Getting to a list of organisation references for IATI publishers


(Reid Porter) #8

Spoke with @andylolz about this briefly offline, and tbh I’m sure I don’t fully appreciate the use case here (though I trust there is one)…But…

  1. At a strategic level, I’m with @bill_anderson - this needs to be integrated with other infrastructure, sustainably financed, embedded with other complementary tools, etc. I return to my pub props - ketchup bottle, salt shaker, hot sauce (which then became water bottles at the last miniTAG) if we solve every problem individually, without tying those solutions together in some way, we’re increasing the maintenance burden for ourselves and adding to the overwhelmed feeling new publishers get when introduced to the ever-expanding list of tools and workarounds-as-tools. (I think this is aimed at a more forgiving clientele, but I’ll say it anyway.)

  2. Tactically, this seems very similar, possibly overlapping in parts, but at the very least tangentially related to the work @anjesh and Young Innovations are doing with the org data clean up, org-data API service, and AidStream UI integration. Anjesh can say more, and @TimDavies is involved in both, so I don’t think there’s risk of duplicating work, but wanted people to be aware since they’re working in proximity to each other. See more: https://github.com/younginnovations/aidstream-org-data


(Steven Flower) #9

I just want to take a slight step back here to the original question

The reason (or use case) for this would be:

  • IATI publishing organisations are very likely to be talking about themselves and their involvement in activities
  • With traceability, a key and vital consideration is that organisations can talk about each other in consistent and unique ways (GB-GOV-1 rather than DFID or DfID or D.F.I.D)
  • IATI publishers self-identify. If we can be certain about what they use to identify themselves, then that can be useful to others, for reuse

The intention here is to not solve wider issues about the cannonical reference for organisations outside of the 555 (now 592) publishers, right now. We just wanted to test and find the most reliable source for being confident of the reference for the publishers. It turned out that is the organisation file, published by these very same organisations!

In terms of org-id.guide. Right now, that is a service to check/verify the first part of any reference (the GB-GOV or GB-GOV-1, for example). We used this in the methodology above. org-id.guide is not currently about hosting lists of specific organisation identifiers.

The IATI Registry is a listing of organisations that publish data with the IATI format. We’ve a list, which is automated by the software underneath it, has an API underneath it, and is administered by the core @IATI-techteam. If the IATI Registry can serve out (perhaps along the methodology outlined above) the identifiers for the publishers registered on its own system, in an authoritative way, then that seems very helpful for people such as @Reichner and others.

Honestly, it’s really fantastic that @andylolz has built something, and that @anjesh and team are integrating org referencing tools to AidStream. But, I also think building upon the core and common infrastructure we have in place already, is well worth consideration, for this particular question


(Andy Lulham) #10

Registry metadata is available via API, and includes the publisher’s name and publisher’s organisation identifier. Here’s a random example. As both a user and a publisher, I’ve found the overlap between the publisher metadata on the registry versus the information in the organisation file super confusing.

Pulling organisation data from organisation files into the registry metadata would be awesome. I think that would achieve the thing you’re talking about here, @stevieflow. The registry archiver already does a similar job when it comes to last updated timestamps for datasets, so there’s precedent for this (albeit it’s currently being fixed!) What do you think to this change, @IATI-techteam?

Then the registry would be providing a publisher-maintained, centralised list of organisation identifiers, available via API. Adding endpoints to make it queryable by organisation name and/or organisation identifier would also be brilliant.


(Steven Flower) #11

Yes, agree.

(my emphasis) - but that’s the crucial bit for me.


(Andy Lulham) #12

Bumping this:

What do you think, @IATI-techteam?

I use https://andylolz.github.io/org-id-finder/ quite often. It would be great if the registry could provide this service directly.


(Steven Flower) #13

+1

I see this as core infrastructure, and based on the needs of publishers.


(Anjesh Tuladhar) #14

Hi Andy and all,

I am bit late into the discussion here. As Reid mentioned in earlier thread, we are doing something similar in and for AidStream users - where we are taking data from org xml and the publishers list in the registry. We are only consuming data that pass org-id.guide criteria or is present in iati-org-codelist - rest are ignored even if they are included in org-xmls. We want this to be controlled list instead of solely consuming org-xml files only. There are number of issues with org-xml files which might give wrong info to the users. I randomly typed DFID and got this
image Apparently this xml has that id https://aidstream.org/files/xml/stromme_ug-org.xml

We are putting extra eyes to avoid situation like this but still there are chances of missing those as well, when the numbers of org increase. So we call for suggestions from the users as well to improve the data. Like providing alternative names for organisations so that searching for DFID also gives results here http://api.stage.aidstream.org/organisation But it’s far from perfection but hope that this will at least help the majority of aidstream users to improve the a limited number of organisations to start with.

We are releasing this as a part of aidstream new feature solely targeting the participating-organisations data.

I would be very happy to collaborate and see how we can combine our forces on org-data.

Best
Anjesh.


(Andy Lulham) #15

Nice! Thanks for sharing, @anjesh! I’m exciting about something like this being baked into AidStream.

Just to respond on this point:

So, I did it this way by design, mostly because I don’t have the time or desire to take ownership of someone else’s data issues :slight_smile: Funnily enough, I did exactly the same search as you last week, and found the same data issue. But instead of taking responsibility for the problem and fixing it myself, I was able to trace where the problem was, by clicking the source link:

44

I reported the issue (via zendesk) last week, and it’s currently with the publisher in question to fix.

Admittedly, that doesn’t help users in the meantime – the data is bad, and remains bad until the publisher fixes it. But once it’s fixed, it’s fixed for everyone. I’d encourage you to also bubble up the data issues you find back to the publishers.


(Steven Flower) #16

At the “Mini developers TAG” meeting today, I heard several people (I think) reiterate the need for a canonical list of verified organisation references (my words). I pointed to this thread on twitter, but want to flag again.

I also wanted reiterate that the method we went through confirmed (as @anjesh describes) that the Organisation XML files seem to be our best initial source of these references. This isn’t an ID for every single organisation mentioned in IATI data, but it is a start.

And - I’m going to do that thing of tagging people I heard say (or at least listen to!) this: @rolfkleef @pelleaardema @Herman @siemvaessen @hayfield @bill_anderson @Imogen_Kutz @r_clements @JohnAdams


(Herman van Loon) #17

A canonical list of activity ids would also be very helpfull to implement validation of references to other activities. The lack of both the canonical org id and activity lists as a part of the IATI infrastructure causes quite some headaches and duplication of effort to do very basic data validation checks.


(Siem Vaessen) #18

So, who will be managing this list to the extent that anyone can trust this list for it to be codified into a codelist?


(Herman van Loon) #19

Any working solution should not have manual intervention, since new organizations en activities are frequently added. What would be helpful is that the existing organization and activity codes are automatically extracted from the current IATI XML publications. Then these list can be used to validate if references to organisations-id and activity-ids actually exist.

Trust is implicit since an IATI publisher is responsible for its own data. So if you as a publisher publish an activity with a certain identifier, that is by definition the truth since you own the data that activity.


(Andy Lulham) #20

For clarity: In what sense does this thing fail to do what you’re looking for, @stevieflow / @siemvaessen / @Herman?

^^ I agree with this! That’s why it’s exactly how this thing works (for org IDs, at least.) For activity IDs, I guess you need to look to a datastore like this one (though I’m afraid I have no way to judge its trustworthiness!)

I guess you’re talking about the secretariat either funding, managing, running or endorsing this somehow. Is that right?


(Siem Vaessen) #21

I guess you’re talking about the secretariat either funding, managing, running or endorsing this somehow. Is that right?
[/quote]

@andylolz yes.


(Steven Flower) #22

Please can we take out of this conversation mention of:

  • references to any organisation in the IATI corpus
  • a list of activity identifiers

This isn’t what is being talk about. One thing at a time, folks!

The remit of the original focus was to get a list of organisation references for IATI publishers. Whilst that sounds straightforward, we realised it isn’t always so. We then found that the data published via the Organisation standard was the optimal source.

I agree. This is why the methodology above took a look at the org data already provided by publishers. Any solution could continue to focus on this data as a source.

The interface to this is very useful. It’s be great to have a similar interface to many other “codes”! However, I think some organisations would also appreciate the data as a list, or in a format that could be imported into their applications. So this application doesn’t fail anything – it’s just a step on from there being a list!!

It’s probably important to consider this isn’t a list in the same way as other lists we use. It’s not a list of countries, that changes via formal announcements, for example.

This list of IATI publisher references that will a) increase regularly / at no set pattern b) be contributed to by each publisher, via their org file (so no central sanction of the list) c) have the possibility of changing, should a publish change their org reference (which is theoretically possibly – a UN agency may decide to add the XM-DAC prefix, for example.)

Therefore, the important step seems to be:

  • agreeing that the route to canonical references for IATI publishers is via the org file
  • inspecting the code we shared, to ensure that provides this
  • running / hosting that code to provide an updated/ongoing version of this list

I’d imagine the service to provide ongoing access the contents of this list would be something consider core to the IATI infrastructure.

I think this can be a community effort, with backing from the Secretariat. We also have an in built metric, in that we can actively track those organisations that do not provide an org file / matching references in their org file. This metric might be something the secretariat could take on.


(Andy Lulham) #23

The API is linked in the footer. You can use that to get a snapshot list of IATI org IDs, in CSV, JSON or atom.

I’ll change the footer links to make this more obvious.


Why does 2.02 include a code list that was not supported since 1.04?
(Herman van Loon) #24

Great API! Will this be ‘officially’ maintained?


(Anna Petruccelli) #27

Hi All

A bit late to this conversation but I just wanted to say that I find @andylolz API super helpful - thanks Andy! I work for a funder and currently in the process of publishing about 200 activities (i.e. grants) - I started looking for organisations’ ids by scrolling up and down the IATI Registry, then discovered this tool (thanks to @stevieflow) and almost halved the time it took me to look up orgs on the Register.

I have also published our org file so the stat is now 407 out of 555 publishers!

Agree this should be maintained centrally as a tool IATI should offer to encourage and support people to publish and agree a list would also be helpful for things like vlookups etc. Tools like this can improve the quality of the data massively!

And I know this is slightly separate to this thread but an activity finder, based on the same concept and maintained centrally, would be amazing - it would be also useful to ensure that aid flows are tracked properly (by enabling publishers to check that they’re using the right activity id)


Why does 2.02 include a code list that was not supported since 1.04?
(Herman van Loon) #28

When using API’s in production applications, its continuity must be guaranteed. @bill_anderson would it therefore be possible to have this API of @andylolz defined as an IATI core service (to be hosted either by the IATI technical team or to be hosted by a third party under the supervision of the IATI technical team)?


(Andy Lulham) #29

The source code is available here:

It’s MIT licensed, so if someone wants to take it, rebrand it, repurpose it… I’m fine with that.