Getting to a list of organisation references for IATI publishers

Question: where can I get a list of organisation references for IATI publishers?

Answer - here’s a list, which took some steps to produce

Sounds simple? Well, it took us a few steps to get to this (big thanks to Ben Webb - IATI Secretariat ):

We first looked at the IATI Registry. Each publisher has a freetext field for their organisation reference. We found this to be inconsistent and unreliable: the dashboard confirms this.
we then looked Activity data. The reporting-org reference is useful here. But - this is repeated, and sometimes different. It seemed a lot of overhead to check 4500+ files for around 500 identifiers.
so, finally, we landed at the Organisation file. Big Surprise!

Yes. When discovering the preferred organisation reference for an IATI publisher, we found the Organisation files to be the best source. Specifically, we looked for instances where:

reporting-org/@ref matches the identifier in organisation-identifier (or iati-identifier in versions 1.0x)

(it’s entirely possible and feasible to reference many organisations in a single org file, but we focused on matches between the publisher and org reference initially)

We took a look into this (in July 2017), and found:

Of 555 IATI publishers, there were 406 IATI organisation files (73% of publishers provide an Org file)
Of these 406 publishers, 392 organisation identifiers match the reporting-org/@ref (97%)

So - when a publisher provides an Org file, it’s highly likely that this will be a definitive source for their organisation reference

So far, so good.

Next, we then took a look at the prefixes for these references, to understand if these were available via the org-id project (which IATI supports). On doing this, we found:

Of 392 “matching” organisation identifiers, 333 started with a “recognised” prefix (85%)

So again - a trend seems to be that definitive, and standardized / useful organisation references are also highly likely via the IATI Org file.

What Next?

It seems obvious:

all publishers should provide an Org file
the definitive organisation reference for a publisher should be maintained in the Org file

It doesn’t seem a lot of effort to get to nearly 100% coverage in the above metrics. It would be useful to hear thoughts from others.

Last notes:

this isn’t a list of all the organisations mentioned via IATI data: we wanted to focus initially on the growing list of publishers
how such a list (if it is a list - Tim Davies thinks it’s more of a cache) is published, maintained and used is another topic. In doing this research, we output the identifiers as a spreadsheet, but also looked at how the IATI Organisation standard could be used.

Comments (36)

Otto Reichner • 6 years ago

great job, very valuable work has been done and it is exactly this kind of groundwork (common code lists, alignment and validation of data published) that ensures that IATI data can be used properly, I very much hope that this continues and all publishers are requested to publish the org file with proper content, mandatory validation of such basics is a must if IATI wants to achieve its challenging goals

Vincent van 't Westende • 6 years ago

stevieflow:

all publishers should provide an Org file
the definitive organisation reference for a publisher should be maintained in the Org file

Totally agree. In your list there are also some empty names due to orgs not reporting the “name” element in the org standard, would be good to put emphasis on that too imo. At the moment it is harder than it should be for toolbuilders to just create a correct list of publisher (aka org ref + names) to be used as a filter.

At the moment the publisher name often differs from the organisation standard name element, which sometimes differs from their activities reporting-org/narrative. It might not sound like a big problem but it leads to usability problems when a user searches for a different name than what the application uses as name (had questions about this multiple times on IATI Studio).

Steven Flower • 6 years ago

Otto Reichner Vincent van 't Westende many thanks

Yes. I didnt mention it, but wonder if we can say that the Org (when the above conditions are met) is the canonical source of references for organisations who publish IATI data

We can also look to refresh the research and get an updated “list”, but would be useful to understand if this approach and data is going to be useful to others.

Steven Flower • 6 years ago

We could try to update and maintain this list as a community effort, but when speaking to others at the Members’ Assembly (Otto Reichner , Andy Lulham ) we did think this “list” might be best considered key infrastructure for IATI, and could potentially be output via the IATI Registry.

IATI Technical Team : any thoughts?

Bill Anderson • 6 years ago

I would prefer to see this as part of a shared, cross-standard list, maintained by (a sustainably funded) org-id.guide.

Taking this a step further I would like to see (a sustainably funded) org-id.guide provide a service that harvests all valid org-ids used in all places in all participating standards.

Andy Lulham • 6 years ago

Here’s a first pass at an org-id finder, using (more or less) the methodology described above: https://andylolz.github.io/org-id-finder/

What do you have in mind here, Steven Flower ? Does that roughly do the thing publishers need it to do?

Reid Porter • 6 years ago

Spoke with Andy Lulham about this briefly offline, and tbh I’m sure I don’t fully appreciate the use case here (though I trust there is one)…But…

At a strategic level, I’m with Bill Anderson - this needs to be integrated with other infrastructure, sustainably financed, embedded with other complementary tools, etc. I return to my pub props - ketchup bottle, salt shaker, hot sauce (which then became water bottles at the last miniTAG) if we solve every problem individually, without tying those solutions together in some way, we’re increasing the maintenance burden for ourselves and adding to the overwhelmed feeling new publishers get when introduced to the ever-expanding list of tools and workarounds-as-tools. (I think this is aimed at a more forgiving clientele, but I’ll say it anyway.)
Tactically, this seems very similar, possibly overlapping in parts, but at the very least tangentially related to the work Anjesh Tuladhar and Young Innovations are doing with the org data clean up, org-data API service, and AidStream UI integration. Anjesh can say more, and Tim Davies is involved in both, so I don’t think there’s risk of duplicating work, but wanted people to be aware since they’re working in proximity to each other. See more: https://github.com/younginnovations/aidstream-org-data

Steven Flower • 6 years ago

I just want to take a slight step back here to the original question

stevieflow:

where can I get a list of organisation references for IATI publishers?

The reason (or use case) for this would be:

IATI publishing organisations are very likely to be talking about themselves and their involvement in activities
With traceability, a key and vital consideration is that organisations can talk about each other in consistent and unique ways (GB-GOV-1 rather than DFID or DfID or D.F.I.D)
IATI publishers self-identify. If we can be certain about what they use to identify themselves, then that can be useful to others, for reuse

The intention here is to not solve wider issues about the cannonical reference for organisations outside of the 555 (now 592) publishers, right now. We just wanted to test and find the most reliable source for being confident of the reference for the publishers. It turned out that is the organisation file, published by these very same organisations!

In terms of org-id.guide. Right now, that is a service to check/verify the first part of any reference (the GB-GOV or GB-GOV-1, for example). We used this in the methodology above. org-id.guide is not currently about hosting lists of specific organisation identifiers.

The IATI Registry is a listing of organisations that publish data with the IATI format. We’ve a list, which is automated by the software underneath it, has an API underneath it, and is administered by the core IATI Technical Team . If the IATI Registry can serve out (perhaps along the methodology outlined above) the identifiers for the publishers registered on its own system, in an authoritative way, then that seems very helpful for people such as Otto Reichner and others.

Honestly, it’s really fantastic that Andy Lulham has built something, and that Anjesh Tuladhar and team are integrating org referencing tools to AidStream. But, I also think building upon the core and common infrastructure we have in place already, is well worth consideration, for this particular question

Steven Flower • 6 years ago

I just want to take a slight step back here to the original question

stevieflow:

where can I get a list of organisation references for IATI publishers?

The reason (or use case) for this would be:

IATI publishing organisations are very likely to be talking about themselves and their involvement in activities
With traceability, a key and vital consideration is that organisations can talk about each other in consistent and unique ways (GB-GOV-1 rather than DFID or DfID or D.F.I.D)
IATI publishers self-identify. If we can be certain about what they use to identify themselves, then that can be useful to others, for reuse

Andy Lulham • 6 years ago

Registry metadata is available via API, and includes the publisher’s name and publisher’s organisation identifier. Here’s a random example. As both a user and a publisher, I’ve found the overlap between the publisher metadata on the registry versus the information in the organisation file super confusing.

Pulling organisation data from organisation files into the registry metadata would be awesome. I think that would achieve the thing you’re talking about here, Steven Flower . The registry archiver already does a similar job when it comes to last updated timestamps for datasets, so there’s precedent for this (albeit it’s currently being fixed!) What do you think to this change, IATI Technical Team ?

Then the registry would be providing a publisher-maintained, centralised list of organisation identifiers, available via API. Adding endpoints to make it queryable by organisation name and/or organisation identifier would also be brilliant.

Steven Flower • 6 years ago

Yes, agree.

andylolz:

a publisher-maintained, centralised list of organisation identifiers

(my emphasis) - but that’s the crucial bit for me.

Andy Lulham • 6 years ago

Bumping this:

andylolz:

Pulling organisation data from organisation files into the registry metadata would be awesome. I think that would achieve the thing you’re talking about here, Steven Flower . The registry archiver already does a similar job when it comes to last updated timestamps for datasets, so there’s precedent for this […] What do you think to this change, IATI Technical Team ?

Then the registry would be providing a publisher-maintained, centralised list of organisation identifiers, available via API.

What do you think, IATI Technical Team ?

I use https://andylolz.github.io/org-id-finder/ quite often. It would be great if the registry could provide this service directly.

Anjesh Tuladhar • 6 years ago

Hi Andy and all,

I am bit late into the discussion here. As Reid mentioned in earlier thread, we are doing something similar in and for AidStream users - where we are taking data from org xml and the publishers list in the registry. We are only consuming data that pass org-id.guide criteria or is present in iati-org-codelist - rest are ignored even if they are included in org-xmls. We want this to be controlled list instead of solely consuming org-xml files only. There are number of issues with org-xml files which might give wrong info to the users. I randomly typed DFID and got this
Apparently this xml has that id https://aidstream.org/files/xml/stromme_ug-org.xml

We are putting extra eyes to avoid situation like this but still there are chances of missing those as well, when the numbers of org increase. So we call for suggestions from the users as well to improve the data. Like providing alternative names for organisations so that searching for DFID also gives results here http://api.stage.aidstream.org/organisation But it’s far from perfection but hope that this will at least help the majority of aidstream users to improve the a limited number of organisations to start with.

We are releasing this as a part of aidstream new feature solely targeting the participating-organisations data.

I would be very happy to collaborate and see how we can combine our forces on org-data.

Best
Anjesh.

Andy Lulham • 6 years ago

Nice! Thanks for sharing, Anjesh Tuladhar ! I’m exciting about something like this being baked into AidStream.

Just to respond on this point:

anjesh:

I randomly typed DFID and got this

Apparently this xml has that id https://aidstream.org/files/xml/stromme_ug-org.xml

So, I did it this way by design, mostly because I don’t have the time or desire to take ownership of someone else’s data issues Funnily enough, I did exactly the same search as you last week, and found the same data issue. But instead of taking responsibility for the problem and fixing it myself, I was able to trace where the problem was, by clicking the source link:

I reported the issue (via zendesk) last week, and it’s currently with the publisher in question to fix.

Admittedly, that doesn’t help users in the meantime – the data is bad, and remains bad until the publisher fixes it. But once it’s fixed, it’s fixed for everyone. I’d encourage you to also bubble up the data issues you find back to the publishers.

Steven Flower • 6 years ago

At the “Mini developers TAG” meeting today, I heard several people (I think) reiterate the need for a canonical list of verified organisation references (my words). I pointed to this thread on twitter, but want to flag again.

I also wanted reiterate that the method we went through confirmed (as Anjesh Tuladhar describes) that the Organisation XML files seem to be our best initial source of these references. This isn’t an ID for every single organisation mentioned in IATI data, but it is a start.

And - I’m going to do that thing of tagging people I heard say (or at least listen to!) this: Rolf Kleef Pelle Aardema Herman van Loon [~379] Hayden Field Bill Anderson Imogen Kutz r_clements John Adams

Steven Flower • 6 years ago

I see this as core infrastructure, and based on the needs of publishers.

Herman van Loon • 6 years ago

A canonical list of activity ids would also be very helpfull to implement validation of references to other activities. The lack of both the canonical org id and activity lists as a part of the IATI infrastructure causes quite some headaches and duplication of effort to do very basic data validation checks.

Anonymous • 6 years ago

So, who will be managing this list to the extent that anyone can trust this list for it to be codified into a codelist?

Herman van Loon • 6 years ago

Any working solution should not have manual intervention, since new organizations en activities are frequently added. What would be helpful is that the existing organization and activity codes are automatically extracted from the current IATI XML publications. Then these list can be used to validate if references to organisations-id and activity-ids actually exist.

Trust is implicit since an IATI publisher is responsible for its own data. So if you as a publisher publish an activity with a certain identifier, that is by definition the truth since you own the data that activity.

Andy Lulham • 6 years ago

For clarity: In what sense does this thing fail to do what you’re looking for, Steven Flower / [~379] / Herman van Loon ?

Herman:

Any working solution should not have manual intervention, since new organizations en activities are frequently added. What would be helpful is that the existing organization and activity codes are automatically extracted from the current IATI XML publications. Then these list can be used to validate if references to organisations-id and activity-ids actually exist.

^^ I agree with this! That’s why it’s exactly how this thing works (for org IDs, at least.) For activity IDs, I guess you need to look to a datastore like this one (though I’m afraid I have no way to judge its trustworthiness!)

siemvaessen:

So, who will be managing this list to the extent that anyone can trust this list for it to be codified into a codelist?

I guess you’re talking about the secretariat either funding, managing, running or endorsing this somehow. Is that right?

Anonymous • 6 years ago

siemvaessen:

So, who will be managing this list to the extent that anyone can trust this list for it to be codified into a codelist?

I guess you’re talking about the secretariat either funding, managing, running or endorsing this somehow. Is that right?
[/quote]

Andy Lulham yes.

Steven Flower • 6 years ago

Please can we take out of this conversation mention of:

references to any organisation in the IATI corpus
a list of activity identifiers

This isn’t what is being talk about. One thing at a time, folks!

The remit of the original focus was to get a list of organisation references for IATI publishers. Whilst that sounds straightforward, we realised it isn’t always so. We then found that the data published via the Organisation standard was the optimal source.

Herman:

Any working solution should not have manual intervention

I agree. This is why the methodology above took a look at the org data already provided by publishers. Any solution could continue to focus on this data as a source.

andylolz:

In what sense does this thing fail to do what you’re looking for

The interface to this is very useful. It’s be great to have a similar interface to many other “codes”! However, I think some organisations would also appreciate the data as a list, or in a format that could be imported into their applications. So this application doesn’t fail anything – it’s just a step on from there being a list!!

siemvaessen:

So, who will be managing this list to the extent that anyone can trust this list for it to be codified into a codelist?

It’s probably important to consider this isn’t a list in the same way as other lists we use. It’s not a list of countries, that changes via formal announcements, for example.

This list of IATI publisher references that will a) increase regularly / at no set pattern b) be contributed to by each publisher, via their org file (so no central sanction of the list) c) have the possibility of changing, should a publish change their org reference (which is theoretically possibly – a UN agency may decide to add the XM-DAC prefix, for example.)

Therefore, the important step seems to be:

agreeing that the route to canonical references for IATI publishers is via the org file
inspecting the code we shared, to ensure that provides this
running / hosting that code to provide an updated/ongoing version of this list

I’d imagine the service to provide ongoing access the contents of this list would be something consider core to the IATI infrastructure.

I think this can be a community effort, with backing from the Secretariat. We also have an in built metric, in that we can actively track those organisations that do not provide an org file / matching references in their org file. This metric might be something the secretariat could take on.

Steven Flower • 6 years ago

Please can we take out of this conversation mention of:

references to any organisation in the IATI corpus
a list of activity identifiers

This isn’t what is being talk about. One thing at a time, folks!

Herman:

Any working solution should not have manual intervention

I agree. This is why the methodology above took a look at the org data already provided by publishers. Any solution could continue to focus on this data as a source.

andylolz:

In what sense does this thing fail to do what you’re looking for

siemvaessen:

So, who will be managing this list to the extent that anyone can trust this list for it to be codified into a codelist?

It’s probably important to consider this isn’t a list in the same way as other lists we use. It’s not a list of countries, that changes via formal announcements, for example.

Therefore, the important step seems to be:

agreeing that the route to canonical references for IATI publishers is via the org file
inspecting the code we shared, to ensure that provides this
running / hosting that code to provide an updated/ongoing version of this list

I’d imagine the service to provide ongoing access the contents of this list would be something consider core to the IATI infrastructure.

Steven Flower • 6 years ago

Please can we take out of this conversation mention of:

references to any organisation in the IATI corpus
a list of activity identifiers

This isn’t what is being talk about. One thing at a time, folks!

Herman:

Any working solution should not have manual intervention

I agree. This is why the methodology above took a look at the org data already provided by publishers. Any solution could continue to focus on this data as a source.

andylolz:

In what sense does this thing fail to do what you’re looking for

siemvaessen:

So, who will be managing this list to the extent that anyone can trust this list for it to be codified into a codelist?

It’s probably important to consider this isn’t a list in the same way as other lists we use. It’s not a list of countries, that changes via formal announcements, for example.

Therefore, the important step seems to be:

agreeing that the route to canonical references for IATI publishers is via the org file
inspecting the code we shared, to ensure that provides this
running / hosting that code to provide an updated/ongoing version of this list

I’d imagine the service to provide ongoing access the contents of this list would be something consider core to the IATI infrastructure.

Andy Lulham • 6 years ago

stevieflow:

However, I think some organisations would also appreciate the data as a list, or in a format that could be imported into their applications. So this application doesn’t fail anything – it’s just a step on from there being a list!!

The API is linked in the footer. You can use that to get a snapshot list of IATI org IDs, in CSV, JSON or atom.

I’ll change the footer links to make this more obvious.

Herman van Loon • 6 years ago

Great API! Will this be ‘officially’ maintained?

Andy Lulham • 6 years ago

For clarity: In what sense does this thing fail to do what you’re looking for, Steven Flower / [~379] / Herman van Loon ?

Herman:

Any working solution should not have manual intervention, since new organizations en activities are frequently added. What would be helpful is that the existing organization and activity codes are automatically extracted from the current IATI XML publications. Then these list can be used to validate if references to organisations-id and activity-ids actually exist.

siemvaessen:

So, who will be managing this list to the extent that anyone can trust this list for it to be codified into a codelist?

I guess you’re talking about the secretariat either funding, managing, running or endorsing this somehow. Is that right?

Anonymous • 6 years ago

siemvaessen:

So, who will be managing this list to the extent that anyone can trust this list for it to be codified into a codelist?

I guess you’re talking about the secretariat either funding, managing, running or endorsing this somehow. Is that right?
[/quote]

Andy Lulham yes.

Steven Flower • 6 years ago

Please can we take out of this conversation mention of:

references to any organisation in the IATI corpus
a list of activity identifiers

This isn’t what is being talk about. One thing at a time, folks!

Herman:

Any working solution should not have manual intervention

I agree. This is why the methodology above took a look at the org data already provided by publishers. Any solution could continue to focus on this data as a source.

andylolz:

In what sense does this thing fail to do what you’re looking for

siemvaessen:

So, who will be managing this list to the extent that anyone can trust this list for it to be codified into a codelist?

It’s probably important to consider this isn’t a list in the same way as other lists we use. It’s not a list of countries, that changes via formal announcements, for example.

Therefore, the important step seems to be:

agreeing that the route to canonical references for IATI publishers is via the org file
inspecting the code we shared, to ensure that provides this
running / hosting that code to provide an updated/ongoing version of this list

I’d imagine the service to provide ongoing access the contents of this list would be something consider core to the IATI infrastructure.

Anna Petruccelli • 6 years ago

Hi All

A bit late to this conversation but I just wanted to say that I find Andy Lulham API super helpful - thanks Andy! I work for a funder and currently in the process of publishing about 200 activities (i.e. grants) - I started looking for organisations’ ids by scrolling up and down the IATI Registry, then discovered this tool (thanks to Steven Flower ) and almost halved the time it took me to look up orgs on the Register.

I have also published our org file so the stat is now 407 out of 555 publishers!

Agree this should be maintained centrally as a tool IATI should offer to encourage and support people to publish and agree a list would also be helpful for things like vlookups etc. Tools like this can improve the quality of the data massively!

And I know this is slightly separate to this thread but an activity finder, based on the same concept and maintained centrally, would be amazing - it would be also useful to ensure that aid flows are tracked properly (by enabling publishers to check that they’re using the right activity id)

Herman van Loon • 5 years ago

When using API’s in production applications, its continuity must be guaranteed. Bill Anderson would it therefore be possible to have this API of Andy Lulham defined as an IATI core service (to be hosted either by the IATI technical team or to be hosted by a third party under the supervision of the IATI technical team)?

Andy Lulham • 5 years ago

The source code is available here:
GitHub andylolz/org-id-finder

OLD REPO. This is just here to redirect to the new location - andylolz/org-id-finder

It’s MIT licensed, so if someone wants to take it, rebrand it, repurpose it… I’m fine with that.

Site Admin • 5 years ago

You’ve done a great job I am bit late into the discussion but spend valuable time, Thanks for sharing.
If you have a Confusion about how to fix AOL Forgot Password Don’t worry you can check this link for a proper solution.

Andy Lulham • 4 years ago

An update on the figures from the original post, two years on.

Of 555 IATI publishers, there were 406 IATI organisation files (73% of publishers provide an Org file)

We’re now up to 1,024 IATI publishers. 694 of those have an IATI organisation file. That’s 68% - so the numbers have gone up, but the percentage has slipped a little.

Of these 406 publishers, 392 organisation identifiers match the reporting-org/@ref (97%)

Of the 694 publishers with organisation files, 678 of the organisation-identifiers in those org files match the reporting-org/@ref (98%).

Of 392 “matching” organisation identifiers, 333 started with a “recognised” prefix (85%)

Of the 678 “matching” organisation identifiers, 637 start with a “recognised” prefix (94%). I guess that means new publishers are choosing (or being given) valid org IDs, and some existing publishers have updated their org IDs to the new system. That’s excellent.

It looks like there are also noteable improvements to the registry metadata. I suspect the IATI Technical Team have pushed to improve this, so that is great. (Recent changes to the registry ensuring the org ID can only be modified on request will certainly also help with this.)

First of all, 100% of organisation identifiers in these 678 org files match the organisation identifiers in the registry metadata. Steven Flower and Ben Webb - IATI Secretariat didn’t include this figure last time so we don’t have it for comparison, but I suspect it wasn’t this high.

958 of the 1,024 org identifiers in the registry metadata match the reporting-org/@ref (94%).

Of these 958 “matching” organisation identifiers, 898 start with a recognised prefix (94%).

Steven Flower • 4 years ago

Many thanks for this timely update Andy Lulham

It does represent excellent news in terms of the growth of the number and quality of the organisation references available. Thanks to the IATI Technical Team for pushing this forward for us all.

I’m unsure what the “recent changes to the registry” are, but these also seem welcome. Is there an announcement of these changes anywhere Wendy Thomas ?

Andy Lulham can you confirm that the service you voluntarily host remains up to date / in synch ( I think it’s automagically so)?

From all this, there looks to be four questions to us all to address:

Given that registry itself now contains a high match (94%) of org references to the reporting-org/@ref data , is this to be a preferred methodology?
If so, where does that leave the purpose of the organisation file? Our original point was to highlight that the org file could be the perfect place to maintain a single source of truth in terms of a publisher org reference (and less dependent on the registry itself, which might be a use case for data users who download the whole corpus)
What do we do about the fact that 32% of publishers (330) do not publish an organisation file?
Whilst org references are available in a list on the registry, do we need to ensure services such as Andy Lulham org id finder are core, central and easily available to all?

Andy Lulham • 4 years ago

stevieflow:

I’m unsure what the “recent changes to the registry” are, but these also seem welcome.

Sorry, I should have provided a link. I was referring to changes to the registry that prevent publishers from changing their IATI publisher and Org IDs.

stevieflow:

Andy Lulham can you confirm that the service you voluntarily host remains up to date / in synch ( I think it’s automagically so)?

Indeed yes. The footer currently says it last updated 9 hours ago.

I’m tempted to change the methodology that org-id-finder uses, to instead use registry metadata. This would make it much simpler. But it would be good to get confirmation that this is now the most reliable source.

Please log in or sign up to comment.