Registry publisher_iati_id inconsistencies

(Vincent van 't Westende) #1

A while ago I found a publisher that was publishing an activity under the IATI identifier GB-1-123456 (one of DFID’s activities). This was by a publisher that was publishing through CSV2IATI and probably misunderstood the guidance to refer to GB-1-123456 as provider-activity-id for its DFID funded transactions within the activity.

Of course we wanted to invalidate this data as it was not DFID’s. I could do that quite easily because the reporting-org ref was set correctly, and when when you don’t use that reporting-org ref as your prefix, I’ll just invalidate the activity.

It becomes more error prone if the reporting-org is also set incorrectly, how do I know who’s speaking truth then? I don’t expect anyone to purposely do this but I’ve seen odder mistakes in IATI and try to avoid assumptions where possible.

For this we need a validation check to see if the activity reporting-org is the publisher of the dataset (are there any edge cases on this?). This is possible by using the publisher_iati_id key on the IATI Registry CKAN API endpoint to get all publishers:

Though unfortunately, this is not always set correctly:

Another use scenario this would help is; I’m building an IATI publishing tool and would like to make it easier for new publishers to start publishing correctly. I ask them to make an IATI registry account and after registration ask them for their IATI Registry user name and API key. Based upon that I fetch the publisher with its iati id, I validate the user and automatically set the reporting-org correctly so they can’t make any errors on that.


-Is this a shared need?

-How can we keep the IATI Registry publisher_iati_id in sync? It seems like quite some orgs did not correctly fill in the publisher_iati_id on registration, but afterwards did use the correct one on their reporting-org (see the gdoc). @dalepotter @hayfield can this be included in the new validator?

(Andy Lulham) #2

Good point @VincentVW! (and very useful spreadsheet!)

I agree it would be useful to at least have a canonical set of publisher_iati_ids in publishers’ metadata, to validate against.

Am I correct in thinking the Org IDs project can help check the publisher_iati_ids? I’ve used it to find the relevant APIs, and have added a few comments to your spreadsheet with URLs to verify IDs, just as a proof of concept.

(SJohns) #3

Hi this spreadsheet is really useful, thank you! There is a similar function on IATI dashboard where it shows whether an organisations ID on the Registry and in their data match.

I think something to be aware of is that some publishing tools (eg. AidStream) help you build the organisation identifier based on the rules - so it asks you to identify your country of registration, then identify the registration authority you are registered under (using the centralised list of registration authorities), then provide your registration number/code. Then it creates your organisation ID for you. It also asks you whether you are duplicating an organisation entry. We’ve seen this cut down the number of mistakes made by organisations publishing recently.

However, this function is not currently available on the Registry, so it is still possible to make mistakes there as it is manually typed in. And there are legacy issues as well (ie where an organisation was publishing for a grant, that grant has stopped and the Registry account is just languishing).

So using the Registry as the definitive list may not be the best way to go.

My thoughts are:

  1. ( I think this is already planned) IATI uses the Org IDs list of lists as the source for registration authorities and perhaps registration numbers where APIs into the data exist.
  2. Creat something similar to AidStream’s managed creation process into the IATI Registry account set up to cut down on mistakes.
  3. Tech team need to feed back to organisations where there is a mis-match between the ID on the Registry and the ID in the data. For example, where a UK govt department has switched to the new GB-GOV- format for the organisation ID in their data, they may not know/remember to change it in their Registry account.

Cheers, Sarah

(Vincent van 't Westende) #4

Right @SJohns I forgot about that table here (which even has more info than that spreadsheet). Anyway, good that we got this discussion started here, all good suggestions in the above messages imo.

I don’t know what effort is already put into getting these mismatches to zero, but I would like to note that as a user of all IATI data this is one of the few issues where we are dependent on others.

IATI challenges like org names, traceability, data validation and such are something data users can handle in their own way, but here we are forced into two options of dealing with invalid publisher iati id’s/reporting-org refs and both look awkward. Invalidating them feels harsh and is bad for the usefulness of my application, accepting them opens my application up to erroneous data.

(Hayden Field) #5

can this be included in the new validator?

Part of the PyLib work we were doing last week was on metadata. The outcome of this was that Datasets should have metadata (including that from the Registry where appropriate) directly associated with them.

The new validator (to be built on top of the library) will take a Dataset as part of its input. As such, it will be possible to write rules to make checks like this, yes.

Tagging @petyakangalova because of the Org IDs discussion in this thread.

(Vincent van 't Westende) #6

Thanks for the reply @hayfield !

Small additional bug related to this; IR iati_publisher_id’s are also not unique.