Add metadata from Registry 'About' page to Org files

(Matt Geddes) #1

I have been looking for email addresses for a range of organisations recently, and often ended up using the one given in the ‘about’ page (e.g. here) on the IATI registry as there were none provided in the activity or the organisation file. This was a complete pain as while the org file can be queried using standard IATI tools, the ‘about’ page cannot (as far as I am aware?)

Therefore I am suggesting a modification to require the registry ‘about’ page information to be added to the org file instead.

The ‘about’ page on the registry should in turn pull from the org file, and perhaps the act of registering for the registry should be reduced to supplying a link to an org file with all the relevant information.

This would require simple changes to the org file elements, and combining with a rule to require an org file from of any organisation with an activity file on the registry.

There would need to be adjustments made by publishers to their publication systems.

(Andy Lulham) #2

Apologies – bit of a braindump below. I did some work on this a year or so ago, but not much since. The situation may have progressed since then.

You’re right that registry metadata contacts and org file contacts are frequently out of sync. Elsewhere, I’ve suggested synching them up by auto-populating, but the other way around – from org file to registry metadata. That’s mostly because auto-populating org files would fundamentally change the way IATI works. Publishers control their XML files; IATI doesn’t. Furthermore, IATI already auto-populates a number of registry metadata field via the IATI harvester (a CKAN plugin), so this could be extended to auto-populate contact information.

[Sidenote: the IATI tech audit unanimously agreed based on UNHCR’s experience that the IATI harvester should be moved external to CKAN. This seemed like a great recommendation, and it’s a shame it wasn’t followed.]

As discussed in this ticket, publisher contact information is particularly useful for closing the feedback loop between data users and data publishers. So this seems a worthwhile pursuit.

Publisher metadata is available via the registry API, though it’s not documented in the registry API documentation. (You have to refer to the CKAN documentation.) So for instance, the PWYF metadata includes a publisher_contact_email field. There’s sometimes a publisher_contact field that includes an email address. Dataset metadata also includes contact information, which may or may not be the same as the publisher contact.

18 months ago, the previous registry supplier, Petya and I did some work in an effort to try and tidy this situation up. That thread might be worth reading. I’m afraid I’m not sure where that got to – looks like the ticket was closed and the work was deprioritised.

(Matt Geddes) #3

Thanks @andylolz - braindump much appreciated

I was also unclear - I agree that it should definitely populate from the org file to the registry metadata - when I said ‘information from the ‘about page’ I should have said ‘fields’ from the ‘about’ page’ as I had lazily assumed that the issue was that the org file didn’t contain all the relevant elements to capture the registry metadata.

Given that there is broad agreement that there would be a lot of benefits (yay feedback loops, yay all metadata available through the DS) to cleaning this up, and a whole bunch of preliminary work done, then it seems a sensible to make sure that any adjustments to the org file elements needed for them to act as the source for all the registry metadata are achieved the next version bump.

In terms of syncing the org file and registry metadata, I can’t think of a reason why the publisher contact field in the registry metadata couldn’t be given by the publisher contact in the org file, but maybe there is a good one?

I guess the next step towards getting this done for standard v3 is a crosswalk of the registry and org file metadata fields? I see that there are some i.e. the Registry Publisher ID that we would either need to be clear were generated by the registry, or were made available for the publisher to include in their org file.