Data use observation: A reference for an organisation alone is not enough


(Steven Flower) #1

I use d-portal a lot - it’s very helpful to find good data, I find

I wanted to share this recent insight that digs into how the standard is implemented, and what this means for a) a data use project such as d-portal and b) a data user using a data use project!

This might seem quite a minor aspects of the standard, but I think interesting nevertheless, and they certainly have an impact on the end use. Many thanks to @shi at d-portal for the prompt replies and actions

When we describe an organisaton with a org reference, should we include the name too?

When looking at one activity, I was presented with the following:

31860116-45b821d8-b70d-11e7-804a-02a1fe3bd5ff

As much as I like organisation references, I dont think I can recall who NL-KVK-41198677 is !

The reason for this , is that the publisher had not included the org name in the XML:

<participating-org ref="NL-KVK-41198677" role="2" type="21"/>
<participating-org ref="XM-DAC-7" role="1" type="10"/>
<participating-org ref="NL-KVK-41198677" role="4" type="21"/>

Hence, d-portal couldnt tell us the name of the organisation, as it wasnt in the data.

Thankfully, @shi found a solution, and we now have:

orgs

But - the fact remains that this data (and no doubt others) misses the org name. This would potentially cause headaches for data users.

Do we therefore insist on the name being included in the XML? Or … is there a need for a central look up of references, which isn’t as straightforward as it might seem.

But, at the publisher end, it probably makes sense to keep things simple, and avoid repetition, particularly is you are talking about the same organisations time and again. We only need to say NL-KVK-41198677 = Hivos once, right? After all, we want the machines to read the machine-readable data?

Any thoughts welcome


(Andy Lulham) #2

Nice fix from @shi here – that’s very cool (to clarify: d-portal is doing an org-id lookup using all registry metadata. So it should work for any reference to an org that already publishes to IATI.)

If the org-id can’t be used to figure out the org name, then yes – I don’t see any option but to insist on a name.

In general, OIPA is great for this sort of lookup – it does something very similar to what d-portal is doing here. Unfortunately, OIPA doesn’t help for that example… My guess is that OIPA uses the org file instead of the registry metadata, and the Hivos org file looks a bit borked :cry:

One potential fallback option would be for d-portal to link to the relevant org-id.guide page, so the user could then follow the links and perform the lookup themselves. In this case, that’s http://org-id.guide/list/NL-KVK but if you give that a go you’ll find you quickly come unstuck.


(Steven Flower) #3

Would be interested to know more —> @VincentVW ?

I think the wider issue persists though - publishers are having to repeat the same thing again and again in their data – and that seems because humans are reading it (whether in the XML or via d-portal, for example)…


(Vincent van 't Westende) #4

Ah yes that is a bug and we’ve got an open issue on OIPA to fix that. [short explanation as to whats wrong] It now looks at the Organisation file -> name element and if that is missing (or invalidates) it should fall back (but doesn’t) on displaying the publisher activity -> reporting-org narrative or IATI registry publisher name ( We might make the latter the preferred name since it always exists ).

To get more back on topic: totally agree that we can’t avoid using participating-org narratives for all organisations that do not publish yet for the reason that we can’t validate the name with the ref.

A ‘do not repeat yourself’ solution would be to make it mandatory to report all the orgs you name in activities in your organisation file (that’s already possible atm right?). Not sure if that makes things easier for publishers nor developers though. It would increase consistency in naming.

A central lookup using org-id.guide sounds like the best option to make it easier both on ‘search a org ref’ and ‘auto ref to name lookups’ in IATI visualisation tools. Also would be easy to integrate in publishing tools that use a CSV to IATI approach (pivot tables?) and AidStream. Any big issues with feasibility on this that you ran into at MA discussions @andylolz @stevieflow ?


(shi) #5

In case anyone is curious, this is how we do it:

  1. Using the IATI Registry API - we suck down all the publisher metadata.
    You can find it here - this list is updated nightly-ish.

  2. Refining the data from here is, of course, an obvious and easy step; ie. turn publisher_iati_id into the org list.

Bonus points - Since this is under source control, you can see the history of changes in the IATI Registry over time.


(Yohanna Loucheur) #6

It’s already the case - it’s strongly suggested in guidance. Unless by insist you mean quasi-mandatory.

We do publish org names (both English and French when relevant) in our file, mostly to make our data usable if downloaded (eg via datastore) -
something @shi’s much appreciated fix for the D-Portal won’t help with. Until we can solve this, perhaps we must insist more strongly for publishers to include both the Org ID and name (though in reality, we first need to insist that they actually publish an ID instead of a generic org category like CSO…)

Now, I would prefer to see both Ord ID and name on the D-Portal. The ID conveys useful information, like which country the org is from.


(shi) #7

I second this. Considering activities reported from these ids usually begin with the same id in their identifiers, it’s a useful thing to have.

Below are links showing all reporting-org publishers that have included both org name and id, along with the number of activities reported.

JSON
CSV

We found a total of 444 unique org name and id combination from data pulled in from the registry which means about 200 or so are not reporting org names, at least from our findings.

There are also instances where org id include things like, execution (117 publishers), null (126 publishers) and finished (20 publishers) though this could just be down to an overzealous publishing tool.

Update

Thanks, @VincentVW for pointing out that the numbers above could be wrong!

444 publishers were attributed using 2.01 and not across all versions, maybe. Strange instances are counted as publishers when it should be activities - this was all down to a single publisher with questionable data.

We will have to take a closer look as we do not use this part of the data so is subject to bit rot. On the plus side, we will fix more bugs :}


(Steven Flower) #8

Have added this: https://github.com/devinit/D-Portal/issues/417