Discussion

Pulling non-mirrored codelists?

matmaxgeds
matmaxgeds • 8 November 2018

Hi,

Please can I ask how other people handle the codelists that are not directly mirrored by IATI? For DAC5 etc we (a computer) can pull them directly, but for e.g. the Humanitarian sectors/clusters, the IATI codelist links to a webpage where you can download them as a csv - of course we can write a scraper to locate the CSV and download it, and then deal with the import - but this could easily break and is reliant on a non IATI third party.

Any tips for how everyone else handles this? Any pointers to why some codelists are integrated (therefore higher class IATI citizens) and others are not?

Thanks a lot,

Matt

Comments (7)

Andy Lulham
Andy Lulham

Hi Matt,

I’d suggest adding an issue on the Open Knowledge datasets repo:
Image removed. GitHub Image removed. datasets/awesome-data

Awesome datasets curated. By topic and for core data https://datahub.io/docs/core-data - datasets/awesome-data

Then there’s potential to find collaborators who can help with maintaining a scraper.

Image removed. matmaxgeds:

Any pointers to why some codelists are integrated (therefore higher class IATI citizens) and others are not?

I don’t know. I guess there’s some heuristic that factors in how widely used the codelist is likely to be, how difficult the original is to use (e.g. the format it’s in), how frequently it updates etc…

IATI Technical Team
IATI Technical Team

matmaxgeds Ideally we would not have to replicate any of the external codelist but historically we have done so for the OECD DAC and a few others. We have done this mainly for convenience and accessibility of users.
We are unable to maintain and replicate all external codelists. It is the responsibility of the codelist providers to make sure they are up-to-date and easily accessible to users via the URLs they have provided and we link to on the IATI page.

matmaxgeds
matmaxgeds

Hi IATI Technical Team - thanks for the response.

Could we move to a system where the community maintain them as part of IATI in the same way they can submit pull requests for DAC5?

I ask because not embedding these codelists makes it much harder to use any IATI data that does not use DAC5 sectors, and all the different systems that use IATI data have to maintain their own codelists - this would be much more efficient if it were handled by IATI.

Thanks,

Matt

Bill Anderson
Bill Anderson

I don’t think it is realistic to expect a voluntary community to maintain a reliable suite of replicated standards.

Rather I think we, along with other open data standards and other consumers of these coding standards, should argue:

  • for all codelists to be API accessible
  • and (I know this is ambitious) for common protocols in consuming these lists?
matmaxgeds
matmaxgeds

Thanks Bill Anderson - this sounds a good direction of travel.

Currently non DAC5 data is completely inaccessible to non-technical users e.g. I can’t search for it via d-portal, OIPA or the datastore, so if we are going to keep having it as part of the standard, the question is what steps are we going to take to make it useful?

I think your focus on APIs is the right one. Can we adjust the rules so that IATI either only adds external codelists that are available by API, or requests the organisations to whom the codelists belong to upload a copy in a standard format for IATI to make available via their API?


Please log in or sign up to comment.