Pulling non-mirrored codelists?


(Matt Geddes) #1

Hi,

Please can I ask how other people handle the codelists that are not directly mirrored by IATI? For DAC5 etc we (a computer) can pull them directly, but for e.g. the Humanitarian sectors/clusters, the IATI codelist links to a webpage where you can download them as a csv - of course we can write a scraper to locate the CSV and download it, and then deal with the import - but this could easily break and is reliant on a non IATI third party.

Any tips for how everyone else handles this? Any pointers to why some codelists are integrated (therefore higher class IATI citizens) and others are not?

Thanks a lot,

Matt


(Matt Geddes) #2

Bump - anyone?..


(Andy Lulham) #3

Hi Matt,

I’d suggest adding an issue on the Open Knowledge datasets repo:

Then there’s potential to find collaborators who can help with maintaining a scraper.

I don’t know. I guess there’s some heuristic that factors in how widely used the codelist is likely to be, how difficult the original is to use (e.g. the format it’s in), how frequently it updates etc…


(Matt Geddes) #4

Thanks @andylolz

@IATI-techteam can you share any light on this? How come only the DAC5 vocab is integrated and not the other sector vocabs?


(IATI Technical Team) #5

@matmaxgeds Ideally we would not have to replicate any of the external codelist but historically we have done so for the OECD DAC and a few others. We have done this mainly for convenience and accessibility of users.
We are unable to maintain and replicate all external codelists. It is the responsibility of the codelist providers to make sure they are up-to-date and easily accessible to users via the URLs they have provided and we link to on the IATI page.


(Matt Geddes) #6

Hi @IATI-techteam - thanks for the response.

Could we move to a system where the community maintain them as part of IATI in the same way they can submit pull requests for DAC5?

I ask because not embedding these codelists makes it much harder to use any IATI data that does not use DAC5 sectors, and all the different systems that use IATI data have to maintain their own codelists - this would be much more efficient if it were handled by IATI.

Thanks,

Matt


(Bill Anderson) #7

I don’t think it is realistic to expect a voluntary community to maintain a reliable suite of replicated standards.

Rather I think we, along with other open data standards and other consumers of these coding standards, should argue:

  • for all codelists to be API accessible
  • and (I know this is ambitious) for common protocols in consuming these lists?

(Matt Geddes) #8

Thanks @bill_anderson - this sounds a good direction of travel.

Currently non DAC5 data is completely inaccessible to non-technical users e.g. I can’t search for it via d-portal, OIPA or the datastore, so if we are going to keep having it as part of the standard, the question is what steps are we going to take to make it useful?

I think your focus on APIs is the right one. Can we adjust the rules so that IATI either only adds external codelists that are available by API, or requests the organisations to whom the codelists belong to upload a copy in a standard format for IATI to make available via their API?