Download everything from the Registry

(John Adams) #1

What’s the current recommended way to download the entire IATI dataset in XML? Separate files are OK.

Looking to experts like @andylolz and @Herman.

(Rory Scott) #2

Amusingly, this torrent still exists, but a) it’s from 2016 and b) it’s only a snapshot, but @siemvaessen might be happy to put another torrent online :stuck_out_tongue:.

I imagine pyIATI has something to do this, but I’m not sure how long it takes to sort out python environments etc. I’m fully expecting @andylolz or @Herman to have some magic for this.

Serious suggestions though:

If you don’t mind the known issues with the IATI Datastore, or the existence of some wrapper XML including query metadata, this (theoretically) should do the trick as an ultra lazy approach:

But if you want the exact published xml, this is possibly still the simplest / lowest friction way of downloading all of the datasets in their raw form: as far as I know.

(Andy Lulham) #3

@JohnAdams I made this proof-of-concept yesterday evening:

It provides all IATI data on the registry – unfettered and unfiltered – in a single HTTP request.

It’s a big zip file (currently ~350MB) that’s auto-updated daily. It’s made using IATI Registry Refresher that @rory_scott mentions above, and is hosted on dropbox.

I’ll try and make it look a bit more presentable, but let me know if that’s useful.

(Siem Vaessen) #4

Great. I have also created another torrent, just in case:

(Herman van Loon) #5

Currently I download IATI XML data by using Pentaho Data Integrator on a file by file basis for a defined list of approx. 140 publishers, making use of the URL’s from the registry.

(Matt Geddes) #6


Is there any push to have either @andylolz’s IATI wonderful data dump, or something identical (with Andy’s permission, provided by the new datastore and therefore adopted as an official IATI product?



(Andy Lulham) #7

Please consider my permission hereby granted! Agree, it would be cool if this service could be provided officially.

Ongoing maintenance is minimal – I frequently use this service, but it hasn’t required any maintenance at all since it was created.