Update on IATI Datastore API

(IATI Technical Team) #1

New DataStore API

As everyone is aware the new IATI DataStore is in the process of being built. There have been a few questions about the API that this discuss post seeks to address. We have a bit of background on the old DataStore for context then the information about the new DataStore API.

History of the old Datastore

When the IATI Standard was launched there were strong arguments against IATI maintaining a database: this was seen to be overlapping the service provided by the OECD CRS which is a curated database. The datastore was instead agreed as an uncurated view of the files held on the Registry.

The original datastore was built in 2011/12 by Open Knowledge (who also managed the IATI Registry at the time). It replicated all (readable) data on the Registry. It was sold as everything in the Registry today will be in the datastore (DS) tomorrow.
Locations and results data were not included in this alpha - only transactions, budgets and activity-level data were imported into the DataStore (the whole activity was only available through an xml blob stored for each activity). The alpha product did not clean data, it just show exactly what was on the Registry.

The plan had been that these additional features would be added in future phases of the DataStore but due to a number of reasons (including budget and contractual matters) the project never went beyond phase one.

New IATI datastore

The new datastore is based on existing open source software ‘OIPA’ and is actively being maintained by Zimmerman & Zimmerman. OIPA has been in use with a variety of international organisations and governments like UNESCO, IOM, DFID, MFA and many others. This new datastore will scrape the IATI Registry for IATI Publishers and their XML data sources, validate XML using the new IATI Validator service and will then transform, store and interface that data into API for anyone to use. The API has 14 different API endpoints each with their specific purpose. The datastore will also allow users to export XML, CSV and xlsx format if so desired. Snapshot of functionality as per original specification:

  1. ETL (Extract, Transform, Load) from XML to JSON

  2. Validation provided by new IATI Validator

  3. IATI Version support

  4. XML exports

  5. Range of filters available

  6. API output

  7. CSV/XLSX Serialisations

Timeline for delivery

The new DataStore will be launched together with the IATI Validator this summer.

The IATI technical team met with Zimmerman and Zimmerman along with Data4Development earlier this month to agree how the two systems will integrate. We will share more information on the system integration of the two products in early May with an update on the timeline.

Moving from old to new API

The new OIPA-powered Datastore will differ from the current one both in API calls and results returned. This is because as it shows more of IATI data, a new structure is needed to do so logically.

For API calls we have limited the changes as much as was possible to do whilst still delivering a product that has a different core structure and more capabilities. Mostly what will be required is small tweaks in the url in use so that it points to the new location. The mapping is not going to be a 1:1 mapping so we cannot use the old API structure with redirects.

For returned results, the underlying structure will once again be mostly the same, with a few changes. The new Datastore allows for a more comprehensive and precise resultset to be explored. An example is the participating-org result row: in the current system, the users will receive a result containing participating-org.role = 3, while the new Datastore offers an expanded view, providing both participating-org.role.code = 3 and participating-org.role.name = Extending, effectively removing the need for cross checks and extra calls.

Don’t panic!

The technical team are here to help. There will be documentation of all the new parameters, queries possible and outputs so that the transition can happen as smoothly as possible.

We are also making sure there is a grace period where the old DataStore will exist in parallel till the end of 2019 so that there is plenty of time to make necessary changes.

Details of who is using the current DataStore are being collected here.

Datastore update?
(Herman van Loon) #2

Thanks for the update on the status of the DS. Is there an overview of what functionality will be delivered when against the functional requirements as specified in the final version of the Terms of Reference for the DS? If not, it would be very helpful to have such an overview because it will enable planning of the migration of existing applications of the old DS to the new DS.

Kind regards
Herman

3 Likes
(Matt Geddes) #3

+1 for the overview against the ToR

(IATI Technical Team) #4

Full details of the project are publically available here: https://github.com/zimmerman-zimmerman/OIPA/projects/2 so you can clearly see the implementation of the project.

When the IATI DataStore is launched there will be clear user documentation and developer documentation made available that will help to smooth the transition between old DataStore and new. For the question “what will be delivered when” the full TOR will be met at the point of release. Any future releases would be for upgrades / new features.

1 Like
(Mark Brough) #5

Thanks for the update.

Please can you explain / review the below paragraph? If the new API will return the original XML and full, unpaginated results (as I believe it will), then I don’t understand why it would not basically be a fairly trivial process to convert the old URLs to the new URLs, by passing the current (very limited!) list of filters on the old Datastore to their equivalent values on the new Datastore.

Please can you confirm that this will include meeting the following requirement included in the draft ToR, and if not, why the Secretariat decided to remove this requirement?

Provides support for (all ?) existing routes for the IATI Datastore? So that existing software using the current Datastore API does not break.

At the very least, the IATI Secretariat should be working to ensure continuity of service for anyone who has managed to begin using IATI data over the last ten years, especially when it should be fairly simple to implement once on the IATI Secretariat’s end rather than many times on many users’ ends. Please can this be reconsidered? :slight_smile:

2 Likes
(IATI Technical Team) #6

As mentioned in our original post, there is no 1:1 mapping therefore the mapping would be infinite. It also would introduce a new potential for technical debt on the new Datastore. By issuing a redirect at Datastore level:

  • We would essentially be Introducing a static mapping file

  • It would by no means sort the old domain to new domain out (eg: the redirect from old-datastore.iatistandard.org to new-datastore.iatistandard.org should live on the old Datastore’s server; this redirect will cease to exist once the old datastore is switched off, so there’s no benefit to it

  • In future iterations of 3rd party software, the new urls will need to be used

There are 6 months available to update the API calls and there will be information available with the launch of the new Datastore to help people transition.

We have already said that we will be meeting the terms of the TOR, where there has had to be a deviation in order to bring technical benefits to the whole community we have posted here to notify users. If there are any other changes there will be notifications.

The IATI Secretariat are committed to ongoing improvement to services. In in order to do this change will sometimes be required. Changes are not made in an arbitrary way; they are considered carefully and we engage with the community to keep everyone informed of possible impacts. . In this instance, in order to provide a better datastore the structure of things has had to shift. We are providing a long grace period within which we are happy to speak to you directly if you need further support.

(Mark Brough) #7

The response above has worried me a lot. This decision breaks many of the few country systems using IATI data, so it seems worth spending some time seeing if we can avoid this, which will set us back significantly in terms of data use at country level. I put together a quick mapping file here which suggests it could be fairly straightforward to redirect requests (at least for XML data). Perhaps we can have a quick chat about any remaining technical barriers?

1 Like
(Andy Lulham) #8

I’ve had a go at turning @markbrough’s quick mapping file into a tiny redirect application:

The code is on github.

3 Likes
(Matt Geddes) #9

Thanks (amazing) @andylolz

Not suggesting you should do it now now, but trying to think about any other potential issues, I guess it wouldn’t be hard to add the header that comes with datastore xml?

<result><ok>True</ok>
<iati-activities generated-datetime="2019-05-09T10:11:11.870934"
<query><total-count>1</total-count><start>0</start>
<limit>50</limit></query>

The redirects for the datafiles route seem to require the internal name? but I guess we could do a lookup for this - or maybe the new datastore will include both options.

It also looks like as soon as the new datastore can do =activity&transaction, not just =transaction then we should also be able to redirect those.

(Mark Brough) #10

This is super good, thanks @andylolz! I think it shows how simple it would be to avoid breaking systems at country level. On @matmaxgeds point, as the XML output is still under active development and some additional metadata (total number of results and status) would be useful and something similar is in the JSON output already, perhaps @siemvaessen could just adjust the XML output to include this metadata?