Thoughts on API as a Service & Data Playground


(James Hughes) #1

In March I shared some thoughts via a Gist about some ways in which IATI data could be used in a way that would make it more accessible to NGOs and other parties. I have been asked to move the content of the gist here as it is a better forum (in that it is actually a forum) for discussing it.

I must caveat the following with

  1. These are just ideas off the top of my head and could be further refined.
  2. I currently don’t work with IATI data or projects so the user need associated with these ideas may be very limited
  3. This was written roughly in a single pass and might contain terrible atrocities against grammar and spelling. I apologies for that.
  4. I am in no way saying any of this is right or anything currently existing is wrong, these are just thoughts that may or may not spark some discussion.

EARLY ALPHA: Some very early unvalidated thoughts on an IATI API Service/Playground.

Introduction

There appears, from an outside view, to be an attempt to create a unified IATI API that will supoprt querying of all available IATI data in a unified way. Projects like IATI Datastore and to a certain extent the DevTracker API [There are more and we should call them out here] have gone some way to achieving this goal.

DevTracker

During the development of DevTracker we explored using various IATI datastores but every one of them had constraints. Some problematic, others potentially catastrophic to the project. Existing solutions either strictly adhered to the IATI standard thus refusing to consume most of the necessary data (IATI Datastore) or had assumptions baked into the logic that conflicted with the DFID data structure.

Root Cause Analysis

The reason for these problems was that existing datastores attempt to provide a generic appraoch to a non-generic problem. While most API based solutions give people access to Activity streams and Organistaion streams there simply isn’t enough guidence within the developing IATI standard yet to expand on this. Consumers needs are typically rather unique to them in some regard which imposes certain overheads on a generic API. In DevTracker for example using an existing solution would have meant extracting data by crawling the, often paged, endpoints, deserialing it and aggreating the data into our own structure that we want to consume. This would have been more time consuming and prone to error.

The other problem was that most of these solutions aren’t open to extension without diving into the code itself. Something that many people would consider too high a barrier of entry.

Suggestion

Two things I have observed (from my outsider view) is that people want

  1. Access to data (or subset of)
  2. Ability to construct their own queries over this data

These people may want to build a site, make an API of a subset of data publically available, or simply do some spreadsheet based work. All of these require some very specific query abilities and custom outputs that dont belong in a generic solution.

Over the last few weeks I’ve been toying with the idea of an “API as a Service” and/or “Data Playground”. Both of these could be seen as an extension to IATI Datastore (though an extensible approach would be more favourable due to the needs of some users).

API as a Service

This concept revolves around providing a multi-tenant approach to providing an API. A user would sign up to the service, configure their resources (select desired data sources, refresh frequency etc) and immediatley be given access to a basic but full featured API similar to the IATI Datastore. They can then provide their own aggregations to the API e.g. Country View, Sectors etc. which will be exposed as an API endpoint on their specific API e.g. http://jhughes.iati-api.org/api/countries.

Custom queries can be provided through some sort of query language [TBD] e.g. Readonly SQL Subset, Readonly XPath/XQuery or some sort of graphical interface akin to Yahoo Pipes.

Varous other customisations could be provided if there was a need (i18n of API data, custom codelists etc.) but the goal should be to provide flexability without inventing an entirely new platfrom with a million bells and whistles.

Data Playground

Not everyone wants an API. Some users may simply want to access a subset of data and play around with it to perform some adhoc calculations or embed it as a widget in another site. So where the API as a Service feature would be too full blown they could use the Data Playground. Inspired by Githubs Gist feature (providing ability to share code without having to use a full blown git repository) and Neo4Js Graph Gist (abiltiy to save and visualise graph queries) the playground would enable users to write and save one off queries against a set of data. Results would come back in table, csv, json, xml, graphs (D3 powered for example) and can be embedded in other sites or exported to spreadsheets and other tools.

Again format of the queries is open to discussion.

Thoughts

Extensible Data Store

Datastores could be pluggable (assuming a common defined interface) so you could provide a public store and plug it into your own API platform account. Or other people coudl provide BaseX version of IATI or you could run everything locally etc.

Monetisation

As the IATI dataset grows and potential userbase gets larger the cost of running this type service would also increase. Queries would not necesarilly be cheap (though extensive caching in line with IATI data refresh would help). So a tiered subscription model could be introduced giving access to various features or increasing numbers of Requests/Queries per Month.

Benefits

For the User/Consumer

  • They aren’t obliged to download and setup another product or technology stack just to get some data.
  • They don’t have to work around the constraints of a sytem designed to cater to everyone

For the API Designers

  • They can look at providing a baseline API (performance, scale, API semantics) without extensive aggregations and trying to guess people needs.