Work towards an improved IATI validator


(Dale Potter) #1

Over the past few weeks, the IATI Technical Team have been making first steps to build an improved validator. This follows the great work by done by David Carpenter and others who built the initial validator, which has served IATI well since it’s initial development in 2012.

Why a new tool is needed

However, as users call for added validation functionality, the technical team feel would be better served by a revamped version, rather than trying to bolt functionality onto this legacy tool.

The overall aim is to build a new validation tool that features:

  • XML syntax and IATI schema checking.
  • Content checking, to look for adherance to IATI Rulesets, and codelists. Plus basic sense checking where approproiate (for example: do specified transaction dates fall within the broad range of speficied activity start/end dates).
  • RESTful API functionality, that can be used by bespoke publisher applications.
  • Improved error messages, showing the context of any errors and pointing users towards appropriate documentation and guidance.

The new tool will also have technical advantages. It will:

  • Rationalise the IATI technology stack further towards Python and the ‘lxml’ XML parser.
  • Make use of improved software design patterns.
  • Enable integration into our other IATI tools, making better use of common functionality.

Our development approach

We are using agile and ‘API first’ approaches to the development of this tool. Work is organised around 2-3 week sprints culminating the the release of a usable and testable product at the end of each iteration.

The improved validator will continue to be an open-source tool and work is taking place on a new branch of the IATI-Public-Validator github repository. As ever, we welcome comments, suggestions and scrutiny on our work.

Sprint 1 complete

The first sprint has just completed and focuses on basic API functionality for XML syntax (well-formed) and validation against an (automatically detected) IATI version.

Examples API usage can be found here with an available public endpoint of: http://dev.validator.iatistandard.org/api/validate/xml

Future sprints will focus on adding documentation, a user-interface and content checking.

Watch this topic for updates.

We will be posting further updates to this thread and deploying to the public instance at regular intervals. In the meantime, we welcome user comments and testing of the current work.


IATI Identifiers should not be allowed to contain special characters
Technical measures to improve/incentivise better data quality
(Herman van Loon) #2

Hi Dale. Good to see that you started improving the IATI public validator. Is any work still being done on the IATI public validator? The current implementation works fine for files up to 10 Mb. Unfortunately large XML’s, such as the activity file from Oxfam Novib Netherlands can not validated. Since the IATI public validator is a critical piece of software for IATI data quality assurance, I think it is important that the validator is also able to handle large XML’s.


(Dale Potter) #3

Hi Herman,

You’re right that the current validator seems to struggle with files over 10MB, including the Oxfam Novib dataset that you highlighted, which is around 35MB in size.

It’s unlikely that we’ll work to improve the validator to handle larger inputs, but there’s good news in that it seems that the new validator can handle large files very well. I’ve just used this to confirm that this large file validates against version 2.02 of the schema.

$ wget http://www.oxfamnovib.nl/redactie/Downloads/XML/iati.xml -O oxfamnovib_iati.xml
# Small edit made to the file to remove the encoding declaration on the first line - i.e. '<?xml version="1.0" encoding="utf-8"?>'
$ curl --data-urlencode "xml@oxfamnovib_iati.xml" http://dev.validator.iatistandard.org/api/validate/xml
{
  "error_count": 0, 
  "error_detail": [], 
  "metadata": {
    "began": "Fri, 07 Oct 2016 16:20:57 GMT", 
    "completed": "Fri, 07 Oct 2016 16:20:59 GMT", 
    "file_size_bytes": 38453296, 
    "version": {
      "type": "Detected", 
      "version_tested": "2.02"
    }
  }, 
  "status": {
    "status_detail": {
      "status_schema": "Pass", 
      "status_well_formed_xml": "Pass"
    }, 
    "status_overall": "Pass"
  }
}

We are working to finalise a user-interface as an alternative to the API too. This has been delayed, partly due to TAG preparation work, but we hope to release the first iteration this in coming weeks.

Hope that this helps in the meantime.


(John Adams) #4

Hi Dale, has there been any further progress on the shiny new validator?


(Dale Potter) #5

Hi John,

I’m afraid work on a new validator paused in the autumn on account of other internal priorities (including initial preparation for the upcoming TAG, recruitment of more developers, and others). Furthermore, we have pivoted our focus in the past few weeks to shoring up current tools, adding a number of automated tests to check availability and improve deployment practices.

We expect to refocus attention towards building an improved validator user-interface and API sometime in Q1 of next year. Before then, we will be spending sometime before then to scope and build the foundations of a new architecture plan for the array of IATI Secretariat tools. This will aim to build solid foundations for new tools (such as the validator) as well as transition legacy products - more on this to be shared publicly in the new year!

There could be scope to discuss IATI Secretariat architecture plans at the TAG 2017, as we will be keen to gather expert input and advice in order to reduce the past proliferation of common functionality and integrate existing tools - watch this space…

Sorry that there hasn’t been more exciting news to report on the validator, but hope that this gives a good overview of what (we hope) is to come!