IATI Dashboard & Data quality - your views welcome

(Dale Potter) #1

I have been asked to scope out improvements to the IATI Dashboard and am keen to hear the community’s views on this tool.

We are particularly interested in how we can help publishers to understand their data quality.

Currently, all current IATI publishers have a page which offers an overview of their data (example page for the UK Department for International Development). This page is has the following sections:

  • Headlines - Overview of data by this publisher (number of files, IATI versions used, etc). Plus graphs showing the number of activities/activity files/organisation files/total file size over time
  • Data Quality - summary of automated issues found, for example Files Failing Validation
  • Financial - counts the number of budgets published, together with aggregated totals
  • Exploring Data
    • Files - Overview of all publisher’s files, showing the number of activities contained, file size and version
    • Codelist values - summary of the codelist values used for each element used by this publisher
    • Elements and Attributes published - shows counts for how often the IATI schema is used by this publisher, and details which of their files

Questions for publishers/data users:

We are keen to hear your thoughts, some questions to get you thinking:

  • Which of the above components (if any) do publishers/data users find most useful?
  • Which of the above components (if any) do publishers/data users find least useful?
  • Is there any data which should be presented in a different way? Or should the page have a different look and feel in general?
  • Are there other aspects to data quality that should also be included? Can you demonstrate how this would make a difference to you?
  • How often do you use the dashboard? Would this change if different data/format was available?
  • Is the page fine as it is? In which case, are there other things which you would suggest we should improve instead?

At present, we are in the scoping phase and timescales for development have not been fully defined, but we are keen to hear your views to inform our thinking on how much priority should be assigned to this.

Data Quality Issues
(Vincent van 't Westende) #2

Hi Dale,

First of all I have to say the dashboard is a great tool and I use it a lot! As a data user it provides me the necessary meta on datasets / fields used. Very handy when developing data portals. As maintainer of OIPA it provides me with an easy way to cross-check my findings.

The components I find most useful: The headline overview table, Exploring data components, the table of contents to get to those exploring data components which are at the bottom of the page.

The components I find least useful: All the headline graphs (they are useful, but only for use cases that don’t happen very often), Financial, Data quality (XSD validation is not enough for my use cases).

The presentation of all components is ok for me, no suggestions on that.

The thing that could use improvement is the data quality component. XSD validation just covers the basics, there’s more in-depth validation to check. Here’s everything I can think of now:

  • Are all used codelist values on the accompanying codelist. this is already displayed on the dashboard at Exploring data -> Codelist value, but that ‘Values not on codelist’ column should lead to data quality issues in the data quality component.

  • Check if defaults are set when data is omitted. For example; according to my data there’s over 500,000 narratives (in 519 files) that don’t have a language nor default language set. Strictly taken this is a violation of this rule and those narratives should be invalidated; http://iatistandard.org/201/activity-standard/iati-activities/iati-activity/#iati-activities-iati-activity-xml-lang (Correct me if I’m wrong here! ).

  • For a data publisher it would be useful to know where these errors happen, so to have an iati-identifier and/or xml line number.

  • For a data journalist/scientist it might be useful to have aggregated numbers on these errors? Not sure on this one.

I consider it quite concerning to IATI that there’s hidden errors in the data of publishers since it leads to incomparable data. Here’s an example, which is quite a bad example because its not hidden nor do their files validate the XSD, but it demonstrates a bad practice caused by bad data.

The policy markers of the EC-DEVCO all have a zero prepended to their code; http://dashboard.iatistandard.org/publisher/ec-devco.html#p_codelists

This leads to all 300,976 policy markers to be invalid. Meanwhile, I can find those activities through filtering policy markers on the EU Aid Explorer; https://euaidexplorer.ec.europa.eu/SearchPageAction.do

So instead of getting DEVCO to correct the error in the data, the developers on the EU Aid Explorer started to cheat the standard and wrote something to make them valid policy markers. Obviously this leads to IATI dialects and incomparable data.
I also saw an example last week where these kind of things happen on the transactions receiver/provider, and thats an even bigger problem. Then not only comparability but also traceability would become impossible without cheating. Note to developers; try to never cheat the standard on purpose! I know that sounds impossible in some cases, but it really shouldn’t be your problem and in the end its bad for everyone.

Recently I’ve been working on making OIPA’s validation errors more transparent. This provides some of the functionality I just described that’s not in the IATI Dashboard yet. I’m thinking about creating a small tool to display it, first reason is awareness and it not fully being shown on the IATI Dashboard yet. Second reason is that people can point me at errors we make in OIPA, though I hope thats not the case too often. For both reasons it would be great if this would also be possible through the IATI Dashboard. That would provide us both a cross-check to improve our tools, and in the end provide data publishers with better tools to get technically high quality IATI files.