Challenge: results with multiple indicators and multiple measures

results

(Steven Flower) #1

I’m working with an organisation who wish to publish results data.

For any indicator, there will be multiple measurements (periods) that provide target and actuals, like this:

The standard allows this:

  • a result can have one or more indicator
  • any indicator can have more than one period (which then holds the target and actual data)

So far so good. But… how do we actually maintain this data, for the production of IATI. And then - use it!

Missing in the schema is the ability to somehow identify the result(s) and indicator(s), meaning that whilst you might want to say in the above model:

the actual value 100 relates to the period Q2 1617, and contributes to Indicator A, which is a part of Result 1.

It doesn’t seem possible to declare the concepts of Result 1 and Indicator A.

(the ref attribute of transaction might be a good analogy here)

This becomes a real difficulty when you’re managing data on multiple activities, all with many results, a range of indicators and various measurement periods.

Of course, it’s possible to do this directly in the XML, but the struggle is how to manage this in the lead up to publishing - whether spreadsheets, in house systems or elsewhere. The Aidstream interface would enable this multi-multi recording, but I assume there’s some database behind it - @anjesh , do you provide a databse key (or similar) for the above elements?

And - at the other end, if someone wants to convert some XML to a spreadsheet / other format - how do packages then handle such multi-multi relationships, with IDs?

@Herman @rolfkleef @pelleaardema what’s your experience?

@SJohns @mikesmith @rbesseling did this come up in the CSO work on results?


Session for TAG2018 on results data - looking to collab
(Hayden Field) #2

Indicators have a reference element, so you could potentially declare the UID of Indicator A by adding the following within an <indicator>:

<reference vocabulary="99" code="Indicator A" indicator-uri="http://example.com/document-detailing-indicator-usage" />

There is not currently a similar element or attribute to define a UID for results… though one is to be added at 2.03.


(Pelle Aardema) #3

Hi Steven,

Yes, we’re gaining more and more experience with this.

The relation between the actual and the indicator (and result) it provides a value for, comes from the nested structure.

Your example is still a relatively straightforward one: with a limited number of indicators, and the periods for target and actual values being the same. In practice I often see:
period: 5 years
target: X
period: year 1
actual: Y
period: year 2
actual: Z

The Aidstream import function is perfectly capable of handling the kind of table structure in your example.
But managing many activities with many results and indicators does indeed become rather laborious. I get the feeling that the easiest way to maintain the structure in Aidstream is to keep the result sheets as a spreadsheet, and make a clean import for each publication.

Things become a bit more interesting when organisations use the same indicators in different activities (and even in different datasets by different organisations). That is where the references come in handy.

Another nice exercise is to add another level and start adding dimensions.

Concerning the conversion of XML to spreadsheet: I would make it a ‘proper’ data table again, so repeat the result and indicator elements in each row in order to maintain the relation.

table


(Steven Flower) #4

@pelleaardema @hayfield many thanks for your thoughts - much appreciated

Side note: With Organisations we are well-drilled into terms of the fact that the string DFID, or D.F.I.D is not really satisfactory to point precisely at the entity known-as DFID. To do this, we also need to include the reference attribute GB-GOV-1, which uniquely identifies that organisation. We know that isn’t widespread in IATI, but the mechanism and willing is there

This is what I miss in results. My example may have mislead, as it looked like the results and indicators had unique references. I’ve tried the following (please note - this is not meant to indicate any actual real data / context).

In this example an organisation is publishing results in two different activities. It’s a one-to-one relationship though, so any measurements are just associated with the relevant indicator and result via the Activity ID

But then:

By luck (!) the names of the indicators are slightly different, so we can still identify the measurements accordingly, but I think this illustrates your point:

Are we therefore expecting people to manage the data (whether they are publishing it via any means, or trying to use it via some other way) to rely on strings for identification?

Apologies - I still don’t think I’m explaining the issue in prose tremendously well. I’m thinking specifically of instances when this data is flat/normalised either before publication or in use. I spoke to @bjwebb briefly about this, He said: “In Open Contracting everything has a UID…”


(Pelle Aardema) #5

Sorry, I’m not sure if I understand the problem you’re trying to express here.
The relation between the value->indicator->result comes from the nested structure… and if organisations choose to include two (nearly) similar indicators under the same result that sounds like a badly designed result framework, rather than an IATI problem.

And with the new upgrade indicators already get the possibility to include a unique reference.

Where am I missing the point?


(Hayden Field) #6

@rory_scott has attempted to explain the context of this in-person, so I understand it’s to do with the lack of UIDs and the difficulty this presents in creating a normalised data structure that can handle more than a single snapshot of information.


On the publishing-side:

  • Assuming full control of the end-to-end system: it should be possible to add UIDs to the system. These are then stripped out (ie. not displayed) during the process of converting to IATI XML.
  • Without full control of the end-to-end system (eg. using Aidstream): there is no UID in IATI to map “new data” to “old data” to know exactly what you are updating. If the internal Results system is reasonable some combination of other values should be able to act as a UID, though this could lead to problems if fixing typos or whatever.

On the data-use side:

  • Using a Dataset once: add some keys to the database and use it.
  • Updating data in a database to, eg. see changes between publications: use some heuristic that probably differs based on the publisher, isn’t entirely reliable, and in general is a pain to work with.

So…

  • Dealing with one-off actions is fine.
  • Handling data-over-time cannot reasonably be undertaken with the Standard as it stands.

The reference elements, available on an indicator at 2.02, and to be added to result at 2.03 codify what the information is about, but is not suitable for acting as a UID. This is especially true with the addition of the The reference element MUST ONLY be reported once. Either [at indicator] OR at result level.


Soooo, due to the 1:many and many:many relationships and lack of UID in the data, it is extremely challenging to track results data over time - on both the publishing and data use sides.

Adding UIDs (akin to transaction/@ref) would help here. There may also be other places in the Standard where it would be helpful to add UIDs for similar reasons.


(Steven Flower) #7

+1

Thanks @hayfield & @rory_scott for talking this through, and writing up

Apologies @pelleaardema that I couldn’t explain in my post(s)!


(Steven Flower) #8

@pelleaardema so… I’ve an example from Nederlands Red Cross that might be of interest:

http://d-portal.org/q.html?aid=NL-KVK-40409352-PRJ08-260-0009

As you can see, this activity has three results, which each have multiple indicators. And then - most indicators seem to have two periods. Of course, we’re lacking the “data” in the form of targets and actual in most cases, but that doesn’t matter…

I took the XML and “flattened” this - taking the nested data and representing it into flat files - in this case a tabbed workbook:

This process results in three sheets to look at - the results, the indicators and the periods. Hopefully you can then see, there doesnt seem to be a way to piece these together into the original arrangement. I don’t know which indicators / periods belong to which results, for example.

This - as @hayfield & @rory_scott discuss - looks to be because we dont have UID in the XML , which could help…

Is that a useful example?


(Hayden Field) #9

I don’t know which indicators / periods belong to which results, for example.

During the flattening process, your own UIDs can be added. This would allow you to piece the information back together.

As @pelleaardema noted, the relation comes from the nested structure. As such, whatever manipulation is undertaken needs to maintain a representation this nesting if no information is to be lost.

The addition of a UID to the source XML would help, but it does not prevent this task from being undertaken.

This - as @hayfield & @rory_scott discuss - looks to be because we dont have UID in the XML , which could help…

The “this can’t be done without a UID” problem requires multiple snapshots of data.

  1. I have a copy of that same activity file, containing 3 results.
  2. The Netherlands Red Cross update the data so that each of the 3 results has changed.
  3. I cannot with certainty say which of the results in the new file relates to which of the results in the original file.

(Steven Flower) #10

Sure. But that places an overhead on me to do this. If I figure that out, it might not be the case that my neighbour has.

We have a focus on people using IATI data, no?


(Hayden Field) #11

But that places an overhead on me to do this.

The publisher has already provided details of the relationships in the Dataset through use of child-elements. Adding UIDs to ‘solve’ this problem would be duplicating information in the XML, something which is bad.

One reasonable solution to the situation you are detailing would be proposal number 1 by @JohnAdams - mappings between and details of technical implementations of the Standard could define how nesting is handled in varying formats.


(Steven Flower) #12

I’d rephrase: The publisher has provided details of the relationships using the existing standard. The existing standard doesn’t easily enable UIDs for these sub elements.

Bad for whom?


(Hayden Field) #13

The publisher has provided details of the relationships using the existing standard.

The Standard needs to enable the relationships between elements in a single snapshot of data to be defined and understood. As stated here, it does that.

The existing standard doesn’t easily enable UIDs for these sub elements.

UIDs are not required in the XML to understand the relationships that exist within a single data snapshot.

Bad for whom?

Users of data - it would lead to multiple ways to define literally the same information.

Create proper conversion tools and there is zero information missing to undertake 2-way snapshot conversion to and from a custom flattened format. It’s a tooling problem, not a data problem.


I’m not against UIDs where they enable something that is not otherwise possible (see: above). For this use case, however, they provide no information that is otherwise unavailable in the XML.


(Pelle Aardema) #14

I agree.

@stevieflow Looking at your ‘Flattened data’ example, I get the impression you’re separating elements that belong together, and I still don’t understand what you’re trying to do:

  • Why would you separate the indicators from the results they aim to measure?
  • And why would you separate the periods from the indicators they belong to?

Concerning the issue of measuring/showing changes over time: the IATI Standard wasn’t designed to facilitate a diff analyses between files, but rather keep the history available in one place:

“Publishers should NOT create additional new files that only contain the most recent information each time they publish. Ideally all the information that relates to a specific activity should be kept together as much as is possible as this makes the ultimate use of the data by any third party much more efficient. Data should be cumulative (e.g. if publishing every quarter, the new files will include the quarter being reported on, and the previous quarter reported last time, and so on). It will also enable amendments to be made to existing data.”

At the Netherlands Red Cross we keep track of changes over time by defining different periods for targets and actuals:

  • the target is usually valid for the entire activity period (but it’s also possible to work with annual targets)
  • the actual values will reflect the progress made over a shorter period of time, e.g. a quarter. In this case it is important that the actual value shows the increment in that period, rather than the cumulative value.

(Steven Flower) #15

I’m not doing anything - I just asked a machine (in the form of CoVE code) to output the XML as spreadsheet!

I can also go and use also some xslt that @TimDavies wrote way back, which is still integrated with the Registry. Hence, I can get a flattened version of the XML for this publisher. Take a look at the Full Activity CSV. Does that work?

I’d be interested to know what the IATI datastore does with results data when you request a csv output, but I dont think the datastore API outputs results data - correct @dalepotter?

@VincentVW can I request results data from OIPA in CSV? If I do, what does NL-KVK-40409352-PRJ08-260-0009 look like?

In short - IATI XML is machine readable format:

IATI uses a data format called XML. It is very easy to convert this format into more accessible formats – such as CSV, or even use it to drive tools that can generate graphs and tables of data from queries.

Use of XML enables swift, machine-readable data to be easily exchanged, compared, and mashed up with other data published in the XML format.

Therefore, when we ask these machines to exchange the data into other formats, can it still be used (in the context of results)? To repeat: this is not about me pasting data into spreadsheets or something! And - I get that it’s nested in the XML!


(Hayden Field) #16

The confusion is therefore that you are expecting that a tool that performs one-way transformation of published data (XML -> spreadsheet) to provide an output that can be used for two-way transformation (and back again).

The IATI XML contains information about the relationships between results and their nested elements such as indicators. That these relationships are lost during transformation is a problem with CoVE, not with the IATI XML.

Correct.

Yes.

IATI XML is an XML-based format for data exchange. As such, any tool that wishes to exchange data in other formats cannot assume that the other formats will be fully compatible with each other - they will be compatible with the specific tool(s) they take input from.

IATI XML contains all the information (eg. nested elements) and capabilities (eg. custom namespaces) required for 2-way transformation. If a specific tool only provides one-way transformation, that’s a tool design decision rather than an IATI XML bug.


Should it be deemed that exchanging data in non-XML formats is a fundamental requirement of IATI, proposal number 1 from this linked post should be taken forward.


(Pelle Aardema) #17

Then CoVE doesn’t seem to be the right machine to ask for this specific job. It loses vital information while deconstructing the results framework.

The XSLT that Tim wrote isn’t adjusted to IATI 2.0x.
I have made an attempt to fit it to 2.01 in the past. Not 100% accurate, but helpful for quick analysis.
This code, however, combines multiple values in one cell - so it won’t do the job for this specific use case either.

As far as I know the IATI datastore doesn’t output results data…

Simplest solutions I know:

  • Import an IATI file in OpenRefine, and limit the colums to iati-identifier + the results section.
  • I’ve played for half an hour with Excel Query on the NLRC dataset, and fairly easily pulled out the results framework (although I somehow missed the actuals) - see the attached file

NLRC results.xlsx (86.0 KB)


(Steven Flower) #18

I’m not going to let this one go btw :slight_smile:

Looking at data users, I noticed that d-portal actually present a form of numbering/ordering when there are results data. Two random examples:

http://d-portal.org/q.html?aid=44000-P106390
http://d-portal.org/q.html?aid=NL-KVK-41198677-AFGO_PROJECTCLUSTER-1006719

The latter example is interesting, as the publisher has actually included some form of UID in the narrative. Obviously, d-portal doesn’t recognise this, and presents a different take:

So - here’s an example of a publisher trying to include an identifier for results (in narrative) and a data user applying its own. Would be a lot easier if this could be expressed and used via the actual XML, no?


(Bill Anderson) #19

Are you saying that for every case of double-nesting the standard should insist on UIDs?

  • Isn’t the alternative a set of guidance for those wanting to create flat files?
  • Or campaigning for the standardisation of indicators (which would provide a UID and make it possible to compare results across publishers and activities)?

(Steven Flower) #20

@bill_anderson I’m not campaigning for anything! I’ve been trying to reflect how problematic it can be to prepare and use results data, with the lack of UID in the IATI schema.

I’m working with a publisher who have UIDs for their results and indicators (and even the time periods) in their database (obviously!) - so we will try and output this in IATI XML as an extension.

@hayfield & @pelleaardema seem to actively disagree with everything I suggest, so Ill try and demonstrate it in some published data…