IATI rulesets - add rules for sector percentages and policy marker conditionals


(Steven Flower) #1

(posting this in Data Quality to begin with…)

The IATI Rulesets are a set of conditions that it would be very useful for any IATI publisher to consider in their publication.

These are not enforced at the schema level, but additional. Think of these rules as the bits of “guidance” you often see in the IATI schema, which are then represented as rules to help us use them! They represent logic (eg start date before end date!) and conditions (percenrages of multiple countries should add to 100%) and other things!

These rulesets are very important, and have long been available around IATI. We know that they are not currently used in the “official IATI validator”, but are used by others. At Open Data Services, we use the rulesets in our tools, for example.

Who else uses the rulsets? @rolfkleef @rbesseling @JohnAdams maybe?

Looking at the current IATI rules, and in discussion with others such as @YohannaLoucheur @Herman @robredpath & @andylolz, there seem to be a couple of omissions, which I would suggest we incorporate:

Sectors
There is no rule in terms of multiple sectors from the same vocabulary should add to 100%. The standard text says:

All reported sectors from the same vocabulary MUST add up to 100.

There is no rule in the ruleset for this

Policy Markers
There is no rule in terms of conditions of the Policy Marker codes

The standard says:

Policy Significance code = 4 (Explicit primary objective) SHOULD ONLY be used in conjunction with Policy Marker code = 9 (Reproductive, Maternal, Newborn and Child Health)

There is no rule in the ruleset for this

==================

There may be others too - so I wonder if there is chance to get a wider review of these rulesets…?


DAC WP-STAT group redefining purpose codes (sector codes) to align with SDGs
(Andy Lulham) #2

So, the same page also says:

When multiple sector are declared, then the @percentage values should sum to 100% for the specific iati-activity.

I.e. the “must” becomes a “should”. Also:

1.03

Where used, the @percentage attribute is now designated as a decimal value and no longer as a positive Integer

The “where used” here suggests @percentage isn’t required. But I don’t know what the rule should be when it isn’t provided. Indeed, there are examples on the page that don’t include a @percentage e.g. this one:

<sector vocabulary="99" vocabulary-uri="http://example.com/vocab.html" code="A1" />

I’m not trying to muddy the waters here – I’m just pointing out that the waters are muddy. The standard appears to leave things a bit open to interpretation. IIRC, my own code uses the following quite pragmatic interpretation:

  • If percentages for a given sector vocab are not provided, assume equal split (e.g. if there are two, assume 50% each)
  • If percentages for a given sector vocab are sometimes provided, assume they’re correct, and assume equal split of the remainder
  • If percentages for a given sector vocab are always provided, but their total is greater than 100, rescale (i.e. don’t treat percentages as percentages, but treat them instead as a proportion of their sum)
  • If percentages are sometimes provided, but their total is greater than 100, ignore them and assume equal split (since there’s no satisfactory way to work around this data error)

TL;DR: for the data user, life is complicated.


Anyway. Assuming the correct interpretation of this rule is the “MUST” one, I made a start at adding it to the standard ruleset. This fix adds a rule specifically for the DAC vocab (i.e. @vocabulary="1" or @vocabulary not present). Doing it for an arbitrary vocab would involve looping, which would constitute a bigger (though by no means insurmountable!) change.


(Matt Geddes) #3

Hi both,

I have also struggled with this when trying to use IATI data - as a result I would be in favour of adding a ‘must’ to the rule.

Alternatively, would it be possible/helpful to add to the standard a section on ‘official approach to interpreting’ e.g. following the rules @andylolz uses so that publishers who did not give percentages adding to 100% could be sure of how it would be interpreted - and I would therefore hope that this would mean that all the platforms that publish IATI data would also use the same approach - otherwise we have the situation where different platforms could be showing different results for the same IATI data - which I think starts to really damage the standard.

Does anyone know what ruleset e.g. d-portal uses - is it the same as Andy’s?

I would prefer the ‘must add to 100 to be valid’ approach because when sector codes are not given, it is incredibly unlikely to be an even split - and so IATI data ends up inaccurate and gets rejected at the country level but users who are aware of the individual projects. When I have done number crunching with an ‘assume equal split’ approach (because I agree that there is not a better one) it has often ended up with a (false) upward bias to small sectors that often get thrown into many projects e.g. PFM, gender, human rights - leading to wrong policy conclusions.

Does the same kind of issue apply in other places in the standard/rules e.g. in the percentage allocations to different countries for multi-country projects?


(Andy Lulham) #4

It appears to be the following:

  • if there’s no @percentage, assume it’s 100% (even if there are multiple sectors)
  • totals that don’t add to 100% are still shown
  • everything gets rescaled for the pie chart. I.e. if the total is greater or less than 100%, rescale.

For instance, here’s an activity in d-portal with 10 sectors, adding up to 1,000%:

ref.

Here’s one with 2 sectors adding up to 50%:

ref.

recipient-country and recipient-region have their own issues! But we should probably save that for a separate discussion.


(Matt Geddes) #5

thanks @andylolz for the detective work - I guess I can see the logic in the second approach - stick to the given numbers (but as you say, the pie chart should really only show 50% accounted for) - as for the first one - no comment!


(Andy Lulham) #6

Just to illustrate one bit that I didn’t provide an example for…

<sector vocabulary="DAC" code="11220"/>
<sector vocabulary="DAC" code="11230" percentage="50"/>

^^ In this case, we have one sector with a @percentage and one without. The one without defaults to 100%, so this gets rendered as:

ref.


(Andy Lulham) #7

I’m reminded that there was also relevant discussion on this topic during the proposal for a sector/@no-aggregation attribute, as part of the v2.03 upgrade.


(Steven Flower) #8

Thanks for the discussion and research on the sector issue @andylolz @matmaxgeds

I just want to circle back up to the original post I made:

I can appreciate this gets complicated, quickly - but I’m advocating for an inclusion of a rule in the rulesets, so people can at least use it. Of course, there will be exceptions and complications - but I also understand that is the purpose of rules in the rulesets: they are there for guidance - it’s not “essential” that every single activity pass them successfully (whereas schema validation is much more strict?)

If we continue to avoid mention of sectors/percentages in the rulesets, then I think we can have a very long thread on where they are not working, but no means to help people address it, systematically.

Therefore, I’d support the contribution from @andylolz.


(Matt Geddes) #9

@stevieflow sorry for the drag into a discussion - also very much in favour of the rule from my side


(Andy Lulham) #10

I also support adding this to the standard ruleset. But it would be great to firm up the docs so it’s clear.

I’ve now sent a fix for v2.03, both for DAC and other vocabs:

It’s python only at the moment (hence PHP tests fail) but I might add PHP at some point. Implemented in both PHP and python.


(Steven Flower) #11

Thanks @matmaxgeds @andylolz

How about the Policy Marker rule @YohannaLoucheur @Herman (@andylolz - could you do similar for this one, maybe?) ?


(Herman van Loon) #12

Hi Steven,

I can appreciate this gets complicated, quickly - but I’m advocating for an inclusion of a rule in the rulesets, so people can at least use it. Of course, there will be exceptions and complications - but I also understand that is the purpose of rules in the rulesets: they are there for guidance - it’s not “essential” that every single activity pass them successfully (whereas schema validation is much more strict?)

Do not agree with this statement. A IATI rule is imo opinion applicable to each and every IATI activity. An IATI guideline is an advise to IATI publishers, but does not need to apply to each activity.

With regard to sector percentages in the same vocabulary: if percentages are published, then they should add up to 100%. Since percentages are optional, there should imo be another rule:
you publish for every sector a mandatory percentage, or you publish no percentages at all. So no mixed bags of some sectors having a percentage and some not.


(Steven Flower) #13

Fair point @Herman

That’s useful, when put that way. I guess the question is: have we any guidance that is currently a rule? Or vice versa?!


(shi) #14

Thanks, @andylolz - yup, that’s what we’ve done.

If you don’t specify a number, we will treat it as 100% so this way, when multiple sectors are not given a percentage, we are still able to rescale the pie chart and they all get an equal share.

We basically treat the numbers as ratios.

Once we’ve got all of the sectors, we add them up and if it comes to more or less than 100%, we adjust it so it adds up to 100% by scaling; ie. if it adds up to 200%, we half it to fit.

If the numbers don’t add up to 100%, we attribute this to a data quality issue.

The original numbers are displayed in SAVi.

In d-portal, yes. We can’t split the money unless it all adds up to 100% so we always make sure it adds up to that.

If people are publishing the right numbers, this would not be a problem but when they don’t, we make a ‘best guess’. Otherwise, we will not be able to include that activity in the portal.

Ultimately, these graphs are not just visual representations of the data, we also use these calculated percentages for the rest of d-portal; ie. all the sector and publisher tables.

This means, the whole site is dependant on the quality of the data that has been published.