(posting this in Data Quality to begin with…)

The IATI Rulesets are a set of conditions that it would be very useful for any IATI publisher to consider in their publication.

These are not enforced at the schema level, but additional. Think of these rules as the bits of “guidance” you often see in the IATI schema, which are then represented as rules to help us use them! They represent logic (eg start date before end date!) and conditions (percenrages of multiple countries should add to 100%) and other things!

These rulesets are very important, and have long been available around IATI. We know that they are not currently used in the “official IATI validator”, but are used by others. At Open Data Services, we use the rulesets in our tools, for example.

Who else uses the rulsets? Rolf Kleef Roderick Besseling John Adams maybe?

Looking at the current IATI rules, and in discussion with others such as Yohanna Loucheur Herman van Loon Rob Redpath (IATI Secretariat) & Andy Lulham , there seem to be a couple of omissions, which I would suggest we incorporate:

Sectors
There is no rule in terms of multiple sectors from the same vocabulary should add to 100%. The standard text says:

All reported sectors from the same vocabulary MUST add up to 100.

There is no rule in the ruleset for this

Policy Markers
There is no rule in terms of conditions of the Policy Marker codes

The standard says:

Policy Significance code = 4 (Explicit primary objective) SHOULD ONLY be used in conjunction with Policy Marker code = 9 (Reproductive, Maternal, Newborn and Child Health)

There is no rule in the ruleset for this

==================

There may be others too - so I wonder if there is chance to get a wider review of these rulesets…?

Comments (14)

Andy Lulham
Andy Lulham
Image removed. stevieflow:

The standard says:

Policy Significance code = 4 (Explicit primary objective) SHOULD ONLY be used in conjunction with Policy Marker code = 9 (Reproductive, Maternal, Newborn and Child Health)

So, the same page also says:

When multiple sector are declared, then the @percentage values should sum to 100% for the specific iati-activity.

I.e. the “must” becomes a “should”. Also:

1.03

Where used, the @percentage attribute is now designated as a decimal value and no longer as a positive Integer

The “where used” here suggests @percentage isn’t required. But I don’t know what the rule should be when it isn’t provided. Indeed, there are examples on the page that don’t include a @percentage e.g. this one:

<sector vocabulary="99" vocabulary-uri="http://example.com/vocab.html" code="A1" />

I’m not trying to muddy the waters here – I’m just pointing out that the waters are muddy. The standard appears to leave things a bit open to interpretation. IIRC, my own code uses the following quite pragmatic interpretation:

  • If percentages for a given sector vocab are not provided, assume equal split (e.g. if there are two, assume 50% each)
  • If percentages for a given sector vocab are sometimes provided, assume they’re correct, and assume equal split of the remainder
  • If percentages for a given sector vocab are always provided, but their total is greater than 100, rescale (i.e. don’t treat percentages as percentages, but treat them instead as a proportion of their sum)
  • If percentages are sometimes provided, but their total is greater than 100, ignore them and assume equal split (since there’s no satisfactory way to work around this data error)

TL;DR: for the data user, life is complicated.

Anyway. Assuming the correct interpretation of this rule is the “MUST” one, I made a start at adding it to the standard ruleset. This fix adds a rule specifically for the DAC vocab (i.e. @vocabulary="1" or @vocabulary not present). Doing it for an arbitrary vocab would involve looping, which would constitute a bigger (though by no means insurmountable!) change.

matmaxgeds
matmaxgeds

Hi both,

I have also struggled with this when trying to use IATI data - as a result I would be in favour of adding a ‘must’ to the rule.

Alternatively, would it be possible/helpful to add to the standard a section on ‘official approach to interpreting’ e.g. following the rules Andy Lulham uses so that publishers who did not give percentages adding to 100% could be sure of how it would be interpreted - and I would therefore hope that this would mean that all the platforms that publish IATI data would also use the same approach - otherwise we have the situation where different platforms could be showing different results for the same IATI data - which I think starts to really damage the standard.

Does anyone know what ruleset e.g. d-portal uses - is it the same as Andy’s?

I would prefer the ‘must add to 100 to be valid’ approach because when sector codes are not given, it is incredibly unlikely to be an even split - and so IATI data ends up inaccurate and gets rejected at the country level but users who are aware of the individual projects. When I have done number crunching with an ‘assume equal split’ approach (because I agree that there is not a better one) it has often ended up with a (false) upward bias to small sectors that often get thrown into many projects e.g. PFM, gender, human rights - leading to wrong policy conclusions.

Does the same kind of issue apply in other places in the standard/rules e.g. in the percentage allocations to different countries for multi-country projects?

Andy Lulham
Andy Lulham

Image removed. matmaxgeds:

Does anyone know what ruleset e.g. d-portal uses - is it the same as Andy’s?

It appears to be the following:

  • if there’s no @percentage, assume it’s 100% (even if there are multiple sectors)
  • totals that don’t add to 100% are still shown
  • everything gets rescaled for the pie chart. I.e. if the total is greater or less than 100%, rescale.

For instance, here’s an activity in d-portal with 10 sectors, adding up to 1,000%:

 

Image removed.

Screen Shot 2018-07-17 at 22.11.33.png908×370 57 KB

ref.

Here’s one with 2 sectors adding up to 50%:

 

Image removed.

Screen Shot 2018-07-17 at 22.27.54.png912×366 14.5 KB

ref.

Image removed. matmaxgeds:

Does the same kind of issue apply in other places in the standard/rules e.g. in the percentage allocations to different countries for multi-country projects?

recipient-country and recipient-region have their own issues! But we should probably save that for a separate discussion.

matmaxgeds
matmaxgeds

thanks Andy Lulham for the detective work - I guess I can see the logic in the second approach - stick to the given numbers (but as you say, the pie chart should really only show 50% accounted for) - as for the first one - no comment!

shi
shi
Image removed. andylolz:

Does anyone know what ruleset e.g. d-portal uses - is it the same as Andy’s?

It appears to be the following:

Thanks, Andy Lulham - yup, that’s what we’ve done.

If you don’t specify a number, we will treat it as 100% so this way, when multiple sectors are not given a percentage, we are still able to rescale the pie chart and they all get an equal share.

We basically treat the numbers as ratios.

Once we’ve got all of the sectors, we add them up and if it comes to more or less than 100%, we adjust it so it adds up to 100% by scaling; ie. if it adds up to 200%, we half it to fit.

If the numbers don’t add up to 100%, we attribute this to a data quality issue.

The original numbers are displayed in SAVi.

Image removed. matmaxgeds:

Does the same kind of issue apply in other places in the standard/rules e.g. in the percentage allocations to different countries for multi-country projects?

In d-portal, yes. We can’t split the money unless it all adds up to 100% so we always make sure it adds up to that.

If people are publishing the right numbers, this would not be a problem but when they don’t, we make a ‘best guess’. Otherwise, we will not be able to include that activity in the portal.

Ultimately, these graphs are not just visual representations of the data, we also use these calculated percentages for the rest of d-portal; ie. all the sector and publisher tables.

This means, the whole site is dependant on the quality of the data that has been published.

shi
shi
Image removed. andylolz:

Does anyone know what ruleset e.g. d-portal uses - is it the same as Andy’s?

It appears to be the following:

Thanks, Andy Lulham - yup, that’s what we’ve done.

If you don’t specify a number, we will treat it as 100% so this way, when multiple sectors are not given a percentage, we are still able to rescale the pie chart and they all get an equal share.

We basically treat the numbers as ratios.

Once we’ve got all of the sectors, we add them up and if it comes to more or less than 100%, we adjust it so it adds up to 100% by scaling; ie. if it adds up to 200%, we half it to fit.

If the numbers don’t add up to 100%, we attribute this to a data quality issue.

The original numbers are displayed in SAVi.

Image removed. matmaxgeds:

Does the same kind of issue apply in other places in the standard/rules e.g. in the percentage allocations to different countries for multi-country projects?

In d-portal, yes. We can’t split the money unless it all adds up to 100% so we always make sure it adds up to that.

If people are publishing the right numbers, this would not be a problem but when they don’t, we make a ‘best guess’. Otherwise, we will not be able to include that activity in the portal.

Ultimately, these graphs are not just visual representations of the data, we also use these calculated percentages for the rest of d-portal; ie. all the sector and publisher tables.

This means, the whole site is dependant on the quality of the data that has been published.

Steven Flower
Steven Flower

Thanks for the discussion and research on the sector issue Andy Lulham matmaxgeds

I just want to circle back up to the original post I made:

Image removed. stevieflow:

There is no rule in terms of multiple sectors from the same vocabulary should add to 100%.

I can appreciate this gets complicated, quickly - but I’m advocating for an inclusion of a rule in the rulesets, so people can at least use it. Of course, there will be exceptions and complications - but I also understand that is the purpose of rules in the rulesets: they are there for guidance - it’s not “essential” that every single activity pass them successfully (whereas schema validation is much more strict?)

If we continue to avoid mention of sectors/percentages in the rulesets, then I think we can have a very long thread on where they are not working, but no means to help people address it, systematically.

Therefore, I’d support the contribution from Andy Lulham .

Andy Lulham
Andy Lulham

I also support adding this to the standard ruleset. But it would be great to firm up the docs so it’s clear.

I’ve now sent a fix for v2.03, both for DAC and other vocabs:
github.com/IATI/IATI-Rulesets Image removed. Add sector percentage rule (v2.03) by andylolz on 08:56PM - 18 Jul 18 UTC 3 commits changed 10 files with 116 additions and 2 deletions.

It’s python only at the moment (hence PHP tests fail) but I might add PHP at some point. Implemented in both PHP and python.

Steven Flower
Steven Flower

Thanks matmaxgeds Andy Lulham

How about the Policy Marker rule Yohanna Loucheur Herman van Loon (Andy Lulham - could you do similar for this one, maybe?) ?

Herman van Loon
Herman van Loon

Hi Steven,

I can appreciate this gets complicated, quickly - but I’m advocating for an inclusion of a rule in the rulesets, so people can at least use it. Of course, there will be exceptions and complications - but I also understand that is the purpose of rules in the rulesets: they are there for guidance - it’s not “essential” that every single activity pass them successfully (whereas schema validation is much more strict?)

Do not agree with this statement. A IATI rule is imo opinion applicable to each and every IATI activity. An IATI guideline is an advise to IATI publishers, but does not need to apply to each activity.

With regard to sector percentages in the same vocabulary: if percentages are published, then they should add up to 100%. Since percentages are optional, there should imo be another rule:
you publish for every sector a mandatory percentage, or you publish no percentages at all. So no mixed bags of some sectors having a percentage and some not.

Steven Flower
Steven Flower

Fair point Herman van Loon

Image removed. Herman:

Do not agree with this statement. A IATI rule is imo opinion applicable to each and every IATI activity. An IATI guideline is an advise to IATI publishers, but does not need to apply to each activity.

That’s useful, when put that way. I guess the question is: have we any guidance that is currently a rule? Or vice versa?!


Please log in or sign up to comment.