Standards Day Proposal: Accommodating non-statistical, secondary sectors

Key Facts

Proposal Authors: Open Ag Funding Partnership
Standard change type: schema (and guidelines indirectly)
Elements affected: iati-activities/iati-activity/sector
Change Type: backwardly compatible
Proposed IATI Version: 2.03
Issues addressed:
- non-statistical activity classification
- activity-search
- SDGs

TL;DR

Let’s add an @aggregation-status attribute to the iati-activity/sector element so that we can put in tags for discovery, categorise in a non-statistical way, and make implementing the SDGs more straightforward.

Please note that this proposal has arisen from discussions that have taken place at the TAG - we came to the TAG with the aim of starting discussions around this topic, and evangelising our own user needs. Here, we’ve found that the humanitarian sector, and publishers trying to incorporate the SDGs into their data have similar concerns, and so the following proposal has been put together for consideration.

This is not about changing, or replacing the OECD DAC CRS Purpose Codes!

This isn’t the only way of addressing the user needs below, and we’re open to the other options!

The user needs

1. Activity search

The proposed change is not sector specific, but informed by a sector-specific use case. Let’s consider this, and some other use cases:

As agricultural users and publishers of IATI we want to make agricultural investment data more findable, without making IATI less usable or clear, and with the minimum impact on the standard, to facilitate and promote more agricultural publication and data use.

Now let’s abstract this:

As a sector-specific user (humanitarian, health, nutrition, education, etc.), I want a way of finding relevant IATI activities which are categorised to a level of detail which fits my sector’s interests, so that I can access and use investment data which is relevant to it.

2. Non-statistical / SDG classification

Proviso from the author: I’m not an SDG expert, but I have gotten the impression that others are asking similar questions about classification, and I hope this section starts a conversation at least.

Another potential scenario is that a publisher wants to associate their activity with various SDG goals, but that there is no sensible way to allocate percentages, because the SDGs aren’t mutually exclusive, and one activity might contain interventions which simultaneously address multiple goals.

In this instance, it would still be valuable for data users to know that an activity is relevant to an SDG goal, and then know to look at that activity’s transactions for statistical analysis of the goals* (as the SDG goals could be used there also) , or results for analysis of the activity’s outputs relative to the SDG indicators.

The current obstacle

The sector element assumes that classification is statistical, i.e. that aggregation works with the sectors.

For instance, if I publish an activity that spans two sectors, I am expected to include a percentage split, so that the funds can be split amongst the sectors and, for instance, a pie chart can be drawn. All of the codes within a given vocabulary must add up to 100%.

This doesn’t play nice with the use-cases outlined above.

The proposal

To add an @aggregation-status attribute to secondary sector elements, which defaults to true, but when declared as false indicates that this instance of a sector should not be included in aggregations. This would work in the same way as the result/@aggregation-status.

Rules and guidance should reflect that:

This only applies to non-primary sectors
This absolutely does not apply to DAC CRS purpose codes

Status Quo

<sector vocabulary="DAC" code="31140">
    <narrative>Irrigation And Drainage</narrative>
</sector>
<sector vocabulary="99" vocabulary-uri="http://example.vocabulary.com/tag-registry/" code="c_10967" percentage="40">
    <narrative xml:lang="en">food security</narrative>
</sector>
<sector vocabulary="99" vocabulary-uri="http://example.vocabulary.com/tag-registry/" code="c_37836" percentage="30">
    <narrative xml:lang="en">capacity building</narrative>
</sector>
<sector vocabulary="99" vocabulary-uri="http://example.vocabulary.com/tag-registry/" code="c_24935" percentage="30">
    <narrative xml:lang="en">off season cultivation</narrative>
</sector>

Proposed

<sector vocabulary="DAC" code="31140">
    <narrative>Irrigation And Drainage</narrative>
</sector>
<sector vocabulary="99" vocabulary-uri="http://example.vocabulary.com/tag-registry/" code="c_10967" aggregation-status="false">
    <narrative xml:lang="en">food security</narrative>
</sector>
<sector vocabulary="99" vocabulary-uri="http://example.vocabulary.com/tag-registry/" code="c_37836" aggregation-status="false">
    <narrative xml:lang="en">capacity building</narrative>
</sector>
<sector vocabulary="99" vocabulary-uri="http://example.vocabulary.com/tag-registry/" code="c_24935" aggregation-status="false">
    <narrative xml:lang="en">off season cultivation</narrative>
</sector>

Arguments against

This changes the way people will think about sectors

Outline: tool builders and data users currently assume that any sector elements of the same vocabulary will add up to 100%. This change would defy that assumption and cause inconvenience or confusion.

Initial response: most changes which would allow for above use cases would require some changes which would impact current users. This is really an issue of priorities.

Moral hazard for publishers in their primary sector code?

Outline: The schema doesn’t allow for one rule to be applied to a primary sector code and another to a secondary one, so this change might allow publishers to put this aggregation-status flag into their DAC purpose codes, allowing for worse quality data.

Initial Response: The schema already allows for bad or misleading data, and we have to assume that publishers adhere to rules and guidelines, we shouldn’t constrain the capability of all publishers because we don’t trust some of them. Instead, we should introduce more robust validation and more immediate feedback mechanisms.

Alternatives considered or suggested

Add a new element i.e. sector-tag which uses the same vocabularies as <sector>, but acts as a content tag

This is potentially a more pure solution, but requires the addition of a new element to the standard, for which there is little appetite, and which would require data users to deal with another element instead of just another attribute.

This isn’t a major drawback though, so this option could serve as an alternative if there is consensus that the solution proposed above is untenable.

Using policy-marker

This would be misleading for several of the anticipated usecases. Particularly for classifications which deal with concepts like crops, natural disasters, or much sector purposes which can’t be summed, but share a codelist.

In Sum

Our aim is to start a conversation around this, and we’re not wedded to the proposed fix. We want it to be clear that there are legitimate needs for a non-statistical classification which links out to codelists.

Standards Day Proposal: Accommodating non-statistical, secondary sectors

Standards Day Proposal: Accommodating non-statistical, secondary sectors

Key Facts

TL;DR

The user needs

1. Activity search

2. Non-statistical / SDG classification

The current obstacle

The proposal

Arguments against

This changes the way people will think about sectors

Moral hazard for publishers in their primary sector code?

Alternatives considered or suggested

In Sum

Comments (2)