Modify definition of secondary publisher (included 2.03)

activity_standard
2-03_2ndry_publishers

(IATI Technical Team) #1

This proposal is part of the 2.03 upgrade process, please comment by replying below.

Standard
Activity

Schema Object
reporting-org/@secondary-publisher

Type of Change
Schema modification

Issue
The concept of a “secondary publisher” was introduced into the standard to allow for organisations that collect data on aid activities from a number of actors (such as UN OCHA’s Financial Tracking Service, an Aid Information Management System, or bodies representing NGOs or foundations) to publish this data to IATI. Much, but not all, of this data may already be published to IATI by the primary source and so the @secondary-publisher flag is a warning to users to understand this context. This was never intended to cover situations where an agency (such as Swedish SIDA) officially publishes data on behalf of another related institution (such as the Swedish Ministry of Foreign Affairs).

Proposal
Modify the definition of reporting-org/@secondary-publisher

  • From A flag indicating that the reporting organisation is a secondary publisher: publishing data for which it is not directly responsible. This flag must not be reported by primary source publishers.
  • To A flag indicating that the reporting organisation of this activity is acting as a secondary publisher. A secondary publisher is one that reproduces data on the activities of an organisation for which it is not directly responsible. This does not include a publisher officially assigned as a proxy to report on behalf of another.”

Standards Day
There was confusion about what is meant by the term ‘secondary publisher’. The use of the term ‘proxy’ was suggested to clarify where publishers report ‘officially’ on behalf of another organisation. (eg Swedish SIDA and Ministry of Foreign Affairs)

Links


Secondary Publishers: A Manifesto! ...and some honest questions
(Herman van Loon) #2

The use case described here for secondary publisher, assumes that the IATI data may already have been published by the primary publisher. If that is the case, why republish the IAT data again, conflicting with the ‘Publish once, use often’ rule? Couldn’t this be solved by introducing a rule that secondary publishers ONLY publish activities which have NOT been published by the primary publisher? In this way, also the uniqueness of IATI activity identifiers can be guaranteed.


(Graeme Jones) #3

any secondary publisher that commits significant unfunded effort to surface primary publisher open data as IATI datasets would need to reconsider that a primary publisher could simply “adopt” the datasets (download, retag and upload) without acknowledgement/engagement, not upload any new datasets or stop at the next budget/election, and simply marginalise the secondary publisher with the net result efforts are stopped


(IATI Technical Team) #4

This topic has been included for consideration in the formal 2.03 proposal


(IATI Technical Team) #7

Notes from consultation calls w/c 3rd July

Discussion:
There were some questions around double counting; the difference between ‘secondary publisher’ and a ‘proxy’ was explained
There was agreement that more use cases would be good in addition to a clearer definition.
IATI tech team to provide more examples of secondary publishers and more use cases on what information is being published by existing secondary publisher.

Outcomes:
The proposal was reviewed by those on the call and there was no objection from the group.


(Yohanna Loucheur) #8

According to yesterday’s update, this modification is listed as having consensus.

This status is not clear to me from the above. If people asked for more examples of secondary publishers and use cases, does it not mean the proposal is not fully supported (or at least fully understood)?


(Herman van Loon) #9

I agree with @YohannaLoucheur. It seems no harm is done if we take some more time to find out what is the business case.

If someone Is of the opinion that harm is done and explains why, we might have our business case.


(Bill Anderson) #10

I think the use case is clear.

Swedish SIDA publishes another institution’s activities but is NOT a secondary publisher - because it is mandated to publish the Ministry of Foreign Affairs activities as primary data. The current definition makes no distinction between this case, and publishers such as FTS or AidData.


(Herman van Loon) #11

Ok, from that perspective the definition change is fine with me.

For my understanding: does the definition of secondary publisher mean that when an activity is published by a secondary publisher, it has already been published by the primary publisher AND that a different IATI identifier is used by the secondary publisher than by the primary publisher?

Cannot find any guidance in standard about this topic, but maybe I am missing something.


(Bill Anderson) #12

No.

  • OCHA’s FTS publishes ALL its data irrespective of whether the primary source publishes to IATI.
  • The US Foundation Center has recently published “$4.3 billion worth of grants from nearly 1,900 funders to more than 3,000 organizations around the world”, but “to avoid duplication of data on the IATI Registry, we have removed funders already publishing to IATI from our IATI data”

Yes

NB that on Standards Day there was another proposal …

Add attribute reporting-org/@secondary-unique
“A flag indicating that this activity, reported by a secondary reporter, is not reported to IATI by a primary publisher and can therefore be expected to be unique.”

… which wasn’t taken forward.


(Herman van Loon) #13

@bill_anderson Maybe not for the 2.03 discussion, but would it be an idea to use exactly the same IATI identifier as the primary publisher, when you are republishing an activity as a secondary publisher?

Now there is no way to find out that the same activity is published twice with different IATI identifiers, leading to double counting. Doesn’t that violate a core IATI principle: publish once, use often? When using the same IATI identifier, you can at least identify that an activity is a duplicate.


(Yohanna Loucheur) #14

Just to clarify, I assume you mean using the same Activity ID as the primary publisher?

Would support this 1,000%. In fact, I naively assumed this was the case… Not using the same Activity ID should be a cardinal sin of republishing.


(Andy Lulham) #15

Reusing the same IATI identifier – albeit for the same (republished) activity breaks a standard ruleset rule:

It MUST be globally unique among all activities published through the IATI Registry

I’d suggest instead a new RelatedActivityType. But I’d also agree that this is out of scope for 2.03 discussions.


(Herman van Loon) #16

@andylolz Yes I see your point. The concern is about the duplication of activities with all the risks of inconsistencies and double counting. Your suggestion to add a new RelatedActivity type might help at data use time to identify such duplications.

@reidmporter started a discussion about this subject in the
community zone


(Herman van Loon) #17

@YohannaLoucheur Yes, that is what I meant. @andylolz though has a valid objection against using the same identifier when republishing.


(Yohanna Loucheur) #18

In fact, the problem isn’t so much the need for a globally unique iactivity identifier - it would remain unique - but the requirement that the activity ID start with the reporting org ID even in the case of re-publishers. (Side note: we should distinguish secondary publishers and re-publishers. This issue arises with re-publishers, not secondary publishers.)

Let’s say we publish an activity CA-3-D123456, and this activity is also reported/pulled to FTS. If FTS publishes the exact same activity again, they could name it CA-3-D123456. Why should they rename it some random FTS number in IATI data if it’s the same activity and this activity was already published in IATI under CA-3-D123456?

This rule seems to be the source of the problem:
“This MUST be prefixed with EITHER the current IATI organisation identifier for the reporting organisation (reporting-org/@ref) OR a previous identifier reported in other-identifier, and suffixed with the organisation’s own activity identifier.”

Should this rule be relaxed in the case of re-publishers?


(Herman van Loon) #19

@YohannaLoucheur: for my understanding, what is in your view the fundamental difference between a secondary publisher and a republisher? Don’t they both use existing IATI data, modify or add to these data, and publish the modified data as IATI again?


(Yohanna Loucheur) #20

I would suggest that a secondary publisher is publishing data from organizations that don’t publish IATI data themselves - like Interaction or US Foundation (per examples provided by Reid and Bill). This creates low/no risk of double-counting.

Whereas republishing involves taking data from IATI and publishing it again, like FTS. In some cases they may add content to it (like could happen for instance if someone adds detailed agriculture codes, or geo locations), but for the most part it’s data already available in IATI format - hence high risks of double-counting.


(Herman van Loon) #21

Given this definitions, wouldn’t this suggest that:

1 - ‘Secondary publishers’ should use the organization prefix of the primary publisher in the activities id’s. There is no risk for publishing the same activity twice, since the primary publisher does not publish themselves. The secondary publisher is nothing more than a administrative service provider.

2 - ‘Republishers’ (e.g. FTS) should NOT reuse the already published activity identifiers of the primary publishers, since that would cause confusion about who is the original data owner. It would also introduce great risks for double counting. Republishers should additionally ALWAYS mark an activity as ‘Republished’ and preferably refer back to the original activity with the related activity type, as suggested by @andylolz . This would enable data users to easily distinguish between original data and republished data.

I am not sure though if the proposed definition of a ‘secondary publisher’ according to @bill_anderson matches with your definitions above.

Maybe the use of the term ‘secondary publisher’ is too confusing. Wouldn’t the terms ‘original publication’ and ‘republication’ be a better way to describe the status of the data? It looks more important to know if you are dealing with the original data or the reprocessed data, than to know that you are processing the data of someone else.


(Andy Lulham) #22

@IATI-techteam: The name of the attribute in the standard is @secondary-reporter, not @secondary-publisher (ref). Could the proposal be amended to reflect that?

@YohannaLoucheur: If both publishers use the same IATI identifier then it is not globally unique. I expect this global uniqueness is core to various systems, so I think it would likely be problematic to relax that (i.e. by allowing/encouraging republishers to use the same identifier). As an example: If I were to look up the activity on d-portal, what would I expect http://d-portal.org/ctrack.html?#view=act&aid=CA-3-D123456 to show? Should it amalgamate all information from all publishers of iati-activitys with that IATI identifier? I can anticipate problems with that.

When a new reporting org starts publishing, a ‘secondary publisher’ (of the reporting org’s activities) becomes a ‘republisher’. That change is outside of the control of the secondary publisher. So I don’t think we should expect the secondary publisher/republisher to declare which one they are, because I suppose that information could easily become inaccurate.