Data use observation: people go to the documents

Moderators

No moderators on this discussion

(discussion started at Members’ Assembly 17)

To contribute to the focus on people using IATI data, I wanted to provide some observations from supporting people to do so. Hopefully this is useful. I’d suggest that we try to refrain from getting into a solution immediately, but try to gather insights on how people do actually use IATI

A first observation relates to documents. Specifically, this is about the log frames, evaluation strategies, reports and such that are often linked to IATI activities. DFID, for example, publish links to many documents via their IATI data: here’s a random example.

My observation is really quite simple: when using interfaces such as devtracker of d-portal, people often end up at these documents.

People usually do this after a search and filter process: “show me the Agriculture activities in Tanzania”. On receipt of a list of relevant projects, people then scan through, check, and hone in on ones they are interested in. On landing on a page about one of these activities, then the documents are a hotspot for attention. Often, people are both surprised and engaged by the access to these documents: it seems to make transparency more evident than the numbers and dates we more readily associate as data.

And, thats it! It’s nothing groundbreaking, but worth noting.

What does this mean for the concept of data use and IATI? Perhaps this presents the community with a couple of challenges:

1 - would we consider people accessing documents via IATI data to be data users?
2 - and if documents are useful, then would the publishing them instead of data be acceptable (it’s possible to point to a results document, for example, without publishing results data , for example)?

Underneath all this is probably a discussion about the relevant merits and uses of quantitative and qualitative information. I want to stress, however: this is an observation from people using IATI data…

Has anybody witnessed the same?

Comments (23)

Aria Grabowski • 6 years ago

I think frequently the documents provide more information and are easer to read and comprehend than the long list of results that frequently get jumbled in d-portal so at times portions of the results don’t make sense at all. I think the biggest challenge with the documents that sometimes have the best and most useful information is that it’s very time consuming to search through them and find the information that has what you are looking for.

Anonymous • 6 years ago

At ZZ we have been extracting data from external documents, enriching search and context analysis. not implemented in any front-end yet, but available in the OIPA API.

Vincent van 't Westende was this released into the Master version yet? Or still in development branch?

Vincent van 't Westende • 6 years ago

The ‘search in document-link texts functionality’ It is on the master branch of OIPA. We don’t have the documents indexed on any of the instances we host though. We did try that once and the disk was full after an hour, filled with 500+ mb PDFs.

If anyone’s wondering how it performs; indexing the text of the text documents (~620k documents atm) takes long but once indexed it should be quick to search through. For example, retrieving all activities that have a document attached that names the word ‘cashew’ should at most take a few seconds.

Andy Lulham • 6 years ago

VincentVW:

We don’t have the documents indexed on any of the instances we host though. We did try that once and the disk was full after an hour, filled with 500+ mb PDFs.

[sorry to be a bit technical, but…] If storage is an issue, can’t you download, index, dump? Surely each PDF only needs to be stored until indexing of its contents is complete?

Vincent van 't Westende • 6 years ago

[np, I’ll try to keep it to a short tech answer haha] True, Its not really a problem, we just need to up the storage if we want to enable this. The upsides to storing the documents are that we:

Don’t have to re-download the document when we improve our document indexing capabilities.
Can do checksum comparison when updating the indexes (only download the document, if it didn’t change, don’t update the indexes, better for performance) [as Andy pointed out, this is not a valid reason and we actually store the checksum]
can host the files and be a mirror for them if ever necessary? Not sure if that is ok to do, definitely not the primary reason.

Steven Flower • 6 years ago

Polite reminder > Andy Lulham Vincent van 't Westende [~379]

stevieflow:

I’d suggest that we try to refrain from getting into a solution immediately, but try to gather insights on how people do actually use IATI

I know you’re keen

Andy Lulham • 6 years ago

stevieflow:

Polite reminder […]

Absolutely – you’re right. It’s okay, we took it to twitter.

matmaxgeds • 6 years ago

From the research I have done on in-country data use, this was also absolutely the case, people needed very basic data to find/identify a project, and then for anything more complex, either wanted the documents, or the email address of the project manager.

I think this has implications for IATI development i.e. even if we keep adding more features, can we realistically expect the standard to compete with the linked documents e.g. for M&E, or results tracking, or descriptions of the target populations, or even for the detailed locations. I think maybe not.

Steven Flower • 6 years ago

matmaxgeds thanks. Yes. As the data standard becomes more complex in scale and implementation, the observation that people value the narrative documents seems “inconvenient”. As Aria Grabowski rightly adds, these documents then need time to digest and comprehend.

I guess my original observation was simply: this is a thing. If we want to talk about data use, then we should be prepared for the fact that much of this may be right-clicking and saving PDF documents. And (for the tooling folk): perhaps we can respond to these user stories.

Aria Grabowski • 6 years ago

It was just pointed out to me that USAID added links to evaluation documents and like the world bank project appraisal documents they are long. Potentially good info that is actually helpful and useful, which just makes me want a tool to search and filter so that data is the trifecta of available, useful and accessible. On my first thought that tool would let you search for services provided (results of all kinds) and subnational locations (in addition to the search criteria that is available already) before opening the doc, so it can be a way to find the projects that meet the criteria you are looking for. Maybe I am dreaming and asking for too much, but if it’s possible maybe there can be a way to figure out how to make it happen??

Yohanna Loucheur • 6 years ago

ariag:

On my first thought that tool would let you search for services provided (results of all kinds) and subnational locations (in addition to the search criteria that is available already) before opening the doc, so it can be a way to find the projects that meet the criteria you are looking for.

Subnational locations would be taken care of in project data, ideally (can’t wait to test the auto-coder from OpenAg!). So you’d use the geolocation data to identify projects that meet your geo criteria, then assume the documents will be relevant (rather than identifying projects via documents).

Matt Bartlett • 6 years ago

Just a very quick update/fyi on this - Shi noticed this conversation/other feedback about documents in IATI and made some subtle changes to make the document links on an activity page (such as this one) a bit more prominent in d-portal. A small change but hopefully useful for users interested in an activity’s docs!

matmaxgeds • 6 years ago

Matt Bartlett

Matt:

subtle changes to make the document links on an activity page

very cool! Maybe it can be made collapsible e.g. give a taste and click to see the rest - in case adding documents really takes off and projects have hundreds!

Steven Flower • 6 years ago

… A hidden aspect of document links could be that it’s feasible for a publisher to create a very basic IATI activity (just observing the minimum for the schema) and then add narrative, budget, results and conditions (for example) as PDFs … the IATI Document Category codes would support this.

Would people consider that against the spirit of IATI , or welcome transparency?

(side note: apparently, PDF is now 3-star open data - via Rory Scott Andy Lulham )

Matt Bartlett • 6 years ago

matmaxgeds there’s the option to show/hide various aspects now on the individual project pages, including the list of documents - thanks for the suggestion (useful on pages like this! http://d-portal.org/ctrack.html#view=act&aid=SE-0-SE-6-7100174403-BIH-15150)

matmaxgeds • 6 years ago

Matt Bartlett ace, thanks (and nice choice of example project!)

Steven Flower • 6 years ago

wow

Might be useful to have a count and some indication of the category…

Yohanna Loucheur • 6 years ago

matmaxgeds:

Matt Bartlett ace, thanks (and nice choice of example project!)

+1!

Also found this one interesting, as an example of project with lots and lots of documents. (I came across it in relation to another discussion with matmaxgeds - related projects reported by GAC and DFID, I want to start a thread on comparing the 3, both in terms of data published and presentation on portal)
Openaid.se Openaid.se

Openaid.se is a web-based information service about Swedish aid built on open government data.

matmaxgeds • 6 years ago

Yohanna Loucheur - yes, please start the thread, in addition to our discussion, I think that different different IATI portals showing different data is going to start being a v. serious problem, true for documents, but v. bad for numbers - which is the ‘right one’? Or do you have to start quoting the source portal when telling someone to ‘get it from IATI’.

Steven Flower • 6 years ago

matmaxgeds:

I think that different different IATI portals showing different data is going to start being a v. serious problem

Im not sure I fully understand from the examples cited (d-portal | open Sida) - but look forward to the new thread!

Yohanna Loucheur • 6 years ago

Slightly edited my previous message, hoping it’s a bit clearer why I posted the SIDA example.

In terms of the 3 portals showing different things, it’s totally normal - they are different projects, one isn’t “more true” than the other. But each is missing some useful data, so that’s interesting to compare and contrast. They also present the information in very different ways - again interesting to compare, especially for those of us trying to improve presentation tools. Would be great to have user feedback (hence the thread, hopefully later today).

Steven Flower • 6 years ago

Thanks Yohanna Loucheur

Yes, interesting how different portals use the data. I think that could be a whole new Data Use Observation thread - agree? Could include screenshots too!

Yohanna Loucheur • 6 years ago

stevieflow:

Yes, interesting how different portals use the data. I think that could be a whole new Data Use Observation thread - agree?

Indeed: Data on pooled funding - a case study

Please log in or sign up to comment.