Publishing and updating process - checking my understanding

(Richard) #1

Dear all, we are just starting to publish our activity data and I want to check that I understand the process correctly:

  1. I ‘publish’ our activity data by registering a data set at which links to a file that I store at a url of our choice

  2. When I want to update our activity file I just replace the file at our url with a new activity xml file. There is no need to re-register it

  3. People who want to consume the data have to either read it from all of the individual xml files scattered around the web or alternatively pull it from another source that has already done so. The iati registry itself does not collate the data

If the three statements above are correct then I have a couple of extra questions:

A. How often is our xml file read or does that depend on who is doing the reading? I’m imagining that whoever is polling the file just looks at the file datestamp to see when there are changes

B. When I initially register the dataset there is a ‘Date Updated’ field but I don’t see what that is for if I only have to register the data once. Do we actually need to complete that?

Hope that makes sense. I’m trying to pull together bits of info. from various places to get the process down.

(Andy Lulham) #2

All makes sense, and all excellent questions, @RichPepp!

The three statements above are indeed correct!

It indeed depends on who’s doing the reading. Most services update daily. But for instance, the IATI Dashboard updates every three days (see the FAQ).

There are a number of timestamps! Below is a list of some:

  • The XML root node has a @generated-datetime attribute. It’s highly recommended that this is used, though its meaning (and therefore its usefulness) isn’t well-defined.
  • Each repeated iati-organisation and iati-activity has a @last-updated-datetime attribute. According to the documentation:

    This date must change whenever the value of any field changes

  • The registry stores data_updated in the dataset metadata (shown as “IATI data updated”), which is updated ~daily to equal the most recent of these @last-updated-datetimes. This would be a very useful reference, but note that this is handled by the registry archiver and this process has been very unreliable in the recent past. The tech audit recommended updating this metadata from outside of CKAN.
  • The registry also stores metadata_modified in the dataset metadata (shown as “IATI registry updated”). This is unrelated to data modifications – I think it records the most recent manual metadata update.
  • The server responding to the data request will itself include various dates in the response headers. It would be cool if publishers could guarantee that these response headers were meaningful or consistent, but I’m not sure about guidance or consensus on this.

No, you don’t need to complete that. It’ll be automatically populated. The automatically populated fields shouldn’t really be editable by the publisher – it would be cool if the registry made them readonly.

(Richard) #3

Awesome reply, thank you. That is completely clear

This date must change whenever the value of any field changes

Excellent, I’ll use that. I had been looking through other published datasets and some seemed to use it more as a generated-datetime rather than last-updated-datetime. last-updated works well for me and will fit nicely with our checking workflow

Thanks again