One problem with the expectation that the cultural sector will publish lots of Linked Data is that most (if not all) of our current data consists of string values held in traditional databases. I have discussed before how we need software support to convert these string values to Linked Data URLs, either as a one-off operation within the source database, or on the fly as part of the Linked Data publication process. Given that many Linked Data resources whose URLs we might want to use will offer a SPARQL endpoint, it would be nice if we could use such an endpoint directly to enhance our data.
Looking at this problem in the specific context of the Modes software, we have the option of setting up “web termlists”, which treat an external web resource as an authority file. The interface to this resource includes a URL pattern which can be used to query for concepts matching a given string. The assumption is that HTTP requests to this URL will return an XML response. An XSLT transform converts the returned XML into a form of XML which can be stored locally in a Modes data file. In the past we have used the “web termlist” technique to interface to resources like Geonames, which have a reasonably simple query syntax.
One “SPARQL challenge” came about while making the standard Modes termlists into Linked Data resources. The British Museum materials termlist has been published by the BM as a SKOS ontology as part of their online collections data, and they suggested that instead of creating a parallel Modes file, we should simply access their data directly.
The first job was to work out how to query the BM data so as to retrieve just materials termlist concepts, and how to retrieve useful information. This was tackled by going to the search box for the SPARQL endpoint and hitting it with queries until something useful came back. Three key learning points came out of this exercise:
- the CONSTRUCT command gives you a useful subset of the original data; the standard SELECT command is pretty useless
- filtering on the required SKOS ontology simply required a ?s skos:inScheme <http://collection.britishmuseum.org/id/thesauri/material> clause in the SPARQL query
- support for FILTER and REGEX is required if you want to search the data for string matches: FILTER regex(?term, “^agave”)
This is the SPARQL query pattern which I eventually used:
CONSTRUCT { ?s ?p ?o } WHERE
{
?s ?rel ?term .
?s skos:inScheme <http://collection.britishmuseum.org/id/thesauri/material> .
?s ?p ?o
FILTER regex(?term, "^***")
}
(where “***” is replaced by the user’s search term). This web termlist allows Modes users to record the string value of a material in their data, and then quickly look up and include the corresponding BM Linked Data identifier:

The second SPARQL endpoint which I wanted to access is the Ordnance Survey postcode resource. The reason for this is that web resources such as Historypin require geolocation information (latitude/longitude) for uploaded resources. Modes users want to be able to contribute to such web resources. However, typical Modes local history data might include postcodes, but certainly won’t have lat/long coordinates. The Ordnance Survey postcode data includes lat/long and NGR coordinates for the centre of each postcode area. So, by making a link to the OS data, Modes users can get the required coordinate information “for free”. This is a good example of how cultural history institutions can get added value from adopting a Linked Data approach.
Looking up postcodes is more straightforward than materials keywords. The correct form of a postcode is simple and well-understood, so there is no need for a SPARQL “search”, just a lookup of the OS data. This URL pattern looks up the skos:notation property of the postcode, then returns all triples of which it is the subject:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
CONSTRUCT { ?s ?p ?o }
WHERE { ?s skos:notation "***"^^<http://data.ordnancesurvey.co.uk/ontology/postcode/Postcode> .
?s ?p ?o . }
(Again, “***” in this pattern is replaced by the postcode, e.g. “RH15 8JA”.) The only complication here is that the skos:notation has a specified datatype, so this has to be specified in the SPARQL query. When a postcode is selected from the web termlist, a new record is created and stored in a local Modes data file, using the Place application:

Because this cached copy of the data is stored locally in a linked Modes data file, and in an XML format which is compatible with other Modes place data, the lat/long coordinates which it contains can easily be accessed and used in views and reports. In particular, they can be included in a report which generates output for “bulk load” into Historypin (our original objective).
These experiments demonstrate that if data is published in Linked Data format with a SPARQL endpoint, it is quite possible to use this as an “API” to access and use the data in a variety of ways. In particular, we can use the “web termlist” approach to generate new Linked Data connections to existing resources, and so enrich the developing cultural history Linked Data environment.