The W3C Library Linked Data group is preparing a draft report summarizing its conclusions and recommendations. This article offers some thoughts, from the perspective of a member of the museum community, which might (!) be helpful. Please feel free to quote or copy whatever is deemed to be of use.
At present the report focuses entirely on the library community, and on how its existing data and standards might be expressed as Linked Data. However, one possibility which the Linked Data initiative opens up is the seamless linking of information from related communities. I think it would be a lost opportunity (and would demonstrate a certain inwardness of world-view) not to discuss potential points of contact between library data and museum, archive and general literary and historical data resources.
One framework which has been widely used to aid interoperability of this sort is the CIDOC CRM. In the context of library data, the CRM has been used to model FRBR in a generic manner: FRBRoo. CIDOC is actively promoting the adoption of Linked Data, and is working on a set of guidelines for its adoption in the museum community. Meanwhile, the LOCAH project is doing analogous work in the U.K. archives community. Both would, I am sure, welcome discussions with library colleagues (as indeed will happen at the LOD-LAM Summit in June this year).
Reconsider what “library data” is, or could be
The focus of the group’s discussions has been mostly on how to express existing library data in a Linked Data framework. This has served to demonstrate how difficult a job it will be to re-engineer e.g. MARC records as Linked Data, and also to show how little practical help the FRBR framework is going to be, given the actual nature and state of most current library data.
The draft report mentions the challenges of opening up library data to other communities in a Linked Data world, though it only gets as far as the idea of sharing it with publishers. What about readers? From a reader’s perspective, much of the existing library data would be irrelevant. However, they might well appreciate description of the content of works which takes advantage of the possibilities offered by Linked Data. For example, providing subject “keywords” (URLs) which identify people, dates and places unambiguously: something that generic subject indexing schemes like Dewey and LCSH don’t attempt to do. This would allow cross-linking with resources like Wikipedia, and the possibility of very specific searching (e.g. “travel books describing Paris in the mid-nineteenth century”).
Scope for sharing, and the redundancy issue
I’m puzzled by the discussions that have taken place on data sharing and redundancy, and the feeling that this is a problem, rather than an opportunity. My understanding is that libraries invented the concept of shared cataloguing decades before the rest of the world, and built systems which supported it. I assume that ISO 2709 files on mag tape have been superseded by more efficient delivery mechanisms, and that the basic idea of shared cataloguing still holds sway?
With Linked Data, there is the prospect of implementing the “shared cataloguing” approach in real time, since a master catalogue record becomes a Web resource which can be dereferenced whenever it is required. Any update to that record will be instantly available, removing the problem of latency when updating from a central authoritative source. Standard caching techniques can be used to distribute master records for efficient access.
This seems like such a massive potential efficiency “win” for the library community that I would have expected it to feature in the report. Take Open Library, for example. If it succeeds in delivering a “one web page for every book”, and then publishing those pages as Linked Data, isn’t that a resource which all libraries could make use of?
This raises a related point about “sharing”: it isn’t necessary for the library community itself to create all the Linked Data that relates to books. Crowd-sourcing and other Web 2.0 techniques can be harnessed, as is already happening with projects like Galaxy Zoo and Old Weather, especially where the information to be gathered is more “popular” and less “technical” than traditional bibliographic descriptions.