Thoughts on library Linked Data

The W3C Library Linked Data group is preparing a draft report summarizing its conclusions and recommendations. This article offers some thoughts, from the perspective of a member of the museum community, which might (!) be helpful. Please feel free to quote or copy whatever is deemed to be of use.

Cross-disciplinary possibilities

At present the report focuses entirely on the library community, and on how its existing data and standards might be expressed as Linked Data. However, one possibility which the Linked Data initiative opens up is the seamless linking of information from related communities.  I think it would be a lost opportunity (and would demonstrate a certain inwardness of world-view) not to discuss potential points of contact between library data and museum, archive and general literary and historical data resources.

One framework which has been widely used to aid interoperability of this sort is the CIDOC CRM.  In the context of library data, the CRM has been used to model FRBR in a generic manner: FRBRoo. CIDOC is actively promoting the adoption of Linked Data, and is working on a set of guidelines for its adoption in the museum community.  Meanwhile, the LOCAH project is doing analogous work in the U.K. archives community.  Both would, I am sure, welcome discussions with library colleagues (as indeed will happen at the LOD-LAM Summit in June this year).

Reconsider what “library data” is, or could be

The focus of the group’s discussions has been mostly on how to express existing library data in a Linked Data framework. This has served to demonstrate how difficult a job it will be to re-engineer e.g. MARC records as Linked Data, and also to show how little practical help the FRBR framework is going to be, given the actual nature and state of most current library data.

The draft report mentions the challenges of opening up library data to other communities in a Linked Data world, though it only gets as far as the idea of sharing it with publishers.  What about readers?  From a reader’s perspective, much of the existing library data would be irrelevant. However, they might well appreciate description of the content of works which takes advantage of the possibilities offered by Linked Data.  For example, providing subject “keywords” (URLs) which identify people, dates and places unambiguously: something that generic subject indexing schemes like Dewey and LCSH don’t attempt to do.  This would allow cross-linking with resources like Wikipedia, and the possibility of very specific searching (e.g. “travel books describing Paris in the mid-nineteenth century”).

Scope for sharing, and the redundancy issue

I’m puzzled by the discussions that have taken place on data sharing and redundancy, and the feeling that this is a problem, rather than an opportunity.  My understanding is that libraries invented the concept of shared cataloguing decades before the rest of the world, and built systems which supported it.  I assume that ISO 2709 files on mag tape have been superseded by more efficient delivery mechanisms, and that the basic idea of shared cataloguing still holds sway?

With Linked Data, there is the prospect of implementing the “shared cataloguing” approach in real time, since a master catalogue record becomes a Web resource which can be dereferenced whenever it is required.  Any update to that record will be instantly available, removing the problem of latency when updating from a central authoritative source. Standard caching techniques can be used to distribute master records for efficient access.

This seems like such a massive potential efficiency “win” for the library community that I would have expected it to feature in the report. Take Open Library, for example. If it succeeds in delivering a “one web page for every book”, and then publishing those pages as Linked Data, isn’t that a resource which all libraries could make use of?

This raises a related point about “sharing”: it isn’t necessary for the library community itself to create all the Linked Data that relates to books.  Crowd-sourcing and other Web 2.0 techniques can be harnessed, as is already happening with projects like Galaxy Zoo and Old Weather, especially where the information to be gathered is more “popular” and less “technical” than traditional bibliographic descriptions.

This entry was posted in Linked Data. Bookmark the permalink.

One Response to Thoughts on library Linked Data

  1. It’s really valuable to have your thoughts, Richard (especially with the explicit rights to reuse them)!

    I agree with what you say — and especially with the point about reusing the data for readers. (On that note, I’ll plug a JCDL workshop that’s relevant: Semantic Web Technologies for Libraries and Readers http://stlr2011.weebly.com/ ) There’s lots of great data we already *have* — how can we make it serve readers?

    In your section about scope for sharing, and redundancy, you ask:
    “I assume that ISO 2709 files on mag tape have been superseded by more efficient delivery mechanisms, and that the basic idea of shared cataloguing still holds sway?” It’s meant rhetorically, I know.

    There are two issues: philosophical and pragmatic. First: while redundancy has many downsides, the upside (from the library/conservative perspective) is that “lots of copies keeps stuff safe”; we still need better architecture for localization (i.e. collection-dependent description) of LD. Second: it’s going to take some time before libraries adopt linked data catalogs in great numbers; figuring out how to make that inexpensive and drop-dead simple (to suit limited technical resources and small budgets) will take some time.

    I think the suggestions you make are likely to find greater voice in the report as it develops from its current (early draft) state!

Leave a Reply