The GB1900.org project – first look

I found out about GB100.org from Twitter, and went across to have a look.  Well, more than just a look: I ended up spending the evening adding entries for my locality.  It gets to be surprisingly addictive when the places you are logging are ones that you know.

The aim of the GB100 project (a collaboration between Portsmouth University’s ‘Vision of Britain‘ project and a number of Welsh and Scottish institutions – no English ones, I notice, and the map scans actually come from the National Library of Scotland) is to crowd-source the transcription of place names and other features from the OS 6″ to a mile maps of around 1900.  This will provide an early-modern resource to complement the OS 50K Gazetteer (which is being withdrawn anyway), and will offer an order of magnitude more detail.  The results will be available under a Creative Commons Zero (CC0 1.0 Universal) licence.

After a simple signing-on procedure, you are let loose on a map of England, Wales and Scotland which overlays (if you zoom in and look carefully) a modern OpenStreetMap.  You can choose to start wherever you like (though there is an option to find your current location on the map).  You can either ‘tag’ names yourself, or verify names that have already been tagged by someone else.  Once a name has been entered the same way twice, it is marked as ‘done’.  You get a simple dialogue to enter the name:

gb1900-1which also allows you to record alternative names for the place, your own memories, etc.  It’s a great idea, but I think it could be even better.

Where’s the feedback?

My first, and main, gripe is that this looks suspiciously like yet another crowd-sourcing site which steals your work and hides it away.  Yes, you see pins appearing on the map where a name has been transcribed, but you never see the transcribed data (or any memories or alternative names, where these have been recorded).  They say the transcribers’ work will be released under CC Zero, but … when?  And how?

This strikes me as a massive missed opportunity.  Why isn’t there a live list of the most frequently entered names?  Or a search facility, with the results being displayed on a map view?  If transcribers could see the results of their work appearing in real time, they might be more inclined to continue transcribing.

As well as place names, there are physical features like fords and building types such as smithies.  If these were searchable, it might encourage the recording of what might otherwise look like trivial details, given the daunting scale of the overall project.

Quite apart from wanting feedback as a transcriber, I want access to the data as a user of historical data.  I want it now, and I want it as Linked Data.

Only one chance

Once an entry has been entered and confirmed, it can’t be accessed or updated.  This means that if someone did have an alternative name for a place, or an interesting story about it, they won’t be able to add this if the name has already been entered.  This is surely a design fault.

More help; more precision?

I assume that the coordinates that are recorded when we click on the first letter of a name will be recorded as part of the data to be released.  However, that doesn’t necessarily help with locating the actual feature which the name describes.  Couldn’t there be a feature where you can indicate (by a directional arrow or by more clicking) the centre or end-points of the feature described by the name?

In a similar spirit, names on these maps describe entities of different types and at different levels.  An option to indicate the type or level of name would potentially add considerable value.

Common features such as footpaths (F.P.) crop up with such regularity that there must be a case for providing a key to them.  I’m assuming, for example, that ‘W’ means ‘well’, but I could be wrong.

Providing more feedback and usable data from the start would, I think, vastly improve the prospects of this useful and ambitious project.

1 thought on “The GB1900.org project – first look”

  1. I am the person who originally proposed what is now the GB1900 system, and am leading the English end of it — but in a sense I agree with most of Richard’s criticisms and we certainly need to make data gathered by the system available more or less as it comes in (see below). As always, you need to understand the history.

    In May 2011, a meeting at the National Library of Wales in Aberystwyth discussed how to move forward a Welsh equivalent to the English Place Names Survey, but without taking quite so long (they started in 1923, but have still not begun work on some counties; see http://www.nottingham.ac.uk/research/groups/epns/survey.aspx). I suggested they focus on the first stage of the EPNS methodology, which is to gather all the place names on early Ordnance Survey six inch maps, but accelerate it through crowd-sourcing, and by working with geo-referenced images of the maps. The welsh partner organisations — the Royal Commission on the Ancient and Historical Monuments of Wales, the University of Wales Centre for Advanced Welsh and Celtic Studies, the National Library of Wales and the People’s Collection Wales — then obtained limited funding from the Welsh Assembly and commissioned developers associated with the Zooniverse project to develop software, which worked with geo-referenced images the Royal Commission had already licensed from Landmark Information. The Cymru1900 system launched in the autumn of 2013:

    http://www.cymru1900wales.org/

    Meanwhile, I was trying to get funding for a more ambitious system for historical gazetteer-building through crowd-sourcing. Where Cymru1900 and GB1900 create a new gazetteer from a single set of maps, here the main aim would be to find more and more instances of the places already in an existing gazetteer, on many different maps from a range of dates. The first stage of this was actually funded, by the Joint Informations Systems Committee/Jisc as the Old Maps Online project, which assembled information about how to access many tens of thousands of online historical map images. However, Jisc no longer offers project grants, and applications to other funding bodies failed, partly because they did not believe there would be wide popular interest in transcribing place names on old maps.

    By the summer of 2015 potential funding sources were pretty much exhausted. Meanwhile, Cymru1900 had succeeded in crowd-sourcing the transcription of nearly 300,000 welsh “place names” (as Richard points out, really text strings) but the software did not encourage users to do additional confirmatory transcriptions so there was little prospect of it being “finished” while the Assembly’s funding for cloud hosting was running out– and, of course, Cymru1900 was limited to Wales. Given that the National Library of Scotland had separately scanned and geo-referenced all the relevant six inch maps for Great Britain, and that my team had a server and a little developer time available, the potential for a new project was obvious and GB1900 was born.

    However, it must be emphasised that GB1900 is completely unfunded, each of the partners contributing available resources. The available developer time limited us to getting the system off the cloud and onto our local server; making it work with the Scottish map server; and tweaking the actual crowd-sourcing process to encourage users to add those confirmatory transcriptions.

    Despite the limitations Richard points out, GB1900 has two immense virtues over the crowd-sourcing system described in my AHRC “Technical Plan” in 2013: it actually exists and, despite its limitations, it really works: in the last week, over 60,000 new transcriptions have been added and, in particular, while under 4,000 “confirmations” were ever done in Cymru1900, over 20,000 have been added in the last week. Further, the infrastructure inherited from Zooniverse has proved its worth: at one point on Saturday, 7 different users were adding places in a period of 30 seconds, at a rate of 1 to 3 a second; and one user’s additions show up automatically in the browser of someone else working on the same area, without a refresh.

    We don’t have the resources to make additional major changes to the software, and given some of the software’s fundamental limitations it probably wouldn’t make sense to put a lot of effort in even if we had the resources. I suspect that the Citizen Science Alliance, the organisation behind Zooniverse and Galaxy Zoo, don’t expect their “citizen scientists” to want to download “their” particular set of galaxies, but local and family historians do want gazetteers of particular local areas.

    Longer term, we hope to add data gathered by GB1900 to our Vision of Britain system, both to enable online gazetteer searches and to permit area-specific downloads (if you go here you will find a lot of the necessary functionality: http://www.visionofbritain.org.uk/data/#tab03). In the short term, the system already generates a nightly dump of all transcriptions as a CSV file, but this includes lots of unconfirmed names, and information about who contributed them. We are sorting out how to create a simpler dump limited to confirmed names and the associated coordinates. My aim is to have that available by the end of October, and to update it at least monthly (until I have discussed this with our partners, that commitment has to be limited to places in England).

Comments are closed.