I started to compile a list of library news from 2011 and was sidetracked into thinking about the various crowdsourcing projects that have come about over the last couple of years.
Galaxy Zoo was first released in 2007 and while they expected it to take two years to classify the million galaxies, within 24 hours they were receiving 70,000 classifications an hour. The Galaxy Zoo project has now branched out into the Zooniverse collection of crowdsourcing projects – currently 11 of them – mostly astronomy focused but now entering into climate change, biology, and the humanities. We will be launching our own Zooniverse project – What’s the Score at the Bodleian? – this year.
As libraries and archives saw the power of crowdsourcing they jumped on board and began releasing some projects of their own. The most well known of these from the .edu/.ac world is probably Transcribe Bentham. Released in April of 2010 with the goal of transcribing a portion of Jeremy Bentham’s papers, they are now 42% of the way into the 5,580 manuscripts uploaded to the website. Like a lot of digital projects, though, Transcribe Bentham only had funding for one year and they are now searching for was to remain sustainable.
Nonetheless, crowdsourcing has spread in the library world and in 2011 a number of new projects have been launched. NYPL’s What’s on the Menu? has been quite popular, and with Digital Koot, the National Library of Finland ‘gamified’ the type of newspaper OCR correction that the National Library of Australia started in 2009. And just this month the US National Archives released their Citizen’s Archivist Dashboard for transcribing and tagging digitized archives.
I think we are likely to see even more library-based crowdsourcing projects in 2012, but I would like to have an honest conversation about the successes and failures of these endeavors. I think we need to discuss what we really want out of these projects and whether they are worth the money. By some accounts (although I can’t find the citation now), UCL could have hired a team to transcribe Bentham’s manuscripts and for all of the money they have spent to date they would be further along. However, if their goal is not just to transcribe Bentham’s papers, but to build a community of people around the archive—to create amateur and young Bentham scholars, then that is a different kind of success.
Looking around the world of crowdsourcing (both online and offline) I think there are some important lessons that libraries need to pay attention to in order to engage users in these projects.
1. Stop being obsessed with context. Library/archive projects tend to provide users with an artifact—a manuscript or page—and ask people to transcribe it. But this can feel a lot like work. Some of the most successful projects I have seen, namely DigitalKoot and reCAPTCHA do just the opposite – they provide snippets or words for people to transcribe. This allows the tasks to be completed completely out of context, for the former as a game and the latter to prove that you are human. If the goal of the project is quantity of transcribed materials (rather than community building or engagement) I think this is important.
2. Reward your users. It doesn’t take much. I think the Zooniverse projects do this nicely. They tell you right up front how your work will be of help, and as the participants of Old Weather transcribe shipping logs they progress up the ranks of the ship to Captain. You can also loose your Captain status if you don’t stay active. In the Original Galaxy Zoo project, each of the contributors was also named as a co-author in one of the resulting papers and active participants continue to be occasionally listed as co-authors. One of the rewards from the NYPL’s menu project is that the data is made available to the public, and they encourage its re-use.
3. It’s ok to fail. As I mentioned above, there are reasons beyond just getting the data to do a crowdsourcing project. And sometimes you will get things you would never expect. In 2010 the GAP company famously released a new logo that was universally hated. They quickly tried to turn lemons into lemonade by ‘crowdsourcing’ the design of a new logo. You just can’t buy the sort of publicity this whole discussion generated. Chiquita banana let the public both design and vote on their stickers. This may not have sold more bananas, but my guess is that it built community around their product.
4. Remember that crowdsourcing is a form of engagement, and it can happen offline as well as on. The Tate Britain does this very well with their ‘Your Collection’ pamphlets (which are also online but I first saw them in the museum and so think of the analogue version.) They have pamphlets scattered around the Tate that provide interesting paths through the collections (e.g., the ‘rainy day’ collection or the ‘I’ve just split up’ collection). They then have blank pamphlets that allow you to build and name your own collection. Lovely, simple, and analogue.
So either 2012 will be the year of crowdsourcing in libraries and archives, or it won’t. If we are lucky, we will get some well-described and/or transcribed library collections. But if we are successful, we will engage more users than we did last year and build communities around some of our library collections.