Martin Poulter
Wikimedian In Residence at the University of Oxford

1. Introduction

DIY digitization is a great opportunity to expand the cultural impact of special collections, particularly when the results are shared on open communities such as Wikipedia. There must, however, exist a mutual understanding between the host institution and volunteer contributors. This means that institutions need to understand and respect the culture of openness that drives the success of those communities.

This chapter approaches the topic obliquely. We will look at how Wikipedia and related resources are built using crowdsourced effort from a global community, why this might drive people to seek out and photograph items from library special collections and what this means for the relationship between the library and its patrons.

2. The Wiki way of working

Wikimedia is a charitable project working towards a world “in which every single human being can freely share in the sum of all knowledge”.[1] It takes the form of eleven freely available “content projects”. These include Wikimedia Commons, a repository of photos, video clips and other digital media, Wikibooks, which creates textbooks and tutorials and Wikisource, the free library, which transcribes out-of-copyright text. The best known of these projects is Wikipedia, the free encyclopaedia. Wikipedia is, according to Lafrance, the most popular informational web site and in some languages is the biggest and most popular web site.[2]

Wikipedia’s role is not to compete against the knowledge produced by traditional scholarly sources but to bring that knowledge to the attention of a mass audience. It has a “no original research” rule: the information is not meant to be the contributors’ own opinions or observations but should summarize and derive its reliability from authoritative published sources. According to Casper Grathwohl of Oxford University Press, Wikipedia is “an ideal bridge between the validated and unvalidated Web,” helping readers respond to information overload by enabling “pre-research”.[3]

Having no adverts, commercial influences or paid content creators, Wikimedia is dependent on donations and volunteer work. There are many millions of registered accounts and around 80,000 regular contributors. The working methods of this community naturally differ from the way in which a project would normally be run in the commercial or public sectors. Everything has to be done on what might be called a DIY ethos. The term “knowledge philanthropy” has been coined for this kind of activity: a hobby in which people learn and freely share their learning so that others can benefit and build on it.[4]

Knowledge philanthropy gives a new motivation for people to spend time in a library and, especially, seek out its globally unique holdings. The kind of person who most needs a library is someone writing an encyclopaedia or a textbook. Wikipedia reports 70,000 active users and tens of millions of user accounts, which form a large community of people whose hobby is improving an encyclopaedia. Some of the contributions are textual but this hobby can also take the form of taking photographs and uploading them for use in educational resources.

Wikipedia’s tagline is “the free encyclopaedia”, Wiktionary’s is “the free dictionary”, and other Wikimedia projects similarly tout their “free” status. “Free” here reflects not just gratis (free as in “free beer”) but libre (free as in “free speech”). All the Wikimedia projects are “free cultural works”, a term originating in the open-source software movement.[5] This means readers are allowed to republish them and reuse them, including altered versions.  Cultural works get this status either by being old enough to be out of copyright or by their creators deliberately giving them a licence which permits use as a Free Cultural Work. In practice this means a Creative Commons Attribution (CC-BY), Attribution-ShareAlike (CC-BY-SA) or CC-Zero (CC0) licence.

“Free” is not a gimmick but essential. Wikipedia depends on people correcting, expanding, and translating existing text, or enhancing images. Remixing is taken for granted in the open culture movement: anything might be edited and nothing is in a final state. It is axiomatic that contributors can alter an article or image without prior permission from its creators. Require each change to be signed off by a manager or a lawyer and resources like Wikipedia simply could not exist. The library sector has a great opportunity in the form of this large, very engaged community who have an interest in seeking out and freely sharing out-of-copyright cultural material. The DIY ethos can, however, seem alien and scary to some heritage institutions who are under pressure to find and keep control of commercial uses.

There already exists a flourishing global “GLAM-Wiki” community consisting of formal and informal working arrangements between Wikimedia contributors and professionals in galleries, libraries, archives and museums. Sometimes this involves the institution hiring a Wikipedian or Wikimedian In Residence (WIR) and adopting Wikimedia-compatible policies on intellectual property. DIY digitization is a smaller-scale collaboration but it fits neatly into the ethos and working practice of the Wikimedia community. To illustrate this, we will look at some closely related activities.

3. The crowd and cultural heritage

To create the sum of all knowledge is an enormously ambitious goal. Break it down into smaller sub-goals and each of those could be an expensive and ambitious project. To take some examples:

  • Create an article about each notable book or document (“notable” meaning “written about in multiple scholarly sources”), with key facts such as publication date, original language and a summary of its content.
  • Photograph monuments, public art and architecture around the world, upload the photographs to a shared repository and tag them with a description and location.
  • Create biographies for every notable author, from every era and culture.
  • Document notable objects from museums around the world and upload the photographs with descriptions.

One way Wikimedia deals with this is by taking a long-term “eventualist” view. There is no project plan breaking things down into stages; instead, things happen when a volunteer gets around to doing them. This means that things take a long time to happen and they happen unevenly. The architecture of English cities is documented in higher detail than monuments in rural India.[6]

Another way to handle large tasks is to break them down into smaller tasks, spread across many contributors. Consider the example of creating an image repository of monuments and architectural features from around the world. There is no formal funding for this project, no formal plan and no paid workforce to carry it out and yet it is part of the Wikimedia community’s stated goal. What the Wikimedia community does is run an annual campaign called “Wiki Loves Monuments”.[7] Although they are not evenly spread, Wikimedia does have volunteer contributors all over the world, many of whom own cameras. For almost any notable monument, there is someone in the region who can take and share a photograph. Some of these enthusiasts are very dedicated photographers—even professionals—but amateurs can and do contribute useful content with simple equipment such as smartphones.

Naturally, the technical quality of the resulting photographs varies wildly and may be very poor. This does not harm Wikimedia, because the presence of a photo on Commons does not commit anyone to using it. The authors of educational materials, on Wikipedia or elsewhere, have the option, not the obligation, to use these images. The purpose of the online competition is to identify and showcase the best photographs in terms of technical, aesthetic and educational quality. It thanks and encourages the best contributors, rather than labelling poor-quality images negatively.

Users do not have to upload directly to Wikimedia Commons. An alternative is to share on Flickr with a Wikimedia-compatible Creative Commons licence. An automated process (a “bot”) transfers images and their attribution from Flickr to Commons on request of a Commons user. This automatically reads the licence tag to verify compatibility.

Wikimedia Loves Monuments is a relatively high-profile campaign. There are similar, smaller initiatives to visually document particular locations or topics. Wikipedia Takes Coventry got members of the public to take and share more than 1500 photographs of architecture and interesting sights in the vicinity of the Herbert Museum and Art Gallery.[8] A monthly competition on Wikimedia Commons invites photographs relating to a given word or phrase such as “wheels” or “religious practices”, resulting in multiple interpretations from around the world. This expands the range of images available to illustrate those concepts in Wikipedia or in other educational materials.[9]

For an example of how museums are welcoming the crowd (and a close model for DIY digitization in libraries), consider “backstage pass” events for Wikipedians. This is an event where a museum invites in a group of Wikimedia contributors to photograph their exhibits, usually giving them access to objects not normally on public display. There is an understanding that these photographers will share metadata and descriptions along with their photograph and they have access to the description cards and museum staff to get information about exhibits.[10]

The appeal of these events to the host institution is in having an event with a highly engaged audience who are keen to learn about the exhibits and write about them for a mass audience, as well as increased cultural reach as shared images and articles raise awareness of the collection. Sara Snyder of the Archives of American Art described a backstage pass event as “one of the most inspirational days I have ever had at my job”.[11]

A museum event provides an unconventional example of a document that reached a mass audience via DIY digitization. The British Museum holds the Cyrus Cylinder, a clay cylinder from the sixth century BCE with cuneiform inscriptions, which some sources regard (though controversially) as the first declaration of human rights. Wikimedia Commons hosts a photograph of the cylinder taken by Mike Peel, an academic from Manchester who contributes to Wikipedia and Commons in his spare time. This image is now used in more than a hundred pages on Wikipedia and related projects. These include the French language article on human rights, the English article on 539 BCE and the Farsi article on cultural icons.

Photograph by Mike Peel ( Modifications by مانفی. CC-BY-SA 4.0

Just as Wikipedia records past versions of each article in a “page history”, Commons has past versions of each image. The history for this image shows that the version presently visible is not the one uploaded by Peel. A user with a Farsi username made small adjustments to the crop and colour balance. So the appearance of the image in, say, the Bosnian language article on the Achaemenid Empire results from at least three volunteers with different skills and opportunities. One person had the camera, the photography skill and physical access to the object. Another user optimized the image for use in an encyclopedia; someone who likely never met the photographer and is probably based in a different continent. Someone else with the relevant language skills chose that image (from amongst several photographs of the Cyrus Cylinder on Commons) and added it to the article with a suitable Bosnian caption.

This example—not at all unusual—shows that the person who comes through the door to take the photograph might not be the person who uses the image in a Wikipedia article, or even uploads the final version of the image. When that person is a Wikimedia contributor, they are one part of the community that makes that knowledge and culture available and accessible. It also shows how “remixability”, made possible by the open licensing of the photograph, is crucial to the working process within Wikimedia.

People can approach knowledge philanthropy for different reasons. For some the co-operative aspect—sharing something which others find useful and can build on—attracts them to this work. Others might take a competitive approach, seeking to upload an impressive number of photographs and illustrate an impressive number of articles. Either way, there is a strong incentive for the contributor to connect the digital file they upload to its source; to make clear the location and provenance of the photographed object. With this attribution, the uploader can signal to the rest of the online community that they are contributing something valuable and distinctive.

We have seen how Wikimedia inspires and enables a global community to seek out and photograph interesting artefacts of many kinds. This naturally includes books and documents.

4. Writing about books on Wikipedia

As of Summer 2016, the English language version of Wikipedia has 20,000 articles about books.[12] Anything which is written about in multiple reliable sources can have a Wikipedia article, so there could be a great many more articles about books, both present and historic.

The pursuit of quality on Wikipedia gives contributors an incentive to seek out interesting images related to the topic. The connection, however, is complicated. Although not immediately obvious to most readers, there is a quality scale of Wikipedia articles, with the rating for an article located on its associated “Talk” page. The lowest rating is “Stub”. Presently, about half of Wikipedia’s book articles are at this level. At the top end of the scale are “Good Article” and “Featured Article” ratings. These involve formal review processes in which Wikipedians uninvolved with writing the article assess it against a list of specified criteria. If the article passes, it is badged, gets greater protection from vandalism and is added to showcases, which might involve being temporarily linked from the front page. The article’s main authors can also add badges to their own profiles to show that they have contributed some of Wikipedia’s best content. The GA and FA criteria are demanding: only about 1.5% of the articles about books are at these levels.

One criterion for GA and FA is that articles are illustrated with relevant images.[13] The interpretation of “relevant” is left up to the authors of each article. For a historic book this could mean:

  • a cover or title page, ideally of a first edition or illustrated edition;
  • an excerpt from an early draft or author’s notes related to the book;
  • illustrations or figures, especially if they are notable in their own right;
  • illuminated letters or pages;
  • an author’s portrait from around the time of the book’s publication; or
  • cultural responses to the book, which might include other books or pamphlets, cartoons, posters or letters.

The article about Alice’s Adventures in Wonderland, for example, is illustrated with some of John Tenniel’s character drawings, the cover and title pages of the original 1865 edition, a page from an early manuscript by the author and illustrations from various editions.[14] The articles about Thomas Paine’s The Age of Reason and Mary Wollstonecraft’s Original Stories from Real Life use title pages from other books that influenced or responded to the article’s subject.[15]

A caricature of Thomas Paine by George Cruickshank, used in the Wikipedia article on The Age of Reason. Public domain.

For these purposes, Wikipedia does not need a scan of the whole book. In fact, it cannot use a complete scan, although sister projects including Wikimedia Commons and Wikisource would have a use for it. For the Wikipedia article, it may be enough to have one visually interesting page. The scan need not be archive-quality; for the main purpose of illustrating an online encyclopaedia, a photo from a mobile phone may well be suitable.

Given Wikipedia contributors’ interest in moving articles about books up the quality scale, they have an ideal reason to visit a library’s special collections. They will not be seeking out just any copy of a book but the rarest and earliest edition or the copy with the most tangible connection to the author. The same may happen when someone is writing about an author, or about cultural or political phenomena. The article on the “Islamic Golden Age”, for example, contains pages from several relevant manuscripts.[16]

The same image may get many uses. For instance, the title page from the original edition of On the Origin of Species (in the form of a scan from the University of Sydney) is used hundreds of times across 65 different language versions of Wikipedia. On the English version, it is used not just in the article about the book itself but in the articles “1859 in literature”, “1859 in science”, “Darwin Centennial celebration”, “Publication of Darwin’s theory”, “Bibliography of biology” and “Dates of Epoch-Making Events”.[17]

Each of these uses can in turn reach a large audience, sometimes more than a million per year. Here are some view statistics for English Wikipedia articles about books, for a 30-day period. They include visits by people but not by programs such as web search robots.

Book Views for English Wikipedia article about the book in 30 days (to nearest hundred)
The Picture of Dorian Gray 327,400
To Kill a Mockingbird 130,300
Alice’s Adventures in Wonderland 128,700
Lord of the Flies 73,800
Uncle Tom’s Cabin 41,600
On the Origin of Species 31,000
Sir Garwain and the Green Knight 12,600
Original Stories from Real LIfe 7,200
The Age of Reason 6,400

Part of what attracts people to knowledge philanthropy is an opportunity for personal expression. When someone is reproducing already-existing material, they are not being creative in the same sense as someone who writes an article. Nevertheless, they can express themselves in their choice of text to focus on. Wikisource, the free library, has a collection of texts relating to slavery. While the community has transcribed many anti-slavery texts such as Harriet Beecher Stowe’s A Key to Uncle Tom’s Cabin or Harriet Ann Jacobs’ Incidents in the Life of a Slave Girl, dozens of defences of slavery lie blank, waiting to be transcribed.[18] Understandably, volunteers find more appeal in working on texts that were on the right side of history. At least part of their motivation is the feeling of connection to that historic struggle.

We should expect the same forces to shape DIY digitization. The choice of what to digitize will be driven by personal interests and feelings of connection with the past. This will not correspond with the priority of a library to preserve all the learning and culture of a period.

Knowledge philanthropy drives a great interest in the special collections of libraries around the world. It has its own preferences and biases but it appreciates the value of rare or unique cultural artefacts. Poulter argues that free availability of the digital replica does not substitute for interest in the physical object but instead expands the pool of people who appreciate the significance of, for example, the private papers of a historical figure and are willing to come to the library for an experience that the digital cannot capture.[19]

Formal digitization projects can meet some of this demand if the digitized material is released under suitable licences and terms for copying to Wikimedia. The resources for formal digitization are tiny, however, compared to the number of images that could usefully be added to Wikimedia. The demand outstrips what libraries can service. This mismatch between supply and demand could be seen as a problem but DIY digitization turns it into a positive; it turns demand into supply. If knowledge philanthropists are keen to informally digitize and share cultural treasures and are able to do so with the co-operation and mutual respect of institutions, everybody gains (“everybody” having the widest possible meaning). This can succeed or fail depending on how we frame the relationship between the library and the digitizing patron.

5. Crowdsourcing and the library-patron relationship

Knowledge philanthropy drives some of the most popular web resources and also drives some interest in special collections. In reacting to this, libraries can choose to muffle that enthusiasm or to foster it. This reaction will define the relationship between the library and its patrons. That relationship could be adversarial or co-operative. Doctorow’s advice to museums is no less applicable to library special collections: “Stop telling your patrons to put their cameras away […] You can’t convey the mission of cultural preservation and communication to an audience whom you are prohibiting from preserving and communicating their interactions with culture.”[20]

If a collection is valuable and its value is not appreciated by everyone, one natural conclusion is that it needs to be protected. Sensible as this statement is, in an overly risk-averse context it could lead to treating patrons with an adversarial mindset. We should not be distracted from another natural conclusion: that those who appreciate our collections are valuable to us and we to them. Mutual appreciation is the ground of a very positive working relationship.

As we have seen earlier in this chapter, the library and the digitizing patron are not the only stakeholders. There is also the distal, possibly global, community with an interest in images of cultural heritage and with putting them to educational use. This is especially the case when the library holds cultural treasures from other countries. Most editors of Japanese Wikipedia—the people best placed to contextualize a piece of culture for a large Japanese-language audience—will never step through the doors of a library in the UK. Yet libraries in the UK hold, for various reasons, significant cultural treasures from Japan. Again, we can see this audience as those from whom collections should be protected or we can recognize that, as people who appreciate the value of our collections, they are natural and valuable allies.

Flickr presents contributors with a variety of licensing options, including several flavours of Creative Commons as well as All rights reserved. In the Bodleian DIY Digitization Flickr group, it has been common so far for users to choose All rights reserved. This may seem to involve least risk but consider what it means in practice. Libraries have preserved and catalogued a cultural treasure, perhaps for hundreds of years, often thanks to charitable funds as well as public taxation. The person who gets access to the out-of-copyright work takes a photograph of it. It is not certain that this photograph is legally a separate work with its own copyright status; assume for the sake of argument that it is. That copyright is owned by the person who made the derivative work, the photographer. Reserving all rights, that patron shares the image for public view but forbids anyone from making use of it (adapting it, incorporating it in educational materials, redistributing it) without getting permission.

In sum, the patron asserts a private benefit for themselves. They have this benefit only because, as a society, we consider it important to preserve cultural heritage, have put in considerable investment to do so and because that person lived in part of a world where they could take advantage of the opportunity. Even the library that preserves the book or document has to request permission to make use of the photograph.

What is wrong with this? Although the event of photographing a document is an opportunity to create social value, this arrangement creates the minimum possible value. For a start, it locks the images away from the majority of the people who could find a legitimate use for them. In theory, anyone could ask for permission to republish or adapt an image. In practice, getting that permission is going to be difficult or impossible. People forget to check their Flickr email address, they lose their Flickr password, or for other reasons become unreachable.

To give advance permission just for the beneficial uses of an image requires us to anticipate all those uses, which no one can do. Earlier in this chapter, we saw that Wikimedia uses one image of the Cyrus Cylinder in 100 different ways. No one person thought of all those uses and they were not anticipated at the time the photograph was taken. Anything less than an open-ended free licence will restrict the value, for society as a whole, that could be created by sharing the image.

The photograph may well not even be copyrightable. Petri’s comments about museum exhibits are even more applicable to out-of-copyright publications in library special collections.[21] Using copyright to restrict the use of photographic copies is probably not viable and institutions have other ways, including trademark law, to protect items they can use commercially. Even in the case of UK libraries, UK law may not be applicable given that Wikimedia and other popular image-sharing sites reside on servers in the United States.

The most obvious way to use the image for the widest public benefit is to share it through Wikimedia. The previous section explained that DIY digitization will produce many images that Wikipedia hungers for. We have seen that Wikipedia and the Wikimedia projects rely on text and images that are usable by anyone for any purpose. Non-commercial or no-derivative clauses are unacceptable; that content will just not be visible on Wikimedia platforms. We have seen that this is integral to the way these communities work, where everything is collaborative and continually being remixed but where everybody is entitled to credit for their contributions.

An alternative approach would be to encourage an ethos of “paying it forward”: to tell patrons, “You have had a rare opportunity to interact with a precious cultural artefact, now share that opportunity freely with others.” This is not suggested as a form of words but as a message that should be implicit in the way digitizers are encouraged and supported. Patrons could be allowed to take photographs on the condition that they share them under a free licence through a platform like Flickr or Wikimedia Commons, with attribution in the form preferable to the library. This is a message based on reciprocity: not “You are lucky to be here, and we have to minimize the damage you might do,” but “We are giving you an opportunity and we have a right to expect this favour in return.”

We should start from the assumption that patrons want the photographs they share to have the greatest possible value in the communities where images are shared. One aspect of this is good metadata about the digitized object; informative titles that help people use the image, as well as location and shelfmark that will allow others to track down and find more about the physical object. Patrons don’t always know how to include this information with their uploaded photographs. Here the library can actively help them as part of their positive working relationship.

At the Bodleian, we want images to be used in many languages, not just English. Translating image descriptions into multiple languages was too much to take on but we managed to come up with a one-line attribution, “This file comes from the Bodleian Libraries, a group of research libraries at Oxford University.” Staff and student contacts helped translate it into a total of fourteen languages. This has been coded into a template which can be added to images on Wikimedia, showing readers the attribution statement in their own language, if it is available.[22] Libraries could learn how to create this kind of template and show patrons how to attach it to their uploads.

Commons has institutional tags that can be added to show where the physical copy of a depicted art work resides. There are also categories which can be added to files to show their origin. These tags can be read by open source tools which count the uses of an image and page views on the articles that use it.[23] It is in libraries’ interest to learn how these work and to share this knowledge with DIY digitizers. This is a small amount of work for the library but its effect is amplified by the work of knowledge philanthropists. An enormous cultural impact—the fulfilment of the potential of special collections—is the potential reward.



[1] Wikimedia, ‘Vision Statement’; available at [last accessed 05/08/2016].

[2] Adrianne Lafrance, ‘The Internet’s favourite website’, The Atlantic (20 May 2016); available at [last accessed 05/08/2016].

[3] Casper Grathwohl, ‘Wikipedia Comes of Age’, The Chronicle of Higher Education (7 January 2011); available at [last accessed 21/062016].

[4] The term was coined by Jack Herrick, founder of WikiHow (a knowledge-sharing community that is similar to, but not connected with, Wikimedia). See Hilda Bastian, ‘Are You a Knowledge Philanthropist? If Not, Why Not?’, Scientific American Blog (18/07/2012); available at [last accessed 05/08/2016].

[5] Defined at [last accessed 05/08/2016].

[6] The distinction between a formally planned project and the work of Wikimedia communities parallels the distinction between artificial computers and slower but much more energy-efficient artificial computers such as DNA. This analogy is explored in Martin Poulter, ‘Crowdsourcing: the Wiki Way of Working’, Jisc Guides (26 September 2014); available at [last accessed 05/08/2016].

[7] Available at [last accessed 05/08/2016].

[8] Available at [last accessed 05/08/2016].

[9] Available at [last accessed 05/08/2016].

[10] For examples of write-ups from backstage pass events, see Liam Wyatt, ‘Backstage Pass and its achievements’, Wittylama Blog Post (13 June 2010); available at [last accessed 05/08/2016]. See also Derek Lieu, ‘How the Smithsonian is Helping Wikipedia’, The Chronicle of Philanthropy (11 August 2011); available at [last accessed 05/08/2016].

[11] Sara Snyder, ‘“Wikipedia is made of people”: Revelations from Collaborating with the World’s Most Popular Encyclopaedia’, Outreach: Innovative Practices for Archives and Special Collections, ed. Kate Theimer (Plymouth: Rowman & Littlefield), pp. 91–106 (p. 96).

[12] Available at [last accessed 05/08/2016].

[13] These criteria are available at [last accessed 05/08/2016].

[14] Available at [last accessed 05/08/2016].

[15] Available at and [last accessed 05/08/2016].

[16] Available at [last accessed 05/08/2016].

[17] Available at [last accessed 05/08/2016].

[18] Available at [last accessed 05/08/2016].

[19] Martin Poulter, ‘Shiver-inducing contacts with the past’, CILIP Update (Chartered Institute of Library and Information Professionals, November 2015); available at [last accessed 05/08/2016].

[20] Cory Doctorow, ‘GLAM and the Free World’, Museums and the Web 2013, eds. N. Proctor and R. Cherry (Silver Spring, MD: Museums and the Web, 19 February 2014); available at [last accessed 28/07/2016].

[21] Grischka Petri, ‘The Public Domain vs. the Museum: The Limits of Copyright and Reproductions of Two-Dimensional Works of Art’, Journal of Conservation and Museum Studies, 12.1 (2014), art. 8; available at [last accessed 05/08/2016].

[22] Available at [accessed 05/08/2016].

[23] Available at [accessed 05/08/2016].

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.