Inicio Biblioteca Complutense Catálogo Cisne Colección Digital Complutense

The collaboration between Google and the Complutense to scan books and make them available for free

30 de Noviembre de 2012 a las 13:35 h

Google has scanned more than 20 million books that you can search by every word of every line of each of its pages and download them if they are in the public domain.

The Complutense is the first non English partner to join the project involving some of the world's leading libraries.

National libraries like the British Library, those of Italy, Holland and Bavaria.

Universities like Michigan, Harvard, Oxford, California or Cornell.

The world's largest public library, the New York Public library.

Google created several digitization centers. Some are scanned more than 4,000 books a day.

The Complutense has digitized 120,000 books with Google creating the largest Spanish collection of ancient books scanned at the scanning center of Madrid.

  The collaboration between Google and the Complutense to scan books and make them available for free

The Complutense University of Madrid and its Library

30 de Noviembre de 2012 a las 13:35 h

The Complutense University of Madrid is the largest among Spanish universities, with over 80,000 students and 6,000 scholars.

The university library is the second in Spain by the number of books after the National Library.

Today we have more than 3 million books, 34 branch libraries, 11,300 seats for reading, 1.500 computers and 411 librarians.

  The Complutense University of Madrid and its Library

Our commitment to collaboration for digitization and dissemination of scientific production and heritage

30 de Noviembre de 2012 a las 13:34 h

Like other libraries, the Library of the Complutense University of Madrid has had to redefine its strategy for disseminating knowledge and research.

Firstly, in order to serve the our academic community, but also the global academic community and the general public, we were aware that the only viable way to make this work was through collaboration with public and private institutions.

- Thus, our historic collections have been digitized through collaboration with public institutions and agencies: Spanish Government, Federal Government of Madrid, Madrid Academic libraries consortium, Europeana, Hathi Trust, Internet Text Archive, but also with private institutions: Google, Santander Universities, Health Sciences Foundation, and Editorial Extramuros

- And our scientific output has been digitized with the collaboration with Banco de Santander and distributed through alliance with publishers and distributors: Springer, Thomson Reuters, Proquest, E-Libro

  Our commitment to collaboration for digitization and dissemination of scientific production and heritage

Complutense Digital Colections: a) Academic works

30 de Noviembre de 2012 a las 13:33 h

In order to carry out this strategy, we have developed three lines of action.

The first line, academic works, has more than 25,000 digital dissertations and two open access portals with 30,000 articles from journals published by our university and 11,000 e-prints.

  Complutense Digital Colections: a) Academic works

Complutense Digital Collections: b) Materials to support teaching and research

30 de Noviembre de 2012 a las 13:32 h

The second line is materials to support teaching and research.

We have more than 400,000 newspapers (with restricted access, of course), and several portals in open access of photographs from the Spanish Civil War, XVIII century Drawings from the School of Arts, XIX Century japanese prints, etcetera.

  Complutense Digital Collections: b) Materials to support teaching and research

Complutense Digital Colections : c) Ancient books and cultural heritage

30 de Noviembre de 2012 a las 13:32 h

The third line, which is the specific subject matter of this virtual exhibition, is building collections through digitization of our cultural heritage to make it universally accessible and properly preserved.

Now we have 125,000 books scanned and you can download them freely from the Internet.

  Complutense Digital Colections : c) Ancient books and cultural heritage

Status of Complutense ancient books digitization in 2006: the Dioscorides Digital Collection

30 de Noviembre de 2012 a las 13:32 h

But to reach this amount of digitized ancient books has not been easy. Six years ago we had only 2 thousand 8 hundred (2,800) scanned books.

Although this collection was the largest in Spain we were far from being satisfied:

  • The pace of digitization had been slow, which made it very difficult to extend the project to include all our cultural heritage within a reasonable period of time.
  • At this rate we would take more than 400 yearse to scan the number of books we scanned with Google in 3 years.
  • Dissemination on the Internet was poor owing to the local nature of the project and Portal deficiencies: The portal was not multilingual, not adapted to social web developmentshad and had not long term digital preservation.

It was therefore necessary to find an alternative digitization model and migrate to a large-scale model that by then was already being implemented by Google.

  Status of Complutense ancient books digitization in 2006: the Dioscorides Digital Collection

The Complutense University of Madrid - Google Agreement

30 de Noviembre de 2012 a las 13:31 h

A partnership agreement was signed with Google in 2006 for digitizing our public domain collections and offering free online access.

In 3 years we have scanned 120,000 books.

For now, Google has scanned more than 20 million books all around the world. Over 80 % of them are from the libraries participating in the program.

These books have been indexed with OCR and included in the overall Google's index:  Any word o any page of these 20,000,000 books is recovered when someone is searching in Google.

The project's success is evident when you consider that every 6 months over 98% of the books have been visited at least once, something that many librarians always dreamed.

  The Complutense University of Madrid - Google Agreement

A controversial project

30 de Noviembre de 2012 a las 13:31 h

Anyway, this is a controversial project.

For some people:

- It's a violation of the rights of authors and publishers.

- A Risk of monopolization of access to the content of books

- It transfers public cultural heritage to a commercial company

- And scanning has been without enough quality: for them Google has poor images and worst OCR.

For others:

- Was a unique opportunity to democratize knowledge by digitizing the content of millions of books that would have been impossible to do otherwise in a reasonable time.

- Created a free and easy tool that allows you to query the contents of the books and download them for reading.

- The scanning quality is enough. In our case, the digitization with Google improves considerably what other companies have done with our collections.

- Has created public collections of digitized books that never before have been created.

- Stimulated other public and private projects of mass digitization

 

Anyway the facts are:

First, that participating libraries have created with their digital copies important public collections of scanned books.

And second, when you search Google you find not only websites but books that can be downloaded and the general public use this tool (and how)

  A controversial project

What does Google do in our university?

30 de Noviembre de 2012 a las 13:30 h

 

  • Scans documents and is responsible for the costs: Books are scanned twice to avoid errors.
  • Two copies of the digitized material were produced, one for Complutense and the other for Google.
  • 120,000 works have been digitized and there has been no incident along the process that has included printed material of all time and value, from incunabula to first editions as our first edition of Isaac Newton's Principia Mathematica.
  • Out of copyright scanned books are freely searchable and downloaded from Google Books and the Google index.
  • Create an exclusive interface for the University and its users, to access to and download digital works of the program.
  What does Google do in our university?

And what does the Complutense Library do?

30 de Noviembre de 2012 a las 13:29 h

The University contributed its holdings and the staff needed to select and handle the documents to be digitized.

This has meant not only select the material, but to ensure that everything was properly cataloged, classified and physically prepared.

You should know that over 50% of our old collection had not yet been introduced in the automated catalog.

This work has led many basic operational tasks, from deep cleansing of the books and the stacks, repair them and, even, take off uncut pages in books that had never been opened before.

Finally, we have made actions for disseminating these digital copies and long-term preservation at a reasonable cost.

  And what does the Complutense Library do?

An opportunity for academics and general public

30 de Noviembre de 2012 a las 13:29 h

 

So, this has been a unique opportunity to provide Complutense scholars, especially those ones working in the fields of social sciences and humanities, with a corpus of digital materials that enable digital projects to be developed.

This is the case of Dialogyca, a portal for Spanish dialogues of the past five centuries created by Complutense academics that provides information on the characteristics of each work, the literature on it and the full text of their testimonies manuscripts and printed.

  An opportunity for academics and general public

Project Planning and Design: 2007 Actions

30 de Noviembre de 2012 a las 13:28 h

In 2007 the library conducted an analysis of their collections and storage spaces to ascertain the number of public domain volumes to be digitized and to obtain information on the facilities and accessibility in the locations.

Also the library prepared a Plan for Bookbinding and a Recommendations Guide to Select the works to be scanned. 

The restoration department of the Complutense Library drew up a list of technical recommendations for repairing ancient century books.

As a result, it was possible to digitize many previously discarded volumes and improve their conservation.

The selection of volumes was based on three main variables: date of publication, physical condition of the item and fitness for scanning.

  Project Planning and Design: 2007 Actions

Project Planning and Design: 2008-2011 Actions

30 de Noviembre de 2012 a las 13:28 h

A Cataloguing Plan was drawn up for non-catalogued materials. Approximately 220,000 volumes published before XX century were catalogued.

In the period from 2008 to 2011 were analyzed for scanning 145,000 books and 120,000 of them were scanned in the Google Scanning Center in Madrid. In this center were also digitized 80,000 books from the National Library of Catalonia.

This scanning center was small and were scanned only 300 books per day, not 4,000 books per day as in other Google scanning centers.

In June 2011 Google finished the operations in Madrid.

  Project Planning and Design: 2008-2011 Actions

Technological Developments: Web application for project management:

30 de Noviembre de 2012 a las 13:27 h

For this project we developed a web application providing real-time information on all the processes involved in scanning.

This application collects book metadata on preservation and includes them in the catalogue.

Also offers a complete overview of the daily movements of books sent to Google and shows automatically updated information on the availability or status of the item in the catalogue.

  Technological Developments: Web application for project management:

Technological Developments: PDA application

30 de Noviembre de 2012 a las 13:26 h

We also developed an application used directly through a PDA by selection teams in the stacks.

When the application reads the book's barcode, a form is displayed on the touch screen for to evaluate features relating to the physical conditions of the book.

This information on preservation status is then dispatched to the web application and a summary of this information is automatically forwarded to the catalogue records in a note field.

  Technological Developments: PDA application

How do you access the Complutense digitized books? 1. Searching anything in Google

30 de Noviembre de 2012 a las 13:26 h

You can access to Complutense scanned books by searching anything in Google  (or Google Books or Google Play).

 

This is relevant because it integrates our collections where the general public looks for information.

It has a multilingual interface and constant technological developments and upgrades.

You have easy access to another editions, related books, keywords or geographical places mentioned in the book.

And remember: Every 6 months more than 90% of the 20.000.000 Google Books are visited!

  How do you access the Complutense digitized books? 1. Searching anything in Google

How do you access the Complutense digitized books? 2. Exclusive Google search interface for searching Complutense books.

30 de Noviembre de 2012 a las 13:26 h

Yes, you can also access to the Complutense digitized books with the exclusive Google search interface for searching Complutense books (and the rest books of the project).

Visit our exclusive interface linking here: http://www.ucm.es/BUCM/atencion/25403.php

  How do you access the Complutense digitized books? 2. Exclusive Google search interface for searching Complutense books.

How do you access the Complutense digitized books? 3. Catalogue of the Library of the Complutense University.

30 de Noviembre de 2012 a las 13:26 h

 

Third. In our library catalogue. It has links to the digital volume and also a box for full-text searching.

Our web application for logistical management produces a daily script for accessing GRIN, the Google partners management interface, obtains the item barcodes and adds an eight hundred and fifty six MARC label providing a link to the digital copy in Google Books.

 

Also it runs another script that uses a Google API to check whether there is a digital copy of a book in Google Books; if this is the case, the catalogue will show a search box for searching in this specific book fulltext.

Our catalog Cisne is available at http://cisne.sim.ucm.es/

  How do you access the Complutense digitized books? 3. Catalogue of the Library of the Complutense University.

How do you access the Complutense digitized books? 4. HathiTrust Digital Library.

30 de Noviembre de 2012 a las 13:25 h

Hathi Trust is a library consortium to ensure that the cultural heritage is preserved and accessible long into the future.

It has more than 5 million books digitized and two hundred thousand serial titles with 31% of them in the public domain.

Complutense joined Hathi Trust in Twenty Ten (2010) and is the only non-American partner.

Another partners are the Library of Congress, New York Public Library, California Digital Library and some academic libraries as Columbia, Cornell, Harvard, MIT, Princeton, Stanford, California, Chicago, Michigan, Yale...

Why we joined Hathi Trust? Because there was not at that time in Spain or in Europe a project that would allow us to preserve and disseminate at reasonable costs and proper standards our digital objects.

Hathi Trust is available at http://www.hathitrust.org/

  How do you access the Complutense digitized books? 4. HathiTrust Digital Library.

How do you access the Complutense digitized books? 5. In your own library catalogue if you have a discovery tool

30 de Noviembre de 2012 a las 13:25 h

When you perform a search on some of the actual discovery tools for library catalogues, as Summon, you find not only the books of your collection, but Hathi Trust books too.

So, Complutense digitized books are searchable when you search your library catalog.

  How do you access the Complutense digitized books? 5. In your own library catalogue if you have a discovery tool

How do you access the Complutense digitized books? 6. More: Internet Archive, Europeana…

30 de Noviembre de 2012 a las 13:25 h

Finally, thousands of our books digitized by Google are included several public digital libraries.

For example you can search thousand or our books in the Internet Text Archive http://archive.org/details/texts and soon the entire collection will be in Europeana.

  How do you access the Complutense digitized books? 6. More: Internet Archive, Europeana…

How do you access the Complutense digitized books? 6. More: Internet Archive, Europeana…

30 de Noviembre de 2012 a las 13:25 h

Finally, thousands of our books digitized by Google are included in several public digital libraries.

For example, thousand of our books are in the Internet Text Archive (http://archive.org/details/texts)  and soon the entire collection will be in Europeana.

  How do you access the Complutense digitized books? 6. More: Internet Archive, Europeana…

Europeana Libraries Project: A bet to access to digitized heritage objects from European institutions.

30 de Noviembre de 2012 a las 13:24 h

Complutense University is collaborating on the Europeana Libraries project with eighteen European research libraries from Europe and the United Kingdom.

Five million digitized objects will be aggregated to Europeana. The project is coordinated by The European Library and hosted by the National Library of the Netherlands.

What does Europeana libraries offers to you?

• 1,200 film and video clips

• 850,000 images

• 4.3 million texts (books, journal articles, theses, letters)

  Europeana Libraries Project: A bet to access to digitized heritage objects from European institutions.

What does Complutense University Library offer to you in Europeana?

30 de Noviembre de 2012 a las 13:24 h

Complutense University Library will aggregate:

- Complutense Rare Books from 16th Century to 1870: Metadata of more of 120,000 books digitized by Google.

- Engravings of Dioscorides Collection:  50,000 engravings approximately.

- Complutense University Theses: 6,000 theses.

- Journal articles published by Complutense University Press: 31,000 articles

- Fine Art Old Drawings from the Fine Arts Faculty (between 1752 and 1914): 287 drawings.

- Photos of the Spanish Civil War from the Historical Archive of the Communist Party of Spain: 3,200 photographs approximately.

- Personal Archive of Ruben Dario: 5,000 documents.

- Complutense Manuscripts & Incunabula

- Cartographic material, maps and city views

  What does Complutense University Library offer to you in Europeana?

Scanning process total data

30 de Noviembre de 2012 a las 13:24 h

 

This process has provided the Library with very comprehensive information on the conservation of our heritage: some seventeen percent of the pre-eighteen seventy-one volumes have issues that exclude them from the digitization project.

The main issues are related to text block (fungi, loose pages, uncut pages and other physical damages) and binding.

  Scanning process total data

Access to Complutense Books in Google: Most visited books (one week)

30 de Noviembre de 2012 a las 13:23 h

Our books, are used? Fortunately yes.

Two out of three books are visited each week and more than 90% are visited as least once each 6 months.

The most visited books in Google Books scanned in an Euuropean Partner is one from the Complutense Library: Pascual Madoz's "Diccionario Geográfico-estadístico de España y sus posesiones de Ultramar".

  Access to Complutense Books in Google: Most visited books (one week)

How do we preserve our digitized books? Hathi Trust

30 de Noviembre de 2012 a las 13:23 h

We believe that digital preservation could only be achieved through a cooperative initiative.

As a result, in November 2010 the Complutense joined HathiTrust, a repository for high quality storing and access to scanned books and journals.

HathiTrust serves a dual role: First, as a trusted repository it guarantees the long-term preservation of the materials it holds. Second, as a service for partners and a public good, HathiTrust offers persistent access to the digital collections.

This includes downloading public domain volumes. Specialized features are also available which facilitate access by persons with print disabilities, and allow users to gather subsets of the digital library into 'collections'. Once a collection is created, the full text of those volumes can be searched as a set.

Bibliographic metadata are managed in an Aleph Library Management System.

  How do we preserve our digitized books? Hathi Trust

HathiTrust Characteristics:

30 de Noviembre de 2012 a las 13:23 h

The repository was designed in accordance with the Open Archival Information Systems reference model, open code technologies and another international standards:

  • Trustworthy Repositories Audit & Certification
  • Preservation Metadata Implementation Strategies
  • TIFF and JPEG Two Thousand for image formats and
  • Permanent URL

 

The repository has two mirror sites at the University of Michigan and Indiana University.

The two libraries that more books have digitized with Google are also the major libraries in Hathi Trust: Michigan University with 50% and California University with 28%.

  HathiTrust Characteristics:


Universidad Complutense de Madrid - Ciudad Universitaria - 28040 Madrid - Tel. +34 914520400
[Información - Sugerencias]