ERPANET Case Study: Project Gutenberg by ERPANET (books to read as a couple .txt) π
Regulatory Environment
Project Gutenberg must adhere to U.S. laws involving operation as a not-for-profit corporation. However, these regulations are not sector specific. Project Gutenberg must be exceedingly careful to respect U.S. copyright laws regarding the works that they digitise and make available over the Internet. However, once a publication has been verified as being in the public domain, there are no other legal restrictions affecting Project Gutenberg.
Preservation Activity
Policies and Strategies
Project Gutenberg scans literary works and employs OCR technology to create eBooks. In some cases, eBooks are typed in by hand. The eBooks are then edited by a team of volunteer proof-readers. There are procedures and guidelines available online for volunteers to consult when scanning and editing texts for Project Gutenberg to ensure that all eBooks follow a standard format. O
Read free book Β«ERPANET Case Study: Project Gutenberg by ERPANET (books to read as a couple .txt) πΒ» - read online or download for free at americanlibrarybooks.com
- Author: ERPANET
- Performer: -
Read book online Β«ERPANET Case Study: Project Gutenberg by ERPANET (books to read as a couple .txt) πΒ». Author - ERPANET
The task of selecting the sectors for the case studies and of identifying the respective companies to be studied is incumbent upon the management board. They compiled a first list of sectors at the very beginning of the project. But sector and company selection is an ongoing process, and the list is regularly updated and complemented. The Directors are assisted in this task by an advisory committee (5).
Chapter 4: Project Gutenberg Project Gutenberg produces free electronic versions of literature and reference works that are in the public domain. As the project has only a few paid staff members (6), the majority of eBooks are scanned and edited by volunteers. Available via the Internet since 1994, Project Gutenberg is the oldest producer of freely accessible, electronic books (eBooks). From 1971 until 1997 over 1,100 eBooks were created. In the first eleven weeks of 2004 alone, three hundred new eBooks have been generated. There are now over 13,380 eBooks available and the production of eBooks is constantly increasing. Project Gutenberg is dedicated to making these resources available to the general public in a form that the vast majority of the computers, programs and people can easily read (ASCII). However, most texts are available in a wide range of formats for users to select.New features have been added recently to Project Gutenberg's core services. Specifically, the new Radio Gutenberg (7) makes audio and video files accessible to the public for download as well as broadcasts on their two radio channels and Gutenberg Music (8) makes digitised music sheets accessible. This project focuses only on the preservation of the eBooks.
The Project Gutenberg Literary Archive Foundation (PGLAF) is a recognised charitable organization by the US Internal Revenue Service.
http://www.gutenberg.net
Perception and Awareness of Digital Preservation
Project Gutenberg is one of the earliest web sites on the internet and one of the earliest digital libraries in existence. They have been active in creating eBooks for over thirty years and are aware of the social benefits to be gained through preserving these resources for public access. Project Gutenberg ensures that all eBooks are available in plain text and other open formats to avoid obsolescence. The eBooks are uploaded to two main servers (9) and can then be mirrored by over thirty sites worldwide. The combination of open formats and many copies should ensure that access to these digitised literary works is preserved for the long-term.
The Main Problems
The major long-term problem lies in ensuring that copyright laws are respected for all of the digitised works made accessible by Project Gutenberg. Mirror sites exist in many countries around the world and, as such, ensuring that copyright laws are respected in each can be difficult. However, no eBook will be posted to the main site in the U.S. without gaining copyright clearance. Recent extensions to copyright laws in the U.S. and Europe have presented new challenges for the Project Gutenberg team. This is because no new works will be released to the public domain until 2018. Hart believes that these extensions to copyright laws benefit 'very few copyright holders at the expense of universal access to literature and knowledge'(10). These changes will impact the amount of research that needs to be done before an eBook can be digitised and made available.
Asset Value and Risk Exposure
Project Gutenberg exists to make literature and reference materials freely accessible to the general public in a digitised format. As mentioned above, Michael Hart believes that free access to literary works is vital for enabling the sharing of knowledge, art, music and culture.
Regulatory Environment
Project Gutenberg must adhere to U.S. laws involving operation as a not-for-profit corporation. However, these regulations are not sector specific. Project Gutenberg must be exceedingly careful to respect U.S. copyright laws regarding the works that they digitise and make available over the Internet. However, once a publication has been verified as being in the public domain, there are no other legal restrictions affecting Project Gutenberg.
Preservation Activity
Policies and Strategies
Project Gutenberg scans literary works and employs OCR technology to create eBooks. In some cases, eBooks are typed in by hand. The eBooks are then edited by a team of volunteer proof-readers. There are procedures and guidelines available online for volunteers to consult when scanning and editing texts for Project Gutenberg to ensure that all eBooks follow a standard format. Once the eBook has been produced, it is uploaded to two main servers. The eBook is made accessible via the official Project Gutenberg website and the Internet Archive site and on over thirty mirror sites around the world. As there are no access or distribution issues, Project Gutenberg encourages users to save copies of the eBooks to CD or DVD.
Project Gutenberg believes that by generating a multitude of versions - those stored on the main servers, on local servers (through mirror sites) and those downloaded to CD and DVD - will ensure that the bit stream of the literary work is preserved for access. This embodies the philosophy of the LOCKSS strategy. LOCKSS 'uses the caching technology of the web to collect pages of journals as they are published, allowing libraries to take physical custody of selected electronic titles they purchase'(11). LOCKSS was inspired by the words of Thomas Jefferson who said "let us save what remains: not by vaults and locks which fence them from the public eye and use in consigning them to the waste of time, but by such a multiplication of copies, as shall place them beyond the reach of accident." (12)
Selection
Project Gutenberg aims to make digitised versions of popular literature and reference materials in the public domain freely accessible to the general public. As copyright expires, publications can be freely replicated and distributed. Many of these works are out of print. By digitising the out of print works, Project Gutenberg feels that they are saving the publications from 'obscurity and ultimate oblivion'(13). Basically, all of the texts can be classified into three categories: light literature (such as Alice in Wonderland), heavy literature (such as Shakespeare and Dante) and references (such as Roget's Thesaurus). Mathematical and scientific works are also made available including the Human Genome. There are no real restrictions to what Project Gutenberg will make accessible. As long as the material is in the public domain, they can be digitised and submitted to Project Gutenberg. However, Project Gutenberg aims to benefit the widest possible audience and therefore prioritise the digitisation of popular literature and reference materials rather than extremely specialised works. Project Gutenberg already have texts in over 31 languages and are especially keen to increase their multilingual holdings.
Preservation
Project Gutenberg already has numerous plain text files that are 20-30 years old. In that time, many file formats have come and gone while plain text is still readable on virtually all computers. The use of plain text will also help to insure against future obsolescence. All Project Gutenberg eBooks are created as plain ASCII text files. This means that people with 'Apples and Ataris all the way to the old homebrew Z80 computers' (14) as well as Mac and UNIX users are all able to read the text files. Any open format can be submitted but the Project Gutenberg team will also generate plain ASCII (15) text files. Project Gutenberg encourages users to created new formats from the plain text files to suit their individual needs. Once the eBook has been generated and edited by volunteers, it is uploaded to two main servers. The first is the Project Gutenberg site itself and the other is the Internet Archive site. From this point, mirror sites can download the redundant files to their own sites and store them on their own servers.
Project Gutenberg uses the unique eBook number as the file name. Therefore, if the eBook is the 10001 plain text file created it will be named 10001. txt. Project Gutenberg will accept as many open file formats as volunteers are willing to submit, but will also generate a plain text version. Additional versions in other formats will be named accordingly but with different file extensions (e.g., html, pdf, xml). Each eBook has its own subdirectory that contains all versions of the eBook.
Project Gutenberg have volunteers representing a wide range of sectors (cultural heritage, government and higher education). Through these affiliations, they keep up to date with digital preservation developments. Project Gutenberg staff have ties with many organisational leaders and informal collaborations on best practices are common.
Access
The eBooks are catalogued by Project Gutenberg volunteers to include the author, the author's dates of birth & death, language, eBook number, and the Library of Congress classification to enhance online searching capabilities. As the publications that Project Gutenberg aims to make accessible are already in the public domain, restricting access is not really an issue. Project Gutenberg is mirrored in over thirty sites around the world. As such, they cannot accurately estimate the number of downloads that take place across all of the mirrored sites, but state that the equivalent of 1 million eBooks are downloaded each month from the main central server (16). In an effort to increase accessibility by non-English users, eBooks can be generated and submitted in any language.
Project Gutenberg uses Dublin Core to describe their electronic resources to enable resource discovery.
Compliance Monitoring
There are no external requirements that Project Gutenberg must meet. However, Distributed Proof-readers (17) work to edit and ensure that the eBook content is as accurate as possible. The eBook goes through two rounds of proofreading where it may be examined by hundreds of volunteers. Once the eBook has been proofread, it goes to the post-processing stage. 'The ultimate goal of post-processing is to create a plain text eBook with consistent formatting throughout, which contains as few errors as possible, and which accurately reflects the intentions of the author' (18). Project Gutenberg citations - for example in the Online Computer Library Center (OCLC)- appear as their own editions and, as such, do not comply with any particular paper edition. In some cases Project Gutenberg editions are listed as the only edition in existence. Project Gutenberg makes every effort to ensure that they comply with U.S. copyright laws and encourages all volunteers to verify that materials proposed for digitisation are in the public domain. Guidance and advice on undertaking this research is provided on the project website. However, the Project Gutenberg team are ultimately responsible for verifying public domain status and require that a copy of the title page be submitted for each proposed publication to assist in this process.
Digital Preservation Costs
A registered charity, Project Gutenberg relies on donations to pay their few dedicated staff members and for operational costs. Nearly 100 per cent of the operational budget is focused on preservation. In terms of storage costs, the project founder believes that as disk drives become larger and cheaper, the price of putting eBooks on computers will become negligible (19).
Future Outlook
Project Gutenberg has already been implemented in Australia and Europe. Project Gutenberg of Canada is being founded in the near future. Project Gutenberg also hopes "to also create such projects in Africa, Asia, and other regions. In particular, they hope to create projects by which e-books can reach the masses via digital radio links to solar-powered PDAs. In addition, Project Gutenberg will be adding more multimedia e-books: paintings, sculptures, music, audio e-books, movies, etc., along with a wider variety of text formats." (20)
Project Gutenberg will continue digitising literary works and aim to offer over 10,000,000 eBooks in over 100 languages by the time they celebrate their 50th anniversary in 2021. Project Gutenberg aim to enable the migration on request of their plain text files. This would mean that the plain text version could be generated in any type of file requested on the fly. This is currently in test mode. Project Gutenberg is also investigating creating the eBooks as born XML to allow easier creation of other formats on demand (21).
Chapter 7: ConclusionsAs the first and largest collection of eBooks, Project Gutenberg has been preserving electronic publications and making them accessible for over thirty years. By adhering to strict guidelines regarding the format of the eBook
Comments (0)