This was totally serendipitous, as it turned out that two of a four operator crew happened to be the best friend of Michael's and the best friend of his brother. Michael just happened "to be at the right place at the right time" at the time there was more computer time than people knew what to do with, and those operators were encouraged to do whatever they wanted with that fortune in "spare time" in the hopes they would learn more for their job proficiency.
At any rate, Michael decided there was nothing he could do, in the way of "normal computing," that would repay the huge value of the computer time he had been given. . .so he had to create $100,000,000 worth of value in some other manner. An hour and 47 minutes later, he announced that the greatest value created by computers would not be computing, but would be the storage, retrieval, and searching of what was stored in our libraries.
He then proceeded to type in the "Declaration of Independence" and tried to send it to everyone on the networks. . .which can only be described today as a not so narrow miss at creating an early version of what was later called the "Internet Virus."
A friendly dissuasion from this yielded the first posting of a document in electronic text, and Project Gutenberg was born as Michael stated that he had "earned" the $100,000,000 because a copy of the Declaration of Independence would eventually be an electronic fixture in the computer libraries of 100,000,000 of the computer users of the future.
This philosophical premise has created several offshoots:
The reason for this is that 99% of the hardware and software a person is likely to run into can read and search these files.
Any other system of etext storage is going to fall short of an audience of 99%.
This does not mean there are not other valid mean of doing the etext business. . .after all, over half the computers are DOS, so one could address a wide audience by just doing DOS. Plain Vanilla ASCII, however, addresses the audience with Apples and Ataris all the way to the old homebrew Z80 computers, while an audience of Mac, UNIX and mainframers is still included.
In this same vein, Project Gutenberg selects etexts targeted a bit on the "bang for the buck" philosophy. . .we choose etexts we hope extremely large portions of the audience will want and use frequently. We are constantly asked to prepare etext from out of print editions of esoteric materials, but this does not provide for usage by the audience we have targeted, 99% of the general public.
Also in the same vein, Project Gutenberg has avoided requests, demands, and pressures to create "authoritative editions." We do not write for the reader who cares whether a certain phrase in Shakespeare has a ":" or a ";" between its clauses. We put our sights on a goal to release etexts that are 99.9% accurate in the eyes of the general reader. Given the preferences your proofreaders have, and the general lack of reading ability the public is currently reported to have, we probably exceed those requirements by a significant amount. However, for the person who wants an "authoritative edition" we will have to wait some time until this becomes more feasible. We do, however, intend to release many editions of Shakespeare and the other classics for the comparative study on a scholarly level, before the end of the year 2001, when we are scheduled to complete our 10,000 book Project Gutenberg Electronic Public Library.
Project Gutenberg hopes to be a part of massive celebrations a 100th Anniversary of Public Libraries deserves in 1995, and in 1997 hopes to found "The Public Domain Register," on the 100th Anniversary of The U.S. Copyright Register.
We hope you will be part of it, too. You are all invited.
This goal of presenting Public Domain Editions immediately has the Public Domain Register as it predecessor. Before I expect the availability of all Public Domain materials, we have to at least come up with a way of listing what those titles are. If you are interested, please let us know before 1997 so we might be able to include your efforts in the Public Domain Register.
This has several ramifications:
i.e. when we started, the files had to be very small as a normal 300 page book took one meg of space which no one in 1971 could be expected to have (in general). So doing the U.S. Declaration of Independence (only 5K) seemed the best place to start. This was followed by the Bill of Rights-- then the whole US Constitution, as space was getting large (at least by the standards of 1973). Then came the Bible, as individual books of the Bible were not that large, then Shakespeare (a play at a time), and then into general work in the areas of light and heavy literature and references.
By the time Project Gutenberg got famous, the standard was 360K disks, so we did books such as Alice in Wonderland or Peter Pan because they could fit on one disk. Now 1.44 is the standard disk and ZIP is the standard compression; the practical filesize is about three million characters, more than long enough for the average book.
However, pictures are still so bulky to store on disk that it will still be a while before we include even the lowres Tenniel illustrations in Alice and Looking-Glass. However we ARE very interested in doing them, and are only waiting for advances in technology to release a test edition. The market will have to establish SOME standards for graphics, however, before we can attempt to reach general audiences, at least on the graphics level.
To illustrate our faith in graphics, and in the future, we have gone one step further in our pursuit of what we named "Replicator Technology" TM a few years ago. We would like the end of this phase of Project Gutenberg (at year's end, 2001 with a first 3D application of Replicator Technology), by doing CAT, MRI and XRAY Fluoroscopy scans of something, perhaps a painting, and printing 3D copies. If anyone can get us access to a hundred year old masterpiece. . . .
This has created a need to present these Project Gutenberg Etexts in "Plain Vanilla ASCII" as we have come to call it over the years.
The reason for this is simple. . .it is the only text mode that is easy on both the eyes and the computer.
However, this encourages others to improve our etexts in a variety of ways and to distribute them in a variety of the available media, as follows:
Once an etext is created in Plain Vanilla ASCII, it is the foundation for as many editions as anyone could hope to do in the future. Anyone desiring an etext edition matching, or not matching, a particular paper edition can readily do the changes they like without having to prepare that whole book again. They can use the Project Gutenberg Etext as a foundation, and then build in any direction they like.
Thus any complaints about how we do italics, bold, and the underscoring, or whether we should use this or that markup formula are sent back with encouragement to do it any ways any person wants it, and with the basic work already done, with our compliments.
The same goes for media. We have had a long-standing work ethic of providing our etexts in any medium people wanted: Amiga, Apple, Atari. . .to IBM, to Mac, to TRS-80. . . .
However, now that our etexts are carried in so many BBS's, networks and other locations, it is easier to download the file in a manner that puts them in your format than we can make and mail a disk, so we don't really do that too much.
The major point of all this is that years from now Project Gutenberg Etexts are still going to be viable, but program after program, and operating system after operating system are going to go the way of the dinosaur, as will all those pieces of hardware running them. Of course, this is valid for all Plain Vanilla ASCII etexts. . .not just those your access has allowed you to get from Project Gutenberg. The point is that a decade from now we probably won't have the same operating systems, or the same programs and therefore all the various kinds of etexts that are not Plain Vanilla ASCII will be obsolete. We need to have etexts in files a Plain Vanilla search/reader program can deal with; this is not to say there should never be any markup. . .just those forms of markup should be easily convertible into regular, Plain Vanilla ASCII files so their utility does not expire when programs to use them are no longer with is. Remember all the trouble with CONVERT programs to get files changed from old word processor programs into Plain Vanilla ASCII?
Do you want to go through all that again with every book a whole world ever puts into etext?
The value of Plain Vanilla ASCII is obvious. . .so is very much of the value of most of the various markup systems we have in the world. But until some real standards arrive-- we would be limiting our options a great deal if we do not keep copies of all etexts in Plain Vanilla ASCII as well.
We don't have anything against markup. Not vice versa.
Alice in Wonderland, the Bible, Shakespeare, the Koran and many others will be with us as long as civilization. . .an operating system, a program, a markup system. . .will not.
This includes the many requests we have for compression in particular formats. There are only two formats we know of that are suitable for transfer to a wide general audience: Plain Vanilla ASCII (.txt files) and ZIPped files of them, (.zip files). Requests for other compression formats must be ignored as they are appropriate only for small portions of our target audience. However, (programmers take note: we will need help) we are planning to put some compression links on our files so they can be transmitted in any of an assortment compression formats on the fly. i.e. we should be able to generate any kind of file asked for, but we can keep only one copy of each etext on our servers. . .as the .Z compression format does in a similar manner today.
We want people to be able to look up quotations they heard in conversation, movies, music, other books, easily with a library containing all these quotations in an easy to find etext format. With Plain Vanilla ASCII you will be easily able to search an entire library, without any program more sophisticated than a plain search program. In fact, these Project Gutenberg Etext files are so plain that you can do a search on them without even using an intermediate search program (i.e. a program between you and the disk) Norton's and other direct disk access programs can search every one of your files without you even naming them, pointing to an etext directory, or whatever. You can simply search a raw output from the disk. . .I do this on a half gigabyte disk partition, containing all our editions.
Copyright [1992] Michael Hart