This last week has proved to be quite a week for the EPUB eBook format with announcements from Google, Sony and Project Gutenberg on their support for the EPUB format.
Project Gutenberg EPUB Books
Over at Project Gutenberg, Marcello Perathoner has been working hard to convert all the Gutenberg titles into the EPUB format. At this time these versions should be considered experimental, but after trying several different titles, they are all more than readable.
The books are converted where possible from the HTML version in the Gutenberg archives and for those titles without a HTML version, Marcello uses the plain .txt book. The plain .txt files at Gutenberg are notoriously inconsistent in their layout so converting these accurately is extremely difficult — I know this myself only too well. Perhaps it’s time Project Gutenberg embraced a Master Format.
What makes this special from the other news (see below) is that all the Gutenberg books go through a proofreading process and so the accuracy is very high. This is why so many other eBook project are based on the Gutenberg archives.
Google and Sony partner to release 500,000 Public Domain EPUB Books
Over the last few years Google has been scanning bo0ks by the million, making them available on their book search, but this is the first time they have any of them available to an eBook reader. All the titles are in the public domain (pre-1923 titles only) and once added to the current Sony Reader catalogue, brings the total available titles to around 600,000, far surpassing Amazon Kindle’s 240,000 catalogue. Amazon still use their own propriety eBook format and do no currently allow EPUB files to be read natively – conversion is needed first – but as the Google EPUB books all come DRM free, there are many tools out there that will allow you access to these on a Kindle or other reading platform.
As the press release states, “the publishing industry has more or less united on EPUB for e-book distribution”, so c’mon Amazon, let’s see some native EPUB support!
Don’t have a Sony Reader?
Although this announcement is aimed at Sony Reader owners, you don’t need to own a PRS-505 or PRS-700 to read these Google EPUB books. Just download the Sony “eBook Library” software to search and download. Paul Biba’s article over at TeleRead shows how to access the books without owning a Reader.
No access for non-U.S. based PRS-505 owners
Yet again, this is a U.S. only release, at least if you already own a PRS-505. You can only access the Google books via Sony’s “eBook Library” software, but in my UK version we don’t have access to the Sony Book Store as we are only pointed to the Waterstone website. When I tried to install the new version (2.5) it tells me it cannot install in my region! I believe it is because I already have my UK reader registered and if I were to install on a second computer I would then have access to these titles.
If there are any non U.S. PRS-505 users out there who have succeeded in accessing the archive, please let me know.
Google EPUB Quality?
As Google does not proof read their scanned books, relying completely on the OCR (Optical Character Recognition) software to produced the digital characters, the quality of the text themselves is not as good as archives like Project Gutenberg or websites that take their books from PG, such as Feedbooks.com. Then again, were are getting 500K free books with the potential to access another million Google books in the future.
All in all, this has been a good week for the EPUB format.
It’s unfortunate that the Gutenberg EPUB process isn’t taking more care in creating those files. Many are a single monolithic chunk of HTML wrapped in EPUB. We’ve worked very hard at a Gutenberg conversion process, and we would certainly be willing to help them out if they want to use our API to create better conversions. Gutenberg zipped HTML and HTML files create beautiful EPUBs with our converter, and it would spare the community the pain of dealing with even more batches of poorly converted files. Send them our way and we’ll be happy to lend some time and assistance.
Aaron Miller
Hi Aaron, I’ve already recommended that the files should be broken into seperate chapters where possible, hopefully this will be fixed in the near future.
You might want to contact Marcello directly regarding your proposal, his details can be found on the Project Gutenberg Contact page.