This last week has proved to be quite a week for the EPUB eBook format with announcements from Google, Sony and Project Gutenberg on their support for the EPUB format.
Project Gutenberg EPUB Books
Over at Project Gutenberg, Marcello Perathoner has been working hard to convert all the Gutenberg titles into the EPUB format. At this time these versions should be considered experimental, but after trying several different titles, they are all more than readable.
The books are converted where possible from the HTML version in the Gutenberg archives and for those titles without a HTML version, Marcello uses the plain .txt book. The plain .txt files at Gutenberg are notoriously inconsistent in their layout so converting these accurately is extremely difficult — I know this myself only too well. Perhaps it’s time Project Gutenberg embraced a Master Format.
What makes this special from the other news (see below) is that all the Gutenberg books go through a proofreading process and so the accuracy is very high. This is why so many other eBook project are based on the Gutenberg archives.
Google and Sony partner to release 500,000 Public Domain EPUB Books
Over the last few years Google has been scanning bo0ks by the million, making them available on their book search, but this is the first time they have any of them available to an eBook reader. All the titles are in the public domain (pre-1923 titles only) and once added to the current Sony Reader catalogue, brings the total available titles to around 600,000, far surpassing Amazon Kindle’s 240,000 catalogue. Amazon still use their own propriety eBook format and do no currently allow EPUB files to be read natively – conversion is needed first – but as the Google EPUB books all come DRM free, there are many tools out there that will allow you access to these on a Kindle or other reading platform.
Over the last few weeks several people have emailed asking if epubBooks.com has been abandoned – the answer is a resounding No! Okay, I know I’ve not been very active recently so please accept my apologies for that.
The reason for such limited activity is that I am working very hard on developing the new site for epubbooks.com – yes, the blog and current resources will still be accessible. The new site will allow you to download all kinds of different EPUB books, including many from the Project Gutenberg archives.
If you don’t know what my ePub Books Project is, here is a short summary.
The project was started to provide free downloads of nicely formatted EPUB files, the majority of which will be taken from the Project Gutenberg archives. These will not just be plain TEXT files enclosed in an EPUB container, but fully converted to XML (of a TEI flavour) which are themselves converted using XSLT into professional quality EPUB files. Here are some of the features;
- Properly formatted and displayed Chapter Titles/Subtitles.
- Footnotes which are Linkable (forward and backward) for instant access
- Books with Illustrations will also be available.
- Text Formatting (italics, etc.)
- Nice indents for block quotes, letters of correspondence, epigraphs, etc.
- …and many more features
The new web application is the biggest project I’ve developed to date and so is naturally a challenge to my programming skills, which is why it’s taking some time to complete, however things are going very well.
The basic skeleton of the site up and running and I am now working on programming for usability. Of course it’s these less obvious items which are some of the hardest things to programme, so at this time can’t give an accurate launch date. Rest assured it will be sooner rather than later.
Thanks for your patience and understanding and do keep checking back regularly for any new updates.
I Twittered (@epub) about the Cleveland Public Library press release when it was first announced, and David from TeleRead has also written a post on this. “This” being that the Cleveland library is the first library to offer up eBook downloads in the EPUB format! Naturally this is great news for the EPUB fans, but more importantly it’s great for the general public at large.
OverDrive are providing them and another 8,500 libraries access to EPUB books for borrowing. We must also presume that as OverDrive increase their number of EPUB titles , all these libraries will be offered them too.
As TeleRead mentions, it would be great if they could also offer their books via popular iPhone readers such as Stanza which could then encourage younger readers to get back to books.
I expect 8,500 libraries is a good coverage across the U.S. but as an European I hope our libraries can strike a similiar deal. If both sides of the big pond can offer up these services then there’s potential for more countries to follow suit, which would be particularly useful for those where moving a ton of paper books around can be quite difficult and expensive.
During 2008 the EPUB eBook format gained huge acceptance and we heard rumours touting that there would be 20,000 available EPUB titles by the end of the year. Waterstones were saying this prior to the release of the Sony Reader in the UK.
As we head into 2009, Waterstones still shows less than 7000 titles in their catalogue, when viewing all available eBook titles. However, I get the feeling that this will change quite soon.
A few days back BooksOnBoard made an announcement on WebWire that they now have 30,000 titles available for the iPhone. After doing a search on their site I found that almost 20,000 of those are in the EPUB format. Great news for ePub fans, but we need more. Still, BooksOnBoard was the first retailer to make commercial ePub formatted books available and their entire online eBook collection consists of almost 300,000 titles. Perhaps they will be the first to reach 100,000 ePub books!
No doubt other online retailers such as Waterstones will be soon following suit – will 2009 be the year of the EPUB format?
Below is an example <pageList> markup (that is valid per the NCX DTD) which can be used to mark up page numbers within ePub documents.
Note that <pageList> must be placed right after the required <navMap>, and must occur before the first optional <navList>. There may be one and only one <pageList> (but there can be any number of <navList> — of course, there can only be one <navMap>).
<navLabel><text>Paper Edition Page Mapping</text></navLabel>
<pageTarget id="page-iii" value="3" type="front" playOrder="82">
<!-- ... -->
<pageTarget id="page-105" value="105" type="normal" playOrder="192">
1. Currently there is an error in epubcheck 1.0.3 which says <pageList> must include both the id and class attributes, while they should be optional. Hopefully that bug will get fixed. The above markup includes the optional id (generally a good idea), but not the class so it will not validate to 1.0.3.
Back in September I wrote about my epubBooks development; a project to convert the .TXT ebooks from Project Gutenberg into the IDPF’s EPUB format. After many months of hard work I’e finally finished the conversion tools and I’m now preparing development of the website itself, which will allow anyone to download my EPUB books, and all for free.
Although I’m happy with the current formatting in the EPUB files, I wanted to turn to you, the ebook community and ask for your feedback, in the hope that the improvements you submit will make these EPUB ebooks even better.
EPUB Book Features
- Linked Footnotes – each footnote number is a link, click on this to see the footnote (I’ve actually made them all endnotes). Clicking on the notes number takes you back to the original page.
- Images – Some titles will include images.
- Nicely formatted titles, subtitles, etc.
- Paragraph indents – Except on first paragraph of a chapter/section – as is usual in paper books
- Block Indents – Small left/right indents on block quotes, letters of correspondence, songs, etc.
This is just a small selection for some of the formatting features I’ve implemented.
Please Note: As certain systems enforce their own stlying by defaults, various features will display differently. UPDATE (2011): This isn’t as prolific as it used to be.
Test the EPUB
The title I’m making available as a pre-release download is Gulliver’s Travels by Jonathan Swift – this has many features which show off my conversion. As this eBook contains images it is quite large, weighing in at over 5MB.
The test book has now been removed as you can find the final release here;
All comments, on both the frontend formatting (indents, italics, etc) and the underlying code (OPF, NCX, HTML markup) is very much appreciated.
This ebook can be read using Adobe Digital Editions, Stanza (desktop and iPhone version), Sony Reader (PRS-505 and PRS-700), BeBook and the FBReader.
EPUB formatted books as an industry wide standard is what I, and many others want. But can we achieve this without Amazon’s adoption — at least with ePub support on their Kindle eBook reader.
It will certainly be a lot easier to have a standard eBook format if Amazon joined the ePub party. I’m reluctant to say it but all current indications show that Amazon will not adopt the ePub format in the near future – but perhaps there is hope.
Recently we have seen a flurry of publishers and eBook projects (including yours truly) adopting the ePub format and a number of these are pushing their titles onto the iPhone/iPod Touch platform via the Stanza eBook reader. Feedbooks, Project Gutenberg are the two big projects but we now have Pan Macmillan offering commercial Tasters and in the last few days BookGlutton announced that they have joined forces with Stanza. Interesting times ahead for sure.
With all this recent iPhone/eBook activity I am asking myself, where is Amazon? The Kindle is certainly making waves with big sale numbers but this is probably nothing compared to iPhone sales. This makes me wonder if Amazon will start making their titles available on this platform and if so, what format will they use. If they use their own eBook format (AZW), they would need to release a dedicated ‘Amazon eBook Reader’ — how many different iPhone reader applications will people accept?
Everyone around here knows that having one standard eBook format will better serve everyone. If Amazon opens their Kindle to the ePub format and strikes a deal with a company such as Lexcycle (Stanza) they could kill two birds with one stone. Hmm, perhaps an Amazon/Stranza union is a little too much wishful thinking.
Providing direct purchase and download would make Amazon a serious option for any iPhone or iPod Touch user, and vise versa.
So, can Amazon leverage the iPhone to further dominate the eBook market and can they continue to resist the ePub eBook format?
Disclaimer: The Amazon/ePub logo I created is intended just for fun.
During the Digital Lunch seminar at this years Frankfurt Book Fair, Michael Vantusko from Overdrive commented that W.H.Smith was one of their eBook customers. As I was updating the epubbooks.com homepage I thought I’d check them out to see if they actually have them online yet. I don’t know the date W.H.Smith went live with their ePub books but they currently have almost 6,500 ePub titles.
As Overdrive also distribute to Waterstones I would imagine that the W.H.Smith eBook collection will grow quite quickly. It’s great to see more stores offering ePub formatted books to the consumers – perhaps this extra competition will result in lower eBook prices sooner rather than later.
There is however quite some way to go before the number of ePub titles reaches the overall eBook numbers. Here’s a quick breakdown of the current eBook titles in the W.H.Smith eBook store;
Liza Daly from threepress.org has just released an article outlining problems she is having with users uploading invalid ePub formatted documents to Bookworm; an online ePub book reader. It’s very important for anyone developing ePub eBooks to produce valid markup. Not only will Bookworm give desirable results when rendering, but you’ll also be covering yourself for any future rendering engines and conversions you might need to do.
It’s actually quite surprising how many errors are showing up from files submitted to Bookworm. You should go over to the threepress blog for a full explanation, but here’s a list of the main errors;
- Missing required attributes in the metadata
- Metadata that hasn’t been proofread
- Improper nesting of the ePub zip file
- Items declared in the OPF file that are missing from the archive
- Invalid XHTML
Points 1 to 4 are really quite vital, although it is understanable for many documents to have invalid XHTML. Still, if it is within your means, I would try to control this the best you can.
I have plans to write some detailed articles regarding the creation of both the NCX and OPF files found in an ePub document, so keep a lookout for those.
In October 2007 the IDPF elevated OPS 2.0 to an official standard and it was from this point I realised that we might well see the ePub format adopted worldwide as an eBook standard.
Planning started on how I would go about converting my TEI eBooks to the ePub format. After plenty of research I decided the best solution would be to utilise XSLT.
Okay, so I’d never actually used XSLT before, but how hard could it be?
TEI to XTHML Conversion using XSLT
In June 2008 I set to work on teaching myself this new language, XSLT, getting thoroughly confused in the process. So after a few weeks I decided I needed help and while on a trip to London, I popped into Waterstones and bought the book XSLT 2.0 and XPath 2.0 by Michael Kay – only the paper edition though ;-)
This gave a big boost to my skills and from this point on I made quick progress…well, quick by my standards.