In my last post I talked about the epubBooks Project and how I plan to convert Project Gutenberg .txt eBooks to the ePub format and how I will make these eBooks available for download from ePubBooks.com.
I already have in place a converter to transform the PG .txt files to a TEI Master Format and also an XSLT script to convert these into XHTML. The final task now is to create a converter for TEI to the ePub format.
Before I attempt to write this converter I will need to have a much better understanding on how a book is laid out inside the ePub OEBPS Container Format (OCF) .zip archive. So I set about taking my XHTML output file and breaking it up into the appropriate parts ready to be packaged in to an .epub file.
On the whole this went fairly smoothly, although I did encounter a couple of issues, which I’ll explain at the end of this article.
A great way to understand how to make your own ePub Book is to download and examine a pre-existing book. My reference book was Jon Noring’s submission of “My Ántonia” by Willa Cather, found on the IDPF website.
After unzipping and examining the contents everything looked straight forward, so went ahead and started editing Jon’s file into my own.
My first task was to split up the all-in-one XHTML file into separate chapters, title page, footnotes, etc., thus creating the OPS files. During this I added the appropriate header and footer (using My Ántonia as the guide), making sure I also included the correct link to the CSS file and giving each its own title.
As XHTML 1.1 can be used directly within an ePub document there was nothing to change within the text itself.
Once I had all my separate OPS parts I went ahead and started editing the ePub OPF file.
Again using Jon’s example as a guide, I entered all the book information (Title, Author, etc..) into the meta tags – an important tag to note is the
dc:identifier. For this you will need to create a unique identifier for the book/document. You can use anything you like here (including an ISBN number) as long as it is completely unique. As this is just a test file I used the epubbooks.com domain name, the date and the time. (This ID will also be used in the NCX file.)
Once I was happy with the data I went on to the
manifest section and listed all the files used in the publication
; cover, title page, introduction, chapters, footnotes, CSS Style Sheets, images and finally the NCX file.
spine section lists the reading order for the book and was pretty straight forward.
Next I edited the NCX (Navigation Center eXtended) file. This provides the Reading System with the TOC listing and navigation links. Each entry is given an ID, PlayOrder, Label and filename. ID’s should always be unique and the ‘PlayOrder’ starts at “1” with no gaps in the sequence.
There are couple of important points to take note on here. The ‘Unique ID’ created in the OPF file (
dc:identifier) needs to be included in this
meta section. You will also need to adjust the
<meta name="dtb:depth" content="1"/> value.
If you have an eBook with just chapters then the depth will be “1”. If you have an eBook that has Books, Chapters and Sections, then Book is Level 1, Chapters are Level 2 and Sections are Level 3. The more sections you have within your TOC the more ‘depths’ you will need to state.
All the final editing needed was to set up links for the footnotes. As I’m storing the footnotes in a separate file I marked up the entry in the
linear="no" as this should be considered an “auxiliary” file.
Now all that was needed was to add the filename to the
a tag in the
footnotes.xml file, which in this case became
chapter001.xml#fn-place-1 and In the
chapter001.xml file I added a link to the footnote file,
Creating the .epub file
There’s a couple of rules to follow when creating your .zip (ePub) file.
mimetypemust be the first file in the .zip
- No compression is to be used on this file.
Once you have this file in place then you can then go ahead and add the rest of the content, just make sure you retain the directory structure.
Problems and further research
One thing to remember is that filenames are case sensitive. Make sure you use the same case as stated in your OPF and NCX files, otherwise they will not be displayed.
When I created my XHTML version I had each TOC entry linking to the appropriate chapter, if you clicked on the chapter heading you would be transported back to the TOC entry. When using DE on my desktop computer there did not seem to be a need to use linking back to the TOC, but until I get myself a Sony Reader or BeBook I won’t be able to test exactly how this works on a dedicated reader.
Although my .epub eBook displays perfectly well in Adobe DE, it does however fail on many points when tested against the epubcheck tool. Most of these seem related to undeclared entities (
ndash) and some undefined
fragment identifiers. I guess I’ll just need to get stuck into the specifications and see where I’m going wrong – I don’t think these are going to be major issues though.
I hope article has provided a nice overview on creating an ePub eBook. I still need to clean up these epubcheck errors but once that’s done I can get on with writing the XSLT conversion script. I will likely do a follow up article covering what was need to validate against epubcheck and I will try and write some more detailed articles on creating both the OPF and NCX files.