epubBooks Project Part 3: ePub conversion and epubBooks.com development

In October 2007 the IDPF elevated OPS 2.0 to an official standard and it was from this point I realised that we might well see the ePub format adopted worldwide as an eBook standard.

Planning started on how I would go about converting my TEI eBooks to the ePub format. After plenty of research I decided the best solution would be to utilise XSLT.

Okay, so I’d never actually used XSLT before, but how hard could it be?

TEI to XTHML Conversion using XSLT

In June 2008 I set to work on teaching myself this new language, XSLT, getting thoroughly confused in the process. So after a few weeks I decided I needed help and while on a trip to London, I popped into Waterstones and bought the book XSLT 2.0 and XPath 2.0 by Michael Kay – only the paper edition though ;-)

This gave a big boost to my skills and from this point on I made quick progress…well, quick by my standards.

I worked through some examples from the book and a few I found online but basically I got stuck right in with converting my TEI files. I decided on outputting to straight-up HTML code as I am quite advanced in coding XHTML and CSS and would be able understand better what was going on.

After many false starts and rewrites I finally finished writing my Super-Lite TEI to XHTML conversion script. Here’s a rundown of some of the features;

  • Simple Front matter section; Title, Sub-title, Author(s), illustrator(s), Publisher(s), original published date (where given in the source), etc.
  • TOC and Footnotes – generated automatically.
  • Book, Volume, Part and Chapter sections.
  • Paragraphs, Tables, Quotes [within Quotes within Quotes], Italics, Superscript, etc.
  • Images with headings and descriptions, when available.
  • Blockquotes – enclosing letters of correspondence, line-groups and other quote passages.
  • Epigraph, Preface, Introduction, Prologue, etc.
  • Page breaks, thought breaks, etc.
  • CSS stylesheet additions of ID and Class attributes have been added where useful.
  • Text alignment; center, right, indenting, etc.

TEI to ePub Conversion using XSLT

This now brings us to the current state of the project, the next task is to create ePub versions from my master TEI eBooks. I believe most of the hard work regarding the XSLT scripting was done with the TEI to HTML converter so with luck I should be able to get this next script out of the door pretty quickly.

Initially I will just work on books without images although I will try to get some converted early on. There will be a lot more work involved in producing image books and I would prefer to have a reasonable catalogue as quickly as possible.

During the learning phase I will be posting articles to the ePub Books blog following my progress. Although I’ll be working with my custom TEI format I’m hoping these short articles will still be useful to anyone who is considering creating ePub books but is not already an expert in related technologies.

Making ePub eBooks Freely available for Download

The next challenge will be to find a way to make all these ePub books available for everyone to download. I could put together a very simple website with the books listed but I really want to make this not only a super easy but also enjoyable experience so I’m currently searching for another solution.

I’ve looked over a number of open source projects including Drupal, Joomla and WordPress, in conjunction with extensions and also some commercial applications. I wasn’t able to find anything commercial and all the Library scripts I found still meant I would need to hand code way too much. But now I think I have now found a solution using part open source and part commercial. I’ll go into more details once I’ve finalised this.

With luck I should have something ready and in place before the end of 2008. Once I reach this goal I can then concentrate on creating more eBooks from the Gutenberg archives. Of course I will start with the obvious Classics, which should keep me busy for a good few months, but will also be setting up a system for book requests.

As books with images take longer to convert I won’t be providing too many to start with. Although I do already have 15 titles in the TEI format, so I will make these available as soon as possible.
I hope you’ve found this set of articles interesting and are as excited as myself about the future of ePubBooks.com.

If you liked this post, say thanks by sharing it.

2 thoughts on “epubBooks Project Part 3: ePub conversion and epubBooks.com development”

  1. TEI to epub is a development I’m interested in.

    You hit upon the key issue; how do narrow which TEI elements are important and which are not?

    I’m playing around with a hybrid solution for an ebook. use xslt scripts from docbook, etc to produce static html files uploaded by ftp, use php files in the same directory that are more dynamic. kind of kludgy, but I think it will work. (But my needs are simple).

    a naive question: why is it so hard to make epub files that show images? (feedbooks, etc doesn’t seem to support it).

  2. Marking up every possible thing in a text has it’s place for sure, but for our everyday usage I believe an ultra-lite TEI mark-up is more suitable, while still leaving the document open to more detailed tagging later – my hope has always been that PG will take this approach.

    Images in ePub shouldn’t be that difficult, but they are more time consuming. For my own project, preparing the image mark-up in my TEI master is what takes the time, however once this is done I don’t see any difficulties for converting to ePub. I’m guessing this is the same scenario for Feedbooks. (I do have some questions about image sizes so I’ll write a separate post on this soon.)

    If you only have a couple of books to convert, then I would say get your XHTML output and make the ePub version by hand. From XHTML it really shouldn’t take more than an hour or three. If you have many books then a proper pipeline is in order.

    My own scripts are pretty ugly things but ultimately the output is as it needs to be and that, I say, is the important part for the moment.

Comments are closed.