Below is an example <pageList> markup (that is valid per the NCX DTD) which can be used to mark up page numbers within ePub documents.
Note that <pageList> must be placed right after the required <navMap>, and must occur before the first optional <navList>. There may be one and only one <pageList> (but there can be any number of <navList> — of course, there can only be one <navMap>).
<navLabel><text>Paper Edition Page Mapping</text></navLabel>
<pageTarget id="page-iii" value="3" type="front" playOrder="82">
<!-- ... -->
<pageTarget id="page-105" value="105" type="normal" playOrder="192">
1. Currently there is an error in epubcheck 1.0.3 which says <pageList> must include both the id and class attributes, while they should be optional. Hopefully that bug will get fixed. The above markup includes the optional id (generally a good idea), but not the class so it will not validate to 1.0.3.
A couple of weeks back we had a new release of the epub validation tool as the old one was not validating documents properly. epubcheck-1.0.3 was released to fix the XMLParser as it was not allowing multiple validators to be added.
The error was first realised by Jon Noring who noticed that Adobe’s “page-map” attribute extension, which is used in the NCX , was being validated incorrectly. This extended markup can be used for mapping page numbers (to align with those in the paper book edition).
Jon Noring has posted to several communities about the page-map issue. Here’s a short extract (slightly edited); Continue reading
Liza Daly from threepress.org has just released an article outlining problems she is having with users uploading invalid ePub formatted documents to Bookworm; an online ePub book reader. It’s very important for anyone developing ePub eBooks to produce valid markup. Not only will Bookworm give desirable results when rendering, but you’ll also be covering yourself for any future rendering engines and conversions you might need to do.
It’s actually quite surprising how many errors are showing up from files submitted to Bookworm. You should go over to the threepress blog for a full explanation, but here’s a list of the main errors;
- Missing required attributes in the metadata
- Metadata that hasn’t been proofread
- Improper nesting of the ePub zip file
- Items declared in the OPF file that are missing from the archive
- Invalid XHTML
Points 1 to 4 are really quite vital, although it is understanable for many documents to have invalid XHTML. Still, if it is within your means, I would try to control this the best you can.
I have plans to write some detailed articles regarding the creation of both the NCX and OPF files found in an ePub document, so keep a lookout for those.
In my last post I talked about the epubBooks Project and how I plan to convert Project Gutenberg .txt eBooks to the ePub format and how I will make these eBooks available for download from ePubBooks.com.
I already have in place a converter to transform the PG .txt files to a TEI Master Format and also an XSLT script to convert these into XHTML. The final task now is to create a converter for TEI to the ePub format.
Before I attempt to write this converter I will need to have a much better understanding on how a book is laid out inside the ePub OEBPS Container Format (OCF) .zip archive. So I set about taking my XHTML output file and breaking it up into the appropriate parts ready to be packaged in to an .epub file.
On the whole this went fairly smoothly, although I did encounter a couple of issues, which I’ll explain at the end of this article.