<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>epubBlog &#187; PG</title>
	<atom:link href="http://blog.epubbooks.com/tag/project-gutenberg/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.epubbooks.com</link>
	<description>epubBlog: EPUB eBook Help &#38; Resources</description>
	<lastBuildDate>Tue, 15 May 2012 08:49:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Google + Sony + Project Gutenberg = EPUB bliss!</title>
		<link>http://blog.epubbooks.com/403/google-sony-project-gutenberg-epub-bliss</link>
		<comments>http://blog.epubbooks.com/403/google-sony-project-gutenberg-epub-bliss#comments</comments>
		<pubDate>Sun, 22 Mar 2009 11:35:18 +0000</pubDate>
		<dc:creator>Mike Cook</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Books]]></category>
		<category><![CDATA[eReaders]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[PG]]></category>
		<category><![CDATA[Sony Reader]]></category>

		<guid isPermaLink="false">http://www.epubbooks.com/blog/?p=403</guid>
		<description><![CDATA[This last week has proved to be quite a week for the EPUB eBook format with announcements from Google, Sony and Project Gutenberg on their support for the EPUB format. Project Gutenberg EPUB Books Over at Project Gutenberg, Marcello Perathoner has been working hard to convert all the Gutenberg titles into the EPUB format. At [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>This last week has proved to be quite a week for the EPUB eBook format with announcements from Google, Sony and Project Gutenberg on their support for the EPUB format.</p>
<h3>Project Gutenberg EPUB Books</h3>
<p>Over at Project Gutenberg, Marcello Perathoner has been working hard to convert all the <a title="Gutenberg book inthe EPUB format" href="http://www.gutenbergnews.org/20090320/epub-books-now-available-at-project-gutenberg/">Gutenberg titles into the EPUB format</a>. At this time these versions should be considered experimental, but after trying several different titles, they are all more than readable.</p>
<p>The books are converted where possible from the HTML version in the Gutenberg archives and for those titles without a HTML version, Marcello uses the plain .txt book. The plain .txt files at Gutenberg are notoriously inconsistent in their layout so converting these accurately is extremely difficult &#8212; I know this myself only too well. Perhaps it&#8217;s time Project Gutenberg embraced a <a title="What is an eBook &quot;Master Format&quot;" href="http://www.teleread.org/blog/2007/02/13/digital-text-masters-digitizing-the-classic-public-domain-books/">Master Format</a>.</p>
<p>What makes this special from the other news (see below) is that all the Gutenberg books go through a proofreading process and so the accuracy is very high. This is why so many other eBook project are based on the Gutenberg archives.</p>
<h3>Google and Sony partner to release 500,000 Public Domain EPUB Books</h3>
<p>Over the last few years Google has been scanning bo0ks by the million, making them available on their book search, but this is the first time they have any of them available to an eBook reader. All the titles are in the public domain (pre-1923 titles only) and once added to the current Sony Reader catalogue, brings the total available titles to around 600,000, far surpassing Amazon Kindle&#8217;s 240,000 catalogue. Amazon still use their own propriety eBook format and do no currently allow EPUB files to be read natively &#8211; conversion is needed first &#8211; but as the Google EPUB books all come DRM free, there are many tools out there that will allow you access to these on a Kindle or other reading platform.</p>
<p><span id="more-403"></span>As the press release states, &#8220;the publishing industry has more or less united on EPUB for e-book distribution&#8221;, so c&#8217;mon Amazon, let&#8217;s see some native EPUB support!</p>
<h4>Don&#8217;t have a Sony Reader?</h4>
<p>Although this announcement is aimed at Sony Reader owners, you don&#8217;t need to own a PRS-505 or PRS-700 to read these Google EPUB books. Just download the Sony &#8220;eBook Library&#8221; software to search and download. <a title="Teleread article that shows how to access Google EPUB books" href="http://www.teleread.org/2009/03/18/google-and-sony-team-up-to-provide-500000-public-domain-titles-in-epub-for-sony-reader-owners/">Paul Biba&#8217;s article over at TeleRead</a> shows how to access the books without owning a Reader.</p>
<h4>No access for non-U.S. based PRS-505 owners</h4>
<p>Yet again, this is a U.S. only release, at least if you already own a PRS-505. You can only access the Google books via Sony&#8217;s &#8220;eBook Library&#8221; software, but in my UK version we don&#8217;t have access to the Sony Book Store as we are only pointed to the Waterstone website. When I tried to install the new version (2.5) it tells me it cannot install in my region! I believe it is because I already have my UK reader registered and if I were to install on a second computer I would then have access to these titles.</p>
<p>If there are any non U.S. PRS-505 users out there who have succeeded in accessing the archive, please let me know.</p>
<h4>Google EPUB Quality?</h4>
<p>As Google does not proof read their scanned books, relying completely on the OCR (Optical Character Recognition) software to produced the digital characters, the quality of the text themselves is not as good as archives like Project Gutenberg or websites that take their books from PG, such as Feedbooks.com. Then again, were are getting 500K free books with the potential to access another million Google books in the future.</p>
<p>All in all, this has been a good week for the EPUB format.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.epubbooks.com/403/google-sony-project-gutenberg-epub-bliss/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>ePub Books Project Update &#8211; Still Alive and Kicking!</title>
		<link>http://blog.epubbooks.com/393/epub-books-project-update-alive-and-kicking</link>
		<comments>http://blog.epubbooks.com/393/epub-books-project-update-alive-and-kicking#comments</comments>
		<pubDate>Mon, 09 Feb 2009 20:46:06 +0000</pubDate>
		<dc:creator>Mike Cook</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[PG]]></category>
		<category><![CDATA[TEI]]></category>
		<category><![CDATA[XSLT]]></category>

		<guid isPermaLink="false">http://www.epubbooks.com/blog/?p=393</guid>
		<description><![CDATA[Over the last few weeks several people have emailed asking if epubBooks.com has been abandoned &#8211; the answer is a resounding No! Okay, I know I&#8217;ve not been very active recently so please accept my apologies for that. The reason for such limited activity is that I am working very hard on developing the new [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Over the last few weeks several people have emailed asking if <a href="http://www.epubbooks.com">epubBooks.com</a> has been abandoned &#8211; the answer is a resounding No! Okay, I know I&#8217;ve not been very active recently so please accept my apologies for that.</p>
<p>The reason for such limited activity is that I am working very hard on developing the new site for epubbooks.com &#8211; yes, the blog and current resources will still be accessible. The new site will allow you to download all kinds of different EPUB books, including many from the Project Gutenberg archives.</p>
<p>If you don&#8217;t know what my <a title="ePub Books Project outline" href="/161/the-epub-books-project-part-1-an-introduction">ePub Books Project</a> is, here is a short summary.</p>
<p>The project was started to provide free downloads of nicely formatted EPUB files, the majority of which will be taken from the Project Gutenberg archives. These will not just be plain TEXT files enclosed in an EPUB container, but fully converted to XML (of a TEI flavour) which are themselves converted using XSLT into professional quality EPUB files. Here are some of the features;</p>
<ul>
<li>Properly formatted and displayed Chapter Titles/Subtitles.</li>
<li>Footnotes which are Linkable (forward and backward) for instant access</li>
<li>Books with Illustrations will also be available.</li>
<li>Text Formatting (italics, etc.)</li>
<li>Nice indents for block quotes, letters of correspondence, epigraphs, etc.</li>
<li>&#8230;and many more features</li>
</ul>
<p>The new web application is the biggest project I&#8217;ve developed to date and so is naturally a challenge to my programming skills, which is why it&#8217;s taking some time to complete, however things are going very well.</p>
<p>The basic skeleton of the site up and running and I am now working on programming for usability. Of course it&#8217;s these less obvious items which are some of the hardest things to programme, so at this time can&#8217;t give an accurate launch date. Rest assured it will be sooner rather than later.</p>
<p>Thanks for your patience and understanding and do keep checking back regularly for any new updates.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.epubbooks.com/393/epub-books-project-update-alive-and-kicking/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ePub Books Project Part 2: A Little History</title>
		<link>http://blog.epubbooks.com/168/epubbooks-project-part2-history</link>
		<comments>http://blog.epubbooks.com/168/epubbooks-project-part2-history#comments</comments>
		<pubDate>Mon, 15 Sep 2008 18:34:50 +0000</pubDate>
		<dc:creator>Mike Cook</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[PG]]></category>
		<category><![CDATA[TEI]]></category>

		<guid isPermaLink="false">http://www.epubbooks.com/blog/?p=168</guid>
		<description><![CDATA[My first suspicions that eBooks were going to be the future was way back in the day when all those CD’s were coming out and taking over our beloved Vinyl LP, but the real light-bulb moment was when I first discovered Project Gutenberg sometime at the end of the 1990’s. If memory serves me correctly, [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>My first suspicions that eBooks were going to be the future was way back in the day when all those CD’s were coming out and taking over our beloved Vinyl LP, but the real <em>light-bulb</em> moment was when I first discovered Project Gutenberg sometime at the end of the 1990’s.</p>
<p>If memory serves me correctly, I even considered trying to set up an eBook site back then. I believe what stopped me was that there were just not yet any decent reading devices available &#8211; reading from a computer monitor was, and still is, the most uncomfortable experience ever.</p>
<p>And so the years rolled by&#8230;</p>
<p>Then in 2004/05 I heard about the Sony Librie and immediately knew that the eBooks’ time was coming&#8230;and soon!</p>
<p><span id="more-168"></span>It took me a couple of years to get the project off the ground, but toward the end of 2006 I was seriously working out how to become a part of the eBook revolution. At this time I also started as the Project Gutenberg Newsletter editor which allowed me to get among, and learn from, those who’d been there right from the start.</p>
<p>I spent many months trying to figure out what eBook format would be best suited as my Master Format but after much research and some brain picking, I had my shortlist; Jon Noring’s BookX, TEI and the OPS (ePub) format.</p>
<p>At the time I actually rejected the ePub format as I felt trying to manage all those individual files would be just too much trouble &#8212; for my own tastes I still do. I liked the concept of BookX but felt my XML and DTD skills, which were non-existent at the time, would make it difficult.</p>
<h4>TEI as a Master Format</h4>
<p>I choose TEI not just because there is plenty of documentation, but also because Project Gutenberg was showing indications (albeit reluctant) that this could be an accepted format in their archives on a mass scale. Even if PG won’t accept it as a <em>Master</em> format, you know those PG volunteers are going to keep on producing TEI eBooks for the archives.</p>
<p>Another thing that really attracted me to TEI was that it utilises the ODD; One Document Does it all.</p>
<p>I’ve continued to keep eye out for alternate formats, even considering DTBook at one point, but I guess by sticking with the TEI format I can eventually make my files available for inclusion into the PG archives, so I stayed put.</p>
<p>In April 2007 I started teaching myself Perl and so work began on my pg2tei.pl script. The Gutenberg.org webmaster (Marcello Perathoner) kindly allowed me to use his gut2tei.pl script as a starting point and although the basic structure and a handful of routines remains the same, I’ve rewritten much of the original and added numerous new routines.</p>
<p>From this I created a sample base of around 70 PG eBooks in the TEI P4 format, converting over to TEI P5 when this was released in November 2007. I continue to make improvements and fix bugs wherever possible.</p>
<p>Alas the script is not fully automated, although it does catch most things. The biggest manual work required includes;</p>
<ul>
<li><strong>&lt;teiHeader&gt;</strong> – Mostly automated but does need to be double-checked. The odd error happens so editing and adding changes for missing information is a must – luckily this normally takes just a minute or two, although the more awkward documents can take longer.</li>
<li><strong>Images</strong> – The inclusion of the <strong>&lt;figure&gt;</strong> tags are automatic but as none of the original TXT documents include filenames, I have to manually work out which image goes with which <strong>&lt;figure&gt;</strong> tag. This can be quite time consuming.</li>
<li><strong>Quote tags</strong> – This is probably the biggest consumption of time in the whole process. Although 99.99% of the <strong>&lt;q&gt;</strong> tags are correct, fixing that tiny percent can add many minutes to a conversion. Several times I’ve considered omitting the <strong>&lt;q&gt;</strong> tag mark-up, either leaving the original “double” and ‘single’ quotes in place or just replacing with a quote entity. However, I still feel the versatility this can offer makes them well worth the work.</li>
</ul>
<p>The final output from this whole process produces what I call my <strong>Super-Lite TEI</strong>; creating a set of around 22 TEI tags (excluding the &lt;teiHeader&gt;, &lt;front&gt; and &lt;back&gt; sections) and no more than a dozen attributes.</p>
<p><em>In the final article of the ePub Books Project, I’ll talk about the plan to convert to the ePub format and the future of ePubBooks.com website.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.epubbooks.com/168/epubbooks-project-part2-history/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

