EPUB: Chapter and Verse – XML Prague 2011
Transcription
EPUB: Chapter and Verse – XML Prague 2011
EPUB Chapter and Verse Tony Graham Mentea [email protected] http://www.mentea.net/ Mark Howe [email protected] Version 1.0 – XML Prague 2011 – 26-27 March 2011 © 2011 Mentea EPUB EPUB Chapter and Verse Bibles 5 EPUB 15 © 2011 Mentea 3 Mentea 4 © 2011 Mentea EPUB Bibles 1 © 2011 Mentea 5 Mentea Handheld device 2 iPhone 3 6 © 2011 Mentea EPUB Dedicated reader 4 Web browser 5 © 2011 Mentea 7 Mentea In the beginning, there was ... SFM 6 \c 1 \ms AU COMMENCEMENT \mr 1--11 \s Dieu crée l'univers et l'humanité \p \v 1 Au commencement Dieu créa le ciel et la terre*fa*. \fm a \fr 1.1 \f |iAu commencement...:|x traduction la plus fréquente de ce verset. Elle est imitée de l'ancienne version grecque, à laquelle se réfère très probablement l'évangile de Jean (1.1). Mais le texte hébreu serait mieux rendu par |iQuand Dieu commença de créer le ciel et la terre... Dieu dit|x. \p \v 2 La terre était sans forme et vide, et l'obscurité couvrait l'océan primitif. Le souffle de Dieu se déplaçait à la surface de l'eau*fb*. \fm b \fr 1.2 \f Le v.$2 constitue en hébreu une sorte de parenthèse, le v.$1 ayant sa suite au v.$3. -- |isans forme et vide:|x l'expression hébraïque correspondante a donné en français |itohu-bohu|x. Le jeu de mots pourrait être rendu par |iun désert en désordre|x. -- |iLe souffle de Dieu:|x autre traduction |iun vent terrible|x. OSIS 7 <div type="book"> <title short="GENÈSE" type="main">Genèse</title> <chapter osisID="Gen.1" sID="Gen.1" n="1"/> <div type="majorSection"><title>AU COMMENCEMENT</title> <title type="scope"><reference>1–11</reference></title> <title type="x-section">Dieu crée l’univers et l’humanité</title> <p><verse osisID="Gen.1.1" sID="Gen.1.1" n="1"/>Au commencement Dieu créa le ciel et la terre<note n="a" type="crossReference"> <reference type="source" osisRef="Gen.1.1">1.1</reference> <hi type="italic">Au commencement... :</hi> traduction la plus fréquente de ce verset. Elle est imitée de l’ancienne version grecque, à laquelle se réfère très probablement l’évangile de Jean (1.1). Mais le texte hébreu serait mieux rendu par <hi type="italic">Quand Dieu commença de créer le ciel et la terre... Dieu dit</hi>.</note>.<verse eID="Gen.1.1"/></p> <p><verse osisID="Gen.1.2" sID="Gen.1.2" n="2"/>La terre était sans forme et vide, et l’obscurité couvrait l’océan primitif. Le souffle de Dieu se déplaçait à la surface de l’eau<note n="b" type="crossReference"><reference type="source" osisRef="Gen.1.2">1.2</reference>Le v. 2 constitue en hébreu une sorte de parenthèse, le v. 1 ayant sa suite au v. 3. – <hi type="italic">sans forme et vide :</hi> l’expression hébraïque correspondante a donné en français <hi type="italic">tohu-bohu</hi>. Le jeu de mots pourrait être rendu par <hi type="italic">un désert en désordre</hi>. – <hi type="italic">Le souffle de Dieu :</hi> autre traduction <hi type="italic">un vent terrible</hi>.</note>. <verse eID="Gen.1.2"/> 8 © 2011 Mentea EPUB EPUB 8 • XHTML <body class="chapter" id="top"><div class="book"> <h2 class="majorSection nolinegap">AU COMMENCEMENT</h2> <h2 class="scope wholelinegap"><span class="reference">1–11</span></h2> <h2 class="x-section halflinegap">Dieu crée l’univers et l’humanité</h2> <p class="p"><a class="verse" id="vGen.1.1"/><span class="displayReference"> <span class="chapterBookName"><a class="chapter-to-index-link" href="Gen.xml#cGen.1">GEN</a></span><span class="firstChapterNumber">1</span> </span>Au commencement Dieu créa le ciel et la terre • NCX <navPoint class="category" id="navpoint-3" playOrder="3"> <navLabel><text>ANCIEN TESTAMENT</text></navLabel> <content src="Gen.xml"/><navPoint class="category" id="navpoint-3" playOrder="3"> <navLabel><text>LE PENTATEUQUE</text></navLabel> <content src="Gen.xml"/><navPoint class="book" id="navpoint-3" playOrder="3"> <navLabel><text>Genèse</text></navLabel> <content src="Gen.xml"/></navPoint> • OPF <spine toc="ncx"><itemref idref="titlepage" linear="yes"/> <itemref idref="copyright" linear="yes"/> <itemref idref="toc" linear="yes"/> <itemref idref="bk-Gen" linear="yes"/> <itemref idref="intro-Gen1" linear="yes"/> <itemref idref="ch-Gen-1" linear="yes"/> How hard can it be? 9 • Turn SFM for five Bibles into EPUB • We had sample files! • Bible is well structured: • Testaments • Books • Chapters • Paragraphs and poetry • Verses • SFM is documented Bibles contain... • • • • • • 10 Testaments Introductions Glossaries Notes Copyright statements Title pages © 2011 Mentea 9 Mentea Testaments contain... 11 • Books • Different in different translations • Introductions • Book groups • E.g., Pentateuch, minor prophets, etc. • With introductions Books contain... 12 • Chapters • Introductions • Maybe multiple • Sections • Footnotes • Notes Chapters contain... 13 • Paragraphs • Poetry • Lists • Tables • Sections • Footnotes • Other notes • Source in external OSIS files 10 © 2011 Mentea EPUB Paragraphs, poetry, etc., contain... 14 • Verses • Titles • Chapters • Selah (pause markers) • Hebrew annotations Verses contain... 15 • Text • Highlights • Glossary terms • Footnotes • Other notes • Source in external OSIS files Numbers and sequences 16 • Different books • Catholic Deutocanonicals • One Daniel or two? • Psalm 151? • Different hierarchy of books © 2011 Mentea 11 Mentea Chapter numbers 17 • Start at 1 • Are numeric • Are present • One per chapter Verse numbers 18 • Start at 1 • Are all present • One per verse • Are consecutive Standard Format Markers (SFM) \c 1 \ms AU COMMENCEMENT \mr 1--11 \s Dieu crée l'univers et l'humanité \p \v 1 Au commencement Dieu créa le ciel et la terre*fa*. • Now standardising as “Unified Standard Format Markers” (USFM) • “The first task in preparing to convert SFM files to OSIS is to clean the ext. The more regular your source files are, the more likely the conversion process will operate correctly.” • Wasn't an option 12 © 2011 Mentea 19 EPUB Balises 20 SFM codes 21 • Are documented • Start a line \v 1 Commencement*µ* de la *y*création*fa**µ* par Dieu du ciel et de la terre. \fm a \fr 1.1 \f Il ... • Mean the same \s L'Homme // et la Femme en Éden \f 10.2//; 17.1; 20.20//; 27.56; Mc 1.29 • Are codes, not formatting \v 6|ia|x \v (1) \v [21 • Still were more similar than different • Despite some used in one book in one translation © 2011 Mentea 13 Mentea How we did it 22 • XSLT 2.0 • Turn text into XML • Chain of transformations • Nearly 50 transforms • Some for all books • Some for one chapter of one book • Ant • Handle configuration • Run XSLT Transitions 23 • From SFM ... \c 1 \ms AU COMMENCEMENT \mr 1--11 \s Dieu crée l'univers et l'humanité \p \v 1 Au commencement Dieu créa le ciel et la terre*fa*. • ... to lines ... <t:line>\c 1</t:line> <t:line>\ms AU COMMENCEMENT</t:line> <t:line>\mr 1--11</t:line> <t:line>\s Dieu crée l'univers et l'humanité</t:line> <t:line>\p</t:line> <t:line>\v 1 Au commencement Dieu créa le ciel et la terre*fa*.</t:line> Transitions 24 • ... to partial OSIS ... <div type="book"><title short="GENÈSE" type="main">Genèse</title> <chapter osisID="Gen.1" sID="Gen.1" n="1"/><div type="majorSection"> <title>AU COMMENCEMENT</title><t:line>\mr 1–11</t:line> <title type="x-section">Dieu crée l’univers et l’humanité</title> <t:line>\p <verse osisID="Gen.1.1" sID="Gen.1.1" n="1"/>Au commencement Dieu créa le ciel et la terre<note n="a" ID="nGen.1.1"> <reference type="source" osisRef="Gen.1.1">1.1</reference>|iAu commencement... :|x traduction la plus fréquente de ce verset. Elle est imitée de l’ancienne version grecque, à laquelle se réfère très probablement l’évangile de Jean (1.1). Mais le texte hébreu serait mieux rendu par |iQuand Dieu commença de créer le ciel et la terre... Dieu dit|x.</note>.</t:line> • ... to OSIS <div type="book"><title short="GENÈSE" type="main">Genèse</title> <chapter osisID="Gen.1" sID="Gen.1" n="1"/><div type="majorSection"> <title>AU COMMENCEMENT</title><title type="scope"><reference>1–11</reference></title> <title type="x-section">Dieu crée l’univers et l’humanité</title> <p><verse osisID="Gen.1.1" sID="Gen.1.1" n="1"/>Au commencement Dieu créa le ciel et la terre<note n="a" ID="nGen.1.1"> <reference type="source" osisRef="Gen.1.1">1.1</reference> <hi type="italic">Au commencement... :</hi> traduction la plus fréquente de ce verset. Elle est imitée de l’ancienne version grecque, à laquelle se réfère très probablement l’évangile de Jean (1.1). Mais le texte hébreu serait mieux rendu par <hi type="italic">Quand Dieu commença de créer le ciel et la terre... Dieu dit</hi>.</note>. <verse eID="Gen.1.1"/></p> 14 © 2011 Mentea EPUB Transform from the outside in • • • • • • • • 25 Introduction and body Sections Chapters Verse start milestones Notes Tables, poetry, lists, paragraphs Highlights and glossary terms Verse end milestones OSIS • • • • 26 Open Scriptural Information Standard Developed by linguistic, theological, and XML experts Strong TEI influence Designed for subject matter experts and XML beginners Where OSIS worked well 27 • “Non-intuitive” numbering • chapter/@n and verse/@n record number • Optional in schema • Very necessary in practice • Multiple verses in one sentence • OSIS ID scheme: Gen.1.1, John.3.16, etc. • verse/@osisID can have multiple IDs • Separate @sID and @eID for use with milestones • Split verses • <verse osisID="Zac.4.6!a" n="6a"/> Where OSIS didn't work well 28 • No verse start/end in highlighted text • 5½ chapters in Daniel all highlighted text • Except for “mene mene tekel parsin” • Added yet another stage just to split highlights • No titles in poetry line groups • Our translations put titles everywhere • Fixed by hand EPUB 29 • What is it? • OSIS to EPUB © 2011 Mentea 15 Mentea What is an eBook? An EPUB? 30 • eBook – Electronic book • Book delivered electronically over the Internet or to handheld reading devices • PDF • “Website-in-a-file” • EPUB – Standard eBook format • International Digital Publishing Forum • Current: 2.0.1 • Next: EPUB 3, due mid-2011 Advantages • • • • 31 Fewer dead trees Back catalog always available Many free books available Out of copyright books often free Disadvantages 32 • Can be as expensive as physical book • You don't own the book in the same way • Amazon deleted 1984 from Kindles • Can't read during takeoff and landing • Reading devices aren’t environmentally friendly 16 © 2011 Mentea EPUB eBook growth 33 • US$119.7 Million wholesale in Q3 2010 • Amazon sold more eBooks than paperbacks in Q4 2010 • Third-generation Kindle bestselling product in Amazon’s history EPUB file format 34 • Zip file • mimetype file provides EPUB signature • First file in archive • Uncompressed • Must be “application/epub+zip” • Text is XHTML • “OPF” file for manifest and spine • “NCX” file for Table of Contents © 2011 Mentea 17 Mentea OSIS to EPUB XHTML 35 Books Gen Gen Note John Glossary Mac1 Note Gloss Gen Mac1 Gen Mac1 Gen John Gen John Gen Note John Gloss Gen Note Gloss Gloss Mac1 Note Gloss Catho, Notes 18 Mac1 Ext. Notes Catho, No Notes © 2011 Mentea Prot, Notes John Prot, No Notes EPUB Glossary cross-references to notes Books Gen Mac1 36 Ext. Notes Gen Note John Glossary Mac1 Note Gloss Gen Mac1 Gen Mac1 Gen John Gen John Gen Note John Gloss v1 Gen Note Gloss v2 Gloss v3 Mac1 Note Gloss Catho, Notes Catho, No Notes Prot, Notes © 2011 Mentea John Prot, No Notes 19 Mentea Putting the EPUB together 37 Bible contents 38 <contents id="bfc-catho"> <i18n> <label id="ot">ANCIEN TESTAMENT</label> <label id="penta">LE PENTATEUQUE</label> <label id="history">LIVRES HISTORIQUES</label> <label id="poetry">LIVRES POÉTIQUES</label> <label id="prophets">LIVRES PROPHÉTIQUES</label> <label id="nt">NOUVEAU TESTAMENT</label> </i18n> <books> <category id="ot"> <include id="catho-ot"/> </category> <category id="nt"> <include id="abf-nt"/> </category> </books> </contents> 20 © 2011 Mentea EPUB NCX <navMap> = Table of Contents 39 <navMap> <navPoint class="titlepage" id="navpoint-1" playOrder="1"> <navLabel> <text>La Bible en français courant</text> </navLabel> <content src="titlepage.xml"/> </navPoint> <navPoint class="copyright" id="navpoint-2" playOrder="2"> <navLabel> <text>Conditions générales d'utilisation</text> </navLabel> <content src="copyright.xml"/> </navPoint> <navPoint class="category" id="navpoint-3" playOrder="3"> <navLabel> <text>ANCIEN TESTAMENT</text> </navLabel> <content src="Gen.xml"/> <navPoint class="category" id="navpoint-3" playOrder="3"> <navLabel> <text>LE PENTATEUQUE</text> </navLabel> <content src="Gen.xml"/> <navPoint class="book" id="navpoint-3" playOrder="3"> <navLabel> <text>Genèse</text> </navLabel> <content src="Gen.xml"/> </navPoint> ... </navMap> OPF <manifest> 40 <manifest> <!-- CSS Style Sheets --> <item id="main-css" href="css/book.css" media-type="text/css"/> <item id="local-css" href="css/book-local.css" media-type="text/css"/> <!-- Metadata images. Not to be included in spine. --> <item id="images-217-2-jpg" href="images/217-2.jpg" media-type="image/jpeg"/> <!-- NCX --> <item id="ncx" href="epb.ncx" media-type="application/x-dtbncx+xml"/> <item id="titlepage" href="titlepage.xml" media-type="application/xhtml+xml"/> <item id="copyright" href="copyright.xml" media-type="application/xhtml+xml"/> <item id="toc" href="toc.xml" media-type="application/xhtml+xml"/> <item id="glossaire" href="glossaire.xml" media-type="application/xhtml+xml"/> <item id="bk-Gen" href="Gen.xml" media-type="application/xhtml+xml"/> <item id="intro-Gen1" href="Gen-intro1.xml" media-type="application/xhtml+xml"/> <item id="ch-Gen-1" href="Gen-1.xml" media-type="application/xhtml+xml"/> <item id="notes-Gen-1" href="Gen-1-notes.xml" media-type="application/xhtml+xml"/> OPF <spine> = Linear reading order 41 <spine toc="ncx"> <itemref idref="titlepage" linear="yes"/> <itemref idref="copyright" linear="yes"/> <itemref idref="toc" linear="yes"/> <itemref idref="bk-Gen" linear="yes"/> <itemref idref="intro-Gen1" linear="yes"/> <itemref idref="ch-Gen-1" linear="yes"/> <itemref idref="ch-Gen-2" linear="yes"/> © 2011 Mentea 21 Mentea Where XSLT 2.0 worked well 42 • Hard logic made easy, e.g., inserting verse end milestones <!-- true() only if $text is child of the right kind of element and within a chapter. --> <xsl:function name="t:versable" as="xs:boolean"> <xsl:param name="text" as="text()" /> <xsl:sequence select="exists($text/preceding::o:chapter[1]) and empty($text/ancestor::o:title[not(@canonical = 'true')]) and empty($text/ancestor::o:note) and empty($text/ancestor::o:speaker) and empty($text/ancestor::o:w) and exists(for $element in $text/ancestor::* return if (namespace-uri($element) eq namespace-uri($o:ns) and local-name($element) = $versable-elements) then $element else ())" /> </xsl:function> Where XSLT 2.0 didn't help 43 • “Moving” a node means copying plus dropping <xsl:template match="o:div[exists(t:opening-chapter-start(.))]" mode="move-up"> <xsl:copy-of select="t:opening-chapter-start(.)" /> <xsl:copy> <xsl:apply-templates select="@*|node()" mode="#current" /> </xsl:copy> </xsl:template> <xsl:template match="o:chapter[t:is-opening-chapter-start(.)]" mode="move-up" /> • Not good for processing SFM highlight markup when content included notes and glossary terms Further Information 44 • Open Scriptural Information Standard (OSIS) http://bibletechnologies.net/ • IDPF http://www.idpf.org/ Credits 45 • slide 33 – eBook sales – http://idpf.org/about-us/industry-statistics (accessed 22 February 2011) • slide 33 – Amazon sales – http://phx.corporate-ir.net/phoenix.zhtml? c=97664&p=irol-newsArticle&ID=1521089 (accessed 22 February 2011) • slide 32 – http://www.theregister.co.uk/2009/07/18/ amazon_removes_1984_from_kindle/ (accessed 22 February 2011) • slide 32 – http://xkcd.com/750/ 22 © 2011 Mentea EPUB © 2011 Mentea 23