EPUB: Chapter and Verse – XML Prague 2011

Transcription

EPUB: Chapter and Verse – XML Prague 2011
EPUB
Chapter and Verse
Tony Graham
Mentea
[email protected]
http://www.mentea.net/
Mark Howe
[email protected]
Version 1.0 – XML Prague 2011 – 26-27 March 2011
© 2011 Mentea
EPUB
EPUB
Chapter and Verse
Bibles 5
EPUB 15
© 2011 Mentea
3
Mentea
4
© 2011 Mentea
EPUB
Bibles
1
© 2011 Mentea
5
Mentea
Handheld device
2
iPhone
3
6
© 2011 Mentea
EPUB
Dedicated reader
4
Web browser
5
© 2011 Mentea
7
Mentea
In the beginning, there was ... SFM
6
\c 1
\ms AU COMMENCEMENT
\mr 1--11
\s Dieu crée l'univers et l'humanité
\p
\v 1 Au commencement Dieu créa le ciel et la terre*fa*.
\fm a
\fr 1.1
\f |iAu commencement...:|x traduction la plus fréquente de
ce verset. Elle est imitée de l'ancienne version grecque, à laquelle
se réfère très probablement l'évangile de Jean (1.1). Mais le
texte hébreu serait mieux rendu par |iQuand Dieu
commença de créer le ciel et la terre... Dieu dit|x.
\p
\v 2 La terre était sans forme et vide, et l'obscurité
couvrait l'océan primitif. Le souffle de Dieu se déplaçait à
la surface de l'eau*fb*.
\fm b
\fr 1.2
\f Le v.$2 constitue en hébreu une sorte de parenthèse,
le v.$1 ayant sa suite au v.$3. -- |isans forme et vide:|x
l'expression hébraïque correspondante a donné en français
|itohu-bohu|x. Le jeu de mots pourrait être rendu par |iun
désert en désordre|x. -- |iLe souffle de Dieu:|x autre
traduction |iun vent terrible|x.
OSIS
7
<div type="book">
<title short="GENÈSE" type="main">Genèse</title>
<chapter osisID="Gen.1" sID="Gen.1" n="1"/>
<div type="majorSection"><title>AU COMMENCEMENT</title>
<title type="scope"><reference>1–11</reference></title>
<title type="x-section">Dieu crée l’univers et l’humanité</title>
<p><verse osisID="Gen.1.1" sID="Gen.1.1" n="1"/>Au
commencement Dieu créa le ciel et la terre<note n="a"
type="crossReference"> <reference type="source"
osisRef="Gen.1.1">1.1</reference> <hi type="italic">Au
commencement... :</hi> traduction la plus fréquente de
ce verset. Elle est imitée de l’ancienne version
grecque, à laquelle se réfère très probablement
l’évangile de Jean (1.1). Mais le texte hébreu serait
mieux rendu par <hi type="italic">Quand Dieu commença de
créer le ciel et la terre... Dieu
dit</hi>.</note>.<verse eID="Gen.1.1"/></p>
<p><verse osisID="Gen.1.2" sID="Gen.1.2" n="2"/>La terre
était sans forme et vide, et l’obscurité couvrait
l’océan primitif. Le souffle de Dieu se déplaçait à la
surface de l’eau<note n="b"
type="crossReference"><reference type="source"
osisRef="Gen.1.2">1.2</reference>Le v. 2 constitue en
hébreu une sorte de parenthèse, le v. 1 ayant sa suite
au v. 3. – <hi type="italic">sans forme et vide :</hi>
l’expression hébraïque correspondante a donné en
français <hi type="italic">tohu-bohu</hi>. Le jeu de
mots pourrait être rendu par <hi type="italic">un désert
en désordre</hi>. – <hi type="italic">Le souffle de
Dieu :</hi> autre traduction <hi type="italic">un vent
terrible</hi>.</note>. <verse eID="Gen.1.2"/>
8
© 2011 Mentea
EPUB
EPUB
8
• XHTML
<body class="chapter" id="top"><div class="book">
<h2 class="majorSection nolinegap">AU COMMENCEMENT</h2>
<h2 class="scope wholelinegap"><span class="reference">1–11</span></h2>
<h2 class="x-section halflinegap">Dieu crée l’univers et l’humanité</h2>
<p class="p"><a class="verse" id="vGen.1.1"/><span class="displayReference">
<span class="chapterBookName"><a class="chapter-to-index-link"
href="Gen.xml#cGen.1">GEN</a></span><span class="firstChapterNumber">1</span>
</span>Au commencement Dieu créa le ciel et la terre
• NCX
<navPoint class="category" id="navpoint-3" playOrder="3">
<navLabel><text>ANCIEN TESTAMENT</text></navLabel>
<content src="Gen.xml"/><navPoint class="category" id="navpoint-3" playOrder="3">
<navLabel><text>LE PENTATEUQUE</text></navLabel>
<content src="Gen.xml"/><navPoint class="book" id="navpoint-3" playOrder="3">
<navLabel><text>Genèse</text></navLabel>
<content src="Gen.xml"/></navPoint>
• OPF
<spine toc="ncx"><itemref idref="titlepage" linear="yes"/>
<itemref idref="copyright" linear="yes"/>
<itemref idref="toc" linear="yes"/>
<itemref idref="bk-Gen" linear="yes"/>
<itemref idref="intro-Gen1" linear="yes"/>
<itemref idref="ch-Gen-1" linear="yes"/>
How hard can it be?
9
• Turn SFM for five Bibles into EPUB
• We had sample files!
• Bible is well structured:
• Testaments
• Books
• Chapters
• Paragraphs and poetry
• Verses
• SFM is documented
Bibles contain...
•
•
•
•
•
•
10
Testaments
Introductions
Glossaries
Notes
Copyright statements
Title pages
© 2011 Mentea
9
Mentea
Testaments contain...
11
• Books
• Different in different translations
• Introductions
• Book groups
• E.g., Pentateuch, minor prophets, etc.
• With introductions
Books contain...
12
• Chapters
• Introductions
• Maybe multiple
• Sections
• Footnotes
• Notes
Chapters contain...
13
• Paragraphs
• Poetry
• Lists
• Tables
• Sections
• Footnotes
• Other notes
• Source in external OSIS files
10
© 2011 Mentea
EPUB
Paragraphs, poetry, etc., contain...
14
• Verses
• Titles
• Chapters
• Selah (pause markers)
• Hebrew annotations
Verses contain...
15
• Text
• Highlights
• Glossary terms
• Footnotes
• Other notes
• Source in external OSIS files
Numbers and sequences
16
• Different books
• Catholic Deutocanonicals
• One Daniel or two?
• Psalm 151?
• Different hierarchy of books
© 2011 Mentea
11
Mentea
Chapter numbers
17
• Start at 1
• Are numeric
• Are present
• One per chapter
Verse numbers
18
• Start at 1
• Are all present
• One per verse
• Are consecutive
Standard Format Markers (SFM)
\c 1
\ms AU COMMENCEMENT
\mr 1--11
\s Dieu crée l'univers et l'humanité
\p
\v 1 Au commencement Dieu créa le ciel et la terre*fa*.
• Now standardising as “Unified Standard Format Markers” (USFM)
• “The first task in preparing to convert SFM files to OSIS is to clean the ext. The more regular
your source files are, the more likely the conversion process will operate correctly.”
• Wasn't an option
12
© 2011 Mentea
19
EPUB
Balises
20
SFM codes
21
• Are documented
• Start a line
\v 1 Commencement*µ* de la *y*création*fa**µ* par
Dieu du ciel et de la terre. \fm a \fr 1.1 \f Il ...
• Mean the same
\s L'Homme // et la Femme en Éden
\f 10.2//; 17.1; 20.20//; 27.56; Mc 1.29
• Are codes, not formatting
\v 6|ia|x
\v (1)
\v [21
• Still were more similar than different
• Despite some used in one book in one translation
© 2011 Mentea
13
Mentea
How we did it
22
• XSLT 2.0
• Turn text into XML
• Chain of transformations
• Nearly 50 transforms
• Some for all books
• Some for one chapter of one book
• Ant
• Handle configuration
• Run XSLT
Transitions
23
• From SFM ...
\c 1
\ms AU COMMENCEMENT
\mr 1--11
\s Dieu crée l'univers et l'humanité
\p
\v 1 Au commencement Dieu créa le ciel et la terre*fa*.
• ... to lines ...
<t:line>\c 1</t:line>
<t:line>\ms AU COMMENCEMENT</t:line>
<t:line>\mr 1--11</t:line>
<t:line>\s Dieu crée l'univers et l'humanité</t:line>
<t:line>\p</t:line>
<t:line>\v 1 Au commencement
Dieu créa le ciel et la terre*fa*.</t:line>
Transitions
24
• ... to partial OSIS ...
<div type="book"><title short="GENÈSE" type="main">Genèse</title>
<chapter osisID="Gen.1" sID="Gen.1" n="1"/><div type="majorSection">
<title>AU COMMENCEMENT</title><t:line>\mr 1–11</t:line>
<title type="x-section">Dieu crée l’univers et l’humanité</title>
<t:line>\p <verse osisID="Gen.1.1" sID="Gen.1.1" n="1"/>Au
commencement Dieu créa le ciel et la terre<note n="a" ID="nGen.1.1">
<reference type="source" osisRef="Gen.1.1">1.1</reference>|iAu
commencement... :|x traduction la plus fréquente de
ce verset. Elle est imitée de l’ancienne version grecque, à laquelle
se réfère très probablement l’évangile de Jean (1.1). Mais le
texte hébreu serait mieux rendu par |iQuand Dieu
commença de créer le ciel et la terre... Dieu dit|x.</note>.</t:line>
• ... to OSIS
<div type="book"><title short="GENÈSE" type="main">Genèse</title>
<chapter osisID="Gen.1" sID="Gen.1" n="1"/><div type="majorSection">
<title>AU COMMENCEMENT</title><title type="scope"><reference>1–11</reference></title>
<title type="x-section">Dieu crée l’univers et l’humanité</title>
<p><verse osisID="Gen.1.1" sID="Gen.1.1" n="1"/>Au
commencement Dieu créa le ciel et la terre<note n="a" ID="nGen.1.1">
<reference type="source" osisRef="Gen.1.1">1.1</reference>
<hi type="italic">Au commencement... :</hi> traduction la plus fréquente de
ce verset. Elle est imitée de l’ancienne version grecque, à laquelle
se réfère très probablement l’évangile de Jean (1.1). Mais le
texte hébreu serait mieux rendu par <hi type="italic">Quand Dieu
commença de créer le ciel et la terre... Dieu dit</hi>.</note>.
<verse eID="Gen.1.1"/></p>
14
© 2011 Mentea
EPUB
Transform from the outside in
•
•
•
•
•
•
•
•
25
Introduction and body
Sections
Chapters
Verse start milestones
Notes
Tables, poetry, lists, paragraphs
Highlights and glossary terms
Verse end milestones
OSIS
•
•
•
•
26
Open Scriptural Information Standard
Developed by linguistic, theological, and XML experts
Strong TEI influence
Designed for subject matter experts and XML beginners
Where OSIS worked well
27
• “Non-intuitive” numbering
• chapter/@n and verse/@n record number
• Optional in schema
• Very necessary in practice
• Multiple verses in one sentence
• OSIS ID scheme: Gen.1.1, John.3.16, etc.
• verse/@osisID can have multiple IDs
• Separate @sID and @eID for use with milestones
• Split verses
• <verse osisID="Zac.4.6!a" n="6a"/>
Where OSIS didn't work well
28
• No verse start/end in highlighted text
• 5½ chapters in Daniel all highlighted text
• Except for “mene mene tekel parsin”
• Added yet another stage just to split highlights
• No titles in poetry line groups
• Our translations put titles everywhere
• Fixed by hand
EPUB
29
• What is it?
• OSIS to EPUB
© 2011 Mentea
15
Mentea
What is an eBook? An EPUB?
30
• eBook – Electronic book
• Book delivered electronically over the Internet or to handheld reading devices
• PDF
• “Website-in-a-file”
• EPUB – Standard eBook format
• International Digital Publishing Forum
• Current: 2.0.1
• Next: EPUB 3, due mid-2011
Advantages
•
•
•
•
31
Fewer dead trees
Back catalog always available
Many free books available
Out of copyright books often free
Disadvantages
32
• Can be as expensive as physical book
• You don't own the book in the same way
• Amazon deleted 1984 from Kindles
• Can't read during takeoff and landing
• Reading devices aren’t environmentally friendly
16
© 2011 Mentea
EPUB
eBook growth
33
• US$119.7 Million wholesale in Q3 2010
• Amazon sold more eBooks than paperbacks in Q4 2010
• Third-generation Kindle bestselling product in Amazon’s history
EPUB file format
34
• Zip file
• mimetype file provides EPUB signature
• First file in archive
• Uncompressed
• Must be “application/epub+zip”
• Text is XHTML
• “OPF” file for manifest and spine
• “NCX” file for Table of Contents
© 2011 Mentea
17
Mentea
OSIS to EPUB XHTML
35
Books
Gen
Gen
Note
John
Glossary
Mac1
Note
Gloss
Gen
Mac1
Gen
Mac1
Gen
John
Gen
John
Gen
Note
John
Gloss
Gen
Note
Gloss
Gloss
Mac1
Note
Gloss
Catho, Notes
18
Mac1
Ext. Notes
Catho, No Notes
© 2011 Mentea
Prot, Notes
John
Prot, No Notes
EPUB
Glossary cross-references to notes
Books
Gen
Mac1
36
Ext. Notes
Gen
Note
John
Glossary
Mac1
Note
Gloss
Gen
Mac1
Gen
Mac1
Gen
John
Gen
John
Gen
Note
John
Gloss
v1
Gen
Note
Gloss
v2
Gloss
v3
Mac1
Note
Gloss
Catho, Notes
Catho, No Notes
Prot, Notes
© 2011 Mentea
John
Prot, No Notes
19
Mentea
Putting the EPUB together
37
Bible contents
38
<contents id="bfc-catho">
<i18n>
<label id="ot">ANCIEN TESTAMENT</label>
<label id="penta">LE PENTATEUQUE</label>
<label id="history">LIVRES HISTORIQUES</label>
<label id="poetry">LIVRES POÉTIQUES</label>
<label id="prophets">LIVRES PROPHÉTIQUES</label>
<label id="nt">NOUVEAU TESTAMENT</label>
</i18n>
<books>
<category id="ot">
<include id="catho-ot"/>
</category>
<category id="nt">
<include id="abf-nt"/>
</category>
</books>
</contents>
20
© 2011 Mentea
EPUB
NCX <navMap> = Table of Contents
39
<navMap>
<navPoint class="titlepage" id="navpoint-1" playOrder="1">
<navLabel>
<text>La Bible en français courant</text>
</navLabel>
<content src="titlepage.xml"/>
</navPoint>
<navPoint class="copyright" id="navpoint-2" playOrder="2">
<navLabel>
<text>Conditions générales d'utilisation</text>
</navLabel>
<content src="copyright.xml"/>
</navPoint>
<navPoint class="category" id="navpoint-3" playOrder="3">
<navLabel>
<text>ANCIEN TESTAMENT</text>
</navLabel>
<content src="Gen.xml"/>
<navPoint class="category" id="navpoint-3" playOrder="3">
<navLabel>
<text>LE PENTATEUQUE</text>
</navLabel>
<content src="Gen.xml"/>
<navPoint class="book" id="navpoint-3" playOrder="3">
<navLabel>
<text>Genèse</text>
</navLabel>
<content src="Gen.xml"/>
</navPoint>
...
</navMap>
OPF <manifest>
40
<manifest>
<!-- CSS Style Sheets -->
<item id="main-css" href="css/book.css" media-type="text/css"/>
<item id="local-css" href="css/book-local.css" media-type="text/css"/>
<!-- Metadata images. Not to be included in spine. -->
<item id="images-217-2-jpg"
href="images/217-2.jpg" media-type="image/jpeg"/>
<!-- NCX -->
<item id="ncx"
href="epb.ncx" media-type="application/x-dtbncx+xml"/>
<item id="titlepage"
href="titlepage.xml" media-type="application/xhtml+xml"/>
<item id="copyright"
href="copyright.xml" media-type="application/xhtml+xml"/>
<item id="toc"
href="toc.xml" media-type="application/xhtml+xml"/>
<item id="glossaire"
href="glossaire.xml" media-type="application/xhtml+xml"/>
<item id="bk-Gen"
href="Gen.xml" media-type="application/xhtml+xml"/>
<item id="intro-Gen1"
href="Gen-intro1.xml" media-type="application/xhtml+xml"/>
<item id="ch-Gen-1"
href="Gen-1.xml" media-type="application/xhtml+xml"/>
<item id="notes-Gen-1"
href="Gen-1-notes.xml" media-type="application/xhtml+xml"/>
OPF <spine> = Linear reading order
41
<spine toc="ncx">
<itemref idref="titlepage" linear="yes"/>
<itemref idref="copyright" linear="yes"/>
<itemref idref="toc" linear="yes"/>
<itemref idref="bk-Gen" linear="yes"/>
<itemref idref="intro-Gen1" linear="yes"/>
<itemref idref="ch-Gen-1" linear="yes"/>
<itemref idref="ch-Gen-2" linear="yes"/>
© 2011 Mentea
21
Mentea
Where XSLT 2.0 worked well
42
• Hard logic made easy, e.g., inserting verse end milestones
<!-- true() only if $text is child of the right kind of element and
within a chapter. -->
<xsl:function name="t:versable" as="xs:boolean">
<xsl:param name="text" as="text()" />
<xsl:sequence
select="exists($text/preceding::o:chapter[1]) and
empty($text/ancestor::o:title[not(@canonical = 'true')]) and
empty($text/ancestor::o:note) and
empty($text/ancestor::o:speaker) and
empty($text/ancestor::o:w) and
exists(for $element in $text/ancestor::*
return if (namespace-uri($element) eq namespace-uri($o:ns) and
local-name($element) = $versable-elements)
then $element else ())" />
</xsl:function>
Where XSLT 2.0 didn't help
43
• “Moving” a node means copying plus dropping
<xsl:template
match="o:div[exists(t:opening-chapter-start(.))]"
mode="move-up">
<xsl:copy-of
select="t:opening-chapter-start(.)" />
<xsl:copy>
<xsl:apply-templates select="@*|node()" mode="#current" />
</xsl:copy>
</xsl:template>
<xsl:template
match="o:chapter[t:is-opening-chapter-start(.)]"
mode="move-up" />
• Not good for processing SFM highlight markup when content included notes and glossary
terms
Further Information
44
• Open Scriptural Information Standard (OSIS)
http://bibletechnologies.net/
• IDPF
http://www.idpf.org/
Credits
45
• slide 33 – eBook sales – http://idpf.org/about-us/industry-statistics
(accessed 22 February 2011)
• slide 33 – Amazon sales – http://phx.corporate-ir.net/phoenix.zhtml?
c=97664&p=irol-newsArticle&ID=1521089 (accessed 22 February 2011)
• slide 32 – http://www.theregister.co.uk/2009/07/18/
amazon_removes_1984_from_kindle/ (accessed 22 February 2011)
• slide 32 – http://xkcd.com/750/
22
© 2011 Mentea
EPUB
© 2011 Mentea
23