********************************************************************** BookX --> BookXHTML (conformant to XHTML 1.1) Procedural Description 05-Mar-2007 version, supercedes 04-Mar-2007. For reference, the latest draft BookX DTD is found at: http://www.bookx.org/dtd/bookx10.dtd I. Assumptions and global requirements: --------------------------------------- 1) All BookX documents will be valid to the BookX DTD and meet ALL other requirements listed in the BookX DTD, including those which cannot be enforced by validation to the DTD. Thus, no need to worry about pre-testing for malformed BookX documents before transforming. A BookX conformance checker is being planned. 2) One requirement is that all BookX documents must be UTF-8 or UTF-16 encoded, and the XHTML 1.1 transform must also be the same. 3) The output XHTML 1.1 document must follow the following "top-level template": ======================================================================
====================================================================== Notes on the above "template": 1) The encoding MUST either be UTF-8 or UTF-16 as given in the XML declaration in the source BookX document. 2) The language-countrycode applied to 'xml:lang' in is that from the 'lang' attribute in the root element. ********************************************************************** II. Common attribute mapping ---------------------------- All BookX elements may, and in some cases must, include one, some, or all of the Common attributes of 'id', 'lang', 'pubcomment' and 'privcomment'. When converting BookX to XHTML 1.1 as specified herein, these are the Common attribute value mappings, except where noted to the contrary elsewhere in this document (for the major exception, note especially the section dealing with structural division headers where 'id's are shifted from the header titles to the wrapping
): BookX --> XHTML 1.1 -------------------- 'id' --> 'id' 'lang' --> 'xml:lang' 'pubcomment' --> 'title' 'privcomment' --> [must not be transferred, ignore] ********************************************************************** III. XHTML 1.1 elements ------------------------------ 1) The XHTML element is to be given the content in the first BookX <booktitle> (usually there will only be one <booktitle>.) Ignore/remove any inline tags that might be present in <booktitle> (but keep all PCDATA!) For example, the following BookX: <booktitle>This is the <emph>Book</emph> Title</booktitle> maps to the XHTML 1.1: <title>This is the Book Title 2) elements There are two types of : A) The required Content-Type: The above is placed right after in <head>. According to the rules, XHTML 1.1 *should* have a media-type of "application/xhtml+xml" (which for some applications may need to be changed to "text/html" which is not of issue for this transformation.) The 'charset' attribute is either "utf-8" or "utf-16" depending upon the encoding of the source BookX document. B) Those derived from the <bookinfo> metadata. All book metadata in <bookinfo> must be mapped to XHTML <meta/> elements. They are to be ordered in XHTML as they are ordered in the BookX document, and begin appearing after the required "Content-Type" <meta/>. There may be more than one entry for a particular metadata type in BookX. If there are any inline tags within the content of the metadata elements, they must be first stripped out as described above for the <booktitle> --> <title> (in <head>) mapping. Here are the mappings (and in their order) to embed into XHTML 1.1: a) <booktitle>[content]</booktitle> maps to <meta name="dc.title" content="[content]"/> b) <creator role="xxx">[content]</creator> maps to <meta name="dc.creator" scheme="marcrel.xxx" content="[content]"/> (Note that the optional 'file-as' attribute is to be ignored. Unfortunately, there's nowhere clean to put it in XHTML, although the XHTML 1.1 'title' attribute could be used.) (The 'role' attribute value is a three character string from the MARC Relator code list.) c) <publisher>[content]</publisher> maps to <meta name="dc.publisher" content="[content]"/> d) <imprint>[content]</imprint> maps to <meta name="dc.publisher" content="[content]"/> e) <copyright>[content]</copyright> maps to <meta name="dc.rights" content="[content]"/> f) <identifier ver="[N]">[content1]</identifier> maps to <meta name="dc.identifier" content="[content1]::ver=[N]"/> g) <description>[content]</description> maps to <meta name="dc.description" content="[content]"/> h) Add a language metadata item at this position (it won't be in the <bookinfo> section, but add it to XHTML <meta/> anyway): <meta name="dc.language" scheme="dcterms.rfc3066" content="xx-yy"/> Where "xx-yy" is the language/country-code (conforming to RFC3066) assigned by the 'lang' attribute in the <bookx> root element. Note that according to RFC3066, the value does not have to include a country code and each field may have 3 (and possibly more) characters rather than 2 characters long -- so simply map the value and assume it follows RFC3066. i) <contributor role="xxx">[content]</contributor> maps to <meta name="dc.contributor" scheme="marcrel.xxx" content="[content]"/> (note that the optional 'file-as' attribute is to be ignored.) j) <bookdate event="[content1]">[content2]</bookdate> maps to <meta name="dc.date.[content1]" scheme="dcterms.W3CDTF" content="[content2]"/> k) <subject scheme="[content1]">[content2]</subject> Set [content3] to one of the following: [content3] == [content1] if [content1] not equal to "lcc" or "lcsh" [content3] == dcterms.lcc when [content1]=="lcc" [content3] == dcterms.lcsh when [content1]=="lcsh" (Note: "lcc" and "lcsh" for [content1] is case-insensitive, so check all case variations. The conversion to XHTML though should be to lower case, i.e. "dcterms.lcc" and "dcterms.lcsh".) Then <subject> maps to: <meta name="dc.subject" scheme="[Content3]" content="[content2]"/> 3) Style sheet assignments using <link/> It is not yet determined how style sheet assignments using <link/> are to be added to the XHTML 1.1 documents, but since such assignments will not use any data from the BookX documents, it doesn't matter. We will figure this out at a later time (and of course it is pretty trivial.) When <link/> is included at a future time, it will be placed between <title> and the first <meta/>, so this item #3 is a little bit out of order. ********************************************************************** IV. Clean mappings ------------------ Many elements can be straightforwardly mapped without complications where the start tags and end tags are simply replaced (but still remember the Common attributes -- when they appear in any BookX element, they must be mapped as described in section II.) In approximate order of appearance in the BookX DTD: <sup>, <sub> and <code> remain "as is" (no mapping). <quote>...</quote> --> <blockquote>...</blockquote> <listitem>...</listitem> --> <li>...</li> <epigraph-chap>...</epigraph-chap> --> <div class="epigraph-chap">...</div> <verse>...</verse> --> <div class="verse">...</div> <stanza>...</stanza> --> <div class="stanza">...</div> <letter>...</letter> --> <div class="letter">...</div> <opener>...</opener> --> <div class="opener">...</div> <closer>...</closer> --> <div class="closer">...</div> <ending>...</ending> --> <div class="ending">...</div> <notestitle>...</notestitle> --> <h2 class="notestitle">...</h2> <notescommentary>...</notescommentary> --> <div class="notescommentary">...</div> <note>...</note> --> <div class="note">...</div> <glossarytitle>...</glossarytitle> --> <h2 class="glossarytitle">...</h2> <glossarycommentary>...</glossarycommentary> --> <div class="glossarycommentary">...</div> <gloss>...</gloss> --> <dl class="gloss">...</dl> <term>...</term> --> <dt class="term">...</dt> <definition>...</definition> --> <dd class="definition">...</dd> <caption>...</caption> --> <div class="caption">...</div> <versetitle>...</versetitle> --> <h2 class="versetitle">...</h2> <verseauthor>...</verseauthor> --> <h3 class="verseauthor">...</h3> <signed>...</signed> --> <div class="signed">...</div> <dateline>...</dateline> --> <div class="dateline">...</div> <salute>...</salute> --> <div class="salute">...</div> <notemark>...</notemark> --> <div class="notemark">...</div> <themebreak/> --> <div class="themebreak"></div> <emph>...</emph> --> <em class="emph">...</em> <emph-strong>...</emph-strong> --> <strong class="emph-strong">...</strong> <title>... --> ... ... --> ... ... --> ... (note, BookX should not map to XHTML since there are problems with that element because of quotation marks plus the default for XHTML is not italic. In BookX, is for quotations *intended* to be highlighted with the default being italic.) ... --> ... ... --> ... ... --> ... ********************************************************************** V. Cover Image --------------- When is present in BookX, the information is to be transferred to the
section as shown in the XHTML 1.1 template and as described below. The BookX: ... (note that 'id' and 'loititle' are required; ignore 'loiitem' if present.) maps to the XHTML 1.1:
...
For the contained , that maps to XHTML , with the 'source' attribute mapping to the 'src' attribute. In addition, map the value of 'loititle' in the parent to both the 'alt' and 'title' attributes of XHTML . (If 'pubcomment' is present on , ignore that.) Also, if 'href' is present in , that means the image is to be hypertext linked. In this case, wrap within .... BookX Example: (Again note that 'id' is always required on ) The above maps to the XHTML 1.1 (various white space has been added to improve readability): ********************************************************************** VI. Title Page (or frontispiece) ------------------------------- This section of the XHTML 1.1 document is entirely reconstructed from the BookX metadata. The parts are to be ordered as follows: a. Book Title(s) (required, from ) b. Creator(s) (required, from ) c. Publisher/Imprint (both optional, from and ) d. Copyright (optional, from ) e. Identifer(s) (optional, from ) Note that for this section, any inline elements in any of the BookX metadata elements must be preserved, and mapped to their XHTML equivalents as outlined elsewhere in this document. a. For Book Title(s), map each ... to

...

and wrap all of them (one or more) within a
...
. For example, if there are two :

First Line of Book Title

Second Line of Book Title

b. For Creator(s), map each ... to

...

and wrap all of them (one or more) within a
...
. In addition, for each creator transfer the value of the 'role' attribute (a three character MARC Relator code) into the 'class' attribute. For example, if there are two , one of which is a "aut" (for author) and another is "ill" (for illustrator):

John Doe

Jane Doe

(Ignore 'file-as' attribute if present in .) c. For and , if either one or both are present, map their contents to (as appropriate):

Publisher Name

Imprint Name

d. For ..., map contents to: e. For , the mapping is somewhat complicated. Each BookX is mapped to
, and all of them are wrapped in
. Order of multiple identifiers, if present, is significant and must be preserved. The content of BookX , which may only be character data (i.e., it cannot contain any inline elements), is to be modified using the value to the required 'ver' attribute, best explained by the following BookX example: urn:uuid:6a2014b0-87a2-11da-a72b-0800200c9a66 urn:isbn:90-70002-34-5 maps to:
urn:uuid:6a2014b0-87a2-11da-a72b-0800200c9a66 (version 1)
urn:isbn:90-70002-34-5 (version 2)
********************************************************************** VII. Table of Contents ---------------------- If there are any valid entries for the Table of Contents, as will be explained here, then include this section in the XHTML 1.1 document. Otherwise it can be left out or represented by the empty
. [NEED TO ADD THIS] ********************************************************************** VIII. List of Illustrations --------------------------- If there are any valid entries for the List of Illustrations, as will be explained here, then include this section in the XHTML 1.1 document. Otherwise it can be left out or represented by the empty
. [NEED TO ADD THIS] ********************************************************************** IX. Frontmatter portion ----------------------- In BookX, the optional section content maps to the XHTML 1.1
...
section (see template). The "frontmatter" section is to include the and content when either or both are present (when both are present, they appear in that order and will be mapped in that order.) Here are these element mappings: ... -->
...
... -->
...
Ignore the value of 'tocitem' if present. For either and , if 'toctitle' appears, use its value for the XHTML 1.1 'title' attribute, which overrides the value in the BookX 'pubcomment' attribute if that is also present. ********************************************************************** X. endmatter portion -------------------- In BookX, the optional section maps to the XHTML 1.1
...
section (see template). The endmatter section is to include the and content when either or both are present (when both are present, they appear in that order and will be mapped to XHTML 1.1 in that order. Here are these element mappings: ... -->
...
and ... -->
...
Ignore 'tocitem' if present. In addition, if 'toctitle' is present, map its value to 'title' instead of the value in 'pubcomment' if that is also present. Example: ... maps to:
...
********************************************************************** XI. Some more complicated mappings ---------------------------------- 1)

====== BookX

straightforwardly maps to XHTML 1.1

. However, if Bookx

includes the 'continuation' attribute, do the following: a) If the value of 'continuation' is "no", ignore this attribute, there will be no transfer of the information to XHTML 1.1

. b) If the value of 'continuation' is "yes", then add the 'class' attribute to XHTML

and set its value to "continuation". 2) ========= When the attribute 'ordered' is present and its value is "yes", then ... -->

    ...
Otherwise: ... -->
    ...
3) ============ ... maps to: ... In addition, if the 'vhireq' attribute is present in , all the space-separated NMTOKEN values in 'vhireq' are placed into the XHTML 'class' attribute along with the value for the 'type' attribute. For example: ... maps to XHTML 1.1: ... 4) and ============================ (This is the most complex transformation in BookX.) and are for embedding an image into a BookX document. In addition, may be present in . The top-level mapping is: (note that 'id' and 'loititle' are required.) maps to
...
In addition, place the value of the 'position' attribute (whether specified or its implied default value of 'inflow') into the class attribute, e.g.,
...
Finally, map the value of the required 'loititle' attribute to the 'alt' and 'title' attributes for the associated XHTML 1.1 element (see below). Ignore 'loiitem' if present. (For what to do with the 'assocwith' attribute, if present, refer to the section discussing hierarchical structural levels. Otherwise this value is not transferred anywhere to the XHTML 1.1
division.) For , that maps to XHTML , with the 'source' attribute mapping to the 'src' attribute. As noted above, map the 'loititle' value to the 'alt' and 'title' attributes (for 'title', this would override any given value for 'pubcomment' if present.) If 'href' is present in , that means the image is to be hypertext linked. In this case, wrap with ... element. BookX Example (including ): ------------------------------------

Headquarters of the Acme Corporation

(Note that 'id' is always required on ) The above maps to the XHTML 1.1:
Acme Corporation Headquarters

Headquarters of the Acme Corporation

5) BookX =============== ... --> ... In addition, if the attribute value 'hiddenlink' is present and if its value is "yes", then add the 'class' value of "hidden" to . For example. ... maps to 6) ============== ... maps to
...
where N in "indentN" is an integer from 0 to 6: a) If the 'indent' attribute is not present on , set N="0" b) If the 'indent' attribute is present, set its value, which is a single digit integer of 0 to 6, to the value of "N". For example: ... maps to
...
and ... maps to
...
7) and ============================ ... maps to where * is one of three tokens: "lines-left" "lines-center" "lines-right" Which token to use for * is determinable by: a) If the 'justify' attribute is not present on , use "lines-center" b) If the 'justify' attribute is present on , use: justify="left" --> "lines-left" justify="center" --> "lines-center" justify="right" --> "lines-right" Examples: ... --> ... --> For ... maps to
, use "medium" b) If the 'size' attribute is present on , use: size="large" --> "large" size="medium" --> "medium" size="small" --> "small" Examples: ... -->
...
... maps to
...
8) ============ ... maps to ... where [noteid] is the id associated with some note in the document. A must point to a in the section. Since we want to enable an actual link, a "#" must be placed in front of the noteref id making it a fragment identifier. 9) The four document division header titles: , , , and [and which may immediately follow any of the above four] ====================================================================== Transforming these four important document division header titles is somewhat complicated, and important to do right to assure the final XHTML 1.1 is properly and explicitly structured in a hierarchical sense. These four header titles define the start of a document division of the specified hierarchical level. (Note that one may appear immediately following any of these header titles.) Each header title must include an 'id' attribute, and may include a 'tocitem' and a 'toctitle' (when either or both are present, they are ignored for the purposes of this section). The mapping to XHTML is as follows: a. Each header title element (ignoring any attributes in the source headers for the moment) is mapped as follows: ... -->

...

... -->

...

... -->

...

... -->

...

b. Whenever a appears immediately after a header title, it is mapped to an XHTML header with the same XHTML header level as that precedes it. So each will map to one of these: ... -->

...

... -->

...

... -->

...

(e.g., if appears after , it will map to

. If appears after , it will map to

.) c. Each mapped header title and the content associated with it is to be wrapped within a
...
, where the class values must be "part", "chapter", "section" or "subsection". This will create a nesting when more than one hierarchy is used as the full example later on will show. For example: Chapter Title will map to:

Chapter Title

Note in this example that the 'id' value has been transferred from to the wrapping
, as noted next. d. The value of the required 'id' is transferred to the wrapping
, as seen in the above example. e. If there is an immediately preceding a header title of any level, the becomes a part of that header title division *if either* 'assocwith' is not present, or if 'assocwith' is present and has the value of "next". Otherwise, the is part of the content of the block immediately preceding the . Putting it all together, let's transform the following example BookX markup fragment, which, inherent to BookX, is totally flat, quite simple, and implies structural hierarchy: ====================================================================== Chapter 1

Some chapter 1 content.

Section 1

Some section 1 content.

Subsection 1

Some subsection 1 content.

Chapter 2 The End

And the last paragraph.

====================================================================== the above will map to XHTML 1.1: ======================================================================

Chapter 1

Some chapter 1 content.

Section 1

Some section 1 content.

SubSection 1

some subsection 1 content.

Chapter 2

The End

And the last paragraph.