http://www.flatironssolutions.fr/ header_visibility: footer_visibility:
lang: en

The Inherent Structure of Content

Calendar October 12, 2015 | User Falk Aupers | Tag , , , , , ,

All information that we read is presented to us in one way or the other. This can be books telling a story, signs conveying a short message, or assemble instructions for a bed downloaded from the manufacturer’s website. Sometimes the same information is available even in different formats, like a printed book as well as an eBook especially made for tablets or mobile phones.

Let us have a closer look at the assemble instructions for the bed: For authoring the document that is used for printing as well as creating the PDF for download from the website, software like Microsoft Word is being used. The bed is very popular which makes it a bestseller for the manufacturer. Over the years, small changes to the design of the bed are being made and the instructions need to be updated. In the meantime the software version was updated and the original file is not compatible with the new version anymore. Instead of doing quick updates to the instructions, the whole document needs to be revised. And this procedure is repeated every 5 to 6 years!

The main challenge with this approach is such software using its own proprietary data format which is adapted now and then. Opening the file in a simple text editor like Microsoft’s Notepad reveals a large number of special binary characters that are not readable by humans. In among these characters the actual content of the instructions can be found. Compared to the amount of the binary characters, the content is only a tiny part of the whole data in the file. Why is that?

FA_image1
Figure 1 – A poem, separating layout from actual content

In order to present the information the software needs to store the layout information like fonts, font sizes, page size, margins, etc. This makes up the biggest part of the data and is coded in a way that only the specific version of the software can render the contents correctly. In order to make the information independent of the software you would need to separate the layout part from the actual content, which in this case is the assemble procedure.

To illustrate the process of separating layout from actual content, a much simpler example than an assembly procedure is being used: A poem, which is illustrated in Figure 1. In my training classes I ask the people to imagine having a simple text editor only, without any layout capabilities. What would they consider to be content and put into this file?

The first answer usually is something like: “Type all characters you see on the page and that belong to the poem into the editor. Then save the file.”. This would result in:

“A Reasonable Affliction”

(Matthew Prior)

On his death-bed poor Lubin lies;

His spouse is in despair;

With frequent sobs, and mutual cries,

They both express their care.

 

“A different cause,” says Parson Sly,

“The same effect may give:

Poor Lubin fears, that he shall die;

His wife that he may live.”

 

Arguing a bit further leads to the removal of the empty lines in between and the parentheses around “Matthew Prior”. Empty lines are definitely no content and the parentheses could have been added by the publisher. A few people go to the extreme and finally even write all text in a single line. In either case looking at the final result shows that something has been lost. Without doubt the actual content has been separated from the layout. But is it possible to reproduce the original presentation from the pure content again? Why would I create the parentheses around “Matthew Prior” and not “With frequent sobs, and mutual cries,”? Why would I present “A Reasonable Affliction” in a bigger font size and blue color instead of “Matthew Prior”?

The specific layout chosen assigns a meaning to the respective content. Bigger font size and blue color tells “A Reasonable Affliction” to be the title. The parentheses around “Matthew Prior” and the italic type face tells this to be the author. The empty lines make the stanzas or verses visible which in turn consist of four lines each. The complete package then makes up the poem. So in addition to the actual content there is an inherent structure attached to the information presented to us. And this structure must be kept as well. This is done in the following version of the poem:

<poem>

<title>”A Reasonable Affliction”</title>

<author>Matthew Prior</author>

<verse>

<line>On his death-bed poor Lubin lies;</line>

<line>His spouse is in despair;</line>

<line>With frequent sobs, and mutual cries,</line>

<line>they both express their care.</line>

</verse>

<verse>

<line>”A different cause,” says Parson Sly,</line>

<line>”The same effect may give:</line>

<line>Poor Lubin fears, that he shall die;</line>

<line>His wife, that he may live.”</line>

</verse>

</poem>

This is an Extensible Markup Language (XML) version of the poem, the content is marked up with so called “tags” [W3C 1998, para. 3]. The start-tag “<…>” marks the beginning and the corresponding end-tag “</..>” the end of the respective structure. By this you see what kind of information a certain content is of and it becomes easy to handle it. Software creating PDF can automatically assign a bigger font size and blue color to the content within the title tags, an empty line before a verse structure and parentheses around the content within author.

Getting back to the assemble procedure it is obvious, that it is the inherent structure that is different than the one applied to a poem. But the principle is the same. With a suitable set of tags the assemble procedure can be written in XML.

But what is a suitable set of tags for a procedure? Different manufacturers for instance could have their own way to describe the inherent structure of a procedure. As long as they don’t deal with each other and there is no need to exchange their documentation this might not be an issue. But what if they deliver their products to the same customer, which might be a ship owner? Then the ship owner has to deal with different structures for basically the same thing. This is a situation when it gets important to find a common standard set of tags for exchanging such XML data.

XML provides two ways of defining a set of tags for a document type. One is the Document Type Definition [W3C 1998, para. 2.8] and the other is the XML schema [W3C 2012]. Both are separate files and can be exchanged between partners. They serve basically two purposes: First of all they guide the own authors in creating the contents using the predefined tags. Second they allow you to validate XML files you get from partners in a project to be in accordance with the rules.

S1000D is a specification that among other things standardizes inherent structures for document types being used in technical documentation. Such document types are for instance descriptive, procedural or spare parts information. For each of these a separate XML schema exists. A big advantage of S1000D lies in the experience that has gone into developing these sets of tags, the schemas. For instance a maintenance procedure does not only consist of the separate steps that are being performed, but also of preliminary requirements that must be met in order to start the procedure as well as finishing tasks after the procedure has been performed. Parts of the preliminary requirements are lists of spare parts, supplies and/or tools that are required to perform the procedure. These three structures have something in common: They all have a name, a part number and a manufacturer. So the schema defines corresponding tags for this information. So when loading such a XML procedure into a maintenance planning system, it can already prove the spare parts, supplies or tools being in stock or available.

To sum it up, the advantages of using XML for marking up technical documentation are:

  • It gets independent of soft- and hardware,
  • It is long-living,
  • It can be published in different formats like HTML or PDF,
  • It can be easily exchanged when using a common standard set of tags,
  • It can be evaluated automatically,
  • It is future proof.
©2019 Flatirons Solutions, Inc.|All Rights Reserved|Privacy Policy
Follow Flatirons Solutions