SIGDOC Newsletter
December 2004
:: Volume 5, Number 4
Features
Documentation tooling and conversion issues
By Rob Pierce
robertp@us.ibm.com
The profession of information design and development requires both
writing and technical skills and the literature on the profession addresses
both
of these areas. There are many articles covering a broad range of topics
from usability, single-sourcing content, retrievability, design of content,
information architecture, structured content, and methodologies of designing,
developing and presenting the content, to name a few.
However, I have not seen a wide array of articles addressing the tooling
that provides a context for these topics and I have also not seen much
in the area of converting technical content from one format or structure
to a different format, or within the context of migrating to a different
set of tools.
While I understand that the point of developing content is to transcend
the tooling so that the concepts are not platform specific, in practicality,
we all work within a basic framework or set of tools. Yes, the concepts
of good indexing techniques apply to both the technical writer using
a text editor or different GUI applications, but there are significant
differences in the workflow or process of the actual implementation between
one solution and another. Experience shows us that the tool drives the
process.
For instance, I have not seen articles that address the issue of costs,
time, and maintenance for taking legacy information and converting it
to a new structured markup language such as xml. If the original content
was in a collection of Microsoft Word documents, what would be the actual
cost of retraining the writers, purchasing the xml documentation tools,
converting the content, producing the output deliverables and ensuring
the same (that is consistent) or improved (that is, optimized) product?
A case study might be a fascinating opportunity to determine some metrics
that would be of direct relevance to numerous documentation managers
across numerous industries.
back to top
There are likely many companies that have thousands of documents or
pages of documentation that are all created in a proprietary tool,
or a tool
that is no longer sold because the company that sold it no longer
exists. There are many examples of both these situations and we’ll
leave the names out to protect the innocent.
The areas of concern regarding conversion of content include not only
the actual text but also the formatting, the references, and the metadata.
Some products use or create simple text files, html, or other basic
formats from which it is relatively easy to extract actual content. But
other
products generate proprietary file types from which you cannot simply
extract the textual content. And in both scenarios, the best that might
be hoped for is to extract the actual content, but in either situation,
you can likely expect to lose all the formatting. Anyone who tries to
move content between Adobe FrameMaker and Word documents will discover
this.
So, there is most often not a direct migration path for legacy documentation
to a new format that will be housed or worked with in a new set of tools.
The worst case scenario is frequently the most frequent use case namely
cutting and pasting text from one tool to the new tool and reformatting
as you go along.
Scripts (sometimes called “transforms”) can sometimes be
written to automate this process if there are enough files that can all
be treated in the same manner. And this is the case when all files from
a legacy system all conform to the same set of formatting (or markup)
rules. But when there are variations, no one script can contend with
the variations. However, while scripts can automate the process, there
is likely going to be required cleanup work after the automation process.
Either way, there are always conversion issues.
The manual conversion process is more time consuming but in the end,
provides a means for writers to learn the new tooling as they paste
the content into the new files within the new tool and learn to format
it
as required.
back to top Another aspect of technical content is the referencing, both the cross-references
and index entries. A manual cut and paste will not preserve this
information. Scripts can be written to extract and preserve the integrity
of this
information including the reference points and target locations but
the task is not trivial.
There is also the issue of metadata, that is, data about the data
such as attributes or properties for a given section of content.
Unique
IDs for sections, or properties that specify conditionalization
of text are
examples of metadata and this content is difficult to convert from
one form of file type to another. And this is not information you
can usually
afford to lose in a conversion.
Finally, in a book format or Help project, there is usually an
outer layer or wrapper file that contains all of the individual
sections,
chapters, or topics of the actual documentation deliverable (even
if the deliverable
is a reference to an Internet location that hosts the actual
content). Since the wrapper file may simply be a collection of references
to the actual content that is contained in other files, there
is
no
direct means
of converting the outer layer file. This too, requires a manual
conversion.
So, in conclusion, while there are many articles of interest
to all areas of the design and development of technical communication,
it
would be
beneficial if in the future, more information became available
in the area of the handling of the actual text.
If readers know of existing articles or where to find related
information, please send a note to robertp@us.ibm.com and I
can post them in
the next Newsletter.
back to top |