HOME | CONTACT | QUICK JOIN | SITEMAP
About
Join
Members
Conference
Newsletter
Awards
Board

Newsletter

SIGDOC Newsletter
December 2004 :: Volume 5, Number 4


Our members | Looking Ahead | Interesting Items | Features | Job Market

Features

Documentation tooling and conversion issues
By Rob Pierce
robertp@us.ibm.com

The profession of information design and development requires both writing and technical skills and the literature on the profession addresses both of these areas. There are many articles covering a broad range of topics from usability, single-sourcing content, retrievability, design of content, information architecture, structured content, and methodologies of designing, developing and presenting the content, to name a few.

However, I have not seen a wide array of articles addressing the tooling that provides a context for these topics and I have also not seen much in the area of converting technical content from one format or structure to a different format, or within the context of migrating to a different set of tools.

While I understand that the point of developing content is to transcend the tooling so that the concepts are not platform specific, in practicality, we all work within a basic framework or set of tools. Yes, the concepts of good indexing techniques apply to both the technical writer using a text editor or different GUI applications, but there are significant differences in the workflow or process of the actual implementation between one solution and another. Experience shows us that the tool drives the process.

For instance, I have not seen articles that address the issue of costs, time, and maintenance for taking legacy information and converting it to a new structured markup language such as xml. If the original content was in a collection of Microsoft Word documents, what would be the actual cost of retraining the writers, purchasing the xml documentation tools, converting the content, producing the output deliverables and ensuring the same (that is consistent) or improved (that is, optimized) product? A case study might be a fascinating opportunity to determine some metrics that would be of direct relevance to numerous documentation managers across numerous industries.

back to top

There are likely many companies that have thousands of documents or pages of documentation that are all created in a proprietary tool, or a tool that is no longer sold because the company that sold it no longer exists. There are many examples of both these situations and we’ll leave the names out to protect the innocent.

The areas of concern regarding conversion of content include not only the actual text but also the formatting, the references, and the metadata.

Some products use or create simple text files, html, or other basic formats from which it is relatively easy to extract actual content. But other products generate proprietary file types from which you cannot simply extract the textual content. And in both scenarios, the best that might be hoped for is to extract the actual content, but in either situation, you can likely expect to lose all the formatting. Anyone who tries to move content between Adobe FrameMaker and Word documents will discover this.

So, there is most often not a direct migration path for legacy documentation to a new format that will be housed or worked with in a new set of tools. The worst case scenario is frequently the most frequent use case namely cutting and pasting text from one tool to the new tool and reformatting as you go along.

Scripts (sometimes called “transforms”) can sometimes be written to automate this process if there are enough files that can all be treated in the same manner. And this is the case when all files from a legacy system all conform to the same set of formatting (or markup) rules. But when there are variations, no one script can contend with the variations. However, while scripts can automate the process, there is likely going to be required cleanup work after the automation process. Either way, there are always conversion issues.

The manual conversion process is more time consuming but in the end, provides a means for writers to learn the new tooling as they paste the content into the new files within the new tool and learn to format it as required.

back to top

Another aspect of technical content is the referencing, both the cross-references and index entries. A manual cut and paste will not preserve this information. Scripts can be written to extract and preserve the integrity of this information including the reference points and target locations but the task is not trivial.

There is also the issue of metadata, that is, data about the data such as attributes or properties for a given section of content. Unique IDs for sections, or properties that specify conditionalization of text are examples of metadata and this content is difficult to convert from one form of file type to another. And this is not information you can usually afford to lose in a conversion.

Finally, in a book format or Help project, there is usually an outer layer or wrapper file that contains all of the individual sections, chapters, or topics of the actual documentation deliverable (even if the deliverable is a reference to an Internet location that hosts the actual content). Since the wrapper file may simply be a collection of references to the actual content that is contained in other files, there is no direct means of converting the outer layer file. This too, requires a manual conversion.

So, in conclusion, while there are many articles of interest to all areas of the design and development of technical communication, it would be beneficial if in the future, more information became available in the area of the handling of the actual text.

If readers know of existing articles or where to find related information, please send a note to robertp@us.ibm.com and I can post them in the next Newsletter.

 

back to top