upCastRT – What's new?
Modular Processing Pipelines
Conversion and processing pipelines are now created from specialized modules that can be arranged in any way and order desired. You can also create subordinate pipelines to be included by others. Settings can be kept relative to allow for easily porting them between different environments, even between system platforms.
A variable pool per pipeline lets you pass custom variables to any modules in your pipeline. Access Java properties or even include a file's content.
Upcast Processing Language (UPL)
This highly specialized language lets you define complex document processing tasks with just a few lines. This includes grouping by regular expressions or style properties, layout style queries, element hoisting, complex grouping operations, …
Export options: Java source code, Ant task
Any pipeline setup can be exported to Java source code or an Ant task for effortless integration into existing environments.
Solve Grouping problems with the innovative Painter Concept
Complex grouping operations are now as easy as virtually just coloring the nodes to be grouped. This includes grouping by start, end, last before start, first after end or adjacency with support for strict and relaxed rules. Even nested grouping can be specified declaratively in one single pass over the document tree (no nesting
Regular expressions across element boundaries
Use regular expressions with groups to mark up your document content even across element boundaries. Processing makes sure that element structures within a matching group remain intact, and even ensures that non-well-formed regex-groups are made well-formed by splitting/duplicating existing elements as necessary. Deep and shallow grouping modes allow maximum flexibility.
Open several pipelines simultaneously, duplicate module definitions or copy modules between pipelines.
…and more refinements
upCastRT has already been used in several of our internal projects, and has received many small (and not so small) improvements, refinements, performance optimizations and changes to make the tedious and often complex task of document conversions as enjoyable as possible. See for yourself!
Changes & Improvements
The RTF Exporter (formerly: downCast) is about 30% to 100% faster on typical documents.
The most important thing to recognize is that upCastRT is not a simple upgrade to upCast 5, but a completely new application. To emphasize this fact, we're even skipping a "version 6" altogether.
Due to the drastic changes to the architecture in upCastRT, configuration files created for upCast 5.x or downCast 1.x can not be used or imported directly for the time being. We do offer, however, upCast and downCast templates for you to use, which set up processing pipelines that mimic a basic upCast or downCast configuration. From there, it should be relatively easy to recreate any old configurations in upCastRT.
Consequently, the Java API and Commandline interface are not compatible with code written and batch files written for earlier versions of upCast or downCast.
What about XML (Raw) output?
upCast 5 included an XML (Raw) exporter module that made it easy to access style information in languages like XSLT since respective information was added to elements in form of attributes. That format was never documented and was always subject to change.
upCastRT employs the same concept, but now makes use of namespaces for clearly and easily separating attributes derived from CSS properties and other semantic information on an element. The schema that's behind all this is the upCast internal DTD, which is generated by the RTF Importer module. Documentation on this schema (elements, namespaces, attributes) will be available soon. For now, you can serialize the internal tree using an instance of the XML Exporter module within your processing pipeline. The major structural difference between this new upCast internal DTD and the former XML (Raw) output is that any out-of-flow elements (like footnotes, textboxes, index entries etc.) are now collected in a top-level container in the document and the link to the place where they are conceptually located in the document is established via an ID/IDREF connection.