XTC Short Introduction


Table Of Contents


Introduction
Quick start
XTC in detail
Technical notes


Introduction


XTC (Xml Tree Compare) is a differential tool for XML files. The intention of the program is to have a 'change detection'-tool for two versions (an old and a new one for example) of a file.

The compare process is kept as generic as possible. The XML documents must be well-formed and that is the only presumption, so XTC can be used for any XML-related format such as SVG, XVL, etc... The result of the compare process is written into a file (result file). There is also a result visualization, showing the XML structure as trees including the marked changes.

XTC is useful for:

Quick Start

A) Gui version:
1. Start XTC by double clicking on the program icon or by selecting the menu entry in the 'programs'-menu of your windows installation. The program's main window appears. Select two XML files to be compared by using the buttons 'XML file 1' and 'XML file 2'. Once the files have been selected their paths are shown left to the buttons in the white line edit boxes.


2. Press the large button 'Diff'.
3. Depending on the sizes of the selected files and the hardware of your computer the comparison may take some seconds. XTC will turn the mouse pointer into an hourglass symbol to indicate that the process is running.
4. A message box informs you as soon as the comparison is finished.
5. After the compare process has finished the result can be viewed by pressing the leftmost button on the tool bar (showing the tree symbol) or by choosing the menu entry File->Visualization. A separate window opens showing the XML structure of XML file 1 on the left hand side and the structure of XML file 2 with changes on the right side. The visualization is interactive: If an element in one of the trees is selected, the corresponding element in the other tree and the selected one are highlighted (note that added or deleted elements don't have a corresponding element).

B) Server edition:
1. Open a shell (DOS box on Windows) and enter the directory where the XTC executable is stored.
2. Type 'xtc.exe xmlfile1 xmlfile2 -batch' where 'xmlfile1' is the path to the first of the to XML files, 'xmlfile2' is the path to the second XML file.
3. Depending on the sizes of the selected files and the hardware of your computer the comparison may take some seconds.
4. After the process has finished, a result file can be found in the directory where 'xmlfile2' is stored. If you can't find the result file, please check the configuration file (xtc_cfg.xml in the current directory or, alternatively in your home directory) for the <writeresultfile> entry. It must look like this:
<xtcparam_bool name="writeresultfile">1</xtcparam_bool>.
Or check the log file xtclog.txt if an error occurred during the comparison.
Result file example (fragment):

XTC in detail

Overview

Motivation

The comparison of XML documents can be motivated by different intentions, such as taking a quick view on what has changed since the last revision of a document. Or by tracking changes in a further step of your XML processing, for example if changes have to be marked in a rendered version of the XML file. Also quality assurance purposes may need the comparison of XML, for example if drawings are stored in an XML related format (such as SVG). A comparison of two versions can reveal all changes, even very small ones not even visible in a graphical editor.

Change Marks

Changes are marked in the result file by adding change marks. A change mark is a processing instruction, it describes the compare result for the following XML element. For each XML element in file B the compare process results in one of the following ten change marks:

complete match
position change
content change
attribute change
attribute and content change
position and content change
position and attribute change
position and content and attribute change
element added
element deleted

Four change marks (position change, position and content change, position and attribute change, position and content and attribute change) indicate that the element has moved. The distance of the move (relative to the parent element) is indicated by a number added to the change mark, it displays the position difference to the element's former position in file A. A number greater then zero indicates a 'move fore ward', that means the distance to the parent element is greater then before (e.g. because a new element has been added in between). A number smaller than zero indicates that the element now is nearer to its parent element (e.g. because an element has been deleted in between)
Example:

file A:

<parent>
<one/>
<two/>
<three/>
</parent>

file B:

<parent>
<one/>
<two/>
<hundred/>
<three/>
</parent>


The result will be:
<parent>
<one/>
<two/>
<?XTC element added?>
<hundred/>
<?XTC position change 1?>
<three/>
</parent>


(In this example 'complete match' change marks have been omitted for the sake of simplicity).
Note that the change marks (here <?XTC element added?>) do not count when calculating the distance of a move, because they are not 'real' content of the document. Other processing instructions (non XTC processing instructions) count as normal content.
The change marks are also used in the visualization.

Configuration

To apply to a large variety of use cases, XTC has been designed as a generic tool. Being well formed is the only presumption made to the files that are to be compared. As with any generic tool XTC has to be configured to serve the user's needs as good as possible. XTC's configuration options will be explained in the following text.

Note: the following text assumes that you use the gui version. If you have the command line version only, you must configure XTC by editing XTC's configuration file xtc_cfg.xml. The configuration file is located in the current directory or in your home directory. XTC looks into the current directory first and if no xtc_cfg.xml found into the user's home directory.

Start the XTC gui (see Quick start). Press the second button on the toolbar or choose the Edit->configuration menu. A dialog box opens showing a tabbed dialog consisting of four tabs ('General Settings', 'Anchor Elements', 'Text Diffing' and 'Change Marks').

The 'General Settings' tab:



The 'Diff mode' group configures the basic features of the comparison.


'Root elements' defines, how XTC will handle the XML root elements, since they are not part of the diff process (this is needed sometimes for the use of XTC in batch mode).

'File handling' sets parameters for the writing of the result file

The'Attribute' tab:

The 'Anchor elements' tab:

Anchor elements are used to 'navigate' through the XML tree during the comparison. They serve as a 'hint' to the algorithm and can force the program to find the right counterpart of an element. An anchor is an XML node or an attribute that does not change its content, thus it is identical in both XML documents. If so, the element can serve as an anchor to its parent.

Example:
File A:

<chapter>
<section>
<sectionmeta>
<sectionid>3</sectionid>
<sectionname>.....</sectionname>
</sectionmeta>
<para>....</para>
...
</section>

File B:

<section>
<sectionmeta>
<sectionid>4</sectionid>
<sectionname>.....</sectionname>
</sectionmeta>
<para>....</para>
...
</section>
<section>
...
</section>
</chapter>

Here the element <sectionid> can be defined as an anchor. If the order of 'section'-elements has changed in the second XML document and the element contents (apart from the 'sectionid'-element of course) too, the program can now find the suitable counterelement 'section' by searching for its subelement 'sectionid' and compare its contents. The matching of the anchor elements is a 'match of element content', here the text child node of <sectionid>. The compare functionality compares the texts of the whole subtree of the anchor element (but omits attributes).
In this example the anchor definition would be:

section sectionmeta/sectionid
(see picture)

The path to the anchor element is separated by / (slash).

Also attributes can be defined as anchors. An example would be:
<chapter>
<section>
<sectionmeta id=”3”>
<sectionname>.....</sectionname>
</sectionmeta>
<para>....</para>
...
</section>

<section>
<sectionmeta id=”4”>
<sectionname>.....</sectionname>
</sectionmeta>
<para>....</para>
...
</section>
<section>
...
</section>
</chapter>

Here the anchor definition would be: section sectionmeta/@id The '@' indicates that the anchor is an attribute.


To define a new anchor for an XML element, push the 'Add element' button.
A small dialog window will appear, enter the element's name and the path to the element's anchor:

Only one anchor should be defined for an XML element. If more than on anchor is defined, only the last one is used. The anchor functionality for attributes works even when 'ignore attributes' is activated. Anchor elements that are defined but not found in the XML structure will be ignored.

The 'absolute' checkbox:

As mentioned before an anchor serves as a way finding mechanism to the element's counterpart. If an appropriate counter element can't be found, two reasons can be distinguished:
1. The anchor's content didn't match any anchor content from the other side (anchor mismatch).
2. The element hasn't got the anchor that has been defined in the configuration.
When the 'absolute' checkbox is checked, elements with anchor mismatches are tagged as 'added' and 'deleted', while elements without the defined anchor remain untouched. If the checkbox is not activated no such distinction is made and all elements for which the anchor property is defined but no counterpart could be found remain untouched.

The 'normalize spaces' checkbox:

Since the anchor's value is an element's content (a textual representation of it), a whole subtree can be an anchor's content. If checked multiple spaces are reduced and carriage returns and tabs inside texts are eliminated. This is useful since some editor programs add undemanded spaces and carriage returns.

The 'Text diffing' tab


To text elements that have changed their contents a textual diff function can be applied. The 'enable text diff' checkbox switches the text diff on and off. The algorithm classifies the text into three categories:


The Settings:


For real (natural language) texts a minimum length for the LCSS of 3 or 4 is useful.
Finally the text marks for insertion and deletion can be entered. Default values are: [+] text insertion start [/+] text insertion end [-] text deletion start [/-] text deletion end

The 'Change Marks' tab:

This tab provides the possibility to enter your own texts for the change marks.
The checkboxes indicate if a change mark will appear in the result document. Every change mark can be switched on or off.

Technical notes

XTC is programmed in C++ using the QT library from Trolltech Nokia.

A special XML-API has been developed to assure flexibility and very good performance of the tool.

XTC runs under Windows 2000, XP, Vista, Windows 7
Versions for other operating systems will be delivered on demand.

http://xmldifftool.com

Questions:
info@xmldifftool.com

Copyright © 2009-2012 Martin Achtziger