NooJ understands over 100 text file formats, including MS-WORD, HTML, etc.
Texts' Units can be paragraphs, delimited by a PERL regular expression, or defined as XML nodes.
NooJ can parse XML documents; XML tags can be integrated into NooJ's annotation system.
NooJ can export some of its annotations as XML tags.
Figure shows a French/Unicode/XML document being imported.