[./nooj.html]
[Web Creator] [LMSOFT]
New functionalities (v2.3):

-- a syntactic parser that produces two trees: the derivation tree represents the structure of the grammar that was applied to parse the sequence; the structural tree represents the structure of the parsed sequence. The syntactic structure is represented in the TAS, and can be used to disambiguate the text.

-- new tools for the disambiguation: automatic (using local grammars), semi-automatic using simple requests such as that/<PRO>, semi-automatic using the list of all the ambiguities, manual (directly click and delete annotations in the TAS)


New functionalities (v2.0):

-- two types of constraints can be produced by a syntactic or a morphological grammar, for instance:

<$N=:N+Hum> checks that the linguistic unit stored in $N matches the query N+Hum-f (noun, human)

<$N$Sem=Hum> checks that the value of property $Sem of linguistic unit is equal to "Hum"

<$N$Nb=$A$Nb> checks that the value of property $Nb of linguistic unit $N is equal to the value of property $Nb of the linguistic unit $A

-- the new dictionary compiler is much more robust than before. It compiles a 120+ million entry dictionary (for Hungarian) in a few hours.

-- the new corpus processor is much faster. It has no problem processing 10,000+ text files.

v1.43:

-- NooJ's syntactic parser can process contracted words: for instance it can locate and annotate a Noun Phrase in the French text "aux étudiants" (where "aux" is a contraction of a preposition and a determiner)

-- noojapply can be used to build .not and .noc files.

-- "Lab > Construct A Corpus" can now split a large text file, including a PDF document, to build a corpus.

v1.42:

-- full support for frozen expressions: NooJ manages discontinuous annotations and processes dictionary/grammar pairs that mimic lexicon-grammar tables. The new feature +FXC is used in a dictionary to tell NooJ that the lexical entry is in fact a Frozen eXpression Component (that must be recognized by the corresponding grammar). The +XREF feature in a grammar's output is used to link together different parts of a discontiguous annotation. See the French lexical resource C1D (C1D.xls, C1D.dic and C1D.nog) for an example of a table of frozen expressions.

v1.41:

-- improved XML importation: now XML tags that represent Lexical Units, namely  are converted into lexical annotations. Two special properties for these tags: LEMMA and CAT. For instance:

<LU LEMMA="table" CAT="N" plural>tables</LU>

will be converted into the corresponding lexical annotation into the Text Annotation Structure. Note that XML tags that were converted into annotations are now replaced with blanks in the text: that makes the text more readable.

This allows users to import in NooJ texts that have been tagged with other NLP applications.

-- Lexical features can be copied to syntactic annotations via the system of lexical information fields. Moreover, the new special field variables: $CAT, $ALLS and $ALLF can be used to copy the category of a lexeme, all its syntactic & semantic information, and all its inflectional information.

For instance, if the following lexeme is stored in variable $N:

table,N+Conc+sing

then $N stores "table", $N$Nb stores "sing", $N$CAT stores "N", $N$ALLS stores "+Conc" and $N$ALLF stores "+sing".

v1.40:

-- Lexical constraints can be inserted in syntactic grammars. They allow to process agreements very naturally. For instance, a lexical constraint such as <$V=V+PR+3+s> could be used to check that a certain verb is conjugated in PResent, 3rd person singular. This constraint could be produced in the output of a node that recognizes "he+she+it": this agreement is then processed independantly from the structure of the sentence.

Moreover, lexical constraints can be used to formalize lexicon-grammar tables lexical constraints (such in tables for frozen expressions). A constraint such as <$N1="les pédales"> could be associated with the frozen expression’s lexical entry "perdre" (perdre les pédales = lose one’s mind).

-- Compound variables can be used to get the property value of a lexical entry, and then to apply a morphological operator to it. For instance, "$V$FR_PP+m+s" can be used to compute the French translation of a verb, and then to produce its masculine singular past participle.

***** VERSION 1.30 *****
     October 2006
************************

-- NooJ can import XML  tags that were either exported from NooJ (TEXT > Export annotations as XML tags), or that come from other systems. In that case, make sure to have a LEMMA and a CAT properties so that NooJ can construct the corresponding Linguistic Unit.

-- Syntactic Grammars can now be automatically launched (via the Info > Preferences > Syntactic Analysis) in the three modes: LONGEST, SHORTEST, ALL. LONGEST is still the default behaviour. To launch a grammar in ALL modes, then add the suffix "-A" to its name, e.g. "DATE-A.nog". To launch it in "SHORTEST" mode, add suffix "-S".

-- A new special category "NW" is used; it is similar to the feature "+NW" (non-word), but is seen by NooJ’s syntactic parser and therefore can be used in a syntactic grammar (even though it does not display in the Text Annotation Structure.

-- the debugger displays the derivation tree of the current solution.

-- the +UNAMB feature is also available for the syntactic parser.

-- Variables have now access to property names. For instance, the variable:

$Noun_$FR

could have for value the value of the +FR property for the lexical entry that recognizes $Noun.

-- Variables can now be set explicitely. For instance, the path:

$(N not anymore $)negation

tells NooJ that if "not anymore" has matched, the variable $N must be set to "negation". In the same manner:

$(N $)affirmation

tells NooJ that if there is nothing, the variable $N must be set to "affirmation".

-- The Tokens and the Digrams "Export" functionality now exports frequencies.

-- Morphological grammars can be recursive now and they can be combined, thanks to a new recursive operator "=:" in lexical constraints. For instance, the word form "reremountable" can be parsed by the morphological grammar "Verb#able", that produces the recursive lexical constraint <reremount=:V>. Then the morphological grammar <re#Verb> produces the lexical constraint <remount=:V>, then produces the lexical constraint <mount=V>, that checks OK. The resulting analysis is then

reremountable,mount,A+Repetition+Repetition+Able

-- A new TRANS special annotation that can be used in conjunction with the XML exportation to translate parts of the text. This functionality resembles the INTEX replace mode, and is more powerful because a given grammar could generate more than one translation.

-- A Grammar Debugger for morphological and syntactic grammars

-- Derivational paradigms can now be combined with Inflectional paradigms. For instance:

#in dictionary:
play,V+FLX=ASK+DRV=ER:TABLE

#in inflectional/derivational description:
ASK = ... ;
ER = er/N ;
TABLE = <E>/s + s/p;

inflects "play" according to inflectional paradigm "ASK", then derives "play" to the Noun "player", then inflects "player" according to inflectional paradigm "TABLE".

-- Syntax Coloring for morphological description rules

-- Concordance > Export Matching or Non-Matching text units

-- Grammars in the Syntactic Analysis module can now color matching sequences in texts. Use feature +COLOR=XXX, where the color is RED, GREEN, BLUE, YELLOW, PURPLE, CYAN, LIGHTRED, LIGHTGREEN, LIGHTBLUE.

-- A new Export XML annotations available for Corpora

-- NooJ grammars can produce embedded annotations, e.g.:(*)

<HUM><COMP>IBM</COMP>’s CEO</HUM>

-- NooJ’s syntactic parser has been optimized (up to 20 time faster)

***** VERSION 1.21 *****
     October 2005
************************

-- automatic vowelization for semitic languages
-- morphological operators fully integrated in the syntactic parser
-- derivational morphology integrated into FLX files
-- new algorithms & data structures for faster processings
-- unlimited history for dictionary (.dic/.flx/.def) and grammar files
-- syntactic variables can be used in loops and in embedded graphs
-- codes in DELA dictionaries have been adapted to NooJ

v1.20: September 2005.

v1.10: first public release, March 2005.

v1.0: presentation, Univ. de Tours, June 2004.
Bug fixes:

-- XML exportation could crash when asked to export annotations that were embedded in a contracte word form
-- the automatic disambiguation process (in Info > Syntactic Analysis) would choose a random disambiguation solution when a sequence was not 100% disambiguated
-- Text Annotations’ window did not take disambiguations into account
-- Text coloring did not always work; Concordance filtering did not always work.
-- Grammars that contained loops with embedded variables and outputs, inside which  was recognized, could loop forever.
-- Highly ambiguous grammars (e.g. with 65536+ ambiguities for a few words) would crash NooJ’s syntactic parser
-- Grammars’ contracts were not using always the same lexical resources as the ones in Info > Preferences > Lexical Analysis
-- NooJ did not properly display texts in corpora that have never been parsed; Corpus’s REMOVE button was not properly updating the corpus’s list of texts.
-- words recognized by derivational rules were not properly displayed in the Annotations window.
-- the Preference window was hidden behind the main window after selecting fonts
-- concordances did not correctly display new matching sequences if "Show Input/Output" was modified before a new Locate
-- grammars starting with <CAP>, <LOW> or <UPP> were not finding all matching sequences
-- unexpected crashes when displaying annotations just after having performed a linguistic analysis on a corpus.
-- after importing a lexical resource, an incorrect error message was displayed
-- compounds that occured across lines, or that contained an XML tag, e.g. prime <Ital> minister </Ital> were not recognized
-- the large dictionary compiler now consumes less memory; some large Hungarian dictionaries that could not compile, can now
-- initial word forms counts in texts was incorrect
-- graph history is no longer recorded in grammar files; after a series of Undo’s and Redo’s, Purge now deletes "past" as well as "futur" modifications
-- the option "Longest Match" was not always removing shorter matches
-- unexpected crashes when loading some grammars
-- unexpected crashes when exporting some annotated texts into XML just after having performed a linguistic analysis.