Category Archives: NLP

Natural Language Processing issues and methods

Download perl script to create instances from TreeTagger format files

TT2seq_feat_header_3grams_context.pl a perl program that you can amend at will.

Posted in NLP, PERL scripts | Comments Off

TreeTagger .par file trained on native WSJ corpus

After modifying some this, that and it related PoS tags in the WSJ, I trained TT on this new subset and obtained a .par file (see contribution page) that can be used to tag other corpora with the modified Penn … Continue reading

Posted in NLP | Tagged , | Comments Off

Customise PoS tags with TreeTagger

It’s possible to add/modify the tagset employed by Treetagger. The solution involves a threefold methodology: 1. Retag a Penn Treebank compliant corpus 2. Train Treetagger on it 3. Used the trained .par file to tag another corpus with TreeTagger More … Continue reading

Posted in NLP | Tagged | Comments Off