Words of Second Language Acquisition

TreeTagger2NITE NXT with positional tags

Posted on 14/03/2016 by Thomas

Convert from TreeTagger format to NITE XML format and includes positional information (nominative or oblique). Download here.

Posted in Linguistics & SLA | Comments Off

Download perl script to create instances from TreeTagger format files

Posted on 11/06/2015 by Thomas

TT2seq_feat_header_3grams_context.pl a perl program that you can amend at will.

Posted in NLP, PERL scripts | Comments Off

Create your matrix of features from texts

Posted on 17/03/2014 by Thomas

In contributions part, PERL script 1 is about extracting text units and displaying them in a matrix-like format so that the files can be imported as data frames in R for example. They can also be used in classifiers such as TiMBL.
The texts must previously be tokenized and tagged if you want to use the script as is. Also it targets the forms it, this and that. Feel free to modify for your own use.
Here is what the output looks like in my case:
DIDID TOKENS TAGS TOKENS3BEFORE TAGS3BEFORE TOKENS2BEFORE TAGS2BEFORE TOKENS1BEFORE TAGS1BEFORE TOKENS1AFTER TAGS1AFTER TOKENS2AFTER TAGS2AFTER TOKENS3AFTER TAGS3AFTER CONTEXT DISCOURSE
DID0014-S001.seq it PNR me PRP something NN about IN SYM SYM okay JJ 0 0
DID0014-S001.seq that TCOM also RB the DT fact NN peoples NNS are VBP drinking VBG 0 0

…
The first line corresponds to the headers. In the case of TiMBL, it needs to be deleted.

Posted in Linguistics & SLA | Comments Off

TreeTagger .par file trained on native WSJ corpus

Posted on 06/02/2014 by Thomas

After modifying some this, that and it related PoS tags in the WSJ, I trained TT on this new subset and obtained a .par file (see contribution page) that can be used to tag other corpora with the modified Penn tag set.

Posted in NLP | Tagged training, Treetagger | Comments Off

The acquisition of ‘this’ and ‘that’ by learners

Posted on 06/02/2014 by Thomas

Learners of English do not necessarily have a good command of the demonstratives. There are a variety of unexpected uses and these can be classified according to several criteria. Semantically, learners may experience difficulties when constructing referential processes. Research on deictic and anaphoric processes provides better understanding of their output. At functional level, learners experience difficulties in the selection between one form or the other. Still at functional level, there are two learner-specific micro-systems of use in which the form interact. Firstly, in the proform function, they interact with the pronoun it. Secondly, in their determiner function they interact with the determiner the. It appears that for learners this and that have competitor forms and a close investigation of their use in learner corpora would provide answers on the extent with which such issues arise.

More on this in (Gaillat, 2013a). Draft version here.

Posted in Linguistics & SLA | Comments Off

What it looks like to tag with TreeTagger

Posted on 30/01/2014 by Thomas

Once you have your files, this is what happens …

Enjoy and let me know if it helps!

Posted in Linguistics & SLA | Comments Off

What it looks like to train TreeTagger

Posted on 25/01/2014 by Thomas

It takes 3 files and a few seconds

Video here

Posted in Linguistics & SLA | Comments Off

Customise PoS tags with TreeTagger

Posted on 18/01/2014 by Thomas

It’s possible to add/modify the tagset employed by Treetagger. The solution involves a threefold methodology:
1. Retag a Penn Treebank compliant corpus
2. Train Treetagger on it
3. Used the trained .par file to tag another corpus with TreeTagger

More details on this paper (Gaillat, 2013).

Posted in NLP | Tagged PoS tagging modified tagset | Comments Off

The purpose of this blog

Posted on 19/12/2013 by Thomas

Welcome to my blog on SLA and linguistics.

The purpose of this blog is to document and share my experience in the use of NLP tools for the linguistic analysis of various corpora (learners and natives).

Posted in Linguistics & SLA | Comments Off

Words of Second Language Acquisition

TreeTagger2NITE NXT with positional tags

Download perl script to create instances from TreeTagger format files

Create your matrix of features from texts

TreeTagger .par file trained on native WSJ corpus

The acquisition of ‘this’ and ‘that’ by learners

What it looks like to tag with TreeTagger

What it looks like to train TreeTagger

Customise PoS tags with TreeTagger

The purpose of this blog

Recent Posts

Meta