NLTK perceptron tagger & ldquo; TypeError: The 'LazySubsequence' object does not support element allocation & rdquo;

advertisements

I would like to try and use the PerceptronTagger in the nltk package for Python 3.5, But I am getting the error TypeError: 'LazySubsequence' object does not support item assignment

I would like to train it with data from the brown corpus with the universal tagset.

Here is the code I am running when I have the issue.

import nltk,math
tagged_sentences = nltk.corpus.brown.tagged_sents(categories='news',tagset='universal')
i = math.floor(len(tagged_sentences)*0.2)
testing_sentences = tagged_sentences[0:i]
training_sentences = tagged_sentences[i:]
perceptron_tagger = nltk.tag.perceptron.PerceptronTagger(load=False)
perceptron_tagger.train(training_sentences)

It won't train correctly, and gives the following stack trace.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-61332d63d2c3> in <module>()
      1 perceptron_tagger = nltk.tag.perceptron.PerceptronTagger(load=False)
----> 2 perceptron_tagger.train(training_sentences)

/home/nathan/anaconda3/lib/python3.5/site-packages/nltk/tag/perceptron.py in train(self, sentences, save_loc, nr_iter)
    192                     c += guess == tags[i]
    193                     n += 1
--> 194             random.shuffle(sentences)
    195             logging.info("Iter {0}: {1}/{2}={3}".format(iter_, c, n, _pc(c, n)))
    196         self.model.average_weights()

/home/nathan/anaconda3/lib/python3.5/random.py in shuffle(self, x, random)
    270                 # pick an element in x[:i+1] with which to exchange x[i]
    271                 j = randbelow(i+1)
--> 272                 x[i], x[j] = x[j], x[i]
    273         else:
    274             _int = int

TypeError: 'LazySubsequence' object does not support item assignment

It seems to be coming from the shuffle function in the random module but that doesn't really seem right.

Is there something else that could cause the problem? Has someone had this issue?

I am running this on Ubuntu 16.04.1 with Anaconda Python 3.5. The nltk version is 3.2.1


NLTK has a lot of custom "lazy" types, which should ease mangling of large bodies of data, such as annotated corpora. They behave like the standard lists, tuples, dicts etc. in many ways, but avoid occupying too much memory unnecessarily.

One instance of this is the LazySubsequence, which is the result of the slice expression tagged_sentences[i:]. If tagged_sentences was a normal list, the division of the data into test/training would create an entire copy of the data. Instead, this LazySubsequence is a view to parts of the original sequence.

While the memory benefits of this are a probably a good thing, the problem here is that this view is read-only. Apparently the PerceptronTagger would like to shuffle its input data in-place, which is not allowed – hence the exception.

A quick (but maybe not the most elegant) solution is to provide the tagger with a copy of the data:

perceptron_tagger.train(tuple(training_sentences))

You might have to do the same thing with the test data.