When trying to manipulate data, assign a list to the first item in a higher list, the second item will be information about that list

advertisements

Ok, I'm trying to transmit a list of values, alongside information regarding that list of values. I am trying to do that while manipulating the data. Let me show you what's going on:

worddictlist2 = []
for innertweet in namelist:
        worddictlist = []
        for tweet in innertweet[0]:
                worddict = {word: tweet.count(word) for word in wordlist}
                worddictlist.append(worddict)
                worddictlist2.append(worddictlist)

namelist is a variable with the following information:

[[['blah blah blah string blah blah blah blah blah blah', 'another string, blah blah blah, string string', 'string string string'], category], ['string string another string, blah', 'more words, more words, etc', 'yet again, here we go'], category2]

I am counting the number of times that a particular word occurs in each phrase. However I still want to keep the category assignment in some way.

I've been trying to append different lists throughout the various loops, I've tried different list comprehensions, and I'm just not seeing the result I want, which will be as follows:

[[{word1: 0, word2: 7, word3: 12, word4: 6}, category], {word1: 3, word2: 9, word3: 1, word4: 2}, category2]]

How can I get this output? Am I doing this inefficiently? The way I am torturing this data makes me feel like I am doing this process inefficiently.


Given data:

category = "C"
category2 = "C2"

namelist = [
  [['blah blah blah string blah blah blah blah blah blah', 'another string, blah blah blah, string string', 'string string string'],
   category
  ],
  [['string string another string, blah', 'more words, more words, etc', 'yet again, here we go'],
   category2
  ]
]

wordlist = "blah string words".split()

Then this should work as described:

from collections import defaultdict

worddictlist2 = []
for innertweet in namelist:
    worddict = defaultdict(lambda: 0)
    category = innertweet[1]
    for tweet in innertweet[0]:
        for word in wordlist:
            worddict[word] += tweet.count(word)

    # optional - transform defaultdict into standard dict to make it printable
    worddictClean = {}
    worddictClean.update(worddict)

    worddictlist2.append([worddictClean, category])

print worddictlist2

And it outputs:

[[{'blah': 12, 'string': 7, 'words': 0}, 'C'], [{'blah': 1, 'string': 3, 'words': 2}, 'C2']]