Categories
Uncategorized

Twitter Word Cloud

As previously mentioned I’m a bit of a Twitter user. One of the things that I came across, actually the first python project I did, was writing code to create a word cloud based on the most recent 20 posts of my Twitter feed.

I used a post by Sebastian Raschka and a post on TechTrek.io as guides and was able to generate the word cloud pretty easily.

As usual, we import the need libraries:

import tweepy, json, random
from tweepy import OAuthHandler
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS
from scipy.misc import imread

The code below allows access to my feed using secret keys from my twitter account. They have been removed from the post so that my twitter account doesn’t stop being mine:

consumer_key = consumer_key
consumer_secret = consumer_secret
access_token = access_token
access_secret = access_secret

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

Next I open a file called tweets and write to it the tweets (referred to in the for loop as status) and encode with utf-8. If you don’t do the following error is thrown: TypeError: a bytes-like object is required, not 'str'. And who wants a TypeError to be thrown?

f = open('tweets', 'wb')

for status in api.user_timeline():
    f.write(api.get_status(status.id).text.encode("utf-8"))
f.close()

Now I’m ready to do something with the tweets that I collected. I read the file into a variable called words

words=' '
count =0
f = open('tweets', 'rb')
for line in f:
    words= words + line.decode("utf-8")
f.close

Next, we start on constructing the word cloud itself. We declare words that we want to ignore (in this case https is ignored, otherwise it would count the protocol of links that I’ve been tweeting).

stopwords = {'https', 'co', 'RT'}

Read in the twitter bird to act as a mask

logomask = imread('twitter_mask.png')

Finally, generate the wordcloud, plot it and save the image:

wordcloud = WordCloud(
    font_path='/Users/Ryan/Library/Fonts/Inconsolata.otf',
    stopwords=STOPWORDS.union(stopwords),
    background_color='white',
    mask = logomask,
    max_words=500,
    width=1800,
    height=1400
).generate(words)

plt.imshow(wordcloud.recolor(color_func=None, random_state=3))
plt.axis('off')
plt.savefig('./Twitter Word Cloud - '+time.strftime("%Y%m%d")+'.png', dpi=300)
plt.show()

The second to last line generates a dynamically named file based on the date so that I can do this again and save the image without needing to do too much thinking.

Full Code can be found on my GitHub Report

My Twitter Word Cloud as of today looks like this:

I think it will be fun to post this image every once in a while, so as I remember, I’ll run the script again and update the Word Cloud!