Twitter Data Cleaning Using Python

Twitter Data was collected from three hastags: #NHL, #NHLNews, #FantasyHockey. This data was then cleaned using python.

Steps Taken

The data was imported and the file names were set from the Corpus

The Data was then run through count vectorizer which created a matrix of word appearances in each text file .

The data frame contained a large number of numeric words which were then removed to decrease the size and remove unneeded words.