Twitter Data Cleaning Using Python

Twitter Data was collected from three hastags: #NHL, #NHLNews, #FantasyHockey. This data was then cleaned using python.

Link to Raw Data

Link to Code

Link to Clean Data

Steps Taken

Step 1: Importing Twitter Data Set and Setting File Names

The data was imported and the file names were set from the Corpus

Step 2: Using Count Vectorizer.

The Data was then run through count vectorizer which created a matrix of word appearances in each text file .

Step 3: Removing the Unneeded words

The data frame contained a large number of numeric words which were then removed to decrease the size and remove unneeded words.

Before

After

Step 4: Exporting the Data