SVM for Record Data in R and Python

SVM for Record Data in R

Link to Data Set

Link to Code

SVM for Biographical Statistics

Biographical Statistics such as Birth Month, Height and Weight were used to analyze the relationship with Estimated Earnings.
First Players were divided into those who have made more than 20 million dollars and those who have made less.

The data was then anlyzed using Support Vector Machines using different kernals.

Polynomial Kernal

The Ploynomial Kernal was found to have the best accuracy with cost=10 as determined by tuning the cost. The accuracy of the model is 58% as seen in the confusion matrix below. Additionally the SVM classification plot is seen below.

Linear Kernal

The Linear Kernal was found to have the best accuracy at cost=.01 as determined by tuning the cost. The accuracy of the model is 67% as seen below in the confusion matrix. Additionally the SVM classification plot is shown below.

Radial Kernal

The Radial Kernal was found to have the best accuracy at cost=.01 as determined by tuning the cost. The accuracy was 67% as shown in the confusion matrix below. Additionally the classification plot is shown below.

Conclusions

The results show that either the linear or the radial kernal is best to model the data due to the higher accuracy compared to the polynomial kernal. The results of this data suggest ther is a slight relationship between bio stats such as birth month, height and weight. What this might suggest is that players who have a certain body type are more likely to be payed more compared to others. This will allow players to determine the estimated earnings based off of their current body type and birth month and rain towards a body type that would result in higher earnings. Additionally it would allow recruiters to identify the ideal body type for hockey success and actively recruit players that fulfill these requirements.

SVM for Text Data in Python

Link to Data Set

Link to Code

SVM for Twitter Data

Twitter Data centered around the hashtags #NHL and #NHLNews was examined .

The data was then anlyzed using Support Vector Machines using different kernals. First the feature importance shown below shows that NHL and more general words about the game were more closely associated with #NHL while more fantasy hockey and news was associated with #NHLNews. This indicates that there is a clear divide between the two hashtags at the most important feature level.

Linear Kernal

The linear kernal was determined to be most accurate with a cost= .01. Additionally the accuracy of the model was 100% as seen below in the confusion matrix with 16 correct identifications and 0 misidentifications.

Polynomial Kernal

The polynomial kernal was determined to have the best accuracy at cost=1 . Additionally the accuracy of the model was 100% as seen below in the confusion matrix with 16 correct identifications and 0 incorrect identifications.

Radial Kernal

The Radial Kernal was determined to have the best accuracy at cost =.01. Additionally the model had an accuracy of 100% with 16 correct identifications and 0 misidentifications as een below in the confusion matrix.

Conclusions

All the models had 100% accuracy, however more tweets need to be collected in order to build a more rebust identifying model. This is because tweets have a level of unpredictability in them due to the not strong nature of hashtags shown in the previous sections. What this model will allow for is the better classification of tweets that are not centered around these hashtags and group them into either of these two bins. This will allow one to more effectively look for NHL News and the publics opinion on it from a wider range of tweets. It will inform players and franchises what fans are thinking about certain teams and current moves in the markets and what fans think about them so they can address their concerns in real time especially in regards to salary. This model will need a larger data set to become more robust, but initial findings show that it is able to seperate tweets based of their topic which will help players and teams in making decsions based off of public opinion.