2011 Draft Bio Data Cleaning Using R

Biographic Data was cleaned using R to create a data set that will be easy to analyze. the code is below on the left and the original data set is on the right, with the clean data set being below.

Clean Data

Steps Taken

Step 1: Importing Data Set and Flattening the Nested Data

The data was imported into R, but it was in a nested json format. This required code to be written to flatten the json in order to access the nested data which is seen below.

Step 2: Removing Unneeded Columns.

Unneeded columns such as multiple different formats for position were removed from the data frame as seen below..

Step 3: Converting the Data

Data was converted from character to the correct format such as numeric or factor data as seen below.

Step 4: Filling in Missing Values

Missing Values for age were filled in because age could be calcul;ated for date of birth. Missing values in the current team were left because not all players are currently playing in the NHL, thus the rows should not be removed.

Step 5: Converting Date Data

Date of birth was converting from a charactrer string to a date format.

Step 6: Exporting

The Data was exported