ABOUT ME INTRODUCTION DATA GATHERING DATA CLEANING EXPLORING DATA CLUSTERING ARM and NETWORKING DECISION TREES NAIVE BAYES SVM CONCLUSIONS INFOGRAPHIC

Cleaning Data

Record Data Cleaning with R

The YRBSS, developed in 1990, monitors priority health-risk behaviors that contribute markedly to the leading causes of death, disability, and social problems among youth and adults in the United States. The survey questions ask about suicide, drug, alcohol, tobacco use, violence, and other contributing risk factors.

This data initially contained 109 variables. During the data cleaning process, I dropped a few variables, labeled variables, combined some columns to create a new variable, dropped missing variables, and more.

Figure 1:Youth Risk Behavior Surveillance System (YRBSS) Before Cleaning
Image
Figure 2:Youth Risk Behavior Surveillance System (YRBSS) After Cleaning
Image

This R-code was used to clean this origianal data to this cleaned data.

Text Data Cleaning with Python

Twitter data may also provide potential insights into the general ongoing conversation about the mental well-being of Black youth. Cleaning text data included removing stop words, numbers, and punctuations to clean the text data and normalizing data to shrink the dimension of the data. The final data form consists of singular words as individual variables, and each row represents the topic of discussion.

Figure 3:Twitter Data in Json
Image
Figure 2:Twitter Data after Being Cleaned
Image

This R-code was used to clean the origianal data to

Quote of the day:"If you torture data long enough, it will tell you whatever you want to hear."