This 'NYCDataScience' folder contains projects source codes that I have implemented during NYC DataScience academy. Please email me if you have any questions or comments.
As the first project, I did exploratory data analysis and visualization of the Yelp academic dataset using R, R Markdown, Knitr, googleVis and Shiny. I identified the hidden and interesting relationships between the number of followers at Yelp and average rating frequency as well as average rating scores. Namely, as the number of followers at Yelp increase, users tend to rate more frequently until some point and show tendencies to rate higher
The Shiny Dashboard application is an interactive data scientist salary comparator against other eight professions in the US, including software engineer, business analyst, and assistant professors.
This application is based on the Yelp academic data set and provides restaurants information in users' areas of interests. Through multifaceted interactive components such as data explorers, scatter plots, and comparison maps, users of applications can get broad and deeper understanding of various aspects of the data sets.
implemented an interactive application which embeds K-means clustering and sentiment analysis based on web-scrapped data about jobs and companies from indeed.com and dice.com. The contents were extracted and processed with a variety of Python packages and natural language processing libraries including NLTK.
Participated in the Kaggle Competition (BNP Paribas Cardif project) to predict category of user claims based on features available in the early process of claim management from a large data set (30~40% missing and anonymized). I continuously utilized multiple supervised learning methods for better performance and higher prediction accuracy using decision tree, random forest, gradient boosting model, and xgboost.
Collaborate with clients from the DataSci4Good for the Wise project which is designed for young people to develop financial literacy. Our project team currently tries to implement improved visualization and embe recommendation components to the existing Shiny dashboard application for efficient learning experience.