JMichael Blog

Thinking will not overcome fear but action will.

Natural Language Processing for Entity(name, place, etc.) Extraction using R

Apply NLP techniques in R to Annotate people and places in text files and extract them into a clean table.

Overview This is a project I work on a yearbook PDF document (University of Cincinnati_1928) and would like to extract the name, places, and institutions in the book for analysis and further study...

Jupyter Lab Customization with Python, Jave, C++, R, Matlab Environments and SQL, Diagram, Markdown Interface

A step-by-step guidance to customize your Jupyter project

Overview JupyterLab enables you to work with documents and activities such as Jupyter notebooks, text editors, terminals, and custom components in a flexible, integrated, and extensible manner. You...

HopSkipDrive Driver Marketplace Analysis

A marketplace analysis for 27k data of suppliers & customers, including cohort analysis, concentration, take rate, conversation rate, power usrs etc. using Excel and R.

For demo and recruiters view only, please do not repost or disclose any data & information. Background This is a data project comes from HopSkipDrive for interview challenge: HopSkipDrive...

Machine Learning Application on Heart Disease Prediction

Preventing heart disease is important. Good data-driven systems for predicting heart disease can improve the entire research and prevention process, making sure that more people can live healthy lives.

About (from: DrivenData) In the United States, the Centers for Disease Control and Prevention is a good resource for information about heart disease. According to their website: About 610,000 peo...

Web-browser Automation with Selenium

With Selenium, Python can be enabled to let users enter, search, scrape down and manipulate information from any source simply in one piece of scripts, with one click to run code and get your result.

This is a project from a due diligence company called Vcheck Global, which provides services such as background screening, document retrieval and specialized research. The objective is to achie...

SPE JupyterHub & Python on remote Linux/Unix servers

A Presenation to 45 related/interested fellows at Sony Pictures 19 summer -- Architecture for R , Python and Julia environments for Corporate Data Science Project Initiatives.

Jupyter on remote Linux server Jupyter is an online service that all interactive computing using different programming languages across multiple users. Jupyter Notebook JupyterNotebook Acces...

EY NextWave Data Science Challenge 2019

Local/Regional finalist, ranked top 10 in US, and regional finalist in China over 2936 participants.

The EY NextWave Data Science Challenge 2019 focuses on how data can help the next smart city thrive, and boost the mobility of the future. As a challenge participant, I downloaded a dataset wit...

Zipline Unmmaned Aerial Vehicle Data Exploration & Analysis.

An unstructured, independent exploratory data analysis & visulization assignment using 450+ flight datasets csv files to discern details and find patterns, business insights, engineering risks or anomalies.

Zipline Data Scientist Take Home Project Introduction Zipline operates the world’s only drone delivery system at national scale to send urgent medicines like blood transfusions and vaccines to th...

UCLA Data Fest 2019 -- Sports Analytics for Athlete's Fatigue Levels

Effects of Acute and Chronic Fatigue on a Rugby Player’s Performance and Advice for Coaches.

Overview The Canadian National Women’s Rugby Team seeks your advice on the role of workload and fatigue in Rugby 7s. Rugby 7s is a fast-paced, physically demanding sport that pushes the limits of ...

Demographic Analysis of People in City of Seattle

Tableau Desktop could be a powerful tool to study cencus statistically and display plots that demonstrate business insight and any other interesting findings.

THE CHALLENGE: Review the 1990,2000,2010 census data for the city of Seattle or Washington state. Find a way to represent the data in a creative and engaging way. Provide with some interesti...

Textbook Resources for Data Science (Copyright Owned by the Authors)

Textbooks from Pubic Internet

Books An Introduction to Statistical Learning with Applications in R Introduction to Machine Learning with R Rigorous Mathematical Analysis R and Data Mining: Example...

Data Mining & Machine Learning applied in Predictive Analysis

Exploring out the most influential variables in predicting the affordability among 79 potentially variables and the most effective model by applying different classification methods including Logistic Regression, K-Nearest Neighbors Method, and Random Forest

class Kaggle competition second place among two lectures 200 people Project Report This browser does not support PDFs. Please download the PDF to view it: Download PDF. </emb...