Projects

Breast Cancer Analysis and Prediction

Advanced machine learning methods were utilized to build, test and optimise the performance of K-NN algorithm for breast cancer diagnosis.
Python scikit-learn machine learning feature selection PCA cross-validation evaluation-metrics Pandas IPython notebook

Identify Fraud from Enron Email

Identified which Enron employees are more likely to have committed fraud using machine learning and public Enron financial and email data.
Python scikit-learn machine learning natural language processing feature selection

Explore and Summarize Data

Investigated a wine dataset using R and exploratory data analysis techniques, exploring both single variables and relationships between variables.
RStudio R packages plotting in R exploratory data analysis techniques

Wrangle OpenStreetMap Data

Chose a region and used data munging techniques to assess the quality of the data for validity, accuracy, completeness, consistency and uniformity.
Python data verification data cleaning

Investigate a Dataset

Posed a question about a dataset, then used NumPy and Pandas to answer that question based on the data and created a report to share the results.
Python NumPy Pandas Matplotlib IPython notebook