from "How to Build an Awesome Data Science Portfolio", on freeCodeCamp
from "Why Have a Data Science Portfolio and What It Shows", on eugeneyan.com
by Nick Singh, Author of the best-seller book Ace the data science interview.
A selection of recent data science projects that have been visited more than others
Natural Language Processing
Fake Review Detection and Sentiment Analysis
Fake reviews often damage the reputation and integrity of services such as Yelp. In this project fake reviews were created using natural language processing and n-gram and gpt2 models were used to discern between real and fake reviews. A sentiment analysis on the reviews was also performed to create more context around the business use case of fake reviews and how often the fake reviews are negative. The models used trained weights from BERT and DistilBERT to improve the accuracy of the models.
Time Series Forecasting
Automated Cross-Validation Framework for Time Series Model Selection
Created an automated framework for a demand forecasting problem to choose the best-performing model on a weekly basis. The models considered are: Facebook Prophet Moving Average Simple Exponential Smoothing Holt Winter's ARIMA.
Machine Learning, Data visualisation
Spotify Recommender and Visualizer
By generating global song recommendations towards highlighting global talent and multilingual songs, our project aims to create a more diverse experience to music listeners while also providing a fun user experience by generating live, audio-responsive artistic visuals.
Data Science for Business Applications | Auto ML
Data Science for Dummies | Data Voyager
Developing an application that serves as an automated E2E machine learning pipeline aimed at users who are interested in machine learning but lack the techincal skillset or time to dive in.
Wildfires are becoming more common and destructive in California. This project aims to classify and predict how big a wildfire may become by training models on a variety of weather conditions to help provide more information to prevent large wildfires.
US 2020 Elections Analytics
With my team, we used tweets collected with the Twitter API regarding the US elections of 2020. Using the LDA algorithm, we spotted the most talked topics in the months leading up to the election and for a short period after it. We also trained ML models with annotated data for gender prediction, age prediction, political affiliation prediction, and we calculated the emotion and subjectivity features. We applied these models to present charts with our findings in a Web application. Finally, we did tweet clustering based on the LDA representation of each tweet and we spotted 3 tweet clusters.
A selection of data science portfolios created this month