from "How to Build an Awesome Data Science Portfolio", on freeCodeCamp
from "Why Have a Data Science Portfolio and What It Shows", on eugeneyan.com
by Nick Singh, Author of the best-seller book Ace the data science interview.
Trending projects
A selection of recent data science projects that have been visited more than others
Natural Language Processing
Fake Review Detection and Sentiment Analysis
Fake reviews often damage the reputation and integrity of services such as Yelp. In this project fake reviews were created using natural language processing and n-gram and gpt2 models were used to discern between real and fake reviews. A sentiment analysis on the reviews was also performed to create more context around the business use case of fake reviews and how often the fake reviews are negative. The models used trained weights from BERT and DistilBERT to improve the accuracy of the models.
Computer Vision
OpenCV Auto Attendance management system
An attendance management system based on faces trained with AI technique. An interactive GUI with TKINTER
Time Series Forecasting
Automated Cross-Validation Framework for Time Series Model Selection
Created an automated framework for a demand forecasting problem to choose the best-performing model on a weekly basis. The models considered are: Facebook Prophet Moving Average Simple Exponential Smoothing Holt Winter's ARIMA.
Machine Learning, Data visualisation
Spotify Recommender and Visualizer
By generating global song recommendations towards highlighting global talent and multilingual songs, our project aims to create a more diverse experience to music listeners while also providing a fun user experience by generating live, audio-responsive artistic visuals.
Data Science for Business Applications | Auto ML
Data Science for Dummies | Data Voyager
Developing an application that serves as an automated E2E machine learning pipeline aimed at users who are interested in machine learning but lack the techincal skillset or time to dive in.
Machine Learning
California Wildfires
Wildfires are becoming more common and destructive in California. This project aims to classify and predict how big a wildfire may become by training models on a variety of weather conditions to help provide more information to prevent large wildfires.
Natural Language Processing
Named Entity Recognition for Icelandic
In this project a BERT model is applied to an Icelandic NER corpus. The model is evaluated using 10-fold cross-validation, and from this evaluation an F1-score of 89.24 is obtained.
Modeling
Anomaly detection with IsolationForest
Project to analyse anomalies spent by corporate credit cards of Brazilian government.
Data analytics
US 2020 Elections Analytics
With my team, we used tweets collected with the Twitter API regarding the US elections of 2020. Using the LDA algorithm, we spotted the most talked topics in the months leading up to the election and for a short period after it. We also trained ML models with annotated data for gender prediction, age prediction, political affiliation prediction, and we calculated the emotion and subjectivity features. We applied these models to present charts with our findings in a Web application. Finally, we did tweet clustering based on the LDA representation of each tweet and we spotted 3 tweet clusters.
Latest portfolios
A selection of data science portfolios created this month