SQL Data Mastery: From Fundamentals to Advanced Analysis

This project is a comprehensive journey through SQL and MySQL, where I’ve built and documented core querying skills such as aliases, joins, aggregations, and filtering techniques. To enhance interactivity and data handling,

I integrated Python using mysql.connector and pandas, enabling seamless connection to MySQL databases and transformation of raw query outputs into clean, insightful data frames.

This practical approach bridges traditional SQL querying with modern Python workflows—ideal for real-time data analysis and reporting.

View Project on GitHub: Portfolio-1

Machine Learning for Finance: A Complete ML Zoomcamp Journey

As part of the DataTalks.Club ML Zoomcamp, I spent three months applying machine learning concepts directly to financial datasets. This included hands-on work with regression, classification, evaluation metrics, model deployment, decision trees, ensemble models, and neural networks.

I used Python to build, train, and deploy models, drawing actionable insights from real financial data.This project demonstrates my ability to connect machine learning theory with high-impact finance applications, such as risk analysis and investment prediction.

View Project on GitHub: Portfolio-2

Student Stay Duration & Well-being Score Analysis

This project explores how the duration of student stays relates to their well-being, using three key psychological metrics:

PHQ – Physical Health Questionnaire
SCS – Social Connectedness Scale
AS – Anxiety Scale

The analysis focuses on:

Computing average scores for each metric based on stay duration.
Counting international students across different stay periods.

By uncovering patterns between stay length and well-being, this study provides actionable insights to support student welfare and improve program design for international students.

View Project on GitHub: Portfolio-3

Diamond Price Predictor: The 4C Model

As part of the SuperDataScience Community Project, I contributed to building a machine learning model that predicts the price of diamonds based on key characteristics—carat, cut, color, clarity—as well as additional features like depth and dimensions.

This end-to-end project involved:

Data preprocessing and cleaning for model readiness
Feature engineering to optimize model inputs
Model training and evaluation using regression techniques
Deployment through a Streamlit web application for real-time prediction

Working collaboratively with other data professionals, I helped turn raw data into a functional pricing tool—bridging data science with real-world application in the gemstone industry.performance, and profitability drivers.

View Project on GitHub: Portfolio-4

SuperStore Sales Analysis and Forecasting with Python

In this project, I applied data science techniques to analyze and forecast sales performance for the SuperStore retail dataset. Using Python, I explored sales trends, customer behavior, regional performance, and profitability drivers.

I also built predictive models to forecast future sales using regression techniques. The insights generated support business decision-making around inventory, marketing, and resource allocation—demonstrating the power of data-driven strategy in retail operations.

View Project on GitHub: Portfolio-5

Stock Market Analysis of Tech Giants Using Python

This project focuses on analyzing stock performance of leading tech companies—Apple (AAPL), Microsoft (MSFT), Google (GOOGL), and Amazon (AMZN).

Using Python, I processed historical stock data to uncover market trends, calculate returns, and evaluate investment performance over time. The goal was to understand stock volatility and correlation, enabling data-driven portfolio strategy decisions.

The project showcases data manipulation, visualization, and financial metric computation using libraries such as pandas, matplotlib, and yfinance.

View Project on GitHub: Portfolio-6

YouTube Comment Sentiment Analysis

This project explores public sentiment on immigration in the UK by analyzing YouTube comments from politically charged videos. Using the YouTube Data API, comments are extracted and processed through an automated ETL pipeline powered by Apache Airflow.

A sentiment analysis model built with Huggingface transformers classifies each comment as positive, negative, or neutral. The final model is deployed on Streamlit or Huggingface Spaces, enabling real-time analysis through an interactive web interface.

View Project on GitHub: Portfolio-7

Financial Data Analytics & Forecasting

This project applies data science and business intelligence techniques to a large-scale financial transactions dataset sourced from Kaggle. With over 1 million records, it covers customer behavior, card usage, and financial activity throughout the 2010s. The project demonstrates a full-stack approach—combining SQL, Python, Machine Learning, and Power BI—to deliver insights across four key areas:

Financial Forecasting: Predicting revenue and expenses using models like ARIMA, XGBoost, and LSTM.
Variance Analysis: Investigating gaps between budgeted and actual performance across spending categories.
Customer Segmentation: Identifying customer groups based on transaction patterns using clustering techniques.
Interactive Reporting: Designing Power BI dashboards for real-time financial monitoring and insights.

This end-to-end pipeline showcases a data-driven approach to financial planning and decision-making.

View Project on GitHub: Portfolio-8