Database design, queries, and reporting pipelines for finance and business insights.
🗓️ SQL Journey — From Fundamentals to Advanced Data Mastery
An end-to-end learning portfolio documenting my growth in SQL and MySQL, designed to strengthen my expertise in data analysis, reporting, and database management.
💡 What makes this journey different?
I integrate SQL with Python — using mysql.connector and pandas — to transform query results into interactive data frames, visualizations, and machine learning-ready datasets, bridging the gap between database skills and applied analytics.
Featured Content:
- 🔍 Querying & Filtering — Core SQL skills: 
SELECT,WHERE,DISTINCT,ORDER BY - 🌐 Table Joins & Relationships — 
INNER,LEFT,RIGHT, andSELF JOIN - 📊 Aggregations & Grouping — 
GROUP BY,HAVING, and summarization for insights - 🧩 Subqueries & Nested Queries — Using subqueries and derived tables for complex logic
 - 🛠️ Set Operations — 
UNION,INTERSECT,EXCEPTfor combining datasets - 🗄️ Database Management — 
CREATE DATABASE,CREATE TABLE, schema design - 🔐 Constraints & Integrity — Enforcing reliability with 
PRIMARY KEY&FOREIGN KEY 
Key Insights:
- Filtering & Joins unlock deeper insights across multiple tables.
 - Aggregations & Grouping make data summaries and trend analysis possible.
 - Subqueries simplify complex analysis and help modularize logic.
 - Python integration enhances SQL workflows with interactivity and visualization.
 
Outcome:
Building a comprehensive SQL portfolio that demonstrates:
- Real-world query solving for analytics
 - Database design and management skills
 - SQL + Python pipelines for end-to-end analysis
 - Applied learning documented on Notion and GitHub
 
🛠️ Tools & Tech:
- 📊 SQL (MySQL · PostgreSQL · SQL Server)
 - 🐍 Python (
pandas,mysql.connector,numpy) - 📈 matplotlib · seaborn for visualization
 - ⚙️ scikit-learn · XGBoost for advanced pipelines
 
🌐 Documentation:
This journey is extended on my Notion Page, where I share:
- 📘 In-depth SQL explanations
 - 📝 Personal reflections & milestones
 - 📚 Additional resources for deeper learning
 
Explore the full Project on Github
🗄️ MySQL Database Engineering Portfolio - Queries, Analytics & Real-World Projects
A comprehensive repository documenting my structured progression through MySQL databases — from fundamentals to advanced analytics and optimisation.
💡 What makes this project different?
It isn't just SQL snippets — it's a complete learning archive, combining structured curricula, advanced tutorials, hands-on projects, and integration with Python for real-world financial and business insights.
What's Inside
- Core SQL Foundations — Database creation, SELECT queries, filtering & aggregations
 - Joins & Subqueries — Inner/Outer joins, correlated queries, EXISTS patterns
 - Window Functions & Analytics — Ranking, partitioning, and advanced calculations
 - Functions & Data Types — String, date, numeric, and control flow mastery
 - Data Analysis — Time-series, statistical analysis, EDA with real datasets
 - Performance Optimisation — Indexing, query tuning, execution planning
 - Python-SQL Integration — Analysis pipelines in pandas & Jupyter
 
Key Highlights
- 📖 DataCamp Track — SQL foundations with relational databases, joins, and EDA
 - 🎓 Comprehensive MySQL Tutorial — In-depth coverage of functions, grouping, subqueries
 - 🛠️ Practical Database Ops — Consumer complaints & retail datasets queried for insights
 - 📈 Cathy Tanumura Track — Advanced SQL for time-series and retail sales analysis
 - ⏰ 24-Hour SQL Intensive — Rapid mastery of query techniques with applied exercises
 - 📊 SQL + Python (Luke's Track) — Full Python integration, analysis workflows, and statistical modelling
 
Outcomes
This repository demonstrates:
- Ability to design and manage databases from scratch
 - Proficiency in advanced SQL techniques (CTEs, window functions, recursive queries)
 - Strength in business analytics use cases — sales performance, customer segmentation, inventory management
 - Capability to integrate SQL with Python for end-to-end data analysis
 - Consistency in best practices: optimisation, indexing, modular query design
 
🛠️ Tools & Tech
- Database: MySQL 8.0+ (ClassicModels dataset as foundation)
 - Tools: MySQL Workbench · DbVisualizer · Jupyter Lab/Notebooks
 - Programming: Python 3.x (pandas, NumPy, matplotlib, seaborn, scikit-learn)
 - Version Control: GitHub repository with structured commits
 
📊 Real-World Applications
- Customer Analysis — segmentation, behaviour patterns, lifetime value
 - Sales Analytics — trend analysis, seasonal performance, forecasting
 - Inventory Management — stock optimisation, reorder points, demand planning
 - Financial Reporting — revenue, profit margins, liquidity analysis
 
Explore the Full Repository on GitHub →
🎓 International Student Wellbeing Analytics: Advanced SQL Portfolio
An enterprise-level SQL project analyzing how stay duration and demographics impact international student mental health outcomes, using advanced SQL queries, database design, and data warehousing techniques.
💡 What makes this project different?
I combined normalization, star schema design, and advanced SQL analytics (CTEs, window functions, OLAP) with ETL pipelines and BI dashboards to deliver actionable wellbeing insights for educational institutions and student services.
Featured Content
- 🏗️ Database Architecture — 3NF normalization, star schema with fact & dimension tables, SCD Type 2
 - 🔍 Advanced SQL Queries — window functions (
ROW_NUMBER,LAG,PERCENTILE_CONT), recursive CTEs, OLAP (ROLLUP, CUBE) - ⚙️ ETL Pipelines — production-ready processes for transformation and loading
 - 📈 Analytics & Insights — variance analysis, risk scoring, cohort and longitudinal studies
 - 📊 Visualization — interactive wellbeing dashboards for institutional decision-making
 
Key Insights
- Short-stay students (1–2 years): 34% higher anxiety, 28% lower connectedness.
 - Cultural adaptation improves mental health outcomes over longer stay durations.
 - First 6–12 months identified as the highest-risk period requiring focused interventions.
 - SQL-driven predictive models can flag students at risk early, enabling proactive support.
 
Outcome
Delivered actionable insights and advanced SQL implementations that enable:
- Robust data warehouse design for wellbeing analytics
 - Early warning systems for student support services
 - Longitudinal and cohort-based mental health analysis
 - Integration of SQL + Python workflows for visualization and predictive modeling
 
🛠️ Tools & Tech
- 📊 SQL (MySQL · PostgreSQL)
 - 🐍 Python (pandas, matplotlib, seaborn)
 - ⚙️ ETL Pipelines · OLAP · Recursive CTEs · Window Functions
 - 📈 Jupyter Notebooks for interactive analysis & reporting
 
Explore the full Project on Github
🌐 YouTube Comment Sentiment Analysis- Python & SQL
- ☁️ YouTube Data API · Streamlit / Huggingface Spaces
 - 🤖 NLP (NLTK · spaCy · TextBlob · VADER · Huggingface)
 - 🐍 Python (pandas · numpy · matplotlib · seaborn)
 - 📊 SQL (PostgreSQL) · Apache Airflow
 
🛠️ Tools & Tech
- Scalable ETL pipelines deployable in research or media monitoring contexts
 - Real-time dashboards for audience trend analysis
 - Comparative evaluation of multiple sentiment models
 - Automated collection & classification of YouTube comments
 - Delivered a production-ready workflow that enables:
 
Outcome
- Real-time pipelines enable continuous monitoring of public opinion, supporting research and policy insights.
 - Cohort analysis reveals sentiment shifts across news outlets (The Young Turks vs. Al Jazeera) and video timelines.
 - TextBlob vs. VADER: VADER performs better for short, opinion-heavy comments, while TextBlob captures longer narrative sentiment.
 - Audience reactions to UK immigration debates are highly polarized, with strong negative sentiment clustering around economic concerns.
 
Key Insights
- 🚀 Deployment — Streamlit app / Huggingface Spaces for real-time sentiment checks
 - ⚙️ ETL Pipelines — Airflow DAGs for automated extraction, transformation, and loading
 - 📊 Insights & Visualization — sentiment distributions, trend dashboards, cohort analysis
 - 🤖 Sentiment Analysis — TextBlob & VADER models, plus Huggingface transformers
 - 🧹 Data Cleaning — preprocessing text (stopwords, stemming, lemmatization)
 - 📥 Data Collection — fetching YouTube comments via API & storing in PostgreSQL
 
Featured Content
I combined ETL pipelines (Airflow), SQL storage, and sentiment analysis models (TextBlob, VADER, Huggingface)to transform unstructured YouTube comments into real-time insights and dashboards for trend analysis.
💡 What makes this project different?
An end-to-end data engineering and NLP project analyzing YouTube comments from videos on immigration and UK politics, using APIs, SQL workflows, and sentiment classification models.
Explore the full Project on Github
📂 Financial Data Analytics: Forecasting, Variance Analysis & Customer Insights
An end-to-end financial analytics project leveraging 1M+ transactions across 21 features, combining SQL, Python, and Power BI to deliver forecasts, variance analysis, and customer segmentation.
💡 What makes this project different?
I integrate SQL for data extraction & transformation, Python for forecasting & clustering, and Power BI for interactive dashboards — creating a full-stack financial analysis pipeline that turns raw transaction data into decision-ready insights.
Featured Content
- 📈 Forecasting — revenue & expense prediction with ARIMA, XGBoost, and LSTMs
 - 📊 Variance Analysis — budget vs. actual performance with variance drivers across customers & merchants
 - 👥 Customer Insights — segmentation using K-Means & DBSCAN (frequency, spend, categories)
 - 📉 Seasonality & Trends — detecting cyclical spending and financial deviations
 - 📊 Business Intelligence — Power BI dashboards for real-time revenue & variance tracking
 
Key Insights
- Revenue forecasts highlight seasonal peaks with predictive accuracy measured via RMSE, MAPE, and R².
 - Variance analysis uncovers spending categories with largest budget deviations.
 - Customer segmentation shows high-value customers (top 20%) driving ~60% of revenue.
 - Seasonal variance trends reveal critical months for financial risk monitoring.
 
Outcome
Delivered data-driven insights that empower financial decision-making by:
- Improving forecast accuracy for revenue & expenses
 - Identifying variance drivers behind budget gaps
 - Highlighting customer profiles & spending behaviors
 - Providing real-time dashboards for executives and finance teams
 
🛠️ Tools & Tech
- 📊 SQL (MySQL · PostgreSQL · SQL Server)
 - 🐍 Python (pandas · numpy · scikit-learn · XGBoost)
 - 📈 Power BI · matplotlib · seaborn
 - ⚙️ Time-series modeling · Clustering · Variance reporting
 
Explore the full Project on Github