Level 1: Data Foundations & Analytics

Focus on the most in-demand industry skills: version control, pulling data, exploratory analysis, and building reports. EDA is introduced early so you don’t jump into ML without insight.

Session 1: The Data Mindset & Environment Setup

Set up Python (Anaconda/VS Code) and intro to Jupyter Notebook.
Git basics: version control for scripts and notebooks (commit, branch, push); why it matters for reproducible work.
Prompt engineering intro: GenAI mindset—how to ask models for code, explanations, and checks; sets the stage for later AI-assisted analysis.
Why Data Science in 2026 is different (AI vs Human roles).

Session 2: SQL Masterclass (Part 1)

Basic queries: SELECT, FROM, WHERE, ORDER BY.
Filtering & aggregation: GROUP BY, HAVING, COUNT/SUM/AVG.

Session 3: SQL Masterclass (Part 2)

Relational databases: JOIN (Inner, Left, Right).
Advanced SQL: Common Table Expressions (CTE) and subqueries.

Session 4: Python for Data (Pandas & Polars)

Reading various file types (CSV, Excel, JSON).
Table manipulation: filtering, sorting, and creating new columns.
Polars vs Pandas: when to use which; quick benchmark (speed, memory) so you can choose the right tool for scale.

Session 5: Data Cleaning & Wrangling

Handling missing values and duplicate data.
Data transformation: date, category, and string formatting.
EDA preview: spot correlation and outliers during cleaning so you’re not blind when you hit ML—simple checks (e.g. value ranges, duplicates by key) and when to dig deeper.

Session 6: Data Storytelling, Visualization & EDA

EDA principles: distribution, outliers, and “what does this variable tell us?” before you build anything.
Visual design principles: choosing the right chart (Bar, Line, Scatter).
Business context & domain: why this data exists, what decisions it supports—don’t ignore the “why” behind the numbers.
Tools: intro to Tableau or Power BI to build your first dashboard.

Level 2: Machine Learning & Statistical Thinking

Focus on prediction logic and rigorous model evaluation so you can defend your work in practice.

Session 7: Statistics for Practical Analysts

Data distribution, outliers, and correlation (why A relates to B).
Ties back to EDA from Level 1; formalize the intuition you built.

Session 8: Supervised Learning (Regression)

Predicting numbers: e.g. rental price or monthly income.
Model evaluation: metrics (RMSE, MAE, R²), train/validation split, overfitting checks.
Simple hyperparameter tuning: e.g. grid search on one or two knobs so you don’t ship default-only models.

Session 9: Supervised Learning (Classification)

Predicting categories: e.g. spam classification or fraud detection.
Metrics: accuracy, precision, recall, F1; when to optimize for which (e.g. recall in fraud).
Cross-validation and overfitting checks; simple hyperparameter tuning where it matters.

Session 10: Unsupervised Learning (Clustering)

Segmentation: grouping customers by behavior.
Evaluation: silhouette, inertia; how to sanity-check clusters and avoid over-interpreting.

Level 3: The "Engineer" Edge (2026 Special)

The part that makes your mentoring valuable—pipelines, cloud, AI, and deployment.

Session 11: Automated Data Pipeline (The Mini ETL)

Build Python scripts to pull data from APIs or databases on a schedule.
Scheduling: Airflow or Dagster intro—DAGs, tasks, and running pipelines reliably.
Cloud basics: GCP or AWS free tier—run a small pipeline in the cloud so you understand scalability beyond your laptop.

Session 12: AI Agents for Data Analysis

Integrate Gemini/OpenAI APIs for automated text analysis (sentiment, summarization).
Prompt patterns for data tasks (generating code, explaining results).

Session 13: Model Deployment (From Local to API)

Wrap your ML model as a simple API with FastAPI so others can use it.
Versioning and basic monitoring (e.g. logging inputs/outputs).

Session 14: Final Project Review & Portfolio Building

Milestone projects: 2–3 end-to-end examples (e.g. fraud detection from data → EDA → model → evaluation → simple API or report) so you can show “I can ship it.”
Tips to showcase your work on GitHub (repos, README, Git history) and LinkedIn so recruiters notice.
How to talk about domain and “why” in interviews.