idnasirasira
Recruit me
← Course

Data Science & Analytics (2026)

Roadmap from analytics foundations to AI integration—Git, SQL, Python (Pandas/Polars), EDA, ML with proper evaluation, pipelines (Airflow/Dagster), and GenAI. Includes milestone projects.

PythonSQLLast updated 3 Feb 2026

Level 1: Data Foundations & Analytics

Focus on the most in-demand industry skills: version control, pulling data, exploratory analysis, and building reports. EDA is introduced early so you don’t jump into ML without insight.

Session 1: The Data Mindset & Environment Setup

  • Set up Python (Anaconda/VS Code) and intro to Jupyter Notebook.
  • Git basics: version control for scripts and notebooks (commit, branch, push); why it matters for reproducible work.
  • Prompt engineering intro: GenAI mindset—how to ask models for code, explanations, and checks; sets the stage for later AI-assisted analysis.
  • Why Data Science in 2026 is different (AI vs Human roles).

Session 2: SQL Masterclass (Part 1)

  • Basic queries: SELECT, FROM, WHERE, ORDER BY.
  • Filtering & aggregation: GROUP BY, HAVING, COUNT/SUM/AVG.

Session 3: SQL Masterclass (Part 2)

  • Relational databases: JOIN (Inner, Left, Right).
  • Advanced SQL: Common Table Expressions (CTE) and subqueries.

Session 4: Python for Data (Pandas & Polars)

  • Reading various file types (CSV, Excel, JSON).
  • Table manipulation: filtering, sorting, and creating new columns.
  • Polars vs Pandas: when to use which; quick benchmark (speed, memory) so you can choose the right tool for scale.

Session 5: Data Cleaning & Wrangling

  • Handling missing values and duplicate data.
  • Data transformation: date, category, and string formatting.
  • EDA preview: spot correlation and outliers during cleaning so you’re not blind when you hit ML—simple checks (e.g. value ranges, duplicates by key) and when to dig deeper.

Session 6: Data Storytelling, Visualization & EDA

  • EDA principles: distribution, outliers, and “what does this variable tell us?” before you build anything.
  • Visual design principles: choosing the right chart (Bar, Line, Scatter).
  • Business context & domain: why this data exists, what decisions it supports—don’t ignore the “why” behind the numbers.
  • Tools: intro to Tableau or Power BI to build your first dashboard.

Level 2: Machine Learning & Statistical Thinking

Focus on prediction logic and rigorous model evaluation so you can defend your work in practice.

Session 7: Statistics for Practical Analysts

  • Data distribution, outliers, and correlation (why A relates to B).
  • Ties back to EDA from Level 1; formalize the intuition you built.

Session 8: Supervised Learning (Regression)

  • Predicting numbers: e.g. rental price or monthly income.
  • Model evaluation: metrics (RMSE, MAE, R²), train/validation split, overfitting checks.
  • Simple hyperparameter tuning: e.g. grid search on one or two knobs so you don’t ship default-only models.

Session 9: Supervised Learning (Classification)

  • Predicting categories: e.g. spam classification or fraud detection.
  • Metrics: accuracy, precision, recall, F1; when to optimize for which (e.g. recall in fraud).
  • Cross-validation and overfitting checks; simple hyperparameter tuning where it matters.

Session 10: Unsupervised Learning (Clustering)

  • Segmentation: grouping customers by behavior.
  • Evaluation: silhouette, inertia; how to sanity-check clusters and avoid over-interpreting.

Level 3: The "Engineer" Edge (2026 Special)

The part that makes your mentoring valuable—pipelines, cloud, AI, and deployment.

Session 11: Automated Data Pipeline (The Mini ETL)

  • Build Python scripts to pull data from APIs or databases on a schedule.
  • Scheduling: Airflow or Dagster intro—DAGs, tasks, and running pipelines reliably.
  • Cloud basics: GCP or AWS free tier—run a small pipeline in the cloud so you understand scalability beyond your laptop.

Session 12: AI Agents for Data Analysis

  • Integrate Gemini/OpenAI APIs for automated text analysis (sentiment, summarization).
  • Prompt patterns for data tasks (generating code, explaining results).

Session 13: Model Deployment (From Local to API)

  • Wrap your ML model as a simple API with FastAPI so others can use it.
  • Versioning and basic monitoring (e.g. logging inputs/outputs).

Session 14: Final Project Review & Portfolio Building

  • Milestone projects: 2–3 end-to-end examples (e.g. fraud detection from data → EDA → model → evaluation → simple API or report) so you can show “I can ship it.”
  • Tips to showcase your work on GitHub (repos, README, Git history) and LinkedIn so recruiters notice.
  • How to talk about domain and “why” in interviews.

Interested in this course? I offer mentoring and structured learning—get in touch to discuss your goals.