Nathan Leclercq #
nathan.leclercq9@protonmail.com | LinkedIn | GitHub | Blog | Download PDF
Profile #
Data Engineer and ML Engineer at DataKhi for 3 years (apprenticeship then full-time). I design and operate end-to-end data platforms: collection, pipelines, ML models, deployment, monitoring. Background in mathematics and computer science, Master’s in Machine Learning (Lille). I don’t just write code — I deploy, industrialize and deliver.
Professional Experience #
DataKhi — Data Consulting Firm, Tourcoing (2023 - present) #
Data Engineer — Nyukom Project · Full-time · Oct 2025 - present
- Designed and deployed an end-to-end telecom data platform: collection (3CX web scraping, Centreon API), MinIO data lake, PostgreSQL star schema warehouse, Power BI reporting
- Full infrastructure deployment: K3s, Airflow, Ansible, private Docker registry
- Multi-tenant architecture with partitioning, idempotency and historical backfill
- Stack: Airflow, K3s, Ansible, Docker, PostgreSQL, MinIO, Playwright, Pandas
ML Engineer — Hall U Need Project · Full-time (continued from apprenticeship) · 2023 - present
- Industrialized a restaurant demand forecasting model (XGBoost quantile regression)
- Multi-restaurant prediction models, feature engineering (weather, calendar, reservations)
- Custom loss function (Huber), confidence interval calibration, non-regression tests
- Full pipeline: Microsoft Fabric collection → training → prediction · Makefile workflow
Data Engineer & ML Engineer — Tossée Project · Apprenticeship · 2023 - 2025
- Architected a complete data ecosystem for an eco-friendly fashion aggregator
- Multi-brand scraping (Playwright, Scrapy, custom YAML rules engine)
- Normalization pipeline, environmental impact calculation (Ecobalyse API), product embeddings
- Backend API (FastAPI, PostgreSQL/pgvector, semantic search, recommendation)
- Flutter mobile app: virtual try-on (DM-VTON), barcode scanning, multi-provider OAuth, geolocation
- React/TypeScript browser extension for real-time environmental impact display
- AI agent (OpenAI Agents SDK) for automated data extraction from HTML
- Hybrid on-premise / Azure deployment (Functions, Blob, DevOps)
FullStack Developer — Internship · 2023 · 4 months
- PowerBI versioning system: C++ backend (report differentials), React frontend, Electron distribution
Music Teacher · 2017 - present #
- Saxophone (jazz, soul) and music theory — private lessons and music schools
Technical Skills #
Data Engineering
- End-to-end ETL pipelines, star schema, partitioning, idempotency, backfill
- Apache Airflow · PostgreSQL · MinIO (S3) · Parquet / PyArrow · Microsoft Fabric
Machine Learning
- XGBoost (quantile regression) · Feature engineering · Temporal cross-validation
- Embeddings / vector search (pgvector) · CamemBERT / Transformers · MLflow
- Confidence interval calibration · Custom loss functions
DevOps / Infrastructure
- Kubernetes (K3s) · Docker · Ansible (IaC, roles, vault) · Proxmox
- Monitoring: Prometheus / Grafana · CI/CD: Makefile, pipelines
- Azure (Fabric, Functions, Blob, DevOps)
Development
- Python (FastAPI, Pandas, scikit-learn) · SQL · TypeScript (React) · Dart (Flutter)
- Scraping: Playwright, Scrapy, BeautifulSoup
- Familiar with: Go, Rust, Haskell, C++
Scientific / Competitive Programming
- Julia (competitions: Google Hash Code, Reply Challenge, Cloudflight) · R · NumPy / SciPy
Personal Projects #
MLOps Homelab Platform · 2024 - present
- Self-hosted infrastructure: Proxmox, GPU servers, ML services, crewAI agents with RAG
- Prometheus/Grafana monitoring, Ansible deployment, Docker registry, Gitea
- Published technical articles
Book Recommendation System · 2023 - 2025
- Full data pipeline: large book catalogue scraping, embeddings (TF-IDF + CamemBERT), FastAPI API
- PostgreSQL/pgvector, MLflow, Vue.js interface
- Published technical articles
Algorithms Club · 2020 - 2024
- Preparation and participation in competitive programming contests
- Optimized solutions in Julia · Google Hash Code, Reply Challenge, Cloudflight
Research: Melody Harmonization · 2024
- Comparative study of models and algorithms for automatic melody harmonization
Education #
Master’s in Machine Learning · University of Lille · 2023 - 2025
- Deep Learning, NLP, MLOps · LLM deployment on GPU infrastructure
Bachelor’s in Computer Science · University of Lille · 2020 - 2023
- Advanced algorithms, distributed architecture, full-stack development
Mathematics Studies (3 years) · University of Lille · 2017 - 2020
- Numerical analysis, probability/statistics, applied linear algebra
Languages #
- French: native
- English: professional (TOEIC 885)
Interests #
- Music: jazz/soul saxophone, orchestra
- Sports: daily cycling, badminton
- Reading: science fiction, technical essays
- Tabletop role-playing games
Publications #
- Technical articles on the DataKhi blog (2026)
- Technical articles on my personal blog (2024-2025)