SOEUK BONDOL — DATA SCIENTIST — CAMBODIA

SOEUK BONDOL.

Data Science × Machine Learning × MLOps

Turning raw data into decisions.
Building models that generalise, pipelines that scale,
and systems that actually ship.

sys.profile() LIVE
roleData Scientist
locationPhnom Penh, KH
focusML · CV · NLP
langPython
projects0
commits
mouse_use0ms
skill_index[]
Python
93
ML/DL
80
SQL
76
DevOps
70
DATA SCIENCE· MACHINE LEARNING· COMPUTER VISION· FASTAPI· PYTHON· SCIKIT-LEARN· MLOPS· DOCKER· NLP· PANDAS· DEEP LEARNING· PYTORCH· DATA SCIENCE· MACHINE LEARNING· COMPUTER VISION· FASTAPI· PYTHON· SCIKIT-LEARN· MLOPS· DOCKER· NLP· PANDAS· DEEP LEARNING· PYTORCH·
02
Projects

THINGS I'VE SHIPPED.

From Document AI and research papers to real-time distributed streaming —
production-grade architectures, clean code, engineered with scientific intent.

002 DE
Data Engineering / Streaming

Real-Time GDELT Trends Analytics

A high-throughput analytical pipeline processing the global GDELT event stream in real time. Ingests live telemetry streams, aggregates geopolitical sentiment windows, and orchestrates daily workflow tasks.

GDELT Live Streams → Apache Kafka → PySpark Streaming → PostgreSQL → Apache Airflow → FastAPI

Apache KafkaPySparkApache AirflowPostgreSQLFastAPIPython
GitHub Project ↗
003 FIN
Data Engineering / Finance

Stock Streaming Analytics

An end-to-end financial streaming and analytical pipeline. Ingests live market data, processes streaming window metrics, stores structured multi-layer data tables, and serves instant API metrics.

Stock APIs → Kafka Producer → Apache Kafka → PySpark Streaming → PostgreSQL (Bronze/Silver/Gold) → FastAPI → React Dashboard

KafkaPySparkPostgreSQLFastAPIReactData Pipeline
GitHub Project ↗
004 RAG
Research / Document AI

KhmerDoc-Ai

An advanced intelligent document processing (IDP) and layout-aware retrieval system for low-resource Khmer business documents. Implements hierarchical fine-grained layout analysis inspired by the KH-FUNSD dataset (APSIPA 2025) to parse complex structures, extract structured JSON schemas, and enable highly accurate semantic RAG search.

APSIPA 2025 Paper Implementation (KH-FUNSD) · Hierarchical Khmer document parsing · Complex JSON tree extraction · Semantic chunking & layout-aware RAG vector search.

PythonDocument AIRAGLayout AnalysisKH-FUNSDVector DBKhmer NLP
GitHub Project ↗
03
Stack

THE
TOOLKIT.

Every tool chosen with intent. Zero bloat.

Python
Python
PyTorch
PyTorch
Scikit-learn
Scikit-learn
Pandas
Pandas
NumPy
NumPy
OpenCV
OpenCV
Jupyter
Jupyter
FastAPI
FastAPI
PostgreSQL
PostgreSQL
SQLAlchemy
SQLAlchemy
Docker
Docker
GitHub
GitHub
Linux
Linux
Git
Git
Neovim
Neovim
Astro
Astro
HF
HuggingFace
uv
uv
skill_matrix.json CURRENT
Python Data & ML
93
Data Analysis Data & ML
88
Machine Learning ML
82
FastAPI Backend
80
Linux / Bash DevOps
85
SQL Data
76
Docker DevOps
72
Deep Learning ML
70
04
Contact

LET'S
BUILD
SOMETHING.

Open to data science roles, ML consulting, freelance projects, and interesting collaborations. Based in Cambodia — available remotely worldwide. I reply to everything.

Available for opportunities
TypeFull-time · Freelance · Contract
ModeRemote worldwide
TZICT (UTC+7)
Response< 24h