DATA SCIENCE·MACHINE LEARNING·COMPUTER VISION·FASTAPI·PYTHON·SCIKIT-LEARN·MLOPS·DOCKER·NLP·PANDAS·DEEP LEARNING·PYTORCH·DATA SCIENCE·MACHINE LEARNING·COMPUTER VISION·FASTAPI·PYTHON·SCIKIT-LEARN·MLOPS·DOCKER·NLP·PANDAS·DEEP LEARNING·PYTORCH·
02Projects
THINGS I'VESHIPPED.
From Document AI and research papers to real-time distributed streaming —
production-grade architectures, clean code, engineered with scientific intent.
001ML / Computer Vision
Khmer Number Recognition
A deep learning model for recognising handwritten Khmer numerals. Trained a custom CNN on Khmer digit datasets — tackling one of the hardest OCR challenges in Southeast Asian language processing.
Custom CNN architecture · Data augmentation pipeline · Evaluated on real-world Khmer handwriting samples.
A high-throughput analytical pipeline processing the global GDELT event stream in real time. Ingests live telemetry streams, aggregates geopolitical sentiment windows, and orchestrates daily workflow tasks.
An end-to-end financial streaming and analytical pipeline. Ingests live market data, processes streaming window metrics, stores structured multi-layer data tables, and serves instant API metrics.
An advanced intelligent document processing (IDP) and layout-aware retrieval system for low-resource Khmer business documents. Implements hierarchical fine-grained layout analysis inspired by the KH-FUNSD dataset (APSIPA 2025) to parse complex structures, extract structured JSON schemas, and enable highly accurate semantic RAG search.
Open to data science roles, ML consulting, freelance projects, and
interesting collaborations. Based in Cambodia — available remotely
worldwide. I reply to everything.