Neural Machine Translation
We fine-tune translation models like Google's TranslateGemma for English↔Kikuyu using Parameter-Efficient Fine-Tuning (LoRA). Our first 12B model achieved 19.61 BLEU — a 758% improvement over zero-shot performance on 30,430 curated sentence pairs. Our current production 4B model improves this to 21.93 BLEU and 42.87 chrF++ while serving faster on the web.
Key areas: LoRA and rsLoRA optimization, regularization tuning, BLEU/chrF++ evaluation, production deployment on Modal serverless GPUs
Speech-to-Speech Models
We are building end-to-end voice AI using the Mimi neural codec adapted for Kikuyu tonal fidelity. Our Stage 1 codec adaptation is complete (79.3M params, 1.1 Hz pitch error), with streaming inference and full-duplex conversation in development.
Key areas: Mimi codec adaptation, pitch-preservation loss, cascaded ASR→LLM→TTS pipeline
Dataset Engineering
We use the African Next Voices corpus (750+ hours of Kikuyu audio) and Google's WAXAL TTS dataset (~9 hours studio quality) to train robust speech and translation models.
Key areas: Audio preprocessing, noise augmentation, multi-source dataset curation
Inference & Deployment
We deploy models on Modal serverless GPUs (A10G, A100) with keep-warm strategies to minimize cold starts. Our Next.js frontend proxies API requests to serverless backends for production-grade serving.
Key areas: Modal serverless, keep-warm scheduling, Next.js API routes