Neural Machine Translation
We fine-tune translation models like Google's TranslateGemma-12B for English↔Kikuyu using Parameter-Efficient Fine-Tuning (LoRA). Our deployed model achieves 19.61 BLEU — a 758% improvement over zero-shot performance on 30,430 curated sentence pairs.
Key areas: LoRA optimization, regularization tuning, production deployment on Modal serverless GPUs
Speech-to-Speech Models
We are building end-to-end voice AI using the Mimi neural codec adapted for Kikuyu tonal fidelity. Our Stage 1 codec adaptation is complete (79.3M params, 1.1 Hz pitch error), with streaming inference and full-duplex conversation in development.
Key areas: Mimi codec adaptation, pitch-preservation loss, cascaded ASR→LLM→TTS pipeline
Dataset Engineering
We use the African Next Voices corpus (750+ hours of Kikuyu audio) and Google's WAXAL TTS dataset (~9 hours studio quality) to train robust speech and translation models.
Key areas: Audio preprocessing, noise augmentation, multi-source dataset curation
Inference & Deployment
We deploy models on Modal serverless GPUs (A10G, A100) with keep-warm strategies to minimize cold starts. Our Next.js frontend proxies API requests to serverless backends for production-grade serving.
Key areas: Modal serverless, keep-warm scheduling, Next.js API routes