Email me to unlock the full case study and demo API.
Domain-tuned, latency-optimized sentiment for financial headlines & articles.
We adapted a LLaMA-8B model with LoRA for 3-way sentiment (positive/neutral/negative) on a finance-focused corpus. Prompts were kept short with a structured JSON output schema to simplify parsing. Tokenizer was domain-tuned; we used mixed-precision + KV-cache and small beam search for speed.
A two-stage system routes easy cases to a lightweight prompt classifier and reserves the 8B model for hard examples based on an uncertainty score. This cut cost and latency significantly while improving macro F1. We monitor drift and recalibrate thresholds monthly.