Request Access — LLaMA-8B Sentiment

Email me to unlock the full case study and demo API.

LLaMA-8B News Sentiment — Case Study

Domain-tuned, latency-optimized sentiment for financial headlines & articles.

Macro F1
0.89
Latency p95
< 500 ms
Throughput
10k docs/hr
Cost / 1k
$0.004
Class Distribution
F1 by Class
Latency p95
Throughput (docs/hr)

Approach

We adapted a LLaMA-8B model with LoRA for 3-way sentiment (positive/neutral/negative) on a finance-focused corpus. Prompts were kept short with a structured JSON output schema to simplify parsing. Tokenizer was domain-tuned; we used mixed-precision + KV-cache and small beam search for speed.

A two-stage system routes easy cases to a lightweight prompt classifier and reserves the 8B model for hard examples based on an uncertainty score. This cut cost and latency significantly while improving macro F1. We monitor drift and recalibrate thresholds monthly.