Request Access — LLaMA-8B Sentiment

Email me to unlock the full case study and demo API.

EmailMessage

LLaMA-8B News Sentiment — Case Study

Domain-tuned, latency-optimized sentiment for financial headlines & articles.

← Back

Macro F1

0.89

Latency p95

< 500 ms

Throughput

10k docs/hr

Cost / 1k

$0.004

Class Distribution

F1 by Class

Latency p95

Throughput (docs/hr)

Approach

We adapted a LLaMA-8B model with LoRA for 3-way sentiment (positive/neutral/negative) on a finance-focused corpus. Prompts were kept short with a structured JSON output schema to simplify parsing. Tokenizer was domain-tuned; we used mixed-precision + KV-cache and small beam search for speed.

A two-stage system routes easy cases to a lightweight prompt classifier and reserves the 8B model for hard examples based on an uncertainty score. This cut cost and latency significantly while improving macro F1. We monitor drift and recalibrate thresholds monthly.

Contact for Live Access Back to Portfolio