[P] Domain specific LoRA fine tuning on consumer hardware
Been experimenting with a pattern for building domain-specific local LLMs that I haven’t seen documented cleanly elsewhere.
The problem: base models fine for general tasks but struggle with domain-specific structured data — wrong schema assumptions, inconsistent output formatting, hallucinated column names even when the data is passed as context via RAG.
The approach:
Phase 1 — Use your existing RAG pipeline to generate (question, SQL, data, baseline_answer) examples automatically via a local model. No annotation, no cloud, ~100-200 examples in 20 minutes.
Phase 2 — Single cloud pass: a stronger model rewrites baseline answers to gold-standard quality in your target style. One-time cost ~$2-5. This is the only external API call in the entire pipeline.
Phase 3 — LoRA fine-tune on Qwen3.5-4B using mlx-lm (Apple Silicon) or Unsloth+TRL (CUDA). 15-40 min on M4 Mac mini, 10-25 min on RTX 3090.
Phase 4 — Fuse and serve locally. mlx-lm on Apple Silicon, GGUF + Ollama on any platform.
Key observations:
– RAG alone doesn’t fix schema hallucination in smaller models — LoRA is needed for structural consistency
– The annotation quality ceiling matters more than example count past ~100 samples
– 4B models post fine-tuning outperform untuned 70B models on narrow domain tasks in my testing
Built a working implementation with a finance coach example. Curious if others have found better approaches to the annotation phase specifically — that feels like the biggest lever.
submitted by /u/sandseb123
[link] [comments]