[P] Domain specific LoRA fine tuning on consumer hardware

digitado ⋅ 6 de March de 2026

Been experimenting with a pattern for building domain-specific local LLMs that I haven’t seen documented cleanly elsewhere.

The problem: base models fine for general tasks but struggle with domain-specific structured data — wrong schema assumptions, inconsistent output formatting, hallucinated column names even when the data is passed as context via RAG.

The approach:

Phase 1 — Use your existing RAG pipeline to generate (question, SQL, data, baseline_answer) examples automatically via a local model. No annotation, no cloud, ~100-200 examples in 20 minutes.

Phase 2 — Single cloud pass: a stronger model rewrites baseline answers to gold-standard quality in your target style. One-time cost ~$2-5. This is the only external API call in the entire pipeline.

Phase 3 — LoRA fine-tune on Qwen3.5-4B using mlx-lm (Apple Silicon) or Unsloth+TRL (CUDA). 15-40 min on M4 Mac mini, 10-25 min on RTX 3090.

Phase 4 — Fuse and serve locally. mlx-lm on Apple Silicon, GGUF + Ollama on any platform.

Key observations:

– RAG alone doesn’t fix schema hallucination in smaller models — LoRA is needed for structural consistency

– The annotation quality ceiling matters more than example count past ~100 samples

– 4B models post fine-tuning outperform untuned 70B models on narrow domain tasks in my testing

Built a working implementation with a finance coach example. Curious if others have found better approaches to the annotation phase specifically — that feels like the biggest lever.

https://github.com/sandseb123/local-lora-cookbook

submitted by /u/sandseb123
[link] [comments]

Like 0

Liked Liked