[P] I Trained a Language Model on CPU for 40 Hours – It Beat the GPU Baseline
For those who have been following this project, you may recall FlashLM v3, then v4 “Bolt”, and v5.2 “Nova-Ignition”. I am pleased to announce that FlashLM v5 “Thunderbolt” is now complete.
Results
| Metric | Value |
|---|---|
| Final PPL | 1.36 |
| Final BPC | 0.44 |
| Parameters | 29.7M (26.5M ternary) |
| Training Time | ~40 hours |
| Hardware | AMD Ryzen 7950X3D |
FlashLM v5 achieves a validation perplexity of 1.36, which beats the TinyStories-1M baseline (PPL 1.59). This represents the first instance of a CPU-trained model beating this baseline.
Architecture
FlashLM v5 utilizes ParallelGatedRecurrence, a MatMul-free architecture featuring:
- BitLinear with ternary weights {-1, 0, +1}
- Parallel gated recurrence with learned decay gates
- No matrix multiplications in the forward pass
Parameters: 29,750,784 Ternary: 26,542,080 (89%) Float: 3,208,704 (11%)
Acknowledgments
I would like to thank arki05 for providing the AMD Ryzen 7950X3D used for training. Without this contribution, the project would not have been possible.
Generation Comparison
| Version | PPL | BPC | Output Quality |
|---|---|---|---|
| v4 “Bolt” | 15.05 | 0.88 | Short, repetitive |
| v5.2 “Nova-Ignition” | 10.56 | 0.78 | Better coherence |
| v5 “Thunderbolt” | 1.36 | 0.44 | Significantly better |
Analysis:
- v5 demonstrates improved cohesive storytelling compared to v4 and v5.2
- v5 shows better vocabulary diversity and grammar
- BPC improved from 0.88 (v4) to 0.44 (v5), representing a 2x improvement
- PPL improved from 15.05 (v4) to 1.36 (v5), representing an 11x improvement
Samples
Prompt: “Once upon a time, there was a brave girl named Lucy.”
Once upon a time, there was a brave girl named Lucy. her big tiny looked door, and she wanted. Lucy loved to creative things. She would find toy when, while small laughing, when she thought. She would be friends all day.One day, Lucy found her toy saw a little hole. Lucy was very happy. She wanted to see who was mean. The little hole was not alone anymore. When Lucy was done playing, she saw the little…
Links
- Live Demo: https://huggingface.co/spaces/changcheng967/flashlm-v5-demo
- Model Card: https://huggingface.co/changcheng967/flashlm-v5-thunderbolt
- GitHub: https://github.com/changcheng967/FlashLM
Future Directions
FlashLM v5 concludes the v5 series. Future work includes:
- FlashLM v6 – Continuing to validate the ParallelGatedRecurrence architecture
- Nano-Coder (NC series) – Applying FlashLM techniques to code generation
submitted by /u/Own-Albatross868
[link] [comments]