[P] I Trained a Language Model on CPU for 40 Hours – It Beat the GPU Baseline

For those who have been following this project, you may recall FlashLM v3, then v4 “Bolt”, and v5.2 “Nova-Ignition”. I am pleased to announce that FlashLM v5 “Thunderbolt” is now complete.

Results

Metric Value
Final PPL 1.36
Final BPC 0.44
Parameters 29.7M (26.5M ternary)
Training Time ~40 hours
Hardware AMD Ryzen 7950X3D

FlashLM v5 achieves a validation perplexity of 1.36, which beats the TinyStories-1M baseline (PPL 1.59). This represents the first instance of a CPU-trained model beating this baseline.

Architecture

FlashLM v5 utilizes ParallelGatedRecurrence, a MatMul-free architecture featuring:

  • BitLinear with ternary weights {-1, 0, +1}
  • Parallel gated recurrence with learned decay gates
  • No matrix multiplications in the forward pass

Parameters: 29,750,784 Ternary: 26,542,080 (89%) Float: 3,208,704 (11%) 

Acknowledgments

I would like to thank arki05 for providing the AMD Ryzen 7950X3D used for training. Without this contribution, the project would not have been possible.

Generation Comparison

Version PPL BPC Output Quality
v4 “Bolt” 15.05 0.88 Short, repetitive
v5.2 “Nova-Ignition” 10.56 0.78 Better coherence
v5 “Thunderbolt” 1.36 0.44 Significantly better

Analysis:

  • v5 demonstrates improved cohesive storytelling compared to v4 and v5.2
  • v5 shows better vocabulary diversity and grammar
  • BPC improved from 0.88 (v4) to 0.44 (v5), representing a 2x improvement
  • PPL improved from 15.05 (v4) to 1.36 (v5), representing an 11x improvement

Samples

Prompt: “Once upon a time, there was a brave girl named Lucy.”

Once upon a time, there was a brave girl named Lucy. her big tiny looked door, and she wanted. Lucy loved to creative things. She would find toy when, while small laughing, when she thought. She would be friends all day.One day, Lucy found her toy saw a little hole. Lucy was very happy. She wanted to see who was mean. The little hole was not alone anymore. When Lucy was done playing, she saw the little…

Links

Future Directions

FlashLM v5 concludes the v5 series. Future work includes:

  1. FlashLM v6 – Continuing to validate the ParallelGatedRecurrence architecture
  2. Nano-Coder (NC series) – Applying FlashLM techniques to code generation

submitted by /u/Own-Albatross868
[link] [comments]

Liked Liked