[P] Update to my Neuromorphic Architecture!

I’ve been working on my neuromorphic architectures quite a lot over the past few months, to the point where I have started a company, here is what I have made so far:

N1 — Loihi 1 feature parity. 128 cores, 1,024 neurons per core, 131K synapses per core, 8×16 mesh network-on-chip. 96 simulation tests passing. Basic STDP learning. Got it running on FPGA to validate the architecture worked.

N2 — Loihi 2 feature parity. Same 128-core topology but with a programmable 14-opcode microcode learning engine, three-factor eligibility learning with reward modulation, variable-precision synaptic weights, and graded spike support. 3,091 verification tests across CPU, GPU, and FPGA backends. 28 out of 28 hardware tests passing on AWS F2 (f2.6xlarge). Benchmark results competitive with published Intel Loihi numbers — SHD 90.7%, N-MNIST 99.2%, SSC 72.1%, GSC 88.0%.

N3 — Goes beyond Loihi 2. 128 cores across 16 tiles (8 cores per tile), 4,096 neurons per core at 24-bit precision scaling up to 8,192 at 8-bit — 524K to 1.05M physical neurons. Time-division multiplexing with double-buffered shadow SRAM gives x8 virtual scaling, so up to 4.2M virtual neurons at 24-bit or 8.4M at 8-bit. Async hybrid NoC (synchronous cores, asynchronous 4-phase handshake routers with adaptive routing), 4-level memory hierarchy (96 KB L1 per core, 1 MB shared L2 per tile, DRAM-backed L3, CXL L4 for multi-chip), ~36 MB total on-chip SRAM. Learning engine expanded to 28 opcodes with 4 parallel threads and 6 eligibility traces per neuron. 8 neuron models — 7 hardwired (LIF, ANN INT8, winner-take-all, adaptive LIF, sigma-delta, gated, graded) plus a fully programmable one driven by microcode. Hardware short-term plasticity, metaplasticity, and homeostatic scaling all at wire speed. NeurOS hardware virtualization layer that can schedule 680+ virtual networks with ~20-40 us context switches. Multi-chip scales to 4,096 cores and 134M virtual neurons. 1,011+ verification tests passing. 19 out of 19 hardware tests passing on AWS F2. Running at 14,512 timesteps/sec on an 8-core configuration at 62.5 MHz.

The whole thing is written in Verilog from scratch — RTL, verification testbenches, etc. Python SDK handles compilation, simulation, and FPGA deployment.

Happy to answer questions about the FPGA side — synthesis, timing closure on F2, verification methodology, etc. None of these are open source but I plan to make these openly accessible for anyone to test and use, but if you email me directly at henry@catalyst-neuromorphic.com I would be happy to arrange access to all three architectures for free via a cloud api build!

If anyone has any tips on how to acquire funding it would be much appreciated as I hope I can eventually tape these out!

submitted by /u/Mr-wabbit0
[link] [comments]

Liked Liked