Controllable Symbolic Music Generation via Stage-Aware Style Routing and Differentiable Melody Regularization
Controllable symbolic music generation must preserve a reference melody while remaining responsive to style prompts. Existing hierarchical diffusion systems typically reuse a shared condition vector across harmony, rhythm, and timbre stages, which can entangle stylistic factors and weaken melody preservation. We present HCDMG++, a hierarchical diffusion framework that addresses these two limitations through Stage-Aware Style Routing and Differentiable Melody Regularization. The routing module uses a residual Multi-Layer Perceptron (MLP) to project text-derived style embeddings into stage-specific subspaces, whereas the regularization branch aligns soft pitch histograms and contour trajectories with the conditioning melody during training. We evaluate the integrated system on a 384-sample benchmark covering four melodies, eight styles, four random seeds, and three denoising budgets. HCDMG++ produces valid four-track outputs in all runs and reaches a peak pitch-histogram similarity of 0.508 under a 64-step budget. A matched legacy-compatible reference further shows substantially stronger pitch-histogram alignment than Legacy-HCDMG. These results indicate that stage-specific conditioning and differentiable structural guidance improve controllability in symbolic music diffusion.