Hamilton-Jacobi-Bellman Equations and Reinforcement Learning: A Theoretical Framework and Empirical Study for Dynamic Credit Decision-Making
Traditional credit scoring models reduce decisions to static classification, ignoring dynamic risk evolution and long-term profit. This paper integrates the Hamilton-Jacobi-Bellman (HJB) equation with deep reinforcement learning, reformulating credit risk as a discrete-time stochastic optimal control problem. Theoretically, we establish equivalence between discrete Markov decision processes and the HJB equation, prove existence and uniqueness of the optimal value function, derive the closed-form Riccati solution under linear-quadratic assumptions, and show neural network value iteration is an effective numerical scheme with separable errors. Empirically, using LendingClub data (2016–2018), the HJB-based PPO model significantly outperforms all static baseline models considered (e.g., logistic regression, random forest, XGBoost) in average profit (1.5167) and total profit (786,700.4682). Ablation experiments replacing the policy network with linear mapping reduce profit by 34.7%, confirming the necessity of nonlinear approximation. Theoretical validation gives a mean squared error of 0.0006 between the neural value function and Riccati solution. This work provides a rigorous mathematical foundation for reinforcement learning in financial risk control and a path from static classification to dynamic optimization in credit scoring.