Reinforcement Learning-Guided Input Scheduling for Kernel Fuzzing

In this study, we propose an RL-guided fuzzing scheduler that learns optimal mutation ordering and seed prioritization based on kernel coverage reward signals. The agent observes execution depth, subsystem transitions, and historical crash density to adapt exploration strategies. On Linux 5.10, the RL-fuzzer triggers 22% more unique crashes and 31% more deep paths compared with AFL-style schedulers. It identifies 7 previously unknown vulnerabilities, including mismanaged capability checks. Despite additional overhead from RL inference, throughput remains within 85% of baseline fuzzers. This study demonstrates the feasibility of applying RL-based policy learning to kernel fuzzing orchestration.

Liked Liked