Grammar-Guided Incremental Method for Efficient LLM-Generated Code Execution

Rapid advancements in large language models with code generation abilities have enabled new paradigms in automated software development, positioning AI both as a coding assistant and an active actor within complex software ecosystems. Traditional code generation pipelines, mostly relying on tool calling via ReAct approach, require a complete code snippet to be generated and followed by validation and correction, often leading to significant latency and resource overhead due to sequential inference and execution processes. This research introduces a novel asynchronous inference algorithm that integrates context-free grammar parsing with real-time REPL-based execution, enabling early detection of syntax, semantic, and runtime errors without completing entire code snippets. We formally define the suitability criteria for LLMs in a target programming language, establish parse-tree-based identification of top-level statements, and present an incremental buffer-parsing mechanism that triggers execution upon recognition of complete statements. Implemented for Python 3 using the Lark parser and evaluated on a modified MBPP split ($N{=}113$ tasks; dataset and prompts in the Appendix) across six models—CodeAct–Mistral, GPT-OSS~20B, Gemma~3, Llama~3.2, Phi~4, and Qwen3-Coder~30B—our method is compared to a synchronous baseline using paired Wilcoxon tests with Bonferroni correction. Empirical results show significantly faster time-to-first-output for every model, large reductions in total latency where top-level script execution dominates (up to roughly an order of magnitude for CodeAct–Mistral), and no material change in pass or correctness rates, indicating that incremental execution improves responsiveness without altering task outcomes. With special prompting or finetuning, the method shows up to 4x reduction in latency for valid code generation. The benchmark results confirm that synchronous inference constraints can be alleviated through grammar-guided incremental execution, allowing more efficient and responsive agent-driven code execution workflows. Future research will explore predictive parsing techniques, deeper integration with agentic system architectures, security constraints, and formulating runtime requirements for scalable deployment of LLM-generated code execution environments.

Liked Liked