The Importance of a Feedback Loop: An Ablation Study on Neural Code Generation

Table of Links

Abstract and 1. Introduction

Appendix

A. Mutation Algorithm

B. Context-Free Grammars

C. Sketch Simulation

D. Complexity Filtering

E. Tree Path Algorithm

F. Implementation Details

4.3 Ablations

To understand the impact of our design decisions, we performed ablation studies on the simplified Rainbow environment using a smaller transformer model.

First, we examined the effect of removing the current image (no REPL) from the policy network’s input. As shown in Figure 5(a), this drastically hindered performance, confirming the importance of a REPL-like interface observed by Ellis et al. [11].

Next, we investigated the necessity of our reverse mutation path algorithm. While training on the last mutation step alone provides a valid path, it introduces noise by potentially targeting suboptimal intermediate states. Figure 5(a) demonstrates that utilizing the reverse mutation path significantly improves performance, particularly in finding solutions with fewer steps. However, both methods eventually reach similar performance levels, suggesting that a noisy path, while less efficient, can still lead to a solution.

Finally, we explored whether the incremental noise process is crucial, given our tree edit path algorithm. Couldn’t we directly sample two random expressions, calculate the path, and train the network to imitate it? We varied the training data composition between pure forward diffusion (ρ = 0.0) and pure random initialization (ρ = 1.0) as shown in Figure 5(b). We found that a small proportion (ρ = 0.2) of pure random initializations combined with forward diffusion yielded the best results. This suggests that forward diffusion provides a richer training distribution around target points, while random initialization teaches the model to navigate the program space more broadly. The emphasis on fine-grained edits from forward diffusion proves beneficial for achieving exact pixel matches in our evaluations.

Authors:

(1) Shreyas Kapur, University of California, Berkeley ([email protected]);

(2) Erik Jenner, University of California, Berkeley ([email protected]);

(3) Stuart Russell, University of California, Berkeley ([email protected]).

This paper is available on arxiv under CC BY-SA 4.0 DEED license.