Table of Links
Appendix
3.2 Policy
3.2.1 Forward Process
3.2.2 Reverse Mutation Paths
Since we have access to the ground-truth mutations, we can generate targets to train a neural network by simply reversing the sampled trajectory through the forward process Markov-Chain, z0 → z1 → . . .. At first glance, this may seem a reasonable choice. However, training to simply invert the last mutation can potentially create a much noisier signal for the neural network.
Consider the case where, within a much larger syntax tree, a color was mutated as,
Authors:
(1) Shreyas Kapur, University of California, Berkeley ([email protected]);
(2) Erik Jenner, University of California, Berkeley ([email protected]);
(3) Stuart Russell, University of California, Berkeley ([email protected]).
This paper is