Implementation Details of Tree-Diffusion: Architecture and Training for Inverse Graphics

Table of Links

Abstract and 1. Introduction

Appendix

A. Mutation Algorithm

B. Context-Free Grammars

C. Sketch Simulation

D. Complexity Filtering

E. Tree Path Algorithm

F. Implementation Details

F Implementation Details

We implement our architecture in PyTorch [1]. For our image encoder we use the NF-ResNet26 [4] implementation from the open-sourced library by Wightman [38]. Images are of size 128 × 128 × 1 for CSG2D and 128 × 128 × 3 for TinySVG. We pass the current and target images as a stack of image planes into the image encoder. Additionally, we provide the absolute difference between current and target image as additional planes.

For the autoregressive (CSGNet) baseline, we trained the model to output ground-truth programs from target images, and provided a blank current image. For tree diffusion methods, we initialized the search and rollouts using the output of the autoregressive model, which counted as a single node expansion. For our re-implementation of Ellis et al. [11], we flattened the CSG2D tree into shapes being added from left to right. We then randomly sampled a position in this shape array, compiled the output up until the sampled position, and trained the model to output the next shape using constrained grammar decoding.

This is a departure from the pointer network architecture in their work. We think that the lack of prior shaping, departure from a graphics specific pointer network, and not using reinforcement learning to fine-tune leads to a performance difference between their results and our re-implementation. We note that our method does not require any of these additional features, and thus the comparison is fairer. For tree diffusion search, we used a beam size of 64, with a maximum node expansion budget of 5000 nodes.

Authors:

(1) Shreyas Kapur, University of California, Berkeley ([email protected]);

(2) Erik Jenner, University of California, Berkeley ([email protected]);

(3) Stuart Russell, University of California, Berkeley ([email protected]).

This paper is available on arxiv under CC BY-SA 4.0 DEED license.