Table of Links Abstract and 1. Introduction Abstract and 1. Introduction Preliminaries and Related Work


Key Bottlenecks in PC Parallelization


Harnessing Block-Based PC Parallelization
4.1. Fully Connected Sum Layers
4.2. Generalizing To Practical Sum Layers
4.3. Efficient Implementations by Compiling PC Layers
4.4. Analysis: IO and Computation Overhead


Optimizing Backpropagation with PC Flows


Experiments
6.1. Faster Models with PyJuice
6.2. Better PCs At Scale
6.3. Benchmarking Existing PCs


Conclusion, Acknowledgements, Impact Statement, and References Preliminaries and Related Work Preliminaries and Related Work Preliminaries and Related Work Key Bottlenecks in PC Parallelization Key Bottlenecks in PC Parallelization Key Bottlenecks in PC Parallelization Harnessing Block-Based PC Parallelization
4.1. Fully Connected Sum Layers
4.2. Generalizing To Practical Sum Layers
4.3. Efficient Implementations by Compiling PC Layers
4.4. Analysis: IO and Computation Overhead Harnessing Block-Based PC Parallelization 4.1. Fully Connected Sum Layers 4.1. Fully Connected Sum Layers 4.2. Generalizing To Practical Sum Layers 4.2. Generalizing To Practical Sum Layers 4.3. Efficient Implementations by Compiling PC Layers 4.3. Efficient Implementations by Compiling PC Layers 4.4. Analysis: IO and Computation Overhead 4.4. Analysis: IO and Computation Overhead Optimizing Backpropagation with PC Flows Optimizing Backpropagation with PC Flows Optimizing Backpropagation with PC Flows Experiments
6.1. Faster Models with PyJuice
6.2. Better PCs At Scale
6.3. Benchmarking Existing PCs Experiments Experiments 6.1. Faster Models with PyJuice 6.1. Faster Models with PyJuice 6.2. Better PCs At Scale 6.2. Better PCs At Scale 6.3. Benchmarking Existing PCs 6.3. Benchmarking Existing PCs Conclusion, Acknowledgements, Impact Statement, and References Conclusion, Acknowledgements, Impact Statement, and References Conclusion, Acknowledgements, Impact Statement, and References A. Algorithm Details A. Algorithm Details B. Additional Technical Details B. Additional Technical Details C. Experimental Details C. Experimental Details D. Additional Experiments D. Additional Experiments 6. Experiments We evaluate the impact of using PyJuice to train PC models. In Section 6.1, we compare PyJuice against existing implementations regarding time and memory efficiency. To demonstrate its generality and flexibility, we evaluate PyJuice on four commonly used dense PC structures as well as highly unstructured and sparse PCs. Next, we demonstrate that PyJuice can be readily used to scale up PCs for various downstream applications in Section 6.2. Finally, in Section 6.3, we benchmark existing PCs on high-resolution image datasets, hoping to incentivize future research to develop better PC structures as well as learning algorithms. 5. Optimizing Backpropagation with PC Flows While similar results have been established in a slightly different context (Peharz et al., 2020a), we prove the following equations in Appendix B.2 for completeness: Another important design choice that leads to a significant reduction in memory footprint is to recompute the product nodes’ probabilities in the backward pass instead of storing them all in the GPU memory during the forward pass. Specifically, we maintain a scratch space on GPU HBM that 5 If such nodes exist, we can always collapse them into a single sum or product node. can hold the results of the largest product layer. All product layers write their outputs to this same scratch space, and the required product node probabilities are re-computed when requested by a sum layer during backpropagation. Since product layers are extremely fast to evaluate compared to the sum layers (e.g., see the runtime breakdown in Fig. 2), this leads to significant memory savings at the cost of slightly increased computation time. Authors:
(1) Anji Liu, Department of Computer Science, University of California, Los Angeles, USA (liuanji@cs.ucla.edu);
(2) Kareem Ahmed, Department of Computer Science, University of California, Los Angeles, USA;
(3) Guy Van den Broeck, Department of Computer Science, University of California, Los Angeles, USA; Authors: Authors: (1) Anji Liu, Department of Computer Science, University of California, Los Angeles, USA (liuanji@cs.ucla.edu); (2) Kareem Ahmed, Department of Computer Science, University of California, Los Angeles, USA; (3) Guy Van den Broeck, Department of Computer Science, University of California, Los Angeles, USA; This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv available on arxiv [5] If such nodes exist, we can always collapse them into a single sum or product node.

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Optimizing Backpropagation with PC Flows

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Conditional Diffusion Approach to Multidimensional Unfolding Across Physics Processes

Scaling Probabilistic Circuits with PyJuice: 100x Faster Training & 5x Lower Memory Use

Why Researchers Are Betting on PCs to Power the Next Wave of AI

Key Bottlenecks in PC Parallelization

How Block-Based Parallelization Cuts IO and Computation Overhead

PyJuice Pushes HMMs and Image Models Beyond State-of-the-Art

A Conditional Diffusion Approach to Multidimensional Unfolding Across Physics Processes

Scaling Probabilistic Circuits with PyJuice: 100x Faster Training & 5x Lower Memory Use

Why Researchers Are Betting on PCs to Power the Next Wave of AI

Key Bottlenecks in PC Parallelization

How Block-Based Parallelization Cuts IO and Computation Overhead

PyJuice Pushes HMMs and Image Models Beyond State-of-the-Art

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps