Table of Links Abstract and 1. Introduction 2. Background 2.1 Amortized Stochastic Variational Bayesian GPLVM 2.2 Encoding Domain Knowledge through Kernels 3. Our Model and Pre-Processing and Likelihood 3.2 Encoder 4. Results and Discussion and 4.1 Each Component is Crucial to Modifies Model Performance 4.2 Modified Model achieves Significant Improvements over Standard Bayesian GPLVM and is Comparable to SCVI 4.3 Consistency of Latent Space with Biological Factors 4. Conclusion, Acknowledgement, and References A. Baseline Models B. Experiment Details C. Latent Space Metrics D. Detailed Metrics 4.2 MODIFIED MODEL ACHIEVES SIGNIFICANT IMPROVEMENTS OVER STANDARD BAYESIAN GPLVM AND IS COMPARABLE TO SCVI We compare our proposed model with three benchmark models: OBGPLVM, the current state-ofthe-art scVI (Lopez et al., 2018) (Appendix A.1), and a simplified scVI model with a linear decoder (LDVAE) (Svensson et al., 2020) (Appendix A.2) on the synthetic dataset and a real-world COVID19 dataset (Stephenson et al., 2021). The UMAP plots for the COVID dataset are presented in Figure 3 and the detailed latent space metrics and UMAP plots are given in Appendix D. Based on the UMAP visualizations, we observe that for both the simulated and COVID datasets, the modified BGPLVM achieves more visually separated cell types and mixed batches compared to the standard Bayesian GPLVM. The model also achieves visually comparable visualizations to scVI and LDVAE (Figures 7 and 3). While the modified model may not achieve better performance when compared to scVI and LDVAE, the GPLVM offers a more intuitive way to encode prior domain knowledge, and exploring such kernels and likelihoods more tailored to specific datasets are left for future work. This paper is available on arxiv under CC BY-SA 4.0 DEED license. Authors: (1) Sarah Zhao, Department of Statistics, Stanford University, (smxzhao@stanford.edu); (2) Aditya Ravuri, Department of Computer Science, University of Cambridge (ar847@cam.ac.uk); (3) Vidhi Lalchand, Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard (vidrl@mit.edu); (4) Neil D. Lawrence, Department of Computer Science, University of Cambridge (ndl21@cam.ac.uk). Table of Links Abstract and 1. Introduction Abstract and 1. Introduction 2. Background 2.1 Amortized Stochastic Variational Bayesian GPLVM 2.1 Amortized Stochastic Variational Bayesian GPLVM 2.2 Encoding Domain Knowledge through Kernels 2.2 Encoding Domain Knowledge through Kernels 3. Our Model and Pre-Processing and Likelihood 3. Our Model and Pre-Processing and Likelihood 3.2 Encoder 3.2 Encoder 4. Results and Discussion and 4.1 Each Component is Crucial to Modifies Model Performance 4. Results and Discussion and 4.1 Each Component is Crucial to Modifies Model Performance 4.2 Modified Model achieves Significant Improvements over Standard Bayesian GPLVM and is Comparable to SCVI 4.2 Modified Model achieves Significant Improvements over Standard Bayesian GPLVM and is Comparable to SCVI 4.3 Consistency of Latent Space with Biological Factors 4.3 Consistency of Latent Space with Biological Factors 4. Conclusion, Acknowledgement, and References 4. Conclusion, Acknowledgement, and References A. Baseline Models A. Baseline Models B. Experiment Details B. Experiment Details C. Latent Space Metrics C. Latent Space Metrics D. Detailed Metrics D. Detailed Metrics 4.2 MODIFIED MODEL ACHIEVES SIGNIFICANT IMPROVEMENTS OVER STANDARD BAYESIAN GPLVM AND IS COMPARABLE TO SCVI We compare our proposed model with three benchmark models: OBGPLVM, the current state-ofthe-art scVI (Lopez et al., 2018) (Appendix A.1), and a simplified scVI model with a linear decoder (LDVAE) (Svensson et al., 2020) (Appendix A.2) on the synthetic dataset and a real-world COVID19 dataset (Stephenson et al., 2021). The UMAP plots for the COVID dataset are presented in Figure 3 and the detailed latent space metrics and UMAP plots are given in Appendix D. Based on the UMAP visualizations, we observe that for both the simulated and COVID datasets, the modified BGPLVM achieves more visually separated cell types and mixed batches compared to the standard Bayesian GPLVM. The model also achieves visually comparable visualizations to scVI and LDVAE (Figures 7 and 3). While the modified model may not achieve better performance when compared to scVI and LDVAE, the GPLVM offers a more intuitive way to encode prior domain knowledge, and exploring such kernels and likelihoods more tailored to specific datasets are left for future work. This paper is available on arxiv under CC BY-SA 4.0 DEED license. This paper is available on arxiv under CC BY-SA 4.0 DEED license. available on arxiv Authors: (1) Sarah Zhao, Department of Statistics, Stanford University, (smxzhao@stanford.edu); (2) Aditya Ravuri, Department of Computer Science, University of Cambridge (ar847@cam.ac.uk); (3) Vidhi Lalchand, Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard (vidrl@mit.edu); (4) Neil D. Lawrence, Department of Computer Science, University of Cambridge (ndl21@cam.ac.uk). Authors: Authors: (1) Sarah Zhao, Department of Statistics, Stanford University, (smxzhao@stanford.edu); (2) Aditya Ravuri, Department of Computer Science, University of Cambridge (ar847@cam.ac.uk); (3) Vidhi Lalchand, Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard (vidrl@mit.edu); (4) Neil D. Lawrence, Department of Computer Science, University of Cambridge (ndl21@cam.ac.uk).