Too Long; Didn't Read

Researchers introduce Deep Lake, an open-source lakehouse for deep learning, optimizing complex data storage and streaming for deep learning frameworks.
(1) Sasun Hambardzumyan, Activeloop, Mountain View, CA, USA;

(2) Abhinav Tuli, Activeloop, Mountain View, CA, USA;

(3) Levon Ghukasyan, Activeloop, Mountain View, CA, USA;

(4) Fariz Rahman, Activeloop, Mountain View, CA, USA;.

(5) Hrant Topchyan, Activeloop, Mountain View, CA, USA;

(6) David Isayan, Activeloop, Mountain View, CA, USA;

(7) Mark McQuade, Activeloop, Mountain View, CA, USA;

(8) Mikayel Harutyunyan, Activeloop, Mountain View, CA, USA;

(9) Tatevik Hakobyan, Activeloop, Mountain View, CA, USA;

(10) Ivo Stranic, Activeloop, Mountain View, CA, USA;

(11) Davit Buniatyan, Activeloop, Mountain View, CA, USA.


We presented Deep Lake, the lakehouse for deep learning. Deep Lake is designed to help deep learning workflows run as seamlessly as analytical workflows run on Modern Data Stack. Notably, Deep Lake is built to retain prominent features of data lakes, such as time travel, querying, and rapid data ingestion at scale. One important distinction from traditional data lakes is Deep Lake’s ability to store unstructured data with all its metadata in deep learning-native columnar format, which enables rapid data streaming. This allows materializing data subsets on-the-fly, visualizing them in-browser, or ingesting them into deep learning frameworks without sacrificing GPU utilization. Finally, we show that Deep Lake achieves state-of-the-art performance for deep learning on large datasets via multiple benchmarks.


The authors would like to thank Richard Socher, Travis Oliphant, Charu Rudrakshi, Artem Harutyunyan, Iason Ofeidis, Diego Kiedanski, Vishnu Nair, Fayaz Rahman, Dyllan McCreary, Benjamin Hindman, Eduard Grigoryan, Kristina Grigoryan, Ben Chislett, Joubin Houshyar, Andrii Liubimov, Assaf Pinhasi, Vishnu Nair, Eshan Arora, Shashank Agarwal, Pawel Janowski, Kristina Arezina, Gevorg Karapetyan, Vigen Sahakyan and the open-source community including contributors. The project was funded by Activeloop. We also thank the CIDR reviewers for their feedback.


This paper is available on arxiv under CC 4.0 license.