An Overview of the Data-Loader Landscape: Conclusion, Acknowledgments, and Referencesby@serialization

An Overview of the Data-Loader Landscape: Conclusion, Acknowledgments, and References

Too Long; Didn't Read

In this paper, researchers highlight dataloaders as key to improving ML training, comparing libraries for functionality, usability, and performance.
featured image - An Overview of the Data-Loader Landscape: Conclusion, Acknowledgments, and References
The Serialization Publication HackerNoon profile picture


(1) Iason Ofeidis, Department of Electrical Engineering, and Yale Institute for Network Science, Yale University, New Haven {Equal contribution};

(2) Diego Kiedanski, Department of Electrical Engineering, and Yale Institute for Network Science, Yale University, New Haven {Equal contribution};

(3) Leandros TassiulasLevon Ghukasyan, Activeloop, Mountain View, CA, USA, Department of Electrical Engineering, and Yale Institute for Network Science, Yale University, New Haven.


In this paper, we explored the current landscape of Pytorch libraries that allow machine learning practitioners to load their datasets into their models. These libraries offer a wide array of features from increased speed, creating views of only a subset of the data, and loading data from remote storage. We believe that remote loading holds the most promise for all these features since it enables the de-coupling of data storage and model training. Even though loading speed over the public internet is naturally slower than from a local disk, some libraries, such as Deep Lake, showed remarkable results (only a 13% increase in time). For the most part, we did not find a considerable difference in performance across libraries except for FFCV for multi-GPUs and Deep Lake for networked loading, which performed remarkably well. However, we did notice that the documentation for most of these libraries is not readily available or comprehensive, which might result in misconfigured setups. Since good practices are hard to find, a programmer might use what works well in a different dataloader, which need not work in the new library. At this point, the performance gains do not seem large enough to justify the migration of existing code bases for small to medium jobs. For larger jobs, there could be significant cost reductions for switching to one of the faster libraries. Finally, we believe that an innovative caching system designed for machine learning applications could be the final piece in realizing the vision of a truly decoupled dataset model system. Any such approach would have to build existing knowledge on dataset summarization and active learning.


The authors would like to thank the Activeloop team for their support and insights during the development of this project. The authors would also like to thank both Tryolabs and Activeloop for their resources for running some of the experiments.


Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mane, D., Monga, R., Moore, S., Murray, D., ´ Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viegas, F., Vinyals, O., Warden, P., Wattenberg, M., ´ Wicke, M., Yu, Y., and Zheng, X. TensorFlow: Largescale machine learning on heterogeneous systems, 2015. URL Software available from

Adolf, R., Rama, S., Reagen, B., Wei, G.-Y., and Brooks, D. Fathom: Reference workloads for modern deep learning methods. In 2016 IEEE International Symposium on Workload Characterization (IISWC), pp. 1–10. IEEE, 2016.

Baidu-Research. DeepBench, 2020. URL https://

Ben-Nun, T., Besta, M., Huber, S., Ziogas, A. N., Peter, D., and Hoefler, T. A modular benchmarking infrastructure for high-performance and reproducible deep learning. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 66–77. IEEE, 2019.

Bianco, S., Cadene, R., Celona, L., and Napoletano, P. Benchmark analysis of representative deep neural network architectures. IEEE access, 6:64270–64277, 2018

Buslaev, A., Iglovikov, V. I., Khvedchenya, E., Parinov, A., Druzhinin, M., and Kalinin, A. A. Albumentations: fast and flexible image augmentations. Information, 11(2): 125, 2020.

Coleman, C., Kang, D., Narayanan, D., Nardi, L., Zhao, T., Zhang, J., Bailis, P., Olukotun, K., Re, C., and Zaharia, ´ M. Analysis of dawnbench, a time-to-accuracy machine learning performance benchmark. ACM SIGOPS Operating Systems Review, 53(1):14–25, 2019.

Gao, W., Tang, F., Zhan, J., Lan, C., Luo, C., Wang, L., Dai, J., Cao, Z., Xiong, X., Jiang, Z., et al. Aibench: An agile domain-specific benchmarking methodology and an ai benchmark suite. arXiv preprint arXiv:2002.07162, 2020.

Hadidi, R., Cao, J., Xie, Y., Asgari, B., Krishna, T., and Kim, H. Characterizing the deployment of deep neural networks on commercial edge devices. In 2019 IEEE International Symposium on Workload Characterization (IISWC), pp. 35–48. IEEE, 2019.

Hambardzumyan, S., Tuli, A., Ghukasyan, L., Rahman, F., Topchyan, H., Isayan, D., Harutyunyan, M., Hakobyan, T., Stranic, I., and Buniatyan, D. Deep lake: a lakehouse for deep learning, 2022. URL abs/2209.10785.

Heterogeneous Computing Lab at HKBU, D. DLBench, 2017. URL dlbench.

Hinton, G., Srivastava, N., and Swersky, K. Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Cited on, 14(8):2, 2012.

Hu, H., Jiang, C., Zhong, Y., Peng, Y., Wu, C., Zhu, Y., Lin, H., and Guo, C. dpro: A generic performance diagnosis and optimization toolkit for expediting distributed dnn training. Proceedings of Machine Learning and Systems, 4:623–637, 2022.

Ignatov, A., Timofte, R., Chou, W., Wang, K., Wu, M., Hartley, T., and Van Gool, L. Ai benchmark: Running deep neural networks on android smartphones. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0, 2018.

Krizhevsky, A., Hinton, G., et al. Learning multiple layers of features from tiny images. 2009.

Kumar, A. V. and Sivathanu, M. Quiver: An informed storage cache for deep learning. In 18th USENIX Conference on File and Storage Technologies (FAST 20), pp. 283–296, Santa Clara, CA, February 2020. USENIX Association. ISBN 978-1-939133-12-0. URL fast20/presentation/kumar.

Leclerc, G., Ilyas, A., Engstrom, L., Park, S. M., Salman, H., and Madry, A. ffcv. libffcv/ffcv/, 2022. commit xxxxxxx.

Li, S., Zhao, Y., Varma, R., Salpekar, O., Noordhuis, P., Li, T., Paszke, A., Smith, J., Vaughan, B., Damania, P., et al. Pytorch distributed: Experiences on accelerating data parallel training. arXiv preprint arXiv:2006.15704, 2020.

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., and Zitnick, C. L. Microsoft coco: ´ Common objects in context. In European conference on computer vision, pp. 740–755. Springer, 2014.

Liu, L., Wu, Y., Wei, W., Cao, W., Sahin, S., and Zhang, Q. Benchmarking deep learning frameworks: Design considerations, metrics and beyond. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pp. 1258–1269. IEEE, 2018.

Mattson, P., Cheng, C., Diamos, G., Coleman, C., Micikevicius, P., Patterson, D., Tang, H., Wei, G.-Y., Bailis, P., Bittorf, V., et al. Mlperf training benchmark. Proceedings of Machine Learning and Systems, 2:336–349, 2020.

Mohan, J., Phanishayee, A., Raniwala, A., and Chidambaram, V. Analyzing and mitigating data stalls in dnn training, 2020. URL 2007.06775.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.

PyTorch Core Team. PyTorch: PyTorch Docs. PyTorch.

Shi, S., Wang, Q., Xu, P., and Chu, X. Benchmarking state-of-the-art deep learning software tools. In 2016 7th International Conference on Cloud Computing and Big Data (CCBD), pp. 99–104. IEEE, 2016.

Tao, J.-H., Du, Z.-D., Guo, Q., Lan, H.-Y., Zhang, L., Zhou, S.-Y., Xu, L.-J., Liu, C., Liu, H.-F., Tang, S., et al. Benchip: Benchmarking intelligence processors. Journal of Computer Science and Technology, 33(1):1–23, 2018.

Team, A. D. Hub: A dataset format for ai. a simple api for creating, storing, collaborating on ai datasets of any size & streaming them to ml frameworks at scale. GitHub. Note:, 2022a.

Team, S. D. Squirrel: A python library that enables ml teams to share, load, and transform data in a collaborative, flexible, and efficient way. GitHub. Note:, 2022b. doi: 10.5281/zenodo.6418280.

TorchData. Torchdata: A prototype library of common modular data loading primitives for easily constructing flexible and performant data pipelines. https: //, 2021.

Wang, Y., Wei, G.-Y., and Brooks, D. A systematic methodology for analysis of deep learning hardware and software platforms. Proceedings of Machine Learning and Systems, 2:30–43, 2020.

Webdataset. Webdataset format. https://github. com/webdataset/webdataset, 2013.

Wu, Y., Cao, W., Sahin, S., and Liu, L. Experimental characterizations and analysis of deep learning frameworks. In 2018 IEEE International Conference on Big Data (Big Data), pp. 372–377. IEEE, 2018.

Wu, Y., Liu, L., Pu, C., Cao, W., Sahin, S., Wei, W., and Zhang, Q. A comparative measurement study of deep learning as a service framework. IEEE Transactions on Services Computing, 2019.

Zhang, W., Wei, W., Xu, L., Jin, L., and Li, C. Ai matrix: A deep learning benchmark for alibaba data centers. arXiv preprint arXiv:1909.10562, 2019.

Zhu, H., Akrout, M., Zheng, B., Pelegris, A., Phanishayee, A., Schroeder, B., and Pekhimenko, G. Tbd: Benchmarking and analyzing deep neural network training. arXiv preprint arXiv:1803.06905, 2018.

This paper is available on arxiv under CC 4.0 license.