Lessons for Improving Training Performance — Part 1

TLDR

Pure Storage published TensorFlow deep learning performance results in March. In Part 2 we’ll investigate how input pipelines affect overall training throughput. Performance gains came from ten months of application developments, not a single factor. With FP16 support, developers can take advantage of Tensor Cores present on Nvidia GPUs, trading lower precision for higher training throughput. With larger batch sizes, more samples are processed together, amortizing coordination work. The input pipeline during training, previously a performance limiter, is more efficient.via the TL;DR App

no story

Published by HackerNoon on 2019/10/02