Too Long; Didn't Read
Pure Storage published TensorFlow deep learning performance results in March. In Part 2 we’ll investigate how input pipelines affect overall training throughput. Performance gains came from ten months of application developments, not a single factor. With FP16 support, developers can take advantage of Tensor Cores present on Nvidia GPUs, trading lower precision for higher training throughput. With larger batch sizes, more samples are processed together, amortizing coordination work. The input pipeline during training, previously a performance limiter, is more efficient.