Too Long; Didn't Read
In machine learning, it is generally assumed that the training samples are Independent and Identically Distributed (IID) As far as the sequence data is concerned, this isn’t always true. If the sequence values have temporal dependence among them, such as Time Series data, the IID assumption fails. The sequence modeling algorithms come in two flavors, Stateless and Stateful, depending upon the architecture used while training. For Stateful architecture, the batches are not shuffled internally (which otherwise is the default step)