We will look at the implementation of the function suffix in collections in terms of the evolution of computational complexity from the basic protocol Sequence to the RandomAccessCollection protocol.
Before reading this article, get acquainted with the concept of computational complexity.
Each Swift developer used features from the Sequence family of protocols. It comes from the Sequence protocol and includes Sequence, Collection, BidirectionalCollection, RandomAccessCollection, and other protocols related to asynchrony, laziness, collection variability. Here, we'll look at the most basic protocols that provide read-only access: Sequence, Collection, BidirectionalCollection, and RandomAccessCollection protocols.
We use the functions and properties of collections (e.g., prefix, sort, count) without thinking about internal implementation, although the implementation initially looks simple. However, a feature available at different levels of these protocols' hierarchy can be implemented differently and with diverse computational complexity.
Each new protocol in the protocol hierarchy from Sequence to RandomAccessCollection announces new basic capabilities. They allow you to implement new default functions in the protocol extension and optimize existing ones declared in inherited protocols. We'll see the evolution of implementation based on new functions at each level of the protocol hierarchy on the example of the standard implementation of the suffix function for the Sequence, Collection, BidirectionalCollection, and RandomAccessCollection. The suffix function returns a subsequence of the last k elements.
The Sequence is the base protocol in the hierarchy of collections. The primary ability is to iterate one by one element from some set due to the iterator. This set can be limited or unlimited, and an iteration of elements can be destructive or non-destructive. The protocol doesn't say anything about it.
Sequence does not have property count, but there is an underestimatedCount property. On the simple principle of iterating elements, it becomes possible to write many amazing functions, such as prefix, suffix, contains, first, min, max, drop, sorted, map, etc. The protocol has a default implementation for these functions. They may be suboptimal; for example, we cannot access an element using an index with computational complexity O(1), based only on the principle of iterative elements. The simplicity of this protocol limits the ability to implement functions. We can only iterate the elements. That's it.
Only one simple function allows us to iterate over a set of elements from start to finish at this protocol level. The only way to get the last k elements is to enumerate every Sequence component, putting the current part into a ring buffer structure. The default implementation of this function defines RingBuffer with the capacity of k elements.
The first element from the Sequence is put into the first cell of the ring buffer. The second element puts to 2-nd cell, k-th element puts to k-th cell, k+1-th puts to the 1-st cell of buffer (while previous items stay in their cells), and further every current piece rewrite an old one. Thus, when we get the last element of the sequence, all the last k elements are in the ring buffer. The only thing we should do is cut and glue the result array from the buffer with the right index equal to (count of elements in the sequence+1) % k. This function's computational complexity is O(n) since iterate over all the elements.
The Collection protocol has the same capabilities as the Sequence protocol, but it declares non-destructive iteration of the elements multiple times and provides access to the items by index.
In addition, iteration over elements provides the same order as the elements when indexing. Iterating through the elements by an index is only available in one direction. You can get an element by index, calculate the following index, get the next item by this index, calculate the following index, and so on. However, you cannot calculate the index in reverse order.
For example, for simplicity, suppose that we have a collection whose indices are Int numbers. To get the first element, we should use the startIndex property available in the protocol (also, there is endIndex), which will be 0. Then we get the following index of our collection, next to the initial index — index(after: 0), will be 1. However, in Collection, there is no way to define the index preceding index 1. BidirectionalCollection adds this ability.
In Collection, we can call the subscript of a collection by range to get the elements we need, and it seems like a simple operation at first. We need to get the start index and call collection's subscript [suffixStartIndex .. <endIndex]. The last function will work with O(1) complexity, but there is a problem with getting the start index of the last k elements. We do not have a function for getting the i-th index with complexity O(1) to obtain it. The protocol only provides the ability to get the next index in O(1), — index(after:). To reach the i-th index, first, we need to calculate startIndex + 1, startIndex + 2, and so on, and then up to our index suffixStartIndex.
Thus, the suffix function in Collection has a different default implementation based on index calculation rather than iteration over all elements, but the complexity of this function remains O(n).
A collection that has just one additional base function index(before:) adds new features. For example, the property last (computed as an item at the index preceding endIndex) and reversed (returns another ReversedCollection structure, passing our collection as a parameter to the initializer).
Yes, until now, we haven't been able to implement these functions in previous protocols because they would have been highly ineffective. For instance, we could iterate over all the elements on the Sequence protocol and take the last one. In reality, there is no such function, but if we call the suffix(1) function, such an algorithm will be used.
The BidirectionalCollection provides one new primary feature, which is to get the following index backward order. As we remember, in the Collection protocol, the starting index of the required subsequence was calculated from the very beginning.
However, implementing the suffix function in the BidirectionalCollection protocol uses the ability to compute the previous index; it calculates the index of the starting element of the subsequence, starting from the end. This approach can have its advantages, especially if k is a small number.
It looks like it is possible to optimize the suffix function here by calculating how profitable it is to go from start to end or end to start to compute the subsequence's starting index. But to make such a comparison, you need to call the count property, which has O(n) complexity in this protocol.
Therefore, the implementation assumes that the number of elements k is less than half of n. Otherwise, the function execution will not be the most efficient.
Each new protocol in the protocol hierarchy, in addition to new capabilities, provides more efficient implementations of the functions defined in the base protocols. The peculiarity of this protocol is that it doesn't offer new primary functions, but it defines the requirements for the complexity of working with indices; calculating the distance between the indices and getting the index at some distance from the given one must be implemented with complexity O(1).
In the previous protocol, the complexity of these functions could be O(k), where k is the number of indices between two given ones, but now these functions have a complexity of O(1). Many functions can be significantly optimized based on this, for example, the count property, which now has a computational complexity of O(1).
RandomAccessCollection doesn't provide any new core functions, but it declares that the index(_: offsetBy:) and distance(from: to:) functions must operate with complexity O(1), which gives us the ability to calculate the starting index of our suffix instantly.
Moreover, there is no need to invent any new algorithm or implementation. It’s enough to use the old implementation but of much greater complexity - O(1).
With these simple examples, we've seen how the capabilities of the sequence protocols from Sequence to RandomAccessCollection evolve. I hope you've raised your awareness of the protocol choices for your custom sequences, and you'll understand better the default implementations in these protocols and the logic of the functions based on the protocols' core capabilities.