Consider the challenge of keeping track of a list of integers A
. Users of the list A
are interested in performing two types of operations:
(1) Updating the value of any element in A
, and
(2) Calculating the sum of some sub-array A[l..r]
.
The length of list A is O(N)
, the number of Type 1 queries is O(Q_1)
, and the number of type 2 queries is O(Q_2)
. In addition to conducting the procedures correctly, the time spent on each process must be appropriate in order to accommodate all queries.
Example: Consider the following list of integers A = [1, 2, 3, 4, 5].
The user requests a type 1 query to increase the 2nd element of A by 3. This results in the updated list A = [1, 5, 3, 4, 5]. The user's next request is for the sum of elements in A[1..3]. This value evaluates to 9.
The essence of the challenge is determining the best data structure to accommodate these kinds of needs. One such good option is the Fenwick tree, which takes logarithmic time for each query (both update and range query). The remainder of this article explores the concept behind this intriguing data structure.
Let us investigate a few potential solutions to the situation. The very natural solution is to keep only the list of integers A
in its original form. Any update query can be handled quickly and easily. Since the update query requires changing an element of A
, the request can be handled in O(1)
time. The sum query, on the other hand, is dependent on the length of the interval requested. Every time a type 2 query is made, the sum had to be calculated by iterating over the range. In the worst-case scenario, the range sum query can take O(N)
. When the amount of type 2 queries is small, this strategy will function efficiently. However, as the quantity of type 2 queries increases, the overall time complexity becomes O(Q_2.N)
. Which is quadratic in nature and hence not desirable.
When updates are more and range sum queries are less, maintaining only list A solves the problem.
But what if Q2
is large? We would want to speed up the range sum queries. Simple observation shows that any subarray sum can be denoted as the difference of prefix sums. Here is how it is calculated -
sum(A[l..r]) = sum(A[0..r]) - sum(A[0..l-1])
Thus, if we have the prefix sums pre-calculated, the range sub-query can be handled in O(1)
time. Building upon this idea, let us maintain a prefix sum array P
for the array A
. Now any range query can be answered using the values present in the list P
. But this makes the update operation costly. Note how any change to the value of A[i]
will cause us to recalculate entries in P
for all positions greater or equal to i
. Thus in the worst case, the update operation may take O(n)
time.
When the number of update operations is less compared to the number of range queries, maintaining a prefix array is an efficient solution.
What we just saw are two extremes of all the possible solutions. The way the Fenwick tree maintains all this data allows us to answer both types of queries in O(logn)
time.
The idea of Fenwick Tree is to maintain a list T
. The length of the list T
is the same as that of list A
. Each element of the list, T[i]
, stores the sum of elements of A
in the range g(i)
to i
. Where g(i)
falls in the range [0, i]
. Thus there are many choices of the function g(.)
. Once the list T
is created, we can handle the query to find the sum of a prefix A[0..r]
as the following procedure suggests.
def sum(r): # Sum from A[0...r]
s = 0
while r >= 0:
s = s + T[r]
r = g(r) - 1
return s
For an update query, the list T needs to be updated to accommodate an increase of delta
in elements at pos
index. Thus an update query will affect those index j
in T
whose range contains pos
. The following procedures summarizes the same.
def update(pos, delta):
for all j where g(j) <= pos <= j:
T[j] += delta
return
Choosing a function g()
is the last piece of the puzzle. A smart choice of the same function helps achieve the bounds of Fenwick Tree.
When g(i)=0
, the list T
boils down to prefix sum. And, when g(i)=i
, the list T
becomes exactly equal to list A
.
The following choice of function g()
is made in Fenwick Tree. It is defined as g(i)
is the number obtained by flipping all the trailing zeros of the number i
. In other words, g(i) is obtained by converting all the ones to zero when moving from Least Significant Bit (LSB) to Most Significant Bit (MSB) until a zero is obtained. This function can be written beautifully in terms of bitwise operators as g(i) = i & (i + 1)
.
Examples - Bold numbers are in Binary
g(8) = g(1000) = 1000 = 8
g(7) = g(111) = 000 = 0
g(23) = g(10111) = 10000 = 16
Thus we have all the details required for handling the sum queries. But there is still a roadblock in implementing the update method. How do we find all those positions in list T
which are affected by the update operation? It is important to note that following the conversion of trailing ones to zero, any positions of list T
that become smaller than or equal to the position pos
that is being updated, are bound to be updated too. All these numbers can be generated by replacing the least significant zero with 1
. Continuing this process will result in all those positions which will be affected by the update query. There is a direct function for this too and it can be given as follows -h(j) = j | (j + 1)
Example
pos = 5. Thus T[5]
will be updated.
h(5) = h(101) = 111 = 7. Thus T[7]
will be updated.
h(8) = h(111) = 1111 = 15. Thus T[15]
will be updated.
….
This completes the final piece of the puzzle as well. Both of the queries can now be answered in O(log n)
time. We will next see informal proof of this claim.
First, let us take the sum method. For any range, [0, r]
we move from r
to g(r)-1
to g(g(r))-1
and so on until we hit 0
. Consider a number n
whose least significant bit (LSB) is zero. If the number n does not have any 1s in its binary representation, we are done. So assume that number n has a 1 at ith
a bit from left. The number n will be of kind n = b_1 b_2 b_3 … b_(i-1) 1 0 0 0 … 0
, where b_1, b_2, …, b_(i-1)
is either 0
or 1
. Since there are no trailing ones we will have g(n) = n
. Thus we will move from n
to n-1
during the loop. The number n-1
will have the following binary representation n-1 = b_1 b_2 b_3 … b_(i-1) 0 1 1 1 … 1
. Note that g(n-1)
will look like g(n-1) = b_1 b_2 b_3 … b_(i-1) 0 0 0 0 … 0
. Thus in just 2 steps, the rightmost set bit in number n is unset. And since there can be at most log n set bits in a number, the loop will terminate in logarithmic number of steps.
If the number already has trailing zeros, then also each step will unset the rightmost set bit. Thus the entire procedure is bound to stop in logarithmic number of steps.
Using Fenwick Tree for handling point updates and range sum query is one of the many applications in which this interesting data structure finds its use. Many implementations of Fenwick Tree are available all over the internet. One such can be found here. For further reading, readers are encouraged to refer to the following sources that the author has consulted while writing this article.
With this interesting data structure, you can solve many interesting problems. Do you know of any other data structure which can also be used to solve this problem? Answers in comments please!