Part 1: Introduction, coding practice and simple solutions
Part 2: Advanced solutions
Part 3: Expert solutions
Here we have an innocent-looking problem that bled me dry (luckily, it was not in a real interview!). Turned out, there are many interesting ways to solve it, and I thought it could be a good illustration to some fundamental algorithms. If you digested advanced solutions OK, here is the desert menu: segment tree, lazy propagation segment tree, and height-balanced BST with counters.
7. Segment tree
Segment tree has the same purpose as binary indexed tree (BIT) - efficiently sum ranges in a mutating array. Go ahead and review solution #5 in the previous post to see how we combine update and sum operations to count inversions.
I used to confuse segment tree with interval tree, but they are quite different. Interval tree stores arbitrary intervals in binary search tree, so it can find an overlap in a logarithmic time. Segment tree always works with a predefined interval, which is the size of the underlying array, and uses logarithmic segments of that interval to efficiently perform range operations.
I used to confuse segment tree with interval tree, but they are quite different. Interval tree stores arbitrary intervals in binary search tree, so it can find an overlap in a logarithmic time. Segment tree always works with a predefined interval, which is the size of the underlying array, and uses logarithmic segments of that interval to efficiently perform range operations.
Segment tree is an actual binary tree (unlike BIT); leaves represent elements of the array, and parent nodes represent segments of the array, storing the operation result of its children (and therefore all elements in the segment). Because of that, segment tree requires additional memory comparing to BIT, but also supports non-cumulative operations, such as minimum or maximum value in the range.
- Example 1: for range [1..5], we will split it to [1..4] and [5..5]. In the left subtree, we will stop at [1..4] (value 14), and in the right subtree we will go all the way to element #5 (value 3). The result is [1..4] + #5 = 14 + 3 = 17.
- Example 2: for range [3..7], we will split it into [3..4] and [5..7]. In the left subtree, we will stop at [3..4] (value 11). In the right subtree, we will split [5..7] again into [5..6] and [7..7], and stop at the corresponding ranges (values 8 and 6). The result is [3..4] + [5..6] + [7..7] = 11 + 8 + 6 = 25.
Segment tree can be easily implemented using an array. You will need to know the index of the current node (root's index is zero), and range of the current segment. The left child is at index * 2 + 1, at right child - at index * 2 + 2.
This recursive implementation is good to demonstrate the segment tree operations, but it isn't very efficient. Below is the iterative implementation that uses properties of the segment tree to identify nodes related to a given interval. Based on my experiments, this iterative implementation is twice as fast as the recursive one. I borrowed it from this post, where you can find good explanations on how it works.
Same as for BIT solution, we need normalize the array, so we can use array elements as positions in segment tree. Starting from the back of the array, we will increment value in segment tree for that position, and use the sum operation to count (previously incremented) smaller positions. Note that here we only update one element at a time (no range updates).
The size of the array to hold segment tree is double the underlying array size (extended to power of two) minus one. For example, it's 15 for array sizes 5-8, 31 for array sizes 9-16, 63 for array sizes 17-32, and so on. This solution is accepted by Online Judge for both the original and similar problem 493, and provides linearithmic time in the worst, average and best cases.
8. Lazy propagation segment tree
Segment tree is an efficient data structure to perform range updates, e.g. increment all elements in a specific range, when it's combined with lazy propagation. Instead of making the change all the way to leaves, we stop at a segment that is equal to the current sub-range, and mark the children of the segment node as having a pending update. The following picture shows the segment tree from the previous section after adding 1 to interval [1..6].
When performing a query operation, we check if there is a pending update for the current node, and if so, apply this update and mark the children as having a pending update. One way to track pending updates is to use another array to accumulate pending changes for a node. Check out this video on YouTube and this article on GeeksforGeeks for the additional explanations.
If you are paying attention you would stop me right here, as our solution does not do any range updates. That's right, but what if we flip our operations? Starting from the back of the array, we will increment all elements in range [index + 1, n]. This way, the value at a given index will accumulate the count of inversions.
This solution is based on the recursive implementation of segment tree and it's good to demonstrate lazy propagation. It's not very efficient, unfortunately, in our case as range operations are not prevalent. There is also a way to use lazy propagation for the iterative implementation, described in this post [2].
This solution is accepted by Online Judge for both the original and similar problem 493, and provides linearithmic time in the worst, average and best cases.
When performing a query operation, we check if there is a pending update for the current node, and if so, apply this update and mark the children as having a pending update. One way to track pending updates is to use another array to accumulate pending changes for a node. Check out this video on YouTube and this article on GeeksforGeeks for the additional explanations.
If you are paying attention you would stop me right here, as our solution does not do any range updates. That's right, but what if we flip our operations? Starting from the back of the array, we will increment all elements in range [index + 1, n]. This way, the value at a given index will accumulate the count of inversions.
This solution is based on the recursive implementation of segment tree and it's good to demonstrate lazy propagation. It's not very efficient, unfortunately, in our case as range operations are not prevalent. There is also a way to use lazy propagation for the iterative implementation, described in this post [2].
This solution is accepted by Online Judge for both the original and similar problem 493, and provides linearithmic time in the worst, average and best cases.
9. Height-balanced BST with counters
Solution #4 in the previous post, self-balanced BST with counters, was slower than merge sort and binary indexed tree. I wondered if I could try to tune it up a notch by getting rid of memory allocation overhead and using more efficient tree balancing criteria.
The code below is slightly modified solution 4. Tree nodes now store actual height and use it to decide when to rotate [2]. Also, I am storing nodes in a vector to avoid new and delete operations.
This solution is accepted by Online Judge for both the original and similar problem 493, and provides linearithmic time in the worst, average and best cases.
The code below is slightly modified solution 4. Tree nodes now store actual height and use it to decide when to rotate [2]. Also, I am storing nodes in a vector to avoid new and delete operations.
This solution is accepted by Online Judge for both the original and similar problem 493, and provides linearithmic time in the worst, average and best cases.
Runtime comparison
This is a follow-up to Runtime comparison in the previous post to include additional solutions. I am using 200 arrays populated with random numbers and larger array sizes (10,000 - 100,000). Since we have established that these solutions have linearithmic time in the worst case, I wanted to focus on the average case. I also excluded lazy propagation segment tree as it's ~4 times slower than the iterative segment tree.
External links
- Segment tree vs. BIT on CodeChef
- Segment tree on YouTube
- Efficient and easy segment trees on Codeforces
- Lazy propagation segment tree on YouTube
- Lazy propagation segment tree on GeeksforGeeks
1 I found lazy propagation for the iterative segment tree quite difficult to comprehend. I think that it will be faster that the recursive implementation, but still slower than other solutions. I'll keep it in mind if I need an efficient segment tree and heavy range operations.
2 Tracking the actual height allows achieving the perfect balance (abs (hl - hr) <= 1). Experimentally, I am getting the best runtime when the height difference is less than or equal 2.
No comments:
Post a Comment