Tuesday, February 28, 2017

Interview Pearls: Count Inversions

I am collecting problems that start easy and then lead to a deep conversation, touching on multiple aspects of computer science fundamentals. These problems are being asked during interviews in the top technology companies, often with a twist to see if you understand the underlying principles and can adapt your solution.

Part 1: Introduction, coding practice and simple solutions
Part 2: Advanced solutions
Part 3: Expert solutions

Here we have an innocent-looking problem that bled me dry (luckily, it was not in a real interview!). Turned out, there are many interesting ways to solve it, and I thought it could be a good illustration to some fundamental algorithms. In this first post, we will start with three relatively simple (but not very efficient) solutions. In later parts, we fiddle with binary search trees, indexed trees, merge sort, and even segment trees.

Problem Statement

"Given an array of numbers, count pairs where i < j and numbers[i] > numbers[j]"
I saw this problem also referenced as "Reverse Pairs" and "Count of Smaller Numbers After Self". The result indicates how far the array is from being sorted. It's zero if the array is already sorted, and it's maximum (1 + 2 + ... + size - 1, or size * (size - 1) / 2) if it's sorted in the reverse order. Another way to look at this; the result is the number of swap operations required to sort the array using the bubble sort algorithm.

Coding Practice

Try to solve this problem before moving on to the solutions. It is available on LeetCode Online Judge (problem #315) with a minor twist to count of pairs for each element of the array. As it does not change the essence of the problem, I used LeetCode Online Judge to run and validate the solutions below. As you read through each solution, try to implement it before looking at the code.

LeetCode is my most favorite destinations for algorithmic problems. It has 526 problems (at the time of this post) with test cases to automatically validate your solution correctness, as well as computational and memory complexity. Most of the problems are free, and there is a large community of programmers discussing different approaches, tips and tricks.

Introduction

I encountered this problem during a programming contest, and I was struggling with it quite a bit. A brute force solution with a quadratic complexity did not meet the time limit; I sensed a linearithmic  solution but could not find a standard container to do the job. I ended up implementing BST with counters, like solution #3 below.

1. Brute force

This is a straightforward solution with two nested loops. The computational complexity is quadratic, so it does not pass LeetCode Online Judge ("Time Limit Exceeded"). As with any solution, you need to watch for corner cases, like an empty input array (see the comment in the code).

2. Sorted container

Now, if quadratic is too slow, what complexity should we shoot for? Linear does not seem realistic, and between quadratic and linear lies linearithmic . Therefore, we should think about sorting the input, applying a divide & conquer strategy, or using a binary search tree.

We can try to use a container that stores elements in a sorted order, like set (or multiset, if duplicates are possible) in C++. We will insert elements starting from the end of the array, and the position of the inserted element will indicate the number of inversions for that element.
We can insert and find a single element in a sorted container in a logarithmic time, resulting in linearithmic time for all elements in the array. Sounds promising, right?

Unfortunately, when we submit this solution, we still get the "Time Limit Exceeded" error. The reason is that, internally, a sorted container is a self-balancing BST. While it allows logarithmic insertion and search, calculating the distance is a traversal from the beginning. In other words, we need to enumerate all notes before we reach the target, which requires linear time. This solution is more efficient than the brute force one, and it provides linearithmic time in the best case, but it's quadratic in the average and worst cases.

3. BST with counters

So, what if we have BST that can tell us how many elements are smaller than any given value? In each node, we can track the number of elements in the left sub-tree. The number of inversions of any element will be the number of elements in the parent's left subtree plus one (for the parent element itself). In the example below when we insert an element as a right child of the root, the number of inversions is 2 (elements in the root's left subtree) plus one (the root element itself).
Since it's our own implementation of BST, we can make a neat optimization - instead of creating multiple nodes for duplicate values, we can just count the occurrences of that value in a single node.

This solution is accepted by Online Judge. However, this is because Online Judge does not have a good test for the worst case. There is a similar problem 493, and a solution based on BST with counters is not accepted there. If the input is mostly sorted, our tree will be unbalanced, and insert and find operation will require linear time. As I mentioned above, a standard sorted container is backed up by self-balancing BST, and our custom implementation does not have this logic (yet). As the result, this solution has quadratic time in the worst case, and linearithmic time in the average and best cases.

1 In the Big O notation, quadratic is O (n2), and linearithmic is O (n log n).
2 This type of analysis is called the best conceivable run-time in Cracking the Codding Interview.