Mar 20, 2025
An actual binary search on a linked list???
A data structure that can simulate a binary search on a linked list? Yes, please!
Popular Content
Subscribe to my newsletter to get updates on new posts and email only specials.
Mar 20, 2025
A data structure that can simulate a binary search on a linked list? Yes, please!
Subscribe to my newsletter to get updates on new posts and email only specials.
During my last data structures lecture, our professor was talking about priority queues, selection and insertion sorts. And he mentioned, as a non-examable material, how we might achieve a faster insertion when considering a sorted sequence-based priority queueue.
Considering the interface of a priority queue, we need to implement three main methods:
We note that the last two methods could also be implemented as one: which in addition to removing the minimum element, it also returns it. What's more, the implementation details what we mean by a "sequence-based priority queue on a sorted list". Specifically, the implementation for this data structure would be to store elements inside the priority queue (as the elements are pushed) as a sorted list, so that and would be in time complexity.
A naïve approach to this would be to, as inserting into the priority queue, going through the entire list and finding where that element belongs. Consider the example below, when we're adding 52 to the sorted list, we need to compare 52 with all of the numbers in the priority queue: 7, 9, 13, and 21. This means that an insertion would take time.
What's unlucky is, if we were to implement a sorting algorithm from this, we would get time complexity, due to the insertion taking too long. How do we improve this?
My first thought to this problem was to just binary search the elements already inside the priority queue. We already know that the list is sorted, so we could just search the elements in time. The tricky thing is we want to insert into the list, and the list is stored as a continuous array, meaning that, even if we were to find the index we want to insert in time, we still need to copy over the elements to hold the structure of the array — it still has to be continuous in memory — which would take time. So... We aren't so fast after all.
Okay, so we need to find a way to insert quite quickly in a list... And it hit me: the answer is linked list. If we were to store the elements as a linked list, when we find the location of the insertion, adding the element is as simple as changing the head pointers. But, how do we find where to insert the element. It's not like we can binary search a linked list — spoilers...
Well, the current data structure only uses memory — just for storing the list — so could we use some more memory to get a lower time complexity? Let's try storing the partial solutions somewhere as we insert into the priority queue, just to avoid some recalculations. We once more consider the example given above. From the elements already inside the priority queue, we had already calculated that . When 52 came in, we should probably directly compare 52 with 21, without comparing with all of the other numbers.
Let's try thinking about this with a linked list. We say we want to insert 19 to the linked list:
For this case, we would compare with 7, 9, 13, and 21. When we see that , we would put 19 in between 13 and 21. As expected this takes time, only if there was a faster way to do this by skipping some nodes in the linked list. What if we had something like:
This would allow us to compare with 7, use the express lane to 13, compare with 13, and finally compare with 21. If we had this in a large list with optimally placed express lanes, this could save tremendous amounts of time. More formally, we would have something like:
With this kind of linked list, if we were to add 32, we would insert it as:
This just takes three comparisons, instead of the usual seven. This seems like a good start, but how do we construct this linked list? How do we know where to put these express lanes? I mean, it seems quite arbitrary...
This data structure is called a skip list or a skip linked list, since we have these express lanes to skip over some elements.
For ease of implementation, we need to add a sentinal or head node to the beginning of the skip list. This allows us to easily add to or remove from the start of the skip list.
Furthermore, when we think of skip lists conceptually, we see that they are a sequence of singly linked lists , where each contains a subset of the items in .
As you can see, can be considered to be the given list that will be inserted to the skip list. We also say that for an element in a skip list, we call the height of the largest value such that appears in . So, for example, elements that only appear in have a height of zero.
Finally, we see that the expected length of the search path for any node in is at most . I will upload another blog post with the proof, but for now, you can just assume this finding. This result shows us the power of this data structure: we can now find an element in logarithmic time, with a type of linked list that allows us easy deletion and insertion.
Before discussing the actual construction of the skip list, let's look at how we will be storing the information of the skip list.
We define a Node to be consisting of a data value , and an array of pointers, where points to 's successor in the list . For our example, if is the first node (with the value 1), then we say is the second node (with the value 7), or is the sixth node (with the value 21).
An implementation of this node structure in C++ would be as follows.
This way of implementing the structure allows us to have low space complexity. If is the length of our skip list and is the height of our skip list, then the memory complexity would be . What's more, we already know the value of . Since the search path of finding an element is no greater than , the height can't be greater than too, meaning that the memory complexity of this data structure is actually . Quite good, huh?
Do you remember when I said the heights and the placement of these express lanes were quite arbitrary? In fact... they are random! We are actually able to generate from by throwing coins. Specifically, for each element in , we throw a coin, and if it comes up tails, we add that element to . This process ends when we create a list that is empty.
If you had a keen eye, you would spot that this doesn't help much in our case, since we're not creating a skip list from a known list of elements, we are adding to the skip list one-by-one — that is literally the whole point of a priority queue. But if you had an even more keen eye, you would see that the height of an element corresponds to this thought experiment:
Toss a coin repeatedly until the first time it comes up heads. How many times did it come up tails?
When we are adding an element into the skip list, we can recreate this to determine the height of .
Also notice that this probabilistic behaviour doesn't need to be limited by a coin toss (which is a probability), we can change this value to change the memory and time complexity to be as desired.
First, let's look at the main structure of our skip list class.
In the constructor, we set the maximum level (the height) that the skip list can have and the probability value mentioned above. We also create the head (or the sentinel) node which will be the entry point to our skip list.
To add an element into the priority queue, we must first find the location we want to add; in other words, we need to find the greatest element such that .
Following the search path for is easy enough: when situated at some node in , we look right to . If , then we take a step to the right in , otherwise we move down into .
In the end, would be pointing to the element we want. Now, we can just roll for the height of the new element and change the connections inside the linked lists.
The problem is... We're not going to just connect the new element from . We might roll quite unluckly and get a tall node, which we would need to connect it with the previous nodes from the start. This means that we need to store some of the nodes as we process through the skip list — going downwards. For this, we just need to keep track of the nodes at which the search path goes down from some list to . We can store this in an array (or a for you C++ heads) . Precisely, is the node in where the search path proceeds down to . Therefore, the nodes that we will modify to insert the new element are precisely the nodes . Keeping track of the vector is easy for the search path too. The change is:
Now, we need to get the height of the new element: . For this, we can continously get a random number and check if it is less than our value.
Note that, since I'm using C++ here, returns a random integer value, instead of a random number between 0 and 1, like most other programming languages, which is why we need to divide by . That while loop can be done with a do-while loop, but I leave that to you to be more cool.
As a side note, you can also generate a random integer and check the number of leading ones to determine the height.
If the new height we get from the method is greater than the height of the skip list, we need to increase it.
Finally, we create our new node and connect it up with the rest of the skip list.
All in all, we have the following for adding an element into the skip list.
Continuing on, is trivial. We only need to get the first element, and we use .
I'm returning the default value for our type here, but you could also just throw an exception if the size is zero.
Finally, we only got to implement. Since, we're just removing the first element from the skip list, we can just change the pointers of to the element after the element that it is currently pointing at. We also need to consider the height of our entire skip list going down, which only happens when the element we removed had no successors.
And... That's it! If we put everything together, we get the following.
In this blog post so far, I've talked about how a skip list structure can be used to implement the interface of a priority queue; however, skip lists are much more than that. Particularly, we can use them to implement a set. Skip list implemention of a set have the same asymptotic expected time bounds as balanced trees (or red-black trees, just like how C++ implements it) and are simpler, faster, and use less space. You can find that implementation on the GitHub repo for this post, in addition to the priority queue implementation of skip list in C++ and Java.
Apart from using this algorithm in a job interview, skip lists are highly useful in implementing concurrent priority queues with less lock contention (see: Skiplist-Based Concurrent Priority Queues). This is useful in decentralised contexts.
But, do you know what's the issue with skip lists? It's obviously that I can't access and modify any element I want. But, in fact, we can create an efficient random-access list/vector/array implementation with skip lists. To do it, we only need to consider the "length" of these express lanes inside the skip list.
Notice that the length of the upper layers is the sum of the lengths of the layer underneath it. This way, we can go through the skip list, counting all the lengths until we find our desired location. I've added this implementation in the GitHub repo as well.
u
u
val
next
u
u
u
u
update
update[0], update[1], ..., update[r]
update
PROB
RAND_MAX
head
head
push(T x)
top()
pop()
removeMin()
top()
pop()
u.next[i]
u.next[0]
u.next[3]
1template<typename T>2class Node {3public:4T val;5vector<Node<T> *> next;6Node(T _val, int h) : val(_val) {7next = vector<Node<T> *>(h + 1);8}9};
1template<typename T>2class SkipListPQ {3public:4int MAX_LEVEL;5double PROB;6int level;7int size;8Node<T> *head;910SkipListPQ(int max_level, double prob) : MAX_LEVEL(max_level), PROB(prob) {11level = size = 0;12head = new Node<T>(T(), MAX_LEVEL);13}1415int get_size() { return size; }16};
u.next[i].val
x >= u.next[i].val
1Node<T> *u = head;2for (int i = level; i >= 0; i--) {3while (u->next[i] != nullptr && u->next[i]->val < x) {4u = u->next[i];5}6}
std::vector
update[i]
1Node<T> *u = head;2vector<Node<T> *> update(MAX_LEVEL + 1);3for (int i = level; i >= 0; i--) {4while (u->next[i] != nullptr && u->next[i]->val < x) {5u = u->next[i];6}7update[i] = u;8}
int h = random_level();
1int random_level() {2int h = 0;3double r = rand() / (double) RAND_MAX;4while (h < MAX_LEVEL && r < PROB) {5h++;6r = rand() / (double) RAND_MAX;7}8return h;9}
rand()
1int h = random_level();2if (h > level) {3for (int i = level + 1; i <= h + 1; i++) {4update[i] = head;5}6level = h;7}
1Node<T> *n = new Node<T>(x, h);2for (int i = 0; i <= h; i++) {3n->next[i] = update[i]->next[i];4update[i]->next[i] = n;5}
1void push(T x) {2Node<T> *u = head;3vector<Node<T> *> update(MAX_LEVEL + 1);4for (int i = level; i >= 0; i--) {5while (u->next[i] != nullptr && u->next[i]->val < x) {6u = u->next[i];7}8update[i] = u;9}1011int h = random_level();12if (h > level) {13// We need to increase the height of the skip list14for (int i = level + 1; i <= h + 1; i++) {15update[i] = head;16}17level = h;18}1920Node<T> *n = new Node<T>(x, h);21for (int i = 0; i <= h; i++) {22n->next[i] = update[i]->next[i];23update[i]->next[i] = n;24}2526size++;27}
top()
1T top() {2if (size == 0) return T();3return head->next[0]->val;4}
pop()
1bool pop() {2if (size == 0) return false;3for (int i = level; i >= 0; i--) {4// We only remove from the head5if (head->next[i] != nullptr) {6head->next[i] = head->next[i]->next[i];7if (head->next[i] == nullptr) {8// Skip list size has gone down9level--;10}11}12}13size--;14return true;15}
1template<typename T>2class Node {3public:4T val;5vector<Node<T> *> next;6Node(T _val, int h) : val(_val) {7next = vector<Node<T> *>(h + 1);8}9};
1template<typename T>2class SkipListPQ {3public:4int MAX_LEVEL;5double PROB;6int level;7int size;8Node<T> *head;910SkipListPQ(int max_level, double prob) : MAX_LEVEL(max_level), PROB(prob) {11level = size = 0;12head = new Node<T>(T(), MAX_LEVEL);13}1415int get_size() { return size; }1617int random_level() {18int h = 0;19double r = rand() / (double) RAND_MAX;20while (h < MAX_LEVEL && r < PROB) {21h++;22r = rand() / (double) RAND_MAX;23}24return h;25}2627void push(T x) {28Node<T> *u = head;29vector<Node<T> *> update(MAX_LEVEL + 1);30for (int i = level; i >= 0; i--) {31while (u->next[i] != nullptr && u->next[i]->val < x) {32u = u->next[i];33}34update[i] = u;35}3637int h = random_level();38if (h > level) {39// We need to increase the height of the skip list40for (int i = level + 1; i <= h + 1; i++) {41update[i] = head;42}43level = h;44}4546Node<T> *n = new Node<T>(x, h);47for (int i = 0; i <= h; i++) {48n->next[i] = update[i]->next[i];49update[i]->next[i] = n;50}5152size++;53}5455T top() {56if (size == 0) return T();57return head->next[0]->val;58}5960bool pop() {61if (size == 0) return false;62for (int i = level; i >= 0; i--) {63// We only remove from the head64if (head->next[i] != nullptr) {65head->next[i] = head->next[i]->next[i];66if (head->next[i] == nullptr) {67// Skip list size has gone down68level--;69}70}71}72size--;73return true;74}75};
std::set