skiplist

Description

NOTE: Updated January 20, 2006!

Upon Eric Anholt's request, I have added a license to the skiplist files. They are now offically licensed under the 2-clause BSD license. The license change is the only change I made between v1.0 and v1.1.

Thanks to Ariel Burbaickij who pointed out a slight bug in my skiplist code that was allocating too much memory. I have since fixed this bug, and skiplists now only uses about 20% more memory (for a 6 million item, height 24 skiplist) than B-Tree does.

I have also run some preliminary numbers and if you have complex compares that take up cpu time, then skiplists are not good. On a run of only 1 million nodes, skiplists spent twice as much time in intcmp than B-Tree did. If you use complex indirection, this will end up eating up skiplist's simple alogorithm benefits.

Skiplists will also thrash your cache more. Skiplists do not maintain as much locality and have to pull more pages from main memory. At first, skiplists will be able to maintain some locality, but as time passes, skiplists will fragment memory causing heavy cache misses. B-Tree maintains locality by sticking more data together in each node. For a 32bit machine with a nodesize of 1024 bytes, there are 83 data elements in each node. This means for each main memory hit, you eliminate 83 possibilities, in skiplists you could end up having to go to mainmeory 83 times!

Memory is the bottle neck of modern computers. If you don't use that memory efficiently and effectively, you won't be able to achieve maxium performance.

I will try to run some numbers and also introduce some more complex tests.

1999

You may have read about how much faster and better skiplists are over normal binary trees. After reading about skiplists, I immediately had to go out and do some performance testing as people were reporting that skiplists used less memory and performed better than binary trees. I had already writen a B-Tree library, so I went and looked at various skiplist implementations. After none of them being easy to use, I decided to write my own implemenation. I did some testing and my beliefs were correct, skiplists DO take up significantly more memory, and for large data sets (that require swapping) perform terribly compared to a B-Tree. The problem is that you have to have n pointers for 2^n elements. Most skiplist implementations hard code n, which is usually between 16 and 20. This means that as long as you limit the number of elements, you won't have problems, but why not use a hash table then? You still have the upper limit on performance.

I can regenerate the numbers, but my B-Tree implementation hits swap much later than the skiplist does.

In Skip Lists vs. B-Trees by Michael D. Black, he talks about much faster skiplists are over B-Trees. He even goes as far as saying that to keep a B-Tree balanced requires extra time to keep balanced. Obviously, he has not studied many basic algorithm books. One popular tree that is taught is the Red-Black tree which is proven to be always balanced. As part of the B-Tree algorithm, it is assumed that you keep the tree balanced. This is not hard to do once you understand exactly how B-Trees work. If you would like to learn more about basic data structures and algorithms, please read Introduction to Algorithma by Cormen, Leiserson and Rivest.

Download

Source for the skiplist package as of Fri Jan 20 2006. The latest release is v1.1 done on Fri Jan 20 2006.

Usage and Details

Functions

struct skiplist *createskiplist(int (*)(void *, void *), int maxlvl, int chunk, void *min, void *max);
The function createskiplist allocates and sets up the necessary information to store data. It requires the following arguments:
- cmp_func
  This function is used to compare and order the data you insert into the skiplist. Return < 0 if the first argument is less than the second. Return > 0 if the first argument is greater than the second. Return 0 if they are equal.
- maxlvl
  This is the upper bound on how many levels are possible. This should be log2(n), where n is the maximum elements you plan to have in your skiplist at a time. There are no hard limits, so you could have maxlvl be 4, and put in five million elements, but your performance would be terrible. Recommended value is 20 if your not sure.
- chunk
  This is the probability that you'll continue to the next level. Recommended value is 1 which gives a log2 style tree.
- min
  This is a data element that will always compare less than any data you pass in. Equivalent to -inf.
- max
  This is a data element that will always compare greater than any data you pass in. Equivalent to inf.
int insertkey(struct skiplist *sl, void *k);
Add data k to skiplist sl
void *searchkey(struct skiplist *sl, void *k);
Find data k in skiplist sl and return data.
void printskiplist(struct skiplist *sl);
Print out the contents of skiplist sl.
void *deletekey(struct skiplist *sl, void *k);
Delete data k from skiplist sl and return the value that was removed. Data returned may be different from k, but will compare equal by cmp_funct.