First Annual SIGMOD Programming Contest
Main Memory Transactional Index

The SIGMOD 2010 contest has been announced!

The 2009 contest is over! The winner was Clément Genzmer, from Ecole polytechnique. Please see our results page for more information.

Student teams from degree granting institutions are invited to compete in a programming contest to develop an indexing system for main memory data. The winning team will be awarded a prize of $5,000. Submissions will be judged based on their overall performance on a supplied workload. The top three submissions will be invited to compete in a "bakeoff" to be held at SIGMOD; up to two students from each team will receive travel grants to attend the conference.

Updates
top
Please subscribe to our contest mailing list to receive updates and notifications (this will be a low-traffic mailing list.)

Task overview
top

The goal of the contest is to design an index for main memory data. The index must be capable of supporting exact match queries and range queries, as well as updates, inserts, and deletes. The system must also support serializable execution of user-specified transactions. The choice of data structures (e.g., B-tree, AVL-tree, etc.) as well as the mechanism for enforcing serializability (locking, OCC, one-at-a-time) is up to you. The system does not need to support crash recovery.

Contestants must supply the source code for their entries, and agree to license their code under the BSD or MIT open source license should their system win the contest.

Submissions may be written in any language, but an x86 shared-library and source code that conforms to a supplied build environment will be required.

Important Dates
top
December 12, 2008 Detailed specification of the system will be available on the website given above. (Note that we have pushed this date back on account of the SIGMOD deadline.)
January 20, 2009The complete workload will be made available.
March 31, 2009Entries due -- submission details will be posted on this website.
April 30, 2009Finalists notified.
We will host a final competition amongst finalists at SIGMOD; details TBD.

Task Details, API and Test Implementation
top

The data structure is a collection of {key, payload} pairs. Your system must be able to index 32-bit integers (int32), 64-bit integers (int64) and variable-length characters strings of 128 bytes or less (varchars). Keys cannot be assumed to be unique.

The payload is a variable length null-terminated string between 10 and 100 bytes.

There are no restrictions on representation or compressions, but your system must be capable of supporting any number of indexes of the above form, as long as the total amount of "raw" (uncompressed) information does not exceed 4 Gbytes.

API

The commands which your system must support are specified in the API, which can be downloaded from the following link:

An example implementation of the API was created using Berkeley DB. A set of unit tests have been provided which all valid implementations should pass. The example implementation and unit tests can be downloaded using the following links:

More unit tests and our complete test driver will be released mid-January, but these should help get you started. You can also download these files, along with the API header file and Makefile in a tarball here using this link:

Please note that you will need to modify the defines (e.g., BASE) at the beginning of the Makefile to point to your installation of BerkeleyDB.

Running the Test Cases

In order to run the Berkeley DB implementation, you must have Berkeley DB installed on your machine. You can download it for free from this site:

http://www.oracle.com/technology/products/berkeley-db/index.html

If Berkeley DB was installed in /usr/local/BerkeleyDB.4.7, you can and run this command using our default Makefile (if BerkeleyDB is installed in some other location, you will have to modify the Makefile -- see the comments at the beginning of the Makefile):

make test

To compile on a MacOS machine, use the command:

make macos

The binary contest created by this can be executed to run the unit tests:

./contest

This will produce a directory ENV into which the Berkeley DB database and environment information is stored, as well as an output file error.log in which error messages are recorded (some error messages are expected while running these unit tests, as they purposefully test for invalid keys, etc.). Note that the test cases will not pass if the ENV directory already exists when the contest binary is run; the make test target deletes this directory before running.

The current batch of unit tests run through all of the API calls via three threads: two threads run over the same index and check for basic transactional logic, while the third thread runs on a separate index, and so should not see any data from the first two threads. Furthermore, one of the threads uses two indices at the same time.

Your Implementation, Submission, and Our Hardware Configuration
top

As with our test cases, the final driver will be written in C (with some scripting support in Python). It will connect to your implementation via a number of threads (implemented using the pthreads package.) You must supply a Linux shared library (.so file) called lib.so, which can be built as in our supplied Makefile, e.g.:

gcc -shared -I /usr/include/db4/ /usr/lib/libdb.a bdbimpl.c -o lib.so

In the above example, we have linked in BerkeleyDB since our test implementation depends on it. Your implementation probably won't use BerkeleyDB!

We will build our contest binary that links against this shared library as follows (our supplied Makefile also includes this step):

gcc unittests.c ./lib.so -pthread -o contest

We will run on a 64-bit x86 machine running Red Hat Fedora Core 10. We will supply additional details of the test machine when they become available, but we anticipate using a new-ish Quad Core machine such as the Dell PowerEdge 1900 Enhanced (see this page), with 8 GB of main memory.

We will provide a submission site where you can upload your implementation as the final deadline approaches.

Benchmark
top

The benchmark can be found here: speed_test.c. We have also supplied a python script which will execute test files provided to it, provided that they have a run() method.

The benchmark will consist of approximately 50 concurrent streams of transactions, each executed by a separate thread. The system will start out with no data. A driver program will create between 1 and approximately 50 indexes totaling not more than 4 Gbytes. Each index will be created by a different thread. Then the driver will have all of the threads perform inserts to populate the index structures. The data type of the key and its distribution are not known in advance. Lastly, each thread will run a collection of transactions.

There are three kinds of transactions that will be used in the benchmark:

  1. (10%) Scan: Transaction does a get followed by K-1 getNext commands. K is uniformly distributed between 100 and 200.

  2. (30%) Get: Transaction does a collection of L get commands. L is uniformly distributed between 20 and 30.

  3. (60%) Update: Transaction does a collection of M (insert, delete) pairs. M is uniformly distributed between 5 and 10.

However, your implementation must be able to handle any kind of transaction possible given the API, not just these three types of interactions (see the unit tests for some examples of other transactions your code should be able to handle).

The driver program will run on the same machine as your indexing implementation.

Your implementation must give the same answer as some serial execution of these transactions. You can decide how to achieve serializability.

Each thread issue the next command when the answer to the previous command is returned. If you choose to abort a transaction, then it will be immediately retried by the same thread.

Your implementation must correctly handle the "phantom problem".

Your code must be multi-threaded, so that it can support simultaneous connections from the 50 threads (implemented using the pthreads library). These threads will first load a collection of {key, pointer} pairs in a collection of transactions as noted above. Then, the threads will submit a mix of queries of updates.

The driver will run some number of transactions, in the ratios specified above, and the score for your submission will be based on the time taken to complete those transactions.

Organizers
Samuel Madden (madden@csail.mit.edu), MIT
Michael Stonebraker (stonebraker@csail.mit.edu), MIT

Please contact us at sigmodcontest09@nms.csail.mit.edu.

Sponsorship
The contest is supported by a grant from the NSF (IIS-0848727). Prizes will be donated by Microsoft and Vertica Systems.