| Updates | Task Overview | Important Dates | Task Details, API, and Example | Your Implementation | Benchmark | Sponsorship |
| Updates | Task Overview | Important Dates | Task Details, API, and Example | Your Implementation | Benchmark | Sponsorship |
Student teams from degree granting institutions are invited to compete in a programming contest to develop an indexing system for main memory data. The winning team will be awarded a prize of $5,000. Submissions will be judged based on their overall performance on a supplied workload. The top three submissions will be invited to compete in a "bakeoff" to be held at SIGMOD; up to two students from each team will receive travel grants to attend the conference.
We have created a system that allows old jobs to be re-run using the new code for speed_test. Both the bug and the new key/payload generation system may have affected the run times for your implementations, so you may want to re-run old tests with the updated code to see how yours is affected. We will also re-run all of the leaderboard tests ourselves using the new code, and require that any future leaderboard entries are run using the new code.
Finally, we have added a second leaderboard that performs 10x as many inserts and queries. When you upload a test you have an option of selecting which leaderboard tests you would like to run, and the leaderboard page now displays two sets of results.
We also created a leaderboard for the fastest, valid (i.e., pass the unit tests) implementations.
We have decided not to count SLA violations as previously specified. Instead, the overall run time, with respect to the number of transactions completed, is what matters.
Also, changes were made to the following files:
speed_test.c: The way the random data used for the tests is generated has been changed so that the time it takes to do so is no longer included in the run time. Also made the main() method able to receive an initial state so that it can be run multiple times and report cumulative results.
harness.py: Loops to run the speed_test twice, and then runs the unit tests and reports whether they pass as well. The results reported are cumulative for the speed_test.
Makefile: Some slight adjustments to the types of files made and how they are cleaned up.
bdbimpl.c: The ordering of the integral keys has been adjusted to account for byte order of the machine and the sign of the number.
The goal of the contest is to design an index for main memory data. The index must be capable of supporting exact match queries and range queries, as well as updates, inserts, and deletes. The system must also support serializable execution of user-specified transactions. The choice of data structures (e.g., B-tree, AVL-tree, etc.) as well as the mechanism for enforcing serializability (locking, OCC, one-at-a-time) is up to you. The system does not need to support crash recovery.
Contestants must supply the source code for their entries, and agree to license their code under the BSD or MIT open source license should their system win the contest.
Submissions may be written in any language, but an x86 shared-library and source code that conforms to a supplied build environment will be required.
| December 12, 2008 | Detailed specification of the system will be available on the website given above. (Note that we have pushed this date back on account of the SIGMOD deadline.) |
| January 20, 2009 | The complete workload will be made available. |
| March 31, 2009 | Entries due -- submission details will be posted on this website. |
| April 30, 2009 | Finalists notified. |
| We will host a final competition amongst finalists at SIGMOD; details TBD. |
The data structure is a collection of {key, payload} pairs. Your system must be able to index 32-bit integers (int32), 64-bit integers (int64) and variable-length characters strings of 128 bytes or less (varchars). Keys cannot be assumed to be unique.
The payload is a variable length null-terminated string between 10 and 100 bytes.
There are no restrictions on representation or compressions, but your system must be capable of supporting any number of indexes of the above form, as long as the total amount of "raw" (uncompressed) information does not exceed 4 Gbytes.
The commands which your system must support are specified in the API, which can be downloaded from the following link:
An example implementation of the API was created using Berkeley DB. A set of unit tests have been provided which all valid implementations should pass. The example implementation and unit tests can be downloaded using the following links:
BASE) at the beginning of the Makefile to point to your installation of BerkeleyDB.
Running the Test Cases
In order to run the Berkeley DB implementation, you must have Berkeley DB installed on your machine. You can download it for free from this site:
http://www.oracle.com/technology/products/berkeley-db/index.htmlIf Berkeley DB was installed in /usr/local/BerkeleyDB.4.7, you can and run this command using our default Makefile (if BerkeleyDB is installed in some other location, you will have to modify the Makefile -- see the comments at the beginning of the Makefile):
make test
To compile on a MacOS machine, use the command:
make macos
The binary contest created by this can be executed to run the unit tests:
./contest
This will produce a directory ENV into which the Berkeley DB database and environment information is stored, as well as an output file error.log in which error messages are recorded (some error messages are expected while running these unit tests, as they purposefully test for invalid keys, etc.). Note that the test cases will not pass if the ENV directory already exists when the contest binary is run; the make test target deletes this directory before running.
The current batch of unit tests run through all of the API calls via three threads: two threads run over the same index and check for basic transactional logic, while the third thread runs on a separate index, and so should not see any data from the first two threads. Furthermore, one of the threads uses two indices at the same time.
As with our test cases, the final driver will be written in C (with some scripting support in Python). It will connect to your implementation via a number of threads (implemented using the pthreads package.) You must supply a Linux shared library (.so file) called lib.so, which can be built as in our supplied Makefile, e.g.:
In the above example, we have linked in BerkeleyDB since our test implementation depends on it. Your implementation probably won't use BerkeleyDB!
We will build our contest binary that links against this shared library as follows (our supplied Makefile also includes this step):
We will provide a submission site where you can upload your implementation as the final deadline approaches.
The benchmark can be found here: speed_test.c. We have also supplied a python script which will execute test files provided to it, provided that they have a run() method.
The benchmark will consist of approximately 50 concurrent streams of transactions, each executed by a separate thread. The system will start out with no data. A driver program will create between 1 and approximately 50 indexes totaling not more than 4 Gbytes. Each index will be created by a different thread. Then the driver will have all of the threads perform inserts to populate the index structures. The data type of the key and its distribution are not known in advance. Lastly, each thread will run a collection of transactions.
There are three kinds of transactions that will be used in the benchmark:
get followed by K-1 getNext commands. K is uniformly distributed between 100 and 200.
get commands. L is uniformly distributed between 20 and 30.
insert, delete) pairs. M is uniformly distributed between 5 and 10.
However, your implementation must be able to handle any kind of transaction possible given the API, not just these three types of interactions (see the unit tests for some examples of other transactions your code should be able to handle).
The driver program will run on the same machine as your indexing implementation.
Your implementation must give the same answer as some serial execution of these transactions. You can decide how to achieve serializability.
Each thread issue the next command when the answer to the previous command is returned. If you choose to abort a transaction, then it will be immediately retried by the same thread.
Your implementation must correctly handle the "phantom problem".
Your code must be multi-threaded, so that it can support simultaneous connections from the 50 threads (implemented using the pthreads library). These threads will first load a collection of {key, pointer} pairs in a collection of transactions as noted above. Then, the threads will submit a mix of queries of updates.
The driver will run some number of transactions, in the ratios specified above, and the score for your submission will be based on the time taken to complete those transactions.
Please contact us at sigmodcontest09@nms.csail.mit.edu.