CStore version 0.2 October 2006 Active code authors: Daniel Abadi (dna@mit.edu) Alex Rasin (alexr@cs.brown.edu) Nga Tran (nga@cs.brandeis.edu) Given that C-Store is a university research project and we currently have no staff members doing generic code maintenance and testing, C-Store has progressed in the past year only in pieces, and only by students who need to do so for paricular research interests. For this reason, some parts of the systems (like the query executer) see large improvements over the last release while other pieces (e.g. tuple mover and WOS) have not been touched at all compared with the last release. Since the WOS is so rudimentory compared with the ROS (WOS stand for write-optimized system, ROS for read-optimized system - see the original C-Store paper) we have decided for this release to have the plan generator only generate plans that query the ROS (and not the WOS). Thus, this release of C-Store is a "pure" read-only column-store with no way to insert or update data other than to recreate the ROS data files. This is still sufficient to do research on certain aspects column-oriented systems. However the version 0.1 release WOS, insert and delete code is still included in case you wish to extend it for more complete research into column-oriented DBMSs. You will find that this new release will be able to handle a wider variety of queries without crashing (or producing incorrect answers) compared with the previous release. However this system is still nowhere near a complete DBMS and the query executer still is known not to work for many queries. If you want to do real column-oriented DBMS querying, we recommend Vertica (http://www.vertica.com) as a good commerical DBMS. This release also contains a column-creator so that you can run C-Store on your own data (rather than the TPC-H data we supply). For this release, the column-creater only works for integer data and works as follows: 1) Create a space delimited file with each column in the file representing a column in your table or "C-Store projection". For example, in the cstore/data directory, there exists the data file lshipdate.sorted.tiny with the following contents: 8036 221 8036 1693 8036 3697 8037 5862 8037 6428 8037 6982 8038 7486 8038 9194 8038 10372 8038 11056 This is a two column projection; the first column is an integer representation of lineitem.shipdate, the second column is lineitem.suppkey. You may have up to 256 columns in your projection (this is an arbitrary number - if you want more simply change the code in cstore/src/UnitTests/ProjMaker.cpp). 2) Run the C-Store Projection/Column-Maker. From the cstore/src directory, run: ./cstoreqptest 0 projectionMaker.cnf global.cnf e.g. for the above example, you'd run ./cstoreqptest 0 projectionMaker.cnf global.cnf ../data/lshipdate.sorted.tiny follow the command line instructions from there. As soon as you are finished with this process, the projection will be queryable in C-Store. Note: for non-integer data, you will still have to create/query these columns by hand. This is an artifact of the query executer being able to handle generic type data but our not updating the plan generator to handle generic type data at the time of the release. The next release will have this fixed. CStore version 0.1 July 2005 Active code authors: Daniel Abadi (dna@mit.edu) Xuedong Chen (xuedchen@cs.umb.edu) Tien Hoang (hoangt@brandeis.edu) Edmond Lau (edmond@mit.edu) Velen Liang (vliang@csail.mit.edu) Alex Rasin (alexr@cs.brown.edu) Nga Tran (nga@cs.brandeis.edu) To compile and run the code, see the README file. This document describes the features of the CStore paper (CStore: A Column-Oriented DBMS, Stonebraker et. al. included as vldb.pdf in these docs) that are included in this release. It also gives an overview of what can be input to the system, and how this translates to code that is run. A detailed overview of the query executer, some of the thinking behind it, and the data structures used is given in Miguel Ferreira's thesis provided in a separate file (thesis.pdf). An overview of the plan generator can be found in: PlanGenerator_20050628.doc. The other pieces of the system (the catalog and the tuple mover) have brief overviews that are given in this document. What we support: We have ROS, WOS, and a tuple mover as described in the paper. WOS is physically laid out in memory as a row store in the release (there's a flag in the constructor that allows it to be laid out as a column store - however this feature is not fully functional after last batch of changes). ROS is, of course, a column store. WOS is uncompressed, whereas ROS has 2 compression types that are fully tested and working well (type 1 and type 4 from the paper), a bunch of extra compression types that we were playing around with for experimentation but we haven't touched in more than a few months and may or may not work well (null suppression, lzo - a variant of lempel ziv, dictionary, and rle II - which is run length encoding without encoding start position), and type 2 compression from the paper (bit-string or "delta-pos" encoding) but this was incompletely implemented by a member of the team who has since left the project and to our knowledge this is buggy. So while type 1 (RLE) and type 4 (uncompressed) are the only ways to store data on disk that are fully functional, other compression types are within epsilon of working. To add a new compression type, all that has to be done is to implement the DataSource for it, the Block (and Block Writer) for it, and add the encoder and decoder. So the system is fairly extensible for new compression types. The tuple mover is not fully integrated into the system, in that it must be run externally to the query system. This is described later in this document. This release is not distributed (cstore runs on one node only). Further, there is only one projection per query since we did not implement join indicies for the release. There is no concept of a transaction, epoch, or timestamp authority. There is no query parallelism - queries must be run sequentially (this includes the tuple mover). The following operators from the paper exist: - Aggregate Operators (sum, avg, count, max, min) - Decompress (though this is not a physical operator in our system - any operator can decompress input when it finds it necessary) - Mask (again, not a physical operator - this is done through position filtering in data sources) - Select (again, not a physical operator - this is done through predication in data sources - though down the road this may need to also be a separate operator) - Bit-string operators (BAnd, BNot, BOr) - NLJoin (but only when the inner table fits in memory, and can only run on uncompressed input) - Project (again, not needed to be a physical operator) The following operators exist, but aren't listed in the paper: - MergeSortedGroups: takes sorted results of an aggregation operation and merges them (useful for merging ROS and WOS results) - Union: takes non-aggregated results of ROS and WOS and merges them. There are four versions of Union, of which only one is implemented in this release. See Union.doc for a more detailed account of Union - Insert and Delete operators (see below in this document) - BlockPrinter: This is the root node of any query tree - it prints the results of a query to screen or to a file - ColumnExtracter: This reads a text file and converts it to binary data for use in creating BerkeleyDB ROS files. This will not be used by the system once the system exclusively builds itself from WOS, though it might be useful in a varied form for storing temporary results. The following operators are listed in section 8 of the paper, but aren't implemented in this release: - Concat (because we only have one projection per table) - Permute (again because we don't have join indicies) - Sort The queries presented in section 9 of the paper are implemented in Query1.cpp through Query7.cpp. The correct answers for these queries are listed in Query1.ans through Query7.ans. Query1U.cpp through Query7U.cpp code the same query plans, but operating only on uncompressed data. The data received through the make data command is identical data we used for the paper. The rest of this document overviews various pieces of the release (except the query executor which is given in thesis.pdf and the plan generator which is given in PlanGenerator_20050628.doc). Tuple Mover: The main function of the tuple mover is to merge out the WOS tuples to the corresponding ROS projections. The WOS is relatively trivial in size compared with ROS, and is kept in memory all the time. The Tuple Mover is invoked when the number of WOS tuples is above a certain threshold. Tuple Mover Merger Algorithm: When the Tuple Mover does a merge-out for the given projection, the WOS tuples must be retrieved in the same sort order as that projection. The projection can have multiple columns in its sort key. The current implementation supports multiple-column sort keys where the first column is in Type1 encoding and the remaining are in Type1A (RLE encoded, but not sorted, so it's possible to have more than one triple per value) except the last one can be either Type1A or Type4 (uncompressed integers, but sorted for each value of the sort-key columns to its left, a temporary placeholder for Type1A.) The Tuple Mover iterates through the WOS tuples in sort order and merges each of them into the new version of the ROS projection column by column. Thus there is one pointer for each ROS column. These pointers are advanced simultaneously. In order to find the exact insert position in the new ROS for given WOS tuple, the sort key columns must be processed first. Through the sort key column processing, the final insert position is determined and the Tuple Mover uses that insert position for the other non-sort-key columns' merge-out. How to run Tuple Mover Merger? To merge the D6 projection (the lineitem fact table), simply run: ./cstoreqptest 0 tm.cnf global.cnf After which, you can run: ./cstoreqptest and issue queries on the resulting merged table, e.g.: SELECT MO_D6_l_shipdate, COUNT (*) FROM MO_D6.data WHERE MO_D6_l_shipdate > 9000 GROUP BY MO_D6_l_shipdate; or SELECT MO_D6_l_suppkey, COUNT (*) FROM MO_D6.data WHERE MO_D6_l_shipdate = 9000 GROUP BY MO_D6_l_suppkey; The major class for Tuple Mover is called TMMerger under the C-Store CVS CSTORE/cstoreqptest/src/TM directory. It provides a simple constructor, which takes only a CatalogInstance pointer and a projection name string. After the TMMerger is instantiated, a simple API TMMerger::doMerge() will start the merge-out process. After the merge-out you can get the statistics by calling printProfile(). Once there is a driver (or manager) that can detect the WOS threshold and has the access to the CatalogInstance, then it can easily construct the TMMerger to merge-out the WOS. After the merge-out the driver will find the best time for switching the newly merged ROS project to the main Catalog and dump the old ROS and remove those merged WOS tuples. As current setup the merge-out ROS BDB tables has "_MO" prefix. As noted above, there currently is no such driver for the tuple mover. The Tuple Mover Merger also consists of other helper classes, which are very straight forward. There is a TmMergerTest class under CSTORE/cstoreqptest directory, which handles all sorts of unit tests. For how to perform unit tests please consult the README. What hadn't been done yet? Currently the merge-out does not deal with the foreign key order, does not handle type2 and type3 encoding column merge-out. There is a new design for merge-out of WOS and ROS column by column instead of looping through all columns at the same time, but this has not not yet been implemented. Plan Generator: Plan generator documentation can be found in: PlanGenerator_20050628.doc Catalog: The catalog maintains an overview of every projection in the RuntimeData directory. At all times there exists a single static instance of the catalog. Any component (TupleMover, PlanGenerator, etc) that wishes to access ROS or WOS can request an access method (ROSAM for ROS access, WOSAM for WOS access and DVAM for DeleteVector both for ROS and WOS) based on projection and column name. To add a new projection user must do the following: Change the size of the array PROJECTIONS and add the name of newProjection to that static array. In the constructor, add the desired columns using: new MetaColumn( name, type, compression type, position, bool sortorder?) CStore will look for corresponding BerkeleyDB data files in local RuntimeData directory. In order to support WOS operation, the user needs to create a WOSManager with the name of a data file from which WOS can be loaded. Note that WOS will record a .TUPLES file for future use, to avoid loading input from the text file on the disk. You will also need to add a mapping in _wosManager, so that this WOSManager can be looked up in the catalog. Current implementation also requires that a WOSManager mapping is made for every column in the projection (so that any column name leads to the same shared WOS). DeleteVectors are supported, but loading or storing their status on disk is not. Therefore the CStore prototype always starts out with a blank DeleteVector. INSERT and DELETE Operation INSERT Currently we support one forms of insertion (to WOS). INSERT Syntax #1: INSERT INTO table-name [(column-name1 [,column-name2]...)] VALUES (insert-value1 [, insert-value2]...) InsertOperator inserts one row at a time. So the idea is to obtain a storage key for the new row, then populate the fields at the end of the WOS table by calling appendTuple using the storage key. The Plan generator for Insert takes queries of this forms and: 1. constructs a WOSManager 2. constructs an InsertOperator 3. calls InsertOperator::initTuple to get storage key 4. calls InsertOperator::populateField to populate each field 5. calls InsertOperator::appendTuple to insert this row Note this next version of INSERT is NOT CURRENTLY SUPPORTED, but will be in the next release. INSERT Syntax #2: INSERT INTO table-name [(column-name1 [,column-name2]...)] VALUES (select-statement) InsertOperator provides appendColumns to support syntax #2. Say we have SQL statement INSERT INTO S (s1, s2, s3) VALUE SELECT t2, t3, t1 FROM T In this case we have three input columns t2, t3, t1 that are inserted into to the columns s1, s2, s3 of the projection S. The Plan generator for this insert takes queries of this forms and: 1. constructs an array of data source pointers that hold input data sources t1, t2, t3 (Operator* dataSource[] = {ds1, ds2, ds3}) 2. constructs a array of pointers that hold indices of column number in S( int columnIndex[] = {3, 1, 2}) 3. calls appendColumns DELETE Two things need to be done to support deletion: 1. Tuples must be marked as deleted in the delete vector of the projection 2. A mechanism must be added to the query plan to filter out deleted entries. Currently we use two delete vectors(list of deleted positions), one for ROS and one for WOS for each projection. So deletion is accomplished by adding the deleted position to the delete vectors. We have implemented item 1 but do not currently support item 2 (e.g., the plan generator has not been modified to filter according to the tuples in the deletion vector, though the test file SelectDV.cpp shows an example of a plan that does this. Data structure: Since deletion is projection-wise and the majority of the access methods/operators are column-wise, we devised DVAM, namely delete vector access method to help support deletion. Delete Tuples: To delete columns/rows from a projection, DeleteOperator provides a couple methods: 1.void deleteTuple(int position, bool isROS) this just adds the position to the delete vector. 2.void deleteTuple(Operator* inOp, bool isROS) this method takes an input data source(inOP) (usually with some predicate on it), iterate through the returned (value, position) pairs the data source returns, and adds the positions to the delete vector. 3.void deleteProjection() deletes the whole projection. 4.void deleteByPositionList(Operator* pList, bool isROS) this method takes a position list and marks the positions that are being deleted. Filter Deleted Tuples: Thanks to DVAM/DVConverter, delete vectors are just like regular columns. To do position filtering using the delete vector - construct an IntDataSource like this: (example is for ROS delete vector) IntDataSource* dsDVROS = new IntDataSource ( wosMgr->getDVAM(true), true, //sorted false, //isROS ); This data source dsDVROS contains a list of positions that are being deleted (again, this is just logical - TM will take care of physically removal of tuples). The data source positions are then passed to a BNot operator to generate the positions that are not deleted, and this list is sent to a positionFilter of a DataSource to get the results values of non deleted entries.