Is it a correct template to build the compound primary key by using large column stores?


HBase and Cassandra are built as wide column stores, using the concepts of both rows and columns.

A row is composed of a key similar to the concept of primary key in RDBMS and a value composed of several columns

A representation can be the following:

*******|    Key     |                   Value
Colunms|            |     name    |                 value
       |     a      |   title     | "Building a python graphdb in one night"
       |     b      |   body      | "You maybe already know that I am..."
       |     c      | publishedat |              "2015-08-23"
       |     d      |   name      |                database

       |     e      |   start     |                   1
       |     f      |    end      |                   2

            ...          ...                         ...

       |    u       |   title     |     "key/value store key composition"

            ...          ...                         ...

       |    x       |   title     |    "building a graphdb with HappyBase"

            ...          ...                         ...

Is it correct at the application layer, to build composed primary keys to allow to iterate quickly over colocated rows.

This can be reprensented as follow.

*******|           Key            |                 Value
Colunms| identifier |  name       |                 value
       |     1      |   title     | "Building a python graphdb in one night"
       |     1      |   body      | "You maybe already know that I am..."
       |     1      | publishedat |              "2015-08-23"
       |     2      |   name      |                database

       |     3      |   start     |                   1
       |     3      |    end      |                   2

            ...          ...                         ...

       |     4      |   title     |     "key/value store key composition"

            ...          ...                         ...

       |     42     |   title     |    "building a graphdb with HappyBase"

            ...          ...                         ...

The name column moved from the Value to the Key and Value has a single column name value.

Compound keys are used all the time when designing Cassandra schemas.

In C*, the keys are broken down into two parts, the partition key, and clustering columns.

The partition key is used to hash data to nodes within the cluster. A partition is a bucket of data that can hold a single row or multiple rows based on the clustering columns. Data within a partition is local to a node and is kept in sorted order by the clustering keys, which makes accessing data within a partition fast and efficient, with support for range queries on the clustering keys.

C* also allows data fields, which are not part of the compound key, and are not generally used in queries unless you create a secondary index on them.

The "wide column" terminology is a little outdated for C*. In the current CQL view of things, data is thought of in more traditional terms as rows in a table, that are grouped into efficient to access partitions.

So to answer your question, yes in C* it is common to move columns that might have been thought of as data columns in an RDBMS to be part of the compound key in C*.

To see more information on partition keys and clustering columns, and how they impact the types of queries you can do, see a deep look at the CQL WHERE clause.