Hector
A high level Java client for Apache Cassandra
Fork me on GitHub CloudBees

This Page

Virtual KeyspacesΒΆ

One of the use cases for Keyspaces in Cassandra was for multi-tenant applications. Unfortunately, Keyspaces use a lot of memory, to the degree that it’s unlikely that you’d be running with any significant number of them. In the discussion on this topic on the Cassandra-user mailing list, the consensus was that if you needed Keyspace-like functionality but for a large number of Keyspaces (i.e. user can point and click to create a new Keyspace), that you would have to use a single static Keyspace and to simulate the idea of different keyspaces in the application layer. Hector now has a feature that makes it much simpler to maintain these “virtual keyspaces” within your application.

The approach used is similar to the “Shared Database, Shared Schema” approach to multi-tenancy often used with conventional RDBMS, where an additional column is added to all the tables in a database schema that contains some sort of tenant-id (see [1] and [2]). With Cassandra, while adding an additional column may make sense for indexed queries, much of the time you’re working with row keys. One potentially simple way to implement a virtual keyspace model in Cassandra, therefore, would be use a “Shared Keyspace, Shared Column Families” approach where we prepend the tenant-id to every CF row key value. Hector does a good job of abstracting access to Cassandra away from Thrift and the native Thrift data structures, and it passes all operations through the KeyspaceService interface, which is implemented by KeyspaceServiceImpl. Virtual keyspaces are implemented by a subclass of KeyspaceServiceImpl, called VirtualKeyspaceServiceImpl, which adds the tenant-id prefix to all row keys that are sent to Cassandra, and removes the tenant-id prefix from all keys that are returned, while discarding keys where the returned key doesn’t contain a matching tenant-id prefix. This should have the effect of completely hiding rows that aren’t in your virtual keyspace. Keep in mind, though, that you may very well want to still use an indexed tenant-id column in your CF if you’re doing things like lots of indexed queries. While the virtual keyspace code will discard returned rows where the row key isn’t prefixed with the correct tenant-id, if you also have an indexed tenant-id column and it’s specified in your indexed query, then it’s going to be more efficient than relying on the virtual keyspace code to filter out a large number of returned rows. Adding and using that tenant-id column is currently left up to you, the virtual keyspace code doesn’t handle that part.

In order to make use of this, you call HFactory.createVirtualKeyspace rather than HFactory.createKeyspace, and you specify a prefix and a serializer for that prefix. Unless you do this, none of new virtual keyspace code will be in your execution path, so there wont be any risk of effect to your existing applications. You should ideally only use this with a clean empty physical keyspace. You should make sure that you use prefixes that all serialize to byte arrays of equal length, the expectation is that typically the prefix will be a UUID. Note that, although we do support any prefix type that we have a serializer for, that the OrderPreservingPartitioner does expect that row keys are UTF8 encoded, so if using that partitioner, your prefix should also be a UTF8 string or the OPP will reject your keys. In this case, you may still want to use a UUID, but use uuid.toString() as your prefix and the StringSerializer, that way a fixed length UTF8 string representation of the UUID will be prepended, and the OPP will still be happy.

The unit test for it is currently a subclass of ApiV2SystemTest that performs all the same tests but using a prefixed keyspace. It’s in VirtualKeyspaceTest.java and you can look at it as an example of all that’s required to get a keyspace operator that will preform the automatic prefixing.

Note: Despite the name, neither HFactory.createKeyspace or HFactory.createVirtualKeyspace actually create a Cassandra keyspace. These methods create a keyspace object that Hector uses for performing operations on the actual Cassandra keyspace. Nothing special needs to be done to prepare a Cassandra keyspace for virtual keyspace usage beyond what you’d normally do to create a keyspace in Cassandra, and Hector provides an addKeyspace method in Cluster.java that performs this operation.

[1]http://iablog.sybase.com/kleisath/index.php/2009/11/multi-tenant-database-architecture-part-5

[2] http://msdn.microsoft.com/en-us/library/aa479086.aspx”:http://msdn.microsoft.com/en-us/library/aa479086.aspx