Showing posts with label Cassandra. Show all posts
Showing posts with label Cassandra. Show all posts

Monday, July 7, 2014

Custom Cassandra Data Types

In the blog post Connecting to Cassandra from Java, I mentioned that one advantage for Java developers of Cassandra being implemented in Java is the ability to create custom Cassandra data types. In this post, I outline how to do this in greater detail.

Cassandra has numerous built-in data types, but there are situations in which one may want to add a custom type. Cassandra custom data types are implemented in Java by extending the org.apache.cassandra.db.marshal.AbstractType class. The class that extends this must ultimately implement three methods with the following signatures:

public ByteBuffer fromString(final String) throws MarshalException
public TypeSerializer getSerializer()
public int compare(Object, Object)

This post's example implementation of AbstractType is shown in the next code listing.

UnitedStatesState.java - Extends AbstractType
package dustin.examples.cassandra.cqltypes;

import org.apache.cassandra.db.marshal.AbstractType;
import org.apache.cassandra.serializers.MarshalException;
import org.apache.cassandra.serializers.TypeSerializer;

import java.nio.ByteBuffer;

/**
 * Representation of a state in the United States that
 * can be persisted to Cassandra database.
 */
public class UnitedStatesState extends AbstractType
{
   public static final UnitedStatesState instance = new UnitedStatesState();

   @Override
   public ByteBuffer fromString(final String stateName) throws MarshalException
   {
      return getStateAbbreviationAsByteBuffer(stateName);
   }

   @Override
   public TypeSerializer getSerializer()
   {
      return UnitedStatesStateSerializer.instance;
   }

   @Override
   public int compare(Object o1, Object o2)
   {
      if (o1 == null && o2 == null)
      {
         return 0;
      }
      else if (o1 == null)
      {
         return 1;
      }
      else if (o2 == null)
      {
         return -1;
      }
      else
      {
         return o1.toString().compareTo(o2.toString());
      }
   }

   /**
    * Provide standard two-letter abbreviation for United States
    * state whose state name is provided.
    *
    * @param stateName Name of state whose abbreviation is desired.
    * @return State's abbreviation as a ByteBuffer; will return "UK"
    *    if provided state name is unexpected value.
    */
   private ByteBuffer getStateAbbreviationAsByteBuffer(final String stateName)
   {
      final String upperCaseStateName = stateName != null ? stateName.toUpperCase().replace(" ", "_") : "UNKNOWN";
      String abbreviation;
      try
      {
         abbreviation =  upperCaseStateName.length() == 2
                       ? State.fromAbbreviation(upperCaseStateName).getStateAbbreviation()
                       : State.valueOf(upperCaseStateName).getStateAbbreviation();
      }
      catch (Exception exception)
      {
         abbreviation = State.UNKNOWN.getStateAbbreviation();
      }
      return ByteBuffer.wrap(abbreviation.getBytes());
   }
}

The above class listing references the State enum, which is shown next.

State.java
package dustin.examples.cassandra.cqltypes;

/**
 * Representation of state in the United States.
 */
public enum State
{
   ALABAMA("Alabama", "AL"),
   ALASKA("Alaska", "AK"),
   ARIZONA("Arizona", "AZ"),
   ARKANSAS("Arkansas", "AR"),
   CALIFORNIA("California", "CA"),
   COLORADO("Colorado", "CO"),
   CONNECTICUT("Connecticut", "CT"),
   DELAWARE("Delaware", "DE"),
   DISTRICT_OF_COLUMBIA("District of Columbia", "DC"),
   FLORIDA("Florida", "FL"),
   GEORGIA("Georgia", "GA"),
   HAWAII("Hawaii", "HI"),
   IDAHO("Idaho", "ID"),
   ILLINOIS("Illinois", "IL"),
   INDIANA("Indiana", "IN"),
   IOWA("Iowa", "IA"),
   KANSAS("Kansas", "KS"),
   LOUISIANA("Louisiana", "LA"),
   MAINE("Maine", "ME"),
   MARYLAND("Maryland", "MD"),
   MASSACHUSETTS("Massachusetts", "MA"),
   MICHIGAN("Michigan", "MI"),
   MINNESOTA("Minnesota", "MN"),
   MISSISSIPPI("Mississippi", "MS"),
   MISSOURI("Missouri", "MO"),
   MONTANA("Montana", "MT"),
   NEBRASKA("Nebraska", "NE"),
   NEVADA("Nevada", "NV"),
   NEW_HAMPSHIRE("New Hampshire", "NH"),
   NEW_JERSEY("New Jersey", "NJ"),
   NEW_MEXICO("New Mexico", "NM"),
   NORTH_CAROLINA("North Carolina", "NC"),
   NORTH_DAKOTA("North Dakota", "ND"),
   NEW_YORK("New York", "NY"),
   OHIO("Ohio", "OH"),
   OKLAHOMA("Oklahoma", "OK"),
   OREGON("Oregon", "OR"),
   PENNSYLVANIA("Pennsylvania", "PA"),
   RHODE_ISLAND("Rhode Island", "RI"),
   SOUTH_CAROLINA("South Carolina", "SC"),
   SOUTH_DAKOTA("South Dakota", "SD"),
   TENNESSEE("Tennessee", "TN"),
   TEXAS("Texas", "TX"),
   UTAH("Utah", "UT"),
   VERMONT("Vermont", "VT"),
   VIRGINIA("Virginia", "VA"),
   WASHINGTON("Washington", "WA"),
   WEST_VIRGINIA("West Virginia", "WV"),
   WISCONSIN("Wisconsin", "WI"),
   WYOMING("Wyoming", "WY"),
   UNKNOWN("Unknown", "UK");

   private String stateName;

   private String stateAbbreviation;

   State(final String newStateName, final String newStateAbbreviation)
   {
      this.stateName = newStateName;
      this.stateAbbreviation = newStateAbbreviation;
   }

   public String getStateName()
   {
      return this.stateName;
   }

   public String getStateAbbreviation()
   {
      return this.stateAbbreviation;
   }

   public static State fromAbbreviation(final String candidateAbbreviation)
   {
      State match = UNKNOWN;
      if (candidateAbbreviation != null && candidateAbbreviation.length() == 2)
      {
         final String upperAbbreviation = candidateAbbreviation.toUpperCase();
         for (final State state : State.values())
         {
            if (state.stateAbbreviation.equals(upperAbbreviation))
            {
               match = state;
            }
         }
      }
      return match;
   }
}

We can also provide an implementation of the TypeSerializer interface returned by the getSerializer() method shown above. That class implementing TypeSerializer is typically most easily written by extending one of the numerous existing implementations of TypeSerializer that Cassandra provides in the org.apache.cassandra.serializers package. In my example, my custom Serializer extends AbstractTextSerializer and the only method I need to add has the signature public void validate(final ByteBuffer bytes) throws MarshalException. Both of my custom classes need to provide a reference to an instance of themselves via static access. Here is the class that implements TypeSerializer via extension of AbstractTypeSerializer:

UnitedStatesStateSerializer.java - Implements TypeSerializer
package dustin.examples.cassandra.cqltypes;

import org.apache.cassandra.serializers.AbstractTextSerializer;
import org.apache.cassandra.serializers.MarshalException;

import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;

/**
 * Serializer for UnitedStatesState.
 */
public class UnitedStatesStateSerializer extends AbstractTextSerializer
{
   public static final UnitedStatesStateSerializer instance = new UnitedStatesStateSerializer();

   private UnitedStatesStateSerializer()
   {
      super(StandardCharsets.UTF_8);
   }

   /**
    * Validates provided ByteBuffer contents to ensure they can
    * be modeled in the UnitedStatesState Cassandra/CQL data type.
    * This allows for a full state name to be specified or for its
    * two-digit abbreviation to be specified and either is considered
    * valid.
    *
    * @param bytes ByteBuffer whose contents are to be validated.
    * @throws MarshalException Thrown if provided data is invalid.
    */
   @Override
   public void validate(final ByteBuffer bytes) throws MarshalException
   {
      try
      {
         final String stringFormat = new String(bytes.array()).toUpperCase();
         final State state =  stringFormat.length() == 2
                            ? State.fromAbbreviation(stringFormat)
                            : State.valueOf(stringFormat);
      }
      catch (Exception exception)
      {
         throw new MarshalException("Invalid model cannot be marshaled as UnitedStatesState.");
      }
   }
}

With the classes for creating a custom CQL data type written, they need to be compiled into .class files and archived in a JAR file. This process (compiling with javac -cp "C:\Program Files\DataStax Community\apache-cassandra\lib\*" -sourcepath src -d classes src\dustin\examples\cassandra\cqltypes\*.java and archiving the generated .class files into a JAR named CustomCqlTypes.jar with jar cvf CustomCqlTypes.jar *) is shown in the following screen snapshot.

The JAR with the class definitions of the custom CQL type classes needs to be placed in the Cassandra installation's lib directory as demonstrated in the next screen snapshot.

With the JAR containing the custom CQL data type classes implementations in the Cassandra installation's lib directory, Cassandra should be restarted so that it will be able to "see" these custom data type definitions.

The next code listing shows a Cassandra Query Language (CQL) statement for creating a table using the new custom type dustin.examples.cassandra.cqltypes.UnitedStatesState.

createAddress.cql
CREATE TABLE us_address
(
   id uuid,
   street1 text,
   street2 text,
   city text,
   state 'dustin.examples.cassandra.cqltypes.UnitedStatesState',
   zipcode text,
   PRIMARY KEY(id)
);

The next screen snapshot demonstrates the results of running the createAddress.cql code above by describing the created table in cqlsh.

The above screen snapshot demonstrates that the custom type dustin.examples.cassandra.cqltypes.UnitedStatesState is the type for the state column of the us_address table.

A new row can be added to the US_ADDRESS table with a normal INSERT. For example, the following screen snapshot demonstrates inserting an address with the command INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'New York', '10118');:

Note that while the INSERT statement inserted "New York" for the state, it is stored as "NY".

If I run an INSERT statement in cqlsh using an abbreviation to start with (INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'NY', '10118');), it still works as shown in the output shown below.

In my example, an invalid state does not prevent an INSERT from occurring, but instead persists the state as "UK" (for unknown) [see the implementation of this in UnitedStatesState.getStateAbbreviationAsByteBuffer(String)].

One of the first advantages that comes to mind justifying why one might want to implement a custom CQL datatype in Java is the ability to employ behavior similar to that provided by check constraints in relational databases. For example, in this post, my sample ensured that any state column entered for a new row was either one of the fifty states of the United States, the District of Columbia, or "UK" for unknown. No other values can be inserted into that column's value.

Another advantage of the custom data type is the ability to massage the data into a preferred form. In this example, I changed every state name to an uppercase two-digit abbreviation. In other cases, I might want to always store in uppercase or always store in lowercase or map finite sets of strings to numeric values. The custom CQL datatype allows for customized validation and representation of values in the Cassandra database.

Conclusion

This post has been an introductory look at implementing custom CQL datatypes in Cassandra. As I play with this concept more and try different things out, I hope to write another blog post on some more subtle observations that I make. As this post shows, it is fairly easy to write and use a custom CQL datatype, especially for Java developers.

Tuesday, May 27, 2014

Connecting to Cassandra from Java

In my post Hello Cassandra, I looked at downloading the Cassandra NoSQL database and using cqlsh to connect to a Cassandra database. In this post, I look at the basics of connecting to a Cassandra database from a Java client.

Although there are several frameworks available for accessing the Cassandra database from Java, I will use the DataStax Java Client JAR in this post. The DataStax Java Driver for Apache Cassandra is available on GitHub. The datastax/java-driver GitHub project page states that it is a "Java client driver for Apache Cassandra" that "works exclusively with the Cassandra Query Language version 3 (CQL3)" and is "licensed under the Apache License, Version 2.0."

The Java Driver 2.0 for Apache Cassandra page provides a high-level overview and architectural details about the driver. Its Writing Your First Client section provides code listings and explanations regarding connecting to Cassandra with the Java driver and executing CQL statements from Java code. The code listings in this post are adaptations of those examples applied to my example cases.

The Cassandra Java Driver has several dependencies. The Java Driver 2.0 for Apache Cassandra documentation includes a page called Setting up your Java development environment that outlines the Java Driver 2.0's dependencies: cassandra-driver-core-2.0.1.jar (datastax/java-driver 2.0), netty-3.9.0-Final.jar (netty direct), guava-16.0.1.jar (Guava 16 direct), metrics-core-3.0.2.jar (Metrics Core), and slf4j-api-1.7.5.jar (slf4j direct). I also found that I needed to place LZ4Factory.java and snappy-java on the classpath.

The next code listing is of a simple class called CassandraConnector.

CassandraConnector.java
package com.marxmart.persistence;

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Host;
import com.datastax.driver.core.Metadata;
import com.datastax.driver.core.Session;

import static java.lang.System.out;

/**
 * Class used for connecting to Cassandra database.
 */
public class CassandraConnector
{
   /** Cassandra Cluster. */
   private Cluster cluster;

   /** Cassandra Session. */
   private Session session;

   /**
    * Connect to Cassandra Cluster specified by provided node IP
    * address and port number.
    *
    * @param node Cluster node IP address.
    * @param port Port of cluster host.
    */
   public void connect(final String node, final int port)
   {
      this.cluster = Cluster.builder().addContactPoint(node).withPort(port).build();
      final Metadata metadata = cluster.getMetadata();
      out.printf("Connected to cluster: %s\n", metadata.getClusterName());
      for (final Host host : metadata.getAllHosts())
      {
         out.printf("Datacenter: %s; Host: %s; Rack: %s\n",
            host.getDatacenter(), host.getAddress(), host.getRack());
      }
      session = cluster.connect();
   }

   /**
    * Provide my Session.
    *
    * @return My session.
    */
   public Session getSession()
   {
      return this.session;
   }

   /** Close cluster. */
   public void close()
   {
      cluster.close();
   }
}

The above connecting class could be invoked as shown in the next code listing.

Code Using CassandraConnector
/**
 * Main function for demonstrating connecting to Cassandra with host and port.
 *
 * @param args Command-line arguments; first argument, if provided, is the
 *    host and second argument, if provided, is the port.
 */
public static void main(final String[] args)
{
   final CassandraConnector client = new CassandraConnector();
   final String ipAddress = args.length > 0 ? args[0] : "localhost";
   final int port = args.length > 1 ? Integer.parseInt(args[1]) : 9042;
   out.println("Connecting to IP Address " + ipAddress + ":" + port + "...");
   client.connect(ipAddress, port);
   client.close();
}

The example code in that last code listing specified default node and port of localhost and port 9042. This port number is specified in the cassandra.yaml file located in the apache-cassandra/conf directory. The Cassandra 1.2 documentation has a page on The cassandra.yaml configuration file which describes the cassandra.yaml file as "the main configuration file for Cassandra." Incidentally, another important configuration file in that same directory is cassandra-env.sh, which defines numerous JVM options for the Java-based Cassandra database.

For the examples in this post, I will be using a MOVIES table created with the following Cassandra Query Language (CQL):

createMovie.cql
CREATE TABLE movies
(
   title varchar,
   year int,
   description varchar,
   mmpa_rating varchar,
   dustin_rating varchar,
   PRIMARY KEY (title, year)
);

The above file can be executed within cqlsh with the command source 'C:\cassandra\cql\examples\createMovie.cql' (assuming that the file is placed in the specified directory, of course) and this is demonstrated in the next screen snapshot.

One thing worth highlighting here is that the columns that were created as varchar datatypes are described as text datatypes by the cqlsh describe command. Although I created this table directly via cqlsh, I also could have created the table in Java as shown in the next code listing and associated screen snapshot that follows the code listing.

Creating Cassandra Table with Java Driver
final String createMovieCql =
     "CREATE TABLE movies_keyspace.movies (title varchar, year int, description varchar, "
   + "mmpa_rating varchar, dustin_rating varchar, PRIMARY KEY (title, year))";
client.getSession().execute(createMovieCql);

The above code accesses an instance variable client. The class with this instance variable that it might exist in is shown next.

Shell of MoviePersistence.java
package dustin.examples.cassandra;

import com.datastax.driver.core.ResultSet;
import com.datastax.driver.core.Row;

import java.util.Optional;

import static java.lang.System.out;

/**
 * Handles movie persistence access.
 */
public class MoviePersistence
{
   private final CassandraConnector client = new CassandraConnector();

   public MoviePersistence(final String newHost, final int newPort)
   {
      out.println("Connecting to IP Address " + newHost + ":" + newPort + "...");
      client.connect(newHost, newPort);
   }

   /**
    * Close my underlying Cassandra connection.
    */
   private void close()
   {
      client.close();
   }
}

With the MOVIES table created as shown above (either by cqlsh or with Java client code), the next steps are to manipulate data related to this table. The next code listing shows a method that could be used to write new rows to the MOVIES table.

/**
 * Persist provided movie information.
 *
 * @param title Title of movie to be persisted.
 * @param year Year of movie to be persisted.
 * @param description Description of movie to be persisted.
 * @param mmpaRating MMPA rating.
 * @param dustinRating Dustin's rating.
 */
public void persistMovie(
   final String title, final int year, final String description,
   final String mmpaRating, final String dustinRating)
{
   client.getSession().execute(
      "INSERT INTO movies_keyspace.movies (title, year, description, mmpa_rating, dustin_rating) VALUES (?, ?, ?, ?, ?)",
      title, year, description, mmpaRating, dustinRating);
}

With the data inserted into the MOVIES table, we need to be able to query it. The next code listing shows one potential implementation for querying a movie by title and year.

Querying with Cassandra Java Driver
/**
 * Returns movie matching provided title and year.
 *
 * @param title Title of desired movie.
 * @param year Year of desired movie.
 * @return Desired movie if match is found; Optional.empty() if no match is found.
 */
public Optional<Movie> queryMovieByTitleAndYear(final String title, final int year)
{
   final ResultSet movieResults = client.getSession().execute(
      "SELECT * from movies_keyspace.movies WHERE title = ? AND year = ?", title, year);
   final Row movieRow = movieResults.one();
   final Optional<Movie> movie =
        movieRow != null
      ? Optional.of(new Movie(
           movieRow.getString("title"),
           movieRow.getInt("year"),
           movieRow.getString("description"),
           movieRow.getString("mmpa_rating"),
           movieRow.getString("dustin_rating")))
      : Optional.empty();
   return movie;
}

If we need to delete data already stored in the Cassandra database, this is easily accomplished as shown in the next code listing.

Deleting with Cassandra Java Driver
/**
 * Deletes the movie with the provided title and release year.
 *
 * @param title Title of movie to be deleted.
 * @param year Year of release of movie to be deleted.
 */
public void deleteMovieWithTitleAndYear(final String title, final int year)
{
   final String deleteString = "DELETE FROM movies_keyspace.movies WHERE title = ? and year = ?";
   client.getSession().execute(deleteString, title, year);
}

As the examples in this blog post have shown, it's easy to access Cassandra from Java applications using the Java Driver. It is worth noting that Cassandra is written in Java. The advantage of this for Java developers is that many of Cassandra's configuration values are JVM options that Java developers are already familiar with. The cassandra-env.sh file in the Cassandra conf directory allows one to specify standard JVM options used by Cassandra (such as heap sizing parameters -Xms, -Xmx, and -Xmn),HotSpot-specific JVM options (such as -XX:-HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath, garbage collection tuning options, and garbage collection logging options), enabling assertions (-ea), and exposing Cassandra for remote JMX management.

Speaking of Cassandra and JMX, Cassandra can be monitored via JMX as discussed in the "Monitoring using JConsole" section of Monitoring a Cassandra cluster. The book excerpt The Basics of Monitoring Cassandra also discusses using JMX to monitor Cassandra. Because Java developers are more likely to be familiar with JMX clients such as JConsole and VisualVM, this is an intuitive approach to monitoring Cassandra for Java developers.

Another advantage of Cassandra's Java roots is that Java classes used by Cassandra can be extended and Cassandra can be customized via Java. For example, custom data types can be implemented by extending the AbstractType class.

Conclusion

The Cassandra Java Driver makes it easy to access Cassandra from Java applications. Cassandra also features significant Java-based configuration and monitoring and can even be customized with Java.

Monday, May 19, 2014

Hello Cassandra

A colleague recently told me about several benefits of Cassandra and I decided to try it out. Apache Cassandra is described in A Quick Introduction to Apache Cassandra as "one of today’s most popular NoSQL-databases." The main page for Apache Cassandra states that the "Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance." Cassandra is being used by companies such as eBay, Netflix, Adobe, Reddit, Instagram, and Twitter. This post is a summary of steps for getting started with Cassandra.

Apache Cassandra can be downloaded from the main Apache Cassandra web page. The Download page states that "the latest stable release of Apache Cassandra is 2.0.7 (released on 2014-04-18)" and this is the version I will discuss and use in this post.

For this post, I downloaded and installed the DataStax Community Edition from Planet Cassandra Downloads. The DataStax Community 2.0.7 edition includes "The Most Stable and Recommended Version of Apache Cassandra for Production (2.0.7)." There are DataStax Community Edition downloads available for Mac OS X, Microsoft Windows, and several flavors of Linux.

The next screen snapshot shows the directory listing for the "bin" directory of the Apache Cassandra included with the DataStax Community Edition installation.

From that "bin" directory, the Cassandra server can be started simply by running the appropriate executable. In the case of this single Windows machine, that command is cassandra.bat and this step is illustrated in the next screen snapshot.

The interactive command-line tool cqlsh is also located in the Apache Cassandra "bin" subdirectory. This tool is similar to SQL*Plus for Oracle databases, mysql for MySQL databases, and psql for PostgreSQL. It allows one to enter various CQL (Cassandra Query Language) statements such as inserting new data and querying data. Starting cqlsh from the command line on a Windows machine is shown in the next screen snapshot.

There are several useful observations that can be made from the previous image. As the output of from starting cqlsh shows, this version of Apache Cassandra is 2.0.7, this version of cqlsh is 4.1.1, and the relevant CQL specification is 3.1.1. The immediately previous screen snapshot also demonstrates help provided by running the "HELP" command. We can see that there are several "documented shell commands" as well as even more "CQL help topics."

The previous screen snapshot demonstrated that the "help" command in cqlsh lists individual topics on which the help command can be specifically run. For example, the next screen snapshot demonstrates the output from running "help types" in cqlsh.

In this screen snapshot, we see CQL data types that are supported in cqlsh such as ascii, text/varchar, decimal, int, double, timestamp, list, set, and map.

Keyspaces in Cassandra

Keyspaces are significant in Cassandra. Although this post covers Cassandra 2.0, the Cassandra 1.0 documentation nicely explains keyspaces in Cassandra: "In Cassandra, the keyspace is the container for your application data, similar to a schema in a relational database. Keyspaces are used to group column families together. Typically, a cluster has one keyspace per application." This documentation goes on to explain that keyspaces are typically used to group column families by replication strategy. The next screenshot demonstrates creation of a keyspace in cqlsh and listing the available keyspaces.

The last screen snapshot included an example of using the command SELECT * FROM system.schema_keyspaces; to see the available keyspaces. When one just wants a list of the names of the available keyspaces without all of the other details, it is easy to use desc keyspaces as shown in the next screens snapshot.

Creating a Column Family ("Table")

With a keyspace created, a column family (or table) can be created. The next screen snapshot demonstrates using the newly created movies_keyspace with the use movies_keyspace; statement and then shows using the cqlsh command SOURCE (similar to using @ in SQL*Plus) to run an external file to create a table (column family). The screen snapshot demonstrates listing available tables with the desc tables command and listing specific details of a given table (MOVIES in this case) with the desc table movies command.

The above screen snapshot demonstrated running an external file called createMovie.cql using the SOURCE command. The code listing for the createMovie.cql file is shown next.

CREATE TABLE movies
(
   title varchar,
   year int,
   description varchar,
   PRIMARY KEY (title, year)
);
Inserting Data into and Querying from the Column Family

The next screen snapshot demonstrates how to insert data into the newly created column family/table [insert into movies_keyspace.movies (title, year, description) values ('Raiders of the Lost Ark', 1981, 'First of Indiana Jones movies.');]. The image also shows how to query the column family/table to see its contents [select * from movies].

Cassandra is NOT a Relational Database

Looking at the Cassandra Query Language (CQL) statements just shown might lead someone to believe that Cassandra is a relational database. However, CQL is a relational-like feature added to Cassandra 2.0 intended to help people with SQL expertise more readily adopt Cassandra. Similarly, triggers are being added in 2.0/2.1. Despite the presence of these Cassandra features intended to make it easier for relational database users to adopt Cassandra, there are significant differences between Cassandra and a relational database.

The Cassandra Data Model is a page in the Apache Cassandra 1.0 Documentation that describes some key differences between Cassandra and relational databases. These include:

  • "Cassandra does not enforce relationships between column families the way that relational databases do between tables"
    • There are no foreign keys in Cassandra and there is no "joining" in Cassandra.
    • Denormalization is not a shameful thing in Cassandra and is actually welcomed to a certain degree.
  • Cassandra "table" (column family) modeling should be done based on expected queries to be used.
Conclusion

I've just begun to get my feet wet with Cassandra but look forward to learning more about it. This post has focused on some basics of acquiring and starting to use Cassandra. There is much to learn about Cassandra and some "deeper" topics that really need to be understood to truly appreciate Cassandra include Cassandra architecture (and here), Cassandra Data Modeling (and here), and Cassandra's strengths and weaknesses.