Monday, July 7, 2014

Custom Cassandra Data Types

In the blog post Connecting to Cassandra from Java, I mentioned that one advantage for Java developers of Cassandra being implemented in Java is the ability to create custom Cassandra data types. In this post, I outline how to do this in greater detail.

Cassandra has numerous built-in data types, but there are situations in which one may want to add a custom type. Cassandra custom data types are implemented in Java by extending the org.apache.cassandra.db.marshal.AbstractType class. The class that extends this must ultimately implement three methods with the following signatures:

public ByteBuffer fromString(final String) throws MarshalException
public TypeSerializer getSerializer()
public int compare(Object, Object)

This post's example implementation of AbstractType is shown in the next code listing.

UnitedStatesState.java - Extends AbstractType
package dustin.examples.cassandra.cqltypes;

import org.apache.cassandra.db.marshal.AbstractType;
import org.apache.cassandra.serializers.MarshalException;
import org.apache.cassandra.serializers.TypeSerializer;

import java.nio.ByteBuffer;

/**
 * Representation of a state in the United States that
 * can be persisted to Cassandra database.
 */
public class UnitedStatesState extends AbstractType
{
   public static final UnitedStatesState instance = new UnitedStatesState();

   @Override
   public ByteBuffer fromString(final String stateName) throws MarshalException
   {
      return getStateAbbreviationAsByteBuffer(stateName);
   }

   @Override
   public TypeSerializer getSerializer()
   {
      return UnitedStatesStateSerializer.instance;
   }

   @Override
   public int compare(Object o1, Object o2)
   {
      if (o1 == null && o2 == null)
      {
         return 0;
      }
      else if (o1 == null)
      {
         return 1;
      }
      else if (o2 == null)
      {
         return -1;
      }
      else
      {
         return o1.toString().compareTo(o2.toString());
      }
   }

   /**
    * Provide standard two-letter abbreviation for United States
    * state whose state name is provided.
    *
    * @param stateName Name of state whose abbreviation is desired.
    * @return State's abbreviation as a ByteBuffer; will return "UK"
    *    if provided state name is unexpected value.
    */
   private ByteBuffer getStateAbbreviationAsByteBuffer(final String stateName)
   {
      final String upperCaseStateName = stateName != null ? stateName.toUpperCase().replace(" ", "_") : "UNKNOWN";
      String abbreviation;
      try
      {
         abbreviation =  upperCaseStateName.length() == 2
                       ? State.fromAbbreviation(upperCaseStateName).getStateAbbreviation()
                       : State.valueOf(upperCaseStateName).getStateAbbreviation();
      }
      catch (Exception exception)
      {
         abbreviation = State.UNKNOWN.getStateAbbreviation();
      }
      return ByteBuffer.wrap(abbreviation.getBytes());
   }
}

The above class listing references the State enum, which is shown next.

State.java
package dustin.examples.cassandra.cqltypes;

/**
 * Representation of state in the United States.
 */
public enum State
{
   ALABAMA("Alabama", "AL"),
   ALASKA("Alaska", "AK"),
   ARIZONA("Arizona", "AZ"),
   ARKANSAS("Arkansas", "AR"),
   CALIFORNIA("California", "CA"),
   COLORADO("Colorado", "CO"),
   CONNECTICUT("Connecticut", "CT"),
   DELAWARE("Delaware", "DE"),
   DISTRICT_OF_COLUMBIA("District of Columbia", "DC"),
   FLORIDA("Florida", "FL"),
   GEORGIA("Georgia", "GA"),
   HAWAII("Hawaii", "HI"),
   IDAHO("Idaho", "ID"),
   ILLINOIS("Illinois", "IL"),
   INDIANA("Indiana", "IN"),
   IOWA("Iowa", "IA"),
   KANSAS("Kansas", "KS"),
   LOUISIANA("Louisiana", "LA"),
   MAINE("Maine", "ME"),
   MARYLAND("Maryland", "MD"),
   MASSACHUSETTS("Massachusetts", "MA"),
   MICHIGAN("Michigan", "MI"),
   MINNESOTA("Minnesota", "MN"),
   MISSISSIPPI("Mississippi", "MS"),
   MISSOURI("Missouri", "MO"),
   MONTANA("Montana", "MT"),
   NEBRASKA("Nebraska", "NE"),
   NEVADA("Nevada", "NV"),
   NEW_HAMPSHIRE("New Hampshire", "NH"),
   NEW_JERSEY("New Jersey", "NJ"),
   NEW_MEXICO("New Mexico", "NM"),
   NORTH_CAROLINA("North Carolina", "NC"),
   NORTH_DAKOTA("North Dakota", "ND"),
   NEW_YORK("New York", "NY"),
   OHIO("Ohio", "OH"),
   OKLAHOMA("Oklahoma", "OK"),
   OREGON("Oregon", "OR"),
   PENNSYLVANIA("Pennsylvania", "PA"),
   RHODE_ISLAND("Rhode Island", "RI"),
   SOUTH_CAROLINA("South Carolina", "SC"),
   SOUTH_DAKOTA("South Dakota", "SD"),
   TENNESSEE("Tennessee", "TN"),
   TEXAS("Texas", "TX"),
   UTAH("Utah", "UT"),
   VERMONT("Vermont", "VT"),
   VIRGINIA("Virginia", "VA"),
   WASHINGTON("Washington", "WA"),
   WEST_VIRGINIA("West Virginia", "WV"),
   WISCONSIN("Wisconsin", "WI"),
   WYOMING("Wyoming", "WY"),
   UNKNOWN("Unknown", "UK");

   private String stateName;

   private String stateAbbreviation;

   State(final String newStateName, final String newStateAbbreviation)
   {
      this.stateName = newStateName;
      this.stateAbbreviation = newStateAbbreviation;
   }

   public String getStateName()
   {
      return this.stateName;
   }

   public String getStateAbbreviation()
   {
      return this.stateAbbreviation;
   }

   public static State fromAbbreviation(final String candidateAbbreviation)
   {
      State match = UNKNOWN;
      if (candidateAbbreviation != null && candidateAbbreviation.length() == 2)
      {
         final String upperAbbreviation = candidateAbbreviation.toUpperCase();
         for (final State state : State.values())
         {
            if (state.stateAbbreviation.equals(upperAbbreviation))
            {
               match = state;
            }
         }
      }
      return match;
   }
}

We can also provide an implementation of the TypeSerializer interface returned by the getSerializer() method shown above. That class implementing TypeSerializer is typically most easily written by extending one of the numerous existing implementations of TypeSerializer that Cassandra provides in the org.apache.cassandra.serializers package. In my example, my custom Serializer extends AbstractTextSerializer and the only method I need to add has the signature public void validate(final ByteBuffer bytes) throws MarshalException. Both of my custom classes need to provide a reference to an instance of themselves via static access. Here is the class that implements TypeSerializer via extension of AbstractTypeSerializer:

UnitedStatesStateSerializer.java - Implements TypeSerializer
package dustin.examples.cassandra.cqltypes;

import org.apache.cassandra.serializers.AbstractTextSerializer;
import org.apache.cassandra.serializers.MarshalException;

import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;

/**
 * Serializer for UnitedStatesState.
 */
public class UnitedStatesStateSerializer extends AbstractTextSerializer
{
   public static final UnitedStatesStateSerializer instance = new UnitedStatesStateSerializer();

   private UnitedStatesStateSerializer()
   {
      super(StandardCharsets.UTF_8);
   }

   /**
    * Validates provided ByteBuffer contents to ensure they can
    * be modeled in the UnitedStatesState Cassandra/CQL data type.
    * This allows for a full state name to be specified or for its
    * two-digit abbreviation to be specified and either is considered
    * valid.
    *
    * @param bytes ByteBuffer whose contents are to be validated.
    * @throws MarshalException Thrown if provided data is invalid.
    */
   @Override
   public void validate(final ByteBuffer bytes) throws MarshalException
   {
      try
      {
         final String stringFormat = new String(bytes.array()).toUpperCase();
         final State state =  stringFormat.length() == 2
                            ? State.fromAbbreviation(stringFormat)
                            : State.valueOf(stringFormat);
      }
      catch (Exception exception)
      {
         throw new MarshalException("Invalid model cannot be marshaled as UnitedStatesState.");
      }
   }
}

With the classes for creating a custom CQL data type written, they need to be compiled into .class files and archived in a JAR file. This process (compiling with javac -cp "C:\Program Files\DataStax Community\apache-cassandra\lib\*" -sourcepath src -d classes src\dustin\examples\cassandra\cqltypes\*.java and archiving the generated .class files into a JAR named CustomCqlTypes.jar with jar cvf CustomCqlTypes.jar *) is shown in the following screen snapshot.

The JAR with the class definitions of the custom CQL type classes needs to be placed in the Cassandra installation's lib directory as demonstrated in the next screen snapshot.

With the JAR containing the custom CQL data type classes implementations in the Cassandra installation's lib directory, Cassandra should be restarted so that it will be able to "see" these custom data type definitions.

The next code listing shows a Cassandra Query Language (CQL) statement for creating a table using the new custom type dustin.examples.cassandra.cqltypes.UnitedStatesState.

createAddress.cql
CREATE TABLE us_address
(
   id uuid,
   street1 text,
   street2 text,
   city text,
   state 'dustin.examples.cassandra.cqltypes.UnitedStatesState',
   zipcode text,
   PRIMARY KEY(id)
);

The next screen snapshot demonstrates the results of running the createAddress.cql code above by describing the created table in cqlsh.

The above screen snapshot demonstrates that the custom type dustin.examples.cassandra.cqltypes.UnitedStatesState is the type for the state column of the us_address table.

A new row can be added to the US_ADDRESS table with a normal INSERT. For example, the following screen snapshot demonstrates inserting an address with the command INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'New York', '10118');:

Note that while the INSERT statement inserted "New York" for the state, it is stored as "NY".

If I run an INSERT statement in cqlsh using an abbreviation to start with (INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'NY', '10118');), it still works as shown in the output shown below.

In my example, an invalid state does not prevent an INSERT from occurring, but instead persists the state as "UK" (for unknown) [see the implementation of this in UnitedStatesState.getStateAbbreviationAsByteBuffer(String)].

One of the first advantages that comes to mind justifying why one might want to implement a custom CQL datatype in Java is the ability to employ behavior similar to that provided by check constraints in relational databases. For example, in this post, my sample ensured that any state column entered for a new row was either one of the fifty states of the United States, the District of Columbia, or "UK" for unknown. No other values can be inserted into that column's value.

Another advantage of the custom data type is the ability to massage the data into a preferred form. In this example, I changed every state name to an uppercase two-digit abbreviation. In other cases, I might want to always store in uppercase or always store in lowercase or map finite sets of strings to numeric values. The custom CQL datatype allows for customized validation and representation of values in the Cassandra database.

Conclusion

This post has been an introductory look at implementing custom CQL datatypes in Cassandra. As I play with this concept more and try different things out, I hope to write another blog post on some more subtle observations that I make. As this post shows, it is fairly easy to write and use a custom CQL datatype, especially for Java developers.

Friday, June 27, 2014

Next Generation Project Valhalla Proposed

Earlier this week, Brian Goetz proposed Project Valhalla on an OpenJDK mailing list. Goetz's proposal states:

In accordance with the OpenJDK guidelines, this project will provide a venue to explore and incubate advanced Java VM and Language feature candidates such as Value Types, Generic Specialization, enhanced volatiles (and possibly other related topics, such as reified generics.)

Note that Goetz's proposal should not be confused with the 1997 version of Oracle's Project Valhalla that is described as "the code name for Oracle's flexible Java development environment for building, debugging and deploying component-based applications for the network computing platform." This "other" Project Valhalla is also described in the 15 December 1997 edition of InfoWorld: "Code-named Project Valhalla, the Oracle AppBuilder for Java is slated to ship in the first quarter of 1998. The Java tool, which is based on Borland's JBuilder Java IDE, is designed for writing n-tier, thin client applications."

The "other" Project Valhalla became AppBuilder for Java and was included with JDeveloper Suite (15 April 1998). The Oracle Application Server (4.0 at the time) that was part of this same JDeveloper Suite has new company at Oracle with Oracle's acquiring WebLogic with the BEA acquisition and acquiring GlassFish with the Sun acquisition and JDeveloper is an Oracle-provided (free in the "free beer" sense, but not open source) Java IDE.

Returning to the "next generation" of Project Valhalla, the voting on this Goetz proposal closes on 7 July 2014. Several have already expressed "yes" in replies to Goetz's original e-mail post. In that forum, Patrick Wright asked, "how is this/will this be different from the Da Vinci Project/MLVM?" and John Rose answered that "the charter for Da Vinci is to incubate JVM features for languages other than Java ... or for general language support without associated Java language changes" while "Valhalla is intended to support the evolution of Java itself."

I was excited about Project Coin when it was announced and enjoyed reading and hearing about its progress, but Project Valhalla would be much more ambitious and much more exciting if it is investigated and some or all of the proposed features are implemented.

Wednesday, June 25, 2014

Book Review: Penetration Testing with the Bash Shell

This post is my review of the Packt Publishing book Penetration Testing with the Bash shell by Keith Makan. I think it's important to emphasize its subtitle: "Make the most of the Bash shell and Kali Linux's command-line-based security assessment tools." As the main title and sub-title suggest, this book is about penetration testing with Bash using the Kali Linux. The book has approximately 125 pages covering 5 chapters.

The Preface of Penetration Testing with the Bash shell begins with an articulation of why it is important for penetration testers to be familiar with command-line tools. The Preface also provides brief summaries of each of the five chapters in the book and some of the command-line tools covered in each chapter.

Under the "What you need for this book" section of Penetration Testing with the Bash shell's Preface, Makan writes, "The only software requirement for this book is the Kali Linux operating system, which you can download in the ISO format from http://www.kali.org." You don't necessarily need to use Kali Linux as many of the bash examples and described command-line tools are also available with or for other Linux distributions. The "Who this book is for" section of the Preface states, "Command line hacking is a book for anyone interested in learning how to wield their Kali Linux command lines to perform effective penetration testing."

Chapter 1: Getting to Know Bash

The initial chapter explains that "throughout the book, the bash environment or the host operating system that will be discussed will be Kali Linux" and that "Kali Linux is a distribution adapted from Debian." The chapter then goes onto explain with text and examples Bash concepts such as man pages and bash commands such cd, ls, pwd, and find. The chapter also describes using Linux redirection (output and input) and pipes before concluding with a discussion of grep and regular expressions. As the author suggests, this chapter could be generally be skipped by developers familiar with Linux, but would be must-read information for those new to Linux. Although Kali Linux is used for the examples, I didn't see anything in this initial chapter that seemed specific to Kali Linux.

Chapter 2: Customizing Your Shell

The second chapter of Penetration Testing with the Bash shell begins with coverage of terminal control sequences to change appearance of text in the terminal. The chapter builds upon this information to demonstrate customization of the terminal prompt. Chapter 2 also introduces aliases, customizing command history, and customizing tab completion. Like the first chapter, this second chapter covers additional general Bash concepts rather than specifics of penetration testing.

Chapter 3: Network Reconnaissance

Chapter 3 of Penetration Testing with the Bash shell transitions from coverage of Linux and bash to "move on to using the shell and the Kali Linux command-line utilities to collect information about the networks you find yourself in." The chapter examines tools such as whois, dig, dnsmap, Nmap, and arping.

Chapter 4: Exploitation and Reverse Engineering

Penetration Testing with the Bash shell's fourth chapter focuses on reverse engineering and tools that "may enable you to discover memory corruption, code injection, and general data- or file-handling flaws that may be used to instantiate arbitrary code execution vulnerabilities." Tools covered in this chapter include Metasploit (including command-line msfcli, mixing msfcli with bash commands and other command line tools, msfpayload, Meterpreter), objdump, and GDB.

Chapter 5: Network Exploitation and Monitoring

The fifth and final chapter of Penetration Testing with the Bash shell focuses "on the network exploitation available in Kali Linux and how to take advantage of it in the modern bash shell environment." This relatively longer chapter begins with discussion of "Media Access Control (MAC) spoofing" and "Address Resolution Protocol (ARP) abuse." The chapter looks at tools such as macchangeer, ifconfig, arpspoof.

Chapter 5 describes man-in-the-middle (MITM) attacks and provides a detailed introduction to ettercap. The chapter also explains server interrogation and describes tools for Simple Network Management Protocol (SNMP) interrogation (snmpwalk and Metasploit's snmp_enum and snmp_login) and for Simple Mail Transfer Protocol (SMTP) interrogation (smtp-user-enum).

There is a section in Chapter 5 on brute force authentication that demonstrates use of Medusa. The section of Chapter 5 on traffic filtering demonstrates use of TCPDump. SSLyze is demonstrated as a tool for "assess[ing] SSL/TLS implementations" and SkipFish and Arachni are demonstrated as tools for scanning web pages and web applications for vulnerabilities.

General Observations
  • The electronic version of Penetration Testing with the Bash shell that I reviewed included numerous colored screen snapshots.
  • I liked the "Further Reading" sections at the end of each chapter that provided links to online information with more detail about subjects covered in the chapter.
  • The first two chapters of Penetration Testing with the Bash shell have very little information specific to penetration testing but provide an overview of some bash features used in later chapters. The third, fourth, and fifth chapters are heavily focused on penetration testing and build upon the bash basics covered in the first two chapters.
  • Penetration Testing with the Bash shell provides background information on various security and assessment concepts as its illustrates the tools available that are related to those concepts.
  • There are some awkward sentence structures in Penetration Testing with the Bash shell and several of the sentences are too long for my taste (especially early in the book), but I was generally able to make out the intent of the writing. An example of the occasional awkward but understandable text is on the first page of the first chapter: "Why are discussing the bash shell?"
  • Although this book does mention Kali Linux specifically and frequently, the majority of the described tools are available as built-in tools or as tools that can be installed separately with other flavors of Linux. With no or just a a bit of effort, one could run the examples in different Linux implementations. It's also worth pointing out that the author repeatedly mentions that several commands runs with no special effort in Kali Linux because things are run as root. In other Linux implementations, you may need to use sudo to apply these tools.
  • Although I am fairly familiar with bash, I still learned a few new things from the first two chapters (primarily the second chapter). I am far less experienced with penetration testing and learned quite a bit from the other three chapters.
Conclusion

Penetration Testing with the Bash shell: Make the most of the Bash shell and Kali Linux's command-line-based security assessment tools outlines how bash in general and Kali Linux in particular provide command-line security assessment tools. The book introduces the tools and how to apply them and explains security-related concepts along the way.

Monday, June 23, 2014

I Don't Think That Software Development Word Means What You Think It Means

There are several terms used inappropriately or incorrectly in software development. In this post, I look at some of these terms and the negative consequences of misuse of these terms.

"agile"

The Agile Manifesto started a movement that resonated with many software developers frustrated with inefficiencies and inadequacies of prevalent software development methodologies. Unfortunately, the relatively simple concepts of the Agile Manifesto were interpreted, changed, evangelized, commercialized, and sold in so many different ways that it became difficult to uniquely describe agile. To some, agile became synonymous with "no documentation" and to others agile meant going straight to coding without any process. So many disparate methodologies and practices are now sold as agile that it's become increasingly difficult to describe what makes something agile or not.

There have been several negative consequences of the multiple interpretations of what agile means. Implementation of so-called agile practices without understanding of agile can lead to failures that are blamed on agile, but which failures may have little or nothing to do with agile. Unrealistic expectations of agile and what it can do for development can lead to inevitable disappointment as there still is no silver bullet. It is difficult to help new developers, developers new to agile, managers, customers, and other stakeholders understand what agile is and how it may or may not be appropriate for them with so many different interpretations. I was at a presentation by an agile enthusiastic several years ago when he suggested that agile was anything that was successful and was not anything that is not successful.

For me, "agile" means processes and approaches that closely match the values outlined in the Agile Manifesto (individuals and interactions, working software, customer collaboration, and responding to change). There are other approaches and methodologies out there that may be useful and positive, but if they aren't inspired by these values, it is difficult for me to hear them called "agile."

"REST"

Roy Fielding's dissertation Architectural Styles and the Design of Network-based Software Architectures popularized the term Representational State Transfer (REST). Unfortunately, many have used REST and HTTP interchangeably and in the process have muddled the conversations about both the REST architectural style and the Hypertext Transfer Protocol (HTTP).

It is easy to see from a historical perspective why REST and HTTP are often treated interchangeably. REST embraced the functionality already provided by HTTP as a significant part of its architectural style at a time other popular architectural styles and frameworks were doing everything they could to hide or abstract away HTTP specifics details. REST leverages HTTP's stateless nature while others were trying to wrap HTTP with state. Although REST certainly played a major role in raising awareness of HTTP, REST is more than HTTP. I have found that many who think of REST and HTTP as one and the same don't appreciate the HATEOAS concept in REST. HATEOAS stands for Hypermedia as the Engine of Application State and refers to the concept of application state being embodied within the hypermedia exchanged between server and client rather than in the client.

"refactoring"

I've known of clients and managers filled with dread when they hear a developer state that he or she is going to "refactor" something. The reason is that "refactoring" too often means the developer plans to change the code structure and "improve" or "fix" behavior as part of this. Refactoring is supposed to be code improvements that do not effect the results of the software but lead to more maintainable code. Too many developers are lured into making other changes "while they are in the code" that change results. Even when for the better, these changes are not in the spirit of refactoring and so, when they have led to breaking of existing functionality, have led to "refactoring" being seen in a bad light.

Comprehensive unit tests and other tests can help ensure that refactoring does not change any expected behavior, but developers should also clearly understand whether the goal is to maintain current functionality with improved code structure (refactor) or actually change/improve functionality and only use the term "refactoring" when appropriate to avoid confusion.

"premature optimization"

I generally agree with the principle behind the now famous quotation, "Premature optimization is the root of all evil." However, my interpretation of this is that one should not write less maintainable or less readable code in an attempt to achieve small expected performance gains. However, as I posted in When Premature Optimization Isn't, this term occasionally gets used as justification for not making good architecture and high-level design decisions just because they have a performance benefit associated with them. Some architectural decisions are difficult to change at a later point and performance does need to be accounted for. Similarly, even at implementation level, there are times when better performing code is as readable and easy to write as less performing code and so there is no good reason to not write the better performing code.

NoSQL

The term NoSQL was an unfortunate one for a class of databases that probably would have been better labeled "Not Relational." As numerous "NoSQL databases" have adopted SQL (without adopting the relational model), alternative terms have been tried such as "Not Just SQL."

"open source"

The term "open source" has often led to confusion about whether the software in question is "free" in terms of "freedom" (libre/free speech) and/or "free" in terms of no monetary price (gratuit/free beer). There can even be confusion about the minor differences between "open source" and "free software." For me, "open source" means source code that I can look at and modify as necessary.

JavaScript

With JavaScript's increased popularity, its poorly chosen name does not seem to confuse as many people as it used to. However, I still do occasionally hear people who think that JavaScript must have some relationship to Java because "Java" is in both languages' names.

SLOC

I generally despise everything about the idea of source line of code. The appeal of SLOC is the pretense that somehow lines of code can be counted the same as beans and widgets. All lines of code are not created equal and there are differences in lines of code across different languages, across different developers, and across different functionality. Some have even gone so far as to think that more SLOC is always a good thing whereas I've found that more concise code with fewer lines of code can often be preferable. I have blogged before on lines of code and unintended consequences.

SOAP

This one is not a big deal in terms of negative consequences from its misuse, but it is worth noting that SOAP no longer stands for Simple Object Access Protocol.

JDBC

This is another one that doesn't really lead to any problems even though it technically has never stood for Java Database Connectivity and is not even an acronym. The fact that it does indeed relate to connecting to databases, that it is Java-related, and that it is so widely said to stand for Java Database Connectivity means that this misuse of the term JDBC has no significant negative side effects. In fact, I suspect that Sun Microsystems folks intentionally wanted people to think of it as an acronym for Java Database Connectivity while explicitly stating that it was not an acronym because it allowed people to quickly understand what JDBC is via their awareness of ODBC.

Conclusion

The incorrect use of many of the terms discussed in this post could be described as largely pedantic, but misuse of a few of them can lead to miscommunication and general confusion. In some cases (such as "agile" and "refactoring"), the misuse of terms has led to negative experiences and soiled reputations for those terms. In other cases (such as using JDBC and SOAP as acronyms when they really are not acronyms), the confusion seems small and harmless as everyone discussing the falsely advertised "acronym" seems to understand what it implies.

Monday, June 2, 2014

Review: DZone's 2014 Guide to Mobile Development

DZone is launching the public release of its DZone's 2014 Guide to Mobile Development today. The 35-page PDF is intended to be used with the DZone Mobile Development Research site to "explore the mobile application development landscape, examining best-practice strategies and comparing tools and frameworks that accelerate mobile development."

For those desiring an "executive summary" of the state of mobile development as described in DZone's 2014 Guide to Mobile Development, a single page (page 3) contains a paragraph and five bullets summarizing what the guide covers along with four "key takeaways" from the mobile development research.

The "Key Research Findings" section (pages 4 and 5) includes summary text and colorful graphics to illustrate results of surveys of mobile developers. The findings include things such as types of mobile developers, types of mobile applications being developed, and common timeline for mobile app development.

Pages 6 and 7 of DZone's 2014 Guide to Mobile Development look at developing mobile apps as web apps, as native apps, and as hybrids of those. Pages 10 and 11 elaborate more on the "state" of apps that are native, web-based, and hybrid and presents a table of 7 comparison factors for the three approaches.

Pages 14-15 of DZone's 2014 Guide to Mobile Development discuss issues related to integration of mobile applications with back-end enterprises. Pages 21-22 discuss perceived performance versus actual performance metrics.

The 23rd page of DZone's 2014 Guide to Mobile Development features a "Mobile Application Development Checklist." Pages 24-34 are the "Solutions Directory" and it is this section that may be of most interest to those trying to decide on tools, frameworks, and platforms to use in their new mobile development projects. Each "solution" has a brief description listed along with certain characteristics (the set of characteristics depend on whether it is a framework, mobile application development platform [MADP], or both).

The DZone 2014 Guide to Mobile Development also includes several pages of full-page advertisements as well as pages split in half with the top half containing text about a mobile development issue and the bottom half containing an advertisement for a related solution addressing the need discussed in text. These half-page text sections and associated advertisements are typically written by "research partners" (Progress/Progress Pacific, Outsystems/Outsystems Platform, ICEsoft/ICEmobile, Telerik/Telerik Platform, and New Relic Mobile Monitoring).

The DZone 2014 Guide to Mobile Development is a polished product that will likely be of most benefit to those considering starting mobile application development or those just entering mobile application development. Although it contains ideas likely to interest even more experienced mobile application developers, much of it is introductory in nature. Experienced mobile developers who want to see what others are doing will also find the research findings interesting and may find it useful if they are looking for an alternate framework or MADP to adopt. This report is not a highly detailed, but instead focuses on high-level trends, survey results, and product offerings that affect mobile development.

The sections of DZone's 2014 Guide to Mobile Development that I found most interesting are the Summary and Key Takeaways, Key Research Findings, Cross-Platform Problems and Solutions, The State of Native vs. Hybrid vs. Web, the checklist, and the Solutions Directory. Reading this report has helped solidify some of my opinions by providing background and support, has helped me increase my understanding of a couple things, and helped me to think about some common mobile development issues from a different perspective. I also was able to read about some frameworks and MADPs that I had not previously been aware of. In short, this report provides a succinct and well presented summary of the overall current state of mobile development.

Tuesday, May 27, 2014

Connecting to Cassandra from Java

In my post Hello Cassandra, I looked at downloading the Cassandra NoSQL database and using cqlsh to connect to a Cassandra database. In this post, I look at the basics of connecting to a Cassandra database from a Java client.

Although there are several frameworks available for accessing the Cassandra database from Java, I will use the DataStax Java Client JAR in this post. The DataStax Java Driver for Apache Cassandra is available on GitHub. The datastax/java-driver GitHub project page states that it is a "Java client driver for Apache Cassandra" that "works exclusively with the Cassandra Query Language version 3 (CQL3)" and is "licensed under the Apache License, Version 2.0."

The Java Driver 2.0 for Apache Cassandra page provides a high-level overview and architectural details about the driver. Its Writing Your First Client section provides code listings and explanations regarding connecting to Cassandra with the Java driver and executing CQL statements from Java code. The code listings in this post are adaptations of those examples applied to my example cases.

The Cassandra Java Driver has several dependencies. The Java Driver 2.0 for Apache Cassandra documentation includes a page called Setting up your Java development environment that outlines the Java Driver 2.0's dependencies: cassandra-driver-core-2.0.1.jar (datastax/java-driver 2.0), netty-3.9.0-Final.jar (netty direct), guava-16.0.1.jar (Guava 16 direct), metrics-core-3.0.2.jar (Metrics Core), and slf4j-api-1.7.5.jar (slf4j direct). I also found that I needed to place LZ4Factory.java and snappy-java on the classpath.

The next code listing is of a simple class called CassandraConnector.

CassandraConnector.java
package com.marxmart.persistence;

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Host;
import com.datastax.driver.core.Metadata;
import com.datastax.driver.core.Session;

import static java.lang.System.out;

/**
 * Class used for connecting to Cassandra database.
 */
public class CassandraConnector
{
   /** Cassandra Cluster. */
   private Cluster cluster;

   /** Cassandra Session. */
   private Session session;

   /**
    * Connect to Cassandra Cluster specified by provided node IP
    * address and port number.
    *
    * @param node Cluster node IP address.
    * @param port Port of cluster host.
    */
   public void connect(final String node, final int port)
   {
      this.cluster = Cluster.builder().addContactPoint(node).withPort(port).build();
      final Metadata metadata = cluster.getMetadata();
      out.printf("Connected to cluster: %s\n", metadata.getClusterName());
      for (final Host host : metadata.getAllHosts())
      {
         out.printf("Datacenter: %s; Host: %s; Rack: %s\n",
            host.getDatacenter(), host.getAddress(), host.getRack());
      }
      session = cluster.connect();
   }

   /**
    * Provide my Session.
    *
    * @return My session.
    */
   public Session getSession()
   {
      return this.session;
   }

   /** Close cluster. */
   public void close()
   {
      cluster.close();
   }
}

The above connecting class could be invoked as shown in the next code listing.

Code Using CassandraConnector
/**
 * Main function for demonstrating connecting to Cassandra with host and port.
 *
 * @param args Command-line arguments; first argument, if provided, is the
 *    host and second argument, if provided, is the port.
 */
public static void main(final String[] args)
{
   final CassandraConnector client = new CassandraConnector();
   final String ipAddress = args.length > 0 ? args[0] : "localhost";
   final int port = args.length > 1 ? Integer.parseInt(args[1]) : 9042;
   out.println("Connecting to IP Address " + ipAddress + ":" + port + "...");
   client.connect(ipAddress, port);
   client.close();
}

The example code in that last code listing specified default node and port of localhost and port 9042. This port number is specified in the cassandra.yaml file located in the apache-cassandra/conf directory. The Cassandra 1.2 documentation has a page on The cassandra.yaml configuration file which describes the cassandra.yaml file as "the main configuration file for Cassandra." Incidentally, another important configuration file in that same directory is cassandra-env.sh, which defines numerous JVM options for the Java-based Cassandra database.

For the examples in this post, I will be using a MOVIES table created with the following Cassandra Query Language (CQL):

createMovie.cql
CREATE TABLE movies
(
   title varchar,
   year int,
   description varchar,
   mmpa_rating varchar,
   dustin_rating varchar,
   PRIMARY KEY (title, year)
);

The above file can be executed within cqlsh with the command source 'C:\cassandra\cql\examples\createMovie.cql' (assuming that the file is placed in the specified directory, of course) and this is demonstrated in the next screen snapshot.

One thing worth highlighting here is that the columns that were created as varchar datatypes are described as text datatypes by the cqlsh describe command. Although I created this table directly via cqlsh, I also could have created the table in Java as shown in the next code listing and associated screen snapshot that follows the code listing.

Creating Cassandra Table with Java Driver
final String createMovieCql =
     "CREATE TABLE movies_keyspace.movies (title varchar, year int, description varchar, "
   + "mmpa_rating varchar, dustin_rating varchar, PRIMARY KEY (title, year))";
client.getSession().execute(createMovieCql);

The above code accesses an instance variable client. The class with this instance variable that it might exist in is shown next.

Shell of MoviePersistence.java
package dustin.examples.cassandra;

import com.datastax.driver.core.ResultSet;
import com.datastax.driver.core.Row;

import java.util.Optional;

import static java.lang.System.out;

/**
 * Handles movie persistence access.
 */
public class MoviePersistence
{
   private final CassandraConnector client = new CassandraConnector();

   public MoviePersistence(final String newHost, final int newPort)
   {
      out.println("Connecting to IP Address " + newHost + ":" + newPort + "...");
      client.connect(newHost, newPort);
   }

   /**
    * Close my underlying Cassandra connection.
    */
   private void close()
   {
      client.close();
   }
}

With the MOVIES table created as shown above (either by cqlsh or with Java client code), the next steps are to manipulate data related to this table. The next code listing shows a method that could be used to write new rows to the MOVIES table.

/**
 * Persist provided movie information.
 *
 * @param title Title of movie to be persisted.
 * @param year Year of movie to be persisted.
 * @param description Description of movie to be persisted.
 * @param mmpaRating MMPA rating.
 * @param dustinRating Dustin's rating.
 */
public void persistMovie(
   final String title, final int year, final String description,
   final String mmpaRating, final String dustinRating)
{
   client.getSession().execute(
      "INSERT INTO movies_keyspace.movies (title, year, description, mmpa_rating, dustin_rating) VALUES (?, ?, ?, ?, ?)",
      title, year, description, mmpaRating, dustinRating);
}

With the data inserted into the MOVIES table, we need to be able to query it. The next code listing shows one potential implementation for querying a movie by title and year.

Querying with Cassandra Java Driver
/**
 * Returns movie matching provided title and year.
 *
 * @param title Title of desired movie.
 * @param year Year of desired movie.
 * @return Desired movie if match is found; Optional.empty() if no match is found.
 */
public Optional<Movie> queryMovieByTitleAndYear(final String title, final int year)
{
   final ResultSet movieResults = client.getSession().execute(
      "SELECT * from movies_keyspace.movies WHERE title = ? AND year = ?", title, year);
   final Row movieRow = movieResults.one();
   final Optional<Movie> movie =
        movieRow != null
      ? Optional.of(new Movie(
           movieRow.getString("title"),
           movieRow.getInt("year"),
           movieRow.getString("description"),
           movieRow.getString("mmpa_rating"),
           movieRow.getString("dustin_rating")))
      : Optional.empty();
   return movie;
}

If we need to delete data already stored in the Cassandra database, this is easily accomplished as shown in the next code listing.

Deleting with Cassandra Java Driver
/**
 * Deletes the movie with the provided title and release year.
 *
 * @param title Title of movie to be deleted.
 * @param year Year of release of movie to be deleted.
 */
public void deleteMovieWithTitleAndYear(final String title, final int year)
{
   final String deleteString = "DELETE FROM movies_keyspace.movies WHERE title = ? and year = ?";
   client.getSession().execute(deleteString, title, year);
}

As the examples in this blog post have shown, it's easy to access Cassandra from Java applications using the Java Driver. It is worth noting that Cassandra is written in Java. The advantage of this for Java developers is that many of Cassandra's configuration values are JVM options that Java developers are already familiar with. The cassandra-env.sh file in the Cassandra conf directory allows one to specify standard JVM options used by Cassandra (such as heap sizing parameters -Xms, -Xmx, and -Xmn),HotSpot-specific JVM options (such as -XX:-HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath, garbage collection tuning options, and garbage collection logging options), enabling assertions (-ea), and exposing Cassandra for remote JMX management.

Speaking of Cassandra and JMX, Cassandra can be monitored via JMX as discussed in the "Monitoring using JConsole" section of Monitoring a Cassandra cluster. The book excerpt The Basics of Monitoring Cassandra also discusses using JMX to monitor Cassandra. Because Java developers are more likely to be familiar with JMX clients such as JConsole and VisualVM, this is an intuitive approach to monitoring Cassandra for Java developers.

Another advantage of Cassandra's Java roots is that Java classes used by Cassandra can be extended and Cassandra can be customized via Java. For example, custom data types can be implemented by extending the AbstractType class.

Conclusion

The Cassandra Java Driver makes it easy to access Cassandra from Java applications. Cassandra also features significant Java-based configuration and monitoring and can even be customized with Java.

Monday, May 19, 2014

Hello Cassandra

A colleague recently told me about several benefits of Cassandra and I decided to try it out. Apache Cassandra is described in A Quick Introduction to Apache Cassandra as "one of today’s most popular NoSQL-databases." The main page for Apache Cassandra states that the "Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance." Cassandra is being used by companies such as eBay, Netflix, Adobe, Reddit, Instagram, and Twitter. This post is a summary of steps for getting started with Cassandra.

Apache Cassandra can be downloaded from the main Apache Cassandra web page. The Download page states that "the latest stable release of Apache Cassandra is 2.0.7 (released on 2014-04-18)" and this is the version I will discuss and use in this post.

For this post, I downloaded and installed the DataStax Community Edition from Planet Cassandra Downloads. The DataStax Community 2.0.7 edition includes "The Most Stable and Recommended Version of Apache Cassandra for Production (2.0.7)." There are DataStax Community Edition downloads available for Mac OS X, Microsoft Windows, and several flavors of Linux.

The next screen snapshot shows the directory listing for the "bin" directory of the Apache Cassandra included with the DataStax Community Edition installation.

From that "bin" directory, the Cassandra server can be started simply by running the appropriate executable. In the case of this single Windows machine, that command is cassandra.bat and this step is illustrated in the next screen snapshot.

The interactive command-line tool cqlsh is also located in the Apache Cassandra "bin" subdirectory. This tool is similar to SQL*Plus for Oracle databases, mysql for MySQL databases, and psql for PostgreSQL. It allows one to enter various CQL (Cassandra Query Language) statements such as inserting new data and querying data. Starting cqlsh from the command line on a Windows machine is shown in the next screen snapshot.

There are several useful observations that can be made from the previous image. As the output of from starting cqlsh shows, this version of Apache Cassandra is 2.0.7, this version of cqlsh is 4.1.1, and the relevant CQL specification is 3.1.1. The immediately previous screen snapshot also demonstrates help provided by running the "HELP" command. We can see that there are several "documented shell commands" as well as even more "CQL help topics."

The previous screen snapshot demonstrated that the "help" command in cqlsh lists individual topics on which the help command can be specifically run. For example, the next screen snapshot demonstrates the output from running "help types" in cqlsh.

In this screen snapshot, we see CQL data types that are supported in cqlsh such as ascii, text/varchar, decimal, int, double, timestamp, list, set, and map.

Keyspaces in Cassandra

Keyspaces are significant in Cassandra. Although this post covers Cassandra 2.0, the Cassandra 1.0 documentation nicely explains keyspaces in Cassandra: "In Cassandra, the keyspace is the container for your application data, similar to a schema in a relational database. Keyspaces are used to group column families together. Typically, a cluster has one keyspace per application." This documentation goes on to explain that keyspaces are typically used to group column families by replication strategy. The next screenshot demonstrates creation of a keyspace in cqlsh and listing the available keyspaces.

The last screen snapshot included an example of using the command SELECT * FROM system.schema_keyspaces; to see the available keyspaces. When one just wants a list of the names of the available keyspaces without all of the other details, it is easy to use desc keyspaces as shown in the next screens snapshot.

Creating a Column Family ("Table")

With a keyspace created, a column family (or table) can be created. The next screen snapshot demonstrates using the newly created movies_keyspace with the use movies_keyspace; statement and then shows using the cqlsh command SOURCE (similar to using @ in SQL*Plus) to run an external file to create a table (column family). The screen snapshot demonstrates listing available tables with the desc tables command and listing specific details of a given table (MOVIES in this case) with the desc table movies command.

The above screen snapshot demonstrated running an external file called createMovie.cql using the SOURCE command. The code listing for the createMovie.cql file is shown next.

CREATE TABLE movies
(
   title varchar,
   year int,
   description varchar,
   PRIMARY KEY (title, year)
);
Inserting Data into and Querying from the Column Family

The next screen snapshot demonstrates how to insert data into the newly created column family/table [insert into movies_keyspace.movies (title, year, description) values ('Raiders of the Lost Ark', 1981, 'First of Indiana Jones movies.');]. The image also shows how to query the column family/table to see its contents [select * from movies].

Cassandra is NOT a Relational Database

Looking at the Cassandra Query Language (CQL) statements just shown might lead someone to believe that Cassandra is a relational database. However, CQL is a relational-like feature added to Cassandra 2.0 intended to help people with SQL expertise more readily adopt Cassandra. Similarly, triggers are being added in 2.0/2.1. Despite the presence of these Cassandra features intended to make it easier for relational database users to adopt Cassandra, there are significant differences between Cassandra and a relational database.

The Cassandra Data Model is a page in the Apache Cassandra 1.0 Documentation that describes some key differences between Cassandra and relational databases. These include:

  • "Cassandra does not enforce relationships between column families the way that relational databases do between tables"
    • There are no foreign keys in Cassandra and there is no "joining" in Cassandra.
    • Denormalization is not a shameful thing in Cassandra and is actually welcomed to a certain degree.
  • Cassandra "table" (column family) modeling should be done based on expected queries to be used.
Conclusion

I've just begun to get my feet wet with Cassandra but look forward to learning more about it. This post has focused on some basics of acquiring and starting to use Cassandra. There is much to learn about Cassandra and some "deeper" topics that really need to be understood to truly appreciate Cassandra include Cassandra architecture (and here), Cassandra Data Modeling (and here), and Cassandra's strengths and weaknesses.