Saturday, October 24, 2009

Java SourceVersion and Character

The SourceVersion class provides information on Java source versions and can provide some interesting details, including demonstration of terminology covered in the Java Language Specification. In this blog posting, I look briefly at some of the more interesting observations one can make using this class that was introduced with Java SE 6 in conjunction with the Java long-timer class Character.

The SourceVersion class provides several handy static methods for ascertaining details about the source version of the current Java runtime. The methods SourceVersion.latest() and SourceVersion.latestSupported() provide details regarding the latest source versions that can be "modeled" and "fully supported" respectively.

The following code snippet demonstrates these two methods in action.


out.println("Latest source version that can be modeled: ");
out.println("\tSourceVersion.latest(): " + SourceVersion.latest());
out.println("Latest source version fully supported by current execution environment ");
out.println("\tSourceVersion.latestSupported(): " + SourceVersion.latestSupported());


The output from running this code is shown next.



As the code example and corresponding output above indicate, the currently supported modeled version and currently fully supported versions are easily accessible. Although the SourceVersion class was introduced with Java SE 6, it has been built to be support future versions of Java. Not only does the Javadoc documentation state that "additional source version constants will be added to model future releases of the language," but the SourceVersion.values() method also provides all supported version enums. A code example and associated output are shown next to demonstrate this method in action.


out.println("SourceVersion enum Values:");
final SourceVersion[] versions = SourceVersion.values();
for (final SourceVersion version : versions)
{
out.println("\t" + version);
}




The Javadoc documentation tells us the meanings of the various enum values shown in the above output. Each represents a different "source version of the Java programming language" and the platform version it is associated with. As shown earlier, the RELEASE_6 is associated with Java SE 6, RELEASE_5 is associated with J2SE 5, RELEASE_4 is associated with JDK 1.4, RELEASE_3 is associated with JDK 1.3, RELEASE_2 is associated with JDK 1.2, RELEASE_1 is associated with JDK 1.1 and RELEASE_0 is associated with "the original version." The Javadoc documentation for Java SE 7 indicates that SourceVersion.RELEASE_7 is supported in Java SE 7.

The SourceVersion class provides three static methods that each indicate whether a provided CharSequence is an identifier, keyword, or name. The three methods that allow one to dynamically determine if a particular CharSequence fits one or more of the types identifier, name, or keyword are (respectively) SourceVersion.isIdentifier(), SourceVersion.isName(), and SourceVersion.isKeyword().

Using these methods allows one to determine if a particular string is reserved as a keyword, is even considered a valid identifier, and if a string that is a valid identifier is not a keyword and is thus a valid name. The isName() method returns true for a "syntactically valid name" that is not also a keyword or literal. The isKeyword() method indicates if the provided string is one of the keywords listed here.

I have run many different strings of various combinations of these three types in the following code.


public static void printIdentifierTest(final String stringToBeTested)
{
out.println(
"Is '" + stringToBeTested + "' an identifier? "
+ SourceVersion.isIdentifier(stringToBeTested));
}

public static void printKeywordTest(final String stringToBeTested)
{
out.println(
"Is '" + stringToBeTested + "' a keyword? "
+ SourceVersion.isKeyword(stringToBeTested));
}

public static void printNameTest(final String stringToBeTested)
{
out.println(
"Can '" + stringToBeTested + "' be used as a name? "
+ SourceVersion.isName(stringToBeTested));
}

public static void printTests(final String stringToBeTested)
{
out.println("\n=============== " + stringToBeTested + " ===============");
printIdentifierTest(stringToBeTested);
printKeywordTest(stringToBeTested);
printNameTest(stringToBeTested);
}

public static void printTests(
final String stringToBeTested,
final String alternateHeaderString)
{
out.println("\n=============== " + alternateHeaderString + " ===============");
printIdentifierTest(stringToBeTested);
printKeywordTest(stringToBeTested);
printNameTest(stringToBeTested);
}

/**
* Main function for demonstrating SourceVersion enum.
*
* @param arguments Command-line arguments: none expected.
*/
public static void main(final String[] arguments)
{
final String dustinStr = "Dustin";
printTests(dustinStr);
final String dustinLowerStr = "dustin";
printTests(dustinLowerStr);
final String instanceOfStr = "instanceof";
printTests(instanceOfStr);
final String constStr = "const";
printTests(constStr);
final String gotoStr = "goto";
printTests(gotoStr);
final String trueStr = "true";
printTests(trueStr);
final String nullStr = "null";
printTests(nullStr);
final String weirdStr = "/#";
printTests(weirdStr);
final String tabStr = "\t";
printTests(tabStr, "TAB (\\t)");
final String classStr = "class";
printTests(classStr);
final String enumStr = "enum";
printTests(enumStr);
final String assertStr = "assert";
printTests(assertStr);
final String intStr = "int";
printTests(intStr);
final String numeralStartStr = "1abc";
printTests(numeralStartStr);
final String numeralEmbeddedStr = "abc1";
printTests(numeralEmbeddedStr);
final String dollarStartStr = "$dustin";
printTests(dollarStartStr);
final String underscoreStartStr = "_dustin";
printTests(underscoreStartStr);
final String spacesStartStr = " dustin";
printTests(spacesStartStr, " dustin (space in front)");
final String spacesInStr = "to be";
printTests(spacesInStr);
}


When the above code is executed the output shown next is generated.


=============== Dustin ===============
Is 'Dustin' an identifier? true
Is 'Dustin' a keyword? false
Can 'Dustin' be used as a name? true

=============== dustin ===============
Is 'dustin' an identifier? true
Is 'dustin' a keyword? false
Can 'dustin' be used as a name? true

=============== instanceof ===============
Is 'instanceof' an identifier? true
Is 'instanceof' a keyword? true
Can 'instanceof' be used as a name? false

=============== const ===============
Is 'const' an identifier? true
Is 'const' a keyword? true
Can 'const' be used as a name? false

=============== goto ===============
Is 'goto' an identifier? true
Is 'goto' a keyword? true
Can 'goto' be used as a name? false

=============== true ===============
Is 'true' an identifier? true
Is 'true' a keyword? true
Can 'true' be used as a name? false

=============== null ===============
Is 'null' an identifier? true
Is 'null' a keyword? true
Can 'null' be used as a name? false

=============== /# ===============
Is '/#' an identifier? false
Is '/#' a keyword? false
Can '/#' be used as a name? false

=============== TAB (\t) ===============
Is ' ' an identifier? false
Is ' ' a keyword? false
Can ' ' be used as a name? false

=============== class ===============
Is 'class' an identifier? true
Is 'class' a keyword? true
Can 'class' be used as a name? false

=============== enum ===============
Is 'enum' an identifier? true
Is 'enum' a keyword? true
Can 'enum' be used as a name? false

=============== assert ===============
Is 'assert' an identifier? true
Is 'assert' a keyword? true
Can 'assert' be used as a name? false

=============== int ===============
Is 'int' an identifier? true
Is 'int' a keyword? true
Can 'int' be used as a name? false

=============== 1abc ===============
Is '1abc' an identifier? false
Is '1abc' a keyword? false
Can '1abc' be used as a name? false

=============== abc1 ===============
Is 'abc1' an identifier? true
Is 'abc1' a keyword? false
Can 'abc1' be used as a name? true

=============== $dustin ===============
Is '$dustin' an identifier? true
Is '$dustin' a keyword? false
Can '$dustin' be used as a name? true

=============== _dustin ===============
Is '_dustin' an identifier? true
Is '_dustin' a keyword? false
Can '_dustin' be used as a name? true

=============== dustin (space in front) ===============
Is ' dustin' an identifier? false
Is ' dustin' a keyword? false
Can ' dustin' be used as a name? false

=============== to be ===============
Is 'to be' an identifier? false
Is 'to be' a keyword? false
Can 'to be' be used as a name? false



The above output demonstrates that a valid name must be a valid identifier without being a keyword. A keyword must be a valid identifier, but not all identifiers are keywords. Some string values that are not keywords or reserved words are not even identifiers because they don't meet the rules of Java identifiers.

The examples above indicate that we cannot use a name for a variable or other construct that begins with a numeral, but we can use $ and _ for the first character in a name. Another way to determine this is through use of the static method Character.isJavaIdentifierStart(char). The following code snippet demonstrates this along with the similar method Character.isJavaIdentifierPart(char), which returns true if the provided character can be in the name anywhere other than the first character.


public static void printTestForValidIdentifierCharacter(
final char characterToBeTested)
{
out.println(
"Character '" + characterToBeTested
+ ( Character.isJavaIdentifierStart(characterToBeTested)
? "': VALID "
: "': NOT VALID ")
+ "FIRST character and "
+ ( Character.isJavaIdentifierPart(characterToBeTested)
? "VALID "
: "NOT VALID ")
+ "OTHER character in a Java name.");
out.println( "\tType of '" + characterToBeTested + "': "
+ Character.getType(characterToBeTested));
}

public static void demonstrateCharacterJavaIdentifierStart()
{
out.println("\nTEST FOR FIRST AND OTHER CHARACTERS IN A VALID JAVA NAME");
printTestForValidIdentifierCharacter('A');
printTestForValidIdentifierCharacter('a');
printTestForValidIdentifierCharacter('1');
printTestForValidIdentifierCharacter('\\');
printTestForValidIdentifierCharacter('_');
printTestForValidIdentifierCharacter('$');
printTestForValidIdentifierCharacter('#');
printTestForValidIdentifierCharacter('\n');
printTestForValidIdentifierCharacter('\t');
}


The output from the above appears below.


TEST FOR FIRST AND OTHER CHARACTERS IN A VALID JAVA NAME
Character 'A': VALID FIRST character and VALID OTHER character in a Java name.
Type of 'A': 1
Character 'a': VALID FIRST character and VALID OTHER character in a Java name.
Type of 'a': 2
Character '1': NOT VALID FIRST character and VALID OTHER character in a Java name.
Type of '1': 9
Character '\': NOT VALID FIRST character and NOT VALID OTHER character in a Java name.
Type of '\': 24
Character '_': VALID FIRST character and VALID OTHER character in a Java name.
Type of '_': 23
Character '$': VALID FIRST character and VALID OTHER character in a Java name.
Type of '$': 26
Character '#': NOT VALID FIRST character and NOT VALID OTHER character in a Java name.
Type of '#': 24
Character '
': NOT VALID FIRST character and NOT VALID OTHER character in a Java name.
Type of '
': 15
Character ' ': NOT VALID FIRST character and NOT VALID OTHER character in a Java name.
Type of ' ': 15


Because the Character.getType(char) method has been with us for quite a while and predates the J2SE 5-introduced enum construct, this method returns primitives integers. One can refer to Java's Constant Field Values to determine what each of these constants stand for.

To make the above example's output a little more readable, I have added a simple "converter" method that converts the returned int to a more readable String. I have only added switch cases for the integers returned from my example, but one could add cases for all supported types represented by different integers.


public static String extractReadableStringFromJavaCharacterTypeInt(
final int characterTypeInt)
{
String characterType;
switch (characterTypeInt)
{
case Character.CONNECTOR_PUNCTUATION :
characterType = "Connector Punctuation";
break;
case Character.CONTROL :
characterType = "Control";
break;
case Character.CURRENCY_SYMBOL :
characterType = "Currency Symbol";
break;
case Character.DECIMAL_DIGIT_NUMBER :
characterType = "Decimal Digit Number";
break;
case Character.LETTER_NUMBER :
characterType = "Letter/Number";
break;
case Character.LOWERCASE_LETTER :
characterType = "Lowercase Letter";
break;
case Character.OTHER_PUNCTUATION :
characterType = "Other Punctuation";
break;
case Character.UPPERCASE_LETTER :
characterType = "Uppercase Letter";
break;
default : characterType = "Unknown Character Type Integer: " + characterTypeInt;
}
return characterType;
}


When the integers returned from Character.getType(char) in the example two listings ago are run through this switch statement, the revised output appears as shown next.


TEST FOR FIRST AND OTHER CHARACTERS IN A VALID JAVA NAME
Character 'A': VALID FIRST character and VALID OTHER character in a Java name.
Type of 'A': Uppercase Letter
Character 'a': VALID FIRST character and VALID OTHER character in a Java name.
Type of 'a': Lowercase Letter
Character '1': NOT VALID FIRST character and VALID OTHER character in a Java name.
Type of '1': Decimal Digit Number
Character '\': NOT VALID FIRST character and NOT VALID OTHER character in a Java name.
Type of '\': Other Punctuation
Character '_': VALID FIRST character and VALID OTHER character in a Java name.
Type of '_': Connector Punctuation
Character '$': VALID FIRST character and VALID OTHER character in a Java name.
Type of '$': Currency Symbol
Character '#': NOT VALID FIRST character and NOT VALID OTHER character in a Java name.
Type of '#': Other Punctuation
Character '
': NOT VALID FIRST character and NOT VALID OTHER character in a Java name.
Type of '
': Control
Character ' ': NOT VALID FIRST character and NOT VALID OTHER character in a Java name.
Type of ' ': Control


The SourceVersion class is useful for dynamically determining information about the Java source code version and the keywords and valid names applicable for that version. The Character class also provides useful information on what a particular character's type is and whether or not that character can be used as the first character of a name or as any other character in a valid name.

No comments: