Saturday, March 17, 2018

Raw String Literals Coming to Java

It appears likely that "raw string literals" are coming to Java. JEP 326 ("Raw String Literals") started as Issue JDK-8196004 and was announced as a "new JEP candidate" on March 2. The JEP and associated issue point out that "Java remains one of a small group of contemporary programming languages that do not provide language-level support for raw strings." The JEP and associated issue specifically reference programming languages C, C++, C# ("verbatim"), Dart, Go, Groovy, Haskell, JavaScript, Kotlin, Perl, PHP, Python, R, Ruby, Scala and Swift and the "Unix tools" bash, grep, and sed that were "surveyed for their delimiters and use of raw and multi-line strings."

JEP 326's "Summary" provides an overview of the proposed Java raw string literals: "A raw string literal can span multiple lines of source code and does not interpret escape sequences, such as \n, or Unicode escapes, of the form \uXXXX." The "Motivations" section of this JEP adds, "This JEP proposes a new kind of literal, a raw string literal, which sets aside both Java escapes and Java line terminator specifications, to provide character sequences that under many circumstances are more readable and maintainable than the existing traditional string literal." JEP 326 does not introduce interpolation and, in fact, rules it out in its "Non-Goals" section: "Raw string literals do not directly support string interpolation. Interpolation may be considered in a future JEP."

Multi-line String literals have long been desired in Java. JEP 326 ("Raw String Literals") currently lists several examples of how raw string literals would make it easier to implement common things in Java and these example uses include multi-line strings, operating system file paths, regular expressions, relational database SQL statements, and polygot (Java+JavaScript).

The current version of JEP 326 states that Java's raw string literals will be denoted via the use of the "backtick" character (`), which is also described in the JEP as \u0060 (Unicode "Grave Accent"), "backquote", and "accent grave". I don't show any examples of the proposed syntax because the JEP already does a nice job of listing these proposed raw string literal examples alongside examples of traditional Java code needed to implement the same thing. This makes it easy to compare the required current syntax to what would be needed in the future to accomplish the same thing if raw string literals are supported.

Support for raw string literals in Java will provide nice convenience for Java developers wishing to write more readable code to support use cases like those described in the JEP. It will provide similar advantages to libraries and even to the JDK code. The core-lib-devs mailing list post "Raw String Literal Library Support" [JDK-8196005] starts a "discussion with regards to RSL library support." (The context of "library support" in this case is the JDK and RSL stands for Raw String Literal.).

In the referenced post Raw String Literal Library Support, Jim Laskey provides a list of methods to potentially add to String to take advantage of raw string literals. These ideas for kicking off discussion include "line support", enhancements to "trim" methods, "margin management", and "escape management". Some of these are facilitated by RSL while others are necessitated by RSL. The cited post provides multiple examples of each of these.

Issue JDK-8198986 points out that "a new JLS section is needed for raw string literals." This issue links to a currently proposed section to be added to the cited Java Language Specification.

Although JEP 326 is still just a "Candidate" and is not associated with a particular release of Java, recent work on it and recent discussion in mailing list posts seeking input related to it lead me to be cautiously optimistic that we'll see multi-line Java strings and other raw string literals coming to Java in a future release.


@DustinMarx said...

Brian Goetz has posted the message "Raw string literals -- where we are, how we got here" that "summarize[s] where we are, and how we got here" related to raw string literals "now that things have largely stabilized." The post discusses design and syntax decisions and why the particular decisions were made instead of proposed alternatives.

@DustinMarx said...

Two new messages on the amber-dev mailing list under the title "Raw String Literals (RSL) - indent stripping of multi-line strings and written by Jim Laskey and John Rose discuss low-level details related to raw string literal "'indent stripping' of multi-line strings."

@DustinMarx said...

In the message "[RSL] RSL update", Brian Goetz summarizes "where we are with respect to the multi-line aspect of Raw String Literals" (RSL).