Monday, January 18, 2010

Reproducing "code too large" Problem in Java

Code conventions and standard software development wisdom dictate that methods should not be too long because they become difficult to fully comprehend, they lose readability when they get too long, they are difficult to appropriately unit test, and they are difficult to reuse. Because most Java developers strive to write highly modular code with small, highly cohesive methods, the "code too large" error in Java is not seen very often. When this error is seen, it is often in generated code.

In this blog post, I intentionally force this "code too large" error to occur. Why in the world would one intentionally do this? In this case, it is because I always understand things better when I tinker with them rather than just reading about them and because doing so gives me a chance to demonstrate Groovy, the Java Compiler API (Java SE 6), and javap.

It turns out that the magic number for the "code too large" error is 65535 bytes (compiled byte code, not source code). Hand-writing a method large enough to lead to this size of a .class file would be tedious (and not worth the effort in my opinion). However, it is typically generated code that leads to this in the wild and so generation of code seems like the best approach to reproducing the problem. When I think of generic Java code generation, I think Groovy.

The Groovy script that soon follows generates a Java class that isn't very exciting. However, the class will have its main function be of an approximate size based on how many conditions I tell the script to create. This allows me to quickly try generating Java classes with different main() method sizes to ascertain when the main() becomes too large.

After the script generates the Java class, it also uses the Java Compiler API to automatically compile the newly generated Java class for me. The resultant .class file is placed in the same directory as the source .java file. The script, creatively named generateJavaClass.groovy, is shown next.

generateJavaClass.groovy

#!/usr/bin/env groovy

import javax.tools.ToolProvider

println "You're running the script ${System.getProperty('script.name')}"
if (args.length < 2)
{
println "Usage: javaClassGeneration packageName className baseDir #loops"
System.exit(-1)
}

// No use of "def" makes the variable available to entire script including the
// defined methods ("global" variables)

packageName = args[0]
packagePieces = packageName.tokenize(".") // Get directory names
def fileName = args[1].endsWith(".java") ? args[1] : args[1] + ".java"
def baseDirectory = args.length > 2 ? args[2] : System.getProperty("user.dir")
numberOfConditionals = args.length > 3 ? Integer.valueOf(args[3]) : 10

NEW_LINE = System.getProperty("line.separator")

// The setting up of the indentations shows off Groovy's easy feature for
// multiplying Strings and Groovy's tie of an overloaded * operator for Strings
// to the 'multiply' method. In other words, the "multiply" and "*" used here
// are really the same thing.
SINGLE_INDENT = ' '
DOUBLE_INDENT = SINGLE_INDENT.multiply(2)
TRIPLE_INDENT = SINGLE_INDENT * 3

def outputDirectoryName = createDirectories(baseDirectory)
def generatedJavaFile = generateJavaClass(outputDirectoryName, fileName)
compileJavaClass(generatedJavaFile)


/**
* Generate the Java class and write its source code to the output directory
* provided and with the file name provided. The generated class's name is
* derived from the provided file name.
*
* @param outDirName Name of directory to which to write Java source.
* @param fileName Name of file to be written to output directory (should include
* the .java extension).
* @return Fully qualified file name of source file.
*/
def String generateJavaClass(outDirName, fileName)
{
def className = fileName.substring(0,fileName.size()-5)
outputFileName = outDirName.toString() + File.separator + fileName
outputFile = new File(outputFileName)
outputFile.write "package ${packageName};${NEW_LINE.multiply(2)}"
outputFile << "public class ${className}${NEW_LINE}"
outputFile << "{${NEW_LINE}"
outputFile << "${SINGLE_INDENT}public static void main(final String[] arguments)"
outputFile << "${NEW_LINE}${SINGLE_INDENT}{${NEW_LINE}"
outputFile << DOUBLE_INDENT << 'final String someString = "Dustin";' << NEW_LINE
outputFile << buildMainBody()
outputFile << "${SINGLE_INDENT}}${NEW_LINE}"
outputFile << "}"
return outputFileName
}


/**
* Compile the provided Java source code file name.
*
* @param fileName Name of Java file to be compiled.
*/
def void compileJavaClass(fileName)
{
// Use the Java SE 6 Compiler API (JSR 199)
// http://java.sun.com/mailers/techtips/corejava/2007/tt0307.html#1
compiler = ToolProvider.getSystemJavaCompiler()

// The use of nulls in the call to JavaCompiler.run indicate use of defaults
// of System.in, System.out, and System.err.
int compilationResult = compiler.run(null, null, null, fileName)
if (compilationResult == 0)
{
println "${fileName} compiled successfully"
}
else
{
println "${fileName} compilation failed"
}
}


/**
* Create directories to which generated files will be written.
*
* @param baseDir The base directory used in which subdirectories for Java
* source packages will be generated.
*/
def String createDirectories(baseDir)
{
def outDirName = new StringBuilder(baseDir)
for (pkgDir in packagePieces)
{
outDirName << File.separator << pkgDir
}
outputDirectory = new File(outDirName.toString())
if (outputDirectory.exists() && outputDirectory.isDirectory())
{
println "Directory ${outDirName} already exists."
}
else
{
isDirectoryCreated = outputDirectory.mkdirs() // Use mkdirs in case multiple
println "Directory ${outputDirectoryName} ${isDirectoryCreated ? 'is' : 'not'} created."
}
return outDirName.toString()
}


/**
* Generate the body of generated Java class source code's main function.
*/
def String buildMainBody()
{
def str = new StringBuilder() << NEW_LINE
str << DOUBLE_INDENT << "if (someString == null || someString.isEmpty())" << NEW_LINE
str << DOUBLE_INDENT << "{" << NEW_LINE
str << TRIPLE_INDENT << 'System.out.println("The String is null or empty.");'
str << NEW_LINE << DOUBLE_INDENT << "}" << NEW_LINE
for (i in 0..numberOfConditionals)
{
str << DOUBLE_INDENT << 'else if (someString.equals("a' << i << '"))' << NEW_LINE
str << DOUBLE_INDENT << "{" << NEW_LINE
str << TRIPLE_INDENT << 'System.out.println("You found me!");' << NEW_LINE
str << DOUBLE_INDENT << "}" << NEW_LINE
}
str << DOUBLE_INDENT << "else" << NEW_LINE
str << DOUBLE_INDENT << "{" << NEW_LINE
str << TRIPLE_INDENT << 'System.out.println("No matching string found.");'
str << DOUBLE_INDENT << NEW_LINE << DOUBLE_INDENT << "}" << NEW_LINE
return str
}


Because this script is intended primarily for generating Java code to learn more about the "code too large" error and to demonstrate a few things, I did not make it nearly as fancy as it could be. For one thing, I did not use Groovy's built-in Apache CLI support for handling command-line arguments as I have demonstrated in previous blog posts on using Groovy to check seventh grade homework.

Even though the script above does not apply Groovy's full potential, it still manages to demonstrate some Groovy niceties. I tried to add comments in the script describing some of these. These include features such as Groovy GDK's String.tokenize method and other useful Groovy String extensions.

When I run this script from the directory C:\java\examples\groovyExamples\javaClassGeneration with the arguments "dustin.examples" (package structure), "BigClass" (name of generated Java class), "." (current directory is based directory for generated code, and "5" (number of conditionals to be in generated code), the script's output is shown here and in the following screen snapshot:


You're running the script C:\java\examples\groovyExamples\javaClassGeneration\generateJavaClass.groovy
Directory .\dustin\examples already exists.
.\dustin\examples\BigClass.java compiled successfully




This output tells us that the generated Java class with five conditionals compiled successfully. To get a taste of what this generated Java class looks like, we'll look at this newly generated version with only five conditionals.

BigClass.java (generated with 5 conditionals)

package dustin.examples;

public class BigClass
{
public static void main(final String[] arguments)
{
final String someString = "Dustin";

if (someString == null || someString.isEmpty())
{
System.out.println("The String is null or empty.");
}
else if (someString.equals("a0"))
{
System.out.println("You found me!");
}
else if (someString.equals("a1"))
{
System.out.println("You found me!");
}
else if (someString.equals("a2"))
{
System.out.println("You found me!");
}
else if (someString.equals("a3"))
{
System.out.println("You found me!");
}
else if (someString.equals("a4"))
{
System.out.println("You found me!");
}
else if (someString.equals("a5"))
{
System.out.println("You found me!");
}
else
{
System.out.println("No matching string found.");
}
}
}


The above code includes two default conditionals every time regardless of how many conditionals are selected when the class generation script is run. In between the check for null/empty String and the else clause if no other else if has been satisfied are the number of else if statements specified when the class generation script was run. In this case, 5 was that number and so there are five else if conditionals between the two default conditionals on either end. As this demonstrates, it will be easy to scale up the number of conditionals until the Java compiler just won't take it anymore.

I now try the Groovy script for generating the Java class again, but this time go all out and select 5000 as the number of desired conditionals. As the output shown below and in the following screen snapshot indicate, Groovy has no trouble generating the text file representing the Java class with this many conditionals in its main() function, but the Java compiler doesn't like it one bit.


You're running the script C:\java\examples\groovyExamples\javaClassGeneration\generateJavaClass.groovy
Directory .\dustin\examples already exists.
.\dustin\examples\BigClass.java:5: code too large
public static void main(final String[] arguments)
^
1 error
.\dustin\examples\BigClass.java compilation failed




Obviously, the attempt to compile the generated class with a 5000+2 conditional main was too much. Through a little iterative trial-and-error, I was able to determine that 2265 conditionals (beyond the two defaults) was the maximum compilable number for my main() function and 2266 would break it. This is demonstrated in the next screen snapshot.



Knowing our limits better, we can now "look" at the byte code using the javap tool provided with Sun's JDK to analyze the corresponding class file. Because there was a compiler error when we tried to compile the code with 2266 additional conditionals, we must run javap against the BigClass.class file generated with 2265 additional conditionals. The output of running javap with the -c option for this large class is too large (~1 MB) to bludgeon readers with here. However, I include key snippets from its output below.


Compiled from "BigClass.java"
public class dustin.examples.BigClass extends java.lang.Object{
public dustin.examples.BigClass();
Code:
0: aload_0
1: invokespecial #1; //Method java/lang/Object."<init>":()V
4: return

public static void main(java.lang.String[]);
Code:
0: ldc #2; //String Dustin
2: ifnonnull 10
5: goto_w 23
10: ldc #2; //String Dustin
12: invokevirtual #3; //Method java/lang/String.isEmpty:()Z
15: ifne 23
18: goto_w 36
23: getstatic #4; //Field java/lang/System.out:Ljava/io/PrintStream;
26: ldc #5; //String The String is null or empty.
28: invokevirtual #6; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
31: goto_w 65512
36: ldc #2; //String Dustin
38: ldc #7; //String a0
40: invokevirtual #8; //Method java/lang/String.equals:(Ljava/lang/Object;)Z
43: ifne 51
46: goto_w 64
51: getstatic #4; //Field java/lang/System.out:Ljava/io/PrintStream;
54: ldc #9; //String You found me!
56: invokevirtual #6; //Method java/io/PrintStream.println:


. . .

. . .

. . .

65411: goto_w 65512
65416: ldc #2; //String Dustin
65418: ldc_w #2272; //String a2263
65421: invokevirtual #8; //Method java/lang/String.equals:(Ljava/lang/Object;)Z
65424: ifne 65432
65427: goto_w 65445
65432: getstatic #4; //Field java/lang/System.out:Ljava/io/PrintStream;
65435: ldc #9; //String You found me!
65437: invokevirtual #6; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
65440: goto_w 65512
65445: ldc #2; //String Dustin
65447: ldc_w #2273; //String a2264
65450: invokevirtual #8; //Method java/lang/String.equals:(Ljava/lang/Object;)Z
65453: ifne 65461
65456: goto_w 65474
65461: getstatic #4; //Field java/lang/System.out:Ljava/io/PrintStream;
65464: ldc #9; //String You found me!
65466: invokevirtual #6; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
65469: goto_w 65512
65474: ldc #2; //String Dustin
65476: ldc_w #2274; //String a2265
65479: invokevirtual #8; //Method java/lang/String.equals:(Ljava/lang/Object;)Z
65482: ifne 65490
65485: goto_w 65503
65490: getstatic #4; //Field java/lang/System.out:Ljava/io/PrintStream;
65493: ldc #9; //String You found me!
65495: invokevirtual #6; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
65498: goto_w 65512
65503: getstatic #4; //Field java/lang/System.out:Ljava/io/PrintStream;
65506: ldc_w #2275; //String No matching string found.
65509: invokevirtual #6; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
65512: return
}


From the snippets of javap output shown above, we see that the highest Code offset (65512) for this function pushing the limits of the method size was getting awfully close to the magic 65535 bytes (216-1 or Short.MAX_VALUE - Short.MIN_VALUE).


Conclusion

Most Java developers don't see the "code too large" problem very often because they write methods and classes that are reasonable in size (or at least more reasonable than the limits allow). However, generated code can much more easily exceed these "limitations." So of what value is intentionally reproducing this error? Well, the next time someone tries to convince you that bigger is better, you can refer that person to this post.


Other Resources

Java Language Specification, Third Edition

Class File Format

Code Too Large for Try Statement?

Code Too Long

Is There Any Number of Lines Limit in a Java Class?

4 comments:

Edson said...

You don't need to go to such great lengths to get a "code too large" problem in Java.
Just write a not-so-large JSP page.
For a example:
http://www.coderanch.com/t/71466/Websphere/code-too-large-try-statement

Dave Newton said...

I was going to mention the same JSP issue; too many static includes trivially reproduces this.

Dustin said...

Edson and Dave,

Thanks for posting and mentioning the JSP issue. That is certainly the most common way to see this error. However, as I stated in the post, approaching this the way I did in this blog post also allowed me to demonstrate Groovy, the Java Compiler API, and javap in addition to the "code too large" error itself.

Diablo said...

Its not uncommon if you are doing parser work and some tools like JFLEX and Byac produce state variables (in generated source code) that easily cause this problem.