Monday, May 21, 2018

New JDK 11 Files Method isSameContent()

It has been proposed that a method named isSameContents() be added to the Files class in JDK 11 via JDK-8202285 ["(fs) Add a method to Files for comparing file contents"]. Proposed by Joe Wang, this new method is "intended to be an extension to the existing isSameFile method since it stopped short of comparing the content to answer the query for whether two files are equal." JDK-8201276 also references this method and describes it as "a utility method that compares two files."

Regarding the usage of this new method, JDK-8202285's Description states:

Proposing a new Files method isSameContent. Files currently has a method called isSameFile that answers the query on whether or not two files are the same file. Since two files containing the same contents may also be viewed as the same, it is desirable to add a method that further compares the contents, that would make the "is same file" query complete.

The OpenJDK core-libs-dev mailing list discussion on this thread provides additional details on the background of, motivation for, and implementation of this new method. For example, there are messages on this thread that do the following:

A particularly insightful message in this thread is a RĂ©mi Forax message providing code demonstrating how to use the JDK 9-added InputStream.transfer(OutputStream) method, the JDK 10-added local variable type inference, and classes MessageDigest and DigestOutputStream to hash the contents of a file in six lines of Java code.

It's looking increasingly likely that JDK 11 will provide several new useful "utility" methods in addition to the JEPs and other more significant features that will be coming with JDK 11.

1 comment:

PR said...

Couple of suggestions, please let me know your views:
1. The comparison should happen by file type (types need to be same to compare the content): Either programmers ensure they are comparing same type or method checks for type else throws TypeMismatchException
2. If there is a need to compare some specific content, programmers should be provided with some handle to compare by content (for instance: 2 simple CSVs having headers, comparison can happen for some select fields for content)