Monday, January 25, 2016

Book Review: JavaScript Concurrency

Adam Boduch's book JavaScript Concurrency has been published by Packt Publishing (December 2015). The subtitle of the book is, "Build better software with concurrent JavaScript programming, and unlock a more efficient and forward thinking approach to web development." JavaScript Concurrency consists of ten chapters spanning over 260 substantive pages. The author has described this book in his blog post of the same title.

Preface

The Preface of JavaScript Concurrency provides a single sentence summary of each of the book's ten chapters (find longer chapter descriptions in the author's blog post). The Preface states that the "requirements for this book" are a modern web browser, Node.js, and a code editor.

JavaScript Concurrency's Preface states that "all aspects of concurrent, asynchronous, and parallel programming are covered from first principles" and advertises that the book is "written for any JavaScript developer who wants to learn how to write more efficient, powerful, and maintainable applications that utilize the latest developments in the JavaScript language."

Chapter 1: Why JavaScript Concurrency?

The first sentence of the first chapter of JavaScript Concurrency opens with, "JavaScript is not a language associated with concurrency," but then points out that "this has changed a lot over the past few years, especially with new language features in ES 2015. The chapter introduces concepts of synchronous and asynchronous JavaScript. There is coverage of asynchronous versus parallel that uses simple juggler examples to illustrate the differences.

JavaScript Concurrency's initial chapter introduces the "JavaScript concurrency principles" of "parallelize," "synchronize," and "conserve." The chapter describes the "parallelize principle" as "taking advantage of modern CPU capabilities to compute results in less time" and explains that JavaScript parallelism is achieved in web browsers with Web Workers and in Node.js by spawning processes.

Chapter 1 describes the "synchronize principle" as "about the mechanisms used to coordinate concurrent actions and the abstractions of those mechanisms" and introduces callback functions and Promises. The chapter describes the "conserve principle" as "about saving on compute and memory resources" and introduces "lazy evaluation."

Chapter 2: The JavaScript Execution Model

JavaScript Concurrency's second chapter introduces the JavaScript execution model and related terminology: tasks, execution environment, interpreter, task queue, and event loop. The section "Creating tasks using timers" introduces and demonstrates use of setTimeout() and setInterval() "to schedule JavaScript code to run at a later time."

The second chapter of JavaScript Concurrency looks at basic ways of "responding to DOM events" and "responding to network events" using JavaScript callbacks. As part of this discussion, discussion and demonstrative code listings are provided related to use of debounce and for coordinating multiple requests. The chapter concludes with discussion of some of the challenges of using JavaScript callbacks as an introduction to the motivation for promises.

Chapter 3: Synchronizing with Promises

The third chapter of JavaScript Concurrency opens with a brief history of promises in JavaScript and defines key terms associated with promises: Promise, State, Executor, Resolver, Rejector, and Thenable. The objective of this chapter is to cover how the ES6 implementation of promises in JavaScript helps address JavaScript's so-called "callback hell."

Chapter 3 provides explanations of each of the terms related to JavaScript promises that were briefly introduced at the beginning of the chapter. The chapter uses multiple code listings and descriptive text to explain different approaches to using promises and I liked the id it presents of extending the promise prototype with an always() function that should be run for a given promise whether it resolves or rejects.

JavaScript Concurrency's third chapter reinforces that promises "are born into a pending state, and they die in either a resolved or rejected state" and then goes onto describe consequence of this. There is more in-depth coverage of use of JavaScript promises in different contexts that includes discussion and example code illustrating Promise.all() for handling multiple promises, Promise.race() for specifying that first resolved promise wins, and Promise.resolve(), and Promise.reject() for dealing with a JavaScript function that has "the potential to be both synchronous and asynchronous."

Chapter 4: Lazy Evaluation with Generators

Chapter 4 of JavaScript Concurrency opens with the sentence, "Lazy evaluation is a programming technique, which is used when we don't want to compute values until the very last second." The early portion of this chapter also explains, "The Generator is a new primitive type introduced to JavaScript as a part of the ES6 specification of the language" that helps "implement lazy evaluation techniques in [JavaScript] code." The chapter looks at the JavaScript Generator in more detail, including coverage of generator function syntax (function*), use of yield and next, and use of for .. of for iterating over a sequence of generators..

JavaScript Concurrency's fourth chapter describes with text and illustrates with code how to use generators to work with infinite and alternating/circular sequences. The chapter similarly describes and illustrates deferring between generators and interweaving generators. There are also sections in this chapter regarding passing data to generators, reusing general generators, and implementing "lightweight map/reduce." The "Coroutines" section of Chapter 4 "looks at some techniques for implementing coroutines in JavaScript using generators" and provides examples of implementing and using JavaScript coroutines based on generators.

Chapter 5: Working with Workers

The fifth chapter of JavaScript Concurrency begins with the sentence, "Web workers enable true parallelism within a web browser," and states, "In this chapter, we'll walk through the conceptual ideas of web workers, and how they relate to the concurrency principles that we're trying to achieve in our applications." The chapter introduces web workers as "at their core, ... nothing more than operating system-level threads."

Chapter 5 covers three types of web workers: dedicated (default), sub-workers, and shared workers. There is also coverage of worker environments and how these differ from the main thread's environment.

JavaScript Concurrency's fifth chapter explains and demonstrates the worker importScripts() function to import the lodash library. The chapter also covers communication between workers and related message serialization. I like that the chapter includes discussion of trade-offs and disadvantages associated with use of shared workers and sub-workers. The chapter concludes with coverage of error and exception handling in web workers.

Chapter 6: Practical Parallelism

Chapter 6 of JavaScript Concurrency introduces functional programming and related concepts such as immutability and referential transparency. A section in this chapter asks, "Do we need to go parallel?" That section briefly discusses how to decide whether to apply parallelism or not. This is complemented by a section that briefly discusses how to determine how many cores are available and use that information to decide how much parallelism is appropriate. The chapter also introduces "candidate problems" that can be addressed with parallelism such as mapping and reducing, searching collections, and "keeping the DOM responsive."

Chapter 7: Abstracting Concurrency

The seventh chapter begins by looking "at the approaches that we might use to insulate the rest of our application from tricky concurrency bits." The chapter explains how promises can be used to encapsulate the complexities of concurrent code. The explanations and code examples demonstrate "creating helper functions that wrap the worker communications" to return promises and "extending the web worker interface" postMessage() method. The chapter's section on "lazy workers" looks at "using generators in web workers."

Several of the code listings in Chapter 7 are at least partially taken from http://adambom.github.io/parallel.js/ (and this is designated as code comments in those listings). After referencing these multiple code listings based on that library, the author formally introduces Parallel.js and explains that this library's purpose is "to make interacting with web workers as seamless as possible." The chapter discusses how to use Parallel.js to provide one's own functions to workers provided by Parallel.js rather than implementing one's own workers. The chapter demonstrates using Parallel.js to spawn workers and for mapping and reducing. I like that this section includes discussion of parallel slowdown and how to address that.

Chapter 7 concludes with coverage of worker pools. There are several pages devoted to explanation and demonstrative code listings related to implementing a worker "pool abstraction [that] look[s] and behave[s] like a plain dedicated worker."

Chapter 8: Evented IO with NodeJS

The purpose of Chapter 8 is to "explain how Node handles concurrency and how we need to program our NodeJS code to take full advantage of this environment." The chapter introduces evented IO, the Node.js Event Loop, and why these work well in web environments. The author introduces Dan Kegel's concept of the C10K problem and discusses how Node helps address this. The areas of specific focus are network IO and file system IO with descriptive text explanations and associated code samples.

Chapter 9: Advanced NodeJS Concurrency

Chapter 9 begins by introducing the co library as a library relying on generators and promises "to implement coroutines." The introduction to co introduces the "co() function to create a coroutine," demonstrates how co's awaiting values support is similar to proposed ES7 syntax, demonstrates resolving promises with co, and explains using "the wrap() utility to make a plain coroutine function into a reusable function."

JavaScript Concurrency's ninth chapter includes a section that explains how to fork new Node child processes to avoid issues with the blocked IO event loop. There is also a section on applying "a pool of general-purpose processes." The "Server Clusters" section discusses "scaling our Node application to several machines" and covers examples of this in the context of proxying, microservices, and load balancing.

Chapter 10: Building a Concurrent Application

The final chapter of JavaScript Concurrency uses a "simple chat application" to demonstrate concepts covered in the earlier chapters for building a concurrent application in JavaScript. The chapter begins by revisiting why it's easier to design applications to be concurrent from the beginning than to retrofit an application that was not designed to be concurrent.

The sample chat application built in Chapter 10 is based on Node.js and uses a few libraries such as commander and formidable. Although several simplifying assumptions are made to keep the example simpler, there is still a lot of code to list and describe. The chapter concludes with a section on other features that could be added to the sample chat application and with a paragraph that closes out the entire book.

General Observations

  • JavaScript Concurrency covers several modern (ES6/ES7) topics. This is very useful as many JavaScript books don't yet cover these modern abilities. However, the one drawback is that some of the covered material is not finalized and is subject to change. This is a trade-off that every book written about "cutting edge" topics faces.
  • Several JavaScript libraries with concurrent-orientation are introduced and demonstrated in JavaScript Concurrency. The most attention is paid to Parallel.js, co, formidable, and commander, probably in that order.
  • JavaScript Concurrency not only introduces syntax of new JavaScript concurrency features, but it also includes discussion on motivations for using the different constructs and trade-offs and design decisions that should be considered when selecting a JavaScript concurrent construct.
  • JavaScript Concurrency is best suited for an intermediate or advanced JavaScript developer who has little or no experience with promises, generators, workers, coroutines, and other ES6/ES7 concurrency concepts. A potential reader of this book should probably understand the material covered in Concurrency Model and Event Loop before reading this book. I wouldn't recommend this book to someone trying to learn JavaScript with no previous experience with the language, but would recommend it to someone with basic familiarity and some experience with JavaScript.
  • JavaScript Concurrency covers JavaScript concurrency both in web browser environments and on the server with Node.js.
  • There are numerous code listings in JavaScript Concurrency. These code listings are black text on white background with no color syntax and no line numbers. The code is available for download, which is probably preferable so that the code can be viewed in a favorite editor or IDE with color syntax and line numbers. The code listings are heavily commented (more so and at a lower level than I'd want in production code) and this is desirable in a book that is introducing and explaining concepts. It's nice to be able to read the code and associated comments to gain understanding and switch less between the code and associated text discussion for that understanding.
  • JavaScript Concurrency contains numerous simple diagrams to illustrate concepts. These diagrams consist of simple shapes such as boxes and arrows. In some cases, these simple diagrams make a concept easier to understand and in some cases the diagrams add very little value perhaps because they are overly simplistic.
  • There are some typos in JavaScript Concurrency that generally did not preclude me from understanding the points being made. I list a sample of them here so that readers of this post can decide if these are significant to their understanding, mildly distracting, or not an issue at all.
    • Page 100 has text introducing a code snippet that states, "Next, we'll use the loadScripts() function to bring the lodash library into our library...", but fortunately the actual code listing correctly demonstrates importScripts() rather than loadScripts().
    • Page 149, "Our application is likely filled with operations these, where concurrency simply doesn't make sense."
    • Page 233, "it will be a very simply chat application"
  • Because preferences can be subjective and opinionated, I like to include references to other reviews of the same books in many of my book reviews. Other reviews of JavaScript Concurrency are available:

Conclusion

JavaScript Concurrency sticks to what its two-words title describes: JavaScript concurrency. It is in many ways a "modern JavaScript" book that added clarity to my understanding of what JavaScript now supports or soon will support in many different browsers and on the server with Node.js.

Monday, January 4, 2016

Leaner Java Collections with FastUtil

In response to my recent post Discovering a Trove of Java Primitives Collection Handling on the GNU Trove library, TheAlchemist pointed out some advantages of fastutil over trove: "I much prefer fastutil (http://fastutil.di.unimi.it/), because it's still in active development, has more features, supports large sizes (> 2^32), and has better documentation." Attila-Mihaly Balazs has seconded this: "I second @TheAlchemist's recommendation for fastutil! It's a great library." In this post, I look at fastutil from some of the same perspectives that I previously looked at trove.

The main fastutil page describes fastutil as an extension of the JavaTM Collections Framework that provides "type-specific maps, sets, lists and queues with a small memory footprint and fast access and insertion" along with "big (64-bit) arrays, sets and lists, and fast, practical I/O classes for binary and text files." The license for fastutil is Apache License, Version 2 and the current version of fastutil requires Java 7 or newer. There are currently (as of this writing) "unmaintained" versions of fastutil available for download as well for Java 6 and Java 5 as well.

Adding elements to a FastUtil collection is accomplished with the same API calls as used with standard Java collections as demonstrated in the next code listing which compares inserting elements into a JDK ArrayList to inserting elements into a FastUtil DoubleArrayList.

Inserting Doubles with JDK and Inserting doubles with FastUtil DoubleArrayList
/**
 * Demonstrate standard JDK {@code ArrayList<Double>}
 * with some JDK 8 functionality.
 */
public void demonstrateJdkArrayListForDoubles()
{
   final ArrayList<Double> doubles = new ArrayList<>();
   doubles.add(15.5);
   doubles.add(24.4);
   doubles.add(36.3);
   doubles.add(67.6);
   doubles.add(10.0);
   out.println("JDK ArrayList<Double>:");
   out.println("\tDoubles List: " + doubles);
}

/**
 * Demonstrate use of DoubleArrayList and show how
 * similar using it is to using {@code ArrayList<Double>}.
 */
public void demonstrateFastUtilArrayListForDoubles()
{
   // Demonstrate adding elements to DoubleArrayList is
   // exactly like adding elements to ArrayList<Double>.
   final DoubleArrayList doubles = new DoubleArrayList();
   doubles.add(15.5);
   doubles.add(24.4);
   doubles.add(36.3);
   doubles.add(67.6);
   doubles.add(10.0);
   out.println("FastUtil DoubleArrayList:");  // DoubleArrayList overrides toString()
   out.println("\tDoubles List: " + doubles);
}

When the two above methods are executed, the list of doubles that are written to standard output appear exactly the same with even the same square braces surrounding the comma-separated doubles values.

FastUtil collections tend to implement the appropriate JDK collection interfaces. For example, the just-demonstrated class DoubleArrayList implements several interfaces including Collection<Double> and List<Double>. It turns out that DoubleArrayList also implements it.unimi.dsi.fastutil.doubles.DoubleStack and it.unimi.dsi.fastutil.Stack<Double>. The ability to use this class as a stack is demonstrated in the next code listing.

Using FastUtil's DoubleArrayList as a Stack
/**
 * Demonstrate FastUtil's Double Stack.
 *
 * FastUtil's DoubleStack allows access to its
 * contents via push, pop, and peek. It is declared
 * as a DoubleArrayList type here so that the size()
 * method is available without casting.
 */
public void demonstrateFastUtilDoubleStack()
{
   final DoubleArrayList stack = new DoubleArrayList();
   stack.push(15.5);
   stack.push(17.3);
   stack.push(16.6);
   stack.push(2.2);
   out.println("FastUtil Stack of Doubles");
   out.println("\tPeek: " + stack.peek(0) + "; After Size: " + stack.size());
   out.println("\tPop:  " + stack.pop() + "; After Size: " + stack.size());
   out.println("\tPeek: " + stack.peek(0) + "; After Size: " + stack.size());
}

As I discussed in the blog post on Trove, Trove provides a gnu.trove.TCollections class that is an analogous (subset) to java.util.Collections. FastUtil provides similar functionality, but this approach of providing static methods to act upon FastUtil collections is separated into type-specific and structure-specific classes with static methods rather than in a single class with static methods. The next code listing demonstrates using one of these type-specific and structure-specific classes with static methods, IntSets, in conjunction with a FastUtil IntLinkedOpenHashSet. As the name suggests, the IntSets class provides "static methods and objects that do useful things with [int]-specific sets."

Using IntSets with IntLinkedOpenHashSet
/**
 * Demonstrate one of FastUtil's "equivalent"s of the
 * java.util.Collections class. FastUtil separates its
 * grouping of static methods into classes that are
 * specific to the data type of the collection and to
 * the data structure type of the collection.
 */
public void demonstrateFastUtilCollectionsClass()
{
   final IntLinkedOpenHashSet integers = new IntLinkedOpenHashSet();
   integers.add(5);
   integers.add(7);
   integers.add(3);
   integers.add(1);
   final IntSet unmodifiableIntegers = IntSets.unmodifiable(integers);
   out.println("Unmodifiable Integers:");
   out.println("\tClass: " + unmodifiableIntegers.getClass().getCanonicalName());
   try
   {
      unmodifiableIntegers.add(15);
   }
   catch (Exception ex)
   {
      out.println("\tException caught: " + ex);
   }
}

FastUtil supports that standard Java iteration approaches of using an explicit iterator and using the Java SE 5-introduced for-each loop. FastUtil collections even support the JDK 8 style using .forEach (assuming code is built and run on JDK 8) because the FastUtil collections implement java.lang.Iterable. These are demonstrated in the next code listing.

Iterating FastUtil Collections in Standard Java Style
/**
 * Demonstrate "traditional" Java iteration of a
 * FastUtil collection.
 */
public void demonstrateIterationWithIterator()
{
   final LongOpenHashSet longs = new LongOpenHashSet();
   longs.add(15);
   longs.add(6);
   longs.add(12);
   longs.add(13);
   longs.add(2);
   final LongIterator longIterator = longs.iterator();
   while (longIterator.hasNext())
   {
      final long longValue = longIterator.next();
      out.print(longValue + " ");
   }
}

/**
 * Demonstrate iteration of a FastUtil collection
 * using Java's enhanced for-each approach.
 */
public void demonstrateIterationWithForEach()
{
   final LongLinkedOpenHashSet longs = new LongLinkedOpenHashSet();
   longs.add(15);
   longs.add(6);
   longs.add(12);
   longs.add(13);
   longs.add(2);
   for (final long longValue : longs)
   {
      out.println(longValue + " ");
   }
}

/**
 * Demonstrate iteration of a FastUtil collection
 * using JDK 8 .forEach approach.
 */
public void demonstrateIterationWithJdk8ForEach()
{
   final LongLinkedOpenHashSet longs = new LongLinkedOpenHashSet();
   longs.add(15);
   longs.add(6);
   longs.add(12);
   longs.add(13);
   longs.add(2);
   longs.forEach(longValue -> out.print(longValue + " "));
}
Additional Observations Related to FastUtil
  • Because FastUtil collections implement standard JDK 8 collection interfaces, the APIs are easy to pick up and use with familiar Java idioms.
  • FastUtil collections generally provide a constructor that accepts an array of the underlying data type and an overridden toArray() method and a type-specific method such as toDoubleArray() [for double-oriented collections] to provide their data elements in form of array of primitives.
  • FastUtil collections generally provide explicitly overridden toString() implementations that allow for the individual data elements to be easily written similar to JDK collections and differently than Java arrays (which require Arrays.toString() methods).
  • FastUtil's Java packages are organized generally by primitive type with specific implementations of the various data structure types for that primitive type all in the same package. For example, packages are named like it.unimi.dsi.fastutil.doubles, it.unimi.dsi.fastutil.ints, and so on.
  • Because each FastUtil collection is specific to a particular primitive data type, each collection does not require a generic parameter and has none of the issues related to generics (such as erasure). I haven't seen FastUtil take advantage of the collections being type-specific for type-specific methods like Trove does, probably because FastUtil more closely implements the corresponding Java collection interfaces.
  • FastUtil's Javadoc API documentation is probably the best place to start with when learning to use FastUtil. The classes, interfaces, enums, and packages tend to be fairly well documented in FastUtil's Javadoc-based API documentation.
Conclusion

Like Trove, FastUtil is a library that can potentially be used to more efficiently (in terms of memory and performance) work with Java collections. While Trove seems to have formerly been the most popular of the many choices available, FastUtil is perhaps the most popular currently for reasons that include those cited by TheAlchemist: "still in active development, has more features, supports large sizes (> 2^32), and has better documentation." Similar libraries besides Trove and FastUtil include High Performance Primitive Collections for Java (HPPC), Koloboke, Goldman Sachs Collections, Mahout collections, and Javolution.