Fast dynamic languages — how did JavaScript get so fast?

Topics

Introduction

What follows is a tentative list of major topics together with a brief discussion, intended to spur curiosity and to present some key points to discuss.

Papers are currently cited by title, and they can be found through Google Scholar or similar services, like DBLP or CiteSeerX. Mentioned papers are intended as a starting point, but are often not sufficient. Especially for research papers, the results achieved in a paper and the main idea behind the result is often more important than the technical details of the contributions. Survey papers are different, because they are intended to select (and often organize) the research literature on a given topic.

Criteria to select papers

Publication journal or venue

Considering the conference or journal where a paper is published is important to estimate the importance of the paper. However, I also chose many relevant papers which were published in journals with different focus, so this criteria is not sufficient to exclude a paper from consideration.

Some top-level related conferences, workshops and journals are:

PLDI – Programming Languages Design and Implementation;
ISMM – International Symposium on Memory Management (previous editions are known as International Workshops on Memory Management), specialized on memory allocators and garbage collectors;
CGO – Code Generation and Optimization;
TOPLAS – ACM Transactions on Programming Languages and Systems.

Some second-rank related workshop are:

DLS – Dynamic Languages Symposium, colocated with the prestigious OOPSLA/SPLASH conference;
ICOOLPS – Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems, part of the prestigious ECOOP conference;
VEE – Virtualized Execution Environments, colocated with the prestigious ASPLOS conference.

Reading the article

Reading the abstract, introduction, and conclusion suggests whether a paper is interesting. However, it is also often important to read the evaluation section, study the graphs, and check the results. Good papers will acknowledge disadvantages of their approach, and try to argue why one should accept those disadvantages, but such disadvantages are often discussed only in the evaluation or in the paper body.

Main topics

Adaptive Just-In-Time (JIT) compilation

JIT compilation can improve performance over Ahead Of Time compilation, thanks to adaptive optimization; on the other hand, choosing how much CPU time should be spent on compilation, and what should be compiled, is often difficult. A Survey of Adaptive Optimization in Virtual Machines provides a survey of the area, which needs to be complemented by the study of some of the techniques mentioned there or in later developments.

Trace-based JIT compilation, introduced by the research group of Michael Franz, is one such technique, introduced in One method at a time is quite a waste of time and HotpathVM: an effective JIT compiler for resource-constrained devices.

The experience of Firefox developers, with TraceMonkey and JägerMonkey, suggests further fundamental refinements for trace-based compilation, and I would like this to be discussed as well.

Support for Object-Oriented Programming

Virtual Machines often implement object-oriented languages, and efficient implementation of OO features often require specific optimizations; many of those optimizations were pioneered in Efficient implementation of the Smalltalk-80 system and An Efficient Implementation of SELF, a Dynamically-Typed Object-Oriented Language Based on Prototypes.

Later research concentrated on Java (for instance Fast Subtype Checking in the HotSpot JVM, Cliff Click) and JavaScript (for instance Trace-based just-in-time type specialization for dynamic languages, which explains how trace-based JIT compilation better supports dynamically typed languages).

Garbage collection

Garbage collection spares the programmer the responsibility of managing memory: it is essential to prevent run-time errors due to dangling pointers, and it reduces (but does not eliminates) the possibility of memory leaks. A classical survey of the basic techniques for Garbage Collection is given by Uniprocessor Garbage Collection Techniques (from 1992).

Two papers discussing the overhead caused by Garbage Collection compared to manual memory management, leading to contradictory results, are The Measured Cost of Conservative Garbage Collection, and Quantifying the Performance of Garbage Collection vs. Explicit Memory Management.

A basic technique for parallel GC as done in the HotSpot Java VM is discussed in A generational mostly-concurrent garbage collector.

Some interesting novel garbage collectors are introduced by The pauseless GC algorithm, Immix: A Mark-Region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance, Garbage-First Garbage Collection.

Further topics

Following topics are interesting but not fundamental, so they will not be discussed by students of the current seminar.

Bytecode design and fast interpretation

A slow interpreter can be 100x slower than a fast one. JIT compilation is just 10x faster than a fast interpreter, in the very best case. What makes an interpreter fast?
See: The Structure and Performance of Efficient Interpreters.
Possibly compare to: Virtual-Machine Abstraction and Optimization Techniques.

Tricks for further interpreter optimization

Various tricks such as top-of-stack caching, indirect threading, context threading and dynamic superinstructions allow for further speedup of interpreters, and code copying allows interpreters to become simple JIT compilers.

Development support tools

How to support full-speed debugging and transparent deoptimization, and tool support in general.
Development eupport tools like debugging and profiling are essential for developers.

Debugging optimized code is usually hard, for instance in C, because the developer has to reason in terms of optimization to understand what the code is doing. Disabling optimizations for debugging can make debugging so slow to be impractical. The challenge, first set and solved by Debugging Optimized Code with Dynamic Deoptimization for Self, is to make optimizations transparent to developers.
Extracting accurate profiling information for time and space usage can also be challenging – various techniques for time profiling, like sample-based vs instrumentation-based profiling, are available.

Efficient support for multithreading on multiple CPUs

Support for multithreading should be standard nowadays, but most implementations of dynamic languages such as Python and Ruby cannot take advantage of them. When multithreading is supported, as in Java, optimizing lock usage also poses major challenges, and many papers have been written about them.

Object model and data layout

In general, how to represent data can be a complex question, especially for objects in dynamic object oriented languages. Shadow classes play a role also here.

One of the important topics for data layout is support for unboxing, or how the structure of basic entities is important for space consumption and efficiency, and various other object layout optimizations.
In many cases (e.g. Java), efficient support for synchronization, identity hash codes, reflective information is also important.
The stack representation is important in languages which allow manipulating it (e.g. Smalltalk, Python, languages supporting continuations) .
Discover why the Java language is memory-bloated (it is not only because of garbage collection).
Pointer tagging is, among others, an important technique allowing primitive values to be unboxed, mentioned for instance by the above mentioned paper about Self (An Efficient Implementation of SELF, a Dynamically-Typed Object-Oriented Language Based on Prototypes). A survey of this technique and its alternative is present in Representing Type Information in Dynamically Typed Languages.

The PyPy project: generating a JIT compiler from an interpreter

PyPy is a research project concerned with making virtual machines fundamentally more modular and easy to implement. Crosscutting design decisions are separated, and JIT compilation is implemented separately from any specific language. The resuts are interesting and the project might work in the real world, but performance is currently still inferior to standard approaches.