Sebastian Erdweg

Top publications highlighted.

2018

Declarative Specification of Indentation Rules
Luís Eduardo de Souza Amorim, Michael J. Steindorfer, Sebastian Erdweg, and Eelco Visser. In Proceedings of Software Language Engineering (SLE). ACM, 2018
Distinguished Paper Award. [ pdf ]

In layout-sensitive languages, the indentation of an expression or statement can influence how a program is parsed. While some of these languages (e.g., Haskell and Python) have been widely adopted, there is little support for software language engineers in building tools for layout-sensitive languages. As a result, parsers, pretty-printers, program analyses, and refactoring tools often need to be handwritten, which decreases the maintainability and extensibility of these tools. Even state-of-the-art language workbenches have little support for layout-sensitive languages, restricting the development and prototyping of such languages.
In this paper, we introduce a novel approach to declarative specification of layout-sensitive languages using layout declarations. Layout declarations are high-level specifications of indentation rules that abstract from low-level technicalities. We show how to derive an efficient layout-sensitive generalized parser and a corresponding pretty-printer automatically from a language specification with layout declarations. We validate our approach in a case-study using a syntax definition for the Haskell programming language, investigating the performance of the generated parser and the correctness of the generated pretty-printer against 22191 Haskell files.

Scalable Incremental Building with Dynamic Task Dependencies
Gabriël Konat, Sebastian Erdweg, and Eelco Visser. In Proceedings of International Conference on Automated Software Engineering (ASE). ACM, 2018. [ pdf ]

Incremental build systems are essential for fast, reproducible software builds. Incremental build systems enable short feedback cycles when they capture dependencies precisely and selectively execute build tasks efficiently. A much overlooked feature of build systems is the expressiveness of the scripting language, which directly influences the maintainability of build scripts. In this paper, we present a new incremental build algorithm that allows build engineers to use a proper programming language with explicit task invocation, value and file inspection facilities, and conditional and iterative language constructs. In contrast to prior work on incrementality for such programmable builds, our algorithm scales with the number of tasks affected by a change and is independent of the size of the software project being built. Specifically, our algorithm accepts a set of changed files, transitively detects and re-executes affected build tasks, but also accounts for new task dependencies discovered during building. We have evaluated the performance of our algorithm in a real-world case study and confirm its scalability.

Incrementalizing Lattice-Based Program Analyses in DataLog
Tamás Szabó, Gábor Bergmann, Sebastian Erdweg, and Markus Völter. In Proceedings of the ACM on Programming Languages (OOPSLA). ACM, 2018. [ pdf ]

Program analyses detect errors in code, but when code changes frequently as in an IDE, repeated re-analysis from-scratch is unnecessary: It leads to poor performance unless we give up on precision and recall. Incremental program analysis promises to deliver fast feedback without giving up on precision or recall by deriving a new analysis result from the previous one. However, Datalog and other existing frameworks for incremental program analysis are limited in expressive power: They only support the powerset lattice as representation of analysis results, whereas many practically relevant analyses require custom lattices and aggregation over lattice values. To this end, we present a novel algorithm called DRedL that supports incremental maintenance of recursive lattice-value aggregation in Datalog. The key insight of DRedL is to dynamically recognize increasing replacements of old lattice values by new ones, which allows us to avoid the expensive deletion of the old value. We integrate DRedL into the analysis framework IncA and use IncA to realize incremental implementations of strong-update points-to analysis and string analysis for Java. As our performance evaluation demonstrates, both analyses react to code changes within milliseconds.

An Infrastructure for Combining Domain Knowledge with Automated Theorem Provers (System description)
Sylvia Grewe, Sebastian Erdweg, André Pacak, and Mira Mezini. In Proceedings of International Symposium on Principles and Practice of Declarative Programming (PPDP). ACM, 2018. [ pdf ]

Computer science has seen much progress in the area of automated verification in the last decades. Yet, there are many domains where abstract strategies for verifying standard properties are well-understood by domain experts, but still not automated to a satisfactory degree. One example for such a domain are type soundness proofs. Being able to express domain-specific verification strategies using domain-specific terminology and concepts can help to narrow down this gap toward more automated verification.
We present the requirements, design, and implementation of a configurable verification infrastructure that allows for expressing domain knowledge about proofs and for interfacing with existing automated theorem provers and solvers to verify individual proof steps. As an application scenario for our infrastructure, we present the development of a standard type soundness proof for a typed subset of SQL.

Incremental Overload Resolution in Object-Oriented Programming Languages
Tamás Szabó, Edlira Kuci, Matthijs Bijman, Mira Mezini, and Sebastian Erdweg. In Proceedings of International Workshop on Formal Techniques for Java-like Programs (FTfJP). ACM, 2018. [ pdf ]

Object-oriented programming languages feature static and dynamic overloading: Multiple methods share the same name but provide different implementations. Dynamic overloading (also know as dynamic dispatch) is resolved at run time based on the type of the receiver object. In this paper, we focus on static overloading in Featherweight Java, which is resolved at compile time based on the types of the method arguments.
The challenge this paper addresses is to incrementalize static overload resolution in IDEs. IDEs resolve overloaded methods for the developer to help them discern which implementation a method call refers to. However, as the code changes, the IDE has to reconsider previously resolved method calls when they are affected by the code change. This paper clarifies when a method call is affected by a code change and how to re-resolve method calls with minimal computational effort. To this end, we explore and compare two approaches to incremental type checking: co-contextual type checking and IncA.

Compositional Soundness Proofs of Abstract Interpreters
Sven Keidel, Casper Bach Poulsen, and Sebastian Erdweg. In Proceedings of the ACM on Programming Languages (ICFP). ACM, 2018. [ pdf ]

Abstract interpretation is a technique for developing static analyses. Yet, proving abstract interpreters sound is challenging for interesting analyses, because of the high proof complexity and proof effort. To reduce complexity and effort, we propose a framework for abstract interpreters that makes their soundness proof compositional. Key to our approach is to capture the similarities between concrete and abstract interpreters in a single shared interpreter, parameterized over an arrow-based interface. In our framework, a soundness proof is reduced to proving reusable soundness lemmas over the concrete and abstract instances of this interface; the soundness of the overall interpreters follows from a generic theorem.
To further reduce proof effort, we explore the relationship between soundness and parametricity. Parametricity not only provides us with useful guidelines for how to design non-leaky interfaces for shared interpreters, but also provides us soundness of shared pure functions as free theorems. We implemented our framework in Haskell and developed a k-CFA analysis for PCF and a tree-shape analysis for Stratego. We were able to prove both analyses sound compositionally with manageable complexity and effort, compared to a conventional soundness proof.

Versatile Event Correlation with Algebraic Effects
Oliver Bračevac, Nada Amin, Guido Salvaneschi, Sebastian Erdweg, Patrick Eugster, and Mira Mezini. In Proceedings of the ACM on Programming Languages (ICFP). ACM, 2018. [ pdf ]

We present the first language design to uniformly express variants of n-way joins over asynchronous event streams from different domains, e.g., stream-relational algebra, event processing, reactive and concurrent programming. We model asynchronous reactive programs and joins in direct style, on top of algebraic effects and handlers. Effect handlers act as modular interpreters of event notifications, enabling fine-grained control abstractions and customizable event matching. Join variants can be considered as cartesian product computations with ”degenerate” control flow, such that unnecessary tuples are not materialized a priori. Based on this computational interpretation, we decompose joins into a generic, naive enumeration procedure of the cartesian product, plus variant-specific extensions, represented in terms of user-supplied effect handlers. Our microbenchmarks validate that this extensible design avoids needless materialization. Alongside a formal semantics for joining and prototypes in Koka and multicore OCaml, we contribute a systematic comparison of the covered domains and features.

PIE: A Domain-Specific Language for Interactive Software Development Pipelines
Gabriël Konat, Michael J. Steindorfer, Sebastian Erdweg, and Eelco Visser. In Art, Science, and Engineering of Programming, 2(3). 2018. [ pdf ]

Context. Software development pipelines are used for automating essential parts of software engineering processes, such as build automation and continuous integration testing. In particular, interactive pipelines, which process events in a live environment such as an IDE, require timely results for low-latency feedback, and persistence to retain low-latency feedback between restarts. Inquiry. Developing an incrementalized and persistent version of a pipeline is one way to reduce feedback latency, but requires implementation of dependency tracking, cache invalidation, and other complicated and error-prone techniques. Therefore, interactivity complicates pipeline development if timeliness and persistence become responsibilities of the pipeline programmer, rather than being supported by the underlying system. Systems for programming incremental and persistent pipelines exist, but do not focus on ease of development, requiring a high degree of boilerplate, increasing development and maintenance effort.
Approach. We develop Pipelines for Interactive Environments (PIE), a Domain-Specific Language (DSL), API, and runtime for developing interactive software development pipelines, where ease of development is a focus. The PIE DSL is a statically typed and lexically scoped language. PIE programs are compiled to programs implementing the API, which the PIE runtime executes in an incremental and persistent way. Knowledge. PIE provides a straightforward programming model that enables direct and concise expression of pipelines without boilerplate, reducing the development and maintenance effort of pipelines. Compiled pipeline programs can be embedded into interactive environments such as code editors and IDEs, enabling timely feedback at a low cost.
Grounding. Compared to the state of the art, PIE reduces the code required to express an interactive pipeline by a factor of 6 in a case study on syntax-aware editors. Furthermore, we evaluate PIE in two case studies of complex interactive software development scenarios, demonstrating that PIE can handle complex interactive pipelines in a straightforward and concise way.
Importance. Interactive pipelines are complicated software artifacts that power many important systems such as continuous feedback cycles in IDEs and code editors, and live language development in language workbenches. New pipelines, and evolution of existing pipelines, is frequently necessary. Therefore, a system for easily developing and maintaining interactive pipelines, such as PIE, is important.

Exploration of Language Specifications by Compilation to First-Order Logic (extended version)
Sylvia Grewe, Sebastian Erdweg, André Pacak, Michael Raulf, and Mira Mezini. In Science of Computer Programming, 155 (Special Issue on Selected Papers of PPDP'16). Elsevier, 2018. [ pdf ]

Exploration of language specifications helps to discover errors and inconsistencies early during the development of a programming language. We propose exploration of language specifications via application of existing automated first-order theorem provers (ATPs). To this end, we translate language specifications and exploration tasks to first-order logic, which many ATPs accept as input. However, there are several different strategies for compiling a language specification to first-order logic, and even small variations in the translation may have a large impact on the time it takes ATPs to find proofs.
In this paper, we first present a systematic empirical study on how to best compile language specifications to first-order logic such that existing ATPs can solve typical exploration tasks efficiently. We have developed a compiler product line that implements 36 different compilation strategies and used it to feed language specifications to 4 existing first-order theorem provers. As benchmarks, we developed language specifications for typed SQL and for a Questionnaire Language (QL), with 50 exploration goals each. Our study empirically confirms that the choice of a compilation strategy greatly influences prover performance in general and shows which strategies are advantageous for prover performance. Second, we extend our empirical study with 4 domain-specific strategies for axiom selection and find that axiom selection does not influence prover performance in our benchmark specifications.

2017

Toward Abstract Interpretation of Program Transformations
Sven Keidel and Sebastian Erdweg. In International Workshop on Meta-Programming Techniques and Reflection. ACM, 2017. [ pdf ]

Developers of program transformations often reason about transformations to assert certain properties of the generated code. We propose to apply abstract interpretation to program transformations in order to automate and support such reasoning. In this paper, we present work in progress on the development and application of an abstract interpreter for the program transformation language Stratego. In particular, we present challenges encountered during the development of the abstract Stratego interpreter and how we intend to solve these challenges.

IncAL: A DSL for Incremental Program Analysis with Lattices
Tamás Szabó, Sebastian Erdweg, and Markus Völter. In International Workshop on Incremental Computing (IC). 2017. [ pdf ]

We describe IncAL, a DSL for incremental lattice-based program analyses. IncAL is an extension of our previous work, IncA, which supported relational program analyses, that has been used for practically relevant analyses on industrial code bases. IncAL improves the expressive power of IncA by adding support for synthesis of data, enabling, for example, incremental execution of interval analysis.

Privacy-aware Distributed Incremental Computation
Mirko Köhler, Philipp Haller, Sebastian Erdweg, Mira Mezini, and Guido Salvaneschi. In International Workshop on Incremental Computing (IC). 2017. [ pdf ]

Distributed incremental processing is an effective solution for processing large amounts of data in an efficient way. In this setting, algorithms for operator placement automatically distribute data queries the the available processing units. However, current algorithms for operator placement focus on performance and ignore privacy concerns that arise when handling sensitive data.
We present ongoing research on a new methodology for privacy-aware operator placement that both prevents leakage of sensitive information and improves performance. We implement a working prototype based on previous work on (local) incremental computation.

A Co-contextual Type Checker for Featherweight Java
Edlira Kuci, Sebastian Erdweg, Oliver Bračevac, Andi Bejleri, and Mira Mezini. In Proceedings of European Conference on Object-Oriented Programming (ECOOP). 2017. [ pdf ]

This paper addresses compositional and incremental type checking for object-oriented programming languages. Recent work achieved incremental type checking for structurally typed functional languages through co-contextual typing rules, a constraint-based formulation that removes any context dependency for expression typings. However, that work does not cover key features of object-oriented languages: Subtype polymorphism, nominal typing, and implementation inheritance. Type checkers encode these features in the form of class tables, an additional form of typing context inhibiting incrementalization.
In the present work, we demonstrate that an appropriate co-contextual notion to class tables exists, paving the way to efficient incremental type checkers for object-oriented languages. This yields a novel formulation of Igarashi et al.'s Featherweight Java (FJ) type system, where we replace class tables by the dual concept of class table requirements and class table operations by dual operations on class table requirements. We prove the equivalence of FJ's type system and our co-contextual formulation. Based on our formulation, we implemented an incremental FJ type checker and compared its performance against javac on a number of realistic example programs.

A Module-System Discipline for Model-Driven Software Development
Sebastian Erdweg and Klaus Ostermann. In Art, Science, and Engineering of Programming, 1(2). 2017. [ pdf ]

Model-driven development is a pragmatic approach to software development that embraces domain-specific languages (DSLs), where models correspond to DSL programs. A distinguishing feature of model-driven development is that clients of a model can select from an open set of alternative semantics of the model by applying different model transformations. However, in existing model-driven frameworks, dependencies between models, model transformations, and generated code artifacts are either implicit or globally declared in build scripts, which impedes modular reasoning, separate compilation, and programmability in general.
We propose the design of a new module system that incorporates models and model transformations as modules. A programmer can apply transformations in import statements, thus declaring a dependency on generated code artifacts. Our design enables modular reasoning and separate compilation by preventing hidden dependencies, and it supports mixing modeling artifacts with conventional code artifacts as well as higher-order transformations. We have formalized our design and the aforementioned properties and have validated it by an implementation and case studies that show that our module system successfully integrates model-driven development into conventional programming languages.

2016

Programming Language Techniques for Incremental and Reactive Computing (Dagstuhl Seminar 16402)
Camil Demetrescu, Sebastian Erdweg, Matthew A. Hammer, and Shriram Krishnamurthi. In Dagstuhl Reports, 6(10):1–12. 2016. [ pdf ]

Incremental computations are those that process input changes faster than naive computation that runs from scratch, and reactive computations consist of interactive behavior that varies over time. Due to the importance and prevalence of incremental, reactive systems, ad hoc variants of incremental and reactive computation are ubiquitous in modern software systems.
In response to this reality, the PL research community has worked for several decades to advance new languages for systems that interface with a dynamically-changing environment. In this space, researchers propose new general-purpose languages and algorithms to express and implement efficient, dynamic behavior, in the form of incremental and reactive language systems.
While these research lines continue to develop successfully, this work lacks a shared community that synthesizes a collective discussion about common motivations, alternative techniques, current results and future challenges. To overcome this lack of community, this seminar will work towards building one, by strengthening existing research connections and by forging new ones. Developing a shared culture is critical to the future advancement of incremental and reactive computing in modern PL research, and in turn, this PL research is critical to developing the efficient, understandable interactive systems of the future.

Bootstrapping, Default Formatting, and Skeleton Editing in the Spoofax Language Workbench
Gabriël Konat, Luís Eduardo de Souza Amorim, Sebastian Erdweg, and Eelco Visser. In Language Workbench Challenge (LWC). 2016. [ pdf ]

Language workbenches are tools that help language designers to design and implement (domain-specific) programming languages, aiming to produce a full featured programming environment from a high-level language description. A recent paper, resulting from a series of language workbench challenge workshops, describes a collection of benchmark problems for language workbench research. In this paper, we describe solutions to two of these benchmark problems in the Spoofax Language Workbench, i.e. default formatting in Section 3 and skeleton editing in Section 4. In addition, we introduce a new benchmark problem - bootstrapping of meta-languages in a workbench - and describe the support for bootstrapping we developed for Spoofax in Section 2.

The IDE Portability Problem and its Solution in Monto
Sven Keidel, Wulf Pfeiffer, and Sebastian Erdweg. In Proceedings of Software Language Engineering (SLE). ACM, 2016. [ pdf ]

Modern Integrated Development Environments (IDEs) support multiple programming languages via plug-ins, but developing a high-quality language plug-in is a huge development effort and individual plug-ins are not reusable in other IDEs. We call this the IDE portability problem.
In this paper, we present a solution to the IDE portability problem based on a language-independent and IDE-independent intermediate representation (IR) for editor-service products. This IR enables IDE-independent language services to provide editor services for arbitrary IDEs, using language-independent IDE plug-ins. We combine the IR with a service-oriented architecture to facilitate the modular addition of language services, the decomposition of language services into smaller interdependent services, and the use of arbitrary implementation languages for services.
To evaluate the feasibility of our design, we have implemented the IR and architecture in a framework called Monto. We demonstrate the generality of our design by constructing language services for Java, JavaScript, Python, and Haskell and show that they are reusable in the Eclipse IDE and in a web-based IDE. We also evaluate the performance of Monto and show that Monto is responsive and has admissible performance overhead.

Principled Syntactic Code Completion using Placeholders
Luís Eduardo de Souza Amorim, Sebastian Erdweg, Guido Wachsmuth, and Eelco Visser. In Proceedings of Software Language Engineering (SLE). ACM, 2016. [ pdf ]

Principled syntactic code completion enables developers to change source code by inserting code templates, thus increasing developer efficiency and supporting language exploration. However, existing code completion systems are ad-hoc and neither complete nor sound. They are not complete and only provide few code templates for selected programming languages. They also are not sound and propose code templates that yield invalid programs when inserted. This paper presents a generic framework that automatically derives complete and sound syntactic code completion from the syntax definition of arbitrary languages. A key insight of our work is to provide an explicit syntactic representation for incomplete programs using placeholders. This enables us to address the following challenges for code completion separately: (i) completing incomplete programs by replacing placeholders with code templates, (ii) injecting placeholders into complete programs to make them incomplete, and (iii) introducing lexemes and placeholders into incorrect programs through error-recovery parsing to make them correct so we can apply one of the previous strategies. We formalize our framework and provide an implementation in Spoofax.

Efficient Development of Consistent Projectional Editors using Grammar Cells
Markus Völter, Tamás Szabó, Sascha Lisson, Bernd Kolb, Sebastian Erdweg, and Thorsten Berger. In Proceedings of Software Language Engineering (SLE). ACM, 2016.
Nominated for Distinguished Paper Award. [ pdf ]

The definition of a projectional editor does not just specify the notation of a language, but also how users interact with the notation. Because of that it is easy to end up with different interaction styles within one and between multiple languages. The resulting inconsistencies have proven to be a major usability problem. To address this problem, we introduce grammar cells, an approach for declaratively specifying textual notations and their interactions for projectional editors. In the paper we motivate the problem, give a formal definition of grammar cells, and define their mapping to low-level editor behaviors. Our evaluation based on project experience shows that grammar cells improve editing experience by providing a consistent and intuitive “text editor-like” user experience for textual notations. At the same time they do not limit language composability and the use of non-textual notations, the primary benefits of projectional editors. We have implemented grammar cells for Jetbrains MPS, but they can also be used with other projectional editors.

Bootstrapping Domain-Specific Meta-Languages in Language Workbenches
Gabriël Konat, Sebastian Erdweg, and Eelco Visser. In Proceedings of Generative Programming: Concepts & Experiences (GPCE). ACM, 2016. [ pdf ]

It is common practice to bootstrap compilers of programming languages. By using the compiled language to implement the compiler, compiler developers can code in their own high-level language and gain a large-scale test case. In this paper, we investigate bootstrapping of compiler-compilers as they occur in language workbenches. Language workbenches support the development of compilers through the application of multiple collaborating domain-specific meta-languages for defining a language’s syntax, analysis, code generation, and editor support. We analyze the bootstrapping problem of language workbenches in detail, propose a method for sound bootstrapping based on fixpoint compilation, and show how to conduct breaking meta-language changes in a bootstrapped language workbench. We have applied sound bootstrapping to the Spoofax language workbench and report on our experience.

IncA: A DSL for the Definition of Incremental Program Analyses
Tamás Szabó, Sebastian Erdweg, and Markus Völter. In Proceedings of International Conference on Automated Software Engineering (ASE). ACM, 2016. [ pdf ]

Program analyses support software developers, for example, through error detection, code-quality assurance, and by enabling compiler optimizations and refactorings. To provide real-time feedback to developers within IDEs, an analysis must run efficiently even if the analyzed code base is large.
To achieve this goal, we present a domain-specific language called IncA for the definition of efficient incremental program analyses that update their result as the program changes. IncA compiles analyses into graph patterns and relies on existing incremental matching algorithms. To scale IncA analyses to large programs, we describe optimizations that reduce caching and prune change propagation. Using IncA, we have developed incremental control flow and points-to analysis for C, well-formedness checks for DSLs, and 10 FindBugs checks for Java. Our evaluation demonstrates significant speedups for all analyses compared to their non-incremental counterparts.

An Extensible Framework for Variable-Precision Data-Flow Analyses in MPS
Tamás Szabó, Simon Alperovich, Markus Völter, and Sebastian Erdweg. In Proceedings of International Conference on Automated Software Engineering (ASE) — Tool Track . ACM, 2016. [ pdf ]

Data-flow analyses are used as part of many software engineering tasks: they are the foundations of program understanding, refactorings and optimized code generation. Similar to general-purpose languages (GPLs), state-of-the-art domain-specific languages (DSLs) also require sophisticated data-flow analyses. However, as a consequence of the different economies of DSL development and their typically relatively fast evolution, the effort for developing and evolving such analyses must be lowered compared to GPLs. This tension can be resolved with dedicated support for data-flow analyses in language workbenches.
In this tool paper we present MPS-DF, which is the component in the MPS language workbench that supports the definition of data-flow analyses for DSLs. Language developers can define data-flow graph builders declaratively as part of a language definition and compute analysis results efficiently based on these data-flow graphs. MPS-DF is extensible such that it does not compromise the support for language composition in MPS. Additionally, clients of MPS-DF analyses can run the analyses with variable precision thus trading off precision for performance. This allows clients to tailor an analysis to a particular use case.

Exploration of Language Specifications by Compilation to First-Order Logic
Sylvia Grewe, Sebastian Erdweg, Michael Raulf, and Mira Mezini. In Proceedings of International Symposium on Principles and Practice of Declarative Programming (PPDP). ACM, 2016. [ pdf ]

Exploration of language specifications helps to discover errors and inconsistencies early during the development of a programming language. We propose exploration of language specifications via application of existing automated first-order theorem provers (ATPs). To this end, we translate language specifications and exploration tasks to first-order logic, which many ATPs accept as input. However, there are several different strategies for compiling a language specification to first-order logic, and even small variations in the translation have a large impact on the time it takes ATPs to find proofs.
In this paper, we present a systematic empirical study on how to best compile language specifications to first-order logic such that existing ATPs can solve typical exploration tasks efficiently. We have developed a compiler product line that implements 36 different compilation strategies and used it to feed language specifications to 4 existing first-order theorem provers. As a benchmark, we developed a language specification for typed SQL with 50 exploration goals. Our study confirms that the choice of a compilation strategy greatly influences prover performance and shows which strategies are advantageous for prover performance.

Automating Proof Steps of Progress Proofs: Comparing Vampire and Dafny
Sylvia Grewe, Sebastian Erdweg, and Mira Mezini. In Proceedings of Vampire Workshop. EPiC, 2016. [ pdf ]

Developing provably sound type systems is a non-trivial task which, as of today, typically requires expert skills in formal methods and a considerable amount of time. Our Veritas project aims at providing support for the development of soundness proofs of type systems and efficient type checker implementations from type system specifications. To this end, we investigate how to best automate typical steps within type soundness proofs.
In this paper, we focus on progress proofs for type systems of domain-specific languages. As a running example for such a type system, we model a subset SQL and augment it with a type system. We compare two different approaches for automating proof steps of the progress proofs for this type system against each other: firstly, our own tool Veritas, which translates proof goals and specifications automatically to TPTP and calls Vampire on them, and secondly, the programming language Dafny, which translates proof goals and specifications to the intermediate verification language Boogie 2 and calls the SMT solver Z3 on them. We find that Vampire and Dafny are equally well-suited for automatically proving simple steps within progress proofs.

Towards Live Language Development
Gabriël Konat, Sebastian Erdweg, and Eelco Visser. In Proceedings of Workshop on Live Programming Systems (LIVE). 2016. [ pdf ]

We would like to see live programming applied to language development, to get live language development. With live language development, a language developer gets fast feedback when they change their language, enabling experimentation with language design and development.
In this paper, we describe what live language development is and why it is useful, and we analyze what is needed to achieve live language development. Moreover, we describe our work in progress in supporting live language development in the Spoofax language workbench.

CPL: A Core Language for Cloud Computing
Oliver Bračevac, Sebastian Erdweg, Guido Salvaneschi, and Mira Mezini. In Proceedings of International Conference on Modularity (AOSD), pages 94–105. ACM, 2016. [ pdf ]

Running distributed applications in the cloud involves deployment. That is, distribution and configuration of application services and middleware infrastructure. The considerable complexity of these tasks resulted in the emergence of declarative JSON-based domain-specific deployment languages to develop deployment programs. Existing deployment programs unsafely compose artifacts written in different languages, leading to bugs that are hard to detect before run time. Furthermore, deployment languages do not have extension points for custom implementations of existing cloud services, e.g., adding application-specific load balancing policies is difficult.
To address these shortcomings, we propose CPL (Cloud Platform Language), a statically-typed core language for programming both distributed applications as well as their deployment on a cloud platform. In CPL, application services and deployment programs interact through statically typed, extensible interfaces, and an application can trigger further deployment at run time. We formally define the dynamic and static semantics of CPL and demonstrate that it enables type-safe, composable and extensible libraries of service combinators, such as load balancing and fault tolerance.

Sound Type-Dependent Syntactic Language Extension
Florian Lorenzen and Sebastian Erdweg. In Proceedings of Symposium on Principles of Programming Languages (POPL), pages 204–216. ACM, 2016. [ pdf ]

Syntactic language extensions can introduce new facilities into a programming language while requiring little implementation effort and modest changes to the compiler. It is typical to desugar language extensions in a distinguished compiler phase after parsing or type checking, not affecting any of the later compiler phases. If desugaring happens before type checking, the desugaring cannot depend on typing information and type errors are reported in terms of the generated code. If desugaring happens after type checking, the code generated by the desugaring is not type checked and may introduce vulnerabilities. Both options are undesirable.
We propose a system for syntactic extensibility where desugaring happens after type checking and desugarings are guaranteed to only generate well-typed code. A major novelty of our work is that desugarings operate on typing derivations instead of plain syntax trees. This provides desugarings access to typing information and forms the basis for the soundness guarantee we provide, namely that a desugaring generates a valid typing derivation. We have implemented our system for syntactic extensibility in a language-independent fashion and instantiated it for a substantial subset of Java, including generics and inheritance. We provide a sound Java extension for Scala-like for-comprehensions.

2015

Evaluating and Comparing Language Workbenches: Existing Results and Benchmarks for the Future
Sebastian Erdweg, Tijs van der Storm, Markus Völter, Laurence Tratt, Meinte Boersma, Remi Bosman, William R. Cook, Albert Gerritsen, Angelo Hulshout, Steven Kelly, Alex Loh, Gabriël Konat, Pedro J. Molina, Martin Palatnik, Risto Pohjonen, Eugen Schindler, Klemens Schindler, Riccardo Solmi, Vlad Vergu, Eelco Visser, Kevin van der Vlist, Guido Wachsmuth, and Jimi van der Woning. In Computer Languages, Systems and Structures, 44, Part A:24–47. Elsevier, 2015. [ pdf ]

Language workbenches are environments for simplifying the creation and use of computer languages. The annual Language Workbench Challenge (LWC) was launched in 2011 to allow the many academic and industrial researchers in this area an opportunity to quantitatively and qualitatively compare their approaches. We first describe all four LWCs to date, before focussing on the approaches used, and results generated, during the third LWC. We give various empirical data for ten approaches from the third LWC. We present a generic feature model within which the approaches can be understood and contrasted. Finally, based on our experiences of the existing LWCs, we propose a number of benchmark problems for future LWCs.

Modular Capture Avoidance for Program Transformations
Nico Ritschel and Sebastian Erdweg. In Proceedings of Software Language Engineering (SLE), 59–70. ACM, 2015. Best Presentation Award. [ pdf ]

The application of program transformations and refactorings involves the risk of capturing variables, which may break the intended semantics of the transformed code. One way to resolve variable capture is by renaming of the involved identifiers. However, in a modular context, the renaming of exported declarations is undesirable (affecting a module's clients), and the renaming of imported declarations is impossible (requiring changes to third-party modules).
We present an algorithm name-fix that detects and eliminates variable capture modularly. We extend a previous non-modular version of name-fix in order to (i) minimize renamings of exported declarations, (ii) propagate necessary renamings of exported declarations to clients, and (iii) avoid renamings of imported declarations altogether. Together with support for transitive name bindings and conflicting declarations, our extensions to name-fix enable the application to real-world languages that feature separate compilation. To demonstrate the applicability of name-fix, we use it to modularly resolve variable capture for optimizations, refactorings, and desugarings of Lightweight Java.

A Sound and Optimal Incremental Build System with Dynamic Dependencies
Sebastian Erdweg, Moritz Lichter, and Manuel Weiel. In Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 89–106. ACM, 2015. [ pdf ]

Build systems are used in all but the smallest software projects to invoke the right build tools on the right files in the right order. A build system must be sound (after a build, generated files consistently reflect the latest source files) and efficient (recheck and rebuild as few build units as possible). Contemporary build systems provide limited efficiency because they lack support for expressing fine-grained file dependencies.
We present a build system called pluto that supports the definition of reusable, parameterized, interconnected builders. When run, a builder notifies the build system about dynamically required and produced files as well as about other builders whose results are needed. To support fine-grained file dependencies, we generalize the traditional notion of time stamps to allow builders to declare their actual requirements on a file's content. pluto collects the requirements and products of a builder with their stamps in a build summary. This enables pluto to provides provably sound and optimal incremental rebuilding. To support dynamic dependencies, our rebuild algorithm interleaves dependency analysis and builder execution and enforces invariants on the dependency graph through a dynamic analysis. We have developed pluto as a Java API and used it to implement more than 25 builders. We describe our experience with migrating a larger Ant build script to pluto and compare the respective build times.

A Co-contextual Formulation of Type Rules and Its Application to Incremental Type Checking
Sebastian Erdweg, Oliver Bračevac, Edlira Kuci, Matthias Krebs, and Mira Mezini. In Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 880–897. ACM, 2015. [ pdf ]

Type rules associate types to expressions given a typing context. As the type checker traverses the expression tree top-down, it extends the typing context with additional context information that becomes available. This way, the typing context coordinates type checking in otherwise independent subexpressions, which inhibits parallelization and incrementalization of type checking.
We propose a co-contextual formulation of type rules that only take an expression as input and produce a type and a set of context requirements. Co-contextual type checkers traverse an expression tree bottom-up and merge context requirements of independently checked subexpressions. We describe a method for systematically constructing a co-contextual formulation of type rules from a regular context-based formulation and we show how co-contextual type rules give rise to incremental type checking. Using our method, we derive incremental type checkers for PCF and for extensions that introduce records, parametric polymorphism, and subtyping. Our performance evaluation shows that co-contextual type checking has performance comparable to standard context-based type checking, and incrementalization can improve performance significantly.

Towards Secure Integration of Cryptographic Software
Steven Arzt, Sarah Nadi, Karim Ali, Eric Bodden, Sebastian Erdweg, and Mira Mezini. In Proceedings of Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software (Onward), 1–13. ACM, 2015. [ pdf ]

While cryptography is now readily available to everyone and can, provably, protect private information from attackers, we still frequently hear about major data leakages, many of which are due to improper use of cryptographic mechanisms. The problem is that many application developers are not cryptographic experts. Even though high-quality cryptographic APIs are widely available, programmers often select the wrong algorithms or misuse APIs due to a lack of understanding. Such issues arise with both simple operations such as encryption as well as with complex secure communication protocols such as SSL. In this paper, we provide a long-term solution that helps application developers integrate cryptographic components correctly and securely by bridging the gap between cryptographers and application developers.
Our solution consists of a software product line (with an underlying feature model) that automatically identifies the correct cryptographic algorithms to use, based on the developer's answers to high-level questions in non-expert terminology. Each feature (i.e., cryptographic algorithm) maps into corresponding Java code and a usage protocol describing API restrictions. By composing the user's selected features, we automatically synthesize a secure code blueprint and a usage protocol that corresponds to the selected usage scenario. Since the developer may change the application code over time, we use the usage protocols to statically analyze the program and ensure that the correct use of the API is not violated over time.

Type Systems for the Masses: Deriving Soundness Proofs and Efficient Checkers
Sylvia Grewe, Sebastian Erdweg, Pascal Wittmann, and Mira Mezini. In Proceedings of Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software (Onward), 137–150. ACM, 2015. [ pdf ]

The correct definition and implementation of non-trivial type systems is difficult and requires expert knowledge, which is not available to developers of domain-specific languages and specialized APIs in practice. We propose Veritas, a workbench that simplifies the development of sound type systems. Veritas provides a single, high-level specification language for type systems, from which it automatically tries to derive soundness proofs and efficient and correct type-checking algorithms. For verification, Veritas combines off-the-shelve automated first-order theorem provers with automated proof strategies specific to type systems. For deriving efficient type checkers, Veritas provides a collection of optimization strategies whose applicability to a given type system is checked through verification on a case-by-case basis. We have developed a prototypical implementation of Veritas and used it to verify type soundness of the simply-typed lambda calculus and of parts of typed SQL. Our experience suggests that a high degree of automation is possible for type systems of domain-specific languages.

Using Vampire in Soundness Proofs of Type Systems
Sylvia Grewe, Sebastian Erdweg, and Mira Mezini. In Proceedings of Vampire Workshop, pages 33–51. EPiC, 2015. [ pdf ]

Type systems for programming languages shall detect type errors in programs before runtime. To ensure that a type system meets this requirement, its soundness must be formally verified. We aim at automating soundness proofs of type systems to facilitate the development of sound type systems for domain-specific languages.

Soundness proofs for type systems typically require induction. However, many of the proofs of individual induction cases only require first-order reasoning. For the development of our workbench Veritas, we build on this observation by combining automated first-order theorem provers such as Vampire with automated proof strategies specific to type systems. In this paper, we describe how we encode type soundness proofs in first-order logic using TPTP. We show how we use Vampire to prove the soundness of type systems for the simply-typed lambda calculus and for parts of a typed SQL. We report on which parts of the proofs are handled well by Vampire, and what parts work less well with our current approach.

Domain-specific languages (Dagstuhl Seminar 15062)
Sebastian Erdweg, Martin Erwig, Richard F. Paige, and Eelco Visser. In Dagstuhl Reports, 5(2):26–43. 2015. [ pdf ]

This report documents the program and outcomes of Dagstuhl Seminar 15062 "Domain-Specific Languages", which took place February 1-6, 2015. The seminar was motivated on the one hand by the high interest in domain-specific languages in academia and industry and on the other hand by the observation that the community is divided into largely disconnected subdisciplines (e.g., internal, external, visual, model-driven). The seminar included participants across these subdisciplines and included overview talks, technical talks, demos, discussion groups, and an industrial panel. This report collects the abstracts of talks and other activities at the seminar and summarizes the outcomes of the seminar.

Capture-Avoiding Program Transformations with name-fix
Sebastian Erdweg, Tijs van der Storm, and Yi Dai. In Proceedings of GI Fachtagung Software Engineering, pages 93–94. GI, 2015.

We present an algorithm called name-fix that automatically eliminates variable capture from a generated program by systematically renaming variables. name-fix is guided by a graph representation of the binding structure of a program, and requires name-resolution algorithms for the source language and the target language of a transformation. name-fix is generic and works for arbitrary transformations in any transformation system that supports origin tracking for names.

2014

i3QL: Language-Integrated Live Data Views
Ralf Mitschke, Sebastian Erdweg, Mirko Köhler, Mira Mezini, and Guido Salvaneschi. In Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 417–432. ACM, 2014. [ pdf ]

An incremental computation updates its result based on a change to its input, which is often an order of magnitude faster than a recomputation from scratch. In particular, incrementalization can make expensive computations feasible for settings that require short feedback cycles, such as interactive systems, IDEs, or (soft) real-time systems.
This paper presents i3QL, a general-purpose programming language for specifying incremental computations. i3QL provides a declarative SQL-like syntax and is based on incremental versions of operators from relational algebra, enriched with support for general recursion. We integrated i3QL into Scala as a library, which enables programmers to use regular Scala code for non-incremental subcomputations of an i3QL query and to easily integrate incremental computations into larger software projects. To improve performance, i3QL optimizes user-defined queries by applying algebraic laws and partial evaluation. We describe the design and implementation of i3QL and its optimizations, demonstrate its applicability, and evaluate its performance.

Towards Virtual Traits in Scala
Manuel Weiel, Ingo Maier, Sebastian Erdweg, Michael Eichberg, and Mira Mezini. In Proceedings of Scala Workshop, pages 67–75. ACM, 2014. [ pdf ]

Scala is a powerful language that supports a variety of features, but it lacks virtual traits. Virtual traits are class-valued object attributes and can be redefined within subtraits. They support higher-order hierarchies and family polymorphism. This work introduces virtual traits into Scala and explains how to encode virtual traits on top of existing Scala features. We have implemented this encoding using Scala annotation macros and have conducted two small case studies.

Evolution of Software Systems with Extensible Languages and DSLs
Sebastian Erdweg, Stefan Fehrenbach, and Klaus Ostermann. In IEEE Software, Special Issue on New Directions in Programming Languages, 31(5):68–75. IEEE, 2014. [ pdf ]

Domain-specific languages (DSLs) provide various advantages regarding the maintainability of software systems. Unfortunately, existing software systems do not exploit DSLs and their maintenance benefits. Based on the extensible programming language SugarJ, we present a process for gradually integrating DSLs into existing software systems, we report on our experience in integrating three DSLs into two existing software systems, and we outline a roadmap for the development of tool support for the integration of DSLs.

Capture-Avoiding and Hygienic Program Transformations
Sebastian Erdweg, Tijs van der Storm, and Yi Dai. In Proceedings of European Conference on Object-Oriented Programming (ECOOP), pages 489–514. Springer, 2014. [ pdf ]

Program transformations in terms of abstract syntax trees compromise referential integrity by introducing variable capture. Variable capture occurs when in the generated program a variable declaration accidentally shadows the intended target of a variable reference. Existing transformation systems either do not guarantee the avoidance of variable capture or impair the implementation of transformations.
We present an algorithm called name-fix that automatically eliminates variable capture from a generated program by systematically renaming variables. name-fix is guided by a graph representation of the binding structure of a program, and requires name-resolution algorithms for the source language and the target language of a transformation. name-fix is generic and works for arbitrary transformations in any transformation system that supports origin tracking for names. We verify the correctness of name-fix and identify an interesting class of transformations for which name-fix provides hygiene. We demonstrate the applicability of name-fix for implementing capture-avoiding substitution, inlining, lambda lifting, and compilers for two domain-specific languages.

Tracing Program Transformations with String Origins
Pablo Inostroza, Tijs van der Storm, and Sebastian Erdweg. In Proceedings of International Conference on Model Transformation (ICMT), pages 154–169. Springer, 2014. [ pdf ]

Program transformations play an important role in domain-specific languages and model-driven development. Tracing the execution of such transformations has well-known benefits for debugging, visualization and error reporting. In this paper, we introduce string origins, a lightweight, generic and portable technique to establish a tracing relation between the textual fragments in the input and output of a program transformation. We discuss the semantics and the implementation of string origins using the Rascal meta programming language as an example. We illustrate the utility of string origins by presenting data structures and operations for tracing generated code, implementing protected regions, performing name resolution and fixing inadvertent name capture in generated code.

Modular Specification and Dynamic Enforcement of Syntactic Language Constraints when Generating Code
Sebastian Erdweg, Vlad Vergu, Mira Mezini, and Eelco Visser. In Proceedings of International Conference on Modularity (AOSD), pages 241–252. ACM, 2014. [ pdf ]

A key problem in metaprogramming is to guarantee that generated code is well-formed with respect to the context-free and context-sensitive constraints of the target language. We propose typesmart constructors as a dynamic approach to enforcing the well-formedness of generated code. A typesmart constructor is a function that is used in place of a regular constructor to create values, but it may reject the creation of values if the given data violates some language-specific constraint. While typesmart constructors can be implemented individually, we demonstrate how to derive them automatically from a grammar, so that the grammar remains the sole specification of a language's syntax and is not duplicated. We have integrated support for typesmart constructors into the run-time system of Stratego to enforce usage of typesmart constructors implicitly whenever a regular constructor is called. We evaluate the applicability, performance, and usefulness of typesmart constructors for syntactic constraints in a compiler for MiniJava developed with Spoofax and in various language extensions of Java and Haskell implemented with SugarJ and SugarHaskell.

Finding Bugs in Program Generators by Dynamic Analysis of Syntactic Language Constraints (demo paper)
Sebastian Erdweg, Vlad Vergu, Mira Mezini, and Eelco Visser. In Proceedings of International Conference on Modularity (AOSD), pages 17–20. ACM, 2014. [ pdf ]

Program generators and transformations are hard to implement correctly, because the implementation needs to generically describe how to construct programs, for example, using templates or rewrite rules. We apply dynamic analysis to program generators in order to support developers in finding bugs and identifying the source of the bug. Our analysis focuses on syntactic language constraints and checks that generated programs are syntactically well-formed. To retain a language's grammar as the unique specification of the language's syntax, we devised mechanisms to derive the analysis from the grammar. Moreover, we designed a run-time system to support the modular activation/deactivation of the analysis, so that generators do not require adaption. We have implemented the analysis for the Stratego term-rewriting language and applied it in case studies based on Spoofax and SugarJ.

2013

A Framework for Extensible Languages
Sebastian Erdweg and Felix Rieger. In Proceedings of Generative Programming: Concepts & Experiences (GPCE), pages 3–12. ACM, 2013. [ pdf ]

Extensible programming languages such as SugarJ or Racket enable programmers to introduce customary language features as extensions of the base language. Traditionally, systems that support language extensions are either (i) agnostic to the base language or (ii) only support a single base language. In this paper, we present a framework for language extensibility that turns a non-extensible language into an extensible language featuring library-based extensible syntax, extensible static analyses, and extensible editor support. To make a language extensible, our framework only requires knowledge of the base language's grammar, the syntax for import statements (which activate extensions), and how to compile base-language programs. We have evaluated the generality of our framework by instantiating it for Java, Haskell, Prolog, JavaScript, and System F-omega, and by studying existing module-system features and their support in our framework.

Template Constructors for Reusable Object Initialization
Marko Martin, Mira Mezini, and Sebastian Erdweg. In Proceedings of Generative Programming: Concepts & Experiences (GPCE), pages 43–52. ACM, 2013. [ pdf ]

Reuse of and abstraction over object initialization logic is not properly supported in mainstream object-oriented languages. This may result in significant amount of boilerplate code and proliferation of constructors in subclasses. It also makes it impossible for mixins to extend the initialization interface of classes they are applied to. We propose template constructors, which employ template parameters and pattern matching of them against signatures of superclass constructors to enable a one-to-many binding of super-calls. We demonstrate how template constructors solve the aforementioned problems. We present a formalization of the concept, a Java-based implementation, and use cases which exercise its strengths.

Software Evolution to Domain-Specific Languages
Stefan Fehrenbach, Sebastian Erdweg, and Klaus Ostermann. In Proceedings of Software Language Engineering (SLE), volume 8225 of LNCS, pages 96–116. Springer, 2013. [ pdf ]

Domain-specific languages (DSLs) can improve software maintainability due to less verbose syntax, avoidance of boilerplate code, more accurate static analysis, and domain-specific tool support. However, most existing applications cannot capitalize on these benefits because they were not designed to use DSLs, and rewriting large existing applications from scratch is infeasible. We propose a process for evolving existing software to use embedded DSLs based on modular definitions and applications of syntactic sugar as provided by the extensible programming language SugarJ. Our process is incremental along two dimensions: A developer can add support for another DSL as library, and a developer can refactor more code to use the syntax, static analysis, and tooling of a DSL. Importantly, the application remains executable at all times and no complete rewrite is necessary. We evaluate our process by incrementally evolving the Java Pet Store and a deliberately small part of the Eclipse IDE to use language support for field-accessors, JPQL, XML, and XML Schema. To help maintainers to locate Java code that would benefit from using DSLs, we developed a tool that analyzes the definition of a DSL to derive patterns of Java code that could be represented with a high-level abstraction of the DSL instead.

The State of the Art in Language Workbenches
Sebastian Erdweg, Tijs van der Storm, Markus Völter, Meinte Boersma, Remi Bosman, William R. Cook, Albert Gerritsen, Angelo Hulshout, Steven Kelly, Alex Loh, Gabriël Konat, Pedro J. Molina, Martin Palatnik, Risto Pohjonen, Eugen Schindler, Klemens Schindler, Riccardo Solmi, Vlad Vergu, Eelco Visser, Kevin van der Vlist, Guido Wachsmuth, and Jimi van der Woning. In Proceedings of Software Language Engineering (SLE), volume 8225 of LNCS, pages 197–217. Springer, 2013. [ pdf ]

Language workbenches are tools that provide high-level mechanisms for the implementation of (domain-specific) languages. Language workbenches are an active area of research that also receives many contributions from industry. To compare and discuss existing language workbenches, the annual Language Workbench Challenge was launched in 2011. Each year, participants are challenged to realize a given domain-specific language with their workbenches as a basis for discussion and comparison. In this paper, we describe the state of the art of language workbenches as observed in the previous editions of the Language Workbench Challenge. In particular, we capture the design space of language workbenches in a feature model and show where in this design space the participants of the 2013 Language Workbench Challenge reside. We compare these workbenches based on a DSL for questionnaires that was realized in all workbenches.

Modular and Automated Type-Soundness Verification for Language Extensions
Florian Lorenzen and Sebastian Erdweg. In Proceedings of International Conference on Functional Programming (ICFP). pages 331–342. ACM, 2013. [ pdf ]

Language extensions introduce high-level programming constructs that protect programmers from low-level details and repetitive tasks. For such an abstraction barrier to be sustainable, it is important that no errors are reported in terms of generated code. A typical strategy is to check the original user code prior to translation into a low-level encoding, applying the assumption that the translation does not introduce new errors. Unfortunately, such assumption is untenable in general, but in particular in the context of extensible programming languages, such as Racket or SugarJ, that allow regular programmers to define language extensions.
In this paper, we present a formalism for building and automatically verifying the type-soundness of syntactic language extensions. To build a type-sound language extension with our formalism, a developer declares an extended syntax, type rules for the extended syntax, and translation rules into the (possibly further extended) base language. Our formalism then validates that the user-defined type rules are sufficient to guarantee that the code generated by the translation rules cannot contain any type errors. This effectively ensures that an initial type check prior to translation precludes type errors in generated code. We have implemented a core system in PLT Redex and we have developed a syntactically extensible variant of System F-omega that we extend with let notation, monadic do blocks, and algebraic data types. Our formalism verifies the soundness of each extension automatically.

XPoints: Extension Interfaces for Multilayered Applications
Mohamed Aly, Anis Charfi, Sebastian Erdweg, and Mira Mezini. In Proceedings of Enterprise Distributed Object Computing (EDOC), pages 237–246. IEEE, 2013. [ pdf ]

Extensibility is a key requirement in modern software applications. In the context of business applications it is one of the major selection criteria from the customer perspective. However, there are some challenges concerning the specification and enforcement of extension interfaces. Extension interfaces define the resources of the base applications that are allowed to be extended, where and when the extension code will run, and what resources of the base application an extension is allowed to access. While concepts for such interfaces are still a hot research topic for "traditional" software constructed using a single programming language, they are completely missing for complex consisting of several abstraction layers. In addition, state-of-the-art approaches do not support providing different extension interfaces for different stakeholders.
This paper attempts to fill this gap by introducing XPoints, an approach and a language for specifying and enforcing extension interfaces in multilayered applications. An extension interface in XPoints defines the available extension points on the different abstraction layers, controls the access and visibility of the core application to the extension, and con- strains the interplay between extension points possibly from different abstraction layers. Several extension interfaces can be overlaid over the same core application, hence, enabling multiple extender views to co-exist. Using an XPoints interface, a software provider can automatically generate the extensibility code infrastructure to provide the extension interface for the core application.

Modular and Automated Type-Soundness Verification for Language Extensions
(extended abstract) Sebastian Erdweg and Florian Lorenzen. In Proceedings of Workshop on Scalable Language Specification (SLS), 2013. [ pdf ]

We present a formalism for building language extensions that ensures soundness of extensions with respect to the type system of the base language. To build a type-sound language extension with our formalism, a developer declares an extended syntax, type rules for the extended syntax, and translation rules into the (possibly further extended) base language. Our formalism then modularly and automatically validates that the user-defined type rules are sufficient to guarantee that code generated by the translation rules cannot contain any type errors.

Extensible Languages for Flexible and Principled Domain Abstraction
Sebastian Erdweg. PhD thesis, Philipps-Universität Marburg, 2013. [ pdf ]

Most programming languages are designed for general-purpose software development in a one-size-fits-all fashion: They provide the same set of language features and constructs for all possible applications programmers ever may want to develop. As with shoes, the one-size-fits-all solution grants a good fit to few applications only.
The trend toward domain-specific languages, model-driven development, and language-oriented programming counters general-purpose languages by promoting the use of domain abstractions that facilitate domain-specific language features and constructs tailored to certain application domains. In particular, domain abstraction avoids the need for encoding domain concepts with general-purpose language features and thus allows programmers to program at the same abstraction level as they think.
Unfortunately, current approaches to domain abstraction cannot deliver on the promises of domain abstraction. On the one hand, approaches that target internal domain-specific languages lack flexibility regarding the syntax, static checking, and tool support of domain abstractions, which limits the level of actually achieved domain abstraction. On the other hand, approaches that target external domain-specific languages lack important principles, such as modular reasoning and composition of domain abstractions, which inhibits the applicability of these approaches in the development of larger software systems. In this thesis, we pursue a novel approach that unifies the advantages of internal and external domain-specific languages to support flexible and principled domain abstraction.
We propose library-based extensible programming languages as a basis for domain abstraction. In an extensible language, domain abstraction can be realized by extending the language with domain-specific syntax, static analysis, and tool support. This enables domain abstractions as flexible as external domain-specific languages. To ensure the compliance with important software-development principles, we organize language extensions as libraries and use simple import statements to activate extensions. This facilitates modular reasoning (by inspecting import statements), supports the composition of domain abstractions (by importing multiple extensions), and allows uniform self-application of language extensions in the development of further extensions (by importing extensions in an extension definition). A library-based organization of extensions enables domain abstractions as principled as internal domain-specific languages.
We designed and implemented SugarJ, a library-based extensible programming language on top of Java. SugarJ libraries can declare and export extensions of SugarJ's syntax, static analysis, and editor support. Thereby, a syntactic extension consists of an extended syntax and a desugaring transformation from the extended syntax into SugarJ base syntax, an analysis extension matches on part of the current file's abstract syntax tree and produces a list of errors, and an editor extension declares editor services such as coloring or code completion for certain language constructs. SugarJ extensions are fully self-applicable: An extended syntax can desugar into the declaration of another extensions, an extended analysis can check the declaration of an extension, and an extended editor can assist developers in writing extensions. To process a source file with extensions, the SugarJ compiler and IDE inspect the imported libraries to determine active extensions. The compiler and IDE adapt the parser, code generator, analyzer, and editor of the source file according to the active extensions.
In this thesis, we do not only describe the design and implementation of SugarJ, but also report on extensions of the original design. In particular, we designed and implemented a generalization of the SugarJ compiler that supports alternative base languages besides Java. Using this generalization, we developed the library-based extensible programming languages SugarHaskell, SugarProlog, and SugarFomega. Furthermore, we developed an extension of SugarJ that supports polymorphic domain abstraction and ensures communication integrity. Polymorphic domain abstraction enables programmers to provide multiple desugarings for the same domain-specific syntax. This increases the flexibility of SugarJ and supports scenarios known from model-driven development. Communication integrity specifies that components of a software system may communicate over explicit channels only. This is interesting in the context of code generation where it effectively prohibits the generation of implicit module dependencies. We augmented SugarJ's principles by enforcing communication integrity.
On the basis of SugarJ and numerous case studies, we argue that flexible and principled domain abstraction constitutes a scalable programming model for the development of complex software systems.

Embedding a Questionnaire DSL with SugarJ
Sebastian Erdweg. In Language Workbench Challenge (LWC), 2013. [ pdf ]

We describe our SugarJ-based solution to the 2013 Language Workbench Competition. As part of this competition, we developed domain-specific language (DSL) for questionnaires that features conditional questions, locally defined questions, derived values, name checking, type checking, checking for overlapping question instances, and basic tool support such as code coloring. Using SugarJ, we have realized the questionnaire DSL as a language extension of Java that translates a questionnaire into a Java Swing component. We have realized the questionnaire DSL via separate components for the syntax, code generation, tool support, name checking, type checking, and overlap checking. Moreover, we use SugarJ's support for layout-sensitive syntax to use indentation instead of parentheses in the design of the questionnaire DSL.

2012

Language Composition Untangled
Sebastian Erdweg, Paolo G. Giarrusso, and Tillmann Rendel. In Proceedings of Workshop on Language Descriptions, Tools and Applications (LDTA), pages 7:1–7:8. ACM, 2012. [ pdf ]

In language-oriented programming and modeling, software developers are largely concerned with the definition of domain-specific languages (DSLs) and their composition. While various implementation techniques and frameworks exist for defining DSLs, language composition has not obtained enough attention and is not well-enough understood. In particular, there is a lack of precise terminology for describing observations about language composition in theory and in existing language-development systems. To clarify the issue, we specify five forms of language composition: language extension, language restriction, language unification, self-extension, and extension composition. We illustrate this classification by various examples and apply it to discuss the performance of different language-development system with respect to language composition. We hope that the terminology provided by our classification will enable more precise communication on language composition.

Layout-sensitive Generalized Parsing
Sebastian Erdweg, Tillmann Rendel, Christian Kästner, and Klaus Ostermann. In Proceedings of Software Language Engineering (SLE), volume 7745 of LNCS, pages 244–263. Springer, 2012. [ pdf ]

The theory of context-free languages is well-understood and context-free parsers can be used as off-the-shelf tools in practice. In particular, to use a context-free parser framework, a user does not need to understand its internals but can specify a language declaratively as a grammar. However, many languages in practice are not context-free. One particularly important class of such languages is layout-sensitive languages, in which the structure of code depends on indentation and whitespace. For example, Python, Haskell, F#, and Markdown use indentation instead of curly braces to determine the block structure of code. Their parsers (and lexers) are not declaratively specified but hand-tuned to account for layout-sensitivity.

To support declarative specifications of layout-sensitive languages, we propose a parsing framework in which a user can annotate layout in a grammar as constraints on the relative positioning of tokens in the parsed subtrees. For example, a user can declare that a block consists of statements that all start on the same column. We have integrated layout constraints into SDF and implemented a layout-sensitive generalized parser as an extension of generalized LR parsing. We evaluate the correctness and performance of our parser by parsing 33290 open-source Haskell files. Layout-sensitive generalized parsing is easy to use, and its performance overhead compared to layout-insensitive parsing is small enough for most practical applications.

A Variability-Aware Module System
Christian Kästner, Klaus Ostermann, and Sebastian Erdweg. In Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 773–792. ACM, 2012. [ pdf ]

Module systems enable a divide and conquer strategy to software development. To implement compile-time variability in software product lines, modules can be composed in different combinations. However, this way variability dictates a dominant decomposition. Instead, we introduce a variability-aware module system that supports compile-time variability inside a module and its interface. This way, each module can be considered a product line that can be type checked in isolation. Variability can crosscut multiple modules. The module system breaks with the antimodular tradition of a global variability model in product-line development and provides a path toward software ecosystems and product lines of product lines developed in an open fashion. We discuss the design and implementation of such a module system on a core calculus and provide an implementation for C, which we use to type check the open source product line Busybox with 811 compile-time options.

Layout-sensitive Language Extensibility with SugarHaskell
Sebastian Erdweg, Felix Rieger, Tillmann Rendel, and Klaus Ostermann. In Proceedings of Haskell Symposium, pages 149–160. ACM, 2012. [ pdf ]

Programmers need convenient syntax to write elegant and concise programs. Consequently, the Haskell standard provides syntactic sugar for some scenarios (e.g., do notation for monadic code), authors of Haskell compilers provide syntactic sugar for more scenarios (e.g., arrow notation in GHC), and some Haskell programmers implement preprocessors for their individual needs (e.g., idiom brackets in SHE). But manually written preprocessors cannot scale: They are expensive, error-prone, and not composable. Most researchers and programmers therefore refrain from using the syntactic notations they need in actual Haskell programs, but only use them in documentation or papers. We present a syntactically extensible version of Haskell, SugarHaskell, that empowers ordinary programmers to implement and use custom syntactic sugar.

Building on our previous work on syntactic extensibility for Java, SugarHaskell integrates syntactic extensions as sugar libraries into Haskell's module system. Syntax extensions in SugarHaskell can declare arbitrary context-free and layout-sensitive syntax. SugarHaskell modules are compiled into Haskell modules and further processed by a Haskell compiler. We provide an Eclipse-based IDE for SugarHaskell that is extensible, too, and automatically provides syntax coloring for all syntax extensions imported into a module.

We have validated SugarHaskell with several case studies, including arrow notation (as implemented in GHC) and EBNF as a concise syntax for the declaration of algebraic data types with associated concrete syntax. EBNF declarations also show how to extend the extension mechanism itself: They introduce syntactic sugar for using the declared concrete syntax in other SugarHaskell modules.

Toward Variability-Aware Testing
Christian Kästner, Alexander von Rhein, Sebastian Erdweg, Jonas Pusch, Sven Apel, Tillmann Rendel, Klaus Ostermann. In Proceedings of Workshop on Feature-Oriented Software Development (FOSD), pages 1–8. ACM, 2012. [ pdf ]

We investigate how to execute a unit test in all configurations of a product line without generating each product in isolation in a brute-force fashion. Learning from variability-aware analyses, we (a) design and implement a variability-aware interpreter and (b) reencode variability of the product line to simulate the test cases with a model checker. The interpreter internally reasons about variability, executing paths not affected by variability only once for the whole product line. The model checker achieves similar results by reusing powerful off-the-shelf analyses. We experimented with a prototype implementation for each strategy. We compare both strategies and discuss trade-offs and future directions.

2011

Library-based Model-driven Software Development with SugarJ (demo paper)
Sebastian Erdweg and Lennart C. L. Kats and Tillmann Rendel and Christian Kästner and Klaus Ostermann and Eelco Visser. In Companion to the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 17–18. ACM, 2011.

SugarJ: Library-based Language Extensibility (poster paper)
Sebastian Erdweg and Lennart C. L. Kats and Tillmann Rendel and Christian Kästner and Klaus Ostermann and Eelco Visser. In Companion to the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 187–188. ACM, 2011.

SugarJ is a Java-based programming language that provides extensible surface syntax, static analyses, and IDE support. SugarJ extensions are organized as libraries; conventional import statements suffice to activate and compose language extensions. We demonstrate how programmers can use SugarJ to modularly extend Java's syntax, semantic analyses and IDE support.

Growing a Language Environment with Editor Libraries
Sebastian Erdweg and Lennart C. L. Kats and Tillmann Rendel and Christian Kästner and Klaus Ostermann and Eelco Visser. In Proceedings of Conference on Generative Programming and Component Engineering (GPCE), pages 167–176. ACM, 2011. [ pdf ]

Large software projects consist of code written in a multitude of different (possibly domain-specific) languages, which are often deeply interspersed even in single files. While many proposals exist on how to integrate languages semantically and syntactically, the question of how to support this scenario in integrated development environments (IDEs) remains open: How can standard IDE services, such as syntax highlighting, outlining, or reference resolving, be provided in an extensible and compositional way, such that an open mix of languages is supported in a single file?

Based on our library-based syntactic extension language for Java, SugarJ, we propose to make IDEs extensible by organizing editor services in editor libraries. Editor libraries are libraries written in the object language, SugarJ, and hence activated and composed through regular import statements on a file-by-file basis. We have implemented an IDE for editor libraries on top of SugarJ and the Eclipse-based Spoofax language workbench. We have validated editor libraries by evolving this IDE into a fully-fledged and schema-aware XML editor as well as an extensible LaTeX editor, which we used for writing this paper.

SugarJ: Library-based Syntactic Language Extensibility
Sebastian Erdweg, Tillmann Rendel, Christian Kästner and Klaus Ostermann. In Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 391–406. ACM, 2011. Distinguished Paper Award. [ pdf ]

Existing approaches to extend a programming language with syntactic sugar often leave a bitter taste, because they cannot be used with the same ease as the main extension mechanism of the programming language—libraries. Sugar libraries are a novel approach for syntactically extending a programming language within the language. A sugar library is like an ordinary library, but can, in addition, export syntactic sugar for using the library. Sugar libraries maintain the composability and scoping properties of ordinary libraries and are hence particularly well-suited for embedding a multitude of domain-specific languages into a host language. They also inherit the self-applicability of libraries, which means that the syntax extension mechanism can be applied in the definition of sugar libraries themselves.

To demonstrate the expressiveness and applicability of sugar libraries, we have developed SugarJ, a language on top of Java, SDF and Stratego that supports syntactic extensibility. SugarJ employs a novel incremental parsing technique that allows changing the syntax within a source file. We demonstrate SugarJ by five language extensions, including embeddings of XML and closures in Java, all available as sugar libraries. We illustrate the utility of self-applicability by embedding XML Schema, a metalanguage to define XML languages.

Variability-Aware Parsing in the Presence of Lexical Macros and Conditional Compilation
Christian Kästner, Paolo G. Giarrusso, Tillmann Rendel, Sebastian Erdweg, Klaus Ostermann and Thorsten Berger. In Proceedings of Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 805–824. ACM, 2011. [ pdf ]

In many projects, lexical preprocessors are used to manage different variants of the project (using conditional compilation) and to define compile-time code transformations (using macros). Unfortunately, while being a simply way to implement variability, conditional compilation and lexical macros hinder automatic analysis, even though such analysis would be urgently needed to combat variability-induced complexity. To analyze code with its variability, we need to parse it without preprocessing it. However, current parsing solutions use heuristics, support only a subset of the language, or suffer from exponential explosion. As part of the TypeChef project, we contribute a novel variability-aware parser that can parse unpreprocessed code without heuristics in practicable time. Beyond the obvious task of detecting syntax errors, our parser paves the road for further analysis, such as variability-aware type checking. We implement variability-aware parsers for Java and GNU C and demonstrate practicability by parsing the product line MobileMedia and the entire X86 architecture of the Linux kernel with 6065 variable features.

Library-based Language Extensibility
Sebastian Erdweg. In Participant's Workshop of Summer School on Generative and Transformational Techniques in Software Engineering (GTTSE). 2011. Best Presentation Award (sponsored by Google). [ pdf ]

Abstract Features in Feature Modeling
Thomas Thüm, Christian Kästner, Sebastian Erdweg, and Norbert Siegmund. In Proceedings of Software Product Line Conference (SPLC), pages 191–200. IEEE Computer Society, 2011. [ pdf ]

A software product line is a set of program variants, typically generated from a common code base. Feature models describe variability in product lines by documenting features and their valid combinations. In product-line engineering, we need to reason about variability and program variants for many different tasks. For example, given a feature model, we might want to determine the number of all valid feature combinations or detect specific feature combinations for testing. However, we found that contemporary reasoning approaches can only reason about feature combinations, not about program variants, because they do not take abstract features into account. Abstract features are features used to structure a feature model that, however, do not have any impact at implementation level. Using existing feature-model reasoning mechanisms for product variants leads to incorrect results. We raise awareness of the problem of abstract features for different kinds of analyses on feature models. We argue that, in order to reason about program variants, abstract features should be made explicit in feature models. We present a technique based on propositional formulas to reason about program variants. In practice, our technique can save effort that is caused by considering the same program variant multiple times, for example, in product-line testing.

2010

Featherweight TeX and Parser Correctness
Sebastian Thore Erdweg and Klaus Ostermann. In Proceedings of Software Language Engineering (SLE), volume 6563 of LNCS, pages 397–416. Springer, 2010. [ pdf ]

TeX (and its LaTeX incarnation) is a widely used document preparation system for technical and scientific documents. At the same time, TeX is also an unusual programming language with a quite powerful macro system. Despite the wide range of TeX users (especially in the scientific community), and despite a widely perceived considerable level of "pain" in using TeX, there is almost no research on TeX. This paper is an attempt to change that.

To this end, we present Featherweight TeX, a formal model of TeX which we hope can play a similar role for TeX as Featherweight Java did for Java. The main technical problem which we study in terms of Featherweight TeX is the parsing problem. As for other dynamic languages performing syntactic analysis at runtime, the concept of "static" parsing and its correctness is unclear in TeX and shall be clarified in this paper. Moreover, it is the case that parsing TeX is impossible in general, but we present evidence that parsers for practical subsets exists.

We furthermore outline three immediate applications of our formalization of TeX and its parsing: a macro debugger, an analysis that detects syntactic inconsistencies, and a test framework for TeX parsers.