Modular and Automated Type-Soundness Verification for
Language Extensions (extended abstract)
Sebastian Erdweg, Florian Lorenzen. In Workshop on Scalable Language Specification (SLS), 2013. [ pdf ]
We present a formalism for building language extensions that ensures soundness of extensions with respect to the type system of the base language. To build a type-sound language extension with our formalism, a developer declares an extended syntax, type rules for the extended syntax, and translation rules into the (possibly further extended) base language. Our formalism then modularly and automatically validates that the user-defined type rules are sufficient to guarantee that code generated by the translation rules cannot contain any type errors.
Extensible Languages for Flexible and Principled Domain Abstraction
Sebastian Erdweg. PhD thesis, Philipps-Universität Marburg, 2013. [ pdf ]
Most programming languages are designed for general-purpose software development in a one-size-fits-all fashion: They provide the same set of language features and constructs for all possible applications programmers ever may want to develop. As with shoes, the one-size-fits-all solution grants a good fit to few applications only.
The trend toward domain-specific languages, model-driven development, and language-oriented programming counters general-purpose languages by promoting the use of domain abstractions that facilitate domain-specific language features and constructs tailored to certain application domains. In particular, domain abstraction avoids the need for encoding domain concepts with general-purpose language features and thus allows programmers to program at the same abstraction level as they think.
Unfortunately, current approaches to domain abstraction cannot deliver on the promises of domain abstraction. On the one hand, approaches that target internal domain-specific languages lack flexibility regarding the syntax, static checking, and tool support of domain abstractions, which limits the level of actually achieved domain abstraction. On the other hand, approaches that target external domain-specific languages lack important principles, such as modular reasoning and composition of domain abstractions, which inhibits the applicability of these approaches in the development of larger software systems. In this thesis, we pursue a novel approach that unifies the advantages of internal and external domain-specific languages to support flexible and principled domain abstraction.
We propose library-based extensible programming languages as a basis for domain abstraction. In an extensible language, domain abstraction can be realized by extending the language with domain-specific syntax, static analysis, and tool support. This enables domain abstractions as flexible as external domain-specific languages. To ensure the compliance with important software-development principles, we organize language extensions as libraries and use simple import statements to activate extensions. This facilitates modular reasoning (by inspecting import statements), supports the composition of domain abstractions (by importing multiple extensions), and allows uniform self-application of language extensions in the development of further extensions (by importing extensions in an extension definition). A library-based organization of extensions enables domain abstractions as principled as internal domain-specific languages.
We designed and implemented SugarJ, a library-based extensible programming language on top of Java. SugarJ libraries can declare and export extensions of SugarJ's syntax, static analysis, and editor support. Thereby, a syntactic extension consists of an extended syntax and a desugaring transformation from the extended syntax into SugarJ base syntax, an analysis extension matches on part of the current file's abstract syntax tree and produces a list of errors, and an editor extension declares editor services such as coloring or code completion for certain language constructs. SugarJ extensions are fully self-applicable: An extended syntax can desugar into the declaration of another extensions, an extended analysis can check the declaration of an extension, and an extended editor can assist developers in writing extensions. To process a source file with extensions, the SugarJ compiler and IDE inspect the imported libraries to determine active extensions. The compiler and IDE adapt the parser, code generator, analyzer, and editor of the source file according to the active extensions.
In this thesis, we do not only describe the design and implementation of SugarJ, but also report on extensions of the original design. In particular, we designed and implemented a generalization of the SugarJ compiler that supports alternative base languages besides Java. Using this generalization, we developed the library-based extensible programming languages SugarHaskell, SugarProlog, and SugarFomega. Furthermore, we developed an extension of SugarJ that supports polymorphic domain abstraction and ensures communication integrity. Polymorphic domain abstraction enables programmers to provide multiple desugarings for the same domain-specific syntax. This increases the flexibility of SugarJ and supports scenarios known from model-driven development. Communication integrity specifies that components of a software system may communicate over explicit channels only. This is interesting in the context of code generation where it effectively prohibits the generation of implicit module dependencies. We augmented SugarJ's principles by enforcing communication integrity.
On the basis of SugarJ and numerous case studies, we argue that flexible and principled domain abstraction constitutes a scalable programming model for the development of complex software systems.
Embedding a Questionnaire DSL with SugarJ
Sebastian Erdweg. In Language Workbench Challenge, 2013. [ pdf ]
We describe our SugarJ-based solution to the 2013 Language Workbench Competition. As part of this competition, we developed domain-specific language (DSL) for questionnaires that features conditional questions, locally defined questions, derived values, name checking, type checking, checking for overlapping question instances, and basic tool support such as code coloring. Using SugarJ, we have realized the questionnaire DSL as a language extension of Java that translates a questionnaire into a Java Swing component. We have realized the questionnaire DSL via separate components for the syntax, code generation, tool support, name checking, type checking, and overlap checking. Moreover, we use SugarJ's support for layout-sensitive syntax to use indentation instead of parentheses in the design of the questionnaire DSL.
Toward Variability-Aware Testing
Christian Kästner, Alexander von Rhein, Sebastian Erdweg, Jonas Pusch, Sven Apel, Tillmann Rendel, Klaus Ostermann. In Workshop on Feature-Oriented Software Development (FOSD), pages 1–8. ACM, 2012. [ pdf ]
We investigate how to execute a unit test in all configurations of a product line without generating each product in isolation in a brute-force fashion. Learning from variability-aware analyses, we (a) design and implement a variability-aware interpreter and (b) reencode variability of the product line to simulate the test cases with a model checker. The interpreter internally reasons about variability, executing paths not affected by variability only once for the whole product line. The model checker achieves similar results by reusing powerful off-the-shelf analyses. We experimented with a prototype implementation for each strategy. We compare both strategies and discuss trade-offs and future directions.
Layout-sensitive Generalized Parsing
Sebastian Erdweg, Tillmann Rendel, Christian Kästner and Klaus Ostermann. In Proceedings of Software Language Engineering (SLE), volume 7745 of LNCS, pages 244–263. Springer, 2012. [ pdf ]
The theory of context-free languages is well-understood and context-free parsers can be used as off-the-shelf tools in practice. In particular, to use a context-free parser framework, a user does not need to understand its internals but can specify a language declaratively as a grammar. However, many languages in practice are not context-free. One particularly important class of such languages is layout-sensitive languages, in which the structure of code depends on indentation and whitespace. For example, Python, Haskell, F\#, and Markdown use indentation instead of curly braces to determine the block structure of code. Their parsers (and lexers) are not declaratively specified but hand-tuned to account for layout-sensitivity.
To support declarative specifications of layout-sensitive languages, we propose a parsing framework in which a user can annotate layout in a grammar as constraints on the relative positioning of tokens in the parsed subtrees. For example, a user can declare that a block consists of statements that all start on the same column. We have integrated layout constraints into SDF and implemented a layout-sensitive generalized parser as an extension of generalized LR parsing. We evaluate the correctness and performance of our parser by parsing 33290 open-source Haskell files. Layout-sensitive generalized parsing is easy to use, and its performance overhead compared to layout-insensitive parsing is small enough for most practical applications.
A Variability-Aware Module System
Christian Kästner, Klaus Ostermann, and Sebastian Erdweg. In Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 773–792. ACM, 2012. [ pdf ]
Module systems enable a divide and conquer strategy to software development. To implement compile-time variability in software product lines, modules can be composed in different combinations. However, this way variability dictates a dominant decomposition. Instead, we introduce a variability-aware module system that supports compile-time variability inside a module and its interface. This way, each module can be considered a product line that can be type checked in isolation. Variability can crosscut multiple modules. The module system breaks with the antimodular tradition of a global variability model in product-line development and provides a path toward software ecosystems and product lines of product lines developed in an open fashion. We discuss the design and implementation of such a module system on a core calculus and provide an implementation for C, which we use to type check the open source product line Busybox with 811 compile-time options.
Layout-sensitive Language Extensibility with SugarHaskell
Sebastian Erdweg, Felix Rieger, Tillmann Rendel, and Klaus Ostermann. In Proceedings of Haskell Symposium, pages 149–160. ACM, 2012. [ bib | pdf ]
Programmers need convenient syntax to write elegant and concise programs. Consequently, the Haskell standard provides syntactic sugar for some scenarios (e.g., do notation for monadic code), authors of Haskell compilers provide syntactic sugar for more scenarios (e.g., arrow notation in GHC), and some Haskell programmers implement preprocessors for their individual needs (e.g., idiom brackets in SHE). But manually written preprocessors cannot scale: They are expensive, error-prone, and not composable. Most researchers and programmers therefore refrain from using the syntactic notations they need in actual Haskell programs, but only use them in documentation or papers. We present a syntactically extensible version of Haskell, SugarHaskell, that empowers ordinary programmers to implement and use custom syntactic sugar.
Building on our previous work on syntactic extensibility for Java, SugarHaskell integrates syntactic extensions as sugar libraries into Haskell's module system. Syntax extensions in SugarHaskell can declare arbitrary context-free and layout-sensitive syntax. SugarHaskell modules are compiled into Haskell modules and further processed by a Haskell compiler. We provide an Eclipse-based IDE for SugarHaskell that is extensible, too, and automatically provides syntax coloring for all syntax extensions imported into a module.
We have validated SugarHaskell with several case studies, including arrow notation (as implemented in GHC) and EBNF as a concise syntax for the declaration of algebraic data types with associated concrete syntax. EBNF declarations also show how to extend the extension mechanism itself: They introduce syntactic sugar for using the declared concrete syntax in other SugarHaskell modules.
Language Composition Untangled
Sebastian Erdweg, Paolo G. Giarrusso, and Tillmann Rendel. In Proceedings of Workshop on Language Descriptions, Tools and Applications (LDTA), pages 7:1–7:8. ACM, 2011. [ bib | pdf ]
In language-oriented programming and modeling, software developers are largely concerned with the definition of domain-specific languages (DSLs) and their composition. While various implementation techniques and frameworks exist for defining DSLs, language composition has not obtained enough attention and is not well-enough understood. In particular, there is a lack of precise terminology for describing observations about language composition in theory and in existing language-development systems. To clarify the issue, we specify five forms of language composition: language extension, language restriction, language unification, self-extension, and extension composition. We illustrate this classification by various examples and apply it to discuss the performance of different language-development system with respect to language composition. We hope that the terminology provided by our classification will enable more precise communication on language composition.
Library-based Model-driven Software Development with SugarJ (Demonstration)
Sebastian Erdweg and Lennart C. L. Kats and Tillmann Rendel and Christian Kästner and Klaus Ostermann and Eelco Visser. In Companion to the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 17–18. ACM, 2011. [ bib ]
SugarJ: Library-based Language Extensibility (Poster)
Sebastian Erdweg and Lennart C. L. Kats and Tillmann Rendel and Christian Kästner and Klaus Ostermann and Eelco Visser. In Companion to the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 187–188. ACM, 2011. [ bib ]
SugarJ is a Java-based programming language that provides extensible surface syntax, static analyses, and IDE support. SugarJ extensions are organized as libraries; conventional import statements suffice to activate and compose language extensions. We demonstrate how programmers can use SugarJ to modularly extend Java's syntax, semantic analyses and IDE support.
Growing a Language Environment with Editor Libraries
Sebastian Erdweg and Lennart C. L. Kats and Tillmann Rendel and Christian Kästner and Klaus Ostermann and Eelco Visser. In Proceedings of Conference on Generative Programming and Component Engineering (GPCE), pages 167–176. ACM, 2011. [ bib | pdf ]
Large software projects consist of code written in a multitude of different (possibly domain-specific) languages, which are often deeply interspersed even in single files. While many proposals exist on how to integrate languages semantically and syntactically, the question of how to support this scenario in integrated development environments (IDEs) remains open: How can standard IDE services, such as syntax highlighting, outlining, or reference resolving, be provided in an extensible and compositional way, such that an open mix of languages is supported in a single file?
Based on our library-based syntactic extension language for Java, SugarJ, we propose to make IDEs extensible by organizing editor services in editor libraries. Editor libraries are libraries written in the object language, SugarJ, and hence activated and composed through regular import statements on a file-by-file basis. We have implemented an IDE for editor libraries on top of SugarJ and the Eclipse-based Spoofax language workbench. We have validated editor libraries by evolving this IDE into a fully-fledged and schema-aware XML editor as well as an extensible LaTeX editor, which we used for writing this paper.
SugarJ: Library-based Syntactic Language Extensibility
Sebastian Erdweg, Tillmann Rendel, Christian Kästner and Klaus Ostermann. In Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 391–406. ACM, 2011. Distinguished Paper Award. [ bib | pdf ]
Existing approaches to extend a programming language with syntactic sugar often leave a bitter taste, because they cannot be used with the same ease as the main extension mechanism of the programming language—libraries. Sugar libraries are a novel approach for syntactically extending a programming language within the language. A sugar library is like an ordinary library, but can, in addition, export syntactic sugar for using the library. Sugar libraries maintain the composability and scoping properties of ordinary libraries and are hence particularly well-suited for embedding a multitude of domain-specific languages into a host language. They also inherit the self-applicability of libraries, which means that the syntax extension mechanism can be applied in the definition of sugar libraries themselves.
To demonstrate the expressiveness and applicability of sugar libraries, we have developed SugarJ, a language on top of Java, SDF and Stratego that supports syntactic extensibility. SugarJ employs a novel incremental parsing technique that allows changing the syntax within a source file. We demonstrate SugarJ by five language extensions, including embeddings of XML and closures in Java, all available as sugar libraries. We illustrate the utility of self-applicability by embedding XML Schema, a metalanguage to define XML languages.
Variability-Aware Parsing in the Presence of Lexical Macros and
Christian Kästner, Paolo G. Giarrusso, Tillmann Rendel, Sebastian Erdweg, Klaus Ostermann and Thorsten Berger. In Proceedings of Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 805–824. ACM, 2011. [ bib | pdf ]
In many projects, lexical preprocessors are used to manage different variants of the project (using conditional compilation) and to define compile-time code transformations (using macros). Unfortunately, while being a simply way to implement variability, conditional compilation and lexical macros hinder automatic analysis, even though such analysis would be urgently needed to combat variability-induced complexity. To analyze code with its variability, we need to parse it without preprocessing it. However, current parsing solutions use heuristics, support only a subset of the language, or suffer from exponential explosion. As part of the TypeChef project, we contribute a novel variability-aware parser that can parse unpreprocessed code without heuristics in practicable time. Beyond the obvious task of detecting syntax errors, our parser paves the road for further analysis, such as variability-aware type checking. We implement variability-aware parsers for Java and GNU C and demonstrate practicability by parsing the product line MobileMedia and the entire X86 architecture of the Linux kernel with 6065 variable features.
Library-based Language Extensibility
Sebastian Erdweg. In Participant's Workshop of Summer School on Generative and Transformational Techniques in Software Engineering (GTTSE). 2011. Best Presentation Award (sponsored by Google). [ pdf ]
Abstract Features in Feature Modeling
Thomas Thüm, Christian Kästner, Sebastian Erdweg, and Norbert Siegmund. In Proceedings of Software Product Line Conference (SPLC), pages 191–200. IEEE Computer Society, 2011. [ bib | pdf ]
A software product line is a set of program variants, typically generated from a common code base. Feature models describe variability in product lines by documenting features and their valid combinations. In product-line engineering, we need to reason about variability and program variants for many different tasks. For example, given a feature model, we might want to determine the number of all valid feature combinations or detect specific feature combinations for testing. However, we found that contemporary reasoning approaches can only reason about feature combinations, not about program variants, because they do not take abstract features into account. Abstract features are features used to structure a feature model that, however, do not have any impact at implementation level. Using existing feature-model reasoning mechanisms for product variants leads to incorrect results. We raise awareness of the problem of abstract features for different kinds of analyses on feature models. We argue that, in order to reason about program variants, abstract features should be made explicit in feature models. We present a technique based on propositional formulas to reason about program variants. In practice, our technique can save effort that is caused by considering the same program variant multiple times, for example, in product-line testing.
Featherweight TeX and Parser Correctness
Sebastian Thore Erdweg and Klaus Ostermann. In Proceedings of Software Language Engineering (SLE), volume 6563 of LNCS, pages 397–416. Springer, 2010. [ bib | pdf ]
TeX (and its LaTeX incarnation) is a widely used document preparation system for technical and scientific documents. At the same time, TeX is also an unusual programming language with a quite powerful macro system. Despite the wide range of TeX users (especially in the scientific community), and despite a widely perceived considerable level of "pain" in using TeX, there is almost no research on TeX. This paper is an attempt to change that.
To this end, we present Featherweight TeX, a formal model of TeX which we hope can play a similar role for TeX as Featherweight Java did for Java. The main technical problem which we study in terms of Featherweight TeX is the parsing problem. As for other dynamic languages performing syntactic analysis at runtime, the concept of "static" parsing and its correctness is unclear in TeX and shall be clarified in this paper. Moreover, it is the case that parsing TeX is impossible in general, but we present evidence that parsers for practical subsets exists.
We furthermore outline three immediate applications of our formalization of TeX and its parsing: a macro debugger, an analysis that detects syntactic inconsistencies, and a test framework for TeX parsers.