Category Archives: Compilers

SimpleScript (3) The Parser, Expressions and Commands

Previos Post

Let’s visit the Parser, that is a separated module. It starts with a simple declaration:

'use strict';

var lexer;

if (typeof lexer == 'undefined')
    lexer = require('./lexer');

var parser = (function () {
    var TokenType = lexer.TokenType;
    var binoperators = [ "+", "-", "*", "/", "==", "!=", "<", ">", "<=", ">=" ];

It uses and requires the lexer module. After this declaration, there are many expressions and commands. This is the expression “class” for a name (ie, foo):

function NameExpression(name) {
    this.isLeftValue = true;

    this.compile = function () {
        return name;

    this.getName = function () {
        return name;

    this.collectContext = function (context) {

In an upcoming post, I will describe the detection and construction of commands and expression. An expression should implement two methods: compile, that returns an string with the compiled JavaScript code associated to the expression, and collectContext, that allows the discover of used variables in an expression/command. In the above code, NameExpression declares its name to a context, an object that is recovering the used variables.

This is an IndexedExpression, composed by an expression and another one for the index (it’s like foo[42+1]):

function IndexedExpression(expr, indexpr) {
    this.isLeftValue = true;

    this.compile = function () {
        return expr.compile() + '[' + indexpr.compile() + ']';

    this.collectContext = function (context) {

The collectContext visits the internal expression (I could add the visit of the index expression, too).

There are commands, like IfCommand:

function IfCommand(cond, thencmd, elsecmd) {
    this.compile = function () {
        var code = 'if (' + cond.compile() + ') { ' + thencmd.compile() + ' }';
        if (elsecmd)
            code += ' else { ' + elsecmd.compile() + ' }';
        return code;

    this.collectContext = function (context) {
        if (elsecmd)

The distinction between commands and expressions is a formal one. Again, a command should implement compile and collectContext. The above code generates a JavaScript if command.

As usual, I followed TDD (Test-Driven Development) workflow. Partial tests example:

exports['Compile string without quotes inside'] = function (test) {
    test.equal(compileExpression("'foo'", test), "'foo'");
    test.equal(compileExpression('"foo"', test), "'foo'");

exports['Compile name'] = function (test) {
    test.equal(compileExpression("foo", test), "foo");

exports['Qualified name'] = function (test) {
    test.equal(compileExpression("", test), "");

exports['Indexed term'] = function (test) {
    test.equal(compileExpression("foo[bar]", test), "foo[bar]");

Remember: No TDD, no paradise! 😉

Next topics: how to recognize and build expressions and commands.

Stay tuned!

Angel “Java” Lopez

SimpleScript (2) The Lexer

Previous Post
Next Post

These days, I was working improving my SimpleScript compiler to JavaScript. Today, I want to comment the implementation of the lexer, the repo is

Now, the lexer code resides in a dedicated file lib/lexer.js, that expose a module, that can be consumed from Node.js and from the browser. It starts defining the token types:

var lexer = (function () {
    var TokenType = { 
        Name: 1, 
        Integer: 2, 
        Real: 3, 
        String: 4, 
        NewLine: 5, 
        Separator: 6, 
        Assignment: 7 };

Then, it defines some operatores, delimiters, and the Token, with two elements, type and value.

var separators = ".,()[]";
var assignments = ["=", "+=", "-=", "*=", "/="];
var operators = ["+", "-", "*", "/", "==", "!=", "<", ">", "<=", ">="];

function Token(value, type) {
    this.value = value;
    this.type = type;

The main job is in the Lexer “class”, with the method nextToken:

function Lexer(text) {
    var length = text ? text.length : 0;
    var position = 0;
    var next = [];

    this.nextToken = function () {
        if (next.length > 0)
            return next.pop();


        var ch = nextChar();

        if (ch === null)
            return null;

        if (ch === '"' || ch === "'")
            return nextString(ch);

        if (ch === '\n')
            return new Token(ch, TokenType.NewLine);

        if (ch === '\r') {
            var ch2 = nextChar();

            if (ch2 === '\n')
                return new Token(ch + ch2, TokenType.NewLine);

            if (ch2)

            return new Token(ch, TokenType.NewLine);

        if (isAssignment(ch))
            return new Token(ch, TokenType.Assignment);

        if (isOperator(ch))
            return nextOperator(ch);

        if (isSeparator(ch))
            return new Token(ch, TokenType.Separator);

        if (isFirstCharOfName(ch))
            return nextName(ch);

        if (isDigit(ch))
            return nextInteger(ch);

Finally, the module expose a lexer factory and the enumeration of token types:

return {
    lexer: function (text) { return new Lexer(text); },
    TokenType: TokenType

The code was developed using Test-Driven Development workflow. There is a file test/lexer.js, fragment:

function getToken(text, value, type, test) {
    var lexer = sslexer.lexer(text);
    var token = lexer.nextToken();
    test.equal(token.value, value);
    test.equal(token.type, type);
    test.equal(lexer.nextToken(), null);

exports['Get names'] = function (test) {
    getToken('foo', 'foo', TokenType.Name, test);
    getToken('foo123', 'foo123', TokenType.Name, test);
    getToken('foo_123', 'foo_123', TokenType.Name, test);
    getToken('_foo', '_foo', TokenType.Name, test);

exports['Get integer'] = function (test) {
    getToken('123', '123', TokenType.Integer, test);
    getToken('1234567890', '1234567890', TokenType.Integer, test);

Remember: no TDD, no paradise 😉

Next topics: the parser, commands and expressions implementations, compilation to JavaScript.

Stay tuned!

Angel “Java” Lopez

Code Katas in JavaScript/Node.js using TDD

These past weeks, I was working in JavaScript/Node.js modules, using TDD at each step. Practice, practice, practice, the journey to mastery.

You can see my progress, reviewing the commits I did at each new test. This is a summary of that work:

CobolScript: See my posts, an implementation of COBOL as a compiler to JavaScript, having console program samples, dynamic web pages and access to Node.js modules. See web sample, using MySQL, and SimpleWeb.

SimplePipes: A way to define message-passing using ‘pipes’ to connect different defined nodes/functions. I want to extend it to have distributed process.

SimpleBoggle: Boggle solver, it is better than me! See console sample.

SimpleMemolap: Multidimensional OLAP-like processing, with in-memory model, and SimpleWeb site see sample:

SimpleChess: Work in progress, define a board using SimpleBoard, and make moves. I’m working on SimpleGo, too, to have a board, game, and evaluators.

SimpleRules: forward-chaing rule engine. I should add rule compilation to JavaScript. The engine works a la Rete-2, detecting the changes in the current state, and triggering actions.

SimpleScript: see post, my simple language, compiled to JavaScript. See posts. WIP.

Py2Script: Python language compiler to JavaScript, first step. WIP.

SimpleWeb: web middleware, a la Connect, with web sample.

BasicScript: My first steps to compile Basic to JavaScript. I want to use it to program and compile a game.

SimplePermissions: Today code kata. It implements subjects, roles, and permissions, granted by context.

SimpleFunc: Serialization of functions.

SimpleMapReduce: Exploring the implementation of a Map-Reduce algorithm.

SimpleTuring: Turing machine implentation.

Cellular: Cellular automata implementation, including a Game of Life console sample.

I will work on:

NodeDelicious: To retrieve my links from my Delicious account, now the site was revamped and no more pagination.

SimpleDatabase: In-memory database, maybe I will add file persistence.

SimpleSudoku: Rewrite of my AjSudoku solver, from scratch.

I’m having a lot of fun, as usual 😉

Keep tuned!

Angel “Java” Lopez

SimpleScript (1) First ideas

The latest two week, I was busy writing CobolScript, my COBOL compiler to JavaScript (see my posts). I have console sample programs, and dynamic pages samples running on Node.js (see samples). The web samples are using the simple http node.js module, or my new SimpleWeb module, a simple middleware layer a la Connect. I started to write a Python to JavaScript compiler, see Py2Script, too. But now, after those training projecs (my first ones that are compiling to JavaScript using JavaScript), I want to push the envelope and write a simple script compiler, I named SimpleScript (see repo).

The key points:

– It compiles to JavaScript, so, it’s JavaScript-oriented. It’s not a script language to be implemented in different technologies (.NET, Java, JavaScript). It’s totally oriented to JavaScript semantic.

– I love C programming language tradition, but this time, I want no semicolons and curly braces. I want a syntax more Python/Ruby oriented.

– No indentation “hell”. Ok, I like Python, but indentation as part of the syntax, it’s not my preferred way.

– No command separator (no semicolon or something else) except new line or syntax. That is, I could write

if a < 1 a = 1


if a < 1
   a = 1

Notice the use of end.

You CANNOT write (semicolon is not a separator):

if a < 1 a=1; b=2

You MUST write:

if a < 1
  a = 1
  b = 2

– No parenthesis around conditions (see the above if example).

– Only for … in , to be discussed. I want to have for-in as in JavaScript, but with some variant to access directly the values instead the names/indexes of an object or array. Something like

for k in myarray

iterates over myarray indexes.

for k in myarray values

iterates over myarray values, directly. Range expression will be supported:

for k in 0..n

– Loops with continue, break. The main loop construction is the while.

– Functions as first class citizens.

function keyword will be used to define anonymous functions. Maybe (to be discussed) I will use define keyword to define named functions.

– Function invocation with explicit parenthesis (forget Ruby convention, go for something like Python 3.x).

– Array access with [] (forget Basic programming convention of using parenthesis).

– External variables. I learnt this from my CobolScript work: the linkage section is useful. An external variable is something injected at runtime (not a global one), and it can be provided in the call to the program. For example, print function could be an external one, so the calling program could redirect the print to the console, or to a buffer or to the web response. at will.

– Global variables should be declared explicitly. All non-declared variable is considered local (to the function, unit where it is used).

– Functions has closures as in JavaScript. In contrast, AFAIK, Python use explicit declaration to access outer variables in a function. I like JavaScript automatic access to outer variables, so I will keep it.

– Async constructions. I added this to CobolScript, and I found it useful. It’s something like the await/async in C# 5.0

– Run on the browser and on Node.js.

– Class support: to be discussed (the only key use case I have, is the game client coding for my game project).

I know, there are other implementations, like CofeeScript. But I want to training me, on JavaScript, Node.js and TDD.

Keep tuned!

Angel “Java” Lopez

Compilers: Links, News And Resources (4)

Previous Post

Compiling Ruby: From Text to Bytecode

The Ludicrous JIT Compiler
Ludicrous is a just-in-time compiler for Ruby 1.8 and 1.9. Though still in the experimental stage, its performance is roughly on par with YARV


Common Lisp DSL Compiler Framework

What are the available tools to compile .NET projects to standalone native binaries?

takeoutweight / clojure-scheme
Clojure to Scheme to C to the bare metal.

Introducing C# To Go: a C# Compiler for Android

The Julia Language
Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. …
Julia’s LLVM-based just-in-time (JIT) compiler combined with the language’s design allow it to approach and often match the performance of C/C++….

Vala – Compiler for the GObject type system

Quick fun with Mono’s CSharp compiler as a service

dodo / node-dt-compiler
Δt compiler – async & dynamic templating engine compiler

Universal Cobol Compiler


Lispm archaeology: Compiler Protocols

Outlet gets a Personality
If you haven’t been following, Outlet is a project I’ve been working on that compiles a Scheme-inspired language to javascript and other languages.

A guide to the CHICKEN compilation process
This document describes the compilation process used by the CHICKEN Scheme to C compiler by explaining the different compilation stages on a simple example program.

Free APL Interpreters and Compilers

facebook / hiphop-php
Source code transformer from PHP to C++

CHICKEN is a compiler for the Scheme programming language. CHICKEN produces portable, efficient C, supports almost all of the R5RS Scheme language standard, and includes many enhancements and extensions. CHICKEN runs on Linux, MacOS X, Windows, and many Unix flavours.

The Impact of Optional Type Information
on JIT Compilation of Dynamically Typed Languages

PyPy, Tutorial Part 2: Adding a JIT

mherkender / lua.js
An ECMAscript framework to compile and run Lua code, allowing Lua to run in a browser or in Flash

Parsing expression grammar

My Links

Keep tuned!

Angel “Java” Lopez

Compilers: Links, News And Resources (3)

Previous Post
Next Post

More links about compiler implementations, ideas, interpreters, and programming languages:

Pharo Opal Compiler

Microsoft’s Roslyn: Reinventing the compiler as we know it

Roslyn CTP Now Available

Why Clojure Doesn’t Need Invokedynamic (Unless You Want It to be More Awesome)

C#5 and Meta-Programming

Roslyn: Compiler as a Service, not a black-box anymore!!.aspx

Project Roslyn or Compiler-as-a-Service. Bringing flexibility of dynamic languages to C#

Microsoft previews Compiler-as-a-Service software

O# (managed Objective-C) Compiler (requires .NET 4)

Microsoft previews Compiler-as-a-Service software

GNU Objective-C runtime features

An Introduction to Objective-C for .NET Programmers

Opa – a unified approach to web programming
Opa is a new open source language that lets you write an app and compile it to a JavaScript application on the client, complete with server-side support including a database.

Clojure, JVM 7 support (invokedynamic)

RES – An Open Cobol To Java Translato

Making Ruby Fast: The Rubinius JIT
by @dseminara

Introducing ClojureScript
by @stuartsierra

ClojureScript Rationale
by @stuarthalloway

Compiling Clojure to Javascript pt. 1 of n
by @fogus

Scala comes to .Net
Miguel Garcia, part of the Scala group at EPFL, has been striving to make the Scala productivity available to .Net developers too. In a project funded by Microsoft, he has now reached a major milestone in delivering that capability.

List of languages that compile to JS

Ferret: An Experimental Clojure Compiler

Precompile your MVC Razor views using RazorGenerator

Razor Generator

My Links

Keep tuned!

Angel “Java” Lopez

Compilers: Links, News And Resources (2)

Previous Post
Next Post

More links about compiler implementations, ideas and technologies:

jitasm is C++ library for runtime code generation of x86/x64. You can write the code like a inline assembler.

Compiling Scala to LLVM

Groovy Goodness: Add AST Transformations Transparently to Scripts
With Groovy 1.8 we can add compilation customizers when for example we want to run a Groovy script from our application code.

Jabaco is a simple programming language with a Visual Basic like syntax. Jabaco enables you to create powerful software for all Java supported operating systems.

How To Write a Console Application in PowerShell with Add-Type

Type inference
Type inference refers to the automatic deduction of the type of an expression in a programming language. If some, but not all, type annotations are already present it is referred to as type reconstruction.

LLVM Language Reference Manual

Example of how to use ANTLR+StringTemplate with LLVM to build a compiler.

Implementing the virtual method pattern in C#, Part Two

The goal of XMLVM is to offer a flexible and extensible cross-compiler toolchain. Instead of cross-compiling on a source code level, XMLVM cross-compiles byte code instructions from Sun Microsystem’s virtual machine and Microsoft’s Common Language Runtime.

Engineering a Compiler, 2nd Edition

Implementing Smalltalk’s Non-Local Returns in JavaScript

Smalltalk Classes in JavaScript

A Linux Compiler Deathmatch: GCC, LLVM, DragonEgg, Open64, Etc…

Pharen: Lisp -> PHP
Pharen is a compiler that takes a Lisp-like language and turns it into PHP code. This combines Lisp’s advantages of uniform syntax and homoiconicity (among others) and PHP’s advantage of…being everywhere.

clojurejs — a Clojure (subset) to Javascript translator

timcameronryan / mug
A self-hosted JavaScript compiler for the JVM. Written in CoffeeScript

Mono’s C# Compiler as a Service on Windows.

Smalltalk in small talks: The Setup
Smalltalk in Java

Xbase interactive programming environment
One of the exciting new features of the upcoming Xtext 2.0 is the integration of the general purpose expression language Xbase.

Compiler in Clojure
The Clojure compiler is currently a large Java class. Writing it in Clojure will produce many benefits

jarpiain / cljc
A clojure port of the clojure compiler

Inside Razor – Part 3 – Templates

My Links

More links about interpreters, compilers and programming languages are coming.

Keep tuned!

Angel “Java” Lopez

Compilers: Links, News and Resources (1)

Next Post

Since the late seventies, I am a programming language geek. I remember my first contact with COBOL, Fortran, Algol/W, assembly language for mainframes and BCPL. In recent years, I was studying new languages implementations, in the form of interpreter, virtual machine and compilers. This is part of the links I found interesting in the last years.

It parses .NET assemblies and generates unmanaged C++ code that can be compiled on any standard C++ compiler.

Mono CSharp Compiler
The Mono C# compiler is considered feature complete for C# 1.0, C# 2.0, C# 3.0 and C# 4.0 (ECMA).

Bytecode 2011

Write A Template Compiler For Erlang

alvaroc1 / s2js
Scala to Javascript compiler

Clojure Faster than Machine Code?


On FaceBook’s Thrift semantics, code generation, and OCaml



Introduction to GOLD Parser


Irony – .NET Language Implementation Kit
Irony is a development kit for implementing languages on .NET platform. Unlike most existing yacc/lex-style solutions Irony does not employ any scanner or parser code generation from grammar specifications written in a specialized meta-language. In Irony the target language grammar is coded directly in c# using operator overloading to express grammar constructs.

Interpreter and Compiler

Compiling .NET code on-the-fly

Dynamic Creation of Assemblies/Apps

NScript – A script host for C#/VB.NET/JScript.NET
NScript is a tool similar to WScript except that it allows scripts to be written in .NET languages such as C#, VB.NET and JScript.NET. NScript automatically compiles the code into an assembly in memory and executes the assembly.

CodeDom Assistant
Generating CodeDom Code By Parsing C# or VB

Building .NET Assemblies Dynamically

A Simple Compiler for the Common Language Runtime
An end-to-end example of a bottom up LALR(1) compiler for a fictitious language targeting the Common Language Runtime

Clojure COmpilation
Ahead-of-time Compilation and Class Generation

he Lisp Before the End of My Lifetime

Hydrating Objects With Expression Trees – Part I

Mastering Expression Trees With .NET Reflector

Dumping Objects Using Expression Trees

Solving the Expression Problem with OOP

C# compiler as a service

Chicken is an implementation of the Scheme programming language that can compile Scheme programs to C code as well as interpret them.

Steel Bank Common Lisp
Steel Bank Common Lisp (SBCL) is a high performance Common Lisp compiler

Scheme2Js is a Scheme to JavaScript compiler

Comparing SPUR to PyPy

SPUR: A Trace-Based JIT Compiler for CIL

chandlerkent / HKCD
MiniJava compiler for RHIT compilers course (CSSE 404)

Google Closure Compiler Advanced mode

free pascal
Free Pascal (aka FPK Pascal) is a 32 and 64 bit professional Pascal compiler.

VMKit: a substrate for virtual machines
The VMKit project is a framework for building virtual machines. It uses LLVM for compiling and optimizing high-level languages to machine code, and MMTk to manage memory.

Hello from a libc-free world! (Part 1)

My Links