Angel \”Java\” Lopez on Blog

November 28, 2014

SimpleScript (3) The Parser, Expressions and Commands

Previos Post

Let’s visit the Parser, that is a separated module. It starts with a simple declaration:

'use strict';

var lexer;

if (typeof lexer == 'undefined')
    lexer = require('./lexer');

var parser = (function () {
    var TokenType = lexer.TokenType;
    var binoperators = [ "+", "-", "*", "/", "==", "!=", "<", ">", "<=", ">=" ];

It uses and requires the lexer module. After this declaration, there are many expressions and commands. This is the expression “class” for a name (ie, foo):

function NameExpression(name) {
    this.isLeftValue = true;

    this.compile = function () {
        return name;
    };

    this.getName = function () {
        return name;
    }

    this.collectContext = function (context) {
        context.declare(name);
    }
}

In an upcoming post, I will describe the detection and construction of commands and expression. An expression should implement two methods: compile, that returns an string with the compiled JavaScript code associated to the expression, and collectContext, that allows the discover of used variables in an expression/command. In the above code, NameExpression declares its name to a context, an object that is recovering the used variables.

This is an IndexedExpression, composed by an expression and another one for the index (it’s like foo[42+1]):

function IndexedExpression(expr, indexpr) {
    this.isLeftValue = true;

    this.compile = function () {
        return expr.compile() + '[' + indexpr.compile() + ']';
    };

    this.collectContext = function (context) {
        expr.collectContext(context);
    }
}

The collectContext visits the internal expression (I could add the visit of the index expression, too).

There are commands, like IfCommand:

function IfCommand(cond, thencmd, elsecmd) {
    this.compile = function () {
        var code = 'if (' + cond.compile() + ') { ' + thencmd.compile() + ' }';
        if (elsecmd)
            code += ' else { ' + elsecmd.compile() + ' }';
        return code;
    };

    this.collectContext = function (context) {
        cond.collectContext(context);
        thencmd.collectContext(context);
        if (elsecmd)
            elsecmd.collectContext(context);
    }
}

The distinction between commands and expressions is a formal one. Again, a command should implement compile and collectContext. The above code generates a JavaScript if command.

As usual, I followed TDD (Test-Driven Development) workflow. Partial tests example:

exports['Compile string without quotes inside'] = function (test) {
    test.equal(compileExpression("'foo'", test), "'foo'");
    test.equal(compileExpression('"foo"', test), "'foo'");
}

exports['Compile name'] = function (test) {
    test.equal(compileExpression("foo", test), "foo");
}

exports['Qualified name'] = function (test) {
    test.equal(compileExpression("foo.bar", test), "foo.bar");
}

exports['Indexed term'] = function (test) {
    test.equal(compileExpression("foo[bar]", test), "foo[bar]");
}

Remember: No TDD, no paradise! 😉

Next topics: how to recognize and build expressions and commands.

Stay tuned!

Angel “Java” Lopez

http://www.ajlopez.com

http://twitter.com/ajlopez

November 11, 2014

SimpleScript (2) The Lexer

Previous Post
Next Post

These days, I was working improving my SimpleScript compiler to JavaScript. Today, I want to comment the implementation of the lexer, the repo is

http://github.com/ajlopez/SimpleScript

Now, the lexer code resides in a dedicated file lib/lexer.js, that expose a module, that can be consumed from Node.js and from the browser. It starts defining the token types:

var lexer = (function () {
    var TokenType = { 
        Name: 1, 
        Integer: 2, 
        Real: 3, 
        String: 4, 
        NewLine: 5, 
        Separator: 6, 
        Assignment: 7 };

Then, it defines some operatores, delimiters, and the Token, with two elements, type and value.

var separators = ".,()[]";
var assignments = ["=", "+=", "-=", "*=", "/="];
var operators = ["+", "-", "*", "/", "==", "!=", "<", ">", "<=", ">="];

function Token(value, type) {
    this.value = value;
    this.type = type;
}

The main job is in the Lexer “class”, with the method nextToken:

function Lexer(text) {
    var length = text ? text.length : 0;
    var position = 0;
    var next = [];

    this.nextToken = function () {
        if (next.length > 0)
            return next.pop();

        skipSpaces();

        var ch = nextChar();

        if (ch === null)
            return null;

        if (ch === '"' || ch === "'")
            return nextString(ch);

        if (ch === '\n')
            return new Token(ch, TokenType.NewLine);

        if (ch === '\r') {
            var ch2 = nextChar();

            if (ch2 === '\n')
                return new Token(ch + ch2, TokenType.NewLine);

            if (ch2)
                pushChar(ch2);

            return new Token(ch, TokenType.NewLine);
        }

        if (isAssignment(ch))
            return new Token(ch, TokenType.Assignment);

        if (isOperator(ch))
            return nextOperator(ch);

        if (isSeparator(ch))
            return new Token(ch, TokenType.Separator);

        if (isFirstCharOfName(ch))
            return nextName(ch);

        if (isDigit(ch))
            return nextInteger(ch);
    }

Finally, the module expose a lexer factory and the enumeration of token types:

return {
    lexer: function (text) { return new Lexer(text); },
    TokenType: TokenType
}

The code was developed using Test-Driven Development workflow. There is a file test/lexer.js, fragment:

function getToken(text, value, type, test) {
    var lexer = sslexer.lexer(text);
    var token = lexer.nextToken();
    test.ok(token);
    test.equal(token.value, value);
    test.equal(token.type, type);
    test.equal(lexer.nextToken(), null);
};

exports['Get names'] = function (test) {
    getToken('foo', 'foo', TokenType.Name, test);
    getToken('foo123', 'foo123', TokenType.Name, test);
    getToken('foo_123', 'foo_123', TokenType.Name, test);
    getToken('_foo', '_foo', TokenType.Name, test);
}

exports['Get integer'] = function (test) {
    getToken('123', '123', TokenType.Integer, test);
    getToken('1234567890', '1234567890', TokenType.Integer, test);
}

Remember: no TDD, no paradise 😉

Next topics: the parser, commands and expressions implementations, compilation to JavaScript.

Stay tuned!

Angel “Java” Lopez

http://www.ajlopez.com

http://twitter.com/ajlopez

January 2, 2013

SimpleScript (1) First ideas

The latest two week, I was busy writing CobolScript, my COBOL compiler to JavaScript (see my posts). I have console sample programs, and dynamic pages samples running on Node.js (see samples). The web samples are using the simple http node.js module, or my new SimpleWeb module, a simple middleware layer a la Connect. I started to write a Python to JavaScript compiler, see Py2Script, too. But now, after those training projecs (my first ones that are compiling to JavaScript using JavaScript), I want to push the envelope and write a simple script compiler, I named SimpleScript (see repo).

The key points:

– It compiles to JavaScript, so, it’s JavaScript-oriented. It’s not a script language to be implemented in different technologies (.NET, Java, JavaScript). It’s totally oriented to JavaScript semantic.

– I love C programming language tradition, but this time, I want no semicolons and curly braces. I want a syntax more Python/Ruby oriented.

– No indentation “hell”. Ok, I like Python, but indentation as part of the syntax, it’s not my preferred way.

– No command separator (no semicolon or something else) except new line or syntax. That is, I could write

if a < 1 a = 1

or

if a < 1
   a = 1
end

Notice the use of end.

You CANNOT write (semicolon is not a separator):

if a < 1 a=1; b=2

You MUST write:

if a < 1
  a = 1
  b = 2
end

– No parenthesis around conditions (see the above if example).

– Only for … in , to be discussed. I want to have for-in as in JavaScript, but with some variant to access directly the values instead the names/indexes of an object or array. Something like

for k in myarray

iterates over myarray indexes.

for k in myarray values

iterates over myarray values, directly. Range expression will be supported:

for k in 0..n

– Loops with continue, break. The main loop construction is the while.

– Functions as first class citizens.

function keyword will be used to define anonymous functions. Maybe (to be discussed) I will use define keyword to define named functions.

– Function invocation with explicit parenthesis (forget Ruby convention, go for something like Python 3.x).

– Array access with [] (forget Basic programming convention of using parenthesis).

– External variables. I learnt this from my CobolScript work: the linkage section is useful. An external variable is something injected at runtime (not a global one), and it can be provided in the call to the program. For example, print function could be an external one, so the calling program could redirect the print to the console, or to a buffer or to the web response. at will.

– Global variables should be declared explicitly. All non-declared variable is considered local (to the function, unit where it is used).

– Functions has closures as in JavaScript. In contrast, AFAIK, Python use explicit declaration to access outer variables in a function. I like JavaScript automatic access to outer variables, so I will keep it.

– Async constructions. I added this to CobolScript, and I found it useful. It’s something like the await/async in C# 5.0

– Run on the browser and on Node.js.

– Class support: to be discussed (the only key use case I have, is the game client coding for my game project).

I know, there are other implementations, like CofeeScript. But I want to training me, on JavaScript, Node.js and TDD.

Keep tuned!

Angel “Java” Lopez
http://www.ajlopez.com
http://twitter.com/ajlopez

Blog at WordPress.com.