Category Archives: GrammGen

GrammGen In C# (2) First Rules

Previous Post

Let’s see today how a parser is built with GrammGen using code. The idea is to define using GrammGen how to build a tree with expressions detected in the text under process. The text is written in the programming language that we want to implement. The expression tree is created using the GrammGen definition, given by code.

I presented in the previous post for an example of a console program, which employs GrammGen. A parser is defined using rules. The rules indicate:

– How to process text characters
– How then form text elements, such as numbers, names, operators
– How to go from putting together such elements, expressions, commands that define our language to implement.

In the sample of calculator

https://github.com/ajlopez/GrammGen/tree/master/Samples/Calculator

the rules are defined by the parser:

https://github.com/ajlopez/GrammGen/blob/master/Samples/Calculator/Calculator/ExpressionParser.cs

The first rules:

private static Rule[] rules = new Rule[]
{
   Rule.Or(' ', '\t', '\r', '\n').Skip(),
   Rule.Or('+', '-').Generate("Oper0"),
   Rule.Or('*', '/').Generate("Oper1"),
// ...


The idea is to get an Expression. The first rule defines that anyone (Rule.Or) of space, tab, carriage return, new line items are ignored (. Skip ()). You can put more than one rule of this type, for example, to set comments to be ignored from the input text.

The second rule defines both + (the plus sign) and – (minus sign) produce a called terminal Oper0. They differ from * (the token) and / (divide sign) because we want to take precedence among operators.

There is a rule to construct integers:

 

Rule.Get("0-9").OneOrMore().Generate("Integer", MakeIntegerConstantExpression), 

The Rule.Get allows you to specify a range of characters, and then, with the fluent interface defined, you can attach the .OneOrMore () method, which specifies that the rule can be applied once or several times. When you can not apply more, it ends generating a node called "Integer". But in GrammGen a node, in addition to a name, may have an associated object. Al. A second parameter can be passed to Generate method, with a function that creates the associated object.

MakeIntegerConstantExpression job is:

– Get the object that was formed with the rule, in this case, a concatenated string with all digits were found.

– Transform it to integer

– Create a ConstantExpression with the intever as constant value.

The class ConstantExpression is not from GrammGen. It is defined by us. We have the freedom to define the objects we want to attach to each node of the GrammGen generated tree.

The code of MakeIntegerConstantExpression is:


   private static object MakeIntegerConstantExpression(object obj)
   {
       int value = int.Parse((string)obj);

       return new ConstantExpression(value);
   }

In the next posts we will see how the nodes of binary operations are formed, how operator precedence is implemented, how to manage left recursion, and evaluation of the generated expressions.

Stay tuned!

Angel “Java” Lopez

http://www.ajlopez.com

http://twitter.com/ajlopez

GrammGen in C# (1) First Concepts

Next Post

I worked a lot writing lexers and parsers, see:

https://github.com/ajlopez/AjSharp
https://github.com/ajlopez/AjTalk
https://github.com/ajlopez/AjTalkJs
https://github.com/ajlopez/Mass
https://github.com/ajlopez/AjLispJava

and more. Two years ago, I met Ian Piumarta en la Smalltalks 2011, at Universidad de Quilmes (see AjSoda). He suggested I could work  on an implementation of a PEG:

http://en.wikipedia.org/wiki/Parsing_expression_grammar

Past year I tried to write something in JavaScript, but this year I put my effort on writing two implementations. I’m not sure they are PEGs, but they are close. Inm JavaScript, I have:

https://github.com/ajlopez/SimpleGrammar

I’m doing “dog fooding” using it at:

https://github.com/ajlopez/PageJs

with good results. But today, I would like to present my C# implementation:

https://github.com/ajlopez/GrammGen

It’s a solution built using TDD workflow. It has a class library and tests:

The idea is to have a parser with a list of rules. Each rule recognize a series of characters and patterns:

var rule = Rule.Get("0-9").OneOrMore().Generate("Integer");

The above rule recognize a series of digits, producing a non-terminal element with name “Integer” (the non-terminal are named with first letter in upper case). There is a fluent interface, and the .Generate method groups the result  (a character string) under a named element “Integer”.

The .Generate method could associate a custom object to “Integer” element:

Rule.Get("0-9").OneOrMore().Generate("Integer", 
    x => int.Parse((string)x));

In this case, it associates the conversion of the collected string to native integer.

You can write a list of rules:

var rules = new Rule[] {
    // ...
    Rule.Get("Term", Rule.Or('*', '/'), "Factor").Generate("Term", MakeBinaryOperatorExpresion),
    Rule.Get("Factor").Generate("Term"),
    Rule.Get("Integer").Generate("Factor"),
    // ...
}

and then, with you can create a Parser object:

var parser = new Parser("1+2", rules);

Then, you can get a named element:

var element = parser.Parse("Expression");
var expression = (IExpression)element.Value;

Our code should generate an element of IExpression. That is our interface: it’s not part of GrammarGen. Each element returned by the library can have an associated custom element, attached in .Generate methods. See the test code, and the calculator sample (ie., it has left recusrion).

Now I have a console sample:

It parse and evaluate an string with an arithmetic expression:

I could use GrammGen for more ambitious projects, ie. building an AST (Abstract Syntax Tree) for a programming language.

Keek tuned!

Angel “Java” Lopez

http://www.ajlopez.com

http://twitter.com/ajlopez