Monday, February 20, 2012

4. Syntactic analysis of Wally


Parts: 1 2 3 4 5 6 7

In this part we'll be creating the parser rules for Wally. If you haven't already got the necessary files, download them here.

A Wally script optionally starts with one ore more use-statements followed by zero or more file_atoms and ending with an optional return statement. A use statement looks like this:

# use object Tuple from the file Tuple.wy
use Tuple from "Tuple.wy"
 
# use everything from the file Math.wy
use * from "Math.wy"

Resulting in the following parser rules:

parse
 : EOL* use_stats? file_atom* return_stat EOF
 ;
 
use_stats
 : use_stat+
 ;
 
use_stat
 : Use (Id | '*') From Str EOL+
 ;

A file_atom is one of three things:

  • an object;
  • a function;
  • or a block_atom (an assignment, function call, if-statement, ...).
file_atom
 : object
 | function
 | block_atom
 ;
 
object
 : Obj Id EOL INDENT object_atom+ DEDENT EOL* 
 ;
 
object_atom
 : vars
 | function
 ;
 
vars
 : Id (',' Id)* EOL
 ;
 
function
 : any_id id_list EOL block
 ;
 
any_id
 : ReservedId
 | In
 | Id
 ;
 
id_list
 : '(' (Id (',' Id)*)? ')'
 ;
 
block
 : INDENT block_atom* return_stat DEDENT
 ;

Since there is a bit of ambiguity w.r.t. an assignment and a function_call (both can start with a lookup), we'll add a syntactic predicate before the first to make sure we only parse an assignment if there really is an assignment ahead.

block_atom
 : (assignment)=> assignment EOL
 | function_call EOL
 | Break EOL
 | for_stat
 | if_stat
 | while_stat
 ;
 
for_stat
 : For Id '=' expr To expr EOL block
 | For Id In expr EOL block
 ;
 
if_stat
 : If expr EOL block elif_stat* else_stat?
 ;
 
elif_stat
 : Elif expr EOL block
 ;
 
else_stat
 : Else EOL block
 ;
 
while_stat
 : While expr EOL block
 ;
 
assignment
 : lookup ( '=' expr
          | '||=' expr
          | '&&=' expr
          | '+=' expr
          | '-=' expr
          | '*=' expr
          | '/=' expr
          | '%=' expr
          | '^=' expr
          )
 ;
 
function_call
 : lookup
 | Print '(' expr ')'
 | Println '(' expr? ')'
 | Assert '(' expr (',' expr)? ')'
 ;
 
return_stat
 : Return expr EOL
 | /* epsilon */
 ;

Below are the various expression rules:

expr
 : or ( '?' expr ':' expr
      | In expr
      )?
 ;
 
or
 : and (Or and)*
 ;
 
and
 : rel (And eq)*
 ;
 
rel
 : eq ((Lt | Gt | LtEq | GtEq) eq)?
 ;
 
eq
 : add ((Eq | NEq) add)*
 ;
 
add
 : mult ((Add | Sub) mult)*
 ;
 
mult
 : unary ((Mult | Div | Mod) unary)*
 ;
 
unary
 : Sub pow
 | Not unary
 | pow
 ;
 
pow
 : atom (Pow atom)*
 ;
 
atom
 : '(' expr ')'
 | New Id expr_params
 | lookup
 | list
 | Param
 | Str
 | Num
 | True
 | False
 | None
 ;
 
lookup
 : Id tail*
 | This tail*
 | any_id expr_params
 ;
 
tail
 : '.' Id
 | '[' expr ']'
 | '.' any_id expr_params
 ;
 
expr_params
 : '(' (expr (',' expr)*)? ')'
 ;
 
list
 : OBrace (expr (',' expr)*)? CBrace
 ;

That's the entire parser grammar. Now paste the following in the Test.wy file:

# import Tuple
use Tuple from "Tuple.wy"
 
# function twice
twice(n)
  temp = n + n
  return temp
 
# a rational object
obj Rational
 
  num, den
 
  Rational(n, d)
    num = n
    den = d
 
  str()
    return num + "/" + den
 
# script starts here
r = new Rational(1, 2)
t = twice(r)
println("t = " + t)

Recreate the JAR file and execute the Test.wy script:

ant jar
java -jar Wally.jar src/scripts/Test.wy

Again, you shouldn't see anything on the console, meaning the tokenization and parsing of the input went okay.

Download the project with the updates from this part here.

The next step is to let out parser construct a proper AST for us to walk (and evaluate) at a later time. Continue reading here.

Wednesday, February 15, 2012

3. Lexical analysis of Wally


Parts: 1 2 3 4 5 6 7

In this part we'll be creating the lexer of Wally. Before writing any code, or grammar, let's set up a small project structure with an Ant build file that will handle the generating of the parser and lexer, the compilation of the source files and creation of an executable JAR file.

Create the following folder structure:

+-- build/
|
+-- lib/
|    |
|    +-- antlr-3.3-complete.jar
|
+-- src/
|    |
|    +-- grammar/
|    |    |
|    |    +-- Wally.g
|    |
|    +-- main/
|    |    |
|    |    +-- wally/
|    |         |
|    |         +-- parser/
|    |         |
|    |         +-- Wally.java
|    |
|    +-- scripts/
|         |
|         +-- Test.wy
|
+-- build.xml

Download version 3.3 of ANTLR here: http://www.antlr.org/download/antlr-3.3-complete.jar

Wally.g

Initialize Wally.g with the following:

grammar Wally;
 
@parser::header {
  package wally.parser;
}
 
@lexer::header {
  package wally.parser;
}
 
parse
 : .* EOF
 ;
 
Any
 : .
 ;

This is just a simple stub, of course: we'll be filling it later on.

build.xml

Copy the following contents inside the build.xml file:

<?xml version="1.0" encoding="UTF-8"?>
<project name="Wally">
 
    <path id="classpath">
        <pathelement location="build/classes"/>
        <fileset dir="lib">
            <include name="*.jar"/>
        </fileset>
    </path>
 
    <target name="clean">
        <delete file="Wally.jar"/>
        <delete dir="build/classes/"/>
        <mkdir dir="build/classes/"/>
    </target>
 
    <target name="compile" depends="clean">
        <javac srcdir="src/main/" destdir="build/classes" includeantruntime="false">
            <classpath refid="classpath"/>
        </javac>
    </target>
 
    <target name="generate" depends="clean">
        <echo>Generating the lexer and parser...</echo>
        <java classname="org.antlr.Tool" fork="true" failonerror="true">
            <arg value="-fo"/>
            <arg value="src/main/wally/parser/"/>
            <arg value="src/grammar/Wally.g"/>
            <classpath refid="classpath"/>
        </java>
        <antcall target="compile"/>
    </target>
 
    <target name="jar" depends="generate, compile">
        <jar destfile="Wally.jar" whenmanifestonly="fail">
            <fileset dir="build/classes">
            </fileset>
            <manifest>
                <attribute name="Class-Path" value="lib/antlr-3.3.jar"/>
                <attribute name="Main-Class" value="wally.Wally"/>
            </manifest>
        </jar>
    </target>
 
</project>

It contains 4 targets:

  • clean - removes all compiled files;
  • compile - compiles all Java source files;
  • generate - generates the lexer and parser from the grammar file (Wally.g);
  • jar - bundles all compiled Java files in a JAR file to be able to run the main class more easily.

Wally.java

And the main Java file, Wally.java, will look like this:

package wally;
 
import org.antlr.runtime.*;
import wally.parser.*;
import java.io.File;
import java.util.Scanner;
 
public class Wally {
 
  public static WallyParser getParser(String fileName) throws Exception {
    WallyLexer lexer = new WallyLexer(new ANTLRStringStream(contentsOf(fileName)));
    return new WallyParser(new CommonTokenStream(lexer));
  }
 
  // Note that the input must have a trailing line break: otherwise the lexer 
  // will not be able to produce DEDENT tokens once we reach the EOF as you 
  // will see later. So don't trim() the String returned by this method!
  private static String contentsOf(String fileName) throws Exception {
    StringBuilder b = new StringBuilder();
    Scanner file = new Scanner(new File(fileName));
    while(file.hasNextLine()) {
      String line = file.nextLine();
      b.append(line).append('\n');
    }
    return b.toString();
  }
 
  public static void main(String[] args) {
 
    if(args.length == 0) {
      System.err.println("usage: java -jar Wally-0.1.jar SCRIPT-FILE [PARAMS]");
      System.exit(1);
    }
 
    try {
      WallyParser parser = getParser(args[0]);
      parser.parse();
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}

Test.wy

The file Test.wy will hold the Wally source. Copy the following into it:

a = 42

Test run

Okay, the project structure is in place, let's execute the jar target with Ant. Open a shell/command, navigate to the project's root directory and do:

ant jar

You should see the following output:

Buildfile: .../build.xml

clean:
   [delete] Deleting: .../Wally.jar
   [delete] Deleting directory .../build/classes
    [mkdir] Created dir: .../build/classes

generate:
     [echo] Generating the lexer and parser...

clean:
   [delete] Deleting directory .../build/classes
    [mkdir] Created dir: .../build/classes

compile:
    [javac] Compiling 3 source files to .../build/classes

compile:

jar:
      [jar] Building jar: .../Wally.jar

BUILD SUCCESSFUL

The jar target did all the dull stuff: it generated a parser and lexer from our grammar and put it in the folder src/main/wally/parser, compiled all source files, and packaged it all in a JAR file whose classpath was properly set, and it also points to the class containing the main method.

Now run the JAR file and provide the Test.wy script as a parameter:

java -jar Wally.jar src/scripts/Test.wy

If all goes well, nothing happens. This means the lexer and parser had no problem tokenizing and parsing the input file. Not surprising of course: our grammar tokenizes every character as an Any token, and the parser rule parse accepts zero or more of such Any tokens. No matter what the input is, it would always pass.

in- and dedent tokens

Let's immediately dive into the deep end of the pool by implementing the rule for in- and dedent tokens. An in- or dedent will always occur right after a line break. But we'll also want to create line break tokens, and since an ANTLR lexer rule can always create a single token, we'll override some methods that will let us emit more than a single token inside a single lexer rule. The following should come directly after the lexer::header section:

@lexer::members {
  private Queue<Token> tokens = new LinkedList<Token>();
  private Stack<Integer> indents = new Stack<Integer>();
  private int openBrace = 0;
 
  @Override
  public void emit(Token t) {
    state.token = t;
    tokens.offer(t);
  }
 
  @Override
  public Token nextToken() {
    super.nextToken();
    return tokens.isEmpty() ? Token.EOF_TOKEN : tokens.poll();
  }
}
 
// more rules...
 
OBrace     : '{' {openBrace++;};
CBrace     : '}' {openBrace--;};
 
EOL : /* todo */ ;

Notice that I also added a stack that keep track of the size of the indentation levels and a counter that will keep track of the number of open list-braces we've encountered: we won't be creating any in-, dedent and line break tokens when openBrace > 0.

Now the EOL (end-of-line) rule:

EOL
 : ('\r'? '\n' | '\r') Spaces? // match a line break followed by zero or more spaces
   {
     // Custom code here to inspect what the `Spaces` fragment matched
     // and produce EOL, INDENT or DEDENT tokens (or nothing at all).
     // Scroll down to see what Java code to put here.
   }
 ;
 
fragment Spaces  // replace tabs with 4 spaces
 : (' ' | '\t')+ {setText(getText().replace("\t", "    "));}
 ;

The custom code follows (I put it outside the ANTLR code for nicer formatting):

// Look at the next character in the char-stream
int next = input.LA(1);
 
if(openBrace > 0 || next == '\r' || next == '\n' || next == '#') {
  // If we're inside a list or on a blank line (a commented line is blank 
  // as well!), ignore all indents, dedents and line breaks
  skip();
}
else {
  // Push an EOL token on the token-queue by invoking our overridden 
  // `emit(...)` method
  emit(new CommonToken(EOL, "EOL"));
 
  // Get the length of the indents
  int indent = $Spaces == null ? 0 : $Spaces.text.length();
 
  // Find out how much indents the previous INDENT token was
  int previous = indents.isEmpty() ? 0 : indents.peek();
 
  if(indent == previous) {
    // Skip indents of the same size as the present indent-size
    skip();
  }
  else if(indent > previous) {
    // We encountered an INDENT, add it in the token-queue and remember the number 
    // of indents by pushing it on the stack
    indents.push(indent);
    emit(new CommonToken(INDENT, "INDENT"));
  }
  else {
    // If we get here, it must be a dedent token. Keep emitting DEDENT tokens
    // until either the queue is empty, or the last value on the stack is more than
    // the current indent-value
    while(!indents.isEmpty() && indents.peek() > indent) {
      emit(new CommonToken(DEDENT, "DEDENT"));
      indents.pop();
    }
  }
}

That should do the trick. Here's the grammar with all lexer rules and a parse rule that will print all the tokens it encounters on the console:

grammar Wally;
 
@parser::header {
  package wally.parser;
}
 
@lexer::header {
  package wally.parser;
 
  import java.util.Queue;
  import java.util.LinkedList;
}
 
@lexer::members {
  private Stack<Integer> indents = new Stack<Integer>();
  private Queue<Token> tokens = new LinkedList<Token>();
  private int openBrace = 0;
 
  @Override
  public void emit(Token t) {
    state.token = t;
    tokens.offer(t);
  }
 
  @Override
  public Token nextToken() {
    super.nextToken();
    return tokens.isEmpty() ? Token.EOF_TOKEN : tokens.poll();
  }
}
 
parse
 : (t=. 
     {System.out.printf("\%-20s \%s\n", 
          tokenNames[$t.type], $t.text.replace("\n", "\\n"));}
   )* EOF
 ;
 
ReservedId
 : 'add'    // + 
 | 'and'    // &&
 | 'div'    // /
 | 'eq'     // ==
 | 'get'    // ID[index]
 | 'gt'     // > 
 | 'gteq'   // >=
 | 'lt'     // <
 | 'lteq'   // <=
 | 'mod'    // %
 | 'mult'   // *
 | 'not'    // !
 | 'or'     // ||
 | 'pow'    // ^
 | 'set'    // ID[index] = expr
 | 'size'   // the size as a number
 | 'str'    // str() is called when concatenating strings
 | 'sub'    // -
 | 'type'   // implemented by default: returns the type as a string
 ;
 
In      : 'in';
New     : 'new';
For     : 'for';
To      : 'to';
Return  : 'return';
None    : 'none';
Obj     : 'obj';
Println : 'println';
Print   : 'print';
True    : 'true';
False   : 'false';
Assert  : 'assert';
If      : 'if';
Elif    : 'elif';
Else    : 'else';
While   : 'while';
Break   : 'break';
This    : 'this';
Use     : 'use';
From    : 'from';
 
AddAssign  : '+=';
SubAssign  : '-=';
MultAssign : '*=';
DivAssign  : '/=';
OrAssign   : '||=';
AndAssign  : '&&=';
PowAssign  : '^=';
ModAssign  : '%=';
Assign     : '=';
Add        : '+';
Sub        : '-';
Mult       : '*';
Div        : '/';
Eq         : '==';
NEq        : '!=';
Or         : '||';
And        : '&&';
Not        : '!';
Mod        : '%';
Pow        : '^';
Lt         : '<';
Gt         : '>';
LtEq       : '<=';
GtEq       : '>=';
OBrace     : '{' {openBrace++;};
CBrace     : '}' {openBrace--;};
 
Num
 : '0'..'9'+ ('.' '0'..'9'*)?
 ;
 
Param
 : '$' '0'..'9'+
 ;
 
Id
 : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
 ;
 
Str
 : '"' ('""' | ~('\r' | '\n' | '"'))* '"'
   {
     String s = getText().replace("\\n", "\n")
                         .replace("\\r", "\r")
                         .replace("\\t", "\t");
     setText(s.substring(1, s.length()-1).replace("\"\"", "\""));
   }
 ;
 
EOL
 : ('\r'? '\n' | '\r') Spaces?
   {
     int next = input.LA(1);
 
     if(openBrace > 0 || next == '\r' || next == '\n' || next == '#') {
       skip();
     }
     else {
       emit(new CommonToken(EOL, "EOL"));
 
       int indent = $Spaces == null ? 0 : $Spaces.text.length();
       int previous = indents.isEmpty() ? 0 : indents.peek();
 
       if(indent == previous) {
         skip();
       }
       else if(indent > previous) {
         indents.push(indent);
         emit(new CommonToken(INDENT, "INDENT"));
       }
       else {
         while(!indents.isEmpty() && indents.peek() > indent) {
           emit(new CommonToken(DEDENT, "DEDENT"));
           indents.pop();
         }
       }
     }
   }
 ;
 
Skip
 : Spaces  {skip();}
 | Comment {skip();}
 ;
 
fragment Spaces
 : (' ' | '\t')+ {setText(getText().replace("\t", "    "));}
 ;
 
fragment Comment
 : '#' ~('\r' | '\n')*
 ;
 
fragment DEDENT : ;
fragment INDENT : ;

To test if the INDENT and DEDENT tokens are properly created, paste the following inside Test.wy:

a
  b
    c
    cc
  bb
    ccc

where we'd expect the following INDENT (>) and DEDENT (<) tokens to be created:

a
> b
  > c
    cc
< bb
  > ccc
< <

Recreate the JAR file and execute the test script:

ant jar
java -jar Wally.jar src/scripts/Test.wy

which would print the following to the console:

Id                   a
EOL                  EOL
INDENT               INDENT
Id                   b
EOL                  EOL
INDENT               INDENT
Id                   c
EOL                  EOL
Id                   cc
EOL                  EOL
DEDENT               DEDENT
Id                   bb
EOL                  EOL
INDENT               INDENT
Id                   ccc
EOL                  EOL
DEDENT               DEDENT
DEDENT               DEDENT

which corresponds to our expectations perfectly!

There are still some lexer rules missing, like parenthesis and square brackets. But since these will be discarded from the AST our parser will be creating, we'll create such tokens on the fly inside parser rules. Which brings us to the next part of this tutorial: the syntactic analysis of Wally.

Download the project created so far here.

Tuesday, February 14, 2012

2. Some Wally samples


Parts: 1 2 3 4 5 6 7

In this part I'll be introducing you to Wally by providing some example code and indicating how it would (or should) evaluate.

white spaces

Wally will look a bit like Python w.r.t. indentation and line-breaks acting like end of statements. For example, take the function twice:

twice(n)
    temp = n + n
    return temp
x = twice(6)
print(x)         # => 12

(note that the # => 12 comment indicates that the value 12 is either returned, or displayed)

The two statements temp = n + n and return temp both belong to the code block of the function twice. Notice that both statements don't have a semi colon indicating their end: a line break indicates the end of a statement.

comments

Wally only knows single line comments which start with a # sign:

# a single line comment
# and another comment

data types

Wally will support the following data types: boolean, number, string, list and user/programmer defined objects. There will also be none which is the null equivalent in Java.

boolean

A boolean is one of:

true
false

number

A number can be an integer, or floating point number:

1234
100.987

string

A string is a double quoted value. If you want to use a literal double quote, it needs to be escaped with another double quote:

"a string"
"A ""B"" C"    # => A "B" C 
"1\t2"         # => 1    2

list

A list is zero or more expressions/values separated by a comma, surrounded by { and }. Note that although Wally will be white-space sensitive, lists will be able to span more than a single line. The following two lists are therefor equivalent:

{1, 2, 3, 4, {5, 55, 555}}
 
{
  1, 2, 3, 4, 
  {
    5, 55, 555
  }
}

objects

A user object starts with the keyword obj followed by an identifier, which will be the name of the object. After that, one or more indented identifiers (instance variables) and/or functions are placed under the object indicating these belong to an instance of such an object. For example, a Rational object might look like this:

obj Rational
 
    num # the numerator
    den # the denominator
 
    # a function with the same name of the 'obj' acts like a constructor
    Rational(n, d)
        num = n
        den = d
 
    denominator()
        return den
 
    # multiply 'this' with 'that'
    mult(that)
        return new Rational(num*that.num, den*that.den)
 
    numerator()
        return num
 
    # returns a string representation of 'this'
    str()
        return num + "/" + den
 
# some statements outside the object are executed as a script
s = new Rational(3, 4)
t = new Rational(5, 6)
 
println("s   = " + s)
println("t   = " + t)
println("s*t = " + (s*t))

Executing the script above would result in the following output:

s   = 3/4
t   = 5/6
s*t = 15/24

operator overloading

As you may have noticed in the Rational object, the expression (s*t) resulted in the mult function to be invoked. Some function names have a special meaning in Wally, like mult and str in the Rational object. Below is a list of all special function names that map to an operator, or a special action:

add(that)           # this + that
and(that)           # this && that
div(that)           # this / that
eq(that)            # this == that
get(index)          # this[index]
gt(that)            # this > that
gteq(that)          # this >= that
in(needle)          # check if needle is present in this
lt(that)            # this < that
lteq(that)          # this <= that
mod(that)           # this % that
mult(that)          # this * that
not()               # !this
or(that)            # this || that
pow(n)              # this ^ n
set(index, expr)    # this[index] = expr
size()              # this' size
str()               # this' string representation
sub(that)           # this - that
More on this later.

expressions

The usual mathematical operators can be used in Wally. Below are some examples in order of their precedence (the operators with the lowest precedence are listed first).

logical or

true || false      # => true
false || false     # => false

logical and

true && false      # => false
false && false     # => false
true && true       # => true

relational

3 >= 4             # => false
99 > 4             # => true
3 <= 3             # => true
3 < 4              # => true

equality

"1" == 1           # => false
"1" != 1           # => true
{1, 2} == {2, 1}   # => false
true != false      # => true

addition

2 + 3              # => 5
"ab" + 2           # => "ab2"
{1, 2} + 3         # => {1, 2, 3}
{1, 2} + {3, 4}    # => {1, 2, {3, 4}}

subtraction

2 - 3              # => -1
{1, 2, 2} - 2      # => {1, 2}
{1, 2, 3} - {3}    # => {1, 2, 3}

multiplication

2 * 3              # => 6
"abc" * 3          # => "abcabcabc"

division

6 / 3              # => 2
1 / 3              # => 0.33333333333

modulus

5 % 3              # => 2
5 % 1.5            # => 0.5

unary minus, not

-5                 # => -5
!true              # => false
!!false            # => false

power

2^10               # => 1024
2^3^4              # => 2417851639229258300000000
(2^3)^4            # => 4096

built-in functions

Wally will have 3 built-in functions: print, println and assert.

print("x")                  # => "x"
println("x")                # => "x\n"
assert(2 == 1+1)            # => passes
assert(2 == 1-1)            # => throws an error
assert(2 == 1-1, "Oops!")   # => throws an error with the message: "Oops!"

assignments

Assignments can be simple assignments like:

a = 5
print(a)    # => 5

But it could also be assignments in (nested) lists:

list = {1, 2, 3, {44, 55}}
list[2] = "c"
println(list)               # => {1, 2, c, {44, 55}}
list[3][0] = "d"
println(list)               # => {1, 2, c, {d, 55}}

or object instances:

obj Box
    value
 
b = new Box()
println(b.value)    # => none
b.value = 42
println(b.value)    # => 42

For all binary operations, +, -, * etc., there will also be the shorthand: +=, -=, *= etc. Where a += 1 is equivalent to a = a + 1.

user defined functions

A user defined function is just an identifier with parenthesis placed after it, with optional parameters inside these parenthesis followed by an indented block of statements.

factorial(n)
    if n < 2
        return n
    else
        return n * factorial(n - 1)
 
print(factorial(5))    # => 120

Note that a call to factorial could also be placed before the function is defined:

print(factorial(5))    # => 120

factorial(n)
    if n < 2
        return n
    else
        return n * factorial(n - 1)

And functions have their own scope: a variable defined in the global scope of a script is not visible inside a function, nor are variables defined inside the function available in the global scope. Below is an example that illustrates this:

total = "global total"
 
f()
    total = "local total"
    return total
 
println(total)    # => global total
println(f())      # => local total
println(total)    # => global total

parameter passing

Like Java, Wally passes its parameters by value. Note that this means its references to objects are copied and passed by value, not the entire object itself! A little demo:

obj Box
 
    value
 
    Box(v)
        value = v
 
# function f
f(b)
    b.value = 1
 
# function g    
g(b)
    b = new Box(100)
 
box = new Box(42)
println(box.value)    # => 42
f(box)
println(box.value)    # => 1
g(box)
println(box.value)    # => 1

In other words, changing the object's internal attributes is possible, but assigning a new instance to b, as done in the function g(...), has no effect on the original object instance box. Again, this works the same as in Java.

control statements

There will be the following control statements: if, for and while. In both the for and while statements, a break, or an early return, can terminate the loop.

if statement

An if statement will look like this:

if a > b
    println("a is more than b")
elif a < b
    println("a is less than b")
else
    println("a and b are equal")

ternary

There will also be the ternary statement:

n = a > b ? "a" : "b"

which would be equivalent to the following if statement:

n = none
if a > b
    n = "a"
else
    n = "b"

for statement

A for statement comes in two flavors:

for n=? to ? ...
for n in ? ...

where the question marks are expressions. Some examples:

for i = 1 to 3
    println(i)
 
for n in {10, 20, 30}
    println(n)
 
for c in "abc"
    if c == "c"
        break
    println(c)

When the script above is executed, the following will be printed to the console:

1
2
3
10
20
30
a
b

The nice thing about the for ... in ... statement is that you can iterate over custom objects, as long as the functions get(...) and size() are implemented. An example:

obj Bag
 
    itemA, itemB, itemC
 
    Bag(a, b, c)
        itemA = a
        itemB = b
        itemC = c
 
    get(index)
        if index == 0
            return itemA
        elif index == 1
            return itemB
        else
            return itemC
 
    size()
        return 3
 
    str()
        return "Bag=[" + itemA + ", " + itemB + ", " + itemC + "]"
 
# the script
bag = new Bag("x", "y", "z")
println(bag)
for item in bag
    println(item)

which would print:

Bag=[x, y, z]
x
y
z

while statement

The while statement will probably not pose any problems:

x = 5
 
while x > 0
    println("x=" + x)
    x -= 1

would print:

x=5
x=4
x=3
x=2
x=1

in operator

Finally, there's the in operator that tests if a certain value is present in a collection, or object.

list = {3, 4, 5}    
string = "abc"
 
println(2 in list)        # => false
println(5 in list)        # => true
println("A" in string)    # => false
println("b" in string)    # => true

And if the in function is implemented in a custom object, the in operator would use that overridden function:

obj Bag
 
    value_a, value_b
 
    Bag(a, b)
        value_a = a
        value_b = b
 
    in(needle)
        return value_a == needle || value_b == needle
 
# test the bag's in(...) function below
bag = new Bag("beer", "coffee")
 
println("Beer" in bag)    # => false
println("beer" in bag)    # => true

wrap up

That pretty much sums up the functionality we'll be putting into Wally. I realize I didn't provide many details about objects and how the operator overloading mechanism works exactly, but I'll get to that in later parts of this blog-series. The first problem we'll be tackling is the lexical analysis of Wally, of which the trickiest part will be how to let ANTLR properly recognize indent and dedent white spaces (and ignore them when inside a list!). Continue reading here.

If you like, you can also play around with the language a bit: I deployed a small interpreter in Appspot here.

1. Programming Wally


Parts: 1 2 3 4 5 6 7

In a previous series of posts, I demonstrated how to create a small, dynamically typed programming language. In this blog we'll be creating a (slightly) more complicated language. Just as the previous language, we'll be using ANTLR to create a parser for our language and a sort of iterator (tree walker) that will walk the AST and evaluate the input.

The previous language was called TL (Tiny Language), which was a rather dull name. The language we'll be creating in the upcoming blog posts will be called Wally, after the (in)famous character from Scott Adams' (IMHO) hilarious Dilbert comic strip.

Wally will have Pythonesque grouping of statements into code blocks by white space indenting. There will be no semi-colons that indicate the end of a statement, some form of operator overloading will be supported and it will also support user defined objects and the possibility to import other (Wally) objects or functions from other files, to name the most radical changes compared to TL.

Note that in this blog series, I will not go too deeply into ANTLR: for a more gentle introduction to ANTLR, please see my previous blog posts.

Go ahead and continue reading part 2 where I'll give a birds-eye view of Wally's syntax.