The Target Language

The compiler course I took at Johns Hopkins featured a small language called "XL." I'm not sure who came up with this language, but the course was taught by Dr. John Moore, and he presented XL to us. I wanted to at least give him some credit for getting me really interested in compilers and such. If anyone knows a reason why I shouldn't be presenting this language, please let me know and I'll remove this from the net. I'll be presenting the language almost verbatim from Dr. Moore's writeup of the language.

Note that in this tutorial I will not be implementing the entire XL language. I will only be implementing the features I selected when writing the compiler for the class. Eventually I will try to implement all the features in this document, if I have lots of extra time between reading to the kids and changing their diapers. I will try to mention when features are not implemented.

XL is a small but complete programming language based primarily on Ada and, to a lesser extent on C, C++ and Pascal. XL was designed to be suitable for use as a project language in a course on compiler design and construction. Its features illustrate many of the techniques and problems associated with language translation.


General

XL is case sensitive. Upper-case letters and lower-case letters are considered to be distinct in all tokens, including reserved words.

White space (space character, tab character and end-of-line) serves to separate tokens; otherwise, it is ignored. No token can extend past end-of-line. Spaces may not appear inside any token except character and string literals.

A comment begins with two forward slashes and extends to end of line, as in C++.


Identifiers

Identifiers start with a letter and contain letters and digits. An identifier must fit on a single line and its first 100 characters are significant.


Reserved Words

The following keywords are reserved in XL:

and array Boolean begin Char
constant else elsif end exit
function if Integer loop mod
not of or out procedure
program record return String then
type var when while

Literals

An integer literal consists of a sequence of one or more digits

A character literal is a single character enclosed by a pair of apostrophes (sometimes called "single quotes".) Examples include 'A', 'x', and '''. A character literal is distinct from a string literal of length one.

A string literal is a sequence of zero or more printable characters enclosed by a pair of quotation marks ("double quotes.") The double quote character itself can be represented in a string as two adjacent double quotes. In XL, string literals are only used as arguments to the predefined I/O procedure "put" (see below)


Other Tokens (delimiters and operators)

: ; , . ( ) [ ] // one character

+ - * / < = > '

.. := /= >= <= // two characters

and the end-of-file character


Standard (Predefined) Scalar Types

The type Boolean is treated as a predefined enumeration type with elements FALSE and TRUE

The type Integer is a predefined type as in Pascal and FORTRAN (equivalent to type int in C)

The type Char is a predefined type which is equivalent to Pascal's CHAR. The predecessor and successor functions are invoked as attributes Char'pred and Char'succ. Thus, Char'pred('D') is 'C' and Char'succ('A') is 'B'. (At this point, these attributes are not implemented. I may add them later.)


Type Generators

An enumeration type is defined by listing the identifiers which are the actual values of the type. As with type Char, the predecessor and successor functions of an enumeration type T as invoked as attributes T'pred and T'succ. For example:

type CardSuit = <CLUB, DIAMOND, HEART, SPADE>;

           

Then CardSuit'pred(HEART) is DIAMOND and CardSuit'succ(HEART) is SPADE.

An array type (one dimensional only) is defined by giving a range of indices and the component type. Only integer indices are allowed in XL. For example

type Table = array[1..10] of Integer;

           

A record type is defined by listing the individual component names (fields) and their types. For example:

type Date =

    record

        day : Integer;

        month: Integer;

        year: Integer;

    end record;

           

Named Constants

Named constants are introduced by declarations of the form

constant ID, ID, . . ., ID: typeName := expression;

           

typeName must be an identifier representing a scalar type (Integer, Boolean, Char or a user-defined enumeration type.) The phrase ":= expression" is required. For example:

constant maxIndex : Integer := 100;

           

Variables

Variables are introduced by declarations of the form

var ID, ID, ..., ID : typeName := expression;

           

typeName must be an identifier, not a type constructor such as an array or record. The phrase ":= expression" is optional and can only involve literals and named constants. For example:

var I : Integer := 1;

var b1, b2 : Boolean;

           

Operators and Expressions

Operators

The operators, in order of precedence, are

Boolean negation not
Unary adding operators + -
Multiplying operators * / mod
Binary adding operators + -
Relational operators = /= < <= > >=
Logical operators and or

Expressions

For binary operators, both operands must be the same type. Similarly, for assignment compatibility, both the left and right sides must have the same type. Objects are considered to have the same type only if they both have the same type name. Thus, two distinct type definitions are considered different even if they may be structurally identical. This is known as "name equivalence" of types.

Short Circuiting

Logical operators and and or use short-circuit evaluation.

(If you are not familiar with short-circuiting, this means that as soon as the truth value can be determined, evaluation stops. For example, if the first operand of an and evaluates false, the expression will evaluate false no matter what the second operand is, so the second operand is not even evaluated. If the first operand of an or evaluates true, the second isn't evaluated either.)


Statements

All statements are terminated with a semicolon.

Assignment statement

(":=" is the assignment operator). For example

i := 2*I + 5;     

If statement

if x > MAX then             if x < 10 then

    MAX := x;                   x := x + 1;

elsif x < MIN then              y := 2*x;

    MIN := x;               end if;

end if;

           

Loop and exit statements

loop                        while I < n loop

    get(x);                     sum := sum + a[I];

    exit when x = SIGNAL;       i := i + 1;

    process(x);             end loop;

end loop;

           

I/O Statements (for text I/O only)

XL defines only sequential I/O for two basic character streams, standard input and standard output. All I/O is provided by the following procedures:

procedure put(item : String); // for string literals

procedure put(item : Char);

procedure put(item : Integer);

procedure newLine;

procedure get(var item : Char);

procedure get(var item : Integer);

procedure skipLine;

           

Subprograms

Procedures

Procedures are similar to those in Pascal except that an explicit return is allowed. The program is essentially the outermost procedure (with no parameters) and serves as a starting point for the program.

Functions

Functions can return scalar types only, not arrays or records. Only value parameters are allowed. A function returns a value by executing a statement of the form

return expression;

           

Parameters

There are two parameter modes in XL: value parameters and variable parameters. Value parameters (passed by copy) are the default. Variable parameters (passed by reference) must be explicitly declared as in

procedure p(var x : Integer) ...

           

Subprogram end

The name of the procedure must be repeated at the end of its declaration. For example

procedure proc1;

end proc1;

           

Specification Conclusion

OK, so it's sketchy. I know, I know. But that's what I got. Maybe I'll improve it someday, but I think if you're a programmer you'll probably understand it pretty well anyway...

Next we'll discuss our strategy...