Friday, November 27, 2015

Script Language for Memory Constrainded Embedded Systems

A little digression on my quest for the ideal embedded scripting language.

I wanted to add scripting capability into a system I've created as open source project for quite some time:

After some back and forth with languages like eLua the conclusion I've reached is that although powerful the run-time footprint was far away from what I could afford. At the time I couldn't find anything suitable.
Then recently when developing a tool, for the company I'm working right now, I felt the need for some mechanism to flexibly configure the system. My answer too provide this flexibility was again an embedded scripting language. 

I revisited some  options previously considered and also new ones. Frustrated with the outcome I decided to write my own scripting language and run-time environment. I've start studying compilers theory, attended some online classes and after a while I was ready to go.

First thing was to create a list of requirements for the language, which are:
- The language is not meant to be a general purpose one, so we can get away with some complex constructs.
- The main use is to bind existing functions of the system in order to provide dynamic response to events. A sort of user defined behavioural response (the ultimate flexibility for configuration).
- Small runtime footprint.
- Should run portable p-code (bytecodes) in an compact virtual machine. It should allow for the bytecode stream to be generated in a host and transferred to run in the target.
- Support complex integer arithmetic and logic expressions.
- 32 bits integers variables is a requirement.
- Exception handling is desirable as a clean way of dealing with errors.
- Strings are optional specially string manipulation routines. The reason for that is that this requires dynamic memory allocation. 
- For stings it must support constant strings in non volatile memory.
- It's not required to support user defined functions.
- The system can support multiple scripts. Possibly one script for each type of event.
- Support global user defined variables shared among scripts. Meaning that when one script is fired in response to an event it can record some information to be used for another script later one.
- The syntax must be close to a common programming language, like C or Java.
- The compiler must be compact to be embedded in the target system.
- No optimization of the code is necessary.
- The syntax analysis should be relatively strong (difficult to measure). The idea is to do a certain amount of static analysis to reduce the run-time checking as much as possible. Some common constructs can be left out of the language for this reason.
- If possible avoid complex run time memory management support. They are tend to be expensive in terms of memory and/or CPU usage.
- It should be possible to compile chunks of code at time. The compiler have to save it's state to resume it;s operation when more code became available. This is needed in order to compile code embedded as arrays of strings in JSON files without resorting to stitching the strings before compiling the full text.

From this initial requirements some other where added due to the limitation of syntax analysis with very little memory:
- Has to be a one pass compiler (translator) no parse or abstract syntax tree can be generated due to memory limitation.
- It can't be a recursive compiler because it's too heavy on the stack. Also memory limit check for the stack are hard to implement.So a recursive descent parser was out of the equation, although simple to implement.

These requirements narrowed down the solution to a LL(1) grammar and a table based Syntax Directed Translator. The problem now was how to generate the a compact lexical analyzer and the parser. The lexer (scanner) was solved by a handcrafted specialized code, not particularly difficult. 
Next dragon to slay was the parser. I've tried to find some tools to generate the tables for LL(1) grammars but nothing good. Some tools crashed as soon as the grammar grew to be a little more complex. Also the tables generated where very large, unsuitable for what I wanted. The solution was to write my own parser generator for LL(1) grammars. But wait, searching the internet I found a perfect starting point which was a tool developed by Prof. Ivo Mateljan from the University of Split, Croatia. It was a code he wrote for his students in a computer science classes. The program called ELL had almost everything I needed. It cold parse a grammar create the first and follow sets and create the list of predictions for each rule. I asked Prof. Ivo to modify his program, which he promptly and generously did. Then I added a code to generate the tables in C code and an extension to insert semantic actions inside the productions in the grammar. The big trick was a method I devised to binary search for the correct rule in the predictions list instead of a single lookup table. The ELL then generates 4 tables 2 functions and a set of constant definitions to generate the skeleton of a Syntax Directed Translator. It's pretty neat. I hope to create an open source project out of it, it may help other people as well.

The result is a language which I'm calling provisionally "MicroJS" because resembles JavaScript. 
As it stands right now, the minimum compiled code targeting an ARM Cortex-M3 microcontroller is 9064bytes of FLASH (Code) 416 bytes of RAM plus some 128bytes for the stack. This includes a small library with some 9 functions including a "printf", the compiler, the virtual machine. 
A more realistic example is a system that includes a serial driver, a console shell a small basic flash filesystem some basic commands to upload manage files and to upload scripts using the Xmodem protocol. All of that costs 17720 bytes of code and 968 bytes of memory and around 1256 bytes for stack (the problem is the xmodem here that requires a 1k buffer, a better implementation could reuse the microjs space for the xmodem buffer which would reduce the total memory requirement to around 1.5KiB).

In case you may be wandering the type of code I can run. These are 2 examples of the code I used to test the system described above:

Example 1:
// Generate the Fibonacci sequence up to the maximum 32 bits signed integer

var x0 = 0, x1 = 1, x;

try {
    var i = 1;
    while (1) {
        // Check whether the next sum will overflow or not
        if (0x7fffffff - x1 <= x0) {
            throw 1; // overflow
        x = x1 + x0;
        x0 = x1;
        x1 = x;
        printf("%2d | %10u\n", i, x);
        i = i + 1;
} catch (err) {
    printf(" - overflow error!\n");
Produces the output:
[JS]$ js fib.js
Code: 85 bytes.
Data: 12 bytes.
 1 |          1
 2 |          2
 3 |          3
 4 |          5
 5 |          8
 6 |         13
 7 |         21
 8 |         34
 9 |         55
10 |         89
11 |        144
12 |        233
13 |        377
14 |        610
15 |        987
16 |       1597
17 |       2584
18 |       4181
19 |       6765
20 |      10946
21 |      17711
22 |      28657
23 |      46368
24 |      75025
25 |     121393
26 |     196418
27 |     317811
28 |     514229
29 |     832040
30 |    1346269
31 |    2178309
32 |    3524578
33 |    5702887
34 |    9227465
35 |   14930352
36 |   24157817
37 |   39088169
38 |   63245986
39 |  102334155
40 |  165580141
41 |  267914296
42 |  433494437
43 |  701408733
44 | 1134903170
45 | 1836311903
 - overflow error!
Example 2:

// Print a list of 100 random prime numbers
var j, cnt = 0;

srand(time()); // initialize random number generator


for (j = 0; j < 100; ) {
    var n = rand();
    var prime;
    if (n <= 3) {
        prime = n > 1;
    } else {
        if (n % 2 == 0 || n % 3 == 0) {
            prime = false;
        } else {
            var i;
            var m;
            m = sqrt(n) + 1;
            prime = true;
            for (i = 5; (i < m) && (prime); i = i + 6) {
                if (n % i == 0 || n % (i + 2) == 0) {
                    prime = false;
    if (prime) {
        j = j + 1;
        printf("%3d %12d\n", j, n);
    cnt = cnt + 1;


var x = (j * 10000) / cnt;

printf("%d out of %d are prime, %d.%02d %%.\n",
       j, cnt, x / 100, x % 100);

The result form the console (the intermediate values were cut):
[JS]$ js prime.js
Code: 230 bytes.
Data: 12 bytes.
  1   1840531613
  2   1518954509
 98   1946156671
 99    821160383
100    359376917
100 out of 2022 are prime, 4.49 %.

This can give you an idea of the syntax and capabilities of the language.
The compiled code size is reasonable. And the execution speed is considerably good. But this is just the impression I have from the complexity of the prime algorithm It took 28 seconds to factor 2022 32 bit numbers in a 16MHz machine, It won't break any cryptosystem but seems good enough for and embedded scripting.

Some observations about the language:

- There is no increment (++) or decrement (--) operations.
- The only assignment operation allowed is equals (=), contrasting with C alternative assignments like:  +=, *= ...
- The "for", "if" and "while" structures require the statements to be surrounded by braces "{ }", this is to avoid the famous dangling "else" issue, which is hard to treat with LL(1) grammars.
- There is no "switch/case" construct in the language.
- All variables are 32 bits signed integers. Although the language can accept strings and chars and booleans they will be stored ant treated internally as signed integers.
- There are no support (for the moment) of "break" and "continue" declarations.
- There is no "goto" construct.
- No user defined functions. All callable functions are provided by a compile time defined library. This is a problem difficult to decide. Although it's not that complicated to allow functions, it's misuse can lead to problems difficult to treat like recursive calls that may exhaust the stack very quickly. Also static analysis is much simpler with no function calls to deal with. Library calls area easy to handle as they don't use the VM's memory space to run.
- Variadic functions are allowed. Yippee. I can't live without printf()...
- Multiple return of functions arguments are allowed (work in progress). With the not so common construct:
     (x, y) = get_point();
- There is a default catch all exception handler which silently terminates the script. The exception number is returned as a return value of the virtual machine. So it's better not to throw a 0 exception, which will be difficult to catch.
- There are no real differences between logical and bitwise AND and OR operators, which will perform as expected on boolean values anyway. So "&" and "&&" are interchangeable.
- BUG: There is a small limitation (which I plan to fix soon) in the precedence of "*" and "/" operators, they are at the same level and evaluated from right to left (easy to solve).
- TODO: arrays. This is a tricky one for non-typed language (but hey, we are intrinsically typed everything is integral). The problem is two folds. First is how to correctly allocate memory for it. Easy to solve if we force defining the size in the declaration, alternatively a static analysis could do the trick. But how to check for bound in run-time without too much of metadata being managed by the VM? Does someone have the answer for that? Other issue is the utility of arrays if we can't have other types except for integers. Maybe the trade offs do not favor implementing arrays. Extra complexity with no real benefit for the intended use. Other idea is to implement library defined arrays only. At least you could do something like:
    x = sensor[2];
    valve[1] = x * 4;
This can be easily implemented by a syntax action which calls access functions (get()/set()).
- TODO: packaging the byte code for remote target. How to carry the required library information without taking too much space.

Well I think that's enough for now.

Thanks for listening :)...