Friday, November 27, 2015

Script Language for Memory Constrainded Embedded Systems

A little digression on my quest for the ideal embedded scripting language.

I wanted to add scripting capability into a system I've created as open source project for quite some time: https://code.google.com/p/yard-ice/

After some back and forth with languages like eLua the conclusion I've reached is that although powerful the run-time footprint was far away from what I could afford. At the time I couldn't find anything suitable.
Then recently when developing a tool, for the company I'm working right now, I felt the need for some mechanism to flexibly configure the system. My answer too provide this flexibility was again an embedded scripting language. 

I revisited some  options previously considered and also new ones. Frustrated with the outcome I decided to write my own scripting language and run-time environment. I've start studying compilers theory, attended some online classes and after a while I was ready to go.

First thing was to create a list of requirements for the language, which are:
- The language is not meant to be a general purpose one, so we can get away with some complex constructs.
- The main use is to bind existing functions of the system in order to provide dynamic response to events. A sort of user defined behavioural response (the ultimate flexibility for configuration).
- Small runtime footprint.
- Should run portable p-code (bytecodes) in an compact virtual machine. It should allow for the bytecode stream to be generated in a host and transferred to run in the target.
- Support complex integer arithmetic and logic expressions.
- 32 bits integers variables is a requirement.
- Exception handling is desirable as a clean way of dealing with errors.
- Strings are optional specially string manipulation routines. The reason for that is that this requires dynamic memory allocation. 
- For stings it must support constant strings in non volatile memory.
- It's not required to support user defined functions.
- The system can support multiple scripts. Possibly one script for each type of event.
- Support global user defined variables shared among scripts. Meaning that when one script is fired in response to an event it can record some information to be used for another script later one.
- The syntax must be close to a common programming language, like C or Java.
- The compiler must be compact to be embedded in the target system.
- No optimization of the code is necessary.
- The syntax analysis should be relatively strong (difficult to measure). The idea is to do a certain amount of static analysis to reduce the run-time checking as much as possible. Some common constructs can be left out of the language for this reason.
- If possible avoid complex run time memory management support. They are tend to be expensive in terms of memory and/or CPU usage.
- It should be possible to compile chunks of code at time. The compiler have to save it's state to resume it;s operation when more code became available. This is needed in order to compile code embedded as arrays of strings in JSON files without resorting to stitching the strings before compiling the full text.

From this initial requirements some other where added due to the limitation of syntax analysis with very little memory:
- Has to be a one pass compiler (translator) no parse or abstract syntax tree can be generated due to memory limitation.
- It can't be a recursive compiler because it's too heavy on the stack. Also memory limit check for the stack are hard to implement.So a recursive descent parser was out of the equation, although simple to implement.

These requirements narrowed down the solution to a LL(1) grammar and a table based Syntax Directed Translator. The problem now was how to generate the a compact lexical analyzer and the parser. The lexer (scanner) was solved by a handcrafted specialized code, not particularly difficult. 
Next dragon to slay was the parser. I've tried to find some tools to generate the tables for LL(1) grammars but nothing good. Some tools crashed as soon as the grammar grew to be a little more complex. Also the tables generated where very large, unsuitable for what I wanted. The solution was to write my own parser generator for LL(1) grammars. But wait, searching the internet I found a perfect starting point which was a tool developed by Prof. Ivo Mateljan from the University of Split, Croatia. It was a code he wrote for his students in a computer science classes. The program called ELL had almost everything I needed. It cold parse a grammar create the first and follow sets and create the list of predictions for each rule. I asked Prof. Ivo to modify his program, which he promptly and generously did. Then I added a code to generate the tables in C code and an extension to insert semantic actions inside the productions in the grammar. The big trick was a method I devised to binary search for the correct rule in the predictions list instead of a single lookup table. The ELL then generates 4 tables 2 functions and a set of constant definitions to generate the skeleton of a Syntax Directed Translator. It's pretty neat. I hope to create an open source project out of it, it may help other people as well.

The result is a language which I'm calling provisionally "MicroJS" because resembles JavaScript. 
As it stands right now, the minimum compiled code targeting an ARM Cortex-M3 microcontroller is 9064bytes of FLASH (Code) 416 bytes of RAM plus some 128bytes for the stack. This includes a small library with some 9 functions including a "printf", the compiler, the virtual machine. 
A more realistic example is a system that includes a serial driver, a console shell a small basic flash filesystem some basic commands to upload manage files and to upload scripts using the Xmodem protocol. All of that costs 17720 bytes of code and 968 bytes of memory and around 1256 bytes for stack (the problem is the xmodem here that requires a 1k buffer, a better implementation could reuse the microjs space for the xmodem buffer which would reduce the total memory requirement to around 1.5KiB).

In case you may be wandering the type of code I can run. These are 2 examples of the code I used to test the system described above:

Example 1:
//
// Generate the Fibonacci sequence up to the maximum 32 bits signed integer
//

var x0 = 0, x1 = 1, x;

try {
    var i = 1;
    while (1) {
        // Check whether the next sum will overflow or not
        if (0x7fffffff - x1 <= x0) {
            throw 1; // overflow
        }
        x = x1 + x0;
        x0 = x1;
        x1 = x;
        printf("%2d | %10u\n", i, x);
        i = i + 1;
    }
} catch (err) {
    printf(" - overflow error!\n");
}
Produces the output:
[JS]$ js fib.js
"fib.js"
Code: 85 bytes.
Data: 12 bytes.
 1 |          1
 2 |          2
 3 |          3
 4 |          5
 5 |          8
 6 |         13
 7 |         21
 8 |         34
 9 |         55
10 |         89
11 |        144
12 |        233
13 |        377
14 |        610
15 |        987
16 |       1597
17 |       2584
18 |       4181
19 |       6765
20 |      10946
21 |      17711
22 |      28657
23 |      46368
24 |      75025
25 |     121393
26 |     196418
27 |     317811
28 |     514229
29 |     832040
30 |    1346269
31 |    2178309
32 |    3524578
33 |    5702887
34 |    9227465
35 |   14930352
36 |   24157817
37 |   39088169
38 |   63245986
39 |  102334155
40 |  165580141
41 |  267914296
42 |  433494437
43 |  701408733
44 | 1134903170
45 | 1836311903
 - overflow error!
[JS]$
Example 2:



// Print a list of 100 random prime numbers
//
var j, cnt = 0;

srand(time()); // initialize random number generator

printf("----------------------\n");

for (j = 0; j < 100; ) {
    var n = rand();
    var prime;
    if (n <= 3) {
        prime = n > 1;
    } else {
        if (n % 2 == 0 || n % 3 == 0) {
            prime = false;
        } else {
            var i;
            var m;
            m = sqrt(n) + 1;
            prime = true;
            for (i = 5; (i < m) && (prime); i = i + 6) {
                if (n % i == 0 || n % (i + 2) == 0) {
                    prime = false;
                }
            }
        }
    }
    if (prime) {
        j = j + 1;
        printf("%3d %12d\n", j, n);
    }
    cnt = cnt + 1;
}

printf("----------------------\n");

var x = (j * 10000) / cnt;

printf("%d out of %d are prime, %d.%02d %%.\n",
       j, cnt, x / 100, x % 100);

printf("---\n\n");
The result form the console (the intermediate values were cut):
[JS]$ js prime.js
"prime.js"
Code: 230 bytes.
Data: 12 bytes.
----------------------
  1   1840531613
  2   1518954509
...
 98   1946156671
 99    821160383
100    359376917
----------------------
100 out of 2022 are prime, 4.49 %.
---
[JS]$

This can give you an idea of the syntax and capabilities of the language.
The compiled code size is reasonable. And the execution speed is considerably good. But this is just the impression I have from the complexity of the prime algorithm It took 28 seconds to factor 2022 32 bit numbers in a 16MHz machine, It won't break any cryptosystem but seems good enough for and embedded scripting.

Some observations about the language:

- There is no increment (++) or decrement (--) operations.
- The only assignment operation allowed is equals (=), contrasting with C alternative assignments like:  +=, *= ...
- The "for", "if" and "while" structures require the statements to be surrounded by braces "{ }", this is to avoid the famous dangling "else" issue, which is hard to treat with LL(1) grammars.
- There is no "switch/case" construct in the language.
- All variables are 32 bits signed integers. Although the language can accept strings and chars and booleans they will be stored ant treated internally as signed integers.
- There are no support (for the moment) of "break" and "continue" declarations.
- There is no "goto" construct.
- No user defined functions. All callable functions are provided by a compile time defined library. This is a problem difficult to decide. Although it's not that complicated to allow functions, it's misuse can lead to problems difficult to treat like recursive calls that may exhaust the stack very quickly. Also static analysis is much simpler with no function calls to deal with. Library calls area easy to handle as they don't use the VM's memory space to run.
- Variadic functions are allowed. Yippee. I can't live without printf()...
- Multiple return of functions arguments are allowed (work in progress). With the not so common construct:
     (x, y) = get_point();
- There is a default catch all exception handler which silently terminates the script. The exception number is returned as a return value of the virtual machine. So it's better not to throw a 0 exception, which will be difficult to catch.
- There are no real differences between logical and bitwise AND and OR operators, which will perform as expected on boolean values anyway. So "&" and "&&" are interchangeable.
- BUG: There is a small limitation (which I plan to fix soon) in the precedence of "*" and "/" operators, they are at the same level and evaluated from right to left (easy to solve).
- TODO: arrays. This is a tricky one for non-typed language (but hey, we are intrinsically typed everything is integral). The problem is two folds. First is how to correctly allocate memory for it. Easy to solve if we force defining the size in the declaration, alternatively a static analysis could do the trick. But how to check for bound in run-time without too much of metadata being managed by the VM? Does someone have the answer for that? Other issue is the utility of arrays if we can't have other types except for integers. Maybe the trade offs do not favor implementing arrays. Extra complexity with no real benefit for the intended use. Other idea is to implement library defined arrays only. At least you could do something like:
    x = sensor[2];
    valve[1] = x * 4;
This can be easily implemented by a syntax action which calls access functions (get()/set()).
- TODO: packaging the byte code for remote target. How to carry the required library information without taking too much space.

Well I think that's enough for now.

Thanks for listening :)...

Monday, October 22, 2012

Modified PID Controller with Constrained Cubic Spline Error Function


A PID controller is a good tool to have in your belt. You can use it as the first approach for most of the control problems a practical electronics engineer faces. You don't have to have the complete dynamic model of your system to be able to use it. What you have to know is how to tune it. After some time you will start feeling how the gains affect the system, and how to test the limits of the system to determine stability.

My  comments here, most likely,  will freak out my  control theory teachers, but what the heck. Sometimes, most of the time to be more precise, you don't have all the tools you need to evaluate your system for several reasons. And most of the time a PID controller will be good enough to move you project forward.

Just to give some context here, I'm not talking about the industrial PID controllers implemented as a box you can buy and connect to your boiler. Here the PID refers to the algorithm and it's implementation as a piece of software.

In this post I will present a modification of the classical PID controller to enable a kind of symmetrical behaviour when the input changes to saturation limits.

This idea occurred to me when I was working with video cameras and having some issues with the auto exposure system. Particularly with the control of the electromechanical iris in the lenses. The control of the iris was achieved by a standard PID, which produced a very noticeable difference in the convergence speed when moving the camera from a bright to a dark scene compared to performing the opposite (dark to bright). This produced one annoying very bright picture that could last for some seconds. I had no luck in tweaking the gains of the controller because while it solved the problem in one condition, it created instabilities in the other.

Analysis of the problem led me to develop the improvement describe here.

The PID equation

The equation for the PID controller in it's parallel form is:
\[u(t)=K_pe(t)+K_i\int_{0}^{t}{e(\tau)}\,{d\tau} + K_d\frac{d}{dt}e(t)\]
where:
\(K_p\): Proportional gain
\(K_i\): Integral gain
\(K_d\): Derivative gain
\(e\): Error
The error is the difference between the set point and the process variable:
\(e=SP-PV\)
\(SP\): Set Point
\(PV\): Process variable (input)
I often use this form to directly implement a discrete PID controller.

In a PID controller, the proportional and integral terms contribute to the convergence speed.
The integral term is necessary to offset the error.

The Problem

For any physical system there will be saturation in every single part. The input will saturate to an maximum and a minimum level. You can design your controller to work with any range of inputs, which is most probably impractical. Or you can artificially limit the input to a certain range. For digital controllers there is a big chance of this saturation being imposed by your A/D converter or other analog signal conditioning circuit. Other good reason to saturate the input is to avoid numerical instabilities problems.

There some classes of systems where, for some reason, you want to limit the convergence speed, or you cannot improve speed without destabilize it.

Fig. 1
Now let's consider what happen if a very fast change in the input lead to a saturation. The error will be constant until the accumulated integral part will be enough to offset it. The error derivative is zero at this point, as the error is not changing. 

For a quick analysis, lets consider that the integral part is much bigger then the proportional one and dominates the equation. While this situation persists the output will increase, or decrease, at a constant rate. This is represented in the fig 1. The time between 1 and 2.5 seconds the input saturates at its maximum and starting from 3.5 seconds it saturates at its minimum.

Observe that the period of time necessary to recover from  a saturated minimum and the one to recover from a saturated maximum are very different. When it's max saturated it takes 1.5 seconds, and 6 seconds when at minimum. That is 4 times larger. The reason is that our set point is at 1/4 of the maximum value. The Integral term is the dominating factor:

\[{I}(t)=K_{i}\int_{0}^{t}{e(\tau)}\,{d\tau}\]

As we are saturated the error term being integrated is constant:

\(e(\tau)_{max}=SP-PV_{max}\)
\(e(\tau)_{min}=SP-PV_{min}\)

In our example:
\(SP=0.2\)
\(PV_{max}=1\)
\(PV_{min}=0\)

Leads to:
\(e(\tau)_{max}=-0.8\)
\(e(\tau)_{min}=0.2\)

Linear Error Function

The error function for the classical PID controller is:

\[e=SP-PV\]


Fig. 2


In the Fig. 2 we can see the error 'curve' plotted for 3 different set points in a saturation limited (bounded) system. Observe that if the set point is set to half of the allowed range (0.5 in this case), the error will have the same absolute value, leading to equivalent raising and falling times for the integral, and for the convergence as a consequence.

Solution

What I propose is to replace the error function in the PID controller by a constrained cubic spline with some special requirements:
  • The three points of the curve are: P1=(0,0.5); P2=(PV, 0) and P3=(1, -0.5)
  • The curve has to be smooth at P2 and 
  • The 1st. derivative at P2 should be -1
  • The 1st. derivatives should always be negative, i.e. there will be no overshoot
  • It has to be simple enough to be computed at runtime 
Fig. 3 shows a plot of the proposed function.

Fig. 3
In Fig. 3 we can see the same 3 cases depicted previously . But now, regardless of the set point value, the error at both saturation points have the same absolute value: -0.5 and 0.5. This way the system will converge with approximately the same rate in both directions. Notice also that for SP = 0.5 the curve is the same line as in the original error function.


The ideas is to replace the error function by an spline. The proposed solution are a 2 segments cubic splines whose polynomials are:

\[e_1(t) = a_{1} + b_{1}PV(t) + c_{1}PV(t)^2 + d_{1}PV(t)^3\]
\[e_2(t) = a_{2} + b_{2}PV(t) + c_{2}PV(t)^2 + d_{2}PV(t)^3\]

Thus, the error function is given by:

\[e(t)=\left\{\begin{matrix}
e_1(t) & PV(t) \leq SP\\
e_2(t) & PV(t) > SP
\end{matrix}\right.\]

The coefficient of the polynomials are functions of the Set Point and must be computed each time this value changes. To calculate the coefficients we must solve the spline equations including the proposed constraints.

The equations bellow are a solution for the two segments cubic spline with the constrains presented above.

\[f'_{1}(x_{1})=\frac{2}{\frac{x_{2}-x_{1}}{y_{2}-y_{1}}+\frac{x_{1}-x_{0}}{y_{1}-y_{0}}}\]
\[f'_{1}(x_{0})= \frac{3(y_{1}-y_{0})}{2(x_{1}-x_{0})}-\frac{f'_{1}(x_{1})}{2}\]
\[f''_{1}(x_{0})=\frac{-2(f'_{1}(x_{1}) + 2 f'_{1}(x_{0}))}{(x_{1} - x_{0})} + \frac{6 (y_{1} - y_{0})}{(x_{1} - x_{0})^2}\]
\[f''_{1}(x_{1})=\frac{2(2 f'_{1}(x_{1}) + f'_{1}(x_{0}))}{(x_{1} - x_{0})} - \frac{6 (y_{1} - y_{0})}{(x_{1} - x_{0})^2}\]
\[d_{1} = \frac{f''_{1}(x_{1}) - f''_{1}(x_{0})}{6 (x_{1} - x_{0})}\]
\[c_{1} = \frac{x_{1} f''_{1}(x_{0}) - x_{0} f''_{1}(x_{1})}{x_{1} - x_{0}}\]
\[b_{1} = \frac{(y_{1} - y_{0}) - c_{1}(x_{1}^2 - x_{0}^2) - d_{1}( x_{1}^3 - x_{0}^3)}{x_{1} - x_{0}}\]
\[a_{1} = y_{0} - b_{1} x_{0} - c_{1} x_{0}^2 - d_{1} x_{0}^3\]
\[f'_{2}(x_{1})=\frac{2}{\frac{x_{2} - x_{1}}{y_{2} - y_{1}} + \frac{x_{1} - x_{0}}{y_{1} - y_{0}}}\]
\[f'_{2}(x_{2})= \frac{3(y_{2} - y_{1})}{2(x_{2} - x_{1})}-\frac{f'_{2}(x_{1})}{2}\]
\[f''_{2}(x_{1})=\frac{-2(f'_{2}(x_{2}) + 2 f'_{2}(x_{1}))}{(x_{2}-x_{1})} + \frac{6 (y_{2} - y_{1})}{(x_{2} - x_{1})^2}\]
\[f''_{2}(x_{2})=\frac{2(2 f'_{2}(x_{2}) + f'_{2}(x_{1}))}{(x_{2} - x_{1})} - \frac{6 (y_{2} - y_{1})}{(x_{2} - x_{1})^2}$\]
\[d_{2} = \frac{f''_{2}(x_{2}) - f''_{2}(x_{1})}{6 (x_{2} - x_{1})}\]
\[c_{2} = \frac{x_{2} f''_{2}(x_{1}) - x_{1} f''_{2}(x_{2})}{x_{2} - x_{1}}\]
\[b_{2} = \frac{(y_{2} - y_{1}) - c_{2}(x_{2}^2 - x_{1}^2) - d_{2}( x_{2}^3 - x_{1}^3)}{x_{2} - x_{1}}\]
\[a_{2} = y_{1} - b_{2} x_{1} - c_{2} x_{1}^2 - d_{2} x_{1}^3\]
\[x_{0} = 0\]
\[y_{0} = 0.5\]
\[x_{1} = SP\]
\[y_{1} = 0\]
\[x_{2} = 1\]
\[y_{2} = -0.5\]

To help with the calculation of the polynomials' coefficients I've developed a small Matlab (Octave) program. Bellow there are some results.

SP=0.1250
  a1=   0.50000 b1=  -5.50000 c1=   0.00000 d1=  96.00000
  a2=   0.13703 b2=  -1.19679 c2=   0.83965 d2=  -0.27988

SP=0.2500
  a1=   0.50000 b1=  -2.50000 c1=   0.00000 d1=   8.00000
  a2=   0.29630 b2=  -1.38889 c2=   0.88889 d2=  -0.29630

SP=0.5000
  a1=   0.50000 b1=  -1.00000 c1=   0.00000 d1=   0.00000
  a2=   0.50000 b2=  -1.00000 c2=   0.00000 d2=   0.00000

SP=0.7500
  a1=   0.50000 b1=  -0.50000 c1=   0.00000 d1=  -0.29630
  a2=  -6.00000 b2=  21.50000 c2= -24.00000 d2=   8.00000

SP=0.8750
  a1=   0.50000 b1=  -0.35714 c1=   0.00000 d1=  -0.27988
  a2= -91.00000 b2= 282.50000 c2=-288.00000 d2=  96.00000

Limitations

When SP is close to the limits (0 or 1)  the derivatives (slope) became very steep and may cause numeric problems. So this method should be used with caution in its extremes.

Real Application

This modified controlling method was devised when I was developing a digital controller for the auto-exposure system of video surveillance cameras. The objective of this system is to control the brightness of the image being captured by the camera. It has to be able to perform under very extreme light conditions such as direct sunlight and poorly illuminated indoor areas.
There are three parameters to control in the camera in order to regulate the exposure:
  • Image sensor gain
  • Shutter speed
  • Iris opening
Not all the lenses have a controllable Iris, so the system can operate in two different modes depending on the type of lens installed:
  • Iris mode - regulates the amount of light entering in the camera
  • Shutter mode - regulates the exposure time of each captured frame
I will present some examples of improvements in both modes.

Iris Mode

In this mode the dominant parameter to be controlled is the amount of light enters in the camera. This is accomplished by regulating the opening of a mechanical iris embedded in the lens. 
The video bellow shows two similar sequences, the first with the normal PID and the second with the modified version. The sequences consist of moving the camera from a dark to a bright scene. 


As we can observe on this video, during the first sequence, the camera overshot and got "blind" for a little more than 2 seconds. This effect is due to the very high hysteresis of the electromechanical iris. Next sequence shows a mere half second dark picture. This represents an 8 fold improvement over the original design.
The figures 4 and 5 are the output of a real-time scope that was monitoring the controller operation when the videos were shot. Figure 4 shows the normal PID and figure 5 is the spline error modified. The major horizontal divisions  represent the time in seconds (10 seconds in total). The vertical axis is a interval form -1 to 1, all the variables were normalized to fit this interval.
 The traces captured are:
  • Blue: Set point (Illumination reference)
  • Red: Input (current measured illumination in the image sensor)
  • Green: Integral term (normalized to the interval 1, -1)
Fig. 4 - Iris control with standard PID error

Fig. 5 - Iris controller with modified spline error

Shutter Mode

In this mode the amount of light entering in the camera is fixed, what is controlled is the frame's exposure time. The mechanism that allow us to do this is an electronic shutter implemented in the image sensor itself.
The next video is an example of moving from bright to dark. We don't observe visually a so dramatic improvement  as in the previous case. But, as the graphs  bellow shows, the controller took 4 seconds to  converge with the normal PID and 2 seconds with the improved version.


Also is worth mentioning that the shutter model is linear , so we don't observe an overshoot as in the previous case.

Figures 6 and 7 were captured when above video sequences were taken. It must be noticed that the graphs show a little more than the video sequences. The graphs include moving the camera from dark to bright, that is the point where the red line goes up suddenly.

Fig. 6 - Shutter control with standard PID error
Fig 7 - Shutter control with modified spline error.

Friday, October 19, 2012

Interprocess RPC generation tool

Introduction

This post discusses a methodology to create a Remote Procedure Call interface for Process-to-Process intercommunication. That is, to communicate between two processes in the same host.
The general solution for the problem is summarized here as design pattern. A tool to automatically generate the RPC stubs code, called irpcgen, is presented as well.

The Problem

We have two processes in an embedded system, lets call them H and C.

  • H - is a hard real-time process with strict deadlines. This can be one or more control loops or an acquisition system for example.
  • I - controls the operation of H and perform other non time sensitive tasks. It may implement the user interface, operational logging etc... but it's main task is create and watchdog H.

The question that arises is, what's the best approach to create a communication channel between H and I? More specifically, we want to answer two questions:

  1. which IPC mechanism will be best suited for the task?
  2. how to send and receive structured information over these channels?

The Solution

To answer 1. I created a small set of programs to benchmark several alternative IPC mechanisms in Linux. See my previous posts on the subject: Embedded Linux Interprocess Communication Mechanisms Benchmark - Part 2

With these data in hand it occurred to me that a natural channel will be two pipes,  I have used this approach in other opportunities, but I never considered before of using unnamed pipes for the task. That's what I propose here, to used a pair of unnamed pipes connected to the stdin and stdout. Which was the best thing to do as our controlling process I is the one forking H. And H have only one single controller attached to it. This way we created a two-way Process to Process IPC channel. Now we have to be able to send and receive structured data trough it.

My requisites for the data exchange mechanisms were:
  • Simple to program and extend
  • The communication has to be synchronous
  • The programming interface has to be at high level, RPC like
First thing to do was to transform an asynchronous channel into a synchronous one. To do this a small overhead protocol was introduced. It just defines a framing structure, to delimit the message boundaries and a scheme to multiplex different message types, also it introduced control messages for synchronization an link management.

Next step was to create a way of encapsulate C structures into the messages and to label them in order to be able to demultiplex on reception. No marshalling is necessary because both processes are in the same host. This involved in defining, for each message, a function to be called to transmit it and a corresponding callback to be invoked on reception.

This can be done manually. As a matter of fact I just did it, in the first product developed with this approach. It was also a way of validating the strategy without incurring in too much tooling effort.
But for this to be generally useful a tool to automatic generate the code was needed.

The irpcgen tool

To make the development easy I created a tool to generate the stubs for the server and client as well as sample server service calls. The program works pretty much like the SUN RPC rpgen tool, except that instead of reading a RPCL input file (.x) it reads a standard C header (.h) file. This is to super simplify the things. You just need to write your API in a header file an use the functions in the client's side. The implementation of the functions will be at the server's side.

The rpcgen will read the header file and will create stubs for all functions declared that can be used as RPC. This functions represent the server's API and have to follow some rules:

1 - The return has to be a bool type;
2 - There must be at most 2 arguments to the function;
3 - It cannot be declared static;
4 - If a second argument is provided it has to be a pointer to something except a void pointer;

Functions that fail to conform to any of these rules are not considered IRPC and no stub will be generated for them.

Furthermore the direction of the data transmission will be derived by the position and type of the arguments. The following cases are possible:

No arguments

Sends nothing returns nothing (but invokes the corresponding callback on the server side) .

bool my_rpc(void);

Single argument passed by value.

This is an server input value. This is more or less obvious as the client cannot read anything back. 

bool my_rpc_set(int val);
bool my_rpc_set(struct my_req req);

Single argument passed by constant reference.

This is similar as the previous case, the single argument is a server input value.

bool my_rpc_set(const struct my_req * req);

Single argument passed by reference.

This case the argument is a return value from the server. The client should provide a pointer to a variable that will receive the data.

bool my_rpc_get(int * val);
bool my_rpc_get(struct my_rsp * rsp);

Two arguments

This case the first argument is an input value and the second a return value from the server. The client should provide a pointer to a variable that will receive the data. Note that the second argument must be a non constant reference. The first argument can be any, except a void pointer (void *); 

bool my_rpc_set_and_get(struct my_req * req, struct my_rsp * rsp);
bool my_rpc_set_and_get(const struct my_req * req, struct my_rsp * rsp);
bool my_rpc_set_and_get(struct my_req req, struct my_rsp * rsp);
bool my_rpc_set_and_get(int req, int * rsp);

Strings

If any argument is passed as a char pointer (char *) it will be treated as a NULL terminated string. The rule for a single argument is the same as for reference. I.e. if it's declared as const it represents a client to server message and will be reverse for non const strings.

bool my_rpc_set_and_get(char * req, char * rsp);
bool my_rpc_set_and_get(const char * req, char * rsp);


bool my_rpc_set(const char * req);
bool my_rpc_get(char * rsp);

Service calls

The irpcgen tool will optionally create a ".h" file with "_svc" appended to the input file name. E.g. if the input is "my_rpc.h" the generated file will be "my_rpc_svc.h". The file will contains the signature for the services to be implemented.

The file:

bool my_rpc_get(int * val);
bool my_rpc_set(int val);

Will produce:

bool my_rpc_get_svc(int * val);
bool my_rpc_set_svc(int val);

The "_svc" functions must implement the server behaivour. Optionally a "*_svc.c" can be created with dummy functions. All you need to do is to fill this functions body to have a functional RPC system.

The libirpc

The libirpc is the companion of the irpcgen. The generated code depends on this library to run.

Source code

The irpcgen tool is GPL open-source and can be downloaded from: irpcgen.tar.gz
The package also contains the libirpc and a sample. The library is LGPL licensed.

There is a Makefile in the directories irpcgen, libirpc and sample. You need to compile irpcgen and libirpc before compiling the test.

If you want do cross-compile the library and the sample to an embedded platform, set the environment variable CROSS_COMPILE to the prefix of your tool-chain e.g. export CROSS_COMPILE=arm-gnu-linux-.





Friday, October 12, 2012

YARD-ICE goes Open Source

YARD-ICE

YARD-ICE stands for Yet Another Remote Debugger - In Circuit Emulator. It is a hardware and software platform I made public recently at Google Code. The project goal is to design the Hardware and Software of a JTAG tool to program and debug ARM microcontrollers. The target audience include developers of deep embedded systems with shallow pockets.

Link to the Project: YARD-ICE on Google Code

Why Another JTAG Tool?

There are tons of tool in the market. Why another one? The main reasons are three:

  1.  performance. Some basic, low cost tools, available in the market are really slow. One of the main reasons is that low level operations are performed by the Host PC. The round trip of the USB is the one to be blamed. YARD-ICE solve this problem with and FPGA handling the serialization and other bit handling.
  2.  support for Linux/MAC platforms. Most ICE hardware lacks a decent support for non Windows platforms. There are some exceptions, but those are expensive tools with TCP/IP support. YARD-ICE is a TCP/IP based tool with embedded GDB server. It's designed to work with any IDE supporting GDB like Eclipse.
  3. flexibility. Some tools are OK for some processors, but their best performance is tied to a certain proprietary tool. Scripting is not always an option. And when this possibility exists it's some obscure language or API with Windows DLL dependencies, and too slow. Why not to write a simple shell or python script in the host to automate a test or to program your systems in the factory? YARD-ICE provides a simple csh like scripting capability, you can run small scripts remotely through a ordinary TCP connection. End better than this, if you don't like the way we do or want to customize your tool? No worries, it's LGPL open source, meaning that you have what you need to do just that.

Apart from that I really like bit scrubbing. It's a good way of knowing the processor cores in depth.