Mostly Embedded

Script Language for Memory Constrainded Embedded Systems

2015-11-27T06:23:00.001-08:00

A little digression on my quest for the ideal embedded scripting language.

I wanted to add scripting capability into a system I've created as open source project for quite some time: https://code.google.com/p/yard-ice/

After some back and forth with languages like eLua the conclusion I've reached is that although powerful the run-time footprint was far away from what I could afford. At the time I couldn't find anything suitable.

Then recently when developing a tool, for the company I'm working right now, I felt the need for some mechanism to flexibly configure the system. My answer too provide this flexibility was again an embedded scripting language.

I revisited some options previously considered and also new ones. Frustrated with the outcome I decided to write my own scripting language and run-time environment. I've start studying compilers theory, attended some online classes and after a while I was ready to go.

First thing was to create a list of requirements for the language, which are:

- The language is not meant to be a general purpose one, so we can get away with some complex constructs.

- The main use is to bind existing functions of the system in order to provide dynamic response to events. A sort of user defined behavioural response (the ultimate flexibility for configuration).

- Small runtime footprint.

- Should run portable p-code (bytecodes) in an compact virtual machine. It should allow for the bytecode stream to be generated in a host and transferred to run in the target.

- Support complex integer arithmetic and logic expressions.

- 32 bits integers variables is a requirement.

- Exception handling is desirable as a clean way of dealing with errors.

- Strings are optional specially string manipulation routines. The reason for that is that this requires dynamic memory allocation.

- For stings it must support constant strings in non volatile memory.

- It's not required to support user defined functions.

- The system can support multiple scripts. Possibly one script for each type of event.

- Support global user defined variables shared among scripts. Meaning that when one script is fired in response to an event it can record some information to be used for another script later one.

- The syntax must be close to a common programming language, like C or Java.

- The compiler must be compact to be embedded in the target system.

- No optimization of the code is necessary.

- The syntax analysis should be relatively strong (difficult to measure). The idea is to do a certain amount of static analysis to reduce the run-time checking as much as possible. Some common constructs can be left out of the language for this reason.

- If possible avoid complex run time memory management support. They are tend to be expensive in terms of memory and/or CPU usage.

- It should be possible to compile chunks of code at time. The compiler have to save it's state to resume it;s operation when more code became available. This is needed in order to compile code embedded as arrays of strings in JSON files without resorting to stitching the strings before compiling the full text.

From this initial requirements some other where added due to the limitation of syntax analysis with very little memory:

- Has to be a one pass compiler (translator) no parse or abstract syntax tree can be generated due to memory limitation.

- It can't be a recursive compiler because it's too heavy on the stack. Also memory limit check for the stack are hard to implement.So a recursive descent parser was out of the equation, although simple to implement.

These requirements narrowed down the solution to a LL(1) grammar and a table based Syntax Directed Translator. The problem now was how to generate the a compact lexical analyzer and the parser. The lexer (scanner) was solved by a handcrafted specialized code, not particularly difficult.

Next dragon to slay was the parser. I've tried to find some tools to generate the tables for LL(1) grammars but nothing good. Some tools crashed as soon as the grammar grew to be a little more complex. Also the tables generated where very large, unsuitable for what I wanted. The solution was to write my own parser generator for LL(1) grammars. But wait, searching the internet I found a perfect starting point which was a tool developed by Prof. Ivo Mateljan from the University of Split, Croatia. It was a code he wrote for his students in a computer science classes. The program called ELL had almost everything I needed. It cold parse a grammar create the first and follow sets and create the list of predictions for each rule. I asked Prof. Ivo to modify his program, which he promptly and generously did. Then I added a code to generate the tables in C code and an extension to insert semantic actions inside the productions in the grammar. The big trick was a method I devised to binary search for the correct rule in the predictions list instead of a single lookup table. The ELL then generates 4 tables 2 functions and a set of constant definitions to generate the skeleton of a Syntax Directed Translator. It's pretty neat. I hope to create an open source project out of it, it may help other people as well.

The result is a language which I'm calling provisionally "MicroJS" because resembles JavaScript.

As it stands right now, the minimum compiled code targeting an ARM Cortex-M3 microcontroller is 9064bytes of FLASH (Code) 416 bytes of RAM plus some 128bytes for the stack. This includes a small library with some 9 functions including a "printf", the compiler, the virtual machine.

A more realistic example is a system that includes a serial driver, a console shell a small basic flash filesystem some basic commands to upload manage files and to upload scripts using the Xmodem protocol. All of that costs 17720 bytes of code and 968 bytes of memory and around 1256 bytes for stack (the problem is the xmodem here that requires a 1k buffer, a better implementation could reuse the microjs space for the xmodem buffer which would reduce the total memory requirement to around 1.5KiB).

In case you may be wandering the type of code I can run. These are 2 examples of the code I used to test the system described above:

Example 1:

//
// Generate the Fibonacci sequence up to the maximum 32 bits signed integer
//

var x0 = 0, x1 = 1, x;

try {
    var i = 1;
    while (1) {
        // Check whether the next sum will overflow or not
        if (0x7fffffff - x1 <= x0) {
            throw 1; // overflow
        }
        x = x1 + x0;
        x0 = x1;
        x1 = x;
        printf("%2d | %10u\n", i, x);
        i = i + 1;
    }
} catch (err) {
    printf(" - overflow error!\n");
}



Produces the output:



[JS]$ js fib.js
"fib.js"
Code: 85 bytes.
Data: 12 bytes.
 1 |          1
 2 |          2
 3 |          3
 4 |          5
 5 |          8
 6 |         13
 7 |         21
 8 |         34
 9 |         55
10 |         89
11 |        144
12 |        233
13 |        377
14 |        610
15 |        987
16 |       1597
17 |       2584
18 |       4181
19 |       6765
20 |      10946
21 |      17711
22 |      28657
23 |      46368
24 |      75025
25 |     121393
26 |     196418
27 |     317811
28 |     514229
29 |     832040
30 |    1346269
31 |    2178309
32 |    3524578
33 |    5702887
34 |    9227465
35 |   14930352
36 |   24157817
37 |   39088169
38 |   63245986
39 |  102334155
40 |  165580141
41 |  267914296
42 |  433494437
43 |  701408733
44 | 1134903170
45 | 1836311903
 - overflow error!
[JS]$




Example 2:

// Print a list of 100 random prime numbers
//
var j, cnt = 0;

srand(time()); // initialize random number generator

printf("----------------------\n");

for (j = 0; j < 100; ) {
    var n = rand();
    var prime;
    if (n <= 3) {
        prime = n > 1;
    } else {
        if (n % 2 == 0 || n % 3 == 0) {
            prime = false;
        } else {
            var i;
            var m;
            m = sqrt(n) + 1;
            prime = true;
            for (i = 5; (i < m) && (prime); i = i + 6) {
                if (n % i == 0 || n % (i + 2) == 0) {
                    prime = false;
                }
            }
        }
    }
    if (prime) {
        j = j + 1;
        printf("%3d %12d\n", j, n);
    }
    cnt = cnt + 1;
}

printf("----------------------\n");

var x = (j * 10000) / cnt;

printf("%d out of %d are prime, %d.%02d %%.\n",
       j, cnt, x / 100, x % 100);

printf("---\n\n");

The result form the console (the intermediate values were cut):

[JS]$ js prime.js
"prime.js"
Code: 230 bytes.
Data: 12 bytes.
----------------------
  1   1840531613
  2   1518954509
...

 98   1946156671
 99    821160383
100    359376917
----------------------
100 out of 2022 are prime, 4.49 %.
---
[JS]$

This can give you an idea of the syntax and capabilities of the language.

The compiled code size is reasonable. And the execution speed is considerably good. But this is just the impression I have from the complexity of the prime algorithm It took 28 seconds to factor 2022 32 bit numbers in a 16MHz machine, It won't break any cryptosystem but seems good enough for and embedded scripting.

Some observations about the language:

- There is no increment (++) or decrement (--) operations.

- The only assignment operation allowed is equals (=), contrasting with C alternative assignments like: +=, *= ...

- The "for", "if" and "while" structures require the statements to be surrounded by braces "{ }", this is to avoid the famous dangling "else" issue, which is hard to treat with LL(1) grammars.

- There is no "switch/case" construct in the language.

- All variables are 32 bits signed integers. Although the language can accept strings and chars and booleans they will be stored ant treated internally as signed integers.

- There are no support (for the moment) of "break" and "continue" declarations.

- There is no "goto" construct.

- No user defined functions. All callable functions are provided by a compile time defined library. This is a problem difficult to decide. Although it's not that complicated to allow functions, it's misuse can lead to problems difficult to treat like recursive calls that may exhaust the stack very quickly. Also static analysis is much simpler with no function calls to deal with. Library calls area easy to handle as they don't use the VM's memory space to run.

- Variadic functions are allowed. Yippee. I can't live without printf()...

- Multiple return of functions arguments are allowed (work in progress). With the not so common construct:

(x, y) = get_point();

- There is a default catch all exception handler which silently terminates the script. The exception number is returned as a return value of the virtual machine. So it's better not to throw a 0 exception, which will be difficult to catch.

- There are no real differences between logical and bitwise AND and OR operators, which will perform as expected on boolean values anyway. So "&" and "&&" are interchangeable.

- BUG: There is a small limitation (which I plan to fix soon) in the precedence of "*" and "/" operators, they are at the same level and evaluated from right to left (easy to solve).

- TODO: arrays. This is a tricky one for non-typed language (but hey, we are intrinsically typed everything is integral). The problem is two folds. First is how to correctly allocate memory for it. Easy to solve if we force defining the size in the declaration, alternatively a static analysis could do the trick. But how to check for bound in run-time without too much of metadata being managed by the VM? Does someone have the answer for that? Other issue is the utility of arrays if we can't have other types except for integers. Maybe the trade offs do not favor implementing arrays. Extra complexity with no real benefit for the intended use. Other idea is to implement library defined arrays only. At least you could do something like:

x = sensor[2];

valve[1] = x * 4;

This can be easily implemented by a syntax action which calls access functions (get()/set()).

- TODO: packaging the byte code for remote target. How to carry the required library information without taking too much space.

Well I think that's enough for now.

Thanks for listening :)...

Modified PID Controller with Constrained Cubic Spline Error Function

2012-10-22T19:16:00.000-07:00

A PID controller is a good tool to have in your belt. You can use it as the first approach for most of the control problems a practical electronics engineer faces. You don't have to have the complete dynamic model of your system to be able to use it. What you have to know is how to tune it. After some time you will start feeling how the gains affect the system, and how to test the limits of the system to determine stability.

My comments here, most likely, will freak out my control theory teachers, but what the heck. Sometimes, most of the time to be more precise, you don't have all the tools you need to evaluate your system for several reasons. And most of the time a PID controller will be good enough to move you project forward.

Just to give some context here, I'm not talking about the industrial PID controllers implemented as a box you can buy and connect to your boiler. Here the PID refers to the algorithm and it's implementation as a piece of software.

In this post I will present a modification of the classical PID controller to enable a kind of symmetrical behaviour when the input changes to saturation limits.

This idea occurred to me when I was working with video cameras and having some issues with the auto exposure system. Particularly with the control of the electromechanical iris in the lenses. The control of the iris was achieved by a standard PID, which produced a very noticeable difference in the convergence speed when moving the camera from a bright to a dark scene compared to performing the opposite (dark to bright). This produced one annoying very bright picture that could last for some seconds. I had no luck in tweaking the gains of the controller because while it solved the problem in one condition, it created instabilities in the other.

Analysis of the problem led me to develop the improvement describe here.

The PID equation

The equation for the PID controller in it's parallel form is:
\[u(t)=K_pe(t)+K_i\int_{0}^{t}{e(\tau)}\,{d\tau} + K_d\frac{d}{dt}e(t)\]
where:
$K_p$: Proportional gain
$K_i$: Integral gain
$K_d$: Derivative gain
$e$: Error
The error is the difference between the set point and the process variable:
$e=SP-PV$
$SP$: Set Point
$PV$: Process variable (input)
I often use this form to directly implement a discrete PID controller.

In a PID controller, the proportional and integral terms contribute to the convergence speed.
The integral term is necessary to offset the error.

The Problem

For any physical system there will be saturation in every single part. The input will saturate to an maximum and a minimum level. You can design your controller to work with any range of inputs, which is most probably impractical. Or you can artificially limit the input to a certain range. For digital controllers there is a big chance of this saturation being imposed by your A/D converter or other analog signal conditioning circuit. Other good reason to saturate the input is to avoid numerical instabilities problems.

There some classes of systems where, for some reason, you want to limit the convergence speed, or you cannot improve speed without destabilize it.

Fig. 1

Now let's consider what happen if a very fast change in the input lead to a saturation. The error will be constant until the accumulated integral part will be enough to offset it. The error derivative is zero at this point, as the error is not changing.

For a quick analysis, lets consider that the integral part is much bigger then the proportional one and dominates the equation. While this situation persists the output will increase, or decrease, at a constant rate. This is represented in the fig 1. The time between 1 and 2.5 seconds the input saturates at its maximum and starting from 3.5 seconds it saturates at its minimum.

Observe that the period of time necessary to recover from a saturated minimum and the one to recover from a saturated maximum are very different. When it's max saturated it takes 1.5 seconds, and 6 seconds when at minimum. That is 4 times larger. The reason is that our set point is at 1/4 of the maximum value. The Integral term is the dominating factor:

\[{I}(t)=K_{i}\int_{0}^{t}{e(\tau)}\,{d\tau}\]

As we are saturated the error term being integrated is constant:

$e(\tau)_{max}=SP-PV_{max}$

$e(\tau)_{min}=SP-PV_{min}$

In our example:

$SP=0.2$
$PV_{max}=1$
$PV_{min}=0$

Leads to:

$e(\tau)_{max}=-0.8$

$e(\tau)_{min}=0.2$

Linear Error Function

The error function for the classical PID controller is:

\[e=SP-PV\]

Fig. 2

In the Fig. 2 we can see the error 'curve' plotted for 3 different set points in a saturation limited (bounded) system. Observe that if the set point is set to half of the allowed range (0.5 in this case), the error will have the same absolute value, leading to equivalent raising and falling times for the integral, and for the convergence as a consequence.

Solution

What I propose is to replace the error function in the PID controller by a constrained cubic spline with some special requirements:

The three points of the curve are: P1=(0,0.5); P2=(PV, 0) and P3=(1, -0.5)
The curve has to be smooth at P2 and
The 1st. derivative at P2 should be -1
The 1st. derivatives should always be negative, i.e. there will be no overshoot
It has to be simple enough to be computed at runtime

Fig. 3 shows a plot of the proposed function.

Fig. 3

In Fig. 3 we can see the same 3 cases depicted previously . But now, regardless of the set point value, the error at both saturation points have the same absolute value: -0.5 and 0.5. This way the system will converge with approximately the same rate in both directions. Notice also that for SP = 0.5 the curve is the same line as in the original error function.

The ideas is to replace the error function by an spline. The proposed solution are a 2 segments cubic splines whose polynomials are:

\[e_1(t) = a_{1} + b_{1}PV(t) + c_{1}PV(t)^2 + d_{1}PV(t)^3\]
\[e_2(t) = a_{2} + b_{2}PV(t) + c_{2}PV(t)^2 + d_{2}PV(t)^3\]

Thus, the error function is given by:

\[e(t)=\left\{\begin{matrix}
e_1(t) & PV(t) \leq SP\\
e_2(t) & PV(t) > SP
\end{matrix}\right.\]

The coefficient of the polynomials are functions of the Set Point and must be computed each time this value changes. To calculate the coefficients we must solve the spline equations including the proposed constraints.

The equations bellow are a solution for the two segments cubic spline with the constrains presented above.

\[f'_{1}(x_{1})=\frac{2}{\frac{x_{2}-x_{1}}{y_{2}-y_{1}}+\frac{x_{1}-x_{0}}{y_{1}-y_{0}}}\]
\[f'_{1}(x_{0})= \frac{3(y_{1}-y_{0})}{2(x_{1}-x_{0})}-\frac{f'_{1}(x_{1})}{2}\]
\[f''_{1}(x_{0})=\frac{-2(f'_{1}(x_{1}) + 2 f'_{1}(x_{0}))}{(x_{1} - x_{0})} + \frac{6 (y_{1} - y_{0})}{(x_{1} - x_{0})^2}\]
\[f''_{1}(x_{1})=\frac{2(2 f'_{1}(x_{1}) + f'_{1}(x_{0}))}{(x_{1} - x_{0})} - \frac{6 (y_{1} - y_{0})}{(x_{1} - x_{0})^2}\]
\[d_{1} = \frac{f''_{1}(x_{1}) - f''_{1}(x_{0})}{6 (x_{1} - x_{0})}\]
\[c_{1} = \frac{x_{1} f''_{1}(x_{0}) - x_{0} f''_{1}(x_{1})}{x_{1} - x_{0}}\]
\[b_{1} = \frac{(y_{1} - y_{0}) - c_{1}(x_{1}^2 - x_{0}^2) - d_{1}( x_{1}^3 - x_{0}^3)}{x_{1} - x_{0}}\]
\[a_{1} = y_{0} - b_{1} x_{0} - c_{1} x_{0}^2 - d_{1} x_{0}^3\]
\[f'_{2}(x_{1})=\frac{2}{\frac{x_{2} - x_{1}}{y_{2} - y_{1}} + \frac{x_{1} - x_{0}}{y_{1} - y_{0}}}\]
\[f'_{2}(x_{2})= \frac{3(y_{2} - y_{1})}{2(x_{2} - x_{1})}-\frac{f'_{2}(x_{1})}{2}\]
\[f''_{2}(x_{1})=\frac{-2(f'_{2}(x_{2}) + 2 f'_{2}(x_{1}))}{(x_{2}-x_{1})} + \frac{6 (y_{2} - y_{1})}{(x_{2} - x_{1})^2}\]
\[f''_{2}(x_{2})=\frac{2(2 f'_{2}(x_{2}) + f'_{2}(x_{1}))}{(x_{2} - x_{1})} - \frac{6 (y_{2} - y_{1})}{(x_{2} - x_{1})^2}$\]
\[d_{2} = \frac{f''_{2}(x_{2}) - f''_{2}(x_{1})}{6 (x_{2} - x_{1})}\]
\[c_{2} = \frac{x_{2} f''_{2}(x_{1}) - x_{1} f''_{2}(x_{2})}{x_{2} - x_{1}}\]
\[b_{2} = \frac{(y_{2} - y_{1}) - c_{2}(x_{2}^2 - x_{1}^2) - d_{2}( x_{2}^3 - x_{1}^3)}{x_{2} - x_{1}}\]
\[a_{2} = y_{1} - b_{2} x_{1} - c_{2} x_{1}^2 - d_{2} x_{1}^3\]
\[x_{0} = 0\]
\[y_{0} = 0.5\]
\[x_{1} = SP\]
\[y_{1} = 0\]
\[x_{2} = 1\]
\[y_{2} = -0.5\]

To help with the calculation of the polynomials' coefficients I've developed a small Matlab (Octave) program. Bellow there are some results.

SP=0.1250
a1= 0.50000 b1= -5.50000 c1= 0.00000 d1= 96.00000
a2= 0.13703 b2= -1.19679 c2= 0.83965 d2= -0.27988

SP=0.2500
a1= 0.50000 b1= -2.50000 c1= 0.00000 d1= 8.00000
a2= 0.29630 b2= -1.38889 c2= 0.88889 d2= -0.29630

SP=0.5000
a1= 0.50000 b1= -1.00000 c1= 0.00000 d1= 0.00000
a2= 0.50000 b2= -1.00000 c2= 0.00000 d2= 0.00000

SP=0.7500
a1= 0.50000 b1= -0.50000 c1= 0.00000 d1= -0.29630
a2= -6.00000 b2= 21.50000 c2= -24.00000 d2= 8.00000

SP=0.8750
a1= 0.50000 b1= -0.35714 c1= 0.00000 d1= -0.27988
a2= -91.00000 b2= 282.50000 c2=-288.00000 d2= 96.00000

Limitations

When SP is close to the limits (0 or 1) the derivatives (slope) became very steep and may cause numeric problems. So this method should be used with caution in its extremes.

Real Application

This modified controlling method was devised when I was developing a digital controller for the auto-exposure system of video surveillance cameras. The objective of this system is to control the brightness of the image being captured by the camera. It has to be able to perform under very extreme light conditions such as direct sunlight and poorly illuminated indoor areas.
There are three parameters to control in the camera in order to regulate the exposure:

Image sensor gain
Shutter speed
Iris opening

Not all the lenses have a controllable Iris, so the system can operate in two different modes depending on the type of lens installed:

Iris mode - regulates the amount of light entering in the camera
Shutter mode - regulates the exposure time of each captured frame

I will present some examples of improvements in both modes.

Iris Mode

In this mode the dominant parameter to be controlled is the amount of light enters in the camera. This is accomplished by regulating the opening of a mechanical iris embedded in the lens.

The video bellow shows two similar sequences, the first with the normal PID and the second with the modified version. The sequences consist of moving the camera from a dark to a bright scene.

As we can observe on this video, during the first sequence, the camera overshot and got "blind" for a little more than 2 seconds. This effect is due to the very high hysteresis of the electromechanical iris. Next sequence shows a mere half second dark picture. This represents an 8 fold improvement over the original design.

The figures 4 and 5 are the output of a real-time scope that was monitoring the controller operation when the videos were shot. Figure 4 shows the normal PID and figure 5 is the spline error modified. The major horizontal divisions represent the time in seconds (10 seconds in total). The vertical axis is a interval form -1 to 1, all the variables were normalized to fit this interval.

The traces captured are:

Blue: Set point (Illumination reference)
Red: Input (current measured illumination in the image sensor)
Green: Integral term (normalized to the interval 1, -1)

Fig. 4 - Iris control with standard PID error

Fig. 5 - Iris controller with modified spline error

Shutter Mode

In this mode the amount of light entering in the camera is fixed, what is controlled is the frame's exposure time. The mechanism that allow us to do this is an electronic shutter implemented in the image sensor itself.

The next video is an example of moving from bright to dark. We don't observe visually a so dramatic improvement as in the previous case. But, as the graphs bellow shows, the controller took 4 seconds to converge with the normal PID and 2 seconds with the improved version.

Also is worth mentioning that the shutter model is linear , so we don't observe an overshoot as in the previous case.

Figures 6 and 7 were captured when above video sequences were taken. It must be noticed that the graphs show a little more than the video sequences. The graphs include moving the camera from dark to bright, that is the point where the red line goes up suddenly.

Fig. 6 - Shutter control with standard PID error

Fig 7 - Shutter control with modified spline error.

Interprocess RPC generation tool

2012-10-19T07:35:00.001-07:00

Introduction

This post discusses a methodology to create a Remote Procedure Call interface for Process-to-Process intercommunication. That is, to communicate between two processes in the same host.
The general solution for the problem is summarized here as design pattern. A tool to automatically generate the RPC stubs code, called irpcgen, is presented as well.

The Problem

We have two processes in an embedded system, lets call them H and C.

H - is a hard real-time process with strict deadlines. This can be one or more control loops or an acquisition system for example.
I - controls the operation of H and perform other non time sensitive tasks. It may implement the user interface, operational logging etc... but it's main task is create and watchdog H.

The question that arises is, what's the best approach to create a communication channel between H and I? More specifically, we want to answer two questions:

which IPC mechanism will be best suited for the task?
how to send and receive structured information over these channels?

The Solution

To answer 1. I created a small set of programs to benchmark several alternative IPC mechanisms in Linux. See my previous posts on the subject: Embedded Linux Interprocess Communication Mechanisms Benchmark - Part 2

With these data in hand it occurred to me that a natural channel will be two pipes, I have used this approach in other opportunities, but I never considered before of using unnamed pipes for the task. That's what I propose here, to used a pair of unnamed pipes connected to the stdin and stdout. Which was the best thing to do as our controlling process I is the one forking H. And H have only one single controller attached to it. This way we created a two-way Process to Process IPC channel. Now we have to be able to send and receive structured data trough it.

My requisites for the data exchange mechanisms were:

Simple to program and extend
The communication has to be synchronous
The programming interface has to be at high level, RPC like

First thing to do was to transform an asynchronous channel into a synchronous one. To do this a small overhead protocol was introduced. It just defines a framing structure, to delimit the message boundaries and a scheme to multiplex different message types, also it introduced control messages for synchronization an link management.

Next step was to create a way of encapsulate C structures into the messages and to label them in order to be able to demultiplex on reception. No marshalling is necessary because both processes are in the same host. This involved in defining, for each message, a function to be called to transmit it and a corresponding callback to be invoked on reception.

This can be done manually. As a matter of fact I just did it, in the first product developed with this approach. It was also a way of validating the strategy without incurring in too much tooling effort.

But for this to be generally useful a tool to automatic generate the code was needed.

The irpcgen tool

To make the development easy I created a tool to generate the stubs for the server and client as well as sample server service calls. The program works pretty much like the SUN RPC rpgen tool, except that instead of reading a RPCL input file (.x) it reads a standard C header (.h) file. This is to super simplify the things. You just need to write your API in a header file an use the functions in the client's side. The implementation of the functions will be at the server's side.

The rpcgen will read the header file and will create stubs for all functions declared that can be used as RPC. This functions represent the server's API and have to follow some rules:

1 - The return has to be a bool type;
2 - There must be at most 2 arguments to the function;
3 - It cannot be declared static;
4 - If a second argument is provided it has to be a pointer to something except a void pointer;

Functions that fail to conform to any of these rules are not considered IRPC and no stub will be generated for them.

Furthermore the direction of the data transmission will be derived by the position and type of the arguments. The following cases are possible:

No arguments

Sends nothing returns nothing (but invokes the corresponding callback on the server side) .

bool my_rpc(void);

Single argument passed by value.

This is an server input value. This is more or less obvious as the client cannot read anything back.

bool my_rpc_set(int val);
bool my_rpc_set(struct my_req req);

Single argument passed by constant reference.

This is similar as the previous case, the single argument is a server input value.

bool my_rpc_set(const struct my_req * req);

Single argument passed by reference.

This case the argument is a return value from the server. The client should provide a pointer to a variable that will receive the data.

bool my_rpc_get(int * val);
bool my_rpc_get(struct my_rsp * rsp);

Two arguments

This case the first argument is an input value and the second a return value from the server. The client should provide a pointer to a variable that will receive the data. Note that the second argument must be a non constant reference. The first argument can be any, except a void pointer (void *);

bool my_rpc_set_and_get(struct my_req * req, struct my_rsp * rsp);
bool my_rpc_set_and_get(const struct my_req * req, struct my_rsp * rsp);
bool my_rpc_set_and_get(struct my_req req, struct my_rsp * rsp);
bool my_rpc_set_and_get(int req, int * rsp);

Strings

If any argument is passed as a char pointer (char *) it will be treated as a NULL terminated string. The rule for a single argument is the same as for reference. I.e. if it's declared as const it represents a client to server message and will be reverse for non const strings.

bool my_rpc_set_and_get(char * req, char * rsp);
bool my_rpc_set_and_get(const char * req, char * rsp);


bool my_rpc_set(const char * req);
bool my_rpc_get(char * rsp);

Service calls

The irpcgen tool will optionally create a ".h" file with "_svc" appended to the input file name. E.g. if the input is "my_rpc.h" the generated file will be "my_rpc_svc.h". The file will contains the signature for the services to be implemented.

The file:

bool my_rpc_get(int * val);
bool my_rpc_set(int val);

Will produce:

bool my_rpc_get_svc(int * val);
bool my_rpc_set_svc(int val);

The "_svc" functions must implement the server behaivour. Optionally a "*_svc.c" can be created with dummy functions. All you need to do is to fill this functions body to have a functional RPC system.

The libirpc

The libirpc is the companion of the irpcgen. The generated code depends on this library to run.

Source code

The irpcgen tool is GPL open-source and can be downloaded from: irpcgen.tar.gz

The package also contains the libirpc and a sample. The library is LGPL licensed.

There is a Makefile in the directories irpcgen, libirpc and sample. You need to compile irpcgen and libirpc before compiling the test.

If you want do cross-compile the library and the sample to an embedded platform, set the environment variable CROSS_COMPILE to the prefix of your tool-chain e.g. export CROSS_COMPILE=arm-gnu-linux-.

YARD-ICE goes Open Source

2012-10-12T15:43:00.001-07:00

YARD-ICE

YARD-ICE stands for Yet Another Remote Debugger - In Circuit Emulator. It is a hardware and software platform I made public recently at Google Code. The project goal is to design the Hardware and Software of a JTAG tool to program and debug ARM microcontrollers. The target audience include developers of deep embedded systems with shallow pockets.

Link to the Project: YARD-ICE on Google Code

Why Another JTAG Tool?

There are tons of tool in the market. Why another one? The main reasons are three:

performance. Some basic, low cost tools, available in the market are really slow. One of the main reasons is that low level operations are performed by the Host PC. The round trip of the USB is the one to be blamed. YARD-ICE solve this problem with and FPGA handling the serialization and other bit handling.
support for Linux/MAC platforms. Most ICE hardware lacks a decent support for non Windows platforms. There are some exceptions, but those are expensive tools with TCP/IP support. YARD-ICE is a TCP/IP based tool with embedded GDB server. It's designed to work with any IDE supporting GDB like Eclipse.
flexibility. Some tools are OK for some processors, but their best performance is tied to a certain proprietary tool. Scripting is not always an option. And when this possibility exists it's some obscure language or API with Windows DLL dependencies, and too slow. Why not to write a simple shell or python script in the host to automate a test or to program your systems in the factory? YARD-ICE provides a simple csh like scripting capability, you can run small scripts remotely through a ordinary TCP connection. End better than this, if you don't like the way we do or want to customize your tool? No worries, it's LGPL open source, meaning that you have what you need to do just that.

Apart from that I really like bit scrubbing. It's a good way of knowing the processor cores in depth.

Unix Select+Timers

2012-10-05T12:57:00.000-07:00

Introduction

When developing real-time network protocols and other embedded time sensitive systems, it is common having to read from one or more file-descriptors while keeping track of various timeouts at the same time.

This post discusses a method to implements timers and file-descriptors polling in a single loop. It's very limited in the resource usage and relatively fast for a small number of timers and file descriptors. This conditions are usually met in embedded systems, where either is not allowed or expected for a device to serve too many clients.

The solution is fairly portable among UNIX like OSs as it uses POSIX calls. I wont claim that this is the best method to do it, but I have to say it's being successfully used in some time sensitive protocol implementations.

Timers

To implement the timers we have to keep track of the time, this is performed by a clock. The clock is a monotonic counter obtained through the clock_gettime(CLOCK_MONOTONIC) system call. And, usually, it represents an absolute time since the system start up. The reference or epoch of the clock doesn't really matter, the important thing is that it can' t be subjected to corrections like NTP.

The timers are represented in the same way the clock is. Active timers (not expired) have their times set in the future. The timers with a time in the past are expired and are consider inactive. We compare the clock value with the timers values to determine when a timer expire and an appropriate action can be taken. One approach is to associate callback functions with the timers.

To improve the performance on 32bits systems, for the clock and timers we use only the first 32bits of the value, this way the time will wraps each 4294 seconds or 71 minutes. That means that to unequivocally determine if a timer timeout is in the future it should be at most at 1/2 of the wrapping value or about 35.5 minutes. This is more than enough for most of real-time applications. If this is not your case consider using a milliseconds clock reference (see bellow).

To setup a timer timeout it's just a matter of adding the timeout time in microseconds to the clock. In the example bellow we use a value of 0 to represent an inactive timer. So if the value of the clock plus the timeout time wraps to 0 we add 1 microsecond to avoid this condition. Other more elaborated methods can be used but this have the advantage of avoiding an extra memory reference when polling the timers.

Polling

The idea is to use the select() system call to poll for the files-descriptors adjusting the timeout parameter according to the expiration time of the timers. We compare all the timers with the current clock and selects the smallest difference, higher than zero, between the expiration time an current time.

The select() system call has the advantage of being fast and conservative regarding resources usage, for a small number of file descriptors. The call by itself will not depend much of the number of the file-descriptors as it depends on the value of the last file-descriptor in the set. Another advantage of select() is portability.

#include <stlib.h>
#include <stdint.h>
#include <time.h>

#define ONE_SECOND 1000000
#define TMR_MAX 8
#define FD_MAX 8

/* get the system monotonic clock value in microseconds. */
static uint32_t get_clock_us(void)
{
    struct timespec tv;
    clock_gettime(CLOCK_MONOTONIC, &tv);
    return (tv.tv_sec * 1000000) + (tv.tv_nsec / 1000);
}

/* the maximum timer timeout allowed is 
   2147 seconds ~ 35 minutes */

uint32_t tmr[TMR_MAX]; /* List of timers */
unsigned int tmr_cnt; /* Number of timers in the list */

int fd[FD_MAX]; /* List of file descriptors */
unsigned int fd_cnt; /* Number of descriptors in the list */

static void * my_task(void * arg)
{
    struct timeval tv;
    uint32_t clock;
    int fd_max;
    fd_set rs;
    int ret;
    int i;


    for (;;) {
        /* get the current time in mircosseconds */
        clock = get_clock_us();

        /* clear the fd set */
        FD_ZERO(&rs);
        /* initialize dt_min to 1 minute */
        dt_min = 60 * ONE_SECOND;
        /* initialize fd_max */
        fd_max = 0;

        for (i = 0; i < tmr_cnt; i++) {
            int32_t dt;
            if (tmr[i] == 0) /* timer is inactive */
                continue;
            if ((dt = (int32_t)(tmr[i] - clock)) <= 0) {
                /* timer timeout */
                on_timeout(i);
            } else if (dt < dt_min) {
                /* adjust the minimum timeout time */
                dt_min = dt;
            }
        }

        tv.tv_usec = dt_min;
        tv.tv_sec = 0;

        for (i = 0; i < fd_cnt; i++) {
            if (fd[i] != -1) {
                FD_SET(fd[i], &rs);
                if (fd[i] > fd_max)
                    fd_max = fd[i];
            }
        }


        ret = select(fd_max + 1, &rs, NULL, NULL, &tv);

        if (ret < 0) {
            if (errno == EINTR) /* select() interrupted */
                continue;
            /* select() failed */
            return ret;
        }

        for (i = 0; i < fd_cnt; i++) {
            if ((fd[i] != -1) && FD_ISSET(fd[i], &rs)) {
                /* read from the file descriptor */
                on_recv(fd[i]);
            }
        }
    }
}

void timer_set(unsigned int id, unsigned int tmo_us)
{
    tmr[id] = clock + tmo_us;
    if (tmr[id] == 0)
        tmr[id]++;
}

I've tried to keep the example as short as possible, so the structure is far from ideal in terms of encapsulation. If the list of timers or file descriptors is changes dynamically, a mutual exclusion mechanism should be implemented as well. This is to avoid race conditions when evaluating the timers or the file descriptors.

Minor improvements

It may be a good idea to avoid arithmetic divisions in platforms that don't have an equivalent div instruction, like ARM v4 and v5 (ARM7-9). This will improve the performance a little bit. The following code is an alternative to the original one that uses sums of shifts to get an approximation of the 'by 1000' division, when calculating the number of microseconds.

static inline uint32_t get_sys_clock_us(void)
{
   struct timespec ts;

   clock_gettime(CLOCK_MONOTONIC, &ts);
   /* This is a fast, no division, good approximation to: 
      tv_nsec / 1000. The maximum error is 74 microseconds
      It costs only 5 structions on ARMv5 */
   return (ts.tv_sec * 1000000) + (ts.tv_nsec >> 10) +  
   (ts.tv_nsec >> 15) - (ts.tv_nsec >> 17) + (ts.tv_nsec >> 21);
}

If timers with more than 35 minutes are needed the clock function can be modified to count in milliseconds instead of microseconds. Follows the non-division implementation of the clock function, and the conversion to microseconds to set-up the timeval struct:

static uint32_t get_clock_ms(void)
{

   struct timespec ts;

clock_gettime(CLOCK_MONOTONIC, &ts);
    /* This is a fast, no division, good approximation to: tv_nsec / 1000000. */
    return (ts.tv_sec * 1000) + (ts.tv_nsec >> 20) + 
        (ts.tv_nsec >> 25) - (ts.tv_nsec >> 26) + 
        (ts.tv_nsec >> 28) + (ts.tv_nsec >> 29);
}

... 

tv.tv_usec = dt_min * 1000;

The Espresso Machine

2012-02-10T14:07:00.000-08:00

This tale begins in Brazil, winter time. I mean winter on the north hemisphere. Naturally it was summer in South-America, where we fled to escape the peak of Canada's cold (turns out that the winter was not that bad this year). Anyway, my wife and I were in vacations visiting our relatives there. While my wife went to the north-east part of the country, I had to go to the the capital of Minas-Gerais state, the city of Belo Horizonte. There is where my younger sister has being living.

I won't say that I do not appreciate a good espresso coffee, I'm more like a tea kind of guy. But even someone as inexperienced as I am, have to admit, that there is something rather pleasant in the taste of a good coffee extracted by a skilled barista. That was sure the case when we went to a coffee shop called Kahlúa. By the recommendation of my brother-in-law, as well as my sister, I tasted two 'single origin' ('sigle origin' being in opposition of 'blends' as I learned from them). The first one called Araponga and the other one being Sul-de-Minas Especial(South of Minas Gerais Special), to be more precise we tasted the later first. I may fail to describe the sensation of smelling the 'exquisite' aroma, a mixing of the brew and the freshly roasted beans. They where roasting the coffee while we are at the store. All that I can say is that the coffees were amazing, no bitter nor soar, just perfect. So much so that I couldn't help my self but to buy right away two packets. One to myself, my wife and dog (you have to know the dog to understand), and the other one for a couple of friends who were 'dog sitting' our little cockapoo. It is worth mentioning that the beans were medium roasted, packed and sealed as we were in the store. This allows to preserve most of the characteristics of the coffee, I suppose.

All very well, except by the fact that, we didn't have the grinder to get a coffee powder, nor the espresso machine to brew it into something worth drinking. Returning to Toronto the first thing I did was to look for machines and learn a little bit about the art of espresso making. Well, there is a plethora of ways to brew coffee and a lot of different types of machines to do espresso variants. The choice of a particular type of machine will depend, as we learned, on how much you want to be involved in the process of coffee making. Tt can range from completely manual to fully automated ones. In some matters, as food and beverages, I like to be in control of the preparation whenever is possible, or at least be part of it. Besides of the fact that, I don't classify myself as gourmet, I like to fancy of being a reasonable cook. So I decided to venture into this new endeavour of espresso making.

After some googling around, I settled for the Rancillio Silvia espresso machine and the
Baratza Vario grinder. The main reasons being, the good reviews of both machines in several sites like CoffeGeek (http://coffeegeek.com/reviews/consumer/rancilio_silvia, http://coffeegeek.com/proreviews/firstlook/baratzavario/details), as well as the bundle was in the budget we had available. I located a store (http://www.espressoplanet.com/) in Mississauga (a city nearby Toronto) which have this particular combination in a promotional package, along with some accessories and 1Kg of coffee beans. The first Saturday, just after arriving home, we went there. I must say that I was very impressed by the store, that turns out being much larger than I expected. The person who took care of us there was very kind and knowledgeable. We had the opportunity to test the machines on the spot, clarify some doubts and taste coffees. Needless to say, we bought the package and other stuff we deemed necessary to complete the espresso experience. These included: a calibrated tamper, a knock box, some 'vacuum' sealed containers for the beans and a new water filter. In the picture bellow you can see how the two machines are happily installed in our dining room.

Rancilio Silvia and Baratza Vario

ARM-GCC Toolchain How-To

2012-02-10T11:52:00.000-08:00

Once in a wile I have to compile the GCC Toolchain (Binutils, GCC, GDB) for a new platform, either because I want to have some new feature, or due to a bug correction, and also after installing a new operating system. As I don't do this often, I always have trouble remembering some steps. That's why I'm posting it here.

Before you go any further I want to point out that we will not cover here how to compile the C++ compiler (g++) - this will require the compilation of a runtime library, and is a little more challenging. Only the C language will be supported, and no C library (libc) will be generated as well. This will be, for sure, a limiting factor for almost everybody except those who are developing system software.

This tutorial will explain how to compile a cross GCC toolchain for ARM processors on a Ubuntu 10.04 LST host machine. It will probably work fine on other Ubuntu releases as well, but please be aware that there is a good chance of these procedures failing if you intend to use a different set of OS and source code (other versions of GCC, binutils or GDB).

So there we go. First of all, let's get the packages:

Downloading the source code

cd /tmp
wget ftp://ftp.gnu.org/pub/gnu/binutils/binutils-2.22.tar.bz2
wget ftp://ftp.gnu.org/pub/gnu/gcc/gcc-4.6.2/gcc-core-4.6.2.tar.bz2
wget ftp://ftp.gnu.org/pub/gnu/gdb/gdb-7.4.tar.bz2

Now lets prepare the environment to compile and install. I usually install the tools in a subdirectory over the /opt directory. In this case we will be installing in the /opt/arm-none-eabi directory. The binaries (programs, gcc, gdb and such) will be located in the /opt/arm-none-eabi/bin
subdirectory and will be prefixed by "arm-none-eabi" (arm-none-eabi-as,, arm-none-eabi-gcc,...) .

Installing development libraries

sudo apt-get install libmpfr-dev libgmp3-dev libmpc-dev
sudo apt-get install libz-dev

The first line install the MPFR, GMP and MPC development libraries, which are required to compile GCC since version 4.3.
The last line adds the zlib development package, as you may get an error when compiling the zlib provided with GCC.

Creating a build tree

Assuming that all the source code files where downloaded in the /tmp directory, ad we will compile in our home directory:

cd
mkdir gcc-toolchain
cd gcc-toolchain
bzip2 -dc /tmp/binutils-2.22.tar.bz2 | tar -vxf -
bzip2 -dc /tmp/gcc-core-4.6.2.tar.bz2 | tar -vxf -
bzip2 -dc /tmp/gdb-7.4.tar.bz2 | tar -vxf -
mkdir arm-none-eabi
cd arm-none-eabi
mkdir binutils-2.22
mkdir gcc-4.6.2
mkdir gdb-7.4
export PATH=/opt/arm-none-eabi/bin:/bin:/usr/bin

The last line will set-up the PATH for the compilation. Notice that the first entry (/opt/arm-none-eabi/bin) does not exist yet, but it will be crated when installing the binutils and will be necessary for compiling the GCC.

Compiling GNU binutils

First let's do the basics: assembler, archiver, linker and object files utilities.

cd binutils-2.22
../../binutils-2.22/configure --prefix=/opt/arm-none-eabi --target=arm-none-eabi --disable-nls
make -j 8
sudo make install
cd ..

Compiling GCC

If everything went well, we are good to compile the cross-compiler. To make sure check the /opt/arm-none-eabi/bin directory, all the "arm-none-*" family of binutils must be there.

cd gcc-4.6.2
../../gcc-4.6.2/configure --prefix=/opt/arm-none-eabi --target=arm-none-eabi --disable-nls --disable-libssp --disable-zlib --enable-languages="c"
make -j 8
sudo make install
cd ..

GCC is up, let's see if it's running:

$ arm-none-eabi-gcc
arm-none-eabi-gcc: fatal error: no input files
compilation terminated.

If you got that message your compiler is fine.

Compiling GDB
As an optional step, you can compile the GDB. This will allows you, with the right tool, to remotely debug your embedded application.

cd gdb-7.4
../../gdb-7.4/configure --prefix=/opt/arm-none-eabi --target=arm-none-eabi --disable-nls
make -j 8
sudo make install
cd ..

Update your PATH

You have to include the newly created toolchain bin directory into your PATH environment. Edit .bashrc, in your home directory, and add the following line:

export PATH=$PATH:/opt/arm-none-eabi/bin

For the changes to take effect you will have to restart the terminal or source your .bashrc with:

$ source ~.bashrc

/!\ Attention: the -j 8 parameter in the make line, allows for parallel building, which will speed-up the compilation process quite a lot. But, from my experience, I recommend not to use -j alone, as this may result in a non-responsive computer and sometimes the compilation itself or some other applications may crash. For that reason always set the number of tasks to match the number of cores or threads your machine has. For example I'm using a Intel Core 7 with 4 cores and 2 threads per core (Intel Hyper Threading), so I use -j 8.

Other tips

GMP and MPFR

In some operating systems the GMP and MPFR libraries required to compile the GCC are outdate, to solve this we download source code and tell the configure script where to find them:

ftp://ftp.gnu.org/pub/gnu/gmp/gmp-4.3.2.tar.bz2
http://www.mpfr.org/mpfr-current/mpfr-3.1.0.tar.bz2

bzip2 -dc /tmp/gmp-4.3.2.tar.bz2 | tar -vxf -
bzip2 -dc /tmp/mpfr-3.1.0.tar.bz2 | tar -vxf -

Newlib

Depending on what your projects are you most probably will need a C library. NewLib is a good option and you can compile it along with GCC.
Quoting from the Newlib's website: "Newlib is a C library intended for use on embedded systems. It is a conglomeration of several library parts, all under free software licenses that make them easily usable on embedded products."

cd / tmp
wget ftp://sources.redhat.com/pub/newlib/newlib-1.20.0.tar.gz
cd ~/gcc-toolchain
gzip -dc /tmp/newlib-1.20.0.tar.gz | tar -vxf -

In the GCC compilation step you need to inform you want to use the NewLib as your default C library: --with-newlib
As the NewLib provides support for building the run-time elements of C++ we can enable the C++ in the GCC compilation as well: --enable-languages="c,c++"

cd gcc-4.6.2
../../gcc-4.6.2/configure --prefix=/opt/arm-none-eabi --target=arm-none-eabi --disable-nls --disable-libssp --disable-zlib --enable-languages="c,c++" --with-newlib --with-headers=../../newlib-1.20.0/newlib/libc/include 
make -j 8
sudo make install
cd ..

Now you can compile the library:

mkdir newlib-1.20.0
cd newlib-1.20.0
../../newlib-1.20.0/configure --prefix=/opt/arm-none-eabi --target=arm-none-eabi
make -j 8
sudo make install
cd ..

NewLib Notes

If you want to use the NewLib to do I/O, dynamic memory, file operations and some other other functions, you will need to create a OS adaptation layer. I may write a tutorial on the subject one of these days.

Updates

I've tried the compilation sequence in my netbook running Lubuntu 12.04 and it worked like a charm. If you have a resource limited, old computer, or simply don't swallow the new Gnome/Ubuntu interface I really recommend LUbuntu: http://lubuntu.net/.

Embedded Linux Interprocess Communication Mechanisms Benchmark 2nd Part

2011-12-05T17:44:00.001-08:00

This is the second part of the benchmark of some IPC on Embedded Linux. See the previous post: Embedded Linux Interprocess Communication Mechanisms Benchmark - Part 1.

Source Code
The source code with the tests can be downloaded from here: ipc_bm.tar.gz.
To compile just adjust the variable CROSS_COMPILE in the main Makefile and do make all.

Results
The listings bellow show the output of the tests
POSIX Message Queue

* IPC Benchmark start
 - POSIX mq server...
 - POSIX mq client...
 - Large message send test:
      0.52 secs,  3870.7 Msgs/s,  3870.7 KiB/s
 - Large message receive test:
      0.52 secs,  3858.2 Msgs/s,  3858.2 KiB/s
 - Medium message send test:
      0.50 secs,  4019.3 Msgs/s,   502.4 KiB/s
 - Medium message receive test:
      0.46 secs,  4386.2 Msgs/s,   548.3 KiB/s
 - Small message send test:
      0.50 secs,  4021.6 Msgs/s,    62.8 KiB/s
 - Small message receive test:
      0.45 secs,  4426.0 Msgs/s,    69.2 KiB/s
 - Event posting test:
      0.34 secs,  5918.5 Msgs/s,    23.1 KiB/s
* IPC Benchmark end.

Shared Memory

* IPC Benchmark start
 - POSIX shared memory server...
 - POSIX shared memory client...
 - Large message send test:
      0.82 secs,  2453.5 Msgs/s,  2453.5 KiB/s
 - Large message receive test:
      0.82 secs,  2447.1 Msgs/s,  2447.1 KiB/s
 - Medium message send test:
      0.80 secs,  2503.6 Msgs/s,   313.0 KiB/s
 - Medium message receive test:
      0.80 secs,  2494.6 Msgs/s,   311.8 KiB/s
 - Small message send test:
      0.79 secs,  2519.9 Msgs/s,    39.4 KiB/s
 - Small message receive test:
      0.80 secs,  2515.5 Msgs/s,    39.3 KiB/s
 - Event posting test:
      0.79 secs,  2540.8 Msgs/s,     9.9 KiB/s
* IPC Benchmark end.

ONC RPC

* IPC Benchmark start
 - ONC RPC Server...
 - ONC RPC Client...
 - Large message send test
   -    1.73 secs,  1156.8 Msgs/s,  1156.8 KiB/s
 - Large message receive test
   -    1.74 secs,  1148.4 Msgs/s,  1148.4 KiB/s
 - Medium message send test
   -    1.65 secs,  1209.0 Msgs/s,   151.1 KiB/s
 - Medium message receive test
   -    1.65 secs,  1211.2 Msgs/s,   151.4 KiB/s
 - Small message send test
   -    1.64 secs,  1219.2 Msgs/s,    19.0 KiB/s
 - Small message receive test
   -    1.66 secs,  1202.1 Msgs/s,    18.8 KiB/s
 - Event posting test
   -    0.25 secs,  7904.5 Msgs/s,    30.9 KiB/s
* IPC Benchmark end.

Comments
In order to run all the tests you must be sure that the following options are enabled in the kernel:

General Setup:
  [*] POSIX Message Queues 
  ...
  Configure standard kernel features (for small systems) :
    [*]   Use full shmem filesystem  

File systems:
  ...
  Pseudo filesystems:
    ...
    [*] Virtual memory file system support (former shm fs)

<< Embedded Linux Interprocess Communication Mechanisms Benchmark - Part 1

Embedded Linux Interprocess Communication Mechanisms Benchmark

2011-12-04T12:20:00.000-08:00

Hi there,

this is my first attempt to blog, so please excuse me for not having or following any stylistic conventions for this kind of writing. As a matter of fact writing is not something I do often. That being said, I'm open to any criticism regarding either misuses of the English language or errors/omissions in the information content I will eventually present. So fell free to post comments and such.

This first post is intended to be the initial part of a benchmark test on some IPC (Interprocess Communication) mechanisms that I'm evaluating to implement in a commercial product. I will not going to disclose what the project or product is about, but I will outline the requirements of the subsystem involved in the test.

Overview

Before we start dealing with the problem, I would like to make some comments regarding the usefulness of the results I intend to achieve. For most of the reviews or comparisons out there there is a lack of information regarding the platform on which the tests where performed. In my opinion, this makes things a little confusing when you try to figure out whether such a solution will be appropriate or not for your system. Because differences in cache sizes and architectures, memory bandwidth, library implementation and other factors may affect the results, favoring one or another solution will depend on taking this conditions into account. This way whatever the results may be in my particular tests, I will only recommend the use of a particular approach to someone who have a similar platform.

The system I'm working right now is based on an ARM9926EJ processor with 8KiB of data cache and 16KiB of instruction cache. The processor clock is close to 300MHz and the system memory is a 128MiB 16bits DDR2 type running roughly at 540MHz.
The Linux kernel version is 2.6.32 and the C library is glibc version 2.8.

The testing setup will consists of two process: a server and a client that will perform 3 type of conversation:

1 - Synchronous request - the client issue a request to the server, which will perform some tasks and return some data as result. The size of the reply may vary between 4 to 1024 bytes, depending on the service being requested. The client will wait (block) for the server to reply.
2 - Synchronous send - the client send some data to the server (4 to 1024 bytes) and waits for the server to process it and reply back with a status.
3 - Asynchronous notification - the client send a notification (event) to the server without waiting for acknowledgment.

The mechanisms being tested are:

1 - POSIX Message Queues (mq): in this case the messages will be send, received and synchronized trough the mechanism itself. It's a very straightforward implementation.
2 - POSIX Shared Memory + POSIX Semaphores: this will be a little more evolving as we need to have a mechanism to send the data (Shared Memory) and another for synchronization and mutual exclusion ( in case we have more than one client accessing the server's shared memory resources).
3 - ONC RPC (Open Network Computing Remote Procedure Call - aka SUN RPC) : it may seems a little odd why I want even to consider this but some of the reasons are:

It will simplify the interface creation trough the use of the XDR (kind of IDL)
It will enable the same API to be used for remote access which is another requirement of the product.
It has provision for UNIX Sockets for local transactions (although I, myself, never used it)
It will be fun to do it.

Embedded Linux Interprocess Communication Mechanisms Benchmark - Part 2 >>

Mostly Embedded

Script Language for Memory Constrainded Embedded Systems

Modified PID Controller with Constrained Cubic Spline Error Function

The PID equation

The Problem

\[{I}(t)=K_{i}\int_{0}^{t}{e(\tau)}\,{d\tau}\] As we are saturated the error term being integrated is constant: \(e(\tau)_{max}=SP-PV_{max}\) \(e(\tau)_{min}=SP-PV_{min}\) In our example: \(SP=0.2\) \(PV_{max}=1\) \(PV_{min}=0\) Leads to: \(e(\tau)_{max}=-0.8\) \(e(\tau)_{min}=0.2\)

Linear Error Function

Solution

Limitations

Real Application

Iris Mode

Shutter Mode

Interprocess RPC generation tool

Introduction

The Problem

The Solution

The irpcgen tool

No arguments

Single argument passed by value.

Single argument passed by constant reference.

Single argument passed by reference.

Two arguments

Service calls

The libirpc

Source code

YARD-ICE goes Open Source

YARD-ICE

Why Another JTAG Tool?

Unix Select+Timers

Introduction

Timers

Polling

Minor improvements

The Espresso Machine

ARM-GCC Toolchain How-To

Update your PATH

NewLib Notes

Embedded Linux Interprocess Communication Mechanisms Benchmark 2nd Part

Embedded Linux Interprocess Communication Mechanisms Benchmark

\[{I}(t)=K_{i}\int_{0}^{t}{e(\tau)}\,{d\tau}\]

As we are saturated the error term being integrated is constant:

\(e(\tau)_{max}=SP-PV_{max}\)

\(e(\tau)_{min}=SP-PV_{min}\)

In our example:

\(SP=0.2\)
\(PV_{max}=1\)
\(PV_{min}=0\)

Leads to:

\(e(\tau)_{max}=-0.8\)

\(e(\tau)_{min}=0.2\)