How many RAM takes my object?

If you have worked like me in the past on microcontrollers, then an hardware like Netduino is somewhat amazing. I have worked on several Motorola (now Freescale) MCUs in the late 80’s and the typical resources were a bunch of Kilobytes of ROM and some hundreds of bytes of RAM. The language was assembler or C, that was uploaded to the internal EPROM. There was no Flash yet, and every single deployment of the testing firmware was a long trip.

I may also remember to you whose have worked with EPROMs chips the erasing cycle done via UV-light. In our lab we had a kind of “oven” with a UV-lamp for all the “windowed-chips”. Every cycle was take about 20 mins. and all around the “oven” there was a disgusting smell of ozone (not so healthy). The lamp rays were UV-C (high energy), so highly dangerous for humans. What else?

But it’s been 20 years ago, today it is much more easier (and safer!).

Now we are in the real boom-duino age. I am proud to know that Arduino is an italian idea: as far as my experience, I think that the Arduino board deserves an huge merit to rediscover the pleasure of playing with electronics for hobby or fast-prototyping. The board itself and the quantity of shields around it are a kind of Lego-like game for everyone. However, by the programming view-point, Arduino is nothing new into the MCUs’ world, where the people ever used C/C++ or assembler to solve their problems.

SecretLabs made a really good move indeed: the creation of Netduino is an important milestone, because it takes the flexibility of Arduino and adding the power of a managed language like C# into a so small device. That brings abstraction and error-proof programming. Of course, all that comes with a cost: an hardware pretty powerful to make even simple works, huge amount of resources needed and poor performance.

My Netduino Plus workbench on a messy table

Yes: it’s just like going to the mall driving a truck, but it is quite exciting to play with!

Most of us have experience of managed languages and frameworks on PC, maybe on big servers. When you get in touch with small devices running managed languages like C# on top of the .Net Micro Framework, you will have at least three problems to bear in mind. Essentially all of them are involved to the very poor resources of the device, if you compare them to a PC.
The first problem is that the available libraries are a small subset of the ordinary .Net Framework. When the amount of ROM/Flash is under a MiB, it’s obvious that most of the framework should be only the essential. On a PC, where an hard-disk can hold hundreds of GiB, does not matter if you need only a small fraction of the .Net Framework: it will be easier if you download the whole bunch of libraries.

The second problem is that not all of the libraries are supported by the specific hardware you are working on. Microsoft, with the Micro Framework, gives us a layer to work on with managed languages, but we must keep in mind that the hardware is not standardized as the PC are. In the sense, we may lay on a certain core-library of the framework, only when the hardware manufacturer grants us the adequate support (and drivers).

The third obstacle is the few RAM available, that is the most precious resource that a managed language needs. I think that the RAM quantity (i.e. price) has been always the real obstacle to the implementation of applications. Managed languages, functional programming and whatever else has been proof as high-reliable and high-efficient programming, were stopped for decades by the huge amount of memory requested.

Some day ago there was an interesting thread on the Netduino forum, involving the quantity of ROM/RAM needed to store efficiently an array of patterns, used for a display, I guess. The most interesting thing the solution apart, is that most of the friends (me too) were not able to give an efficient-enough solution for the problem. When you are using C# every day on a PC, the focus is toward reliability, readiness of the sources (usually hundreds of classes) and flexibility. You don’t care about the amount of RAM requested, at least for me. The most advanced activity I may do is to increase the performance of an WPF application, but this is another story.

Past yesterday I was playing around my Netduino Plus, making a pretty simple application. My deal is to collect the ADC data at regular intervals, then expose that stream to a web server. Each collection is composed of 100 samples, then the cycle restarts forever.

This is much more an experiment than a serious application: here is not any optimization. However, being the interval tight enough for the Netduino (10ms or less), each collection takes completion in about one second. Since the web request could income anytime asynchronously, it must be available more than just one collection: the first is collecting data and some other cached, available for the server.

I have defined my sample-item as follows:

    public class SampleData
    {
        public TimeSpan TimestampBegin;
        public TimeSpan TimestampEnd;
        public int An0;
        public int An1;
        public int An2;
        public int An3;
    }

Each sampling consist in all four the ADCs readings, without any particular conversion, and the machine time before and after reading data.


        cell.TimestampBegin = Utility.GetMachineTime();

        //sample data
        cell.An0 = this._an0.Read();
        cell.An1 = this._an1.Read();
        cell.An2 = this._an2.Read();
        cell.An3 = this._an3.Read();

        cell.TimestampEnd = Utility.GetMachineTime();

Again, don’t mind why I want to do that: the real meaning is toward the RAM occupation even of a simple application, which even on a old PC is negligible.

So far, I have declared statically three collection of sample-items, each one able to hold exactly 100 of them. Once the program is deployed and it starts, the debugger halts the execution showing the fatal “out of memory” error!…Ta-da!

Note that it’s a long long time I did not see this kind of message: I guess it should be on Dos environment or something like that.
An "out of memory" exception, awaited from a long long time...

That message puzzles me a lot: the Netduino Plus MCU (Atmel AT91SAM7X512) has 128KiB of RAM and I guess there is not so many resources allocated. When Visual C# Express completes the compilation and it begins the firmware uploading, it shows on the output pane an amount of less than 14KiB of RAM. That is I should have over 100KiB free for my application. Well, I really don’t know how much it is actually free for the custom application, but all that it seems so far from running off the available memory.

What the compiler tells about the allocated resources

That is because I really wanted to understand well what it happens behind the scenes or, better, under the framework layer. Yep: I really think we are used to developing software too well and the 2-4 Gigs common-sized PC have also me us forget tons of problems of objects allocation.

So far, I have decided to make some little basic experiment by using the standard .Net framework and measuring the RAM requested with a really useful tool.

In the lab where I work, we are using the SciTech Memory Profiler. It is a very well-done as much as an useful tool to detect memory leaks and inspecting the object allocation of any .Net application. I think it is able to perform ASP.Net and service also, but I have never tried it onto.

The first program is a pout-pourri of instance types, where I want to inspect the most-common CLR types, as much as arrays, structs and classes. Note that I did not test anything involving generics, because my first deal is related to the .Net Micro Framework that does not support them. As soon you will also understand why they are not supported!

The profiler I have used shows details only for the instantiated object and not for the pinned or stacked ones (or at least I still have not understand how to do it). So I choose to create several structs, each one containing a different basic type: bool, byte, int, etc.

struct StructEmpty
{
}

struct StructOfBool
{
    public bool A;
    public bool B;
    public bool C;
}

struct StructOfByte
{
    public byte A;
    public byte B;
    public byte C;
}

struct StructOfInt
{
    public int A;
    public int B;
    public int C;
}

    // ...

By declaring arrays of just one cell for every kind of struct, the profiler look these arrays as “instantiated”, so fully traceable.


static object[] _reference = new object[0];
static StructEmpty[] _arrayEmpty = new StructEmpty[1];
static StructOfBool[] _arrayBool = new StructOfBool[1];
static StructOfByte[] _arrayByte = new StructOfByte[1];
static StructOfInt[] _arrayInt = new StructOfInt[1];

    // ...

Over that, I have also created some one-dimensional array (a.k.a. Vector) and two different versions of the “famous” sample-item of my Netduino application. The two versions are pretty equivalent: they have exactly the same content, but one is a class (as my first attempt was) and the other is a structure.

Let the code talk for us:

class MyClass
{
    public TimeSpan TS0;
    public TimeSpan TS1;
    public int A;
    public int B;
    public int C;
    public int D;
}

struct MyStruct
{
    public TimeSpan TS0;
    public TimeSpan TS1;
    public int A;
    public int B;
    public int C;
    public int D;
}

For those types, I declared something additional respect to the simpler case:

static MyClass[] _samples = new MyClass[100];
static MyClass _singleSample = new MyClass();
static MyStruct[] _strutture = new MyStruct[100];

However, each cell in the array of MyClass-es is fill with an instance in the very first line of the program.

Well, what the profiler shows us?

The upper part of the list of the memory profiler

The first thing coming out (at least for me), is that there is tons of “hidden” objects carried within/behind the application, that eat the greatest part of the total space. As you may see, my test application is quite easy, without any additional code except the essential. That would be probably because here is a desktop console application: the lightest way to build a program on a PC, but maybe even too large for a device as Netduino is.

On the picture above, I have highlighted for convenience the 100+1 instances of MyClass. Remember? There is a single instance, plus an array of 100 of MyClass instances. We may see that 101 instances takes as much as 4,040 bytes, that is 40 bytes each.

Note: we will see that the useful data in the class take only 32 bytes, so 8 bytes are the overhead due to a “class” to keep in mind.

Let’s scrolling the grid a little bottom:

The detail of the custom structures allocation

Here is the rest of the instances, counting one per type.

From the viewpoint of the Netduino sampler application, it is pretty clear that an array of MyStruct is much more compact than the equivalent implementation using MyClass. The first needs just 32 bytes, but the class needs 8 bytes more. The whole bunch of MyStruct, including array management, takes 3,212 bytes at all. The other version, using MyClass, requires 40*100+416=4,416 bytes!…Over 30% in excess.

Note: it is worth to remember that by declaring an array (one-dimensional for simplicity), implicitly the compiler instantiates an System.Array object, act to hold all the cell references.

Analyzing the “Total” column, we may easily look that a “System.Array” instance takes always 12 bytes, just for itself. After there is the “real” content, such a booleans, bytes, strings, etc.

Since any of the struct-s I have defined for the basic types holds always three homogeneous field, we may look that, for example, the “StructOfBool” takes 15 bytes at all, minus 12 for the array, is 3: so one phyiscal byte for one boolean. Along this way we may notice that one byte is one physical byte, one integer (System.Int32) is actually 4 bytes, and so away.

Nothing strange, so far.

Another way to watch the memory occupation

The most careful readers may have noticed that most of the struct-s of the test hold primitive values, i.e. structs themselves, deriving from System.ValueType. There are only the string cases, which hold a type (System.String) not deriving from ValueType.

In the sense, a structure (or a class) that declares an object not being a ValueType, it keeps inside only the reference (to the heap or to the stack). That is the way because in the program there are three kinds of struct-of-string: the one is a null reference, the second is toward a static reference (String.Empty) and the last is a real text string (“hello”). In the list all three those kind of string allocates just four bytes, as the pointer to the real vector of characters. So, while both the “null” and the “empty” strings actually do not require anything else than the pointer itself, the third “hello” strings will allocate somewhere (e.g. the heap) several other bytes.

It also tickles my curiosity how is that an empty array of objects requires more space than a 3-bytes array!

You may read more on this, about the fixed statement of C#.

Primitive types size is not a new on the block. Perhaps I wanted to know how much require a DateTime or a TimeSpan, but they could be estimated because internally work on a 64-bit long basis.

Maybe much interesting to know is that an array itself (System.Array) needs 12 bytes. But, how much will require a two-dimensional-array (i.e. matrix) or even a jagged array?

Another really interesting thing is the great difference between classes and structures, I think. However, later we will see that structures are not a panacea and it is not straightforward easy to decide whether to use a struct or a class.

Several kinds of arrays.

The step to make to complete this bunch of tests is pretty small.

I have written another program, always console-fashion for the desktop.

class Program2
{
    static byte[] _array1D = new byte[100];
    static byte[,] _array2D = new byte[100, 100];
    static byte[][] _array2J = new byte[100][];

    static void Main(string[] args)
    {
        for (int i = 0; i < 100; i++)
            _array2J[i] = new byte[100];

        Console.ReadKey();
    }
}

As you may see there are three kinds of array:

  • the first is a one-dimensional array, often called as “vector”;
  • the second is a perfect-squared two-dimensional-array, also called matrix;
  • the third is a jagged array.

The first kind needs no particular mention: it is the most common form of array used in programming. The byte-basis makes it the framework for files, streaming, bitmaps, etc.

The second is much less used, but it is particularly useful within math algorithms and whatever else needs “tables” of simple and homogenous data. It is worth noting that the .Net Micro Framework does not support arrays with more than one dimension.

The third kind of array is supported in the Micro Framework, but it is somewhat fuzzy. You can define the number of “rows” at compile-time, but what it will happen in each row is blurry, except that it will be another array of the same type.

I personally do not like arrays and I never use them in the desktop programming: they have many faults (you may call as “limitations”), but they are pretty simple and intuitive to use. They are “immutable” in size, but not in the cell content. There is also no guarantee on what you may place into (see covariance and contravariance rules). Along this way I prefer to use generic collections.

In the .Net Micro Framework there is not generics support, nor matrix. I would say also “there is not jagged arrays”, because I do like to have major compile-time checking. While on a PC an exception pops up a tedious message, on a device like Netduino there is no easy way to detect a runtime-error.

Anyway, let’s take a peek what the memory profiles shows about these arrays.

The RAM requested by several kinds of arrays

Well, some result is expected.

The easiest indication is about the matrix. Since it was defined as 100 by 100 bytes, the total allocation requested is at least 10,000 bytes, but there are additional 28 bytes needed for the 2-dimensional array instance.

Even easy to locate is the jagged array, but it requires a lot of space. The array declaration itself with the reference to the column-arrays needs 416 bytes (I am not able to tell you more). But the jagged array is composed to the 100 columns of 100 bytes also, that we find among the 101 vectors.

The vectors (one-dimensional array) are 100+1 strips of bytes: one explicitly declared and the other 100 instantiated in the jagged array. We have previously seen that each one is 12+100=112 bytes (the array itself and the real data): the grand-total is exactly 101 * 112 = 11,312 bytes. An overhead of more than 10%.

OK, let’s sort out some number:

  • for an one-dimensional array we must consider 12 bytes plus the real data;
  • for a two-dimensional array we must consider 28 bytes plus the real data;
  • for a jagged array we must consider 16 bytes as basis, then -for each row- 40 bytes plus the cost of a one-dimensional array.

Yes: the jagged array requires a lot of memory. Basically they are almost the same thing as an array of instance of class, but the class is cheaper and type-stronger. I do not want to discuss pro and cons of jagged arrays versus array of a custom class. The important thing to bear in mind is to know how they internally work, so to choose the best that fit our needs.

Hmm…if a structure is cheaper in terms of allocation, why don’t we choose it always instead of a class?

There are several differences and I am not the right person to explain they, but I think it would be interesting to focus on the main difference.

Let’s consider that our data are not allocated just for fun, instead we have to use them in some way.

Now get back to the testing application, where we have several primitive types and the two versions of “sample-item”. We have 100 samples with four integers inside, that are the ADCs readings. To test the performance, we want to write a function that takes all those samples, sums together all the ADCs readings and returns us back the grand-total.

Note: here is absolutely not important the meaning of the result: we are also assuming there is no overflow and whatever else.

OK, how many kinds of routine are you able to write?

Let’s begin with this one:

static int TestWithClass0(MyClass[] collection)
{
    int sum = 0;

    for (int i = 0, count = collection.Length; i < count; i++)
    {
        sum += collection[i].A + collection[i].B + collection[i].C + collection[i].D;
    }

    return sum;
}

This is written considering the sample-items carried within a class, so the routine accepts the array of MyClass-es as input, the loops through the instances and summing the four fields.

However, we may write it in a more elegant way:

static int TestWithClass1(MyClass[] collection)
{
    int sum = 0;

    for (int i = 0, count = collection.Length; i < count; i++)
    {
        var cell = collection[i];
        sum += cell.A + cell.B + cell.C + cell.D;
    }

    return sum;
}

Or even in this way too:

static int TestWithClass2(MyClass[] collection)
{
    int sum = 0;

    for (int i = 0, count = collection.Length; i < count; i++)
    {
        var cell = collection[i];
        sum += cell.A;
        sum += cell.B;
        sum += cell.C;
        sum += cell.D;
    }

    return sum;
}

All of them gives the same result. We should also write these three routines for the MyStruct version: also the result is identical.

So far, let’s try to measure the time taken for each routine. There are 6 test performed, each one for a specific routine, which is called for execution 100K-times with the same collection.

The time values are in milliseconds but they are not important in absolute, instead is useful make a comparison among the cases.

The performance timing results

The first three test are evaluated using the class and the last three with the structure.

Excepting the test #0, the class version is performing much faster than the structure, just because an operation with structures is done by reference, but involves also the copy “by value” of the whole structure itself. This is not a problem for the class that works only by reference.

However, you may see that accessing a cell of the array or making four “self-additions” has a considerable cost in loss of performance.

For convenience, here is the source of the project used for the test. Remember to change the extension to “.zip”.

Hoping to gave you some useful information, I thank you for the reading.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s