Which is better?

Just a quick post on how to write “better” a small piece of code.
First off, “better” is ambiguous: it should mean “elegant”, or “readable”? Maybe “performing” or even “compact”? In general, I tend to favor the readability, for a better maintenance; whereas possible a good performance as well.
Secondly, although this case is against the .Net Micro Framework (where the resources are very poor), the considerations may be applied for any platform. Here the discussion is focused just on the IL: not more in depth.
Last but lot least, the language is C#. Plain and safe C#: pointer tricks are not allowed at all.

The problem.

The problem depicted here is just an example. The goal is to store an Int16 value (16-bits wide signed integer) onto a byte-array, at a certain offset, using the Little-Endian format. It’s clear that it could be used any kind of integer, and also a different format.
As for example, given:

  • N = 12345 (0x3039 in Hex), as the value to store;
  • K = 23 (0x17 in Hex), as the starting offset of the array

The aim is storing N in the array as follows:

K K+1
Offset: 21 22 23 24 25 20
Content: x x 0x39 0x30 x x

Many ways to do it: which is better?

I found four different ways to solve the problem, but new versions are welcome.

The first is perhaps the most intuitive. I think there’s nothing to explain.

        private static void Test1(short value)
        {
            _buffer[_ptr++] = (byte)value;
            _buffer[_ptr++] = (byte)(value >> 8);
        }

The second way is a revised version of the first one, just because the post-increment is typically less compact yet performing than the pre-increment.

        private static void Test2(short value)
        {
            _buffer[_ptr] = (byte)value;
            _buffer[++_ptr] = (byte)(value >> 8);
            ++_ptr;
        }

The third way looks as the dumbest one: instead updating the offset on every step, just calculate the actual cell index. At the end, the global offset is updated once only.

        private static void Test3(short value)
        {
            _buffer[_ptr] = (byte)value;
            _buffer[_ptr + 1] = (byte)(value >> 8);
            _ptr += 2;
        }

The fourth looks as a file corruption, because it seems having no sense. Hard to read, hard to understand what the program does, and even if what id does, does it correctly. Always reliable?…hum…
However, there’s an explanation for this code further.

        private static void Test4(short value)
        {
            _buffer[_ptr] = (byte)(value + 0 * (_buffer[(_ptr += 2) - 1] = (byte)(value >> 8)));
        }

The results.

The comparison of the four snippets is relative to the speed of execution, but also on the compactness.
The execution template looks as follows:

            const int num = 10000000;
            const short k = 12345;
            Stopwatch sw;

            sw = Stopwatch.StartNew();
            for (int i = 0; i < num; i++)
            {
                _ptr = 0;
                Test1(k);
                Test1(k);
                Test1(k);
                Test1(k);
                Test1(k);
                Test1(k);
                Test1(k);
                Test1(k);
                Test1(k);
                Test1(k);
            }

            sw.Stop();
            Console.WriteLine("Test1=" + sw.ElapsedMilliseconds);

Here are the IL dump of the various snippets:

        private static void Test1(short value)
        {
            _buffer[_ptr++] = (byte)value;
            _buffer[_ptr++] = (byte)(value >> 8);
        }
	IL_0000: ldsfld uint8[] ConsoleApplication1.Program::_buffer
	IL_0005: ldsfld int32 ConsoleApplication1.Program::_ptr
	IL_000a: dup
	IL_000b: ldc.i4.1
	IL_000c: add
	IL_000d: stsfld int32 ConsoleApplication1.Program::_ptr
	IL_0012: ldarg.0
	IL_0013: conv.u1
	IL_0014: stelem.i1
	IL_0015: ldsfld uint8[] ConsoleApplication1.Program::_buffer
	IL_001a: ldsfld int32 ConsoleApplication1.Program::_ptr
	IL_001f: dup
	IL_0020: ldc.i4.1
	IL_0021: add
	IL_0022: stsfld int32 ConsoleApplication1.Program::_ptr
	IL_0027: ldarg.0
	IL_0028: ldc.i4.8
	IL_0029: shr
	IL_002a: conv.u1
	IL_002b: stelem.i1
	IL_002c: ret

        private static void Test2(short value)
        {
            _buffer[_ptr] = (byte)value;
            _buffer[++_ptr] = (byte)(value >> 8);
            ++_ptr;
        }
	IL_0000: ldsfld uint8[] ConsoleApplication1.Program::_buffer
	IL_0005: ldsfld int32 ConsoleApplication1.Program::_ptr
	IL_000a: ldarg.0
	IL_000b: conv.u1
	IL_000c: stelem.i1
	IL_000d: ldsfld uint8[] ConsoleApplication1.Program::_buffer
	IL_0012: ldsfld int32 ConsoleApplication1.Program::_ptr
	IL_0017: ldc.i4.1
	IL_0018: add
	IL_0019: dup
	IL_001a: stsfld int32 ConsoleApplication1.Program::_ptr
	IL_001f: ldarg.0
	IL_0020: ldc.i4.8
	IL_0021: shr
	IL_0022: conv.u1
	IL_0023: stelem.i1
	IL_0024: ldsfld int32 ConsoleApplication1.Program::_ptr
	IL_0029: ldc.i4.1
	IL_002a: add
	IL_002b: stsfld int32 ConsoleApplication1.Program::_ptr
	IL_0030: ret

        private static void Test3(short value)
        {
            _buffer[_ptr] = (byte)value;
            _buffer[_ptr + 1] = (byte)(value >> 8);
            _ptr += 2;
        }
	IL_0000: ldsfld uint8[] ConsoleApplication1.Program::_buffer
	IL_0005: ldsfld int32 ConsoleApplication1.Program::_ptr
	IL_000a: ldarg.0
	IL_000b: conv.u1
	IL_000c: stelem.i1
	IL_000d: ldsfld uint8[] ConsoleApplication1.Program::_buffer
	IL_0012: ldsfld int32 ConsoleApplication1.Program::_ptr
	IL_0017: ldc.i4.1
	IL_0018: add
	IL_0019: ldarg.0
	IL_001a: ldc.i4.8
	IL_001b: shr
	IL_001c: conv.u1
	IL_001d: stelem.i1
	IL_001e: ldsfld int32 ConsoleApplication1.Program::_ptr
	IL_0023: ldc.i4.2
	IL_0024: add
	IL_0025: stsfld int32 ConsoleApplication1.Program::_ptr
	IL_002a: ret

        private static void Test4(short value)
        {
            _buffer[_ptr] = (byte)(value + 0 * (_buffer[(_ptr += 2) - 1] = (byte)(value >> 8)));
        }
	IL_0000: ldsfld uint8[] ConsoleApplication1.Program::_buffer
	IL_0005: ldsfld int32 ConsoleApplication1.Program::_ptr
	IL_000a: ldarg.0
	IL_000b: ldsfld uint8[] ConsoleApplication1.Program::_buffer
	IL_0010: ldsfld int32 ConsoleApplication1.Program::_ptr
	IL_0015: ldc.i4.2
	IL_0016: add
	IL_0017: dup
	IL_0018: stsfld int32 ConsoleApplication1.Program::_ptr
	IL_001d: ldc.i4.1
	IL_001e: sub
	IL_001f: ldarg.0
	IL_0020: ldc.i4.8
	IL_0021: shr
	IL_0022: conv.u1
	IL_0023: stelem.i1
	IL_0024: conv.u1
	IL_0025: stelem.i1
	IL_0026: ret

The real surprise is on the speed results (milliseconds):

Test1=1034
Test2=848
Test3=712
Test4=801

It is worthwhile to notice that:

  • the pre-increment yields a little bonus in performance, despite the less-readable code. Also notice there are five “_ptr” accesses vs four in the first snippet, but that seems running faster anyway.
  • Surprisingly, the “dumbest” way to write the code is also the best one: not just on the speed, but compactness and readability as well.
  • The fourth snippet was just a test on how to “force” a certain IL generation, and -yes- it is very compact. Despite this effort, the speed result isn’t gratifying at all. That’s the loser solution for sure.
  • The interesting thing in the firth snippet is the fake multiplication (by zero), which aims to “compress” the two assignments in a single row. I’m pleased to see how smart is the compiler: it discard the useless multiplication, but not its terms.

Conclusions.

Just a short lesson on how to writer the code better.
Most of the times, you don’t have to bump your head against the wall to find the best solution as it were in native languages like C/C++. That’s why I love C#!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s