AngularJS experimental page routing/templating

This is my very first post about “pure-web” tech, and it’s also very short. I began to deal with those things some months ago, but I feel there’s a long road to walk.
Here is an attempt to re-think the Single-Page (web) Application (a.k.a. SPA) using Angular-JS, toward a more abstracted templating way. The reasons behind a similar solution are pretty hard to understand only reading this post, but shortly I’ll post a much larger yet concrete framework for telemetry applications.
As a hint, think the ability to compose a page from a series of components, and store/retrieve the layout on any persistent medium (e.g. file, database, etc)

From what I meant, AngularJS is among the closest web-frameworks to the desktop’s WPF, which is (at least in my mind) the best framework for LOB apps.
However, I noticed that the ability to reuse components, abstract views and so away, is still somewhat not yet standardized, nor used. That’s because I thrown myself in this challenge, and the result isn’t bad as expected (for a web-dev noob like me).
A short video should explain way better than thousand words how the result is:

Follow the project on the Github repository:
https://github.com/highfield/ng-route1
Stay tuned for hotter articles in the near future!

Nesting a private C# Dynamic object

I don’t use often the dynamic feature of the C# language, but past yesterday I bumped against a subtle issue.

A basic implementation.

Consider a very basic dynamic object implementation against the DynamicObject, which looks like the overhauled ExpandoObject:

    public class MyDynamicObject
        : DynamicObject
    {
        public MyDynamicObject()
        {
            this._dict = new Dictionary<string, object>();
        }


        private readonly Dictionary<string, object> _dict;


        /**
         * called when the host tries to GET the value
         * from a member
         **/
        public override bool TryGetMember(
            GetMemberBinder binder,
            out object result
            )
        {
            //look for the member into the dictionary
            bool found = this._dict.TryGetValue(
                binder.Name,
                out result
                );

            if (found)
            {
                return true;
            }

            //yield the default behavior
            return base.TryGetMember(
                binder,
                out result
                );
        }


        /**
         * called when the host tries to SET a value
         * against a member
         **/
        public override bool TrySetMember(
            SetMemberBinder binder,
            object value
            )
        {
            //store the value in the dictionary
            this._dict[binder.Name] = value;
            return true;
        }

    }

Its usage may be expressed as follows:

    class Program
    {
        static void Main(string[] args)
        {
            dynamic d = new MyDynamicObject();
            d.first = "John";
            d.last = "Doe";
            d.birthdate = new DateTime(1966, 7, 23);
            d.registered = true;

            Console.WriteLine(d.first);
            Console.WriteLine(d.last);
            Console.WriteLine(d.birthdate);
            Console.WriteLine(d.registered);

            Console.ReadKey();
        }
    }

So far, so well. But what about retrieving a member “by name”, that is using a string as a “key” for mapping the desired member?
The above snippet could be refined as follows:

    class Program
    {
        static void Main(string[] args)
        {
            dynamic d = new MyDynamicObject();
            d.first = "John";
            d.last = "Doe";
            d.birthdate = new DateTime(1966, 7, 23);
            d.registered = true;

            Console.WriteLine(d.first);
            Console.WriteLine(d.last);
            Console.WriteLine(d.birthdate);
            Console.WriteLine(d.registered);

            Console.WriteLine();
            Console.Write("Please enter a field name: ");
            string key = Console.ReadLine();

            //how to map the required field?
            //Console.WriteLine("The field value is: " + ??? );

            Console.ReadKey();
        }
    }

Again, with an ExpandoObject everything would be straightforward, but the actual “MyDynamicObject” used in the original application requires a more complex content, with XML and a dictionary working aside.

pic1

Going on this way, the “keyed” dynamic object implementation is easy to refine:

    public class MyDynamicObject
        : DynamicObject
    {

        // ... original implementation ...


        /**
         * provide a member access through a key
         **/
        public object this[string key]
        {
            get { return this._dict[key]; }
            set { this._dict[key] = value; }
        }

    }

At this point, the demo application works fine with both the accessing way. It looks much like a JavaScript object!

    class Program
    {
        static void Main(string[] args)
        {
            dynamic d = new MyDynamicObject();
            d.first = "John";
            d.last = "Doe";
            d.birthdate = new DateTime(1966, 7, 23);
            d["registered"] = true;

            Console.WriteLine(d.first);
            Console.WriteLine(d.last);
            Console.WriteLine(d.birthdate);
            Console.WriteLine(d.registered);

            Console.WriteLine();
            Console.Write("Please enter a field name: ");
            string key = Console.ReadLine();

            Console.WriteLine("The field value is: " + d[key]);
            Console.ReadKey();
        }
    }

pic2

The problem: a nested-private dynamic object.

Consider a proxy pattern, and a dynamic object to expose indirectly to the hosting application. Also consider that the dynamic object should be marked as “private” due to avoid any possible usage outside its context.
The revised component would look as follows:

    class MyClass
    {

        public IDynamicMetaObjectProvider GetDynamicAccess()
        {
            return new MyDynamicObject();
        }


        //notice that the below class is marked as "private"
        private class MyDynamicObject
            : DynamicObject
        {

            // ... implementation as the keyed-one seen above ...

        }
    }

When used in such a sample application, it won’t work:

    class Program
    {
        static void Main(string[] args)
        {
            var c = new MyClass();

            dynamic d = c.GetDynamicAccess();
            d.first = "John";
            d.last = "Doe";
            d.birthdate = new DateTime(1966, 7, 23);
            d["registered"] = true;     //throws!

            Console.WriteLine(d.first);
            Console.WriteLine(d.last);
            Console.WriteLine(d.birthdate);
            Console.WriteLine(d.registered);

            Console.WriteLine();
            Console.Write("Please enter a field name: ");
            string key = Console.ReadLine();

            //the following would also throw
            Console.WriteLine("The field value is: " + d[key]);
            Console.ReadKey();
        }
    }

Better: it won’t work the “keyed” access, but the classic way is available, however.

I wasn’t able to find *ANY* solution unless you have the ability to modify the implementation. Here are the possible solutions.

Solution 1: mark the MyDynamicObject class accessor as “public”.

This is the simplest way, but I’d say it’s also a NON-solution because the original desire is keeping the class as “private”.

Solution 2: use the reflection.

You know, reflection is able to dig into the deepest yet hidden corners of your assembly, but it’s yet a last-rescue way. The compiler has a very-little (or nothing at all) control over what we access through reflection. I’d discourage, though feasible.

Solution 3: add an interface.

The “best” solution (although I’d demote to “decent”) is adding an interface, which aim is to expose the indexed access (keyed) to the host application.

    interface IKeyedAccess
    {
        object this[string name] { get; set; }
    }


    class MyClass
    {

        public IDynamicMetaObjectProvider GetDynamicAccess()
        {
            return new MyDynamicObject();
        }


        //notice that the below class is marked as "private"
        private class MyDynamicObject
            : DynamicObject, IKeyedAccess
        {

            // ... implementation as the keyed-one seen above ...

        }
    }

Our keyed-dynamic object must implement the interface, but rather obvious because our primary goal is that.
The major difference is rather on the object usage:

    class Program
    {
        static void Main(string[] args)
        {
            var c = new MyClass();

            dynamic d = c.GetDynamicAccess();
            var dk = (IKeyedAccess)d;
            d.first = "John";
            d.last = "Doe";
            d.birthdate = new DateTime(1966, 7, 23);
            dk["registered"] = true;

            Console.WriteLine(d.first);
            Console.WriteLine(d.last);
            Console.WriteLine(d.birthdate);
            Console.WriteLine(d.registered);

            Console.WriteLine();
            Console.Write("Please enter a field name: ");
            string key = Console.ReadLine();

            Console.WriteLine("The field value is: " + dk[key]);
            Console.ReadKey();
        }
    }

Unfortunately not as good as expected, but at least it allows to keep sticky to the “private” constraint.

Here is the source code.

Microsoft TechDays 2013 Paris: une grand merci!

The greatest Microsoft event of Europe has just been closed in Paris, France.
I am soooo honored to have been mentioned in the “Geek in da House” session of Laurent Ellerbach.

Image00001

He presented two very interesting projects, both of them involving Netduino and a little hardware around.
In the first part of his session, Laurent presents his remotely controlled gardening sprinkler system. Afterward, his Netduino is used in a totally different way: as transmitter for IR commands against a Lego train. My help was just on the latter project.
Here is the link of the video (French speaking).
Have fun!

How to get our Netduino running faster

Before going on on my graphic library for led matrix, I think it’s time to optimize a bit the code in order to get the Netduino running faster.
My job is programming application using .Net for desktop, but a PC is very rich of resources such as RAM and processor speed. Instead, the Micro Framework offers a very small environment where every byte more might have an impact on the final result.
running_cheetah_320x240
Here is a brief bunch of tests for showing a comparison on different approaches against a same task. Sometime you don’t care about the best way to write the code, but the interesting thing is actually knowing how the things are working. You will be surprised, as I was.

The test bench.

The base program for the tests is very simple: it is an endless loop where the code under test runs interleaved by a short pause of 100ms. The comparison is mostly against different yet commonly-used types, such as Int32, Single, Double and Byte.
The timings are taken by using a scope, then watching at two output ports when they change their state.
Except for the very first, each test cycles 50 times over a 20-operations set: that for minimize the overhead due to the “for-loop”. By the way, the first test is targeted just for get the “for-loop” heaviness.
It follows the test program template:

namespace PerformanceTest
{
    public class Program
    {
        private const int Count = 50;

        private static OutputPort QTest = new OutputPort(Pins.GPIO_PIN_D0, false);
        private static OutputPort QPulse = new OutputPort(Pins.GPIO_PIN_D1, false);


        public static void Main()
        {
            byte b;
            byte bx = 50;
            byte by = 16;

            int i;
            int ix = 50;
            int iy = 16;

            float f;
            float fx = 50.0f;
            float fy = 16.0f;

            double d;
            double dx = 50.0;
            double dy = 16.0;

            while (true)
            {
                //start of the test
                QTest.Write(true);


                // ... operations to test ...


                //end of the test
                QTest.Write(false);
                Thread.Sleep(100);
            }
        }


        private static void Pulse()
        {
            QPulse.Write(true);
            QPulse.Write(false);
        }

    }
}

The basic for-loop.

Since every test will use the “for-loop”, we should measure how much overhead that introduces.
Here is the snippet…

                for (int n = 0; n < 1000; n++)
                {
                    //do nothing
                }

…and here is the timing:
UNIT0000

Roughly speaking, we could say that every for-loop cycle takes about 7 microseconds.

How does look the IL-opcodes generated by the compiler (restricted to the only for-loop)?
Well, it is pretty interesting digging a bit behind (or under?) the scenes. I will take advantage by the awesome ILSpy, which is a free, open-source decompiler, disassembler and much more provided by the SharpDevelop teams.

		IL_0042: ldc.i4.0
		IL_0043: stloc.s n
		IL_0045: br.s IL_004f
		// loop start (head: IL_004f)
			IL_0047: nop
			IL_0048: nop
			IL_0049: ldloc.s n
			IL_004b: ldc.i4.1
			IL_004c: add
			IL_004d: stloc.s n

			IL_004f: ldloc.s n
			IL_0051: ldc.i4 1000
			IL_0056: clt
			IL_0058: stloc.s CS$4$0000
			IL_005a: ldloc.s CS$4$0000
			IL_005c: brtrue.s IL_0047
		// end loop

Notice how the final branch-on-true jumps back to the first opcode, which implies a couple of “nop”s: why?
Anyway, we are not going to optimize the for-loop yet.

Addition.

The addition will be performed over three common types: Int32, Single and Double.
Here is the snippet…

                for (int n = 0; n < Count; n++)
                {
                    i = ix + iy; //repeated 20 times
                }

                Pulse();

                for (int n = 0; n < Count; n++)
                {
                    f = fx + fy; //repeated 20 times
                }

                Pulse();

                for (int n = 0; n < Count; n++)
                {
                    d = dx + dy; //repeated 20 times
                }

…and here is the timing:
UNIT0001

Again, an “average” addition takes about 2 microseconds.

Many users are blaming the poor speed of a board like Netduino, because its core can run at over 200Mips. Two microseconds for an addition (integer or floating-point) seems a waste of performance, but…please, bear in mind that a so small yet inexpensive board performs similar about the same as an old 1984 IBM PC-AT machine (estimated price US$5000).

The interesting thing is that there’s almost no difference between using Int32 or Single, whose are both 32-bit based. Surprisingly, even choosing Double as type, the calculation takes insignificantly longer than the other cases. However, a Double takes 8 bytes.
Below there are the parts of IL whose depict the operations:

                        // ...

			IL_004e: ldloc.s ix
			IL_0050: ldloc.s iy
			IL_0052: add
			IL_0053: stloc.3
                        
                        // ...

			IL_00eb: ldloc.s fx
			IL_00ed: ldloc.s fy
			IL_00ef: add
			IL_00f0: stloc.s f
                        
                        // ...

			IL_019c: ldloc.s dx
			IL_019e: ldloc.s dy
			IL_01a0: add
			IL_01a1: stloc.s d
                        
                        // ...

Multiplication.

Here is the snippet…

                for (int n = 0; n < Count; n++)
                {
                    i = ix * iy; //repeated 20 times
                }

                Pulse();

                for (int n = 0; n < Count; n++)
                {
                    i = ix << 4; //repeated 20 times
                }

                Pulse();

                for (int n = 0; n < Count; n++)
                {
                    f = fx * fy; //repeated 20 times
                }

                Pulse();

                for (int n = 0; n < Count; n++)
                {
                    d = dx * dy; //repeated 20 times
                }

…and here is the timing:
UNIT0002

As for the addition, the multiplication takes almost the same time to perform and it seems there’s no significant loss of performance over different data types.
There is an extra-special case, which calculates the multiplication leveraging the left-shift operator. It’s a very particular case, but it’s noticeable the better speed than an ordinary multiplication. Is it worthwhile choosing a shift over a real multiplication? I don’t believe…
Below there are the parts of IL whose depict the operations:

                        // ...

			IL_004e: ldloc.s ix
			IL_0050: ldloc.s iy
			IL_0052: mul
			IL_0053: stloc.3
                        
                        // ...

			IL_00e8: ldloc.s ix
			IL_00ea: ldc.i4.4
			IL_00eb: shl
			IL_00ec: stloc.3
                        
                        // ...

			IL_016e: ldloc.s fx
			IL_0170: ldloc.s fy
			IL_0172: mul
			IL_0173: stloc.s f
                        
                        // ...

			IL_021f: ldloc.s dx
			IL_0221: ldloc.s dy
			IL_0223: mul
			IL_0224: stloc.s d
                        
                        // ...

Logical AND.

Here is the snippet…

                for (int n = 0; n < Count; n++)
                {
                    i = ix & iy; //repeated 20 times
                }

                Pulse();

                for (int n = 0; n < Count; n++)
                {
                    b = (byte)(bx & by); //repeated 20 times
                }

…and here is the timing:
UNIT0003

It is clear that a logical operation like the AND takes almost the same as an ordinary addition between Int32-s. Instead, the interesting thing is seeing how different is working with Int32 and Byte.
Any .Net Framework operates at least on 32-bits operands (whereas possible it uses 64-bits). Thus, when you constrain your variables to a tiny byte, most operations will cast the values to Int32-s. That takes much more time to do and demonstrates why in the .Net world the speculation habits of small CPUs are wrong.
Below there are the parts of IL whose depict the operations:

                        // ...

			IL_004e: ldloc.s ix
			IL_0050: ldloc.s iy
			IL_0052: and
			IL_0053: stloc.3
                        
                        // ...

			IL_00e8: ldloc.1
			IL_00e9: ldloc.2
			IL_00ea: and
			IL_00eb: conv.u1
			IL_00ec: stloc.0
                        
                        // ...

Min/Max calculation.

Here is the snippet…

                for (int n = 0; n < Count; n++)
                {
                    i = System.Math.Min(ix, iy);
                    i = System.Math.Max(ix, iy);
                    // ... repeated 10 times
                }

                Pulse();

                for (int n = 0; n < Count; n++)
                {
                    i = ix < iy ? ix : iy;
                    i = ix > iy ? ix : iy;
                    // ... repeated 10 times
                }

                Pulse();

                for (int n = 0; n < Count; n++)
                {
                    i = ix; if (ix < iy) i = iy;
                    i = ix; if (ix > iy) i = iy;
                    // ... repeated 10 times
                }

…and here is the timing:
UNIT0005

Please, bear in mind that the time is 5x than the above charts.

Using a library function is preferable: we should avoid “reinventing the wheel” and most of the times a library function embeds native code and yields faster results. However, when that function is particularly simple, it could be better choosing another approach, such as in this example.
The timings clear shows that calling the framework’s Min/Max function takes about three-times than using a trivial ternary-if. Even using a third attempt for calculating the min/max yields no better results other than the most trivial way.
Let’s have a peek at the IL assembly:

                        // ...

			IL_004e: ldloc.s ix
			IL_0050: ldloc.s iy
			IL_0052: call int32 [mscorlib]System.Math::Min(int32, int32)
			IL_0057: stloc.3
                        
                        // ...

			IL_013b: ldloc.s ix
			IL_013d: ldloc.s iy
			IL_013f: blt.s IL_0145

			IL_0141: ldloc.s iy
			IL_0143: br.s IL_0147

			IL_0145: ldloc.s ix

			IL_0147: stloc.3
                        
                        // ...

			IL_0264: ldloc.s ix
			IL_0266: stloc.3
			IL_0267: ldloc.s ix
			IL_0269: ldloc.s iy
			IL_026b: clt
			IL_026d: ldc.i4.0
			IL_026e: ceq
			IL_0270: stloc.s CS$4$0000
			IL_0272: ldloc.s CS$4$0000
			IL_0274: brtrue.s IL_0279

			IL_0276: ldloc.s iy
			IL_0278: stloc.3
                        
                        // ...

Sample expression.

Here is the snippet…

                for (int n = 0; n < Count; n++)
                {
                    d = ix * (fx + dx) * (fy + dy); //repeated 20 times
                }

                Pulse();

                for (int n = 0; n < Count; n++)
                {
                    d = ix; 
                    d *= fx + dx; 
                    d *= (fy + dy);
                    // ... repeated 20 times
                }

…and here is the timing:
UNIT0004

The timings are showing that an inline-expression performs better than a compound operator. That’s normal, because the compiler actually does what the user wrote: store each intermediate operation in the variable. That forces the compiler to avoid optimizations such as in the inline syntax.
The IL opcodes demonstrate the longer task in the second case:

                        // ...

			IL_004e: ldloc.s ix
			IL_0050: conv.r8
			IL_0051: ldloc.s fx
			IL_0053: conv.r8
			IL_0054: ldloc.s dx
			IL_0056: add
			IL_0057: mul
			IL_0058: ldloc.s fy
			IL_005a: conv.r8
			IL_005b: ldloc.s dy
			IL_005d: add
			IL_005e: mul
			IL_005f: stloc.s d
                        
                        // ...

			IL_01ef: ldloc.s ix
			IL_01f1: conv.r8
			IL_01f2: stloc.s d
			IL_01f4: ldloc.s d
			IL_01f6: ldloc.s fx
			IL_01f8: conv.r8
			IL_01f9: ldloc.s dx
			IL_01fb: add
			IL_01fc: mul
			IL_01fd: stloc.s d
			IL_01ff: ldloc.s d
			IL_0201: ldloc.s fy
			IL_0203: conv.r8
			IL_0204: ldloc.s dy
			IL_0206: add
			IL_0207: mul
			IL_0208: stloc.s d
                        
                        // ...

Conclusion.

As a professional programmer, I ma obsessed by well-written source code, patterns, good-practices and so away. However, I also believe it’s useful to know when and how put your finger on a program to get the most from it.
That is also a good programming practice, IMHO.

Led-matrix controller driven via SPI

Introduction.


Time ago, Stanislav -a Netduino community user- posted a problem on how to drive a 6-by-4 led-matrix using its Netduino. After some experiment, he got stuck with the circuit, because a matrix must be multiplexed, and that’s not easy to solve.
Here is the link to the forum thread.

 

If you read the message exchange on the thread, then you’ll collect easily a list of constraints. Here they are:

  • the leds have been already assembled (i.e. only the multiplex driver is needed)
  • the overall price should fall within 10 Euro
  • must be handcrafted, thus no use of small parts (e.g. SMDs)
  • the multiplex should not stop its cycling as the Netduino stops (avoid leds burnout)
  • the circuit should avoid complicate wiring, so that the PCB can get pretty easy
  • reliable enough
  • finally, Stanislav asked to learn how to design such a circuit

It was clear that Netduino only wasn’t enough to drive a 6×4 led-matrix. First off, for the inability to give enough current for the leds, and secondly for the relative slowness of the managed code running into.

 

The problem in depth.

Light up a led is very simple. Starting from the power supply, as parameter you have the current flowing through the led, then calculating the resistor to put in series. A led needs from few mA (SMDs), to several hundreds of mA (or even more) for the high-power class.
Let’s face the multiplex problem thinking to a normal discrete-led which needs 10 mA for a normal brightness.

So, what is a multiplex?
The multiplexing is a technique for driving many loads (e.g. leds), using a relatively low number of wires. Thinking to a 6×4 led-matrix, instead having 24 wires (one for each led), the multiplex-way needs only 6+4 = 10 wires at all. The trick is enabling one column at once, and issuing the related row pattern. If this process is fast enough, our eyes can’t perceive the scanning.
Now, let’s focus on a single column of four leds: the scan process cycles over, but each column is enabled only at 25% (i.e. 1/4) of the total cycle-time. It means that to yield the same brightness as the led was lit with 10 mA, we should raise it of a factor of 4, thus 40 mA. This current is off the upper limit achievable by a normal logic chip.
By the way 40 mA is probably above the led’s limit. However, the current is flowing only for a quarter of the cycle, so there’s no warm up in the *average*. We only should take care to *avoid* any cycle break, otherwise the 40 mA will flow for a too long time, and the led blows.
That’s not all. When a column is enabled, there are 6 leds composing it, and they might be all on (worst case). So, the total current flowing is 40 mA x 6 = 240 mA.
How much is the current of each row, instead? A row drives only the led at where the column is enabled, but at 25% duty, of course. It means the 40 mA seen above.

 

My solution.

To solve this problem, I see three ways:

  1. using any small micro-controller (e.g. AVR, STM8, etc), then creating a program for both multiplexing the matrix, and for communicating with the Netduino. However, this solution still has the current limitation problem, and easily could take the overall cost over 10 Euro.
  2. using an ASIC, such as the AS1108: this way probably keep the cost within the limit, but can’t get the current higher than the chip’s max-rating.
  3. creating the circuit in the “classic-way”, using simple logic gates that you can buy everywhere for fews Euro. This solution seems having only the fault on the low compactness. However, there are tricks to minimize the hardware, and the compactness didn’t seem a constraint.

My choice was for the third option, for several reasons: it’s easy to create, it teaches how to design a led-matrix driver, it’s also cheap. I’d add it’s also pretty flexible, because you could change some component upon the leds current. It’s even modular: can create (theoretically) as many rows and columns as you wish, by simply adding a stage.
I had an old 7×5 led-matrix display: not so good, IMHO. It requires about 10 mA to light a led, but the brightness isn’t so high. Even raising the current to 20-25 mA, there’s no significantly better shining. I have several modern leds, and they should fit much better a similar project, because with as little as 5 mA they shine a lot more than my matrix. However, I used it for a faster prototyping.
In my circuit I also reversed the rows/columns role, but that does not change anything about the concept. It’s only for my convenience.

 

How it works.

The circuit is based on the famous shift-register 74HC595. A single register holds one-column pattern, that is 7 leds. Since we can chain several shift-register, I chained 5 of them: one for each row. The software for loading via SPI a bit stream into the chained registers is trivial, but there are several libraries such as the Stefan’s “Bit-shift shizzle”.
The Netduino has only one task: shift the whole stream of bits into the five registers, by using the SPI.
Afterward, the trick is playing with the /OE input of each register: when this line is high, all the register’s outputs are completely “detached” from the circuit. That is, we can parallel all the outputs, and enable one register a time leveraging the /OE behavior.

NOTE: the circuit shows only three registers for clarity.

The /OE signals should be cycled. To do that, I used a simple clock generator (NE555), and a 4017 (or 74HC4017), which is a Johnson counter.
The NE555 generates a square-wave of about 500 Hz, which feeds the counter. The 4017 simply puts high one of its outputs at once: every clock edge the next output is pulled high, as a natural 10-sequence. This sequence is also used for the column’s cycle, because the registers enabling must be synchronized with the proper led-column activation. Since the matrix is composed by 5 columns, the 4017 sequence must be shorten to that quantity. To achieve this, simply wire the 6th output to the counter reset: as soon the sequence hits the 6th output, its logic high also resets the counter taking the first output high immediately.


Both the shift-registers circuit, and the sequencer requires no particular difficulty.
The complex section is the real leds driver, which has to amplify the current (possibly wasting almost no power).
There are two cases of led-matrix pattern: common-cathode or common-anode. To clarify, let’s take the columns as reference: either the column lines represent the leds’ cathode, or the leds’ anode instead.


My display is a common-cathode, and the Stanislav case is about a common-anode, instead.

 

The led driver for columns sharing the cathodes.

The following circuits targets the current amplification for both columns- and rows-lines.

NOTE: the circuit shows only few rows/cols for clarity.

The row signals are coming directly from the 74HC595 outputs, which aren’t powerful enough to drive the leds. Thus a PNP-transistor (I used BC640) for each row is used for amplifying the signal. The PNP is wired as common-emitter, so that the registers have to set the output low to activate the led-row.
The ultimate goal for these transistors is taking them to the saturation: that’s for minimizing the voltage drop across the emitter-collector. More voltage available for the leds, and lesser power waste on the transistor themselves.
Notice that I didn’t used any resistor in series to the transistors base. That’s because I wanted to maximize the current through the base, so that the saturation will be guaranteed. The register stress is relative, because -yes- the current is above the chip’s rating, but also that is for a short period in a cycle. We should always bear in mind the *average* behavior.

For the columns the amplification circuit is more complex.
That’s because every column transistor should saturate flowing a current of 350 mA (or more). It worth noting that my matrix is 7×5, so that the column current is 7 x 50 mA = 350 mA (see the above calculation).
The BC639 NPN-transistor is the dual of the BC640, and it’s rated up to 1 A (continuous). Its hFE (current amplification ratio) is rated at about 25 when the collector current is about 500 mA. That means a base current greater than 500 / 25 = 20 mA, to ensure the saturation. This value is very close to the upper limit achievable by the 74HC4017, and furthermore the drop C-E looks still pretty high. The BC639 specs indicate a VCE = 0.5V @ Ic=500 mA and Ib=50 mA. All that imply another stage for pre-amplification.
The pre-amplification stage is a common-collector pattern: simple, yet good for taking the current higher. Please, also notice that the transistor pair are NOT connected as Darlington. The Darlington fault is that you cannot take it to the saturation, and we want that instead.
I used a 1k Ohms resistor for the final-stage-base, which is fine for a column current up to 500 mA. However, you could take this value lower (e.g. 330 Ohms) whereas the required column current should be greater.

You know, this driver is designed for taking the columns to the ground in order to light the leds. So, when the 74HC4017 output is high (just one at once), the related transistor-pair stage shorts the column line to the ground.
But…remember? We also need to enable the related shift-register, so that the bits pattern will be issued against the rows. The same column signal is also used for activating the /OE of the 74HC595. Since when the column is not active there’s also nothing taking the /OE high, there’s an additional pullup (2.2k Ohms) to achieve that. However, the presence of this pullup doesn’t involve the matrix behavior in any way.

 

The led driver for columns sharing the anodes.

This is the case of Stanislav, and the circuit looks a little simpler than the above.


The considerations about the current flowing through the transistors are the same as before. The only difference is in the polarity, which has to be reversed.

 

Power supply.

NOTE: this section is very important. An improper wiring may lead to an unexpected or quirky behavior of the circuit.

There are three sections involved in the whole project: the Netduino, the logic (shift-registers and the sequencer), and the drivers (rows’ and columns’). All the grounding must be carefully shared: the logic can be powered from the Netduino +5V supply, but the driver can’t.
The drivers (i.e. the leds) need a lot of power, which must be supplied separately.
You should observe this pattern for supplying the various power sources:


The circuit should be created along two sections: the logic and the drivers.
The two grounds (Netduino’s and leds’ power) should be joined in the middle point between the two sections. So, the two current loops are kept separated.
As stated, the positive leads of the two supplies must be not connected together.

 

Modularity.

As stated in the beginning, the circuit is clearly modular.
The 74HC595 offers up to 8 outputs (i.e. rows), so you can add a transistor and a led strip with ease. Since the registers are already chained, it’s also not hard to double-chain them, and reach 16 rows or even more.
Pretty the same consideration for the columns: the 74HC4017 yields up to 10 outputs (i.e. columns). Just add a transistor-pair stage, and the leds.
In this case is a bit more difficult to expand the column outputs over ten, but it’s not impossible at all. I’ll avoid any description over here, unless explicitly requested.
Modularity yields a relatively easy wiring of the PCB, or any other concrete solution.

 

The prototype.

NOTE: the circuit scans the led-matrix automatically, also when the Netduino is halted or even detached. This should prevent over-current through the leds, and facilitates the debugging of the software application.

Here are some pictures about the prototype built over two (!) bread-boards.

The whole prototype seen from above.

 

Detail of the 5×7 led-matrix

 

Detail of the five shift-registers (74HC595) chained.

 

Detail of the clock generator, and the 74HC4017 (i.e. the column scanning)

 

Detail of the seven rows’ drivers (BC640)

 

Detail of the five columns’ drivers (2 x BC639)

 

The demo program.

Below is the source code used for the demo (see below). Nothing else is required.

    /// <summary>
    /// Sample application for testing the led-matrix driver
    /// </summary>
    /// <remarks>
    /// NOTE: it's important to bear in mind that the circuit
    /// uses a negative logic, thus a logic '1' means a led off.
    /// </remarks>
    public class Program
    {
        //define the bit-buffer as mirror to the 'HC595 chain
        private static byte[] _buffer = new byte[5];

        //define some bit-masks, just for improving speed
        private static int[] _mask0 = new int[8] { 0xFE, 0xFD, 0xFB, 0xF7, 0xEF, 0xDF, 0xBF, 0x7F };
        private static int[] _mask1 = new int[8] { 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80 };


        public static void Main()
        {
            //fill the whole buffer with logic '1' (turn all the leds off)
            for (int i = 0; i < 5; i++)
            {
                _buffer[i] = 0xFF;
            }

            //defines the first SPI slave device with pin #10 as SS
            var cfg595 = new SPI.Configuration(
                Pins.GPIO_PIN_D10, // SS-pin
                false,             // SS-pin active state
                0,                 // The setup time for the SS port
                0,                 // The hold time for the SS port
                false,             // The idle state of the clock
                true,              // The sampling clock edge (this must be "true" for the 74HC595)
                1000,              // The SPI clock rate in KHz
                SPI_Devices.SPI1   // The used SPI bus (refers to a MOSI MISO and SCLK pinset)
            );

            //open the SPI port
            using (var spi = new SPI(cfg595))
            {
                //set the initial ball's position
                var ball1 = new Point();
                ball1.X = 2;
                ball1.Y = 5;

                //set the initial ball's speed
                var speed1 = new Point();
                speed1.X = 1;
                speed1.Y = 1;

                //endless loop
                while (true)
                {
                    //clear the led where the ball now is
                    SetPixel(ref ball1, false);

                    //move the ball accordingly to its speed
                    ball1.X += speed1.X;
                    ball1.Y += speed1.Y;

                    //check for the display "walls"
                    //NOTE: it's a rect having width=5, and height=7
                    if (ball1.X > 4)
                    {
                        ball1.X = 4;
                        speed1.X = -1;
                    }
                    else if (ball1.X < 0)
                    {
                        ball1.X = 0;
                        speed1.X = 1;
                    }

                    if (ball1.Y > 6)
                    {
                        ball1.Y = 6;
                        speed1.Y = -1;
                    }
                    else if (ball1.Y < 0)
                    {
                        ball1.Y = 0;
                        speed1.Y = 1;
                    }

                    //light the led at the new ball's position
                    SetPixel(ref ball1, true);

                    //copy the bit-buffer to the 'HC595 chain
                    spi.Write(_buffer);

                    //wait a little
                    Thread.Sleep(120);
                }
            }

            //useless in this case, but better than missing it!
            Thread.Sleep(Timeout.Infinite);
        }


        /// <summary>
        /// Simple helper for setting the state of a pixel
        /// </summary>
        /// <param name="pt">The pixel coords</param>
        /// <param name="state">The desired led state (true=on)</param>
        /// <remarks>
        /// The "ref" yields better performance than
        /// passing two parameters (X,Y) separately
        /// </remarks>
        private static void SetPixel(
            ref Point pt, 
            bool state)
        {
            if (state)
            {
                //the led should be turned on,
                //then let's clear the related bit
                _buffer[pt.X] &= (byte)_mask0[pt.Y];
            }
            else
            {
                //the led should be turned off,
                //then let's set the related bit
                _buffer[pt.X] |= (byte)_mask1[pt.Y];
            }
        }

    }


    /// <summary>
    /// The basic point structure
    /// </summary>
    public struct Point
    {
        public int X;
        public int Y;
    }

Here is a short video demonstrating the circuit prototype.
The Netduino runs a small program for bouncing a ball.

Enjoy!

Netduino SPI: “S” is for “Speed”

Introduction.

One of the most used peripheral of the Netduino microcontroller is the SPI. Its simplicity together with its speed makes it a very good medium to exchange data from the microcontroller (thus your program) and other external devices. The marker is plenty of shields and circuits basing their data exchange on SPI, plus a vast series of ICs are SPI-ready, making a logic connection a pretty easy task.
There are several articles, tutorial and programs demonstrating how to use the SPI: most of them are related to some specific shield. That’s not what we are talking about; instead it is interesting to point out how to interface our own logic, and obtaining a good performance.

Using the SPI.

Many times we need to expand the number of I/Os of our board. Sometimes we need a solution to realize a parallel transfer, since the base framework does not offer that feature.

A simple solution to achieve that is using a normal shift-register chip. As the name suggests, a shift-register takes just one bit at once as input, and shifts it along a byte-register (8 cells). The shift is not automatic, but it must be accomplished by the “clock”: each clock event means a bit-shift. There is not any constraint on the number of clocks applied: simply the data overflow, and we must take care about the exact number of clocks.
This brief description depicts a “serial-in-parallel-out” logic, but there are the “parallel-in-serial-out” and other flavors as well.
All is well described as “synchronous-communication”. Vice versa, an “asynchronous-communication” (e.g. the UART) relies on the implicit matching of the data-rate: if they don’t match the device involved don’t understand each other.
When we have to connect chips together (i.e. very short distances) the synchronous choice is surely better: offers high throughput, reliability, frequency independence and lot more. Its price is a small logic and a certain number of lines to manage it.
When is a synchronous choice disallowed? When we need to exchange data over relatively long distances and when we cannot afford the payload of many wires to connect the devices.

Visual devices connected via SPI.

When we begin playing with Netduino, the first experiences are driving leds, lcds and other “visual” stuffs. That’s absolutely normal, because the visual-way is the most intuitive and direct solution for having a concrete feedback on what our program is doing.
Why should the SPI be involved with visual devices?
Well, just after our “hello-world” program being able only to blink a led, we would try connecting two-three or more leds. Our Netduino luckily provides many I/O ports, so it is very easy to add even a dozen of leds. What’s the sense of wasting all your precious ports just for driving leds?
The Mighty Stefan is one among the first running this “gold rush” around the shift-registers. He enjoyed so much the connection of a shift-register that drove 8 leds, who wanted going further, chaining several chips as a cascade of bits.

At the moment I am writing, I know he connected up to 5 shifters linked together.

However, my very first attempt to use a shift-register with my Netduino was for driving a LCD module. I found reading text much exciting than seeing a psychedelic game of leds. Szymon presented a very good tutorial how to interface a common 16×2 LCD module using a 75HC595 shift-register. It worked at first run.
I would point out that here we will talk about the 74HC595 chip, but there is *not* anything of specific on it. This model is often used, due to its versatility, and it would be preferable to keep the discussion on well-known devices, so everyone can test it easily.

Where is this article going to?

Well, the “problem” could not be an actual problem; it depends on what we are looking for. However there are situations where a simple connection of a shift-register doesn’t solve our problems, otherwise it may happen that the data speed is far from the “megabits-promise” of the manufacturer. The microcontroller specifications say that the SPI could reach even the processor clock, but that sounds much more a theoretical value than an effective data-rate.
We will analyze two hardware solutions to connect a shift-register: one is trivial, the other one is more sophisticated, but it offers a much higher performance.
A generic task will be considered in the two contexts; the program should transfer a series of bytes to an external device, but each byte available should be “notified” to the target consumer. For example we may suppose to feed a parallel DAC, where each 8-bit sample must be latched onto the converter, so that the analog output will be set accordingly. Please, consider this example as merely illustrative. Several DACs have the SPI interface built-in.
The SPI clock frequency is the same for both the circuits and it has be chosen as 2 MHz.

The easy way.

Let’s first analyze how it works the simplest and most common connection of a 74HC595 shift-register.
Here is the sample code for the test. Our goal is to transfer a series of bytes to the register as fast as possible. The buffer transfer is repeated indefinitely, only waiting for a short pause between the cycles. Along a buffer transfer we should expect a byte-rate of approximately 2M / 8 bit = 250 KBytes/s.
Will we able to reach that?

    public class Program
    {
        private static SPI SPIBus;

        public static void Main()
        {
            // Defines the first SPI slave device with pin 10 as SS
            SPI.Configuration Device1 = new SPI.Configuration(
                Pins.GPIO_PIN_D10, // SS-pin
                false,             // SS-pin active state
                0,                 // The setup time for the SS port
                0,                 // The hold time for the SS port
                true,              // The idle state of the clock
                true,              // The sampling clock edge (this must be "true" for the 74HC595)
                2000,              // The SPI clock rate in KHz
                SPI_Devices.SPI1   // The used SPI bus (refers to a MOSI MISO and SCLK pinset)
            );

            // Initializes the SPI bus, with the first slave selected
            SPIBus = new SPI(Device1);

            DoWorkSlow();
            //DoWorkFast();
        }

        ///
        /// Send 8 bytes out to the SPI (one byte at once)
        ///
        private static void DoWorkSlow()
        {
            //set-up a one-byte buffer
            byte[] buffer = new byte[1];

            while (true)
            {
                for (int i = 0; i < 8; i++)
                {
                    buffer[0] = (byte)i;
                    SPIBus.Write(buffer);
                }

                Thread.Sleep(5);
            }
        }

    }

Note: part of the code was “stolen” from the Stefan’s tutorial on SPI, from the Netduino wiki section.

The code shows clearly an overhead of computation, because there’s no way to send a single byte directly the SPI driver. Instead, we must create a single-byte buffer and then populate the unique cell with the desired value.
Another fault of this approach is that we *must* take care of the sending of every single byte, while it could be much more useful making other operations.
Here is the logical timing sequence. Note that on this chart the time proportion is not respected.

Every byte (i.e. every Write call), the SSEL line (Slave Select) will fall and keeps low as long the stream is over. Since the “stream” is just one byte, the SSEL will rise after the 8th bit.
The clock (SCLK) pulses for 8 periods. The data (MOSI) is shifted out the Netduino on each clock falling edge. That is because the 74HC595 needs to sample the data (MOSI) when its clock input rises. To avoid any misinterpretation of data, the best thing is keeping the data perfectly stable during the SCLK rising edge.
The rising edge of the SSEL line is also used to latch the byte shifted on the 74HC595 parallel output.
All that does what we expect, but…what is the real timing?

It is easy to see the 8 single-byte transfer, separated by the pause. It seems that the real delay is almost 6ms instead of 5, but maybe this is not a problem.
Much more interesting is measuring the time that elapses from the beginning of a byte and the next one. For example, this time could be taken as the period between two consecutive rising edges of the SSEL line.
The scope shows 368us, that is about 2.7 KBytes/s: around 100 times slower than the expected rate!

Finally, here is both the schematic and the breadboard layout, for the ones want to build and test this circuit.

The smart way.

The program is almost the same as before, even much simpler and efficient, because the buffer will be sent “as-is”. Our application doesn’t care about how the bytes are shifted out: feels like a heaven!
Here is only the difference from the code above.

        ///
        /// Send 8 bytes out to the SPI (buffer at once)
        ///
        private static void DoWorkFast()
        {
            //set-up the desired buffer
            byte[] buffer = new byte[]
            {
                0,1,2,3,4,5,6,7
            };

            while (true)
            {
                SPIBus.Write(buffer);
                Thread.Sleep(5);
            }
        }

Please, bear in mind that the SPI clock frequency is still the same as before, so the expectation is always a throughput of 250 KBytes/s.


The logical timing sequence of the SPI is similar to the one-byte transfer, but now there is a problem: the SSEL line rises at the end of the last byte. How could be latched the preceding bytes onto the 74HC595 register?
We must add some logic to help the circuit. Here is the revised logical timing sequence.

The extra logic must be able to provide a latch clock to the 74HC595 exactly after every 8th bit. To achieve this, we must consider a counter, such as a normal 3-stages counter, because we need to count just up to 8. This counter should also “trigger” some other logic so that we obtain a pulse. Its rising edge can finally latch the data on the register.

The counter is a 74HC4040. It is a 12 stages binary-counter, increments its value every falling edge of the clock, and could be reset pulling high the related pin. In fact, the reset input is connected to the SSEL line, so that when the transfer begins we are guaranteed that the counter is zero.
The SCLK from the Netduino must feed the counter, but we need to invert it. Remember? The 74HC595 shifts on the rising edge of the clock, so the counter must increment itself at the same time. To invert the SCLK line I used a 40106: it is an old Cmos, not as fast as an HCmos, but it embeds 6 inverters with Schmitt-trigger, and it allows a very large power supply range.
Here is the detail of the SCLK (light blue) and the output of the first stage of the 74HC4040 (yellow). Note the propagation delay from the rising edge of the SCLK and the output change is almost comparable with the half of the clock period. Not a good thing and we should have chosen some chip performing better than the 40106.

The 3-stages counter starts with 000, then 001, then 010, up to 111, then rolls back to zero. We will take advantage from the third bit, because it falls from 1 to 0 just after 8 clocks. At this point, the only problem is to create a pulse triggered from the falling edge of the counter output.
The pulse generator is realized by a small analog circuit (R, C and diode), along with a couple of 40106 inverters.

  • Consider the Q2 output of the 74HC4040 at logic level “1” (+5V), just before is dropping to zero. Across the capacitor the voltage is zero, because there is either +5V from the counter side, and +5V on the TRIG point, pulled up by the resistor.
  • As soon the Q2 falls to zero, the capacitor keeps its voltage drop, bringing the TRIG point toward zero Volts as well.
  • However, through the resistor will flow current that charges the capacitor, so the voltage on the TRIG point begins to rise with an exponential shape.
  • Now, the Q2 output switches to the high level. Again, the capacitor tries to keep its voltage drop pushing the TRIG point over the +5V, but the diode limits it.


The Schmitt-trigger inverters need because the signal on input is analog and it is useful to manage it as best, to prevent unpredictably behavior during the transitions. Two inverters in cascade mean just no logic inversion. We only need the Schmitt-trigger capabilities.
Here follows the detail of the pulse succeeding the 8-clocks sequence. Again, note the not-so-sharp fall and rise of the pulse due to the poor performance of the 40106.

The “smart” (and complex) solution has been built. What about the data transfer performance?
Here is the picture showing the buffer transfer separated by the Sleep pause. Again the spacing is much more toward 6ms instead of 5, but that’s not what we are looking for.

Here is the interesting chart: the scope shows clearly a huge increment of speed between a byte and the next one. The actual byte period is about 5.5 us, which is over 180 KBytes/s. This is still far from the theoretical value of 250K, but it seems the best performance the Netduino can do (keeping stable the clock frequency at 2MHz).

Conclusions.

Using this pretty simple extra logic we are able to hugely increment the actual performance of the Netduino SPI. The analog trick is not a reliable way to manage the digital circuits, but sometimes is acceptable.
This article was a half-a-way step to explain a technique to improve the SPI throughput, because it needs for my next project.
Stay tuned!

Renovating a Collada viewer application – Part I

Introduction.

The community around Google SketchUp offers a very large number of 3D models, really good-looking, also and it is very easy to create new models as well. Several years ago I considered the idea to embed these models available in Collada (TM) format in any WPF applications. So I made a simple program in a very short time to convert this kind of models, so that they could be viewed in a WPF window.
The experimentation goal was reached, but the project stopped as it was.
Nowadays I am considering again the idea about the 3D for some new projects, but I have realized that the old Collada converter is too messy and rigid to be a serious component.
I take advantage of this to show you the renewal of this old program, by splitting the evolution on several parts, so that it will be clear how and where to operate any improvement. This is not because of the viewer itself, but it’s interesting to emphasize the usefulness of the good-practices on programming and to learn the use of the right tools instead.

The Collada format.

Collada is an acronym deriving from Collaborative Design Activity and it defines a file format which is able to describe whole 3D-scenes, even with animations and physics in the latest standard specifications.
I would avoid talking about the story and the tech specs of Collada, and I invite you to visit the related portal. Instead, from my viewpoint, I find useful to highlight that is a XML-based format. Moreover it is a schema very well-oriented toward the common 3D-engines modeling specs: by the way it was born just for this target.
Furthermore, it is not my goal to dig into the programming techniques around DirectX or OpenGL, but I consider that the reader has a minimum of knowledge about the Media3D section of WPF.

COLLADA and the COLLADA logo are trademarks of the Khronos Group Inc.

The first viewer.

By taking the old converter-viewer program, the very first thing you notice is that is a bunch of classes, all of them mixed together, which is making exactly what it is expected, no more, nor less.
It is a simple project for a WPF application, containing a parser, a converter and a 3D-viewer. Everything is somewhat bound together, where is much noticeable the mess instead the ability to keep the sections separated, for reuse and expanding capabilities.
There are several limitations also: the parser support is able to interpret only a small part of the specifications. Moreover the program must be feed with a compressed (zip) Collada source, that because often the source model includes some bitmap.
The comments are almost missing, and that makes the comprehension of the program sources very hard after years of latency.

So far, we may say that the program is working well, and has shown that taking advantage of the Collada models together with the WPF framework was targeted. However, we are quite far from considering this experience as something reusable and professional.
It is much like an experimentation with an electronic circuit on a bread-board: we are going to make it working fastly, but it is not a definitive application.
The logic data-flow is extremely simple: the Collada source file must be in a compressed (zip) form, and is parsed firstly as XML DOM. From this XML model, to the building of an intermediate Collada model, thanks to the “parser” section. The “builder” section takes the intermediate model and builds a 3D scene as WPF, by programmatically creating several Visual3D instances and grouping them as well.

The graphic presentation of the 3D object is realized by a normal Viewport3D control. There are also some helping tools to facilitate the three-dimensional viewing. I am not a 3D-editor expert, but I noticed that the interactivity way offered by Google SketchUp is particularly easy and effective. So I decided to take inspiration and mimic the navigation, giving the users three degree of freedom and the use of mouse to move.

The parser and the builder.

Ad stated above, it is necessary that the source Collada document must be in the compressed (zip) form, i.e. a compressed archive containing an XML document, together with any number of bitmaps used as textures. The extraction of these files from within the source archive is made automatically in memory, thanks to the wonderful SharpZipLib library, as part of the awesome SharpDevelop project.
The very first step is to load the XML document into an XLinq DOM. I prefer more the XDocument than the classic XmlDocument, because it is much more effective and straightforward, plus it is possible to take advantage of the XLinq. Before using C#, I have used XmlDOM (COM) in Visual Basic 6 for years, and it is a very good library anyway, especially when used in conjunction with XPath.

The intermediate Collada model is built by scanning the XML tree, and instantiating the classes upon a simplified schema. To do this, just invoke the static method LoadModel of the ColladaParser class: it returns a context, being the container of several resources realizing the overall intermediate Collada model.
At this point the role of the parser is over.
To generate the WPF 3D model, just call the static method CreateModel of the ColladaBuilder class. Such a function creates a ModelVisual3D instance, based on the intermediate Collada model. The final step is the insertion of that instance into the WPF viewport hosted by the application.

The most careful readers may have noticed an extra step, that could be avoided.
From the Collada XML source (so XDocument) there is a production of an intermediate Collada model, then of a definitive WPF model. It looks as unnecessary passing through an intermediate model; however it offers some easing and is faster to develop as an experiment.
The intermediate model does have a great advantage as well.
If we are supposing to operate starting from a relatively complex source, containing several bitmaps, could arise to a uselessly and costly processing when duplicating the same model many times on the target viewport. The bitmaps extraction itself has a cost, maybe not relevant, but by repeating several times the same operation is surely a waste of resources. Along this way the intermediate model is a concrete trick to avoid waste of CPU time, because the costly operations such as extraction and the textures composition are done only once, during the parsing phase. To replicate any time the WPF 3D model is a relatively cheaper task, done by the builder section, because it has only to “assemble” various parts together as a bunch of ModelVisual3D instances.

It is worth noting that the intermediate model could be managed by the builder with an option. During the 3D surface building pass, that property allows to choose whether to render both the front and the back face, instead of the front only. I was not able to understand how the Collada specs indicate this detail, so I decided to add the choice, and then let the user decide.

Any 3D surface is modeled with many triangular planar sections, called “texels”. Of course, being a planar section, any texel owns two faces. During the building pass, it is necessary to specify the orthogonal vector to the plane (at the triangle vertices). The vector versus along with the vertices sequence determines the “front” face, which is textured by default. The opposite (back) face is not considered normally, and in the viewport it would be seen as non-existent (i.e. fully transparent).

This peculiarity, if not managed properly, can carry to noticeable imperfections when rendered. That is the way I decided to insert an option. Fortunately the WPF libraries simplify a lot this kind of management, because in the GeometryModel3D class there is a property for the frontal “material”, and another one targeted for the back face as well.

The viewer.

The viewer itself is the Viewport3D control that is included in the WPF libraries. We are going to describe what kind of helpers has been built around that control, to ease the 3D object-moving interactivity between the human and the machine.
Must be noticed that this program does not allow to modify most of the parameters of the 3D scene, such as the lights and the background. The only thing the user does is to move the camera around the space, by using the mouse.
As briefly stated, the only way to interact with the program is the mouse, and that is a clear sign of limitation, because it could be useful the use of the keyboard, but the touch also wherever the monitor supports it.
The three fundamental functions allowed for the interaction are:

  • orbit
  • zoom
  • pan

The orbit movement let the user control the inclination of the camera respect to the “floor”.
The zoom moves the camera along the direction of the observer.
Finally, the panning allows the user to move the camera anywhere on the floor plane, since the Y-axis is considered orthogonal to that plane.

Even in this case are noticeable several limitations of the program. For example, it cannot be possible to spin the camera around the observer’s direction; also there is not any easing to move the camera around a particular point in the space (typically an observed point).
The most important limitation is that the parser takes no care about the axis-system declared by the Collada document: this brings often to an upside-down rendering of the model.
That was born as an experimental program and it gives pretty good result, but there are many features that should be added. It should be offered the ability to manage the lighting, to move any single model instead the camera, and so away.
All these gaps should be seen as an input for a dramatic and careful revision of the project, being able to fit easily to extensions and components reuse.
A surgeon cannot operate on a patient without having a clear idea to his problem.
It is necessary to start from a careful analysis of the current application, and then point some targets to reach, step-by-step for the future releases.

Code analysis with NDepend.

I want to emphasize once again the meaning of this series of articles: the real goal is how to write a good code, stable and reliable. It is absolutely reasonable thinking to an application being evolved and got more complex, but that cannot be an alibi to transform the source to a spaghetti-code, having hard time to maintain.
A very interesting tool built for the code analysis is NDepend.
NDepend performs easily several kinds of different analysis of our code, and assemblies as well, leveraging by a very smart core functionality: the CQL (Code Query Language). The CQL is the Columbus Egg’s for the code analysis, because it treats the sources of our projects as they were a database, where we may perform queries in a SQL-like fashion on it.
This simple-to-use, and effective also, engine performs the analysis of a large number of code metrics; it represents the code structure graphically in several ways, its complexity and even any cyclic dependency of classes. This is only an overview of the features of NDepend, and I invite you to browse them carefully on its home-site.
So far, it is interesting to test the Collada viewer code, so to understand where potential structural problems are.
NDepend installs as an add-in for the ordinary Visual Studio IDE, but it may be used as stand-alone for the Visual Studio Express users. The following picture shows the Visual NDepend main screen, being the stand-alone version.

To analyze our viewer code is straightforward: just pick “Analyze VS solutions” and browse for the proper Visual Studio solution.
The analysis process takes only few seconds, and terminates creating a useful HTML report, shown automatically in your favorite browser. Such a report synthesizes the overall result of the many CQL queries performed on the code, so that having at-a-glance an overview about the project structure.

The “CQL Rules summary” section summarizes the overall result of all the queries performed. It is noticeable how any project will be subject of over 100 different tests. In the specific case of our Collada viewer, there is not any critical error, but there have been listed 26 warnings.
At this point it becomes really interesting to dig deeper, just to understand where the problems are and how to solve them. To do this, let’s hide the report and take a look at the NDepend main screen.
This screen looks surely as a high-impact window, plenty of colored frames, but may be also scary for the novices. However it is an environment very well described, so you may gain familiarity briefly.
Let’s go step-by-step.
In the lower side of the NDepend screen there is the “Query explorer” pane, showing all the planned queries along with their result status.

In our test there are several warnings, even the overall result is not so bad. For example, there is a clear indication about the scarce quantity of comments, and a good documentation is a very important task to consider for the development process.

As seen above, each test performed is a CQL query. In the left pane of the NDepend window we may read how these queries are written, along with a brief description about the meaning and the expectations. From my viewpoint, CQL is an extremely powerful and versatile feature, but requires a bit of familiarity with the code-quality analysis common practices.

We do know already about the mess in the Collada viewer sources. Now it’s time to take a peek at the dependency matrix.

A colored box indicates a dependency between types of the related row and column. A blue box means that the column-type refers to the row-type; vice versa for a green box. The number indicates how many dependencies of such a pair of types are involved.

We may see that there are two rows/columns particularly dense of references than others (ColladaParser and ColladaParserContext). That is a good new, because indicates that most of the data are “flowing through” these classes. A certain separation between logical layers indicates pretty good abstraction; despite we are examining a single bunch of classes, offering several different functions.
The bad news is that: there are two cases of cyclic dependency. In other words, it means that two different types refer and depend to each other. This is something that should be avoided, because it makes a barrier toward a good abstraction, thus the separation and reuse of components.

The dependency matrix itself can be seen as a graph. Maybe this could look the most intuitive way to observe the classes’ dependencies, but it is also true that the graph becomes increasingly hard to read due the huge quantity of links.

At any time it is possible to choose a class (type) by clicking on it, so that it will be highlighted, as long its dependencies. Please note the cyclic-dependency cases on the graph, they having a hot-reddish bidirectional link connecting to.

It is also possible to view the complexity of the code structure by viewing the tree map.

Conclusions.

Next time we will see how to approach a similar Collada viewer application, bearing in mind flexibility, abstraction, as long as good-practices of programming. The goal is build a viewer able to grow as functionalities, without falling into a rigid implementation, expensive to maintain.

Here is the source of this application:
Highfield.ColladaViewer.doc
(Remember to change the .doc extension to .zip)