A simple watchdog for the Netduino

What if you create your own Netduino application, and an error halts it unexpectedly?
First answer: no problem, I’m in front of the debugger, and I’ll have the complete stack trace of the exception.
Second answer: it’s a nightmare! I’m in holiday for the weekend, and my Netduino-based sprinkler system looks frozen…I should ask my neighbor to reset the board, or -better- give some water to the grass.

The nightmare.

Obviously we are interested in the second answer, or -better- how to avoid such as situation.
When the hardware is damaged or the software is buggy, there’s no many ways to rescue the board control. However, there are many, many situation where our board works perfectly (maybe for weeks) without any problem at all. As soon it is moved “on-site”, the first problem happens.
Yeah, it’s much like “proudly works on my PC”!
Unfortunately, our customers rarely have such a sense of humour, and don’t “understand” where is your proud. I’m wondering why…
Since none of use would like to be awaken at 3 AM for a system halt, or forced to get back to home because the sprinkler did not water properly, then we could enforce the reliability by adding a so called “watchdog”.
It’s a very simple way to protect the system from undesired halts, nor solves every pain. At contrary, it’s much like an “extrema ratio” for something we really can’t foresee.

A good programming practice, along with a good hardware, are the must-have redundancy for most of the reliability of any system.

Think seriously about it.

Courage the cowardly dog - by John R. Dilworth
Courage the cowardly dog
by John R. Dilworth
Artwork courtesy by EspionageDB7

A simple watchdog.

I don’t know who invented this name…”watchdog”…but it sounds clear (at least for me) that there are two different subjects:

  • the dog, which is the controller;
  • the house, being watched over by the dog.

Now, the potential failing system is the house, and the dog lives “externally” respect the house.
That’s obvious, because if the system hangs how could take itself on the rescue?
It’s so obvious, that many people are asking for a pure-software solution, maybe using a separate thread as a controller.
There are soooo many situations that can get a MCU to hang, that would lead anyone immediately to an external solution.
Some example:

  • under- or over-temperature;
  • voltage spikes (both above the supply and below ground);
  • strong noise in general (especially when long wires are connected directly to the MCU pins);
  • software bugs;
  • hardware instability (e.g. the crystal stops oscillating);
  • many others.

I could have used a custom chip for a watchdog. The Atmel chip of the Netduino embeds a watchdog, but it has not been driven by the firmware.
Instead, I’d like to show a very simple circuit, which aim is primarily for learning how to solve this particular problem.

We’ll use a simple counter: the 74HC4060. It’s a 14-stages binary counter, which embeds also a basic R-C oscillator. All that to obtain a re-triggerable, long-period timer.
The word “timer” calls immediately in my mind the amazing “555” chip: a masterpiece in the hardware design of the ’70. BTW, we need a relatively long reaction time: at least several seconds. That’s because the Netduino takes about a couple of seconds to complete the full reset process, then we should consider the slowness of the program. A normal watchdog reacts within milliseconds, while here we’re considering dozen of seconds, maybe more. For a such long timing the normal 555-timer is not reliable, because relies on the capacitor charge. Also, we would need a pretty large capacitor.
The 74HC4060 is much simpler for long timings. I tuned the oscillator for a frequency of about 60Hz, that is using:

  • Rt = 68k
  • Ct = 100n

Note: refer to the 74HC4060 specs.

Then, I chose the output of the 10th stage (i.e. Q9) as “timeout signal”, which triggers the Netduino reset after about 10 seconds. Now, ten binary stages yield a frequency division of 1024 (=2^10), so why 60 Hz divided by 1024 does not yield 10 seconds, but 20?
Because the reset happens as soon the Q9 output turns high, which is after just half of the overall time.


So, what’s the role of the Netduino, being afraid to be reset from the 74HC4060?
Well, yeah…our program running in the Netduino must “refresh” continuously the counter, so that it won’t never reach the Q9 high. Basically we need any of the Netduino outputs generating a short positive pulse, which has to reset the counter. Until the Netduino application is running properly, the pulse will keep the counter within a relatively low value, and the Q9 never turn high. By the way, when the program hangs, there’s no more reset pulse generation, and the counter can run to reach the Q9 high. That signal will reset the Netduino.

A simple test program.

The following program is used as a test for the watchdog.
It makes the led blinking for a certain period, then generates an exception. That is a simulated “bug”, which actually hangs the whole board. Under such a circumstance, you have only two choices: press the “reset” button, or detach and plug the supply again. Since none of them are operation suitable for a remote context, we’ll introducing a little “helper”, that “press the reset button for us”.

using System;
using System.Net;
using System.Net.Sockets;
using System.Threading;
using Microsoft.SPOT;
using Microsoft.SPOT.Hardware;
using SecretLabs.NETMF.Hardware;
using SecretLabs.NETMF.Hardware.NetduinoPlus;

namespace NetduinoWatchdog
{
    public class Program
    {
        public static void Main()
        {
            //define the led port
            var led = new OutputPort(Pins.ONBOARD_LED, false);

            //just a long loop to make the led blinking
            for (int i = 0; i < 1000; i++)
            {
                //call the critical section
                Freezer(i);

                //wait for a while, then toggle the led status
                Thread.Sleep(100);
                led.Write(
                    !led.Read()
                    );
            }
        }

        static void Freezer(int count)
        {
            //this is just to simulate an unexpected event
            if (count == 20)
                throw new Exception();

            //keep the dog awaken
            Watchdog();
        }

        //define the watchdog port
        static OutputPort wdt = new OutputPort(Pins.GPIO_PIN_A5, false);

        static void Watchdog()
        {
            //generate a positive pulse to reset the external counter
            wdt.Write(true);
            wdt.Write(false);
        }

    }
}

There is no other code, because the project is mainly focusing the external circuit using the 74HC4060.
Also it's clear that a similar source will hang every time: it has no sense in a real context. A more realistic application should be much more "exception-free proof", and maybe is able to "correct itself" upon a certain failure. For instance, consider your application is writing a file on the SD, but the user pull out the card. It's a bit difficult to write a bullet-proof procedure that writes data without any exception. However, once the Netduino has been reset, can test for the SD presence, and avoid any related operation.

The demo.

Enjoy it!

30 thoughts on “A simple watchdog for the Netduino

  1. Monroe

    Hi Mario. External watchdog timer for the Netduino is a great idea. Could you please share the schematic diagram for this project? Thanks!

    • Mario Vernari

      Uh, funny!…I’m using to share the schematics, but most of the people aren’t able to understand them, and prefer a Fritzing-like snapshot of the circuit.
      Let me some hour for a bit of spare time, then I’ll post it.
      Thanks.

  2. Monroe

    Wow, excellent, Mario, grazie!! I guess I’m old school…I prefer the schematics. Thanks so much for the super quick reply and posting of the schematic. This is a huge help. My netduino application relies on ethernet connection and the processor hangs up if ethernet is lost for any reason.

  3. Tal

    I found out that when netduino Plus II gets the reset, the 3v3 drops to 0.5v (USB powered board) and causes the board to hang.

    Did you get the same?

    • Mario Vernari

      Good catch!
      I still hadn’t try on the Plus 2, but it’s clearly another point against the choice to detach the +3.3V and the +5V instead of keeping them always connected.
      Let me some time to find out a workaround.
      Cheers

      EDIT: during the reset cycle, the +3.3V rail is cut off for just 1ms, thus a simple 10uF capacitor across the +3.3V output should be enough. Could you try that for me?

      • Gerard

        Hello Mario,

        Is is possible to elaborate more about the problem of the plus 2?
        And is the 5 volt not powered down for a msec? Therefore the suggested solution to connect to 5V instead of 3.3Volt?
        I cannot test is because I have the one version but want to solve it for the two-owners🙂

      • Mario Vernari

        Gerard, let me a little time and I’ll try the circuit for the Netduino 2. I don’t think there’ll be particular problems, though.
        Cheers

  4. JP van Mackelenbergh

    Hello Mario, I ran into your solution and I would like to (also) use it on the Netduino Plus 2. Did you try that in the meantime ?

    • Mario Vernari

      No, I still haven’t tried because I’m pretty busy. However, I believe it should be working fine: at most, it need a capacitor for holding the missing supply during the power cycle.
      If you have more time than me, feel free to try and I’ll assist you, in case.
      Cheers

  5. René

    This solution will not work when the MCU has crashed in a way, in which the output Portpin connected to MR stays high forever. Only pulses should reset the counter not a permanent high signal. If you use a decoupling condensator in the line together with 2 resistors to ground it should work also in that case. 100nf with 2x100k on both ends of the cap to ground.

  6. Mario Vernari

    You are right, however I consider to generate a very tight pulse in order to minimize the possibility to get hang in such a context.
    I will modify the circuit with your suggestion, as well as for the Netduino 2.
    Thanks and cheers

    • Mario Vernari

      Of course it’s possible! The actual challenge is making a reliable watchdog with a minimum of parts!
      Cheers

  7. Larre Ländin

    This was great and very helpful. I had one major problem when I first tested the watchdog without attaching it to the Netduino. Took a while before I realized that MR needs to be grounded in order for the 74HC4060 to start counting. Also tried out Renés idea about the capacitor and resistors on the MR line, worked as a charm.

  8. Cyn

    Hi Mario, You did a great job on explaining the watchdog. My Netduino periodically hangs and I thought this to be the perfect solution. Thank you for this article.

    I am trying to set it up on my Netduino Plus v1 and almost have it running. I am using a 2N3904 instead of the BC337 which I understand I need to flip (flat side away instead of towards me) in the diagram above. I think I have everything wired as shown but it doesn’t restart the Netduino.

    What I have found is that if I touch the Collector wire with anything metal it restarts the Netduino. I was trying to get readings with my multimeter and to my surprise the Netduino restarted when my probe touched it.

    Do you have any thoughts on what I did wrong?
    Thanks

    • Mario Vernari

      Yes, you have to flip the 2N3904, but its specs are similar to the BC337, so everything should work fine.
      About the issue you say, it looks as you were NOT closing the RESET to the ground. Are you sure about the emitter hardly wired to the ground? Again, place a voltmeter on the pin 15 of the 4060, and wait for the timeout: the pin should rise to the logic “one” and the collector going to the ground (close to zero Volts). Verify that, please.

      • Cyn

        Thanks for getting back to me so quickly. I moved the watchdog to it’s own breadboard, took off the xBee Shield and disconnected the Ethernet cable just to make everything more clear. I tested the transistor, replaced the 4060 chip and verified that I’m running 3.3v off the Netduino to the breadboard just to be sure. Everything looks good to me but obviously something’s not right.

        Ok on to your questions: Yes the emitter is connected to ground. When I touch Pin 15 with multimeter RED wire OR the Collector Pin (Black wire) it resets the Netduino once. I have to move the wire off then back on to reset it.

        When I leave the Red Wire on Pin15 and touch the Collector pin (Black Wire) it reads 3.3 thru the entire cycle and doesn’t restart the Netduino.

        When I leave the Red Wire on Pin15 and touch the Emitter pin (Black Wire) it reads 0 thru the entire cycle and doesn’t restart the Netduino.

        This is the first time I’m working with a timer or resistor, I thank you for your time.

      • Mario Vernari

        I don’t believe you made any mistake with transistors and timers. Instead, looks like the RESET pin is floating, but on the schematic it’s clearly connected to a pull-up. Please, detach the Reset pin from the collector and try to measure its voltage: it should be close to +3.3V and NOT reset the board. A voltmeter is almost irrelevant when inserted in a circuit (unless is faulty), and if you observe an unstable behavior it may be the pull-up broken or missing. Check it up, please

      • Cyn

        Ok here’s what I see, Pin 15 is connected to 2.2K(red red red) resistor then the other side of the resistor is connected to the base of transistor (middle pin). The emitter is connected to ground on the breadboard and the collector is connected to Netduino /Reset. Am I doing this right?

        I disconnected the /Reset wire from the collector pin on the circuit and attached to multimeter .. it reads a constant 3.3v The Netduino does not reset until I start disconnecting the multimeter

        Thanks again for helping me understand this.

      • Mario Vernari

        Fine. What happens when the counter reaches the timeout, that is the pin 15 goes high? You should see the /reset falling, and the Netduino resetting. Check this.

      • Cyn

        I never get a reading on Pin 15. The mutlimeter always shows 0. I’ve tried 3 different 74HC4060s. Same result. Do you think they could all be bad?

      • Mario Vernari

        I don’t think so. Assuming you don’t own a scope, you should proceed as follows: take a led (any color) and a resistor (300-500 Ohms, not critical). Now, wire the resistor to the ground and the led in series, so that by connecting the other led lead to the +3.3V it will light. At this point, leave the 4060 running free (detach the /reset), and point the led “probe” to each counter output (Q3, Q4, etc). You should see the led blinking as soon the frequency is low enough to perceive. If everything’s going fine, you should see the led lighting on/off even on the output feeding the transistor base.
        http://www.nxp.com/documents/data_sheet/74HC_HCT4060_Q100.pdf

      • Cyn

        No scope and the LED doesn’t Blink. Thanks for the link to the datasheet, I’ve been reading it ever since I decided to try this project and it actually makes sense to me. Is it possible to send you a picture to make sure I added the LED in correctly?

      • Mario Vernari

        Sure! Take one or more closed snapshot of the circuit and send me at “vernarim” (at) “libero” (dot) “it”.
        However, at this point looks like the problem is right on the oscillating stage of the 4060. Don’t worry, though…

  9. Cyn

    Eureka! I now know the difference between 68 ohms & 68K ohms. Thank you so very much for walking me thru this. Sending you a plate of Mom’s homemade gnocchi.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s