Fast calculation of the square-root

At first glance this post seems having no sense. Today’s programming languages have at least minimal math to work with. Even an “ancient” 20-year old home PC has a CPU with a powerful coprocessor, doing blazing fast math operations. However not always is so simple: if you have tried to work with microcontrollers (MCU), then for sure you have noticed that even the most common things available on a PC yield a relative effort to solve.

By playing with my Netduino and the .Net Micro Framework, I need for a basic calculation of a square root. I mean the function for real numbers only, not involving complex values at all. Believing or not, that function is not available on the framework, so the simplest way I may do is searching for a pretty good C# algorithm. That calculation has to be fast enough, but I don’t need an high precision: by working with floats, the precisions is very limited.

Searching in internet for an algorithm that could fit my needs, I have found only this one, which is part of Grommet: an open-source toolkit tailor-made for the .Net MF.

        /// Returns the square root of a specified number
        /// A number
        /// square root of x
        public static double Sqrt(double x)
            double i = 0;
            double x1 = 0.0F;
            double x2 = 0.0F;

            if (x == 0F) return 0F;

            while ((i * i) <= x)
                i += 0.1F;

            x1 = i;

            for (int j = 0; j < 10; j++)
                x2 = x;
                x2 /= x1;
                x2 += x1;
                x2 /= 2;
                x1 = x2;

            return x2;


The problem is that function is pretty fast for arguments around 1, but incredibly slow as much as the value is far from. More after, that uses double representation that is heavy for a such small device like Netduino.

The Newton-Raphson method.

That algorithm takes advantage by the Newton-Raphson method, that is a formal explanation for the most ancient method to calculate the square-root: the Babylonian method.

The Newton-Raphson way is pretty convenient to find out the solution of a similar problem. Its convergence is very fast, but may have problems when the function is not monotonic. Fortunately this is not the case.

Finding the square-root of the variable x, it means solving the equation:

y = \sqrt{x}

that is, finding the (unique) zero-crossing points of the follows:

y - \sqrt{x} = 0 or even y^2 - x = 0 (always being y non-negative)

On the following graph it is easy to understand what it means to solve the equation above. The black curve is the square root of the x-axis. Given y=1.414, for example, we must search for the intersection of the red line with the black one: the abscissa is the desired value.

Graphically solving the equation to find the square-root

To make the convergence fast enough, using few iterations, the starting point is very important. That is my approach is to make the iteration working on a very limited domain, from 1 to 100. Values outside these bounds are prescaled accordingly. That is possible since any number x can be written as follows:

x = x' \times 10^{2k} where k is any integer and x’ is a real number falling into the 1..100 interval.

but also the calculation of the square-root could be simplified as follows:

y = \sqrt{x} = \sqrt{x' \times 10^{2k}} = 10^k \times \sqrt{x'}

Another trick I have added to my version of the algorithm is to split the domain (1..100) in two decades. For every decade there is a specific starting point, who has been found during the optimization process.

The basis points of the optimization are:

  • the working interval (1..100) must be split to two parts only;
  • the number of iterations should be as little as possible;
  • the maximum absolute error must be under 0.01% (i.e. < 1E-4);

With a simple optimization program running the target algorithm under several conditions, I have found that three iterations are enough to fit the constraint.

Here is the final source:

namespace Highfield.Core
    /// Math Library for Micro Framework
    /// (C)opyright 2011 Mario Vernari,
    /// This Sourcecode is Public Domain.
    /// You are free to use this class Non-Commercialy and Commercialy.
    /// This sourcecode is provided AS-IS. I take no responsibility for direct or indirect
    /// damage coused by this program/class.
    public static class Math
        /// Returns the square root of a specified number
        /// <param name="x" />A positive real number
        /// The square root of x
        public static float Sqrt(float x)
            //cut off any special case
            if (x <= 0.0f)
                return 0.0f;

            //here is a kind of base-10 logarithm
            //so that the argument will fall between
            //1 and 100, where the convergence is fast
            float exp = 1.0f;

            while (x < 1.0f)
                 x *= 100.0f;
                 exp *= 0.1f;

            while (x > 100.0f)
                x *= 0.01f;
                exp *= 10.0f;

            //choose the best starting point
            //upon the actual argument value
            float prev;

            if (x > 10f)
                //decade (10..100)
                prev = 5.51f;
            else if (x == 1.0f)
                //avoid useless iterations
                return x * exp;
                //decade (1..10)
                prev = 1.741f;

            //apply the Newton-Rhapson method
            //just for three times
            prev = 0.5f * (prev + x / prev);
            prev = 0.5f * (prev + x / prev);
            prev = 0.5f * (prev + x / prev);

            //adjust the result multiplying for
            //the base being cut off before
            return prev * exp;

The performance measured on a standard PC running the standard .Net Framework is amazingly high, considering the computations are made via managed C# with any native code. However we must bear in mind that the precision obtained is quite far from the double offered by the System library.

The performance comparison between the native function and the algorithm

The program used for the test is basically a large loop (N=100,000), where every iteration there is the calculation of a predefined set of values. The simplified version used for the Micro Framework is shown below.

            //creates an array of real number to calculate
            //their square-root
            var given = new float[369];
            var count = 0;
            var exp = 1e-20f;
            while (exp <= 1e20f)
                for (int i = 1; i < 10; i++)
                    given[count++] = i * exp;

                exp *= 10.0f;

            //start the test
            var crono = Utility.GetMachineTime().Ticks;

            const int N = 1;
            float temp;
            for (int i = 0; i < N; i++)
                for (int j = 0; j < count; j++)
                    temp = Highfield.Core.Math.Sqrt(given[j]);

            //read the elapsed time
            crono = Utility.GetMachineTime().Ticks - crono;

Sounds good…but what about the performance on a Netduino-turtle?

I have tested the algorithm on the .Net Micro Framework by loading an equivalent program on Netduino. The only difference is the number of iterations, but the set of values is exactly the same as the desktop version.

The test yielded about 225ms to complete just one iteration, over the 369-values set. That is less than one millisecond for a single calculation.



3 thoughts on “Fast calculation of the square-root

  1. Stefan Thoolen

    Nice solution. Math should be fast and simple indeed. I find it interesting that NETMF lacks quite some basic features imo.
    On the other hand, it just requires some tricks to work around those missing features like you did and it saves some precious memory to not include features that aren’t in use.

    • Mario Vernari

      There is no problem to share my code, but I think that it could be written even better by using native code. I have posted this article for anyone cannot take advantage of C/C++ and must use managed code.
      It seems also that .Net MF is completely written in C/C++, so this code won’t fit at all.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s