Computable Minds -

Any algorithm in hardware is faster than in software

Posted on: Aug, 27th 2010
Code and a processor
In this post I'm going to explain something quite basic, but a lot of people unknown or simply don't understand well.
I'm going to clarify why the algorithms (sequence of steps that has to realizy a computer to complete a determinate task), are execute more faster in hardware than in software.

I begin defining each thing separtely:

1.- Implementation of an algorithm in software or that is the same, prepared for execute it in a processor:

The software always has to be executed in the hardware of the machine where resides. Normally, always we have a general porpuse processor, this name is due to that is built to execute any algorithm. For this uses a set of general instructions, by example, "sum two numbers", "store this value in this position of the memory" or "load the value from this position of the memory".

Also, to execute every instruction , the general purpose processor has to perform a sequence of steps, that always are the same (not as the hardware implementation).

I must clarify, that in a computer, aditionally to the processor, we can execute parts of the software in others devices, like for instance in a graphic card, that use more specific instructions. But when someone talk about of the software implementation of an algorithm is for say that only use the processor.

2.- Implementation of an algorithm in specific hardware:

An algorithm implemented directly in the hardware, can execute it faster, because the only instruction that has to make is "execute the algorithm".

The principal reason of be faster, is that you are not tied to a general instruction set and there are more freedom to decide the way to resolve the problem.

To implement the algorithm, the basics components of the hardware (logic gates) are joined to build other components more complex. To the implementation several optimization techniques can be used, that can't be used in the software implementation, for instance, divide the problem in parts to resolve at the same time (parallelize).

If we implement an algorithm in a chip we were talking about a specific purpose processor.

- For instance:

Imagine a processor that only have the instruction "sum two numbers".

If we have to multiply two numbers with this instruction, the processor will have to execute it several times to get the result.

Instead, if we have implemented hardware that allows multiply directly, only would have to exectue the multiply instruction only one time.

- Conclusions:

When more complex is the instruction to implement, we will save more time if we implement it in hardware.

Now imagine, for instance, that we have to execute only with sums, an operation a little bit more complex, as multiply a matrix by a vector (this is used among other things to move 3D objects.). The number of instructions in the general purpose processor will be quite more high.

Normally, if we want that a complex algorithm, implemented in software in a general purpose processor, be execute faster than another implemented directly in hardware, we have to use hundreds of this processors working in parallel.

The problem of use hardware is that is more expensive. Only we can use it to the things that are specifically designed for, and to update it, normally we have to replace it.

So, the next time that you see a graphic card wit "3D hardware acceleration", now you know why accelerate. This phrase has led that the people have invented the no sense term "3D software acceleration" concerning to games that only use the processor. It's would be more correct to say something as "3D engine by software".

Another interesting thing, to take it in account, is that, rarely, when we have a program that don't use adequately the hardware acceleration, the information racking between the processor, the memory and the specific hardware can make that the program will be execute more slow even than using only the processor. But this occurs only in aplications rottenly implemented as the Adobe Flash Player.

Talking about another topic, also there are reconfigurable hardware, that allows program an algorithm in seem way as we can do in software and execute it almost to the same speed that its hardware implementation. But about that I will talk in another post.

Comments (2): See comments Comment


Copy and paste in your page:

How about you!? Don't give your opinion?

Replying to the next comments:

To check if you are human answer the question correctly:

I don't like this question, change it!

None of these data will be stored.

(Write the e-mail)

Required field.

(Write the e-mail or several e-mails separated by coma)

Required field.

To check if you are human answer the question correctly:

I don't like this question, change it!

Date: Jan, 11th 2012 Time: 1:17:19

Very interesting and helpful, thank you.

You need to improve your grammar though.

Date: Apr, 14th 2013 Time: 11:43:05

You are wrong.

Prior to implementing an algorithm in hardware, I usually make some experiments in software and frequently experience that my 3.x Ghz desktop is faster than the 20 Mhz FPGA board I'm bound to testing my designs with. On the other hand, I agree that the final ASIC might indeed be faster.

So, what did you forget to mention: right, the clock frequency.

Hardware is not _per se_ faster than software.

Daiatron on Google+