How the JPEG algorithm cheats the human eye

Posted on: Aug, 3rd 2011

Surely that all of you know the image files ".JPG", and if you don't, you must know that your mobile, digital camera, web cam, tablet PC, etc., do photos in that format. Its use has proliferated so much because allow reduce the size of the image without compression 10 times, without we notice that the quality has been reduced, and this is achieved thanks to the application in the compression algorithm of the knowledge that we have of the eye.

The JPEG compression algorithm, is a lossy compression algorithm, this means that lost information and, also, image quality, but, thanks to the way that take advantage of the human eye defects, we don't notice the lost.

Let's see step by step the working of the algorithm:

Normally we begin with an image where each pixel or point of the image is formed by an intensity of red, other of green and another or blue (it's said that we have a channel by each colour). So that the first thing that do the algorithm is transform this way of represent the image, to another where we have two colour channels and another of brightness, instead of the three original channels of red, green and blue.

After, it reduces resolution to the colour channels, that is to say do a subsampling. This is something seemed to pixelate the image. It's does assigning the same colour to each block of, for instance, four pixels of the colour channels, trying that be the more seemed to the colours that there are in original image. Let's see it more clear with an image where is shown the result of pixelate the different channels in different ways (click here).

This is done just like that because the eye has more brightness resolution than of colour resolution. We have a million of rods (the photoreceivers of the brightness on the retina), in front of 300.000 cones (photoreceivers of the colour), so if we reduce resolution to the the brightness channel we notice it immediately, while that if we do it with the colour channels, we remove a lot of information and we don't notice the difference, as we can observe in the prior image.

This first step usually we can configure it in the programs of photographic retouching when we save a JPG image. Normally appears something as "4:2:2", "4:2:0" o "4:4:4". The values are in the notation of the chroma subsampling, and its meaning is the following:

To "X:Y:Z":

X= horizontal sampling frequency of the brightness channel. The sampling frequency is the speed to that a dropper will go taking pixels when an image pass below it, always at the same speed and in this case, as is horizontal, the image would pass below of the dropper cutted in rows and putting the rows one before of the other.

This value always is 4 because is the speed that has been established as reference speed in the case that we take all the pixels, so we don't lose resolution. Historically this value was take because is related to the sampling frequency of the television without high definition. Here you have a visual example of that would be the sampling frequency of 4(the first) and the sampling frequency of 2:

example of the ideal sampling frequency

example of the half of the ideal sampling frequency

Y= horizontal sampling frequency of the colour channels in relation to X. X marks the maximum speed of reference, so if we have a 2, will be taking the half of the pixels (because is the half of the reference speed given by X), or that is the same, we take 2 pixels of each 4, so we have the half of horizontal resolution.

Z= vertical sampling frequency of the colour channels in relation to Y. In this case, the value of Y marks the maximum speed of reference, so if we have a 2 in Y, and a 2 in Z, Z will take pixels at the necessary speed to don't jump any. The same occurs with a 4:4:4. In the 4:2:0, the 0 is used to indicate that is take the half of resolution horizontal and vertical, as would be in the 4:2:1, but taking the average of block of 2x2 that are formed, instead of the first pixel that appears.

With this information now you can deduce which are the formats that lose more quality, compressing more and which are the ones that happens the opposite.

After are divided all the channels in squares of 8x8 pixels (the squares that can appreciate when we see an image very compressed) and the next steps of the compression algorithm are applied to each one.

In the following step is do it something called two-dimensional discrete cosine transform and perceptual quantification. Visually it makes a smoothing the sudden variations of brightness and colour. This is as apply a kind of blur imperceptible to the image. This is done due to that the human vision is less sensible to the big variations in a small zone than to the little variations in a big zone. This selective elimination of information is based in statistic studies of the human vision, which are do it polling persons.

In the programs of image edition we can adjust this smoothing of the big variations in small zones, moving a bar that indicates the level of compression or quality of the JPG. When less quality we have, more we notice the squares in that the image are divided, less we see the details of the textures and less space will deal the final archive.

Finally, a lossless compression algorithm is applied, called algorithm of Huffman.

Comments (0): Comment

Categories: Algorithms

Copy and paste in your page:

How about you!? Don't give your opinion?

Send this post to a friend

< Previous post

Computable Minds | Return

Next post >

En español: Como el algoritmo de compresión JPEG engaña al ojo humano