< Previous post

Computable Minds

Next post >

En español: El viejo mito de la fragmentación en Linux

102

visits

315

votes

-74

+63

-137

What is fragmentation and the old myth of the fragmentation in Linux

Posted on: Mar, 11th 2012

In the computer science, sometimes, appears false beliefs due to the ignorance of that really occurs in the machine. This time I'm going to demystify some of this old beliefs, which say that the hard disk can't be fragmented when we run a Linux based operating system.

First, for those who don't know, the fragmentation consist in the store of a file in blocks in a way no contiguous, which produces that the hard disk go slow, because it has to displace more times the heads to read the same file.

I'm going to extend a lot of in this article, so for to those that don't want read so much, I'm going to explain why the fragmentation is unavoidable in the following three points:

1. Several process could want write in the disk at the same time, and the blocks allocation algorithm has to allocate it. This is made a lot of times without knowing the real size of each file.

2. When the hard disk is almost full only can store the files in the available blocks.

3. The block allocation algorithm can't predict how much will grow the files or which files will be deleted.

I know that a lot of Linux users will insist in that their system can't be fragmented more than a determinate percentage and this is because the tool fsck of Linux, gives the percentage of non-contiguous files. This is information totally irrelevant to know how much is the real fragmentation of the disk. We can have a million of little files without fragmentation and only one thousand files very fragmented, so this tool will give us a low percentage, but the tool of Windows will give the real fragmentation percentage of the disk and not of the fragmented files, so in Windows, always will have a percentage more high to the same level of fragmentation.

Another argument that use the defenders of the impossible fragmentation under Linux, like Roberto Di Cosmo that wrote the article "Chests of drawers and brainwashing", is the fact that in Windows exists the tool for do defragmentation and in Linux don't (not at least in a native form). That this tool doesn't come installed by default, don't mean that it's useless. In fact, in Windows they give better support to this problem providing this tool.

Normally the file system is blamed of that a concrete operating system has more or less fragmentation. Really this isn't totally correct, because the real guilty isn't the file system, but a part of this, which is called blocks allocation algorithm, which has the task of decide where will store the files that we create. This algorithm can vary from different implementations of the same file system, for instance, the file system FAT of Windows, implements in a different way this algorithm, respect to the implementation that do Linux of the same file system.

Next, I'm going to detail the three points that I have indicated initially. For this I'm going to suppose that we have a very small fictitious hard disk with 11 free blocks (I'm going to represent each block with "[]" and if it's taken, inside I'm going to write the number of the file that it contains in this format: [number of the file.number of the chunk of file]). Let's go:

1. Several process could want write in the disk at the same time, and the blocks allocation algorithm has to allocate it. This is made a lot of times without knowing the real size of each file.

When two process have to write at the same time in the hard disk, to avoid that the user perceives that one of them have been stopped and thinks that really they work at the same time, the operating system leaves that a process stores a chunk of file, take out the process from the processor and loads the other process, which stores other chunk of a different file. In this kind of situation is easy that occurs something like this:

We suppose that we are downloading in our browser two files at the same time. Each archive begins to be stored in a different part of the disk:

[1.1][   ][   ][   ][2.1][   ][   ][   ][   ][   ][   ]

When the time passes the process has left the hard disk in this state:

[1.1][1.2][1.3][1.4][2.1][2.2][1.5][   ][   ][   ][   ]

So the archive 1, had occupied 5 blocks, one block more than the expected, so fragmentation appears.

2. When the hard disk is almost full only can store the files in the available blocks.

In this case, we suppose that we have 10 blocks that are occupied and the files are in an ideal situation, that is to say, perfectly aligned:

[1.1][1.2][2.1][2.2][3.1][3.2][3.3][4.1][4.2][5.1][   ]

The next day we delete two files, the 2 and the 3, these files was occupying two blocks each one, so we have this:

[1.1][1.2][   ][   ][3.1][3.2][3.3][   ][   ][5.1][   ]

Next comes a new file, we can call it file 6, which occupies three blocks. Also, we have to take into account, that to fill a block we have to wait two hours, and in take out a block one hour. In this case we have two possible strategies:

1. Store the file 6 where we can causing fragmentation. In this case we spend three hours in storing the new file in the three blocks:

[1.1][1.2][6.1][6.2][3.1][3.2][3.3][6.3][   ][5.1][   ]

2. Move the file 3 to give space to the file 6 and so don't get fragmentation. In this other case, we spend three hours in take out the file number 3, six hours storing it again and other three hours storing the file number 6. Altogether we have spend eleven hours in store the new file, achieving that the hard disk be perfectly organized, so we save five minutes each time that we want read the file 6.

[1.1][1.2][3.1][3.2][3.3][6.1][6.2][6.3][   ][5.1][   ]

Obviously, by the general, we aren't going to have the time for use the second strategy, so all the file systems always use the first option. So, in all the file systems we are going to have fragmentation, but this is better that have to wait all the day for copy the holiday photos. And doesn't exists any other way of solve the problem to avoid the fragmentation and also don't have to defragment when we store a file? No, there isn't any unless we have an infinite hard disk.

3. The block allocation algorithm can't predict how much will grow the files or which files will be deleted.

We have stored two files. At the time of allocate blocks, the algorithm can choose between store it in a contiguous way or one separated from the other, because perhaps the first file will grow. The first strategy is better for the case 2 (which usually use the file systems of Windows) and the second for this case (which usually use the file system of Linux and that creates a problem called free space fragmentation). So suppose that we are using the best strategy for this case, store it in a scattered way:

[1.1][1.2][   ][   ][2.1][2.2][   ][   ][   ][   ][   ]

Now we have to add three chunks more to the file 1. Again we have two possible solutions:

1. Store the file that remains next to the file 2, causing fragmentation:

[1.1][1.2][1.3][1.4][2.1][2.2][1.5][   ][   ][   ][   ]

2. Take out the file 1 and put it next to the file 2. In this way there isn't fragmentation but we spend a lot of more time storing the new chunk of file:

[   ][   ][   ][   ][2.1][2.2][1.1][1.2][1.3][1.4][1.5]

Again, the solution that choose absolutely all the file systems for its blocks allocation algorithm, is the first, because the second is temporally unviable.

I don't know if it's better the allocation block strategy in the file systems of Linux or in the Windows file systems, perhaps be the ext4 of Linux or the last NTFS of Windows 7. If someone knows some scientific paper where it had been tested can comment it, but it's sure that the fragmentation is unavoidable and grows with the use.

At last, for those that after read this, thought that already they have an idea sufficiently clear to understand the problem of the fragmentation, they must know that they doesn't know a one tenth part of the problem. Really, is something a lot of more complex and there are a lot of varied techniques to solve it. For that the hard disk can read the faster as possible a file that occupies more than one track, the file must be allocate in a determinate ideal position, for that when the head move to that track after read the previous, it rests perfectly aligned with the next block that have to read.

Comments (0): Comment

Categories: Critic, Hardware

Partitions of the hard disk. Are good or bad?

Hard Disk: the size is important

Copy and paste in your page:

How about you!? Don't give your opinion?

Send this post to a friend

< Previous post

Computable Minds | Return

Next post >

En español: El viejo mito de la fragmentación en Linux