Storage Solutions raid 0 query

Status
Not open for further replies.

stormblast

Forerunner
wanted to know whats the difference in raid 0 array between 4kb cluster & 64kb cluster.

I myself have always used 4kb cluster. would like to know from someone who has used both 4kb & 64kb cluster sizes. what is the technical difference in the 2 & also want to know if there is any realtime difference noticable in the two while using the pc.
 
Well typically the cluster size should be same as the average request size in multiuser environments. Since for games, the request size is generally quite big... (textures, sounds etc) a bigger cluster size like 64K or 96K would be better than a smaller cluster size. I'm not certain on this, I'm just speculating. When I had raid enabled on my drives, I used 64K. The only real benefit of striping is that the average transfer rate doesn't taper down at the end of the medium like in normal single drive setups.
 
other guys using raid 0 array reply, whats taking you so long. deejay karan etc.

@ chaos - at 64k cluster size. did you notice a difference in games & while loading. & also in general usage while loading files n doing multiple stuff. cause i used raidx0 array 4k cluster size it made a huge difference just not in benhcmarks but it was easily noticed in game loading which was twice or more then twice the speed in loading levels etc & also fast in max n photoshop.
 
^^Game loading was the only thing which seemed a bit faster. I'm gonna raid my 200GB PATA drives in a few days... Shall try out different cluster sizes once i do that. Gotta backup all the data before I do that though so it might take a while!
 
I will just quote the technical aspect of cluster size - about which is better depends on the file size/data being requested.

AnandTech said:
As we mentioned before, stripes are blocks of a single file that are broken into smaller pieces. The stripe size, or the size that the data is broken into, is user definable and can range from 1KB to 1024KB or more. The way it works is when data is passed to the RAID controller, it is divided by the stripe size to create 1 or more blocks. These blocks are then distributed among drives in the array, leaving different pieces on different drives.

Like we discussed before, the information can be written faster because it is as if the hard drive is writing a smaller file, although it is really only writing pieces of a large file. At the same time, reading the data is faster because the blocks of data can be read off of all the drives in the array at the same time, so reading back a large file may only require the reading of two smaller files on two different hard drives at the same time.

There is quite a bit of debate surrounding what stripe size is best. Some claim that the smaller the stripe the better, because this ensures that no matter how small the original data is it will be distributed across the drives. Others claim that larger stripes are better since the drive is not always being taxed to write information.

To understand how a RAID card reacts to different stripe sizes, let's use the most drastic cases as examples. We will assume that there are 2 drives setup in a RAID 0 stripe array that has one of two stripe sizes: a 2KB stripe and a 1024KB stripe. To demonstrate how the stripe sizes influence the reading and writing of data, we will use also use two different data sizes to be written and read: a 4KB file and a 8192KB file.

On the first RAID 0 array with a 2KB stripe size, the array is happy to receive the 4KB file. When the RAID controller receives this data, it is divided into two 2KB blocks. Next, one of the 2KB blocks is written to the first disk in the array and the second 2KB blocks is written to the second disk in the array. This, in theory, divides the work that a single hard drive would have to do in half, since the hard drives in the array only have to write a single 2KB file each.

When reading back, the outcome is just as pretty. If the original 4KB file is needed, both hard drives in the array move to and read a single 2KB block to reconstruct the 4KB file. Since each hard drive works independently and simultaneously, the speed of reading the 4KB file back should be the same as reading a single 2KB file back.

This pretty picture changes into a nightmare when we try to write the 8192KB file. In this case, to write the file, the RAID controller must break it into no less than 4096 blocks, each 2KB in size. From here, the RAID card must pass pairs of the blocks to the drives in the array, wait for the drive to write the information, and then send the next 2KB blocks. This process is repeated 4096 times and the extra time required to perform the breakups, send the information in pieces, and move the drive actuator to various places on the disk all add up to an extreme bottleneck.

Reading the information back is just as painful. To recreate the 8192KB file, the RAID controller must gather information from 4096 places on each drive. Once again, moving the hard drive head to the appropriate position 4096 times is quite time consuming.

Now let's move to the same array with a 1024KB stripe size. When writing a 4KB file, the RAID array in this case does essentially nothing. Since 4 is not divisible by 1024 in a whole number, the RAID controller just takes the 4KB file and passes it to one of the drives on the array. The data is not split, or striped, because of the large stripe size and therefore the performance in this instance should be identical to that of a single drive.

Reading back the file results in the same story. Since the data is only stored on one drive in our array, reading back the information from the array is just like reading back the 4KB file from a single disk.

The RAID 0 array with the 1024KB stripe size does better when it comes to the 8192KB file. Here, the 8192KB file is broken into eight blocks of 1024KB in size. When writing the data, both drives in the array receive 4 blocks of the data meaning that each drive only has the task of writing four 1024KB files. This increase the writing performance of the array, since the drives work together to write a small number of blocks. At the same time reading back the file requires four 1024KB files to be read back from each drive. This holds a distinct advantage over reading back a single 8192KB file.

http://www.anandtech.com/storage/showdoc.aspx?i=1491&p=5

One more thing to check out

The impact of stripe size upon performance is more difficult to quantify than the effect of stripe width:

Decreasing Stripe Size: As stripe size is decreased, files are broken into smaller and smaller pieces. This increases the number of drives that an average file will use to hold all the blocks containing the data of that file, theoretically increasing transfer performance, but decreasing positioning performance.

Increasing Stripe Size: Increasing the stripe size of the array does the opposite of decreasing it, of course. Fewer drives are required to store files of a given size, so transfer performance decreases. However, if the controller is optimized to allow it, the requirement for fewer drives allows the drives not needed for a particular access to be used for another one, improving positioning performance.

So what should you use for a stripe size?

The best way to find out is to try different values: empirical evidence is the best for this particular problem. Also, as with most "performance optimizing endeavors", don't overestimate the difference in performance between different stripe sizes; it can be significant, particularly if contrasting values from opposite ends of the spectrum like 4 kiB and 256 kiB, but the difference often isn't all that large between similar values. And if you must have a rule of thumb, I'd say this: transactional environments where you have large numbers of small reads and writes are probably better off with larger stripe sizes (but only to a point); applications where smaller numbers of larger files need to be read quickly will likely prefer smaller stripes. Obviously, if you need to balance these requirements, choose something in the middle.)

http://www.pcguide.com/ref/hdd/perf/raid/concepts/perfStripe-c.html
 
good find dipdude. but after reading both. anandtech said that for large files a bigger cluster is better i.e 1024k according to them, but on the other hand pcguide said this at the end "applications where smaller numbers of larger files need to be read quickly will likely prefer smaller stripes." so both sites are saying something different.

would be nice if anyone can find some benchmarks about raid0 array on the net with diff cluster sizes & the difference in read n write speeds.

@ chaos try out 4k, 64k, 128k & 256k if u can, n some screenshots would be helpfull.
 
I used to use 16k clusters before but right now I'm using 64 k cluster. Did'nt benchie the two but disk fragmentation is lesser now so I suppose performance should be better:) . Could'nt reply sooner as my connection was down since 2 days. Damn BSNL:@
 
Well, the cluster size should depend on the kind of work to be done and how it is done. Other things to be considered is how quanitity of data will be accessed and how many times.

For eg, in a NAS, ppl usually dump large files to free space in thier own workstations. Now, if the NAS is accessed many times, one has to make sure, it is able to keep up with the request. Hence, a cluster size of 512k suits such environment.

Heres the rule

Large Files = Big Cluster Size

Small Files = Small CLuster Size

Frequently Accesssed = Big Cluster Size

Rarely Accessed = Small cluster size

If there is a database server which stores login and profile information, such type of data takes very less space and hence the small cluster size is highly advisable.

For a gaming rig, a cluster size of 128k used to be the best setting some years ago. But the advent of mega sized textures and 5 channel sounds, a bigger cluster size seems more logical. Now, if u need to load a texture of nearly 10 MB (thats the average size of a texture nowadays), a 4 kb cluster will require you to access the harddisk nearly 2500 times. The access time of the hard disk matters here a lot. Here, the access time will kill ur performance rather than the transfer speed.
 
evilution said:
Well, the cluster size should depend on the kind of work to be done and how it is done. Other things to be considered is how quanitity of data will be accessed and how many times.

For eg, in a NAS, ppl usually dump large files to free space in thier own workstations. Now, if the NAS is accessed many times, one has to make sure, it is able to keep up with the request. Hence, a cluster size of 512k suits such environment.

Heres the rule

Large Files = Big Cluster Size
Small Files = Small CLuster Size
Frequently Accesssed = Big Cluster Size
Rarely Accessed = Small cluster size

If there is a database server which stores login and profile information, such type of data takes very less space and hence the small cluster size is highly advisable.

For a gaming rig, a cluster size of 128k used to be the best setting some years ago. But the advent of mega sized textures and 5 channel sounds, a bigger cluster size seems more logical. Now, if u need to load a texture of nearly 10 MB (thats the average size of a texture nowadays), a 4 kb cluster will require you to access the harddisk nearly 2500 times. The access time of the hard disk matters here a lot. Here, the access time will kill ur performance rather than the transfer speed.

????

he was asking about what cluster size to use for creating RAID-0 arrays.....

not about the cluster size of creating partitions. I think you got them mixed up somehow.

But yes, what you said ^^ above stands correct in its own right.

Regards,

Karan
 
Status
Not open for further replies.