Guide Study: RAM Timings and their Effects

Study: RAM Timings and their Effects

Hello and welcome to my humble attempt at clearing some of the paranoia and confusion that shrouds matters such as RAM Timings, Timings vs Mhz, etc, etc.


The setup being used:

Opteron 165 "CCBWE 0551 UPMW"
DFI Ultra-D AD0 w/ 704-2bta
Transcend Samsung UCCC "Random Retail Part"
Windowns XP SP2 (w/ any tweaks)


What I will be doing?

I will start with my 24x7 OC setup and proceed from there. I will gradually CHANGE one RAM Timing (tRP, tRAS, tRRD, tRC, tRFC, tREF, etc.) and benchmark "3" timings to get a average view of how the timing affects the benchmarks.

The program being used for benchmarking is "Everest Ultimate Edition" The "Cache and Memory Benchmark" tool inside it makes it easier for me to get as many combinations tested as possible.

Version of program used: v2.80.534


How I did it?

First of all to remove as many variables from the equation, I did a "End Process Tree" on Explorer.exe. This ensured that the CPU was free of any processor intensive applications. Then I manually started the differenent programs (A64Info.exe and everest.exe in my case). I set "everest.bin"'s priority to High (realtime is detrimental to benchmarking) and then set A64Info.exe to Low.

I did this to ensure no interference from the A64Info.exe process (which runs a auto-refresh cycle every 2 seconds to update system parameters.)

Then I proceeded to set the timings one-by-one using A64Info.exe then RAN the benchmarks 6-7 times untill I got 3 "close" readings (concordant.) Those 3 benchmarks (per timing change) have been reproduced here as evidence.


Links to Software Used

Everest UE: http://www.lavalys.com/products/overview.php?pid=3&lang=en
A64Info: http://www.xtremesystems.org/forums/showthread.php?t=96678

PS: Thanks to Anish for telling me to link to the software used.


Hmm, wheres the source?

I have mentioned 2 references (DFI street, and a DDR article), but the research, benchmarking, overclock were mine. Hence no source article :P (thanks Justin for telling me to mention this.)


[break=The Base Overclock]

The Base Overclock


This is the base overclock against which I will be testing the various changes in RAM timings.




Benchmarks at the base (optimal) settings:


[break=Playing with tRAS]

Playing with tRAS

Definition:

tRAS: This BIOS feature controls the memory bank's minimum row active time (tRAS). This constitutes the time when a row is activated until the time the same row can be deactivated. If the tRAS period is too long, it can reduce performance by unnecessarily delaying the deactivation of active rows. Reducing the tRAS period allows the active row to be deactivated earlier. However, if the tRAS period is too short, there may not be enough time to complete a burst transfer. This reduces performance and data may be lost or corrupted. For optimal performance, use the lowest value you can. Usually, this should be CAS latency + tRCD + 2 clock cycles. For example, if you set the CAS latency to 2 clock cycles and the tRCD to 3 clock cycles, the optimum tRAS value would be 7 clock cycles. But if you start getting memory errors or system crashes, increase the tRAS value one clock cycle at a time until your system becomes stable.


Benchmarks at the base (optimal) settings:

tRAS = 1

tRAS = 2

tRAS = 3

tRAS = 4

tRAS = 5

tRAS = 6

tRAS = 7

tRAS = 8

tRAS = 9

tRAS = 10

Deduction:

We see that the tRAS value doesnt affect the benchmarks much. Whatever variations we see are probably due to the variation between the different runs of the benchmark rather than being affected by the changing tRAS value. So it isnt entirely necessary that achieving a tRAS of 0 will give u a big boost in scores. Infact, it is entirely possible that the optimal tRAS value is much higher. For example, when I had my tRAS set to 9, I noticed a dip of the latency field to 39.9ns. Will invistage and update later.


[break=Playing with tRP]

Playing with tRP

Definition:

tRP: This BIOS feature specifies the minimum amount of time between successive ACTIVATE commands to the same DDR device. The shorter the delay, the faster the next bank can be activated for read or write operations. However, because row activation requires a lot of current, using a short delay may cause excessive current surges. For desktop PCs, a delay of 2 cycles is recommended as current surges aren't really important. The performance benefit of using the shorter 2 cycles delay is of far greater interest. The shorter delay means every back-to-back bank activation will take one clock cycle less to perform. This improves the DDR device's read and write performance. Switch to 3 cycles only when there are stability problems with the 2 cycles setting.


Benchmarks at the base (optimal) settings:

tRP = 4


Deduction:

We see that tRP has a considerable effect on the benchmarks. The latency increases by a .2 ns and the "Copy" speeds dip below 7000MB/s mark for the first time.


[break=Playing with 1T/2T]
Playing with 1T/2T

Definition:

CPC: This BIOS feature allows you to select the delay between the assertion of the Chip Select signal till the time the memory controller starts sending commands to the memory bank. The lower the value, the sooner the memory controller can send commands out to the activated memory bank. When this feature is enabled, the memory controller will only insert a command delay of one clock cycle or 1T. When this feature is disabled, the memory controller will insert a command delay of two clock cycles or 2T. The Auto option allows the memory controller to use the memory module's SPD value for command delay. If the SDRAM command delay is too long, it can reduce performance by unnecessarily preventing the memory controller from issuing the commands sooner. However, if the SDRAM command delay is too short, the memory controller may not be able to translate the addresses in time and the "bad commands" that result will cause data loss and corruption. It is recommended that you try enabling SDRAM 1T Command for better memory performance. But if you face stability issues, disable this BIOS feature.

Benchmarks at the base (optimal) settings:


CPC (Command Per Clock) = Disabled (= 2T)


Deduction:

This single parameter has a very marked effect on all the benchmarks results. The latency takes a large dip by about 3.7 ns. The write speeds are affected worst as they take a plunge to lowly 5000MB/s mark.


[break=Playing with tWTR]

Playing with tWTR

Definition:

tWTR: This BIOS feature controls the Write Data In to Read Command Delay (tWTR) memory timing. This constitutes the minimum number of clock cycles that must occur between the last valid write operation and the next read command to the same internal bank of the DDR device. The 1 Cycle option naturally offers faster switching from writes to reads and consequently better read performance. The 2 Cycles option reduces read performance but it will improve stability, especially at higher clock speeds. It may also allow the memory chips to run at a higher speed. In other words, increasing this delay may allow you to overclock the memory module higher than is normally possible. It is recommended that you select the 1 Cycle option for better memory read performance if you are using DDR266 or DDR333 memory modules. You can also try using the 1 Cycle option with DDR400 memory modules. But if you face stability issues, revert to the default setting of 2 Cycles.


Benchmarks at the base (optimal) settings:

tWTR = 2


Deduction:

tWTR affects the "Copy" speeds somewhat. They consistently remain below the 7000MB/s mark and thus are affected by the selected tWTR timing.


[break=Playing with tRRD]

Playing with tRRD

Definition:

tRRD: This BIOS feature specifies the minimum amount of time between successive ACTIVATE commands to the same DDR device. The shorter the delay, the faster the next bank can be activated for read or write operations. However, because row activation requires a lot of current, using a short delay may cause excessive current surges. For desktop PCs, a delay of 2 cycles is recommended as current surges aren't really important. The performance benefit of using the shorter 2 cycles delay is of far greater interest. The shorter delay means every back-to-back bank activation will take one clock cycle less to perform. This improves the DDR device's read and write performance. Switch to 3 cycles or higher only when there are stability problems with the 2 cycles setting.


Benchmarks at the base (optimal) settings:

tRRD = 1

tRRD = 4

tRRD = 5


Deduction:

Again, we observer that tRRD doesnt affect the benchmark scores much at all. Whatever variations we see are just differences between the different runs of the benchmark program. So its just not worth it trying to optimize the setting to the lowest possible. Rather, put a value which aids in maximum stability.


[break=Playing with tWR]

Playing with tWR

Definition:

tWR: This BIOS feature controls the Write Recovery Time (tWR) of the memory modules. It specifies the amount of delay (in clock cycles) that must elapse after the completion of a valid write operation, before an active bank can be precharged. This delay is required to guarantee that data in the write buffers can be written to the memory cells before precharge occurs. The shorter the delay, the earlier the bank can be precharged for another read/write operation. This improves performance but runs the risk of corrupting data written to the memory cells. It is recommended that you select 2 Cycles if you are using DDR200 or DDR266 memory modules and 3 Cycles if you are using DDR333 or DDR 400 memory modules. You can try using a shorter delay for better memory performance but if you face stability issues, revert to the specified delay to correct the problem.


Benchmarks at the base (optimal) settings:

tWR = 3


Deduction:

tWR affects the "Copy" bandwidth to a small extent as is visible. But nothing to write home about.


[break=Playing with tRWT]

Playing with tRWT

Definition:

tRWT: When the memory controller receives a write command immediately after a read command, an additional period of delay is normally introduced before the write command is actually initiated. As its name suggests, this BIOS feature allows you to skip (or raise) that delay. This improves the write performance of the memory subsystem. Therefore, it is recommended that you enable this feature for faster read-to-write turn-arounds. However, not all memory modules can work with the tighter read-to-write turn-around. If your memory modules cannot handle the faster turn-around, the data that was written to the memory module may be lost or become corrupted. So, when you face stability issues, disable (or raise the value) of this feature to correct the problem.


Benchmarks at the base (optimal) settings:

tRWT = 4

tRWT = 5

tRWT = 6


Deduction:

Again the "Copy" bandwidth takes a hit. We see it severly degrading as we increase tRWT. When we reach tRWT = 6, the copy bandwitdh has already fallen to 6500MB/s mark.


[break=Playing with ICL]

Playing with ICL (Idle Cycle Limit)

Definition:

ICL: This BIOS setting specifies the number of memclocks before forcibly closing (pre-charging) an open page.†It appears that this setting is the maximum number of tries allowed for a page of memory to be read before arbitration kicks in and forces pre-charge once again for that page.

Benchmarks at the base (optimal) settings:

ICL = 0 cycles

ICL = 4 cycles

ICL = 8 cycles

ICL = 32 cycles

ICL = 64 cycles

ICL = 128 cycles


Deduction:

We see that the optimal value for ICL comes to be 16 in my case. Going too tight on this value increases the latency to as much as 50.8ns. Also going too loose again increases the latency value, but not by that much. So we should strive to arrive at the most optimal value that our RAM can manage, and it will be nearer to 16 cycles if the memory controller (on ur M/C) is anything similar to that of mine.


[break=References]

References

I have used these references in my report, to help the reader gain as much knowledge as possible.

I thank the authors for creating such wonderful articles to aid the enthusiast community.


[break=Conclusion]

Conclusion

It must be obvious to everyone now that there are not magical set of timings which will give optimal results on all Memory Controller+Memory combinations.

Reasons:
  • The memory controller might perform optimally at some timings which could be different to other memory controllers.
  • The memory in question might not be stable (or even POST worthy) at the most optimal timings for the given memory controller. So we must work to go to the most optimal configuration for any combination (i.e. memory and memory controller.)

Keep in mind that more "Mhz" does not just mean more "bandwidth." It means much more. For example, 2-2-2-5 timings @ 200Mhz are not comparable to 2.5-4-4-10 @ 300Mhz because the timings have acquired a new meaning now. Because the clock cycles have become shorter (earlier they were, 1/.2 = 5ns, now at 300Mhz, they are 1/.3 = 3.33ns long) Hence, strive to obtain the tightest timings possible at the highest frequency. Ofcourse this is not something new, but the above benchmarks just prove them cohesively.

I am exhausted now (straight 3 hrs of article writing.) I have another 60 or so benchmarks to document and include in the report, but I will do so at another day when I am able to muster some courage. Till then, I am off to frag noobs :D

I am sorry:
  • for I was unable to show you guys the variance that happens with varying tCL. The RAM I have (Transcend UCCC) doesnt not allow me to play with tCL (CAS) at all. Neither 2.5 nor 3.5 (:P) is stable. So I didnt bother with it much.
  • for the sheer number of benchmarks: Well I like to document my findings. And plus this should serve as evidence for skeptical minds.
  • for the number of typographical and unintentional "information" errors that have passed my eye. Guys please help me correct the errors. ;)
  • for putting this link here. I love green bars :D LINK --> :D

Regards and Aloha,

Karan Misra
 
Awesome Guide Karan !

Reps added ! That must have taken atleast 2-3 hours ;)

Bookmarked btw, will read carefully later and point out all your errors :p
 
Anish said:
Awesome Guide Karan !

Reps added ! That must have taken atleast 2-3 hours ;)

Bookmarked btw, will read carefully later and point out all your errors :p
haha thanks.... yea took almost 4hrs to be precise. thanks again for the appreciation, and pls tell me the errors. i would love to have a error free article.
 
i wonder how my system booted when i changed my twinmos ram settings to 2-2-3-5..........:O i dunno if it was stable it ram fine for 7 hours then for the fear of goin kaput i changed em back to normal.
ill use super pi and see :P.
btw awesome tutorial karan.. reps :)
 
awesome guide .. u must have worked really hard for this .. will go through it later .. repping u now.

EDIT : Cannot rep u as of now as i have to spread some reps. will do so later :)
 
awesome guide karan .... i must say its one of the best i have seen on TE :D

i am surprized only 6 replies tho ....
 
Karan , Gr8 Effort there m8 , Keep comming up with such article.

Sorry m8 cant rep ya
"You must spread some Reputation around before giving it to Karan again."

But one due on u for sure.

Thread Rated.
 
Awesome Awesome Aweseom Guide :D.

Went through it fully today :D.

And karan, since ur asking for errors, theres just a typo I could find on Page 7 ("observer" instead of "observe").

And you used to keep the Tras timings at 0?? :O.

Also, delay of 2 cycles means the numeric value "2" or subtracting 2 from the existing value??Please clarify :).

Thanx for an awesome guide.U got more green now :).
 
Darthcoder said:
Awesome Awesome Aweseom Guide :D.
Went through it fully today :D.
And karan, since ur asking for errors, theres just a typo I could find on Page 7 ("observer" instead of "observe").

And you used to keep the Tras timings at 0?? :O.
Also, delay of 2 cycles means the numeric value "2" or subtracting 2 from the existing value??Please clarify :).
Thanx for an awesome guide.U got more green now :).

heheh.... how did u dig this thread up....... buy anyways thanks....... many such articles are in the pipeline...... just cant churn them up every other day cuz it takes quite a bit of time to ensure the "quality" of the thing..... thnx again.
 
Karan, Please clarify this for me:

"The performance benefit of using the shorter 2 cycles delay is of far greater interest"

here, shorter 2 cycles means subtracting 2 from the current number of cycles??i.e reducing the present number of cycles by 2??

Also, how can you use the tRas as 0??:O, really...is ur system stable at this settings??Experts, please help dis n00b here :).
 
Darthcoder said:
Karan, Please clarify this for me:

here, shorter 2 cycles means subtracting 2 from the current number of cycles??i.e reducing the present number of cycles by 2??

Also, how can you use the tRas as 0??:O, really...is ur system stable at this settings??Experts, please help dis n00b here :).

tRas is not a very sensitive setting....... my system was SuperPi32M stable even @ tRas = 0..... surely if the ram was stressed it would perform unoptimally @ tRas = 0, but it was just a good feeling to have the tRas = 0...... i keep tRas = 9 for 24x7... or atleast i kept. now the rig is with akshay.... a friend of mine.
 
Back
Top