AMD Dual Opteron 4180 + Asus KCMA-D8 Review
Introduction
To begin with - this is not the review of a server’s performance in enterprise applications. This review in fact considers the machine as a high-end desktop rig rather than a server.
Here, we have a dual AMD Opteron based (AMD QuadFX-like) system also known as the AMD Lisbon Server platform with us – we shall be taking it through a bunch of tests and comparing it to an Intel Gulftown based system; now before we do that, let’s turn the time machine to July of 2006.
July 2006
AMD and Intel both have been trading blows for quite a while now, although since the introduction of AMD’s Kryptonite10 architecture; AMD has for the most part has been pushed into the low and mid end CPU arena while Intel has conveniently dominated the higher end with their enthusiast class Extreme Edition CPU’s.
Although this was not the case always – AMD, after the introduction of their K8 architecture based Opteron’s in April 2003 and Athlon64 & Athlon64 FX following shortly were dominating Intel in the CPU performance arena. The Athlon64 and Athlon64 FX were a big hit then with new features like AMD64/x86-64, Hypertransport Technology (HTT) and an Integrated Memory Controller (IMC) which Intel did not have at the time. Intel CPU’s at the time were simply slower, offered less performance-per-watt and lacked support for 64 bit vs. AMD Athlon64 and Athlon64 FX CPU’s.
This forced Intel to license and implement the AMD64 Instruction set in their processors as their 64bit strategy had crashed & burned on all possible fronts with their HP/Intel Itanium processors which implemented Intel’s IA64 Instruction Set. Following this, the first Intel processors with AMD’s 64bit instruction set still wouldn’t be out until next year, in February of 2004. The Integrated Memory Controller (IMC) & Quick Path Interconnect (Intel implemented QPI in place of HTT.) would not come to Intel CPU’s till Nov, 14 2008 either. In 2007 Intel bought the rights to DEC Alpha 21x64 designs which are rumored to be the reference for Intel Nehalem’s IMC.
Interestingly, the CPU design team-lead for AMD K8 architecture was also the co-architect for DEC Alpha 21x64, Mr. Derrick R. Meyer; he later went on to become the President & CEO of AMD.
On July 27, 2006 all of this changed with the launch of Intel’s new “Core 2†architecture. That was the day Intel took the crown for CPU performance from AMD and AMD has not been able to catch-up since!! The new CPU’s from Intel delivered better performance and lower power consumption while AMD was still selling their older K8 based CPU’s.
At that time the next upgrade from AMD, the K10 architecture was still sometime away. In another blow to AMD, following the launch of Intel “Core 2â€, Intel also launched their first Quad Core x86-processor – The Intel Core 2 Extreme QX6700 later in the year, in November.
At that time AMD had no plans to launch a Quad Core x86 processor anytime soon although they also wanted to refrain from losing the huge army of fan following they had, so in order to stay competitive till their new architecture (AMD K10/AMD Phenom) and the quad core processors, arrived – They launched the AMD 4x4 platform right behind the Intel Core 2 Extreme QX6700, AMD 4x4 later got renamed to the AMD QuadFX platform!
The AMD QuadFX platform consisted of a dual processor desktop board and two Athlon FX dual core CPU’s to make for a total of 4(quad) cores in two CPU’s. AMD QuadFX platform enabled AMD to compete with Intel’s quad core offering at the time.
Here is a how an AMD QuadFX platform looked like in 2006 –
(AMD Quad FX – Dual Athlon 64 FX on a Socket F Asus L1N64-SLI WS.)
The successor to the QuadFX platform would have been the AMD FASN8 (First AMD Silicon Next-gen 8-core) platform, which was supposed to accommodate two AMD Phenom Quad Cores in Socket F boards to make for an 8core platform, FASN8 was cancelled later-on, and the successor to the FASN8 platform would have been the dual AMD Opteron rig that is being reviewed today!
Back to Now...
As we all know, AMD Kryptonte10 architecture failed to take the technology lead back from Intel and later with the release of Intel’s Nehalem architecture, AMD was made irrelevant altogether. Lately AMD’s K10.5/Phenom II’s had been holding it off till the arrival of Bulldozer Core – AMD’s much hyped CPU architecture which was released 30months behind the original schedule which proved to be a disappointing failure.
I won’t be talking about Bulldozer here but... but was AMD NOT-COMPETITIVE in the enthusiast space with K10.5??
The answer is, it could have been competitive in the enthusiast CPU space if it had continued to pitch its physical cores vs. Intel’s logical cores!!
Here is why…
While Intel strongly believed in Hyper-threading technology and bottlenecking a CPU core to churn out that 15% - 30% higher performance, AMD didn’t. AMD believed in providing threads with dedicated cores and that is exactly what we saw happening in the market – Athlon II X4’s with 4 physical cores were competing with Nehalem/Westmere based Core i3’s with 2Physical & 2Virtual cores, and were beating them at a lower price point.
Although as we move higher up the order Intel had their Core i7 Quad Core CPU’s which supported hyper-threading and had a total of 8 logical (4Physical+4Virtual) CPU cores but AMD has had nothing for crunching 8 threads(if-partly), at a time. The Bloomfield Core i7’s from Intel were countered by AMD to some extent with their Six Core Phenom processors, which traded blows with them at a higher clock speed. Moving higher up still, Intel had the Core i7 980X. The undisputed champion of processors, a six core monstrosity, with hyper-threading enabled to give a total of 12 logical cores. AMD just couldn’t compete in this space as its own six core processors could only fend off Bloomfield Quad Core’s from Intel with 8 logical cores!!
Now, AMD did not have any processors capable of crunching 8 threads to compete against the Bloomfield Core i7’s and it certainly did NOT have anything capable of crunching 12 threads to compete with the Gulftown Core i7’s. This led to Intel beating it in the enthusiast CPU space and pushing it into the low-mid end “value-for-money†CPU arena, to stay competitive. So while AMD claimed that having physical cores is better (and they were right) than having logical cores, they had no products in their line-up to compete against Intel CPU’s with 8 and 12 logical cores.
Things with Sandybridge have only gotten better for Intel with AMD being pushed further down the order. Now AMD Athlon II X4 competes vs. Intel Pentium G860 in place of Core i3 5xx, if you are to believe tom’s hardware.
I hope this clears the point behind comparing a 12 (physical) core AMD rig to a 12 (logical) core Intel rig. Although, I wonder why AMD killed the FASN8 platform?! If that platform were here today – AMD would have been much more competitive vs. Intel, competing even in the enthusiast CPU-space!!!
Anyways, moving on with the Review – If there ever would have been an AMD 12 x 12 platforms the successor to AMD FASN8, It would have been this –
The AMD Lisbon Platform
The AMD “Lisbon†platform consists of up to 2 AMD Opteron 4000 processors paired with an AMD SR56x0 Northbridge (NB) which is connected to AMD SP5100 Southbridge (SB). This platform pretty much the same as an AMD QuadFX platform. The processors go into a Socket C32 from AMD which is exactly the same as the Socket F used for AMD QuadFX platform – both are FC LGA 1207 pin sockets.
As with most AMD sockets, Bulldozer based Opteron 4200 series is drop in compatible with the current C32 socket for Lisbon platform.
There are 3 NB configurations available for the platform –
1. SR5650 – 42 PCIe Lanes / 11 Engines – Similar to AMD 990FX desktop chipset.
2. SR5670 – 30 PCIe Lanes / 9 Engines – Similar to AMD 990X desktop chipset.
3. SR5690 – 22 PCIe Lanes / 8 Engines – Similar to AMD 970 desktop chipset.
Platform Operating System Support –
Windows Vista, Windows® XP and XPe, Windows 7, Embedded Linux, Windows Server 2003 and
Windows Server 2008.
The Processors – 2x AMD Opteron 4180
The Opteron 4180 Configuration – The AMD Opteron 4180 has a core clock speed of 2600MHz, IMC clocks in at 2200MHz and the HT Links are running at 3200MHz – effectively 6400MHz as HT is a DDR bus - this is quite good from a server’s point of view. The processor is a 6-core part with ~10MB of cache and a TDP of 95W at a Vcore of 1.26V.
The CPU is basically an underclocked & undervolted Thuban six core also known as a Phenom II X6, of course being an Opteron part it has a few extra features specific to servers that are not included in the Phenom II series but other than that it’s the same as a Phenom II X6.
From a server point of view - The Opteron 4100 series is designed to provide the highest performance-per-watt and a good core count at a very low cost. Hence Virtualization, Rendering etc are the kind of workloads you might want to throw, at these processors.
Although from a desktop point of view its clocks are a little lower than you would want them to be but still acceptable. Also there is an Opteron 4184 model available which is a 6-core part which offers the same clock speeds as a desktop Phenom II X6 1055T - 2.8GHz, if you need higher clocks!!
There are also a bunch of 42xx models available based on the Bulldozer architecture they are available as 4-core, 6-core and 8-core parts and of course they clock higher and have more cache than the K10.5 based 4100 series Opteron’s, although I think it’s better to stick with Opteron 4100’s as they have a higher IPC (Instruction per clock).
The Flip-Chip LGA 1207 socket C32 for AMD processors –
The Motherboard – Asus KCMA-D8
The Package –
The Asus board is Rock Solid!
6 SATA Cables, a back plate, 2 Temperature Sensors, a manual, driver CD were all included with the motherboard.
The Asus KCMA-D8 is a dual processor socket C32 motherboard and has an AMD SR5670 NB paired with an AMD SP5100 SB. In other words – Dual GPU Crossfire X in x16+x8 is supported; this is in line with a saner configuration. If you find this limiting you in any way you can go for a TYAN S8225 motherboard with 2 x 5690 NB’s. That monster of a board will support 4 GPU Crossfire X with all the PCIe x16 lanes running in x16 mode!!
The board is an ATX form-factor motherboard so compatibility with desktop cabinets is insured. The KCMA-D8 is also compliant with ATX power supplies, so you don’t have to get some typical and costly only-meant-for-servers solution in the name of a PSU!!
(Another 12V ATX power connector would’ve meant more power to the CPU’s and more stable operation under full load but the CPU’s are 75W SKU’s anyway so it wouldn’t make much of a difference.)
The board provided a 4+1 phase power to each CPU, this could have been 8+2phase for each CPU, but at this point you can’t really complain too much, Asus has already done a fantastic job of accommodating two processors in an ATX form factor, you can’t really expect more!!
The BIOS –
The BIOS of the KCMA-D8 is OK, it had all the necessary options, not too much to play with but not too little either. All the necessary memory options there need to be for a ccNUMA based system were there and as the KCMA-D8 is technically a server board, so no options for overclocking.
Here are a few snaps –
Asus Boot time logo.
BIOS CPU Configuration.
ccNUMA options – During the complete review Bank & Channel Interleaving were Enabled while Node Interleaving was Disabled. Bank Swizzle was also enabled but it did not make any difference as only 4 sticks of memory were used – 1stick per channel.
Motherboard hardware monitoring.
The board also supports Promise software RAID for - RAID 0, 1, 5, 10, JBOD modes.
Overall the board was pretty good – all components were of good quality, it was fun & easy to work with too; the only downside to the board was that it didn’t support SATA 3.0 or USB 3.0 and the power FET’s were not passively cooled.
One fan blows into the other but CPU has 75W ACP, so not a big issue – both CPU’s were below 40 deg C under full load. There are 6 SATA ports (black & red) & 8 SAS ports (blue & red) on the board. The 1st and 3rd lanes are the PCIe x16 2.0 lanes. The layout of the board is such as to accommodate 2 dual slot GPU’s
The board also had an Integrated Graphic Processor – ASpeed AST2050. It had 8MB of video RAM as specified in the manual although on the board I was able to find a 512 MB DDR2 800MHz ram chip between the IGP & the ASMB4-iKVM module socket.
The Winbond chip is the 512MB of DDR2 memory. No specific details were given about ASMB4-iKVM or the memory module to figure out why was there a 512MB chip when IGP was using only 8MB and ASMB4 module wasn’t plugged in?!
Anyway…
A very interesting piece of trivia – as it technically is a server platform you might find yourself in a situation where you might need a server CPU HSF compatible with socket C32, now the good news here is that you don’t have to necessarily go in for a Supermicro/any other server HSF which also might have availability issues in India and also would cost quite a bit more than you would like to spend; you can basically go for any AM3 socket compatible HSF with any C32 motherboard as the mounting pitch for both the socket is exactly the same!!!
I used a pair of CM Hyper 212+ HSF for the system.
Also, a piece of advice, if you ever happen to come across this board – However shiny and inviting the chokes appear – DO NOT TOUCH THEM!!! AND CERTAINLY NOT AFTER THE SYSTEM HAS BEEN UP FOR A HALF HOUR!
The Test System & Benchmarks –
Processors:
2x AMD Opteron 4180
Motherboard:
Asus KCMA-D8
Graphics Card:
Asus ATI Radeon HD 4770 512MB
Memory:
4x 4096MB 1333MHz 9-9-9-24-33 1T Transcend RDIMM’s with ECC.
Hard Drives:
4x 1TB Western Digital Green w/ 64MB cache in RAID 10.
Power Supply:
Seasonic S12 750W 80+ Bronze
Monitor:
AOC 2036Sa
Operating System:
Windows 7 64bit
Software Benchmark:
• 7-Zip
• Blender
• Cinebench R10
• Cinebench 11.5
• SuperPi
• wPrime
• UC Bench 2011
• Geekbench
• Aida64
• SiSoft Sandra
• Resident Evil 5 DX10
• TM Nations
• Devil May Cry 4
NOTE – All the figures for Intel Core i7 990X have been taken from reputed hardware review sites, like tom’shardware.com, anandtech.com, overclockersclub.com, etc.. Intel Core i7 980X figures are published here only for the convenience of the reader! I did not have any Core i7 980X based system at the time of testing.
Also, all the benchmark settings for the benchmarks & Software versions of the benching software’s are similar to that of the Core i7 980X review from toms’hardware.com. Though some of the software’s in this benchmark were not present in the tom’shardware.com review. The settings & scores for those are matched with Core i7 980X review from some equally credible review website, like overclocker’sclub.com etc.
The AIDA64 Benchmark Suite Compares the test system here with a Core i7 990X whose benchmarks were already inbuilt with the software at the time of benchmarking. Hence you will notice that i7 980X @3.46GHz is compared to the test system, in all AIDA64 benchmarks, which is essentially is the Core i7 990X.
The Benchmarks –
7-Zip Benchmark – Compression and decompression rates in MB/s. 7-Zip is everyday use compression software used to compress files.
Blender Benchmark – Blender is a free and open source rendering and compositing software for 3D modeling. Blender is a multithreaded benchmark and is very efficient at utilizing CPU resources.
Interesting to note – A recent review of the core Intel i7 Sandy Bridge-E from tom’shardware.com reveals that the Blender benchmark is completed by a Core i7 3960X @ 3.3GHz in 42 seconds,Core i7 3820K @ 4.625GHz in 38 seconds and a Core i7 3930K @ 4.5GHz in 34 seconds.
Dual Opteron Systems seem to perform very nicely in rendering applications even after their lower clock speeds.
Cinebench R10 Benchmark – Cinebench is another rendering test suite from Maxon software for stress testing CPU and GPU compute capabilities of the processors installed.
Despite the 733MHz lower clock speed, the Opteron based machine is matching the performance of an i7 980X.
Cinebench R11.5 Benchmark – Cinebench 11.5 is the upgrade to the Cinebench R10 benchmark from Maxon software. It provide more tighter integration and more intense rendering test ensuring to stress test the CPU in render compute capabilities.
Maxon seems to have made some optimizations to the CB R11.5 benchmarks for AMD, as this benchmark favors the AMD system by a little margin.
wPrime 32m & 1024m Benchmark – wPrime is a leading multithreaded benchmark for x86 processors that tests your processor performance by calculating square roots with a recursive call of Newton's method for estimating functions.
The AMD system gets the better of the Intel system despite running at significantly lower clocks and having a lower IPC too. This clearly shows why AMD system makes more sense than a Core i7 980X. Despite of lower IPC and clocks, you get nearly the same performance from both the systems while AMD system being $600 cheaper than the Intel system.
UC Bench 2011 Benchmark – It simulates brute force searching of the archive password (password length is 6 chars). It utilizes all available cores, and supports SSE2/SSSE3/SSE4.1.
Geekbench 2.0 Benchmark – Geekbench 2.0 is a benchmark that tests CPU and memory performance in an easy-to-use tool. The measure used for comparison is the total suite average score.
The Geekbench 2.0 benchmark suit is made up of multiple benchmarks which are both single & multi-threaded. Due to its 733MHz or 22% lower clock speed the AMD system falls about 12% behind
the i7 980X. Although it seems like with similar clocks the AMD system would’ve bettered the Intel System slightly.
AIDA64 Benchmark’s Suite
AIDA64 Memory Bandwidth – Memory bandwidth benchmarks (Memory Read, Memory Write, and Memory Copy) measure the maximum achievable memory data transfer bandwidth.
AIDA64 Memory Latency – The AIDA64 Memory Latency benchmark measures the typical delay when the CPU reads data from system memory. Memory latency time means the penalty mea
sured from the issuing of the read command until the data arrives to the integer registers of the CPU.
AIDA64 CPU Queen Benchmark – This simple integer benchmark focuses on the branch prediction capabilities and the misprediction penalties of the CPU. It finds the solutions for the classic "Queens problem" on a 10 by 10 sized chessboard. At the same clock speed theoretically the processor with the shorter pipeline and smaller misprediction penalties will attain higher benchmark scores.
AIDA64 CPU Photoworxx – This benchmark performs different common tasks used during digital photo processing like, Fill, Flip, Crop, Difference, Color to Black & White etc on a very large image.
AIDA64 CPU Z-Lib – This integer benchmark measures combined CPU and memory subsystem performance through the public Z-Lib compression library.
AIDA64 CPU Hash – This benchmark measures CPU performance using the SHA1 hashing algorithm.
AIDA64 CPU FPU VP8 – This benchmark measures video compression performance using the Google VP8 (WebM) video codec Version 0.9.5. FPU VP8 test encodes 1280x720 pixel ("HD ready") resolution video frames in 1-pass mode at 8192 kbps bit rate with best quality settings.
AIDA64 CPU FPU Julia – This benchmark measures the single precision (also known as 32-bit) floating-point performance through the computation of several frames of the popular "Julia" fractal.
AIDA64 CPU Mandel – This benchmark measures the double precision (also known as 64-bit) floating-point performance through the computation of several frames of the popular "Mandelbrot" fractal.
AIDA64 CPU FPU SinJulia – This benchmark measures the extended precision (also known as 80-bit) floating-point performance through the computation of a single frame of a modified "Julia" fractal.
SISOFT SANDRA Benchmarks Suite
SANDRA Processor Arithmetic Benchmark – A synthetic test, benchmarking the raw performance potential of an integer core (Dhrystone) & the floating point units (Whetstone) in a CPU.
SANDRA Processor Multimedia Benchmark
SANDRA Processor Cryptography Benchmark – Benchmarking the cryptography performance of a processor in AES256 & SHA256. AMD based system lags behind in the AES due to the lack of an AES instruction set while Intel Core i7 980X supports the AES-NI for better cryptography performance.
SANDRA Memory Bandwidth Benchmark – This benchmark has been tough on AMD’s desktop based processors, with Intel first using a triple channel memory controller in its X58 platform & now a quad channel in its X79 platform.
The dual Opteron system here has a Quad Channel Memory Controller, 2 channels per CPU. It comfortably outperforms the Core i7 980X in the benchmark with a considerable margin at a 22% lower clock speed. This is a great result for the AMD based system.
SANDRA Cache & Memory Benchmark – The I/O performance of the system is exceptionally good. It again beats the Intel system in Cache & Memory Benchmark.
Speed factor –
Memory Latency –
Physical Drive & Random Access Benchmark –
AMD System had the motherboard fake RAID solution in RAID mode 10 enabled with 4x 64MB buffer, WD green 1 TB HDD’s leading the Intel system with more than 2x the speed and lower random access times.
Gaming Benchmarks
Resident Evil 5 DirectX 10.0 Benchmark – At 1360 x 760 resolution and 32-bit color depth, with 4x MSAA the system delivered a stable 60+ fps throughout the game.
As Resident Evil 5 is an 8 threaded game, it’s more likely that the HD 4770 512MB ran into its processing or memory limit due to a small frame buffer rather than the processor performance limiting the higher fps.
TM Nations Forever Benchmark – At 1366x768 resolution and 32bit color depth with settings at Very high Quality preset(8X MSAA and 8X AF) the system was able to deliver an average 39.2fps. Although, ~40fps is a very playable frame rate, the CPU usage graph below suggests that the dual threaded game was bottlenecked by the low IPC and clock speeds of the CPU.
Devil May Cry 4 Benchmark – At 1600x900 resolution, 32bit color depth & 4xAA the game ran smooth 60+ fps for the most part. The CPU was not the bottlenecking factor here, the GPU was, probably due to its little frame buffer. Overall playable fps with a decent GPU at limited clocks of the CPU is all in all a good result for AMD!
Conclusion
You can buy 2 Opteron 4180 CPU’s for $420, an Asus KCMA-D8 for $290 and 4x4GB sticks of compatible RAM for another $120 on newegg.com, compare this to $1000 for an Intel Core i7 980X + $200 for a decent X58 motherboard and another $90 for 3x4GB of RAM. You get the AMD system for about $500 less and which gives you the same performance in multithreaded applications.
The only downside of it is its single threaded performance – Although Thuban cores are known to reach 4GHz easily so, Overclocking the system can be an option here. An overclocked Phenom II X6 can definitely give an AMD FX the run for its money – same metric should apply here too!! If it would be possible to overclock this system in anyway then it should give even the Sandy Bridge-E CPU’s a hard time.
Another plus point with the platform is that the regular desktop ram is also supported so you don’t have to spend extra on server specific DIMMs that also decrease the PC performance!! All in all this is a great platform for anybody who is looking for a decent workstation. If overclocked it will prove to be an excellent gaming rig too with lots of options!!!
The lack of overclocking options in the BIOS was expected as it is a server board and it surely makes it a little difficult for overclocking if at all possible. I would also have liked to see heat sink on the power FET’s but the board lacked them.
Overall the platform is very decent, performance is right up there with Core i7 980X, I wonder why AMD is not focusing on this in its desktop line as it gives AMD a fighting chance with Intel in the enthusiast market as this would perform better than Intel in all multithreaded apps and would give a decent if not better(than Intel) single threaded performance with slightly better- desktop-like CPU clocks!!! I would say next time you’re in the market for a high end rig – Keep this platform right on top of your list!!!
Pros: -
1. Excellent Upgrade path.
2. All components of very high quality.
3. Support for 1333MHz desktop DIMMs.
4. Enthusiast class performance for much less.
5. 4 GPU x16+ x16+ x16+x16 Crossfire X.
6. Great value for money.
Cons -
1. Lack of Overclocking /Desktop features in motherboard BIOS.
2. Lack of heat sink on power FET’s on the particular Asus board.
3. Low clock speeds result in bad single threaded performance.
P.S. Thanks to ico[TDF] for helping with this review!
Introduction
To begin with - this is not the review of a server’s performance in enterprise applications. This review in fact considers the machine as a high-end desktop rig rather than a server.
Here, we have a dual AMD Opteron based (AMD QuadFX-like) system also known as the AMD Lisbon Server platform with us – we shall be taking it through a bunch of tests and comparing it to an Intel Gulftown based system; now before we do that, let’s turn the time machine to July of 2006.
July 2006
AMD and Intel both have been trading blows for quite a while now, although since the introduction of AMD’s Kryptonite10 architecture; AMD has for the most part has been pushed into the low and mid end CPU arena while Intel has conveniently dominated the higher end with their enthusiast class Extreme Edition CPU’s.
Although this was not the case always – AMD, after the introduction of their K8 architecture based Opteron’s in April 2003 and Athlon64 & Athlon64 FX following shortly were dominating Intel in the CPU performance arena. The Athlon64 and Athlon64 FX were a big hit then with new features like AMD64/x86-64, Hypertransport Technology (HTT) and an Integrated Memory Controller (IMC) which Intel did not have at the time. Intel CPU’s at the time were simply slower, offered less performance-per-watt and lacked support for 64 bit vs. AMD Athlon64 and Athlon64 FX CPU’s.
This forced Intel to license and implement the AMD64 Instruction set in their processors as their 64bit strategy had crashed & burned on all possible fronts with their HP/Intel Itanium processors which implemented Intel’s IA64 Instruction Set. Following this, the first Intel processors with AMD’s 64bit instruction set still wouldn’t be out until next year, in February of 2004. The Integrated Memory Controller (IMC) & Quick Path Interconnect (Intel implemented QPI in place of HTT.) would not come to Intel CPU’s till Nov, 14 2008 either. In 2007 Intel bought the rights to DEC Alpha 21x64 designs which are rumored to be the reference for Intel Nehalem’s IMC.
Interestingly, the CPU design team-lead for AMD K8 architecture was also the co-architect for DEC Alpha 21x64, Mr. Derrick R. Meyer; he later went on to become the President & CEO of AMD.
On July 27, 2006 all of this changed with the launch of Intel’s new “Core 2†architecture. That was the day Intel took the crown for CPU performance from AMD and AMD has not been able to catch-up since!! The new CPU’s from Intel delivered better performance and lower power consumption while AMD was still selling their older K8 based CPU’s.
At that time the next upgrade from AMD, the K10 architecture was still sometime away. In another blow to AMD, following the launch of Intel “Core 2â€, Intel also launched their first Quad Core x86-processor – The Intel Core 2 Extreme QX6700 later in the year, in November.
At that time AMD had no plans to launch a Quad Core x86 processor anytime soon although they also wanted to refrain from losing the huge army of fan following they had, so in order to stay competitive till their new architecture (AMD K10/AMD Phenom) and the quad core processors, arrived – They launched the AMD 4x4 platform right behind the Intel Core 2 Extreme QX6700, AMD 4x4 later got renamed to the AMD QuadFX platform!
The AMD QuadFX platform consisted of a dual processor desktop board and two Athlon FX dual core CPU’s to make for a total of 4(quad) cores in two CPU’s. AMD QuadFX platform enabled AMD to compete with Intel’s quad core offering at the time.
Here is a how an AMD QuadFX platform looked like in 2006 –
(AMD Quad FX – Dual Athlon 64 FX on a Socket F Asus L1N64-SLI WS.)
The successor to the QuadFX platform would have been the AMD FASN8 (First AMD Silicon Next-gen 8-core) platform, which was supposed to accommodate two AMD Phenom Quad Cores in Socket F boards to make for an 8core platform, FASN8 was cancelled later-on, and the successor to the FASN8 platform would have been the dual AMD Opteron rig that is being reviewed today!
Back to Now...
As we all know, AMD Kryptonte10 architecture failed to take the technology lead back from Intel and later with the release of Intel’s Nehalem architecture, AMD was made irrelevant altogether. Lately AMD’s K10.5/Phenom II’s had been holding it off till the arrival of Bulldozer Core – AMD’s much hyped CPU architecture which was released 30months behind the original schedule which proved to be a disappointing failure.
I won’t be talking about Bulldozer here but... but was AMD NOT-COMPETITIVE in the enthusiast space with K10.5??
The answer is, it could have been competitive in the enthusiast CPU space if it had continued to pitch its physical cores vs. Intel’s logical cores!!
Here is why…
While Intel strongly believed in Hyper-threading technology and bottlenecking a CPU core to churn out that 15% - 30% higher performance, AMD didn’t. AMD believed in providing threads with dedicated cores and that is exactly what we saw happening in the market – Athlon II X4’s with 4 physical cores were competing with Nehalem/Westmere based Core i3’s with 2Physical & 2Virtual cores, and were beating them at a lower price point.
Although as we move higher up the order Intel had their Core i7 Quad Core CPU’s which supported hyper-threading and had a total of 8 logical (4Physical+4Virtual) CPU cores but AMD has had nothing for crunching 8 threads(if-partly), at a time. The Bloomfield Core i7’s from Intel were countered by AMD to some extent with their Six Core Phenom processors, which traded blows with them at a higher clock speed. Moving higher up still, Intel had the Core i7 980X. The undisputed champion of processors, a six core monstrosity, with hyper-threading enabled to give a total of 12 logical cores. AMD just couldn’t compete in this space as its own six core processors could only fend off Bloomfield Quad Core’s from Intel with 8 logical cores!!
Now, AMD did not have any processors capable of crunching 8 threads to compete against the Bloomfield Core i7’s and it certainly did NOT have anything capable of crunching 12 threads to compete with the Gulftown Core i7’s. This led to Intel beating it in the enthusiast CPU space and pushing it into the low-mid end “value-for-money†CPU arena, to stay competitive. So while AMD claimed that having physical cores is better (and they were right) than having logical cores, they had no products in their line-up to compete against Intel CPU’s with 8 and 12 logical cores.
Things with Sandybridge have only gotten better for Intel with AMD being pushed further down the order. Now AMD Athlon II X4 competes vs. Intel Pentium G860 in place of Core i3 5xx, if you are to believe tom’s hardware.
I hope this clears the point behind comparing a 12 (physical) core AMD rig to a 12 (logical) core Intel rig. Although, I wonder why AMD killed the FASN8 platform?! If that platform were here today – AMD would have been much more competitive vs. Intel, competing even in the enthusiast CPU-space!!!
Anyways, moving on with the Review – If there ever would have been an AMD 12 x 12 platforms the successor to AMD FASN8, It would have been this –
The AMD Lisbon Platform
The AMD “Lisbon†platform consists of up to 2 AMD Opteron 4000 processors paired with an AMD SR56x0 Northbridge (NB) which is connected to AMD SP5100 Southbridge (SB). This platform pretty much the same as an AMD QuadFX platform. The processors go into a Socket C32 from AMD which is exactly the same as the Socket F used for AMD QuadFX platform – both are FC LGA 1207 pin sockets.
As with most AMD sockets, Bulldozer based Opteron 4200 series is drop in compatible with the current C32 socket for Lisbon platform.
There are 3 NB configurations available for the platform –
1. SR5650 – 42 PCIe Lanes / 11 Engines – Similar to AMD 990FX desktop chipset.
2. SR5670 – 30 PCIe Lanes / 9 Engines – Similar to AMD 990X desktop chipset.
3. SR5690 – 22 PCIe Lanes / 8 Engines – Similar to AMD 970 desktop chipset.
Platform Operating System Support –
Windows Vista, Windows® XP and XPe, Windows 7, Embedded Linux, Windows Server 2003 and
Windows Server 2008.
The Processors – 2x AMD Opteron 4180
The Opteron 4180 Configuration – The AMD Opteron 4180 has a core clock speed of 2600MHz, IMC clocks in at 2200MHz and the HT Links are running at 3200MHz – effectively 6400MHz as HT is a DDR bus - this is quite good from a server’s point of view. The processor is a 6-core part with ~10MB of cache and a TDP of 95W at a Vcore of 1.26V.
The CPU is basically an underclocked & undervolted Thuban six core also known as a Phenom II X6, of course being an Opteron part it has a few extra features specific to servers that are not included in the Phenom II series but other than that it’s the same as a Phenom II X6.
From a server point of view - The Opteron 4100 series is designed to provide the highest performance-per-watt and a good core count at a very low cost. Hence Virtualization, Rendering etc are the kind of workloads you might want to throw, at these processors.
Although from a desktop point of view its clocks are a little lower than you would want them to be but still acceptable. Also there is an Opteron 4184 model available which is a 6-core part which offers the same clock speeds as a desktop Phenom II X6 1055T - 2.8GHz, if you need higher clocks!!
There are also a bunch of 42xx models available based on the Bulldozer architecture they are available as 4-core, 6-core and 8-core parts and of course they clock higher and have more cache than the K10.5 based 4100 series Opteron’s, although I think it’s better to stick with Opteron 4100’s as they have a higher IPC (Instruction per clock).
The Flip-Chip LGA 1207 socket C32 for AMD processors –
The Motherboard – Asus KCMA-D8
The Package –
The Asus board is Rock Solid!
6 SATA Cables, a back plate, 2 Temperature Sensors, a manual, driver CD were all included with the motherboard.
The Asus KCMA-D8 is a dual processor socket C32 motherboard and has an AMD SR5670 NB paired with an AMD SP5100 SB. In other words – Dual GPU Crossfire X in x16+x8 is supported; this is in line with a saner configuration. If you find this limiting you in any way you can go for a TYAN S8225 motherboard with 2 x 5690 NB’s. That monster of a board will support 4 GPU Crossfire X with all the PCIe x16 lanes running in x16 mode!!
The board is an ATX form-factor motherboard so compatibility with desktop cabinets is insured. The KCMA-D8 is also compliant with ATX power supplies, so you don’t have to get some typical and costly only-meant-for-servers solution in the name of a PSU!!
(Another 12V ATX power connector would’ve meant more power to the CPU’s and more stable operation under full load but the CPU’s are 75W SKU’s anyway so it wouldn’t make much of a difference.)
The board provided a 4+1 phase power to each CPU, this could have been 8+2phase for each CPU, but at this point you can’t really complain too much, Asus has already done a fantastic job of accommodating two processors in an ATX form factor, you can’t really expect more!!
The BIOS –
The BIOS of the KCMA-D8 is OK, it had all the necessary options, not too much to play with but not too little either. All the necessary memory options there need to be for a ccNUMA based system were there and as the KCMA-D8 is technically a server board, so no options for overclocking.
Here are a few snaps –
Asus Boot time logo.
BIOS CPU Configuration.
ccNUMA options – During the complete review Bank & Channel Interleaving were Enabled while Node Interleaving was Disabled. Bank Swizzle was also enabled but it did not make any difference as only 4 sticks of memory were used – 1stick per channel.
Motherboard hardware monitoring.
The board also supports Promise software RAID for - RAID 0, 1, 5, 10, JBOD modes.
Overall the board was pretty good – all components were of good quality, it was fun & easy to work with too; the only downside to the board was that it didn’t support SATA 3.0 or USB 3.0 and the power FET’s were not passively cooled.
One fan blows into the other but CPU has 75W ACP, so not a big issue – both CPU’s were below 40 deg C under full load. There are 6 SATA ports (black & red) & 8 SAS ports (blue & red) on the board. The 1st and 3rd lanes are the PCIe x16 2.0 lanes. The layout of the board is such as to accommodate 2 dual slot GPU’s
The board also had an Integrated Graphic Processor – ASpeed AST2050. It had 8MB of video RAM as specified in the manual although on the board I was able to find a 512 MB DDR2 800MHz ram chip between the IGP & the ASMB4-iKVM module socket.
The Winbond chip is the 512MB of DDR2 memory. No specific details were given about ASMB4-iKVM or the memory module to figure out why was there a 512MB chip when IGP was using only 8MB and ASMB4 module wasn’t plugged in?!
Anyway…
A very interesting piece of trivia – as it technically is a server platform you might find yourself in a situation where you might need a server CPU HSF compatible with socket C32, now the good news here is that you don’t have to necessarily go in for a Supermicro/any other server HSF which also might have availability issues in India and also would cost quite a bit more than you would like to spend; you can basically go for any AM3 socket compatible HSF with any C32 motherboard as the mounting pitch for both the socket is exactly the same!!!
I used a pair of CM Hyper 212+ HSF for the system.
Also, a piece of advice, if you ever happen to come across this board – However shiny and inviting the chokes appear – DO NOT TOUCH THEM!!! AND CERTAINLY NOT AFTER THE SYSTEM HAS BEEN UP FOR A HALF HOUR!
The Test System & Benchmarks –
Processors:
2x AMD Opteron 4180
Motherboard:
Asus KCMA-D8
Graphics Card:
Asus ATI Radeon HD 4770 512MB
Memory:
4x 4096MB 1333MHz 9-9-9-24-33 1T Transcend RDIMM’s with ECC.
Hard Drives:
4x 1TB Western Digital Green w/ 64MB cache in RAID 10.
Power Supply:
Seasonic S12 750W 80+ Bronze
Monitor:
AOC 2036Sa
Operating System:
Windows 7 64bit
Software Benchmark:
• 7-Zip
• Blender
• Cinebench R10
• Cinebench 11.5
• SuperPi
• wPrime
• UC Bench 2011
• Geekbench
• Aida64
• SiSoft Sandra
• Resident Evil 5 DX10
• TM Nations
• Devil May Cry 4
NOTE – All the figures for Intel Core i7 990X have been taken from reputed hardware review sites, like tom’shardware.com, anandtech.com, overclockersclub.com, etc.. Intel Core i7 980X figures are published here only for the convenience of the reader! I did not have any Core i7 980X based system at the time of testing.
Also, all the benchmark settings for the benchmarks & Software versions of the benching software’s are similar to that of the Core i7 980X review from toms’hardware.com. Though some of the software’s in this benchmark were not present in the tom’shardware.com review. The settings & scores for those are matched with Core i7 980X review from some equally credible review website, like overclocker’sclub.com etc.
The AIDA64 Benchmark Suite Compares the test system here with a Core i7 990X whose benchmarks were already inbuilt with the software at the time of benchmarking. Hence you will notice that i7 980X @3.46GHz is compared to the test system, in all AIDA64 benchmarks, which is essentially is the Core i7 990X.
The Benchmarks –
7-Zip Benchmark – Compression and decompression rates in MB/s. 7-Zip is everyday use compression software used to compress files.
Blender Benchmark – Blender is a free and open source rendering and compositing software for 3D modeling. Blender is a multithreaded benchmark and is very efficient at utilizing CPU resources.
Interesting to note – A recent review of the core Intel i7 Sandy Bridge-E from tom’shardware.com reveals that the Blender benchmark is completed by a Core i7 3960X @ 3.3GHz in 42 seconds,Core i7 3820K @ 4.625GHz in 38 seconds and a Core i7 3930K @ 4.5GHz in 34 seconds.
Dual Opteron Systems seem to perform very nicely in rendering applications even after their lower clock speeds.
Cinebench R10 Benchmark – Cinebench is another rendering test suite from Maxon software for stress testing CPU and GPU compute capabilities of the processors installed.
Despite the 733MHz lower clock speed, the Opteron based machine is matching the performance of an i7 980X.
Cinebench R11.5 Benchmark – Cinebench 11.5 is the upgrade to the Cinebench R10 benchmark from Maxon software. It provide more tighter integration and more intense rendering test ensuring to stress test the CPU in render compute capabilities.
Maxon seems to have made some optimizations to the CB R11.5 benchmarks for AMD, as this benchmark favors the AMD system by a little margin.
wPrime 32m & 1024m Benchmark – wPrime is a leading multithreaded benchmark for x86 processors that tests your processor performance by calculating square roots with a recursive call of Newton's method for estimating functions.
The AMD system gets the better of the Intel system despite running at significantly lower clocks and having a lower IPC too. This clearly shows why AMD system makes more sense than a Core i7 980X. Despite of lower IPC and clocks, you get nearly the same performance from both the systems while AMD system being $600 cheaper than the Intel system.
UC Bench 2011 Benchmark – It simulates brute force searching of the archive password (password length is 6 chars). It utilizes all available cores, and supports SSE2/SSSE3/SSE4.1.
Geekbench 2.0 Benchmark – Geekbench 2.0 is a benchmark that tests CPU and memory performance in an easy-to-use tool. The measure used for comparison is the total suite average score.
The Geekbench 2.0 benchmark suit is made up of multiple benchmarks which are both single & multi-threaded. Due to its 733MHz or 22% lower clock speed the AMD system falls about 12% behind
the i7 980X. Although it seems like with similar clocks the AMD system would’ve bettered the Intel System slightly.
AIDA64 Benchmark’s Suite
AIDA64 Memory Bandwidth – Memory bandwidth benchmarks (Memory Read, Memory Write, and Memory Copy) measure the maximum achievable memory data transfer bandwidth.
AIDA64 Memory Latency – The AIDA64 Memory Latency benchmark measures the typical delay when the CPU reads data from system memory. Memory latency time means the penalty mea
sured from the issuing of the read command until the data arrives to the integer registers of the CPU.
AIDA64 CPU Queen Benchmark – This simple integer benchmark focuses on the branch prediction capabilities and the misprediction penalties of the CPU. It finds the solutions for the classic "Queens problem" on a 10 by 10 sized chessboard. At the same clock speed theoretically the processor with the shorter pipeline and smaller misprediction penalties will attain higher benchmark scores.
AIDA64 CPU Photoworxx – This benchmark performs different common tasks used during digital photo processing like, Fill, Flip, Crop, Difference, Color to Black & White etc on a very large image.
AIDA64 CPU Z-Lib – This integer benchmark measures combined CPU and memory subsystem performance through the public Z-Lib compression library.
AIDA64 CPU Hash – This benchmark measures CPU performance using the SHA1 hashing algorithm.
AIDA64 CPU FPU VP8 – This benchmark measures video compression performance using the Google VP8 (WebM) video codec Version 0.9.5. FPU VP8 test encodes 1280x720 pixel ("HD ready") resolution video frames in 1-pass mode at 8192 kbps bit rate with best quality settings.
AIDA64 CPU FPU Julia – This benchmark measures the single precision (also known as 32-bit) floating-point performance through the computation of several frames of the popular "Julia" fractal.
AIDA64 CPU Mandel – This benchmark measures the double precision (also known as 64-bit) floating-point performance through the computation of several frames of the popular "Mandelbrot" fractal.
AIDA64 CPU FPU SinJulia – This benchmark measures the extended precision (also known as 80-bit) floating-point performance through the computation of a single frame of a modified "Julia" fractal.
SISOFT SANDRA Benchmarks Suite
SANDRA Processor Arithmetic Benchmark – A synthetic test, benchmarking the raw performance potential of an integer core (Dhrystone) & the floating point units (Whetstone) in a CPU.
SANDRA Processor Multimedia Benchmark
SANDRA Processor Cryptography Benchmark – Benchmarking the cryptography performance of a processor in AES256 & SHA256. AMD based system lags behind in the AES due to the lack of an AES instruction set while Intel Core i7 980X supports the AES-NI for better cryptography performance.
SANDRA Memory Bandwidth Benchmark – This benchmark has been tough on AMD’s desktop based processors, with Intel first using a triple channel memory controller in its X58 platform & now a quad channel in its X79 platform.
The dual Opteron system here has a Quad Channel Memory Controller, 2 channels per CPU. It comfortably outperforms the Core i7 980X in the benchmark with a considerable margin at a 22% lower clock speed. This is a great result for the AMD based system.
SANDRA Cache & Memory Benchmark – The I/O performance of the system is exceptionally good. It again beats the Intel system in Cache & Memory Benchmark.
Speed factor –
Memory Latency –
Physical Drive & Random Access Benchmark –
AMD System had the motherboard fake RAID solution in RAID mode 10 enabled with 4x 64MB buffer, WD green 1 TB HDD’s leading the Intel system with more than 2x the speed and lower random access times.
Gaming Benchmarks
Resident Evil 5 DirectX 10.0 Benchmark – At 1360 x 760 resolution and 32-bit color depth, with 4x MSAA the system delivered a stable 60+ fps throughout the game.
As Resident Evil 5 is an 8 threaded game, it’s more likely that the HD 4770 512MB ran into its processing or memory limit due to a small frame buffer rather than the processor performance limiting the higher fps.
TM Nations Forever Benchmark – At 1366x768 resolution and 32bit color depth with settings at Very high Quality preset(8X MSAA and 8X AF) the system was able to deliver an average 39.2fps. Although, ~40fps is a very playable frame rate, the CPU usage graph below suggests that the dual threaded game was bottlenecked by the low IPC and clock speeds of the CPU.
Devil May Cry 4 Benchmark – At 1600x900 resolution, 32bit color depth & 4xAA the game ran smooth 60+ fps for the most part. The CPU was not the bottlenecking factor here, the GPU was, probably due to its little frame buffer. Overall playable fps with a decent GPU at limited clocks of the CPU is all in all a good result for AMD!
Conclusion
You can buy 2 Opteron 4180 CPU’s for $420, an Asus KCMA-D8 for $290 and 4x4GB sticks of compatible RAM for another $120 on newegg.com, compare this to $1000 for an Intel Core i7 980X + $200 for a decent X58 motherboard and another $90 for 3x4GB of RAM. You get the AMD system for about $500 less and which gives you the same performance in multithreaded applications.
The only downside of it is its single threaded performance – Although Thuban cores are known to reach 4GHz easily so, Overclocking the system can be an option here. An overclocked Phenom II X6 can definitely give an AMD FX the run for its money – same metric should apply here too!! If it would be possible to overclock this system in anyway then it should give even the Sandy Bridge-E CPU’s a hard time.
Another plus point with the platform is that the regular desktop ram is also supported so you don’t have to spend extra on server specific DIMMs that also decrease the PC performance!! All in all this is a great platform for anybody who is looking for a decent workstation. If overclocked it will prove to be an excellent gaming rig too with lots of options!!!
The lack of overclocking options in the BIOS was expected as it is a server board and it surely makes it a little difficult for overclocking if at all possible. I would also have liked to see heat sink on the power FET’s but the board lacked them.
Overall the platform is very decent, performance is right up there with Core i7 980X, I wonder why AMD is not focusing on this in its desktop line as it gives AMD a fighting chance with Intel in the enthusiast market as this would perform better than Intel in all multithreaded apps and would give a decent if not better(than Intel) single threaded performance with slightly better- desktop-like CPU clocks!!! I would say next time you’re in the market for a high end rig – Keep this platform right on top of your list!!!
Pros: -
1. Excellent Upgrade path.
2. All components of very high quality.
3. Support for 1333MHz desktop DIMMs.
4. Enthusiast class performance for much less.
5. 4 GPU x16+ x16+ x16+x16 Crossfire X.
6. Great value for money.
Cons -
1. Lack of Overclocking /Desktop features in motherboard BIOS.
2. Lack of heat sink on power FET’s on the particular Asus board.
3. Low clock speeds result in bad single threaded performance.
P.S. Thanks to ico[TDF] for helping with this review!