this article was posted on anandtech but they decided to take it down due to angry fanboys and sony/ms..
http://www.ansonwilson.com/anandreview.htm
IBM’s pitch to Microsoft was based on the peak theoretical floating point performance-per-dollar that the Xenon CPU would offer, and given Microsoft’s focus on cost savings with the Xbox 360, they took the bait.
While Microsoft and Sony have been childishly playing this flops-war, comparing the 1 TFLOPs processing power of the Xenon CPU to the 2 TFLOPs processing power of the Cell, the real-world performance war has already been lost.
Right now, from what we’ve heard, the real-world performance of the Xenon CPU is about twice that of the 733MHz processor in the first Xbox. Considering that this CPU is supposed to power the Xbox 360 for the next 4 - 5 years, it’s nothing short of disappointing. To put it in perspective, floating point multiplies are apparently 1/3 as fast on Xenon as on a Pentium 4.
The reason for the poor performance? The very narrow 2-issue in-order core also happens to be very deeply pipelined, apparently with a branch predictor that’s not the best in the business. In the end, you get what you pay for, and with such a small core, it’s no surprise that performance isn’t anywhere near the Athlon 64 or Pentium 4 class.
The Cell processor doesn’t get off the hook just because it only uses a single one of these horribly slow cores; the SPE array ends up being fairly useless in the majority of situations, making it little more than a waste of die space.
We mentioned before that collision detection is able to be accelerated on the SPEs of Cell, despite being fairly branch heavy. The lack of a branch predictor in the SPEs apparently isn’t that big of a deal, since most collision detection branches are basically random and can’t be predicted even with the best branch predictor. So not having a branch predictor doesn’t hurt, what does hurt however is the very small amount of local memory available to each SPE. In order to access main memory, the SPE places a DMA request on the bus (or the PPE can initiate the DMA request) and waits for it to be fulfilled. From those that have had experience with the PS3 development kits, this access takes far too long to be used in many real world scenarios. It is the small amount of local memory that each SPE has access to that limits the SPEs from being able to work on more than a handful of tasks. While physics acceleration is an important one, there are many more tasks that can’t be accelerated by the SPEs because of the memory limitation.
The other point that has been made is that even if you can offload some of the physics calculations to the SPE array, the Cell’s PPE ends up being a pretty big bottleneck thanks to its overall lackluster performance. It’s akin to having an extremely fast GPU but without a fast CPU to pair it up with.
doesn`t sound too good..........looks like carmack was right
http://www.ansonwilson.com/anandreview.htm
IBM’s pitch to Microsoft was based on the peak theoretical floating point performance-per-dollar that the Xenon CPU would offer, and given Microsoft’s focus on cost savings with the Xbox 360, they took the bait.
While Microsoft and Sony have been childishly playing this flops-war, comparing the 1 TFLOPs processing power of the Xenon CPU to the 2 TFLOPs processing power of the Cell, the real-world performance war has already been lost.
Right now, from what we’ve heard, the real-world performance of the Xenon CPU is about twice that of the 733MHz processor in the first Xbox. Considering that this CPU is supposed to power the Xbox 360 for the next 4 - 5 years, it’s nothing short of disappointing. To put it in perspective, floating point multiplies are apparently 1/3 as fast on Xenon as on a Pentium 4.
The reason for the poor performance? The very narrow 2-issue in-order core also happens to be very deeply pipelined, apparently with a branch predictor that’s not the best in the business. In the end, you get what you pay for, and with such a small core, it’s no surprise that performance isn’t anywhere near the Athlon 64 or Pentium 4 class.
The Cell processor doesn’t get off the hook just because it only uses a single one of these horribly slow cores; the SPE array ends up being fairly useless in the majority of situations, making it little more than a waste of die space.
We mentioned before that collision detection is able to be accelerated on the SPEs of Cell, despite being fairly branch heavy. The lack of a branch predictor in the SPEs apparently isn’t that big of a deal, since most collision detection branches are basically random and can’t be predicted even with the best branch predictor. So not having a branch predictor doesn’t hurt, what does hurt however is the very small amount of local memory available to each SPE. In order to access main memory, the SPE places a DMA request on the bus (or the PPE can initiate the DMA request) and waits for it to be fulfilled. From those that have had experience with the PS3 development kits, this access takes far too long to be used in many real world scenarios. It is the small amount of local memory that each SPE has access to that limits the SPEs from being able to work on more than a handful of tasks. While physics acceleration is an important one, there are many more tasks that can’t be accelerated by the SPEs because of the memory limitation.
The other point that has been made is that even if you can offload some of the physics calculations to the SPE array, the Cell’s PPE ends up being a pretty big bottleneck thanks to its overall lackluster performance. It’s akin to having an extremely fast GPU but without a fast CPU to pair it up with.
doesn`t sound too good..........looks like carmack was right