Bulldozer's FPU has an advantage in another area, though, as the presence of two 128-bit FMAC units indicates. FMAC is short for "fused multiply-accumulate," an operation that's sometimes known as FMA, for "fused multiply-add," instead. Whatever you call it, a single operation that joins multiplication with addition is new territory for x86 processors, and it has two main benefits.
The first, pretty straightforwardly, is higher performance. The need to multiply two numbers and then add the result turns out to be very common in graphics and media workloads, and fusing them means the processor can achieve twice the throughput for those operations. We've seen multiply-add instructions in GPUs for ages, which is why each ALU in a GPU shader can produce two ops per clock at peak. With dual 128-bit FMACs, Bulldozer's peak FLOPS throughput should be comparable to Sandy Bridge's peak with AVX and 256-bit vectors.
Second, because an FMA operation feeds the result of the multiply directly into the adder without rounding, the mathematical precision of the result is higher. For this reason, the DirectX 11 generation of GPUs adopted FMA as their new standard, as well.