User Review Gaming GPUs, beyond just gaming...


Prologue

GPU, as we know it today, has been evolving by leaps and bound from its early days. The first graphic chip to coin the term GPU was nVidia GeForce 256. Today, several generations later, a GPU has become an integral part of most of the computer systems and then some. While most of us know GPU as must-have part for gaming, few years ago, nVidia decided to push GPU as general processing hardware, beyond its primary task of gaming. It made sense, because GPUs have this immensely potential parallel processing power that can be utilized for other things than just gaming. Obviously though, it’s easier said than done because tapping into this power is a huge task from a developer’s stand-point. Unlike gaming, where developers can use many standardize APIs at system level (DirectX and OpenGL), there was no standardized API for GPGPU (General Purpose Graphic Processing Unit) at the time and nVidia’s proposed CUDA (Compute Unified Device Architecture) framework was highly proprietary, leaving ATi GPUs out of all the GPGPU fun.

While initial CUDA apps offered very little for everyday consumers, except GPU accelerated video encoding. Even then, and for that matter even today, GPU accelerated video encoding only offers speed, but cannot surpass, let alone match, quality offered by pure CPU/x86 video encoding.

All that is set to change with the introduction of OpenCL (Open Computing Language), initially developed and introduced by Apple and currently maintained and further development handled by Khronos group. It’s widely adopted and refined by various companies like, AMD/ATi, Intel, nVidia and ARM. Thanks to its heterogeneous and non-proprietary nature, OpenCL is the most promising technology in the GPGPU arena. Adobe had initially adopted CUDA framework to provide GPU acceleration in its Creative Suitee applications like Premiere, where the gains were evidently significant. However, due to proprietary nature of CUDA only selected nVidia GPUs were able to offer these performance gains. Now with the advent of OpenCL though, latest versions of CS applications are taking advantage of OpenCL to provide acceleration using various GPUs. These GPU acceleration features are now not only limited to Premiere, even Photoshop have various features that take advantage of GPU acceleration. While there are many video encoding applications which offers support for various GPU encoding interfaces (CUDA, AVIVO/ATi APP and Intel QuickSync) it’s not really ideal for enthusiast seeking high quality video encoding comparable that a good H.264 encoder or x264 provides. Currently there is experimental build of Handbrake (a popular x264 video encoding app) available which uses combination of OpenCL and CPU based encoding to speed up the encoding process.

So this exciting development in GPGPU arena begs to question, how these GPUs really help in work environment. Do these GPUs offer enough gains and if so, do these gains warrant consideration in your GPU shopping decision? Let’s find out…

-------------------------------------------------------------------------------------------------------------------------------------

Intro

Now, unlike usual 3DMark and synthetic benchmarks, there is hardly any material online to find out how much these GPUs help in work applications or even 2D performance for that matter. To find out how much difference today’s GPU bring to table vs. pure CPU processing power, I ran couple of tests. Before we proceed though, let me clear out few things.

  1. These tests are very minimum and currently limited to the applications that I use on very regular basis.
  2. GPUs featured in this test are limited in numbers as these were the only GPUs I could get my hands on at the time of writing this test . This is not a competitive review, but rather a small attempt at writing an informative piece.
  3. OpenCL support is growing but at this point very few commercial applications offer any definitive support. In the future, I’m sure and I hope that more and more applications will take more advantage of massive GPU power and warrant mainstream media to take note and include more GPGPU benchmark in reviews.
With that cleared out, let’s move ahead with test methodology and setup.

-------------------------------------------------------------------------------------------------------------------------------------

Test setup & methodology
First, the contenders:

  1. Intel HD4000 (specs)
  2. AMD Radeon HD6950 1GB GDDR5 (specs)
  3. AMD Radeon HD7850 2GB GDDR5 (specs)
  4. AMD Radeon HD7970 3GB GDDR5 (specs)
  5. nVidia GeForce GTX680 2GB GDDR5 (specs)
Note: All GPUs except HD6950 and HD7850 are non-referenced coolers, however all GPUs & their respective memories are running at reference clock speeds.

Test Setup:

  • Intel Core i5 3570k @ 4.2GHz
  • Corsair Vengeance DDR3 RAM 8GB @ 1600MHz
  • Asus Maximus V Gene motherboard
  • Intel SSD 530 series 180GB (Primary OS Drive)
  • Samsung SSD 840 series 120GB (Test Data Drive)
  • Corsair TX-650 PSU
  • OS: Windows 7 x64 SP1 (fresh install with all latest updates installed)
  • Drivers:
    • For Intel HD4000 – HD Graphic Drivers v15.31.9.64.3165 and Intel OpenCL SDK 2013 (OCL 1.2)
    • For AMD Radeon – Catalyst 13.6 (current stable) with AMD OpenCL APP driver (OCL 1.2)
    • For nVidia GeForce – Forceware 340.49 (current stable) with OpenCL 1.1 driver
Test Methodology:

We will be using two applications for real-world testing,

  • Photoshop CS6 (v. 13.0.1 x64)
  • Handbrake (build SVN5466 with OpenCL support)
Three applications for pseudo-synthetic testing,
  • Toms 2D GDI benchmark
    (for testing 2D performance of windows GDI functions, responsible for most of the interface/text drawing of windows)
  • CLBenchmark 1.1.3 (OpenCL benchmark application)
  • DirectComputeBenchmark 0.45b (DirectCompute and OpenCL benchmark application)
For real-world testing in Photoshop, I designed a benchmark using Photoshop actions, where series of Photoshop functions supporting GPU acceleration are run on three different images. This test is divided in 2 parts; first part is for GPU exclusive functions in Photoshop, which run only when GPU acceleration is enabled. Second part is mixed with GPU Exclusive features along with functions which are not exclusively relied on GPU, but can use GPU acceleration when enabled to speed up the processing. Each part will be executed on three images, a 12MP JPG, 21MP JPG and a 12MP RAW file. The 12MP RAW file will be identical to 12MP JPG, however, it will be a 300DPI/16-bit image with significantly higher color information. This will help us test if it makes any big difference in image processing. For your information following list of functions used in both GPU exclusive test & GPU assisted test.

GPU Exclusive Features Test
  • Adaptive Wide Angle
  • Oil Paint
  • Lighting Effect
GPU Assisted Features Test
  • Adaptive Wide Angle
  • Oil Paint
  • Liquify (preset mesh)
  • Field Blur
  • Iris Blur
  • Tilt Shift
  • Lighting Effect
Note: To maintain the computing integrity, all blur filters were added with reset action, which resets the image to ensure that next blur filter had original data to compute blur algorithm. Also Photoshop CS6 trial version was used for testing; however, there is no known limitation in features or processing performance from licensed version.

For handbrake, I created a preset for x264 profile with OpenCL acceleration enabled. To ensure proper computing operations are used, the output was resized to 720p with following settings:

  • Output:
    • Dimensions - 1280x720
    • Format – MPEG4/AVC-1, MP4 container
  • Filters:
    • Detelecine – Off
    • Deomb – Default
    • Deinterlace – Off
    • Denoise – Off
    • Deblock – Off
  • Video Settings:
    • Codec – H.264 (x264)
    • Profile – High, Level 4.1
    • Framerate – same as source, variable
    • Quality – Average Bitrate, 2048kbps, 2-Pass encoding, Turbo first pass
    • x264 Preset – Very Slow
    • x264 Tune – None, Fast Decode checked
  • Audio Settings:
    • Set to auto-passthrough
      (since audio encoding is not GPU accelerated it was set to passthrough without any encoding)
The source file used was a 1:42 second 1080p clip (MKV container, H.264 video at 40mbps video bitrate, Dolby AC3 stereo audio with 448kbps bitrate).

For pseudo-synthetic benchmarks, each benchmark will be run three times with final value averaged out.

All the test data, images for PS manipulation, GPU feature presets in Photoshop and source video files are loaded from Samsung SSD-840 to ensure that disk IO did not affect performance numbers.

So let’s get cracking on the results then, shall we?


-------------------------------------------------------------------------------------------------------------------------------------

Test Results / Analysis

Before we do some real-world test, let’s put these contenders through some benchmark tools to see how they stack up against each other.
Toms 2D GDI benchmark

One of the most ignored and underrated part of the modern GPU is the 2D performance, because marketing always wants to show the latest 3Dmark numbers and record achievements. Unfortunately this also puts the engineering team, both software & hardware, on edge for concentrating on the 3D performance. At least couple of years ago, things were so bad that the 2D performance of almost all the available GPU was absolutely rubbish. Fortunately, someone at Toms Hardware noticed this and decided to build a simple GDI operation benchmark to test the 2D performance of the cards and made sure that the results were shown to GPU makers. Both nVidia and AMD/ATi response was positive and promises were made to bring the 2D performance up via driver updates. Let’s take a look at results of Toms 2D GDI benchmark and then we’ll talk about why it matters.

Chart-01-Toms2D.png

Looking at these numbers, the 2D performance of all these cards appear to broadly similar. But let me give you some examples. Back when I read the Toms Hardware article about 2D performance, I was using Radeon HD5770 GPU. I monitored the 2D performance of HD5770 from driver version 10.2 to 11.5b catalyst releases. On 10.2 catalyst HD5770 managed to score only 981 2D points, at the same time, my previous GPU, HD3870 was able to score 1770 2D points. By 11.5b, ATi had delivered on its promise to bring 2D performance up and HD5770 was able to score 1620 2D points. Somewhere around this point I upgraded to HD6950 and on 11.8 catalyst HD6950 could only best HD5770 by 200 points at 1866. Now compared to those numbers, current standings certainly look better. The highest I was able to achieve was 3025 2D points on HD6950 with Catalyst 13.4 release, but even with multiple clean drivers installs for this benchmark I wasn’t able to replicate those results. ATi drivers, as always continue to be finicky. Look at the results for example; HD6950 is still better at 2D compared to latest ATi offerings, bested only by GTX680. Even the lowly Intel IGP manages to take third spot here. ATi’s monster 7970 here is completely shamed by having to be on the last spot. I don’t know if its drivers or hardware itself, but the results are pretty much consistent with this positioning even with beta drivers (I haven’t included beta drivers results here since… well, it’s called beta for a reason).

As for why all this matters… 2D/GDI performance dictates most of your day-to-day computing experience. When Microsoft says that Windows interface is hardware accelerated, it’s not really 3D acceleration they are talking about. All the window interaction and UI elements in windows are now GPU accelerated with DirectDraw and GDI+, which means that better your GPU can perform 2D operations, smoother your Windows/Application UI experience will be.

CLBenchmark 1.1.3 and DirectComputeBenchmark 0.45b

CLBenchmark is a set of about 17 OpenCL operations while DirectComputeBenchmark (hereafter referred as DCB) is mixed set of DirectCompute and OpenCL benchmarks. You’ll notice that OpenCL results also include CPU while DirectCompute doesn’t that is because DirectCompute is GPU exclusive while OpenCL is heterogeneous and can run on CPU.

Chart-02-CLBenchmark.png

Nothing much surprising here, HD7970 takes the cake here and HD7850 follows the lead. GTX680 takes third spot, however close on its heel is last generation’s HD6950. As you can see the CPU is at last spot, which is not surprising given the core count on CPU is much-much lower compared to huge amount of parallel processors (shaders) on the GPUs. HD4000 is where it should be, higher than CPU but still miles away from dedicated GPUs.

Chart-03-DCB-DC.png

Chart-04-DCB-OCL.png

I’m not sure what’s up with DCB result, but even with multiple runs (with different driver configurations) the results stayed the same. Surprisingly HD7850 leads HD7970 in DirectCompute operations despite HD7970’s monstrous power. In the OpenCL results of DCB, HD6950 slightly edges out HD7850 while GTX680 takes massive lead over HD7970, a complete opposite picture from CLBenchmark. Either DCB’s benchmark suitee is flawed and in need for dire update or it’s strictly OCL 1.1 based, which might be skewing the results for HD7000 series. The only reason I decided to include the results is to show the difference between CPU and GPU performance in OpenCL.

Now that we have established the OpenCL performance difference in benchmark suites as well as learned about 2D performance of current gen GPUs, let’s move on to our real-world test to find out how all these extra performance helps us in regular applications.

Photoshop Test – GPU Exclusive features

Let’s take a look at how our contenders perform in Photoshop GPU exclusive feature test. But before that, please note that these GPU-exclusive features use GPU for real-time preview and not for applying the manipulations to the image. However, since we have recorded the actions and preset for these functions, some GPU processing will still be used.

Chart-05-PS-GPUEX.png

For some reason, previous generation Radeon HD6950 beats both HD7970 & HD7850, while HD7850 manages to take the second spot while GTX680 manages the third spot while dangerously staying close to HD7850. In fact, both HD7850 and GTX680 trade blows and claim marginal victory over each other. HD4000 is where it should be in comparison.

Photoshop Test – GPU assisted features

While GPU exclusive features mainly use the GPU for real-time preview for image manipulation, GPU assisted features largely use GPU processing for computing and applying end manipulation results to the image.

Also, before you see the result, here is the comparison number when these features are run without GPU acceleration (CPU only, no GPU-exclusive tests included).

12MP JPG test: 22:44.3

Now let’s see the rest of the test results with GPU acceleration.

Chart-06-PS-GPUASSIST.png

Would you look at that, how tables have turned. HD7970 and GTX680 finishing very, very close; with GTX680 edging out HD7970 by negligible margin. HD7850 is again bested out by HD6950 in overall picture. Although HD7850 is marginally fast in 12MP RAW file test, overall HD6950 is faster than HD7850 by almost a minute there. HD4000 is right where it should be. But just take a moment and compare the 12MP results with the CPU number up there and see how big difference GPU assist makes in Photoshop.

Apart from these features, Photoshop also offer GPU driven interface features such as interactive zoom, flick panning and OpenGL drawing to image canvas. Although not explicitly mentioned, GPU load can be noticed (using GPU-Z or Similar GPU monitoring utility) during Photoshop’s automerge function for creating panorama images. It’s not sure if Photoshop uses the GPU for computing the blending image seams or if it’s just the OpenGL drawing during preview rendering of the finished image. Suffice to say, thanks to OpenCL, Photoshop brings the GPU driven features to vendor-agnostic configurations as well as multiple platforms.

Handbrake Test – GPU Assisted encoding

Handbrake is a very popular cross-platform video encoding application amongst video-enthusiast community. While Handbrake has GUI and Preset support for even novice of users, expert users can easily create their own settings for video encoding. OpenCL support has been worked upon for quite some time now by the developers of x264 encoder as well as Handbrake developers, but as of today there is no stable build of Handbrake available for general population. There is an experimental build available for developers and testers with OpenCL support, for testing the performance of handbrake.

Unlike most of the commercial video encoding applications, Handbrake doesn’t use OpenCL for actual encoding or any of the existing onboard video encoders of current GPUs (QuickSync, nVENC etc). OpenCL/GPU acceleration in Handbrake/x264 comes in three following forms:

  • DXVA support for GPU accelerated video decode (decoding the source stream)
  • OpenCL/GPU acceleration for video scaling and color space conversion
  • OpenCL/GPU acceleration of the lookahead function of the x264 encoding process
Since the primary focus of x264 and Handbrake has always been on quality, CPU is still the main area where encoding is done, but Handbrake/x264 uses OpenCL/GPU acceleration for other functions where massive parallel processing can take on the computing tasks and assist CPU.

Let’s see how much difference it makes in video encoding…

Chart-07-HB-FPS.png

First pass is where the source decode, lookahead and motion detection/estimation operations occurs, while second pass is where the actual encoding and disk writing happens. If you noticed, I had mentioned in the test methodology about scaling down the 1080p source to 720p. This was to ensure that OpenCL operations for video scaling will kick-in. Also, since we set x264 preset to “very slow”, more detailed motion detection/estimation should happen, thus engaging the OpenCL/GPU in the process. Second pass should be the most compute intensive as it stresses the CPU most, and since we have enabled OpenCL/GPU option it should also take GPU’s help. As you can see the raw power of HD7970 is evidently showing the improvement in Handbrake, while HD7850 and GTX680 are pretty close to RAW CPU operation. HD6950 is neck-to-neck with HD7850 and the CPU in second pass, while it loses out in the first pass and overall average FPS. For some reason though, HD4000 worsens the performance (and marginally so does HD6950 when you look at the average).

Chart-08-HB-Time.png

Total encoding times is what ultimately matters and the picture is pretty clear here. HD7970 saves almost a minute of encoding time. With longer movie encodes this should translate in significant gains, while on second spot HD7850 brings about 4 seconds of savings. HD6950 and GTX680 actually deteriorate the performance by 4-5 seconds over each other and CPU while HD4000 absolutely should be avoided with handbrake and OpenCL assisted encoding. Despite the marginal delay caused by HD6950 and GTX680, remember on a weaker CPU this could turn other way around. Bear in mind though, Handbrake/x264 OpenCL support is not yet finalized and is still pretty much in experimental state. Things may change depending on the driver enhancement as well as future Handbrake/x264 development.

Please note though, the QuickSync in HD4000 is absolutely faster than anything else and while the quality is not as good as Handbrake/x264 on CPU encoding, it’s still far better & faster than both nVidia and AMD’s GPU encoders (nVENC and AVIVO). For video conversions to watch on handheld/mobile devices QuickSync is the best option with great balance of quality & speed. However, if you have fast enough GPU there are quite a few commercial video encoding applications out there which support all three encoders, CUDA/nVENC, ATi App/Avivo and Intel QuickSync.

-------------------------------------------------------------------------------------------------------------------------------------
Epilogue

As you can see, there is no clear winner. With results so wide & varied, even with limited set of tests, it’s hard to pick one brand over the other. Then again, this was never meant to be a competitive test.

The main goal of this test was to determine how much the GPGPU promise is actually fulfilled, from a consumer’s perspective. Beyond the marketing speak, how far the GPUs and their relative technologies have come, where having a dedicated GPU actually helps than just gaming. After all, most of us don’t play games for a living. I’m shooting in the dark here, but I would guess that at least 60% of the time, if not more, our GPUs are not running any games or using intensive 3D operations. So if we could squeeze out that idle power of our expensive GPUs, and put it to work on something other than gaming… it does make lot of sense.

Professional graphic cards like Quadro or FirePro/GL, with good GPU power, are still out of budget for most of the consumers. But with gaming GPUs getting more and more powerful, technologies like OpenCL can provide us with opportunity to use their power for work as well as play.

With such advancements in technology, I’m pretty sure that one day General Purpose Computing would become criteria for non-developer/consumer crowd to consider while shopping for new GPU. With wide adoption of OpenCL and more productivity applications tapping into unused powers of our GPU, hopefully that day will come soon. We are kind of getting there today, but are still not quite there.


Special Note: Planning to update the benchmarks with another nVidia card soon, with GeForce GTX780.

-------------------------------------------------------------------------------------------------------------------------------------

Shameless plug
If you liked the article, please don't forget to comment and share your views. You can spare some time and visit my blog as well (although currently it's sparsely updated)
http://yogiee.in/gaming-gpus-beyond-gaming/
 
Last edited:
Firstly, thanks for the well written article.

Secondly, pardon me because I skimmed through it & it seems the article focuses on OpenCL implementation alone. If that is the case then I'd just like to state a FYI for any other readers who are more into things such as 3d rendering on the GPU - If that is the case, kindly approach things from the software side if you're planning on making a GPU purchase. Because it is the s/w which will determine whether it's OpenCL or CUDA that is implemented. As an example, Iray that is now bundled with 3ds Max, will utilize only an Nvidia GPU & not ATI/AMD.

Again, I'd like to say that the author has done a fine job with the article. My post is not in any way trying to undermine his efforts. I think he has covered the topic at hand comprehensively & very well. My post is only meant to serve as a "heads up" to other readers.

Cheers!
 
Back
Top