After years of preparation, GPUs (graphics processing units) are beginning to be used by major server vendors. IBM and Dell are the first tier-one server manufacturers to use GPUs as server co-processors.
These chips are normally used in consumer-level computers, where they offer high-speed graphics performance, primarily for 3D games. But it is slowly dawning that GPUs make amazing math co-processors for math-related tasks than just offering stunning graphics. Both Dell and IBM announced at spring 2010 that they’ll use Tesla M2050 GPUs in their latest server models.
It was a big win for Nvidia, which for years has been actively pushing the idea of using its GPUs as high-performance math co-processors. If you’re already familiar with desktops in 80’s, you might have heard about math co-processors that are sold separately to accelerate math-intensive calculation. In this case, spreadsheets, such as Lotus1-2-3 may have higher calculation performance when using the chip.
Over the years, ATI and Nvidia have built huge multi-core math processors. Nvidia’s Fermi architecture offers 483 stream processors while the ATI Radeon 5000 GPUs could potentially yield 1,600 stream processors.
In general, stream processing can be used in parallel technique for accelerated math computations, where the software manages common things like data synchronization and memory allocation. Each stream processor are connected by high-performance interconnects.
Threads in GPUs are much smaller than typical CPU threads because they’re just a collection of math instructions. Mostly, the math instructions are just simple as addition. In one clock cycle, stream processors can easily change from one thread to another, something common server CPUs can’t do. Threads found in a CPU is an intricate series of instructions, simply too cumbersome for simple math calculations
Lately, those who need high performance computing realize that stream processors in GPUs might do something useful other than pumping out high fps number. Nvidia and ATI wholeheartedly agreed, and recently have converted their graphic processors as math co-processors, although Nvidia seems to be far more emphatic in encouraging the idea.
Complex scientific calculations need double-precision floating-point math. Both ATI and nVidia added double-precision floating-point feature to their GPUs. This ability is unsuitable to gamers but essential for, say, scientific researchers who are simulating global climate patterns.
To fully take advantage of GPUs as math co-processor, server application programmers need to update their codes to use the stream processors. Adopting GPU in servers isn’t a drop-in task. You can’t just buy a new server and expect your current web apps to work faster. Some programming updates have to be done and this is not something that can be completed with a few lines of code and libraries.
3D games have always been common example for apps that use massive floating-point calculations, however these calculations are also useful in scientific and medical imaging, 3D visualization and imaging for oil and gas exploration, financial modeling, aeronautics simulation and others.
The new process is called as general purpose GPU computing or GPGPU. The task basically involves telling an application to send its calculations directly to the GPU. Thus far, that means some code rewriting is necessary.
Nvidia uses CUDA development language for this purpose. CUDA is derived from C language that allows developers to create applications that can run in parallel using Nvidia GPU. Unlike typical CPUs, the applications won’t be running 2, 4, or 8 threads in parallel; but they run in hundreds or even thousands of threads.
OpenCL an offshoot of OpenGL is another alternative in allowing the use of GPUs in servers. OpenCL (Open Computing Language) was originally developed by Apple to allow programs work across CPU, GPU, and other types of processor. It includes a language for writing APIs and kernels used in data-based and task-based parallel programming.
OpenCL has a few advantages over CUDA. For starters, it allows multi-processor computing, while CUDA only supports Nvidia GPUs. OpenCL can run in both GPU and CPU, while CUDA only works in Nvidia GPU. OpenCL conceivably supports everything from ARM embedded processor to Itanium to a Sun UltraSparc. OpenCL is a newer framework compared to CUDA and but still lacks a lot of CUDA’s maturity and features. Most notably, CUDA offers FFT (Fast Fourier Transform) kernel but OpenCL doesn’t. FFT is a powerful algorithm used in many advanced scientific calculations and image processing. CUDA is exclusively used by Nvidia, a major technology company that has strong influence in computer hardware industry, while OpenCL is being developed by a standards group.
Another alternative is Microsoft’s DirectCompute. A new component of DirectX11 library that comes with Windows 7 and it means it works on only one operating system, it won’t even work in Windows XP SP3 or Vista. Like OpenCL, it allows programs to use GPU’s computing power.