AMD release APP SDK 2.8, CodeXL 1.0 and Bolt

This is old news now that AMD have released APP SDK 2.8, CodeXL 1.0 and Bolt. Bolt is a C++ abstraction of OpenCL which allows you to write OpenCL applications without actually writing OpenCL. I personally haven’t tried it as we are not going to be using it at work but it sounds a good step forward in making GPGPU more accessible to the mainstream. For anyone interested there are a number of blog posts by AMD on how to use Bolt.

OpenCL Cookbook: Multi-device utilisation strategies

A quick note on all the different ways in which one can utilise multiple devices in OpenCL. This overlaps with my previous post on the same subject in the series but is a higher level overview than that previous post.

Here are the different ways one can load balance across multiple devices.

  • Use CPU and GPU devices.
  • Use multiple GPU devices.

For each of the above one can utilise them in the following ways.

  • Single context – Use one context in total across all devices and one command queue per device.
  • Multiple contexts single-host-thread – Use one context and one command queue per device.
  • Multiple contexts multi-host-threaded – Same as multiple contexts above but use one host thread per device to enqueue to the command queue for that device. This can also be in the form of starting one host process instance per device which in essence is the same.

For an elaboration on the latter strategies see my previous post on the subject.

Project Rogue: Conception and specification

Project Rogue in action in Crysis 2
Project Rogue in action in Crysis 2

Introduction

Project Rogue was conceived out of necessity in September 2012. Constituent components were researched in October 2012 and then subsequently ordered end of October 2012. A week later the project was complete in the sense that it functioned as a whole. It was alive and breathing. One month later, in December 2012, Project Rogue has now completed its extensive accessorisation phase and now I reveal it to you in all its glory.

How was Project Rogue conceived?

Early September 2012 I started working with some insanely powerful desktop workstation hardware – dual xeon 16 core 32 thread watercooled cpus, 8 water cooled gpus, 128GB ram, dual 1200W power supplies and immensely large cases. We were loading up these beasts with heavy computations that were utilising both multiple physical cpus and multiple gpus.

At that time my personal machine at home was a humble 11″ macbook air which although previously had been more than adequate for my needs it had now essentially been made suitably redundant and in a sense outdone. It was clear that I would need something a bit more ambitious to be able to do the kind of work at home that I was doing at work.

Something that would allow me to do uncompromising computations on the cpu and across multi-gpus but within the constraints of a personal budget (which I far exceeded in the end due to my own uncompromising nature). Something that within loose constraints would be uncompromising in its demands for high quality components and power and not settle for what was just good enough – something almost rogue in nature.

What is Project Rogue?

Sadly it’s nothing as grand as the preceding section makes it out to be where I got a little carried away but to me, actually, it is grand for reasons I’ll get into below. Project Rogue is a home computation and gaming rig with modern high quality components within the constraints of a home budget (sadly a characteristically generous one).

There are two things that make Project Rogue highly significant to me which is why it is being made into a series of posts on this blog. Firstly, it is a real world machine born out of a real world need created from scratch in the modern day and I’m sure that there are many that would benefit from an account of the entire process of creating such a rig. I certainly wish there was such a guide for me when I started.

Secondly, and more importantly, this was my first pc build ever which I admit without reservation and given that fact and the calibre of this machine I feel a sense of achievement and would like to catalogue this milestone of mine. I also think that the latter point makes, this account to come, even more valuable for those out there who haven’t done this before in making it that much more accessible and approachable to them.

I have to say, that whilst my assumption was (as always) that I was the only one who hadn’t done anything like this before, that could not have been further from the truth. I was really very surprised to learn that most ex-colleagues and some current colleagues over the years had either had no exposure to hardware to date or had simply become totally disconnected from it in mainstream commodity development.

I’ve been there and I know what it’s like. You’re developing on a virtual machine. You don’t need to know where it runs and you don’t need to care. The illusion of abstraction and portability reassures you that as long as you write your code to be fast it will automatically run faster when put on better hardware but in reality that is not necessarily true. Hardware is never proactively part of the development process and developers are generally unaware of what hardware they are running on in production.

Home computation and gaming rig component list

Here follows the component and accessory list for a home computation and gaming rig. Each component was picked with a particular motivation in mind and I’ll write about those motivations in a separate post. But generally I tried to go for the best, not unreasonably expensive, quality components that were released most recently.

For the experts out there if you are expecting a workstation or server specification build or a high end consumer build such as a multi cpu, xeon or sandybridge build you’ll be disappointed. This is, sadly, a bit simpler and a bit cheaper but at least up to date with ivybridge and the latest 7970 cards. It is a consumer grade build built with two gpu computations and gaming in mind. It is uncompromising in the gpus chosen however. If you’re expecting a full build watercooled solution you’ll also be sorely disappointed. I needed to start with air-cooling but no doubt next up is water.

In terms of cost – the total cost of the following component list comes to £4385 but a few parts having been bought have been replaced and the sales of the older parts are still pending and once they go through the total cost will reduce to £3533 which sounds a little less mindless.

# Component Price Model Vendor Notes
1 Case £143 Corsair Obsidian 650D Amazon, OC Windowed and matte black
2 Motherboard £231 Asus Maximus V Formula/ThunderFX Amazon, OC Multi-gpu PCI-E, dual band wifi
3 PSU £205 Corsair AX1200 Amazon, OC Fully modular and future proof
4 CPU £240 Intel IvyBridge i7-3770K Amazon, OC 4 cores, 8 threads
5 CPU Cooler £80 Corsair H100 Amazon, OC Sealed liquid cooler
6 Memory £200 Corsair 16GB Dominator Platinum Amazon, OC Pure aesthetic beauty
7 GPU 1 £395 Asus Matrix HD7970 Platinum Amazon, OC Top end triple slot AMD radeon card
8 GPU 2 £395 Asus Matrix HD7970 Platinum Amazon, OC Top end triple slot AMD radeon card
9 GPU 3 £380 MSI R7970 Lightning Boost Edition Amazon, OC Top end dual slot AMD radeon card
10 GPU 4 £380 MSI R7970 Lightning Boost Edition Amazon, OC Top end dual slot AMD radeon card
11 Monitor £489 Asus PB278Q Amazon, OC 27″ LED, fully adjustable
12 SSD 1 £171 Corsair Force GS 240GB Amazon, OC 90K IOPS, bright red
13 SSD 2 £171 Corsair Force GS 240GB Amazon, OC 90K IOPS, bright red
14 HDD 1 £66 WD Scorpio Blue 1TB (Spec) (PDF) Amazon, OC Fast 2.5″ 1TB HD
15 HDD 2 £66 WD Scorpio Blue 1TB (Spec) (PDF) Amazon, OC Fast 2.5″ 1TB HD
16 Bluray writer £80 Asus BW-12B1ST Amazon, OC Multi-purpose drive
17 4x SSD Rack £58 Icy Dock 4×2.5″ SSD/HD Rack Amazon, OC 4x drive hot swap rack
18 Fan controller £47 Lamptron FC5V2 Amazon, OC Multi-colour multi-output
19 120mm fans £18 Corsair SP120 Quiet Edition Amazon, OC Radiator fans with red fan rims!
20 120mm fans £10 Corsair AF120 Quiet Edition Amazon, OC Airflow fans with red rims!
21 200mm fan £15 Bitfenix Spectre Pro Amazon, OC Quiet black 200mm fan
22 Speakers £183 Corsair SP2500 Amazon, OC Top end 2.1 speakers
23 Keyboard 1 £11 MS Keyboard 600 Amazon, OC Cheap and nasty
24 Keyboard 2 £70 Logitech G510 Amazon, OC Soft touch with macros
25 Mouse 1 £45 Logitech Performance MX Amazon, OC Cordless
26 Mouse 2 £35 Logitech G500 Amazon, OC Corded
27 USB3 Stick £50 Corsair Survivor Stealth 64GB Amazon, OC Stunning and fast
28 USB3 adapter cable £7 Akasa USB3 adapter cable Amazon, OC Converts usb3 to mb header
29 Thermal paste £10 Artic Silver 5 Amazon, OC Thermal paste and cleaner
30 Cable ties 1 £3 Large velcro cable ties Amazon, OC Multi-coloured and chunky
31 Cable ties 2 £4 Large velcro cable ties Amazon, OC Black and chunky
32 Cable ties 3 £12 Small velcro cable ties Amazon, OC Thin ties bulk pack
33 Power meter £19 Efergy Socket 2.0 Amazon, OC Shows real time watts used
34 Sata cables 1 £5 Akasa 50cm sata3 Amazon, OC Fat sata cable
35 Sata cables 2 £17 4x Akasa super-slim 50cm sata3 Amazon, OC Super slim form factor
36 DVD-R £7 25x Verbatim DVD-R Amazon, OC System restore discs
37 CD/DVD pens £5 Fellowes CD/DVD Pens Amazon, OC No comment
38 External HDD £62 Samsung M3 1TB Amazon, OC Out of case backup

A sneak peek at Project Rogue
Sadly there was only room for two but the sight of four was nice while it lasted!

What’s next in the series?

Project Rogue will become a series starting with a step by step account of its own build in the form of a photo journal and then continuing to document its evolution in real time. I will also focus on important aspects of a computer build that would be relevant to anyone attempting any build. Post Project Rogue there will be another good few builds coming each offering a progression of some kind of previous builds and if I haven’t run out of cash by then and had to shut the site down I’ll document those too! Stay tuned!

Have you donated to Wikipedia?

Do you use wikipedia? Have you ever considered yourself to be privileged to be able to use wikipedia whenever you wanted for whatever you wanted totally free of charge? How many times has it helped you avoid an embarrassing admission of ignorance? How many times has it helped you prepare for a job interview? Considering that – have you ever donated to wikipedia? I just did. And this is the email I received in return.

Dear Dhruba,

Thank you for donating to the Wikimedia Foundation. You are wonderful!

It’s easy to ignore our fundraising banners, and I’m really glad you didn’t. This is how Wikipedia pays its bills — people like you giving us money, so we can keep the site freely available for everyone around the world.

People tell me they donate to Wikipedia because they find it useful, and they trust it because even though it’s not perfect, they know it’s written for them. Wikipedia isn’t meant to advance somebody’s PR agenda or push a particular ideology, or to persuade you to believe something that’s not true. We aim to tell the truth, and we can do that because of you. The fact that you fund the site keeps us independent and able to deliver what you need and want from Wikipedia. Exactly as it should be.

You should know: your donation isn’t just covering your own costs. The average donor is paying for his or her own use of Wikipedia, plus the costs of hundreds of other people. Your donation keeps Wikipedia available for an ambitious kid in Bangalore who’s teaching herself computer programming. A middle-aged homemaker in Vienna who’s just been diagnosed with Parkinson’s disease. A novelist researching 1850s Britain. A 10-year-old in San Salvador who’s just discovered Carl Sagan.

On behalf of those people, and the half-billion other readers of Wikipedia and its sister sites and projects, I thank you for joining us in our effort to make the sum of all human knowledge available for everyone. Your donation makes the world a better place. Thank you.

Most people don’t know Wikipedia’s run by a non-profit. Please consider sharing this e-mail with a few of your friends to encourage them to donate too. And if you’re interested, you should try adding some new information to Wikipedia. If you see a typo or other small mistake, please fix it, and if you find something missing, please add it. There are resources here that can help you get started. Don’t worry about making a mistake: that’s normal when people first start editing and if it happens, other Wikipedians will be happy to fix it for you.

I appreciate your trust in us, and I promise you we’ll use your money well.

Thanks,
Sue

Sue Gardner
Executive Director,
Wikimedia Foundation
https://donate.wikimedia.org

If you’ve used wikipedia you owe them and all contributors a debt of gratitude. And a donation symbolises a repayment of that debt of gratitude. Donate today. It doesn’t have to be much. £5, £10 – anything that you can afford.

Collapsing two arrays into one in C and C++

A quick tip – if you have two arrays and you want to merge/collapse/concatenate/append/combine (whatever you want to call it) them into one here is one way to do it in C and C++. The code also prints the combined array to check it’s correct.

Note the subtle differences between C and C++. One of the (innumerable!) problems with C++ is that you can write as much C as you want in it and leave the next developer wondering why you were writing C instead of C++ and whether you had specific justifiable reasons to do so in each instance. I personally also consider it bad practice. This is one of the reasons I dislike hybrid languages – they allow you to blend two languages together in arbitrary proportions making the end result unreadable and unmaintainable and just raising more questions than the problems that it solves.

C

[c]
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main () {

double a[4] = {1,2,3,4};
double b[4] = {5,6,7,8};

double* c = malloc(8 * sizeof(double));

memcpy(c, a, 4 * sizeof(double));
memcpy(c + 4, b, 4 * sizeof(double));

for (int i = 0; i < 8; i++) {
printf("%d %fn", i, c[i]);
}

free(c);

}
[/c]

C++

[cpp]
#include <iostream>
#include <cstring>
#include <iomanip>

using namespace std;

int main () {

double a[4] = {1,2,3,4};
double b[4] = {5,6,7,8};

double* c = new double[8];

copy(a, a + 4, c);
copy(b, b + 4, c + 4);

for (int i = 0; i < 8; i++) {
cout << i << " " << c[i] << endl;
}

delete[] c;

}
[/cpp]

Note that in the C++ program:

  • I’m not using includes from C
  • I’m not using memcpy from C
  • I’m not using printf from C
  • I’m not using malloc from C
  • I’m not using sizeof from C
  • I’m not using free from C

8 GPU watercooled computation rig using Power Color Devil 13 HD 7990 cards

Workstation motherboards provide 4 or more PCIE x16 gen 3 slots with a good distribution of dedicated links direct to multiple onboard CPUs. With dual cards such as 7990 which are in fact 2×7970 each one can in theory load up a 4 slot motherboard with 8 gpus in total. Alas, in practice, there can be obstacles booting with 7 or 8 of them with certain motherboads as they tend to run out of PCIE resources after around 6 cards which could be considered to be surprising for a workstation class motherboard. Incidentally, I think it’s worth noting that not only are these the only model of 7990 made but they are practically impossible to get hold of so very rare gems indeed.

Watercooling four dual gpus on a standard workstation motherboard can also be a challenge due to a severe shortage of space between them. Dual slot spacing between PCIE x16 slots is a tight fit as the tubes can take up almost three slots worth of space between cards. In this configuration they are installed on alternate slots so if you had 7 slots in total you’d only install on 1, 3, 5 and 7 leaving 2, 4 and 6 empty. Though 2, 4 and 6 will usually be PCIE x8 slots anyway as opposed to the rest being PCIE x16.

As you can see below these cards have been completely stripped down of their air cooling and heatsink apparatus prior to attaching waterblocks and tubing for coolant to pass through them. The tubing is secured using barb fittings which are stronger than the alternative: compression fittings though they do lack the aesthetic appeal of compression fittings. Compression fittings can come apart under tension and that can create a real mess as I realised the hard way one night.

If not using full card waterblocks (which these aren’t) individual adhesive heatsinks for all the ram chips (known as ram sinks) are required for sufficient cooling. There may be upto 12 of these tiny little ram sinks on each face of a card. I don’t have any photos of that right now but I’ll try and get some. Though, ram sinks, can be flimsy and easily become dislodged and fall off the cards if they are knocked which is why some people prefer full cover blocks. Full cover blocks are more robust but also more expensive.

In terms of power the system is supplied with 2400 watts of power composed of two 1200W power supplies chained by an adapter for the first to kickstart the other on boot. Half the gpus are powered by one and half by the other. This particular machine has 128GB of RAM and dual xeons with a combined total of 32 cores. Update: As rightly pointed out by the commenter below I meant hardware threads not physical cores here.

Note: I do not own the hardware or the photographs but consent has been acquired to publish them here. Also this system is not conceived or assembled by me though I do have the pleasure of loading it with OpenCL benchmarks and computations.

OpenCL Cookbook: Managing your GPUs using amdconfig/aticonfig – a powerful utility in the AMD OpenCL toolset on Linux

When you install AMD Catalyst drivers (I’m using 12.11 beta) on linux you gain access to a command line utility called amdconfig. This is better known by its legacy name aticonfig but in this article we’ll stick with the new name. This tool provides a wealth of functionality in querying and configuring your AMD gpu making it a very powerful utility in your OpenCL toolset on Linux.

Here we explore some basic yet powerful uses of this tool to query and manage the state of our gpus. In the following examples I use the long form of the arguments to the command so that it is easier to remember and understand for those new to the command. Note that in general write commands due to the inherent risks involved can only be run as root. Read only commands can be run by normal users.

List all gpu adapters

Note – just because you see multiple adapters here does not mean they will be enabled in the OpenCL runtime. For that you have to generate a new X configuration with all adapters (see command for further down).

dhruba@debian:~$ amdconfig --list-adapters
* 0. 05:00.0 AMD Radeon HD 7900 Series
  1. 06:00.0 AMD Radeon HD 7900 Series
  2. 09:00.0 AMD Radeon HD 7900 Series
  3. 0a:00.0 AMD Radeon HD 7900 Series
  4. 85:00.0 AMD Radeon HD 7900 Series
  5. 86:00.0 AMD Radeon HD 7900 Series

* - Default adapter

Generate a fresh X config with all adapters enabled

Generating a new config in this way will backup your old config.

dhruba@debian:~$ sudo amdconfig --initial --force --adapter=all
Uninitialised file found, configuring.
Using xorg.conf
Saving back-up to xorg.conf.fglrx-0

Or you can specify your old and new config files explicitly.

dhruba@debian:~$ sudo amdconfig --initial --force --adapter=all --input=foo --output=bar
Uninitialised file found, configuring.
Using bar
Saving back-up to bar.original-0

Query current clocks, clock ranges and load for all adapters

Note here that by default, in idle mode, on linux adapters are clocked at 300/150 (core/memory) but under load the clocks automatically increase to 925/1375 (core/memory) which is nice.

dhruba@debian:~$ amdconfig --od-getclocks --adapter=all

Adapter 0 - AMD Radeon HD 7900 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    925           1375
             Current Peak :    925           1375
  Configurable Peak Range : [300-1125]     [150-1575]
                 GPU load :    99%

Adapter 1 - AMD Radeon HD 7900 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    925           1375
             Current Peak :    925           1375
  Configurable Peak Range : [300-1125]     [150-1575]
                 GPU load :    98%

Adapter 2 - AMD Radeon HD 7900 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    925           1375
             Current Peak :    925           1375
  Configurable Peak Range : [300-1125]     [150-1575]
                 GPU load :    98%

Adapter 3 - AMD Radeon HD 7900 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    925           1375
             Current Peak :    925           1375
  Configurable Peak Range : [300-1125]     [150-1575]
                 GPU load :    98%

Adapter 4 - AMD Radeon HD 7900 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    925           1375
             Current Peak :    925           1375
  Configurable Peak Range : [300-1125]     [150-1575]
                 GPU load :    98%

Adapter 5 - AMD Radeon HD 7900 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    925           1375
             Current Peak :    925           1375
  Configurable Peak Range : [300-1125]     [150-1575]
                 GPU load :    98%

Query temperatues for all adapters

This is handy to keep an eye on your gpus under load to check they are not overheating. The following temperatures were taken under load and you can see that adapter 3 has reached 70C despite all cards being aggressively water cooled.

dhruba@debian:~$ amdconfig --odgt --adapter=all

Adapter 0 - AMD Radeon HD 7900 Series
            Sensor 0: Temperature - 65.00 C

Adapter 1 - AMD Radeon HD 7900 Series
            Sensor 0: Temperature - 50.00 C

Adapter 2 - AMD Radeon HD 7900 Series
            Sensor 0: Temperature - 60.00 C

Adapter 3 - AMD Radeon HD 7900 Series
            Sensor 0: Temperature - 70.00 C

Adapter 4 - AMD Radeon HD 7900 Series
            Sensor 0: Temperature - 58.00 C

Adapter 5 - AMD Radeon HD 7900 Series
            Sensor 0: Temperature - 54.00 C

List crossfire candidates and crossfire status

For OpenCL it is essential that you have crossfire disabled. You can disable it either using amdconfig --crossfire=off or through catalyst control centre which you start by running amdcccle.

dhruba@debian:~$ amdconfig --list-crossfire-candidates

Master adapter:  0. 05:00.0 AMD Radeon HD 7900 Series
    Candidates:  none
Master adapter:  1. 06:00.0 AMD Radeon HD 7900 Series
    Candidates:  none
Master adapter:  2. 09:00.0 AMD Radeon HD 7900 Series
    Candidates:  none
Master adapter:  3. 0a:00.0 AMD Radeon HD 7900 Series
    Candidates:  none
Master adapter:  4. 85:00.0 AMD Radeon HD 7900 Series
    Candidates:  none
Master adapter:  5. 86:00.0 AMD Radeon HD 7900 Series
    Candidates:  none
dhruba@debian:~$ amdconfig --list-crossfire-status
    Candidate Combination:
    Master: 0:0:0
    Slave: 0:0:0
    CrossFire is disabled on current device
    CrossFire Diagnostics:
    CrossFire can work with P2P mapping through GART
    Candidate Combination:
    Master: 0:0:0
    Slave: 0:0:0
    CrossFire is disabled on current device
    CrossFire Diagnostics:
    CrossFire can work with P2P mapping through GART
    Candidate Combination:
    Master: 0:0:0
    Slave: 0:0:0
    CrossFire is disabled on current device
    CrossFire Diagnostics:
    CrossFire can work with P2P mapping through GART
    Candidate Combination:
    Master: 0:0:0
    Slave: 0:0:0
    CrossFire is disabled on current device
    CrossFire Diagnostics:
    CrossFire can work with P2P mapping through GART
    Candidate Combination:
    Master: 0:0:0
    Slave: 0:0:0
    CrossFire is disabled on current device
    CrossFire Diagnostics:
    CrossFire can work with P2P mapping through GART
    Candidate Combination:
    Master: 0:0:0
    Slave: 0:0:0
    CrossFire is disabled on current device
    CrossFire Diagnostics:
    CrossFire can work with P2P mapping through GART

You can also use amdconfig to set core and memory clocks but this will be covered in a separate article. I do not want to run these commands on my system as I’m happy with current clocks. But here’s a snippet from the man page which is fairly self explanatory. Bear in mind that to tweak clocks you need to enable overdrive using –od-enable.

  --od-enable
        Unlocks the ability to change core or memory clock values by
        acknowledging that you have read and understood the AMD Overdrive (TM)
        disclaimer and accept responsibility for and recognize the potential
        dangers posed to your hardware by changing the default core or memory
        clocks
  --od-disable
        Disables AMD Overdrive(TM) set related aticonfig options.  Previously
        commited core and memory clock values will remain, but will not be set
        on X Server restart.
  --odsc, --od-setclocks={NewCoreClock|0,NewMemoryClock|0}
        Sets the core and memory clock to the values specified in MHz
        The new clock values must be within the theoretical ranges provided
        by --od-getclocks.  If a 0 is passed as either the NewCoreClock or
        NewMemoryClock it will retain the previous value and not be changed.
        There is no guarantee that the attempted clock values will succeed
        even if they lay inside the theoretical range.  These newly set
        clock values will revert to the default values if they are not
        committed using the "--od-commitclocks" command before X is
        restarted
  --odrd, --od-restoredefaultclocks
        Sets the core and memory clock to the default values.
        Warning X needs to be restarted before these clock changes will take
        effect
  --odcc, --od-commitclocks
        Once the stability of a new set of custom clocks has been proven this
        command will ensure that the Adapter will attempt to run at these new
        values whenever X is restarted