CUDA 5.0 Runtime VS2012 Project Template

For the continuation of my post on about programming CUDA capable programs with the VS2012 IDE, I have decided to make a project template to make things even easier.

You just need to copy the zipper files to the templates folder and you will then find the project template while creating a new project inside VS2012.

Among with the project template, I included the template files for the CUDA C source file and header file.

Download

I hope you find it useful!

Happy parallel computing ūüėÄ

Advertisements

CUDA 4.1 with Visual Studio 2012

INTRODUCTION

Every time I install a newer version of Visual Studio, I want to be able to use it for all my projects and Visual Studio 2012 was not exception.

But, just like with the older versions you need to customize and setup the IDE so it can compile and run CUDA C programs.

Below is a quick and easy guide for doing just that, it’s pretty much the same as for VS2010, so if you are already familiar with it, it will even more easier to follow.

IDE CONFIGURATION

First of all you need to make sure you already have the CUDA Toolkit, SDK and Drivers installed.

  1. Open up VS2012 and go to Tools->Options->Text Editor->File Extension
  2. Add a new extension for cu and select Microsoft Visual C++ as an Editor
  3. Copy C:\Users\<User>\AppData\Local\NVIDIA Corporation\NVIDIA GPU Computing SDK 4.0\C\doc\syntax_highlighting\visual_studio_8\usertype.dat to C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\IDE (this will add syntax highlithing to the CUDA C language keywords, like __global__ and __device__)

PROJECT CONFIGURATION

Start by creating something simple like console application and follow the next steps:

  1. Right Click on the Project at the Solution Explorer and Select Build Customizations…
  2. You should now see a list of configuration files but none of them are for CUDA. So you must click on Find Existing button and add the CUDA target files located at C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations
  3. Select one of the CUDA configuration files and click Ok.
  4. Add a new cpp source file and change it’s extension to .cu
  5. Open the file’s Properties(Alt+Enter) and change it’s Configuration Properties->General->Item type to CUDA C/C++
  6. Now, go to the project’s property page and add the following Additional Dependency cudart.lib to the Linker’s Input. (Configuration Properties->Linker->Input->Additional Dependency)
  7. Finally, still at the project’s properties page change Configuration Properties->General->Platform Toolset to Visual Studio 2010 (v100)

EXAMPLE

You can add the following code to the source file to test if Visual Studio is ready.

This is a very simplistic program, it displays some information about your CUDA devices and sums 2 vectors on just 10 blocks with 1 thread each.

#include <iostream>
#include <cuda_runtime.h>

#define N 10
#define CUDA_ERROR 1

__global__ void add( int *a, int *b, int *c ) {
	int tid = blockIdx.x;

	if(tid < N)
		c[tid] = a[tid] + b[tid];
}

int main( void ) {
	int count;
	cudaDeviceProp prop;

	int a[N], b[N], c[N];
	int *dev_a, *dev_b, *dev_c;

	if( cudaGetDeviceCount(&count) != cudaSuccess)
		return CUDA_ERROR;

	for(int i = 0; i < count; i++) {
		if(cudaGetDeviceProperties(&prop, i) != cudaSuccess)
			return CUDA_ERROR;

		printf("--- General Information for device %d ---\n", i);
		printf("Name: %s\n", prop.name);
		printf("Compute capability: %d.%d\n", prop.major, prop.minor);
		printf("Max threads per block: %d\n", prop.maxThreadsPerBlock);
		printf("Max thread dimensions: (%d, %d, %d)\n",
			prop.maxThreadsDim[0],
			prop.maxThreadsDim[1],
			prop.maxThreadsDim[2]);
		printf("Max grid dimensions; (%d, %d, %d)\n",
			prop.maxGridSize[0],
			prop.maxGridSize[1],
			prop.maxGridSize[2]);
	}

	cudaMalloc((void**)&dev_a, N * sizeof(int));
	cudaMalloc((void**)&dev_b, N * sizeof(int));
	cudaMalloc((void**)&dev_c, N * sizeof(int));

	printf("\n\n--- Adding 2 vectors on the GPU ---\n");
	for(int i = 0; i < N; i++) {
		a[i] = i * 2;
		b[i] = i * i;
	}

	cudaMemcpy(dev_a, a, N * sizeof(int), cudaMemcpyHostToDevice);
	cudaMemcpy(dev_b, b, N * sizeof(int), cudaMemcpyHostToDevice);

	add<<<N, 1>>>(dev_a, dev_b, dev_c);

	cudaMemcpy( c, dev_c, N * sizeof(int), cudaMemcpyDeviceToHost );

	for(int i = 0; i < N; i++) {
		printf("%d + %d = %d\n", a[i], b[i], c[i]);
	}

	cudaFree( dev_a );
	cudaFree( dev_b );
	cudaFree( dev_c );

	return 0;
}

RESULT

This is the result I get. Yours may¬†yield¬†different values for the device’s properties.

--- General Information for device 0 ---
Name: GeForce GT 240M
Compute capability: 1.2
Max threads per block: 512
Max thread dimensions: (512, 512, 64)
Max grid dimensions; (65535, 65535, 1)
--- Adding 2 vectors on the GPU ---
0 + 0 = 0
2 + 1 = 3
4 + 4 = 8
6 + 9 = 15
8 + 16 = 24
10 + 25 = 35
12 + 36 = 48
14 + 49 = 63
16 + 64 = 80
18 + 81 = 99