Tag Archives: vs2008

Step 2 - Create solution and project

Cuda v3.2 template project using C++


The tutorial is missing some compilation, linker, library settings steps. However you can download the template project, it works and has everything setup.


I am a developer who has been developing software using .NET and C# for several years. I have never used C or C++, and it has never been required.

I like to investigate new technologies, mainly because I am curious, but also because it could make my daily development work easier or smarter.

Recently my focus has been directed towards GPGPU on the Nvidia Cuda platform.

The programming language for Cuda is called “Cuda C”. The name implies that knowledge of C indeed is required for using GPGPU on the Cuda platform.

I discovered that there exist .NET bindings to the Cuda platform and drivers. However, I find their usage complicated and insufficient, and further more kernel development will still have to done in Cuda C.

These fact made me realise that I would have to learn a bit of C and C++ to use Cuda as it was actually intended by Nvidia. Nvidia provides many samples and suggest that a Cuda development environment on Windows could use Visual Studio and Nvidia Parallel NSight for debugging, profiling etc.

As my knowledge of C and C++ development was severely limited, so was the setting up and configuration of Visual Studio 2008 for Cuda C development.

I have read “Cuda by example…” (http://developer.nvidia.com/object/cuda-by-example.html), “Programming Massively Parallel Processors…” (http://www.nvidia.com/object/io_1264656303008.html) and “C Programming Language, 2. edition” (http://www.pearsonhighered.com/educator/product/C-Programming-Language/9780131103627.page). These books have given me the foundation to start developing using Cuda C and GPGPU.

Setting up Visual Studio 2008 and making the compiler work required some work, but here is what I did.

1. Download and install driver and toolkit

Download Cuda toolkit and the developer driver and install. A restart is probably required. (http://developer.nvidia.com/object/gpucomputing.html)

2. Start Visual Studio 2008, and create a new project of type Win32 Console Application

Give the project and solution a name (here called Cuda_Template).

3. Click Next

4. Select Console application and check the empty project, then click Finish

5. Add new item called main.cpp of type C++ File

6. Add the following code the file

#include <stdio.h> 

int main() {

    printf("Hello world...\n);
    return 0;


7. Build and try to run the exe file. The output should be

8. Select Project -> Custom Build Rules…

9. Select Cuda Runtime API build rule (v3.2)

10. Add a new file called kernel.cu

11. Add the following to the file kernel.cu

/* power: raise base to n-th power; n >= 0 */
__device__ int devicePower(int base, int n) {

    int p = 1;

    for (int i = 1; i <= n; ++i) {
        p = p * base;

    return p;

__global__ void power( int *base, int *n, int *output, int threadMax ) {

    int tid = threadIdx.x + blockIdx.x * blockDim.x;

    if (tid < threadMax) {
        output[tid] = devicePower(base[tid], n[tid]);


12. Right click the newly created file and select properties

13. Set the “Exclude From Build” and make sure that the project still builds

14. Create a new file called call_kernel.cu

15. Add the following to the file call_kernel.cu

#include <cuda_runtime_api.h>
#include "main.h"

// includes, kernels
#include <kernel.cu>

void call_kernel_power(int *base, int *n, int *output, int elementCount) {

    int *dev_base, *dev_n, *dev_output;
    int gridX = (elementCount+ThreadsPerBlock-1)/ThreadsPerBlock;

    cudaMalloc( (void**)&dev_base, elementCount * sizeof(int) );
    cudaMalloc( (void**)&dev_n, elementCount * sizeof(int) );
    cudaMalloc( (void**)&dev_output, elementCount * sizeof(int) );

    cudaMemcpy( dev_base, base, elementCount * sizeof(int), cudaMemcpyHostToDevice);
    cudaMemcpy( dev_n, n, elementCount * sizeof(int), cudaMemcpyHostToDevice);

    power<<<gridX,ThreadsPerBlock>>>(dev_base, dev_n, dev_output, elementCount);

    cudaMemcpy( output, dev_output, elementCount * sizeof(int), cudaMemcpyDeviceToHost);

    cudaFree( dev_base );
    cudaFree( dev_n );
    cudaFree( dev_output );

16. Create a new header file called main.h

17. Add the following content to the file

#define ThreadsPerBlock 128

#include <stdio.h>

void call_kernel_power(int *base, int *n, int *output, int elementCount);

18. Update the main.cpp file with the following:

#include "main.h"

#define N   80000

int main() {

    printf("Power Cuda kernel test from C++\n");
    printf("Testing %d elements\n", N);

    int base[N], n[N], output[N];

    for(int i = 0; i < N; i++) {
          base[i] = 2;
          n[i] = i+1;
          output[i] = 0;

    call_kernel_power(base, n, output, N);

    for(int i = 0; i < N && i < 15; i++) {

          printf("%d^%d = %d\n", base[i], n[i], output[i]);



    return 0;

19. That should be it…

You now have a template that you can work from. When you build the file and run it, I get this output:

Visual Studio 2008 debugging and breakpoints

My development tools are currently Visual Studio 2008, Resharper 4.5, Gallio and Testdriven.NET. But for as long as I have used these tools, my Visual Studio debugger has only worked partly.

The problem

When I inserted some breakpoints in my code, the first one was almost all the time hit. However, when I tried to step through the code, it worked fine for a bout 3-6 steps. But then Visual Studio debugger decided that was enough, and just completed the code.

The good thing was, it forced me to write many unit test cases, but sometimes I really think it is nice to visually see the state of your objects, in the code.

I have spoken to colleagues, and tested with different combinations of Gallio, Testdriven.NET and Re-sharper enabled, as I thought they cased the problem. But the debugger just kept bugging me. I had actually given up finding a solution to this problem, thinking that Visual Studio 2010 would fix it.

The solution

So finally, in a complete other context, I stumbled upon this page:


And thought…. that title sounds interesting… I quickly moved to the KB article, and started to read. I thought, that sounds exactly as my problem.

With nothing to loose I installed the update, and started to test the debugger… until now it seems to work :)

– please note, I do not use the debugger that often as I write very reliable code :)

Visual Studio 2008 – ItemTemplatesCache missing

The issue 

My Visual Studio 2008 opened the solution file with no problem. However, when I was to add a new class, user control, aspx file or similar, I received the error:

Could not find part of "c:\Program Files\Microsoft Visual Studio 9.0\…\ItemTemplatesCache…"

[and the path to the ItemTemplatesCache in Visual Studio 2008 folder].

The solution

Well, one slution would be to reinstall or repair installation of Visual studio 2008. However, after searching for a while I found a solution that is much easier:

  1. Open "Visual Studio 2008 Command Prompt" as an Administrator
  2. Run the command: "devenv /installvstemplates"
  3. That's it


It takes a while but it completely rebuild your item template cache if it is corrupt or completely missing like me.