Step 2 - Create solution and project

Cuda v3.2 template project using C++

Important

The tutorial is missing some compilation, linker, library settings steps. However you can download the template project, it works and has everything setup.

Introduction

I am a developer who has been developing software using .NET and C# for several years. I have never used C or C++, and it has never been required.

I like to investigate new technologies, mainly because I am curious, but also because it could make my daily development work easier or smarter.

Recently my focus has been directed towards GPGPU on the Nvidia Cuda platform.

The programming language for Cuda is called “Cuda C”. The name implies that knowledge of C indeed is required for using GPGPU on the Cuda platform.

I discovered that there exist .NET bindings to the Cuda platform and drivers. However, I find their usage complicated and insufficient, and further more kernel development will still have to done in Cuda C.

These fact made me realise that I would have to learn a bit of C and C++ to use Cuda as it was actually intended by Nvidia. Nvidia provides many samples and suggest that a Cuda development environment on Windows could use Visual Studio and Nvidia Parallel NSight for debugging, profiling etc.

As my knowledge of C and C++ development was severely limited, so was the setting up and configuration of Visual Studio 2008 for Cuda C development.

I have read “Cuda by example…” (http://developer.nvidia.com/object/cuda-by-example.html), “Programming Massively Parallel Processors…” (http://www.nvidia.com/object/io_1264656303008.html) and “C Programming Language, 2. edition” (http://www.pearsonhighered.com/educator/product/C-Programming-Language/9780131103627.page). These books have given me the foundation to start developing using Cuda C and GPGPU.

Setting up Visual Studio 2008 and making the compiler work required some work, but here is what I did.

1. Download and install driver and toolkit

Download Cuda toolkit and the developer driver and install. A restart is probably required. (http://developer.nvidia.com/object/gpucomputing.html)

2. Start Visual Studio 2008, and create a new project of type Win32 Console Application

Give the project and solution a name (here called Cuda_Template).

3. Click Next

4. Select Console application and check the empty project, then click Finish

5. Add new item called main.cpp of type C++ File

6. Add the following code the file

#include <stdio.h> 

int main() {

    printf("Hello world...\n);
    return 0;

}

7. Build and try to run the exe file. The output should be

8. Select Project -> Custom Build Rules…

9. Select Cuda Runtime API build rule (v3.2)

10. Add a new file called kernel.cu

11. Add the following to the file kernel.cu

/* power: raise base to n-th power; n >= 0 */
__device__ int devicePower(int base, int n) {

    int p = 1;

    for (int i = 1; i <= n; ++i) {
        p = p * base;
    }

    return p;
}

__global__ void power( int *base, int *n, int *output, int threadMax ) {

    int tid = threadIdx.x + blockIdx.x * blockDim.x;

    if (tid < threadMax) {
        output[tid] = devicePower(base[tid], n[tid]);
    }

}

12. Right click the newly created file and select properties

13. Set the “Exclude From Build” and make sure that the project still builds

14. Create a new file called call_kernel.cu

15. Add the following to the file call_kernel.cu

#include <cuda_runtime_api.h>
#include "main.h"

// includes, kernels
#include <kernel.cu>

void call_kernel_power(int *base, int *n, int *output, int elementCount) {

    int *dev_base, *dev_n, *dev_output;
    int gridX = (elementCount+ThreadsPerBlock-1)/ThreadsPerBlock;

    cudaMalloc( (void**)&dev_base, elementCount * sizeof(int) );
    cudaMalloc( (void**)&dev_n, elementCount * sizeof(int) );
    cudaMalloc( (void**)&dev_output, elementCount * sizeof(int) );

    cudaMemcpy( dev_base, base, elementCount * sizeof(int), cudaMemcpyHostToDevice);
    cudaMemcpy( dev_n, n, elementCount * sizeof(int), cudaMemcpyHostToDevice);

    power<<<gridX,ThreadsPerBlock>>>(dev_base, dev_n, dev_output, elementCount);

    cudaMemcpy( output, dev_output, elementCount * sizeof(int), cudaMemcpyDeviceToHost);

    cudaFree( dev_base );
    cudaFree( dev_n );
    cudaFree( dev_output );
}

16. Create a new header file called main.h

17. Add the following content to the file

#define ThreadsPerBlock 128

#include <stdio.h>

void call_kernel_power(int *base, int *n, int *output, int elementCount);

18. Update the main.cpp file with the following:

#include "main.h"

#define N   80000

int main() {

    printf("Power Cuda kernel test from C++\n");
    printf("Testing %d elements\n", N);

    int base[N], n[N], output[N];

    for(int i = 0; i < N; i++) {
          base[i] = 2;
          n[i] = i+1;
          output[i] = 0;
      }

    call_kernel_power(base, n, output, N);

    for(int i = 0; i < N && i < 15; i++) {

          printf("%d^%d = %d\n", base[i], n[i], output[i]);

      }

      printf("Done\n");

    return 0;
}

19. That should be it…

You now have a template that you can work from. When you build the file and run it, I get this output:

NHibernate dirty problem, weird updates in NHprof

Upgrading to NH3 from NH2.1.2

We have recently update to NHibernate 3 in one of our projects. While testing I was suddenly seeing some weird UPDATE statements in NHprof.

It seemed like the NHibernate IsDirty detection had changed.

After some digging it turned out that our mapping files (made in FluentNhibernate) actually contained some minor errors when compared to the DB schema. Minor errors like value typed properties not being nullable when the ClassMap said it should be.

Debugging of NHibernate projects

I found that inserting the following into the classmaps made debugging a lot easier:

DynamicUpdate();

Now only the properties that are Dirty (according to NHibernate) are used in the SQL UPDATE statement. Here are the description from NHibernate docs:

  • dynamic-update (optional, defaults to false): Specifies that UPDATE SQL should be generated at runtime and contain only those columns whose values have changed.
  • dynamic-insert: (optional, defaults to false): Specifies that INSERT SQL should be generated at runtime and contain only the columns whose values are not null.

That led me directly to the property in the mapping files that had errors in it, and I could easily fix it.

Conclusion

So, in our case, it wasn’t really the NHibernate dirty detection that changed, NHibernate 3 is just not as fault tolerant, than that of NH2. So the upgrade made us discover some errors that should be corrected.

As such, this is a good thing, but it would have been nicer if NH2 had made us aware of these errors in the first place.

But hey, better late than never ;)

Mercurial repository locked on network share and is never released

I have experienced this when the connection is dropped in the middle of a HG command being executed.

The solution is to run the following command on the machine hosting the repository. Fire the command in the directory hosting the problematic/locked .hg folder. (It is not really a requirement to execute the command on the hosting machine, but it will make it a lot quicker than over the network)

hg recover

That is it.

NHibernate CompositeId and GetHashCode SELECT N+1 problem

I was looking in my NHProf log and got a lot of SELECT request. To many and I thought I was having a SELECT N+1 problem.

Using the “Stack Trace” in NHprof I identified a call being made to GetHashCode() of the class (MyClass) holding the composite id. MyClass looked like this:

public class MyClass
{
    public virtual Key1Class Key1 { get; set; }
    public virtual Key2Class Key2 { get; set; }

    /*
     * Other virtual properties
     * ...
     */

    public override int GetHashCode()
    {
        unchecked
        {
            return ((Key1 != null ? Key1.GetHashCode() : 0) * 397) ^ (Key2 != null ? Key2.GetHashCode() : 0);
        }
    }
}

Note that Key1 and Key2 are both references, and please take a closer look at the GetHashCode method.

The problem

The problem I was facing… NHibernate called the GetHashCode() of MyClass when doing the query. And the call to MyClass‘s GetHashCode method subsequentily called GetHashCode() of both the Key1 and Key2 classes.

Key1 and Key2 were loaded as proxies and hence a call to something else than theirs Id resulted in a select being fired.

The solution

The solution was to rewrite the GetHashCode method to this:

public override int GetHashCode()
{
    unchecked
    {
        return ((Key1 != null ? Key1.Id : 0) * 397) ^ (Key2 != null ? Key2.Id : 0);
    }
}

Note that both Id’s of KeyClass1 and KeyClass2 are integers.

Now a call to MyClass GetHashCode does not result in individual SELECT statements being fired, even though the Keys area lazy loaded.

IDrive .NET backup library

What does IDriveLibrary it do?

IDrive EVS gives you 5GB of free backup space on the IDrive platform. IDrive EVS makes it possible via a kind of web http interface to upload files and more.

The library I have written is called IDriveLibrary and can be found here https://bitbucket.org/ovesen/idrivelibrary. My thought about the project, was to be able to write a backup routine in C# for my servers.

Unfortunately IDrive EVS seems buggy and slow. I have in an earlier review of backup providers deemed IDrive buggy (http://blog.ovesens.net/2010/01/test-of-online-backup-providers/). Nothing has changed.

During the development of the library I have had to contact IDrive 5 times, either because documented features simply did not work or because of errors in the documentation.

NHibernate Membase caching provider

Enyim has made a nice client library for both Memcache and Membase. Membase can be used as a free distributed caching platform.

With the new provider model introduced in ASP.NET 4, it is now possible to write providers for output caching and session state. Enyim has done just that with their memcache-provider.

I use NHibernate for some of my projects and being able to use a Membase cache provider for NHibernates second level cache would be very nice.

With help from Christian Dalager and my self, such a library is now available. Take a look here https://bitbucket.org/ovesen/membasecacheprovider

Zipped binaries are provided with mapping against NHibernate 2.0.1, 2.1.0, 2.1.2 and NH3 here https://bitbucket.org/ovesen/membasecacheprovider/downloads

An example configuration file is provided in the downloads, but else you can see the options here:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
    <!-- This is a sample configuration file for using MembaseCacheProvider -->

    <configSections>

        <!-- (Required) Default configuration section for Membase -->
        <section name="membase" type="Membase.Configuration.MembaseClientSection, Membase" />

        <!-- (Optional) An additional configuration section for Membase, can be used to target another bucket or servers -->
        <section name="membaseNhibernate" type="Membase.Configuration.MembaseClientSection, Membase" />

        <!-- (Optional) A section for the Enyim Membase client, allows loggint to e.g. log4net -->
        <sectionGroup name="enyim.com">
            <section name="log" type="Enyim.Caching.Configuration.LoggerSection, Enyim.Caching" />
        </sectionGroup>

        <!-- (Optional, but recommended) A configuration section that allows specific settings to be set for different NHibernate caching regions -->
        <section name="membaseNhibernateCache" type="MembaseCacheProvider.MembaseNhConfiguration, MembaseCacheProvider" />

    </configSections>

    <!-- (Optional) Defines that log4net should be used for Membase client logging, remember to define log4net configuration -->
    <enyim.com>
        <log factory="Enyim.Caching.Log4NetFactory, Enyim.Caching.Log4NetAdapter" />
    </enyim.com>

    <!-- (Required) Maps to the configuration section named "membase", defines a single server url and the "default" bucket -->
    <!-- Other configuration settings are available, look in the Enyim documentation and examples -->
    <!-- here http://memcached.enyim.com/ or here https://github.com/enyim/memcached-providers -->
    <membase>
        <servers bucket="default">
            <add uri="http://localhost:8091/pools/default" />
        </servers>
    </membase>

    <!-- (Optional) Maps to the configuration section name "membaseNhibernate", defines a single server url and the bucket "default_nhibernate" -->
    <membaseNhibernate>
        <servers bucket="default_nhibernate">
            <add uri="http://localhost:8091/pools/default" />
        </servers>

        <!-- (Optional) Sets specific settings for the socket pool -->
        <socketPool minPoolSize="10" maxPoolSize="100" connectionTimeout="00:00:10" />

        <!-- (Optional) Sets a specific locator -->
        <locator type="Enyim.Caching.Memcached.DefaultNodeLocator, Enyim.Caching" />
    </membaseNhibernate>

    <!-- (Optional, but recommended) Configuration settings of NHibernate second level cache, Membase provider -->
    <!-- The section attribute defines that the Membase configuration names "membaseNhibernate" should be used -->
    <membaseNhibernateCache section="membaseNhibernate">

        <!-- Here expiration details are specified for the Nhibernate cache regions LongTerm and ShortTerm -->
        <!-- The expiration are defined in minutes, meaning LongTerm = 60 minutes and ShortTerm = 2 minutes -->
        <cache region="LongTerm" expiration="60" />
        <cache region="ShortTerm" expiration="2" />
    </membaseNhibernateCache>

</configuration>

Membase local development machine ip problem – FIXED

Membase caching stops working

I struggled with Membase last night. I added NHibernate 2. level caching, Asp.net Session caching, Output caching and custom caching to some new Membase providers.

It worked at first, but then suddenly nothing was cached, and nothing was retrieved from the cache. I was working on my development machine and it seems like the problem is due to the network switching.

Well here is how it was solved. First navigate to the Membase server directory (C:\Program Files\Membase\Server\bin). Then execute the following commands.

service_stop.bat
service_unregister.bat
service_register.bat ns_1@127.0.0.1
service_start.bat

Read the original here: http://blog.danhulton.com/2011/02/05/membase-on-windows-7-ip-address-fix/

Membase cache item expiration issue

Membase can be used as a distributed caching platform, and best of all… it is free to use. This is from their website:

Unlimited use in development and up to two nodes may be deployed for free in a production cluster, with modest additional prices for additional nodes.

When I started to develope and test Membase… I first wondered. Why does the total number of caching items continue to increase. Why do the number never decrease?

I expected items to be removed when their expiration time passed, but they didn’t and I started to wonder whether I was doing something wrong.

It turns out that Membase does not have an active expiration cache clean-up functionality. If an item has expired, it is not removed until the next Get, or when the cache is full and items need to be evicted.

Here is a very good description of the logic (Original here):

Your understanding of evictions is correct. When a new item is added and there is no space for it, an older item must be thrown away. One key thing to understand is that expiration and eviction have nothing to do with one another.

When an item expires, nothing actually happens. It is only upon the next access of that item that the server will notice it has expired and remove it from memory. There is also no guarantee that already expired items will be evicted first, though it practicality they are usually old enough that they tend to be among the first to get evicted (but nothing in the server controls this).

Now, onto the nuances of evictions. Within memcached there is a slab allocator which handles memory allocation for items. Basically, the whole memory space is broken up into 1mb pages which are then broken up into slabs of varying sizes. There are configuration options to control this, but that’s the default mechanism. Each size is called a “slab class” and has its own eviction logic which means that if one slab class fills up, you can’t reclaim memory from other slab classes. Depending on the variation in size of the objects that you are putting in, this could become more or less of a problem for you. Can you telnet to your servers and run ‘stats’ and ‘stats slabs’ and paste the output? That will let me understand exactly where memory is being taken up in your server.

Another key point to understand is that evictions are not always a bad thing. With changing workloads you can end up with lots of stale data still stored in cache that just needs to take some time to be pushed out. If everything seems to be running fine (with the exception of the evictions) then you probably don’t really need to take any action. Are you seeing a large number of misses as well? High misses and high evictions can usually be correlated to mean that the application is expecting data to be there when it’s not, and that would be a problem. If the application is receiving the data that it’s asking for, your cache is working correctly.

Your understanding of evictions is correct. When a new item is added and there is no space for it, an older item must be thrown away. One key thing to understand is that expiration and eviction have nothing to do with one another.When an item expires, nothing actually happens. It is only upon the next access of that item that the server will notice it has expired and remove it from memory. There is also no guarantee that already expired items will be evicted first, though it practicality they are usually old enough that they tend to be among the first to get evicted (but nothing in the server controls this).Now, onto the nuances of evictions. Within memcached there is a slab allocator which handles memory allocation for items. Basically, the whole memory space is broken up into 1mb pages which are then broken up into slabs of varying sizes. There are configuration options to control this, but that’s the default mechanism. Each size is called a “slab class” and has its own eviction logic which means that if one slab class fills up, you can’t reclaim memory from other slab classes. Depending on the variation in size of the objects that you are putting in, this could become more or less of a problem for you. Can you telnet to your servers and run ‘stats’ and ‘stats slabs’ and paste the output? That will let me understand exactly where memory is being taken up in your server.

Another key point to understand is that evictions are not always a bad thing. With changing workloads you can end up with lots of stale data still stored in cache that just needs to take some time to be pushed out. If everything seems to be running fine (with the exception of the evictions) then you probably don’t really need to take any action. Are you seeing a large number of misses as well? High misses and high evictions can usually be correlated to mean that the application is expecting data to be there when it’s not, and that would be a problem. If the application is receiving the data that it’s asking for, your cache is working correctly.

So if it seems like the total number of your Membase  cache items just increases and increases, don’t worry it is actually normal behavior. Just make sure you have configured the server instance with the recommended amount of  server memory and disk resources. Take a look in the best practises.

Sample TwitterTest authentication and console application

How to make your application or service tweet

My case: I have a windows service and I want to be able to know the internal state of this service and e.g. when an exception was thrown. The obviusl answer for this is logging. But I already use Log4Net, and I want to be able to access this info without having to remote to the server.
Others have solved this issue by making e.g. Kayak part of the service and expose that state as JSON. That do sound tempting, however I am more to some kind of logging mechanism. I know that there exists Twitter Appenders for Log4Net, but by using something existing I will not learn anything :)
So what I want to implement is custom Twitter logging, and here is how.

  1. Register your application with Twitter (http://dev.twitter.com/apps/new)
  2. Save your consumerKey and consumerSecret
  3. Run TwitterAuthenticator to retrieve the Access token. Save Token and TokenSecret
  4. Use the TweetConsole as a sample for Tweeting. Use the consumerKey, consumerSecret and the Access Token details

That is it…

You can find the source code here: https://bitbucket.org/ovesen/twittertest

My thoughts, stuff I need to remember or things I just want to share with the world