Hello world - My first OpenCL program - György Kovács OpenCL

Chapter 2. My first OpenCL program

2. Hello world

The aim of OpenCL is to generalize the programming of high performance computing devices by hiding the differences of the hardware itself. As a drawback of generality, the inicialization of the OpenCL environment requires several steps to be done in every program using OpenCL. Therefore, the simplest "Hello world!"

application is much longer than the introductionary programs of other parallel programming technologies, like OpenMP or Pthreads.

The OpenCL can not be discussed in details before the reader get to know the abstract architecture of the OpenCL device. Therefore, one has to get to know many concepts before all the lines of a sample program are clarified. To make the reader have a first impression about these concepts, the structure of OpenCL programs and to have an application that can be used to test if the environment is properly configured, we present a short sample code, discuss its structure and the main steps of the compilation and linking. The sample code simply writes the string "Hello world!" to the standard output, however, the letters of the string are put together on the OpenCL device parallel.

The first step is to write the chunk of code running on the OpenCL device, called the kernel-code. This code is stored in a file having the extension .k, referring to "kernel". In fact, one can use any extension, since this file is not to be compiled neither linked, no compiler or linker should recognize its extension, therefore, we do not have to be strict to any naming convention. The kernel is written in the OpenCL C language, which is an extension of the ANSI C language by adding some elements related to the parallel execution. The OpenCL C language is covered in a later chapter of the book. Anyway, the code can be easily interpreted without knowing the novel elements of OpenCL C.

break;

} }

The parameter of the kernel-function is a string. The kernel determines the index of the kernel instance running parallel and sets the character at the same index of the string to the proper letter. The index of the kernel function, namely, the return value of the function get_global_id is discussed in details later.

The OpenCL functions called from the main function and the main parts of the program are discussed after the source code.

#define ERROR(err, prefix) if ( err != CL_SUCCESS ) printErrorString(stderr, err, prefix);

int printErrorString(FILE* o, int err, const char* p) {

switch(err) {

case CL_SUCCESS: fprintf(o, "%s: Success.\n", p); break;

case CL_DEVICE_NOT_FOUND: fprintf(o, "%s: Device not found.\n", p); break;

case CL_DEVICE_NOT_AVAILABLE: fprintf(o, "%s: Device not available.\n", p); break;

case CL_COMPILER_NOT_AVAILABLE: fprintf(o, "%s: Compiler not available.\n", p);

break;

case CL_MEM_OBJECT_ALLOCATION_FAILURE: fprintf(o, "%s: Mem. obj. app. fail.\n", p);

break;

case CL_OUT_OF_RESOURCES: fprintf(o, "%s: Out of resources.\n", p); break;

case CL_OUT_OF_HOST_MEMORY: fprintf(o, "%s: Out of host memory.\n", p); break;

case CL_BUILD_PROGRAM_FAILURE: fprintf(o, "%s: Program build failure.\n", p); break;

case CL_INVALID_VALUE: fprintf(o, "%s: Invalid value.\n", p); break;

case CL_INVALID_DEVICE_TYPE: fprintf(o, "%s: Invalid device type.\n", p); break;

case CL_INVALID_PLATFORM: fprintf(o, "%s: Invalid platform.\n", p); break;

case CL_INVALID_DEVICE: fprintf(o, "%s: Invalid device.\n", p); break;

case CL_INVALID_CONTEXT: fprintf(o, "%s: Invalid context.\n", p); break;

case CL_INVALID_QUEUE_PROPERTIES: fprintf(o, "%s: Invalid queue properties.\n", p);

break;

case CL_INVALID_COMMAND_QUEUE: fprintf(o, "%s: Invalid command queue.\n", p); break;

case CL_INVALID_HOST_PTR: fprintf(o, "%s: Invalid host pointer.\n", p); break;

case CL_INVALID_MEM_OBJECT: fprintf(o, "%s: Invalid memory object.\n", p); break;

case CL_INVALID_BINARY: fprintf(o, "%s: Invalid binary.\n", p); break;

case CL_INVALID_BUILD_OPTIONS: fprintf(o, "%s: Invalid build options.\n", p); break;

case CL_INVALID_PROGRAM: fprintf(o, "%s: Invalid program.\n", p); break;

case CL_INVALID_PROGRAM_EXECUTABLE: fprintf(o, "%s: Invalid program exec.\n", p);

break;

case CL_INVALID_KERNEL_NAME: fprintf(o, "%s: Invalid kernel name.\n", p); break;

case CL_INVALID_KERNEL_DEFINITION: fprintf(o, "%s: Invalid kernel def.\n", p);

break;

case CL_INVALID_KERNEL: fprintf(o, "%s: Invalid kernel.\n", p); break;

case CL_INVALID_ARG_INDEX: fprintf(o, "%s: Invalid argument index.\n", p); break;

case CL_INVALID_ARG_VALUE: fprintf(o, "%s: Invalid argument value.\n", p); break;

case CL_INVALID_ARG_SIZE: fprintf(o, "%s: Invalid argument size.\n", p); break;

case CL_INVALID_KERNEL_ARGS: fprintf(o, "%s: Invalid kernel arguments.\n", p);

break;

case CL_INVALID_WORK_DIMENSION: fprintf(o, "%s: Invalid work dimension.\n", p);

break;

case CL_INVALID_WORK_GROUP_SIZE: fprintf(o, "%s: Invalid work group size.\n", p);

break;

case CL_INVALID_WORK_ITEM_SIZE: fprintf(o, "%s: Invalid work item size.\n", p);

break;

case CL_INVALID_GLOBAL_OFFSET: fprintf(o, "%s: Invalid global offset.\n", p); break;

case CL_INVALID_EVENT_WAIT_LIST: fprintf(o, "%s: Invalid event wait list.\n", p);

break;

case CL_INVALID_OPERATION: fprintf(o, "%s: Invalid operation.\n", p); break;

default: fprintf(o, "%s: Unknown error.\n", p); break;

}

fflush(o);

}

void readSourceProgram(char* filename, char** source, size_t* length) {

fgets(*source + *length, maxLength, input);

*length= strlen(*source);

/** Platform, eszkoz, kornyezet, parancssor lekerdezese, letrehozasa. */

err= clGetPlatformIDs(MAX_PLATFORMS, platforms, &numPlatforms);

properties[i*2 + 1]= (cl_context_properties)(platforms[i]);

}

context= clCreateContextFromType(properties, CL_DEVICE_TYPE_GPU, NULL, NULL, &err);

ERROR(err, "clCreateContextFromType");

err= clGetContextInfo(context, CL_CONTEXT_NUM_DEVICES, sizeof(cl_uint), &numDevices,

&size);

ERROR(err, "clGetContextInfo");

queue= clCreateCommandQueue(context, devices[0], 0, &err);

ERROR(err, "clCreateCommandQueue");

/** Memoriaobjektum letrehozasa. */

memobj= clCreateBuffer(context, 0, ARRAY_SIZE * sizeof(char), NULL, &err);

ERROR(err, "clCreateBuffer");

/** Kernel-kod forditasa, kernel-objektum letrehozasa. */

program= clCreateProgramWithSource( context, 1, (const char**)&kernelSource, NULL,

&err);

ERROR(err, "clCreateProgramWithSource");

err= clBuildProgram(program, numDevices, devices, NULL, NULL, NULL);

ERROR(err, "clBuildProgram");

kernel= clCreateKernel(program, "helloWorld", &err);

ERROR(err, "clCreateKernel");

/** Kernel parametereinek beallitasa. */

err= clSetKernelArg(kernel, 0, sizeof(memobj), &memobj);

ERROR(err, "clSetKernelArg");

/** Kernel vegrehajtasok utemezese, eredmeny olvasasa. */

global_work_size= ARRAY_SIZE;

err= clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_work_size, NULL, 0, NULL, NULL);

/** Eredmenyek olvasasa es kiirasa. */

err= clEnqueueReadBuffer(queue, memobj, 1, 0, sizeof(char)*ARRAY_SIZE, output, 0, NULL, NULL);

ERROR(err, "clEnqueueReadBuffer");

printf("%s\n", output);

/** Lefoglalt eroforrasok felszabaditasa. */

clReleaseMemObject(memobj);

At the beginning some conventional headers of the standard library and the CL/opencl.h header are included.

The latter one contains the specification of all functions belonging to the C interface of the OpenCL standard.

For the ease of clear code some constants and the macro ERROR are declared. The macro is used to handle the error codes. Generally, OpenCL notifies about runtime errors through error codes. Since many of the OpenCL functions can take even dozen of parameters, one have to expect many possible error codes. The code becomes elegant and easy to read if these codes are handled with a separate function. This function handling all the possible error codes that can turn up in the "Hello world!" application is called printErrorString. The function takes three parameters: the first one is the name of the file, into which the proper error message is written. Obviously, this file can be even the standard output. The second parameter is the error code itself and the third parameter is a prefix which is appended to the error message. This can be used to mark the position of the code where the error is raised. The macro ERROR is used to shorten the use of the function printErrorString by using the stderr file as the default first argument and masking the messages related to successful function calls.

The function readSourceProgram is used to read the kernel source code from text files. The first parameter is the name of the file containing the source code (HelloWorld.k), the second parameter is the pointer of string used to return the kernel code itself and the length of the kernel code is returned through the third parameter.

The main function can be divided to several logical parts.

1. lines 94-114.: The required variables are declared and the identifiers of the available OpenCL devices are queried. Note that a computer can contain many different OpenCL supporting devices, moreover, those devices can support various versions of OpenCL. Therefore, in every OpenCL program we have to clearly determine which OpenCL device we want to use for computation. The parameter CL_DEVICE_TYPE_GPU of the function call clCreateContextFromType reflects the aim to use graphical processors. Thus, the available GPUs are used to create the abstract context of the computation. Alltogether, the first step is the discovery and identification of the OpenCL devices.

2. lines 116-118.: In the second step some memory is allocated on the OpenCL device, since the program running on the CPU usually can not access the memory of the OpenCL device. In most application this step is unavoidable, since one wants to do computations on some data which have to be uploaded to the memory of the OpenCL device.

3. lines 120-129.: In the third step the kernel source code is built for execution. Obviously, this step can be more complicated if the kernel source code is assembled from several source code fragments or already compiled libraries. Note that the kernel code is handled as a simple string, it is built for execution only in the runtime of the program.

4. lines 130-132.: In the fourth step the parameters of the built kernel are set: in our case this is the identifier of the memory region allocated on the OpenCL device. Now the kernel is ready to run. The only thing left to specifiy is the number of parallel instances. Thinking back to the kernel code, it is easy to imagine that the identifier of the memory region allocated on the OpenCL device will be translated to a real memory address in the memory of the OpenCL device and the kernel function will work with that address as a parameter.

5. lines 134-144.: The execution of the kernels is launched by the call of function clEnqueueNDRangeKernel. With the parameters of the function we specify that the kernel should be run global_work_size times, with the index range [0,ARRAY_SIZE-1]. During the execution the kernel function is running with each element of the index range as a global index and one character of the parameter string is set to the proper letter.

Finally, when all the kernels are finished, the string "Hello World!" is made up.

6. lines 142-147.: The contents of the memory region allocated on the OpenCL device are read by the call of function clEnqueueReadBuffer and the string is written on the standard output by the function printf. 7. lines 148-158.: In the last step the allocated resources are released.

The order of the steps is somewhat unbound, for example, memory can be allocated on the OpenCL device right after the building of the kernel code. However, we can state that the functions implemented in the separate blocks are necessarily present and called in every application using OpenCL.

In document György Kovács OpenCL (Pldal 23-27)