Kernel objects - Runtime layer - clGetDeviceInfo and the types and descriptions of the propert

clGetDeviceInfo and the types and descriptions of the properties

5. Runtime layer

5.6. Kernel objects

for ( i= 0; i < numDevices; ++i )

binaryCodes[i]= (unsigned char*)malloc(sizeof(unsigned char)*(binaryLengths[i]));

for ( i= 0; i < numDevices; ++i ) {

err= clGetProgramInfo(binary, CL_PROGRAM_BINARIES, sizeof(unsigned char*)*numDevices, &(binaryCodes[i]), &size); function discriminant; discriminant.h contains the specification of function discriminant, and the source code secondOrder.k begins by including the header d.h. The first lines of the main function is used to initialize OpenCL, then, the OpenCL C source files are read and the program objects instantiated. First, the program object sS containing the source of function secondOrder is compiled to binary code and the output of the compiler is written to the standard output. Although the function secondOrder calls the function discriminant, secondOrder can be compiled without errors, since the implementation of discriminant is not required for compilation. Note that the source discriminant.h is passed to clCompileProgram, and we have also specified that the source of discriminant.h is included as d.h. The second call of clCompileProgram creates binary code from the source code of discriminant.k and again, the output of the compiler is written to the standard output. In the first call of clLinkProgram we have specified by option -create-library that the aim is to create program library from the program objects containing binary code. The last step is to link the library and the program object sS containing the binary code of the kernel function secondOrder, and the result is the program object binary containing the executable code.

5.6. Kernel objects

Let us compare the binary codes written to the disk in the examples of the previous section! Using NVidia devices, surprisingly the binary codes are the same, that is, anyhow the code is separated, the OpenCL C compiler replaces the calls of function discriminant by the body of the function. In other words, the function discriminant is considered to be an inline function. Generally, the OpenCL compilers compiling for GPU handle simple functions as inline functions, but other types of OpenCL supporting hardware devices can show similar behavior. Accordingly, the question may arise: if the OpenCL C code can be reorganized in arbitrary ways by the OpenCL compiler (including the abolishment of functions), how can one specify the entry point of

the executable code? How can we specify the function where the parallel execution is started on the OpenCL device?

Before answering the question, we will compile the file discriminant.k containing the implementation of function discriminant, without the contents of the file secondOrder.k. Checking the contents of the binary code written to the disk, the code is empty:

user@linux> cat discriminant.b //

// Generated by NVIDIA NVVM Compiler

// Compiler built on Wed Mar 27 23:57:44 2013 (1364425064) // Driver 310.44

.version 3.0

.target sm_11, texmode_independent .address_size 32

How can we explain this issue? The OpenCL C compilers depend highly on the implementation. If the architecture of the hardware requires to minimize the number of function calls (for example, the GPU processors have program stack to maintain nested function calls, but the operation is expensive), the function calls can be replaced by their bodies to reduce the number of calls; in other cases (for example on CPUs) the OpenCL compiler may leave the functions intact because the function call is less expensive than on GPUs. Among these heterogeneous implementations the functions defined by the attribute __kernel, so-called kernel functions are fixed points. The entry points of parallel executions on OpenCL devices are always kernel functions. The kernel functions are identified by their names, thus, kernel functions are never inline functions. Moreover, kernel functions never return values, since they are running for all the elements of the index range resulting plenty of unnecessary return values.

In the light of kernel functions, the empty code of the example can be interpreted in the following way: the NVidia OpenCL 1.1 compiler considers a code without kernel functions to be a code without entry point. Thus, the non-kernel functions are not compiled. Note that OpenCL 1.1 does not specify the functions clCompileProgram and clLinkProgram. On NVidia devices the building is performed in one step by the function clBuildProgram and creating an executable without entry point makes no sense at all. In general, that kind if behaviour is neither specified in the OpenCL standard, nor described in the official guides of vendors of OpenCL implementations. The best way to figure out the behaviour of the compiler/linker is to compile OpenCL C source files and try to interpret the binary code⁶⁵.

Executable program objects always contain the some kernel functions, the names of the kernels and also the description of the parameters of the kernels if the option -cl-kernel-arg-info was specified for the linking process. However, one program object and one OpenCL C source code can contain many kernel functions.

Thus, a program object is not enough to identify a kernel function, the entry point of a parallel execution. The kernel functions are identified and handled by the programmer through kernel objects. Kernel objects can be created from program objects containing executable code by the function clCreateKernelOpenCL 1.066.

Specification:

cl_kernel clCreateKernel( cl_program program,

const char*

kernel_name,

cl_int*

errcode_ret);

Parameters: program - A program object containing executable code.

kernel_name - The name of the kernel defined in the program object program.

65Similarly to the saying of Java programming: "Nobody is a Java programmer, until he has not recognized that the first two bytes of Java class files are 0xCAFE in hexa."

66http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clCreateKernel.html

errcode_ret - The error code is written to this address.

Return value: A valid kernel object in the case of successful execution, the error code is set otherwise.

If the goal is to create kernel objects from all the kernels defined in a program object, the function clCreateKernelsInProgramOpenCL 1.267 can be used.

Specification:

cl_kernel clCreateKernelsInProgram(

cl_program program,

cl_uint num_kernels,

cl_kernel* kernels,

cl_uint* num_kernels_ret);

Parameters: program - Program object containing executable code.

num_kernels - The size of preallocated array kernels.

kernels - The array where the created kernel objects are written.

num_kernels_ret - The number of created kernel objects is written to this address.

Return value: Error code in the case of unsuccessful execution, CL_SUCCESS otherwise.

Although the function clCreateKernelsInProgram can be used to create kernel objects for all the kernels defined in a program object, the names and arguments of the these kernels are still unknown. Among others, these problems are remedied by the function clGetKernelInfoOpenCL 1.068. The arguments are similar to that of other clGet*Info functions. The constants specifying the properties, the types and short descriptions of the properties are summarized in table 4.15.

Table 4.15. The constants specifying the properties of kernel objects and the types and short descriptions of the properties

cl_kernel_info Type Description

CL_KERNEL_FUNCTION_NAME char[] Name of the kernel function.

CL_KERNEL_NUM_ARGS cl_uint Number of arguments of the kernel function.

CL_KERNEL_REFERENCE_COUNT cl_uint The value of the reference counter.

CL_KERNEL_CONTEXT cl_context Identifier of the context mapped to the kernel.

CL_KERNEL_PROGRAM cl_program The program object containing the definition of the kernel.

CL_KERNEL_ARGS char[] The list of attributes of the function specified by the __attribute__

qualifier, separated by space.

In fact, handling kernel functions as objects makes OpenCL similar to interpreters instead of conventional C and C++ applications: the executable functions of arbitrary parameters can be loaded and the properties queried in runtime. If one could query even the types of the arguments, the kernel functions could be handled at the level of abstraction provided by interpreters. Since the host program is only a middleware between the host machine

67http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clCreateKernelsInProgram.html

68http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetKernelInfo.html

and the OpenCL device, the OpenCL standard enables the abstract handling of kernel arguments, as well: the function clGetKernelArgInfoOpenCL 1.269 can be used to query the properties of kernel arguments.

Specification:

arg_indx - The index of an argument of the kernel object.

param_name - A constant specifying the property to query. Possible values are summarized in table 4.16.

param_value_size - The number of bytes that can be written to the address param_value.

param_value - The address where the value of the property is written.

param_value_size_ret - The number of bytes written to the address param_value.

Return value: Error code in the case of unsuccessful execution, CL_SUCCESS otherwise.

Table 4.16. The constants specifying the properties of kernel arguments, the types and short descriptions of the properties

cl_kernel_arg_info Type Description

CL_KERNEL_ARG_ADDRESS_QUALIF IER

cl_kernel_arg_address_qualif

ier The address space of the argument.

Possible values: CL_KERNEL_ARG_TYPE_NAME char[] The name of the type of the

argument.

CL_KERNEL_ARG_TYPE_QUALIFIER cl_kernel_arg_type_qualifier The modifier of the argument.

Possible values:

CL_KERNEL_ARG_TYPE_CONST, CL_KERNEL_ARG_TYPE_RESTRICT,

69http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetKernelArgInfo.html

cl_kernel_arg_info Type Description

CL_KERNEL_ARG_TYPE_VOLATILE, CL_KERNEL_ARG_TYPE_NONE.

CL_KERNEL_ARG_NAME char[] The name of the argument.

Similarly to program objects, kernel objects are created dynamically and the retain/release model has to be used to handle their references properly. The function clRetainKernelOpenCL 1.070 increases and the function clReleaseKernelOpenCL 1.071 decreases the value of the reference counter.

The use of functions related to kernel objects is demonstrated by the following sample code.

Example 4.30. kernel.c

context= clCreateContextFromType(properties, CL_DEVICE_TYPE_ALL, notify, &stderr,

&err);

err= clGetContextInfo(context, CL_CONTEXT_NUM_DEVICES, sizeof(cl_uint), &numDevices,

&size);

queue= clCreateCommandQueue(context, devices[0], 0, &err);

ERROR(err, "clCreateCommandQueue");

readBinaryProgram(argv[1], &binaryCode, &binaryLength, &n);

program= clCreateProgramWithBinary(context, numDevices, devices, binaryLength, binaryCode, status, &err);

err= clGetKernelInfo(kernels[i], CL_KERNEL_FUNCTION_NAME, MAX_STRING_LENGTH,

&pString, &size);

ERROR(err, "clGetKernelInfo");

printf("kernel name: %s\n", pString);

err= clGetKernelInfo(kernels[i], CL_KERNEL_NUM_ARGS, sizeof(numArgs), &numArgs,

&size);

ERROR(err, "clGetKernelInfo");

printf("number of args: %d\n", numArgs);

for ( j= 0; j < numArgs; ++j )

ERROR(err, "clGetKernelArgInfo");

printf("\tkernel arg type name: %s\n", pString);

ERROR(err, "clGetKernelArgInfo");

printf("\tkernel arg type name: %s\n", pString); code and passed by command line arguments is read, kernel objects are created to all the kernels defined in the code by the function clCreateKernelsInProgram. In the outer loop the names and number of arguments of kernels are queried and in the inner loop the names and types of arguments are written to the standard output.

In document György Kovács OpenCL (Pldal 96-101)