Map/Unmap - Buffer objects and memory handling

clGetDeviceInfo and the types and descriptions of the properties

5. Runtime layer

5.3. Buffer objects and memory handling

5.3.3. Map/Unmap

err= clEnqueueWriteBuffer(queue, //command queue object memobj, //puffer object

CL_FALSE, //blocking read 0, //offset in bytes sizeof(int)*(ARRAY_SIZE/2), //size in bytes

output, //pointer in host memory 0, //number of event objects NULL, //array of event objects

&writeEvent); //pointer of output event object ERROR(err, "clEnqueueReadBuffer");

err= clEnqueueReadBuffer(queue, //command queue object memobj, //puffer object

CL_TRUE, //blocking read sizeof(int)*(ARRAY_SIZE/2), //offset in bytes sizeof(int)*(ARRAY_SIZE/2), //size in bytes

output, //pointer in host memory 1, //number of event objects &writeEvent, //array of event objects

&readEvent); //pointer of output event object

ERROR(err, "clEnqueueReadBuffer");

5.3.3. Map/Unmap

As we have previously mentioned, the use of functions clenqueueReadBuffer and clEnqueueWriteBuffer means data transfer between the buffer object and the main memory. This transfer is independent from the type of operation and the location of the buffer object. There is another, somewhat more sophisticated and advanced way to access the contents of a buffer object in the host program. This way of memory access is called pinned memory access.

The basic idea and practice of the approach is summarized in the followings.

1. The OpenCL buffer is mapped to a region of the host memory, that is, to an address being part of the address space of the host memory (clEnqueueMapBuffer). The mapping operation returns a pointer that can be used to access the contents of the buffer in the address space of the host memory. This pointer can be used to read or write that region, as well. Depending on the flags one has used to create the buffer object, the mapping can cause data transfer. However, in the optimal case, the mapping means the returning of the pointer of the region residing in the host memory or pinned memory.

2. After the mapping is carried out no kernel can access the buffer object, even if it resides on the OpenCL device.

3. Once the host program finished the work on the contents of the buffer, the mapping is finished by the function clEnqueueUnmapMemObject and the modification appear in the buffer objects. After the unmapping is performed, the kernels can access the contents of the buffer again.

The advantage of this approach occurs when the data transfer between the OpenCL device and the host memory can be reduced by the mapping:

• If the OpenCL device is the CPU, the buffer objects reside in the main memory. In this case the mapping of buffer objects to the address space of the host memory is trivial, meaning the returning of the pointer of the memory region of the buffer object. No data transfer is required at all.

• If the OpenCL device has dedicated physical memory (like standalone graphics cards) and the buffer objects are created on the device (neither CL_MEM_ALLOC_HOST_PTR, nor CL_MEM_USE_HOST_PTR are used), the data transfer is unavoidable to make the host program able to access the contents the buffer object.

• If the flag CL_MEM_ALLOC_HOST_PTR is used to create the buffer object, the memory region is allocated in the physical memory providing the fastest access for both the CPU and the device. If the OpenCL device is a peripheral, the buffer usually resides in pinned memory pages. Since pinned pages are part of the address space of the host memory, the mapping is simply the returning of the address without data transfer.

• If the flag CL_MEM_USE_HOST_PTR is used, the buffer object uses the memory region specified by the host_ptr argument of function clCreateBuffer. Obviously, this memory region resides in the main memory, therefore mapping means the returning of that pointer, without any data transfer.

Summarizing the above list, whenever the OpenCL device is CPU or the buffer is created using the flags CL_MEM_ALLOC_HOST_PTR or CL_MEM_USE_HOST_PTR, the memory region of the buffer resides in the main memory, and it can be accessed by the host program without data transfer. The proper use of buffer objects, the tuning of their access by the options described so far and their fitting to the problem to be solved forms the basis for fast and efficient OpenCL programs.

Buffer objects and regions of the main memory can be mapped by the function clEnqueueMapBufferOpenCL 1.036.

Specification:

void* clEnqueueMapBuffer( cl_command_queue command_queue,

cl_mem buffer, cl_bool

blocking_map,

cl_map_flags map_flags,

size_t offset, size_t size, cl_uint num_events_in_wait_list,

const cl_event*

event_wait_list,

cl_event* event cl_int*

errcode_ret);

Parameters: command_queue - Command queue object.

buffer - The buffer object to map to regions of the main memory.

blocking_map - If set, the function returns only when the mapping is performed.

map_flags - A bitfield describing the operations performed in the buffer after the mapping:

CL_MAP_READ - the host performs only reading operations; CL_MAP_WRITE - the host performs only writing operations;

CL_MAP_WRITE_INVALIDATE_REGION - the host performs writing operations, but the modifications are not to be synchronized to the buffer object.

offset - The offset in bytes of the region in the buffer object to be mapped.

size - The size of the region to be mapped in bytes.

36http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueMapBuffer.html

num_events_in_wait_list - The size of array event_wait_list.

event_wait_list - An array of event objects of size num_events_in_wait_list.

event - The event object belonging to the operation is written to that address.

errcode_ret - The error code is written to that address.

Return value: The pointer of the mapped memory region in the address space of the host memory.

The return value of the function is a pointer addressing a memory region in the address space of the host and having the same contents as the buffer object passes as the buffer argument. If the mapping can be performed without data transfer, one gets the same pointer every time the buffer object is mapped to the host memory. If the mapping cannot be accomplished without data transfer, all the calls of the function clEnqueueMapBuffer can give different pointers for the same buffer. Obviously, in the latter case each call of the function clEnqueueMapBuffer means memory allocation in the host memory, as well.

The mapping can be finished by the function clEnqueueUnmapMemObjectOpenCL 1.037. Specification:

cl_int clEnqueueUnmapMemObject(

cl_command_queue command_queue, cl_mem buffer,

void*

mapped_ptr,

cl_uint num_events_in_wait_list,

const cl_event* event_wait_list,

cl_event*

event);

Parameters: command_queue - A command queue object.

buffer - The buffer to stop the mapping with.

mapped_ptr - The pointer used to access the contents of the buffer object.

num_events_in_wait_list - The size of the array event_wait_list.

event_wait_list - An array of event objects of size num_events_in_wait_list.

event - The event object belonging to the command is written to that address.

Return value: Error code in the case of unsuccessful execution, CL_SUCCESS otherwise.

An interesting feature of function clEnqueueUnmapMemObject is that it has no parameters enabling the blocking function call. Therefore, in each case event objects are to be used to check whether the unmapping operation was finished.

The use of functions clEnqueueMapBuffer and clEnqueueUnmapMemObject is demonstrated through a variant of the sample code memory.c presented before.

Example 4.9. memory.c: 49-92

37http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueUnmapBuffer.html

CL_MEM_COPY_HOST_PTR, //buffer flag

printf("input pointer:\t%p\nmapped pointer:\t%p\n", input, p);

for ( i= 0; i < ARRAY_SIZE; ++i ) p[i]*= 2;

err= clEnqueueUnmapMemObject(queue, //command queue memobj, //memory object

The output of the program is given below.

user@linux> ./memory

input pointer: 0x7fff72fb9b70 mapped pointer: 0x7f2e15b86000 10 12 14 16 18

Similarly to the previous variants, a buffer object is created on the OpenCL device (GPU in this case), and the contents of the array input are written into that buffer. Then, the function clEnqueueMapBuffer is used to map the buffer to the address space of the main memory, and the contents of the buffer are modified using the pointer returned by the function. Particularly, all the elements of the buffer are multiplied by 2. It is easy to see from the text written to the standard output that the pointers input and p are not the same. The reason for that is that the buffer was created on the device, thus, the mapping performed an allocation in the main memory and a copy operation from the buffer object to the main memory. The mapping is finished by the function clEnqueueUnmapMemObject, and the event object is used to block the program until the operation is performed.

Then, the contents of the buffer are read by the function clEnqueueReadBuffer, and written to the standard output. It is easy to see that the numbers are exactly what we expected, that is, the program is working properly.

If the call of function clCreateBuffer is changed like below, the output of the program is also changing.

Example 4.10. memory.c: 53-58

CL_MEM_USE_HOST_PTR, //buffer flag ARRAY_SIZE * sizeof(int), //size in bytes input, //host pointer

&err); //pointer to error code variable ERROR(err, "clCreateBuffer");

user@linux> ./memory

input pointer: 0x7fff94363c00 mapped pointer: 0x7fff94363c00 10 12 14 16 18

It is easy to see that the pointers input and p are the same due to the use of the flag CL_MEM_USE_HOST_PTR. Accordingly, one can expect that no data transfer was performed.

As a third variant, the buffer is created by using the flags CL_MEM_ALLOC_HOST_PTR and CL_MEM_COPY_HOST_PTR.

Example 4.11. memory.c: 53-58

CL_MEM_ALLOC_HOST_PTR | CL_MEM_COPY_HOST_PTR, //buffer flags ARRAY_SIZE * sizeof(int), //size

input, //host pointer &err); //pointer to error code variable

ERROR(err, "clCreateBuffer");

user@linux> ./memory

input pointer: 0x7fff93b37260 mapped pointer: 0x7f33a6e8d000 10 12 14 16 18

Although the pointers differ, there is no data transfer performed when the function clEnqueueMapBuffer is called. The difference of the pointers comes from the use of the flag CL_MEM_ALLOC_HOST_PTR: memory is allocated at the pinned memory region and the data is transferred to that region at the call of function clCreateBuffer. In any call of the function clEnqueueMapBuffer returns the same pointer, that is different from the pointer input. Data transfer is performed only once, when the memory region allocated in pinned pages is initialized.

In document György Kovács OpenCL (Pldal 69-73)