image2d_t corrupted when switching to the OpenCL kernel


I'm writing a pathtracer in Haskell and OpenCL, and I'm having an issue with passing an image2d_t to my kernel to write the output to. Namely, calling any of the get_image_* functions in OpenCL on the image2d_t returns nonsense values (usually either 0 or 2^24-1), and write_imagef does nothing. This only happens when running on the GPU -- the CPU runs it fine. Calling clGetImageInfo on the host returns the correct values. The Haskell bindings for OpenCL convert error codes to exceptions, so it's not a matter of forgetting to check for errors. clinfo reports my version as "OpenCL 1.2 AMD-APP (1084.2)". I should note that I experienced (and reported) multiple bugs causing the OpenCL compiler to segfault or fail to link, so this may be a result of that instead of a bug in my code.

I initialise OpenCL like this (hopefully should be relatively intelligible to people who don't know Haskell):

(platform:_) <- clGetPlatformIDs
(device:_) <- clGetDeviceIDs platform CL_DEVICE_TYPE_GPU
glContext <- glXGetCurrentContext
glDisplay <- glXGetCurrentDisplay
context <- clCreateContext [CL_GL_CONTEXT_KHR glContext, CL_GLX_DISPLAY_KHR glDisplay] [device] putStrLn
queue <- clCreateCommandQueue context device []
source <- readFile ""
program <- clCreateProgramWithSource context source
clBuildProgram program [device] "-cl-strict-aliasing"
        `catch` (λe -> case (e :: CLError) of
                            CL_BUILD_PROGRAM_FAILURE -> putStrLn "Building OpenCL program failed:"
                                                     >> clGetProgramBuildLog program device >>= putStrLn
                                                     >> throw e
                            _                        -> return ())
kernel <- clCreateKernel program "sample"
pCorners <- mallocArray 4
buffer <- clCreateBuffer context [CL_MEM_READ_ONLY, CL_MEM_USE_HOST_PTR] (4*sizeOf (undefined :: V.Vec4F), castPtr pCorners)
clSetKernelArgSto kernel 1 buffer
[email protected](TextureObject texid) <- head <$> (genObjectNames 1)
activeTexture $= TextureUnit 0
textureBinding Texture2D $= Just tex
textureFilter Texture2D $= ((Nearest, Nothing), Nearest)
textureWrapMode Texture2D S $= (Repeated, Clamp)
textureWrapMode Texture2D T $= (Repeated, Clamp)
texImage2D Nothing NoProxy 0 RGBA′ (TextureSize2D initialWidth initialHeight) 0 (PixelData RGBA UnsignedByte nullPtr)
image <- clCreateFromGLTexture2D context [CL_MEM_READ_WRITE] gl_TEXTURE_2D 0 texid
clSetKernelArgSto kernel 2 image

And I call this (slightly simplified) to run the kernel and render the result:

clSetKernelArgSto kernel 0 position
pokeArray pCorners orientedCorners -- update the pCorners array
finish -- This is glFinish()
clEnqueueAcquireGLObjects queue [image] []
clEnqueueNDRangeKernel queue kernel [width, height] [] []
clEnqueueReleaseGLObjects queue [image] []
clFinish queue
drawElements TriangleFan 4 UnsignedInt offset0

Finally, the test kernel:

__kernel void sample(float3 position, __constant float3 corner[4], image2d_t output) {
        write_imagef(output, (int2)(get_global_id(0), get_global_id(1)), (float4)(0, 0.5f, 1, 1));

The output of this is a fullscreen quad displaying a random uninitialized area of GPU memory. It should be a fullscreen cyan quad. I had some printfs in there to display the results of the get_image_* functions, but they've started causing the program to hang.

I had a similar problem. After reordering the kernel arguments, so all image2d_t arguments are the first arguments, it worked. Especally calling get_image_dim returned the right results. Don't know, if this is a bug. My GPU: ATI Radeon 7950.