Cycles: Pack kernel textures into buffers for OpenCL
ClosedPublic

Authored by Mai Lavelle (maiself) on Jul 20 2017, 11:23 AM.

Details

Summary

Image textures were being packed into a single buffer for OpenCL, which limited the amount of memory available for images to the size of one buffer (usually 4gb on AMD hardware). By packing textures into multiple buffers that limit is removed, while simultaneously reducing the number of buffers that need to be passed to each kernel.

Benchmarks were within 2%. Will fix issue from T51554.

Diff Detail

Repository
rB Blender

Really helpfull addition and it seems very robust. It shows that the latest AMD drivers also allows to use swap for buffers. I managed to render a scene that took 36GB on a 16GB (system) and 8GB (RX480) system. Of course, it was slow, but still interesting.
Only thing that it showed is that cycles seem to have like a memory leak or hidden cache. See report https://developer.blender.org/T52135

Brecht Van Lommel (brecht) requested changes to this revision.Jul 20 2017, 8:14 PM

I like the idea. The main downside I see is that updating one texture (which can be an image or mesh data or shaders) will now free and re-transfer all the scene memory the GPU. That's going to be quite slow in complex scenes if you just want to tweak a single material color.

Is there anything we can do to avoid this? Maybe spreading the textures as much as possible of the 8 buffers and only re-transferring the changed buffers or the parts that changed?

intern/cycles/device/device_opencl.cpp
97–98

We can entirely remove info.packed_images now that no device is using it anymore.

intern/cycles/device/opencl/opencl.h
545–552

Can we have just a single map instead of two?

554–555

Perhaps rename this to texture_packed_buffers and textures_need_update, to make it clear this is specifically for textures.

intern/cycles/device/opencl/opencl_base.cpp
145

Non need to check if this is null, mem_free already does it.

520–523

This is not deleting the corresponding entry in texture_options, would be avoided if we have a single map.

618

No need to check if this is null.

647

When possible I prefer to define the variable on the line it is initialized.

685

Can we define a const int NUM_TEXTURE_PACKED_BUFFERS to avoided the hardcoded 8 here?

703

Can we use a vector for this?

This revision now requires changes to proceed.Jul 20 2017, 8:14 PM

Builds with this patch and ASAN enabled on Linux can't render the default cube scene (crash) with cycles and OpenCL split kernel.
VS2013 builds start rendering but stay forever with an idle GPU on the first tile on complex scenes.
On Linux without ASAN it renders correctly and fast.

Tested with Vs 2015 and works fine for me.

  • Rebase on master
  • Created MemoryManager class to contain logic of managing buffers
  • Rename and cleanup
  • Removed old image packing code

The new memory management system is simple: allocations are spread out into the buffers, buffers are only reallocated when their size changes, and memory is copied from the most efficient location only when updated. This should be much faster than the previous patch, even for frequent changes in large scenes.

Great, thanks for addressing the viewport update performance. Seems to work fine in my tests.

intern/cycles/device/opencl/memory_manager.cpp
115

This causes a warning for me.

intern/cycles/device/opencl/memory_manager.cpp:114:10: warning: destination for this 'memcpy' call is a pointer to dynamic class 'device_memory'; vtable pointer will be overwritten [-Wdynamic-class-memaccess]
                memcpy(&buffer, &new_buffer, sizeof(buffer));
                ~~~~~~ ^

Removed use of memcpy on device_memory.

Looks good to me now.

This revision is now accepted and ready to land.Aug 7 2017, 12:55 PM
This revision was automatically updated to reflect the committed changes.