Page MenuHome

Fix T71071: Complex scene, one GPU renders successfully, but multiple GPUs fail to render
ClosedPublic

Authored by Ha Hyung-jin (robobeg) on Thu, Oct 24, 8:50 AM.

Details

Summary

The Symptom, also reported as T71071:
A very complex scene, which one GPU can render successfully with the help of AGP memory, cannot be properly rendered by multiple GPUs. The render may fail from the start or may produce a mosaic result of normal tiles and buggy tiles.

This symptom was reproduced from the following combinations of multiple GPUs and cured by this patch.

  • two NVIDIA GeForce 980 Ti GPUs
  • three NVIDIA GeForce 980 Ti GPUs
  • NVIDIA GeForce 980 Ti + GeForce RTX 2080 SUPER

    ( You can test this with the 'Cosmos Laundromat Demo' ( https://download.blender.org/demo/test/benchmark.zip ) with GPU compute device. It is a CPU render demo but can be rendered with one GPU with enough AGP memory. )

The Cause:
Under a multidevice, some GPUs may store a resource in a device memory while others may store and share it in a mapped host AGP memory. The code related to 'device_memory' does not handle this situation properly.

Diff Detail

Repository
rB Blender

Event Timeline

Ha Hyung-jin (robobeg) edited the summary of this revision. (Show Details)Thu, Oct 24, 9:07 AM
Ha Hyung-jin (robobeg) edited the summary of this revision. (Show Details)Thu, Oct 24, 9:17 AM
Ha Hyung-jin (robobeg) edited the summary of this revision. (Show Details)Thu, Oct 24, 12:42 PM
Ha Hyung-jin (robobeg) edited the summary of this revision. (Show Details)Thu, Oct 24, 1:56 PM
Ha Hyung-jin (robobeg) retitled this revision from Bug fix: Using only one GPU will render successfully, but multiple GPUs fails to render. to Fix T71071: Using only one GPU will render successfully, but multiple GPUs fail to render..
Ha Hyung-jin (robobeg) retitled this revision from Fix T71071: Using only one GPU will render successfully, but multiple GPUs fail to render. to Fix T71071: Complex scene, one GPU renders successfully, but multiple GPUs fail to render.Thu, Oct 24, 2:15 PM
Ha Hyung-jin (robobeg) edited the summary of this revision. (Show Details)

Tab spacings were modified according to surrounding codes.

Ha Hyung-jin (robobeg) edited the summary of this revision. (Show Details)Fri, Oct 25, 2:11 AM

The code related to 'device_memory' does not handle this situation properly.

Can you elaborate a bit more about it (on a higher level) ?

intern/cycles/device/device_cuda.cpp
875

We do not use such style of comments.
We are following /* in almost all the cases, sometimes its //.

The code related to 'device_memory' does not handle this situation properly.

Can you elaborate a bit more about it (on a higher level) ?

For example, let's inspect line 896 of repository device_cuda.cpp:
mem.host_pointer != mem.shared_pointer

Even if mem.host_pointer == mem.shared_pointer, it may have nothing to do with mem state of the current (sub)device. The current device in a multidevice may still have the resource in GPU memory, while some other subdevices in the same multidevice became short of GPU memory, put the resource into AGP memory, and copied the pointer to host_pointer.

This isn't really a higher level overview of the problem.

Am i right that the issue is that multiple compute devices were using same device pointer, while the code was relying on the fact that it never happens?

This isn't really a higher level overview of the problem.
Am i right that the issue is that multiple compute devices were using same device pointer, while the code was relying on the fact that it never happens?

Sorry, I am trying to elaborate more. The issue is that multiple compute devices in a multidevice share the same mem (device_memory) data structure, (and host_pointer and shared_pointer in it.) The device_pointer of mem is replaced with the actual device pointer for each compute device whenever the multidevice passes mem to it.

Some devices still have enough GPU memory and keep a resource while others put it to AGP memory (and share it). The mem (device_memory) data structure alone cannot reflect the situation like this.

This isn't really a higher level overview of the problem.
Am i right that the issue is that multiple compute devices were using same device pointer, while the code was relying on the fact that it never happens?

In the current system, mem.shared_pointer is utilized for sharing AGP memory only in the case of multidevice, so the code is written with mem data structure being shared in mind. The point is that it is not possible to determine if a device uses GPU memory only by comparing mem.host_pointer and mem.shared_pointer, ( so cuda_mem_map[&mem].use_mapped_host should be checked first. ) cuda_mem_map[&mem].map_host_pointer in the repository code may be used for that check but 1. the AGP address is already stored in shared_pointer so it is a duplication, and 2. it can be dangled by another device that originally allocated it. This is why I replaced it with a boolean flag and introduced shared_counter in device_memory.

Ha Hyung-jin (robobeg) set the repository for this revision to rB Blender.

The style of comments is modified.

Thanks for the explanation!

Stefan is the original author of the feature, so adding him to the reviewers.

Don't really have time to dig into the code and verify it line-by-line, but overall looks fine. Tested simple scenes and they still work.
More complex scene fails for me, but that also happens in master as well (this is an issue to be solved, but is not related to this patch).

intern/cycles/device/device_optix.cpp
1265–1266

Please add parenthesis. Will make it easier to understand and will also silence strict compiler warning suggest parentheses around ‘&&’ within ‘||’ [-Werror=parentheses].

Ha Hyung-jin (robobeg) marked an inline comment as done.

Parentheses are added around ‘&&’ within ‘||’ to make it easier to understand and silence strict compiler warning.

This revision is now accepted and ready to land.Tue, Nov 5, 4:41 PM