OptiX kernels fail to load when Ambient Occlusion node is used #92363

Closed
opened 2021-10-20 12:40:28 +02:00 by Marcin Twarowski · 12 comments

System Information
Operating system: Windows-10-10.0.19041-SP0 64 Bits
Graphics card: NVIDIA GeForce GTX 1080/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 496.13

Blender Version
Broken: version: 3.0.0 Alpha, branch: master, commit date: 2021-10-19 23:31, hash: ef9269bd62
Worked: 3.0.0 Alpha, branch: master, commit date: 2021-10-18 21:38, hash: d7b231baa8

Short description of error
I have "Cycles Render Devices" set to OptiX. When I switch to rendered mode kernels start to load but fail with this message:
obraz.png
The culprit seems to be the Ambient Occlusion node as other scenes load properly. Interestingly, simply detaching AO node from shader will not help. I need to completely remove AO node from shader window, save and reload.
Switching to CUDA in settings fixes the problem. Note that I don't have an RTX card.
The same file works with yesterday's build, so the problem must have been introduced recently.

Exact steps for others to reproduce the error
Open attached blend file and switch to rendered mode.
optix_fail.blend

**System Information** Operating system: Windows-10-10.0.19041-SP0 64 Bits Graphics card: NVIDIA GeForce GTX 1080/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 496.13 **Blender Version** Broken: version: 3.0.0 Alpha, branch: master, commit date: 2021-10-19 23:31, hash: `ef9269bd62` Worked: 3.0.0 Alpha, branch: master, commit date: 2021-10-18 21:38, hash: `d7b231baa8` **Short description of error** I have "Cycles Render Devices" set to OptiX. When I switch to rendered mode kernels start to load but fail with this message: ![obraz.png](https://archive.blender.org/developer/F11320584/obraz.png) The culprit seems to be the *Ambient Occlusion* node as other scenes load properly. Interestingly, simply detaching AO node from shader will not help. I need to completely remove AO node from shader window, save and reload. Switching to CUDA in settings fixes the problem. Note that I don't have an RTX card. The same file works with yesterday's build, so the problem must have been introduced recently. **Exact steps for others to reproduce the error** Open attached blend file and switch to rendered mode. [optix_fail.blend](https://archive.blender.org/developer/F11320739/optix_fail.blend)

Added subscriber: @MarcinTwarowski

Added subscriber: @MarcinTwarowski

Added subscribers: @pmoursnv, @brecht

Added subscribers: @pmoursnv, @brecht

Changed status from 'Needs Triage' to: 'Confirmed'

Changed status from 'Needs Triage' to: 'Confirmed'

It doesn't happen for me with an RTX card on Linux, but will mark as confirmed anyway since I have no doubt it's a real bug we should fix.

Presumably 943e73b, d06828f or fd77a28 is triggering this internal compiler error.

CC @pmoursnv.

It doesn't happen for me with an RTX card on Linux, but will mark as confirmed anyway since I have no doubt it's a real bug we should fix. Presumably 943e73b, d06828f or fd77a28 is triggering this internal compiler error. CC @pmoursnv.
Member

This is a driver bug, which can only happen with the current r495 release, so 496.13 on Windows or 495.29.05 on Linux. Fix is in the works.

This is a driver bug, which can only happen with the current r495 release, so 496.13 on Windows or 495.29.05 on Linux. Fix is in the works.

Added subscriber: @SteffenD

Added subscriber: @SteffenD

@pmoursnv, thanks, depending on how soon this is fixed we may still want to work around it in Cycles though.

@pmoursnv, thanks, depending on how soon this is fixed we may still want to work around it in Cycles though.
Member

Right. Bug is that the compiler does not handle the sub.s16 PTX instruction. add.s16, neg.s16 or sub.s32 etc. work however, so technically just need to change the code so it uses those instead.
Fortunately sub.s16 only occurs in one place in Cycles (in intergator_shade_surface.h dues to a substraction with uint16_t values). Unfortunately it's rather difficult to convince the optimizer to not use it (tried casting, moving code around, and so on to no success) ... Some inline PTX does the job though:

diff --git a/intern/cycles/kernel/integrator/integrator_shade_surface.h b/intern/cycles/kernel/integrator/integrator_shade_surface.h
--- a/intern/cycles/kernel/integrator/integrator_shade_surface.h
+++ b/intern/cycles/kernel/integrator/integrator_shade_surface.h
@@ -199,9 +199,22 @@ ccl_device_forceinline void integrate_surface_direct_light(KernelGlobals kg,
 
   INTEGRATOR_STATE_WRITE(shadow_state, shadow_path, render_pixel_index) = INTEGRATOR_STATE(
       state, path, render_pixel_index);
+#  ifdef __KERNEL_OPTIX__ // Work around OptiX bug in first r495 driver
+  uint16_t rng_offset_value = INTEGRATOR_STATE(state, path, rng_offset);
+  asm("{ .reg .s16 tmp0; .reg .s16 tmp1;"
+      "neg.s16 tmp0, %3;"
+      "mul.lo.s16 tmp1, %2, tmp0;"
+      "add.s16 %0, %1, tmp1; }"
+      : "=h"(rng_offset_value) // %0
+      :  "h"(rng_offset_value), // %1
+         "h"(static_cast<uint16_t>(PRNG_BOUNCE_NUM)), // %2
+         "h"(transparent_bounce)); // %3
+  INTEGRATOR_STATE_WRITE(shadow_state, shadow_path, rng_offset) = rng_offset_value;
+#  else
   INTEGRATOR_STATE_WRITE(
       shadow_state, shadow_path, rng_offset) = INTEGRATOR_STATE(state, path, rng_offset) -
                                                PRNG_BOUNCE_NUM * transparent_bounce;
+#  endif
   INTEGRATOR_STATE_WRITE(shadow_state, shadow_path, rng_hash) = INTEGRATOR_STATE(
       state, path, rng_hash);
   INTEGRATOR_STATE_WRITE(shadow_state, shadow_path, sample) = INTEGRATOR_STATE(
Right. Bug is that the compiler does not handle the `sub.s16` PTX instruction. `add.s16`, `neg.s16` or `sub.s32` etc. work however, so technically just need to change the code so it uses those instead. Fortunately `sub.s16` only occurs in one place in Cycles (in intergator_shade_surface.h dues to a substraction with `uint16_t` values). Unfortunately it's rather difficult to convince the optimizer to not use it (tried casting, moving code around, and so on to no success) ... Some inline PTX does the job though: ``` diff --git a/intern/cycles/kernel/integrator/integrator_shade_surface.h b/intern/cycles/kernel/integrator/integrator_shade_surface.h --- a/intern/cycles/kernel/integrator/integrator_shade_surface.h +++ b/intern/cycles/kernel/integrator/integrator_shade_surface.h @@ -199,9 +199,22 @@ ccl_device_forceinline void integrate_surface_direct_light(KernelGlobals kg, INTEGRATOR_STATE_WRITE(shadow_state, shadow_path, render_pixel_index) = INTEGRATOR_STATE( state, path, render_pixel_index); +# ifdef __KERNEL_OPTIX__ // Work around OptiX bug in first r495 driver + uint16_t rng_offset_value = INTEGRATOR_STATE(state, path, rng_offset); + asm("{ .reg .s16 tmp0; .reg .s16 tmp1;" + "neg.s16 tmp0, %3;" + "mul.lo.s16 tmp1, %2, tmp0;" + "add.s16 %0, %1, tmp1; }" + : "=h"(rng_offset_value) // %0 + : "h"(rng_offset_value), // %1 + "h"(static_cast<uint16_t>(PRNG_BOUNCE_NUM)), // %2 + "h"(transparent_bounce)); // %3 + INTEGRATOR_STATE_WRITE(shadow_state, shadow_path, rng_offset) = rng_offset_value; +# else INTEGRATOR_STATE_WRITE( shadow_state, shadow_path, rng_offset) = INTEGRATOR_STATE(state, path, rng_offset) - PRNG_BOUNCE_NUM * transparent_bounce; +# endif INTEGRATOR_STATE_WRITE(shadow_state, shadow_path, rng_hash) = INTEGRATOR_STATE( state, path, rng_hash); INTEGRATOR_STATE_WRITE(shadow_state, shadow_path, sample) = INTEGRATOR_STATE( ```

This issue was referenced by blender/cycles@f72bf84eb6

This issue was referenced by blender/cycles@f72bf84eb6f5df41ebb9ecb89cbcfc9e52b3d841

This issue was referenced by be558d2d97

This issue was referenced by be558d2d9775b3d9d1f84d316d2675a205932d92

Ah, thanks for figuring that out. I think I can refactor the logic so we don't need to use a subtraction at all, and avoid the inline PTX.

Ah, thanks for figuring that out. I think I can refactor the logic so we don't need to use a subtraction at all, and avoid the inline PTX.

Changed status from 'Confirmed' to: 'Resolved'

Changed status from 'Confirmed' to: 'Resolved'
Brecht Van Lommel self-assigned this 2021-10-21 21:25:56 +02:00
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
5 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#92363
No description provided.