Page MenuHome

fix denoiser negative index crash in CPU kernels
Needs ReviewPublic

Authored by Mohamed Sakr (msakr) on Apr 4 2019, 4:40 PM.

Diff Detail

Repository
rB Blender
Branch
master
Build Status
Buildable 3309
Build 3309: arc lint + arc unit

Event Timeline

@Mohamed Sakr (msakr) , just commandeer it, and click edit revision in the upper right to fill out the details and proper title

Mohamed Sakr (msakr) commandeered this revision.Apr 4 2019, 4:49 PM

denoiser was crashing in OSX (latest Cycles standalone from 14/2/2019), it was doing negative indices randomly, behavior is undefined.

in my case the test scene was the default cube, light scene.
800x600 image
denoiser radius = 8
CPU tile = 64x64

example:
line 168,
idx_q = (y+dy)*stride + (x+dx);
y = 5, dy = -5, x = 5 (lowered to 4 by the round_down(rect.x, 4)), dx = -5
idx_q = -1, crash on the next load
float4 val = load4_u(image, idx_q);

@Stefan Werner (swerner) confirmed the negative index

Mohamed Sakr (msakr) retitled this revision from diff upload for @msakr to fix denoiser negative index crash in CPU kernels.
Mohamed Sakr (msakr) added a project: Cycles.
LazyDodo (LazyDodo) resigned from this revision.Apr 5 2019, 4:37 PM

This seems like a workaround more than a fix for the real issue.

I would expect it to only loop over pixels within bounds, not use zero for pixels outside of bounds.

Can you figure out why the indices are negative in the first place?

Mohamed Sakr (msakr) added a comment.EditedApr 5 2019, 5:22 PM

This seems like a workaround more than a fix for the real issue.

I would expect it to only loop over pixels within bounds, not use zero for pixels outside of bounds.

Can you figure out why the indices are negative in the first place?

TRUE, I just did this for a fast fix before Cycles 4D release (was yesterday and the bug was a day before), and I didn't want to break the denoiser function (algorithm) as I dunno what is happening.
it gets the bounds (x, y, dx, dy) before entering that function from here
https://developer.blender.org/diffusion/C/browse/master/src/device/device_cpu.cpp$504-507
so that rect is actually wrong (I did a fast fix first time by limiting the dx like this):
on https://developer.blender.org/diffusion/C/browse/master/src/device/device_cpu.cpp$506
dx = (dx < 0) ? round_down(dx, 4) : dx;
but it crashed in one of these kernels as it rounds more stuff from dynamic data (the f parameter)
@Lukas Stockner (lukasstockner97) should know more about the right way to fix.

Can you figure out why the indices are negative in the first place?

in 1 of the tiles, here, https://developer.blender.org/diffusion/C/browse/master/src/device/device_cpu.cpp$504-507

dy = -5. dx = -5, local_rect = {5, 5, 80, 80) (xyzw)
in the kernel, x + dx is intended for 0
all of the bugs in all cases happen because the bounds are rounded somewhere (x is rounded to lower 4) so the 5 will be converted to 4
4 + -5 = -1, crash.
all lines that got round_down in that file (6 lines) needs to be reworked so it ensures that it doesn't go below 0.

I'm just going ahead and updating this with my fix, the history should still be there so we could go back to your fix.

Essentially, all of the aligning was there for performance, but I looked into this a bit further and it turns out that nowadays aligned vs. unaligned doesn't really matter anymore, so my fix is just to remove all the aligned accesses.

The resulting patch is unfortunately quite large since I needed to tweak the buffer row padding (apparently, the aligning before was hiding this issue) and therefore need to pass an additional parameter to a lot of kernels.

I'm still not happy with the manual vectorized code, the readability is not exactly great. It'd be great if auto-vectorization could deal with that for us, but that does not seem to be the case currently. I suppose something like ISPC could help here, but that's a bit overkill...

Also, I couldn't test the GPU code yet (no functional changes, but there's always the possibility of messing up things like argument order), I'll try to do that tomorrow.