Page MenuHome

Cycles denoising on CPU buffer overflow
Open, Needs Triage by DeveloperPublic

Description

System Information
Operating system: any

Blender Version
Broken: (master, as of now, rB35c707684befb2fe823ab1bfa002b34785c31841), intern/cycles/kernel/filter/filter_nlm_cpu.h

Short description of error
When doing denoising sometimes cycles might read values that are out of memory buffer

Exact steps for others to reproduce the error
Issue lies in methods:
kernel_filter_nlm_calc_difference, method, part: int aligned_lowx = rect.x & (~3);
kernel_filter_nlm_update_output, method, part: int aligned_lowx = round_down(rect.x, 4);

This happens when aligned_lowx + dx becomes less than zero.

For example if method get rect.x = 5 value and dx is -5, then after rounding down aligned_lowx becomes 4, and after that aligned_lowx + dx becomes -1, then when reading image we get something like this load4_u(image, -1):

ccl_device_inline void kernel_filter_nlm_update_output(int dx, int dy, ..., int4 rect, ...) {
  nlm_blur_horizontal(difference_image, temp_image, rect, stride, f);

  int aligned_lowx = round_down(rect.x, 4); // Becomes 4
  for (int y = rect.y; y < rect.w; y++) {
    for (int x = aligned_lowx; x < rect.z; x += 4) {
      int4 x4 = make_int4(x) + make_int4(0, 1, 2, 3);
      int4 active = (x4 >= make_int4(rect.x)) & (x4 < make_int4(rect.z));

      int idx_p = y * stride + x, idx_q = (y + dy) * stride + (x + dx); // idx_q - becomes -1

      float4 weight = load4_a(temp_image, idx_p);
      load4_a(accum_image, idx_p) += mask(active, weight);

      float4 val = load4_u(image, idx_q); // we try to read image at position -1
      if (channel_offset) {
        val += load4_u(image, idx_q + channel_offset);
        val += load4_u(image, idx_q + 2 * channel_offset);
        val *= 1.0f / 3.0f;
      }

      load4_a(out_image, idx_p) += mask(active, weight * val);
    }
  }
}

Possible fix would be (but I am not sure if it is correct in case of denosing algorithm):

int aligned_lowx = round_down(rect.x, 4);
if (aligned_lowx + dx < 0) {
    aligned_lowx += 4;
}

Not sure I can reproduce easily as I was cross-compiling cycles and then running it on ARM.

Details

Type
Bug

Event Timeline

Tautvydas Andrikys (esminis) renamed this task from Cycles denoising on GPU buffer overflow to Cycles denoising on CPU buffer overflow.Thu, Oct 10, 11:57 AM
Tautvydas Andrikys (esminis) updated the task description. (Show Details)