Stefan Werner (swerner)
User

Projects

User Details

User Since
Mar 31 2015, 9:29 AM (107 w, 6 d)

Recent Activity

Thu, Apr 20

Stefan Werner (swerner) added an edge to rBaeda1a16f3e4: D2607: Switch eye dropper to use linear color space internally: D2607: Switch eye dropper to use linear color space internally.
Thu, Apr 20, 10:42 PM
Stefan Werner (swerner) added 1 commit(s) for D2607: Switch eye dropper to use linear color space internally: rBaeda1a16f3e4: D2607: Switch eye dropper to use linear color space internally.
Thu, Apr 20, 10:42 PM
Stefan Werner (swerner) committed rBaeda1a16f3e4: D2607: Switch eye dropper to use linear color space internally (authored by Stefan Werner (swerner)).
D2607: Switch eye dropper to use linear color space internally
Thu, Apr 20, 10:41 PM
Stefan Werner (swerner) added 1 commit(s) for D2608: Allow HDR picking from Compositor background: rBb628f765b091: D2608: Allow HDR picking from Compositor background Replaced some STREQ(snode….
Thu, Apr 20, 10:33 PM
Stefan Werner (swerner) added an edge to rBb628f765b091: D2608: Allow HDR picking from Compositor background Replaced some STREQ(snode…: D2608: Allow HDR picking from Compositor background.
Thu, Apr 20, 10:33 PM
Stefan Werner (swerner) committed rBb628f765b091: D2608: Allow HDR picking from Compositor background Replaced some STREQ(snode… (authored by Stefan Werner (swerner)).
D2608: Allow HDR picking from Compositor background Replaced some STREQ(snode…
Thu, Apr 20, 10:32 PM

Fri, Apr 7

Stefan Werner (swerner) updated the diff for D2607: Switch eye dropper to use linear color space internally.
Fri, Apr 7, 10:33 PM
Stefan Werner (swerner) added a comment to D2608: Allow HDR picking from Compositor background.

While I'm at it, shouldn't we also replace the various STREQ(snode->tree_idname, ...) calls in node_group.c with the ED_node_is_*() calls for improved readability?

Fri, Apr 7, 10:19 PM
Stefan Werner (swerner) created D2608: Allow HDR picking from Compositor background.
Fri, Apr 7, 10:15 PM
Stefan Werner (swerner) updated the summary of D2607: Switch eye dropper to use linear color space internally.
Fri, Apr 7, 9:57 PM
Stefan Werner (swerner) updated the summary of D2607: Switch eye dropper to use linear color space internally.
Fri, Apr 7, 9:56 PM
Stefan Werner (swerner) created D2607: Switch eye dropper to use linear color space internally.
Fri, Apr 7, 9:52 PM

Wed, Apr 5

Stefan Werner (swerner) added a comment to D2443: Render API/Cycles: Identify Render Passes by their Name instead of a type flag.

What will it take to get this and D2444 into master? I'd love to be able to have the foundation for many other passes (light groups, Cryptomatte, eventually even LPEs) to be added to Blender and Cycles.

Wed, Apr 5, 2:54 PM

Wed, Mar 29

Stefan Werner (swerner) added a comment to D2586: Cycles: Make all #include statements relative to cycles source directory.

This is also very useful for 3rd party integrations. I've been in a situation before where there were two headers called node.h in the header search path.

Wed, Mar 29, 12:50 PM

Mar 23 2017

Stefan Werner (swerner) added a comment to D2575: Cycles: WIP: Implement UDIM texture node.

We're been on OIIO 1.7.8 since early december

Mar 23 2017, 3:22 PM
Stefan Werner (swerner) added a comment to D2575: Cycles: WIP: Implement UDIM texture node.

For support in OSL, it should only take an update to OpenImageIO 1.7 or newer:

Mar 23 2017, 2:25 PM

Mar 22 2017

Stefan Werner (swerner) updated the diff for D2056: Allow CUDA GPU rendering to use host memory.

This is an update of the patch against the latest master. It still needs to be changed to share host memory between GPUs where possible instead of creating duplicate allocations.

Mar 22 2017, 10:58 PM · Cycles
Stefan Werner (swerner) committed rB412220c8d3b7: Cycles: fixed warnings (authored by Stefan Werner (swerner)).
Cycles: fixed warnings
Mar 22 2017, 12:28 PM

Feb 17 2017

Stefan Werner (swerner) awarded D2348: Cycles: Refactor split kernel and implement for CPU a Like token.
Feb 17 2017, 11:37 AM

Oct 6 2016

Stefan Werner (swerner) added a comment to D2226: Cycles: Speedup transparent shadows on CUDA.

I have experimented in the same direction. My code was using shared memory, which should in theory be faster than global/local memory. As Brecht mentions, moving through shadow in steps of N intersections at a time could make this work for arbitrary shadow depths. One would need to find the N closest intersections then, which I would try by keeping the intersections in a sorted heap (similar to Jensen's photon mapping code). Then sorting and volumes should also be doable.

Oct 6 2016, 10:03 AM

Jul 21 2016

Stefan Werner (swerner) added inline comments to D1995: Parametric Geometry coordinates for point and area lights in Cycles.
Jul 21 2016, 3:20 AM · Cycles
Stefan Werner (swerner) updated the diff for D1995: Parametric Geometry coordinates for point and area lights in Cycles.

Here's an updated patch.

Jul 21 2016, 3:17 AM · Cycles

Jun 30 2016

Stefan Werner (swerner) added a comment to D2056: Allow CUDA GPU rendering to use host memory.

Looks like cuMemAllocHost() is not the same as malloc() followed by mlock(). While malloc/mlock easily allows me to get 14 out of 16GB on my machine, cuMemAllocHost() freezes the entire machine at 12GB.

Jun 30 2016, 7:53 AM · Cycles

Jun 29 2016

Stefan Werner (swerner) added a comment to D2032: Cycles: Implement stackless BVH traversal.

Right now ray traversal is not making any use of shared memory at all. There may be an opportunity here, either for storing a short stack or the bit mask from the Barringer paper in shared memory. If I understood the Barringer paper correctly, they compared a CPU implementation only. The differences between global/local memory access and shared memory access could give different results in a GPU implementation.

Jun 29 2016, 12:25 PM
Stefan Werner (swerner) added a comment to D2056: Allow CUDA GPU rendering to use host memory.

The change to SHADOW_RECORD_ALL wasn't meant as a permanent solution, but to compare how GPU performance would be in the largest Blender scene available to me (will happily render any other scenes). When SHADOW_RECORD_ALL is different for CPU and GPU, we're not tracing the same number of rays, and benchmarks with high levels of transparent shadows will always perform better on the CPU.

Jun 29 2016, 7:36 AM · Cycles

Jun 22 2016

Stefan Werner (swerner) added a comment to D2056: Allow CUDA GPU rendering to use host memory.

My computer just finished rendering the Victor/Gooseberry benchmark scene on two GPUs, coming in at about 1h:36m (conservative tile size of 64x64). So I think my patch is pretty solid.

Jun 22 2016, 4:00 PM · Cycles

Jun 19 2016

Stefan Werner (swerner) updated subscribers of D1985: Light Linking.
Jun 19 2016, 8:07 PM · Cycles

Jun 16 2016

Stefan Werner (swerner) added a comment to D2056: Allow CUDA GPU rendering to use host memory.

There is no swapping with pinned memory. That's the whole point about pinned memory - it always remains fixed in physical memory, at the same address, no matter how severe the OS' memory pressure is. That said, it was an SSD.

Jun 16 2016, 2:50 PM · Cycles
Stefan Werner (swerner) added a comment to D2056: Allow CUDA GPU rendering to use host memory.

Octane exposes what they call "out-of-core" textures, which I assume is textures in pinned memory, as a feature that the user has to enable and to pick how much memory it is supposed to use: https://docs.otoy.com/Standalone_2_0/?page_id=3216

Jun 16 2016, 11:32 AM · Cycles
Stefan Werner (swerner) added a comment to D2056: Allow CUDA GPU rendering to use host memory.

It might also make sense to use pinned memory for all device_vector allocations even if we want to copy memory to the GPU. That way we can use async copies which should be faster, particularly for multi GPU.

Jun 16 2016, 7:44 AM · Cycles
Stefan Werner (swerner) added a comment to D2056: Allow CUDA GPU rendering to use host memory.

Different operating systems may handle this differently, but for what it's worth, I ran a simple test on OS X. cuMemAllocHost() in a loop, allocating 200MB chunks, 100 times - which should try to allocate ~20GB in total. My machine has 16GB of physical memory. At just under 10GB allocated, the machine froze completely. That is, not even the mouse pointer will move and the machine does not respond any more to network requests. After about one or two two minutes of being frozen, the machine rebooted, the "crash" log pointing to the Watchdog task.

Jun 16 2016, 7:38 AM · Cycles

Jun 15 2016

Stefan Werner (swerner) added a comment to D2056: Allow CUDA GPU rendering to use host memory.

From the CUDA documentation it sounds like pinning all available memory may not be a good idea:

Jun 15 2016, 10:00 PM · Cycles
Stefan Werner (swerner) added a comment to D2056: Allow CUDA GPU rendering to use host memory.

It only pins the amount of memory required, no more. The user preference would only set an upper bound to that.

Jun 15 2016, 7:59 PM · Cycles
Stefan Werner (swerner) added a comment to D2056: Allow CUDA GPU rendering to use host memory.

A couple of ideas for improvement:

Jun 15 2016, 7:06 PM · Cycles
Stefan Werner (swerner) updated the diff for D2056: Allow CUDA GPU rendering to use host memory.

This should now include all changes, squashed into a single commit.

Jun 15 2016, 4:38 PM · Cycles
Stefan Werner (swerner) added a comment to D2056: Allow CUDA GPU rendering to use host memory.

Yes. I'm still trying to figure out how this system and git patches work. It looks like when I try to upload my diff file with multiple commits in it, it takes only the first one.

Jun 15 2016, 4:30 PM · Cycles
Stefan Werner (swerner) added a comment to D2056: Allow CUDA GPU rendering to use host memory.

This seems to be solid enough to allow me to launch a render of the Gooseberry benchmark scene on my 4GB GPU with TDR turned off. However, some tiles render extremely slow (counting seconds per sample instead of samples per second!), I haven't found out yet what crazy things happening in them.

Jun 15 2016, 4:20 PM · Cycles
Stefan Werner (swerner) added a comment to T48651: Allow CUDA GPU rendering to use host memory.

OK, here you are: https://developer.blender.org/D2056

Jun 15 2016, 4:17 PM · Cycles
Stefan Werner (swerner) retitled D2056: Allow CUDA GPU rendering to use host memory from to Allow CUDA GPU rendering to use host memory.
Jun 15 2016, 4:17 PM · Cycles
Stefan Werner (swerner) added a comment to T48651: Allow CUDA GPU rendering to use host memory.

I have attached a patch should hopefully take care of the Linux memory query. Apply this on top of the first patch.

Jun 15 2016, 12:25 PM · Cycles
Stefan Werner (swerner) added a comment to T48651: Allow CUDA GPU rendering to use host memory.

The code for determining the amount of physical system memory is not implemented yet for Linux - I am about to create a Linux VM to implement and test that part. So it is possible that the current patch is not working as intended on Linux systems.

Jun 15 2016, 8:42 AM · Cycles

Jun 14 2016

Stefan Werner (swerner) added a comment to T48651: Allow CUDA GPU rendering to use host memory.

I can get the Victor scene to start rendering on the GPU, but it fails with a kernel time out after about 20 tiles (using the default small tile size). Maybe someone with TDR turned off can try benchmarking it?

Jun 14 2016, 10:54 PM · Cycles
Stefan Werner (swerner) added a comment to T48651: Allow CUDA GPU rendering to use host memory.

PCIe bandwidth seems to be the main factor here. When I move the K5000 to a slower slot (x4 instead of x16), performance drops dramatically. I haven't had the patience to let it run all the way through, but the BMW scene with all data in host memory is still at the first tile after 4 minutes, showing a remaining estimate of over one hour.

Jun 14 2016, 6:08 PM · Cycles
Stefan Werner (swerner) added a comment to T48651: Allow CUDA GPU rendering to use host memory.

Well, the answer is a clear "it depends". My desktop machine has a single i5 CPU and a K5000 GPU, so the GPU can take quite a performance penalty before the CPU overtakes it - my laptop has an i7 quad core and a 750M, there the GPU/CPU difference is not as strong. I know, K5000 sounds like a monster card, but for this purpose, it's more or less a GTX 670 with twice the memory.

Jun 14 2016, 2:53 PM · Cycles
Stefan Werner (swerner) added a comment to T48651: Allow CUDA GPU rendering to use host memory.

Jun 14 2016, 10:45 AM · Cycles
Stefan Werner (swerner) created T48651: Allow CUDA GPU rendering to use host memory.
Jun 14 2016, 10:44 AM · Cycles

Jun 13 2016

Stefan Werner (swerner) committed rB2566652ae6c0: Cycles: fixed a typo that would crash shaders that use the "Is Diffuse Ray"… (authored by Stefan Werner (swerner)).
Cycles: fixed a typo that would crash shaders that use the "Is Diffuse Ray"…
Jun 13 2016, 1:34 PM

May 24 2016

Stefan Werner (swerner) awarded D1999: Cycles: Add support for bindless textures. a Like token.
May 24 2016, 10:21 AM

May 19 2016

Stefan Werner (swerner) retitled D2008: Fixed a rare case of NaN in Cycles from to Fixed a rare case of NaN in Cycles.
May 19 2016, 10:51 AM · Cycles

May 18 2016

Stefan Werner (swerner) updated subscribers of D1999: Cycles: Add support for bindless textures..
May 18 2016, 8:21 AM
Stefan Werner (swerner) updated subscribers of D2002: Cycles: Add multi-scattering, energy-conserving GGX as an option to the Glossy, Anisotropic and Glass BSDFs.
May 18 2016, 8:21 AM
Stefan Werner (swerner) updated subscribers of D2003: Cycles: Add a new Metallic BSDF, combining condictive fresnel and multi-scattering GGX.
May 18 2016, 8:21 AM

May 17 2016

Stefan Werner (swerner) updated the diff for D1995: Parametric Geometry coordinates for point and area lights in Cycles.

This updated patch removes the now unused ray_quad_intersect() and ray_triangle_intersect().

May 17 2016, 3:16 PM · Cycles

May 16 2016

Stefan Werner (swerner) added a comment to D1995: Parametric Geometry coordinates for point and area lights in Cycles.

Here is a before/after comparison with an area light:


The same shader on a point light:

May 16 2016, 1:02 PM · Cycles

May 15 2016

Stefan Werner (swerner) retitled D1995: Parametric Geometry coordinates for point and area lights in Cycles from to Parametric Geometry coordinates for point and area lights in Cycles.
May 15 2016, 9:16 PM · Cycles

Nov 27 2015

Stefan Werner (swerner) added a comment to D1621: Cycles: reduced memory usage of subsurface scattering.

Sorry for not responding earlier, there were a number of other things to work on. Sergey, is this still relevant or do your latest changes to SSS take care of this?

Nov 27 2015, 4:24 PM · Cycles
Stefan Werner (swerner) added a comment to T46760: Branched Path Tracing converges to different result than plain Path Tracing.

Thanks!

Nov 27 2015, 4:21 PM · BF Blender, Cycles

Nov 20 2015

Stefan Werner (swerner) added a comment to T46760: Branched Path Tracing converges to different result than plain Path Tracing.

Here we go. This would be my proposed fix. I hope my code style isn't too far from your standards.

Nov 20 2015, 5:45 PM · BF Blender, Cycles

Nov 19 2015

Stefan Werner (swerner) added a comment to D1621: Cycles: reduced memory usage of subsurface scattering.

Querying for CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES (via cuFuncGetAttribute) on a machine with SM 5.2, returns 89024 bytes before the patch and 75056 after the patch. So it's about 15% less local memory per kernel thread.

Nov 19 2015, 12:57 PM · Cycles
Stefan Werner (swerner) added a comment to D1621: Cycles: reduced memory usage of subsurface scattering.

Nevermind, I have to dampen expectations. sizeof(ShaderData) is ~5KB, the amount of memory saved is about 14kB/thread.

Nov 19 2015, 11:33 AM · Cycles

Nov 18 2015

Stefan Werner (swerner) added a comment to D1621: Cycles: reduced memory usage of subsurface scattering.

I'm away from my dev machine right now, but I think that sizeof(ShaderData) is somewhere in the range of 20kB-40kB. The patch should save three instances of ShaderData, so it's reducing local memory usage in the ballpark of 80kB per thread. Don't quote me on those exact numbers though, I can look it up in more detail tomorrow.

Nov 18 2015, 9:12 PM · Cycles
Stefan Werner (swerner) added a comment to T46814: Cycles: reduced memory usage of subsurface scattering.

Sure, no problem:
https://developer.blender.org/D1621

Nov 18 2015, 7:11 PM · Cycles
Stefan Werner (swerner) retitled D1621: Cycles: reduced memory usage of subsurface scattering from to Cycles: reduced memory usage of subsurface scattering.
Nov 18 2015, 7:11 PM · Cycles
Stefan Werner (swerner) created T46814: Cycles: reduced memory usage of subsurface scattering.
Nov 18 2015, 6:05 PM · Cycles
Stefan Werner (swerner) added a comment to T46760: Branched Path Tracing converges to different result than plain Path Tracing.

The ground truth would be the path traced result without MIS, which matches PT with MIS and BPT without MIS.

Nov 18 2015, 10:52 AM · BF Blender, Cycles

Nov 17 2015

Stefan Werner (swerner) added a comment to T46760: Branched Path Tracing converges to different result than plain Path Tracing.

Brecht, I'm not sure I'm following you. In my opinion, the path tracing integrator gives the correct result, where diffuse is uniform, the branched path tracer is incorrect and darkens diffuse in the presence of specular.

Nov 17 2015, 6:28 PM · BF Blender, Cycles

Nov 13 2015

Stefan Werner (swerner) created T46760: Branched Path Tracing converges to different result than plain Path Tracing.
Nov 13 2015, 1:50 PM · BF Blender, Cycles

Jul 16 2015

Stefan Werner (swerner) added a comment to T45447: Area light importance sampling improvement.

Excellent, thanks!

Jul 16 2015, 10:04 AM · BF Blender, Cycles

Jul 15 2015

Stefan Werner (swerner) created T45447: Area light importance sampling improvement.
Jul 15 2015, 5:01 PM · BF Blender, Cycles
Stefan Werner (swerner) added a comment to T38279: Improve Cycles standalone.

JSON or XML should be irrelevant, both are going to be equally easy to read or write with a decent library. I hope nobody has intentions of reinventing the wheel by writing yet another XML parser!

Jul 15 2015, 1:13 PM · BF Blender, Cycles

Mar 31 2015

Stefan Werner (swerner) updated subscribers of D1200: Cycles OpenCL kernel-splitting work.
Mar 31 2015, 9:36 AM · Cycles