Page MenuHome

George Kyriazis (kyriazis)
Disabled

Projects

User Details

User Since
Oct 6 2014, 5:18 PM (254 w, 15 h)
Roles
Disabled

Recent Activity

May 10 2015

Pablo Vazquez (pablovazquez) awarded rB7f4479da425b: Cycles: OpenCL kernel split a Like token.
May 10 2015, 6:32 PM
Carlo Andreacchio (candreacchio) awarded rB7f4479da425b: Cycles: OpenCL kernel split a Like token.
May 10 2015, 4:08 AM

May 9 2015

Germano Cavalcante (mano-wii) awarded rB7f4479da425b: Cycles: OpenCL kernel split a Love token.
May 9 2015, 6:06 PM
Martijn Berger (juicyfruit) awarded rB7f4479da425b: Cycles: OpenCL kernel split a Like token.
May 9 2015, 5:37 PM
Thomas Dinges (dingto) awarded rB7f4479da425b: Cycles: OpenCL kernel split a Love token.
May 9 2015, 5:15 PM

Apr 12 2015

Dennis Brown (DBrown) awarded D1200: Cycles OpenCL kernel-splitting work a Love token.
Apr 12 2015, 6:38 PM · Cycles

Apr 6 2015

George Kyriazis (kyriazis) added a comment to T44197: Cycles OpenCL kernel-splitting work.

I also tested the original patch form D1200 on Windows 7 64bit with an AMD Radeon HD 6950 2 GB, AMD Omega Driver 14.12
I tried to render the standard cube.
Notes:
I can confirm Sv.Lockals notes that the sample count goes into a negative direction during rendering.

  • Viewport Rendering doesn't work for me, Blender freeze and the display goes completely black. The only way to cancel this is to restart the Computer
  • Compiling the Kernel works on my GPU (Console: "Device init sucess")
  • Rendering the Standard Cube in Blender ends up with a fully transparent image with black pixels on the corner of the Tiles.

@George Kyriazis (kyriazis) Is it possible to get Cycles working on non GCN Architecture GPUs?

Apr 6 2015, 5:45 PM · Cycles, BF Blender
Grigore Florin (numarul7) awarded D1200: Cycles OpenCL kernel-splitting work a Like token.
Apr 6 2015, 4:00 AM · Cycles

Mar 31 2015

George Kyriazis (kyriazis) added a comment to D1200: Cycles OpenCL kernel-splitting work.

@Brecht Van Lommel (brecht), nice to hear from you!
For CUDA yes, it's the "biggest" node which matters. But AMD OpenCL is inlining all the functions (@George Kyriazis (kyriazis) please correct me if my information is totally outdated) so adding more nodes will likely cause higher stress on registers i'm afraid.

Yes, that's exactly the problem. Too many things for the optimizer to juggle. At least for the current drivers, that is.

Mar 31 2015, 3:06 AM · Cycles
George Kyriazis (kyriazis) added a comment to D1200: Cycles OpenCL kernel-splitting work.

thanks for the comment. Hmm, if you split the kernel into a few (3-4) kernels because of svm_eval_nodes(), don't you end up needing to execute all of them anyway, because you have no guarantee that all the nodes you need will be in the same kernel?
For example, if you split into 3 kernels as follows: kernel 1 has cases ABC, kernel 2 cases DEF and kernel 3 cases GHI, then what happens if your shader requires nodes ADEG? You'll end up needing to go through all 3 kernels. (is that what you mean by splitting?)

I meant one kernel that has ABC, one that has ABCDEF, and one that has ABCDEFGHI, where GHI would be the nodes that requires the most registers, etc.
I was thinking that if the number of possible kernels that ever need to be compiled is some reasonable fixed number, then they could all be compiled and cache once, so compile time is not as a big a concern. But thinking about it more, there's probably other #defines influencing the SVM nodes too, so the number may be too big to cache it all beforehand.

Mar 31 2015, 12:57 AM · Cycles
George Kyriazis (kyriazis) added a comment to D1200: Cycles OpenCL kernel-splitting work.

Here's the issue: svm_eval_nodes() is a huge case statement. This translates into a huge if-then-else statement. the optimizer is trying to optimize the whole thing in one batch, causing compilation to be slow. In addition, the generated code is "less than optimal", causing register spills, etc.. By limiting the number of cases for the switch statement, we are making the compiler's life easier, and you end up not going through a sequence of if-then-if-then-if-then-.. too often.

That's interesting, I thought the compiler would be able to do similar optimizations as on the CPU, something like a jump table. I guess one way to speed it up would be to do a kind of binary search with nested if / else, that at least makes it logarithmic, though that's not ideal yet.

Mar 31 2015, 12:52 AM · Cycles
George Kyriazis (kyriazis) added a comment to D1200: Cycles OpenCL kernel-splitting work.

Regarding SVM nodes, my experience with CUDA was that for run time, it's only the "biggest" node that matters. Having many nodes isn't really a problem by itself, it's not going to make the code more divergent. But the biggest node determines how many registers you need, and how much data needs to be moved between registers and global memory, and that has an effect on code that you typically wouldn't think is affected.
That happens in general, if you have multiple types of geometric primitives, multiples types of samplers, .. you can add new ones without slowing down the kernel as long as their code is smaller than code for all of the existing ones.
So if the kernels are cached and keeping the amount of nodes small to reduce compile time is not the concern, then perhaps it makes sense make a few groups of nodes according to their size, instead of doing it for individual node types. Then you could have maybe 3 or 4 cached kernels which isn't too big to store on disk. I'm not sure if it's easy to find those groups, or if that would actually help with AMD OpenCL.
To avoid duplicate shader globals member definitions, the same ugly trick as done with kernel_textures.h can be used, make a file that declares all the members and then include it in various places with different macro definitions.

Mar 31 2015, 12:32 AM · Cycles
George Kyriazis (kyriazis) added a comment to D1200: Cycles OpenCL kernel-splitting work.

Viewport is actually quite the same as final render set to "Progressive Refine" and that option is something artists want to have. If it's supported for final renders then viewport should be a piece of cake to support as well. So think rough plan in this context is something like:

  • Look into Progressive Refine for final renders
  • Once that work, look into what's remained to do for the viewport
  • Look into reducing delays

Out of curiosity: what's the time difference bewteen kernel compiled with only needed nodes and kernel compiled with all the nodes? (i don't have AMD cards handy, and afraid intel CPU and nvidia's opencl are not really best things for benchmarks at this point).

Mar 31 2015, 12:21 AM · Cycles

Mar 30 2015

George Kyriazis (kyriazis) added a comment to D1200: Cycles OpenCL kernel-splitting work.

some comments here, and Lenny will reply on some other issues in more detail

Made some preliminary check of the new code. There are some major issues which are to be addressed.

  • You totally violated data flow in rendering: kernels are to be loaded before doing device_update(), otherwise it's just asking for huge problems (i.e. memory allocation issues due to missing modules and so). I'm not sure why OpenCL is happy with this, but it's just crashing CUDA.

Is that the one that you said to ignore in a later comment?

No, the thing i said to ignore was about forced #undefs. The device_update() issue is still there. It might be rather simple solution: make it an utility function in ShaderMaager which will either give you all node types used or directly fill them into given device.
I'm still trying to find a better way to communicate such features to the device, but didn't came with anything better looking.

  • The things you're doing with nodes (selective SVM node compilation) is fine for final renders, but for viewport it'll need to be like a conjunction of all nodes ever used during the rendered viewport. This is because you don't want kernels to be re-{loaded,compiled} when you're tweaking shader tree.

Hmm.. The reason we are doing this is to simplify the kernel (similar stuff may need to be done inside the bsdf function) and allow for faster code and faster compilation. In your mind, if we were to recompile on-the-fly, what is the maximum delay (in seconds) that is acceptable between someone changing the material, and the preview starting?

The goal should be 0.0sec of delay :) It's a bit of research topic perhaps, but i think it worth trying to make it so once node was added to device it never gets removed during the BlenderSession. This way you can safely tweak all the connections in the shader tree without triggering kernel to be re-loaded.
On the other hand, if latency caused by kernel re-load (compiled kernel reload, let's keep aside compilation time, after some ours of work artists will have all sort of kernels compiled anyway :) is not measurable by human current code will work too.
Perhaps we'd better concentrate on the viewport render first and solve latency after that (when latency is visible it's easier to address it anyway).

How about allowing the option to use OpenCL for final rendering, while still going down the CPU path for preview?

Mar 30 2015, 11:27 PM · Cycles
George Kyriazis (kyriazis) added a comment to T44197: Cycles OpenCL kernel-splitting work.

Get repeatable crash with opencl on intel with simple file.
Open .blend and start render with F12, crash before last tile finish.
Hash: rB30c689ff7f10
Opensuse 13.2/64
Intel i5 3770K
GTX 760 4 GB
Driver 349.12
opencl-1.2-5.0.0.57



Cheers, mib

Mar 30 2015, 10:50 PM · Cycles, BF Blender
George Kyriazis (kyriazis) added a comment to D1200: Cycles OpenCL kernel-splitting work.

Hi Sergey,

Mar 30 2015, 10:42 PM · Cycles

Mar 28 2015

xueke pei (yuzukyo) awarded D1200: Cycles OpenCL kernel-splitting work a Love token.
Mar 28 2015, 9:12 PM · Cycles

Mar 27 2015

Sv. Lockal (lockal) awarded D1200: Cycles OpenCL kernel-splitting work a The World Burns token.
Mar 27 2015, 3:03 PM · Cycles

Mar 26 2015

George Kyriazis (kyriazis) added a comment to D1200: Cycles OpenCL kernel-splitting work.
  1. When using some mix materials containing transparent shader, the transparency doesn't work correctly (we can see through those materials, but direct light doesn't go through).

This is related to the TRANSPARENT_SHADOWS feature which is turned off (see kernel/kernel_types.h) in this patch. There is still some unsolved issues on AMD devices to support this feature, but it should work just fine on NVidia.

TRANSPARENT_SHADOWS is working in the official 2.74RC3 with Catalyst 15.3, but you certainly know more than me what has to be better on the driver side.
What about the black render/crash with viewport render and some tile size? Known bug or should we report it somewhere?

Mar 26 2015, 4:55 PM · Cycles
George Kyriazis (kyriazis) added a comment to D1200: Cycles OpenCL kernel-splitting work.

Tested a few Tile sizes
found x-480 y-540 in the BMW scene gets me render times of 31.98sec

Mar 26 2015, 4:52 PM · Cycles

Mar 25 2015

Daniel Salazar (zanqdo) awarded D1200: Cycles OpenCL kernel-splitting work a Love token.
Mar 25 2015, 9:35 PM · Cycles
Campbell Barton (campbellbarton) awarded D1200: Cycles OpenCL kernel-splitting work a Like token.
Mar 25 2015, 6:36 PM · Cycles
mathieu menuet (bliblubli) awarded D1200: Cycles OpenCL kernel-splitting work a Like token.
Mar 25 2015, 11:25 AM · Cycles
Nathan Letwory (jesterking) awarded D1200: Cycles OpenCL kernel-splitting work a Like token.
Mar 25 2015, 11:04 AM · Cycles
Martin Capitanio (capnm) awarded D1200: Cycles OpenCL kernel-splitting work a Like token.
Mar 25 2015, 10:36 AM · Cycles
derek barker (lordodin) awarded D1200: Cycles OpenCL kernel-splitting work a Love token.
Mar 25 2015, 4:22 AM · Cycles
GiantCowFIlms awarded D1200: Cycles OpenCL kernel-splitting work a Like token.
Mar 25 2015, 4:05 AM · Cycles
catlover2 (catlover2) awarded D1200: Cycles OpenCL kernel-splitting work a Like token.
Mar 25 2015, 2:13 AM · Cycles
Ellwood Zwovic (gandalf3) awarded D1200: Cycles OpenCL kernel-splitting work a Love token.
Mar 25 2015, 1:55 AM · Cycles
Sebastian Brachi (brachi) awarded D1200: Cycles OpenCL kernel-splitting work a Like token.
Mar 25 2015, 12:32 AM · Cycles

Mar 24 2015

George Kyriazis (kyriazis) added a comment to D1200: Cycles OpenCL kernel-splitting work.

Thank you guys for the (upcoming) review.

Mar 24 2015, 11:01 PM · Cycles
Sterling Roth (sterlingroth) awarded D1200: Cycles OpenCL kernel-splitting work a Like token.
Mar 24 2015, 9:08 PM · Cycles
Thomas Dinges (dingto) awarded D1200: Cycles OpenCL kernel-splitting work a Love token.
Mar 24 2015, 6:49 PM · Cycles
Germano Cavalcante (mano-wii) awarded D1200: Cycles OpenCL kernel-splitting work a Love token.
Mar 24 2015, 6:12 PM · Cycles
David Schrott (thelasthope) awarded D1200: Cycles OpenCL kernel-splitting work a Orange Medal token.
Mar 24 2015, 6:04 PM · Cycles
Jeffrey Hoover (italic_) awarded D1200: Cycles OpenCL kernel-splitting work a Yellow Medal token.
Mar 24 2015, 5:42 PM · Cycles
tuqueque tuquequin (tuqueque) awarded D1200: Cycles OpenCL kernel-splitting work a Love token.
Mar 24 2015, 5:39 PM · Cycles
George Kyriazis (kyriazis) retitled D1200: Cycles OpenCL kernel-splitting work from to Cycles OpenCL kernel-splitting work.
Mar 24 2015, 4:20 PM · Cycles