Home

Activity Feed

Today

VincentG (Vincent Girès) created T44200: Node groups : group inputs have no effects inside group.
Tue, Mar 31, 2:15 AM · Compositing
dfelinto (Dalai Felinto) added a comment to rB56283d0af8f1: BGE: setUniformEyef need automatic refresh on every frame..

Hi Benoit, I think you may not even need this setUniformEyef. In Python you can get the current eye already, and then you can change whatever uniform accordingly.

Tue, Mar 31, 2:09 AM
greylica (Ivan Paulos Tomé) committed rBTS3423: New translations 100%.
New translations 100%
Tue, Mar 31, 1:43 AM
greylica (Ivan Paulos Tomé) committed rBTS3422: New translations 100%.
New translations 100%
Tue, Mar 31, 1:33 AM
campbellbarton (Campbell Barton) added a comment to D1183: Implement the delta mush modifier..

Committed to master rBc16a8983efba9ecacd8da408d03c37a55483e528

Tue, Mar 31, 1:29 AM
marcclintdion (marc dion) updated subscribers of T44197: Cycles OpenCL kernel-splitting work.
Tue, Mar 31, 1:27 AM · Cycles, BF Blender
campbellbarton (Campbell Barton) closed T43988: Delta Mush Modifier as "Resolved".

Committed rBc16a8983efba9ecacd8da408d03c37a55483e528

Tue, Mar 31, 1:23 AM · BF Blender
campbellbarton (Campbell Barton) committed rBc16a8983efba: Corrective Smooth Modifier (aka delta-mush) (authored by campbellbarton (Campbell Barton)).
Corrective Smooth Modifier (aka delta-mush)
Tue, Mar 31, 1:21 AM
Tunge (Hakan Tunc) updated subscribers of T28386: New Compositing Nodes: Vibrance, HCL, Pixelize, Posterize, ToonAA.
Tue, Mar 31, 1:05 AM · Nodes, BF Blender
kyriazis (George Kyriazis) added a comment to D1200: Cycles OpenCL kernel-splitting work.
In D1200#22424, @brecht wrote:

thanks for the comment. Hmm, if you split the kernel into a few (3-4) kernels because of svm_eval_nodes(), don't you end up needing to execute all of them anyway, because you have no guarantee that all the nodes you need will be in the same kernel?

For example, if you split into 3 kernels as follows: kernel 1 has cases ABC, kernel 2 cases DEF and kernel 3 cases GHI, then what happens if your shader requires nodes ADEG? You'll end up needing to go through all 3 kernels. (is that what you mean by splitting?)

I meant one kernel that has ABC, one that has ABCDEF, and one that has ABCDEFGHI, where GHI would be the nodes that requires the most registers, etc.

I was thinking that if the number of possible kernels that ever need to be compiled is some reasonable fixed number, then they could all be compiled and cache once, so compile time is not as a big a concern. But thinking about it more, there's probably other #defines influencing the SVM nodes too, so the number may be too big to cache it all beforehand.

Tue, Mar 31, 12:57 AM · Cycles
kyriazis (George Kyriazis) added a comment to D1200: Cycles OpenCL kernel-splitting work.
In D1200#22421, @brecht wrote:

Here's the issue: svm_eval_nodes() is a huge case statement. This translates into a huge if-then-else statement. the optimizer is trying to optimize the whole thing in one batch, causing compilation to be slow. In addition, the generated code is "less than optimal", causing register spills, etc.. By limiting the number of cases for the switch statement, we are making the compiler's life easier, and you end up not going through a sequence of if-then-if-then-if-then-.. too often.

That's interesting, I thought the compiler would be able to do similar optimizations as on the CPU, something like a jump table. I guess one way to speed it up would be to do a kind of binary search with nested if / else, that at least makes it logarithmic, though that's not ideal yet.

Tue, Mar 31, 12:52 AM · Cycles
campbellbarton (Campbell Barton) committed rB43b20d5b6c08: minor cleanup (authored by campbellbarton (Campbell Barton)).
minor cleanup
Tue, Mar 31, 12:50 AM
campbellbarton (Campbell Barton) committed rB3121619a7479: Double the strength of length-weight smoothing (authored by campbellbarton (Campbell Barton)).
Double the strength of length-weight smoothing
Tue, Mar 31, 12:48 AM
campbellbarton (Campbell Barton) committed rB0420384283e4: Optimize simple smoothing, approx 60% faster (authored by campbellbarton (Campbell Barton)).
Optimize simple smoothing, approx 60% faster
Tue, Mar 31, 12:41 AM
campbellbarton (Campbell Barton) committed rBf95d73fb5221: Comment tweaks (authored by campbellbarton (Campbell Barton)).
Comment tweaks
Tue, Mar 31, 12:41 AM
brecht (Brecht Van Lommel) added a comment to D1200: Cycles OpenCL kernel-splitting work.

thanks for the comment. Hmm, if you split the kernel into a few (3-4) kernels because of svm_eval_nodes(), don't you end up needing to execute all of them anyway, because you have no guarantee that all the nodes you need will be in the same kernel?

For example, if you split into 3 kernels as follows: kernel 1 has cases ABC, kernel 2 cases DEF and kernel 3 cases GHI, then what happens if your shader requires nodes ADEG? You'll end up needing to go through all 3 kernels. (is that what you mean by splitting?)

Tue, Mar 31, 12:37 AM · Cycles
kyriazis (George Kyriazis) added a comment to D1200: Cycles OpenCL kernel-splitting work.
In D1200#22419, @brecht wrote:

Regarding SVM nodes, my experience with CUDA was that for run time, it's only the "biggest" node that matters. Having many nodes isn't really a problem by itself, it's not going to make the code more divergent. But the biggest node determines how many registers you need, and how much data needs to be moved between registers and global memory, and that has an effect on code that you typically wouldn't think is affected.

That happens in general, if you have multiple types of geometric primitives, multiples types of samplers, .. you can add new ones without slowing down the kernel as long as their code is smaller than code for all of the existing ones.

So if the kernels are cached and keeping the amount of nodes small to reduce compile time is not the concern, then perhaps it makes sense make a few groups of nodes according to their size, instead of doing it for individual node types. Then you could have maybe 3 or 4 cached kernels which isn't too big to store on disk. I'm not sure if it's easy to find those groups, or if that would actually help with AMD OpenCL.

To avoid duplicate shader globals member definitions, the same ugly trick as done with kernel_textures.h can be used, make a file that declares all the members and then include it in various places with different macro definitions.

Tue, Mar 31, 12:32 AM · Cycles
sergey (Sergey Sharybin) added a comment to D1200: Cycles OpenCL kernel-splitting work.

@brecht, nice to hear from you!

Tue, Mar 31, 12:27 AM · Cycles
brecht (Brecht Van Lommel) added a comment to D1200: Cycles OpenCL kernel-splitting work.

Here's the issue: svm_eval_nodes() is a huge case statement. This translates into a huge if-then-else statement. the optimizer is trying to optimize the whole thing in one batch, causing compilation to be slow. In addition, the generated code is "less than optimal", causing register spills, etc.. By limiting the number of cases for the switch statement, we are making the compiler's life easier, and you end up not going through a sequence of if-then-if-then-if-then-.. too often.

Tue, Mar 31, 12:26 AM · Cycles
kyriazis (George Kyriazis) added a comment to D1200: Cycles OpenCL kernel-splitting work.
In D1200#22417, @sergey wrote:

Viewport is actually quite the same as final render set to "Progressive Refine" and that option is something artists want to have. If it's supported for final renders then viewport should be a piece of cake to support as well. So think rough plan in this context is something like:

  • Look into Progressive Refine for final renders
  • Once that work, look into what's remained to do for the viewport
  • Look into reducing delays

    Out of curiosity: what's the time difference bewteen kernel compiled with only needed nodes and kernel compiled with all the nodes? (i don't have AMD cards handy, and afraid intel CPU and nvidia's opencl are not really best things for benchmarks at this point).
Tue, Mar 31, 12:21 AM · Cycles
brecht (Brecht Van Lommel) added a comment to D1200: Cycles OpenCL kernel-splitting work.

Regarding SVM nodes, my experience with CUDA was that for run time, it's only the "biggest" node that matters. Having many nodes isn't really a problem by itself, it's not going to make the code more divergent. But the biggest node determines how many registers you need, and how much data needs to be moved between registers and global memory, and that has an effect on code that you typically wouldn't think is affected.

Tue, Mar 31, 12:18 AM · Cycles
lordloki (Jorge Bernal) added a comment to T41825: Effects in bge.

We can collaborate in both.

Tue, Mar 31, 12:13 AM · Game Engine
campbellbarton (Campbell Barton) committed rB71a75433eadc: remove asm, committed by accident (authored by campbellbarton (Campbell Barton)).
remove asm, committed by accident
Tue, Mar 31, 12:11 AM
campbellbarton (Campbell Barton) committed rBb5496c09c2b9: Minor edits, preparing for master (authored by campbellbarton (Campbell Barton)).
Minor edits, preparing for master
Tue, Mar 31, 12:09 AM
campbellbarton (Campbell Barton) committed rBfff8db369220: Show message when bind data is missing (authored by campbellbarton (Campbell Barton)).
Show message when bind data is missing
Tue, Mar 31, 12:09 AM