thanks for the comment. Hmm, if you split the kernel into a few (3-4) kernels because of svm_eval_nodes(), don't you end up needing to execute all of them anyway, because you have no guarantee that all the nodes you need will be in the same kernel?
For example, if you split into 3 kernels as follows: kernel 1 has cases ABC, kernel 2 cases DEF and kernel 3 cases GHI, then what happens if your shader requires nodes ADEG? You'll end up needing to go through all 3 kernels. (is that what you mean by splitting?)
I meant one kernel that has ABC, one that has ABCDEF, and one that has ABCDEFGHI, where GHI would be the nodes that requires the most registers, etc.
I was thinking that if the number of possible kernels that ever need to be compiled is some reasonable fixed number, then they could all be compiled and cache once, so compile time is not as a big a concern. But thinking about it more, there's probably other #defines influencing the SVM nodes too, so the number may be too big to cache it all beforehand.