Page MenuHome

Cycles X - Multi-device rendering performance
Needs Triage, NormalPublicTO DO

Assigned To
Authored By
Sergey Sharybin (sergey)
Tue, Jul 13, 12:22 PM
"Love" token, awarded by gilberto_rodrigues."Like" token, awarded by GeorgiaPacific."Love" token, awarded by Alaska."Love" token, awarded by MarcoHoo."Love" token, awarded by HEYPictures.


An initial work on the multi-device rendering has been done, bringing all the required building blocks in place. However, actual balancing of the amount of work on the devices is to be improved.

There two main issues with the current logic:

  • The per-device performance is based on the time which it spent path-tracing. It works fine for "uniformly" complex frame, but fails in cases like there is an easy area like sky.
  • Headless render needs to schedule balancing a bit more often than it currently does (headless render renders 1 sample, and then keeps devices occupied for ~30sec). This could lead to a device doing nothing for a long time if the balance wasn't calculated accurate enough.

Tweaking headless scheduling is simple, but from own experiments it is really important to schedule more or less equally complex pixels to the devices, otherwise the balance will not converge fast enough (or will never converge).

There are few ideas to try:

  • Change the objective function from "amount of work based on an approximate performance of a device on a uni-work" to "amount of work based on equalizing time spend by the devices on doing path tracing". From experiments here the tricky part with this is to choose the weight step (avoiding over-compensations, but still allowing to converge to a good balance quickly)
  • Do interleaved scanline scheduling. The issue with this approach is that adaptive sampling becomes very tricky.

The best scene which demonstrates shortcomings of the current approach is the pabellon.blend in the F12 render (it behaves OK in headless, because half-decent balance after first sample is good enough, and subsequent re-balancing do not happen often, and hence do not hurt the performance).

Various open related topics:

  • Command line render (i.e. benchmark tool) seems to have a higher deviation in render times than one would expect. Thermal throttling/boost which interferes with the balancing strategy/schedule?
  • Rebalancing closer to the end of adaptive sampling. What's the best to do? Fall-back to a single device to avoid oevrheads?

Event Timeline

Some of the logic has been tweaked in the rB289a173d938f, which helped a lot to avoid huge speed regreesion in the Pabellon scene when fast GPU is used together with slow CPU.

There is still some performance penalty for such unbalanced configuration. Not sure what would be proper solution for that. In a way the tile-based rendering was dealing with such configuration better (smaller schedule units combined with work stealing), but it also was not keeping GPUs really busy. So here is a tradeoff between reacting quick enough for unbalanced configuration and keeping devices always occupied.

Would be interesting to test whether the observed slowdown is more of a constant time (some time penalty to balance things out happening during first samples of render: which means the percentage of penalty goes down when adding more samples) or whether it is a constant percentage from the overall render time (which means percentage of penalty does not go down with more samples added).