Page MenuHome

Cycles: delay CUDA and OpenCL initialization to avoid driver crashes.
ClosedPublic

Authored by Brecht Van Lommel (brecht) on Sun, Jan 27, 8:23 PM.

Details

Summary

We've had many reported crashes on Windows where we suspect there is a
corrupted OpenCL driver. The purpose here is to keep Blender generally
usable in such cases.

Now it always shows None / CUDA / OpenCL in the preferences, and only when
selecting one will it reveal if there are any GPUs available. This should
avoid crashes when opening the preferences or on startup.

Diff Detail

Repository
rB Blender

Event Timeline

This is like delaying the problem to happen. While initial startup of Blender might happen fine, at the moment someone would try to enable GPU compute Blender will crash. Or even worse, after initial setup is done and something was updated in the OS Blender will crash again.

So to me it seems to fully address root of the issue querying devices in a subprocess is the only reliable way.

Letting a process crash each time Blender starts up is not good either in my opinion. If the user did not enable GPU rendering in the preferences, we should not be querying it.

We can do the separate process in addition to this.

Letting a process crash each time Blender starts up is not good either in my opinion.

If Windows will show an "Application has crashed, send report to developers?" message box then sure. Otherwise i don't see big issue.

If the user did not enable GPU rendering in the preferences, we should not be querying it.

But we do query other things. Like tablets, space navigators, audio system. Guess the main difference is that those are not crashing that often.

Not saying i'm advocating leaving crash here, but here are my thoughts:

  • From the user perspective, is a bit weird to require them to bother about OpenCL/CUDA. In an ideal world just show them compute devices.
  • Before this change users almost never saw both CUDA and OpenCL, so that wasn't too much of a confusion.
  • What if we ever support OptiX? We would need to merge CUDA and OptiX devices anyway, and initialize both libraries. Which also potentially increases probability of crashes.
  • Even if we solve the initial startup crash, this doesn't solve issue of drivers *becoming* "corrupted" (windows update *shrug*). That will lead to constant crashes of Cycles and user Preferences again.
  • This doesn't seem to be solving crash on a buggy system when one generates system info? Not sure how good of an idea of not including platform information about potentially usable but not enabled devices.

Do we know which exact part crashes? Is it dlopen of OpenCL IDC? Enumerating platforms? Enumerating specific platform? There is some binary compiled by myself a while back to help nailing down such issues: https://download.blender.org/ftp/sergey/clinfo/

Can you reproduce crash on your system?

If Windows will show an "Application has crashed, send report to developers?" message box then sure. Otherwise i don't see big issue.

I don't trust crashing the OpenCL drivers right before starting Blender. Ideally it's all safe and isolated, but graphics drivers are not that predictable.

  • From the user perspective, is a bit weird to require them to bother about OpenCL/CUDA. In an ideal world just show them compute devices.
  • Before this change users almost never saw both CUDA and OpenCL, so that wasn't too much of a confusion.
  • What if we ever support OptiX? We would need to merge CUDA and OptiX devices anyway, and initialize both libraries. Which also potentially increases probability of crashes.

I don't think there will be confusion, users can click the settings and see where the devices are. It's slightly less convenient, but not a real problem. We can still have a GPU rendering on/off toggle in the future where we detect both types of devices.

  • Even if we solve the initial startup crash, this doesn't solve issue of drivers *becoming* "corrupted" (windows update *shrug*). That will lead to constant crashes of Cycles and user Preferences again.
  • This doesn't seem to be solving crash on a buggy system when one generates system info? Not sure how good of an idea of not including platform information about potentially usable but not enabled devices.

Yes, it doesn't solve those problems. System info still includes all devices. But at least it will crash doing a specific action that gives a clue about where the problem is, and keeps the rest of Blender usable.

Do we know which exact part crashes? Is it dlopen of OpenCL IDC? Enumerating platforms? Enumerating specific platform? There is some binary compiled by myself a while back to help nailing down such issues: https://download.blender.org/ftp/sergey/clinfo/

Can you reproduce crash on your system?

We've never been able to reproduce the problem ourselves, nor do we know the extent of the problem. It might be very rare and not worth the code complexity of running a separate process. We regularly get "Blender crashes on startup" reports, and it would be nice to exclude this possibility.

In one case it crashed in clGetPlatformIDs() doing some kind of internal initialization. This is the very first function we call.

This was a simple change to make. Either it solves startup crashes for a bunch of users and tells us that this is in fact a common problem, and we can spend time to make it better. Or it makes no difference and we save time.

Some summary of discussion here:

  • Since there is possible crash in clGetPlatformIDs, then is not much we can do about it from our side.
  • One thing we can do to make interface a bit more cleaner is to check for cuda library, and if that does not exist do not show CUDA option. Also, can disable OpenCL if cuda is found. Can happen later on though.

Going all the subprocess thing is indeed much more hassle. But i am more convinced now that this change is in the right direction.

This revision is now accepted and ready to land.Mon, Jan 28, 3:42 PM
This revision was automatically updated to reflect the committed changes.

Yes, they resolve that as well.

@Brecht Van Lommel (brecht) Is it possible to have this in Blender 2.8 as well?

Everything in 2.7 ends up in 2.8, just not always immediately.