Page MenuHome

Document performance profiling Blender
Confirmed, NormalPublicTO DO

Description

We are missing documentation on how to use profiling tools with Blender:
https://wiki.blender.org/wiki/Tools

Ideally for each platform we would document one recommended, user-friendly profiler and exact steps to use it.

Event Timeline

For Mac.

Is OpenGL profiler relevant ?https://developer.apple.com/library/archive/documentation/GraphicsImaging/Conceptual/OpenGLProfilerUserGuide/Introduction/Introduction.html#//apple_ref/doc/uid/TP40006475

Also, instruments.app seems to be a fine choice for Mac since it comes with the compulsory Xcode

I'm not sure OpenGL Profiler is still available with Xcode. But Instruments is indeed the profiling tool to recommend on macOS.

I can document on how I profile with this tool next week: https://github.com/KDAB/hotspot.

Hello,

I'm GSoC Aspirant here. I want to contribute in project. As a beginner I want to fix this bug. Can I get some help?

Thank you

this is not really a bug in blender... but a documentation TODO about profiling. Start at https://wiki.blender.org/wiki/GSoC

@Jacques Lucke (JacquesLucke) I have tried twice, unsuccessfully to build hotspot on Mac. https://github.com/brendangregg/FlameGraph is much more convenient. you can see many more tools on his website/ blog.

For Instruments.app, I have used several of its "modes": zombies, metal, time, etc. It would be nice if I could also make proper sense of the traces, not just "oh something's wrong there". @Jean First (robbott) could you write something ?

Hotspot can't work on macOS, it's only designed to work with perf. But there's also not much point since there is Instruments, which is pretty easy to get started with?
https://help.apple.com/instruments/mac/10.0/#/dev44b2b437

If there's a good tutorial or docs we can link to that, and we can add Blender specific advice, like using RelWithDebInfo builds.

I would like to document how profiling works on Windows. This is a good preparation step for my GSOC "Improving Compositor Performance" proposal.
Does anybody already have experience using Visual Studio or any other native profiler on Windows? Visual Studio even offers GPU profiler, but maybe other tools (e.g. from Nvidia) allow to go even deeper?

Think this ticket is mostly aimed at CPU profiling not GPU, that being said

I'm struggling combining good and user friendly, both Windows Performance Analyzer and Intels VTune are rather amazing profilers but the learning curve is nearly a sheer vertical wall, the build in profiler in VS is easy to use, but not that useful. Profilers are no magic bullets, it's best to figure out what you want to measure, what that will tell you and pick a product that best matches your requirements.

Awesome, thanks for the feedback.

So, for this ticket, I would try the different options and create a simple and advanced section in the wiki.
@Ray molenkamp (LazyDodo): What do you think?

Shall we create separate wiki pages about different platforms?

We can start by documenting the simple setup all on one page. Then if needed we can split it up or extend it.

The important thing is that a developer tests profiling Blender with the tool, and documents any Blender specific steps, configuration or pitfalls. We need just the minimal information to get started.

For example with VTune on Windows that might be something like:

  • Install VTune with Visual Studio integration
  • Set Blender build configuration to RelWithDebInfo (or Release?)
  • Configure VTune like this so it works correctly with Blender
  • Run profile like this and you will get results that look like this
  • To isolate results for one operation in Blender (without .blend load time etc.), do this

@Brecht Van Lommel (brecht): Thanks for the valuable input. Will let you know once I have the first draft ready.

@Brecht Van Lommel (brecht), @Ray molenkamp (LazyDodo):

Here is a draft for the instructions to start profiling on Windows: https://docs.google.com/document/d/1s15X1qZ8iRC2zc3zkEwDboJCARqhqpECXZ2PRLKG7QI/edit?usp=sharing
It would be awesome if you could give me some feedback on the content, and how to improve the document.

I discovered that there is a problem with the CMake RelWithDebInfo configuration, which leads to VS not generating the debug information (if not activated manually for each project). I will try to figure out how to solve this in the next days.

I discovered that there is a problem with the CMake RelWithDebInfo configuration, which leads to VS not generating the debug information (if not activated manually for each project). I will try to figure out how to solve this in the next days.

i fixed that yesterday

I think it's important to note that when you're profiling you're going to compare performance between builds, you have to be very very careful to compare apples to apples, there can be up to a 20% difference in performance between a Release and RelWithDebInfo build, a fun thread where i miserably failed at remembering this is T70463.

As for Vtune vs WPA they answer different questions vtune is focused on measuring if every instruction is living up to its true potential by recording if it's executing as fast as it could have, and what is holding it back while throwing tons and tons of data at you. Things get overwhelming fast but it's great if you're looking to squeeze every last bit of performance out of a piece of code.

WPA is less concerned with this low level detail and it's great for answering questions like 'when am i running? am i running as often as i can, and if not, what is preventing me? what are the threads waiting for?" or just generic questions of 'where is time being spend' D6267 is a great example where analysis of both those things can be leveraged to deal with performance issues. Bruce Dawson from google has written a wonderful tool that makes recording WPA traces as easy as a 1 button click called UiForEtw it also takes care of the installation of the tools required, it's highly recommended. Learning to use it yeah.. lets say there is a steep learning curve, however Bruce has an excellent blog where he shows how to use the tool to analyze various problems, he keeps a list of them over here

Now for the fun part, WPA and Vtune do not co-exist peacefully on a single machine, you install Vtune, WPA appears to be working, but the traces you end up with are virtually empty...

There is a work around to disable vtune temporarily but it is only documented in this obscure post on the intel forums from 2013 amplxe-sepreg.exe -u pax to remove the problematic driver (and reboot) restores WPA but now vtune is broken, amplxe-sepreg.exe -i reinstall the driver

Personally i find myself using WPA more often than VTune, that being said for the longest time VTune was not available for free, so that may have influenced that behavior.

i fixed that yesterday

Thanks. I saw in the VS docs that the /ZI (Program Database for Edit and Continue) option might disable many optimizations. Maybe for profiling it's better to use /Zi for the RelWithDebInfo configuration (Program Database without the Edit and Continue feature).
https://docs.microsoft.com/en-us/cpp/build/reference/z7-zi-zi-debug-information-format?view=vs-2019

most optimizations are incompatible with Edit and Continue

Here is a patch in case you would like to change it for RelWithDebInfo:

I think it's important to note that when you're profiling you're going to compare performance between builds, you have to be very very careful to compare apples to apples, there can be up to a 20% difference in performance between a Release and RelWithDebInfo build, a fun thread where i miserably failed at remembering this is T70463.

Very good point, I've added this to the Tips and Tricks section in the draft.

As for Vtune vs WPA they answer different questions vtune is focused on measuring if every instruction is living up to its true potential by recording if it's executing as fast as it could have, and what is holding it back while throwing tons and tons of data at you. Things get overwhelming fast but it's great if you're looking to squeeze every last bit of performance out of a piece of code.

WPA is less concerned with this low level detail and it's great for answering questions like 'when am i running? am i running as often as i can, and if not, what is preventing me? what are the threads waiting for?" or just generic questions of 'where is time being spend' D6267 is a great example where analysis of both those things can be leveraged to deal with performance issues. Bruce Dawson from google has written a wonderful tool that makes recording WPA traces as easy as a 1 button click called UiForEtw it also takes care of the installation of the tools required, it's highly recommended. Learning to use it yeah.. lets say there is a steep learning curve, however Bruce has an excellent blog where he shows how to use the tool to analyze various problems, he keeps a list of them over here

Now for the fun part, WPA and Vtune do not co-exist peacefully on a single machine, you install Vtune, WPA appears to be working, but the traces you end up with are virtually empty...

There is a work around to disable vtune temporarily but it is only documented in this obscure post on the intel forums from 2013 amplxe-sepreg.exe -u pax to remove the problematic driver (and reboot) restores WPA but now vtune is broken, amplxe-sepreg.exe -i reinstall the driver

Personally i find myself using WPA more often than VTune, that being said for the longest time VTune was not available for free, so that may have influenced that behavior.

Okay, now I better understand how WPA can be useful. I will try it out, document the setup as well, and add a summary when WPA can be useful.
Thanks for the input!

most optimizations are incompatible with Edit and Continue

Here is a patch in case you would like to change it for RelWithDebInfo

Yeah there's no reason for RelWithDebInfo to be using the E&C flags, i'll get that fixed up, thanks for the patch!