Inefficient use of GPU? #91791

Closed
opened 2021-09-28 20:23:09 +02:00 by Kepa · 16 comments

System Information
Operating system: Windows-10-10.0.19041-SP0 64 Bits
Graphics card: NVIDIA GeForce RTX 2080 Ti/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 472.12

Blender Version
Broken: version: 3.0.0 Alpha, branch: master, commit date: 2021-09-27 18:29, hash: c53ffda8a4
Worked: (newest version of Blender that worked as expected)

Short description of error
The total use of the GPU when rendering has dropped by around 10-15% according to different monitoring with different software that I have been doing between the latest LTS version (2.93.4) and the ALPHA (3.0.0).
The scene is the same in both cases.

The team is made up of 4 Nvidia 2080ti GPUs and the tests have been carried out both one by one, two, three or all 4, the result always being the same.
I am attaching two screenshots and I hope it will be useful.

Exact steps for others to reproduce the error
Simply compare the use of the GPUs with some monitoring software, while the render is running. In my case TechPowerUp GPU-Z, in its latest version.
2.93.4
2.93.4_version.jpg
3.0.0 Alpha
3.0.0 Alpha_version.jpg

**System Information** Operating system: Windows-10-10.0.19041-SP0 64 Bits Graphics card: NVIDIA GeForce RTX 2080 Ti/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 472.12 **Blender Version** Broken: version: 3.0.0 Alpha, branch: master, commit date: 2021-09-27 18:29, hash: `c53ffda8a4` Worked: (newest version of Blender that worked as expected) **Short description of error** The total use of the GPU when rendering has dropped by around 10-15% according to different monitoring with different software that I have been doing between the latest LTS version (2.93.4) and the ALPHA (3.0.0). The scene is the same in both cases. The team is made up of 4 Nvidia 2080ti GPUs and the tests have been carried out both one by one, two, three or all 4, the result always being the same. I am attaching two screenshots and I hope it will be useful. **Exact steps for others to reproduce the error** Simply compare the use of the GPUs with some monitoring software, while the render is running. In my case TechPowerUp GPU-Z, in its latest version. 2.93.4 ![2.93.4_version.jpg](https://archive.blender.org/developer/F10654862/2.93.4_version.jpg) 3.0.0 Alpha ![3.0.0 Alpha_version.jpg](https://archive.blender.org/developer/F10654868/3.0.0_Alpha_version.jpg)
Author

Added subscriber: @Kepa-1

Added subscriber: @Kepa-1

Added subscriber: @deadpin

Added subscriber: @deadpin

There is still ongoing work in this area as detailed within #89833 (see last few comments).

There is still ongoing work in this area as detailed within #89833 (see last few comments).
Author

Hi Jesse,

Thanks for responding so quickly, I have read what you tell me but it is not clear to me if the error is due to the distribution of tasks when using multi-GPU or if this also happens in version 3.0.0 Alpha, with only one GPU.

The tests that I have carried out have been with a single GPU and with several and the result is the same, I don't know if I understand that what you tell me is only related to multi-GPUs.

Thank you.

Hi Jesse, Thanks for responding so quickly, I have read what you tell me but it is not clear to me if the error is due to the distribution of tasks when using multi-GPU or if this also happens in version 3.0.0 Alpha, with only one GPU. The tests that I have carried out have been with a single GPU and with several and the result is the same, I don't know if I understand that what you tell me is only related to multi-GPUs. Thank you.

The mentioned bug is for multiple devices. If you are observing a rendering time regression compared to previous versions with just 1 device, please make your bug very clear it's just for 1 GPU and attach an example .blend file (as per the bug filing template and guidelines) so that it could be investigated.

The mentioned bug is for multiple devices. If you are observing a rendering time regression compared to previous versions with just 1 device, please make your bug very clear it's just for 1 GPU and attach an example .blend file (as per the bug filing template and guidelines) so that it could be investigated.
Author

Thanks Jesse,

In the description of the problem I indicate that the result of the test is using a single GPU, two GPUs, three GPUs and the four GPUs, that is, the test result is the same with any number of GPUs, but if you need to report another message again where I only indicate that it is with one, tell me and I report it if it is better.

This happens with any scene, I don't know if it is relevant that I send you a scene because it happens even with a simple box.
Tell me if you need me to do something else to help and I will gladly help you.

Thanks Jesse, In the description of the problem I indicate that the result of the test is using a single GPU, two GPUs, three GPUs and the four GPUs, that is, the test result is the same with any number of GPUs, but if you need to report another message again where I only indicate that it is with one, tell me and I report it if it is better. This happens with any scene, I don't know if it is relevant that I send you a scene because it happens even with a simple box. Tell me if you need me to do something else to help and I will gladly help you.

Added subscriber: @brecht

Added subscriber: @brecht

GPU load being lower than 100% is not necessarily indicative of a bug. It could be that there is something to be optimized, but I would actually expect it to be lower with the cycles-x changes. What we would consider a bug is if there is a significant performance regression in render time or interactivity.

Previously we would schedule one big kernel for all paths, keeping all GPU cores occupied. However, due to divergence paths might terminate and keep cores occupied while not doing anything useful. Now when paths are terminated, we no longer schedule any work for them, and if few active paths remain that means not reaching 100% GPU load. However the effective occupancy might still be higher than before, just not visible in these kinds of statistics.

GPU load being lower than 100% is not necessarily indicative of a bug. It could be that there is something to be optimized, but I would actually expect it to be lower with the cycles-x changes. What we would consider a bug is if there is a significant performance regression in render time or interactivity. Previously we would schedule one big kernel for all paths, keeping all GPU cores occupied. However, due to divergence paths might terminate and keep cores occupied while not doing anything useful. Now when paths are terminated, we no longer schedule any work for them, and if few active paths remain that means not reaching 100% GPU load. However the effective occupancy might still be higher than before, just not visible in these kinds of statistics.
Author

Thanks Brecht,

I understand what you are telling me, I have decided to report it, because in identical conditions it seemed strange to me, that in the same scene the load was lower, as I showed you in the captures.

Thanks Brecht, I understand what you are telling me, I have decided to report it, because in identical conditions it seemed strange to me, that in the same scene the load was lower, as I showed you in the captures.

Added subscriber: @iss

Added subscriber: @iss

Changed status from 'Needs Triage' to: 'Needs User Info'

Changed status from 'Needs Triage' to: 'Needs User Info'

@Kepa-1 do you see render time being worse in 3.0 alpha?

@Kepa-1 do you see render time being worse in 3.0 alpha?
Author

Added subscriber: @richard-27

Added subscriber: @richard-27
Author

Time is better in version 3.0.0 Alpha, but really what I expose would not be significant, I will not apply for what I say, because Cycles in version 3.0 Alpha even if it was 60-70% and it would be faster than in The 2.93.4, the difference that I notice is, that at least visually in the load of GPU is not done, the same load, but as Brecht says, not necessarily that has to mean that it is a bug, what I do not know It is if we even win more if that 10-15% more load that could be done on the GPU would still give us more performance.

Time is better in version 3.0.0 Alpha, but really what I expose would not be significant, I will not apply for what I say, because Cycles in version 3.0 Alpha even if it was 60-70% and it would be faster than in The 2.93.4, the difference that I notice is, that at least visually in the load of GPU is not done, the same load, but as Brecht says, not necessarily that has to mean that it is a bug, what I do not know It is if we even win more if that 10-15% more load that could be done on the GPU would still give us more performance.

Changed status from 'Needs User Info' to: 'Archived'

Changed status from 'Needs User Info' to: 'Archived'

Thanks for clarification. Since there is no performance regression I will close the report. As @brecht said, it is possible that there may be some room for improvement but from information provided here it is hard to tell.

For more information on why this isn't considered a bug, visit: https://wiki.blender.org/wiki/Reference/Not_a_bug

Thanks for clarification. Since there is no performance regression I will close the report. As @brecht said, it is possible that there may be some room for improvement but from information provided here it is hard to tell. For more information on why this isn't considered a bug, visit: https://wiki.blender.org/wiki/Reference/Not_a_bug
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#91791
No description provided.