Asset Browser aggressively requests stat calls #101382

Closed
opened 2022-09-26 15:44:39 +02:00 by Pavel Duong · 8 comments

System Information
Operating system: Arch Linux, Windows 10
Graphics card: Intel UHD 620, nVidia GTX 1060 Super respectively

Blender Version
Broken: 19ae71c11342 (3.3.1 RC)
Worked:

Short description of error

Note: Asset Indexing is enabled in the Experimental tab.

The asset browser (monitored by strace on Linux) very frequently calls for stat(), even when the mouse is just hovering over already loaded assets' thumbnails. While this doesn't seem to be an issue with my Linux setup, where I assume the calls are cached by the kernel, it is very stuttery when the asset library is on a network SAMBA share on Windows.

To demonstrate, here are videos from Linux and Windows. On Linux, the asset library is locally stored, whereas on Windows, the asset library is on a network share, provided by the Linux system.

blender_assetbrowser_strace.mp4

As you can see from the terminal output, I've used two commands, one to output traced read and stat calls and one to count them as they come. And just by hovering over the asset browser we can see the high amount of calls, however given that they are stored locally (and cached by kernel I presume), the performance remains high and no stuttering is felt on my side.

On Windows, just plain opening the asset browser takes at least a minute to load the index/thumbnails (not shown on video). However, after we get to the same state as in Linux, we observe the stuttering when hovering. Unfortunately, I don't have enough knowledge on Windows' workings, so I can't provide any system trace calls at the moment.

However, to demonstrate the issue, a simple playback shows us the FPS in the viewport's topleft corner, and we can see a significant dip once we begin to scroll/hover over the asset browser (in provided video around 20s). Also the highlighting when a thumbnail is being hovered over lags behind the cursor by a lot.

blender_assetbrowser_windows.mp4

There's also the issue of the slow loading of the index/thumbnail on a cold launch, even if the thumbnails/index were generated previously, but I suppose that might be a topic for a different issue.

Exact steps for others to reproduce the error

  1. Have an asset library in a network share that contains relatively high number of files (for my case, PolyHaven library)
  2. Add the asset library and open up the asset browser
  3. (Possibly wait for indexing to be done/thumbnail generation, can take a long time)
  4. Notice that even just hovering above the thumbnails the Blender interface lags behind (eg. the thumbnail highlighting is way behind the cursor)
**System Information** Operating system: Arch Linux, Windows 10 Graphics card: Intel UHD 620, nVidia GTX 1060 Super respectively **Blender Version** Broken: `19ae71c11342` (3.3.1 RC) Worked: **Short description of error** *Note*: Asset Indexing is enabled in the Experimental tab. The asset browser (monitored by `strace` on Linux) very frequently calls for `stat()`, even when the mouse is just hovering over already loaded assets' thumbnails. While this doesn't seem to be an issue with my Linux setup, where I assume the calls are cached by the kernel, it is very stuttery when the asset library is on a network SAMBA share on Windows. To demonstrate, here are videos from Linux and Windows. On Linux, the asset library is *locally* stored, whereas on Windows, the asset library is on a *network share, provided by the Linux system*. [blender_assetbrowser_strace.mp4](https://archive.blender.org/developer/F13575255/blender_assetbrowser_strace.mp4) As you can see from the terminal output, I've used two commands, one to output traced `read` and `stat` calls and one to count them as they come. And just by hovering over the asset browser we can see the high amount of calls, however given that they are stored locally (and cached by kernel I presume), the performance remains high and no stuttering is felt on my side. On Windows, just plain opening the asset browser takes at least a minute to load the index/thumbnails (not shown on video). However, after we get to the same state as in Linux, we observe the stuttering when hovering. Unfortunately, I don't have enough knowledge on Windows' workings, so I can't provide any system trace calls at the moment. However, to demonstrate the issue, a simple playback shows us the FPS in the viewport's topleft corner, and we can see a significant dip once we begin to scroll/hover over the asset browser (in provided video around 20s). Also the highlighting when a thumbnail is being hovered over lags behind the cursor by a lot. [blender_assetbrowser_windows.mp4](https://archive.blender.org/developer/F13575286/blender_assetbrowser_windows.mp4) There's also the issue of the slow loading of the index/thumbnail on a cold launch, even if the thumbnails/index were generated previously, but I suppose that might be a topic for a different issue. **Exact steps for others to reproduce the error** 1. Have an asset library in a network share that contains relatively high number of files (for my case, PolyHaven library) 2. Add the asset library and open up the asset browser 3. (Possibly wait for indexing to be done/thumbnail generation, can take a long time) 4. Notice that even just hovering above the thumbnails the Blender interface lags behind (eg. the thumbnail highlighting is way behind the cursor)
Author

Added subscriber: @vignette

Added subscriber: @vignette
Member

Added subscriber: @Harley

Added subscriber: @Harley
Member

Honestly not sure what the provided information can achieve. It might be a very good starting point for investigation if you were considering attempting to optimize this performance yourself. But on its own it doesn't say anything conclusive.

There is an assertion here that the "stuttering" performance experienced is related to the number of stat calls shown, but evidence that these things are connected would be found in actually profiling the code. Otherwise it seems to be just a guess from the only type of data collected. By what metric do you judge these stat calls to be too many to the point of "aggressive"? How many is the correct number?

There could be more calls to stat than is required. We might be calling it to see if a file (and the thumbnail for it) exists and then also calling it again to find the file dates or file size. But... stat calls are extremely fast and would not account for the long delays shown in your second video. To understand what is happening there requires that you load up the source and profile to see what is taking up the time. There will be something, and likely something that can be improved. But it won't be the stats.

Honestly not sure what the provided information can achieve. It *might* be a very good starting point for investigation if *you* were considering attempting to optimize this performance *yourself*. But on its own it doesn't say anything conclusive. There is an assertion here that the "stuttering" performance experienced is related to the number of stat calls shown, but evidence that these things are connected would be found in actually profiling the code. Otherwise it seems to be just a guess from the only type of data collected. By what metric do you judge these stat calls to be too many to the point of "aggressive"? How many is the correct number? There could be more calls to stat than is required. We might be calling it to see if a file (and the thumbnail for it) exists and then also calling it again to find the file dates or file size. But... stat calls are extremely fast and would not account for the long delays shown in your second video. To understand what is happening there requires that you load up the source and profile to see what is taking up the time. There will be something, and likely something that can be improved. But it won't be the stats.
Author

In #101382#1423146, @Harley wrote:
Honestly not sure what the provided information can achieve. It might be a very good starting point for investigation if you were considering attempting to optimize this performance yourself. But on its own it doesn't say anything conclusive.

Just thought to attempt to raise a potential issue more than anything, sorry if this isn't the right channel for it.

There is an assertion here that the "stuttering" performance experienced is related to the number of stat calls shown, but evidence that these things are connected would be found in actually profiling the code. Otherwise it seems to be just a guess from the only type of data collected.

I'll have to admit, those are indeed guesses based on the fact, that one stores the library in a physical (thus faster) device and the other on a network (way slower) share.

I've tried to at least remove the OS factor from it and tested the same setup on two Linux devices, one connected to the other similiarly like how Windows was connected to the SMB of the Linux, with the additional difference of being linked on Wi-Fi, as opposed to being linked by cable.

The same effect has occured and the strace command showed that each stat can take up to 0.5 second, which quickly stacks up.

By what metric do you judge these stat calls to be too many to the point of "aggressive"? How many is the correct number?

As in the second part of the first video, where the calls are counted, they have quickly reached thousands from just hovering over with a mouse. I'd assume at the very least that such information shouldn't really be needed to be accessed from the filesystem again so often and so it'd be cached in memory or at least somewhere local.

As to how many is the correct number, I'd say that tens of thousands while just hovering the mouse is probably not the correct one.

There could be more calls to stat than is required. We might be calling it to see if a file (and the thumbnail for it) exists and then also calling it again to find the file dates or file size.

I understand that, but is it necessary to call it on every frame the mouse hovers over the thumbnail/asset library?

But... stat calls are extremely fast and would not account for the long delays shown in your second video.

They might be fast on local devices, but in usecases such as the asset library being shared and stored on a network share (or even a slow media storage) might pose some trouble, when the delays stack up with the amount of files.

To understand what is happening there requires that you load up the source and profile to see what is taking up the time. There will be something, and likely something that can be improved. But it won't be the stats.

Indeed, as I've said, it was just a guess and a shot in the dark. It might not be directly the stat calls, but it seems to be linked with the asset library being on a slower storage device.

I understand this kind of "unscientific" way of investigation isn't exactly the best, however I lack the resources to provide detailed analysis, so I just left it at attempting to raise a potential issue.

> In #101382#1423146, @Harley wrote: > Honestly not sure what the provided information can achieve. It *might* be a very good starting point for investigation if *you* were considering attempting to optimize this performance *yourself*. But on its own it doesn't say anything conclusive. Just thought to attempt to raise a potential issue more than anything, sorry if this isn't the right channel for it. > There is an assertion here that the "stuttering" performance experienced is related to the number of stat calls shown, but evidence that these things are connected would be found in actually profiling the code. Otherwise it seems to be just a guess from the only type of data collected. I'll have to admit, those are indeed guesses based on the fact, that one stores the library in a physical (thus faster) device and the other on a network (way slower) share. I've tried to at least remove the OS factor from it and tested the same setup on *two Linux devices*, one connected to the other similiarly like how Windows was connected to the SMB of the Linux, with the additional difference of being linked on Wi-Fi, as opposed to being linked by cable. The same effect has occured and the `strace` command showed that each `stat` can take up to 0.5 second, which quickly stacks up. > By what metric do you judge these stat calls to be too many to the point of "aggressive"? How many is the correct number? As in the second part of the first video, where the calls are counted, they have quickly reached thousands from just hovering over with a mouse. I'd assume at the very least that such information shouldn't really be needed to be accessed from the filesystem again so often and so it'd be cached in memory or at least somewhere local. As to how many is the correct number, I'd say that *tens of thousands* while just hovering the mouse is probably not the correct one. > There could be more calls to stat than is required. We might be calling it to see if a file (and the thumbnail for it) exists and then also calling it again to find the file dates or file size. I understand that, but is it necessary to call it on every frame the mouse hovers over the thumbnail/asset library? > But... stat calls are extremely fast and would not account for the long delays shown in your second video. They might be fast on local devices, but in usecases such as the asset library being shared and stored on a network share (or even a slow media storage) might pose some trouble, when the delays stack up with the amount of files. > To understand what is happening there requires that you load up the source and profile to see what is taking up the time. There will be something, and likely something that can be improved. But it won't be the stats. Indeed, as I've said, it was just a guess and a shot in the dark. It might not be directly the `stat` calls, but it seems to be linked with ***the asset library being on a slower storage device***. I understand this kind of "unscientific" way of investigation isn't exactly the best, however I lack the resources to provide detailed analysis, so I just left it at attempting to raise a potential issue.
Member

@vignette - Just thought to attempt to raise a potential issue more than anything, sorry if this isn't the right channel for it

Yes, no worries. Although this doesn't "fit" as a bug report - and will probably just get closed - I will probably poke around in that code there when I get a chance.

I'll have to admit, those are indeed guesses based on the fact, that one stores the library in a physical (thus faster) device and the other on a network (way slower) share.

There are probably some other more intensive file accesses in there, like reads.

with the additional difference of being linked on Wi-Fi, as opposed to being linked by cable.

Yes, there will lots of performance bottlenecks if connecting to a file share over a slow connection like wifi. Especially creating thumbnails of files since that requires reading the entirely of each file to make each preview.

...but is it necessary to call it on every frame the mouse hovers over the thumbnail/asset library?

To answer that you would have to know about the internals of that code.

...but in usecases such as the asset library being shared and stored on a network share (or even a slow media storage) might pose some trouble, when the delays stack up with the amount of files.

For sure, but in this case it will be other file activity, like accesses and reads that will contribute to most of that time, not the stats.

...however I lack the resources to provide detailed analysis, so I just left it at attempting to raise a potential issue.

No worries. I really do appreciate the attempt to help. But this requires that someone else do the actual profiling to find the root of the issue and then improve it. This just doesn't fit within this system of bug reporting.

> @vignette - Just thought to attempt to raise a potential issue more than anything, sorry if this isn't the right channel for it Yes, no worries. Although this doesn't "fit" as a bug report - and will probably just get closed - I will probably poke around in that code there when I get a chance. > I'll have to admit, those are indeed guesses based on the fact, that one stores the library in a physical (thus faster) device and the other on a network (way slower) share. There are probably some other more intensive file accesses in there, like reads. > with the additional difference of being linked on Wi-Fi, as opposed to being linked by cable. Yes, there will lots of performance bottlenecks if connecting to a file share over a slow connection like wifi. Especially creating thumbnails of files since that requires reading the entirely of each file to make each preview. > ...but is it necessary to call it on every frame the mouse hovers over the thumbnail/asset library? To answer that you would have to know about the internals of that code. > ...but in usecases such as the asset library being shared and stored on a network share (or even a slow media storage) might pose some trouble, when the delays stack up with the amount of files. For sure, but in this case it will be other file activity, like accesses and reads that will contribute to most of that time, not the stats. > ...however I lack the resources to provide detailed analysis, so I just left it at attempting to raise a potential issue. No worries. I really do appreciate the attempt to help. But this requires that someone else do the actual profiling to find the root of the issue and then improve it. This just doesn't fit within this system of bug reporting.
Member

Added subscriber: @PratikPB2123

Added subscriber: @PratikPB2123
Member

Changed status from 'Needs Triage' to: 'Archived'

Changed status from 'Needs Triage' to: 'Archived'
Member

Hi, thanks for the report. As far as I understand from the above discussion, stat() calls are not entirely responsible for the slowdown, right?
Also to note:

  • We need reliable way to redo the problem in order to fix them
  • Reports on performance improvements are not considered as bug reports
    Closing this ticket for now. Don't hesitate to comment or reopen the report if there is misunderstanding

While we do continue to work on improving performance in general, potential performance improvements are not handled as bug reports.
To improve performance, consider using less complex geometry, simpler shaders and smaller textures.

Hi, thanks for the report. As far as I understand from the above discussion, stat() calls are not entirely responsible for the slowdown, right? Also to note: - We need reliable way to redo the problem in order to fix them - Reports on performance improvements are not considered as bug reports Closing this ticket for now. Don't hesitate to comment or reopen the report if there is misunderstanding > While we do continue to work on improving performance in general, potential performance improvements are not handled as bug reports. > To improve performance, consider using less complex geometry, simpler shaders and smaller textures.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#101382
No description provided.