Official 2.7, lagging when sculpting. #39281

Closed
opened 2014-03-20 13:15:16 +01:00 by michalis zissiou · 25 comments

OSX 10.8.5
MacPro, dual Xeon, 16 threads, ATI HD 5870.

official 2.7
Broken: (example: 2.69.7 4b206af, see splash screen)
Worked: All recent buildbot 2.7x, all my builds as well, all working fine

lagging when stroking, even on a 5k mesh.
No need to use multires, no modifiers.
Same lagging under dyntopo, (nothing to do with dyntopo though, irrelevant to the density of the mesh)
BTW
I don't know how to build a blender that supports OpenMP, neither buildbot uploads support Open MP, this is well known.
The official 2.7 supports OpenMP, as seen in console.

OSX 10.8.5 MacPro, dual Xeon, 16 threads, ATI HD 5870. official 2.7 Broken: (example: 2.69.7 4b206af, see splash screen) Worked: All recent buildbot 2.7x, all my builds as well, all working fine lagging when stroking, even on a 5k mesh. No need to use multires, no modifiers. Same lagging under dyntopo, (nothing to do with dyntopo though, irrelevant to the density of the mesh) BTW I don't know how to build a blender that supports OpenMP, neither buildbot uploads support Open MP, this is well known. The official 2.7 supports OpenMP, as seen in console.

Changed status to: 'Open'

Changed status to: 'Open'

Added subscriber: @michaliszissiou

Added subscriber: @michaliszissiou
jens verwiebe was assigned by Bastien Montagne 2014-03-20 14:40:38 +01:00

Added subscriber: @mont29

Added subscriber: @mont29

Jens, do you have insight here? Thanks! :)

Jens, do you have insight here? Thanks! :)
Member

I have tested sculpt on 10.9 and 10.6 where it is fine.
Threading with omp is afaik limited to 2 threads so in realworld, it never made
any huge difference for me.

Can you check what happens if you disable sculpt threading ? ( uncheck the box )

I have no idea what exactly could go wrong here, but i updated the intel omp lib with
dedicated 10.6+ load commands ( although showed no issues on 10.6 ) just to be on the safe side.

I could only speculate that on your dual dual 4 core ( == 16HT ), the omp is spreaded between
cpu's and less efficient.
You could try set omp environment vars to limit omp-threads and/or limit it to use only 1 cpu.

Appendix/omp env vars:

OMP_NUM_THREADS=num - Sets the maximum number of threads to use during execution
OMP_PROC_BIND Enables or disables threads binding to processors. Valid values are TRUE or FALSE.

Jens

I have tested sculpt on 10.9 and 10.6 where it is fine. Threading with omp is afaik limited to 2 threads so in realworld, it never made any huge difference for me. Can you check what happens if you disable sculpt threading ? ( uncheck the box ) I have no idea what exactly could go wrong here, but i updated the intel omp lib with dedicated 10.6+ load commands ( although showed no issues on 10.6 ) just to be on the safe side. I could only speculate that on your dual dual 4 core ( == 16HT ), the omp is spreaded between cpu's and less efficient. You could try set omp environment vars to limit omp-threads and/or limit it to use only 1 cpu. Appendix/omp env vars: OMP_NUM_THREADS=num - Sets the maximum number of threads to use during execution OMP_PROC_BIND Enables or disables threads binding to processors. Valid values are TRUE or FALSE. Jens

Excuse me,
I'm not a programmer
All the builds from buildbot,
and, all my builds
working just fine,
All the previous 2.69 and back official builds working fine. (and, openMP make a lot of deference in sculpting (multires) mode.
2.7 is not.
You made something wrong.
Please try to fix it.
Thank you

If this is the new policy, to discard as many bug reports as possible, go ahead.
Some of my recent reports discarded. However, on BA forum, we all noticed them and replicated them.

Excuse me, I'm not a programmer All the builds from buildbot, and, all my builds working just fine, All the previous 2.69 and back official builds working fine. (and, openMP make a lot of deference in sculpting (multires) mode. 2.7 is not. You made something wrong. Please try to fix it. Thank you If this is the new policy, to discard as many bug reports as possible, go ahead. Some of my recent reports discarded. However, on BA forum, we all noticed them and replicated them.

@michaliszissiou there is no new policy, and this report is still open afaik!!! Jens just gave you hints to try to understand the issue better, since he isn’t able to reproduce it off hand.

And things like “you made something wrong” and “please try to fix it” are not very nice, please consider we spend a lot of time and efforts to produce a tool you get for free!

@michaliszissiou there is no new policy, and this report is still open afaik!!! Jens just gave you hints to try to understand the issue better, since he isn’t able to reproduce it off hand. And things like “you made something wrong” and “please try to fix it” are not very nice, please consider we spend a lot of time and efforts to produce a tool you get for free!

BTW, yeah
disabling multithreaded sculpting stops lagging.
However, it is the first blender that does not support this important feature.
I'll keep a 2.69 for multires sculpting. Serious sculpting I mean.
Thank you for helping.
I wonder, how do you build on OSX with OpenMP support.
I wish I could find someone and ask for advises.

BTW, yeah disabling multithreaded sculpting stops lagging. However, it is the first blender that does not support this important feature. I'll keep a 2.69 for multires sculpting. Serious sculpting I mean. Thank you for helping. I wonder, how do you build on OSX with OpenMP support. I wish I could find someone and ask for advises.
Member

The point is i changed omp enabled build from the very hackish selfcompiled gcc-4.8.1 to
clang-omp 3.4 at some point before 2.7 development.

The results are pretty the same in physics if not faster.
Additionally i don't see the issue here, so will have to try first to pin down
why you see lagging but not me ( on 2 very different systems: hexcore 3.33 2010 OSX 10.9 and dual dualcore 2.66 2006 with OSX 10.6 )

I could go back to compile with gcc with the disadvantage of lots of uncovered backwardscompat. issues.

So thus will take a while to sort out and make a decision for next release or silent release update.

Stay tuned

Jens

The point is i changed omp enabled build from the very hackish selfcompiled gcc-4.8.1 to clang-omp 3.4 at some point before 2.7 development. The results are pretty the same in physics if not faster. Additionally i don't see the issue here, so will have to try first to pin down why you see lagging but not me ( on 2 very different systems: hexcore 3.33 2010 OSX 10.9 and dual dualcore 2.66 2006 with OSX 10.6 ) I could go back to compile with gcc with the disadvantage of lots of uncovered backwardscompat. issues. So thus will take a while to sort out and make a decision for next release or silent release update. Stay tuned Jens

Thank you for replying.
See here on BA forum
http://blenderartists.org/forum/showthread.php?264568-Dyntopo-tests&p=2603544&viewfull=1#post2603544
Doris is on a new iMAC i7 (27) nVidia lot of RAM. (I think on OSX 10.8.5)
Same behavior though

Thank you for replying. See here on BA forum http://blenderartists.org/forum/showthread.php?264568-Dyntopo-tests&p=2603544&viewfull=1#post2603544 Doris is on a new iMAC i7 (27) nVidia lot of RAM. (I think on OSX 10.8.5) Same behavior though
Member

Okay, made some investigation, my results:

The OMP builds with done clang-omp or gcc doesn't matter as much.
But i found the coding in question. It uses 2 * omp_get_num_procs(), which
returns the logical cores here, so on my hexcoreHT == 24 threads. Thats plain
suboptimal and has so much overhead its slower than non-threaded in some
cases.

We must clear up one thing here: you said former omp builds as release 2.69 was superior ?
Or did you always use non-omp'ed builds ?

I formely did a lot research on omp in fluids and always found typically nothing better working than
the physical threadcount.

So i made test with 6 threads here and found it better working, but i have no real comparison atm..
I used an up to 8 times subdivided sphere and it starts to lag then in any case.

Roundup:

  • the new build environment seems to change nothing significant here.
  • the actual choosen threadcount 2 * logical corecount is definitely suboptimal on machines with much cores ( known problem )
  • there might be a comparison misunderstanding omp vs. no omp ( buildbot is afaik always non omp, done with an old apple gcc 4.2 ) So if you liked these better, the conclusion would be sculpt omp is broken by design.

I do a bit more research what is best solution here, stay tuned.

Please give exact details on scen used, perhaps a simple blend, so i can compare subdiv stages you used with mine directly.

Jens

Okay, made some investigation, my results: The OMP builds with done clang-omp or gcc doesn't matter as much. But i found the coding in question. It uses 2 * omp_get_num_procs(), which returns the logical cores here, so on my hexcoreHT == 24 threads. Thats plain suboptimal and has so much overhead its slower than non-threaded in some cases. We must clear up one thing here: you said former omp builds as release 2.69 was superior ? Or did you always use non-omp'ed builds ? I formely did a lot research on omp in fluids and always found typically nothing better working than the physical threadcount. So i made test with 6 threads here and found it better working, but i have no real comparison atm.. I used an up to 8 times subdivided sphere and it starts to lag then in any case. Roundup: - the new build environment seems to change nothing significant here. - the actual choosen threadcount 2 * logical corecount is definitely suboptimal on machines with much cores ( known problem ) - there might be a comparison misunderstanding omp vs. no omp ( buildbot is afaik always non omp, done with an old apple gcc 4.2 ) So if you liked these better, the conclusion would be sculpt omp is broken by design. I do a bit more research what is best solution here, stay tuned. Please give exact details on scen used, perhaps a simple blend, so i can compare subdiv stages you used with mine directly. Jens
Member

I think i found 2 bugs.

1: the threadcount is not set right, typically it should be corecount without HT ( physical cores )
2: once you set threaded, deactivating the button does not go back to single thread, which is faster in some cases
3: as threaded is default==true for omp builds, see 2:

Will take care of this tommorow ...

Jens

I think i found 2 bugs. 1: the threadcount is not set right, typically it should be corecount without HT ( physical cores ) 2: once you set threaded, deactivating the button does not go back to single thread, which is faster in some cases 3: as threaded is default==true for omp builds, see 2: Will take care of this tommorow ... Jens
Member

I pushed a fix for threading cases and optimized thread count to
an empirically found optimal value ( == physical cores, not HT ! )

Please test ...

Btw. for selfcompiling an openMP build use in your user-config:

CC = '../lib/darwin-9.x.universal/clang-omp/bin/clang'
CXX = '../lib/darwin-9.x.universal/clang-omp/bin/clang++'

The needed compiler and omplib is in blender libs for OSX.
OMP is enabled automatically.
Don't forget to eventually do svn update.

Jens

I pushed a fix for threading cases and optimized thread count to an empirically found optimal value ( == physical cores, not HT ! ) Please test ... Btw. for selfcompiling an openMP build use in your user-config: CC = '../lib/darwin-9.x.universal/clang-omp/bin/clang' CXX = '../lib/darwin-9.x.universal/clang-omp/bin/clang++' The needed compiler and omplib is in blender libs for OSX. OMP is enabled automatically. Don't forget to eventually do svn update. Jens

Added subscriber: @Psy-Fi

Added subscriber: @Psy-Fi

Hey Jens,

How would one go about enabling openmp in cmake?
would

  • DCMAKE_C_COMPILER=../lib/darwin-9.x.universal/clang-omp/bin/clang -DCMAKE_CXX_COMPILER=../lib/darwin-9.x.universal/clang-omp/bin/clang++ -DWITH_OPENMP=ON

do it?

Hey Jens, How would one go about enabling openmp in cmake? would - DCMAKE_C_COMPILER=../lib/darwin-9.x.universal/clang-omp/bin/clang -DCMAKE_CXX_COMPILER=../lib/darwin-9.x.universal/clang-omp/bin/clang++ -DWITH_OPENMP=ON do it?

Added subscriber: @JorgeLosilla

Added subscriber: @JorgeLosilla

I have a relatively new Imac with Nvdia gtx 680mx. There is lagging and the smooth brush specially works like a pain in the ass. Not to mention also the problem with the inflate brush. Which also lags and creates weird geometry. Just as reported in previous bugs.

I have a relatively new Imac with Nvdia gtx 680mx. There is lagging and the smooth brush specially works like a pain in the ass. Not to mention also the problem with the inflate brush. Which also lags and creates weird geometry. Just as reported in previous bugs.

No open MP support yet.
Sorry.
The worse OSX blender build ever!
Apple must be responsible for this crap.
Apple, always.
Sorry, not a convincing excuse. Apple vs blender devs. Surrealism

No open MP support yet. Sorry. The worse OSX blender build ever! Apple must be responsible for this crap. Apple, always. Sorry, not a convincing excuse. Apple vs blender devs. Surrealism
Member

The only surrealism is your reports.
Without exact details no one can do any real comparison.
The only hint atm. is new compiler may behave different, but i cannot reproduce here.
With same testsetup 2.69, 2.70gcc or 2.70clang-omp, Sculpt behaves all same as before,
with threading not much win in most cases ( even with last optimization - in trunk )

The only surrealism is your reports. Without exact details no one can do any real comparison. The only hint atm. is new compiler may behave different, but i cannot reproduce here. With same testsetup 2.69, 2.70gcc or 2.70clang-omp, Sculpt behaves all same as before, with threading not much win in most cases ( even with last optimization - in trunk )

Without exact details no one can do any real comparison.
OK
Visit the BArtists forum, the blender tests/dyntopo tests and ask for opinions.
we are the artists who participated in the blender dyntopo and multires sculpting development.
And, you are who?
I already posted links to posts where same complains happening.
You didn't read them.
Sorry, wrong behavior.
Now, I'll share your reply to the blenderartists community.
Let's see, be my guest.
Surrealism at its best.

You obviously have no idea on how to build on OSX.
We read:
"Btw. for selfcompiling an openMP build use in your user-config:

CC = '../lib/darwin-9.x.universal/clang-omp/bin/clang'
CXX = '../lib/darwin-9.x.universal/clang-omp/bin/clang++'

The needed compiler and omplib is in blender libs for OSX.
OMP is enabled automatically.
Don't forget to eventually do svn update."
Seriously ?

*Without exact details no one can do any real comparison.* OK Visit the BArtists forum, the blender tests/dyntopo tests and ask for opinions. we are the artists who participated in the blender dyntopo and multires sculpting development. And, you are who? I already posted links to posts where same complains happening. You didn't read them. Sorry, wrong behavior. Now, I'll share your reply to the blenderartists community. Let's see, be my guest. Surrealism at its best. You obviously have no idea on how to build on OSX. We read: "Btw. for selfcompiling an openMP build use in your user-config: CC = '../lib/darwin-9.x.universal/clang-omp/bin/clang' CXX = '../lib/darwin-9.x.universal/clang-omp/bin/clang++' The needed compiler and omplib is in blender libs for OSX. OMP is enabled automatically. Don't forget to eventually do svn update." Seriously ?

Changed status from 'Open' to: 'Archived'

Changed status from 'Open' to: 'Archived'

Invalid reporter…

Invalid reporter…

@michaliszissou

You just called the maintainer of OSX build systems a newbie and insignificant.
Maybe the only one who can help you resolve the issue.

Since you like publicity I will post some details here so people can know what is happening. I was the one who built your build system. However I am not an Apple dev and cannot know for sure how to enable openmp. I did try to follow some instructions from Jens but:

  1. I tried to modify your build script remotely, without having access to the build cache to check if openmp is indeed enabled, or if errors occurred.
  2. Getting information from you remotely was hard because you were not familiar with the build systems. In fact a request such as 'post the error for me' would generate lots of frustration and confusion. After which I quit trying to do it remotely, telling you I will check it in person next time we met.
  3. You yourself have failed more than once to use the build script properly. In fact last time I checked you still couldn't build an openmp build with the modified script I gave you and I am not really sure if you did manage to do it ever. Since then I didn't have a chance to check your system in person and it's out of the question you would have been able to fix it yourself.

It's most likely an error on our part.

So I suggest a healthy dose of humility, patience and manners please.

@michaliszissou You just called the maintainer of OSX build systems a newbie and insignificant. Maybe the only one who can help you resolve the issue. Since you like publicity I will post some details here so people can know what is happening. I was the one who built your build system. However I am not an Apple dev and cannot know for sure how to enable openmp. I did try to follow some instructions from Jens but: 1) I tried to modify your build script remotely, without having access to the build cache to check if openmp is indeed enabled, or if errors occurred. 2) Getting information from you remotely was hard because you were not familiar with the build systems. In fact a request such as 'post the error for me' would generate lots of frustration and confusion. After which I quit trying to do it remotely, telling you I will check it in person next time we met. 3) You yourself have failed more than once to use the build script properly. In fact last time I checked you still couldn't build an openmp build with the modified script I gave you and I am not really sure if you did manage to do it ever. Since then I didn't have a chance to check your system in person and it's out of the question you would have been able to fix it yourself. It's most likely an error on our part. So I suggest a healthy dose of humility, patience and manners please.

hash: a5be03b
It works beautifully.
Physical cores (in my case =8) is the best. 16 threads not so. Expected.
Many apologies for my manners,
Thank you very much.

@Psy-Fi,
I replied to you via PM. But, haven't finish it yet.

hash: a5be03b It works beautifully. Physical cores (in my case =8) is the best. 16 threads not so. Expected. Many apologies for my manners, Thank you very much. @Psy-Fi, I replied to you via PM. But, haven't finish it yet.
Member

michaliszissiou, I accept your apologies, no problem.

Some background:

Intel claims itself that hyperhtreads do not work so well on older cpu. I have no real insight where the smartpoint is, but i guess my
Westmere EP ( hexcore ), where i don't see such huge penalties as you. My other machine does not even have HT ( Harpertown ) so
has also no huge penalty when using full threads.

I guess all i7, i5 and pre Westemere do better with the now used physical corecount.
I made an intersting observation: I can saturate my cpu well with all logical threads -1 ( aka 11 ! ), higher settings are again down to low cpu usage.
Although this looks busy in activity monitor, cannot verify a real benefit here, but perhaps i miss a useful scene only.

The intel omp behaves a bit pickier than gcc omp perhaps. I observed a lot of wait_sleeps when using too high corecount. This shows
the calculations cannot saturate the cores in this case. But : the intel omp is found to perform 10 -20% better than gcc generally ( when used right )

A comment on actual implementation:
Psy-fi and me ( sorry, forgot to give credits in the commit for reusing his work on scenstore and gui ) made this global for now almost, while using the
coretweaks only in sculpt/dyntopo, due i guess we might find other ompe'd areas where lowering the threadcount to phys. cores may be better.

Fluid already got its own thread tweaking long time ago. Smoke seems to saturate well, cloth is not investigated in deep yet.

Anyway, if someone finds this commit too intrusive, i have no problem cutting this down to the sculpt area again.
Also it came to my mind that the storage per scene is perhaps not the best ( michalis ? is it very scenedependant ? )
Having this user-local, aka machine dependant is more useful perhaps ?
On the other hand the "AUTO" setting will always return the optimal setting.
Opinions are welcome.

Jens

P.S: OS maintainers are welcome to translate this insights to other OS

michaliszissiou, I accept your apologies, no problem. Some background: Intel claims itself that hyperhtreads do not work so well on older cpu. I have no real insight where the smartpoint is, but i guess my Westmere EP ( hexcore ), where i don't see such huge penalties as you. My other machine does not even have HT ( Harpertown ) so has also no huge penalty when using full threads. I guess all i7, i5 and pre Westemere do better with the now used physical corecount. I made an intersting observation: I can saturate my cpu well with all logical threads -1 ( aka 11 ! ), higher settings are again down to low cpu usage. Although this looks busy in activity monitor, cannot verify a real benefit here, but perhaps i miss a useful scene only. The intel omp behaves a bit pickier than gcc omp perhaps. I observed a lot of wait_sleeps when using too high corecount. This shows the calculations cannot saturate the cores in this case. But : the intel omp is found to perform 10 -20% better than gcc generally ( when used right ) A comment on actual implementation: Psy-fi and me ( sorry, forgot to give credits in the commit for reusing his work on scenstore and gui ) made this global for now almost, while using the coretweaks only in sculpt/dyntopo, due i guess we might find other ompe'd areas where lowering the threadcount to phys. cores may be better. Fluid already got its own thread tweaking long time ago. Smoke seems to saturate well, cloth is not investigated in deep yet. Anyway, if someone finds this commit too intrusive, i have no problem cutting this down to the sculpt area again. Also it came to my mind that the storage per scene is perhaps not the best ( michalis ? is it very scenedependant ? ) Having this user-local, aka machine dependant is more useful perhaps ? On the other hand the "AUTO" setting will always return the optimal setting. Opinions are welcome. Jens P.S: OS maintainers are welcome to translate this insights to other OS
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
5 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#39281
No description provided.