Official 2.7, lagging when sculpting. #39281

New Issue

michalis zissiou · 2014-03-20T13:15:16+01:00

michalis zissiou commented

2014-03-20 13:15:16 +01:00

OSX 10.8.5
MacPro, dual Xeon, 16 threads, ATI HD 5870.

official 2.7
Broken: (example: 2.69.7 4b206af, see splash screen)
Worked: All recent buildbot 2.7x, all my builds as well, all working fine

lagging when stroking, even on a 5k mesh.
No need to use multires, no modifiers.
Same lagging under dyntopo, (nothing to do with dyntopo though, irrelevant to the density of the mesh)
BTW
I don't know how to build a blender that supports OpenMP, neither buildbot uploads support Open MP, this is well known.
The official 2.7 supports OpenMP, as seen in console.

OSX 10.8.5 MacPro, dual Xeon, 16 threads, ATI HD 5870. official 2.7 Broken: (example: 2.69.7 4b206af, see splash screen) Worked: All recent buildbot 2.7x, all my builds as well, all working fine lagging when stroking, even on a 5k mesh. No need to use multires, no modifiers. Same lagging under dyntopo, (nothing to do with dyntopo though, irrelevant to the density of the mesh) BTW I don't know how to build a blender that supports OpenMP, neither buildbot uploads support Open MP, this is well known. The official 2.7 supports OpenMP, as seen in console.

michalis zissiou commented

2014-03-20 13:15:16 +01:00

Changed status to: 'Open'

michalis zissiou commented

2014-03-20 13:15:16 +01:00

Added subscriber: @michaliszissiou

jens verwiebe was assigned by Bastien Montagne

2014-03-20 14:40:38 +01:00

Bastien Montagne commented

2014-03-20 14:40:38 +01:00

Added subscriber: @mont29

Bastien Montagne commented

2014-03-20 14:40:38 +01:00

Jens, do you have insight here? Thanks! :)

jens verwiebe commented

2014-03-20 18:20:58 +01:00

I have tested sculpt on 10.9 and 10.6 where it is fine.
Threading with omp is afaik limited to 2 threads so in realworld, it never made
any huge difference for me.

Can you check what happens if you disable sculpt threading ? ( uncheck the box )

I have no idea what exactly could go wrong here, but i updated the intel omp lib with
dedicated 10.6+ load commands ( although showed no issues on 10.6 ) just to be on the safe side.

I could only speculate that on your dual dual 4 core ( == 16HT ), the omp is spreaded between
cpu's and less efficient.
You could try set omp environment vars to limit omp-threads and/or limit it to use only 1 cpu.

Appendix/omp env vars:

OMP_NUM_THREADS=num - Sets the maximum number of threads to use during execution
OMP_PROC_BIND Enables or disables threads binding to processors. Valid values are TRUE or FALSE.

Jens

I have tested sculpt on 10.9 and 10.6 where it is fine. Threading with omp is afaik limited to 2 threads so in realworld, it never made any huge difference for me. Can you check what happens if you disable sculpt threading ? ( uncheck the box ) I have no idea what exactly could go wrong here, but i updated the intel omp lib with dedicated 10.6+ load commands ( although showed no issues on 10.6 ) just to be on the safe side. I could only speculate that on your dual dual 4 core ( == 16HT ), the omp is spreaded between cpu's and less efficient. You could try set omp environment vars to limit omp-threads and/or limit it to use only 1 cpu. Appendix/omp env vars: OMP_NUM_THREADS=num - Sets the maximum number of threads to use during execution OMP_PROC_BIND Enables or disables threads binding to processors. Valid values are TRUE or FALSE. Jens

michalis zissiou commented

2014-03-20 20:19:09 +01:00

Excuse me,
I'm not a programmer
All the builds from buildbot,
and, all my builds
working just fine,
All the previous 2.69 and back official builds working fine. (and, openMP make a lot of deference in sculpting (multires) mode.
2.7 is not.
You made something wrong.
Please try to fix it.
Thank you

If this is the new policy, to discard as many bug reports as possible, go ahead.
Some of my recent reports discarded. However, on BA forum, we all noticed them and replicated them.

Excuse me, I'm not a programmer All the builds from buildbot, and, all my builds working just fine, All the previous 2.69 and back official builds working fine. (and, openMP make a lot of deference in sculpting (multires) mode. 2.7 is not. You made something wrong. Please try to fix it. Thank you If this is the new policy, to discard as many bug reports as possible, go ahead. Some of my recent reports discarded. However, on BA forum, we all noticed them and replicated them.

Bastien Montagne commented

2014-03-20 20:27:26 +01:00

@michaliszissiou there is no new policy, and this report is still open afaik!!! Jens just gave you hints to try to understand the issue better, since he isn’t able to reproduce it off hand.

And things like “you made something wrong” and “please try to fix it” are not very nice, please consider we spend a lot of time and efforts to produce a tool you get for free!

@michaliszissiou there is no new policy, and this report is still open afaik!!! Jens just gave you hints to try to understand the issue better, since he isn’t able to reproduce it off hand. And things like “you made something wrong” and “please try to fix it” are not very nice, please consider we spend a lot of time and efforts to produce a tool you get for free!

michalis zissiou commented

2014-03-20 20:28:54 +01:00

BTW, yeah
disabling multithreaded sculpting stops lagging.
However, it is the first blender that does not support this important feature.
I'll keep a 2.69 for multires sculpting. Serious sculpting I mean.
Thank you for helping.
I wonder, how do you build on OSX with OpenMP support.
I wish I could find someone and ask for advises.

BTW, yeah disabling multithreaded sculpting stops lagging. However, it is the first blender that does not support this important feature. I'll keep a 2.69 for multires sculpting. Serious sculpting I mean. Thank you for helping. I wonder, how do you build on OSX with OpenMP support. I wish I could find someone and ask for advises.

jens verwiebe commented

2014-03-20 20:49:00 +01:00

The point is i changed omp enabled build from the very hackish selfcompiled gcc-4.8.1 to
clang-omp 3.4 at some point before 2.7 development.

The results are pretty the same in physics if not faster.
Additionally i don't see the issue here, so will have to try first to pin down
why you see lagging but not me ( on 2 very different systems: hexcore 3.33 2010 OSX 10.9 and dual dualcore 2.66 2006 with OSX 10.6 )

I could go back to compile with gcc with the disadvantage of lots of uncovered backwardscompat. issues.

So thus will take a while to sort out and make a decision for next release or silent release update.

Stay tuned

Jens

The point is i changed omp enabled build from the very hackish selfcompiled gcc-4.8.1 to clang-omp 3.4 at some point before 2.7 development. The results are pretty the same in physics if not faster. Additionally i don't see the issue here, so will have to try first to pin down why you see lagging but not me ( on 2 very different systems: hexcore 3.33 2010 OSX 10.9 and dual dualcore 2.66 2006 with OSX 10.6 ) I could go back to compile with gcc with the disadvantage of lots of uncovered backwardscompat. issues. So thus will take a while to sort out and make a decision for next release or silent release update. Stay tuned Jens

michalis zissiou commented

2014-03-20 21:28:54 +01:00

Thank you for replying.
See here on BA forum
http://blenderartists.org/forum/showthread.php?264568-Dyntopo-tests&p=2603544&viewfull=1#post2603544
Doris is on a new iMAC i7 (27) nVidia lot of RAM. (I think on OSX 10.8.5)
Same behavior though

Thank you for replying. See here on BA forum http://blenderartists.org/forum/showthread.php?264568-Dyntopo-tests&p=2603544&viewfull=1#post2603544 Doris is on a new iMAC i7 (27) nVidia lot of RAM. (I think on OSX 10.8.5) Same behavior though

jens verwiebe commented

2014-03-20 22:42:00 +01:00

Okay, made some investigation, my results:

The OMP builds with done clang-omp or gcc doesn't matter as much.
But i found the coding in question. It uses 2 * omp_get_num_procs(), which
returns the logical cores here, so on my hexcoreHT == 24 threads. Thats plain
suboptimal and has so much overhead its slower than non-threaded in some
cases.

We must clear up one thing here: you said former omp builds as release 2.69 was superior ?
Or did you always use non-omp'ed builds ?

I formely did a lot research on omp in fluids and always found typically nothing better working than
the physical threadcount.

So i made test with 6 threads here and found it better working, but i have no real comparison atm..
I used an up to 8 times subdivided sphere and it starts to lag then in any case.

Roundup:

the new build environment seems to change nothing significant here.
the actual choosen threadcount 2 * logical corecount is definitely suboptimal on machines with much cores ( known problem )
there might be a comparison misunderstanding omp vs. no omp ( buildbot is afaik always non omp, done with an old apple gcc 4.2 ) So if you liked these better, the conclusion would be sculpt omp is broken by design.

I do a bit more research what is best solution here, stay tuned.

Please give exact details on scen used, perhaps a simple blend, so i can compare subdiv stages you used with mine directly.

Jens

Okay, made some investigation, my results: The OMP builds with done clang-omp or gcc doesn't matter as much. But i found the coding in question. It uses 2 * omp_get_num_procs(), which returns the logical cores here, so on my hexcoreHT == 24 threads. Thats plain suboptimal and has so much overhead its slower than non-threaded in some cases. We must clear up one thing here: you said former omp builds as release 2.69 was superior ? Or did you always use non-omp'ed builds ? I formely did a lot research on omp in fluids and always found typically nothing better working than the physical threadcount. So i made test with 6 threads here and found it better working, but i have no real comparison atm.. I used an up to 8 times subdivided sphere and it starts to lag then in any case. Roundup: - the new build environment seems to change nothing significant here. - the actual choosen threadcount 2 * logical corecount is definitely suboptimal on machines with much cores ( known problem ) - there might be a comparison misunderstanding omp vs. no omp ( buildbot is afaik always non omp, done with an old apple gcc 4.2 ) So if you liked these better, the conclusion would be sculpt omp is broken by design. I do a bit more research what is best solution here, stay tuned. Please give exact details on scen used, perhaps a simple blend, so i can compare subdiv stages you used with mine directly. Jens

jens verwiebe commented

2014-03-21 01:04:11 +01:00

I think i found 2 bugs.

1: the threadcount is not set right, typically it should be corecount without HT ( physical cores )
2: once you set threaded, deactivating the button does not go back to single thread, which is faster in some cases
3: as threaded is default==true for omp builds, see 2:

Will take care of this tommorow ...

Jens

I think i found 2 bugs. 1: the threadcount is not set right, typically it should be corecount without HT ( physical cores ) 2: once you set threaded, deactivating the button does not go back to single thread, which is faster in some cases 3: as threaded is default==true for omp builds, see 2: Will take care of this tommorow ... Jens

jens verwiebe commented

2014-03-21 15:30:57 +01:00

I pushed a fix for threading cases and optimized thread count to
an empirically found optimal value ( == physical cores, not HT ! )

Please test ...

Btw. for selfcompiling an openMP build use in your user-config:

CC = '../lib/darwin-9.x.universal/clang-omp/bin/clang'
CXX = '../lib/darwin-9.x.universal/clang-omp/bin/clang++'

The needed compiler and omplib is in blender libs for OSX.
OMP is enabled automatically.
Don't forget to eventually do svn update.

Jens

I pushed a fix for threading cases and optimized thread count to an empirically found optimal value ( == physical cores, not HT ! ) Please test ... Btw. for selfcompiling an openMP build use in your user-config: CC = '../lib/darwin-9.x.universal/clang-omp/bin/clang' CXX = '../lib/darwin-9.x.universal/clang-omp/bin/clang++' The needed compiler and omplib is in blender libs for OSX. OMP is enabled automatically. Don't forget to eventually do svn update. Jens

Antonis Ryakiotakis commented

2014-03-21 21:13:55 +01:00

Added subscriber: @Psy-Fi

Antonis Ryakiotakis commented

2014-03-21 21:13:55 +01:00

Hey Jens,

How would one go about enabling openmp in cmake?
would

DCMAKE_C_COMPILER=../lib/darwin-9.x.universal/clang-omp/bin/clang -DCMAKE_CXX_COMPILER=../lib/darwin-9.x.universal/clang-omp/bin/clang++ -DWITH_OPENMP=ON

do it?

Hey Jens, How would one go about enabling openmp in cmake? would - DCMAKE_C_COMPILER=../lib/darwin-9.x.universal/clang-omp/bin/clang -DCMAKE_CXX_COMPILER=../lib/darwin-9.x.universal/clang-omp/bin/clang++ -DWITH_OPENMP=ON do it?

Jorge Losilla Martínez commented

2014-03-21 22:59:25 +01:00

Added subscriber: @JorgeLosilla

Jorge Losilla Martínez commented

2014-03-21 22:59:25 +01:00

I have a relatively new Imac with Nvdia gtx 680mx. There is lagging and the smooth brush specially works like a pain in the ass. Not to mention also the problem with the inflate brush. Which also lags and creates weird geometry. Just as reported in previous bugs.

michalis zissiou commented

2014-03-23 22:46:47 +01:00

No open MP support yet.
Sorry.
The worse OSX blender build ever!
Apple must be responsible for this crap.
Apple, always.
Sorry, not a convincing excuse. Apple vs blender devs. Surrealism

No open MP support yet. Sorry. The worse OSX blender build ever! Apple must be responsible for this crap. Apple, always. Sorry, not a convincing excuse. Apple vs blender devs. Surrealism

jens verwiebe commented

2014-03-23 22:56:25 +01:00

The only surrealism is your reports.
Without exact details no one can do any real comparison.
The only hint atm. is new compiler may behave different, but i cannot reproduce here.
With same testsetup 2.69, 2.70gcc or 2.70clang-omp, Sculpt behaves all same as before,
with threading not much win in most cases ( even with last optimization - in trunk )

The only surrealism is your reports. Without exact details no one can do any real comparison. The only hint atm. is new compiler may behave different, but i cannot reproduce here. With same testsetup 2.69, 2.70gcc or 2.70clang-omp, Sculpt behaves all same as before, with threading not much win in most cases ( even with last optimization - in trunk )

michalis zissiou commented

2014-03-23 23:31:56 +01:00

Without exact details no one can do any real comparison.
OK
Visit the BArtists forum, the blender tests/dyntopo tests and ask for opinions.
we are the artists who participated in the blender dyntopo and multires sculpting development.
And, you are who?
I already posted links to posts where same complains happening.
You didn't read them.
Sorry, wrong behavior.
Now, I'll share your reply to the blenderartists community.
Let's see, be my guest.
Surrealism at its best.

You obviously have no idea on how to build on OSX.
We read:
"Btw. for selfcompiling an openMP build use in your user-config:

CC = '../lib/darwin-9.x.universal/clang-omp/bin/clang'
CXX = '../lib/darwin-9.x.universal/clang-omp/bin/clang++'

The needed compiler and omplib is in blender libs for OSX.
OMP is enabled automatically.
Don't forget to eventually do svn update."
Seriously ?

*Without exact details no one can do any real comparison.* OK Visit the BArtists forum, the blender tests/dyntopo tests and ask for opinions. we are the artists who participated in the blender dyntopo and multires sculpting development. And, you are who? I already posted links to posts where same complains happening. You didn't read them. Sorry, wrong behavior. Now, I'll share your reply to the blenderartists community. Let's see, be my guest. Surrealism at its best. You obviously have no idea on how to build on OSX. We read: "Btw. for selfcompiling an openMP build use in your user-config: CC = '../lib/darwin-9.x.universal/clang-omp/bin/clang' CXX = '../lib/darwin-9.x.universal/clang-omp/bin/clang++' The needed compiler and omplib is in blender libs for OSX. OMP is enabled automatically. Don't forget to eventually do svn update." Seriously ?

Bastien Montagne commented

2014-03-23 23:49:58 +01:00

Changed status from 'Open' to: 'Archived'

Bastien Montagne closed this issue

2014-03-23 23:49:58 +01:00

Bastien Montagne commented

2014-03-23 23:49:58 +01:00

Invalid reporter…

Antonis Ryakiotakis commented

2014-03-24 13:18:15 +01:00

@michaliszissou

You just called the maintainer of OSX build systems a newbie and insignificant.
Maybe the only one who can help you resolve the issue.

Since you like publicity I will post some details here so people can know what is happening. I was the one who built your build system. However I am not an Apple dev and cannot know for sure how to enable openmp. I did try to follow some instructions from Jens but:

I tried to modify your build script remotely, without having access to the build cache to check if openmp is indeed enabled, or if errors occurred.
Getting information from you remotely was hard because you were not familiar with the build systems. In fact a request such as 'post the error for me' would generate lots of frustration and confusion. After which I quit trying to do it remotely, telling you I will check it in person next time we met.
You yourself have failed more than once to use the build script properly. In fact last time I checked you still couldn't build an openmp build with the modified script I gave you and I am not really sure if you did manage to do it ever. Since then I didn't have a chance to check your system in person and it's out of the question you would have been able to fix it yourself.

It's most likely an error on our part.

So I suggest a healthy dose of humility, patience and manners please.

@michaliszissou You just called the maintainer of OSX build systems a newbie and insignificant. Maybe the only one who can help you resolve the issue. Since you like publicity I will post some details here so people can know what is happening. I was the one who built your build system. However I am not an Apple dev and cannot know for sure how to enable openmp. I did try to follow some instructions from Jens but: 1) I tried to modify your build script remotely, without having access to the build cache to check if openmp is indeed enabled, or if errors occurred. 2) Getting information from you remotely was hard because you were not familiar with the build systems. In fact a request such as 'post the error for me' would generate lots of frustration and confusion. After which I quit trying to do it remotely, telling you I will check it in person next time we met. 3) You yourself have failed more than once to use the build script properly. In fact last time I checked you still couldn't build an openmp build with the modified script I gave you and I am not really sure if you did manage to do it ever. Since then I didn't have a chance to check your system in person and it's out of the question you would have been able to fix it yourself. It's most likely an error on our part. So I suggest a healthy dose of humility, patience and manners please.

michalis zissiou commented

2014-03-31 17:59:26 +02:00

hash: a5be03b
It works beautifully.
Physical cores (in my case =8) is the best. 16 threads not so. Expected.
Many apologies for my manners,
Thank you very much.

@Psy-Fi,
I replied to you via PM. But, haven't finish it yet.

hash: a5be03b It works beautifully. Physical cores (in my case =8) is the best. 16 threads not so. Expected. Many apologies for my manners, Thank you very much. @Psy-Fi, I replied to you via PM. But, haven't finish it yet.

jens verwiebe commented

2014-03-31 18:35:57 +02:00

michaliszissiou, I accept your apologies, no problem.

Some background:

Intel claims itself that hyperhtreads do not work so well on older cpu. I have no real insight where the smartpoint is, but i guess my
Westmere EP ( hexcore ), where i don't see such huge penalties as you. My other machine does not even have HT ( Harpertown ) so
has also no huge penalty when using full threads.

I guess all i7, i5 and pre Westemere do better with the now used physical corecount.
I made an intersting observation: I can saturate my cpu well with all logical threads -1 ( aka 11 ! ), higher settings are again down to low cpu usage.
Although this looks busy in activity monitor, cannot verify a real benefit here, but perhaps i miss a useful scene only.

The intel omp behaves a bit pickier than gcc omp perhaps. I observed a lot of wait_sleeps when using too high corecount. This shows
the calculations cannot saturate the cores in this case. But : the intel omp is found to perform 10 -20% better than gcc generally ( when used right )

A comment on actual implementation:
Psy-fi and me ( sorry, forgot to give credits in the commit for reusing his work on scenstore and gui ) made this global for now almost, while using the
coretweaks only in sculpt/dyntopo, due i guess we might find other ompe'd areas where lowering the threadcount to phys. cores may be better.

Fluid already got its own thread tweaking long time ago. Smoke seems to saturate well, cloth is not investigated in deep yet.

Anyway, if someone finds this commit too intrusive, i have no problem cutting this down to the sculpt area again.
Also it came to my mind that the storage per scene is perhaps not the best ( michalis ? is it very scenedependant ? )
Having this user-local, aka machine dependant is more useful perhaps ?
On the other hand the "AUTO" setting will always return the optimal setting.
Opinions are welcome.

Jens

P.S: OS maintainers are welcome to translate this insights to other OS

michaliszissiou, I accept your apologies, no problem. Some background: Intel claims itself that hyperhtreads do not work so well on older cpu. I have no real insight where the smartpoint is, but i guess my Westmere EP ( hexcore ), where i don't see such huge penalties as you. My other machine does not even have HT ( Harpertown ) so has also no huge penalty when using full threads. I guess all i7, i5 and pre Westemere do better with the now used physical corecount. I made an intersting observation: I can saturate my cpu well with all logical threads -1 ( aka 11 ! ), higher settings are again down to low cpu usage. Although this looks busy in activity monitor, cannot verify a real benefit here, but perhaps i miss a useful scene only. The intel omp behaves a bit pickier than gcc omp perhaps. I observed a lot of wait_sleeps when using too high corecount. This shows the calculations cannot saturate the cores in this case. But : the intel omp is found to perform 10 -20% better than gcc generally ( when used right ) A comment on actual implementation: Psy-fi and me ( sorry, forgot to give credits in the commit for reusing his work on scenstore and gui ) made this global for now almost, while using the coretweaks only in sculpt/dyntopo, due i guess we might find other ompe'd areas where lowering the threadcount to phys. cores may be better. Fluid already got its own thread tweaking long time ago. Smoke seems to saturate well, cloth is not investigated in deep yet. Anyway, if someone finds this commit too intrusive, i have no problem cutting this down to the sculpt area again. Also it came to my mind that the storage per scene is perhaps not the best ( michalis ? is it very scenedependant ? ) Having this user-local, aka machine dependant is more useful perhaps ? On the other hand the "AUTO" setting will always return the optimal setting. Opinions are welcome. Jens P.S: OS maintainers are welcome to translate this insights to other OS

Sign in to join this conversation.

No Label

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Official 2.7, lagging when sculpting. #39281