- User Since
- Jun 2 2013, 4:47 PM (246 w, 4 d)
Sat, Feb 17
Well, Carve was actually a good fall-back method when Bmesh boolean failed. It would be good to remove fallbacks when the main is polished and covers all the cases the fallback did?
Wed, Jan 31
- As @Vuk Gardašević (lijenstina) said, I would keep OpenCollada in any way until another exporter allows to get the same speed for heavy meshes/scenes and are as memory efficient (see https://developer.blender.org/T53236#469082).
- Second, it saved my day many times when the FBX exporter fails.
- Last but not least, regarding the workflow I see the most (archviz), it's a must have to communicate with Sketchup.
Tue, Jan 30
Jan 16 2018
Just wanted to report the same bug and noticed it's the last known one for bevel, congrats @Howard Trickey (howardt) for making it so robust :)
Jan 3 2018
rB5aa08eb3cc7 is safe to include.
Dec 29 2017
Yes it is, although what is difficult for a GPU may be easier for another one, so there is no universal definition of complexity for all GPUs. See http://download.blender.org/institute/benchmark171221/latest_snapshot.html and this selection:
For Classroom, all GPU render about as fast, while for Koro, the Vega and TitanV are much faster
Is it possible somehow to render a scene on a nvidia card by using OpenCL and not CUDA for GPU?
By the way, to avoid user fiddle with their registry, we could just send less sample to the GPU at once for CUDA. OpenCL can take anything because their is no such timeout, but for CUDA I don't see an easy solution here to send the right amount of samples to GPU.
- A static approach based on the number of CUDA cores wouldn't adapt to scene complexity, so we would have to be very conservative.
- A dynamic solution, for example if you take the mean samples/sec of n last tiles as reference, may lead to very high render time if rendering starts with sky and then go from full sky tile to 10%sky with fur behind a tree with transparency, translucency, etc. wich may render 100 or even 1000x slower than pure sky, triggering the timeout.
@Christoph Werner (Taros) you said 500 samples at 16x16 was already triggering the bug. A 1000 spp render at 64x64 with same GPU takes about 8 times longer to render. But I'll try at 1400spp. IIRC @Brecht Van Lommel (brecht) said that this timeout bug was not happening for all users somehow?
Dec 28 2017
Dec 26 2017
HI, thanks for the report but don't set the priority yourself. Everybody find it's bug to be the most important one.
I couldn't reproduce the crash, neither with 2.79 nor with latest buildbot using driver 17.12.2 on Vega64. So I would recommend to update your driver and test again.
Dec 24 2017
note that I rendered at 1000spp with 64x64 tiles on a 1080Ti too. Windows 7, latest master and driver 388.31
Trying to render with VS2015 builds, I get a similar error (not sure why it's not the same):
@Brecht Van Lommel (brecht) adding this information in the tooltip or as a unit like for the distance values would clarify such discrepancies.
Dec 16 2017
Rendering the scene lightning.blend in the link above (https://cloud.blender.org/p/agent-327/591ac3cabb3ea141675bbaf2), tested with latest master and 17.11.2 on a VEGA 64, OpenCL system memory usage skyrockets at 27.6:
while CUDA and CPU stay both at around 16GB system memory usage
Vega 64 has only 8GB and cycles reports about 10GB peak memory usage, so I could understand a difference around 2GB (the maximal size of buffers in latest OpenCL code) showing the driver allocating system memory. But why nearly 12GB? Of course the GPU will then idle most of the time, waiting for the system memory.
@Brecht Van Lommel (brecht) did it successfully render the scene lightning.blend in a normal time on the RX480?
Another point is, why is as much system memory kept used while doing GPU rendering? Couldn't the data be freed when everything has been written to the GPU?
Dec 11 2017
added description. I also don't see the point to do something like that, but @Ton Roosendaal (ton) and @Campbell Barton (campbellbarton) asked me to upload diffs instead of paste last time, so I do it. Otherwise, I would just upload builds with a paste until the patch is stable and well tested by users.
Dec 3 2017
Dec 2 2017
with latest version of the branch, barbershop renders correctly now, but the whole PC hangs after the third tile and only a hard reset is possible (windows 7, driver 17.11.4 on Vega64). It would be good to report the bug to AMD.
Nov 30 2017
@Lukas Stockner (lukasstockner97) wouldn't it be possible to let the cpu do the denoising when using gpu rendering? It would spare dedicated memory (denoising datas and kernel would be in system memory) and let the gpu only work on rendering, which should give some speedup.
Nov 27 2017
at least on windows, the barbeshop scene renders some tiles like a Z pass with this branch and then crash.
kernel build time with this patch on the barbershop scene goes from 98sec (split kernel only) to 89sec. Using the branch, it even goes down to 69sec. Together with kernel_base, it goes from nearly 2 minutes to a bit less than 1 and a half minutes. Still pretty high, but already a good speedup.
OSL can't build with this patch on windows.
Nov 26 2017
@Brecht Van Lommel (brecht) I don't have commit rights, could you commit it please?
Nov 25 2017
here is the fix
Nov 19 2017
https://git.blender.org/gitweb/gitweb.cgi/blender.git/commit/659ba012b0f30450c6de13f8b1c2fccce32fc461 render correctly and https://git.blender.org/gitweb/gitweb.cgi/blender.git/commit/f77cdd1d59f6e895b567c4d5fdcc6f2440e03307 renders black
Nov 14 2017
yeah that's annoying, sometime it's even worse. You have to save/open/save/open/etc. until the whole chain of dependencies has been done. Like you delete obj1 wich had material1 which had tex1. You have to save and open and save to really get a file without obj1 which makes material1 with 0 user, then you must save/open to make tex1 with 0 user and then save/open again to really have that texture also removed. When you add linked groups to that story, it's sometime really time consuming and cumbersome to get a clean file.
An option "clean all unused datablocks and save" would really help.
Nov 12 2017
I rechecked with VS2013 builds. The system memory usage varies a bit (max 500MB compared to many GB with 2015) and the performance also is more stable (max 35% variation during 10 renders).
Could someone confirm those behaviours on Windows and test on Linux?
Nov 6 2017
and that's the log with latest buildbot:
@Brecht Van Lommel (brecht) thanks for the patch. Latest master with it (I had to apply manually as it seems it was done on a branch?) gives this log on 3 consecutive renders:
Yes, I used another version to get the free memory reported and tried to see if limiting global size to make it all fit in memory would solve the problem, but it didn't. I can redo the log with vanilla master if you want. Here is the code:
@LazyDodo (LazyDodo) the GPU-Z log is wrong somehow, it ignores half of the memory. But it gives the impression that no memory leak happens on the GPU.
Nov 5 2017
actually, 2.79 has the bug, only the official one had the device selection bug and took the 1080Ti instead, which doesn't use system memory.
So it may be a driver bug, but then why is the first render always 30sec?
After some renders, I got up to 114seconds to render = nearly 3x slower... At this point however, the GPU was idling a lot, maybe waiting all the time for system memory access?
Here is a picture of the task manager with 2 consecutive renders on the same instance of Blender.
It may be a coincidence, but VS2013 builds had only +/-10% between first and consecutive renders (made 5 of them) while VS2015 builds go crazy with up to 3x the render time.
If someone could test on Linux with a RX480 to see if GCC or the Linux driver handles this differently. As said before, the RX480 can render this scene. On Linux, the Nvidia drivers destroy a part of the AMD driver and I couldn't find a solution to have both drivers side by side yet.
commit b53e35c655d4 already has the bug, so it's not due to the buffer patch.
Nov 4 2017
got some explanations on IRC, sorry I didn't know the whole story.
the bug was already there 24.08.2017, so my guess is that rBec8ae4d5e9f7 is the commit we look for.
just to give an idea of the mess to bisect:
- cuda disables completly opencl in the majority of revision, so you have to rebuild without cuda
- device selection changed, so userpref have to be modified depending on the revision you test and bisecting requires to go back and forth in time.
- kernel compilation takes 1min50 for victor
- scene preparation takes 2min04
so it takes about 5minutes of VS compile, then manual tweaks for user pref, then 2minute kernel compile+ then 2 renders at 2 (scene preps)+2(render)=8minutes of rendering. That's a quarter of an hour with 4 user intervention between which you can't do much.
rBec8ae4d5e9f7 only added support for more than 4GB of textures iirc.
You don't need HBCC support. On win7, even on Vega, there is no HBCC and my RX480 also renders full victor scene since a year on windows and since some months on Linux.
@Brecht Van Lommel (brecht) Is there a simple command to disable the context caching?
I could try to bisect, but @Mai Lavelle (maiself) should have better guesses of what could have introduced this bug. The scene preparation of Victor takes more than 2 minutes on my computer. With compile time on windows on top, shooting in the dark to bisect would take a lot of time.
The random render times are also in buildbots (but not in 2.79), this patch just make it even more obvious. Render times vary between 29seconds to 102seconds for a small border render of victor. So I reported the bug T53249.
Note that in all above cases, system memory is used as the scene doesn't fit in the dedicated 8Gb memory. So it doesn't seem to come from how the drivers allocates the memory between dedicated and system memory.
Also, power usage of GPU was reduced to ensure no throttling happens, frequency were stable during all the tests.
Can someone else confirm that with victor scene, the second render is much slower with OpenCL?
Yes, I added a bevel shader to test if we could remove the selective compilation on all the scenes I tested. It would be ok for scene fitting in memory, but slow a lot on other.
Nov 3 2017
Some scenes like Barcelona are even slightly faster with bevel as it seem to remove the slowdowns on simple scenes from D2249. The times in general with latest D2249 + D2803 are the same as with master.
However, victor is 2x slower with bevel and D2249 compared to D2249 alone and 23% slower compared to master.
Could register pressure be further reduced on this patch? It worked really well with bicubic texture filtering.
with D2249, render speed is the same with and without bevel on a Vega64 (just forced to be compiled with an unconnected node). I'm still testing Victor to see if scenes that use system memory get slowdowns from it.
After testing on several scenes in my library, I only got good speedups for those not fitting in memory and slowdowns under 2% on smaller ones. So on the performance side, the benefit really out-weights the very small slowdowns if any (many scenes were as fast).
I had to hit tab a lot (spam it more than 10 times) after joining and got a crash indeed on win7 using latest master.
I do all my test with lowered voltage. Temperature never goes above 72°C (max set at 80°C) and frequency is stable. So the timings are also reproducible with differences under 1%.
Some further results:
|scene||slowdown on Vega64/win7|
Victor goes from 23minutes to 17min40 to render with the patch. However, rendering a second time takes 43minutes, so something goes wrong. I guess the speedup comes from the reduced memory usage and something is not freed properly until Blender is closed.
Rendering after restarting Blender again gives 17min40 the first time and 43minutes the second time.
A border render at low resolution of victor takes 1min20 of pure render time with master vs 3min08 with P553 with crimson drivers 17.10.2 on win7 and Vega64.
I can also confirm the speed regression on Vega64/win7 with D2249. Where I awaited speedups the most is in heavy scenes like victor.blend. But render time went from 23minutes to 48minutes. I still have to test Brecht's version.
Oct 25 2017
this patch makes some scenes with volumes to crash like this one https://blenderartists.org/forum/showthread.php?439394-The-new-Cycles-GPU-2-79-Benchmark
Oct 21 2017
I had the same problem with 1080Ti and win7. It was on classroom scene, so it had nothing to do with principled shader. Killing the process, restarting Blender and relaunching the render was enough. No idea what happened as I was using the release UI without any debug info and I couldn't reproduce it since then. It happened just after opening the scene and hitting F12.
there is an aur to have proprietery openCL on mesa driver: https://aur.archlinux.org/packages/opencl-amd
and if you know what you do, you can compile the amd staging linux kernel with rocm and custom llvm. It's fully open source.
Oct 19 2017
@Leo (.Pixel) you are welcome. Please report your times with and without HBCC. Some websites report 15% speedup even when it all fits in the VRAM. Would be interesting to have your results to compare.
@Leo (.Pixel) latest buildbot have the fix for this error. I just tried and it renders without any tricks/simplification
Oct 12 2017
viewport rendering of BMW from official benchmark pack takes 12seconds on 1080TI, 20seconds on Vega64 and 16 seconds using both. With F12 render, that's the opposite, Vega is faster with 82sec (at 128x128, best time), 1080Ti takes 93seconds (at 16x16, best time) and both take 44seconds using latest master with initial_num_samples at 5000.
To sum up:
- viewport seem really slow in latest master. OpenCL. 2.78c with selective node compilation for viewport renders nearly 2x faster on Vega 64. It's not due to SSS or volume as those are not compiled in viewport kernel either. I can investigate on that.
- multi-device rendering is slower with viewport/progressive rendering than the fastest device alone. Logic would be to wait for the slowest half to finish, which would be around 10seconds for Vega?
if you or someone at BF/BI have a direct contact with AMD, maybe the best would be to report it as a bug if it works under Linux.
Oct 11 2017
The openCL kernel is about 20% slower on the BMW scene with this commit. from 1min33 to 1min50 on Vega64 using latest driver on win7.
Oct 10 2017
are the 15-10% for the total rendering time of production scenes or just the intersection code?
from test made in UI, cpu indeed is always the last one to finish. The more threads a cpu will have, the higher the probability is that the GPUs will idle, because those 16 or 32 tiles are already being rendered by cpu very slowly. So if it's possible without too much work, I would say it would be more effective in real scenarios to let all the CPU thread render one tile, just like all the thread of the GPU render one tile. It may also improve cache behaviour and increase render speed. Of course, letting GPU render several tiles would still be needed to ensure better occupancy.
Oct 8 2017
replacing line 1374 with:
It seems the multi-device patch works when CPU is not selected. Note that in this case, selecting cpu in either the CUDA or OpenCL Tab of user pref activates it in the other one, so maybe it conflicts somehow?
indeed, it works now :) thanks for the solution, I think it's safe for master. Here are the diff we spoke of in IRC:
To allow OpenCL to be selected when CUDA is present:
to render all samples at once on OpenCL to limit update overhead, but botleneck is somewhere else when rendering small tiles:
And the patch you proposed to render with CUDA, OpenCL and CPU together, but only init phase works.
full is made with "make.bat full" = no cubins
release is made with "make.bat release" = with cubins
Oct 7 2017
Oct 3 2017
Oct 1 2017
If same behavior could be obtained on OpenCL, the next benchmark pack could have only one version with 32x32 for CPU, Cuda and OpenCL. That would also avoid playing with tile size for the end user. On OpenCL currently, only tile size between 256 and 64 are about the same speed. 32 is much slower and one tile also (two digit percent). It would also maybe allow later to have a tile server that would render with CPU codepath for CPU, CUDA code path for NVidia and OpenCL code path for AMD all on one frame, using one tile size.
Sep 23 2017
The boolean in question was applied many month ago already in the project files. Trying to recreate the same object with a modifier, the bug doesn't show up.
Sep 22 2017
Sep 20 2017
After testing many versions, Bug is already present in 2.74.
@Campbell Barton (campbellbarton) I'm working on a file for the boolean part. If collada and FBX exporter manage to export the geometry correctly, even with 0 area face, why not make the obj export also work in such cases like the 2 others?
Sep 19 2017
it's the result of boolean modifier on a cube. Bug happen with boolean modifier too (so a simple cube with cubes cutting it). I applied everything and removed some parts for simplification purpose.
just tested, collada export doesn't has this bug
Thanks for the quick fix, but maybe only one fix would be enough? Don't know if the 2 fixes together may decrease performance?
The thing is that it looks right in viewport before applying. If the bad geometry is the "right" one, then it should also look bad before applying. A modifier should look the same applied or not.