- User Since
- Jun 2 2013, 4:47 PM (329 w, 20 h)
Jul 19 2019
Ideas to solve as requested per PM:
- Better tile dispatching give each GPU a corner to work on. Only in the middle they will need to exchange tiles.
- use cuda streams to allow exchanging buffers during rendering maybe?
- transfer the rendered tiles to CPU each time one is finished and do the denoising on CPU (already implemented partially as GPU+CPU rendering works with denoising?). I concentrate on GPU only, so just an idea.
- Render everything first and do the denoising in one batch at the end. That would also allow to decouple tile size for path tracing and denoising, which makes denoising faster as it prefers big tiles and allow small tiles for path tracing which is also faster.
Jul 18 2019
Apr 5 2019
Note that in some scene, even closing the popup doesn't help. In the classroom scene, you have to click on the scene name and then validate to get the presets to apply.
Feb 21 2019
thanks for the very quick fix.
Apr 12 2018
Apr 3 2018
@Brecht Van Lommel (brecht) yep, so we agree, it's this same bug that makes the wood floor look strange and adds the noise in the other scene.
the difference in noise level in https://developer.blender.org/T54486#491645 are really due to the windows which should be completely ignored due to their visibility set to only camera rays. 2.79a ignores them, latest master not. When I render without the windows, both stable and master render the same image.
@Brecht Van Lommel (brecht) I changed line 235 of kernel_path_state.h from
I tried to reproduce it in a simple scene (a box with principled on multiscatter GGX, one opening, 1 portal and one window with only camera visibility), but somehow, the bug doesn't show up in master with this file. Making it a bit more complexe with some textures and objects to occlude and have more bounces show a clear difference in noise pattern, but not really noise levels. Complex and real use case scene on the other hand show clearly that 2.79a has a much better noise level, but I couldn't find yet, what the reason is. Keeping only the walls, windows and lights in the living room/kitchen scene above rendered significantly darker in master compared to 2.79a.
@Brecht Van Lommel (brecht) so if I understand you right, this also explains the much higher noise levels in these scene (see pictures). This scene is lighten by portals with windows set to be only visible by camera. But indeed, although those windows are only hit by not camera rays, removing the windows completely greatly improves lightning and noise in latest master. So it showcases what you said about ray visibility. Here is a comparison of scene 10 from archinteriors 43 for Blender in master and stable:
2.79a = low level noise:
buildbot = very high noise level:
Apr 2 2018
@Milan Jaros (jar091) could you provide a windows or Linux build of latest master using intel compiler 2018 to compare performance in different use cases?
Apr 1 2018
As this pack https://evermotion.org/shop/show_product/archinteriors-vol-43-for-blender/14563 has just been released this week, it would be great if user had an option so that it's not already broken in buildbot/next release. Blender just starts to appear on one of the most renown visualization website.
@Brecht Van Lommel (brecht), it's much more subtle in this case, but should be the same thing happening. The master render has a much darker specularity than stable, like in the wood renders in first post.
ok, I hope I nailed it.
@Brecht Van Lommel (brecht) which bugfix commits touched the specular part of principled shader ?
OS is Linux. If the difference seen in the attached blend is not enough, I'll try to find a free wood texture alternative. The materials in the renders above are copyrighted by evermotion.
Mar 27 2018
Addons relying on carve modifier are not working in master like this one https://blenderartists.org/forum/showthread.php?409837-Destructive-Extrude-BETA/page9 and the results of bmesh are not yet there in some cases (like overlapping/coplanar edges/faces).
I think it was a very good idea to keep only one version of the boolean modifier and bmesh is really fast, but it should offer as much as the old combination. Or is the plan to only remove functionality in this case?
Mar 3 2018
@Campbell Barton (campbellbarton) As Carve boolean has been removed, would it be possible to finalize Bmesh version? Before user could use the other one as fallback, but now we have to use Bmesh. I'm happy to have only one variant if it's robust enough.
Feb 23 2018
thanks for investigating. Can you please attach a file and the according steps to reproduce the crash?
Feb 17 2018
Well, Carve was actually a good fall-back method when Bmesh boolean failed. It would be good to remove fallbacks when the main is polished and covers all the cases the fallback did?
Jan 31 2018
- As @Vuk Gardašević (lijenstina) said, I would keep OpenCollada in any way until another exporter allows to get the same speed for heavy meshes/scenes and are as memory efficient (see https://developer.blender.org/T53236#469082).
- Second, it saved my day many times when the FBX exporter fails.
- Last but not least, regarding the workflow I see the most (archviz), it's a must have to communicate with Sketchup.
Jan 30 2018
Jan 16 2018
Just wanted to report the same bug and noticed it's the last known one for bevel, congrats @Howard Trickey (howardt) for making it so robust :)
Jan 3 2018
rB5aa08eb3cc7 is safe to include.
Dec 29 2017
Yes it is, although what is difficult for a GPU may be easier for another one, so there is no universal definition of complexity for all GPUs. See http://download.blender.org/institute/benchmark171221/latest_snapshot.html and this selection:
For Classroom, all GPU render about as fast, while for Koro, the Vega and TitanV are much faster
Is it possible somehow to render a scene on a nvidia card by using OpenCL and not CUDA for GPU?
By the way, to avoid user fiddle with their registry, we could just send less sample to the GPU at once for CUDA. OpenCL can take anything because their is no such timeout, but for CUDA I don't see an easy solution here to send the right amount of samples to GPU.
- A static approach based on the number of CUDA cores wouldn't adapt to scene complexity, so we would have to be very conservative.
- A dynamic solution, for example if you take the mean samples/sec of n last tiles as reference, may lead to very high render time if rendering starts with sky and then go from full sky tile to 10%sky with fur behind a tree with transparency, translucency, etc. wich may render 100 or even 1000x slower than pure sky, triggering the timeout.
@Christoph Werner (Taros) you said 500 samples at 16x16 was already triggering the bug. A 1000 spp render at 64x64 with same GPU takes about 8 times longer to render. But I'll try at 1400spp. IIRC @Brecht Van Lommel (brecht) said that this timeout bug was not happening for all users somehow?
Dec 28 2017
Dec 26 2017
HI, thanks for the report but don't set the priority yourself. Everybody find it's bug to be the most important one.
I couldn't reproduce the crash, neither with 2.79 nor with latest buildbot using driver 17.12.2 on Vega64. So I would recommend to update your driver and test again.
Dec 24 2017
note that I rendered at 1000spp with 64x64 tiles on a 1080Ti too. Windows 7, latest master and driver 388.31
Trying to render with VS2015 builds, I get a similar error (not sure why it's not the same):
@Brecht Van Lommel (brecht) adding this information in the tooltip or as a unit like for the distance values would clarify such discrepancies.
Dec 16 2017
Rendering the scene lightning.blend in the link above (https://cloud.blender.org/p/agent-327/591ac3cabb3ea141675bbaf2), tested with latest master and 17.11.2 on a VEGA 64, OpenCL system memory usage skyrockets at 27.6:
while CUDA and CPU stay both at around 16GB system memory usage
Vega 64 has only 8GB and cycles reports about 10GB peak memory usage, so I could understand a difference around 2GB (the maximal size of buffers in latest OpenCL code) showing the driver allocating system memory. But why nearly 12GB? Of course the GPU will then idle most of the time, waiting for the system memory.
@Brecht Van Lommel (brecht) did it successfully render the scene lightning.blend in a normal time on the RX480?
Another point is, why is as much system memory kept used while doing GPU rendering? Couldn't the data be freed when everything has been written to the GPU?
Dec 11 2017
added description. I also don't see the point to do something like that, but @Ton Roosendaal (ton) and @Campbell Barton (campbellbarton) asked me to upload diffs instead of paste last time, so I do it. Otherwise, I would just upload builds with a paste until the patch is stable and well tested by users.
Dec 3 2017
Dec 2 2017
with latest version of the branch, barbershop renders correctly now, but the whole PC hangs after the third tile and only a hard reset is possible (windows 7, driver 17.11.4 on Vega64). It would be good to report the bug to AMD.
Nov 30 2017
@Lukas Stockner (lukasstockner97) wouldn't it be possible to let the cpu do the denoising when using gpu rendering? It would spare dedicated memory (denoising datas and kernel would be in system memory) and let the gpu only work on rendering, which should give some speedup.
Nov 27 2017
at least on windows, the barbeshop scene renders some tiles like a Z pass with this branch and then crash.
kernel build time with this patch on the barbershop scene goes from 98sec (split kernel only) to 89sec. Using the branch, it even goes down to 69sec. Together with kernel_base, it goes from nearly 2 minutes to a bit less than 1 and a half minutes. Still pretty high, but already a good speedup.
OSL can't build with this patch on windows.
Nov 26 2017
@Brecht Van Lommel (brecht) I don't have commit rights, could you commit it please?
Nov 25 2017
here is the fix
Nov 19 2017
https://git.blender.org/gitweb/gitweb.cgi/blender.git/commit/659ba012b0f30450c6de13f8b1c2fccce32fc461 render correctly and https://git.blender.org/gitweb/gitweb.cgi/blender.git/commit/f77cdd1d59f6e895b567c4d5fdcc6f2440e03307 renders black
Nov 14 2017
yeah that's annoying, sometime it's even worse. You have to save/open/save/open/etc. until the whole chain of dependencies has been done. Like you delete obj1 wich had material1 which had tex1. You have to save and open and save to really get a file without obj1 which makes material1 with 0 user, then you must save/open to make tex1 with 0 user and then save/open again to really have that texture also removed. When you add linked groups to that story, it's sometime really time consuming and cumbersome to get a clean file.
An option "clean all unused datablocks and save" would really help.
Nov 12 2017
I rechecked with VS2013 builds. The system memory usage varies a bit (max 500MB compared to many GB with 2015) and the performance also is more stable (max 35% variation during 10 renders).
Could someone confirm those behaviours on Windows and test on Linux?
Nov 6 2017
P556 seems to limit the slowdown to about 68seconds from 48 while latest buildbot 8a72be7 goes up to 78sec from 45sec and it's slowdown grows on each new render.
and that's the log with latest buildbot:
@Brecht Van Lommel (brecht) thanks for the patch. Latest master with it (I had to apply manually as it seems it was done on a branch?) gives this log on 3 consecutive renders:
Yes, I used another version to get the free memory reported and tried to see if limiting global size to make it all fit in memory would solve the problem, but it didn't. I can redo the log with vanilla master if you want. Here is the code:
@LazyDodo (LazyDodo) the GPU-Z log is wrong somehow, it ignores half of the memory. But it gives the impression that no memory leak happens on the GPU.
Nov 5 2017
actually, 2.79 has the bug, only the official one had the device selection bug and took the 1080Ti instead, which doesn't use system memory.
So it may be a driver bug, but then why is the first render always 30sec?
After some renders, I got up to 114seconds to render = nearly 3x slower... At this point however, the GPU was idling a lot, maybe waiting all the time for system memory access?
Here is a picture of the task manager with 2 consecutive renders on the same instance of Blender.
It may be a coincidence, but VS2013 builds had only +/-10% between first and consecutive renders (made 5 of them) while VS2015 builds go crazy with up to 3x the render time.
If someone could test on Linux with a RX480 to see if GCC or the Linux driver handles this differently. As said before, the RX480 can render this scene. On Linux, the Nvidia drivers destroy a part of the AMD driver and I couldn't find a solution to have both drivers side by side yet.
commit b53e35c655d4 already has the bug, so it's not due to the buffer patch.
Nov 4 2017
got some explanations on IRC, sorry I didn't know the whole story.
the bug was already there 24.08.2017, so my guess is that rBec8ae4d5e9f7 is the commit we look for.
just to give an idea of the mess to bisect:
- cuda disables completly opencl in the majority of revision, so you have to rebuild without cuda
- device selection changed, so userpref have to be modified depending on the revision you test and bisecting requires to go back and forth in time.
- kernel compilation takes 1min50 for victor
- scene preparation takes 2min04
so it takes about 5minutes of VS compile, then manual tweaks for user pref, then 2minute kernel compile+ then 2 renders at 2 (scene preps)+2(render)=8minutes of rendering. That's a quarter of an hour with 4 user intervention between which you can't do much.
rBec8ae4d5e9f7 only added support for more than 4GB of textures iirc.
You don't need HBCC support. On win7, even on Vega, there is no HBCC and my RX480 also renders full victor scene since a year on windows and since some months on Linux.
@Brecht Van Lommel (brecht) Is there a simple command to disable the context caching?
I could try to bisect, but @Mai Lavelle (maiself) should have better guesses of what could have introduced this bug. The scene preparation of Victor takes more than 2 minutes on my computer. With compile time on windows on top, shooting in the dark to bisect would take a lot of time.
The random render times are also in buildbots (but not in 2.79), this patch just make it even more obvious. Render times vary between 29seconds to 102seconds for a small border render of victor. So I reported the bug T53249.
Note that in all above cases, system memory is used as the scene doesn't fit in the dedicated 8Gb memory. So it doesn't seem to come from how the drivers allocates the memory between dedicated and system memory.
Also, power usage of GPU was reduced to ensure no throttling happens, frequency were stable during all the tests.
Can someone else confirm that with victor scene, the second render is much slower with OpenCL?
Yes, I added a bevel shader to test if we could remove the selective compilation on all the scenes I tested. It would be ok for scene fitting in memory, but slow a lot on other.
Nov 3 2017
Some scenes like Barcelona are even slightly faster with bevel as it seem to remove the slowdowns on simple scenes from D2249. The times in general with latest D2249 + D2803 are the same as with master.
However, victor is 2x slower with bevel and D2249 compared to D2249 alone and 23% slower compared to master.
Could register pressure be further reduced on this patch? It worked really well with bicubic texture filtering.
with D2249, render speed is the same with and without bevel on a Vega64 (just forced to be compiled with an unconnected node). I'm still testing Victor to see if scenes that use system memory get slowdowns from it.
After testing on several scenes in my library, I only got good speedups for those not fitting in memory and slowdowns under 2% on smaller ones. So on the performance side, the benefit really out-weights the very small slowdowns if any (many scenes were as fast).
I had to hit tab a lot (spam it more than 10 times) after joining and got a crash indeed on win7 using latest master.
I do all my test with lowered voltage. Temperature never goes above 72°C (max set at 80°C) and frequency is stable. So the timings are also reproducible with differences under 1%.
Some further results:
|scene||slowdown on Vega64/win7|
Victor goes from 23minutes to 17min40 to render with the patch. However, rendering a second time takes 43minutes, so something goes wrong. I guess the speedup comes from the reduced memory usage and something is not freed properly until Blender is closed.
Rendering after restarting Blender again gives 17min40 the first time and 43minutes the second time.
A border render at low resolution of victor takes 1min20 of pure render time with master vs 3min08 with P553 with crimson drivers 17.10.2 on win7 and Vega64.
I can also confirm the speed regression on Vega64/win7 with D2249. Where I awaited speedups the most is in heavy scenes like victor.blend. But render time went from 23minutes to 48minutes. I still have to test Brecht's version.
Oct 25 2017
this patch makes some scenes with volumes to crash like this one https://blenderartists.org/forum/showthread.php?439394-The-new-Cycles-GPU-2-79-Benchmark
Oct 21 2017
I had the same problem with 1080Ti and win7. It was on classroom scene, so it had nothing to do with principled shader. Killing the process, restarting Blender and relaunching the render was enough. No idea what happened as I was using the release UI without any debug info and I couldn't reproduce it since then. It happened just after opening the scene and hitting F12.
there is an aur to have proprietery openCL on mesa driver: https://aur.archlinux.org/packages/opencl-amd
and if you know what you do, you can compile the amd staging linux kernel with rocm and custom llvm. It's fully open source.
Oct 19 2017
@Leo (.Pixel) you are welcome. Please report your times with and without HBCC. Some websites report 15% speedup even when it all fits in the VRAM. Would be interesting to have your results to compare.
@Leo (.Pixel) latest buildbot have the fix for this error. I just tried and it renders without any tricks/simplification
Oct 12 2017
viewport rendering of BMW from official benchmark pack takes 12seconds on 1080TI, 20seconds on Vega64 and 16 seconds using both. With F12 render, that's the opposite, Vega is faster with 82sec (at 128x128, best time), 1080Ti takes 93seconds (at 16x16, best time) and both take 44seconds using latest master with initial_num_samples at 5000.
To sum up:
- viewport seem really slow in latest master. OpenCL. 2.78c with selective node compilation for viewport renders nearly 2x faster on Vega 64. It's not due to SSS or volume as those are not compiled in viewport kernel either. I can investigate on that.
- multi-device rendering is slower with viewport/progressive rendering than the fastest device alone. Logic would be to wait for the slowest half to finish, which would be around 10seconds for Vega?
if you or someone at BF/BI have a direct contact with AMD, maybe the best would be to report it as a bug if it works under Linux.
Oct 11 2017
The openCL kernel is about 20% slower on the BMW scene with this commit. from 1min33 to 1min50 on Vega64 using latest driver on win7.
Oct 10 2017
are the 15-10% for the total rendering time of production scenes or just the intersection code?
from test made in UI, cpu indeed is always the last one to finish. The more threads a cpu will have, the higher the probability is that the GPUs will idle, because those 16 or 32 tiles are already being rendered by cpu very slowly. So if it's possible without too much work, I would say it would be more effective in real scenarios to let all the CPU thread render one tile, just like all the thread of the GPU render one tile. It may also improve cache behaviour and increase render speed. Of course, letting GPU render several tiles would still be needed to ensure better occupancy.
Oct 8 2017
replacing line 1374 with: