- User Since
- Jun 2 2013, 4:47 PM (233 w, 2 d)
Sun, Nov 19
659ba012b0f30450c6de13f8b1c2fccce32fc461 render correctly.
Tue, Nov 14
yeah that's annoying, sometime it's even worse. You have to save/open/save/open/etc. until the whole chain of dependencies has been done. Like you delete obj1 wich had material1 which had tex1. You have to save and open and save to really get a file without obj1 which makes material1 with 0 user, then you must save/open to make tex1 with 0 user and then save/open again to really have that texture also removed. When you add linked groups to that story, it's sometime really time consuming and cumbersome to get a clean file.
An option "clean all unused datablocks and save" would really help.
Sun, Nov 12
I rechecked with VS2013 builds. The system memory usage varies a bit (max 500MB compared to many GB with 2015) and the performance also is more stable (max 35% variation during 10 renders).
Could someone confirm those behaviours on Windows and test on Linux?
Mon, Nov 6
and that's the log with latest buildbot:
@Brecht Van Lommel (brecht) thanks for the patch. Latest master with it (I had to apply manually as it seems it was done on a branch?) gives this log on 3 consecutive renders:
Yes, I used another version to get the free memory reported and tried to see if limiting global size to make it all fit in memory would solve the problem, but it didn't. I can redo the log with vanilla master if you want. Here is the code:
@LazyDodo (LazyDodo) the GPU-Z log is wrong somehow, it ignores half of the memory. But it gives the impression that no memory leak happens on the GPU.
Sun, Nov 5
actually, 2.79 has the bug, only the official one had the device selection bug and took the 1080Ti instead, which doesn't use system memory.
So it may be a driver bug, but then why is the first render always 30sec?
After some renders, I got up to 114seconds to render = nearly 3x slower... At this point however, the GPU was idling a lot, maybe waiting all the time for system memory access?
Here is a picture of the task manager with 2 consecutive renders on the same instance of Blender.
It may be a coincidence, but VS2013 builds had only +/-10% between first and consecutive renders (made 5 of them) while VS2015 builds go crazy with up to 3x the render time.
If someone could test on Linux with a RX480 to see if GCC or the Linux driver handles this differently. As said before, the RX480 can render this scene. On Linux, the Nvidia drivers destroy a part of the AMD driver and I couldn't find a solution to have both drivers side by side yet.
commit b53e35c655d4 already has the bug, so it's not due to the buffer patch.
Sat, Nov 4
got some explanations on IRC, sorry I didn't know the whole story.
the bug was already there 24.08.2017, so my guess is that rBec8ae4d5e9f7 is the commit we look for.
just to give an idea of the mess to bisect:
- cuda disables completly opencl in the majority of revision, so you have to rebuild without cuda
- device selection changed, so userpref have to be modified depending on the revision you test and bisecting requires to go back and forth in time.
- kernel compilation takes 1min50 for victor
- scene preparation takes 2min04
so it takes about 5minutes of VS compile, then manual tweaks for user pref, then 2minute kernel compile+ then 2 renders at 2 (scene preps)+2(render)=8minutes of rendering. That's a quarter of an hour with 4 user intervention between which you can't do much.
rBec8ae4d5e9f7 only added support for more than 4GB of textures iirc.
You don't need HBCC support. On win7, even on Vega, there is no HBCC and my RX480 also renders full victor scene since a year on windows and since some months on Linux.
@Brecht Van Lommel (brecht) Is there a simple command to disable the context caching?
I could try to bisect, but @Mai Lavelle (maiself) should have better guesses of what could have introduced this bug. The scene preparation of Victor takes more than 2 minutes on my computer. With compile time on windows on top, shooting in the dark to bisect would take a lot of time.
The random render times are also in buildbots (but not in 2.79), this patch just make it even more obvious. Render times vary between 29seconds to 102seconds for a small border render of victor. So I reported the bug T53249.
Note that in all above cases, system memory is used as the scene doesn't fit in the dedicated 8Gb memory. So it doesn't seem to come from how the drivers allocates the memory between dedicated and system memory.
Can someone else confirm that with victor scene, the second render is much slower with OpenCL?
Yes, I added a bevel shader to test if we could remove the selective compilation on all the scenes I tested. It would be ok for scene fitting in memory, but slow a lot on other.
Fri, Nov 3
Some scenes like Barcelona are even slightly faster with bevel as it seem to remove the slowdowns on simple scenes from D2249. The times in general with latest D2249 + D2803 are the same as with master.
However, victor is 2x slower with bevel and D2249 compared to D2249 alone and 23% slower compared to master.
Could register pressure be further reduced on this patch? It worked really well with bicubic texture filtering.
with D2249, render speed is the same with and without bevel on a Vega64 (just forced to be compiled with an unconnected node). I'm still testing Victor to see if scenes that use system memory get slowdowns from it.
After testing on several scenes in my library, I only got good speedups for those not fitting in memory and slowdowns under 2% on smaller ones. So on the performance side, the benefit really out-weights the very small slowdowns if any (many scenes were as fast).
I had to hit tab a lot (spam it more than 10 times) after joining and got a crash indeed.
I do all my test with lowered voltage. Temperature never goes above 72°C (max set at 80°C) and frequency is stable. So the timings are also reproducible with differences under 1%.
Some further results:
|scene||slowdown on Vega64/win7|
Victor goes from 23minutes to 17min40 to render with the patch. However, rendering a second time takes 43minutes, so something goes wrong. I guess the speedup comes from the reduced memory usage and something is not freed properly until Blender is closed.
Rendering after restarting Blender again gives 17min40 the first time and 43minutes the second time.
A border render at low resolution of victor takes 1min20 of pure render time with master vs 3min08 with P553 with crimson drivers 17.10.2 on win7 and Vega64.
I can also confirm the speed regression on Vega64/win7 with D2249. Where I awaited speedups the most is in heavy scenes like victor.blend. But render time went from 23minutes to 48minutes. I still have to test Brecht's version.
Wed, Oct 25
this patch makes some scenes with volumes to crash like this one https://blenderartists.org/forum/showthread.php?439394-The-new-Cycles-GPU-2-79-Benchmark
Oct 21 2017
I had the same problem with 1080Ti and win7. It was on classroom scene, so it had nothing to do with principled shader. Killing the process, restarting Blender and relaunching the render was enough. No idea what happened as I was using the release UI without any debug info and I couldn't reproduce it since then. It happened just after opening the scene and hitting F12.
there is an aur to have proprietery openCL on mesa driver: https://aur.archlinux.org/packages/opencl-amd
and if you know what you do, you can compile the amd staging linux kernel with rocm and custom llvm. It's fully open source.
Oct 19 2017
@Leo (.Pixel) you are welcome. Please report your times with and without HBCC. Some websites report 15% speedup even when it all fits in the VRAM. Would be interesting to have your results to compare.
@Leo (.Pixel) latest buildbot have the fix for this error. I just tried and it renders without any tricks/simplification
Oct 12 2017
viewport rendering of BMW from official benchmark pack takes 12seconds on 1080TI, 20seconds on Vega64 and 16 seconds using both. With F12 render, that's the opposite, Vega is faster with 82sec (at 128x128, best time), 1080Ti takes 93seconds (at 16x16, best time) and both take 44seconds using latest master with initial_num_samples at 5000.
To sum up:
- viewport seem really slow in latest master. OpenCL. 2.78c with selective node compilation for viewport renders nearly 2x faster on Vega 64. It's not due to SSS or volume as those are not compiled in viewport kernel either. I can investigate on that.
- multi-device rendering is slower with viewport/progressive rendering than the fastest device alone. Logic would be to wait for the slowest half to finish, which would be around 10seconds for Vega?
if you or someone at BF/BI have a direct contact with AMD, maybe the best would be to report it as a bug if it works under Linux.
Oct 11 2017
The openCL kernel is about 20% slower on the BMW scene with this commit. from 1min33 to 1min50 on Vega64 using latest driver on win7.
Oct 10 2017
are the 15-10% for the total rendering time of production scenes or just the intersection code?
from test made in UI, cpu indeed is always the last one to finish. The more threads a cpu will have, the higher the probability is that the GPUs will idle, because those 16 or 32 tiles are already being rendered by cpu very slowly. So if it's possible without too much work, I would say it would be more effective in real scenarios to let all the CPU thread render one tile, just like all the thread of the GPU render one tile. It may also improve cache behaviour and increase render speed. Of course, letting GPU render several tiles would still be needed to ensure better occupancy.
Oct 8 2017
replacing line 1374 with:
It seems the multi-device patch works when CPU is not selected. Note that in this case, selecting cpu in either the CUDA or OpenCL Tab of user pref activates it in the other one, so maybe it conflicts somehow?
indeed, it works now :) thanks for the solution, I think it's safe for master. Here are the diff we spoke of in IRC:
To allow OpenCL to be selected when CUDA is present:
to render all samples at once on OpenCL to limit update overhead, but botleneck is somewhere else when rendering small tiles:
And the patch you proposed to render with CUDA, OpenCL and CPU together, but only init phase works.
full is made with "make.bat full" = no cubins
release is made with "make.bat release" = with cubins
Oct 7 2017
Oct 3 2017
Oct 1 2017
If same behavior could be obtained on OpenCL, the next benchmark pack could have only one version with 32x32 for CPU, Cuda and OpenCL. That would also avoid playing with tile size for the end user. On OpenCL currently, only tile size between 256 and 64 are about the same speed. 32 is much slower and one tile also (two digit percent). It would also maybe allow later to have a tile server that would render with CPU codepath for CPU, CUDA code path for NVidia and OpenCL code path for AMD all on one frame, using one tile size.
Sep 23 2017
The boolean in question was applied many month ago already in the project files. Trying to recreate the same object with a modifier, the bug doesn't show up.
Sep 22 2017
Sep 20 2017
After testing many versions, Bug is already present in 2.74.
@Campbell Barton (campbellbarton) I'm working on a file for the boolean part. If collada and FBX exporter manage to export the geometry correctly, even with 0 area face, why not make the obj export also work in such cases like the 2 others?
Sep 19 2017
it's the result of boolean modifier on a cube. Bug happen with boolean modifier too (so a simple cube with cubes cutting it). I applied everything and removed some parts for simplification purpose.
just tested, collada export doesn't has this bug
Thanks for the quick fix, but maybe only one fix would be enough? Don't know if the 2 fixes together may decrease performance?
The thing is that it looks right in viewport before applying. If the bad geometry is the "right" one, then it should also look bad before applying. A modifier should look the same applied or not.
Sep 18 2017
Sep 16 2017
@rm beer (rmbeer) couldn't reproduce either with the 2 files. Maybe get a fresh buildbot and delete your config folder?
Sep 10 2017
it seems to only happen with AO approx activated. I'll have a more in depth look tomorrow.
Sep 9 2017
only use one function, performance stay the same.
It looked way too much, so I tested again. Time differences were due to throttling issue. When throttling is removed, both version render at the same speed. I'll update the diff with the new one.
With one function, the scandinavian scene from Chocofur renders in 471sec against 426s with the current version. So the current version renders 10.6% more frames in the same time.
Here is the diff to test:
So I think the best solution is to keep the patch as it is, but following already made workarouds naming conventions like lcg_state_init_addrspace by adding _addrspace instead of _ocl. @Brecht Van Lommel (brecht) What do you think?
I can confirm 17.9.1 still crashes on Vega and RX 480 with another machine and a clean install. It seems somehow the OpenCL driver wasn't actualized with normal install. So in the meantime, staying on 17.7.1 is recommended.
@Lennard Haaks (lennardh) both your file and my file rendered on latest buildbot using 17.9.1 on RX480. I did a normal install from 17.7.1, but OpenCL driver should have been actualized. I'll try today on Vega to see with clean install.
Sep 8 2017
driver 17.9.1 resolves the bug.
Sep 7 2017
RC1 doesn't work, build from 21 July does with 17.8.2
Driver 17.7.1 works, 17.8.2 doesn't, for both files
Sep 6 2017
I think Aaron means to make them only visible in top view or camera view for example. That's indeed very usefull. And if it works for empties, it would be great to have it work for all objects. I would add a kind of sprite rendering like in old games, where the image always faces the view, but can also be constrained to only rotate around Z for example. The empty would then have a list of views ([always show, camera, front, back, top, bottom]) were they are displayed or not, and the constraints. Blender current constraint don't work if you have multiple 3D Viewports with different views, but with OpenGL it's ok.
The X-Ray option to draw behind or in front should be enough to have then all functionalities with a much better usability.
Sep 3 2017
Updated the diff with required changes. As Nvidia now also support OCL 2.0, I suppose it will make sens to switch to it for 2.8x. So I made 2 functions to workaround the address space problem of OCL 1.2. When we switch to 2.0, removing the second function will be easy and until then, changing the code is bundled in one place. If you prefer another workaround, I'm open to it.
I agree to deduplicate, although doing it as part of https://developer.blender.org/D2644 would maybe make more sens?
Aug 31 2017
I remember you already had a patch to make the optimization before the kernel compile options are set/gathered. Could you share it in the tracker?
Aug 30 2017
Aug 22 2017
on simple scenes where the global size stays at or above 1024 on x, the speed impact is negligable. The default cube is even faster with 4spp bevel than without ... 14.05sec with and 15.36sec without. If you fill the camera FOV with cubes at 4spp, render time goes from 33sec without to 40sec with = about 21% slower. With 16spp, it goes up to 56sec = 70% slower
Welcome back Brecht :)
Regarding performance on split kernel, using a RX VEGA 64 on power save mode (to prevent throttling and get stable results):
- Applying latest revision on latest master and adding an invisible cube with bevel at 4 samples to just get the code compiled in the Barcelona scene, global size fall from 1024 to 576 on x.
- Render time increase from 430seconds to 478seconds (without synchronization time)
Aug 13 2017
note that the edgeloop pairs are made correctly, so that the same loops are connected, bug happen on vertex level somehow.
Aug 12 2017
Very helpful for review. It's a pleasure to see a student with so long commitment.
Jul 26 2017
@Sergey Sharybin (sergey) T51975 shows variants of bug that pops up when this carve object is in scene. Sometime, some part of the mesh (the box containing the vegetables) was missing, sometime the textures were pink, sometime the shader got white. Most of the time hitting F12 would segfault. Sometime it succeeded, but the segfault appeared just after render finished (couldn't even save the file).
@Fable Fox (fablefox) debug builds indeed are working. Of course, replacing carve with bmesh also makes the file stable again.
To more easily the bugs and crash, you have to:
- open with VS2013 buildbots
- append everything in the file attached into a very complex scene (there must be something to corrupt in memory, so many shaders, lot of meshes, etc. make crash more often)
- duplicate in place ( location may be important to trigger the bug) what you appended to make the probability of crash higher (the overflows are more likely to produce a crash if many part of memory are overwritten/corrupted, so duplicate it a hundred time if it doesn't crash)
- then try to render, undo, etc...
This part of the scene was modeledin november last year and wasn't touched since then. The bugs and crash happened more often as scene complexity increased.
Note that the crash on file open is only on VS2015 release, but instability is in VS2013 = official release too. It just happens randomly during a material change or an undo or a final render start or whatever part of memory was corrupted.