Page MenuHome

Master Broken, Cuda kernel compat errors compile, OpenCL GPU just sits there on one tile never rendering anything.
Closed, ResolvedPublic

Description

System Information
Win 10, AMD Fire Pro W9100

Blender Version
Master

Errors in compile as cuda compat file gives errors, Commented those out. Got to compile master. Rendering on cpu works, Opencl GPU just sits there on 1 tile and never renders (i left it for 15 mins on one 128x128 tile and nothing, normaly would render the entire thing in 4-5 mins.

compile on release for x64 win 10 through VS

Event Timeline

Sergey Sharybin (sergey) lowered the priority of this task from Needs Triage by Developer to Needs Information from User.Mar 8 2017, 6:56 PM

First of all, please make sure you're using the latest driver from amd.com site (sometimes Windows will override manually installed drivers, screwing things up. And we also were testing Cycles on the very recent drivers since they might contain some crucial fixes).

As for the cuda errors it's not clear for me. What are the exact compat error? Can you attach log file (as a file, inlining logs into comments makes reports difficult to follow).

Adding Mai and Hristo as subscribers. They might have clues / make some tests.

File Kernel_compat_cuda.h was the prolem, Dont have the error log as just commented out the issue and recompiled but the lines from memory that were causing the issue were 57, to 105 (the # Define are OK, can leave them in and still compiles).

More important for me is that rendering with Opencl on GPU now even though in console gives no shader compile error and GPU is being used just sits there and doesn't render anything.

Im using the VERY latest drivers from AMD's website, The Enterprise drivers are most up to date.

Here's a video showing an older build ive done for Opencl GPU with SSS and volumes, Denoise and a few other things not in master, Drop the samples to 5 to make a fast example render.

Master Opecl Issue Video

Then i show loading todays master, Render the same scene with the same 5 samples that should render in a matter of seconds but just sits there rendering nothing even though the GPU fan is going mental. I left the same scene rendering for 15 mins earlier and still nothing (but clearly im not going to let the video sit there watching nothing happen for 15 mins.

I did bring up the Cuda issue to Mai on Blender dev forum not long back, Just like everything on that dev page it got ignored. Code eval on Blender dev page NEEDS someone dedicated to checking .diff and patchs, Good updates just sit there for months going to waste with no chance of getting into master after either you or Brecht take a look.

You cant keep treating code helpers that are not core devs like this and expect people to want to help you guys out.

NEEDS immediate discussion between you core devs and Funding managers to get this resolved.

I am not sure what "blender dev forum" is. We can only address bugs reported here in the bug tracker. If there's something questionable or something which needs developer's attention then proper way to do this is to either inform us in the mailing list or in IRC. Those are the only places we constantly looking at and we don't have time or resources to check more places for possible users' issues.

We surely do extensive tests of all patches which goes to master, but we can't cover all possible compiler/OS/driver/hardware combinations. That being said, this chart was done on a hardware here in the studio just before OpenCL optimization went to master. Additionally, we were keeping track on performance for months now and all the improvements are really documented in this spreadsheet. Doesn't seem we've put something untested to master. But surely you always run into some unpredictable cases with various OS/driver/hardware configuration.

Just to stress: WX7100 is a similar hardware to W9100. I just re-tested the latest build from builder.blender.org on Windows (it's a windows 7 tho) using 17.Q1 driver and the render time of files from our benchmark files is similar to what i've got on Linux. So there is obviously something particular to your exact SW/HW combination.

Things to test:

  • Get latest build from builder.blender.org, see if the issue happens with builds which are don in controlled environment here.
  • Re-install the AMD driver, even if you think the latest driver is installed. Windows 10 is known for re-installing drivers using ones from Windows Update. This is already causing huge issues with CUDA (CDUA rendering suddenly stops working, re-installing the driver helps) and it's not something we can possibly control from Blender side.
  • Test of the issue happens with files from our benchmark bundle (there might be something particular to your file). You can find the bundle here.

P.S. We don't have managers, we've got handful of developers.

@Sergey Sharybin (sergey) Sharybin (sergey), Dont get me wrong you guys are doing a great job with limited resources, Im not having a go at you guys as such.

I understand you heavily test code that your adding to master, What i meant was all the .iff and patch's that are sitting here on developer.blender.org.

There's some really nice features Like new procedural noise nodes, Vector displacement, Compositir SMAA Node, SSS and volumes for Opencl, 30-50% speed up patch for Opencl but they've been sitting on here for so long that especially now the new split kernel has been added will be useless without a complete rewrite.

Maybe You devs who have contact with the blender team that sponsor the code devs could have a word about next hiring someone who's main job is to test as soon as possible all the new .diff and patchs from non core devs that get posted here, so to get into a good enough working state to be able to pass to guys like you and brecht to push to master quicker without you core devs that have no time getting swamped.

I dont see how reinstalling my drivers will make any change, All my older branch's of Blender using opencl on GPU still work, So does my custom AMD fire rays Opencl GPU renderer. Just current master after split kernel has broken. But ill try uninstalling the drivers completely and reinstalling just to help test.

Keep up the good work, We do appreciate how hard you guys try.

OpenCL drivers are VERY fragile, especially the ones from AMD. We've changed a way how we communicate with the split kernels running on GPU in order to reduce latency, improve occupancy and things like that. Comparing new code to old one is not fair at all. Additionally, there was a known bug related on reading state buffers which was causing infinite loops. So just to eliminate possibility of buggy driver interference reinstall it. It's really not that hard and will reduce number of variables to be checked here.

We can only fix issues which we can reproduce, and so far the similar setup works just fine here. Freshly installed driver might make a huge difference here.

@Sergey Sharybin (sergey) Sharybin (sergey), I unistalled all AMD software and reinstalled the latest W9100 Enterprise drivers from AMD and still does the same thing, Just sits there on the first tile never rendering anything.

Any Debug command you want me to use?

Does it do this on all files? or just a specific .blend? (if so please attach a repro case to this post) can you reproduce with a nightly build from builder.blender.org?

@James W E Bird (3dLuver) Please also share the git hash (first 8 letters) of the master which you use.

Ive just done a git pull from todays master, Im compiling now. When done ill test again and post the info.

It does this on all scene's with yesterdays master, Were see what happens with todays when compiled. Cheers for the help guys.

Just compiled todays master and still does the same, Just hangs on the first tile.

Hash: 9de9f25

hate to sound like a broken record but:

  1. can you try with the latest from builder.blender.org
  2. is this with all files, if not please attach the problematic file to this report

Hahaha, No worries LazyDodo, Im here to try to help like your trying to help me.

I download blender-2.78-9de9f25-win64.zip the latest official build for win64 today from the builder page (not even the VS2015 experimental version) and it does exactly the same thing, Just sits there hanging on the first tile.

This is with all scene's ive tried.

I just tried with default scene, pland and cube and does the same, just never renders anything.

Please run blender with the --debug-cycles option and attach the full output after the tile gets stuck. Also run clinfo and attach the output of that aswell.

Hi Mai, Thanks for the help.

The dubug info is everything up untill the tile renders, There is no other debug info shown once the tile is in the locked state.

Hope this helps.

P.S, I noticed only 12 GB of my 16 GB GDDR5 is being evaluated, I tried an old method in environment variables to force 100% memory use but doesnt seem to of worked, Do you know what the exact commands are i need to add to environment variables to force to use the full 16GB.

Mai Lavelle (maiself) claimed this task.EditedMar 10 2017, 11:44 PM
Mai Lavelle (maiself) raised the priority of this task from Needs Information from User to Normal.

@James W E Bird (3dLuver), thanks for the logs, very interesting results. Will try to get this fixed quickly.

Also your clinfo looks normal, no need to change environment, the full memory is available to you.