Page MenuHome

Crashpad support
Needs ReviewPublic

Authored by LazyDodo (LazyDodo) on Aug 3 2018, 5:17 PM.

Details

Summary

This implements the client part of the crash reporting for windows, server wise I developed
with https://sentry.io since it's opensource and they offer both hosted and onsite installs.
(I also tested with https://backtrace.io which will work equally well but appear to be less
open)

It's a little rough around the edges but the basic functionality is there. Things of note:

  • Consent/privacy, it never uploads anything to the server without the users consent, there's a

dialog for every crash, minidumps are essentialy small memory dumps and may contain personal information.
sentry however only extracts the callstacks and removes the actual dump once done, so no actual
personal information will be stored on the server.

  • The crashpad api is different between windows and mac/linux due to wstrings being used for

windows for some reason, so the implementation has been seperated out in a _win.cpp file for
this reason, mac/linux may be able to share an implementation.

  • I attached a copy of the required libs they need to go into win64_vc14\crashpad

-crashpad is designed to have no UI what so ever, so after installing the crashpad handler,
I revert back to our handler and call crashpad only after getting consent from the end user.

-Symbols are only required on the server, so we do not have to ship a 300mb+ symbol file to end
users.

-I'm pretty sure there's some whitespace and other codestyle issues, which will have to be cleaned up
but are not the main focus for this review at this point in time.

-2.8 only, i don't see another 2.79 release coming and 2.8 is pretty crashy at this point in time
so it felt like a natural fit.

Testing

Setting it all up is a little tricky, but following these steps should work.

  1. unpack the attached libs to win64_64/crashpad
  2. Get an account on https://sentry.io, the free one will work, setting up our own server should work as well but will probably require a little larger investment of time.
  3. when you are logged in , get the minidump url from https://docs.sentry.io/clients/minidump/crashpad/
  4. run "win64_vc14/crashpad/bin/sentry-cli-Windows-x86_64 login" and authenticate , this will create a settings file in your profile folder so we don't have to include logins and passwords in the build scripts.
  5. Build with -DWITH_CRASHPAD=On -DCRASHPAD_URL=https://blahblah.whatever.minidump.url.they.gave.you
  6. Run the resulting blender.exe (do not run in the debugger, since the debugger will catch the exceptions, not the crashhandler)
  7. validate crashdumps are coming in on https://sentry.io/your-organization/your-appname/
  8. Remove the null deref I left in creator.c
  9. business as usual.

Possible improvements

-Not a super huge fan of the popup everytime, however users prefs may not have loaded at the time of the crash, UI team might have a thing or two to say here.
-Wording of the popup also could probably be improved.
-Sentry does support getting traces from python as well, could be interesting for the python devs?
-There's some integration available between sentry and phabricator, could be interesting to explore.

Disclaimer

I'm no export on sentry, I'm about 2 weeks of googling ahead of you guys.

Diff Detail

Repository
rB Blender
Branch
arcpatch-D3576_2
Build Status
Buildable 2198
Build 2198: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Martin Felke (scorpion81) manually uploaded the last diff, which implicitly switched branches from 2.8 to master, given we're doing work in build_environment I decided to stick to master as well.

Added building of the dependencies to build_environment:

  • switched breakpad to a githash+zip instead of checking it out the latest from git directly.
  • reverted stray change in source/blender/blenkernel/intern/image.c by @Martin Felke (scorpion81)
  • Scripted breakpads dump_syms for windows in cmake, linux/mac seems different code one of the platform maintainers will need to take a look what needs to be done here.
  • Scripted crashpad in cmake, windows is tested, linux/mac might need a tweak or two.
  • Scripted mini_chromium dep for crashpad.cmake, windows is tested, linux/mac might need a tweak or two.

Removed sentry-cli from the svn libs, doesn't seem easily build-able, platform maintainer will just needs to obtain it him/her self have it in the system path

tested on win64/win32

updated binaries for testing on win64

LazyDodo (LazyDodo) edited the summary of this revision. (Show Details)Sep 11 2018, 7:37 PM
LazyDodo (LazyDodo) edited the summary of this revision. (Show Details)Sep 17 2018, 10:06 PM
Brecht Van Lommel (brecht) requested changes to this revision.Sep 18 2018, 9:05 AM

Removed sentry-cli from the svn libs, doesn't seem easily build-able, platform maintainer will just needs to obtain it him/her self have it in the system path

I'm not sure why sentry-cli not being easily buildable prevents if being added to the svn libs?

build_files/build_environment/CMakeLists.txt
95

Is breakpad needed for all platforms or just Linux?

build_files/build_environment/cmake/breakpad.cmake
42

This breakpad.diff is massive. Why is it needed, how was it generated? To me it seems like most of it could be autogenerated, if not we need at least a comment explaining why it's done this way.

build_files/build_environment/cmake/crashpad.cmake
47

Where did all these custom cmake files come from. Were they created manually for Blender or copied from somewhere?

build_files/build_environment/cmake/versions.cmake
301

Comment style:

# Latest as of 2018-09-09.
build_files/cmake/macros.cmake
401–402

This doesn't seem like the right place to call find_library(), should be in platform_apple.cmake along with the others. bsm and Security can be added to CRASHPAD_LIBRARIES then.

build_files/cmake/platform/platform_apple.cmake
382

Rename to CRASHPAD_ROOT_DIR for consistency.

392

This is redundant.

build_files/cmake/platform/platform_unix.cmake
60 ↗(On Diff #11827)

Rename to CRASHPAD_ROOT_DIR for consistency.

build_files/cmake/platform/platform_win32.cmake
134–135

Please add a comment explaining what this does.

213

Rename to CRASHPAD_ROOT_DIR for consistency.

source/creator/CMakeLists.txt
73

Use elseif.

73–76

Use elseif. Now apple compiles both creator_crashpad_apple.cpp and creator_breakpad_linux.cpp

77

Rename file to unix, as in principle this could also run on BSD for example.

452

Fix indentation.

source/creator/creator_breakpad_linux.cpp
47 ↗(On Diff #11827)

Don't indent everything for anynomous namespaces. It's not even needed since everything is declared static anyway?

55 ↗(On Diff #11827)

Remove debug code. If it's intended for users, use fprintf(stderr,

57 ↗(On Diff #11827)

Follow Blender comment style. for coments.

58 ↗(On Diff #11827)

Does this work on all Linux distros? Wayland?

112 ↗(On Diff #11827)

Use fprintf(stderr, .

113 ↗(On Diff #11827)

Follow Blender code style.

138 ↗(On Diff #11827)

Do we really need to catch all of these? SIGFPE and SIGTRAP seem wrong to me. Is this copied from breakpad documentation or so, which signals does e.g. chrome handle?

144 ↗(On Diff #11827)

Don't hardcode /tmp, use appropriate operating system functions to get a temporary file or directory.

158 ↗(On Diff #11827)

Follow Blender code style, put { on same line as for.

source/creator/creator_crashpad_apple.cpp
62 ↗(On Diff #11827)

Debug code to be removed.

93 ↗(On Diff #11827)

Code style: always use {}.

96–98 ↗(On Diff #11827)

Code style:

/* Long
 * comment
 * like this. */
109–112 ↗(On Diff #11827)

Simplify:

return rc;
120 ↗(On Diff #11827)

We must ask the user permission before sending data, as we do on Windows.

source/creator/creator_crashpad_win.cpp
131 ↗(On Diff #11522)

This extern "C" is not needed if the declaration moves to creator_intern.h.

47–54 ↗(On Diff #11827)

Can we deduplicate more of the code between platforms? For example the main difference between macOS and Windows seems to be strings, but that is easily abstracted:

#ifdef _WIN32

typedef std::wstring native_string;
static native_string to_native_string(const std::string& s)
{
    std::wstring temp(s.length(), L' ');
    std::copy(s.begin(), s.end(), temp.begin());
    return temp;
}

#else

typedef std::string native_string;
static native_string to_native_string(const std::string& s)
{
    return s;
}

#endif
50–51 ↗(On Diff #11827)

This seems wrong, I think it should use something like BLI_strncpy_wchar_from_utf8 to properly handle non-ascii characters.

57 ↗(On Diff #11827)

Code style, don't use camel case for function names. Though this code could be directly in crashpad_init() too, no need for a separate function.

66 ↗(On Diff #11827)

It's not ascii, but utf8.

Do we really need to write this to user datafiles, or is it more like a temporary file?

It seems like we are not removing this file after we've sent it, is there a reason to keep it around?

69–70 ↗(On Diff #11827)

Simplify two lines to one BLI_join_dirfile.

73 ↗(On Diff #11827)

Debug code to be removed.

76–77 ↗(On Diff #11827)

Add a comment to explain why these options are used?

140 ↗(On Diff #11827)

We need a way for users to disable this popup, maybe a user preference.

143 ↗(On Diff #11827)

Change this line to: "Blender has crashed"

source/creator/creator_intern.h
33

Comment style: start with a capital and end with a dot.

source/creator/creator_signals.c
85–91

Move this to creator_intern.h, don't use declarations in C files like this because they can get out of sync without a compile error.

86–87

Add some #ifdef so we don't declare functions that don't exist, we don't have both at the same time.

We could also rename the functions and files to crash_report or so, to abstract platform differences a bit more.

307–319

Simplify to:

if (app_state.signal.use_crash_handler) {
ifdef WITH_CRASHPAD
        crashpad_init();
#endif

#ifdef WIN32 
        SetUnhandledExceptionFilter(windows_exception_handler);
#else
        /* after parsing args */
        signal(SIGSEGV, sig_handle_crash);
#endif
    }
321

What are these conditions for, can we at least get a comment explaining why macOS is different than Windows and Linux?

This revision now requires changes to proceed.Sep 18 2018, 9:05 AM
LazyDodo (LazyDodo) edited the summary of this revision. (Show Details)

initial pass with low hanging fruit from brechts review.

LazyDodo (LazyDodo) commandeered this revision.Sep 19 2018, 8:58 PM

I'm not sure why sentry-cli not being easily buildable prevents if being added to the svn libs?

I'm not comfortable with executables from unknown origin in svn, everything else in svn

  1. I can vouch for was build from source, with the latest virus scanner and definitions
  2. Source is available for it, with buildscripts for facilitate an identical build in case someone has trust issues with 1.

Dropping in an executable from a 3rd party would mess up this good thing we have going.

now if this was a common binary (ie all devs would need it during a build) i would go out of my way to script it, however, sentry-cli will only be run by either

a) the platform devs while making an official release,
b) the buildbots.
c) a few advanced users that have their own blender distribution going (fracture modifier, maybe a studio specific build) , which will probably not report to the blender.org sentry server (there will need to be such a thing i suppose) and might not even use sentry.

Those people imho can manage to get this binary on their own.

sorry about all the 'done' but i forgot to commandeer so it won't give me the checkmarks and there were already too many comments to delete and start over.

build_files/build_environment/CMakeLists.txt
95

from what i got from chrome, they still use breakpad for linux , and crashpad on windows/mac. So in regard of the actual crash handler this is linux only. however, breakpad has a tool to prepare the symbol information for uploading called dump_syms which crashpad still needs.

build_files/build_environment/cmake/breakpad.cmake
42

@Martin Felke (scorpion81) this one is yours.

build_files/build_environment/cmake/crashpad.cmake
47

I breakpad/crashpad i manually scripted them from the gyp build files, its original build system is just not compatible with anything we are doing , minichromium is based on an existing script, it's origin is included in the cmakelists file

build_files/build_environment/cmake/versions.cmake
301

fixed

build_files/cmake/platform/platform_win32.cmake
134–135

Done.

213

Done.

source/creator/CMakeLists.txt
73

done.

73–76

done

452

done.

source/creator/creator_crashpad_win.cpp
73 ↗(On Diff #11827)

done

76–77 ↗(On Diff #11827)

done.

140 ↗(On Diff #11827)

agree, but user preferences might not be available yet in the case of a startup crash, i used this popup as a stop gap measure, we'll have to do better here, just unsure what yet.

143 ↗(On Diff #11827)

done.

source/creator/creator_intern.h
33

Done

LazyDodo (LazyDodo) marked 9 inline comments as done.Sep 19 2018, 9:01 PM
  • fix popup title.
LazyDodo (LazyDodo) marked an inline comment as done.Sep 19 2018, 9:12 PM

One alternative way of handling crash uploading could be to:

  1. Blender crashes
  2. Blender unconditionally stores a crash dump into a known location (without telling the user)
  3. Blender terminates (possibly without telling the user)
  4. User restarts Blender
  5. Blender sees the crash report from last run and does whatever its preferences say.

Another upside of this is that doing just about anything in the context of a crashed application can potentially be unreliable, who knows what memory has been overwritten and how basic function calls will even work in this state? Or threads and synchronization for that matter, what if something crashes while holding a lock?

This would not work for startup crashes obviously, so they would have to be special-cased.

Don't know how to handle this really, but one possibility is:

  1. Get this thing into production *without* reporting startup crashes.
  2. Handle startup crashes in a followup change.

This would still provide a lot of useful information earlier rather than later.

Replies for part of the reviewer's questions :)

build_files/build_environment/cmake/breakpad.cmake
42

This diff does two things:

  1. it fixed a symbol clash with the listed UTF functions between breakpad and llvm 3.4 (since I implemented this in the 2.79 FM branch too, this was important to me; since 2.79 is more or less abandoned it wont hurt there, and it wont affect FM 2.8 merging etc) and modifying the breakpad side was WAY easier here for me than messing around with llvm. Might be newer llvm doesnt have this static lib symbol clash anymore, but better safe than sorry.
  1. it includes linux_syscall_support.h file, which is NEEDED by breakpad but annoyingly not in the same git repo (nor submodule). Very annoying and brittle too to make an extra CMake external project step depending on that other repo. Thats why I copied the file to the directory it belongs

and voila it worked. Since using git in CMake as download target (and even "master" then as moving target) is bad practice (i learned) and things will be "frozen" to some version for the time being anyway, this seemed like a simple and good solution to me.

source/creator/creator_breakpad_linux.cpp
58 ↗(On Diff #11827)

Probably it wont work with wayland, found only this after quick googling...

https://github.com/Tarnyko/wlmessage but i doubt this has a bit of chance to be considered at all (additional dependency for just one popup etc)

So maybe, as @LazyDodo (LazyDodo) suggested only write the dump and as @Johan Walles (walles) suggested, rather send it the next time blender runs (via userpreference or something maybe)

Xmessage was here only used as last resort because more fancy stuff like SDL window / popup or even blender / ghost stuff didnt work.

138 ↗(On Diff #11827)

This is indeed copied from the breakpad ExceptionHandler class, which usually does some more or less working automatic crash handler setup. Here it rather did not work, so I was forced to set up signals manually, and just took their setup which signals to catch.

some more clarification

source/creator/creator_signals.c
307–319

breakpad_init() does pass the existing blender exception handlers as function pointers to breakpad basically.

This happens because this way they can be executed after the manually set up breakpad signal handling (via sigaction, not just via signal, since this is the less sloppy and more correct way to do it).

In practice, this means after writing (and optionally sending) the minidump the sigsegv handler in linux is still called to create the callstack in the terminal after blender terminates, or in case of sigabrt, just cleaning some temporary stuff in blender.

The existing exception handlers basically were just hooked up into the new breakpad handler routine, after the minidump part already ran.
So this imho cant be "simplified" away here.

  • Updated with feedback.
  • Updated to latest master
  • Added cmake option to seperate crashpad support from the uploading to sentry ( WITH_CRASHPAD_SENTRY_UPLOAD )
  • removed all platform specific UI this was problematic for so many reasons, better to get rid of it all together. There is now a single checkbox in userprefs->System to enable/disable error reporting, default is off. To catch any startup issues where the users is unable reach the userpreferences or when the user prefs are not yet loaded there is the enviroment variable BLENDER_CRASHREPORTING which when set to 1 will also send out a crash report. Since setting environment variables isn't that easy for some users this functionality can also be enabled with the --enable-crash-reporting command line parameter.
  • Removed breakpad support (sorry!), I just can't test/maintain it, mac is close-enough-(tm) but the breakpad integration for linux seems too big of a mess to cleanup without being able to test it.
  • Merged crashpad support for mac/windows into a single code file.

Todo:

Mac: On windows we run our crash handler first, then manually call crashpad_activate to activate
crashpad, which will do the userprefs + environment var checks before doing the actual dump + upload.
I'm unclear how exception handling on mac works so can't wire that up.

Mac: breakpads dump_syms for mac is not scripted in the build_environment scripts.

LazyDodo (LazyDodo) marked 10 inline comments as done.Oct 7 2018, 10:12 PM

Forgot one update, the build in crash moved to the debug menu, value of 1 now crashes blender (this will need to go before landing, but it's easy to have to test)

LazyDodo (LazyDodo) planned changes to this revision.Oct 8 2018, 8:41 AM
  • remove last bits of breakpad

I hope that Blender Foundation eventually does not rely on Sentry.Io or any other 3rd party entity when it comes this functionalty. This is all how it starts, first you want to know about the crashes, then you want to know what was leading to the crash, then how the user interacts with Blender, then you want everyone to have an account with Blender to use it etc etc. Not saying there are such intentions but feature creeps happen all the time. Also I had seen these exact same foot steps with all the apps I used in the past like Maya, Modo, Mudbox, Photoshop etc etc.

Personaly I would be more than happy to share my crash logs, if all this is handled by Blender foundation, on their own servers.

thanks

Which is one of the reasons i picked sentry unlike other solutions in this space, it's opensource and you can run it on your own servers.

Which is one of the reasons i picked sentry unlike other solutions in this space, it's opensource and you can run it on your own servers.

I appreciate the functionality and effort you are putting. We all want better Blender and effective developer engagement, anything that saves the Blender dev's time is cool. I was just stating my own perspective as a long time Blender user.

@LazyDodo (LazyDodo) there are still some not-responded-to comments from @Brecht Van Lommel (brecht), will you work on those? A "no" here is fine of course, I just want to know!

And for the record I'm not involved in Blender development in any way.

I do however think that the functionality that you are working on here @LazyDodo (LazyDodo) would be super valuable to the Blender project, I might actually be your biggest fan :)

brecht and i discussed this off the tracker, and it's been decided to put this functionality on hold until the initial release of 2.8 is out the door, setting up the server side takes a little more resources than we have available right now.

Thanks for the update @LazyDodo (LazyDodo)!

Regarding the server side, did you consider testing this out with the Sentry people's hosted solution?

Whether or not Sentry.io would be the long term solution, couldn't it be used as a low effort way to test this whole idea out in practice? As a way to get going it sounds much easier than setting up and operating something Blender specific.

I just looked it up and their cheapest setup is at $29/month for 100k crashes (https://sentry.io/pricing/).