Page MenuHome

Add support for Zstd compression for .blend files
Needs RevisionPublic

Authored by Lukas Stockner (lukasstockner97) on Sep 15 2019, 5:10 AM.

Details

Summary

Compressing blendfiles can help save a lot of disk space, but the slowdown while loading and saving is a major annoyance.
Currently Blender uses Zlib (aka gzip aka Deflate) for compression, but there are now several more modern algorithms that outperform it in every way.

In this patch, I decided for Zstandard aka Zstd for several reasons:

  • It is widely supported, both in other programs and libraries as well as in general-purpose compression utilities on Unix
  • It is extremely flexible - spanning several orders of magnitude of compression speeds depending on the level setting.
  • It is pretty much on the Pareto frontier for all of its configurations (meaning that no other algorithm is both faster and more efficient).

One downside of course is that older versions of Blender will not be able to read these files, but one can always just re-save them without compression or run the file through unzstd.

ToDos:

  • Versioning when loading older files: Currently, the Compression Method setting will be set to None when loading a gzipped file from an unpatched Blender (the reverse works).
  • Compression Level setting: I don't know if we want to support this, but I can see uses (e.g. turning up the compression before sending the file to someone). As a workaround, right now you can just run the uncompressed blendfile though the standalone zstd at any setting and it will load.
  • MAYBE: Other algorithms? Something like LZ4 might even be interesting for in-memory undos.
  • Most of the build_files changes are untested - it builds on my Arch machine, that's all I can say so far
  • CMake scripts for manually/statically building libzstd are missing.

Benchmarks

Some benchmarks on a random massive file I had lying around:

MethodSize on DiskTime to loadTime to save
None863MB1.449 sec0.333 sec
Zlib476MB3.426 sec14.46 sec
Zstd455MB1.505 sec1.630 sec

All I/O is to/from ramdisk. The time to load also includes a few linked files iirc, so just look at the differences instead of the absolute times.

To validate that the implementation isn't bottlenecked somewhere, I ran the standalone zstd on the uncompressed file and it took 1.365sec, which matches almost exactly with the measurement difference above.

Diff Detail

Repository
rB Blender
Branch
zstd (branched from master)
Build Status
Buildable 4961
Build 4961: arc lint + arc unit

Event Timeline

Lukas Stockner (lukasstockner97) edited the summary of this revision. (Show Details)
LazyDodo (LazyDodo) requested changes to this revision.Sep 15 2019, 5:26 AM

how does this compare to D4402? I'll leave the merits of lz4 vs zstd to the other reviewers, however from a platform perspective the windows explorer thumbnail dll will have to be updated before this lands. (but i'd hold off any work on that until it's somewhat clear what direction we're planning to go with this)

This revision now requires changes to proceed.Sep 15 2019, 5:26 AM

Oh, well, I didn't know about that patch. I'll retest tomorrow with LZ4 included.

Regarding Thumbnails: Good point, same goes for some Python stuff I'd expect.

The patch looks good, hopefully it's not wasted effort, the LZ4 patch was listed here T63728: Data, Assets & I/O Module.

A few considerations:

  • My intention was that LZ4 could be enabled by default because saving can be faster when it's used (depending on the HDD speed). I read ZSTD at low compression levels approaches LZ4, so the same advantage may be true for ZSTD.
  • Currently there is an optimization in file reading that delays reading the data from linked libraries (added because Spring files were spiking memory on load). This isn't used for ZLIB files, since seeking is _very_ slow. It would be interesting to check how LZ4/ZSTD seek performance compares, if it's even practical to enable seek for either of the formats.

    If we can't use seeking, this is a down-side to enabling compression by default, at least for large projects that use linking. So it would be good to check if either compression can work well in this case.

    An alternative which allows seeking is to compress the content of BHead instead of the entire file.
  • I'd prefer not to have too many compression formats, for users having none/fast/best options is probably enough.

Comparison with LZ4 here (average of three runs):

AlgorithmSavingLoadingSize
Zstd -200.62sec1.12sec498MB
Zstd -100.65sec1.06sec488MB
Zstd -50.69sec1.07sec483MB
Zstd -30.75sec1.08sec479MB
Zstd -10.83sec1.13sec475MB
Zstd 01.77sec1.18sec455MB
Zstd 11.20sec1.20sec470MB
Zstd 21.33sec1.24sec464MB
Zstd 3 (default)1.76sec1.19sec455MB
Zstd 57.68sec1.19sec452MB
Zstd 1021.09sec1.21sec443MB
LZ40.81sec0.97sec491MB

Loading times aren't comparable to the original benchmark since I used userspace time instead of wall time.

Regarding seeking: I don't think either format supports it. However, we could eventually support compressing only the contents of each block as you suggested - that could even support selective compression (e.g. not recompressing packed images or not compressing the thumbnail to make that part easier). However, since that would need to store per-block compression flags, a modification to the BHead struct might be needed (maybe steal a byte from the SDNA index?)

Update on the per-block compression: It works, but the compression ratio suffers from the small blocks - the file from above is now 544MB on level 3. In theory dictionaries could help here, but I'm not sure how to implement that practically.

We could do some sort of chunking scheme that provides seekability on a granularity of a few MB, but that's not really practical either...