Page MenuHome

Blender compositor memory optimalization
AbandonedPublic

Authored by Sergey Sharybin (sergey) on Jul 1 2014, 9:46 PM.

Details

Reviewers
Jeroen Bakker (jbakker)
Group Reviewers
Compositing
Summary

Blender's compositor always uses 4 floats to store data of a pixel. Even if the pixel is a value (only one floats needed) or vector (only three floats needed). This patch will allocate only the memory needed reducing memory. (normally around 50%).

This memory model also suits better with other area's in blender where we don't need that much of conversions between memory models This will have some (but little) positive impact on performance.

The free memory can be used eventually to keep buffers alive etc, buffering and other features.

more information can be found

Diff Detail

Event Timeline

Did just a quick look, main concerns:

  • Seems you just renamed WorkPackage to Tile? In this case make sure you used git mv for this, so history is preserved.
  • Not happy with using "NO" in a meaning of "number". Use "NUM" instead.
  • Not sure it actually worth having virtual classes for color/vector/value memory buffers, wouldn't it be better from the speed POV to use a templated class MemoryBuffer<int num_channels> ?
  • Does this patch make it so memory buffer for active tiles only is stored?
source/blender/compositor/intern/COM_MemoryBuffer.h
78

Picky, m_num_channels. no_channels is kind of meaning "no channels" as a boolean flag meaning there's no channels :)

source/blender/compositor/intern/COM_MemoryBufferValue.h
52 ↗(On Diff #2026)

Alignment.

Would try to keep meaningless changes (WorkPackage -> Tile rename) out of the patch for now, if it's just about channels.

About virtual functions: This is getting worse now with the additional dispatch by channel number ... Now we have even 2 nested virtual functions, e.g. readBilinear() calling 4x read().
This may be out of scope for the patch, but for these core low-level functions i would really suggest looking into some moderate templating, or at least make dedicated read methods for each buffer type instead of letting it branch out internally based on the num_channels. This would move the dynamic dispatch out of the inner loops, any expensive pixel processor could then determine the actual buffer types first, then start looping and call non-virtual read functions for the known types (this is where templates are useful to generate typed loop variants).

source/blender/compositor/intern/COM_MemoryBuffer.h
78

Would name this m_num_channels for consistency and to avoid confusion with negated value.

Jeroen Bakker (jbakker) updated this revision to Unknown Object (????).Jul 9 2014, 8:19 PM

Renamed the no_channels to num_channels

Jeroen Bakker (jbakker) updated this revision to Unknown Object (????).Jul 9 2014, 8:33 PM

Updated the alignment according to Blender Code Style

Jeroen Bakker (jbakker) updated this revision to Unknown Object (????).Jul 9 2014, 9:53 PM

Fix for Bokeh Blur. A part was skipped during the migration.

Jeroen Bakker (jbakker) updated this revision to Unknown Object (????).Jul 15 2014, 8:38 PM

Added base for sampling. Still needs a lot of work to get sampling where we want it to be.
Currently updated Defocus, Bokeh + VariableSizedBokeh
Perhaps the other parts should be done in master and reorganize the code a bit for developers.

Main issue is that using samplers aren't the solely solution and only available for complex nodes

source/blender/compositor/intern/COM_Sampler.h
26 ↗(On Diff #2110)

Get rid of dead code?

source/blender/compositor/intern/COM_SocketConnection.h
23 ↗(On Diff #2110)

Does this bring back code Lukas was cleaning up before?

Sergey Sharybin (sergey) requested changes to this revision.

To process further, would suggest solving the issues with SocketConnection class (which seems to revert some changes from @Lukas Toenne (lukastoenne)) and go ahead with the commit.

Lukas, atmind please solve this confusion!

This revision now requires changes to proceed.Jul 31 2014, 9:42 AM

Seems to be a wrong merge The code lukas_t did is leading. Will clean up

Jeroen Bakker (jbakker) updated this revision to Diff 2324.

Updated to latest blender master

I would rather put this change on hold until the more fundamental sampling/resolution issues are tackled. The proliferation of virtual functions is a concern and makes code even harder to follow. Compared to that i don't think the advantages are significant enough. Rather we should perhaps try to avoid the consolidated memory buffer construction, which, i suspect, can add quite a lot of allocation and copy overhead.

@lukas_t strange, the roadmap was presented in March this year, We received no comments on it.

When you reviewed the patch you wanted us to minimize the virtual functions (even as you write it is not in scope for this patch). We worked on them by reducing them to a much lower state than Trunk. Please tell us which virtual calls you are referring to and how many times they are called, and why they are a problem?

this patch has already better "allocation and copy overhead" as less buffers are created and no additional buffers needs to be allocated for device transfers. But you want to introduce more memory models to the compositor. If you have a plan please enlighten us. We want the compositor to evolve in little steps, but you are pointing into a direction of a single giant big redesign. If this is the case, please come with a plan. Until then we will should hold all developments on the compositor core.

Please come with facts and not things like "I don't think" and "I suspect". As for users this patch reduces memory overhead (camimandes 01a, 30% memory reduction), increases overall compositor performance. This patch allows us to implement bicubical sampling. Something you and others were asking for. Are you telling us this is not significant enough?

Trunk has consolidated memory buffer construction, the scope of this patch is not to change something in this area (see roadmap), so why are you mentioning this?

been testing this patch, and compositor works really fast and quite responsive than before. Awesome...!

Sergey Sharybin (sergey) requested changes to this revision.

Eh, the complexity of the patch drastically grew up since my last review, now i see why Lukas could be so skeptical.

Performance
Didn't really run own benchmarks yet, but if it's indeed up to 30% then it's something significant. Would run caminandes and mango scenes tomorrow to doublecheck on this.

Unrelated changes
Seems you've been doing changes like renaming WorkPackage to Tile. I don't mind such change, but it obviously don't belong to this patch and should be applied separately. Doesn't really matter if it'll happen before or after finalizing the work on the actual memory optimization.

Same seems to apply to the OutputSocket. Just re-read the patch before publishing and strip all the unrelated changes to a separate revision.

Virtual calls
Not really sure why you mention there's fewer virtual calls comparing to master branch (unless i misunderstand the comments). Just look into MemoryBuffer -- it's now full of virtual calls. And what's more bad -- virtual calls which happens per pixel. This is really something to be avoided.

For the proposal about them read further.

Decoupled memory buffers
I don't see an advantage of having separate memory buffer classes for each of channels configurations. It's really just an extra complexity which is easy to avoid, make code easier to follow and avoid virtual functions.

I'd say it should be single MemoryBuffer class with a number of channels it serves stored as it's property (if one REALLY worry about performance here, it could be a templated class, but personally i don't see reason for this). This single class would serve all that read*() functions, which are no longer virtual.

Not sure how much in details i should go here, but guess you got the idea.

Sampler classes
Imo that's rather stupid to allocate own sampler (or even three at the time?) for each memory buffer. Especially since current samplers implementation is like adding a cyclic dependency -- MemoryBuffer references and owns the sampler, sampler is referencing to the MemoryBuffer buffer.

Sampler should be just a sampler: stateless, reentrant, shared across all the users. And for sure they shouldn't duplicate logic from BLI_color_interp.h. BlenLib is to be extended with the interpolation functions we want to sue in compositor, not the compositor should duplicate code which belongs to other area.

Currently all the 1, 3 and 4 channels are supported by the BLI_color_interp.h. You might make that functions more flexible if they're not really usable for compo (don't see why tho).

Proposal
So the proposal would be:

  • Don't push the patch for 2.72, but rather make it really nice, so everyone is happy
  • Strip all the unrelated changes to a separate patches, which seems to be really easy to do even for 2.72
  • Get rid of samplers in favor of BLI_color_interp functions.
  • Get rid of multiple MemoryBuffers
This revision now requires changes to proceed.Aug 12 2014, 11:13 PM

Had a discussion with sergey online. We agree that this patch will be postponed until the points have been addressed.

  • one MemoryBuffer with num_of_channels
  • using BLI functions in stead of the read operations inside Complex nodes and samplers.
  • should make a reusable EWA interpolation
Jeroen Bakker (jbakker) updated this revision to Diff 2605.
  • Updated to latest master.
  • Removed the different MemoryBuffer classes
  • Enabled bicubic sampling

I would really suggest splitting WorkPackage->Tile rename to be separated from the main change. Plus bicubic might be useful but i would make it a separate patch as well and wouldn't make it default anywhere.

source/blender/compositor/intern/COM_MemoryBuffer.cpp
59

Use BLI_rcti_size_x().

60

Use BLI_rcti_size_y()

73

Same as above.

86

And again.

94

Can the initializations above be de-duplicated?

source/blender/compositor/intern/COM_MemoryBuffer.h
221

use memset() instead.

source/blender/compositor/nodes/COM_DefocusNode.cpp
106 ↗(On Diff #2605)

Is there a particular reason to remove this?

If so, would prefer it to be done as a separate patch.

source/blender/compositor/operations/COM_KeyingBlurOperation.cpp
66

Seems this also changed indentation.

Same seems to happen with other changes in the patch.

source/blender/compositor/operations/COM_ReadBufferOperation.cpp
66

Bicubic adds a bit extra fuzzyness, would think bilinear should be listed first.

source/blender/compositor/operations/COM_RenderLayersProg.cpp
145

BLI_assert(elemsize == 4)

source/blender/compositor/operations/COM_ScreenLensDistortionOperation.cpp
150 ↗(On Diff #2605)

Such changes i'll leave for later. Just using bicubic doesn't mean you'll have much higher quality image.

Sergey Sharybin (sergey) requested changes to this revision.
This revision now requires changes to proceed.Dec 3 2014, 6:33 PM

Hope to have some time next week to fix these issues.

source/blender/compositor/nodes/COM_DefocusNode.cpp
106 ↗(On Diff #2605)

Was unused code since 2.63

source/blender/compositor/operations/COM_ReadBufferOperation.cpp
66

Good one.

source/blender/compositor/operations/COM_ScreenLensDistortionOperation.cpp
150 ↗(On Diff #2605)

Well the original version (2.62) used Bicubic sampling So it is more that between 2.63 and now it was wrongly implemented.There are also issues in the trackier pointing this out. https://developer.blender.org/T37462

Jeroen Bakker (jbakker) updated this revision to Diff 3188.

Now the patch only contains the memory optimization. Other patches will be done when this patch is in trunk.

Thanks for update! It's a bit late here now, will check it tomorrow.

Jeroen Bakker (jbakker) updated this revision to Diff 3189.

Remove redundant field

Seems majority of the changes are now in actual operations, which is nice. There's one small thing in the code which seems to be a typo.

Once that's clarified i think you could go ahead and land the patch to master.

P.S. Also, does EWA really work for number of channels not equal to 4?

source/blender/compositor/intern/COM_MemoryBuffer.h
218

Was this expected to be

memset(result, 0, this->m_num_channels*sizeof(float))

instead?

Added assert in EWA as it is only supported when using colors.
Fixed memset syntax

Sergey Sharybin (sergey) commandeered this revision.

That was committed to my knowledge. So closing..