This new function is part of the 'parallel for loops' functions. It
takes an iterator callback to generate items to be processed, in
addition to the usual 'process' func callback.
This allows to use common code from BLI_task for a wide range of custom
iteratiors, whithout having to re-invent the wheel of the whole tasks &
data chuncks handling.
This supports all settings features from BLI_task_parallel_range(),
including dynamic and static (if total number of items is knwon)
scheduling, TLS data and its finalize callback, etc.
One question here is whether we should provide usercode with a spinlock
by default, or enforce it to always handle its own sync mechanism.
I kept it, since imho it will be needed very often, and generating one
is pretty cheap even if unused...
Additionaly, this commit converts (currently unused)
BLI_task_parallel_listbase() to use that generic code. This was done
mostly as proof of concept, but performance-wise it shows some
interesting data, roughly:
- Very light processing (that should not be threaded anyway) is several times slower, which is expected due to more overhead in loop management code.
- Heavier processing can be up to 10% quicker (probably thanks to the switch from dynamic to static scheduling, which reduces a lot locking to fill-in the per-tasks chunks of data). Similar speed-up in non-threaded case comes as a surprise though, not sure what can explain that.
While this conversion is not really needed, imho we should keep it
(instead of existing code for that function), it's easier to have
complex handling logic in as few places as possible, for maintaining and
for improving it.
Note: That work was initially done to allow for D5372 to be possible... Unfortunately that one proved to be not better than orig code on performances point of view.