This patch is not made by me, it is made by the developers of Natron VFX.
What is the measured speedup, how is it being measured?
How does this behave with track-wide threading (when different tracks are handled in different threads)? From the looks of it this will create n^2 active threads which is not acceptable.
Such defines belong to something more like base/port.h.
But first of all, what is the exact need of those?
There can not be a single define for all type of pixel processing. This is something which is usually calculated on a per-case basis.
Shouldn't be dependent on Ceres.