Page MenuHome

Adding "minimal Python Interpreter" for Drivers
Open, NormalPublic

Description

This is a summary of an IRC chat of today (with Kaito and Aligorith)

When drivers are used, Python Scripting must be enabled. However it looks like most drivers can be created using a very limited set of Python (basic math operations come to mind). Because of this it may be possible and desirable to add a "minimal python interpreter" only for usage with Drivers.

@Joshua Leung (aligorith): Would you mind to add some comments about how such a "minimal python interpreter" could possiblky be implemented?

Some remarks collected from the IRC chat (just so that this information doesn't get lost):

from Aligorith:

  • Basically it (the mini interpreter) would strictly white-list the types of stuff you can do with it - anything that gets too close to being able to be used for naughty stuff wouldn't be allowed.
  • really, most of the drivers I've seen seem to use some combination of sin/cos and/or simple addition/mult/etc.
  • ... we could play around with doing stuff like using the python ast's, and pruning out stuff we don't like when in "secure" mode

from Kaito:

  • Absolute secure doesn't exist, so we better just try to minimize risk or damage.
  • The idea of a sandboxed 2nd py interpretor who only does blender py rna and basic math is great.
  • You know, with such a mini-py we can also drop the gpl requirement and make .blends with embedded driver scripts etc 'free' again
  • I don't think we need to code own interpreter
  • I would check with the python.org team what the smallest compatible interpreter would look like and i would check with houdini, maya, lightwave and others who move to py what they do
  • The traditional py devs work on servers, they see security quite different...

from me:

  • What about adding a couple of python classes for the basic operations? those classes can be marked as "can be used in drivers without script enabled" or so ?

Event Timeline

Gaia Clary (gaiaclary) updated the task description. (Show Details)
Gaia Clary (gaiaclary) raised the priority of this task from to Needs Triage by Developer.
Gaia Clary (gaiaclary) added a project: Python.
Gaia Clary (gaiaclary) set Type to Bug.

One concern I have with this, is it gives some kind of challenge for people to circumvent it.

A while ago I looked into the BGE's sandboxing options- and each time I managed to lock it down, some clever Python guys would show an example of how it could be worked around (quite trivially).
I realize with some more advanced tricks (checking bytecode or ast), some restricted set of Python could be enforced, however - if this ends up being *easy* to workaround (say 10min or searching online).
Then I'm not sure its worth the effort to attempt to sandbox in the first place. Since we would then be promoting a feature as Secure which would in fact be quite insecure.


Other points...

  • Don't think this makes any changes to how the GPL works with drivers, a driver can already use a restricted set of functions - and many do.
  • The kinds of drivers you would evaluate with a restricted API are likely small math expressions - not something you would hold copyright on (maybe too big a topic for this task, but think this is the case for very small expressions).

@gaia, would need a more concrete example - how would the classes work?

Campbell Barton (campbellbarton) triaged this task as Normal priority.Mar 16 2016, 3:06 PM
Campbell Barton (campbellbarton) changed Type from Bug to Design.

My preferred option for eliminating any lingering GPL and Python sandbox bypassing would be to simply write our own simple parser and/or interpreter.

Pros:

  • There isn't the risk of anyone escaping the sandbox as our parser would simply barf on any inputs that try to do anything tricky they it can't handle. Anything that we can't handle is passed back to the standard python interpreter (which only runs when allowed). UI wise this distinction should be indicated (maybe via the presence of a py icon or the old radiosity icon).
  • From GPL perspective, I'm guessing that if we make this "GPL compatible but not GPL" licensed it would rule out any of the standard concerns there.
  • As we are only handling a very limited subset of python, there may be some perf benefits in some cases? It off course depends how we do it, but just by shortcircuiting some of the typechecks and callback lookups we should get some minimal differences in theory.

Cons:

  • We need to write a simple parser + interpreter. That however is not such a big issue and can be done quite easily... it just needs a little time...
  • Potential for other security slips from having our own parser.

Also just to reiterate, we can only use this for handling "simple" driver expressions - ie the sort that just perform math using the builtin math funcs, +-×÷, and the driver vars that were defined for that driver.

Think if this is to be solved - having a simple parser that handled basic math expressions is better than attempting to sand-box CPython.


As for GPL issues - think we should get advice here, and not take on a lot of work because of GPL issues that *might* exist.
We should understand exactly what the implications currently are.

We could make an official statement (and get FSF to double check it), eg:

Driver expressions that use only Python API's and don't call into Blender's API's, aren't subject to the GPL.

... this covers typical math expressions (most drivers).

Correction, the patch linked has an error and isn't working, see reply below (fixed and linked to differential).


Similar to @Joshua Leung (aligorith)'s suggestion to manipulate the AST, there have been a few projects that allow byte-code level manipulation.
One thats quite popular and well maintained is numba, which converts Python bytecode to LLVM instructions, and interestingly has the ability to disable calling back into CPython from the converted functions [1].

This is an experimental patch P338, which uses numba from Blender's PyDrivers when auto-execution is disabled, import and open raise an exception, while math functions (sin/cos/tan... etc) work as expected.
However since this isn't written with security as the main purpose, its possible there is some way to break out of the sandbox (I'll mail their list and see if this is considered *secure*).

Tested this with a production file from glass-half (01_render.blend), and the rigs work without any problems and the same performance.
(improved performance may be possible, most likely the performance cost is setting up the Python context and not the execution it's self).

[0]: http://numba.pydata.org
[1]: http://numba.pydata.org/numba-doc/latest/user/jit.html?#nopython

@Campbell Barton (campbellbarton): Interesting find!

A few questions we'd need to check on:

  1. How do we set up numba to test this?
  2. What sort of impact would numba have on distribution sizes? From the downloads page, the packages seem to be just under 1mb. (I haven't checked yet whether that includes or doesn't include any LLVM stuff, though I imagine that LLVM tends to be quite a bit larger. Anyway, if LLVM is not included, then we already have it included for some of the cycles stuff, so it wouldn't be too much of a stretch I guess)
  3. You mentioned import and file IO. What about some of the other nasties such as os (and other ways of executing commands)?
  4. What happens with custom functions added to the driver functions namespace - stuff that riggers can define in textblocks and register? Is numba restricted to running with what it can see in the expression (and a few other builtins it has converted), or does that extend to everything in the namespace it encounters?

@Campbell Barton (campbellbarton): Interesting find!

A few questions we'd need to check on:

  1. How do we set up numba to test this?

http://numba.pydata.org/#installing

Though I built it from source - https://github.com/numba/numba#installing-numba

  1. What sort of impact would numba have on distribution sizes? From the downloads page, the packages seem to be just under 1mb. (I haven't checked yet whether that includes or doesn't include any LLVM stuff, though I imagine that LLVM tends to be quite a bit larger. Anyway, if LLVM is not included, then we already have it included for some of the cycles stuff, so it wouldn't be too much of a stretch I guess)

Both the dependencies (LLVM and Numpy) are already included with Blender.
So we should be able to use it without adding extra deps apart from numba it's self.

  1. You mentioned import and file IO. What about some of the other nasties such as os (and other ways of executing commands)?

You cant access os because you can't import, and even if you add the functions into the namespace, they won't execute (from my own tests in the Python3.5 command prompt), I've mailed their list to ask if the this could be used as a sandbox, since it isn't mentioned in their docs.

  1. What happens with custom functions added to the driver functions namespace - stuff that riggers can define in textblocks and register? Is numba restricted to running with what it can see in the expression (and a few other builtins it has converted), or does that extend to everything in the namespace it encounters?

Anything that calls back to the CPython API raises an exception, that includes any functions you pass in the name-space.
They must be handling calls from the math module as a special case since the math functions in existing rigs are working as expected.

It seems am talking rubbish and this is not working at all! my testes in the Py console overlooked that the function needs to run at least once before we can get the newly created "code" object back out of the function. (so the basic principle can work, but needs some tweaks).

However it looks like this isn't so hard to support, though we will need function calls instead of evaluating with a name-space since numba doesn't support reading variables, only arguments to a function.

You might use nodes for drivers?
They would visually represent the AST of an arithmetic expression.
From the node inputs the dependencies for the dependency graph could be derived.

In a text parser you would have to resolve the variables from the driver which you have to setup beforehand?

Further you might want to support vector inputs and vector operations?

Update, got numba working correctly, and tested with glass-half file, D1860

In summary - it works but initial driver compilation is very slow.

I think the most sensible comment has been "let's check how others do it". Oh boy it almost doesn't seem like Blender! So how do others do it?