Page MenuHome

Fix for Python failure with .blend files loaded from paths including non-ASCII characters
Closed, ResolvedPublic

Description

Test platform: 64-bit Windows Vista, with the active code page set to 932 (Japanese Shift_JIS)
Test Blender revision: 56458

Problem: Freestyle fails when .blend files are loaded from paths including non-ASCII characters (e.g., Japanese).
File loading is okay (thanks to the trunk revision 56454), but Freestyle rendering results in a Python error shown below:

UnicodeDecodeError: 'mbcs' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page.

The error comes from Py_CompileString() called from python_script_exec() in source/blender/python/intern/bpy_interface.c.
The second argument of Py_CompileString() must be a byte array in the default file system encoding (Py_FileSystemDefaultEncoding).
In the test platform, Py_FileSystemDefaultEncoding is "mbcs" (i.e., Windows-specific class of encodings).

Apparently, G.main->name is a byte array in the UTF-8 encoding (please confirm this).
This variable indicates the fully qualified name of the .blend file being opened and manipulated by Blender.
The .blend file name is then embedded in a string (fn_dummy) to be passed as the 2nd argument of Py_CompileString().

In summary, the UnicodeDecodeError above is resulting from an attempt in Py_CompileString() to decode a UTF-8 byte array as if
the byte array were in the MBCS encoding.

Attached patch is intended to fix the reported issue. Code review and testing are much appreciated.

Event Timeline

Campbell is the expert on this, so assigning to him.

G.main->name is indeed UTF-8, and the python documentation mentions Py_CompileString expects a string in the file system encoding, so this change makes sense to me. PyRun_File in the same function looks like it needs the same conversion to the file system encoding?

Probably it would be good to split this into a utility function that converts a char* from UTF-8 to filesystem default encoding so it can be reused in different places.

Hi Brecht and Campbell,

Thank you for the confirmation and positive response on this matter. Following the code review comments from Brecht, I have duly updated the patch set and made it more comprehensive. There are actually three places in the code base where file names in the file system default encoding are necessary (instead of UTF-8 as it is so far). It is noted that the identified string encoding inconsistency affects the following ways of Python script execution.

- Execution of Freestyle scripts.
- Execution of Python scripts using the command line option -P.
- Execution of text datablocks (marked as Python modules) at start-up.
- Reloading of imported modules using imp.reload().

The attached patch accounts for all these execution methods, allowing: (a) .blend files and Python scripts to reside in paths including non-ASCII characters; and (b) text datablocks to have non-ASCII ID names.

Since the trunk revision 56454 is likely included in the upcoming 2.67 release and file loading from non-ASCII paths will be okay, the reported issue will affect many Blender applications including Freestyle rendering. A timely fix of the issue would hence be greatly appreciated.

About the PyC_EncodeFSDefault implementation. I found there exists a PyUnicode_EncodeFSDefault in the python API, and is already used in Blender. The implementation is here: http://hg.python.org/cpython/file/56ca8eb5207a/Objects/unicodeobject.c

Perhaps we should just always call that instead to be sure it does the right thing, since your version seems to be a bit different? I'm not sure of how much using e.g. "surrogateescape" instead of "strict" matters in practice, but maybe it matters in some corners cases.

ALso the strcpy should be replaced by BLI_strncpy.

Hi Brecht and Campbell,

Thanks for the comments. I agree that PyUnicode_EncodeFSDefault would be better. Also BLI_strncpy is used instead of strcpy. I hope the updated patch set is okay now.

Any update on this? Many users want the problem to be resolved.

I keep seeing many Japanese Blender users suffering from this issue.

For ease of further tests and code review I have just added D267. Any action on the patch set is much appreciated.

Checking on this, note that filepaths in blender aren't strictly utf-8, for rna filepaths which are not utf8 we need to support these already.

I am afraid I missed the point. What do you mean by being not strictly utf-8?

The only requirement for the patch is that G.main->name and the ID name of text data blocks be encoded in UTF-8.
Are there cases where they are not encoded in UTF-8? If so, how do we know the encoding?

I had a conversation with Campbell on IRC. Here is a brief summary of the discussion.

The main concern is that the PyC_EncodeFSDefault() function I wrote may return NULL when the given string is not valid as a UTF-8 string, so that the conversion to the file system default encoding fails. Since callers of the function may not expect NULL, it would be nicer to return some string instead of NULL. It is recalled that the input string to PyC_EncodeFSDefault() is bpy.data.filepath + os.sep + text.name and what may cause the failure of encoding conversion is the first part. Two options were discussed:

  1. PyC_UnicodeFromByte() may provide a solution.
  2. Just omit the troublesome bpy.data.filepath and return text.name as a fallback.

Another concern is that some paths cannot be opened with byte strings at all (cf. BLI_fopen() and intern/utfconv/utf_winfunc.c). This may happen and there is no obvious solution for it.

Correction:

Another concern is that some paths cannot be opened with byte strings at all (cf. BLI_fopen() and intern/utfconv/utf_winfunc.c).

I meant on Windows.

I have updated D267 based on the discussions with Campbell concerning a junk UTF-8 string as input. Now the helper function PyC_EncodeFSDefault() (added by the patch) always return a valid string rather than NULL, so that callers of the function won't fail even when they don't expect NULL.

The chosen solution here is just to return the input string as it is when encoding conversion cannot be done.

I have examined several other fall back solutions, including:

  • PyUnicode_DecodeLatin1()
  • PyUnicode_DecodeRawUnicodeEscape()
  • PyUnicode_DecodeUnicodeEscape()

for conversion from bytes to unicode, and

  • PyUnicode_AsLatin1String()
  • PyUnicode_AsRawUnicodeEscapeString()
  • PyUnicode_AsUnicodeEscapeString()

for conversion from unicode to bytes. The Latin-1 decoder and encoder pass through all bytes, so they are equivalent to a simple copy of the input string to output. The UnicodeEscape and RawUnicodeEscape functions involve the additional handling of backslash escaped characters such as \xNN and \uNNNN. Since backslash is a directory separator on Windows, using these functions may further complicate error conditions.

Since a junk UTF-8 string cannot be a valid path anyway, it looks like the only possibility is to give up encoding conversion and fall back to BLI_strncpy().

Tamito Kajiyama (kjym3) renamed this task from Fix for Freestyle failure with .blend files loaded from paths including non-ASCII characters to Fix for Python failure with .blend files loaded from paths including non-ASCII characters.Feb 13 2014, 2:23 PM

Just changed the title, since the problem being addressed here is not Freestyle specific but related to Python in general.

I've been looking into this bug and oddly enough I can't redo it (my fs encoding is mbcs), but I've tried to use text which isnt mbcs compatible and its not givine a python error.

I'd still rather avoid adding new string conversion functions - since we can get bpy.data.filepathalready.

Note, we already ran into similar problems here.
http://bugs.python.org/issue9713

Committed change to merge text compile into a single function.

Attached a patch which I think fixes the problem for the text editor, by using Py_CompileStringObject so we can convert the string into a PyObject first (using the same method used to get bpy.data.filepath).

@Tamito Kajiyama (kjym3) could you check if this works for you?

Note, this doesn't deal with bpy_interface.c. - just compiling text.

@Campbell Barton (campbellbarton)
The changes by unicode_text.diff look okay. Now that we rely on Py_CompileStringObject() that takes a file name in the form of a Unicode object, we don't have to deal with the file system default encoding.

For Freestyle, we have to make similar changes to python_script_exec() called from BPY_text_exec().

It is noted that Py_CompileStringObject() is new in Python 3.4. Windows binaries of this version are not in the lib svn repository. I just built Python 3.4c1 myself from the tarball using VS 2008 to test the patch.

Below find a revised version of your patch set, including your original changes plus mine to get Freestyle working with non-ASCII file paths.

So OK to postpone this fix until 2.71? (or whenever we bundle Py3.4).

Yes (personally I prefer to have this fix asap though).

As we won't include Python 3.4 for Blender 2.70, removing Blender 2.70 here.

Just realized that the problem has been partly left not addressed. Please, consider reviewing the patch just added to the task.

Here is a quick .blend file for testing. Just press the "Run Script" button in the text editor. Without the proposed fix you will see a stack trace printed in the console (after the completion of the script execution).

@Tamito Kajiyama (kjym3). I can't redo the bug (MSVC2013, Windows7)

I tried to run and to register and both work fine. - print "hello" with no exception.


EDIT, somehow Missed D595, looks good.

Committed, resolved.

I am experiencing a related problem in the latest buildbot build as of Jun 19, (b49e6d0) (and all previous builds as far back as 2.65 that I've tried) on Japanese Windows 8 on a Vaio Ultrabook. Python fails on startup and I am unable to find any workaround to run Blender with its menus, etc intact (I've tried putting the Blender install directly under C:\ so as to avoid Japanese characters in the path, but no luck).

This also gives me the same error as described above in the console:

UnicodeDecodeError: 'mbcs' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page.

Should I open a new report for this?

Tony

Fwiw, I'm adding the screenshot I took of the Traceback.

Exactly the same error occurs when the name of the user running Blender contains non-ASCII MBCS-compatible characters such as those in Japanese. Patch D604 is intended to address this issue. The reported problem is different from what T35176 was meant to address, so filing another bug report is appreciated.

Ok, thanks. Created a separate bug report.