Fix for Python failure with .blend files loaded from paths including non-ASCII characters #35176

Closed
opened 2013-05-02 02:43:03 +02:00 by Tamito Kajiyama · 37 comments

%%%Test platform: 64-bit Windows Vista, with the active code page set to 932 (Japanese Shift_JIS)
Test Blender revision: 56458

Problem: Freestyle fails when .blend files are loaded from paths including non-ASCII characters (e.g., Japanese).
File loading is okay (thanks to the trunk revision 56454), but Freestyle rendering results in a Python error shown below:

UnicodeDecodeError: 'mbcs' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page.

The error comes from Py_CompileString() called from python_script_exec() in source/blender/python/intern/bpy_interface.c.
The second argument of Py_CompileString() must be a byte array in the default file system encoding (Py_FileSystemDefaultEncoding).
In the test platform, Py_FileSystemDefaultEncoding is "mbcs" (i.e., Windows-specific class of encodings).

Apparently, G.main->name is a byte array in the UTF-8 encoding (please confirm this).
This variable indicates the fully qualified name of the .blend file being opened and manipulated by Blender.
The .blend file name is then embedded in a string (fn_dummy) to be passed as the 2nd argument of Py_CompileString().

In summary, the UnicodeDecodeError above is resulting from an attempt in Py_CompileString() to decode a UTF-8 byte array as if
the byte array were in the MBCS encoding.

Attached patch is intended to fix the reported issue. Code review and testing are much appreciated.%%%

%%%Test platform: 64-bit Windows Vista, with the active code page set to 932 (Japanese Shift_JIS) Test Blender revision: 56458 Problem: Freestyle fails when .blend files are loaded from paths including non-ASCII characters (e.g., Japanese). File loading is okay (thanks to the trunk revision 56454), but Freestyle rendering results in a Python error shown below: UnicodeDecodeError: 'mbcs' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page. The error comes from Py_CompileString() called from python_script_exec() in source/blender/python/intern/bpy_interface.c. The second argument of Py_CompileString() must be a byte array in the default file system encoding (Py_FileSystemDefaultEncoding). In the test platform, Py_FileSystemDefaultEncoding is "mbcs" (i.e., Windows-specific class of encodings). Apparently, G.main->name is a byte array in the UTF-8 encoding (please confirm this). This variable indicates the fully qualified name of the .blend file being opened and manipulated by Blender. The .blend file name is then embedded in a string (fn_dummy) to be passed as the 2nd argument of Py_CompileString(). In summary, the UnicodeDecodeError above is resulting from an attempt in Py_CompileString() to decode a UTF-8 byte array as if the byte array were in the MBCS encoding. Attached patch is intended to fix the reported issue. Code review and testing are much appreciated.%%%
Author
Member

Changed status to: 'Open'

Changed status to: 'Open'

%%%Campbell is the expert on this, so assigning to him.

G.main->name is indeed UTF-8, and the python documentation mentions Py_CompileString expects a string in the file system encoding, so this change makes sense to me. PyRun_File in the same function looks like it needs the same conversion to the file system encoding?

Probably it would be good to split this into a utility function that converts a char* from UTF-8 to filesystem default encoding so it can be reused in different places.
%%%

%%%Campbell is the expert on this, so assigning to him. G.main->name is indeed UTF-8, and the python documentation mentions Py_CompileString expects a string in the file system encoding, so this change makes sense to me. PyRun_File in the same function looks like it needs the same conversion to the file system encoding? Probably it would be good to split this into a utility function that converts a char* from UTF-8 to filesystem default encoding so it can be reused in different places. %%%
Author
Member

%%%Hi Brecht and Campbell,

Thank you for the confirmation and positive response on this matter. Following the code review comments from Brecht, I have duly updated the patch set and made it more comprehensive. There are actually three places in the code base where file names in the file system default encoding are necessary (instead of UTF-8 as it is so far). It is noted that the identified string encoding inconsistency affects the following ways of Python script execution.

  • Execution of Freestyle scripts.
  • Execution of Python scripts using the command line option -P.
  • Execution of text datablocks (marked as Python modules) at start-up.
  • Reloading of imported modules using imp.reload().

The attached patch accounts for all these execution methods, allowing: (a) .blend files and Python scripts to reside in paths including non-ASCII characters; and (b) text datablocks to have non-ASCII ID names.

Since the trunk revision 56454 is likely included in the upcoming 2.67 release and file loading from non-ASCII paths will be okay, the reported issue will affect many Blender applications including Freestyle rendering. A timely fix of the issue would hence be greatly appreciated.%%%

%%%Hi Brecht and Campbell, Thank you for the confirmation and positive response on this matter. Following the code review comments from Brecht, I have duly updated the patch set and made it more comprehensive. There are actually three places in the code base where file names in the file system default encoding are necessary (instead of UTF-8 as it is so far). It is noted that the identified string encoding inconsistency affects the following ways of Python script execution. - Execution of Freestyle scripts. - Execution of Python scripts using the command line option -P. - Execution of text datablocks (marked as Python modules) at start-up. - Reloading of imported modules using imp.reload(). The attached patch accounts for all these execution methods, allowing: (a) .blend files and Python scripts to reside in paths including non-ASCII characters; and (b) text datablocks to have non-ASCII ID names. Since the trunk revision 56454 is likely included in the upcoming 2.67 release and file loading from non-ASCII paths will be okay, the reported issue will affect many Blender applications including Freestyle rendering. A timely fix of the issue would hence be greatly appreciated.%%%

%%%About the PyC_EncodeFSDefault implementation. I found there exists a PyUnicode_EncodeFSDefault in the python API, and is already used in Blender. The implementation is here: http://hg.python.org/cpython/file/56ca8eb5207a/Objects/unicodeobject.c

Perhaps we should just always call that instead to be sure it does the right thing, since your version seems to be a bit different? I'm not sure of how much using e.g. "surrogateescape" instead of "strict" matters in practice, but maybe it matters in some corners cases.

ALso the strcpy should be replaced by BLI_strncpy.%%%

%%%About the PyC_EncodeFSDefault implementation. I found there exists a PyUnicode_EncodeFSDefault in the python API, and is already used in Blender. The implementation is here: http://hg.python.org/cpython/file/56ca8eb5207a/Objects/unicodeobject.c Perhaps we should just always call that instead to be sure it does the right thing, since your version seems to be a bit different? I'm not sure of how much using e.g. "surrogateescape" instead of "strict" matters in practice, but maybe it matters in some corners cases. ALso the strcpy should be replaced by BLI_strncpy.%%%
Author
Member

%%%Hi Brecht and Campbell,

Thanks for the comments. I agree that PyUnicode_EncodeFSDefault would be better. Also BLI_strncpy is used instead of strcpy. I hope the updated patch set is okay now.%%%

%%%Hi Brecht and Campbell, Thanks for the comments. I agree that PyUnicode_EncodeFSDefault would be better. Also BLI_strncpy is used instead of strcpy. I hope the updated patch set is okay now.%%%
Member

Added subscriber: @IRIEShinsuke

Added subscriber: @IRIEShinsuke
Member

Any update on this? Many users want the problem to be resolved.

Any update on this? Many users want the problem to be resolved.
Author
Member

I keep seeing many Japanese Blender users suffering from this issue.

For ease of further tests and code review I have just added D267. Any action on the patch set is much appreciated.

I keep seeing many Japanese Blender users suffering from this issue. For ease of further tests and code review I have just added [D267](https://archive.blender.org/developer/D267). Any action on the patch set is much appreciated.

Checking on this, note that filepaths in blender aren't strictly utf-8, for rna filepaths which are not utf8 we need to support these already.

Checking on this, note that filepaths in blender aren't strictly utf-8, for rna filepaths which are not utf8 we need to support these already.
Author
Member

I am afraid I missed the point. What do you mean by being not strictly utf-8?

The only requirement for the patch is that G.main->name and the ID name of text data blocks be encoded in UTF-8.
Are there cases where they are not encoded in UTF-8? If so, how do we know the encoding?

I am afraid I missed the point. What do you mean by being not strictly utf-8? The only requirement for the patch is that G.main->name and the ID name of text data blocks be encoded in UTF-8. Are there cases where they are not encoded in UTF-8? If so, how do we know the encoding?
Author
Member

I had a conversation with Campbell on IRC. Here is a brief summary of the discussion.

The main concern is that the PyC_EncodeFSDefault() function I wrote may return NULL when the given string is not valid as a UTF-8 string, so that the conversion to the file system default encoding fails. Since callers of the function may not expect NULL, it would be nicer to return some string instead of NULL. It is recalled that the input string to PyC_EncodeFSDefault() is bpy.data.filepath + os.sep + text.name and what may cause the failure of encoding conversion is the first part. Two options were discussed:

  1. PyC_UnicodeFromByte() may provide a solution.
  2. Just omit the troublesome bpy.data.filepath and return text.name as a fallback.

Another concern is that some paths cannot be opened with byte strings at all (cf. BLI_fopen() and intern/utfconv/utf_winfunc.c). This may happen and there is no obvious solution for it.

I had a conversation with Campbell on IRC. Here is a brief summary of the discussion. The main concern is that the PyC_EncodeFSDefault() function I wrote may return NULL when the given string is not valid as a UTF-8 string, so that the conversion to the file system default encoding fails. Since callers of the function may not expect NULL, it would be nicer to return some string instead of NULL. It is recalled that the input string to PyC_EncodeFSDefault() is bpy.data.filepath + os.sep + text.name and what may cause the failure of encoding conversion is the first part. Two options were discussed: 1. PyC_UnicodeFromByte() may provide a solution. 2. Just omit the troublesome bpy.data.filepath and return text.name as a fallback. Another concern is that some paths cannot be opened with byte strings at all (cf. BLI_fopen() and intern/utfconv/utf_winfunc.c). This may happen and there is no obvious solution for it.
Author
Member

Correction:

Another concern is that some paths cannot be opened with byte strings at all (cf. BLI_fopen() and intern/utfconv/utf_winfunc.c).

I meant on Windows.

Correction: > Another concern is that some paths cannot be opened with byte strings at all (cf. BLI_fopen() and intern/utfconv/utf_winfunc.c). I meant on Windows.
Member

Added subscriber: @MartijnBerger

Added subscriber: @MartijnBerger
Member

Added subscriber: @Lockal

Added subscriber: @Lockal
Author
Member

I have updated D267 based on the discussions with Campbell concerning a junk UTF-8 string as input. Now the helper function PyC_EncodeFSDefault() (added by the patch) always return a valid string rather than NULL, so that callers of the function won't fail even when they don't expect NULL.

The chosen solution here is just to return the input string as it is when encoding conversion cannot be done.

I have examined several other fall back solutions, including:

  • PyUnicode_DecodeLatin1()
  • PyUnicode_DecodeRawUnicodeEscape()
  • PyUnicode_DecodeUnicodeEscape()
    for conversion from bytes to unicode, and
  • PyUnicode_AsLatin1String()
  • PyUnicode_AsRawUnicodeEscapeString()
  • PyUnicode_AsUnicodeEscapeString()
    for conversion from unicode to bytes. The Latin-1 decoder and encoder pass through all bytes, so they are equivalent to a simple copy of the input string to output. The UnicodeEscape and RawUnicodeEscape functions involve the additional handling of backslash escaped characters such as \xNN and \uNNNN. Since backslash is a directory separator on Windows, using these functions may further complicate error conditions.

Since a junk UTF-8 string cannot be a valid path anyway, it looks like the only possibility is to give up encoding conversion and fall back to BLI_strncpy().

I have updated [D267](https://archive.blender.org/developer/D267) based on the discussions with Campbell concerning a junk UTF-8 string as input. Now the helper function PyC_EncodeFSDefault() (added by the patch) always return a valid string rather than NULL, so that callers of the function won't fail even when they don't expect NULL. The chosen solution here is just to return the input string as it is when encoding conversion cannot be done. I have examined several other fall back solutions, including: - PyUnicode_DecodeLatin1() - PyUnicode_DecodeRawUnicodeEscape() - PyUnicode_DecodeUnicodeEscape() for conversion from bytes to unicode, and - PyUnicode_AsLatin1String() - PyUnicode_AsRawUnicodeEscapeString() - PyUnicode_AsUnicodeEscapeString() for conversion from unicode to bytes. The Latin-1 decoder and encoder pass through all bytes, so they are equivalent to a simple copy of the input string to output. The UnicodeEscape and RawUnicodeEscape functions involve the additional handling of backslash escaped characters such as \xNN and \uNNNN. Since backslash is a directory separator on Windows, using these functions may further complicate error conditions. Since a junk UTF-8 string cannot be a valid path anyway, it looks like the only possibility is to give up encoding conversion and fall back to BLI_strncpy().
Tamito Kajiyama changed title from Fix for Freestyle failure with .blend files loaded from paths including non-ASCII characters to Fix for Python failure with .blend files loaded from paths including non-ASCII characters 2014-02-13 14:23:42 +01:00
Author
Member

Just changed the title, since the problem being addressed here is not Freestyle specific but related to Python in general.

Just changed the title, since the problem being addressed here is not Freestyle specific but related to Python in general.

I've been looking into this bug and oddly enough I can't redo it (my fs encoding is mbcs), but I've tried to use text which isnt mbcs compatible and its not givine a python error.

I'd still rather avoid adding new string conversion functions - since we can get bpy.data.filepathalready.

Note, we already ran into similar problems here.
http://bugs.python.org/issue9713

Committed change to merge text compile into a single function.

Attached a patch which I think fixes the problem for the text editor, by using Py_CompileStringObject so we can convert the string into a PyObject first (using the same method used to get bpy.data.filepath).

@kjym3 could you check if this works for you?

unicode_text.diff

Note, this doesn't deal with bpy_interface.c. - just compiling text.

I've been looking into this bug and oddly enough I can't redo it (my fs encoding is `mbcs`), but I've tried to use text which isnt mbcs compatible and its not givine a python error. I'd still rather avoid adding new string conversion functions - since we can get `bpy.data.filepath`already. Note, we already ran into similar problems here. http://bugs.python.org/issue9713 Committed change to merge text compile into a single function. Attached a patch which I think fixes the problem for the text editor, by using `Py_CompileStringObject` so we can convert the string into a PyObject first (using the same method used to get `bpy.data.filepath`). @kjym3 could you check if this works for you? [unicode_text.diff](https://archive.blender.org/developer/F77245/unicode_text.diff) Note, this doesn't deal with `bpy_interface.c`. - just compiling text.
Author
Member

@ideasman42
The changes by unicode_text.diff look okay. Now that we rely on Py_CompileStringObject() that takes a file name in the form of a Unicode object, we don't have to deal with the file system default encoding.

For Freestyle, we have to make similar changes to python_script_exec() called from BPY_text_exec().

It is noted that Py_CompileStringObject() is new in Python 3.4. Windows binaries of this version are not in the lib svn repository. I just built Python 3.4c1 myself from the tarball using VS 2008 to test the patch.

Below find a revised version of your patch set, including your original changes plus mine to get Freestyle working with non-ASCII file paths.

unicode_text_v2.diff

@ideasman42 The changes by unicode_text.diff look okay. Now that we rely on Py_CompileStringObject() that takes a file name in the form of a Unicode object, we don't have to deal with the file system default encoding. For Freestyle, we have to make similar changes to python_script_exec() called from BPY_text_exec(). It is noted that Py_CompileStringObject() is new in Python 3.4. Windows binaries of this version are not in the lib svn repository. I just built Python 3.4c1 myself from the tarball using VS 2008 to test the patch. Below find a revised version of your patch set, including your original changes plus mine to get Freestyle working with non-ASCII file paths. [unicode_text_v2.diff](https://archive.blender.org/developer/F77515/unicode_text_v2.diff)

So OK to postpone this fix until 2.71? (or whenever we bundle Py3.4).

So OK to postpone this fix until 2.71? (or whenever we bundle Py3.4).
Author
Member

Yes (personally I prefer to have this fix asap though).

Yes (personally I prefer to have this fix asap though).

Added subscriber: @ThomasDinges

Added subscriber: @ThomasDinges

As we won't include Python 3.4 for Blender 2.70, removing Blender 2.70 here.

As we won't include Python 3.4 for Blender 2.70, removing Blender 2.70 here.

This issue was referenced by blender/blender-addons-contrib@4d1a109dde

This issue was referenced by blender/blender-addons-contrib@4d1a109ddec02ad7e527d8b65a5cdc8d4a7528a9

This issue was referenced by 4d1a109dde

This issue was referenced by 4d1a109ddec02ad7e527d8b65a5cdc8d4a7528a9

Changed status from 'Open' to: 'Resolved'

Changed status from 'Open' to: 'Resolved'

Closed by commit 4d1a109dde.

Closed by commit 4d1a109dde.
Author
Member

Just realized that the problem has been partly left not addressed. Please, consider reviewing the patch just added to the task.

Just realized that the problem has been partly left not addressed. Please, consider reviewing the patch just added to the task.
Author
Member

Changed status from 'Resolved' to: 'Open'

Changed status from 'Resolved' to: 'Open'
Author
Member

py-exec-fix-test.blend

Here is a quick .blend file for testing. Just press the "Run Script" button in the text editor. Without the proposed fix you will see a stack trace printed in the console (after the completion of the script execution).

[py-exec-fix-test.blend](https://archive.blender.org/developer/F94103/py-exec-fix-test.blend) Here is a quick .blend file for testing. Just press the "Run Script" button in the text editor. Without the proposed fix you will see a stack trace printed in the console (after the completion of the script execution).

@kjym3. I can't redo the bug (MSVC2013, Windows7)

I tried to run and to register and both work fine. - print "hello" with no exception.


EDIT, somehow Missed D595, looks good.

@kjym3. I can't redo the bug (MSVC2013, Windows7) I tried to run and to register and both work fine. - print "hello" with no exception. ---- EDIT, somehow Missed [D595](https://archive.blender.org/developer/D595), looks good.

Changed status from 'Open' to: 'Resolved'

Changed status from 'Open' to: 'Resolved'

Committed, resolved.

Committed, resolved.
Author
Member

Great, thanks!

Great, thanks!

Added subscriber: @TonyMullen

Added subscriber: @TonyMullen

I am experiencing a related problem in the latest buildbot build as of Jun 19, (b49e6d0) (and all previous builds as far back as 2.65 that I've tried) on Japanese Windows 8 on a Vaio Ultrabook. Python fails on startup and I am unable to find any workaround to run Blender with its menus, etc intact (I've tried putting the Blender install directly under C:\ so as to avoid Japanese characters in the path, but no luck).

This also gives me the same error as described above in the console:

UnicodeDecodeError: 'mbcs' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page.

Should I open a new report for this?

Tony

Fwiw, I'm adding the screenshot I took of the Traceback.

blenderscreenshot.PNG

I am experiencing a related problem in the latest buildbot build as of Jun 19, (b49e6d0) (and all previous builds as far back as 2.65 that I've tried) on Japanese Windows 8 on a Vaio Ultrabook. Python fails on startup and I am unable to find any workaround to run Blender with its menus, etc intact (I've tried putting the Blender install directly under C:\ so as to avoid Japanese characters in the path, but no luck). This also gives me the same error as described above in the console: UnicodeDecodeError: 'mbcs' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page. Should I open a new report for this? Tony Fwiw, I'm adding the screenshot I took of the Traceback. ![blenderscreenshot.PNG](https://archive.blender.org/developer/F95316/blenderscreenshot.PNG)
Author
Member

Exactly the same error occurs when the name of the user running Blender contains non-ASCII MBCS-compatible characters such as those in Japanese. Patch D604 is intended to address this issue. The reported problem is different from what #35176 was meant to address, so filing another bug report is appreciated.

Exactly the same error occurs when the name of the user running Blender contains non-ASCII MBCS-compatible characters such as those in Japanese. Patch [D604](https://archive.blender.org/developer/D604) is intended to address this issue. The reported problem is different from what #35176 was meant to address, so filing another bug report is appreciated.

Ok, thanks. Created a separate bug report.

Ok, thanks. Created a separate bug report.
Sign in to join this conversation.
No Label
Interest
Alembic
Interest
Animation & Rigging
Interest
Asset Browser
Interest
Asset Browser Project Overview
Interest
Audio
Interest
Automated Testing
Interest
Blender Asset Bundle
Interest
BlendFile
Interest
Collada
Interest
Compatibility
Interest
Compositing
Interest
Core
Interest
Cycles
Interest
Dependency Graph
Interest
Development Management
Interest
EEVEE
Interest
EEVEE & Viewport
Interest
Freestyle
Interest
Geometry Nodes
Interest
Grease Pencil
Interest
ID Management
Interest
Images & Movies
Interest
Import Export
Interest
Line Art
Interest
Masking
Interest
Metal
Interest
Modeling
Interest
Modifiers
Interest
Motion Tracking
Interest
Nodes & Physics
Interest
OpenGL
Interest
Overlay
Interest
Overrides
Interest
Performance
Interest
Physics
Interest
Pipeline, Assets & IO
Interest
Platforms, Builds & Tests
Interest
Python API
Interest
Render & Cycles
Interest
Render Pipeline
Interest
Sculpt, Paint & Texture
Interest
Text Editor
Interest
Translations
Interest
Triaging
Interest
Undo
Interest
USD
Interest
User Interface
Interest
UV Editing
Interest
VFX & Video
Interest
Video Sequencer
Interest
Virtual Reality
Interest
Vulkan
Interest
Wayland
Interest
Workbench
Interest: X11
Legacy
Blender 2.8 Project
Legacy
Milestone 1: Basic, Local Asset Browser
Legacy
OpenGL Error
Meta
Good First Issue
Meta
Papercut
Meta
Retrospective
Meta
Security
Module
Animation & Rigging
Module
Core
Module
Development Management
Module
EEVEE & Viewport
Module
Grease Pencil
Module
Modeling
Module
Nodes & Physics
Module
Pipeline, Assets & IO
Module
Platforms, Builds & Tests
Module
Python API
Module
Render & Cycles
Module
Sculpt, Paint & Texture
Module
Triaging
Module
User Interface
Module
VFX & Video
Platform
FreeBSD
Platform
Linux
Platform
macOS
Platform
Windows
Priority
High
Priority
Low
Priority
Normal
Priority
Unbreak Now!
Status
Archived
Status
Confirmed
Status
Duplicate
Status
Needs Info from Developers
Status
Needs Information from User
Status
Needs Triage
Status
Resolved
Type
Bug
Type
Design
Type
Known Issue
Type
Patch
Type
Report
Type
To Do
No Milestone
No project
No Assignees
9 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: blender/blender#35176
No description provided.