i18n messages extraction script: fix handling of C unicode-escapes.

rB1f5647c07d15 introduced for the first time a unicode escape in strings to be translated, directly extracted from C-code itself. This revealed that this case was not properly handled by current code, for now we work around using `raw_unicode_escape` encoding/decoding of python.
2021-02-22 18:29:52 +01:00 · 2021-02-22 18:29:52 +01:00 · 32073993a8
parent 46bdf6d59f
commit 32073993a8
1 changed files with 3 additions and 1 deletions
--- a/release/scripts/modules/bl_i18n_utils/bl_extract_messages.py
+++ b/release/scripts/modules/bl_i18n_utils/bl_extract_messages.py
@ -735,7 +735,9 @@ def dump_src_messages(msgs, reports, settings):
    _clean_str = re.compile(settings.str_clean_re).finditer

    def clean_str(s):
-        return "".join(m.group("clean") for m in _clean_str(s))
+        # The encode/decode to/from 'raw_unicode_escape' allows to transform the C-type unicode hexadecimal escapes
+        # (like '\u2715' for the '×' symbol) back into a proper unicode character.
+        return "".join(m.group("clean") for m in _clean_str(s)).encode('raw_unicode_escape').decode('raw_unicode_escape')

    def dump_src_file(path, rel_path, msgs, reports, settings):
        def process_entry(_msgctxt, _msgid):