Page MenuHome

Text rendering: Add full support for complex scripts layout (like devanagari, thai, etc.)
Confirmed, LowPublicTO DO

Description

There has been a (low-level) stream of reports since years now, about users wanting to write in Blender in their own language, which requires complex script layout.

This is currently not possible in Blender, we only have a very basic/simple layout code. Some scripts (like Arabic one) can partially work around the situation by using (pasting) some kind of 'pre-processed' strings using pre-layed-out unicode points instead of the 'raw' ones, but this means people cannot directly type in Blender in those languages either. And this is not available for other scripts, like many from South-East Asia (India, Cambodia…).

Most obvious solution currently would seem to be to replace out current layout code with HarffBuzz, probably together with fribidi.

This is not a terrific amount of work, but likely not a very simple one either, and though most of it should be limited to the BLF area of the code, it could require some changes elsewhere too. First step would be to try a quick proof-of-concept, to see how well that could be going…

See also D1809: Complex Text Layout, an old patch adding support to complex scripts.

Event Timeline

Bastien Montagne (mont29) lowered the priority of this task from 90 to Low.Jul 22 2019, 4:11 PM
Bastien Montagne (mont29) created this task.

I know that this is not a high-priority task, but still think we should consider it at some point. It would also be nice to be able to refer to it when closing related bug reports we keep getting (like lastest one, T67438: Error Font), and as a basis if someone volunteers to work on it. Might even make a nice GSoC project?

If the User Interface project admins agree with it, this should be listed in T63726: User Interface Module .

Will Text object language support be possible?

I'm fine with this being added on the UI module page.

@zdy (NGENNGT) what would be the point/usage for such a setting?

@Brecht Van Lommel (brecht) thx, will do then. :)

Bastien Montagne (mont29) renamed this task from Text rendering: Add full support for complex scripts layout (like devanagri, thai, etc.) to Text rendering: Add full support for complex scripts layout (like devanagari, thai, etc.).Jul 22 2019, 4:41 PM

mentioning T62011 addon to write reversed text order

Unknown Object (User) added a subscriber: Unknown Object (User).Jul 27 2021, 12:06 AM

pleeeeeaaaaaaaase someone do something about it.
isn't it possible to copy or inspirate from other open-source softwares that support it? such as Inkscape, GIMP, Scribus (version 1.5.x), LibreOffice, ect....
This is the ONLY reason that blender is not YET my primary software for video editing. if i can write arabic/persian text without preprocessing every single sentence with other tools.
It'll be AWESOME if this fixed.

This isn’t forgotten. Ideally this would be done by a developer who is fluent in a language that uses complex shaping and is also familiar with Blender source and the relevant libraries needed to do this.

This is also on my own very long “to do” list of text-related Blender changes, but it might take me a year or more to get to text direction and shaping depending on how many changes I can get accepted per release. For this issue this is mostly about getting our code into a position where we can output arrays of multiple glyphs rather than by code point as we are doing it now.

@Harley Acheson (harley) just came across this thread and wondering what is its status?

I used to work on ICU4C & ICU4J open source bidirectional engine for Arabic right to left and characters shaping back in the days, so thought to check if this would help in the Arabic issues in any way, it is a bit old but I think I still remember how this used to work with the Unicode bidi algorithm and start adding Arabic support to Blender, but will need someone to direct me on how to start

@Ayman Roshdy (ayman.roshdy) - just came across this thread and wondering what is its status?

The status is best described as "a few English speakers slowly trying to improve our international text handling".

This project as a whole has fairly complex needs and wants for text output, and only a few people familiar with (or interested in) this area of code, none of them native users of non-Latin scripts. I have a rough timeline for changes to our text output, that includes getting to bidirectional output and complex shaping here: https://developer.blender.org/T94030

The above is mostly me trying to clean up our text output code and simplify it before adding new features that I want (or need) to add before complex shaping. One notable prior step being D12622: BLF: Fallback Font Stack as that will free us from using our current monolithic metafonts. We want all language glyphs available to all users, regardless of output language selection but the way we currently do that makes it harder to add new languages and makes it more difficult for users to use their own fonts.

Another notable prerequisite is D13137: BLF: Implement FreeType Caching for fairly technical reasons. We require fast text output so we cache rendered glyphs. But we currently save these indexed by character code, while we will need fast access by glyph index once we do text shaping. It is easy enough to index our glyphs by glyph index, but then we lose quick lookups from charcode to glyph index. Rather than use the (slow) FT_Get_Char_Index all over I'd rather move to FreeType caching and use their FTC_CMapCache_Lookup. Using FreeType caching also fits well with the use of fallback fonts above, since fonts that are never used would not have to be ever fully loaded, or could be loaded and discarded when not needed.

Once our code is in nicer shape, is using a fallback font stack, and is using FreeType caching, it would be a nice time to handle bidirectional output and text shaping. And we could probably use some help at that time. At that time we'd basically be changing from our current "output ut8 strings, one charcode at a time" to converting runs of ut8 characters that use the same unicode block to arrays of charcodes that are optionally reordered (probably Fribidi) then converted to arrays of glpyh indexes and advances by Harfbuzz and then output.

I used to work on ICU4C & ICU4J open source bidirectional engine for Arabic right to left and characters shaping back in the days, so thought to check if this would help in the Arabic issues in any way...

Does ICU still include an implementation of the bidirectional algorithm and a text shaping engine? For some reason I thought it had these things things but then dropped them in favor or the use of fribidi and Harfbuzz. I could certainly be wrong. I ALSO want to add ICU anyway to our project for other internationalization issues like proper Unicode sorting.

Are there ways that you can see yourself getting involved and helping with this? There would be ways to help that could range from simple testing and feedback all the way to coding.

@Harley Acheson (harley) it seems you will have a very busy schedule ahead, good luck with that :)

ICU still has a full implementation of the bidirectional algorithm, back in the days I used to work in IBM Egypt and we had a project to participate in ICU with both the bidirectional algorithm plus the Arabic shaping engine that we used to have in IBM, I ported the shaping part myself inside the ICU4C code, in short ICU takes an array of characters (or a string) handles the Arabic RTL and the Shaping then returns the new string, which I believe matches your target, Arabic specifically can't be handled as one character at a time because the shape of each character differs based on the character that comes before it and the character that comes after it, plus of course a bit complex characters like the Lam + Alef combinations and what we call in Arabic Tashkeel that are mainly like small glyphs that should appear above or below the characters to handle the Arabic grammar in the correct way, I can say that all of this will be a bit of a science fiction for non-Arabic speaking developers!

Reference to the shaping engine HYG the code written in C ushape.cpp
It is nice that I found my name still kept in the file after all of these years - this dates back to 2000 - 2001 time frame :)

Anyway I'm willing to help by porting whatever code we need to support Arabic in Blender, and later in testing of course. One thing that I tried just now is that I opened "Blender -> File -> Open -> Blender File View", and tested to see what will happen for a folder name written in Arabic, and as expected it is not in RTL and no shaping, so if you can guide me to which file in Blender source code handles this then I can try testing with ICU4C and see how we can make the 'Blender File View' display Arabic in the correct way, once this test is complete I believe we can go ahead and start porting (or actually calling) the same code every where that reads/writes text

What do you think?

@Ayman Roshdy (ayman.roshdy) - ICU still has a full implementation of the bidirectional algorithm...

Nice!

Shaping then returns the new string, which I believe matches your target, Arabic specifically can't be handled as one character at a time...

I have rudimentary knowledge of output ordering and shaping, but no direct experience in implementing either. I do know that Arabic characters differ depending on word position, so our current text output will need quite a bit of work to get there. We have had some demonstration patches that show complex shaping working in our codebase, but not in a way that works well...

Just for background, defining some terms. All of the Unicode characters have set 32-bit values that I will call "charcodes" here. Each font has a number of glyphs that are in an order that does not match charcodes. As in the glyph in one font with index 1 might be for charcode U+0020, the 60th glyph could be charcode U+0904 in some other font.

We currently start with a ut8 string, loop through each utf32 character at a time, treat each as a charcode. We look in our glpyh cache for it (by charcode), grab the glyph if there or make it if not. That cached glyph also contains the advance value so we then output the glyph and move by the advance. With some added complications to do with hinting and kerning.

Planned loop would instead go through our input ut8 string and group runs that are in the same default script. So a string containing English, Chinese, and then Arabic might turn into three separate runs. For each run we need to convert to an array of 32-bit charcodes, then give this to a bidirectional function to reorder as needed, based on detected script. Then this array is sent to a shaper like Harfbuzz which gives us back pairs of glyph indexes and advances. We then need to find our glyphs in our cache and output these and move with the advances.

That's my current understanding of the process anyway. I might have something wrong or have to alter my thoughts as I get there. But the endgame is having a user running Blender in Spanish yet being able to enter a text string containing a mixture of Japanese, Devanagari, and Arabic and have it not only display correctly but also select and edit correctly. And be able to convert the entire thing to curves to make 3D shapes from them too.

What do you think?

I think you might come in handy! You should probably start by getting the blender source compiling. There is good instructions for that here: https://wiki.blender.org/wiki/Building_Blender

Once are compiling a good start would be looking at some of my own patches to do with these things, https://developer.blender.org/T94030. Most of the meat of our text output that will be affected by text shaping are touched by those patches, mostly blf.c and blf_glyph.c

If you follow the link in my name you will find lots of ways to get in touch, with blender.chat being the most convenient. I am at UTC-8 though, so might not always be available, depending on your own location.

Seems like a good algorithm to handle text with different languages, and BTW if we are to go with ICU4C then we don't really need Harfbuzz as ICU comes with its own shaping engine

I will start by blender source compiling and having a build environment up and running, and I already downloaded the latest ICU4C and started to refresh my memory, will catch you on blender.chat later once I played a bit with the code, I am at UTC+2 but I don't have a problem connecting with you during my night which will probably be your morning to your noon, since I will be probably working on this at night anyway

Chat with you again soon
Have a nice day

@Ayman Roshdy (ayman.roshdy) - if we are to go with ICU4C then we don't really need Harfbuzz as ICU comes with its own shaping engine

I'm seeing that as being deprecated in ICU 54 and removed in version 58. Now at version 70.1

https://unicode-org.github.io/icu/userguide/layoutengine/

@Harley Acheson (harley) I didn't see this comment in version 58, I only downloaded the 70.1 version and searched for the code and found it there, anyway let's think about the shaping engine to use after I am more familiar with the code, working on this step now

Whenever I want to write Persian text in a Left-to-Right software (e.g. unity or blender) I use this site: http://bobardo.com/reshaper/
in this site the algorithm and code for converting is based on JavaScript. its code is visible by inspecting. also this site works offline so you can download it and see it with an IDE. maybe that code may help with what you're doing by reverse engineering its algorithm for converting LTR to RTL... I don't know