Discussion and questions on Unicode Han Unification

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Discussion and questions on Unicode Han Unification

Ange Gapes
Hello,

sorry this is not directly about bugs in Freefont, nor direct development matters, but I could not find a more generic ml for your project. But I think this kind of discussion is still of interest. Hopefully you will think so.

I recently came to some interest on the Han unification project and problem they implies for texts mixing languages. As you are a font project, I guess you know the issues, but for those who don't, I summarize this way: typically for the main 3 languages (Chinese, Japanese, and Korean, though these last one don't use them much in modern writing, hence CJK) who use Chinese-originated characters (Han characters), the Unicode project has decided to unite the character from a same origin (Han Unification: Unihan). This leads to problem when the actual writing of them is different depending on the actual country, sometimes slightly (style), sometimes in a more obvious way. The Wikipedia page has good examples on the issue: http://en.wikipedia.org/wiki/Unihan#Examples_of_language_dependent_characters (this is significant only if you have right fonts on the computers which will show actually the characters with difference).

The way it is dealt with is:
- you use only one of these languages, then you don't care and take only fonts which display your chosen language's way.
- if you read texts of several languages, or even mixed inside a same text, the text can have some kind of markup then different fonts are selected.This is the way it is done in html, hence you can see different fonts for the actually same unicode character in the Wikipedia page I showed before.

But what when you read raw text file without markup for instance? No sure way to tell the language for the editor and mixed characters won't show up.

So why do I tell this all to you? I would like to know your opinion, if not position, towards this Unicode decision. Do you have any remarks on it?
Also what does it mean for a project like yours? Is it possible in a same font family to provide several different fonts/design for the same character with "context" information (= this font is preferably for Chinese display only, unless no other choice, this one for Japanese, and so on) and a default one maybe (in case no context is available, use this "generic" design)? So that a software using your font only may still display different designs depending on the displayed language (if it knows it) or a default version otherwise...

On a side note, I read somewhere that there were maybe some other kinds of characters where similar problems arise. In particular I read on a website about another example of Arabic characters being used in several country/languages but displayed slightly differently. Yet after some search, I could not find actual information on this specific issue, so I don't know if it is true, or maybe it has been fixed since then by the Unicode project by assigning specific characters or control characters to change the display? (Arabic don't have that many characters as those East Asian languages, hence less space issue for duplicating characters)
Do you know about such specific Arabic-character issue? Or other issues with other glyphs in other alphabet?
Do you participate into Unicode standardization? Do you have details on what conducted to this internally? Is it really ONLY a space problem? Because even though there are for sure a lot of characters in these countries, it looks to me there are still a lot of slots unassigned, really far enough (that's how Unicode has been designed after all: with far enough slots for all history, as far as I know). So I don't see the points of keeping them for no reason (it's not like suddenly new alphabets of hundred of thousands of characters, all new, will be created in the next century).
And in the worst case, Unicode may still be extended.
So if you have any particularly interested link to discussion in the Unicode project (mailing lists maybe?) about how we came to this, this is interesting as well.
I will also myself ask directly to Unicode guys later, but I first wanted to know the opinion of a font project whose goal would be to span on all the Unicode. What does that imply for you?

And so on second level, why do I ask all this? Simply first of all I am interested in Unicode, in such questions, for personal use but also for pure intellectual interest (among other reasons, being myself involved in standardization processes, though not directly into Unicode, for now at least). Also because I think this is pretty sad and when I read about this, I didn't agree much with such moves (whereas the prime goal of Unicode was to support any existing character, so this looks like a step backwards; and also because we know that some countries, Japan at least for what I know, is not very into standardization, thus they don't use that much the Unicode encodings, like UTF-8, but localized encodings, and this kind of move won't make them want to change this).
And also because I am currently beginning to write what-may-become-a-book, in some future, not on this in particular, but this kind of topic may be part of it.
So thanks all. Any opinion and information on the topic would be greatly appreciated.

Ange

P.S.: and for personal use, a last question: do you plan on supporting these East-Asian characters in some foreseen future? In particular Japanese Hiragana-Katakana-Kanjis and Korean basic alphabet?
Reply | Threaded
Open this post in threaded view
|

Re: Discussion and questions on Unicode Han Unification

Steve White-9
Hi Ange,

Sorry for the long delay. I can't go into a lot of detail, although I
could say a lot here.
I'm just generally a little pressed for time right now.

At this time, I'll just answer your last (P.S.) question, briefly.

No there are no immediate plans to support CJK scripts in FreeFont.

The problem has to do with balancing resources against quality.  For
me to do the
work myself is out of the question: any of these languages would
hugely increase the
size of the project.  Regarding quality, I have looked at a lot of
free CJK fonts, to see
if I could simply drop them in, but none has satisfied me.  To
maintain the level of technical
quality of the glyphs that we are trying to achieve with the font
would involve a glyph-by-glyph
editing--months of work.

Another consideration is: what is really the point of having these
scripts in a single font file?
As I wrote in the article about policy
  http://www.gnu.org/software/freefont/articles/Why_Unicode_fonts.html
the point of such a family is a simple means of having mixed writing
systems (and symbols)
that look pretty good together in some sense.  This is fairly
meaningful for alphabetic scripts,
but for mixed alphabetic and ideogrammatic scripts, I have my doubts.
These days, with
font renderers that automatically find characters from installed
fonts, it  seems to me to be
of less importance.

Cheers!


On Wed, Jan 26, 2011 at 7:52 AM, Ange Gapes <[hidden email]> wrote:

> Hello,
>
> sorry this is not directly about bugs in Freefont, nor direct development
> matters, but I could not find a more generic ml for your project. But I
> think this kind of discussion is still of interest. Hopefully you will think
> so.
>
> I recently came to some interest on the Han unification project and problem
> they implies for texts mixing languages. As you are a font project, I guess
> you know the issues, but for those who don't, I summarize this way:
> typically for the main 3 languages (Chinese, Japanese, and Korean, though
> these last one don't use them much in modern writing, hence CJK) who use
> Chinese-originated characters (Han characters), the Unicode project has
> decided to unite the character from a same origin (Han Unification: Unihan).
> This leads to problem when the actual writing of them is different depending
> on the actual country, sometimes slightly (style), sometimes in a more
> obvious way. The Wikipedia page has good examples on the issue:
> http://en.wikipedia.org/wiki/Unihan#Examples_of_language_dependent_characters
> (this is significant only if you have right fonts on the computers which
> will show actually the characters with difference).
>
> The way it is dealt with is:
> - you use only one of these languages, then you don't care and take only
> fonts which display your chosen language's way.
> - if you read texts of several languages, or even mixed inside a same text,
> the text can have some kind of markup then different fonts are selected.This
> is the way it is done in html, hence you can see different fonts for the
> actually same unicode character in the Wikipedia page I showed before.
>
> But what when you read raw text file without markup for instance? No sure
> way to tell the language for the editor and mixed characters won't show up.
>
> So why do I tell this all to you? I would like to know your opinion, if not
> position, towards this Unicode decision. Do you have any remarks on it?
> Also what does it mean for a project like yours? Is it possible in a same
> font family to provide several different fonts/design for the same character
> with "context" information (= this font is preferably for Chinese display
> only, unless no other choice, this one for Japanese, and so on) and a
> default one maybe (in case no context is available, use this "generic"
> design)? So that a software using your font only may still display different
> designs depending on the displayed language (if it knows it) or a default
> version otherwise...
>
> On a side note, I read somewhere that there were maybe some other kinds of
> characters where similar problems arise. In particular I read on a website
> about another example of Arabic characters being used in several
> country/languages but displayed slightly differently. Yet after some search,
> I could not find actual information on this specific issue, so I don't know
> if it is true, or maybe it has been fixed since then by the Unicode project
> by assigning specific characters or control characters to change the
> display? (Arabic don't have that many characters as those East Asian
> languages, hence less space issue for duplicating characters)
> Do you know about such specific Arabic-character issue? Or other issues with
> other glyphs in other alphabet?
> Do you participate into Unicode standardization? Do you have details on what
> conducted to this internally? Is it really ONLY a space problem? Because
> even though there are for sure a lot of characters in these countries, it
> looks to me there are still a lot of slots unassigned, really far enough
> (that's how Unicode has been designed after all: with far enough slots for
> all history, as far as I know). So I don't see the points of keeping them
> for no reason (it's not like suddenly new alphabets of hundred of thousands
> of characters, all new, will be created in the next century).
> And in the worst case, Unicode may still be extended.
> So if you have any particularly interested link to discussion in the Unicode
> project (mailing lists maybe?) about how we came to this, this is
> interesting as well.
> I will also myself ask directly to Unicode guys later, but I first wanted to
> know the opinion of a font project whose goal would be to span on all the
> Unicode. What does that imply for you?
>
> And so on second level, why do I ask all this? Simply first of all I am
> interested in Unicode, in such questions, for personal use but also for pure
> intellectual interest (among other reasons, being myself involved in
> standardization processes, though not directly into Unicode, for now at
> least). Also because I think this is pretty sad and when I read about this,
> I didn't agree much with such moves (whereas the prime goal of Unicode was
> to support any existing character, so this looks like a step backwards; and
> also because we know that some countries, Japan at least for what I know, is
> not very into standardization, thus they don't use that much the Unicode
> encodings, like UTF-8, but localized encodings, and this kind of move won't
> make them want to change this).
> And also because I am currently beginning to write what-may-become-a-book,
> in some future, not on this in particular, but this kind of topic may be
> part of it.
> So thanks all. Any opinion and information on the topic would be greatly
> appreciated.
>
> Ange
>
> P.S.: and for personal use, a last question: do you plan on supporting these
> East-Asian characters in some foreseen future? In particular Japanese
> Hiragana-Katakana-Kanjis and Korean basic alphabet?
>

Reply | Threaded
Open this post in threaded view
|

Re: Discussion and questions on Unicode Han Unification

Ange Gapes
Hi,

On Thu, Feb 10, 2011 at 9:42 AM, Steve White <[hidden email]> wrote:
Hi Ange,

Sorry for the long delay. I can't go into a lot of detail, although I
could say a lot here.
I'm just generally a little pressed for time right now.

Sorry about my own (even longer) delay, I took your answer into consideration when it came in but never took the time to answer myself!
Anyway if ever you have, some day, time to answer the main questions about how the Unicode works as a decision entity, and why do they do the decisions they make… don't hesitate.

Moreover I came recently to another thought as they regularly standardize icons whose use case seems really doubtful (representing animals, people with funny costums, etc.). And they do a ton of them. So I guess that making custom kanjis for the various countries using them would indeed take even more place. But that would also be a better use of the place, I would say (I have nothing against though, if they really want *also* an icon representing a cake or smiling rabbits or whatever).
 
At this time, I'll just answer your last (P.S.) question, briefly.

No there are no immediate plans to support CJK scripts in FreeFont.

The problem has to do with balancing resources against quality.  For
me to do the
work myself is out of the question: any of these languages would
hugely increase the
size of the project.  Regarding quality, I have looked at a lot of
free CJK fonts, to see
if I could simply drop them in, but none has satisfied me.  To
maintain the level of technical
quality of the glyphs that we are trying to achieve with the font
would involve a glyph-by-glyph
editing--months of work.

Another consideration is: what is really the point of having these
scripts in a single font file?
As I wrote in the article about policy
 http://www.gnu.org/software/freefont/articles/Why_Unicode_fonts.html
the point of such a family is a simple means of having mixed writing
systems (and symbols)
that look pretty good together in some sense.  This is fairly
meaningful for alphabetic scripts,
but for mixed alphabetic and ideogrammatic scripts, I have my doubts.

I see. I don't know if I fully agree though. I guess you can still have some common style even between very different scripts (the width of the lines, a curved or square style, hand-written or typographic style, exotic writing styles, and so on).

But I understand your point of view. It is clear that the need of the 'é' looking like a 'e' is not the same as any alphabetic letter having a common style with an ideogram mixed in the same text.

These days, with
font renderers that automatically find characters from installed
fonts, it  seems to me to be
of less importance.

So somehow doesn't it partly negate your initial will to have a generic font (ok you answered above, so don't take this question into consideration. I am teasing)?
 
Cheers!

Anyway thanks for the answer! :-)

Ange
 
On Wed, Jan 26, 2011 at 7:52 AM, Ange Gapes <[hidden email]> wrote:
> Hello,
>
> sorry this is not directly about bugs in Freefont, nor direct development
> matters, but I could not find a more generic ml for your project. But I
> think this kind of discussion is still of interest. Hopefully you will think
> so.
>
> I recently came to some interest on the Han unification project and problem
> they implies for texts mixing languages. As you are a font project, I guess
> you know the issues, but for those who don't, I summarize this way:
> typically for the main 3 languages (Chinese, Japanese, and Korean, though
> these last one don't use them much in modern writing, hence CJK) who use
> Chinese-originated characters (Han characters), the Unicode project has
> decided to unite the character from a same origin (Han Unification: Unihan).
> This leads to problem when the actual writing of them is different depending
> on the actual country, sometimes slightly (style), sometimes in a more
> obvious way. The Wikipedia page has good examples on the issue:
> http://en.wikipedia.org/wiki/Unihan#Examples_of_language_dependent_characters
> (this is significant only if you have right fonts on the computers which
> will show actually the characters with difference).
>
> The way it is dealt with is:
> - you use only one of these languages, then you don't care and take only
> fonts which display your chosen language's way.
> - if you read texts of several languages, or even mixed inside a same text,
> the text can have some kind of markup then different fonts are selected.This
> is the way it is done in html, hence you can see different fonts for the
> actually same unicode character in the Wikipedia page I showed before.
>
> But what when you read raw text file without markup for instance? No sure
> way to tell the language for the editor and mixed characters won't show up.
>
> So why do I tell this all to you? I would like to know your opinion, if not
> position, towards this Unicode decision. Do you have any remarks on it?
> Also what does it mean for a project like yours? Is it possible in a same
> font family to provide several different fonts/design for the same character
> with "context" information (= this font is preferably for Chinese display
> only, unless no other choice, this one for Japanese, and so on) and a
> default one maybe (in case no context is available, use this "generic"
> design)? So that a software using your font only may still display different
> designs depending on the displayed language (if it knows it) or a default
> version otherwise...
>
> On a side note, I read somewhere that there were maybe some other kinds of
> characters where similar problems arise. In particular I read on a website
> about another example of Arabic characters being used in several
> country/languages but displayed slightly differently. Yet after some search,
> I could not find actual information on this specific issue, so I don't know
> if it is true, or maybe it has been fixed since then by the Unicode project
> by assigning specific characters or control characters to change the
> display? (Arabic don't have that many characters as those East Asian
> languages, hence less space issue for duplicating characters)
> Do you know about such specific Arabic-character issue? Or other issues with
> other glyphs in other alphabet?
> Do you participate into Unicode standardization? Do you have details on what
> conducted to this internally? Is it really ONLY a space problem? Because
> even though there are for sure a lot of characters in these countries, it
> looks to me there are still a lot of slots unassigned, really far enough
> (that's how Unicode has been designed after all: with far enough slots for
> all history, as far as I know). So I don't see the points of keeping them
> for no reason (it's not like suddenly new alphabets of hundred of thousands
> of characters, all new, will be created in the next century).
> And in the worst case, Unicode may still be extended.
> So if you have any particularly interested link to discussion in the Unicode
> project (mailing lists maybe?) about how we came to this, this is
> interesting as well.
> I will also myself ask directly to Unicode guys later, but I first wanted to
> know the opinion of a font project whose goal would be to span on all the
> Unicode. What does that imply for you?
>
> And so on second level, why do I ask all this? Simply first of all I am
> interested in Unicode, in such questions, for personal use but also for pure
> intellectual interest (among other reasons, being myself involved in
> standardization processes, though not directly into Unicode, for now at
> least). Also because I think this is pretty sad and when I read about this,
> I didn't agree much with such moves (whereas the prime goal of Unicode was
> to support any existing character, so this looks like a step backwards; and
> also because we know that some countries, Japan at least for what I know, is
> not very into standardization, thus they don't use that much the Unicode
> encodings, like UTF-8, but localized encodings, and this kind of move won't
> make them want to change this).
> And also because I am currently beginning to write what-may-become-a-book,
> in some future, not on this in particular, but this kind of topic may be
> part of it.
> So thanks all. Any opinion and information on the topic would be greatly
> appreciated.
>
> Ange
>
> P.S.: and for personal use, a last question: do you plan on supporting these
> East-Asian characters in some foreseen future? In particular Japanese
> Hiragana-Katakana-Kanjis and Korean basic alphabet?
>

Reply | Threaded
Open this post in threaded view
|

Re: Discussion and questions on Unicode Han Unification

Steve White-9
Hi Ange,

You wrote a lot of stuff, and asked a lot of questions, and some of
the topics are to my mind sort of mixed.  Furthermore, your recent
reply makes me wonder about my understanding of the questions you were
asking... It may be we are misunderstanding one another.


On Wed, Jan 26, 2011 at 7:52 AM, Ange Gapes <[hidden email]> wrote:
- Show quoted text -

> Hello,
>
> sorry this is not directly about bugs in Freefont, nor direct development
> matters, but I could not find a more generic ml for your project. But I
> think this kind of discussion is still of interest. Hopefully you will think
> so.
>
> I recently came to some interest on the Han unification project and problem
> they implies for texts mixing languages. As you are a font project, I guess
> you know the issues, but for those who don't, I summarize this way:
> typically for the main 3 languages (Chinese, Japanese, and Korean, though
> these last one don't use them much in modern writing, hence CJK) who use
> Chinese-originated characters (Han characters), the Unicode project has
> decided to unite the character from a same origin (Han Unification: Unihan).
> This leads to problem when the actual writing of them is different depending
> on the actual country, sometimes slightly (style), sometimes in a more
> obvious way. The Wikipedia page has good examples on the issue:
> http://en.wikipedia.org/wiki/Unihan#Examples_of_language_dependent_characters
> (this is significant only if you have right fonts on the computers which
> will show actually the characters with difference).
>
> The way it is dealt with is:
> - you use only one of these languages, then you don't care and take only
> fonts which display your chosen language's way.
> - if you read texts of several languages, or even mixed inside a same text,
> the text can have some kind of markup then different fonts are selected.This
> is the way it is done in html, hence you can see different fonts for the
> actually same unicode character in the Wikipedia page I showed before.
>
> But what when you read raw text file without markup for instance? No sure
> way to tell the language for the editor and mixed characters won't show up.
>
> So why do I tell this all to you? I would like to know your opinion, if not
> position, towards this Unicode decision. Do you have any remarks on it?

Overall, I think the choice was a good.  There were several issues
that had to be balanced.

From what I understand of your examples, I would say the Unicode
standard allows authors the latitude to handle the issues of shared
characters among scripts in more than one way, depending on their
needs.  There may have been specific oversights, but frankly, I don't
grasp
your objection.

In the case of a text file, the author has a choice of either
specifying a shared character that would be understood regardless of
the language context, or else a specially-encoded alternative for the
character.  It depends on what they want to do.

What would you propose?


> Also what does it mean for a project like yours?

The policy for FreeFont, and its rationale as a multi-script set of
fonts, is that it provides a set of characters that look OK together
in text of mixed writing systems.

If we were to support CJK, I'm fairly confident that there are
adequate technical means to
handle all the issues you raise

> Is it possible in a same font family to provide several different fonts/design for the same
> character with "context" information (= this font is preferably for Chinese display only,
> unless no other choice, this one for Japanese, and so on) and a default one maybe (in case > no context is available, use this "generic" design)?
> Is it possible in a same
> font family to provide several different fonts/design for the same character
> with "context" information (= this font is preferably for Chinese display
> only, unless no other choice, this one for Japanese, and so on) and a
> default one maybe (in case no context is available, use this "generic"
> design)? So that a software using your font only may still display different
> designs depending on the displayed language (if it knows it) or a default
> version otherwise...

Yes.

A TrueType/OpenType "feature" of a  font can indicate that the glyph
for a character ought to be replaced by another glyph, given it
appears in the context of a specified language script.  The current
FreeFont has some instances of just this kind of thing.

The feature could specify that a the glyph of different Unicode
character, or even a glyph with no Unicode encoding should serve as
the replacement.

It is up to the font rendering software to actually implement the
feature though.

And this of course has no bearing on the case of encoding text files.

> On a side note, I read somewhere that there were maybe some other kinds of
> characters where similar problems arise. In particular I read on a website
> about another example of Arabic characters being used in several
> country/languages but displayed slightly differently. Yet after some search,
> I could not find actual information on this specific issue, so I don't know
> if it is true, or maybe it has been fixed since then by the Unicode project
> by assigning specific characters or control characters to change the
> display? (Arabic don't have that many characters as those East Asian
> languages, hence less space issue for duplicating characters)
> Do you know about such specific Arabic-character issue? Or other issues with
> other glyphs in other alphabet?

I'm not the one to ask, although I know this issue, and others, arise.

In the case of Arabic mixed with Latin script, other problems arise, such as
accommodating the vertical range of Arabic while keeping a tight line height.

> Do you participate into Unicode standardization?

I don't at all.  Some people have been involved in both FreeFont and
the Unicode standardizations.

> Do you have details on what conducted to this internally?

No.

> Is it really ONLY a space problem?

Again, I have no inside information, but I see other issues more
essential than space.

In the case of Chinese, space for all the characters within 16 bits is
a concern.  It fits, but doesn't leave a lot of room.

But as to whether it's "ONLY a space problem", I would say no.
Unicode has a reluctance, if not a fixed policy against, making
separate encodings based on purely stylistic differences.  There are
cases of this, but I think somebody had to argue that somehow the
meaning of the character was different.

The characters shared among CJK are commonly viewed as being
historically and essentially the same characters, even if sometimes
the style of writing them has drifted in a different way in one region
than another.  Furthermore, the style of writing the characters has
also drifted over time, in the same place.

There are western analogs.

For one example, 200 years ago, English typography featured a "long
s", which is no longer commonly used, and is very confusing for modern
readers where it appears.
Unicode encodes a "long s" glyph.

If it's important to the content of the text, the specially-encode
long s can be used (at the danger that no font on the system has a
glyph for the letter and at the risk that the reader might not
recognize the old letter.)

If not, the author can just use "s" for the "long s", as is usually
done when transcribing these old texts.

> Because
> even though there are for sure a lot of characters in these countries, it
> looks to me there are still a lot of slots unassigned, really far enough
> (that's how Unicode has been designed after all: with far enough slots for
> all history, as far as I know). So I don't see the points of keeping them
> for no reason (it's not like suddenly new alphabets of hundred of thousands
> of characters, all new, will be created in the next century).
> And in the worst case, Unicode may still be extended.
> So if you have any particularly interested link to discussion in the Unicode
> project (mailing lists maybe?) about how we came to this, this is
> interesting as well.
> I will also myself ask directly to Unicode guys later, but I first wanted to
> know the opinion of a font project whose goal would be to span on all the
> Unicode. What does that imply for you?
>
I'm not sure what you're getting at here.

Do you think some characters are unrepresented somehow?

Note the distinction between a "character" and the form in which it is printed.
For example, the Latin letter a has many forms (which sometimes
amounted to regional variants), but these forms don't get different
encodings.

> And so on second level, why do I ask all this? Simply first of all I am
> interested in Unicode, in such questions, for personal use but also for pure
> intellectual interest (among other reasons, being myself involved in
> standardization processes, though not directly into Unicode, for now at
> least). Also because I think this is pretty sad and when I read about this,
> I didn't agree much with such moves (whereas the prime goal of Unicode was
> to support any existing character, so this looks like a step backwards; and
> also because we know that some countries, Japan at least for what I know, is
> not very into standardization, thus they don't use that much the Unicode
> encodings, like UTF-8, but localized encodings, and this kind of move won't
> make them want to change this).
>
Again it seems as if you think Unicode is failing to represent a character.

There are surely some omissions, but I expect they're very rare.

If I understand you rightly, I think you're missing something here.


First off, the notion of language distinction is a bit of an illusion.

The big nationalistic efforts of the 19th and 20th century enforced
the illusion of a single historical German and French and English, but
200 years ago, the languages across
Europe formed a sort of a continuum.  Likewise with Eastern languages.

What is worse, languages and their scripts change in time.

Would you have a separate encoding for each valley, for each town?
A separate encoding for each generation?

Or does one buy into these nationalistic notions of separate and
un-mixable peoples and traditions?

In this sense, the Unicode represents the historical fact and common
understanding that the shared Eastern idiographs come from a common
source.

And as I said, modern font technology can handle stylistic differences
between regional scripts, if it is called for.

Once again, it's a trade-off.  It's the real world.


> And also because I am currently beginning to write what-may-become-a-book,
> in some future, not on this in particular, but this kind of topic may be
> part of it.
>
Bonne chance!

In case become-a-book, be sure to post a pointer on this list!

> So thanks all. Any opinion and information on the topic would be greatly
> appreciated.
>
De nada.  But that's enough for now!

>
> P.S.: and for personal use, a last question: do you plan on supporting these
> East-Asian characters in some foreseen future? In particular Japanese
> Hiragana-Katakana-Kanjis and Korean basic alphabet?
>
(Answered previously)

Cheers!