New glyphs are unixxxx while other glyphs are uniXXXX.

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

New glyphs are unixxxx while other glyphs are uniXXXX.

Tae Wong
New glyphs are unixxxx while other glyphs are uniXXXX.

For example, in FreeSerif Regular, the glyph 0x1D29 is named uni1d29
and not uni1D29.

This is because the maintainer created some glyphs that use the
unixxxx part. SC UniPad is not updated and does not use Unicode 6.0.

Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

Steve White-12
Hello, Tae Wong,

If I understand you correctly, you are referring to the *names* of
glyphs in the font, as
opposed to their encoding.

It is true that sometimes uppercase hex digits are used and sometimes lowercase.
That could be considered aesthetically displeasing, but usually I
don't regard any
reasonable name for a glyph as a *bug* -- our policy is that glyphs in
a Unicode font
must be referred to *only* by their Unicode encoding, and that the
name is *only* for
human consumption.

I'm not familiar with SC UniPad, and you didn't say quite what effect
you are seeing with it.
If it uses glyph names to refer to glyphs in the font, I would call
that a bug with that software.

If you could give us a test case so we could see for ourselves, it
would be clearer
what is going on.

Let us know!

On Fri, Jun 21, 2013 at 3:22 PM, Tae Wong <[hidden email]> wrote:
> New glyphs are unixxxx while other glyphs are uniXXXX.
>
> For example, in FreeSerif Regular, the glyph 0x1D29 is named uni1d29
> and not uni1D29.
>
> This is because the maintainer created some glyphs that use the
> unixxxx part. SC UniPad is not updated and does not use Unicode 6.0.
>

Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

BobH-5
On 2013-06-21 at 10:26 Steve White stevan.white-at-gmail.com |OpenType
stuff| wrote:
> Hello, Tae Wong,
>
> If I understand you correctly, you are referring to the *names* of
> glyphs in the font, as
> opposed to their encoding.

Yes, that is how I interpreted it also

> It is true that sometimes uppercase hex digits are used and sometimes lowercase.
> That could be considered aesthetically displeasing, but usually I
> don't regard any reasonable name for a glyph as a *bug*

According to
http://sourceforge.net/adobe/aglfn/wiki/AGL%20Specification/ uni names
must use uppercase digits to be recognized as representing a Unicode
character.

> -- our policy is that glyphs in a Unicode font
> must be referred to *only* by their Unicode encoding, and that the
> name is *only* for human consumption.

Bad assumption, imo. Certain PDF generators will result in PDF files
that do not include the Unicode text data, and if someone wants to "copy
and paste" from such PDFs then Reader (or whatever) uses the glyph names
to deduce the original character string. In this case, "uni" names with
lower case won't be recognized.

I'd say use of lower case is a bug.

regards,

Bob Hallissy

Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

BobH-5
In reply to this post by Steve White-12
PS:

On 2013-06-21 at 10:26 Steve White stevan.white-at-gmail.com |OpenType stuff| wrote:
our policy is that glyphs in a Unicode font
must be referred to *only* by their Unicode encoding

What about unencoded glyphs (such as ligatures or conjuncts or contextual variants) that are targeted by OpenType lookups?  There is no way to refer to these by their "Unicode encoding".

Bob
Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

Bugzilla from moyogo@gmail.com
On Fri, Jun 21, 2013 at 5:02 PM, BobH <[hidden email]> wrote:

> PS:
>
>
> On 2013-06-21 at 10:26 Steve White stevan.white-at-gmail.com |OpenType
> stuff| wrote:
>
> our policy is that glyphs in a Unicode font
> must be referred to *only* by their Unicode encoding
>
>
> What about unencoded glyphs (such as ligatures or conjuncts or contextual
> variants) that are targeted by OpenType lookups?  There is no way to refer
> to these by their "Unicode encoding".
>
> Bob

A ligature or conjunct composed of uniXXXX and uniYYYY should be
called uniXXXXYYYY, if there are AGLFN names then it should be
name1_name2 for name1 and name2. For contextual variants its
uniXXXX.whatever.
This is all explained in
http://sourceforge.net/adobe/aglfn/wiki/AGL%20Specification/

--
Denis Moyogo Jacquerye
African Network for Localisation http://www.africanlocalisation.net/
Nkótá ya Kongó míbalé --- http://info-langues-congo.1sd.org/
DejaVu fonts --- http://www.dejavu-fonts.org/

Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

Steve White-12
Hi,

Bob, the Adobe Glyph List is deprecated.  It has not been maintained,
as it was an unfortunate idea to begin with.
We largely ignore it (except for some of the names which were good
ideas individually).

The PDF text-copying issue is known to us.  It is indeed a problem,
but no efforts on the font side will fix it.

The attempt to re-construct text from named glyph fragments was an
idea--unfortunately, an idea that could never work in general.  It
rests on severe assumptions about how glyphs might be assembled to
form other glyphs, which would be an unworkable restriction for many
font designers, and which are not followed in practice.

For one example: a ligature glyph might be formed by several different
combinations of characters.  Therefore the mapping from ligature to
text cannot generally work, as it's a one-to-many mapping.  But it
gets worse than that.

Because of this, for complex text including combining forms and
ligatures, copying text from PDF often fails, regardless of the font
used.  It's especially catastrophic with Indic text, and often can't
reconstruct anything readable.

The solution for PDF is to embed the Unicode text.

You talked about other practical considerations regarding unencoded glyphs.

First, the font's lookup tables (including the ones you mentioned) may
indeed refer to un-encoded glyphs (a main purpose of the Private Use
areas in Unicode is to provide slots for such glyphs within a font).
The purpose of these is to go in the opposite direction from the
text-copy issue--from encoded text to graphical rendering. It is to be
regarded as a reference to a glyph *internal to the font*. These
references are used by the font rendering libraries, but should not
normally be accessed from application software.  The mechanism works
so long as the glyph names are unique, and so the AGL has no bearing
on the subject.

Second, special-purpose applications can refer to unencoded glyphs in
special-purpose fonts in  multiple ways,  including glyph number and
glyph name.   By definition, there is no standard for this, so it has
nothing to do with Unicode, and by the same token has nothing to do
with the AGL.

FreeFont is explicitly and essentially a Unicode font.  We would
consider re-naming a glyph in the interest of clarity or internal
consistency, but *not* to refer to unencoded glyphs from application
software.

Cheers!

Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

BobH-5
You may do as you wish, of course.  And you need not reply to this, but lest I be misunderstood...

On 2013-06-22 at 5:28 Steve White stevan.white-at-gmail.com |OpenType stuff| wrote:
Hi,

Bob, the Adobe Glyph List is deprecated.  It has not been maintained,
as it was an unfortunate idea to begin with.
The link I gave refers both to the
  • AGL (which, although not deprecated in the tradition sense, it is, by conscious decision, not being extended) and
  • The Adobe Glyph List for New Fonts.
The latter is far from deprecated -- it is the standard way of naming glyphs for most type designers I know.

For their part, Microsoft and ISO both recommend names conform to the Adobe standard.
 
We largely ignore it (except for some of the names which were good
ideas individually).

Certainly your prerogative. But if one chooses this route, I'd question including any names at all in the finished font -- just move to post fmt3 table.

Cheers,
Bob

Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

Steve White-12
BobH

On Mon, Jun 24, 2013 at 5:34 AM, BobH <[hidden email]> wrote:

> You may do as you wish, of course.  And you need not reply to this, but lest
> I be misunderstood...
>
>
> On 2013-06-22 at 5:28 Steve White stevan.white-at-gmail.com |OpenType stuff|
> wrote:
>
> Bob, the Adobe Glyph List is deprecated.  It has not been maintained,
> as it was an unfortunate idea to begin with.
>
> The link I gave refers both to the
>
> AGL (which, although not deprecated in the tradition sense, it is, by
> conscious decision, not being extended) and
> The Adobe Glyph List for New Fonts.
>
> The latter is far from deprecated -- it is the standard way of naming glyphs
> for most type designers I know.
>
I do not have access to statistics of how many fonts follow Adobe's convention.
That might be helpful information in this discussion.
However, I do know many fonts just ignore it, because it is very ugly,
and not very helpful.

> For their part, Microsoft and ISO both recommend names conform to the Adobe
> standard.
>
I know.

The ostensible reason for this naming scheme does not make sense, as I said.
The only other argument I have ever heard for propagating it is to
support legacy
software.  (A few printer problems too have occasionally been reported that
*might* be related to glyph names,but they are few and far between, and I can
only regard this as firmware bugs.)

My catch on it is, it would be better for the legacy software to be
replaced, than to
force this ugly naming convention on new fonts.

(BTW I still have never seen the bug Tae Wong mentioned with my own eyes, so I
can't be 100% sure the bug really was due to glyph naming.
A test file and thorough description of the problem might resolve that
question.)

> We largely ignore it (except for some of the names which were good
> ideas individually).
>
> Certainly your prerogative. But if one chooses this route, I'd question
> including any names at all in the finished font -- just move to post fmt3
> table.
>
This is certainly a thought.  The usefulness of glyph names in the font binaries
is questionable.

Human-readable glyph names are very helpful to anybody who inspects the font,
to understand what the font designer intended.  If you've ever dealt
with Indic or
Arabic scripts, you know the tables can be almost impenetrable.

Do glyph names belong in the binary though?  What purpose do they serve there?

One view has been that the binaries and the SFD file are  interchangeble,
the SFD being merely a human-readable version.  I have usually taken this view.

However, there are other bits of information in the SFD file that is
not supported in the
binaries (table names anyway---I can't think of anything else except
info caches).
So the SFD may be regarded as "source".

In the view that the SFD file is source, and given that with GPLed
software the user must
have access to the source, it makes sense to say, an interested user
could always
look into the SFD file to find the glyph names.

Of course for the purposes of text display, all this human-readable
stuff is extraneous.

Removing the glyph names wouldn't have any effect on the PDF text-copy
issue, I think.
(I only know the one way I mentioned to resolve that.)
I wonder if it would make Tae Wong's problem go away -- but is making
that problem to
go away the right thing to do?  Maybe that software ought to be fixed.
So it isn't clear to me your proposal is toward the bug at hand.

Bob, I'll think about your proposal.  Let me know if you have further
thoughts on it.
It might be worth opening a bug report on it, where it can be further discussed.

Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

Steve White-12
Hi again Bob,

I've experimented with this, and now I think I spoke too soon.
But maybe I don't understand your suggestion.

The glyphs *must* have unique names.
The options include simply changing the names to Adobe ones.
Is this what you're suggesting?

My problem with this is: I do not approve of the Adobe names.  If it's
a choice between our meaningful names and the ill-advised, ugly Adobe
ones--well, the choice is clear.

I guess when I responded before, I was thinking as if, somehow the
names would be just the slot numbers or something -- removing all
trace of interpretation, and maybe saving a small amount of space...I
see no option for that however.

Still thinking about it...

On Mon, Jun 24, 2013 at 12:41 PM, Steve White <[hidden email]> wrote:

> BobH
>
> On Mon, Jun 24, 2013 at 5:34 AM, BobH <[hidden email]> wrote:
>> You may do as you wish, of course.  And you need not reply to this, but lest
>> I be misunderstood...
>>
>>
>> On 2013-06-22 at 5:28 Steve White stevan.white-at-gmail.com |OpenType stuff|
>> wrote:
>>
>> Bob, the Adobe Glyph List is deprecated.  It has not been maintained,
>> as it was an unfortunate idea to begin with.
>>
>> The link I gave refers both to the
>>
>> AGL (which, although not deprecated in the tradition sense, it is, by
>> conscious decision, not being extended) and
>> The Adobe Glyph List for New Fonts.
>>
>> The latter is far from deprecated -- it is the standard way of naming glyphs
>> for most type designers I know.
>>
> I do not have access to statistics of how many fonts follow Adobe's convention.
> That might be helpful information in this discussion.
> However, I do know many fonts just ignore it, because it is very ugly,
> and not very helpful.
>
>> For their part, Microsoft and ISO both recommend names conform to the Adobe
>> standard.
>>
> I know.
>
> The ostensible reason for this naming scheme does not make sense, as I said.
> The only other argument I have ever heard for propagating it is to
> support legacy
> software.  (A few printer problems too have occasionally been reported that
> *might* be related to glyph names,but they are few and far between, and I can
> only regard this as firmware bugs.)
>
> My catch on it is, it would be better for the legacy software to be
> replaced, than to
> force this ugly naming convention on new fonts.
>
> (BTW I still have never seen the bug Tae Wong mentioned with my own eyes, so I
> can't be 100% sure the bug really was due to glyph naming.
> A test file and thorough description of the problem might resolve that
> question.)
>
>> We largely ignore it (except for some of the names which were good
>> ideas individually).
>>
>> Certainly your prerogative. But if one chooses this route, I'd question
>> including any names at all in the finished font -- just move to post fmt3
>> table.
>>
> This is certainly a thought.  The usefulness of glyph names in the font binaries
> is questionable.
>
> Human-readable glyph names are very helpful to anybody who inspects the font,
> to understand what the font designer intended.  If you've ever dealt
> with Indic or
> Arabic scripts, you know the tables can be almost impenetrable.
>
> Do glyph names belong in the binary though?  What purpose do they serve there?
>
> One view has been that the binaries and the SFD file are  interchangeble,
> the SFD being merely a human-readable version.  I have usually taken this view.
>
> However, there are other bits of information in the SFD file that is
> not supported in the
> binaries (table names anyway---I can't think of anything else except
> info caches).
> So the SFD may be regarded as "source".
>
> In the view that the SFD file is source, and given that with GPLed
> software the user must
> have access to the source, it makes sense to say, an interested user
> could always
> look into the SFD file to find the glyph names.
>
> Of course for the purposes of text display, all this human-readable
> stuff is extraneous.
>
> Removing the glyph names wouldn't have any effect on the PDF text-copy
> issue, I think.
> (I only know the one way I mentioned to resolve that.)
> I wonder if it would make Tae Wong's problem go away -- but is making
> that problem to
> go away the right thing to do?  Maybe that software ought to be fixed.
> So it isn't clear to me your proposal is toward the bug at hand.
>
> Bob, I'll think about your proposal.  Let me know if you have further
> thoughts on it.
> It might be worth opening a bug report on it, where it can be further discussed.
>
> Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

BobH-5
Hi Steve,

On 2013-06-25 at 9:12 you wrote:
Hi again Bob,

I've experimented with this, and now I think I spoke too soon.
But maybe I don't understand your suggestion.

The glyphs *must* have unique names.

Why?  (only slightly rhetorical)

If they *must* have names, then those names are being used by one or more processes (human and/or machine) and, if that is true then the names need to conform to whatever such processes require.  But you have claimed you don't need to conform to at least one such machine process (PDF text copy), so I'm trying to understand what process(es) you believe do require names.

Once one knows what processes are required, only then can one decide what the name requirements are, if any.

My problem with this is: I do not approve of the Adobe names.  

Can you give examples of what names (from the Adobe Glyph List For New fonts) you find objectionable and why they are so?

Or is it the "uni" and "u" names (used for everything not in the aglfn) that you don't like?

Bob
Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

Steve White-12
Hi

On Tue, Jun 25, 2013 at 8:28 PM, BobH <[hidden email]> wrote:
> On 2013-06-25 at 9:12 you wrote:
>
> I've experimented with this, and now I think I spoke too soon.
> But maybe I don't understand your suggestion.
>
> The glyphs *must* have unique names.
>
> Why?  (only slightly rhetorical)
>
I mixed things up there.  Internally references are made by glyph ID,
so I know of no structural reason why they must be unique.
FontForge correctly insists on uniqueness when names are being set.

I'm talking about font generation by FontForge.
I find no way to *remove* the glyph names.
Only ways to replace them with other sets of glyph names.
Am I missing something?

> If they *must* have names, then those names are being used by one or more
> processes (human and/or machine) and, if that is true then the names need to
> conform to whatever such processes require.  But you have claimed you don't
> need to conform to at least one such machine process (PDF text copy),

Our conformance to this broken idea would not unbreak it.
It might serve to prolong it, but I think that is detrimental to the
public good.

> so I'm
> trying to understand what process(es) you believe do require names.
>
??  Very few. Have you misunderstood me.

As I wrote in the posting before last, for the primary purpose of
display, the names are superfluous.

The only process I even know of that used glyph names and had a
legitimate intent was the PDF copying one.  But again, I regard that
as broken and defunct.

> Once one knows what processes are required, only then can one decide what
> the name requirements are, if any.
>
> My problem with this is: I do not approve of the Adobe names.
>
> Can you give examples of what names (from the Adobe Glyph List For New
> fonts) you find objectionable and why they are so?
>
> Or is it the "uni" and "u" names (used for everything not in the aglfn) that
> you don't like?
>
It's the AGLFN names I don't like.  They amount to a loss of
information, besides being ugly.

There are a few AGL names that are just wrong (and there are also a
few Unicode names that are just wrong too).  These things happen.  To
be honest, more names in FreeFont are just wrong, than either the AGL
or Unicode.  But they're *our* names, and have other advantages too
(brevity, etc).  But again, the functioning of outside code must not
depend on glyph names.

Keep me thinking, Bob!

Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

BobH-5
On 2013-06-25 at 18:15 Steve White stevan.white-at-gmail.com |OpenType
stuff| wrote:
>> On 2013-06-25 at 9:12 you wrote:
>>
>> I've experimented with this, and now I think I spoke too soon.
>> But maybe I don't understand your suggestion.
>>
>> The glyphs *must* have unique names.

First, apologies for addressing only one of the two areas your statement
raised.  My email was focused on whether or not glyphs had to have names
at all -- I didn't address whether names had to be unique.

To address that fully: *IF* the glyphs have names, then (except for
".notdef") names should be used only once within the font. Or said
another way: glyph names must be unique except that many glyphs can be
named ".notdef" (which really means it has no name).

> I mixed things up there.  Internally references are made by glyph ID,
> so I know of no structural reason why they must be unique.
> FontForge correctly insists on uniqueness when names are being set.

Right.

> I'm talking about font generation by FontForge.
> I find no way to *remove* the glyph names.
> Only ways to replace them with other sets of glyph names.
> Am I missing something?

I think we're on a rabbit trail here (because the OpenType spec
recommends that ttf-flavored fonts should have glyph names) but if you
wanted to pursue this then you need to see if there are settings in
FontForge that will cause it to generate a Format 3 post table. If not,
you can use any number of font hacking tools to change the post table.
If you want help I can show you how to use the Font::TTF Perl module to
do this.

>> so I'm
>> trying to understand what process(es) you believe do require names.
>>
> ??  Very few. Have you misunderstood me.

I take this to mean that you use the names for development process but
beyond that don't think the names have purpose.

However, I suspect that neither of us know whether names are essential
to correctly render a font on all variety of platforms/drivers/printers
that might be required.

> It's the AGLFN names I don't like. They amount to a loss of
> information, besides being ugly. There are a few AGL names that are
> just wrong (and there are also a few Unicode names that are just wrong
> too). These things happen. To be honest, more names in FreeFont are
> just wrong, than either the AGL or Unicode. But they're *our* names,
> and have other advantages too (brevity, etc). But again, the
> functioning of outside code must not depend on glyph names. Keep me
> thinking, Bob!

I'm curious what possible "loss of information" you might be aware of.

I assume you understand that you can use one set of glyph names during
font development and then, prior to packaging, change the names to be
AGL conformant?  This is a pretty common practice in the industry.

The focus of my argument all along has not been what names you use for
development -- no one cares about that at all -- but it is what gets
delivered for end-users that, imo, should conform to existing standards.

Bob

PS: This *whole* very long thread started out from a bug report that
"New glyphs are unixxxx while other glyphs are uniXXXX" -- I don't think
you can argue that one is less ugly than the other, but I can argue that
one has less information than the other (from the perspective of
processes based on Adobe standards).

Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

Steve White-12
Hi,

As Bob points out at the end of his post, this started with a problem report.
I still have not yet received a test case, and have not seen the
problem myself,
nor have I been enlightened as to how the problem is manifested.
For all I know, a bug exists in that software, and its authors should
be alerted.

Please send me a test case.

Bob, I answered a couple of your direct questions below.
(Only one of those contains new information, unfortunately.)

As you pointed out, we're going in circles.  I'm afraid we've become entrenched.

Instead of pursuing this lengthening exchange, I propose this:
I'm already writing up for myself a summary of the issues, and the pros/cons
of the proposed remedies.  Give me a little time; I'll send it to you
for revision.
When we're both satisfied, we could post that here.

For myself, I need to focus on what is best for our users and for the project.
It may well be that something in the direction of your proposals is in order.
I am still thinking about it.

I have to say, I'm still bucking and kicking -- the AGLFN is a poor idea, that
requires overbearing restrictions on fonts in order to function at all,
and doesn't deliver what it promises.  There is a better way.

(more below)

On Wed, Jun 26, 2013 at 4:59 PM, BobH <[hidden email]> wrote:

> On 2013-06-25 at 18:15 Steve White stevan.white-at-gmail.com |OpenType
> stuff| wrote:
>>>
>>> On 2013-06-25 at 9:12 you wrote:
>>>
>> I'm talking about font generation by FontForge.
>> I find no way to *remove* the glyph names.
>> Only ways to replace them with other sets of glyph names.
>> Am I missing something?
>
>
> I think we're on a rabbit trail here (because the OpenType spec recommends
> that ttf-flavored fonts should have glyph names) but if you wanted to pursue
> this then you need to see if there are settings in FontForge that will cause
> it to generate a Format 3 post table. If not, you can use any number of font
> hacking tools to change the post table. If you want help I can show you how
> to use the Font::TTF Perl module to do this.
>
I'd be glad to look at your scripts!

>>> so I'm
>>> trying to understand what process(es) you believe do require names.
>>>
>> ??  Very few. Have you misunderstood me.
>
>
> I take this to mean that you use the names for development process but
> beyond that don't think the names have purpose.
>
> However, I suspect that neither of us know whether names are essential to
> correctly render a font on all variety of platforms/drivers/printers that
> might be required.
>
My conclusion was: one glyph name is as good as another for the
primary purpose of text display.
A general "flattening* of the names could be performed, which could
even save a tiny fraction of
file size (no I don't care about that!).
On the other hand... I see no compelling reason to do that.  And
FontForge doesn't provide an
option like that.

Regarding supporting all platforms and software:
I regard software that relies on glyph names as being buggy (outside
of font-inspection tools).
We cannot and should not support all bugs in all software and all printers.

The PDF scheme for text copying (without embedded Unicode text) *does*
use glyph names, and I regard this as a cludge and a disaster.
It was a bad idea, it doesn't work as advertized.  We do not support it.

>> It's the AGLFN names I don't like. They amount to a loss of information,
>> besides being ugly. There are a few AGL names that are just wrong (and there
>> are also a few Unicode names that are just wrong too). These things happen.
>> To be honest, more names in FreeFont are just wrong, than either the AGL or
>> Unicode. But they're *our* names, and have other advantages too (brevity,
>> etc). But again, the functioning of outside code must not depend on glyph
>> names. Keep me thinking, Bob!
>
>
> I'm curious what possible "loss of information" you might be aware of.
>
The AGLFN assumes that each un-encoded glyph corresponds to a specific
Unicode sequence.

However, a font can map multiple Unicode strings to the same glyph.
Imposing any particular AGLFN sequence to the glyph therefore results
in a loss of information.

Such many-to-one mappings commonly happen in Indic scripts, but a font
designer might choose to apply a trick involving such a mapping, in
any script.

Sure, font designers could avoid use of all such mappings.  This would
be a pity.
The AGLFN was a bad solution to the problem in the first place, a
solution that itself caused problems.
To force all fonts to compensate to its weaknesses is to put a bandage
on top of a bandage.

> I assume you understand that you can use one set of glyph names during font
> development and then, prior to packaging, change the names to be AGL
> conformant?  This is a pretty common practice in the industry.
>
Yes I know the name can be changed at font generation.

But I have objections to using the Adobe names generally.

> The focus of my argument all along has not been what names you use for
> development -- no one cares about that at all -- but it is what gets
> delivered for end-users that, imo, should conform to existing standards.
>
We do not conform to all standards: that would not make sense.  We
have to choose standards carefully.
We conform to standards that we consider useful: Unicode, OpenType, etc.
Again, we regard the AGL and AGLFN as poor idea, and we largely ignore it.

> PS: This *whole* very long thread started out from a bug report that "New
> glyphs are unixxxx while other glyphs are uniXXXX" -- I don't think you can
> argue that one is less ugly than the other, but I can argue that one has
> less information than the other (from the perspective of processes based on
> Adobe standards).
>
That's right.

An aesthetic argument would have resolved the issue of the
capitalization of hex values immediately.
And I may well go ahead and deal with that anyway.

However a bug appearing in other software was also mentioned, so I
requested to see a test case.
This has not been forthcoming.

Then you proposed that we implement the AGLFN, and most of this
discussion has followed.

From my perspective, progress is being made.
But actually happens in the end might not be what any of us imagined at first.

Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

Steve White-12
In reply to this post by BobH-5
Hi,

I should correct some statements I made before.  This draws mostly
from the PDF reference, v. 1.7.

I maintain that the relation between glyphs and Unicode sequences
supposed by the AGLFN is a bad idea and essentially broken.  That
said, however:

1) My reference to "embedding Unicode text" seems to have been a trick
of my imagination.  It seems to be a conflation of the PDF technology
of embedded fonts with that of embedded documents.  However, after
looking for several evenings now,  I can't find anything like the idea
that seemed so clear to me:  Unicode text embedded in PDF with
references from positioned glyphs into that text.  (I didn't *think* I
was making this up -- I seemed vividly to remember reading about it,
and even doing it! Disturbing, yes.)

Sure, embedded text would be a working solution, but it seems not to
have been implemented at all in current PDF technologies.

2) While the existing PDF "ToUnicode" mapping is incapable of
reproducing the original text, and while it may mangle text in complex
scripts such as Indic ones, it seems to be the *only* technology
existing for extracting text from a PDF document.

Furthermore, in most cases, for simple alphabetic scripts anyway,  the
text produced by the "ToUnicode" mapping would  usually be meaningful.
 For more complex scripts, the worst-case scenario occurs, but rarely:
most sentences produced would be readable.

That is, if the whole system really works as advertized.

3) The PDF "ToUnicode" stream could or should be produced by the font
layout engine based on font table entries.  Evidently the incorrect,
restrictive and ugly AGLFN is the way current  PDF software is
supposed to get the info to populate the "ToUnicode" entries.

I'm still working on a summary of the issues and technologies.

The AGLFN (unfortunate though it is) represents the only way currently
proposed to effect the secondary, but important, function of copying
text from a rendered PDF document. So I'm now working on a way to
apply AGLFN names to FreeFont auxiliary glyphs at build time.

The next question is:
    Does it work for our users?
If the font's auxiliary glyph names are made as those specified by
Adobe, and standard Unix/Linux tools are used to create a PDF with the
font embedded, will the auxiliary glyphs be (somehow) converted to
text?

Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

BobH-5
On 2013-07-03 at 5:48 Steve White wrote:
I'm still working on a summary of the issues and technologies.

You are now well ahead of me in understanding the PDF technologies in question, but in case it helps:

According to this reply on Typophile, one of the problematic situations is when a PDF is created from a postscript file without access to the original font.

Bob
Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

Steve White-12
Hi all!

I'm currently experimenting with text copying from PDF files
* created by various programs
* using various fonts using different naming schemes.

So far, the results are erratic, but
* nothing created on Linux copies Hindi text correctly from PDF files.

This is looking very bad so far.  Not that I'm surprised--I'm only
surprised by *how* catastrophic it is.
I'll post what i find here shortly.

Please, guys, help me out here...

** What exactly is the purpose of the AGLFN standard, if it is not to
copy text from PDF files?

** Is there some other use case that I'm missing?

Reply | Threaded
Open this post in threaded view
|

Re: New glyphs are unixxxx while other glyphs are uniXXXX.

James Cloos-9
>>>>> "SW" == Steve White <[hidden email]> writes:

SW> So far, the results are erratic, but
SW> * nothing created on Linux copies Hindi text correctly from PDF files.

The proper solution for text extraction from PDF files, especially for
complex and/or r2l scripts, is for the PDF creator to include ActualText
objects in the PDF.

Nothing else can work for all scripts.

Cf §10.8.3 Replacement Text in PDFReference17.pdf; the same is §14.9.4
   in PDF32000_2008.pdf.

-JimC
--
James Cloos <[hidden email]>         OpenPGP: 1024D/ED7DAEA6