mhfixmsg: possible bugette, -textcharset/-replacetextplain questions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

mhfixmsg: possible bugette, -textcharset/-replacetextplain questions

Conrad Hughes
A few questions re: mhfixmsg (nmh 1.7.1-4 on Debian)..

  - I'm contemplating running the above command on my entire email
    archive; is there any reason not to use "-textcharset utf-8" on
    everything?  Seems to me like an eminently sensible thing to do on
    the face of it, as without it trying to read emails containing
    (heaven forfend) mixed encodings is asking for trouble.  Think
    that's been mentioned here before as a source of headaches.

  - Similarly I was wondering about adding -replacetextplain to all
    conversions, but I'm kindof thinking that that's not so smart, nor
    so useful — since simply running mhfixmsg will render stuff usefully
    grep'able, the UTF-8 conversion will make the files more reliably
    readable, and 'show' will prefer HTML parts anyway so the
    replacetextplain doesn't really give me anything useful.  Is that
    right?

  - The attached message contains some Windows-1252 parts, yet when I
    try to "mhfixmsg -textcharset utf-8 -verbose" on it, I get the
    following:

      mhfixmsg: 1 part 1.2, decode text/plain; charset="Windows-1252"
      mhfixmsg: 1 part 1.1, decode text/html; charset="Windows-1252"
      mhfixmsg: 1 part 1.2, convert utf-8 to utf-8
      mhfixmsg: 1 part 2, convert utf-8 to utf-8

    .. "convert utf-8 to utf-8" looks like a reporting bug, no?  Should
    be "convert Windows-1252 to utf-8"?  The conversion from 1252 *is*
    actually performed.

Conrad

From: [hidden email]
To: [hidden email]
Subject: Blah
Date: Tue, 15 May 2018 08:50:13 +0000
Message-ID: <[hidden email]>
Content-Type: multipart/mixed; boundary="----------=_1526374215-10656-57"
MIME-Version: 1.0
Content-Transfer-Encoding: binary

This is a multi-part message in MIME format...

------------=_1526374215-10656-57
From: [hidden email]
To: [hidden email]
Subject: Blah
Date: Tue, 15 May 2018 08:50:13 +0000
Message-ID: <[hidden email]>
Content-Type: multipart/alternative;
        boundary="_000_VI1PR0501MB2832D910EEB111D8526EDDEEAC930VI1PR0501MB2832_"
MIME-Version: 1.0

--_000_VI1PR0501MB2832D910EEB111D8526EDDEEAC930VI1PR0501MB2832_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

Dear All,

'Do join us if you can.=92

--_000_VI1PR0501MB2832D910EEB111D8526EDDEEAC930VI1PR0501MB2832_
Content-Type: text/html; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

<html xmlns:o=3D"urn:schemas-microsoft-com:office:office" xmlns:w=3D"urn:sc=
hemas-microsoft-com:office:word" xmlns:m=3D"http://schemas.microsoft.com/of=
fice/2004/12/omml" xmlns=3D"http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DWindows-1=
252">
</head>
<body>
<p>'Do join us if you can.=
=92</p>
</body>
</html>

--_000_VI1PR0501MB2832D910EEB111D8526EDDEEAC930VI1PR0501MB2832_--

------------=_1526374215-10656-57
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
Content-Description: dot sig

A dot sig

------------=_1526374215-10656-57--

Reply | Threaded
Open this post in threaded view
|

Re: mhfixmsg: possible bugette, -textcharset/-replacetextplain questions

David Levine-3
Conrad writes:

> A few questions re: mhfixmsg (nmh 1.7.1-4 on Debian)..
>
>   - I'm contemplating running the above command on my entire email
>     archive; is there any reason not to use "-textcharset utf-8" on
>     everything?  Seems to me like an eminently sensible thing to do

Yes, and that's one of the examples in the mhfixmsg(1) documentation.
It's not the default because its use requires that nmh be built with
iconv(3).

>   - Similarly I was wondering about adding -replacetextplain to all
>     conversions, but I'm kindof thinking that that's not so smart, nor
>     so useful — since simply running mhfixmsg will render stuff usefully
>     grep'able, the UTF-8 conversion will make the files more reliably
>     readable, and 'show' will prefer HTML parts anyway so the
>     replacetextplain doesn't really give me anything useful.  Is that
>     right?

Yes.  The motivating use case is messages that contain an empty
text/plain part.  To save Ralph the trouble of replying that the
sender should be notified so that they can fix their messages, I've
been trying for over 6 years now.

>   - The attached message contains some Windows-1252 parts, yet when I
>     try to "mhfixmsg -textcharset utf-8 -verbose" on it, I get the
>     following:
>
>       mhfixmsg: 1 part 1.2, decode text/plain; charset="Windows-1252"
>       mhfixmsg: 1 part 1.1, decode text/html; charset="Windows-1252"
>       mhfixmsg: 1 part 1.2, convert utf-8 to utf-8
>       mhfixmsg: 1 part 2, convert utf-8 to utf-8
>
>     .. "convert utf-8 to utf-8" looks like a reporting bug, no?

Right, fixed, thanks!

David

Reply | Threaded
Open this post in threaded view
|

Re: mhfixmsg: possible bugette, -textcharset/-replacetextplain questions

Ken Hornstein-2
In reply to this post by Conrad Hughes
>  - I'm contemplating running the above command on my entire email
>    archive; is there any reason not to use "-textcharset utf-8" on
>    everything?  Seems to me like an eminently sensible thing to do on
>    the face of it, as without it trying to read emails containing
>    (heaven forfend) mixed encodings is asking for trouble.  Think
>    that's been mentioned here before as a source of headaches.

I am not sure what you mean, because (assuming your nmh is built with
iconv which it almost certainly is) assuming you have a "normal" nmh
setup and you don't do anything weird in your profile, character set
conversion to your native character set should "just work".  Also,
I don't know what you mean by "mixed encoding", unless you mean two
different text/plain parts in the same message with different encodings.
That is possible, but I have never seen it (well, maybe I have, because
I wouldn't notice it, but I think it would be hard to generate unless
you did it explictly).

People who have problems with emails in different character sets seem to
be (in my experience) mostly self-inflicted problems.  The two big ones
are "I set some stuff in my profile 25 years ago to deal with different
character sets and it turns out it doesn't work right for everything",
and "I set my locale to 'C' but I really want everything output as
UTF-8".  I don't really understand the latter one, but more than one
person had that setup (I never really saw a comprehensible explanation
as to why things were that way).

We used to deal with character set conversion failures poorly; that
was a huge problem for the "I want to display UTF-8 but keep my locale
character set 'C'" crowd.  We now recover and insert a substitution
character.

If you want to convert everything to UTF-8 that is of course your business,
I am just pointing out that show(1) should work fine on any message that
mhfixmsg can process.

--Ken