[nmh-workers] Formatting HTML to Text: netrik.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
56 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[nmh-workers] Formatting HTML to Text: netrik.

Ralph Corderoy
Hi,

Revisiting once again the issue of nice text from horrible HTML emails,
I found another textual web browser, like lynx(1), etc., but
http://netrik.sourceforge.net/ uses colour in its --dump output.
It doesn't list the URL destinations though, e.g. the noisy `[42]' that
links prefixes to the anchor's text.  I also haven't looked into what
remote accesses it may make.

--
Cheers, Ralph.

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Steffen Nurpmeso
Ralph Corderoy wrote in <[hidden email]>:
 |Revisiting once again the issue of nice text from horrible HTML emails,
 |I found another textual web browser, like lynx(1), etc., but
 |http://netrik.sourceforge.net/ uses colour in its --dump output.
 |It doesn't list the URL destinations though, e.g. the noisy `[42]' that
 |links prefixes to the anchor's text.  I also haven't looked into what
 |remote accesses it may make.

You could try to take out the very simple and extremely primitive
HTML filter code of the MUA i maintain and embed it into nmh.  It
only uses standard POSIX C interfaces, so it should not be too
hard.  I do not use anything else for the few HTML things i get,
i have never seen it fail.  (I have test mails around which count
as terror of a pathological maniac.  The only one where i can see
some garbage is actually from the german computer magazine c't,
and it looks like

  [-- #1.2 1007/75296 text/html, quoted-printable, utf-8 --]

  96

  Silex: Neue Malware legt schlecht gesicherte Geräte im Internet of Things still -------------------------------------------------- Die Malware Silex kapert mit Default-Credentials IoT-Geräte, um sie
  lahmzulegen. Ihr 14-jähriger Entwickler handelt offenbar aus Spaß. [%p(2,2)]» https://www.heise.de/security/meldung/Silex-Neue-Malware-legt-schlecht-gesicherte-Geraete-im-Internet-of-Things-still-
  4455677.html[%hr(=)]

Wherever "96", "%p(2,2)" and "%hr(=)" come from and whatever they
are, i never looked into this.  I only ever see this from these
guys.  But WWF, Naturschutzbund and Conversation work for years,
as well as some german shops, and Change.org and a lot of other
mails in the HTML test box do, too.
Guaranteed no loading of external data, anyway.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Ronald F. Guilmette-2
In reply to this post by Ralph Corderoy
In message <[hidden email]>,
Ralph Corderoy <[hidden email]> wrote:

>Revisiting once again the issue of nice text from horrible HTML emails...

And here, all this time, I thought that it was just me!  I assumed
that I had just failed to read the documentation well enough or long
enough to work out some appropriate was of causing nmh to properly
render the (now prevalent) HTMLized and/or base64 encoded emails of
the modern era.

I had just sort-of jury-rigged my own very local and very idiosyncratic
mechanism for dealing with the problem/issue some years ago, and I have
been just trying to struggle along with it for all this time because
I just haven't had the time to find a proper fix... which I've always
assumed is burred in the documentation someplace.  But if there really
is no good solution in the case of nmh, then I guess that eventually...
and proabbly sooner rather than later... I'm going to have to switch
mail clients at long last, although I sure will miss some of the nicer
nmh features.

Quite simply, all I would wish for would be something that would -properly-
convert -both- HTMLized emails -and- "Content-Transfer-Encoding: base64"
emails (like this one I'm responding to) into good old fashioned ascii,
at least for purposes of the "show" and "repl" commands.  I have my
jury-rigged solution working adequately well for the base64 encoding
still, but only for the "show" command, which means that I have to do
some manual cutting-and-pasting when/if I want to reply to a base64
encoded email. :-(


Regards,
rfg

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

nmh-workers mailing list
Hi,

Ronald F. Guilmette wrote:
> Quite simply, all I would wish for would be something that would -properly-
> convert -both- HTMLized emails -and- "Content-Transfer-Encoding: base64"
> emails (like this one I'm responding to) into good old fashioned ascii,
> at least for purposes of the "show" and "repl" commands.  I have my
> jury-rigged solution working adequately well for the base64 encoding
> still, but only for the "show" command, which means that I have to do
> some manual cutting-and-pasting when/if I want to reply to a base64
> encoded email. :-(

mhfixmsg does this pretty well I think, you just have to use it in
your procmail rules. It will convert the messages to plain ascii (from
base64) as well as convert html to text and add it as plain/text section
in the mime. I use the following:

mhfixmsg-format-text/html: charset="%{charset}"; /usr/bin/w3m -I ${charset} -T text/html -dump

This works in combination with replyfilter (a perl script distributed
with nmh or used to be) which nicely puts only the plain text section in
the reply for you (I am not sure if nmh would do this, I don't see why
not).

All these in combination you end with a reasonable reply to HTML emails.
The downside is that you don't get to keep the original email unless you
make a copy of it and it's fairly hacky.

Regards,
spaceman

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Ronald F. Guilmette-2
In message <[hidden email]>,
spaceman <[hidden email]> wrote:

>All these in combination you end with a reasonable reply to HTML emails.
>The downside is that you don't get to keep the original email unless you
>make a copy of it and it's fairly hacky.

Thanks for all the tips, but this is a non-starter for me.  I need to
preserve originals.

I would think there should be some way of doing that *and* getting
nicely TEXTified emails, no?


--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Michael Richardson-5
In reply to this post by Ronald F. Guilmette-2

Ronald F. Guilmette <[hidden email]> wrote:
    > Quite simply, all I would wish for would be something that would -properly-
    > convert -both- HTMLized emails -and- "Content-Transfer-Encoding: base64"
    > emails (like this one I'm responding to) into good old fashioned ascii,
    > at least for purposes of the "show" and "repl" commands.  I have my
    > jury-rigged solution working adequately well for the base64 encoding
    > still, but only for the "show" command, which means that I have to do
    > some manual cutting-and-pasting when/if I want to reply to a base64
    > encoded email. :-(

I would also like that for the cases where I want to use show.
I use mh-e, and I mostly have things configured right:
  1) use text/plain if it exists.
  2) format text/html is no text/plain
  (3) but often text/plain is bullshit-pseudo-HTML and you need to avoid it.

The additional problem is that reply yanks text from formatted text/html
rather than text/plain.  The good HTML formatters in mh-e (Emacs) are slow,
and the fast ones do a poor job. I would rather show did things and mh-e used
that, but too many moving parts.

--
]               Never tell me the odds!                 | ipv6 mesh networks [
]   Michael Richardson, Sandelman Software Works        |    IoT architect   [
]     [hidden email]  http://www.sandelman.ca/        |   ruby on rails    [



--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

signature.asc (497 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Ken Hornstein-2
In reply to this post by Ronald F. Guilmette-2
>I had just sort-of jury-rigged my own very local and very idiosyncratic
>mechanism for dealing with the problem/issue some years ago, and I have
>been just trying to struggle along with it for all this time because
>I just haven't had the time to find a proper fix... which I've always
>assumed is burred in the documentation someplace.  But if there really
>is no good solution in the case of nmh, then I guess that eventually...
>and proabbly sooner rather than later... I'm going to have to switch
>mail clients at long last, although I sure will miss some of the nicer
>nmh features.

Geez, I thought we handled that pretty well.

For base64 emails, we should handle that fine for display, full stop.

If you are building from source, _if_ you have one of the common text-based
HTML browsers available (I use w3m, but we also support lynx and elinks),
then show(1) does the right thing, and pretty reasonably I would say.  If
you are installing from an OS package ... well, what happens there varies.
At least for MacOS X, it will use w3m (this all is configured at install
time).  But if you've put your own entry in for mhshow-show-text/html, then
you wouldn't see that.  We've done that since 1.6 (released in 2014).

As for repl(1), we've shipped with replyfilter in $(DOCDIR)/contrib
since 1.5 (released in 2012), and I use it on every message.  I won't
say it's perfect, but 95% of the time it does the right thing for me,
and handles HTML and base64 email just fine.  Unfortunately without a
complete rewrite of mhl it was hard to put it in there transparently, so
it does require extra configuration (see the comments at the top of
replyfilter).  We talked about this a LOT on the mailing list during it's
development, and here's the snippet from the NEWS file:

- Preliminary support for improved MIME handling when replying to messages!
  Yes, a long requested feature has a solution.  A perl script
  called replyfilter is available; it is designed to act as a mhl
  external filter to process MIME messages in a more logical way.
  It is available in $(srcdir)/docs/contrib/replyfilter or is
  typically installed as $(prefix)/share/doc/nmh/contrib/replyfilter.
  See the comments at the top of replyfilter for usage information;
  it will likely require some adjustment for your site.  replyfilter
  requires the MIME-Tools and MailTools perl modules.

So, I hope it doesn't seem like I'm crapping on you, because you're
not the first person who is unaware of replyfilter (you can search the
mailing list archives for people who have asked this question before,
and I can write this email in my sleep now).  I thought we did a pretty
good job of putting this information out there, but clearly in the 7
years since 1.5 has come out everyone hasn't gotten the word yet.  So my
question to YOU is ... what could we have done better to let you know?

(Yes, in a perfect world 'repl' would just do the right thing automatically
without any extra configuration, but that's a heavy lift).

--Ken

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Conrad Hughes
Ken> As for repl(1), we've shipped with replyfilter in $(DOCDIR)/contrib
Ken> since 1.5 (released in 2012), and I use it on every message.

Just to say thanks for your patience on this: has been bugging me for a
while, and finally just sitting down and unpacking replfilter made a
huge difference.  Two minor issues cropped up for me while setting it up
though; these may already have been addressed (I use Debian, so am on
nmh 1.6-16), but just in case:

  - replfilter depends on 'par', which users may not have installed, but
    the failure mode in its absence is suboptimal: the quoted text is
    absent, and the error shown on the command line is something like

      Pipe reader process exited with 72057594037927935

    .. could well be enough just to list the required support tools in
    the instructions at the start of replfilter.

  - Reading the manpage I see that .mh_profile comments are supposed to
    start '#:', but lines starting '#' give every practical appearance
    of working as comments too — I've <ahem> wrongly used that for years
    — except that if you have two of them in a row, nmh commands fail
    with "no blank lines are permitted in .mh_profile".  The error at
    least seems confused.  Is the colon in '#:' just for easy parsing?
    Would it break things for # to introduce a comment?

Finally, I'd almost be inclined to have nmh-without-replfilter display a
message about replfilter, for example maybe in whatnowproc, so after
grinding your teeth about the undecoded base64 you at least see a
message suggesting a remedy for this after exiting the editor.  I
realise though that accurate detection of circumstances where it would
be helpful to display such a message might not easy, but it would save a
certain amount of repetition.

Conrad

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Ken Hornstein-2
>  - replfilter depends on 'par', which users may not have installed, but
>    the failure mode in its absence is suboptimal: the quoted text is
>    absent, and the error shown on the command line is something like
>
>      Pipe reader process exited with 72057594037927935
>
>    .. could well be enough just to list the required support tools in
>    the instructions at the start of replfilter.

Fair enough; it's down a bit farther, but putting a bit at the top saying
you might want to look at that (and the HTML converter) would be helpful.
You could also use "fmt", but I've found par tends to work better.

>  - Reading the manpage I see that .mh_profile comments are supposed to
>    start '#:', but lines starting '#' give every practical appearance
>    of working as comments too — I've <ahem> wrongly used that for years
>    — except that if you have two of them in a row, nmh commands fail
>    with "no blank lines are permitted in .mh_profile".  The error at
>    least seems confused.  Is the colon in '#:' just for easy parsing?
>    Would it break things for # to introduce a comment?

Weeeellll.... yes.

You may notice that on closer inspection that mh-profile actually looks
a lot like a message header, in that no blank lines are permitted and
it consists of a header field name ('name:') and header field text.
This is not a coincidence.  The same routine used for parsing message
headers is also used to parse the profile (and context files, and
message sequence files ... sigh).

So we'd have to either introduce some special-case code during profile
parsing, or change the email parser code (ugh).  Both of these are hard;
the function used during message parsing (m_getfld()) really takes over
the input stream and does a fair amount of caching for efficiency, so
you can't easily look at the input stream and say, 'Oh, this starts with
a #, skip this line'.  And for changing m_getfld() ... well, take a look
at it sometime and tell me if YOU want to mess with it.  Welcome to
how the sausage is made :-/

What you're doing when you put in '#:' is creating a special profile
entry called '#' which is not used for anything, but it looks like a
header field just enough that m_getfld() is happy.  I forget who
pointed this out on the mailing list many years ago, but the easiest
solution was just to document the current behavior.

I have been playing around with a flex-based email header parser; if
that gets working it would be very easy to create a slight modification
that handled '#' based comments for things like the profile.  That
would be part of my "full MIME parsing" work and wouldn't be done for
a while.

>Finally, I'd almost be inclined to have nmh-without-replfilter display a
>message about replfilter, for example maybe in whatnowproc, so after
>grinding your teeth about the undecoded base64 you at least see a
>message suggesting a remedy for this after exiting the editor.  I
>realise though that accurate detection of circumstances where it would
>be helpful to display such a message might not easy, but it would save a
>certain amount of repetition.

I am sympathetic to that idea ... the problem is that requires making
mhl be smarter about what is and isn't a MIME body.  Right now mhl knows
nothing of MIME, it just sees the message body as one big text blob.
Making it do MIME is a lot of work.  In a perfect "full MIME parsing"
world it would just DTRT and replyfilter could vanish.  There are sadly
no wonderful solutions.

--Ken

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Ronald F. Guilmette-2
In reply to this post by Ken Hornstein-2
In message <[hidden email]>,
Ken Hornstein <[hidden email]> wrote:

>>I had just sort-of jury-rigged my own very local and very idiosyncratic
>>mechanism for dealing with the problem/issue some years ago, and I have
>>been just trying to struggle along with it for all this time because
>>I just haven't had the time to find a proper fix... which I've always
>>assumed is burred in the documentation someplace.  But if there really
>>is no good solution in the case of nmh, then I guess that eventually...
>>and proabbly sooner rather than later... I'm going to have to switch
>>mail clients at long last, although I sure will miss some of the nicer
>>nmh features.
>
>Geez, I thought we handled that pretty well.

Apologies if I offended, unintentionally. I should have gone further to
clarify that I'm quite completely sure that all of the issues I've
encountered are due either to my own ignorance, or to the fact that
I had jury-rigged my personal nmh install, long long ago, approximately
in the bronze age, and I've just never had the time to re-do any of
this stuff properly, or even to -learne- the specifics of what I have
been doing wrong, let alone the "new" and proper way of doing things.
(I';ve been using nmh and its predecessor nmh since about 1982, so that
will give you some idea of the fact that I'm set in my ways, more than
a little bit.)

>For base64 emails, we should handle that fine for display, full stop.

Perhaps, but for display, since so much of the email I've received for
low these many years now has been either partially or totally HTMLized
crap, *and* because I have some rather specialized needs, long ago I
invented and implemenmted my own personal quirky little "solution",
which involved running HTMLized MIME parts thrugh the linx browser to
render them as plain text.

I freely admit that it was most probably a *really* bad idea for me to
have strayed so far from the beaten parth, but I do a lot of anti-spam
work, and since the dawn of time this case caused me to insist on always
seeing the FULL HEADERS of each and every email message I read.  (And
I'm sure this was part of what motivated me to go down this route, although
I have paid for it in daily agony ever since.)

So anyway, here's what I have now, and you're all invited to tell me what
obvious garbage it is, and how I should be doing things The Right Way...

For starters, I have in my ~/.mh_profile the following lines:

showproc: mtext
showmimeproc: mtext

On my system "mtext" is a trivial Perl script I wrote that simply
figures out the full pathname of the "current" NMH message and then
passes that to a different small Perl script I wrote (called "textualize")
that does most of the heavy lifting.  Here are those two short Perl scripts:

    https://pastebin.com/raw/XgqbekaZ
    https://pastebin.com/raw/EMSgdXaD

Together, these two work well enough to show me what I want to see when I do
"show", but the whole mess breaks down badly (and is very neraly entirely
useless) when I try to "repl" any message.

>If you are building from source, _if_ you have one of the common text-based
>HTML browsers available (I use w3m, but we also support lynx and elinks),
>then show(1) does the right thing, and pretty reasonably I would say.

OK, well, but as noted above, I've screwed things up, badly, so what should
I *actually* have in my .mh_profile file for the definitions of showproc
and showmimeproc?  what are the current (proper) defaults for those?  And
do the defaults have options to show me FULL headers for each message?
If not, how do I make that happen?

>If you are installing from an OS package ... well, what happens there varies.
>At least for MacOS X, it will use w3m (this all is configured at install
>time).  But if you've put your own entry in for mhshow-show-text/html, then
>you wouldn't see that.  We've done that since 1.6 (released in 2014).

Here is my current, much fiddled ~/.mh_profile.  Tell me -everything- I
should fix and I'll fix it all:

   https://pastebin.com/raw/PEMJ06VH

(Yes, this .mh_profile has been reused and recycled and fiddled and ignored
ad infinitum from literally EONS ago.  In you see anything in there that
hasn't even been supported for 10+ years, don't be surprised.)

>As for repl(1), we've shipped with replyfilter in $(DOCDIR)/contrib
>since 1.5 (released in 2012), and I use it on every message.  I won't
>say it's perfect, but 95% of the time it does the right thing for me,
>and handles HTML and base64 email just fine.

I've just upgraded everything, so I'm on a fresh new FreeBSD 12.0 system
with a fresh and shiny new nmh-1.7.1 pre-built package installed, so I do
believe that I have everything needed in order to get rolling.

Checking now, i see that I *do* have the following:

   /usr/local/share/doc/nmh/contrib/replyfilter

>Unfortunately without a
>complete rewrite of mhl it was hard to put it in there transparently, so
>it does require extra configuration (see the comments at the top of
>replyfilter).  We talked about this a LOT on the mailing list during it's
>development, and here's the snippet from the NEWS file:
>
>- Preliminary support for improved MIME handling when replying to messages!
>  Yes, a long requested feature has a solution.  A perl script
>  called replyfilter is available; it is designed to act as a mhl
>  external filter to process MIME messages in a more logical way.

So basically, you guys hacked together a solution, sort of like I did!
That's OK.  I'm 100% sure that your solution is WAY better than my crappy
and ham-fisted one.

(I'm just real glad that you folks did your's in Perl, cuz that happens to
be one of the few languages that I actually speak somewhat. It would all
have been utterly Greek to me if it was in Python or Ruby or Go or something.)

I'm looking at the comments at the top of the replyfilter Perl script.
I'll try to just do everything it tells me to do.  Wish me luck!  If you
never hear from me again (because I have screwed up and totally broken
my email entirely) please tell my family that I said I love them. :-)

>  It is available in $(srcdir)/docs/contrib/replyfilter or is
>  typically installed as $(prefix)/share/doc/nmh/contrib/replyfilter.

I guess that I should make an exexcutable copy of that in /usr/local/bin
alongside show and repl and the rest of the gang, yes?  It doesn't seem
quite right to just be executing it from a /docs/ directory.

>  See the comments at the top of replyfilter for usage information;
>  it will likely require some adjustment for your site.

I don't follow.  What will need adjusting, exactly?

>  replyfilter
>  requires the MIME-Tools and MailTools perl modules.

In FreeBSD parlance, that would be "p5-MIME-Tools" and "p5-Mail-Tools".
I have just successfully installed both, so I'm good.

>So, I hope it doesn't seem like I'm crapping on you, because you're
>not the first person who is unaware of replyfilter (you can search the
>mailing list archives for people who have asked this question before,
>and I can write this email in my sleep now).

Sir, you mistake me for someone who gives a crap if someone craps on me in
a mailing list!  I don't.  I have thick skin and I'm just damned greatful
for the help.  I came here perfectly prepared to admit my abundant ignorance,
which I believe I have done.  If not, allow me to do so now.  I'm ignorant.
I've always felt that ignornace isn't something to be at all ashamed of.
There's a BIG difference between ignorance, which can be easily cured, and
stupidity, which can't be.

>I thought we did a pretty
>good job of putting this information out there, but clearly in the 7
>years since 1.5 has come out everyone hasn't gotten the word yet.

Um.... I don't suppose it will help my case any of I were to say something
like "I was busy, and had a LOT of stuff to do", would it?  (But that's
actually true, and I hope that I've done some Good Things for the Internet
and for humanity in that time.)

>So my
>question to YOU is ... what could we have done better to let you know?

See above.  Not your fault.  Nobody's fault in fact.  I am just Rip Van
Winnkle with respect to {N}MH.  I've continued to use it, year after year
after year, even while never taking the time to catch up on recent
developments... even ones that I personally could greatly benefit from.

If you do figure out a way... or if you have figured out a way... to pack
25 hours into every day and if you then neglect to tell me about it, THEN
I will have something to blame you for, but not until then.  Till then,
the fault is all on my end.

>(Yes, in a perfect world 'repl' would just do the right thing automatically
>without any extra configuration, but that's a heavy lift).

I could talk a LOT about this, but probably shouldn't here.

I just "upgraded" two of my systems here.  The wall-clock time needed in
each case for me to get -everything- that I previously had running back
and perfectly configured and running again is shown below:

     Ubuntu  176.04 -> 18.04  (Time: approx 1.5 days.)
     FreeBSD 9.1 -> 12.0      (Time: approx 2.5 WEEKS)

The disparity here has to do with several factors:

    1) MUCH shorer "upgrade gap" in the case of Ubuntu (many YEARS different)

    2) In the FreeBSD case, that system is *both* my main desktop *and*
       the central file & mail server for my whole home network, so there
       were a lot more little things that had to be made to work... again.

    3) Ubuntu has a for-profit corp, working night and day to sandpaper down
       all of the rough edges, especially when it comes to using the OS as
       a desktop system.  FreeBSD doesn't have this.  It's the absolute
       best for -any- "server" type application, but as a desktop, not so
       much.

My point is that having "repl" just "do the right thing", 100% automagically,
is a nice goal, but nmh is not a commercial product of anybody.  So some
allowances can and should be made.

I, for one, am just damn glad the thing works at all.


Regards,
rfg

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Ronald F. Guilmette-2
In reply to this post by Conrad Hughes
In message <E1hgeXd-0003He-PG@sleekit>,
Conrad Hughes <[hidden email]> wrote:

>  - replfilter depends on 'par', which users may not have installed, but

Ummmm... I don't have that either.  What is it and where do I get it?

>Finally, I'd almost be inclined to have nmh-without-replfilter display a
>message about replfilter, for example maybe in whatnowproc, so after
>grinding your teeth about the undecoded base64 you at least see a
>message suggesting a remedy for this after exiting the editor.  I
>realise though that accurate detection of circumstances where it would
>be helpful to display such a message might not easy, but it would save a
>certain amount of repetition.

Seconded.

Some folks... me included... need to be very explicitly knocked upside the
head in order to make sure that we get the message.

Or maybe replfilter should just become part of the (shipped) default
configuration.

Lord knows that well over than 50% of all emails I've received over the past
several years contain either base64 or HTML or both, so that would seem
to make some sense.


Regards,
rfg


P.S.  Speaking as a relic of a now long bygone era (which I am), I really
do wish that all of this base64 and HTNMLized email stuff would get off my
lawn.  (Yes, I'm an old geezer.)

It all annoys me very much, because I know damn well that all of this stupid
HTML stuff... which makes all mails about a factor of ten bigger... isn't
actually carrying any useful additional information that could not have
been represented and expressed just as well with good old plain text.

But I already lost this battle at least a decade or two ago.  Sigh.  Oh well.
We live and we adapt.


--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Jude DaShiell
I'm an old geezer too.  I even go so far as to think people who use
computer mice just by virtue of that use are diminishing their potential
productivity.
In slint and I expect other distributions par and par2cmdline are both
paragraph reformatters.  Slint comes with par already installed by
default.
I need to read up on both since if I can find tools that work better
than fmt and newfmt that will make some of my work easier.

On Thu, 27 Jun 2019, Ronald F. Guilmette wrote:

> Date: Fri, 28 Jun 2019 01:30:12
> From: Ronald F. Guilmette <[hidden email]>
> To: [hidden email]
> Subject: Re: [nmh-workers] Formatting HTML to Text: netrik.
>
> In message <E1hgeXd-0003He-PG@sleekit>,
> Conrad Hughes <[hidden email]> wrote:
>
> >  - replfilter depends on 'par', which users may not have installed, but
>
> Ummmm... I don't have that either.  What is it and where do I get it?
>
> >Finally, I'd almost be inclined to have nmh-without-replfilter display a
> >message about replfilter, for example maybe in whatnowproc, so after
> >grinding your teeth about the undecoded base64 you at least see a
> >message suggesting a remedy for this after exiting the editor.  I
> >realise though that accurate detection of circumstances where it would
> >be helpful to display such a message might not easy, but it would save a
> >certain amount of repetition.
>
> Seconded.
>
> Some folks... me included... need to be very explicitly knocked upside the
> head in order to make sure that we get the message.
>
> Or maybe replfilter should just become part of the (shipped) default
> configuration.
>
> Lord knows that well over than 50% of all emails I've received over the past
> several years contain either base64 or HTML or both, so that would seem
> to make some sense.
>
>
> Regards,
> rfg
>
>
> P.S.  Speaking as a relic of a now long bygone era (which I am), I really
> do wish that all of this base64 and HTNMLized email stuff would get off my
> lawn.  (Yes, I'm an old geezer.)
>
> It all annoys me very much, because I know damn well that all of this stupid
> HTML stuff... which makes all mails about a factor of ten bigger... isn't
> actually carrying any useful additional information that could not have
> been represented and expressed just as well with good old plain text.
>
> But I already lost this battle at least a decade or two ago.  Sigh.  Oh well.
> We live and we adapt.
>
>
> --
> nmh-workers
> https://lists.nongnu.org/mailman/listinfo/nmh-workers

--


--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

David Levine-3
In reply to this post by nmh-workers mailing list
spaceman via nmh-workers wrote:

> mhfixmsg does this pretty well I think, you just have to use it in
> your procmail rules.

mhfixmsg doesn't have to be invoked from procmail.  I do that so that
I can easily grep my messages, but that's certainly not required.

I use the aliases in doc/nmh/contrib/replaliases by sourcing it from
my .bashrc.  Then, for example, to reply to any message to which I
won't be adding any attachments, I run rtm on the message.  If I
will be adding attachments, I use rt.

The downside is that I sometimes end up editing quoted-printable.
If that's unacceptable, I would think that we could come up with
other ways.

> The downside is that you don't get to keep the original email unless
> you make a copy of it

You do get to keep it if you want to, see the example in the mhfixmsg
man page.  Other arrangments for keeping the original could also be
handled, I expect limited only by imagination.

David

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Ken Hornstein-2
In reply to this post by Ronald F. Guilmette-2
>Apologies if I offended, unintentionally.

Not offended at all; I am more lamenting that we didn't do better.

>>If you are building from source, _if_ you have one of the common
>>text-based HTML browsers available (I use w3m, but we also support lynx
>>and elinks), then show(1) does the right thing, and pretty reasonably I
>>would say.
>
>OK, well, but as noted above, I've screwed things up, badly, so what
>should I *actually* have in my .mh_profile file for the definitions of
>showproc and showmimeproc? what are the current (proper) defaults for
>those?  And do the defaults have options to show me FULL headers for
>each message?  If not, how do I make that happen?

Ok, these things are not hard.

First, delete the entries for showproc and showmimeproc; the defaults
for these programs are fine (they are, respectively, mhl and mhshow; you
can look at mh-profile(5) for the exact defaults).

Secondly, I see that in your profile you have:

        mhshow-show-text/html: more '%F'

Which isn't helping things, so delete that also.  I don't know if the
package for nmh on FreeBSD sets things up correctly, but if you look
at mhn.defaults (which should be in the 'etc' directory somewhere) you
might see a line that looks like:

mhshow-show-text/html: charset=%{charset}; %lw3m -dump ${charset:+-I "$charset"} -T text/html %F

If you DON'T, then you could add that to your profile (assuming you are using
w3m) and that should make 'show' mostly do the right thing.

I also see you have:

        mhshow-charset-iso-8859-1: %s

I'd delete that also (we have native support for iconv).  And just as a
matter of future-proofing, I'd recommend using a UTF-8 locale if you aren't
already.

Now, for showing full headers ... this is also not hard, but what you need
to do isn't obvious because things are kinda spread out.  The program by
default that handles message display is mhl, so you could should read
the mhl man page.  Header fields are called 'components' in MH-speak,
so normally you need to list all of the components you want displayed
(or ignored).  But if you want them all:

       The component "Extras" will output all of the components of the message
       which were not matched by  explicit  components,  or  included  in  the
       ignore list.  If this component is not specified, an ignore list is not
       needed since all non-specified components will be ignored.

So just list the ones in the mhl form file you want first then use "extras"
to display the rest.  In fact ... it looks like this is actually the default
mhl format?  So maybe the defaults do what you want already?  Well, except
that maybe you want to delete the "ignores" line.

But here's where things are "fun".  For historical reasons, show/mhl don't
handle MIME messages; that gets passed off to mhshow.  This was fine back
20-30 years ago, but nowadays most messages are MIME messags.  What happens
with a MIME message is mhshow uses mhl to display the headers and then
handles the body of the message on it's own.  So to get full headers for
MIME messages you need to modify the mhl format used by mhshow to
display the headers; the default one is mhl.headers, and I believe that
again what you need to do is simply delete the 'ignores' line.  What
I do is copy the ones provided by nmh to my local 'Mail' directory and
modify them appropriately; that way I don't need to modify my profile.

So, I believe that will do what you want with regards to message display.
If that does not work, please let us know.

>(I'm just real glad that you folks did your's in Perl, cuz that happens
>to be one of the few languages that I actually speak somewhat. It would
>all have been utterly Greek to me if it was in Python or Ruby or Go or
>something.)

I hear you; maybe we're both old enough to remember when Perl was the
hot new language, and now we sound like grumpy old men.  "Kids these
days, inventing all of these new languages!".

>>  See the comments at the top of replyfilter for usage information; it
>>  will likely require some adjustment for your site.
>
>I don't follow.  What will need adjusting, exactly?

Well, I could improve the documentation there as well.  Specifically
you might need to change $filterprogram, $outcharset, and @htmlconv.
You asked about 'par'; you can download it here:

        http://www.nicemice.net/par/

I was under the vague impression that there is a package for FreeBSD
for it.  But you don't HAVE to use it; 'fmt' would probably be an
acceptable substitute.

>>I thought we did a pretty good job of putting this information out
>>there, but clearly in the 7 years since 1.5 has come out everyone
>>hasn't gotten the word yet.
>
>Um.... I don't suppose it will help my case any of I were to say
>something like "I was busy, and had a LOT of stuff to do", would it?
>(But that's actually true, and I hope that I've done some Good Things
>for the Internet and for humanity in that time.)

I mean, that's a fine answer, and I get that.  We just had a number of
people who have said since the 7 (!) years since we released 1.5, "Hey,
did you guys ever come up with any solution to dealing with MIME message
replies?" and I'm like, "Um, YES!".  So I want to understand where we
failed.  "I was busy, I had something which sorta-worked and I didn't
have time to fix it" is a fine answer.  I'd like to figure out how
to make THAT better, but I am busy as well.  Sigh.

So, anyway, I think we have the tools to make things much better for you.
Hopefully in the future it will all be mostly-automatic.  If you run into
problems, please let us know here.

--Ken

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Ronald F. Guilmette-2
In reply to this post by Michael Richardson-5
Catching up on my emails...

In message <7376.1561672209@localhost>,
Michael Richardson <[hidden email]> wrote:

>Ronald F. Guilmette <[hidden email]> wrote:
>    > Quite simply, all I would wish for would be something that would -properly-
>    > convert -both- HTMLized emails -and- "Content-Transfer-Encoding: base64"
>    > emails (like this one I'm responding to) into good old fashioned ascii,
>    > at least for purposes of the "show" and "repl" commands.  I have my
>    > jury-rigged solution working adequately well for the base64 encoding
>    > still, but only for the "show" command, which means that I have to do
>    > some manual cutting-and-pasting when/if I want to reply to a base64
>    > encoded email. :-(
>
>I would also like that for the cases where I want to use show.

I'm going to be trying the solution that was suggested to me... hopefully
today.

>I use mh-e, and I mostly have things configured right:
>  1) use text/plain if it exists.

Ummm... YEA!  Gosh!  I would hope so!

>  2) format text/html is no text/plain
>  (3) but often text/plain is bullshit-pseudo-HTML and you need to avoid it.

I do not have any understanding of your points (2) and (3).  Could you
elaborate?

>The additional problem is that reply yanks text from formatted text/html
>rather than text/plain.

Yes, that's a problem.  And it should most certainly get fixed.  (My own
crappy/broken solution that I slapped together with spit and bubble gum
years ago did at least try to grab a text/plain section, when available.)

>The good HTML formatters in mh-e (Emacs) are slow,
>and the fast ones do a poor job.

Are you saying that, for example, lynx does a crummy job?

It's pretty fast.  Does that mean it also produces crappy results?

I have trouble believe that in this day and age, when we have had REALLY
widespread use of HTML for around a couple of decades now, that there are
still -zero- tools tyat can quicky render HTML into plain text without
mucking it up somehow.


Regards,
rfg

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Ronald F. Guilmette-2
In reply to this post by Ken Hornstein-2
In message <[hidden email]>,
Ken Hornstein <[hidden email]> wrote:

>>>  - replfilter depends on 'par', which users may not have installed,

FreeBSD comes with a tool called "fmt" pre-installed, so I guess I need
to use that in place of par, yes?

OK, so where do I make that subbstitution?


Regards,
rfg

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Ken Hornstein-2
>>>>  - replfilter depends on 'par', which users may not have installed,
>
>FreeBSD comes with a tool called "fmt" pre-installed, so I guess I need
>to use that in place of par, yes?

You could, or you could install par from FreeBSD ports (I double-checked
and it is in there).

>OK, so where do I make that subbstitution?

I think my other email explained the variables in replyfilter that you
should look at; really, start at the top, and page down a bit.  There's
a small paragraph for each one of those variables that hopefully explains
enough.

--Ken

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Ronald F. Guilmette-2
In reply to this post by Ken Hornstein-2
In message <[hidden email]>,
Ken Hornstein <[hidden email]> wrote:

>Ok, these things are not hard.

Speak for yourself!  My little pea brain is still trying to grok all this.

>First, delete the entries for showproc and showmimeproc; the defaults
>for these programs are fine (they are, respectively, mhl and mhshow; you
>can look at mh-profile(5) for the exact defaults).

My man page sez the defaults are:
      showmimeproc: /usr/local/bin/mhshow
       showproc: /usr/local/libexec/nmh/mhl

Hope those are right!

>Secondly, I see that in your profile you have:
>
> mhshow-show-text/html: more '%F'
>
>Which isn't helping things, so delete that also.

Done.

>I don't know if the
>package for nmh on FreeBSD sets things up correctly, but if you look
>at mhn.defaults (which should be in the 'etc' directory somewhere) you
>might see a line that looks like:
>
>mhshow-show-text/html: charset=3D%{charset}; %lw3m -dump ${charset:+-I "$c=
>harset"} -T text/html %F

I have *no* line in /usr/local/etc/nmh/mhn.defaults file that begins with
or even contains the literal string "mhshow-show-text/html".  So I have tried
to add what you posted to my test account ~/.mh_profile file, but the
line you sent got garbled in transmission because your posting was:

    Content-Transfer-Encoding: quoted-printable

I have tried my best to infer what you wrore. Please check and tell me if
this is exactly correct:

   https://pastebin.com/raw/MQLysN7U

>If you DON'T, then you could add that to your profile

Done.

>(assuming you are using w3m)

I wasn't.  In faact I never even heard of the thing before now.

But fortunately, there is a port/package of that for FreeBSD and I have
just installed it.

>and that should make 'show' mostly do the right thing.

Mostly, it looks OK, but I keep on getting stuff like this at the top:

   [ part 1 - text/html -   1.3KB  ]

I actually kind of like that.  i just hope that it won't show up when
I go to to "repl".

>I also see you have:
>
> mhshow-charset-iso-8859-1: %s
>
>I'd delete that

Done.

>also (we have native support for iconv).  And just as a
>matter of future-proofing, I'd recommend using a UTF-8 locale if you aren't
>already.

How would I know, one way or the other?

Here is what I have set.  is this what you are talking about?  Or do I
need to fiddle sonmething else entirely?

    % env | fgrep LOCALE
    XTERM_LOCALE=en_US.UTF-8

>Now, for showing full headers ... this is also not hard...

Wait!  Looking at some messages in my test account, where I have already
made all of the changes suggested above, it appears that I *am* seeing
the full headers already when I just do "show".  I am quite certainly
seeing all of the following header types (which is good):

    Return-Path:
    X-Original-To:
    Delivered-To:
    Received:
    X-Virus-Scanned:
    Reply-To:
    From:
    Subject:
    Date:
    MIME-Version:
    Content-Type:
    X-Priority:
    X-MSMail-Priority:
    X-Mailer:
    X-MimeOLE:
    Message-Id:

It sure looks to me like *all* header types are *already* being displayed
when I just do "show", so why are you thinking that I need ot do something
additional in order to see the full headers?

(The only problem is that the headers, as they are displayed with "show",
appear to have has some extra whitespace inserted... by *something*... just
after the colons in each case.  How do I shut THAT off?)

.but what you need
>to do isn't obvious because things are kinda spread out.  The program by
>default that handles message display is mhl, so you could should read...

I'm going to leave all this for now, since I already seem to be getting
full headers.

>... So to get full headers for
>MIME messages you need to modify the mhl format used by mhshow to
>display the headers; the default one is mhl.headers, and I believe that
>again what you need to do is simply delete the 'ignores' line.

Ummmm... Here is the /usr/local/etc/nmh/mhl.headers fle that is these
days being distributed & installed with nmh itself for FreeBSD:

   https://pastebin.com/raw/w7gmsWVJ

I'm not seeing any "ignore" lines in there.  So maybe, in my case, this
whole issue has been "pre-fixed" to my liking already (?)

Ohhhhhhh!  Wait!  in the Mail/ directory of my "test" account it seems
that I already had a file called "mhl.headers" in there, and I guess
that is overriding the system default.  Here is the Mail/mhl.headers
that's already in my test account.  (Is this really no good?  Seems
perfectly servicable to me!)

    https://pastebin.com/raw/BdeQQjCF

Well, so I deleted the Mail/mhl.headers file for my test account
and did a "show" again on one piece of old spam that had been received
back in Feb. and this is what that looks like now when I do "show":

     https://pastebin.com/raw/JLAGbvbM

This seems fairly messed up to me.  I understand that the default systemwide
mhl.headers file is (somehow) preventing me from seeing the Received: headers,
but even ignoring that small fly in the ointment, why the bleep am I now
getting (what I assume are) the "important" headers, shown at the top,
followed by a blank line and then a bunch of *other* header lines?  Is
that really the effect that the default mhl.headers file is *supposed*
to produce??


Now on to repl...

>>>  See the comments at the top of replyfilter for usage information; it
>>>  will likely require some adjustment for your site.
>>
>>I don't follow.  What will need adjusting, exactly?
>
>Well, I could improve the documentation there as well.  Specifically
>you might need to change $filterprogram, $outcharset, and @htmlconv.

OK, so I put a copy of the distribted replyfilter Perl script into
/usr/local/bin/ and I edited just one line, which was the definition
of $filterprogram:

    $filterprogram = 'fmt';

(Apparently, "fmt" is what I have by default on my FreeBSD system.)

I left those other two variable with the definitions they already had:

   $outcharset = 'utf-8';
   @htmlconv = ('w3m', '-dump', '-cols', $maxcolwidth - 2, '-T', 'text/html',
             '-O', $outcharset, '-I');

I think that will be OK, because I think that I have my xterm windows set
to permit/support UTF-8.  (But how can I check this to be sure?)

Anyway, this is most definitely NOT working.  Look at this, which I cut
and pasted from my xterm window when I tried to do a repl:

    https://pastebin.com/raw/KzuXRg04

I don't think that this is or was the intended effect.

>So, anyway, I think we have the tools to make things much better for you.

I'm still hoping that is the case, but it isn't looking good so far.

I hope that you can guide me on how to debug these issues.  (I'm sure that
it would help a lot if I actually knew what the hell I was doing, but I am
hoping that together we can struggle along and find solutions even in the
absence of that.)

>Hopefully in the future it will all be mostly-automatic.  If you run into
>problems, please let us know here.

You betcha!  Consider it done!


Regards,
rfg

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

David Levine-3
rfg wrote:

> Anyway, this is most definitely NOT working.  Look at this, which I cut
> and pasted from my xterm window when I tried to do a repl:

Does this work better?

  repl -filter mhl.replywithoutbody -convertargs text/html '' -convertargs text/plain ''

Then enter mime at the "What now?" prompt if you want to edit the draft.

David

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: Formatting HTML to Text: netrik.

Ken Hornstein-2
In reply to this post by Ronald F. Guilmette-2
Focusing on just ONE thing, since I just woke up ...

>Anyway, this is most definitely NOT working.  Look at this, which I cut
>and pasted from my xterm window when I tried to do a repl:
>
>    https://pastebin.com/raw/KzuXRg04
>
>I don't think that this is or was the intended effect.

Did you follow ALL of the directions at the top of replyfilter?  Specifically
the changes you need to make to your profile and the mhl reply filter
you need to create?  If they are unclear please let me know.

--Ken

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
123