[nmh-workers] inc: Unable to find a line terminator after 32768 bytes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[nmh-workers] inc: Unable to find a line terminator after 32768 bytes

Andy Bradford-2
Hello,

This is the first time I've ever seen such an error from inc. In looking
at  the message  that is  causing the  problem, apparently  it's a  MIME
message that has a base64 encoded MIME body that is all on one line that
even sed has a hard time parsing:

$ time (cat bigmessage | sed -ne '62p' | wc)
       1       1 11370773
    1m08.62s real     0m15.01s user     0m23.09s system

This just  seems ridiculous.  I'm tempted  to modify  my SMTP  server to
reject such messages, but I'm curious to know if this is simply a bug in
the sending software or if something has changed?

Unfortunately, I  cannot share the  entire message,  but I'm not  sure I
need to.  It should  be easily  reproduced by  simply base64  encoding a
large file and  putting it all on  one line. Here are  the redacted MIME
headers:

------=_Part_8167195_1805258438.1568043535371
Content-Type: application/octet-stream;
        name="attachment.eml"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
        filename="attachment.eml"

Is this something I should report to  the sender as a clear violation of
RFC5322,  which as  far as  I can  tell, restricts  line lengths  to 998
characters, or is there something special about MIME that supersedes the
limit and which means inc needs fixing?

Thanks,

Andy
--
TAI64 timestamp: 400000005d76af72



--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: inc: Unable to find a line terminator after 32768 bytes

Ken Hornstein-2
>This is the first time I've ever seen such an error from inc. In looking
>at  the message  that is  causing the  problem, apparently  it's a  MIME
>message that has a base64 encoded MIME body that is all on one line that
>even sed has a hard time parsing:

So ... yeah, that's a total violation of RFC 5322 (sendmail, for all of
it's faults, will force a newline when it encounters those huge-lined
messages, although that has it's own problems).  Nothing in MIME
supercedes any of those limits; those messages shouldn't be generated.

Well, okay, there is ONE minor exception; content which has a
Content-Transfer-Encoding of "binary" does not require a CR-LF pair
every 1000 bytes.  That doesn't apply in this case, and I have never
actually seen any binary-encoded content in the wild.  I only mention
it out of completeness.

In a perfect world I think we SHOULD parse those messages (up to the limits
of virtual memory), but right now we don't.

>Is this something I should report to  the sender as a clear violation of
>RFC5322,  which as  far as  I can  tell, restricts  line lengths  to 998
>characters, or is there something special about MIME that supersedes the
>limit and which means inc needs fixing?

Based on my personal experience ... you may not be able to find anyone
who really cares about fixing that (I have run into some people who
care about fixing broken email, most of the time I get ignored or blown
off).  Just to warn you.

--Ken

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: inc: Unable to find a line terminator after 32768 bytes

Andy Bradford-2
Thus said Ken Hornstein on Mon, 09 Sep 2019 17:04:05 -0400:

> In a perfect world  I think we SHOULD parse those  messages (up to the
> limits of virtual memory), but right now we don't.

That's actually  how I figured  this problem out.  I found that  my POP3
daemon kept  crashing and when  I investigated it,  I found that  it was
because  it didn't  have  sufficient  memory to  respond  to inc's  RETR
command. After I increased the amount of memory that the POP3 daemon was
allowed to  allocate, the RETR  command succeeded,  but then I  ended up
with an inc that refused to incorporate emails.

Whether or not  we think making inc handle nonconforming  lines is worth
tackling, it  might be  a good  idea to  make inc  handle the  failure a
little better.  What happened instead  was that inc exited  after having
partially RETR'ieved the message, without having told the POP3 server to
DELE the  ones it had already  successfully pulled down. So  each time I
ran inc, it would pull down the messages, die on the same bogus message,
and repeat; so that I ended up with a few duplicates.

I think issuing a warning and leaving  a bad message on the server would
be better than aborting the entire POP3 session and causing a repeat.

> Based on my personal experience ... you may not be able to find anyone
> who really  cares about fixing that  (I have run into  some people who
> care about  fixing broken  email, most  of the time  I get  ignored or
> blown off). Just to warn you.

Yeah, I just wanted to double-check my  facts before I sent off an email
asking them if they are aware of their misbehaving mail system. I'll see
how they  react (if they even  get the message---it's difficult  to find
functioning postmaster@ addresses these days).

Thanks,

Andy
--
TAI64 timestamp: 400000005d7706d8



--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: inc: Unable to find a line terminator after 32768 bytes

Ralph Corderoy
In reply to this post by Andy Bradford-2
Hi Andy,

> $ time (cat bigmessage | sed -ne '62p' | wc)
>        1       1 11370773

I expect you know how to remedy this so you can read the email.
Something like

    perl -lpe 'length > 39 and s/.{16}/$&\n/g'

adjusting those numbers used for testing to suit.

--
Cheers, Ralph.

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: inc: Unable to find a line terminator after 32768 bytes

Ralph Corderoy
In reply to this post by Ken Hornstein-2
Hi Ken,

> In a perfect world I think we SHOULD parse those messages (up to the
> limits of virtual memory), but right now we don't.

My regular statement that no we should not accept such dribble into the
system.  Users should be aware it's arriving.  If they then want to fix
it, then mhfixmsg(1) is where knowledge of all the world's crud can be
put.  And some users, like David, can arrange for it to handle all
incoming email so they never need to know.

    https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00

I would not like virtual memory to have to be exhausted, crippling the
machine's performance, before nmh finally bails out and gives a `line
too long' that it could have stated much, much earlier.

--
Cheers, Ralph.

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: inc: Unable to find a line terminator after 32768 bytes

Ralph Corderoy
In reply to this post by Andy Bradford-2
Hi Andy,

> What happened instead was that inc exited after having partially
> RETR'ieved the message, without having told the POP3 server to DELE
> the ones it had already successfully pulled down.

You may want to see how fetchmail(1) does out of interest.

--
Cheers, Ralph.

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: inc: Unable to find a line terminator after 32768 bytes

Ken Hornstein-2
In reply to this post by Ralph Corderoy
>> In a perfect world I think we SHOULD parse those messages (up to the
>> limits of virtual memory), but right now we don't.
>
>My regular statement that no we should not accept such dribble into the
>system.  Users should be aware it's arriving.  If they then want to fix
>it, then mhfixmsg(1) is where knowledge of all the world's crud can be
>put.

The problem I have with THAT is that pretty much every other MUA deals with
this just fine; that makes us the odd one out.  And I see a chicken and
egg problem here; if we can't incorporate such a message, we can't really
have mhfixmsg deal with it.

Also, thinking more about this makes me think that at least for inc
we should be able to deal with this WITHOUT having to parse everything
line-by-line.  Even if it was a single line of 200 MB, you should be able
to write that out without having to malloc() out 200 MB.

--Ken

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: inc: Unable to find a line terminator after 32768 bytes

Ken Hornstein-2
In reply to this post by Ken Hornstein-2
>That's actually  how I figured  this problem out.  I found that  my POP3
>daemon kept  crashing and when  I investigated it,  I found that  it was
>because  it didn't  have  sufficient  memory to  respond  to inc's  RETR
>command. After I increased the amount of memory that the POP3 daemon was
>allowed to  allocate, the RETR  command succeeded,  but then I  ended up
>with an inc that refused to incorporate emails.

So how big WAS this message, actually?  I'm trying to understand the scope
of the problem.

>Whether or not  we think making inc handle nonconforming  lines is worth
>tackling, it  might be  a good  idea to  make inc  handle the  failure a
>little better.  What happened instead  was that inc exited  after having
>partially RETR'ieved the message, without having told the POP3 server to
>DELE the  ones it had already  successfully pulled down. So  each time I
>ran inc, it would pull down the messages, die on the same bogus message,
>and repeat; so that I ended up with a few duplicates.
>
>I think issuing a warning and leaving  a bad message on the server would
>be better than aborting the entire POP3 session and causing a repeat.

Architecturally, this is difficult.

We issue a DELE after every message we RETR, but those DELE's dont get
committed until you issue the QUIT (this is part of the POP3 protocol).
We call die() a lot and that just means we call exit() and never issue
the QUIT.  Really, I think that the best course of action would be that
inc always tries to write something out (unless it encounters something
like an I/O error) and exits cleanly.

--Ken
results in a

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: inc: Unable to find a line terminator after 32768 bytes

Andy Bradford-2
Thus said Ken Hornstein on Tue, 10 Sep 2019 09:15:00 -0400:

> So how  big WAS this message,  actually? I'm trying to  understand the
> scope of the problem.

The  entire size  of the  message  on disk  (including additional  trace
headers added by  my MTA) is 11,374,046 while the  size of the offending
line is 11,370,773. That means that  the rest of the message headers and
text/plain part of the message occupy 3,273 bytes.
 
> Really,  I think  that the  best course  of action  would be  that inc
> always tries  to write something  out (unless it  encounters something
> like an I/O error) and exits cleanly.

Actually I failed to report that inc *did* write out something. It wrote
out until the MIME  content started, so it got up to  the headers of the
MIME  part and  then while  trying  to scan  the next  line issued  that
error---the resulting file was truncated. I found the problem later when
I was reading messages with EXMH which showed the attachment, but when I
saved the attachment it was a 0  byte file. EXMH must be lenient when it
comes to missing MIME part markers (maybe it just assumes end-of-file is
good enough).

Thanks,

Andy
--
TAI64 timestamp: 400000005d77a9cf



--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: inc: Unable to find a line terminator after 32768 bytes

Ralph Corderoy
In reply to this post by Ken Hornstein-2
Hi Ken,

> So how big WAS this message, actually?

wc(1) said the long line was 11,370,773 bytes.

--
Cheers, Ralph.

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: inc: Unable to find a line terminator after 32768 bytes

Ken Hornstein-2
In reply to this post by Ken Hornstein-2
>The  entire size  of the  message  on disk  (including additional  trace
>headers added by  my MTA) is 11,374,046 while the  size of the offending
>line is 11,370,773. That means that  the rest of the message headers and
>text/plain part of the message occupy 3,273 bytes.

It occurs to me that allocating 11 MB shouldn't be a problem on any modern
system.  But really, this isn't necessary; once inc(1) parses the headers
it doesn't care about the content.  It could just go in a loop and
read data and write it out.  All it REALLY cares about is converting
\r\n to \n.

--Ken

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers
Reply | Threaded
Open this post in threaded view
|

Re: inc: Unable to find a line terminator after 32768 bytes

Ralph Corderoy
In reply to this post by Andy Bradford-2
Hi Andy,

> EXMH must be lenient when it comes to missing MIME part markers (maybe
> it just assumes end-of-file is good enough).

mhstore(1) fares badly too.

    $ mhstore
    mhstore: bogus multipart content in message 20593
    storing message 20593 part 1 as file 20593.1
    $ echo $?
    0

--
Cheers, Ralph.

--
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers