Segfault and faulty focus (related regressions after v1.4.8)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Segfault and faulty focus (related regressions after v1.4.8)

Axel Svensson
I have found 3 regressions in ratpoison. They seem somewhat related.

General reproduction notes (for all 3 regressions):
- Use a docker container based on the debian:latest image.
- Run ratpoison against Xvfb.
- Important: Some reproductions might not succeed unless performed under
  the conditions above.
- If necessary I can provide more details for reproduction, let me know
  if you want this.

Regression 1: Segfault at startup.
- To reproduce:
  - Xvfb :81 -screen 0 2000x2000x24
  - export DISPLAY=:81
  - ./src/ratpoison
  - (press Ctrl-C to kill ratpoison)
  - ./src/ratpoison
- Expected: Ratpoison starts again
- Actual: Ratpoison segfaults
- Workaround: ./configure --without-xrandr
- Introduced in commit: 59b8671 @ libXext isn't used
- Notes:
  - This is the most serious problem. It is only present in certain
    circumstances, which explains why it hasn't been found sooner. Feel
    free to ask for more details. I could for example provide a
    Dockerfile that reproduces the problem, if you only let me know that
    you'd like to use it.

Regression 2: Faulty :focusup and :focusdown
- To reproduce:
  - ./configure --without-xrandr
  - Compile and run ratpoison
  - With no windows, split screen vertically and horizontally into four
    frames of equal size.
  - Go to the top (or bottom) right frame
  - :focusdown (or :focusup)
- Expected: The bottom (or top) right frame is focused
- Actual: The bottom (or top) left frame is focused
- Workaround: None found except running an earlier version.
- Introduced in: 5318f01 Update screen-aware focus* commands after
  changes to support XRandR
- Notes:
  - It seems that 5318f01 (which introduces this regression), is trying
    to correct some calculation of frame coordinates with respect to
    xrandr. Possibly, this is not working as intended when configured
    with --without-xrandr.

Regression 3: Faulty :focusright
- To reproduce:
  - ./configure --without-xrandr
  - Compile and run ratpoison
  - With no windows, split screen vertically and horizontally into four
    frames of equal size.
  - Go to the bottom left frame
  - :focusright
- Expected: The bottom right frame is focused
- Actual: The top right frame is focused
- Workaround: None found except running an earlier version.
- Introduced in: 4b7b046 Fix: check frame overlap in find_frame_left,
  find_frame_right
- Notes:
  - :focusleft might be faulty too; I haven't been able to test it.
  - It seems that 4b7b046 (which introduces this regression), is trying
    to make :focusleft and :focusright work the same way as :focusup and
    :focusdown. This is indeed the case, only that the introduced
    behaviour is faulty when ratpoison is compiled and run as indicated
    above.

Regards,
Axel Svensson

_______________________________________________
Ratpoison-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/ratpoison-devel
Reply | Threaded
Open this post in threaded view
|

Re: Segfault and faulty focus

Martin Samuelsson
Axel Svensson @ 2019-07-15 (Monday), 18:01 (+0200)
>I have found 3 regressions in ratpoison. They seem somewhat related.

Interesting... Would you be able to run that reproducable seg. fault though
gdb and generate a stack trace of the crash?

Given the complexity there is the risk that it might be hard for someone
else to reproduce it. So since you have an environment where you can trigger
the regression, maybe that's the best place to attempt hunting down what's
causing it?

I'd be happy to assist with how to use a gdb if you're new to that area.  
Possibly it might be easiest starting ratpoison through gdbserver in the
Dockerfile, in order to be able to attach to the crashing process from
outside the container.
--
/Martin

_______________________________________________
Ratpoison-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/ratpoison-devel
Reply | Threaded
Open this post in threaded view
|

Re: Segfault and faulty focus

Axel Svensson
On Tue, Jul 16, 2019 at 12:24 AM Martin Samuelsson
<[hidden email]> wrote:
> Interesting... Would you be able to run that reproducable seg. fault though
> gdb and generate a stack trace of the crash?

Never used gdb before, but I believe I have produced a stack trace
below. I might be reading this wrong, but it seems like
add_to_window_list (frame #0) is called with the rp_screen *s argument
set to null. This would explain why it segfaults at the first s->
member access. As a newbie I get a little confused by frame #1.
Despite scanwins having one argument and all call sites provide an
argument, gdb does not list any arguments for the frame. If this
simply means that s=null, then I'm not sure exactly why a segfault is
not produced when s->root is accessed when calling XQueryTree or
XGetWindowAttributes. Perhaps you can shed some light on that, or at
least give me a pointer for what information I need to provide next.
Thanks for taking your time!

========================================================================

root@2f988de7c2d8:~/ratpoison# gdb ./src/ratpoison
GNU gdb (Debian 8.2.1-2) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./src/ratpoison...done.
(gdb) run
Starting program: /root/ratpoison/src/ratpoison
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
add_to_window_list (s=0x0, w=4194305) at window.c:181
181       new_window->colormap = DefaultColormap (dpy, s->screen_num);
(gdb) info stack
#0  add_to_window_list (s=0x0, w=4194305) at window.c:181
#1  0x0000561e59326249 in scanwins () at manage.c:452
#2  0x0000561e5931218a in main (argc=1, argv=0x7ffdcb6af288) at main.c:432
(gdb) info frame
Stack level 0, frame at 0x7ffdcb6af070:
 rip = 0x561e5932a0ab in add_to_window_list (window.c:181); saved rip
= 0x561e59326249
 called by frame at 0x7ffdcb6af150
 source language c.
 Arglist at 0x7ffdcb6af028, args: s=0x0, w=4194305
 Locals at 0x7ffdcb6af028, Previous frame's sp is 0x7ffdcb6af070
 Saved registers:
  rbx at 0x7ffdcb6af050, rbp at 0x7ffdcb6af058, r12 at 0x7ffdcb6af060,
rip at 0x7ffdcb6af068
(gdb) info args
s = 0x0
w = 4194305
(gdb) up
#1  0x0000561e59326249 in scanwins () at manage.c:452
452           win = add_to_window_list (screen, wins[i]);
(gdb) info frame
Stack level 1, frame at 0x7ffdcb6af150:
 rip = 0x561e59326249 in scanwins (manage.c:452); saved rip = 0x561e5931218a
 called by frame at 0x7ffdcb6af1b0, caller of frame at 0x7ffdcb6af070
 source language c.
 Arglist at 0x7ffdcb6af068, args:
 Locals at 0x7ffdcb6af068, Previous frame's sp is 0x7ffdcb6af150
 Saved registers:
  rbx at 0x7ffdcb6af128, rbp at 0x7ffdcb6af130, r12 at 0x7ffdcb6af138,
r13 at 0x7ffdcb6af140,
  rip at 0x7ffdcb6af148
(gdb) info args
No arguments.
(gdb) up
#2  0x0000561e5931218a in main (argc=1, argv=0x7ffdcb6af288) at main.c:432
432       scanwins ();
(gdb) info frame
Stack level 2, frame at 0x7ffdcb6af1b0:
 rip = 0x561e5931218a in main (main.c:432); saved rip = 0x7fe58c6dd09b
 caller of frame at 0x7ffdcb6af150
 source language c.
 Arglist at 0x7ffdcb6af148, args: argc=1, argv=0x7ffdcb6af288
 Locals at 0x7ffdcb6af148, Previous frame's sp is 0x7ffdcb6af1b0
 Saved registers:
  rbx at 0x7ffdcb6af178, rbp at 0x7ffdcb6af180, r12 at 0x7ffdcb6af188,
r13 at 0x7ffdcb6af190,
  r14 at 0x7ffdcb6af198, r15 at 0x7ffdcb6af1a0, rip at 0x7ffdcb6af1a8
(gdb) info args
argc = 1
argv = 0x7ffdcb6af288
(gdb) up
Initial frame selected; you cannot go up.
(gdb)

_______________________________________________
Ratpoison-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/ratpoison-devel
Reply | Threaded
Open this post in threaded view
|

Re: Segfault and faulty focus (related regressions after v1.4.8)

Antoine Busque-2
In reply to this post by Axel Svensson
On Mon, Jul 15, 2019 at 12:01 PM Axel Svensson <[hidden email]> wrote:
>
> I have found 3 regressions in ratpoison. They seem somewhat related.
>
> Regression 1: Segfault at startup.
>
> Regression 2: Faulty :focusup and :focusdown
>
> Regression 3: Faulty :focusright
>

Hi Axel,

Thanks a lot for the detailed reproduction steps, it's really helpful.

This is the first time I hear of that segfault issue, so I'm afraid I
can't help there without investigating further myself.

As for the other two regressions, they also exist when ratpoison is
configured with xrandr enabled. In fact, I posted a patch to this
mailing list several months ago (a year and a half ago actually now
that I'm checking the dates, time flies!), but we never got around to
merging it.

You can find the patch in the ratpoison-devel archives [0] or on my
copy of the ratpoison repository on github, in the fix-focus branch
[1].

Let me know if that helps with your issue. If so I can try to get
those changes merged into master, assuming Jérémie approves.

[0] https://lists.nongnu.org/archive/html/ratpoison-devel/2018-02/msg00000.html
[1] https://github.com/abusque/ratpoison/tree/fix-focus

_______________________________________________
Ratpoison-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/ratpoison-devel
Reply | Threaded
Open this post in threaded view
|

Re: Segfault and faulty focus (related regressions after v1.4.8)

Axel Svensson
On Tue, Jul 16, 2019 at 4:08 AM Antoine Busque <[hidden email]> wrote:

>
> On Mon, Jul 15, 2019 at 12:01 PM Axel Svensson <[hidden email]> wrote:
> >
> > Regression 1: Segfault at startup.
> >
> > Regression 2: Faulty :focusup and :focusdown
> >
> > Regression 3: Faulty :focusright
> >
>
> As for the other two regressions, they also exist when ratpoison is
> configured with xrandr enabled. In fact, I posted a patch
...
> [1] https://github.com/abusque/ratpoison/tree/fix-focus

Thank you!

I can confirm that A. Busque's fix-focus branch fixes regressions 2 and
3 when compiled with --without-xrandr. At this point no further work on
these regressions seems necessary, except maybe getting the fixes
merged.

_______________________________________________
Ratpoison-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/ratpoison-devel
Reply | Threaded
Open this post in threaded view
|

Re: Segfault and faulty focus

Martin Samuelsson
In reply to this post by Axel Svensson
Axel Svensson @ 2019-07-16 (Tuesday), 01:58 (+0200)
>
>Never used gdb before, but I believe I have produced a stack trace
>below.

I kind of assumed it was new for you, and must say you did great considering
that level of experience!

Sorry it took me a couple of days to get back to looking at your answer.

>I might be reading this wrong, but it seems like add_to_window_list (frame
>#0) is called with the rp_screen *s argument set to null.

My understanding is the same as yours, and furthermore I can see that there
seem to be an intact call stack. We are also lucky that the seg. fault
happens early without seeming to require any ratpoison specific state.

>Despite scanwins having one argument and all call sites provide an
>argument, gdb does not list any arguments for the frame.

The compiler might have optimized away the values. I'm not completely sure
that's what happened here, but generally one wants to rebuild the source
code without optimizations when debugging a more complex issue (with the
risk of losing reproduceability). With this bug though, my guess is that
it'll be easy enough to grasp what happens without recompiling.

>Perhaps you can shed some light on that, or at least give me a pointer for
>what information I need to provide next.

Setting a breakpoint at scanwins() is likely the next thing to do. I.e.  
execute "break scanwins" prior to executing "run". Once scanwins() is
reached the prompt should come back and it'll be possible to execute each
line of code interactively with "step" (or "s") and investigate what the
variables contain at using e.g. "print attr". There clearly are more
commands available, and even other interfaces on top of gdb which makes
debugging more convenient. Which might be useful for more tricky problems
than this one.

The general idea is of course to figure out if find_screen_by_attr(),
list_first() and XGetWindowAttributes() seem to be called with and return
reasonable values.

>Thanks for taking your time!

Thank you!
--
/Martin

_______________________________________________
Ratpoison-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/ratpoison-devel
Reply | Threaded
Open this post in threaded view
|

Re: Segfault and faulty focus

Axel Svensson
On Fri, Jul 19, 2019 at 1:09 PM Martin Samuelsson
<[hidden email]> wrote:
> The general idea is of course to figure out if find_screen_by_attr(),
> list_first() and XGetWindowAttributes() seem to be called with and return
> reasonable values.

OK, so I think I've come a little further along. The particular segfault
I reported originates in the assumption that there exists at least one
screen, i.e. list_size(&rp_screens)>0. This reasonable assumption is not
checked, so if there are no screens, add_to_window_list will be called
with s=NULL.

It takes fairly special circumstances to achieve the combination of all
screens removed and a subsequent call to add_to_window_list, it just so
happens that this occurred fairly often at startup in my environment.
See footnote [1] for a MWE that works for me, YMMW.

The interesting part is what causes the screen removal. This happens
when the xrandr tool is used to switch framebuffer size in Xvfb. See
footnote [2] for debug printout, I added a couple to show that the
screen is indeed removed. Note especially that an unknown XEvent of type
-90 is received (there seems to be no event with this type defined in
the X libraries). Also note the error message from xrandr.

At this point I think I'd appreciate some input. Could this be caused by
a bug in xrandr or Xvfb rather than ratpoison? Can there be something
off with ratpoison's event handling? Am I simply using the xrandr tool
wrong? If I exchange "--fb" for "--output screen --mode", it no longer
causes ratpoison to remove its only screen.

[1]: MWE

Xvfb :81 -screen 0 5000x5000x24 &
export DISPLAY=:81
./src/ratpoison &
sleep 1
ratpoison -c "set padding 0 0 0 0"
xrandr --newmode 1600x900 0 1600 0 0 0 900 0 0 0
xrandr --addmode screen 1600x900
xrandr --fb 1600x900 --dpi 130 # Part of the output is in [2].
ratpoison -c "set padding 0 0 0 0" &
jobs # Notice that the last ratpoison command doesn't exit.
fg 1 # Kill ratpoison with Ctrl-C.
jobs # Notice that the command sender still hasn't finished.
./src/ratpoison # This one segfaults

[2]: Debug output from ratpoison (in bg) and xrandr

root@c2985fd5e2f1:~/ratpoison# xrandr --fb 1600x900 --dpi 130
xrandr: Failed to get size of gamma for output screen
xrandr: specified screen 1600x900 not large enough for output screen
(5000x5000+0+0)
ratpoison:xrandr.c:245: debug: --- Handling RRNotify ---
ratpoison:xrandr.c:255: debug: ---          XRRCrtcChangeNotifyEvent ---
ratpoison:events.c:881: debug: --- Unknown event -90 ---
X Error of failed request:  BadValue (integer parameter out of range
for operation)
  Major opcode of failed request:  140 (RANDR)
  Minor opcode of failed request:  21 (RRSetCrtcConfig)
  Value in failed request:  0x0
  Serial number of failed request:  22
  Current serial number in output stream:  22
ratpoison:xrandr.c:245: debug: --- Handling RRNotify ---
ratpoison:xrandr.c:250: debug: ---          XRROutputChangeNotifyEvent ---
ratpoison:globals.c:535: debug: ========== Size of rp_screens: 1
ratpoison:xrandr.c:180: debug: xrandr_output_change: Removing screen screen
ratpoison:number.c:125: debug: ns=0x556e8885ae90 release 0
ratpoison:number.c:125: debug: ns=0x556e8885afb0 release 0
ratpoison:globals.c:535: debug: ========== Size of rp_screens: 0
ratpoison:events.c:881: debug: --- Unknown event -90 ---
[the rest redacted]

_______________________________________________
Ratpoison-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/ratpoison-devel
Reply | Threaded
Open this post in threaded view
|

Segfault on zero screens (Was: Segfault and faulty focus)

Martin Samuelsson
Axel Svensson @ 2019-07-20 (Saturday), 01:20 (+0200)
>
>The particular segfault I reported originates in the assumption that there
>exists at least one screen

Ah! That's it! A reasonable expectation prior to xrandr, but having zero
screens is fully allowed (although arguably useless) with xrandr support.

>See footnote [1] for a MWE that works for me, YMMW.

Once you explained the issue, I managed to reproduce with a normal Xorg
process merely having launched an xterm and the following commands on a text
mode console:

export DISPLAY=:0
for S in `xrandr --listmonitors|sed --silent 's/ [0-9]*:.* //p'`
do
  xrandr --output ${S} --off
done
./src/ratpoison # This one segfaults

>The interesting part is what causes the screen removal.

Makes sense. During reconfiguration there's a split second where no screen
actually exist and ratpoison should cope with that.

>Also note the error message from xrandr.

Without fully understanding them, I wouldn't be surprised if even the tools
for xrandr itself gets confused when there are zero screens.

>At this point I think I'd appreciate some input.

For most of its life ratpoison knew nothing about xrandr and support for it
was only added "somewhat recently". The code needs to be gone through to see
which other functions have this assumption on an existing screen.

I quickly tried adding a few null checks and successfully started ratpoison
with zero screens, reactivated my screen, launched an xlogo and the new
window did appear. My interpretation of that little exercise is that fixing
the bug should be very doable.

Unfortunately I don't know the code well enough to immediately understand
whether adding all the needed null checks before each use of screens are the
way to go, or if there could be a smarter way. Won't have more time to look
at it right now, I'm afraid since my weekend is fully booked.

>Am I simply using the xrandr tool wrong?

Nah. I'd say you're using it right and found a proper bug in ratpoison. Yet
if you manage to find a workaround to setup your environment in some way
which doesn't triggering the bug, go for it! Just beware that every time you
deploy a workaround you increase your technical debt.
--
/Martin

_______________________________________________
Ratpoison-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/ratpoison-devel
Reply | Threaded
Open this post in threaded view
|

Re: Segfault on zero screens (Was: Segfault and faulty focus)

Axel Svensson
On Sat, Jul 20, 2019 at 10:59 AM Martin Samuelsson
<[hidden email]> wrote:
> Ah! That's it! A reasonable expectation prior to xrandr, but having zero
> screens is fully allowed (although arguably useless) with xrandr support.
ok great, so I think we confirmed the bug then.

> Once you explained the issue, I managed to reproduce with a normal Xorg
Good, this might be easier to use for some people.

> For most of its life ratpoison knew nothing about xrandr and support for it
> was only added "somewhat recently". The code needs to be gone through to see
> which other functions have this assumption on an existing screen.
This seem fairly manageable. Maybe I can begin this work.

> Unfortunately I don't know the code well enough to immediately understand
> whether adding all the needed null checks before each use of screens are the
> way to go, or if there could be a smarter way.
Me neither. Maybe it's right but not enough. I wouldn't be surprised if
for example something has to be added to handle the special case of
going from 0 to 1 active screens. I'll look into the implications and
post my findings.

_______________________________________________
Ratpoison-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/ratpoison-devel