Bug 258433 - gdb reports "Failed to read a valid object file image from memory." when debugging
gdb reports "Failed to read a valid object file image from memory." when debu...
Status: RESOLVED WORKSFORME
: 249255 (view as bug list)
Classification: openSUSE
Product: openSUSE 10.3
Classification: openSUSE
Component: Kernel
Beta 1
Other Other
: P5 - None : Critical with 11 votes (vote)
: Beta 1
Assigned To: Andreas Kleen
E-mail List
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-03-28 16:04 UTC by Reinhard Nißl
Modified: 2007-10-24 16:08 UTC (History)
9 users (show)

See Also:
Found By: Other
Services Priority:
Business Priority:
Blocker: ---
Marketing QA Status: ---
IT Deployment: ---


Attachments
Executable showing problem (1.63 MB, application/octet-stream)
2007-07-06 09:04 UTC, Kern Sibbald
Details
Configuration file for bacula-sd executable (811 bytes, text/plain)
2007-07-06 09:05 UTC, Kern Sibbald
Details
patch mentioned in comment 43 (801 bytes, patch)
2007-08-22 23:34 UTC, Andre Klapper
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Reinhard Nißl 2007-03-28 16:04:26 UTC
Before I have upgraded my machine to 10.2, I was able to debug for example deadlocks in xine by attaching ddd (and therefore gdb) to xine for having a look at the backtraces of xine's threads.

Since 10.2, this doesn't work anymore, i. e. the backtraces do not contain useful symbol names. I'm sure to have all symbol files installed as gdb doesn't complain about missing symbols. Though I'm not sure whether this backtrace problem is related to above message, but at least in 10.1 I didn't get this message and the backtrace was ok.

To get this message, simply do the following:

gdb sleep
> run 300
Comment 1 Andreas Schwab 2007-03-28 19:23:28 UTC
This is a kernel bug.  The compat vDSO mapped to ffffe000, but that is not readable via ptrace.
Comment 2 Greg Kroah-Hartman 2007-03-29 17:28:23 UTC
*** Bug 249255 has been marked as a duplicate of this bug. ***
Comment 3 Reinhard Nißl 2007-04-03 17:49:41 UTC
I've read the comments on the duplicate bug and installed the kernel RPM of openSUSE 10.3 Alpha2 on my openSUSE 10.2 system. As a result the above message is gone and the backtrace is OK.
Comment 4 Kern Sibbald 2007-04-14 21:15:15 UTC
I'm experience the same problem reported here.

This is a *critical* problem for a developer like myself.  It would be helpful if someone would attach an exact link to this bug report pointing to a kernel that works -- I'll try to find the 10.3 Alpha kernel.

In addition to the reported problem, when running gdb in many cases the message SEG FAULT will appear, gdb "exits" and I am back at a command prompt -- note, this is not my program that is getting a seg fault.

Both problems happen on kernel-2.6.18.8-0.1-default

Example output for problem of no symbols reported by gdb when the program *is* compiled with the same Makefiles used for debugging for years now.

gdb bacula-sd
GNU gdb 6.5
Copyright (C) 2006 Free Software Foundation, Inc.
...
This GDB was configured as "i586-suse-linux"...Using host libthread_db library "/lib/libthread_db.so.1".

(gdb) run -s -f -c stored.conf
Starting program: /home/kern/bacula/k/src/stored/bacula-sd -s -f -c stored.conf
Failed to read a valid object file image from memory.
[Thread debugging using libthread_db enabled]
[New Thread -1212569904 (LWP 4867)]
[New Thread -1214227568 (LWP 4874)]
...
(I type a ctl=c at this point)

Program received signal SIGINT, Interrupt.
[Switching to Thread -1212569904 (LWP 4867)]
0xb7fc9410 in ?? ()
(gdb) where
#0  0xb7fc9410 in ?? ()
#1  0xbfab1788 in ?? ()
#2  0x00000000 in ?? ()
(gdb) thread apply all bt

Thread 3 (Thread -1222620272 (LWP 4875)):
#0  0xb7fc9410 in ?? ()
#1  0xb72043a4 in ?? ()
#2  0x00000043 in ?? ()
#3  0x00000000 in ?? ()

Thread 1 (Thread -1212569904 (LWP 4867)):
#0  0xb7fc9410 in ?? ()
#1  0xbfab1788 in ?? ()
#2  0x00000000 in ?? ()
(gdb)                    
Comment 5 Kern Sibbald 2007-04-14 21:54:15 UTC
I can confirm that your 2.6.21-rc5-git13-2-default kernel pulled from the Alpha3 cd1 resolves the problem.  I had to install it with --no-deps to get around armor dependencies, and of course, there are armor error messages when booting, but otherwise it seems to work.  It would be preferable if you could supply a 10.2 kernel that fixes this problem.
Comment 6 Kern Sibbald 2007-06-01 09:26:18 UTC
This major bug has been outstanding two months now, and it is still marked as NEW. Could you provide an estimate when you will release a new 10.2 kernel with this problem fixed? 

I saw a comment from someone at SuSE who asked why this was urgent, here is my response. I consider this a major problem, and for me it is not acceptable to be running on a git (10.3) kernel in order to debug my programs for several reasons:

1. The git kerneal is an alpha (or beta) kernel and not made to run on 10.2
2. For the above reason does not load correctly with armour probably because it needs new versions, but there is no procedure for doing this.
3. The kernel I am using 2.6.21-rc5-git13-2-default is not only alpha, but it is broken -- it does not handle opening USB ports correctly, reported in another bug (USB cameras can no longer connect) so I am forced to boot back and forth between kernels, one for debugging, one for accessing my camera.

I really like your SuSE distro (switched to SuSE in Jan 2007), however this bug means your released kernel is badly broken for developers. I understand it is a pain for you guys to update kernels when you want to focus on 10.3, but if you don't quickly fix these kinds of major bugs I think you will find that some developers such as myself will move to another distro that gives a higher priority to these kinds of problems.
Comment 8 Petr Vanek 2007-06-15 08:19:02 UTC
I can confirm this ugly behaviour and I'm voting for quick fix too.
Comment 9 Ryan Partridge 2007-07-02 21:53:15 UTC
Still no update on this bug???
Comment 10 Andre Klapper 2007-07-02 22:05:00 UTC
this issue has triggered hundreds of useless upstream stacktraces in bugzilla.gnome.org and still wastes the time both of 10.2 users and the gnome bugsquad that triages those reports.
i complained about exactly this issue in january and again in march (after jpr asked for this again). though it is nice to see the reason listed here in the first comment, it is unacceptable how an issue with such an outcome has more or less been ignored so far.
at least i can explicitly blame suse now in the upstream bug reports, when advanced reporters ask why there traces are shit, though debug packages are installed. "please do blame your opensuse distro, and not gnome."

to me this is a blocker bug by definition. it blocks testing (QA) work by lowering the "useful feedback"-rate dramatically.

cheers,
andre
Comment 11 Petr Vanek 2007-07-04 08:59:08 UTC
hi kernel devs,

I agree with Andre Klapper, it's one of the most importatnt issue I'm facing in the development now. Of course I understand there is too much work in different areas etc. etc....

But we are missing any feedback. Ok, you can say: "we will not fix it, wait for 10.3". Good - I'll find some workaround (messing my system and GPU drivers etc.). Anyway - I'd like to hear: "it will be fixed in updates after you return  from vacations."

cheers
petr
Comment 12 Kern Sibbald 2007-07-05 09:46:02 UTC
I previously stated that the problem was resolved with 2.6.21-rc5-git13-2-default, which is apparently only partially true for two reasons:
1. That kernel does not correctly detect USB devices, which means that one cannot download photos from a camera via USB without booting back to an older kernel.

2. The debugger did work in *some* situations. However, there is still a major problem with that kernel that prevents proper debugging. I confirmed this by setting up debugging on a FC4 computer where it worked perfectly, while on the git kernel, the debugger crashes.

While it is clear that this is Open Source, and since I am not paying for it, you (Novell) are not obligated to fix it, I would say that Petr's comments are extremely kind. My comment is:

"Novell screws Open Source programmers running OpenSuSE by either being incapable or unwilling to fix a major bug in their released version of the kernel in a reasonable timeframe.  Your distro is no longer suitable for developers."

It is a real pity because you have the best installer around and a very nice implementation of KDE, but this is a killer. I would have expected more from a company that pretends to be a major player in the Linux market.

Since this is the second major kernel show stopper I have experienced in the year since I switched from Fedora to SuSE, I give up and am switching distros again.  For reference, during the 6 years I was using RedHat then later Fedora, I never once had any problem with the kernel.

PS: I had forgotten how terribly slow Yast2 is on doing updates compared to yumex and synaptic until I started seriously looking at switching distros. It is still 4 to 10 times slower than the other major distributions. I recommend that you conviscate the current machine of the developer responsible for the Yast2 software module, replace it by a 400MHz 256M RAM machine,(which I unfortunately recently tried to update) and tell him that he cannot have his normal machine back until he gets the time needed to adding a repository and loading a single new package down to 10 minutes maximum.  Even at that it would be slower than yum.
Comment 13 Andreas Kleen 2007-07-05 11:42:40 UTC
If the debugger has problems with the vdso you can disable
it with echo 0 > /proc/sys/vm/vdso_enabled

Re #11: Petr, if you really need bug fixes for old versions in a specific
time you'll need a support contract, sorry. For openSUSE the deal
is either upgrade or wait.

Re #12; USB problems: you likely need to update udev too. 
Good luck with Fedora, perhaps we'll see you back after their first bug.
The zypp (update performance) problem is being worked on bug I agreed it's pretty bad. Hopefully 10.3 will be better though.
Comment 14 Kern Sibbald 2007-07-05 12:41:36 UTC
Well, one thing I can say is that you guys have always been very gentleman like in accepting criticism.  No way will I be going back to Fedora, they are too much on the bleeding edge for me, and they butcher KDE.  My server will be CentOS with SELinux (complicated but secure and close to my FC4 server) that is 100% decided, and my development machine will probably be SimplyMepis (Debian + Ubuntu with KDE) and a desktop almost as good as yours.

However, since I missed updating udev, I might give it one more try :-)

Concerning 10.3: yes, from the feature list I've seen, you are definitely going in the right direction.
Comment 15 Andreas Kleen 2007-07-05 12:49:01 UTC
With disabling the vdso the original 10.2 kernel should actually work.
Comment 16 Andreas Kleen 2007-07-05 12:59:18 UTC
Actually I tried to reproduce this now and I can step through the vdso
on a 32bit kernel with a simple test program.

Is there a test case that shows it definitely?
Comment 17 Andre Klapper 2007-07-05 13:49:44 UTC
andi, i wonder if requesting information from your collegue andreas (who is not in the CC list but came up with the vdso stuff) would not be more sufficient?

YES, this issue still exists, even if you cannot reproduce. just take a look at bugzilla.gnome.org.
Comment 18 Andreas Kleen 2007-07-05 15:19:48 UTC
His analysis must have been wrong; the vDSO is readable
in my test. I also verified this from the source.

Well if i cannot reproduce i'll close the bug.
Sorry, i'm not going to do fishing expeditions in random bugzillas.
Comment 19 Andre Klapper 2007-07-05 17:14:59 UTC
andreas, ignoring bug reports by closing them does not magically make existing issues vanish. if you need more info, feel free to explicitly ask the folks here for more information and describing the steps to gather the requested information.
Comment 20 Jeff Mahoney 2007-07-05 18:14:01 UTC
Bug 289641 may be related. With the i586 glibc, this bug doesn't exist since vDSOs aren't used. With the i686 glibc, this gets triggered.
Comment 21 Andreas Kleen 2007-07-05 21:06:47 UTC
So far the bug is not closed yet. Yes I'm asking for a reproducible test case.
Please supply one.

Comment 22 Reinhard Nißl 2007-07-05 21:45:41 UTC
Well, I cannot contribute much. I've just booted kernel 2.6.18.8-0.1-default and the issue can be shown like mentioned above:

gdb sleep
> run 300

reports already the error message of this bug. Hitting CTRL+C and typing bt shows a backtrace without symbols.

During the past weeks I've been using kernel 2.6.21-3-default from openSUSE 10.3 which seems to be most compatible to 10.2 and stable. Repeating the above sequence with this newer kernel doesn't report the error message and shows a proper backtrace with symbols.

My system is almost 10.2 as I've updated some packages like gdb, ddd, gcc and kernel with packages from openSUSE 10.3. Here is some information which could be of interest:

> rpm -q gdb
gdb-6.6.50.20070511-7

> rpm -q glibc
glibc-2.5-25

> uname -a
Linux video 2.6.21-3-default #1 SMP Thu Apr 26 11:49:27 UTC 2007 i686 i686 i386 GNU/Linux
Comment 23 Kern Sibbald 2007-07-05 22:07:43 UTC
I don't know what you need to reproduce it.  I am running:
Linux rufus 2.6.18.8-0.3-bigsmp #1 SMP Tue Apr 17 08:42:35 UTC 2007 i686 i686 i386 GNU/Linux
which as far as I am aware is a perfectly stock released SuSE 10.2.  I run my program under the debugger and send it a signal 11, and I get:


Program received signal SIGSEGV, Segmentation fault.
0xb7f46410 in ?? ()
(gdb) where
#0  0xb7f46410 in ?? ()
#1  0xbfdd5298 in ?? ()
#2  0x00000000 in ?? ()
(gdb) thread apply all bt

Thread 16 (Thread -1214223472 (LWP 8592)):
#0  0xb7f46410 in ?? ()
#1  0xb7a05ec8 in ?? ()
#2  0x00000013 in ?? ()
#3  0x00000000 in ?? ()

Thread 11 (Thread -1240470640 (LWP 8581)):
#0  0xb7f46410 in ?? ()
#1  0xb60fdec8 in ?? ()
#2  0x00000014 in ?? ()
#3  0x00000000 in ?? ()

Thread 10 (Thread -1232077936 (LWP 8580)):
#0  0xb7f46410 in ?? ()
#1  0x00000003 in ?? ()
#2  0x00000000 in ?? ()

Thread 8 (Thread -1248863344 (LWP 8573)):
#0  0xb7f46410 in ?? ()
#1  0xb58fced8 in ?? ()
#2  0x000000de in ?? ()
#3  0x080bb684 in ?? ()
#4  0xb7ef0fab in __write_nocancel () from /lib/libpthread.so.0
#5  0x080803a6 in write_nbytes (bsock=0xb69004c0, ptr=0x80bb684 "", nbytes=222)
    at bnet.c:138
#6  0x080827ee in BSOCK::send (this=0xb69004c0) at bsock.c:292
#7  0x08082dd8 in BSOCK::despool (this=0xb69004c0,
    update_attr_spool_size=0x80783f0 <update_attr_spool_size>, tsize=2379278)
    at bsock.c:518
#8  0x08078aca in commit_attribute_spool (jcr=0x80c76c8) at spool.c:636
#9  0x080530f9 in do_append_data (jcr=0x80c76c8) at append.c:334
#10 0x080674c3 in append_data_cmd (jcr=0x80c76c8) at fd_cmds.c:194
#11 0x080670b1 in do_fd_commands (jcr=0x80c76c8) at fd_cmds.c:165
#12 0x08067612 in run_job (jcr=0x80c76c8) at fd_cmds.c:128
#13 0x08067aa4 in run_cmd (jcr=0x80c76c8) at job.c:192
#14 0x08062314 in handle_connection_request (arg=0xb69004c0) at dircmd.c:224
#15 0x0809bebc in workq_server (arg=0x80b5e60) at workq.c:357
#16 0xb7eea112 in start_thread () from /lib/libpthread.so.0
#17 0xb7be02ee in clone () from /lib/libc.so.6

Thread 3 (Thread -1222616176 (LWP 8531)):
#0  0xb7f46410 in ?? ()
#1  0xb72053a4 in ?? ()
#2  0x00000049 in ?? ()
#3  0x00000000 in ?? ()

Thread 1 (Thread -1213093008 (LWP 8526)):
#0  0xb7f46410 in ?? ()
#1  0xbfdd5298 in ?? ()
#2  0x00000000 in ?? ()
(gdb) 

Now that looks pretty broken to me as it provides no information on where the seg fault happened, and none of my threads are at 0x00000000. If I simply ctl-c the program half of the threads show the nonsense as you see above.  If the program seg faults itself, the debugger shows has same problems. With the exception of exactly how one stops the program or how it dies, the broken debugger output is essentially the same as the information I previously sent you in comment #4.

I can understand that you may not be able to reproduce it since your hardware or the program you are debugging may be different, but when there are at least 3 developers telling you it is broken, I'd say there is a 99.9% chance it is broken.  

By the way, the same problem occurs on several different computers loaded with SuSE 10.2.  It does not occur on any other OS I have used.
Comment 24 Kern Sibbald 2007-07-06 09:04:51 UTC
Created attachment 150021 [details]
Executable showing problem

Instructions on running given in bug note.
Comment 25 Kern Sibbald 2007-07-06 09:05:52 UTC
Created attachment 150022 [details]
Configuration file for bacula-sd executable

Configuration file that goes with bacula-sd executable. Instructions on reproducing bug in bug note.
Comment 26 Kern Sibbald 2007-07-06 09:15:18 UTC
I've uploaded two attachments which demonstrate this bug on my Dell Dimension 8300 computer.  The first attachment is bacula-sd an executable built on SUSE 10.2 all patches recently applied.  The second attachment is the config file for the executable. 

Place the two files in a directory, then do the following (root not required):

gdb bacula-sd
run -s -f
(after printout stops)
ctl-c

(output is)
kern@rufus:~/bacula/bug> gdb bacula-sd
GNU gdb 6.5
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i586-suse-linux"...Using host libthread_db library "/lib/libthread_db.so.1".

(gdb) run -s -f
Starting program: /home/kern/bacula/bug/bacula-sd -s -f
Failed to read a valid object file image from memory.
[Thread debugging using libthread_db enabled]
[New Thread -1211902256 (LWP 12764)]
[New Thread -1213027440 (LWP 12768)]
[New Thread -1221420144 (LWP 12769)]
[Thread -1213027440 (LWP 12768) exited]

Program received signal SIGINT, Interrupt.
[Switching to Thread -1211902256 (LWP 12764)]
0xb7eeb410 in ?? ()
(gdb) thread apply all bt

Thread 3 (Thread -1221420144 (LWP 12769)):
#0  0xb7eeb410 in ?? ()
#1  0xb73293a4 in ?? ()
#2  0x00000001 in ?? ()
#3  0x00000000 in ?? ()

Thread 1 (Thread -1211902256 (LWP 12764)):
#0  0xb7eeb410 in ?? ()
#1  0xbfcfe9d8 in ?? ()
#2  0x00000000 in ?? ()
(gdb) 

The program will write in the current directory and references /tmp and as noted above I don't normally run as root when testing.

gdb-6.5-28
glibc-2.5-25
Linux rufus 2.6.18.8-0.3-bigsmp #1 SMP Tue Apr 17 08:42:35 UTC 2007 i686 i686 i386 GNU/Linux

The correct output from a similar execution of the same code (very close but not identical source) built and executed on a FC4 machine is:

[kern@matou bin]$ gdb bacula-sd
GNU gdb Red Hat Linux (6.3.0.0-1.84rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".

(gdb) run -s -f
Starting program: /home/kern/bacula/regress/bin/bacula-sd -s -f
Reading symbols from shared object read from target memory...done.
Loaded system supplied DSO at 0x900000
[Thread debugging using libthread_db enabled]
[New Thread -1208375616 (LWP 15986)]
Detaching after fork from child process 15989.
[New Thread -1210475600 (LWP 15990)]
[New Thread -1220965456 (LWP 15991)]
[Thread -1210475600 (LWP 15990) exited]

Program received signal SIGINT, Interrupt.
[Switching to Thread -1208375616 (LWP 15986)]
0x00900402 in __kernel_vsyscall ()
(gdb) thread apply all bt

Thread 3 (Thread -1220965456 (LWP 15991)):
#0  0x00900402 in __kernel_vsyscall ()
#1  0x00e80aec in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x0809039a in watchdog_thread (arg=0x0) at watchdog.c:307
#3  0x00e7ebd4 in start_thread () from /lib/libpthread.so.0
#4  0x001db4fe in clone () from /lib/libc.so.6

Thread 1 (Thread -1208375616 (LWP 15986)):
#0  0x00900402 in __kernel_vsyscall ()
#1  0x001d3eb1 in ___newselect_nocancel () from /lib/libc.so.6
#2  0x0807943a in bnet_thread_server (addrs=0x9df6048, max_clients=21,
    client_wq=0x80a9880,
    handle_client_request=0x805e264 <handle_connection_request(void*)>)
    at bnet_server.c:161
#3  0x0804d597 in main (argc=Variable "argc" is not available.
) at stored.c:264
(gdb)        
Comment 27 Kern Sibbald 2007-07-06 09:38:03 UTC
I've tried the suggestion in comment #18, which seems to work (more testing needed to confirm 100%), but it is a bit confusing turning off vdso doesn't seem to change the .so map as I would expect it to do:

rufus:/home/kern/bacula/bug # cat /proc/self/maps | grep vdso
b7fd4000-b7fd5000 r-xp b7fd4000 00:00 0          [vdso]
rufus:/home/kern/bacula/bug # echo 0 > /proc/sys/vm/vdso_enabled
rufus:/home/kern/bacula/bug # cat /proc/self/maps | grep vdso
b7f2e000-b7f2f000 r-xp b7f2e000 00:00 0          [vdso]

Note, vdso is apparently still there after turning it off.  However, after turning vdso off, I re-run the debug session and I get:

kern@rufus:~/bacula/bug> gdb bacula-sd
GNU gdb 6.5
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i586-suse-linux"...Using host libthread_db library "/lib/libthread_db.so.1".

(gdb) run -s -f
Starting program: /home/kern/bacula/bug/bacula-sd -s -f
[Thread debugging using libthread_db enabled]
[New Thread -1210984224 (LWP 13173)]
[New Thread -1212109936 (LWP 13177)]
[Thread -1212109936 (LWP 13177) exited]
[New Thread -1220502640 (LWP 13178)]

Program received signal SIGINT, Interrupt.
[Switching to Thread -1210984224 (LWP 13173)]
0xb7fcb8b2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) thread apply all bt

Thread 3 (Thread -1220502640 (LWP 13178)):
#0  0xb7fcb8b2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0xb7f727dc in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x08096af2 in watchdog_thread (arg=0x0) at watchdog.c:307
#3  0xb7f6e112 in start_thread () from /lib/libpthread.so.0
#4  0xb7de32ee in clone () from /lib/libc.so.6

Thread 1 (Thread -1210984224 (LWP 13173)):
#0  0xb7fcb8b2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1  0xb7ddca41 in ___newselect_nocancel () from /lib/libc.so.6
#2  0x0807edb3 in bnet_thread_server (addrs=0x80b5b18, max_clients=21,
    client_wq=0x80b0c80,
    handle_client_request=0x80602f0 <handle_connection_request(void*)>)
    at bnet_server.c:161
#3  0x0804b814 in main (argc=<value optimized out>, argv=0x0) at stored.c:264
(gdb)  

which looks perfectly fine.  Since I am not a kernel guy, I don't really understand what turning off vdso does, but if it resolves the debugging problem at least it is a valid workaround for me.

Comment 28 Srinivasa Ragavan 2007-07-06 11:37:31 UTC
With 'echo 0 > /proc/sys/vm/vdso_enabled' I see the that the traces are fine. I have tried 3-4 crashes and bug-buddy seems to get the traces fine. 

Comment 29 Andreas Jaeger 2007-07-06 14:59:43 UTC
(In reply to comment #13 from Andreas Kleen)
> If the debugger has problems with the vdso you can disable
> it with echo 0 > /proc/sys/vm/vdso_enabled

I think this is a workaround.  We should document it nevertheless somewhere.

> Re #11: Petr, if you really need bug fixes for old versions in a specific
> time you'll need a support contract, sorry. For openSUSE the deal
> is either upgrade or wait.

Which is correct.  On the other hand, we do fix really critical bugs.  If there's a save and non-intrusive patch for this problem, I would approve to include with the next kernel update.  We would not do an extra kernel update for this.

Comment 30 Sankar P 2007-07-20 11:12:50 UTC
(In reply to comment #28 from Srinivasa Ragavan)
> With 'echo 0 > /proc/sys/vm/vdso_enabled' I see the that the traces are fine. I
> have tried 3-4 crashes and bug-buddy seems to get the traces fine. 
> 

I tried this workaround. Even then I do not get traces for all crashes. On some crashes, gdb segfaults and bug-buddy launches (with a bad-stacktrace).
Comment 31 Andreas Kleen 2007-07-20 11:41:48 UTC
It will only be active for processes started after the sysctl change
Comment 32 Sankar P 2007-07-23 12:52:27 UTC
I have exported this variable and then started a new terminal. Even then it was not giving me entire traces all the time.

Also, on some scenarios, if I launch with gdb, the application and gdb crashes whereas if I launch the application alone, for the same scenario it works without crashing.
Comment 33 Sankar P 2007-07-23 12:53:19 UTC
it is not "exported the variable". It should be sysctl change. Sorry.
Comment 34 Loïc Minier 2007-07-26 14:48:57 UTC
Interestingly, Debian and Ubuntu systems have been suffering from this issue, even up to 2.6.22 kernels.

I tried setting the vdso_enable to 0 (from 2), which froze my system, but setting it on boot worked and I got slightly better backtraces, but frankly still ugly.  :-/

Does someone have a link to the kernel change which is supposed to have fixed this in OpenSuse?  Thanks!
Comment 35 Robert Kaiser 2007-08-03 17:44:25 UTC
I'm seeing this on current FACTORY now, but it was gone with the experimental 2.6.21 kernel that was available from the build service for some time in the past.

Should the issue on 10.3 FACTORY be filed as a separate bug from this or is it probably the same issue?

It would be helpful if I could get useful stacktraces out of gdb, they somehow tend to help my colleagues in the Mozilla project more than "I crashed somewhere" ;-)
Comment 36 Srinivasa Ragavan 2007-08-10 09:24:12 UTC
when I(In reply to comment #13 from Andreas Kleen)
> If the debugger has problems with the vdso you can disable
> it with echo 0 > /proc/sys/vm/vdso_enabled
> 

Btw, it seems like 10.3 is also broken. When I try this work-around my machine freezes. Seems like a blocker to me.
Comment 37 Kern Sibbald 2007-08-10 10:33:14 UTC
Although the  echo 0 > /proc/sys/vm/vdso_enabled kludge does help a bit, I regret to inform you that it is *not* a solution for the production 10.2 kernel. It does seem to fix most of the problems, but quite often when the kludge is applied and things seem to work (i.e. no "Failed to read a valid object file image from memory." when starting the debugger) the debugger simply seg faults while my program is running -- *extemely* frustrating.

Given that this problem has been ongoing for well over four months on your "production" kernel and it has now cropped up in 10.3, I for one have decided that my project has suffered way too much on your distro.  

I have now brought up several CentOS 5 systems to replace my OpenSUSE systems, and once augmented by excellent 3rd party repositories has more up to date software available than OpenSuSE.  The best part is that the kernel and the debugger are rock solid under exactly the same conditions where OpenSUSE fails.

Except for this, which is a really stupid killer (blocker), and your slow Yast2 installer, which is a royal pain, I liked your distro.  IMO you guys and Novell need to seriously rethink your priorities.

Good bye.
Comment 41 Alberto Passalacqua 2007-08-21 20:36:40 UTC
This is in 10.3 too, as you can see by examining the attachment to the bug:

https://bugzilla.novell.com/show_bug.cgi?id=301716

And the solution can be found here:

https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/74691

Moving to 10.3.
Comment 43 Andreas Kleen 2007-08-22 15:13:23 UTC
Re #41 I don't see anything useful except some whining in that ubuntu link

I tried again the executable posted here and couldn't reproduce it,
but a custom test program in beta1 could after some trying.

Anyways, that was the problem that was fixed post beta1 with

Wed Aug 15 15:16:04 CEST 2007 - ak@suse.de
...
- patches.arch/i386-compat-vdso: i386: allow debuggers to access
  the vsyscall page with  compat vDSO.
...

In general i must say if you people here had given reproducible 
test cases I could have told you much earlier about this fix 
which had been around for a long time.

I readded the patch to the 10.2 tree too so it'll be fixed there in the
next update (or in kotd in a few hours)

-------------------------------------------------------------------
Wed Aug 22 17:12:10 CEST 2007 - ak@suse.de

- patches.arch/i386-compat-vdso: i386: allow debuggers to access
  the vsyscall page with  compat vDSO (258433).

Comment 44 Alberto Passalacqua 2007-08-22 16:01:57 UTC
The ubuntu link marks the bug as fixed. I reported it for that reason. 

>In general i must say if you people here had given reproducible 
>test cases I could have told you much earlier about this fix 
>which had been around for a long time.

To be frank, we are not expert but users, and I think it's already enough if we provide reports and information, for what we can.

In general, you're supposed to check and fix bugs, maybe avoiding arrogant answers when someone is trying to help.

With kind regards.
Comment 45 Andreas Jaeger 2007-08-22 17:52:31 UTC
I just tested the beta2 kernel on i686 system and that message does not appear anymore when debugging sleep.  Hope that it's fixed for good.
Comment 46 Andre Klapper 2007-08-22 18:19:51 UTC
@ak: if you only expect developers to file bug reports, then good luck.
even novell developers that could reproduce this (sankar) commented on this bug, so with some motivation and the understanding that it's an important issue (if novell is interested in qa and *useful* traces in the user feedback in general) it shouldn't have been to track this down together.
Comment 47 Andreas Kleen 2007-08-22 23:09:11 UTC
well openSUSE is a community project, not a Novell only project.
You get great software for free, but you're expected to write good
bug reports if you don't contribute patches. 

Good bug reports involves reproducible cases
at least for deterministic bugs. That's how free software works.

Besides users of gdb should be developers anyways so we can probably
expect some higher standards.
Comment 48 Andre Klapper 2007-08-22 23:31:43 UTC
i know, i've been triaging for ximian and gnome for a few years now. :-)
anyway, thanks for the fix that i will attach here so other distros can also pick it up, so the triagers waste less time on useless traces.
Comment 49 Andre Klapper 2007-08-22 23:34:17 UTC
Created attachment 159330 [details]
patch mentioned in comment 43

"i386-compat-vdso", extracted from
https://api.opensuse.org/source/openSUSE:Factory/kernel-source/patches.arch.tar.bz2
Comment 50 Andreas Kleen 2007-08-23 00:07:20 UTC
I submitted it for stable anyways.

But there are probably not that many distributions with 2.6.22 kernels
and at least one seems to believe it fixed already @)
Comment 51 Klaus Wagner 2007-10-24 16:08:41 UTC
Just for the record:
 
Patch:  patches.arch/i386-compat-vdso
 
included, enabled, and released in:
 
  10.2 kernel update 2.6.18.8-0.7
  dated Oct 03, 2007 & released Oct 10, 2007.