So, for something like openoffice.org the debuginfo is about 400Megs in size which is sort of large to ask someone to download. Anyway, crashes happen when the user isn’t prepared, so they never have a debugging OOo to hand and if they do install the debuginfo they won’t be able to reproduce the problem. So it’s useful for apps like this to print out their callstack on crash, and enable that to be submitted to developers to map that stack back to a debugging version of that deployed OOo with offline debuginfo.
i.e. using glibc backtrace documentation and friends, heres a basic linux journal article on the topic.
a) but shared libraries complicate this a little in that the address you get from backtrace of something from a shared library is where the code ended up when the lib was loaded. So you really want that location relative to the baseaddress of the library so that you can use addr2line -e that_shared_library (abs address – base_address) to get the position within the offline .so. Using dladdr on the result from backtrace will give the name of the library it belongs to in dli_fname and the base address in dli_fbase
b) All ok so far, but there’s another problem. prelink ate your binaries. While you weren’t looking prelink has come along and optimized the .sos on your computer and added some extra stuff to them. So the callstack from your copy of OOo doesn’t match the offline debuginfo copy of OOo anymore.
e.g. the original not-prelinked line of an address in libsw680li
0x064987e9: /usr/lib/openoffice.org/program/libsw680li.so + 0x5557e9
and the equivalent line after it has been prelinked
0x494bf7e9 /usr/lib/openoffice.org/program/libsw680li.so + 0x58d7e9
0x5557e9 has moved to 0x58d7e9, an increase of 0×38000
So we need a way to fix this problem. Sometimes you can get a symbol name and offset from symbol in dli_sname and dli_saddr from dladdr. If so, then you could print that info in the call stack. As a bodge you can get the location of that symbol in the offline original library and adjust all the prelinked values for that library accordingly from that reference. But you’re like to not get anything back for those values in plenty of cases, leaving unmappable gaps in the callstack. A better way to know that the values for libsw680li.so have to be adjusted by 0×38000 to get the offline position is needed.
so, readelf on the original unprelinked library shows…
[20] .dynamic DYNAMIC 00865944 865944
and on the prelinked…
[20] .dynamic DYNAMIC 497cf944 89d944
The diff of 0×38000 in that sections offset looks useful. Can we include that info in our callstack so that offline we can can compare it to our offline libs and make the adjustment. By using dl_iterate_phdr we can look for the PT_DYNAMIC section of the various loaded libraries and spit out its p_offset value in our callstack. i.e. now we get…
0x064987e9: 0×00865944: /usr/lib/openoffice.org/program/libsw680li.so + 0x5557e9 for a nonprelinked
0x494bf7e9: 0x0089d944: /usr/lib/openoffice.org/program/libsw680li.so + 0x58d7e9 for prelinked
so if we get a prelinked crash, we can compare the included dynamic section offset and compare it for a given lib to the non-prelinked offline copy and make the adjustments to restore the callstack to a non-prelinked case and use addr2line for those values on our callstack and recover the original function names and position in the source. ta-da.
>> So, for something like openoffice.org the debuginfo is
>> about 400Megs in size which is sort of large to ask
>> someone to download.
For some (more and more) people 400Megs is no problem anymore.
>> Anyway, crashes happen when the user isn’t prepared, so
>> they never have a debugging OOo to hand and if they do
>> install the debuginfo they won’t be able to reproduce the >> problem.
Members of oooqa team would have experience to reproduce the crash. So take the number of oooqa members who have broadband. Imagine those would submit crash reports from the debug version of ooo. Would that signifcantly help core devels to save time for mixing more bugs?
I dont know the answer, just want to ask a good question
It is a point that the “vanilla” openoffice.org rpms don’t make a matching “debuginfo” rpm available which would enable the user that has it installed to run their office inside gdb which would give a pre-mapped to symbol names call-stack with function parameters and ability to examine what went wrong directly. So there is an argument for making a debuginfo rpm for the vanilla OOo so that the oooqa team members could keep that installed and use gdb on reproducible crashes to get better stacktraces for devs
> So, for something like openoffice.org the debuginfo is about 400Megs in size
Include only .debug_frame section, that’s much smaller (it contains only info on all function names and number of their args) and have full debug symbols in separate package. For an example how to do this, see glibc debian package.
If you have the answer to this question, that would be awesome:
http://stackoverflow.com/questions/219800/convert-memory-address-range-in-running-linux-process-to-symbols-in-object-file