Bug hunt: Independent functions crashing gcov
NASA reported an odd crash in gcov, and I found it interesting and
deep enough to warrant describing the root cause and the path it took
me on. The actual crash itself was quickly found with gdb; reading the
block.locations.lines
when printing the source lines of a path when
the array was empty (with size()-1, even!). A fix would be to not do
that e.g. by guarding the access, but the question is why the array is
empty in the first place.
The crash seems to happen when all of the following are true:
- You have two classes with a method returning a static instance (singleton). The method does not need to be static.
- The method’s name (symbol) is on the same line in both files.
- The methods are both implemented in the header, directly in the class.
- The class itself has a non-trivial member for a non-trivial destructor.
If any of these change gcov doesn’t crash anymore, but some more
digging seems to confirm that the problem is deeper in gcc/gcov. If I
move the implementation to the source file, gcov prints this. Notice
the static A
.
function B::getInstance called 0 returned 0% blocks executed 0%
paths covered 0 of 3
path 0 not covered:
BB 2: 11:B& B::getInstance(){
BB 2: (true) 12: static B theInstance;
BB 3: 12: static B theInstance;
BB 4: (true) 12: static B theInstance;
BB 5: 12: static B theInstance;
BB 6: 7: static A theInstance;
BB 6: 12: static B theInstance;
BB 7: 12: static B theInstance;
BB 8: 13: return theInstance;
BB 9: 13: return theInstance;
Clearly two functions are mixed up, and the problem was only shown by path coverage, not caused by it. To reproduce:
// fst.h
#include <string>
struct fst {
fst() {}
static fst& instance() {
static fst self;
return self;
}
std::string data;
};
// snd.h
#include <string>
struct snd {
snd() {}
static snd& instance() {
static snd self;
return self;
}
std::string data;
};
// test.cc
#include "fst.h"
#include "snd.h"
int main(int args, char **argv) {
fst::instance().data = "foo";
snd::instance().data = "bar";
}
$ gcc --coverage -fpath-coverage test.cc -o test
$ gcov --stdout --prime-paths-source test --include=instance
Interestingly, this seems to only crash in one of them.
$ gcov --prime-paths-source test --stdout --include=fst # works
$ gcov --prime-paths-source test --stdout --include=snd # crashes
This is a result of inclusion order; if fst.h is included before snd.h, it crashes in snd, and vice versa. This is a good hint, so I dump the CFG from gcc. While these are unsurprisingly similar there is no obvious hint here.
At this point I started digging into gcc itself, since it seems gcov does the right thing given its input. I have gcov dump out the line mapping it reads from the .gcno file:
reading tag-line: lineno = 0, block = 2, file-name = snd.h
reading tag-line: lineno = 0, block = 3, file-name = snd.h
reading tag-line: lineno = 0, block = 4, file-name = snd.h
reading tag-line: lineno = 0, block = 5, file-name = snd.h
reading tag-line: lineno = 0, block = 6, file-name = snd.h
reading tag-line: lineno = 0, block = 6, file-name = fst.h
reading tag-line: lineno = 0, block = 7, file-name = snd.h
reading tag-line: lineno = 0, block = 8, file-name = snd.h
reading tag-line: lineno = 0, block = 9, file-name = snd.h
reading tag-line: lineno = 0, block = 2, file-name = snd.h
reading tag-line: lineno = 0, block = 3, file-name = snd.h
reading tag-line: lineno = 0, block = 2, file-name = snd.h
reading tag-line: lineno = 0, block = 3, file-name = snd.h
reading tag-line: lineno = 0, block = 2, file-name = fst.h
reading tag-line: lineno = 0, block = 3, file-name = fst.h
reading tag-line: lineno = 0, block = 4, file-name = fst.h
reading tag-line: lineno = 0, block = 5, file-name = fst.h
reading tag-line: lineno = 0, block = 6, file-name = fst.h
reading tag-line: lineno = 0, block = 7, file-name = fst.h
reading tag-line: lineno = 0, block = 8, file-name = fst.h
reading tag-line: lineno = 0, block = 9, file-name = fst.h
reading tag-line: lineno = 0, block = 2, file-name = fst.h
reading tag-line: lineno = 0, block = 3, file-name = fst.h
reading tag-line: lineno = 0, block = 2, file-name = fst.h
reading tag-line: lineno = 0, block = 3, file-name = fst.h
Ok, so the problem might be in the .gcno file - notice there is a single fst.h in the block of snd.h, but not the other way around. That probably means that means that gcc either records the source plain wrong, implicitly shares implementation (by accident), or implicitly shares implementation (on purpose).
I dump some diagnostics at the the corresponding writes in gcc where this is recorded:
diff --git a/gcc/profile.cc b/gcc/profile.cc
index c3d06c4d387..877d6ca0ac9 100644
--- a/gcc/profile.cc
+++ b/gcc/profile.cc
@@ -68,6 +68,7 @@ along with GCC; see the file COPYING3. If not see
#include "file-prefix-map.h"
#include "profile.h"
+#include "gimple-pretty-print.h"
struct condcov;
struct condcov *find_conditions (struct function*);
@@ -1108,6 +1109,9 @@ output_location (hash_set<location_triplet_hash> *streamed_locations,
prev_file_name = file_name;
gcov_write_unsigned (0);
gcov_write_filename (prev_file_name);
+ fprintf (stderr, "\n\nname-differs -- prev = %s, self = %s\n",
prev_file_name, file_name);
+ gimple_dump_bb (stderr, bb, 0, TDF_NONE);
}
if (line_differs)
{
name-differs -- prev = snd.h, self = snd.h
<bb 6> :
_11 = 1;
__cxxabiv1::__cxa_atexit (__dt_comp , &self, &__dso_handle);
name-differs -- prev = fst.h, self = fst.h
<bb 6> :
_11 = 1;
__cxxabiv1::__cxa_atexit (__dt_comp , &self, &__dso_handle);
[...]
name-differs -- prev = fst.h, self = fst.h
<bb 6> :
_11 = 1;
__cxxabiv1::__cxa_atexit (__dt_comp , &self, &__dso_handle);
So snd::instance gets two __cxa_atexit
prints, one from fst.h for
some reason, and fst::instance does not. This is the root cause as far
as gcov is concerned - gcc either generates a surprising entry, or a
bogus one. This leads to an empty block.locations.lines
which turns
into a bad access that crashes gcov. My best guess is that it
duplicates the basic block for static destructors which happens to
preserve the first source reference.
for (unsigned i = 0; i != loc.lines.size () - 1; ++i)
{
const unsigned line = loc.lines[i];
fprintf (gcov_file, "BB %2d: %-7s %3d", bb, "", line);
print_source_line (gcov_file, lines, line);
}
But this raises a few more questions. The locations array is not empty
for the fst block because the __cxa_atexit
is associated with it and
there is no file change. Effectively, the line is dropped from the
coverage report and fst prints like you would expect.
This is not strange, since it is compiler generated to guard the
static. The question now is, does this happen consistently for other
singletons, and why is this even a problem for snd?
I added yet another singleton (lst), which also got two __cxa_atexit
entries printed, but neither snd nor lst got a third entry, and fst
still only had its one. This indicates that the problem does not
change except when going from one to two statics.
The next step was to count the invocations of get_atexit_node()
, the
function which creates represents the __cxa_atexit
calls, which
is 1. It is in turn called from register_dtor_fn
which is called
once per static instance.
Tracking it to where basic blocks are built (tree-cfg.cc/make_blocks_1) confirms this: printing every statement assigned to the block including its location gives these notable statements:
[snd.h]: 4:20 D.35685 = 1;
[fst.h]: 4:20 __cxxabiv1::__cxa_atexit (__dt_comp , &self, &__dso_handle);
[snd.h]: 4:20 __cxa_guard_release (&_ZGVZN3snd8InstanceEvE4self);
...
[fst.h]: 4:20 D.35637 = 1;
[fst.h]: 4:20 __cxxabiv1::__cxa_atexit (__dt_comp , &self, &__dso_handle);
[fst.h]: 4:20 __cxa_guard_release (&_ZGVZN3fst8instanceEvE4self);
This pretty much concludes the hunt for the root cause. The same
object (or at least a plain copy) is reused for the __cxa_atexit
call, which happens to preserve the source of where it was first
generated. This call itself is not anchored too strongly to a line in
the source (well, when the static is constructed, but it is still
implicit) and the association to an otherwise unrelated file is then
misinterpreted by gcov as inlining. The __cxa_atexit
call being
shared is not really a problem, it is compiler generated after all,
which means the right thing for gcov to do is to skip empty location
arrays. This was first proposed fix, but now it is righteous.