Bug hunt: Independent functions crashing gcov

Blog

October 3, 2024

NASA reported an odd crash in gcov, and I found it interesting and deep enough to warrant describing the root cause and the path it took me on. The actual crash itself was quickly found with gdb; reading the block.locations.lines when printing the source lines of a path when the array was empty (with size()-1, even!). A fix would be to not do that e.g. by guarding the access, but the question is why the array is empty in the first place.

The crash seems to happen when all of the following are true:

You have two classes with a method returning a static instance (singleton). The method does not need to be static.
The method’s name (symbol) is on the same line in both files.
The methods are both implemented in the header, directly in the class.
The class itself has a non-trivial member for a non-trivial destructor.

If any of these change gcov doesn’t crash anymore, but some more digging seems to confirm that the problem is deeper in gcc/gcov. If I move the implementation to the source file, gcov prints this. Notice the static A.

function B::getInstance called 0 returned 0% blocks executed 0%
paths covered 0 of 3
path 0 not covered:
BB  2:          11:B& B::getInstance(){
BB  2: (true)   12:    static B theInstance;
BB  3:          12:    static B theInstance;
BB  4: (true)   12:    static B theInstance;
BB  5:          12:    static B theInstance;
BB  6:           7:    static A theInstance;
BB  6:          12:    static B theInstance;
BB  7:          12:    static B theInstance;
BB  8:          13:    return theInstance;
BB  9:          13:    return theInstance;

Clearly two functions are mixed up, and the problem was only shown by path coverage, not caused by it. To reproduce:

// fst.h
#include <string>
struct fst {
    fst() {}
    static fst& instance() {
        static fst self;
        return self;
    }
    std::string data;
};

// snd.h
#include <string>
struct snd {
    snd() {}
    static snd& instance() {
        static snd self;
        return self;
    }
    std::string data;
};

// test.cc
#include "fst.h"
#include "snd.h"

int main(int args, char **argv) {
    fst::instance().data = "foo";
    snd::instance().data = "bar";
}

$ gcc --coverage -fpath-coverage test.cc -o test
$ gcov --stdout --prime-paths-source test --include=instance

Interestingly, this seems to only crash in one of them.

$ gcov --prime-paths-source test --stdout --include=fst # works
$ gcov --prime-paths-source test --stdout --include=snd # crashes

This is a result of inclusion order; if fst.h is included before snd.h, it crashes in snd, and vice versa. This is a good hint, so I dump the CFG from gcc. While these are unsurprisingly similar there is no obvious hint here.

{fst,snd}::instance cfg

At this point I started digging into gcc itself, since it seems gcov does the right thing given its input. I have gcov dump out the line mapping it reads from the .gcno file:

reading tag-line: lineno = 0, block = 2, file-name = snd.h
reading tag-line: lineno = 0, block = 3, file-name = snd.h
reading tag-line: lineno = 0, block = 4, file-name = snd.h
reading tag-line: lineno = 0, block = 5, file-name = snd.h
reading tag-line: lineno = 0, block = 6, file-name = snd.h
reading tag-line: lineno = 0, block = 6, file-name = fst.h
reading tag-line: lineno = 0, block = 7, file-name = snd.h
reading tag-line: lineno = 0, block = 8, file-name = snd.h
reading tag-line: lineno = 0, block = 9, file-name = snd.h
reading tag-line: lineno = 0, block = 2, file-name = snd.h
reading tag-line: lineno = 0, block = 3, file-name = snd.h
reading tag-line: lineno = 0, block = 2, file-name = snd.h
reading tag-line: lineno = 0, block = 3, file-name = snd.h

reading tag-line: lineno = 0, block = 2, file-name = fst.h
reading tag-line: lineno = 0, block = 3, file-name = fst.h
reading tag-line: lineno = 0, block = 4, file-name = fst.h
reading tag-line: lineno = 0, block = 5, file-name = fst.h
reading tag-line: lineno = 0, block = 6, file-name = fst.h
reading tag-line: lineno = 0, block = 7, file-name = fst.h
reading tag-line: lineno = 0, block = 8, file-name = fst.h
reading tag-line: lineno = 0, block = 9, file-name = fst.h
reading tag-line: lineno = 0, block = 2, file-name = fst.h
reading tag-line: lineno = 0, block = 3, file-name = fst.h
reading tag-line: lineno = 0, block = 2, file-name = fst.h
reading tag-line: lineno = 0, block = 3, file-name = fst.h

Ok, so the problem might be in the .gcno file - notice there is a single fst.h in the block of snd.h, but not the other way around. That probably means that means that gcc either records the source plain wrong, implicitly shares implementation (by accident), or implicitly shares implementation (on purpose).

I dump some diagnostics at the the corresponding writes in gcc where this is recorded:

diff --git a/gcc/profile.cc b/gcc/profile.cc
index c3d06c4d387..877d6ca0ac9 100644
--- a/gcc/profile.cc
+++ b/gcc/profile.cc
@@ -68,6 +68,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "file-prefix-map.h"

 #include "profile.h"
+#include "gimple-pretty-print.h"

 struct condcov;
 struct condcov *find_conditions (struct function*);
@@ -1108,6 +1109,9 @@ output_location (hash_set<location_triplet_hash> *streamed_locations,
       prev_file_name = file_name;
       gcov_write_unsigned (0);
       gcov_write_filename (prev_file_name);
+      fprintf (stderr, "\n\nname-differs -- prev = %s, self = %s\n",
                prev_file_name, file_name);
+      gimple_dump_bb (stderr, bb, 0, TDF_NONE);
     }
   if (line_differs)
     {

name-differs -- prev = snd.h, self = snd.h
<bb 6> :
_11 = 1;
__cxxabiv1::__cxa_atexit (__dt_comp , &self, &__dso_handle);

name-differs -- prev = fst.h, self = fst.h
<bb 6> :
_11 = 1;
__cxxabiv1::__cxa_atexit (__dt_comp , &self, &__dso_handle);

[...]

name-differs -- prev = fst.h, self = fst.h
<bb 6> :
_11 = 1;
__cxxabiv1::__cxa_atexit (__dt_comp , &self, &__dso_handle);

So snd::instance gets two __cxa_atexit prints, one from fst.h for some reason, and fst::instance does not. This is the root cause as far as gcov is concerned - gcc either generates a surprising entry, or a bogus one. This leads to an empty block.locations.lines which turns into a bad access that crashes gcov. My best guess is that it duplicates the basic block for static destructors which happens to preserve the first source reference.

for (unsigned i = 0; i != loc.lines.size () - 1; ++i)
  {
    const unsigned line = loc.lines[i];
    fprintf (gcov_file, "BB %2d: %-7s %3d", bb, "", line);
    print_source_line (gcov_file, lines, line);
  }

But this raises a few more questions. The locations array is not empty for the fst block because the __cxa_atexit is associated with it and there is no file change. Effectively, the line is dropped from the coverage report and fst prints like you would expect. This is not strange, since it is compiler generated to guard the static. The question now is, does this happen consistently for other singletons, and why is this even a problem for snd?

I added yet another singleton (lst), which also got two __cxa_atexit entries printed, but neither snd nor lst got a third entry, and fst still only had its one. This indicates that the problem does not change except when going from one to two statics.

The next step was to count the invocations of get_atexit_node(), the function which creates represents the __cxa_atexit calls, which is 1. It is in turn called from register_dtor_fn which is called once per static instance.

Tracking it to where basic blocks are built (tree-cfg.cc/make_blocks_1) confirms this: printing every statement assigned to the block including its location gives these notable statements:

[snd.h]: 4:20 D.35685 = 1;
[fst.h]: 4:20 __cxxabiv1::__cxa_atexit (__dt_comp , &self, &__dso_handle);
[snd.h]: 4:20 __cxa_guard_release (&_ZGVZN3snd8InstanceEvE4self);

...

[fst.h]: 4:20 D.35637 = 1;
[fst.h]: 4:20 __cxxabiv1::__cxa_atexit (__dt_comp , &self, &__dso_handle);
[fst.h]: 4:20 __cxa_guard_release (&_ZGVZN3fst8instanceEvE4self);

This pretty much concludes the hunt for the root cause. The same object (or at least a plain copy) is reused for the __cxa_atexit call, which happens to preserve the source of where it was first generated. This call itself is not anchored too strongly to a line in the source (well, when the static is constructed, but it is still implicit) and the association to an otherwise unrelated file is then misinterpreted by gcov as inlining. The __cxa_atexit call being shared is not really a problem, it is compiler generated after all, which means the right thing for gcov to do is to skip empty location arrays. This was first proposed fix, but now it is righteous.

Trip report: NDC Techtown 2024 Early access: zcov