Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.

For the best experience please use the latest Chrome, Safari or Firefox browser.

GNU Tools Cauldron

GNU Tools Cauldron: Trip report

GNU Tools Cauldron is an annual meeting / informal conference of developers involved in GNU toolchain projects, like gcc, gdb or glibc. I used to be a team lead of Tools QE and I still consider Tools to be one of my main areas of interest, so I’m happy to attend this event whenever it is sufficiently close. It is certainly very useful, as one can follow the latest development plans not only by RH developers, but also the ones from different companies. We will eventually inherit this development in RHEL and DTS, so it is quite useful to know what will happen in the future. Also, QE-related topics are sometimes discussed, especially in the context of upstream development.

What to take from here (version for busy people)

  • It is certainly beneficial for QE to attend such meetings: it is useful to meet developers in person, see what is coming, watch the community and recognize upstream involvement opportunities
  • “Sanitizers” are finding bugs in software, yet we do not use them in our testing process. We should start!
  • We are building libabigail, a suite allowing ABI compatibility testing
  • Codespeed is an interesting tool for long-term software performance testing

The talks

GDB Performance Testing

Speaker: Yao Qi
Quite technical talk about the performance testing infrastructure in GDB project [1]. The talk was basically presenting specific examples of performance tests, discovered performance problems, fixes, and evaluations of the fix. I suppose it might be interesting to whoever tests GDB now. An interesting point for our Corly effort is a link to the long-term performance result repository / analysis tools, which is quite similar to what we are trying to achieve for test results. The Codespeed tool is open-source [2] and deployed to measure other SW products [3], but does not have especially live upstream.


Interesting long-term performance trackers

Interesting observations

  • Command thread does not scale
  • dcache is conservative for multi-threaded code

ABI Difference Testing

Speaker: Dodji Seketeli
Another take on the ABI compatibility testing [1], this time based mostly on DWARF data (debuginfo). I came late for this one and got the impression the project is in mostly in early stage with first deliverables ready. This is a RH project, so I assume there are some plans to use this internally to check ABI compatibility.



Answering the question: Is my application A still compatible with the new version of library L?

Adding an element to the end of the structure is not generally harmful, if it is not used inside something else (value parameter or embedded structure)

Support un-instantiated templates is a problem.

Collaboration between GCC and LLVM

Speaker: Renato Golin
This one needed courage (from the speaker). The GCC and LLVM communities are not especially friendly towards each other (but they are no longer hostile either, though). There are certainly opportunities for collaboration, but it is certainly not widely agreed what they are. Lot of proposals were simply to standardize various aspects (UI, builtins), which is not generally understood to be a good thing in GNU community, as the gains are dubious.


There will necessarily be competition, and we can protect users and build on each other by collaborating


  • Discussion
  • Pro-active discussion of the future
  • Isolate legacy
  • Agree and follow docs

Common projects

  • Binutils (gas,ld)
  • glibc
  • Sanitizers

Common goals

User interface

  • Target triples (which are not really useful anymore)
  • Architecture flags
  • Tool invocation

Shouldn’t we agree on the common UI?


  • Common extensions
  • ASM
  • Linker API
  • Inline assembly
  • General behavior

Pointer Bounds Checker

Speaker: Ilya Enkovich, Intel
Ilya spoke about the implementation and usage of a mechanism for preventing some kinds of memory bugs. This mechanism is based on additional specialized instructions (the MPX set, not currently present in any real HW) added by the compiler. The projected overheads are 15-20% hit on runtime, 90% hit on code size and 70% hit on memory consumption. There is a branch in gcc with experimental implementation, and Intel offers an emulator [1]


Based on Intel MPX instructions (instrumentation-based)
Lower performance penalty due to HW
Find struct-internal buffer overflow


Cannot detect dangling pointers

New registers: Bound registers and configuration registers

MPX instructions for manipulating and checking the bounds. These instructions are NOPs on old architectures and when instrumentation is not enabled.


No measures (no HW), projections:

15-20% runtime when on
4-6% runtime when off
x1.9 code size (avg from x1.35-x4)
x1.7 memory consumption (avg from x1-x4)

Present in mpx branch in GCC, emulator available at

News from Sanitizers

Speaker: Kostya Serebryany / Evgeniy Stepanov

There is a new Sanitizer: MemorySanitizer, which reports usage of uninitialized memory in interesting places (syscalls, conditionals). ThreadSanitizer learned to detect more forms of deadlocks. AddressSanitizer now reports memory leaks too, supposedly with negligible runtime overhead. They are also working on detecting beyond-bound accesses in C++ containers (beyond t.end() but below t.capacity()).

There was a discussion about using Sanitizers with glibc, which is currently not very well possible due to glibc using GNU dialect of C, unsupported by Clang. The discussion was similar to that during the LLVM/GCC collaboration talk, and there was certain amount of vitriol in the air. This was not helped by the fact that the presenting two Googlers presented with a certain amount of know-it-all-and-tell-you-what-to-do attitude…

Anyway, with at least some Sanitizers present in gcc, I feel like we are missing a great opportunity in RHEL testing by not using them during our testing. These things provably find tons of bugs everywhere.

MemorySanitizer (new)

Detects use of uninitialized memory. It reports usage of uninitialized values in sensitive places as well as a location where the memory was allocated.

Googlers boasting.

Reporting every load of uninitialized data is too noisy, as it is often copied around. It is often used in calculation as long as the result is discarded.

MSan tracks unitialized property in various data, and reports if it is used in:

  • Program counter
  • Address
  • System call


Acquired DeadlockDetector


Leak Sanitizer

Reports memory leaks, supposedly with really small overheads


Collect coverage data while fuzzing with ASan (they now work on this)

Overhead: 10-20%, <1s shutdown

They are using coverage-driven (genetic) fuzzing

Container overflows

Going behind t.end(), like t.begin() + t.capacity()

Implemented in std::vector<> in libc++, almost deployed in std::vector<> in libstdc++.

They plan deque now.

Sanitizers and Glibc

They do not instrument Glibc, but intercept interesting glibc functions for user supplied bad pointer.s

Whining about Glibc using GNU C instead of ANSI C, like nested functions.

Cannot benefit from Clang’s warnings and MemorySanitizers (which is probably true)

In a hackish way, ASan is usable with glibc.

More whining:

  • GCC and glibc not not parallel enough
  • No public continuous integration testing for GCC
  • Annoying interface differences between Clang and GCC (sanitizers)

Apparently a bit controversial topic :)


Why should distributions build Glibc with ASan?: So users find more bugs both in Glibc and packages.
Are you suggested distros should ship Sanitizer-enhanced packages by default or extras? Not any clear answer, they do not seem to care.

Performance tuning for glibc

Speaker: Ondřej Bilka
Spending lot of time on performance of malloc implementations myself, this was a super-interesting topic for me. Unfortunately, the presentation was really hard to follow. Measuring performance on level this low is hard due to interference from CPU, making it hard to measure impact on common usage scenario. So Ondrej went beyond microbenchmarks and here I basically got lost. There are some interesting insights here [1].




I had the pleasure of talking with lot of people on many different topics. There seems to be a general interest in upstreams for people who would invest effort and HW to ongoing automated QE measures like continuous integration. With gcc and friends being old-school SW projects born before anyone not being a mathematician ever used the word ‘continuous’, the communities lack such measures (or at least some parts of them do). Of course, more QE involvement in upstreams would be welcome.

I talked to Dave Malcolm and he mentioned he still might have some designs and thoughts on reporting and visualizing multidimensional data (like our test plans are) from the time when he worked on TCMS predecessor. With the ongoing efforts with Polarion, I suppose these might come in handy.

Carlos O’Donell mentioned a need for a test results repository in one conversation. I did not have the time to elaborate, but building one is a part of one of our RH Lab projects. I plan to follow-up with Carlos about this.

I also had a small chat with Jeff Law about RH Lab and a possible research involvement of the Tools team. This was quite late and interrupted by a session start, but it is also worth a follow-up.


Another reports

Alex Bradbury (the guy who writes LLVM newsletter) posted his notes here:


Dave Malcolm

Hi Dave,

during our chat on Cauldron, you mentioned you might still have some resources from the time you were working on TCMS predecessor, especially some work on reporting and some design work on multi-dimensional data visualization. If yes, would you make it available for us so we can perhaps utilize it with TCMS or Polarion or whatever else? It seems to me that such kind of thing might still be useful…


Carlos O’Donell

Hi Carlos,

we have only very briefly met on Cauldron: I’m a former lead of Tools QE (Martin Cermak’s predecessor), and I am now working on a project where we are trying to start applied research collaboration between Red Hat and Brno University of Technology. We call it Red Hat Lab [1]. I was coming to Cauldron partially to find people who might be interested in research, either having a problem we could try to solve or ability to utilize something we are doing here.

The reason I’m following up is you mentioning a desire for a ‘test result repository’ during a discussion on implementing CI measures in GNU upstreams. We have a project in progress which involves building something like that: basically a data store for storing test results run anywhere, and putting some structure on the results, in the form of a result timeline for comparable results (same environment, known place in history by knowing e.g. a commit hash). We intend to collect this data and then mine them for useful information, but it could also be possible to build e.g. a regression alerts or analysis tools on top of them. We are going for a central storage with server based front-ends for accepting and interpreting results, and provide really thin client tools for clients to submit results. The client tools should be trivial to include to any existing CI/buildbot infrastructure.

So, the question: what are your use-cases for such test-result repository? Do you know about any related efforts underway? Would you be interested in further information and perhaps about our further progress?



Jeff Law

Hi Jeff,

this is a follow-up to our short chat on the Cauldron w.r.t. possibility of coming up with something static analysis related and useful for RH. We basically have two offices full of static analysis and formal verification experts here, who would be more than willing to come up with new static analysis techniques for Red Hat’s specific needs. We also have people to implement what the first group designs. What we are missing are the needs and problems to solve.

Do you think we can come up with something? I was thinking something in the direction of change impact (patch risk analysis), or perhaps some areas specialized enough not to be present in Coverity: some of my colleagues are working on dynamic analysis of programs using transactional memory, for example.