ABRT project produces very helpful statistics about crashes in Fedora. We in the Red Hat desktop team have been using it intensively for some time. I’ve already written about it in one of my previous posts. It’s really helped us make Fedora much more stable.
Call me Captain Obvious who just discovered America, but until now I had a very little idea about the fact that I can filter messages from FAF and make alerts. So when a problem in one of my packages reaches, say, 1000 occurrences I receive an email or IRC message that there is a severe enough problem to look at.
This is pretty useful for every Fedora packager and I think most of them are still not aware of it. If you’d like to set it up, go to the Fedora Notifications app, log in, choose either email or irc settings, click “Create a new filter”, and pick one of the available FAF rules. You can be notified of every single reported crash or (as the other end of the scale) you can set that you won’t be disturbed until the problem reaches 1,000,000 occurrences. It really depends on how popular and “crashy” your packages are. Just check the FAF stats and set the limit accordingly.
Of course, it’s just a very little subset of Fedora Notifications settings. This tool is very powerful, you can pick many other rules, combine them, and create filters tailored right for you. Kudos to our infra team for it!
Last week, the official Fedora Project account asked users on social networks why Fedora is their distribution of choice. Probably the most frequent answer was that Fedora is THE GNOME distro, that it has the best supported GNOME, which really made me happy, but what made me even happier was that I found a lot of answers like “You won’t believe it, but I use Fedora for stability”. Indeed, the stability of Fedora has improved a lot since I started using it, especially in the last releases. How did we achieve it?
There are several reasons why Fedora is more stable than ever before. What plays an important role is that the significant changes have settled. GNOME 3 matures, the wild beginnings of systemd are also over, Anaconda has stabilized a lot, too. Another reasons is the Fedora QA team, which now has 10 people who test Fedora full time. This is something no other community can enjoy. If you add volunteers and the fact that the team uses more and more of automated testing, you get a lot of test coverage. What I think has also helped is focus. We created three official editions – Workstation, Server, Cloud and defined what MUST be good (the three editions) and what CAN be good (everything else – spins, labs,…). We have also changed the strategy. Fedora is supposed to be progressive, but it doesn’t mean we need to force immature features on users. However, we also doesn’t want to be too conservative and become another Debian. I think we have found a good balance. The strategy is to have stable defaults and experimental features as opt-ins that are just a few clicks away for early adopters who would like to test them (this strategy was used for DNF, and now we’re using it for Wayland). This way, Fedora is stable enough for users who just want to use it, and still fun for those who like living on the edge of future technology.
However, today I’d like to focus on a different factor behind improved stability of Fedora – ABRT, which stands for Automatic Bug Reporting Tool. It’s a tool that helps users report software problems. One of the main problems in software development is to get reports that are detailed enough so that the problem can be identified and fixed. If the report states: “I clicked a button and the window disappeared”, it doesn’t help you find the problem and it most likely won’t get fixed. But if the user attaches a backtrace and a set of relevant logs, the chances go up sharply. That was the first milestone for ABRT – to collect all relevant data in the system and help the user report it.
But the results was bugzilla flooded with ABRT reports. Developers simply didn’t have capacity to go through them and analyze them. They usually ended up filtering ABRT reports out. That was why ABRT went on to another milestone – to create statistics that would help maintainers identify which bugs affect a lot of users (and thus should be fixed) and which are just corner cases. And this finally made ABRT a very interesting aid for developers.
The statistics can be found on Retrace Server. They provide a lot of information. Not only can you find out how many crashes the bug is responsible for, which is the most important information for prioritization, but you can also learn in which release of Fedora, on which architecture etc. What is also very useful is that ABRT can group crashes together based on similarity. Then you can find out that, for instance, crashes in ten different components are caused by a bug in a single library these components are using. The number of reports in bugzilla has decreased significantly, too, because ABRT started identifying duplicates and creating reports only when enough info is collected.
The desktop team started using ABRT roughly a year and half ago. Developers are told to check the stats if their components pop up in the chart of most frequent crashes. I regularly check it, too. And if I find something my team is responsible for, I notify the responsible developer about it. But it’s been quite boring lately. If you check stats from stable releases, you won’t find desktop components so easily. And ff you do find something from the desktop after all, it’s usually already marked as fixed.
But it was not always like this. Fedora is primarily a desktop distribution, so desktop components are heavily used and they were high on the list of most frequent crashes. But ABRT enabled us to prioritize and focus on the most frequent crashes. And you can see the difference in the real-life usage. I rarely experience a crash in GNOME or default Workstation applications.
After good experience with ABRT in GNOME, I also advised KDE maintainers in my team to use it to prioritize. When they went through the list, they found Plasma crashes that had an origin lower in the stack (X11 or drivers), so not easily fixable for them, but they also found quite a few trivial oneliners which affected thousands of users. The ABRT stats are also used by some of our partners. I know Intel uses them to monitor problems in their video driver (btw kernel is associated with most of the frequent problems, but in this case, the problems are not crashes, but rather kernel module oops which users don’t even notice). CentOS started using ABRT, too. That’s helpful if you want to identify frequent crashes in RHEL because if it crashes in CentOS, it most likely crashes in RHEL as well.
ABRT is also useful for users. Not only can it collect relevant information about a crash for you, and make it much easier to report it in bugzilla, but if you don’t want to deal with any bug reporting, you can at least let it send microreports which build the statistics. By doing so, you let us know that the crash that could be fully reported by someone else affects you, too. You can even go for silent microreporting which doesn’t disturb you at all. That’s what I turn on on computers of average users. They will never report a single problem themselves, but by sending microreports they still contribute to quality of Fedora.
I also use ABRT to report problems in software that is not part of Fedora repositories. ABRT collects info about a crash for me and I can pick what I need from it or send it to developers as a whole package.
ABRT has really significantly contributed to quality of Fedora, at least in the desktop part. Kudos to all who have worked on the project for that!
I regularly go through most frequent problems reported to ABRT retrace server because it helps me prioritize bugs in Fedora that are assigned to my team. I think ABRT service is great for developers to prioritize their bugs + it helps collect much more data about the crash than an average user normally provides.
However,I’ve noticed a significant drop in number of reports in Fedora 22. It’s just two weeks before the final release when many early adopters are already running F22, but the difference in number of reports from F21 and F22 is huge: 64373:904.
12 days before F21 was released, we collected 16081 reports from this version. That’s almost 18x more. I don’t think we’re experiencing such a huge drop in adoption, so I investigated more…
…and learned that GNOME Control Center got a new privacy setting in F22: Problem reporting. And if you upgrade from Fedora 21 automatic crash reporting is disabled even though you had it enabled before the upgrade. To make it even more confusing if you go ABRT settings automatic reporting is enabled there. That’s because the setting in GNOME Control Center serves as a master setting that overrides settings in ABRT. So if you have upgraded to F22 and want to provide developers with very valuable data, please go to Control Center->Privacy->Problem Reporting and enable automatic reporting. Manual reporting is still possible from the ABRT app.
The ABRT team is working on a fix for this.
If you do a fresh installation, you should be able to allow automatic reporting in the Initial Experience after installation.