Fedora, GNOME, Linux

Story of GNOME Shell Extensions

A long time ago (exactly 10 years ago) it was decided that the the shell for GNOME would be written in JavaScript. GNOME 3 was still looking for its new face, a lot of UI experimentation was taking place, and JavaScript looked like the best candidate for it. Moreover it was a popular language on the web, so barriers to entry for new contributors would be significantly lowered.

When you have the shell written in JavaScript you can very easily patch it and alter its look and behaviour. And that’s what people started doing. Upstream was not very keen to officially support extensions due to their nature: they’re just hot patching the GNOME Shell code. They have virtually unlimited possibilities in changing look and behaviour, but also in introducing instability.

But tweaking the shell became really popular. Why wouldn’t it? You can tweak your desktop by simply clicking buttons in your browser. No recompilations, no restarts. So extensions.gnome.org was introduced.

The number of available extensions grew to hundreds and instability some of them occasionally introduced seemed like a fair price for the unlimited tweakability. In the end when the Shell crashed it was just a blink. Xorg held up the session with opened clients, the Shell/Mutter was restarted and the show could go on.

In 2016 GNOME switches to Wayland by default. No Xorg and also nothing to hold up the session with opened clients when the Shell crashes. There is only Mutter as a Wayland compositor, but unfortunately it runs in the same process as GNOME Shell (a decision also made 10 years ago when it also looked like a good idea). If the Shell goes down, so does Mutter. Suddenly harmless blinks became desktop crashes with losing all unsaved data in opened applications.

I read user feedback and problems users are having with Fedora Workstation (and Linux desktop in general) a lot on the Internet. And desktop crashes caused by GS extensions are by far the most frequent problem I’ve seen recently. I read stories like “I upgraded my Fedora to 28 and suddenly my desktop crashes 5 times a day. I can’t take it any more and I’m out of ideas” on daily basis. If someone doesn’t step in and say: “Hey, do you have any GS extensions installed? If so, disable them and see if it keeps crashing. The extensions are not harmless, any error in them or incompatibility between them and the current version of GS can take the whole desktop down”, users usually leave with the experience of unstable Linux desktop. It hurts our reputation really badly.

Are there any ways to fix or at least improve the situation? Certainly:

  1. Extensions used to be disabled when the Shell crashed hard (couldn’t be restarted). Since on Wayland it’s the result of every crash, we should do that after every GS crash. And when the user goes back to GNOME Tweak Tool to enable the extensions again, she/he should be told that it was mostly likely one of the 3rd party extensions that made the desktop crash, and she/he should be careful when enabling them.
  2. Decoupling GNOME Shell and Mutter or/and other steps that would bring back the same behaviour like on Xorg: GS crash would not take everything down. This would require major changes in the architecture and a lot of work and GNOME Shell and Mutter developer community has already a lot on their plates.
  3. Discontinuing the unlimited extensions, introducing a limited API they can use instead of hot patching the GS code itself. This would be a very unpopular step because it’d mean that many of the existing extensions would be impossible to implement again. But it may become inevitable in the future.
Advertisements
Fedora

How ABRT helped us make Fedora Workstation more stable

Last week, the official Fedora Project account asked users on social networks why Fedora is their distribution of choice. Probably the most frequent answer was that Fedora is THE GNOME distro, that it has the best supported GNOME, which really made me happy, but what made me even happier was that I found a lot of answers like “You won’t believe it, but I use Fedora for stability”. Indeed, the stability of Fedora has improved a lot since I started using it, especially in the last releases. How did we achieve it?

There are several reasons why Fedora is more stable than ever before. What plays an important role is that the significant changes have settled. GNOME 3 matures, the wild beginnings of systemd are also over, Anaconda has stabilized a lot, too. Another reasons is the Fedora QA team, which now has 10 people who test Fedora full time. This is something no other community can enjoy. If you add volunteers and the fact that the team uses more and more of automated testing, you get a lot of test coverage. What I think has also helped is focus. We created three official editions – Workstation, Server, Cloud and defined what MUST be good (the three editions) and what CAN be good (everything else – spins, labs,…). We have also changed the strategy. Fedora is supposed to be progressive, but it doesn’t mean we need to force immature features on users. However, we also doesn’t want to be too conservative and become another Debian. I think we have found a good balance. The strategy is to have stable defaults and experimental features as opt-ins that are just a few clicks away for early adopters who would like to test them (this strategy was used for DNF, and now we’re using it for Wayland). This way, Fedora is stable enough for users who just want to use it, and still fun for those who like living on the edge of future technology.

However, today I’d like to focus on a different factor behind improved stability of Fedora – ABRT, which stands for Automatic Bug Reporting Tool. It’s a tool that helps users report software problems. One of the main problems in software development is to get reports that are detailed enough so that the problem can be identified and fixed. If the report states: “I clicked a button and the window disappeared”, it doesn’t help you find the problem and it most likely won’t get fixed. But if the user attaches a backtrace and a set of relevant logs, the chances go up sharply. That was the first milestone for ABRT – to collect all relevant data in the system and help the user report it.

But the results was bugzilla flooded with ABRT reports. Developers simply didn’t have capacity to go through them and analyze them. They usually ended up filtering ABRT reports out. That was why ABRT went on to another milestone – to create statistics that would help maintainers identify which bugs affect a lot of users (and thus should be fixed) and which are just corner cases. And this finally made ABRT a very interesting aid for developers.

The statistics can be found on Retrace Server. They provide a lot of information. Not only can you find out how many crashes the bug is responsible for, which is the most important information for prioritization, but you can also learn in which release of Fedora, on which architecture etc. What is also very useful is that ABRT can group crashes together based on similarity. Then you can find out that, for instance, crashes in ten different components are caused by a bug in a single library these components are using. The number of reports in bugzilla has decreased significantly, too, because ABRT started identifying duplicates and creating reports only when enough info is collected.

abrt-hlaseni
Stats of a problem.

The desktop team started using ABRT roughly a year and half ago. Developers are told to check the stats if their components pop up in the chart of most frequent crashes. I regularly check it, too. And if I find something my team is responsible for, I notify the responsible developer about it. But it’s been quite boring lately. If you check stats from stable releases, you won’t find desktop components so easily. And ff you do find something from the desktop after all, it’s usually already marked as fixed.

But it was not always like this. Fedora is primarily a desktop distribution, so desktop components are heavily used and they were high on the list of most frequent crashes. But ABRT enabled us to prioritize and focus on the most frequent crashes. And you can see the difference in the real-life usage. I rarely experience a crash in GNOME or default Workstation applications.

After good experience with ABRT in GNOME, I also advised KDE maintainers in my team to use it to prioritize. When they went through the list, they found Plasma crashes that had an origin lower in the stack (X11 or drivers), so not easily fixable for them, but they also found quite a few trivial oneliners which affected thousands of users. The ABRT stats are also used by some of our partners. I know Intel uses them to monitor problems in their video driver (btw kernel is associated with most of the frequent problems, but in this case, the problems are not crashes, but rather kernel module oops which users don’t even notice). CentOS started using ABRT, too. That’s helpful if you want to identify frequent crashes in RHEL because if it crashes in CentOS, it most likely crashes in RHEL as well.

ABRT is also useful for users. Not only can it collect relevant information about a crash for you, and make it much easier to report it in bugzilla, but if you don’t want to deal with any bug reporting, you can at least let it send microreports which build the statistics. By doing so, you let us know that the crash that could be fully reported by someone else affects you, too. You can even go for silent microreporting which doesn’t disturb you at all. That’s what I turn on on computers of average users. They will never report a single problem themselves, but by sending microreports they still contribute to quality of Fedora.

I also use ABRT to report problems in software that is not part of Fedora repositories. ABRT collects info about a crash for me and I can pick what I need from it or send it to developers as a whole package.

ABRT has really significantly contributed to quality of Fedora, at least in the desktop part. Kudos to all who have worked on the project for that!

Fedora

I’m going to FUDCon APAC 2015!

Last year, I was really impressed by the level of organization and atmosphere at FUDCon APAC that took place in Beijing, China which is why I decided to submit a talk for FUDCon APAC 2015, which is going to take place in Pune, India. And guess what! My talk was accepted!

I named the talk “Present and Future of Fedora Workstation”. I’m now part of the Red Hat desktop team and we have a lot of interesting stuff that has made it to F22 and even more interesting stuff that is planned for F23. So I’ll talk about all the goodness that is changing Fedora Workstation into the best desktop system for active and creative users (developers, writers, designers,…).

I’m arriving to Mumbai at 8:35am on June 25th. I’ve seen that some people have arrivals around that time, too. It’d be great to organize transportation to Pune together. After FUDCon, I’m taking a week of holidays and would like to check interesting places around, hope to see e.g. Goa before the proper rain season starts. India will be my 50th visited country and I’m looking forward to it.

See you in Pune!