On Fri, Apr 28, 2017 at 5:11 PM, Eric Blau <eblau@eblau.com> wrote:
On Fri, Apr 28, 2017 at 12:29 PM, Carsten Mattner via arch-general <arch-general@archlinux.org> wrote:
The constant churn of refactorings and whatnot makes it impossible for all the hardware that say i915 supports to actually work reliably across kernel releases. What used to work flawlessly in 4.1 can be broken in 4.4 because the devs do not test with Intel GPUs older than Gen7 for example, all the while claiming it's supported in the now refactored but practically untested code.
Carsten,
I agree with you about i915. I've been hitting this kernel panic regularly, about once per day, freezing my entire machine:
Bug 99295 - [Regression BDW] kernel panic in Intel i915 module, complete system freeze in 4.10-rc2 https://bugs.freedesktop.org/show_bug.cgi?id=99295
There's a fix that's been submitted to the tip, but no effort has been made to patch the bug in the 4.10.x stable series. It seems the devs don't care about having a stable kernel to use, only about moving forward the tip and staying on the bleeding edge. Shouldn't at least showstopper kernel panics be patched to the "stable" release?
I requested a fix on the tip to be patched in the 4.9.x stable series a couple months ago because I tested the fix myself and verified it "worked for me" but it was subsequently reverted. I'm sure I don't know enough about the i915 driver to be able to make these types of decisions about what should or should not be patched other than to help with testing, but it would be nice if the i915 dev team made an effort to propagate fixes to stable as well.
It's possible that the fix causes other issues, but I've also seen crash fixes take very long until landing in a stable release, sometimes taking 2 or 3 releases, while refactorings are intertwined with other fixes in stable releases. It looks odd. On one machine where XFCE, GNOME3 and Weston work without errors, I've seen Plasma to misbehave in its compositor so much that I couldn't get it to open KDE settings for turning off compositing. An earlier release of Plasma (4.x) used to work, so it's recently regressed. I attributed all of this to massive amount of churn in applications and the graphics stack without adequate GPU testing. It's great that we're now at OpenGL 4.3 levels of support in Mesa for some Intel GPUs, but when Plasma just doesn't render correctly, it's clear QA failed. Another bug with i915 and intel gpu stack is that DRI3 was supposed to solve all tearing issues and together with glamor one was supposed to use just generic modesetting instead of xf86-video-intel. The reality is that DRI2 with TearFree and AccelMethod SNA is the only reliable tear free mode. In DRI3 anytime you start video playback with mpv you can see how a small rectangle is resized to the final window size. This doesn't happen with DRI2. Some distros started to use generic modesetting by default, but I'd wager they didn't test for tearing and other functionality regressions or are used to and don't care about it. Since Wayland uses DRI3 by default, I've actually been able to observe tearing in Sway Wayland compositor, though very seldom. I wonder how the situation is with AMD and nVidia GPUs with open and closed driver stacks. It seems that if you run GNOME3 with GTK3 under Wayland and only GTK3 apps with GDK_BACKEND=wayland and no X app, then it works well, but that's like forcing everyone to use just Android apps under ChromeOS. With libweston and libweston-desktop and further fixes in Xwayland, maybe 2018 we will finally have what Wayland promised very long ago. I wouldn't blame outsiders if they looked at Linux Desktop and thought that there's too many variants and too much change with little stabilization going on. Then there's outstandingly stable software like GNU Emacs, FVWM, xterm or XMonad. Your config from a decade or two ago still works and with minimal to none deprecation disruption. So when it comes to open source video driver stacks, the best stragey is running one of the last two generations of GPU (Broadwell and Skylake) and always stay in thet range since older GPUs lose QA coverage with new GPUs coming out. If the capabilities of a GPU are clear and you cannot expect to have newer OpenGL support in a newer Mesa, then it would make sense to have a stable but old i915 stack for old GPUs that doesn't change vs new i915 stack for newer GPUs, but Linux is a monolithic design without driver ABIs for good reasons that show their disadvantage when QA is insufficient.