Hello,
A lot of the Windows Error Reporting fault buckets that get filled at Microsoft are from third-party code. I think at one point during Windows Vista's heyday Microsoft reported that the largest percentage of all bug checks (aka BSoDs) came from video card device drivers¹. When systemic problems like that occur, Microsoft usually does a good job of identifying the affected industry and working with them to resolve it. You don't typically hear about that, though; oftentimes the companies take credit for making faster and more stable software without mentioning that Microsoft basically came to them and forced them to use their assessment and code profiling tools to identify system impact and work with them to remediate it.
But these days, Microsoft's rapid release cycle mean an increasing number of compatibility errors are showing up in Windows itself. In the
June 12, 2018 kernel update, Microsoft fixed a bug in their operating system which affected a popular accounting package--note that the software company could not fix this because it was a flaw in the operating system itself. There were also a couple of issues with games (and Windows is the largest gaming platform out there by far) and with screen brightness controls on laptops (which are a more popular form-factor by far than desktops right now). Third-party vendors are still to blame for some things, though: The KB article for the update also notes people ending up with black screens due to certain PC Tune-Up programs--which are largely superfluous these days, as Windows does a much better job of dynamically tuning itself now then it did in the XP days. Going back a month to the
May 23, 2018 kernel update shows that Microsoft had to fix incompatibilities with SSDs, from large tier-one SSD manufacturers that are generally well-regarded for shipping stable products². Some of the the fixes were for preserving settings and regression errors--which are the kinds of things you really should be catching in your internal quality control process.
I have to say that Microsoft does try quite hard, though. There have various engineering access programs where developers, manufacturers and large customers sometimes get multiple builds a week, often with debug symbols. But sometimes even that's not enough. For example, one Windows 10 version had an issue that impacted some of my employer's customers. And that issue did not show up in any of the pre-release builds--it occurred due to changes between the final release candidate and the RTM³ version. And no matter how good your coding is and your testing is, you really can't anticipate it when an operating system vendor makes changes after everything is supposed to be locked down.
When Windows 10 originally shipped, Microsoft had a concept of phased distributions for various releases like Current Branch, Current Branch for Business, Long-Term Servicing Branch, Insider's Branch and so forth. with the idea that some of this could be ameliorated by giving consumers the latest build and only releasing it to enterprise customers after it had been tested by them. But that model has been tweaked a little, with the idea of various channels, and businesses running various percentages of their users on various channels, including the Insider's builds, which are a kind of public beta. Maybe Microsoft needs to refactor things again and have a concept of a "consumer stable" channel: The user doesn't stay on a previous version, they get the current version, but it only gets kernel updates when they have passed through some kind of quality gating mechanism,e.g., no sev-1, pri-1 defects. I think that would be a good way for Microsoft to regain a lot of its customers' trust, and it would also incentivize third-party developers with a well-tested stable platform to run on.
Regards,
Aryeh Goretsky
¹I'm looking for a reference to this, but cannot seem to find it. Does anyone recall the details?
²I'm not going to get into speculative execution errors--that was kind of a black swan that an entire industry missed; let's not forget it affected AMD, Arm, IBM and Qualcomm as well.
³Or RTW, since it seems most software is released to the web and not manufacturing these days.