I blame MS for the CrowdStrike incident Link to heading
Yesterday we observed the largest tech failure in history where millions of PCs running Windows failed to boot due to software glitches…
I blame MS for the CrowdStrike incident Link to heading
Yesterday we observed the largest tech failure in history where millions of PCs running Windows failed to boot due to software glitches, disrupting airlines, banks, and media. CrowdStrike, a cyber security company, has admitted that defects in its software update have caused the incident, but I claim that Microsoft is also to be blamed for the incident.
CrowdStrike at fault Link to heading
To see why, let’s first understand what the issue is. Normally, a third-party software should not be able crash your system. If it does, then the OS is not doing its job. When I say a third-party software, I mean software that runs in the user space. If there is an error from a user space program, the worst thing that should ever happen is for the OS to force-kill the application. That’s it. That is how an OS is designed to do, and that is the responsibility of the OS.
Unfortunately, CrowdStrike is little different. It is a cybersecurity software, so it differs from typical applications. It runs at kernel level of the OS because it needs to monitor network activities of the system. Hence, CrowdStrike runs with a higher level of privilege and has far more access to the system than other software, but should come with greater responsibility. What this means is it can mess up the entire system if it wanted to. In fact, it just did on Friday 7/19 2024, though probably not intentional. For more technical details, refer to this article.
Why Microsoft? Link to heading
So, then why is Microsoft to be blamed? Because Microsoft knows too well how deadly CrodwStrike can be to the system given that it is running with escalated privilege, and hence should have implemented a safety net to prevent critical system failure if something goes wrong with CrowdStrike, or any of the software that runs in the kernel space. For example, Microsoft could have forced all third-party software vendors running in kernel space (there are not that many) to submit their apps and updates to Microsoft first, review them to make sure they are not causing any system failures, then distribute only after they pass Microsoft’s internal quality control. It can be difficult for hardware drivers because Microsoft may not have the required hardware but it should be pretty straightforward for non-hardware software, such as CrowdStrike. If you think this is a non realistic scenario, look no further than Apple’s App store.
Apple is doing this for every app, not just those few that need kernel privilege, but also all user-space apps with their App store review process. Yes, it is frustrating for the developers, but there is no free lunch. Apple is serious about user’s privacy and security, thus they are willing to spend that much effort (one can argue more than necessary) to manually go through every single app and update to make sure it is harmless. Of course, this is not always perfect — in fact, some harmful apps have bypassed the App store review in the past — but having a safety net is always better than free falling of apps! Apple knows that any system failure from a third-party app is a potential make-or-break problem and hence is willing to spend astronomical amount of money for the (notoriously) rigorous review processes.
Microsoft knows that anything that runs in the kernel space can mess up the system; this is like sharing a kill-switch and MS has done so willingly without the full understanding of what the vendor would or could do with the god-like privilege. And we just witnessed the fatal consequence of what this led to. What if, hypothetically, CrowdStrike wanted to intentionally cripple the system — there may have been an engineer in CrowdStrike who was fired from Microsoft and wanted revenge by crashing every Windows system that runs CrowdStrike?
Trust may be a bedrock of all business partnerships — but trust without the due diligence on both parties to adhere to these is bound to break. And we have just become the witness of the unfortunate byproduct that has proven to be outright wrong, naive and dumb. After all, only the Microsoft clients had to suffer from the apocalypse.
References Link to heading
Global IT outage causes chaos, disrupting airlines, banks, media, telecoms Cybersecurity firm CrowdStrike says outage caused by ‘defect’ in software update, rules out ‘cyber incident.’
Crowdstrike admits ‘defect’ in software update caused IT outage that is wreaking worldwide chaos Cybersecurity firm Crowdstrike has admitted that a “defect” in a software update has caused the IT outage currently…
Recovering from the global tech outage could be a long, arduous process | CNN Business The company that caused a massive computer outage across the world says a flawed update has been rolled back - but that…
Technical Details on July 19, 2024 Outage | CrowdStrike Learn more about the June 19, 2024 CrowdStrike outage and the technical details related to it.