Microsoft Windows Outage 2024

Microsoft-Windows-Outage-CrowdStrike-Falcon-Sensor-Update-banner-imag

Microsoft Windows Outage: CrowdStrike Falcon Sensor Update

 

Like millions of others, I tried to go on vacation, only to have two flights get delayed because of IT issues.  As an engineer who enjoys problem-solving and as CEO of the company nothing amps me up more than a worldwide IT issue, and what frustrates me the most is the lack of clear information.

From the announcements on their website and on social media, CloudStrike issued an update and that update was defective, causing a Microsoft outage. The computers that downloaded the update go into a debug loop; attempt to boot, error, attempt repair, restore system files, boot, repeat.

The update affects only Windows systems, Linux and Macs are unaffected.

The wide-spread impact and Windows server down focus; is because Microsoft outsourced part of their security to Cloudstrike, allowing CloudStrike to directly patch the Windows Operating System.

 

Microsoft and CrowdStrike Responses

Microsoft reported continuous improvements and ongoing mitigation actions, directing users to its admin center and status page for more details. Meanwhile, CrowdStrike acknowledged that recent crashes on Windows systems were linked to issues with the Falcon sensor.

The company stated that symptoms included the Microsoft server down and the hosts experiencing a blue screen error related to the Falcon Sensor and assured that their engineering teams were actively working on a resolution to this IT outage.

There is a deeper problem here, one that will impact us worldwide until we address it.  The technology world is becoming too intertwined with too little testing or accountability leading to a decrease in durability, stability, and an increase in outages.

 

Global Impact on Microsoft Windows UsersMicrosoft-Windows-Outage-CrowdStrike-Falcon-Sensor-Update-middle-image 

Windows users worldwide, including those in the US, Europe, and India, experienced the Windows server outage or Windows server downtime, rendering their systems unusable. Users reported their PCs randomly restarting and entering the blue screen error mode, interrupting their workday. Social media posts showed screens stuck on the recovery page with messages indicating Windows didn’t load correctly and offering options to restart the PC.

 

If Microsoft had not outsourced certain modules to CloudStrike, then this Windows server outage wouldn’t have occurred. Too many vendors build their products based on assembling a hodgepodge of tools, leading to outages when one tool fails.

The global IT outage caused by CrowdStrike’s Falcon Sensor has highlighted the vulnerability of interconnected systems, especially during Windows server downtime.

I see it in the MSP industry all the time; most (if not all) of our competitors use outsourced support tools, outsourced ticket systems, outsourced hosting, outsourced technology stack, and even outsourced staff. If everything is outsourced, then how do you maintain quality?

We are very different, which is why component outages like what is occurring today do not impact us. The tools we use are all running on servers we built, those servers are running in clusters we own, which are running in dedicated data centers we control. We plan for failures to occur, which to clients translates into unbelievable uptime, and that translates into unbelievable net promotor scores.

The net promotor score is an industry client “happiness” score; for the MSP industry, the average score is 32-38, but at Protected Harbor, our score is over 90.

Because we own our own stack, because all our staff are employees with no outsourcing, and because 85%+ of our staff are engineers, we can deliver amazing support and uptime, which translates into customer happiness.

If you are not a customer of ours and your systems are affected by this Windows server outage in the US, wait. Microsoft downtime will likely resolve soon when an update is issued, however, a manual update process might be required. If your local systems are not impacted yet, turn them off right now and wait for a couple of hours for Windows server outage in the US updates. For our clients, go to work; everything is functioning perfectly. If your local systems or home system are impacted, contact support, and we will get you running.

 

What went wrong and why?

On July 19, 2024, CrowdStrike experienced a significant incident due to a problematic Rapid Response Content update, which led to a Windows crash, widely recognized as the Windows Blue Screen of Death (BSOD). The issue originated from an IPC Template Instance that passed the Content Validator despite containing faulty content data. This bug triggered an out-of-bounds memory read, Windows outage cause operating systems to crash. The problematic update was part of Channel File 291, and while previous instances performed as expected, this particular update resulted in widespread disruptions.

The incident highlighted the need for enhanced testing and deployment strategies to prevent such occurrences. CrowdStrike plans to implement staggered deployment strategies, improved monitoring, and additional validation checks to ensure content integrity. They also aim to provide customers with greater control over content updates and detailed release notes. This incident underscores the critical need for robust content validation processes to prevent similar issues from causing outages, such as the one experienced with Microsoft.