CrowdStrike began rolling out a software update just after 10pm PDT on Thursday July 18, 2024, but little did anyone know that it would be the start of one of the worst IT outages to date. Just six hours after it all started, while most people were still sleeping, our team was leaping into action to begin a day like no other.
We were fortunate to get some early intelligence from an unexpected source.
An engineer on the ISOutsource team is married to someone who works overnight at a hospital that shut down entirely because of the CrowdStrike outage. She came home at 3 a.m. and woke her husband to tell him about the incident. Rather than going back to sleep, curiosity led him to start investigating immediately, combined with the call of duty to respond when an IT incident may impact his clients. Another team member was informed at 6:00a.m. from family in New Zealand where it was the next day, that the country was down, and this was a global CrowdStrike outage.
His early response proved instrumental in helping us get ahead of this problem. His commitment was commendable—but as the day would reveal, it was not at all unusual for the entire ISOutsource team to step into action.
Thinking Like a First Responder
By 6 a.m., our entire incident response and client experience teams were together on a call, collecting information, formulating plans, and delegating responsibilities.
Many of our clients use CrowdStrike, representing dozens of companies across various industries. We determined which servers and devices were offline by correlating usage data and analyzing data sources. While some of those clients faced much larger outages than others, having IT go suddenly offline was causing problems for every one of them—problems that were our responsibility to solve.
Just hours after the outage started, we understood that the response would require a significant effort to resolve. We also knew this would be an “all-hands-on-deck” scenario requiring the vast team at ISOutsource to work as one. That included everyone on our technical, consulting, and customer support teams, but most of our managers, salespeople, and even upper leadership were also involved. We committed to helping all those affected recover as fast as possible, drawing on ALL our resources in ways we never had before.
Before many of our clients even realized they had a problem, the consultants and engineers assigned to each of those accounts were reaching out with a personalized restoration plan and a way for them to track progress. Then it was time to hit the road and execute the plan.
Shortly after sunrise, our staff was behind the wheel, headed in different directions to client locations across the Pacific Northwest and Southwest to confront the dreaded “blue screen of death” in person. Technical experts reached most sites early in the workday, while the outage was still in its infancy, so they could facilitate the fix before IT issues caused any additional downtime. For our clients further away in locations like California, we arranged with some remote partners to provide on-site support so they could get back online ASAP.
Keeping Business Disruption to a Minimum
Bugs in software updates are nothing new. What stands out about this incident is the speed and scale of impacted systems. It caused massive business disruption to companies like Delta and sent critical industries such as healthcare and transportation into chaos. For ISOutsource clients, by contrast, the effects were minimal thanks to our quick, coordinated, and fully committed response.
By the end of the first day, we had 96% of servers back online and the bulk of our client’s endpoints since the end users weren’t always available. Most clients were back to normal within 24 hours of the CrowdStrike outage starting. Many recovered even sooner with no negative impacts whatsoever. We drafted volunteers to provide extra support over the weekend after the outbreak, but they were largely unnecessary since most of our clients had already recovered. It may have been a history-making incident, but it felt like a brief bump in the road for our clients.
Several factors contributed to this success. First, our experience with incident response and business continuity plans, practices, and preparations we’ve developed allowed us to act fast, even if that involved deputizing extra support to help. Second, the competence and commitment of our staff were on full display as every member of ISOutsource took an extremely stressful situation in stride. Lastly, our team exemplified what we mean when we say “ISO Cares”— setting everything aside to make our client’s needs their top priority.
Every service provider promises to go above and beyond. During this outage, ISOutsource proved we can and will do whatever it takes.
Final Takeaway
Always keep your business continuity plan accurate and updated, especially who to contact and how. One of our biggest obstacles during the CrowdStrike outage was contacting clients as quickly as possible, which could have been avoided with proper planning. Take this moment to review (or create) your Backup and Security Plan. Connect with our tech advisors today!