The recent "CrowdStrike Windows Outage" has highlighted the necessity for robust incident response strategies. Here's a closer look at the event and actionable steps to prepare for future disruptions.
What Happened?
On July 19, 2024, a major tech outage disrupted numerous organisations globally. The issue originated from a CrowdStrike update that led to widespread "blue screen of death" errors on Windows workstations, significantly impacting operations across multiple sectors. Significant operational disruptions affected airlines, banks, hospitals, and other critical services.
AT&T, Verizon, and T-Mobile customers reported significant service interruptions. T-Mobile and Verizon clarified that the major disruptions AT&T faced did not directly affect their networks, but they did experience connectivity issues with AT&T's network. Speculations about the outage involving hardware issues or the impact of recent solar flares have surfaced, but these theories remain unconfirmed.
For real-time updates and a more detailed understanding of the outage's scope and impact, you can refer to resources like ThousandEyes' Internet Outages Map​
Immediate Lessons Learned
Regular Backups: Ensure data is backed up regularly and stored securely off-site. Regularly test these backups for quick recovery.
Develop an Incident Response Plan: Create or update your incident response plan with clear steps for managing outages.
Establish Communication Channels: Set up effective communication protocols for internal and external stakeholders during an outage.
Monitor Vendor Management: Ensure your third-party vendors have robust incident response plans. Maintain open lines of communication.
Implement Redundancy Systems: Put redundancy and failover systems in place to maintain business operations during disruptions.
Conduct Training and Simulations: Regularly train your staff and conduct simulations to prepare them for potential outages.
Review Security Protocols: Frequently review and update your security measures to protect against emerging threats.
Perform Post-Incident Analysis: After an incident, conduct a thorough analysis to improve your response for future events.
Engage with Stakeholders: Maintain strong relationships and communicate effectively with stakeholders to address their concerns during disruptions.
Ensure Legal and Compliance Alignment: Verify that your response plans meet all legal and compliance requirements.
As the CrowdStrike Windows Outage demonstrates, incident response and resilience are essential in today’s interconnected digital landscape. This urgency is echoed in recent regulatory updates, such as the Bank of England’s PS16/24 policy on Critical Third Parties (CTPs). PS16/24 sets new operational resilience standards for critical service providers, emphasizing the importance of scenario testing, incident notification, and robust risk management frameworks. For businesses relying on third-party providers, aligning with such regulatory frameworks can be a key step in safeguarding operations and meeting compliance requirements.
Taking Action
Audit Your Current Plans: Review and assess your current incident response and continuity plans to identify any gaps or weaknesses.
Implement Immediate Changes: Address any identified issues promptly to strengthen your defences.
Educate and Train Your Team: Ensure your team is well-prepared and knowledgeable about incident response protocols.
Conclusion
The CrowdStrike Windows Outage serves as a critical reminder of the importance of robust incident response plans. By taking proactive steps now, you can safeguard your business against future challenges and ensure continuity in the face of unexpected disruptions. Stay prepared and resilient.