Using Observability to Prioritize CrowdStrike Remediation with Josh Wood p6x60

Explicit

05/08/2024

When thousands of systems show a blue screen - which ones do you fix first to quickly bring up...

When thousands of systems show a blue screen - which ones do you fix first to quickly bring up your most critical systems? For that you need to know which systems are impacted, which mission critical applications run on it, and which depending systems are also impacted by something like the recent CrowdStrike incident!
We have invited Josh Wood, Principal Solutions Engineer at Dynatrace, who was one of the first responders helping organizations to leverage observability data to identify which systems to fix first to bring critical apps such as ATMs, Self-Service Terminals, POS (Point of Sales), ... back up again quickly.
In this special episode Josh is walking us through the technical details of the CrowdStrike BSOD (Blue Screen of Death), what caused it, how to leverage observability to get a priorities list of systems to fix first and what organizations can do to prevent software impacting issues in the future.

Here the links we discussed in the episode:
Josh on LinkedIn: https://www.linkedin.com/in/joshuadwood/
Josh's blog on CrowdStrike BSOD: https://www.dynatrace.com/news/blog/crowdstrike-bsod-quickly-find-machines-impacted-by-the-crowdstrike-issue/
CrowdStrike Incident Takeaway Blog: https://www.dynatrace.com/news/blog/crowdstrike-incident-revisiting-vendor-quality-control/ 

Observability that is Battle tested by Millions with Marco Sussitz and Wolfgang Ziegler 10 meses 52:35 So you think you should Serverless? Things to know before you do with Sebastian Vietz! 9 meses 01:01:26 Pitfalls to avoid when going all-in on OpenTelemetry with Hans Kristian Flaatten 9 meses 55:04 Why Developer Observability is not a tooling problem with Viktor Farcic 9 meses 58:25 Lessons learned when building the NAIS Platform with Hans Kristian Flaatten 8 meses 49:16 Ver más en APP Comentarios del episodio 474n6s