The Mega Outage

From what I’ve seen, people smart enough to get to that level have, at least, three scapegoats all lined up and fully fattened for the slaughter.
Not sure if "smart" is the word I'd go with. There are plenty of very, very smart people who do not possess the sociopathic tendencies required to have no problem lining up 3 lambs for said slaughter. I have turned down several opportunities in life to make good legal money taking advantage of people such as quitting New York Life after getting licensed because of moral objections or even getting in trouble in management jobs for refusing to write up employees to try and cover up a mistake by leadership. And that is only a fraction of a fraction of the kind of stuff these guys and gals at the big desk are making. Not that I don't admire plenty of successful CEO types, but...for every one of them, there is a place like Boeing or CrowdStrike lol.
 
Well I was supposed to fly out to sfo on United today for a big summer friends get-together. Been looking forward to it for months.

My 6am flight was cancelled at 00:32 so I didn't see that until I woke up at 0330 and by that time any possibility of a rebooking on any other airline was long gone.

• you very much, United and Crowdstrike.
 
It was caused by a 3rd party cyber security firm (CrowdStrike) who forgot to properly test their patches before deploying an update to a Microsoft system. That is one expensive screw up. LAS is ground-stopped also but it seems SWA might be the only airline moving since they don't use that software. Lesson learned.
I’ve been out of the ‘business’ for a while but reading about what that ‘endpoint protection’ does has me wishing the U.S. had any (any!) sort of privacy laws for work computers.
 
Many years ago when I worked for a biotech company as an intern, and shortly after they rolled out information systems change control, I was poking around our Systems Management Server when I saw that Windows security patch compliance was 100 out of ~10,000 machines at headquarters for almost five months worth of patches (much less at global locations). It was a blink and rub your eyes and read it again sort of moment, then a “well, I guess I’d better go tell the principle architect” moment, followed by an “I got to explain that to a Director after the principle architect made me repeat myself three times” moment. When 1) change control required that every business owner that touches a given product sign off on changes, and 2) most every application in that environment ‘touched’ the desktop, you had to have basically every IS business owner sign off on those patches, and nobody apparently had time for that.

Needless to say that 1) the patches got applied by a somewhat chastened desktop group that afternoon, and 2) that change control process got a little ‘tweaking’ to allow routine updates from Microsoft to be applied without getting signatures from every ‘business owner’ in the enterprise. I can’t remember exactly but I think they went with small group tryouts that happened on Patch Tuesday and then the actual updates went out that Friday if we made it through the rest of the week on the desktop with no evident problems.

This strikes me, from a governance perspective (and granted I’ve been out of the world on this a bit), as the complete opposite of change control. As in, there wasn’t any.
 
I can feel the "long pause, a sigh" and visualize him squeezing the sh— out of his stress ball. ;)
Mine says "worry about everything, panic about nothing," and many hours were spent squeezing it or bouncing it off my walls from 2020-2021.

It's great.
 
1721505117059.png
 
Many years ago when I worked for a biotech company as an intern, and shortly after they rolled out information systems change control, I was poking around our Systems Management Server when I saw that Windows security patch compliance was 100 out of ~10,000 machines at headquarters for almost five months worth of patches (much less at global locations). It was a blink and rub your eyes and read it again sort of moment, then a “well, I guess I’d better go tell the principle architect” moment, followed by an “I got to explain that to a Director after the principle architect made me repeat myself three times” moment. When 1) change control required that every business owner that touches a given product sign off on changes, and 2) most every application in that environment ‘touched’ the desktop, you had to have basically every IS business owner sign off on those patches, and nobody apparently had time for that.

Needless to say that 1) the patches got applied by a somewhat chastened desktop group that afternoon, and 2) that change control process got a little ‘tweaking’ to allow routine updates from Microsoft to be applied without getting signatures from every ‘business owner’ in the enterprise. I can’t remember exactly but I think they went with small group tryouts that happened on Patch Tuesday and then the actual updates went out that Friday if we made it through the rest of the week on the desktop with no evident problems.

This strikes me, from a governance perspective (and granted I’ve been out of the world on this a bit), as the complete opposite of change control. As in, there wasn’t any.
Principal Architect? Whatever. The guy below the Director.
 
The fixes discussed seem to be either:
  • Restore machines from backups, or
  • Keep rebooting and hope the network can snag an update before the machine crashes (wired more successful than wireless)
This ain’t getting fixed fast.
I omitted a third option that seems to be popular for cloud-based servers where there it no opportunity to boot from a thumb drive or equivalent intercept:

Detach the drive, attach it to a healthy machine, remove the offending byte code kernel driver .sys file, detach drive, reattach to original machine.

Hundreds/thousands of servers.
 
I omitted a third option that seems to be popular for cloud-based servers where there it no opportunity to boot from a thumb drive or equivalent intercept:

Detach the drive, attach it to a healthy machine, remove the offending byte code kernel driver .sys file, detach drive, reattach to original machine.

Hundreds/thousands of servers.
USB disabled. On what WinBox do we have that BIOS password document saved ….
 
Man, SJI just can’t recover from this. It’s just embarrassing at this point and I have a feeling our q3 is shot. I just don’t even know what to say at this point.
 
Man, SJI just can’t recover from this. It’s just embarrassing at this point and I have a feeling our q3 is shot. I just don’t even know what to say at this point.
Thankfully I’m off for the next few weeks. I’m getting green slip calls every 10mins. I wanna help, really I do. But hearing all the horror stories about no hotels and nobody answering the phones, I’ll just watch from the sidelines. I feel terrible for all the crews and passengers. This is so wrong on so many levels. Should have been fixed years ago.

Friday, you can blame Microsoft or whoever. After that’s this is all on the company.
 
Is that the day you fly to your Delta Air Lines interview?

Airlines*

I feel that the next computer crash that affects Delta is going to be because it didn’t recognize Air Lines as two words and got stuck in some fatal error loop. ;)

And no, it’s the day I go back DTW-LAX. I do that commute in the summer when I see family. That is, if Delta can figure out how to take an A321 with 190+1. Last time, the gate agent almost refused to list me because of “payload optimization.” Staff Traveler showed 22 open seats, 14 nonrevs. I thought with 8 open seats, wouldn’t be an issue.

In the end, I did get on. But this BS is getting old.
 
Thankfully I’m off for the next few weeks. I’m getting green slip calls every 10mins. I wanna help, really I do. But hearing all the horror stories about no hotels and nobody answering the phones, I’ll just watch from the sidelines. I feel terrible for all the crews and passengers. This is so wrong on so many levels. Should have been fixed years ago.

Friday, you can blame Microsoft or whoever. After that’s this is all on the company.

Yep. I was only at United for about 4 months before moving to SJI. Their IT is light years ahead of Delta.
 
Back
Top