*10 minutes of hold music*
"Apple Support, how may I help you?"
"I've got an Xserve that becomes inaccessible for 30 minutes every night."
"Let me forward you to enterprise support"
*sigh, 15 minutes of hold*
"How may I help you?"
"My Xserve turns into a pumpkin every night at 11:30"
"We cannot open a ticket until you run the troubleshooter. Please reboot and run the troubleshooter CD. If you lost it, it's $15 to replace it."
ugh. Let me get this straight. I've got a production server that you want me to take offline entirely for the duration of this "thorough" testing. You also won't allow customers to download a new copy. Thanks for treating your sysadmin customers like you treat your grandmothers.
I run your damn system updater, because it's either update or remain vulnerable to your latest pile of bugs. I reboot, and the NICs start resetting themselves if they don't receive data. Literally. If there was no communication for approx. 80 seconds, the system would down the NICs, unload the drivers, reset the hardware, and reload the drivers. The NOC was kind enough to set one of their monitoring boxes to ping us every 30 seconds.
How do you people test your software? I imagine a brushed-aluminum room with a floor made of keyboards, each one plugged into a different test box somewhere. Someone is tasked with tossing a box full of cats (all wearing turtlenecks) into this room. If none of the systems catch fire within 30 minutes, testing is complete. Someone else must remove the cats. All have iPods.
I give up. I'm going into the data center tonight, ripping out the xserve and replacing it with a toaster running linux. At least when that one messes up, It's either my fault or something I can fix.
Edit: New Replacement Hardware