Things look quiet here. But I've been doing a lot of blogging at
dan.langille.org because I prefer WordPress now.
Not all my posts there are FreeBSD related.
I am in the midst of migrating The FreeBSD Diary over to WordPress
(and you can read about that here).
Once the migration is completed, I'll move the FreeBSD posts into the
new FreeBSD Diary website.
The rebuild took about 13 hours all up... I'm glad the machine was online during
that time. The machine in question runs the
FreshPorts BETA site and is my main
development server.
What caused the problem?
I don't know what caused the problem. I know some of the symptoms.
The box did not respond to pings
telnet to port 22 gave a standard SSH banner
attempts to ssh were unsuccessful with no login prompt being provided
Console was sluggish
When pressing ALT-F3 to go to another vtty, nothing happened. When I returned
to the console some minutes later, I noticed I was now on the other vtty (sluggish).
Attempts to login via that tty showed no response. It may have just been sluggish
There are no entries in the log
After about 20 or 30 minutes trying to get the system going, I rebooted it. Of course,
this would degrade the RAID array, and I wanted to avoid that. I saw no other options.
I rebooted the box.
It was suggested that one drive may have been experiencing an error. HDD try to solve
errors and can take a long time attempting to recover. The RAID card can see this
and just waits. No I/O occurs during this time. Western Digital has drives which
are designed for RAID and feature TLER (Time Limited Error Recovery). Such features
have been available on SCSI drives for quite some time. For what it's worth, the
drives I'm planning to buy for the Dual Opteron server will have TLER.
Ideas? Suggestions? Comments? Please use the comments link to the right.