Things look quiet here. But I've been doing a lot of blogging at
dan.langille.org because I prefer WordPress now.
Not all my posts there are FreeBSD related.
I am in the midst of migrating The FreeBSD Diary over to WordPress
(and you can read about that here).
Once the migration is completed, I'll move the FreeBSD posts into the
new FreeBSD Diary website.
mknod - create the device, then mount5 January 2004
My primary mail server went down on 1 January. In the process of analyzing
the problem, I leaned about a new tool: mknod. This article documents how
I used that tool, a live filesystem CD, and a floppy disk to look at the
disk of the dead box.
Happy New Years!
I first noticed a problem on New Years day. I couldn't ssh into the box. Nor was it accepting
email. Attempts to connect were met with:
$ ssh m20
Password:
Last login: Thu Jan 1 11:38:35 2004 from betty
Copyright (c) 1980, 1983, 1986, 1988, 1990, 1991, 1993, 1994
The Regents of the University of California. All rights reserved.
-bash: /etc/profile: Device not configured
Connection to m20.example.org closed.
$
smtp was also sick:
$ telnet m20 25
Trying 10.0.0.1...
Connected to m20.example.org.
Escape character is '^]'.
220 m20.example.org ESMTP Postfix
helo bast.example.org
250 m20.example.org
mail from: dan@example.org
250 Ok
rcpt to:eric@example.com
250 Ok
data
354 End data with <CR><LF>.<CR><LF>
test msg via m20
.
451 Error: queue file write error
quit
221 Bye
Connection closed by foreign host.
Being a holiday, I wasn't able to get access to the collocation facility. It wasn't
until January 4th that I was able to get there.
Take a camera!
As I was driving to the collocation facility, I remembered my camera. I
thought about turning around to collect it, but didn't. Bad idea. I've
lost useful information because of that decision. The console contained
messages which might have been useful. Next time, I hope I remember.
What I do remember is messages about see tuning(7).
That's it. Nothing else. If I'd had a camera, I would have taken a
picture and we'd both be able to learn something from it. What
a silly mistake.
I hit enter once, and that started a stream of messages far too rapid to
read. CONTROL-S didn't halt it, nor did SCROLL-LOCK. I tried another
virtual console. I got a login prompt. But as soon as I touched a key,
the tty died with the following message:
/: create / symlink failed, no inodes free
This happened with each virtual console I tried.
I went back to the main console to look closely at the scrolling messages.
I could read nothing. I pressed the power switch, and that stopped the messages
for a short time, before they started again. I was able to read something like this:
vm_fault_pager read error pid 1 init
So... it looks like init was having problems. This was a sick system. I rebooted
the box.
The first reboot
The first reboot did nothing. It could not find the disk drive. I went into
the BIOS setup and found that nothing was listed for the primary drive.
Auto-detection found nothing. I had no choice but to take the system home
with me.
booting at home
At home, I wanted to examine the system before booting it up in case I lost
anything by writing to the drive. I booted up from a CD I had, but couldn't mount
any drives. I also had a 4.7-RELEASE from
FreeBSD Mall. Disk 2 contains a live
filesystem, which you can boot from and obtain a working FreeBSD system with
very little effort. I booted, and tried to mount my disk.
dmesg(8)
showed that the disk (ad0) was found. But I could not mount it because
/dev/ad0s1e did not exist, but
/dev/ad0s1 did.
/dev/MAKEDEV was not present on this live filesystem.
I was talking out loud about this in an IRC channel, when Anton Berezin had this
great idea:
mkdir -p /tmp/dev
cd /tmp/dev
/sbin/mknod ad0s1e c 116 0x00020000 root:operator
I tried it, but ran into a problem. This live filesystem CD did not have
mknod(8)
Another great idea from Anton: no mknod, no device. copy mknod to a floppy :-)
Remember: The 4.9-RELEASE live filesystem ISO image contains
mknod. I wouldn't
have needed the floppy if I'd have that ISO just sitting around ready to
go. I now have a CD ready to go....
That gives me a floppy with mknod. From the live
filesystem machine, I mounted the floppy and copied the file to /tmp for
future use.
Trying mknod again
Then I tried the original command again:
/tmp/mknod ad0s1e c 116 0x00020000 root:operator
Now I had an error about no such group. There was no
/etc/group file in this machine. Not to worry.
You can use the numbers instead of the names.
/tmp/mknod ad0s1e c 116 0x00020000 0:0
This translates to root:wheel. Check /etc/passwd
and /etc/group and you'll see why.
This worked. I then mounted that new device:
mount -r /tmp/dev/ad0s1e /mnt
That was was it. I had my drive mounted. I check around, found nothing unusual.
I then repeated the procedure for each slice on my drive.
/tmp/mknod ad0s1a c 116 0x00020000 0:0
/tmp/mknod ad0s1f c 116 0x00020000 0:0
/tmp/mknod ad0s1g c 116 0x00020000 0:0
A brief explanation:
The c means a character type devices.
116 is the major number for this type of device, as found from /dev/MAKEDEV.
0x00020000 is a bitmask. You can see that here:
crw-r----- 2 root operator 116, 0x00020000 Aug 15 16:44 /dev/ad0s1a
crw-r----- 2 root operator 116, 0x00020001 Aug 15 16:44 /dev/ad0s1b
crw-r----- 2 root operator 116, 0x00020002 Aug 15 16:45 /dev/ad0s1c
crw-r----- 2 root operator 116, 0x00020003 Aug 15 16:45 /dev/ad0s1d
crw-r----- 2 root operator 116, 0x00020004 Aug 15 16:45 /dev/ad0s1e
crw-r----- 2 root operator 116, 0x00020005 Aug 15 16:45 /dev/ad0s1f
crw-r----- 2 root operator 116, 0x00020006 Aug 15 16:45 /dev/ad0s1g
crw-r----- 2 root operator 116, 0x00020007 Aug 15 16:45 /dev/ad0s1h
This information was obtained from a working system.... Hopefully you'll have one
somewhere that you can access.
For some reason I was unable to mount more than one slice at a time. I kept
getting a "device busy" message.
But I was able to examine the drive and find nothing obviously wrong. I then
booted the system into single user mode by pressing the space bar during
the boot count down, and then issued boot -s.
For a bit more information about single user mode, please read this
this FAQ.
When I booted into single user mode, I had to run
fsck
in order to clean the file systems. They were marked as dirty because of
reboot. They would be marked clean if I had done a proper shutdown, which
was not possible.