|
| Debian Eplieptic Siezure |
 |
Sun, 30 Mar 2008 21:05:59 -050 |
I have a Debian Etch RAID server which suddenly suffered a siezure the
other day. I had to physically move the system, so I did a clean shut down
and killed power to both the CPU and the external RAID array. I powered up
the RAID array with no problems, but when I turned on the main power to the
CPU box (not the power supply start-up), the UPS instantly shut down. It
didn't trip a breaker, it just quit. I reset the UPS, powered up the RAID
array, turned power on to the CPU system, and then hit the ON switch on the
CPU. Everything seemed to be coming up normally, except I noticed the RAID
array drive never loaded. Then suddenly I started getting lots of usbcore
messages on the boot screen (these were common in the kernel logs long
before this happened, however). Eventually I got the CLI login prompt, but
KDM and KDE never came up on the console. I was able to come in via XDMCP,
and KDM dutifically logs me in to KDE (or whatever). The link /dev/sda1 for
the RAID volume was missing. An lsmod shows neither the RAID driver nor the
video (NVIDIA) driver were loaded. I can manually insmod either one, and
they seem to work. Browsing the logs doesn't bring up anything that jumps
out at me as being a root cause of the problem, but goodness knows what else
isn't loading or why. Pulling the USB port on the UPS makes the annoying
usbcore messages go away, but didn't seem to help anything else.
Can anyone give me some advice on how to figure out the root cause of the
problem and how to eliminate it? Barring that, how can I fugure out what
device drivers or other utilities are not working other than by realizing
some day something isn't working? Just from memory, it doesn't seem a huge
number of modules are missing from the list, but it also seems to me more
than just those two are. The loaded module list used to scroll a bit beyond
the bottom of a full screen window using the default font, but now it
doesn't quite fill up the entire window.
The only thing I can find which seems to be related is an error which
starts at approximatley the correct time which says:
localhost kernel: Error attaching device data
This is repeates 5 more times over the 4 hours after the incident (I
think) and not thereafter.
I'm skeptical it is directly related, but the usbcore message I keep
getting is:
drivers/usb/input/hid-core.c: ctrl urb status -110 received
Before adding in the RAID and video drivers manually, here was the
output of lsmod:
RAID-Server:/var/log# lsmod
Module Size Used by
ext2 70416 0
sd_mod 25856 2
nfs 236216 0
nfsd 256200 17
exportfs 10368 1 nfsd
lockd 67600 3 nfs,nfsd
nfs_acl 8320 2 nfs,nfsd
sunrpc 166984 13 nfs,nfsd,lockd,nfs_acl
appletalk 46704 20
hptiop 16384 0
ppdev 14088 0
lp 17736 0
button 12192 0
ac 10376 0
battery 15496 0
dm_snapshot 20664 0
dm_mirror 25216 0
dm_mod 62800 2 dm_snapshot,dm_mirror
loop 20112 0
tsdev 13056 0
shpchp 42156 0
pci_hotplug 20872 1 shpchp
pcspkr 7808 0
psmouse 44432 0
serio_raw 12036 0
parport_pc 41640 1
parport 44684 3 ppdev,lp,parport_pc
floppy 67112 0
evdev 15360 1
ext3 138512 1
jbd 65392 1 ext3
mbcache 14216 2 ext2,ext3
ide_cd 45088 1
cdrom 40488 1 ide_cd
ide_disk 20608 3
ide_generic 5760 0 [permanent]
generic 10500 0 [permanent]
ide_core 147584 4 ide_cd,ide_disk,ide_generic,generic
skge 43536 0
sata_nv 17412 0
libata 106784 1 sata_nv
scsi_mod 153008 4 sd_mod,rr232x,hptiop,libata
ohci_hcd 24836 0
ehci_hcd 36104 0
thermal 20240 0
processor 38248 1 thermal
fan 9864 0
|
| Post Reply
|
| Re: Debian Eplieptic Siezure |
 |
Wed, 2 Apr 2008 15:21:55 -0500 |
"Leslie Rhorer" <lrhorer@satx.rr.com> wrote in message
news:47f04718$0$22795$4c368faf@roadrunner.com...
>I have a Debian Etch RAID server which suddenly suffered a siezure the
>other day. I had to physically move the system, so I did a clean shut down
>and killed power to both the CPU and the external RAID array. I powered up
>the RAID array with no problems, but when I turned on the main power to the
>CPU box (not the power supply start-up), the UPS instantly shut down. It
>didn't trip a breaker, it just quit. I reset the UPS, powered up the RAID
>array, turned power on to the CPU system, and then hit the ON switch on the
>CPU. Everything seemed to be coming up normally, except I noticed the RAID
>array drive never loaded. Then suddenly I started getting lots of usbcore
>messages on the boot screen (these were common in the kernel logs long
>before this happened, however). Eventually I got the CLI login prompt, but
>KDM and KDE never came up on the console. I was able to come in via XDMCP,
>and KDM dutifically logs me in to KDE (or whatever). The link /dev/sda1
>for the RAID volume was missing. An lsmod shows neither the RAID driver
>nor the video (NVIDIA) driver were loaded. I can manually insmod either
>one, and they seem to work. Browsing the logs doesn't bring up anything
>that jumps out at me as being a root cause of the problem, but goodness
>knows what else isn't loading or why. Pulling the USB port on the UPS
>makes the annoying usbcore messages go away, but didn't seem to help
>anything else.
OK, I think I figured it out, or the main issue, anyway. I upgraded a
number of packages, and although I don't recall specifically noticing that
somewhere along the way, the kernel was also upgraded. The RAID driver's
and the video driver's modules were apparently lost in the shuffle. I
re-installed, and now things seem OK, but I'm still getting the annoying usb
error filling up the logs. Does anyone havea suggestion for that?
>drivers/usb/input/hid-core.c: ctrl urb status -110 received
|
| Post Reply
|
|
|