Groups > Software Support > Veritas Volume manager > Re: Disks in error state




Disks in error state

Disks in error state
9 Nov 2006 12:53:29 -0500
Hi!

We have a problem with Veritas 4.1 on Solaris 9. Our SAN environment consists
of two IBM FastT900 situated at two physically separeted locations. All LUN's
are mirrored, one mirror on each FastT for redundancy if one FastT dives.

We use Qlogic HBA cards to connect the Solaris boxes to the SAN.

The past week we rebuilt our SAN environment, one FastT at the time. The
first one went smooth, we disconnected it and dmp took care of everything,
letting the other FastT take over.

We upgraded the firmware and when the FastT was back online we re-created
the LUN's, and extended some LUN's aswell while we were at it (planning to
do the same on the other FastT). Then we used VEA to replace the
"failed"
disk's with the newly re-created ones.
On this FastT we use LUN's ranging from 1-26. Previously we've been using
the same LUN's on the other FastT aswell but now we changed the LUN's from
1-26 to 50-76 instead to be able to easier know which LUN's are on which
FastT.

The problem now is that after we fixed the other FastT and changed the LUN's
from 1-26 to 50-76 all the disk's remain in error state.


This show's the LUN's on the FastT we upgraded first:
c7t4d0s2     auto:cdsdisk    -            -            online
c7t4d1s2     auto:cdsdisk    -            -            online
c7t4d2s2     auto:cdsdisk    -            -            online
c7t4d3s2     auto:cdsdisk    -            -            online
c7t4d4s2     auto:cdsdisk    -            -            online

This show's the LUN's on the FastT we upgraded last, that gives us trouble:
c7t2d50s2    auto            -            -            error
c7t2d51s2    auto            -            -            error
c7t2d52s2    auto            -            -            error
c7t2d53s2    auto            -            -            error
c7t2d54s2    auto            -            -            error

c7t4d0 and c7t2d50 are a mirror-pair (or are supposed to be), c7t4d1 and
c7t2d51 and so on.

I've done the following:
a) Asked Solaris to clear old disk devices (devfsadm -C -c disk)
b) Got the HBA card to rediscover LUN's (/opt/JNIC146x/jnic146x_update_drv
-a -r)
c) Updated disk devices on Solaris (devfsadm -c disk)
d) Labeled the new disk's using 'format'
e) ran vxdctl to make vxconfigd aware of the newly added disk's/LUN's

I am able to format the disks, so the OS should be aware of them, as would
Veritas, why do they remain in the error state?
I've tried rebooting the Solaris box with the only result that the old LUN's
(1-26) got removed, the disk's remain in the same state though.

I understand that it's hard to help because there are lots of combinations
of HBA's, SAN controllers and OS'es and it's a long-shot to ask for help
Post Reply
Re: Disks in error state
10 Nov 2006 07:44:29 -0500
Correction, we dont use Qlogic HBA cards, we use JNI cards. (we use Qlogic
on our linux systems, I was in a hurry yesterday and mixed things up)

Anyways, for some reason our problems seem to be connected to dmp. Our SAN
controller reports that some LUN's arent on their preferred path, when we
change this -some- LUN's leave the error state. It doesnt apply to all tho,
so there's some other problem aswell it seems.

"Christian Nord" <mistkvist@gmail.com> wrote:
>
>Hi!
>
>We have a problem with Veritas 4.1 on Solaris 9. Our SAN environment
consists
>of two IBM FastT900 situated at two physically separeted locations. All
LUN's
>are mirrored, one mirror on each FastT for redundancy if one FastT dives.
>
>We use Qlogic HBA cards to connect the Solaris boxes to the SAN.
>
>The past week we rebuilt our SAN environment, one FastT at the time. The
>first one went smooth, we disconnected it and dmp took care of everything,
>letting the other FastT take over.
>
>We upgraded the firmware and when the FastT was back online we re-created
>the LUN's, and extended some LUN's aswell while we were at it (planning
to
>do the same on the other FastT). Then we used VEA to replace the
"failed"
>disk's with the newly re-created ones.
>On this FastT we use LUN's ranging from 1-26. Previously we've been using
>the same LUN's on the other FastT aswell but now we changed the LUN's from
>1-26 to 50-76 instead to be able to easier know which LUN's are on which
>FastT.
>
>The problem now is that after we fixed the other FastT and changed the
LUN's
>from 1-26 to 50-76 all the disk's remain in error state.
>
>
>This show's the LUN's on the FastT we upgraded first:
>c7t4d0s2     auto:cdsdisk    -            -            online
>c7t4d1s2     auto:cdsdisk    -            -            online
>c7t4d2s2     auto:cdsdisk    -            -            online
>c7t4d3s2     auto:cdsdisk    -            -            online
>c7t4d4s2     auto:cdsdisk    -            -            online
>
>This show's the LUN's on the FastT we upgraded last, that gives us trouble:
>c7t2d50s2    auto            -            -            error
>c7t2d51s2    auto            -            -            error
>c7t2d52s2    auto            -            -            error
>c7t2d53s2    auto            -            -            error
>c7t2d54s2    auto            -            -            error
>
>c7t4d0 and c7t2d50 are a mirror-pair (or are supposed to be), c7t4d1 and
>c7t2d51 and so on.
>
>I've done the following:
>a) Asked Solaris to clear old disk devices (devfsadm -C -c disk)
>b) Got the HBA card to rediscover LUN's (/opt/JNIC146x/jnic146x_update_drv
>-a -r)
>c) Updated disk devices on Solaris (devfsadm -c disk)
>d) Labeled the new disk's using 'format'
>e) ran vxdctl to make vxconfigd aware of the newly added disk's/LUN's
>
>I am able to format the disks, so the OS should be aware of them, as would
>Veritas, why do they remain in the error state?
>I've tried rebooting the Solaris box with the only result that the old
LUN's
>(1-26) got removed, the disk's remain in the same state though.
>
>I understand that it's hard to help because there are lots of combinations
>of HBA's, SAN controllers and OS'es and it's a long-shot to ask for help
>here, but I would be very happy if someone tries. Anything helps.
Post Reply
Re: Disks in error state
10 Nov 2006 16:36:30 -0500
You've run into a "feature" in DMP after 4.0.  A DMP device does not
have
to match the hardware device.  You can run the command 
"vxdmpadmin getsubpaths dmpnodename=<dmpdevice>" to see which 
hardware device is actually mapped to the dmp device. 

There is a tech note that tells how to clear the
dmp database, which may work.

Steve

"Christian Nord" <mistkvist@gmail.com> wrote:
>
>Correction, we dont use Qlogic HBA cards, we use JNI cards. (we use Qlogic
>on our linux systems, I was in a hurry yesterday and mixed things up)
>
>Anyways, for some reason our problems seem to be connected to dmp. Our SAN
>controller reports that some LUN's arent on their preferred path, when we
>change this -some- LUN's leave the error state. It doesnt apply to all tho,
>so there's some other problem aswell it seems.
>
>"Christian Nord" <mistkvist@gmail.com> wrote:
>>
>>Hi!
>>
>>We have a problem with Veritas 4.1 on Solaris 9. Our SAN environment
consists
>>of two IBM FastT900 situated at two physically separeted locations. All
>LUN's
>>are mirrored, one mirror on each FastT for redundancy if one FastT
dives.
>>
>>We use Qlogic HBA cards to connect the Solaris boxes to the SAN.
>>
>>The past week we rebuilt our SAN environment, one FastT at the time.
The
>>first one went smooth, we disconnected it and dmp took care of
everything,
>>letting the other FastT take over.
>>
>>We upgraded the firmware and when the FastT was back online we
re-created
>>the LUN's, and extended some LUN's aswell while we were at it (planning
>to
>>do the same on the other FastT). Then we used VEA to replace the
"failed"
>>disk's with the newly re-created ones.
>>On this FastT we use LUN's ranging from 1-26. Previously we've been
using
>>the same LUN's on the other FastT aswell but now we changed the LUN's
from
>>1-26 to 50-76 instead to be able to easier know which LUN's are on
which
>>FastT.
>>
>>The problem now is that after we fixed the other FastT and changed the
LUN's
>>from 1-26 to 50-76 all the disk's remain in error state.
>>
>>
>>This show's the LUN's on the FastT we upgraded first:
>>c7t4d0s2     auto:cdsdisk    -            -            online
>>c7t4d1s2     auto:cdsdisk    -            -            online
>>c7t4d2s2     auto:cdsdisk    -            -            online
>>c7t4d3s2     auto:cdsdisk    -            -            online
>>c7t4d4s2     auto:cdsdisk    -            -            online
>>
>>This show's the LUN's on the FastT we upgraded last, that gives us
trouble:
>>c7t2d50s2    auto            -            -            error
>>c7t2d51s2    auto            -            -            error
>>c7t2d52s2    auto            -            -            error
>>c7t2d53s2    auto            -            -            error
>>c7t2d54s2    auto            -            -            error
>>
>>c7t4d0 and c7t2d50 are a mirror-pair (or are supposed to be), c7t4d1
and
>>c7t2d51 and so on.
>>
>>I've done the following:
>>a) Asked Solaris to clear old disk devices (devfsadm -C -c disk)
>>b) Got the HBA card to rediscover LUN's
(/opt/JNIC146x/jnic146x_update_drv
>>-a -r)
>>c) Updated disk devices on Solaris (devfsadm -c disk)
>>d) Labeled the new disk's using 'format'
>>e) ran vxdctl to make vxconfigd aware of the newly added disk's/LUN's
>>
>>I am able to format the disks, so the OS should be aware of them, as
would
>>Veritas, why do they remain in the error state?
>>I've tried rebooting the Solaris box with the only result that the old
LUN's
>>(1-26) got removed, the disk's remain in the same state though.
>>
>>I understand that it's hard to help because there are lots of
combinations
>>of HBA's, SAN controllers and OS'es and it's a long-shot to ask for
help
>>here, but I would be very happy if someone tries. Anything helps.
>
Post Reply
about | contact