|
| Disks in error state |
 |
9 Nov 2006 12:53:29 -0500 |
Hi!
We have a problem with Veritas 4.1 on Solaris 9. Our SAN environment consists
of two IBM FastT900 situated at two physically separeted locations. All LUN's
are mirrored, one mirror on each FastT for redundancy if one FastT dives.
We use Qlogic HBA cards to connect the Solaris boxes to the SAN.
The past week we rebuilt our SAN environment, one FastT at the time. The
first one went smooth, we disconnected it and dmp took care of everything,
letting the other FastT take over.
We upgraded the firmware and when the FastT was back online we re-created
the LUN's, and extended some LUN's aswell while we were at it (planning to
do the same on the other FastT). Then we used VEA to replace the
"failed"
disk's with the newly re-created ones.
On this FastT we use LUN's ranging from 1-26. Previously we've been using
the same LUN's on the other FastT aswell but now we changed the LUN's from
1-26 to 50-76 instead to be able to easier know which LUN's are on which
FastT.
The problem now is that after we fixed the other FastT and changed the LUN's
from 1-26 to 50-76 all the disk's remain in error state.
This show's the LUN's on the FastT we upgraded first:
c7t4d0s2 auto:cdsdisk - - online
c7t4d1s2 auto:cdsdisk - - online
c7t4d2s2 auto:cdsdisk - - online
c7t4d3s2 auto:cdsdisk - - online
c7t4d4s2 auto:cdsdisk - - online
This show's the LUN's on the FastT we upgraded last, that gives us trouble:
c7t2d50s2 auto - - error
c7t2d51s2 auto - - error
c7t2d52s2 auto - - error
c7t2d53s2 auto - - error
c7t2d54s2 auto - - error
c7t4d0 and c7t2d50 are a mirror-pair (or are supposed to be), c7t4d1 and
c7t2d51 and so on.
I've done the following:
a) Asked Solaris to clear old disk devices (devfsadm -C -c disk)
b) Got the HBA card to rediscover LUN's (/opt/JNIC146x/jnic146x_update_drv
-a -r)
c) Updated disk devices on Solaris (devfsadm -c disk)
d) Labeled the new disk's using 'format'
e) ran vxdctl to make vxconfigd aware of the newly added disk's/LUN's
I am able to format the disks, so the OS should be aware of them, as would
Veritas, why do they remain in the error state?
I've tried rebooting the Solaris box with the only result that the old LUN's
(1-26) got removed, the disk's remain in the same state though.
I understand that it's hard to help because there are lots of combinations
of HBA's, SAN controllers and OS'es and it's a long-shot to ask for help
|
| Post Reply
|
| Re: Disks in error state |
 |
10 Nov 2006 07:44:29 -0500 |
Correction, we dont use Qlogic HBA cards, we use JNI cards. (we use Qlogic
on our linux systems, I was in a hurry yesterday and mixed things up)
Anyways, for some reason our problems seem to be connected to dmp. Our SAN
controller reports that some LUN's arent on their preferred path, when we
change this -some- LUN's leave the error state. It doesnt apply to all tho,
so there's some other problem aswell it seems.
"Christian Nord" <mistkvist@gmail.com> wrote:
>
>Hi!
>
>We have a problem with Veritas 4.1 on Solaris 9. Our SAN environment
consists
>of two IBM FastT900 situated at two physically separeted locations. All
LUN's
>are mirrored, one mirror on each FastT for redundancy if one FastT dives.
>
>We use Qlogic HBA cards to connect the Solaris boxes to the SAN.
>
>The past week we rebuilt our SAN environment, one FastT at the time. The
>first one went smooth, we disconnected it and dmp took care of everything,
>letting the other FastT take over.
>
>We upgraded the firmware and when the FastT was back online we re-created
>the LUN's, and extended some LUN's aswell while we were at it (planning
to
>do the same on the other FastT). Then we used VEA to replace the
"failed"
>disk's with the newly re-created ones.
>On this FastT we use LUN's ranging from 1-26. Previously we've been using
>the same LUN's on the other FastT aswell but now we changed the LUN's from
>1-26 to 50-76 instead to be able to easier know which LUN's are on which
>FastT.
>
>The problem now is that after we fixed the other FastT and changed the
LUN's
>from 1-26 to 50-76 all the disk's remain in error state.
>
>
>This show's the LUN's on the FastT we upgraded first:
>c7t4d0s2 auto:cdsdisk - - online
>c7t4d1s2 auto:cdsdisk - - online
>c7t4d2s2 auto:cdsdisk - - online
>c7t4d3s2 auto:cdsdisk - - online
>c7t4d4s2 auto:cdsdisk - - online
>
>This show's the LUN's on the FastT we upgraded last, that gives us trouble:
>c7t2d50s2 auto - - error
>c7t2d51s2 auto - - error
>c7t2d52s2 auto - - error
>c7t2d53s2 auto - - error
>c7t2d54s2 auto - - error
>
>c7t4d0 and c7t2d50 are a mirror-pair (or are supposed to be), c7t4d1 and
>c7t2d51 and so on.
>
>I've done the following:
>a) Asked Solaris to clear old disk devices (devfsadm -C -c disk)
>b) Got the HBA card to rediscover LUN's (/opt/JNIC146x/jnic146x_update_drv
>-a -r)
>c) Updated disk devices on Solaris (devfsadm -c disk)
>d) Labeled the new disk's using 'format'
>e) ran vxdctl to make vxconfigd aware of the newly added disk's/LUN's
>
>I am able to format the disks, so the OS should be aware of them, as would
>Veritas, why do they remain in the error state?
>I've tried rebooting the Solaris box with the only result that the old
LUN's
>(1-26) got removed, the disk's remain in the same state though.
>
>I understand that it's hard to help because there are lots of combinations
>of HBA's, SAN controllers and OS'es and it's a long-shot to ask for help
>here, but I would be very happy if someone tries. Anything helps.
|
| Post Reply
|
| Re: Disks in error state |
 |
10 Nov 2006 16:36:30 -0500 |
You've run into a "feature" in DMP after 4.0. A DMP device does not
have
to match the hardware device. You can run the command
"vxdmpadmin getsubpaths dmpnodename=<dmpdevice>" to see which
hardware device is actually mapped to the dmp device.
There is a tech note that tells how to clear the
dmp database, which may work.
Steve
"Christian Nord" <mistkvist@gmail.com> wrote:
>
>Correction, we dont use Qlogic HBA cards, we use JNI cards. (we use Qlogic
>on our linux systems, I was in a hurry yesterday and mixed things up)
>
>Anyways, for some reason our problems seem to be connected to dmp. Our SAN
>controller reports that some LUN's arent on their preferred path, when we
>change this -some- LUN's leave the error state. It doesnt apply to all tho,
>so there's some other problem aswell it seems.
>
>"Christian Nord" <mistkvist@gmail.com> wrote:
>>
>>Hi!
>>
>>We have a problem with Veritas 4.1 on Solaris 9. Our SAN environment
consists
>>of two IBM FastT900 situated at two physically separeted locations. All
>LUN's
>>are mirrored, one mirror on each FastT for redundancy if one FastT
dives.
>>
>>We use Qlogic HBA cards to connect the Solaris boxes to the SAN.
>>
>>The past week we rebuilt our SAN environment, one FastT at the time.
The
>>first one went smooth, we disconnected it and dmp took care of
everything,
>>letting the other FastT take over.
>>
>>We upgraded the firmware and when the FastT was back online we
re-created
>>the LUN's, and extended some LUN's aswell while we were at it (planning
>to
>>do the same on the other FastT). Then we used VEA to replace the
"failed"
>>disk's with the newly re-created ones.
>>On this FastT we use LUN's ranging from 1-26. Previously we've been
using
>>the same LUN's on the other FastT aswell but now we changed the LUN's
from
>>1-26 to 50-76 instead to be able to easier know which LUN's are on
which
>>FastT.
>>
>>The problem now is that after we fixed the other FastT and changed the
LUN's
>>from 1-26 to 50-76 all the disk's remain in error state.
>>
>>
>>This show's the LUN's on the FastT we upgraded first:
>>c7t4d0s2 auto:cdsdisk - - online
>>c7t4d1s2 auto:cdsdisk - - online
>>c7t4d2s2 auto:cdsdisk - - online
>>c7t4d3s2 auto:cdsdisk - - online
>>c7t4d4s2 auto:cdsdisk - - online
>>
>>This show's the LUN's on the FastT we upgraded last, that gives us
trouble:
>>c7t2d50s2 auto - - error
>>c7t2d51s2 auto - - error
>>c7t2d52s2 auto - - error
>>c7t2d53s2 auto - - error
>>c7t2d54s2 auto - - error
>>
>>c7t4d0 and c7t2d50 are a mirror-pair (or are supposed to be), c7t4d1
and
>>c7t2d51 and so on.
>>
>>I've done the following:
>>a) Asked Solaris to clear old disk devices (devfsadm -C -c disk)
>>b) Got the HBA card to rediscover LUN's
(/opt/JNIC146x/jnic146x_update_drv
>>-a -r)
>>c) Updated disk devices on Solaris (devfsadm -c disk)
>>d) Labeled the new disk's using 'format'
>>e) ran vxdctl to make vxconfigd aware of the newly added disk's/LUN's
>>
>>I am able to format the disks, so the OS should be aware of them, as
would
>>Veritas, why do they remain in the error state?
>>I've tried rebooting the Solaris box with the only result that the old
LUN's
>>(1-26) got removed, the disk's remain in the same state though.
>>
>>I understand that it's hard to help because there are lots of
combinations
>>of HBA's, SAN controllers and OS'es and it's a long-shot to ask for
help
>>here, but I would be very happy if someone tries. Anything helps.
>
|
| Post Reply
|
|
|