Groups > RedHat > device mapper development > [dm-devel] Re: Data corruption on software RAID




[dm-devel] Re: Data corruption on software RAID

[dm-devel] Re: Data corruption on software RAID
Thu, 10 Apr 2008 10:21:01 -040
Mikulas Patocka wrote:
>>> Possibilities how to fix it:
>>>
>>> 1. lock the buffers and pages while they are being written --- this
would
>>> cause performance degradation (the most severe degradation would be
in case
>>> when one process does repeatedly sync() and other unrelated
process
>>> repeatedly writes to some file).
>>>
>>> Lock the buffers and pages only for RAID --- would create many
special cases
>>> and possible bugs.
>>>
>>> 2. never turn the region dirty bit off until the filesystem is
unmounted.
>>> --- this is the simplest fix. If the computer crashes after a long
time, it
>>> resynchronizes the whole device. But there won't cause
application-visible
>>> or filesystem-visible data corruption.
>>>
>>> 3. turn off the region bit if the region wasn't written in one
pdflush
>>> period --- requires an interaction with pdflush, rather complex.
The problem
>>> here is that pdflush makes its best effort to write data in
>>> dirty_writeback_centisecs interval, but it is not guaranteed to do
it.
>>>
>>> 4. make more region states: Region has in-memory states CLEAN,
DIRTY,
>>> MAYBE_DIRTY, CLEAN_CANDIDATE.
>>>
>>> When you start writing to the region, it is always moved to DIRTY
state (and
>>> on-disk bit is turned on).
>>>
>>> When you finish all writes to the region, move it to MAYBE_DIRTY
state, but
>>> leave bit on disk on. We now don't know if the region is dirty or
no.
>>>
>>> Run a helper thread that does periodically:
>>> Change MAYBE_DIRTY regions to CLEAN_CANDIDATE
>>> Issue sync()
>>> Change CLEAN_CANDIDATE regions to CLEAN state and clear their
on-disk bit.
>>>
>>> The rationale is that if the above write-while-modify scenario
happens, the
>>> page is always dirty. Thus, sync() will write the page, kick the
region back
>>> from CLEAN_CANDIDATE to MAYBE_DIRTY state and we won't mark the
region as
>>> clean on disk.
>>>
>>>
>>> I'd like to know you ideas on this, before we start coding a
solution.
>>>   
>>>       
>> I looked at just this problem a while ago, and came to the conclusion
that
>> what was needed was a COW bit, to show that there was i/o in flight,
and that
>> before modification it needed to be copied. Since you don't want to let
that
>> recurse, you don't start writing the copy until the original is written
and
>> freed. Ideally you wouldn't bother to finish writing the original, but
that
>> doesn't seem possible. That allows at most two copies of a chunk to
take up
>> memory space at once, although it's still ugly and can be a
bottleneck.
>>     
>
> Copying the data would be performance overkill. You can really write 
> different data to different disks, you just must not forget to resync them

> after a crash. The filesystem/application will recover with either old or 
> new data --- it just won't recover when it's reading old and new data from

> the same location.
>
>   
Currently you can go for hours without ever reaching a clean state on 
active files. By not deliberately allowing the buffer to change during a 
write the chances for getting consistent data on the disk should be 
significantly improved.
> >From my point of view that trick with thread doing sync() and turning
off 
> region bits looks best. I'd like to know if that solution doesn't have any

> other flaw.
>
>   
>> For reliable operation I would want all copies (and/or CRCs) to be
written on
>> an fsync, by the time I bother to fsync I really, really, want the data
on the
>> disk.
>>     
>
> fsync already works this way.
>   

The point I was making is that after you change the code I would still 
want that to happen. And your comment above seems to indicate a goal of 
getting consistent data after a crash, with less concern that it be the 
most recent data written. Sorry in advance if that's a misreading of 
"you just must not forget to resync them after a crash."

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 


Post Reply
about | contact