March 2009

------- Comment #123 From Andrew Morton 2008-05-22 21:26:02 PDT -------

Life's too short to read through all this, but...

yes, ext3 does suck-by-design. Always has.

I would commend the use of the sync_file_range() syscall on Linux. It
can be used to sync a subsection of a file and will not trigger the
write-the-whole-world behavior. It will just sync the stuff you want

Yes. ext3 はデザインからして最低です。





On Tue, 17 Mar 2009, Nick Piggin wrote:
> > Yes, my patch isn't realy solusion.
> > Andrea already pointed out that it's not O_DIRECT issue, it's gup vs fork
> > issue. *and* my patch is crazy slow :)
> Well, it's an interesting question. I'd say it probably is more than
> just O_DIRECT. vmsplice too, for example (which I think is much harder
> to fix this way because the pages are retired by the other end of
> the pipe, so I don't think you can hold a lock across it).

Well, only the "fork()" has the race problem.

So having a fork-specific lock (but not naming it by directio) actually
does make sense. The fork is much less performance-critical than most
random mmap_sem users - and doesn't have the same scalability issues
either (ie people probably _do_ want to do mmap/munmap/brk concurrently
with gup lookup, but there's much less worry about concurrent fork()

It doesn't necessarily make the general problem go away, but it makes the
_particular_ race between get_user_pages() and fork() go away. Then you
can do per-page flags or whatever and not have to worry about concurrent


Andreaを見かねて助け船を出したつもりが、Linusから「むしろ kosaki パッチをマージすべきじゃね?」とか言われてアゴがはずれそう。


> splice() already has a callback for releasing the pages, so it's doable.

doable, maybe.

Linus: こうやったら出来るんじゃね?
Nick: ・・・たぶんね


・ last resort
・I've got to go with the last resort. : こうなったら最後の手段しかない

よく使うよね > 最後の手段

最初に見た時は、「前回行ったリゾート?話そらすなボケ!今はカーネルの話をしとるんじゃ!!」とか思ったものだが :-p

Andrew MortonがNILFSを2.6.30でLinusに送ると発言。

On Wed, 11 Mar 2009 01:55:42 +0900 (JST)
Ryusuke Konishi wrote:

> I've been working for the past serveral months to take review comments
> and to continually solve users' problems come up in mainling list

Yes, the maintenance has been impressive.

> (thanks for all giving comments and feedbacks!). Also, I've tried to
> stabilize API and disk format to restrict additional changes and
> ensure backward compatibility.

Well. From the point of view of mainline linux, there is no
back-compatibility issue, because the fs hasn't been merged yet.

You perhaps have back-compatibility concerns for existing users of the
out-of-tree patch, but I'd encourage you to not worry about that too
much - there will be fairly few users and they are probably pretty
technical and will be able to cope with a migration. It's a _bit_ hard
on them but on the other hand, omitting back-compatibility code leads
to a better implementation for the long term.

What you should be more concerned about is forward-compatibility. What
arrangements do you presently have in place to be able to later alter the
on-disk format without causing too much disruption? Having a strong
design here will make changes easier to do and will lead to a better

Also.. Don't get _too_ concerned about freezing the on-disk format at
this time. You could put in a mount-time printk("the nilfs on-disk
format may change at any time - do not place critical data on a nilfs
filesystem") and we leave that in place for a few months while things

And yes, I was planning on sending nilfs in to Linus for 2.6.30 unless
someone has decent-sounding reasons to hold it back.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at