June 2011

今日も某所で英語のお勉強

痛車とかShaorinへの愛とかについて語ってる

I have family, who need peaceful life without being finger-pointed at their back....


英語でも後ろ指を指されるっていうんだ!勉強した!


英語では to talk about someone behind their back が辞書に載っている慣用句ですが、文字通り後ろ指を指すという表現をご近所のネイティブのお爺さんが言っておりました。


http://twitter.com/#!/nankyokujusei/status/83722915241144320

Just a moment! は、『待てこら!』『ちょおっとまったあああ!』という意味で、ゆのかさんの使い方はネイティブと言えます。



http://twitter.com/#!/nankyokujusei/status/83725432100364288

Just a moment, folks. We are drifting out from our focus! 『みなさん、ちょっと待ってください。議題から逸れています』と会議で発言するとかっこいい。



http://twitter.com/#!/nankyokujusei/status/83734953996726272

Wait a minute! のほうが Just a moment! よりキツい感じ。後者は会議中に言われてもたいしてむっとしませんが、前者だと敵を作るかも。




ほむほむ。「マテやゴルァ(゚Д゚ )」か。だいたい分かった。勉強になるなー
今後ビジネスシーンで使うかどうかはともかくとして

/proc/statはいくらなんでも最悪だから、/sys/devices/system/cpu/online を使えって
パッチ。これはいかなるデメリットもないので受け入れられるのではないだろうか



On Thu, Jun 16, 2011 at 12:07 PM, Ulrich Drepper  wrote:
> >
> > I'm not opposed to any improvements. Do you think I like this
> > implementation while every other OS which has these interfaces
> > implement sysconf() using a simple syscall?
The thing is, I'd be more impressed by your statement if I actually
believed it. Every time I report a problem with glibc, it does seem
like you're actually against improvements, and instead of improving
glibc code, you _invariably_ blame other things instead.

It might be the kernel interfaces, or it might be "broken programs"
like adobe flash. But never is it glibc itself that is the problem.

So I sent a simple patch that I actually think will improve
performance enormously, and nobody will ever complain. But no, not
acceptable.

We can do it other ways. You already parse /sys/devices/system/cpu for
_SC_NPROCESSORS_CONF, although you do it in an odd way (by iterating
over directory entries). You could just open
/sys/devices/system/cpu/online, and parse the result from there
instead. If that file doesn't exist, the system doesn't support
hotplug, so you can do the caching.

The fact is, the kernel interfaces are ALREADY much better than the
ones you actually use.

Here's a new patch. The code worked when it was a test-program, I
haven't actually tested it within the confines of the glibc thing.
There may be special magical internal glibc names it should use for
fopen/fscanf etc.

Linus


patch.diff

sysdeps/unix/sysv/linux/getsysstats.c | 84 +++++++++++++++++++++++++++++++--
1 files changed, 80 insertions(+), 4 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/getsysstats.c b/sysdeps/unix/sysv/linux/getsysstats.c
index af454b650de9..cda92fc4f8c8 100644
--- a/sysdeps/unix/sysv/linux/getsysstats.c
+++ b/sysdeps/unix/sysv/linux/getsysstats.c
@@ -124,9 +124,44 @@ next_line (int fd, char *const buffer, char **cp, char **re,
return res == *re ? NULL : res;
}

+static int __get_sysfs_cpus(const char *path)
+{
+ FILE *file;
+ int nr_cpus = 0;
+ int prev = -1;
+
+ file = fopen(path, "r");
+ if (!file)
+ return -1;
+ for (;;) {
+ char sep;
+ int cpu;
+ int n = fscanf(file, "%u%c", &cpu, &sep);
+ if (n <= 0)
+ break;
+
+ /* EOF == EOLN */
+ if (n == 1)
+ sep = '\n';
+
+ /* Was the previous CPU a range? */
+ if (prev >= 0) {
+ nr_cpus += cpu - prev + 1;
+ prev = -1;
+ } else if (sep == '-')
+ prev = cpu;
+ else
+ nr_cpus++;
+
+ if (sep == '\n')
+ break;
+ }
+ fclose(file);
+ return nr_cpus;
+}

-int
-__get_nprocs ()
+static int
+__get_nprocs_from_proc (void)
{
/* XXX Here will come a test for the new system call. */

@@ -171,13 +206,37 @@ __get_nprocs ()

return result;
}
+
+int
+__get_nprocs ()
+{
+ long ret;
+ static int cached = -1;
+
+ ret = cached;
+ if (ret < 0)
+ {
+ ret = __get_sysfs_cpus("/sys/devices/system/cpu/online");
+
+ /*
+ * If that failed, we don't support hotplug, and we will
+ * instead cache the result from reading /proc/stat.
+ */
+ if (ret < 0)
+ {
+ ret = __get_nprocs_from_proc();
+ cached = ret;
+ }
+ }
+ return ret;
+}
weak_alias (__get_nprocs, get_nprocs)


/* On some architectures it is possible to distinguish between configured
and active cpus. */
-int
-__get_nprocs_conf ()
+static int
+__get_nprocs_conf_internal (void)
{
/* XXX Here will come a test for the new system call. */

@@ -224,6 +283,23 @@ __get_nprocs_conf ()

return result;
}
+
+int
+__get_nprocs_conf ()
+{
+ long ret;
+ static int cached = -1;
+
+ ret = cached;
+ if (ret < 0)
+ {
+ ret = __get_sysfs_cpus("/sys/devices/system/cpu/possible");
+ if (ret < 0)
+ ret = __get_nprocs_conf_internal();
+ cached = ret;
+ }
+ return ret;
+}
weak_alias (__get_nprocs_conf, get_nprocs_conf)

/* General function to get information about memory status from proc

anon vmaの話はまだまだ続きます。Andi が不用意に「いやーよくしらないけど、glibcにもきっと理由があるんだよ」とか言った瞬間にLinus にフルボッコにされるの図。


On Thu, Jun 16, 2011 at 1:14 PM, Andi Kleen  wrote:
> >
> > I haven't analyzed it in detail, but I suspect it's some cache line bounce,
> > which
> > can slow things down quite a lot. Also the total number of invocations
> > is quite high (hundreds of messages per core * 32 cores)

The fact is, glibc is just total crap.

I tried to send uli a patch to just add caching. No go. I sent
*another* patch to at least make glibc use a sane interface (and the
cache if it needs to fall back on /proc/stat for some legacy reason).
We'll see what happens.

Paul Eggbert suggested "caching for one second" - by just calling
"gettimtofday()" to see how old the cache is. That would work too.

The point I'm making is that it really is a glibc problem. Glibc is
doing stupid expensive things, and not trying to correct for the fact
that it's expensive.

> > I did, but I gave up fully following that code path because it's so
> > convoluted :-/
I do agree that glibc sources are incomprehensible, with multiple
layers of abstraction (sysdeps, "posix", helper functions etc etc).

In this case it was really trivial to find the culprit with a simple

git grep /proc/stat

though. The code is crap. It's insane. It's using
/sys/devices/system/cpu for _SC_NPROCESSORS_CONF, which is at least a
reasonable interface to use. But it does it in odd ways, and actually
counts the CPU's by doing a readdir call. And it doesn't cache the
result, even though that particular result had better be 100% stable -
it has nothing to do with "online" vs "offline" etc.

But then for _SC_NPROCESSORS_ONLN, it doesn't actually use
/sys/devices/system/cpu at all, but the /proc/stat interface. Which is
slow, mostly because it has all the crazy interrupt stuff in it, but
also because it has lots of legacy stuff.

I wrote a _much_ cleaner routine (loosely based on what we do in
tools/prof) to just parse /sys/devices/system/cpu/online. I didn't
even time it, but I can almost guarantee that it's an order of
magnitude faster than /proc/stat. And if that doesn't work, you can
fall back on a cached version of the /proc/stat parsing, since if
those files don't exist, you can forget about CPU hotplug.

> > So you mean caching it at startup time? Otherwise the parent would
> > need to do sysconf() at least , which it doesn't do (the exim source doesn't
> > really know anything about libdb internals)
Even if you do it in the children, it will help. At least it would be
run just _once_ per fork.

But actually looking at glibc just shows that they are simply doing
stupid things. And I absolutely _refuse_ to add new interfaces to the
kernel only because glibc is being a moron.

Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: email@kvack.org



さあ、ごいっしょに。

The fact is, glibc is just total crap.

インテルの性能部隊が最近のカーネルで性能が落ちたと報告すんげー長いスレッドが始まる。
exim がlibdbつかってて、それがCPU数取得する必要があって、それが/proc/stat読むから
遅いとかいう話に。

で、Andi が fork/exec しまくったら、それぞれのプロセスが変数初期化しないと
いけないのはしょうがないんじゃ・・的な事を口走った瞬間に Linus にフルボッコに
されたでござる。



From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Wed, 15 Jun 2011 17:16:57 -0700
Message-ID: <BANLkTi=Tw6je7zpi4L=pE0JJpZfeEC9Jsg@mail.gmail.com>
Subject: Re: REGRESSION: Performance regressions from switching anon_vma->lock
to mutex


On Wed, Jun 15, 2011 at 3:19 PM, Andi Kleen wrote:
> >
> > Caching doesn't help because the library gets reinitialized in every child
> > (it may already do caching, not fully sure for this; it does it for other
> > sysconfs at least)
Why the hell do you continue to make excuses for glibc that are
*clearly*not*true*?

Stop this insanity, Andi. Do you realize that this kind of crazy
behavior just makes me convinced that there is no way in hell I should
*ever* take your sysconfig patch, since all your analysis for it is
totally worthless?

JUST LOOK AT THE NUMBERS, for chrissake!

When format_decode is 7% of the whole workload, and the top 15
functions of the profile look like this:

6.40% exim [kernel.kallsyms] [k] format_decode
5.26% exim [kernel.kallsyms] [k] page_fault
5.05% exim [kernel.kallsyms] [k] vsnprintf
3.55% exim [kernel.kallsyms] [k] number
3.00% exim [kernel.kallsyms] [k] copy_page_c
2.88% exim [kernel.kallsyms] [k] read_hpet
2.38% exim libc-2.13.90.so [.] __GI_vfprintf
1.92% exim [kernel.kallsyms] [k] kstat_irqs
1.53% exim [kernel.kallsyms] [k] find_vma
1.47% exim [kernel.kallsyms] [k] _raw_spin_lock
1.40% exim [kernel.kallsyms] [k] seq_printf
1.34% exim [kernel.kallsyms] [k] radix_tree_lookup
1.21% exim [kernel.kallsyms] [k]
page_cache_get_speculative
1.20% exim [kernel.kallsyms] [k] clear_page_c
1.05% exim [kernel.kallsyms] [k] do_page_fault

I can pretty much guarantee that it doesn't do just one /proc/stat
read per fork() just to get the number of CPU's.

/proc/stat may be slow, but it's not slower than doing real work -
unless you call it millions of times.

And you didn't actually look at glibc sources, did you? Because if you
had, you would ALSO have seen that you are totally full of sh*t. Glibc
at no point caches anything.

So repeat after me: stop making excuses and lying about glibc. It's
crap. End of story.


> > I don't think glibc is crazy in this. It has no other choice.
Stop this insanity, Andi. Why do you lie or just make up arguments? WHY?

There is very clearly no caching going on. And since exim doesn't even
execve, it just forks, it's very clear that it could cache things just
ONCE, so your argument that caching wouldn't be possible at that level
is also bogus.

I can certainly agree that /proc/stat isn't wonderful (it used to be
better), but that's no excuse for just totally making up excuses for
just plain bad *stupid* behavior in user space. And it certainly
doesn't excuse just making shit up!

Linus



リピート・アフタ・ミー!!


PS. ところで repeat after me って日本語でなんていうんだっけ、意味は分かるけど
訳せないという、このもんにょり感。なんとドンピシャが熟語があったはずなんだけど・・・
日本語難しいよ日本語


PS2 glibcのソース的には以下だね。たしかにキャッシュしてない


int
__get_nprocs ()
{
/* XXX Here will come a test for the new system call. */

const size_t buffer_size = __libc_use_alloca (8192) ? 8192 : 512;
char *buffer = alloca (buffer_size);
char *buffer_end = buffer + buffer_size;
char *cp = buffer_end;
char *re = buffer_end;
int result = 1;

#ifdef O_CLOEXEC
const int flags = O_RDONLY | O_CLOEXEC;
#else
const int flags = O_RDONLY;
#endif
/* The /proc/stat format is more uniform, use it by default. */
int fd = open_not_cancel_2 ("/proc/stat", flags);
if (fd != -1)
{
result = 0;

char *l;
while ((l = next_line (fd, buffer, &cp, &re, buffer_end)) != NULL)
/* The current format of /proc/stat has all the cpu* entries
at the front. We assume here that stays this way. */
if (strncmp (l, "cpu", 3) != 0)
break;
else if (isdigit (l[3]))
++result;

close_not_cancel_no_status (fd);
}
else
(snip)

vm4_pass_flood

1000.times{
Thread.new{loop{Thread.pass}}
}

i=0
while i<10000
i += 1
end


ruby 1.9.2p274 (2011-06-06 revision 31932) [x86_64-linux] 649.932942867279
ruby 1.9.2p274 (2011-06-06 revision 31932) [x86_64-linux] 655.249534845352
ruby 1.9.2p274 (2011-06-06 revision 31932) [x86_64-linux] 628.536836147308
ruby 1.9.3dev (2011-06-14 trunk 32073) [x86_64-linux] 1.52222299575806
ruby 1.9.3dev (2011-06-14 trunk 32073) [x86_64-linux] 1.41666984558105
ruby 1.9.3dev (2011-06-14 trunk 32073) [x86_64-linux] 1.50657510757446


やっぱ、1.9.2 はスレッド増えるとだめな子なのね
逆に 1.9.3 のほうが遅いケースもある

vm4_thread_pass

# Plenty Thtread.pass
# A performance may depend on GVL implementation.

tmax = (ARGV.shift || 2).to_i
lmax = 200_000 / tmax

(1..tmax).map{
Thread.new{
lmax.times{
Thread.pass
}
}
}.each{|t| t.join}



ruby 1.9.2p274 (2011-06-06 revision 31932) [x86_64-linux] 0.430890083312988
ruby 1.9.2p274 (2011-06-06 revision 31932) [x86_64-linux] 0.422628164291382
ruby 1.9.2p274 (2011-06-06 revision 31932) [x86_64-linux] 0.426900863647461
ruby 1.9.3dev (2011-06-14 trunk 32073) [x86_64-linux] 2.10650300979614
ruby 1.9.3dev (2011-06-14 trunk 32073) [x86_64-linux] 2.13981986045837
ruby 1.9.3dev (2011-06-14 trunk 32073) [x86_64-linux] 2.23547196388245

Now I am paying for my naive assumption by sitting on the ankles.

今、自分の浅はかな仮定の報いを正座でうけている
今、自分の浅はかな思い込みの報いを受けて、正座している

うーん、この熟語を人生で使う事は一度もない気がしてきたぞ

search_binary_handler() で何も考えずに set_fs(USER_DS); してるから
init起動するときに、最初のパスに目的の実行ファイルがないとその後トチ狂った状態に
突入してしまうぜ-。2.6.12にはすでに存在してるバグだぜーとか議論している

んで、Andrew Morton が bitkeeperからimport した treeは色々コミットが混ざってて
ログが見にくくてしかたがないとか愚痴ってるとLinusが颯爽と登場。
「tglxのlinux-history treeを使えよ。」とか言い出すわけだ。

おいおい、いつの間にそんなの出来たんだよ。
しかし、これは便利




From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Tue, 7 Jun 2011 19:00:24 -0700
Message-ID: <BANLkTinwMyVsR-pvimeVkcqQrCuNDK4zKw@mail.gmail.com>
Subject: Re: [PATCH] init: use KERNEL_DS when trying to start init process

On Mon, Jun 6, 2011 at 4:12 PM, Andrew Morton wrote:
>
> I tried to work out how that set_fs() got there, in the historical git
> tree but it's part of 14592fa9:
>
> 73 files changed, 963 insertions(+), 798 deletions(-)
>
> which is pretty useless (what's up with that?)

Use tglx's more complete linux-history tree:

http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=summary

instead of the bkcvs import tree.

That said, that commit (it's commit ID 4095b99c09e3d in tglx's tree)
predates the "real" BK history too: it's part of the (limited) 2.4.x
history that was imported from the release patches into BK at the
beginning of the use of BK. So at that point we didn't do indivual
commits, it's just the import of the v2.4.3.7 -> v2.4.3.8 patch.

But yeah, it's old and crufty. And I agree that usually the correct
fix is to remove the set_fs() calls entirely.

Linus


↑このページのトップヘ