July 2008

なんかオタワ行きの飛行機がビジネスクラスの席があまってたらしく、あなた、座っていいわよ。だって
ほほー
このエントリーをはてなブックマークに追加

sysrq: 全CPUのバックトレースを表示する機能の追加

が2.6.26の新機能として上がっているが、これがどういうものかというと、Magic SYSRQにl を打ち込むと全CPUの処理箇所が取れるというもの。
以下、使用例


# ./hackbench 10 process 10000 &
# echo l > /proc/sysrq-trigger

SysRq : Show backtrace of all active CPUs

Call Trace:
[] show_stack+0x80/0xa0
sp=e00000005ab8f8a0 bsp=e00000005ab811f0
[] showacpu+0xa0/0xe0
sp=e00000005ab8fa70 bsp=e00000005ab811d0
[] handle_IPI+0x210/0x3a0
sp=e00000005ab8fa70 bsp=e00000005ab81160
[] handle_IRQ_event+0x80/0x120
sp=e00000005ab8fa70 bsp=e00000005ab81128
[] __do_IRQ+0x140/0x420
sp=e00000005ab8fa70 bsp=e00000005ab810c8
[] ia64_handle_irq+0x3f0/0x420
sp=e00000005ab8fa70 bsp=e00000005ab81050
[] ia64_native_leave_kernel+0x0/0x270
sp=e00000005ab8fa70 bsp=e00000005ab81050
[] sock_alloc_send_skb+0x470/0x560
sp=e00000005ab8fc40 bsp=e00000005ab80fb8
[] unix_stream_sendmsg+0x3b0/0x6a0
sp=e00000005ab8fc70 bsp=e00000005ab80f18
[] sock_aio_write+0x260/0x2a0
sp=e00000005ab8fca0 bsp=e00000005ab80ed8
[] do_sync_write+0x170/0x260
sp=e00000005ab8fd20 bsp=e00000005ab80e88
[] vfs_write+0x310/0x320
sp=e00000005ab8fe20 bsp=e00000005ab80e38
[] sys_write+0x70/0xe0
sp=e00000005ab8fe20 bsp=e00000005ab80db8
[] ia64_ret_from_syscall+0x0/0x20
sp=e00000005ab8fe30 bsp=e00000005ab80db8
[] __kernel_syscall_via_break+0x0/0x20
sp=e00000005ab90000 bsp=e00000005ab80db8

Call Trace:
[] show_stack+0x80/0xa0
sp=e0000000670ef8b0 bsp=e0000000670e1258
[] showacpu+0xa0/0xe0
sp=e0000000670efa80 bsp=e0000000670e1238
[] handle_IPI+0x210/0x3a0
sp=e0000000670efa80 bsp=e0000000670e11c0
[] handle_IRQ_event+0x80/0x120
sp=e0000000670efa80 bsp=e0000000670e1188
[] __do_IRQ+0x140/0x420
sp=e0000000670efa80 bsp=e0000000670e1128
[] ia64_handle_irq+0x3f0/0x420
sp=e0000000670efa80 bsp=e0000000670e10b0
[] ia64_native_leave_kernel+0x0/0x270
sp=e0000000670efa80 bsp=e0000000670e10b0
[] unix_write_space+0x50/0x140
sp=e0000000670efc50 bsp=e0000000670e1070
[] sock_wfree+0x120/0x180
sp=e0000000670efc50 bsp=e0000000670e1048
[] skb_release_all+0x100/0x1a0
sp=e0000000670efc50 bsp=e0000000670e1020
[] __kfree_skb+0x20/0x1a0
sp=e0000000670efc50 bsp=e0000000670e1000
[] kfree_skb+0x60/0xc0
sp=e0000000670efc50 bsp=e0000000670e0fd8
[] unix_stream_recvmsg+0x360/0xa60
sp=e0000000670efc50 bsp=e0000000670e0f08
[] sock_aio_read+0x260/0x2a0
sp=e0000000670efca0 bsp=e0000000670e0ec8
[] do_sync_read+0x170/0x260
sp=e0000000670efd20 bsp=e0000000670e0e78
[] vfs_read+0x310/0x320
sp=e0000000670efe20 bsp=e0000000670e0e28
[] sys_read+0x70/0xe0
sp=e0000000670efe20 bsp=e0000000670e0da8
[] ia64_ret_from_syscall+0x0/0x20
sp=e0000000670efe30 bsp=e0000000670e0da8
[] __kernel_syscall_via_break+0x0/0x20
sp=e0000000670f0000 bsp=e0000000670e0da8

Call Trace:
[] show_stack+0x80/0xa0
sp=e00000005f8efa80 bsp=e00000005f8e1048
[] showacpu+0xa0/0xe0
sp=e00000005f8efc50 bsp=e00000005f8e1028
[] handle_IPI+0x210/0x3a0
sp=e00000005f8efc50 bsp=e00000005f8e0fb0
[] handle_IRQ_event+0x80/0x120
sp=e00000005f8efc50 bsp=e00000005f8e0f78
[] __do_IRQ+0x140/0x420
sp=e00000005f8efc50 bsp=e00000005f8e0f18
[] ia64_handle_irq+0x3f0/0x420
sp=e00000005f8efc50 bsp=e00000005f8e0ea0
[] ia64_native_leave_kernel+0x0/0x270
sp=e00000005f8efc50 bsp=e00000005f8e0ea0
[] inotify_inode_queue_event+0x0/0x200
sp=e00000005f8efe20 bsp=e00000005f8e0e78
[] vfs_read+0x270/0x320
sp=e00000005f8efe20 bsp=e00000005f8e0e28
[] sys_read+0x70/0xe0
sp=e00000005f8efe20 bsp=e00000005f8e0da8
[] ia64_ret_from_syscall+0x0/0x20
sp=e00000005f8efe30 bsp=e00000005f8e0da8
[] __kernel_syscall_via_break+0x0/0x20
sp=e00000005f8f0000 bsp=e00000005f8e0da8

Call Trace:
[] show_stack+0x80/0xa0
sp=e0000000500afa80 bsp=e0000000500a1080
[] showacpu+0xa0/0xe0
sp=e0000000500afc50 bsp=e0000000500a1060
[] handle_IPI+0x210/0x3a0
sp=e0000000500afc50 bsp=e0000000500a0fe8
[] handle_IRQ_event+0x80/0x120
sp=e0000000500afc50 bsp=e0000000500a0fb0
[] __do_IRQ+0x140/0x420
sp=e0000000500afc50 bsp=e0000000500a0f50
[] ia64_handle_irq+0x3f0/0x420
sp=e0000000500afc50 bsp=e0000000500a0ed8
[] ia64_native_leave_kernel+0x0/0x270
sp=e0000000500afc50 bsp=e0000000500a0ed8
[] rw_verify_area+0xf0/0x1a0
sp=e0000000500afe20 bsp=e0000000500a0e78
[] vfs_read+0x120/0x320
sp=e0000000500afe20 bsp=e0000000500a0e28
[] sys_read+0x70/0xe0
sp=e0000000500afe20 bsp=e0000000500a0da8
[] ia64_ret_from_syscall+0x0/0x20
sp=e0000000500afe30 bsp=e0000000500a0da8
[] __kernel_syscall_via_break+0x0/0x20
sp=e0000000500b0000 bsp=e0000000500a0da8

Call Trace:
[] show_stack+0x80/0xa0
sp=e00000006ceefa80 bsp=e00000006cee0fe8
[] showacpu+0xa0/0xe0
sp=e00000006ceefc50 bsp=e00000006cee0fc8
[] handle_IPI+0x210/0x3a0
sp=e00000006ceefc50 bsp=e00000006cee0f58
[] handle_IRQ_event+0x80/0x120
sp=e00000006ceefc50 bsp=e00000006cee0f20
[] __do_IRQ+0x140/0x420
sp=e00000006ceefc50 bsp=e00000006cee0ec0
[] ia64_handle_irq+0x3f0/0x420
sp=e00000006ceefc50 bsp=e00000006cee0e48
[] ia64_native_leave_kernel+0x0/0x270
sp=e00000006ceefc50 bsp=e00000006cee0e48
[] fget_light+0x30/0x1a0
sp=e00000006ceefe20 bsp=e00000006cee0e28
[] sys_read+0x30/0xe0
sp=e00000006ceefe20 bsp=e00000006cee0da8
[] ia64_ret_from_syscall+0x0/0x20
sp=e00000006ceefe30 bsp=e00000006cee0da8
[] __kernel_syscall_via_break+0x0/0x20
sp=e00000006cef0000 bsp=e00000006cee0da8

Call Trace:
[] show_stack+0x80/0xa0
sp=e000000065dafa80 bsp=e000000065da1080
[] showacpu+0xa0/0xe0
sp=e000000065dafc50 bsp=e000000065da1060
[] handle_IPI+0x210/0x3a0
sp=e000000065dafc50 bsp=e000000065da0fe8
[] handle_IRQ_event+0x80/0x120
sp=e000000065dafc50 bsp=e000000065da0fb0
[] __do_IRQ+0x140/0x420
sp=e000000065dafc50 bsp=e000000065da0f50
[] ia64_handle_irq+0x3f0/0x420
sp=e000000065dafc50 bsp=e000000065da0ed8
[] ia64_native_leave_kernel+0x0/0x270
sp=e000000065dafc50 bsp=e000000065da0ed8
[] rw_verify_area+0x100/0x1a0
sp=e000000065dafe20 bsp=e000000065da0e78
[] vfs_read+0x120/0x320
sp=e000000065dafe20 bsp=e000000065da0e28
[] sys_read+0x70/0xe0
sp=e000000065dafe20 bsp=e000000065da0da8
[] ia64_ret_from_syscall+0x0/0x20
sp=e000000065dafe30 bsp=e000000065da0da8
[] __kernel_syscall_via_break+0x0/0x20
sp=e000000065db0000 bsp=e000000065da0da8

Call Trace:
[] show_stack+0x80/0xa0
sp=e000000069befa90 bsp=e000000069be0f70
[] showacpu+0xa0/0xe0
sp=e000000069befc60 bsp=e000000069be0f50
[] handle_IPI+0x210/0x3a0
sp=e000000069befc60 bsp=e000000069be0ee0
[] handle_IRQ_event+0x80/0x120
sp=e000000069befc60 bsp=e000000069be0ea8
[] __do_IRQ+0x140/0x420
sp=e000000069befc60 bsp=e000000069be0e48
[] ia64_handle_irq+0x3f0/0x420
sp=e000000069befc60 bsp=e000000069be0dc8
[] ia64_native_leave_kernel+0x0/0x270
sp=e000000069befc60 bsp=e000000069be0dc8
[] sys_read+0x0/0xe0
sp=e000000069befe30 bsp=e000000069be0da8
[] ia64_ret_from_syscall+0x0/0x20
sp=e000000069befe30 bsp=e000000069be0da8
[] __kernel_syscall_via_break+0x0/0x20
sp=e000000069bf0000 bsp=e000000069be0da8

このエントリーをはてなブックマークに追加

カーネルウォッチ没ネタ集。
LTPの新版がリリース。今回のハイライトは

* Addition of timerfd(), utimensat(), gettid() & io_cancel() tests,
* Addition of CPU & MEMORY HOTPLUG tests,
* Addition of Process Event Connector tests,
* Addition of Hackbench test,
* RT tests fix for START_LATENCY,
* FS_BIND fix for ia64 & for kernels below 2.6.15,
* SE-Linux fix to build against the latest refpolicy headers,
* Concurrency Fixes for some tests.


でも、メモリーホットプラグのテストはダメダメらしい。
このエントリーをはてなブックマークに追加

kernel watch没ネタ集

Vaidyanathan Srinivasan がスケジューラに手をいれて

echo 1 > /sys/devices/system/cpu/sched_mc_power_savings

ってやったら、なるべく処理を少ない数のCPUで行うようにしようと提案。目的は省電力機能の追加らしい。
で、インターフェースがどうのと、うだうだ議論したあと、nice, ioniceについてpowerniceを作ろうという斜め上の結論に。
ええ、これを使うとプロセスは省電力モードで動くんです。

もちろん、実装はまだない
このエントリーをはてなブックマークに追加

カーネルウォッチ没ネタ集
linux-2.6.26のリリース直前でLinusがprintkに構文拡張を入れました(そうです。Linusが書いたコードはマージウィンドとか関係なく常にいきなりmainlineにマージされるのです)

具体的には
%pS: ポインタのシンボル名を表示(データ)
%pF: ポインタのシンボル名を表示(関数)

の2つ。ああ、IA64とかでは関数ポインタの扱いが特殊だから%pSだけに出来ないんだわ。
最初は%Sにするつもりだったのだが、誰かが、それってgccの__attribute__((format(printf, ..)))が警告だすやんけ。という話があって、二文字フォーマットしかないねという結論に。

んで、それに対して、Matthew Wilcoxがつけたコメントが

printk("Function %pSucks\n", sys_open);

が非互換になるよねー。まあもっともそんなコードがカーネルの中にあるとも思えないけども(^^)


そうです。LKMLは英語圏の2chなので、どんな議論でもSuckとかCrap言わないと議論が進まないのです!
このエントリーをはてなブックマークに追加

原稿の為にファームウェア議論のスレッドを読み直している。
全然技術的な話が出てこないので、はっきり言って苦行。

でも以下のメールはワラタ


> It almost never happens that you have kernel versions which _need_
> different firmware installed. In almost all cases, the older driver will
> continue to work just fine with the newer firmware (and its bug-fixes).

I'm not sure which planet you're from, but it's one without ipw2200
chips in it. And in any case, the file names change.



全否定された David Woodhouse 涙目。
てゆーか、LKMLってほんと会話のノリが2chだよな
このエントリーをはてなブックマークに追加

LKMLでGPL v4作ろうぜーーって人が出現。


From: "Morton Harrow"
To: linux-kernel@vger.kernel.org
Date: Thu, 17 Jul 2008 02:09:53 +0800
Subject: GPL version 4

Dear gentlemen (and included list-members),

Let me first introduce myself. My name is Morton Harrow, senior GNU/Linux consultant in the London metropolitan area. I have been around in the Open Source world since the early beginning. I am very happy with the spirit and efforts of the Free Software Foundation (FSF).

As the name mentions “free”, one would think this organisation embraces real freedom. I can't help but feel that the FSF has made a mistake with the release of the third version of the GPL (GPLv3). This license restricts the freedom and usage of open source software for governments, companies and end-users alike.
Linking from other software which is not regarded by the FSF as free software, is not allowed by this license. I can't help but wonder if this is the freedom the FSF intensions. Real free should be that users are allowed link any software against GPL licensed software, without restrictions. But the current “freedom” restricts the spirit of Richard M. Stallman's original vision on a free world.

We propose to release as soon as possible, version 4 of the General Public License.

The GPL version 4 will accept every other license, accepted by the Open Source Initiative as open source. Corporate usage of GPL released software should be possible without restrictions. Linking from closed source software to GPLv4 software and libraries will be permitted. GPLv4 software can be shipped in (commercial) closed source software. Only this and the original authors need to be mentioned. Also, I believe the copyright of the FSF software should be transferred to the United Nations. As “human knowledge belongs to the world.”

Our planned release date of GPLv4 is 15th September 2008. The first software to be released under the terms of this new license, will be a continuation of the stalled ReiserFS project. As the FSF headers allow software to be released under the terms of the GPLv2 or higher, we will prepare automatic relicensing of GPLv2 and GPLv3 software to the GPLv4.

If you have any questions or comments, please feel free to contact me.

With kind regards,

Morton Harrow


=


--
Powered by Outblaze
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



すかさずAdrian Bunk が


ROTFL

I've double-checked my calendar - today isn't April 1st.

Please don't feed the troll.



うわ。まったく相手にされてねーーー
このエントリーをはてなブックマークに追加

LKMLでEric Rannaud さんが「madvise(2) MADV_SEQUENTIAL behavior」というタイトルでなんでシーケンシャルゆーてんのに読んだ後捨ててくれへんのやーー
と言っている。

真にもっともだがぶっちゃけメンドイからだと思う。

あと、mmapを使ったused onceページは扱いが難しいんじゃよー
mmapでタッチした瞬間にaccess bitが立ってしまうので、普通はそこで即捨て候補からはずれてしまうからな。

今うず高くつみあがっている残作業が消えたら、なんか考えてもいいな


mm/madvise.c and madvise(2) say:

* MADV_SEQUENTIAL - pages in the given range will probably be accessed
* once, so they can be aggressively read ahead, and
* can be freed soon after they are accessed.


But as the sample program at the end of this post shows, and as I
understand the code in mm/filemap.c, MADV_SEQUENTIAL will only increase
the amount of read ahead for the specified page range, but will not
influence the rate at which the pages just read will be freed from
memory.

Running the sample program on a large file, say 4GB on a machine with
3GB of RAM, the resident size of the program will grow enough to evict
pretty much everything else. (on 2.6.25.9-40.fc8)

Right before the program below is done reading the 4GB file:

7f6c3e654000-7f6d3e654000 r--s 00000000 fd:02 98125 /tmp/bigfile
Size: 4194304 kB
Rss: 2472220 kB
Pss: 2472220 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 2472220 kB
Private_Dirty: 0 kB
Referenced: 718748 kB


I'm well aware that the kernel is free to ignore the advice given
through madvise(2) (fadvise(2) seems to behave similarly, btw), so I'm
certainly not claiming this is a bug. However, I was wondering what was
the rationale behind it, and whether the manpages should be updated to
be more accurate.

There is a very straightforward workaround: MADV_DONTNEED on the range
just read, every so often, will be very effective at controlling the
resident size of the mapping. (mm/madvise.c:madvise_dontneed() calls
zap_page_range())

Thanks.



---
# dd if=/dev/zero of=/tmp/bigfile bs=1024 count=$((4*1024*1024))
# gcc test.c
# Run:
file=/tmp/bigfile; ./a.out $file & pid=$! ; while true; do cat /proc/$pid/smaps | grep -A 8 $file; sleep 1; done

# cat test.c

#include
#include
#include
#include
#include
#include
#include
#include

int main(int argc, char **argv)
{
if (argc != 2)
return -EINVAL;

char *fn = argv[1];
int fd = open(fn, O_RDONLY);
if (fd < 0)
return -errno;

struct stat st;
int ret = fstat(fd, &st);
if (ret)
return -errno;

unsigned char *map = mmap(0, st.st_size, PROT_READ, MAP_SHARED, fd, 0);
if (map == MAP_FAILED)
return -errno;

ret = madvise(map, st.st_size, MADV_SEQUENTIAL);
if (ret) {
fprintf(stderr, "madvise failed\n");
return -errno;
}

const int pagesize = sysconf(_SC_PAGESIZE);
unsigned char dummy = 0;
off_t i;

for (i = 0; i < st.st_size; i += pagesize) {
dummy += map[i];
}

munmap(map, st.st_size);
close(fd);

return dummy;
}



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


このエントリーをはてなブックマークに追加

一番声高に反対していたPeter Zijlstra が賛成に回った事で、一気にマージの機運が高まってきました。
このエントリーをはてなブックマークに追加

なんてチューニングパラメタが出来てた。気づかなかった。
しかし、これはいつ使うのか微妙なパラメタだな
このエントリーをはてなブックマークに追加

↑このページのトップヘ