openwrt/staging/blogic.git
13 years agoext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree()
Theodore Ts'o [Mon, 1 Aug 2011 12:45:02 +0000 (08:45 -0400)]
ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree()

Introduce new helper functions which try kmalloc, and then fall back
to vmalloc if necessary, and use them for allocating and deallocating
s_flex_groups.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: use the correct error exit path in ext4_init_inode_table()
Yongqiang Yang [Mon, 1 Aug 2011 10:32:19 +0000 (06:32 -0400)]
ext4: use the correct error exit path in ext4_init_inode_table()

This patch lets ext4_init_inode_table() handle errors right.
ext4_init_inode_table() should down_write() alloc_sem which
has been up_write()ed and stop the started journal handle.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: add missing kfree() on error return path in add_new_gdb()
Dan Carpenter [Sat, 30 Jul 2011 16:58:41 +0000 (12:58 -0400)]
ext4: add missing kfree() on error return path in add_new_gdb()

We added some more error handling in b40971426a "ext4: add error
checking to calls to ext4_handle_dirty_metadata()".  But we need to
call kfree() as well to avoid a memory leak.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: change umode_t in tracepoint headers to be an explicit __u16
Theodore Ts'o [Sat, 30 Jul 2011 16:38:46 +0000 (12:38 -0400)]
ext4: change umode_t in tracepoint headers to be an explicit __u16

As requested by Al Viro, since umode_t may be changing to a u32 for
some architectures.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
13 years agoext4: fix races in ext4_sync_parent()
Theodore Ts'o [Sat, 30 Jul 2011 16:34:19 +0000 (12:34 -0400)]
ext4: fix races in ext4_sync_parent()

Fix problems if fsync() races against a rename of a parent directory
as pointed out by Al Viro in his own inimitable way:

>While we are at it, could somebody please explain what the hell is ext4
>doing in
>static int ext4_sync_parent(struct inode *inode)
>{
>        struct writeback_control wbc;
>        struct dentry *dentry = NULL;
>        int ret = 0;
>
>        while (inode && ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) {
>                ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY);
>                dentry = list_entry(inode->i_dentry.next,
>                                    struct dentry, d_alias);
>                if (!dentry || !dentry->d_parent || !dentry->d_parent->d_inode)
>                        break;
>                inode = dentry->d_parent->d_inode;
>                ret = sync_mapping_buffers(inode->i_mapping);
>                ...
>Note that dentry obviously can't be NULL there.  dentry->d_parent is never
>NULL.  And dentry->d_parent would better not be negative, for crying out
>loud!  What's worse, there's no guarantees that dentry->d_parent will
>remain our parent over that sync_mapping_buffers() *and* that inode won't
>just be freed under us (after rename() and memory pressure leading to
>eviction of what used to be our dentry->d_parent)......

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: Fix overflow caused by missing cast in ext4_fallocate()
Utako Kusaka [Thu, 28 Jul 2011 02:11:20 +0000 (22:11 -0400)]
ext4: Fix overflow caused by missing cast in ext4_fallocate()

The logical block number in map.l_blk is a __u32, and so before we
shift it left, by the block size, we neeed cast it to a 64-bit size.

Otherwise i_size can be corrupted on an ENOSPC.

# df -T /mnt/mp1
Filesystem    Type   1K-blocks      Used Available Use% Mounted on
/dev/sda6     ext4     9843276    153056   9190200   2% /mnt/mp1
# fallocate -o 0 -l 2199023251456 /mnt/mp1/testfile
fallocate: /mnt/mp1/testfile: fallocate failed: No space left on device
# stat /mnt/mp1/testfile
  File: `/mnt/mp1/testfile'
  Size: 4293656576 Blocks: 19380440   IO Block: 4096   regular file
Device: 806h/2054d Inode: 12          Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2011-07-25 13:01:31.414490496 +0900
Modify: 2011-07-25 13:01:31.414490496 +0900
Change: 2011-07-25 13:01:31.454490495 +0900

Signed-off-by: Utako Kusaka <u-kusaka@wm.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
--
 fs/ext4/extents.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

13 years agoext4: add action of moving index in ext4_ext_rm_idx for Punch Hole
Robin Dong [Thu, 28 Jul 2011 01:29:33 +0000 (21:29 -0400)]
ext4: add action of moving index in ext4_ext_rm_idx for Punch Hole

The old function ext4_ext_rm_idx is used only for truncate case
because it just remove last index in extent-index-block. When punching
hole, it usually needed to remove "middle" index, therefore we must
move indexes which after it forward.

(I create a file with 1 depth extent tree and punch hole in the middle
of it, the last index in index-block strangly gone, so I find out this
bug)

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: simplify parameters of reserve_backup_gdb()
Yongqiang Yang [Thu, 28 Jul 2011 01:23:13 +0000 (21:23 -0400)]
ext4: simplify parameters of reserve_backup_gdb()

The reserve_backup_gdb() function only needs the block group number;
there's no need to pass a pointer to struct ext4_new_group_data to it.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
13 years agoext4: simplify parameters of add_new_gdb()
Yongqiang Yang [Thu, 28 Jul 2011 01:16:33 +0000 (21:16 -0400)]
ext4: simplify parameters of add_new_gdb()

add_new_gdb() only needs the block group number; there is no need to
pass a pointer to struct ext4_new_group_data to add_new_gdb().
Instead of filling in a pointer the struct buffer_head in
add_new_gdb(), it's simpler to have the caller fetch it from the
s_group_desc[] array.

[Fixed error path to handle the case where struct buffer_head *primary
 hasn't been set yet. -- Ted]

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: remove lock_buffer in bclean() and setup_new_group_blocks()
Yongqiang Yang [Thu, 28 Jul 2011 00:40:18 +0000 (20:40 -0400)]
ext4: remove lock_buffer in bclean() and setup_new_group_blocks()

There is no need to lock the buffers since no one else should be
touching these buffers besides the file system.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: simplify journal handling in setup_new_group_blocks()
Yongqiang Yang [Wed, 27 Jul 2011 02:24:41 +0000 (22:24 -0400)]
ext4: simplify journal handling in setup_new_group_blocks()

This patch simplifies journal handling in setup_new_group_blocks().

In previous code, block bitmap is modified everywhere in
setup_new_group_blocks(), ext4_get_write_access() in
extend_or_restart_transaction() is used to guarantee that the block
bitmap stays in the new handle, this makes things complicated.

The previous commit changed things so that the modifications on the
block bitmap are batched and done by ext4_set_bits() at the end of the
for loop.  This allows us to simplify things.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: let setup_new_group_blocks() set multiple bits at a time
Yongqiang Yang [Wed, 27 Jul 2011 02:05:53 +0000 (22:05 -0400)]
ext4: let setup_new_group_blocks() set multiple bits at a time

Rename mb_set_bits() to ext4_set_bits() and make it a global function
so that setup_new_group_blocks() can use it.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: fix a typo in ext4_group_extend()
Yongqiang Yang [Wed, 27 Jul 2011 01:53:35 +0000 (21:53 -0400)]
ext4: fix a typo in ext4_group_extend()

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: let ext4_group_add_blocks() handle 0 blocks quickly
Yongqiang Yang [Wed, 27 Jul 2011 01:51:08 +0000 (21:51 -0400)]
ext4: let ext4_group_add_blocks() handle 0 blocks quickly

If ext4_group_add_blocks() is called with 0 block, make it return 0
without doing any extra work.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: let ext4_group_add_blocks() return an error code
Yongqiang Yang [Wed, 27 Jul 2011 01:46:07 +0000 (21:46 -0400)]
ext4: let ext4_group_add_blocks() return an error code

This patch lets ext4_group_add_blocks() return an error code if it
fails, so that upper functions can handle error correctly.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: rename ext4_add_groupblocks() to ext4_group_add_blocks()
Yongqiang Yang [Wed, 27 Jul 2011 01:43:56 +0000 (21:43 -0400)]
ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks()

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: prevent a fs with errors from being resized
Yongqiang Yang [Wed, 27 Jul 2011 01:39:09 +0000 (21:39 -0400)]
ext4: prevent a fs with errors from being resized

A filesystem with errors is not allowed to being resized, otherwise,
it is easy to destroy the filesystem.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: prevent parallel resizers by atomic bit ops
Yongqiang Yang [Wed, 27 Jul 2011 01:35:44 +0000 (21:35 -0400)]
ext4: prevent parallel resizers by atomic bit ops

Before this patch, parallel resizers are allowed and protected by a
mutex lock, actually, there is no need to support parallel resizer, so
this patch prevents parallel resizers by atmoic bit ops, like
lock_page() and unlock_page() do.

To do this, the patch removed the mutex lock s_resize_lock from struct
ext4_sb_info and added a unsigned long field named s_resize_flags
which inidicates if there is a resizer.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: fix data corruption in inodes with journalled data
Jan Kara [Tue, 26 Jul 2011 13:07:11 +0000 (09:07 -0400)]
ext4: fix data corruption in inodes with journalled data

When journalling data for an inode (either because it is a symlink or
because the filesystem is mounted in data=journal mode), ext4_evict_inode()
can discard unwritten data by calling truncate_inode_pages(). This is
because we don't mark the buffer / page dirty when journalling data but only
add the buffer to the running transaction and thus mm does not know there
are still unwritten data.

Fix the problem by carefully tracking transaction containing inode's data,
committing this transaction, and writing uncheckpointed buffers when inode
should be reaped.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: correct comment for ext4_ext_check_cache
Robin Dong [Sun, 24 Jul 2011 01:53:25 +0000 (21:53 -0400)]
ext4: correct comment for ext4_ext_check_cache

The comment for ext4_ext_check_cache has a litte mistake.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: correct the debug message in ext4_ext_insert_extent
Robin Dong [Sun, 24 Jul 2011 01:51:07 +0000 (21:51 -0400)]
ext4: correct the debug message in ext4_ext_insert_extent

The debug message in ext4_ext_insert_extent before moving extent
is incorrect (the "from xx to xx").

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: remove unused argument in ext4_ext_next_leaf_block
Robin Dong [Sun, 24 Jul 2011 01:49:07 +0000 (21:49 -0400)]
ext4: remove unused argument in ext4_ext_next_leaf_block

The argument "inode" in function ext4_ext_next_allocated_block looks useless,
so clean it.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: remove ac_repeats from ext4_allocation_context
Tao Ma [Sat, 23 Jul 2011 20:18:55 +0000 (16:18 -0400)]
ext4: remove ac_repeats from ext4_allocation_context

ac_repeats isn't referenced in the mballoc code. So remove it.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: don't increment s_mb_buddies_generated in ext4_mb_release
Tao Ma [Sat, 23 Jul 2011 20:18:05 +0000 (16:18 -0400)]
ext4: don't increment s_mb_buddies_generated in ext4_mb_release

In ext4_mb_release, we use s_mb_buddies_generated++.  Although
the output is OK, but I don't think we need this extra ++.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: remove unnecessary ext4_get_group_info in ext4_mb_load_buddy
Tao Ma [Sat, 23 Jul 2011 20:07:26 +0000 (16:07 -0400)]
ext4: remove unnecessary ext4_get_group_info in ext4_mb_load_buddy

ext4_mb_load_buddy() calls ext4_get_group_info() for setting both
"grp" and "e4b->bd_info", but it could do "e4b->bd_info = grp".

Reported-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: avoid eh_entries overflow before insert extent_idx
Robin Dong [Mon, 18 Jul 2011 03:43:42 +0000 (23:43 -0400)]
ext4: avoid eh_entries overflow before insert extent_idx

If eh_entries is equal to (or greater than) eh_max, the operation of
inserting new extent_idx will make number of entries overflow.
So check eh_entries before inserting the new extent_idx.

Although there is no bug case according the code (function
ext4_ext_insert_index is called by ext4_ext_split and ext4_ext_split
is called only if the index block has free space), the right logic
should be "lookup the capacity before insertion".

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: avoid wasted extent cache lookup if !PUNCH_OUT_EXT
Robin Dong [Mon, 18 Jul 2011 03:27:43 +0000 (23:27 -0400)]
ext4: avoid wasted extent cache lookup if !PUNCH_OUT_EXT

This patch avoids an extraneous lookup of the extent cache
in ext4_ext_map_blocks() when the flag
EXT4_GET_BLOCKS_PUNCH_OUT_EXT is absent.

The existing logic was performing the lookup but not making
use of the result. The patch simply reverses the order of evaluation
in the condition.

Since ext4_ext_in_cache() does not initialize newex on misses, bypassing
its invocation does not introduce any new issue in this regard.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Eric Gouriou <egouriou@google.com>
13 years agoext4: remove unneeded parameter to ext4_ext_remove_space()
Allison Henderson [Mon, 18 Jul 2011 03:21:03 +0000 (23:21 -0400)]
ext4: remove unneeded parameter to ext4_ext_remove_space()

This patch removes the extra parameter in ext4_ext_remove_space()
which is no longer needed.

Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: punch hole optimizations: skip un-needed extent lookup
Allison Henderson [Mon, 18 Jul 2011 03:17:02 +0000 (23:17 -0400)]
ext4: punch hole optimizations: skip un-needed extent lookup

This patch optimizes the punch hole operation by skipping the
tree walking code that is used by truncate.  Since punch hole
is done through map blocks, the path to the extent is already
known in this function, so we do not need to look it up again.

Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: ignore a stripe width of 1
Dan Ehrenberg [Mon, 18 Jul 2011 01:18:51 +0000 (21:18 -0400)]
ext4: ignore a stripe width of 1

If the stripe width was set to 1, then this patch will ignore
that stripe width and ext4 will act as if the stripe width
were 0 with respect to optimizing allocations.

Signed-off-by: Dan Ehrenberg <dehrenberg@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: make the preallocation size be a multiple of stripe size
Dan Ehrenberg [Mon, 18 Jul 2011 01:11:30 +0000 (21:11 -0400)]
ext4: make the preallocation size be a multiple of stripe size

Previously, if a stripe width was provided, then it would be used
as the preallocation granularity, with no santiy checking and no
way to override this. Now, mb_prealloc_size defaults to the smallest
multiple of stripe size that is greater than or equal to the old
default mb_prealloc_size, and this can be overridden with the sysfs
interface.

Signed-off-by: Dan Ehrenberg <dehrenberg@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: fix compilation with -DDX_DEBUG
Bernd Schubert [Sat, 16 Jul 2011 23:41:23 +0000 (19:41 -0400)]
ext4: fix compilation with -DDX_DEBUG

Compilation of ext4/namei.c brought up an error and warning messages
when compiled with -DDX_DEBUG

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: remove unnecessary comments in ext4_orphan_add()
Lukas Czerner [Mon, 11 Jul 2011 22:47:04 +0000 (18:47 -0400)]
ext4: remove unnecessary comments in ext4_orphan_add()

The comment from Al Viro about possible race in the ext4_orphan_add() is
not justified. There is no race possible as we always have either i_mutex
locked, or the inode can not be referenced from outside hence the
J_ASSERS should not be hit from the reason described in comment.

This commit replaces it with notion that we are holding i_mutex so it
should not be possible for i_nlink to be changed while waiting for
s_orphan_lock.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: Fix a double free of sbi->s_group_info in ext4_mb_init_backend
Tao Ma [Mon, 11 Jul 2011 22:42:42 +0000 (18:42 -0400)]
ext4: Fix a double free of sbi->s_group_info in ext4_mb_init_backend

If we meet with an error in ext4_mb_add_groupinfo, we kfree
sbi->s_group_info[group >> EXT4_DESC_PER_BLOCK_BITS(sb)], but fail to
reset it to NULL. So the caller ext4_mb_init_backend will try to kfree
it again and causes a double free. So fix it by resetting it to NULL.

Some typo in comments of mballoc.c are also changed.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: fix a race which could leak memory in ext4_groupinfo_create_slab()
Tao Ma [Mon, 11 Jul 2011 22:26:01 +0000 (18:26 -0400)]
ext4: fix a race which could leak memory in ext4_groupinfo_create_slab()

In ext4_groupinfo_create_slab, we create ext4_groupinfo_caches within
ext4_grpinfo_slab_create_mutex, but set it outside the lock, and there
does exist some case that we may create it twice and causes a memory
leak.  So set it before we call mutex_unlock.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: avoid unneeded ext4_ext_next_leaf_block() while inserting extents
Robin Dong [Mon, 11 Jul 2011 22:24:01 +0000 (18:24 -0400)]
ext4: avoid unneeded ext4_ext_next_leaf_block() while inserting extents

Optimize ext4_ext_insert_extent() by avoiding
ext4_ext_next_leaf_block() when the result is not used/needed.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: remove redundant goto in ext4_ext_insert_extent()
Robin Dong [Mon, 11 Jul 2011 15:43:59 +0000 (11:43 -0400)]
ext4: remove redundant goto in ext4_ext_insert_extent()

If eh->eh_entries is smaller than eh->eh_max, the routine will
go to the "repeat" and then go to "has_space" directlly ,
since argument "depth" and "eh" are not even changed.

Therefore, goto "has_space" directly and remove redundant "repeat" tag.

Signed-off-by: Robin Dong <sanbai@taobao.com>
13 years agoext4: Change the wrong param comment for ext4_trim_all_free
Tao Ma [Mon, 11 Jul 2011 04:04:34 +0000 (00:04 -0400)]
ext4: Change the wrong param comment for ext4_trim_all_free

at ext4_trim_all_free() comment, there is no longer an @e4b parameter,
instead it is @group.

Reported-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: Speed up FITRIM by recording flags in ext4_group_info
Tao Ma [Mon, 11 Jul 2011 04:03:38 +0000 (00:03 -0400)]
ext4: Speed up FITRIM by recording flags in ext4_group_info

In ext4, when FITRIM is called every time, we iterate all the
groups and do trim one by one. It is a bit time wasting if the
group has been trimmed and there is no change since the last
trim.

So this patch adds a new flag in ext4_group_info->bb_state to
indicate that the group has been trimmed, and it will be cleared
if some blocks is freed(in release_blocks_on_commit). Another
trim_minlen is added in ext4_sb_info to record the last minlen
we use to trim the volume, so that if the caller provide a small
one, we will go on the trim regardless of the bb_state.

A simple test with my intel x25m ssd:
df -h shows:
/dev/sdb1              40G   21G   17G  56% /mnt/ext4
Block size:               4096

run the FITRIM with the following parameter:
range.start = 0;
range.len = UINT64_MAX;
range.minlen = 1048576;

without the patch:
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real 0m5.505s
user 0m0.000s
sys 0m1.224s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real 0m5.359s
user 0m0.000s
sys 0m1.178s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real 0m5.228s
user 0m0.000s
sys 0m1.151s

with the patch:
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real 0m5.625s
user 0m0.000s
sys 0m1.269s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real 0m0.002s
user 0m0.000s
sys 0m0.001s
[root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a
real 0m0.002s
user 0m0.000s
sys 0m0.001s

A big improvement for the 2nd and 3rd run.

Even after I delete some big image files, it is still much
faster than iterating the whole disk.

[root@boyu-tm test]# time ./ftrim /mnt/ext4/a
real 0m1.217s
user 0m0.000s
sys 0m0.196s

Cc: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: Add new ext4 trim tracepoints
Tao Ma [Mon, 11 Jul 2011 04:01:52 +0000 (00:01 -0400)]
ext4: Add new ext4 trim tracepoints

Add ext4_trim_extent and ext4_trim_all_free.

Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: speed up group trim with the right free block count
Tao Ma [Mon, 11 Jul 2011 04:00:07 +0000 (00:00 -0400)]
ext4: speed up group trim with the right free block count

When we trim some free blocks in a group of ext4, we need to
calculate the free blocks properly and check whether there are
enough freed blocks left for us to trim. Current solution will
only calculate free spaces if they are large for a trim which
isn't appropriate.

Let us see a small example:
a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
And minblocks is 1M.  With current solution, we have to iterate
the whole group since these 300k will never be subtracted from
1.5M.  But actually we should exit after we find the first 2
free spaces since the left 3 chunks only sum up to 900K if we
subtract the first 600K although they can't be trimed.

Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: fix trim length underflow with small trim length
Tao Ma [Mon, 11 Jul 2011 03:52:37 +0000 (23:52 -0400)]
ext4: fix trim length underflow with small trim length

In 0f0a25b, we adjust 'len' with s_first_data_block - start, but
it could underflow in case blocksize=1K, fstrim_range.len=512 and
fstrim_range.start = 0. In this case, when we run the code:
len -= first_data_blk - start; len will be underflow to -1ULL.
In the end, although we are safe that last_group check later will limit
the trim to the whole volume, but that isn't what the user really want.

So this patch fix it. It also adds the check for 'start' like ext3 so that
we can break immediately if the start is invalid.

Cc: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: add tracepoint for ext4_journal_start
Theodore Ts'o [Mon, 11 Jul 2011 02:37:50 +0000 (22:37 -0400)]
ext4: add tracepoint for ext4_journal_start

This will help debug who is responsible for starting a jbd2 transaction.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agojbd2: remove jbd2_dev_to_name() from jbd2 tracepoints
Theodore Ts'o [Mon, 11 Jul 2011 02:05:08 +0000 (22:05 -0400)]
jbd2: remove jbd2_dev_to_name() from jbd2 tracepoints

Using function calls in TP_printk causes perf heartburn, so print the
MAJOR/MINOR device numbers instead.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: free allocated and pre-allocated blocks when check_eofblocks_fl fails
Jiaying Zhang [Mon, 11 Jul 2011 00:07:25 +0000 (20:07 -0400)]
ext4: free allocated and pre-allocated blocks when check_eofblocks_fl fails

Upon corrupted inode or disk failures, we may fail after we already
allocate some blocks from the inode or take some blocks from the
inode's preallocation list, but before we successfully insert the
corresponding extent to the extent tree. In this case, we should free
any allocated blocks and discard the inode's preallocated blocks
because the entries in the inode's preallocation list may be in an
inconsistent state.

Signed-off-by: Jiaying Zhang <jiayingz@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
13 years agoext4: fix i_blocks/quota accounting when extent insertion fails
Maxim Patlasov [Sun, 10 Jul 2011 23:37:48 +0000 (19:37 -0400)]
ext4: fix i_blocks/quota accounting when extent insertion fails

The current implementation of ext4_free_blocks() always calls
dquot_free_block This looks quite sensible in the most cases: blocks
to be freed are associated with inode and were accounted in quota and
i_blocks some time ago.

However, there is a case when blocks to free were not accounted by the
time calling ext4_free_blocks() yet:

1. delalloc is on, write_begin pre-allocated some space in quota
2. write-back happens, ext4 allocates some blocks in ext4_ext_map_blocks()
3. then ext4_ext_map_blocks() gets an error (e.g.  ENOSPC) from
   ext4_ext_insert_extent() and calls ext4_free_blocks().

In this scenario, ext4_free_blocks() calls dquot_free_block() who, in
turn, decrements i_blocks for blocks which were not accounted yet (due
to delalloc) After clean umount, e2fsck reports something like:

> Inode 21, i_blocks is 5080, should be 5128.  Fix<y>?
because i_blocks was erroneously decremented as explained above.

The patch fixes the problem by passing the new flag
EXT4_FREE_BLOCKS_NO_QUOT_UPDATE to ext4_free_blocks(), to request
that the dquot_free_block() call be skipped.

Signed-off-by: Maxim Patlasov <maxim.patlasov@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
13 years agoext4: remove loop around bio_alloc()
Theodore Ts'o [Thu, 30 Jun 2011 01:44:45 +0000 (21:44 -0400)]
ext4: remove loop around bio_alloc()

These days, bio_alloc() is guaranteed to never fail (as long as nvecs
is less than BIO_MAX_PAGES), so we don't need the loop around the
struct bio allocation.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: quiet 'unused variables' compile warnings
Yongqiang Yang [Tue, 28 Jun 2011 14:19:05 +0000 (10:19 -0400)]
ext4: quiet 'unused variables' compile warnings

Unused variables was deleted.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: refactor duplicated block placement code
Eric Sandeen [Tue, 28 Jun 2011 14:01:31 +0000 (10:01 -0400)]
ext4: refactor duplicated block placement code

I found that ext4_ext_find_goal() and ext4_find_near()
share the same code for returning a coloured start block
based on i_block_group.

We can refactor this into a common function so that they
don't diverge in the future.

Thanks to adilger for suggesting the new function name.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: move ext4_ind_* functions from inode.c to indirect.c
Amir Goldstein [Mon, 27 Jun 2011 23:40:50 +0000 (19:40 -0400)]
ext4: move ext4_ind_* functions from inode.c to indirect.c

This patch moves functions from inode.c to indirect.c.
The moved functions are ext4_ind_* functions and their helpers.
Functions called from inode.c are declared extern.

Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: move common truncate functions to header file
Theodore Ts'o [Mon, 27 Jun 2011 23:16:04 +0000 (19:16 -0400)]
ext4: move common truncate functions to header file

Move two functions that will be needed by the indirect functions to be
moved to indirect.c as well as inode.c to truncate.h as inline
functions, so that we can avoid having duplicate copies of the
function (which can be a maintenance problem) without having to expose
them as globally functions.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: move __ext4_check_blockref to block_validity.c
Theodore Ts'o [Mon, 27 Jun 2011 23:16:02 +0000 (19:16 -0400)]
ext4: move __ext4_check_blockref to block_validity.c

In preparation for moving the indirect functions to a separate file,
move __ext4_check_blockref() to block_validity.c and rename it to
ext4_check_blockref() which is exported as globally visible function.

Also, rename the cpp macro ext4_check_inode_blockref() to
ext4_ind_check_inode(), to make it clear that it is only valid for use
with non-extent mapped inodes.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: rename ext4_indirect_* funcs to ext4_ind_*
Amir Goldstein [Mon, 27 Jun 2011 21:10:28 +0000 (17:10 -0400)]
ext4: rename ext4_indirect_* funcs to ext4_ind_*

We are going to move all ext4_ind_* functions to indirect.c.
Before we do that, let's rename 2 functions called ext4_indirect_*
to ext4_ind_*, to keep to the naming convention.

Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: split ext4_ind_truncate from ext4_truncate
Amir Goldstein [Mon, 27 Jun 2011 20:36:31 +0000 (16:36 -0400)]
ext4: split ext4_ind_truncate from ext4_truncate

We are about to move all indirect inode functions to a new file.
Before we do that, let's split ext4_ind_truncate() out of ext4_truncate()
leaving only generic code in the latter, so we will be able to move
ext4_ind_truncate() to the new file.

Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoext4: fix incorrect error msg in ext4_ext_insert_index
Robin Dong [Mon, 27 Jun 2011 19:35:53 +0000 (15:35 -0400)]
ext4: fix incorrect error msg in ext4_ext_insert_index

In function ext4_ext_insert_index when eh_entries of curp is
bigger than eh_max, error messages will be printed out, but the content
is about logical and ei_block, that's incorret.

Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agojbd2: use WRITE_SYNC in journal checkpoint
Tao Ma [Mon, 27 Jun 2011 16:36:29 +0000 (12:36 -0400)]
jbd2: use WRITE_SYNC in journal checkpoint

In journal checkpoint, we write the buffer and wait for its finish.
But in cfq, the async queue has a very low priority, and in our test,
if there are too many sync queues and every queue is filled up with
requests, the write request will be delayed for quite a long time and
all the tasks which are waiting for journal space will end with errors like:

INFO: task attr_set:3816 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
attr_set      D ffff880028393480     0  3816      1 0x00000000
 ffff8802073fbae8 0000000000000086 ffff8802140847c8 ffff8800283934e8
 ffff8802073fb9d8 ffffffff8103e456 ffff8802140847b8 ffff8801ed728080
 ffff8801db4bc080 ffff8801ed728450 ffff880028393480 0000000000000002
Call Trace:
 [<ffffffff8103e456>] ? __dequeue_entity+0x33/0x38
 [<ffffffff8103caad>] ? need_resched+0x23/0x2d
 [<ffffffff814006a6>] ? thread_return+0xa2/0xbc
 [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
 [<ffffffffa01f6224>] ? jbd2_journal_dirty_metadata+0x116/0x126 [jbd2]
 [<ffffffff81400d31>] __mutex_lock_common+0x14e/0x1a9
 [<ffffffffa021dbfb>] ? brelse+0x13/0x15 [ext4]
 [<ffffffff81400ddb>] __mutex_lock_slowpath+0x19/0x1b
 [<ffffffff81400b2d>] mutex_lock+0x1b/0x32
 [<ffffffffa01f927b>] __jbd2_journal_insert_checkpoint+0xe3/0x20c [jbd2]
 [<ffffffffa01f547b>] start_this_handle+0x438/0x527 [jbd2]
 [<ffffffff8106f491>] ? autoremove_wake_function+0x0/0x3e
 [<ffffffffa01f560b>] jbd2_journal_start+0xa1/0xcc [jbd2]
 [<ffffffffa02353be>] ext4_journal_start_sb+0x57/0x81 [ext4]
 [<ffffffffa024a314>] ext4_xattr_set+0x6c/0xe3 [ext4]
 [<ffffffffa024aaff>] ext4_xattr_user_set+0x42/0x4b [ext4]
 [<ffffffff81145adb>] generic_setxattr+0x6b/0x76
 [<ffffffff81146ac0>] __vfs_setxattr_noperm+0x47/0xc0
 [<ffffffff81146bb8>] vfs_setxattr+0x7f/0x9a
 [<ffffffff81146c88>] setxattr+0xb5/0xe8
 [<ffffffff81137467>] ? do_filp_open+0x571/0xa6e
 [<ffffffff81146d26>] sys_fsetxattr+0x6b/0x91
 [<ffffffff81002d32>] system_call_fastpath+0x16/0x1b

So this patch tries to use WRITE_SYNC in __flush_batch so that the request will
be moved into sync queue and handled by cfq timely. We also use the new plug,
sot that all the WRITE_SYNC requests can be given as a whole when we unplug it.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Reported-by: Robin Dong <sanbai@taobao.com>
13 years agoMerge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Linus Torvalds [Tue, 21 Jun 2011 17:22:35 +0000 (10:22 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/tytso/ext4

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: Fix oops in jbd2_journal_remove_journal_head()
  jbd2: Remove obsolete parameters in the comments for some jbd2 functions
  ext4: fixed tracepoints cleanup
  ext4: use FIEMAP_EXTENT_LAST flag for last extent in fiemap
  ext4: Fix max file size and logical block counting of extent format file
  ext4: correct comments for ext4_free_blocks()

13 years agoLinux 3.0-rc4
Linus Torvalds [Tue, 21 Jun 2011 03:25:46 +0000 (20:25 -0700)]
Linux 3.0-rc4

13 years agovfs: i_state needs to be 'unsigned long' for now
Linus Torvalds [Tue, 21 Jun 2011 03:13:49 +0000 (20:13 -0700)]
vfs: i_state needs to be 'unsigned long' for now

Commit 13e12d14e2dc ("vfs: reorganize 'struct inode' layout a bit")
moved things around a bit changed i_state to be unsigned int instead of
unsigned long.  That was to help structure layout for the 64-bit case,
and shrink 'struct inode' a bit (admittedly that only happened when
spinlock debugging was on and i_flags didn't pack with i_lock).

However, Meelis Roos reports that this results in unaligned exceptions
on sprc, and it turns out that the bit-locking primitives that we use
for the I_NEW bit want to use the bitops.  Which want 'unsigned long',
not 'unsigned int'.

We really should fix the bit locking code to not have that kind of
requirement, but that's a much bigger change.  So for now, revert that
field back to 'unsigned long' (but keep the other re-ordering changes
from the commit that caused this).

Andi points out that we have played games with this in 'struct page', so
it's solvable with other hacks too, but since right now the struct inode
size advantage only happens with some rare config options, it's not
worth fighting.

It _would_ be worth fixing the bitlocking code, though.  Especially
since there is no type safety in the bitlocking code (this never caused
any warnings, and worked fine on x86-64, because the bitlocks take a
'void *' and x86-64 doesn't care that deeply about alignment).  So it's
currently a very easy problem to trigger by mistake and never notice.

Reported-by: Meelis Roos <mroos@linux.ee>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied...
Linus Torvalds [Tue, 21 Jun 2011 03:12:48 +0000 (20:12 -0700)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6

* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  drm/radeon/kms/r6xx+: voltage fixes
  drm/nouveau: drop leftover debugging
  drm/radeon: avoid warnings from r600/eg irq handlers on powered off card.
  drm/radeon/kms: add missing param for dce3.2 DP transmitter setup
  drm/radeon/kms/atom: fix duallink on some early DCE3.2 cards
  drm/nouveau: fix assumption that semaphore dmaobj is valid in x-chan sync
  drm/nv50/disp: fix gamma with page flipping overlay turned on
  drm/nouveau/pm: Prevent overflow in nouveau_perf_init()
  drm/nouveau: fix big-endian switch

13 years agoMerge branch 'msm-fix' of git://codeaurora.org/quic/kernel/davidb/linux-msm
Linus Torvalds [Tue, 21 Jun 2011 03:11:34 +0000 (20:11 -0700)]
Merge branch 'msm-fix' of git://codeaurora.org/quic/kernel/davidb/linux-msm

* 'msm-fix' of git://codeaurora.org/quic/kernel/davidb/linux-msm:
  msm: timer: Fix DGT rate on 8960 and 8660
  msm: timer: compensate for timer shift in msm_read_timer_count
  msm: timer: Fix SMP build error

13 years agoMerge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Tue, 21 Jun 2011 03:10:52 +0000 (20:10 -0700)]
Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linux

* 'for-2.6.40' of git://linux-nfs.org/~bfields/linux:
  nfsd4: fix break_lease flags on nfsd open
  nfsd: link returns nfserr_delay when breaking lease
  nfsd: v4 support requires CRYPTO
  nfsd: fix dependency of nfsd on auth_rpcgss

13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Tue, 21 Jun 2011 03:10:18 +0000 (20:10 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits)
  pxa168_eth: fix race in transmit path.
  ipv4, ping: Remove duplicate icmp.h include
  netxen: fix race in skb->len access
  sgi-xp: fix a use after free
  hp100: fix an skb->len race
  netpoll: copy dev name of slaves to struct netpoll
  ipv4: fix multicast losses
  r8169: fix static initializers.
  inet_diag: fix inet_diag_bc_audit()
  gigaset: call module_put before restart of if_open()
  farsync: add module_put to error path in fst_open()
  net: rfs: enable RFS before first data packet is received
  fs_enet: fix freescale FCC ethernet dp buffer alignment
  netdev: bfin_mac: fix memory leak when freeing dma descriptors
  vlan: don't call ndo_vlan_rx_register on hardware that doesn't have vlan support
  caif: Bugfix - XOFF removed channel from caif-mux
  tun: teach the tun/tap driver to support netpoll
  dp83640: drop PHY status frames in the driver.
  dp83640: fix phy status frame event parsing
  phylib: Allow BCM63XX PHY to be selected only on BCM63XX.
  ...

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
Linus Torvalds [Tue, 21 Jun 2011 03:09:15 +0000 (20:09 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper
  fix comment in generic_permission()
  kill obsolete comment for follow_down()
  proc_sys_permission() is OK in RCU mode
  reiserfs_permission() doesn't need to bail out in RCU mode
  proc_fd_permission() is doesn't need to bail out in RCU mode
  nilfs2_permission() doesn't need to bail out in RCU mode
  logfs doesn't need ->permission() at all
  coda_ioctl_permission() is safe in RCU mode
  cifs_permission() doesn't need to bail out in RCU mode
  bad_inode_permission() is safe from RCU mode
  ubifs: dereferencing an ERR_PTR in ubifs_mount()

13 years agodrm/radeon/kms/r6xx+: voltage fixes
Alex Deucher [Mon, 20 Jun 2011 17:00:31 +0000 (13:00 -0400)]
drm/radeon/kms/r6xx+: voltage fixes

0xff01 is not an actual voltage value, but a flag
for the driver.  If the power state as that value,
skip setting the voltage.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
13 years agomsm: timer: Fix DGT rate on 8960 and 8660
Stephen Boyd [Thu, 21 Apr 2011 23:09:11 +0000 (23:09 +0000)]
msm: timer: Fix DGT rate on 8960 and 8660

The DGT runs at 27 MHz divided by 4 on 8660 and 8960.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: David Brown <davidb@codeaurora.org>
13 years agopxa168_eth: fix race in transmit path.
Richard Cochran [Sun, 19 Jun 2011 21:48:06 +0000 (21:48 +0000)]
pxa168_eth: fix race in transmit path.

Because the socket buffer is freed in the completion interrupt, it is not
safe to access it after submitting it to the hardware.

Cc: stable@kernel.org
Cc: Sachin Sanap <ssanap@marvell.com>
Cc: Zhangfei Gao <zgao6@marvell.com>
Cc: Philip Rakity <prakity@marvell.com>
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 years agoipv4, ping: Remove duplicate icmp.h include
Jesper Juhl [Sun, 19 Jun 2011 22:31:20 +0000 (22:31 +0000)]
ipv4, ping: Remove duplicate icmp.h include

Remove the duplicate inclusion of net/icmp.h from net/ipv4/ping.c

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 years agonetxen: fix race in skb->len access
Eric Dumazet [Sun, 19 Jun 2011 20:26:15 +0000 (20:26 +0000)]
netxen: fix race in skb->len access

As soon as skb is given to hardware, TX completion can free skb under
us.
Therefore, we should update dev stats before kicking the device.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 years agoMerge branch 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 20 Jun 2011 16:01:33 +0000 (09:01 -0700)]
Merge branch 'stable/bug.fixes' of git://git./linux/kernel/git/konrad/xen

* 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/setup: Fix for incorrect xen_extra_mem_start.
  xen: When calling power_off, don't call the halt function.
  xen: Fix compile warning when CONFIG_SMP is not defined.
  xen: support CONFIG_MAXSMP
  xen: partially revert "xen: set max_pfn_mapped to the last pfn mapped"

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Mon, 20 Jun 2011 15:59:46 +0000 (08:59 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: sh_keysc - 8x8 MODE_6 fix
  Input: omap-keypad - add missing input_sync()
  Input: evdev - try to wake up readers only if we have full packet
  Input: properly assign return value of clamp() macro.

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs...
Linus Torvalds [Mon, 20 Jun 2011 15:58:53 +0000 (08:58 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mason/btrfs-unstable

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: avoid delayed metadata items during commits
  btrfs: fix uninitialized return value
  btrfs: fix wrong reservation when doing delayed inode operations
  btrfs: Remove unused sysfs code
  btrfs: fix dereference of ERR_PTR value
  Btrfs: fix relocation races
  Btrfs: set no_trans_join after trying to expand the transaction
  Btrfs: protect the pending_snapshots list with trans_lock
  Btrfs: fix path leakage on subvol deletion
  Btrfs: drop the delalloc_bytes check in shrink_delalloc
  Btrfs: check the return value from set_anon_super

13 years agoMerge branch 'kvm-updates/3.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Mon, 20 Jun 2011 15:58:07 +0000 (08:58 -0700)]
Merge branch 'kvm-updates/3.0' of git://git./virt/kvm/kvm

* 'kvm-updates/3.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Fix register corruption in pvclock_scale_delta
  KVM: MMU: fix opposite condition in mapping_level_dirty_bitmap
  KVM: VMX: do not overwrite uptodate vcpu->arch.cr3 on KVM_SET_SREGS
  KVM: MMU: Fix build warnings in walk_addr_generic()

13 years agodevcgroup_inode_permission: take "is it a device node" checks to inlined wrapper
Al Viro [Sun, 19 Jun 2011 17:01:04 +0000 (13:01 -0400)]
devcgroup_inode_permission: take "is it a device node" checks to inlined wrapper

inode_permission() calls devcgroup_inode_permission() and almost all such
calls are _not_ for device nodes; let's at least keep the common path
straight...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
13 years agofix comment in generic_permission()
Al Viro [Sun, 19 Jun 2011 05:56:53 +0000 (01:56 -0400)]
fix comment in generic_permission()

CAP_DAC_OVERRIDE is enough for MAY_EXEC on directory, even if
no exec bits are set.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
13 years agokill obsolete comment for follow_down()
Al Viro [Fri, 17 Jun 2011 23:20:48 +0000 (19:20 -0400)]
kill obsolete comment for follow_down()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
13 years agoproc_sys_permission() is OK in RCU mode
Al Viro [Sun, 19 Jun 2011 00:42:00 +0000 (20:42 -0400)]
proc_sys_permission() is OK in RCU mode

nothing blocking there, since all instances of sysctl
->permissions() method are non-blocking - both of them,
that is.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
13 years agoreiserfs_permission() doesn't need to bail out in RCU mode
Al Viro [Sun, 19 Jun 2011 00:37:33 +0000 (20:37 -0400)]
reiserfs_permission() doesn't need to bail out in RCU mode

nothing blocking other than generic_permission() (and
check_acl callback does bail out in RCU mode).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
13 years agoproc_fd_permission() is doesn't need to bail out in RCU mode
Al Viro [Sun, 19 Jun 2011 00:35:23 +0000 (20:35 -0400)]
proc_fd_permission() is doesn't need to bail out in RCU mode

nothing blocking except generic_permission()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
13 years agonilfs2_permission() doesn't need to bail out in RCU mode
Al Viro [Sun, 19 Jun 2011 00:21:44 +0000 (20:21 -0400)]
nilfs2_permission() doesn't need to bail out in RCU mode

Nothing blocking except for generic_permission().  Which will DTRT.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
13 years agologfs doesn't need ->permission() at all
Al Viro [Sun, 19 Jun 2011 00:17:22 +0000 (20:17 -0400)]
logfs doesn't need ->permission() at all

... and never did, what with its ->permission() being what we do by default
when ->permission is NULL...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
13 years agocoda_ioctl_permission() is safe in RCU mode
Al Viro [Sun, 19 Jun 2011 00:11:43 +0000 (20:11 -0400)]
coda_ioctl_permission() is safe in RCU mode

return (mask & MAY_EXEC) ? -EACCES : 0; is non-blocking...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
13 years agocifs_permission() doesn't need to bail out in RCU mode
Al Viro [Sun, 19 Jun 2011 00:03:36 +0000 (20:03 -0400)]
cifs_permission() doesn't need to bail out in RCU mode

nothing potentially blocking except generic_permission(), which
will DTRT

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
13 years agobad_inode_permission() is safe from RCU mode
Al Viro [Sat, 18 Jun 2011 23:59:04 +0000 (19:59 -0400)]
bad_inode_permission() is safe from RCU mode

return -EIO; is *not* a blocking operation, thank you very much.
Nick, what the hell have you been smoking?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
13 years agoubifs: dereferencing an ERR_PTR in ubifs_mount()
Dan Carpenter [Mon, 20 Jun 2011 07:10:24 +0000 (10:10 +0300)]
ubifs: dereferencing an ERR_PTR in ubifs_mount()

d251ed271d5 "ubifs: fix sget races" left out the goto from this
error path so the static checkers complain that we're dereferencing
"sb" when it's an ERR_PTR.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
13 years agonfsd4: fix break_lease flags on nfsd open
J. Bruce Fields [Tue, 7 Jun 2011 15:50:23 +0000 (11:50 -0400)]
nfsd4: fix break_lease flags on nfsd open

Thanks to Casey Bodley for pointing out that on a read open we pass 0,
instead of O_RDONLY, to break_lease, with the result that a read open is
treated like a write open for the purposes of lease breaking!

Reported-by: Casey Bodley <cbodley@citi.umich.edu>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
13 years agodrm/nouveau: drop leftover debugging
Dave Airlie [Mon, 20 Jun 2011 05:25:35 +0000 (15:25 +1000)]
drm/nouveau: drop leftover debugging

this printk isn't really useful, just drop it for now.

Signed-off-by: Dave Airlie <airlied@redhat.com>
13 years agoMerge branch 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux...
Dave Airlie [Mon, 20 Jun 2011 02:02:38 +0000 (12:02 +1000)]
Merge branch 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes

* 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
  drm/nouveau: fix assumption that semaphore dmaobj is valid in x-chan sync
  drm/nv50/disp: fix gamma with page flipping overlay turned on
  drm/nouveau/pm: Prevent overflow in nouveau_perf_init()
  drm/nouveau: fix big-endian switch

13 years agodrm/radeon: avoid warnings from r600/eg irq handlers on powered off card.
Dave Airlie [Sat, 18 Jun 2011 03:59:51 +0000 (03:59 +0000)]
drm/radeon: avoid warnings from r600/eg irq handlers on powered off card.

Since we were calling the wptr function before checking if the IH was
even enabled, or the GPU wasn't shutdown, we'd get spam in the logs when
the GPU readback 0xffffffff. This reorders things so we return early
in the no IH and GPU shutdown cases.

Reported-and-tested-by: ManDay on #radeon
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
13 years agodrm/radeon/kms: add missing param for dce3.2 DP transmitter setup
Alex Deucher [Fri, 17 Jun 2011 06:11:30 +0000 (06:11 +0000)]
drm/radeon/kms: add missing param for dce3.2 DP transmitter setup

This is used during phy init to set up the phy for DP.  This may
fix DP problems on DCE3.2 cards.

Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
13 years agodrm/radeon/kms/atom: fix duallink on some early DCE3.2 cards
Alex Deucher [Fri, 17 Jun 2011 17:13:52 +0000 (13:13 -0400)]
drm/radeon/kms/atom: fix duallink on some early DCE3.2 cards

Certain revisions of the vbios on DCE3.2 cards have a bug
in the transmitter control table which prevents duallink from
being enabled properly on some cards.  The action switch statement
jumps to the wrong offset for the OUTPUT_ENABLE action.  The fix
is to use the ENABLE action rather than the OUTPUT_ENABLE action
on the affected cards.  In fixed version of the vbios, both
actions jump to the same offset, so the change should be safe.

Reported-and-tested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
13 years agosgi-xp: fix a use after free
Eric Dumazet [Sun, 19 Jun 2011 12:52:36 +0000 (12:52 +0000)]
sgi-xp: fix a use after free

Its illegal to dereference skb after dev_kfree_skb(skb)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Robin Holt <holt@sgi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 years agohp100: fix an skb->len race
Eric Dumazet [Sun, 19 Jun 2011 12:43:33 +0000 (12:43 +0000)]
hp100: fix an skb->len race

As soon as skb is given to hardware and spinlock released, TX completion
can free skb under us. Therefore, we should update netdev stats before
spinlock release.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 years agoMerge branch 'davem.r8169' of git://git.kernel.org/pub/scm/linux/kernel/git/romieu...
David S. Miller [Sun, 19 Jun 2011 23:26:46 +0000 (16:26 -0700)]
Merge branch 'davem.r8169' of git://git./linux/kernel/git/romieu/netdev-2.6

13 years agonetpoll: copy dev name of slaves to struct netpoll
WANG Cong [Sun, 19 Jun 2011 23:13:01 +0000 (16:13 -0700)]
netpoll: copy dev name of slaves to struct netpoll

Otherwise we will not see the name of the slave dev in error
message:

[  388.469446] (null):  doesn't support polling, aborting.

Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13 years agoKVM: Fix register corruption in pvclock_scale_delta
Zachary Amsden [Thu, 16 Jun 2011 03:50:04 +0000 (20:50 -0700)]
KVM: Fix register corruption in pvclock_scale_delta

The 128-bit multiply in pvclock.h was missing an output constraint for
EDX which caused a register corruption to appear.  Thanks to Ulrich for
diagnosing the EDX corruption and Avi for providing this fix.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: fix opposite condition in mapping_level_dirty_bitmap
Steve [Fri, 17 Jun 2011 02:25:39 +0000 (10:25 +0800)]
KVM: MMU: fix opposite condition in mapping_level_dirty_bitmap

The condition is opposite, it always maps huge page for the dirty tracked page

Reported-by: Steve <stefan.bosak@gmail.com>
Signed-off-by: Steve <stefan.bosak@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: VMX: do not overwrite uptodate vcpu->arch.cr3 on KVM_SET_SREGS
Marcelo Tosatti [Mon, 6 Jun 2011 17:27:47 +0000 (14:27 -0300)]
KVM: VMX: do not overwrite uptodate vcpu->arch.cr3 on KVM_SET_SREGS

Only decache guest CR3 value if vcpu->arch.cr3 is stale.
Fixes loadvm with live guest.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-by: Markus Schade <markus.schade@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoKVM: MMU: Fix build warnings in walk_addr_generic()
Borislav Petkov [Mon, 30 May 2011 20:11:17 +0000 (22:11 +0200)]
KVM: MMU: Fix build warnings in walk_addr_generic()

On 3.0-rc1 I get

In file included from arch/x86/kvm/mmu.c:2856:
arch/x86/kvm/paging_tmpl.h: In function ‘paging32_walk_addr_generic’:
arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function
In file included from arch/x86/kvm/mmu.c:2852:
arch/x86/kvm/paging_tmpl.h: In function ‘paging64_walk_addr_generic’:
arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function

caused by 6e2ca7d1802bf8ed9908435e34daa116662e7790. According to Takuya
Yoshikawa, ptep_user won't be used uninitialized so shut up gcc.

Cc: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Link: http://lkml.kernel.org/r/20110530094604.GC21833@liondog.tnic
Signed-off-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
13 years agoMerge branches 'perf-urgent-for-linus', 'sched-urgent-for-linus', 'timers-urgent...
Linus Torvalds [Sun, 19 Jun 2011 16:00:18 +0000 (09:00 -0700)]
Merge branches 'perf-urgent-for-linus', 'sched-urgent-for-linus', 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tools/perf: Fix static build of perf tool
  tracing: Fix regression in printk_formats file

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  clocksource: Make watchdog robust vs. interruption
  timerfd: Fix wakeup of processes when timer is cancelled on clock change

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, MAINTAINERS: Add x86 MCE people
  x86, efi: Do not reserve boot services regions within reserved areas