git.openwrt.org Git - openwrt/staging/blogic.git/log

Merge git://git./linux/kernel/git/rusty/linux-2.6-for-linus

* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  stop_machine: fix error code handling on multiple cpus
  stop_machine: use workqueues instead of kernel threads
  workqueue: introduce create_rt_workqueue
  Call init_workqueues before pre smp initcalls.
  Make panic= and panic_on_oops into core_params
  Make initcall_debug a core_param
  core_param() for genuinely core kernel parameters
  param: Fix duplicate module prefixes
  module: check kernel param length at compile time, not runtime
  Remove stop_machine during module load v2
  module: simplify load_module.

Merge branch 'for_linus' of git://git./linux/kernel/git/mchehab/linux-2.6

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (36 commits)
  V4L/DVB (9336): cx88: always de-alloc frontends on fault condition
  V4L/DVB (9335): videobuf: split unregister bus creating self-contained frontend de-allocator
  V4L/DVB (9334): cx88: dvb_remove debug output
  V4L/DVB (9333): cx88: Not all boards that requires cx88-mpeg has frontends
  V4L/DVB (9332): cx88: initial fix for analogue only compilation
  V4L/DVB (9331): Remove unused inode parameter from video_ioctl2
  V4L/DVB (9330): Get rid of inode parameter at v4l_compat_translate_ioctl()
  V4L/DVB (9328): ivtvfb: FB_BLANK_POWERDOWN turns off video output
  V4L/DVB (9327): v4l: use video_device.num instead of minor in video%d
  V4L/DVB (9326): ivtv: avoid green flashing when loading ivtv
  V4L/DVB (9325): ivtv: switch to unlocked_ioctl.
  V4L/DVB (9324): v4l2: add video_ioctl2_unlocked for unlocked_ioctl support.
  V4L/DVB (9323): v4l2-int-if: Add enum_framesizes and enum_frameintervals ioctls.
  V4L/DVB (9322): v4l2-int-if: Export more interfaces to modules
  V4L/DVB (9321): v4l2-int-if: Define new power state changes
  V4L/DVB (9320): v4l2: Add 10-bit RAW Bayer formats
  V4L/DVB (9319): v4l2-int-if: Add cropcap, g_crop and s_crop commands.
  V4L/DVB (9318): v4l2-int-if: Add command to get slave private data.
  V4L/DVB (9316): s5h1411: Power down s5h1411 when not in use
  V4L/DVB (9315): s5h1411: Skip reconfiguring demod modulation if already at the desired modulation
  ...

Merge branch 'v28-timers-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'v28-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
NOHZ: fix thinko in the timer restart code path

Merge git://git.infradead.org/iommu-2.6

* git://git.infradead.org/iommu-2.6:
  Admit to maintaining VT-d, for my sins.
  dmar: fix uninitialised 'ret' variable in dmar_parse_dev()
  intel-iommu: use coherent_dma_mask in alloc_coherent
  amd_iommu: fix nasty bug that caused ILLEGAL_DEVICE_TABLE_ENTRY errors
  intel-iommu: IA64 support
  dmar: remove the quirk which disables dma-remapping when intr-remapping enabled
  dmar: Use queued invalidation interface for IOTLB and context invalidation
  dmar: context cache and IOTLB invalidation using queued invalidation
  dmar: use spin_lock_irqsave() in qi_submit_sync()

Merge git://git./linux/kernel/git/bart/ide-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
ide: remove useless subdirs from drivers/ide/

Merge git://git./linux/kernel/git/agk/linux-2.6-dm

* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
  dm: tidy local_init
  dm: remove unused flush_all
  dm raid1: separate region_hash interface part1
  dm: mark split bio as cloned
  dm crypt: remove waitqueue
  dm crypt: fix async split
  dm crypt: tidy sector
  dm: remove dm header from targets
  dm: publish array_too_big
  dm exception store: fix misordered writes
  dm exception store: refactor zero_area
  dm snapshot: drop unused last_percent
  dm snapshot: fix primary_pe race
  dm kcopyd: avoid queue shuffle

Merge branch 'core-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcupdate: fix bug of rcu_barrier*()
profiling: fix !procfs build

Fixed trivial conflicts in 'include/linux/profile.h'

Merge branch 'sched-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: disable the hrtick for now
  sched: revert back to per-rq vruntime
  sched: fair scheduler should not resched rt tasks
  sched: optimize group load balancer
  sched: minor fast-path overhead reduction
  sched: fix the wrong mask_len, cleanup
  sched: kill unused scheduler decl.
  sched: fix the wrong mask_len
  sched: only update rq->clock while holding rq->lock

Merge branch 'irq-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  genirq: NULL struct irq_desc's member 'name' in dynamic_irq_cleanup()
  genirq: fix off by one and coding style
  genirq: fix set_irq_type() when recording trigger type

8250: Add more OxSemi devices

These have the Mainpine PCI identifier on however

Additional paranoia check for Tornado versions added by Alan Cox

(and this time I remembered to do an stg refresh so that the corrections ended
up in these patches not randomly attached to another diff -- Alan)

Signed-off-by: Lee Howard <lee.howard@mainpine.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

8250: Oxford Semiconductor Devices

Add support for the OxSemi 'Tornado' devices.

Reformatted and reworked a bit by Alan Cox

Signed-off-by: Lee Howard <lee.howard@mainpine.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

tty: Fix tty_port kref screwup

Pass the brown paper bags please. I changed the semantics of this so the
function was supposed to do the extra kref itself then forgot to do the
change.. duh....

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

watchdog: Fix warning

This seems to have popped up after the recent merges:

drivers/watchdog/w83697ug_wdt.c: In function ‘w83697ug_select_wd_register’:
drivers/watchdog/w83697ug_wdt.c:105: warning: ‘return’ with a value, in function returning void

Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mutex: speed up generic mutex implementations

- atomic operations which both modify the variable and return something imply
  full smp memory barriers before and after the memory operations involved
  (failing atomic_cmpxchg, atomic_add_unless, etc don't imply a barrier because
  they don't modify the target). See Documentation/atomic_ops.txt.
  So remove extra barriers and branches.

- All architectures support atomic_cmpxchg. This has no relation to
  __HAVE_ARCH_CMPXCHG. We can just take the atomic_cmpxchg path unconditionally

This reduces a simple single threaded fastpath lock+unlock test from 590 cycles
to 203 cycles on a ppc970 system.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git./linux/kernel/git/czankel/xtensa-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/czankel/xtensa-2.6:
  xtensa: Add config files for Diamond 232L - Rev B processor variant
  xtensa: Fix io regions
  xtensa: Add support for the Sonic Ethernet device for the XT2000 board.
  xtensa: replace remaining __FUNCTION__ occurrences
  xtensa: use newer __SPIN_LOCK_UNLOCKED macro
  XTENSA: warn about including <asm/rwsem.h> directly.

memcg: fix page_cgroup allocation

page_cgroup_init() is called from mem_cgroup_init(). But at this
point, we cannot call alloc_bootmem().
(and this caused panic at boot.)

This patch moves page_cgroup_init() to init/main.c.

Time table is following:
==
  parse_args(). # we can trust mem_cgroup_subsys.disabled bit after this.
  ....
  cgroup_init_early()  # "early" init of cgroup.
  ....
  setup_arch()         # memmap is allocated.
  ...
  page_cgroup_init();
  mem_init();   # we cannot call alloc_bootmem after this.
  ....
  cgroup_init() # mem_cgroup is initialized.
==

Before page_cgroup_init(), mem_map must be initialized. So,
I added page_cgroup_init() to init/main.c directly.

(*) maybe this is not very clean but
    - cgroup_init_early() is too early
    - in cgroup_init(), we have to use vmalloc instead of alloc_bootmem().
    use of vmalloc area in x86-32 is important and we should avoid very large
    vmalloc() in x86-32. So, we want to use alloc_bootmem() and added page_cgroup_init()
    directly to init/main.c

[akpm@linux-foundation.org: remove unneeded/bad mem_cgroup_subsys declaration]
[akpm@linux-foundation.org: fix build]
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

jbd: abort instead of waiting for nonexistent transactions

The __log_wait_for_space function sits in a loop checkpointing
transactions until there is sufficient space free in the journal.
However, if there are no transactions to be processed (e.g. because the
free space calculation is wrong due to a corrupted filesystem) it will
never progress.

Check for space being required when no transactions are outstanding and
abort the journal instead of endlessly looping.

This patch fixes the bug reported by Sami Liedes at:
http://bugzilla.kernel.org/show_bug.cgi?id=10976

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Tested-by: Sami Liedes <sliedes@cc.hut.fi>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

jbd: test BH_Write_EIO to detect errors on metadata buffers

__try_to_free_cp_buf(), __process_buffer(), and __wait_cp_io() test
BH_Uptodate flag to detect write I/O errors on metadata buffers. But by
commit 95450f5a7e53d5752ce1a0d0b8282e10fe745ae0 "ext3: don't read inode
block if the buffer has a write error"(*), BH_Uptodate flag can be set to
inode buffers with BH_Write_EIO in order to avoid reading old inode data.
So now, we have to test BH_Write_EIO flag of checkpointing inode buffers
instead of BH_Uptodate. This patch does it.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ext3: add checks for errors from jbd

If the journal has aborted due to a checkpointing failure, we have to
keep the contents of the journal space.  Otherwise, the filesystem will
lose uncheckpointed metadata completely and become inconsistent.  To
avoid this, we need to keep needs_recovery flag if checkpoint has
failed.

With this patch, ext3_put_super() detects a checkpointing failure from
the return value of journal_destroy(), then it invokes ext3_abort() to
make the filesystem read only and keep needs_recovery flag.  Errors
from journal_flush() are also handled by this patch in some places.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

jbd: fix error handling for checkpoint io

When a checkpointing IO fails, current JBD code doesn't check the error
and continue journaling.  This means latest metadata can be lost from both
the journal and filesystem.

This patch leaves the failed metadata blocks in the journal space and
aborts journaling in the case of log_do_checkpoint().  To achieve this, we
need to do:

1. don't remove the failed buffer from the checkpoint list where in
   the case of __try_to_free_cp_buf() because it may be released or
   overwritten by a later transaction
2. log_do_checkpoint() is the last chance, remove the failed buffer
   from the checkpoint list and abort the journal
3. when checkpointing fails, don't update the journal super block to
   prevent the journaled contents from being cleaned.  For safety,
   don't update j_tail and j_tail_sequence either
4. when checkpointing fails, notify this error to the ext3 layer so
   that ext3 don't clear the needs_recovery flag, otherwise the
   journaled contents are ignored and cleaned in the recovery phase
5. if the recovery fails, keep the needs_recovery flag
6. prevent cleanup_journal_tail() from being called between
   __journal_drop_transaction() and journal_abort() (a race issue
   between journal_flush() and __log_wait_for_space()

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

profiling: fix up CONFIG_PROC_FS=n build

In the case where procfs is disabled, create_proc_profile() does not
exist. Stub it in with the others.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: page_cgroup needs linux/vmalloc.h for vmalloc_node()/vfree().

mm/page_cgroup.c: In function 'init_section_page_cgroup':
mm/page_cgroup.c:111: error: implicit declaration of function 'vmalloc_node'
mm/page_cgroup.c:111: warning: assignment makes pointer from integer without a cast
mm/page_cgroup.c: In function '__free_page_cgroup':
mm/page_cgroup.c:140: error: implicit declaration of function 'vfree'

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git./linux/kernel/git/benh/powerpc

* git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (53 commits)
  powerpc: Support for relocatable kdump kernel
  powerpc: Don't use a 16G page if beyond mem= limits
  powerpc: Add del_node() for early boot code to prune inapplicable devices.
  powerpc: Further compile fixup for STRICT_MM_TYPECHECKS
  powerpc: Remove empty #else from signal_64.c
  powerpc: Move memory size print into common show_cpuinfo for 32-bit
  hvc_console: Remove __devexit annotation of hvc_remove()
  hvc_console: Add support for tty window resizing
  hvc_console: Fix loop if put_char() returns 0
  hvc_console: Add tty driver flag TTY_DRIVER_RESET_TERMIOS
  hvc_console: Add a hangup notifier for backends
  powerpc/83xx: Add DS1339 RTC support for MPC8349E-mITX boards .dts
  powerpc/83xx: Add support for MCU microcontroller in .dts files
  powerpc/85xx: Move mpc8572ds.dts to address-cells/size-cells = <2>
  of/spi: Support specifying chip select as active high via device tree
  powerpc: Remove device_type = "board_control" properties in .dts files
  i2c-cpm: Suppress autoprobing for devices
  powerpc/85xx: Fix mpc8536ds dma interrupt numbers
  powerpc/85xx: Enable enhanced functions for 8536 TSEC
  powerpc: Delete unused prom_strtoul and prom_memparse
  ...

Merge branch 'for-upstream' of git://git./linux/kernel/git/dvrabel/uwb

* 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/dvrabel/uwb: (47 commits)
  uwb: wrong sizeof argument in mac address compare
  uwb: don't use printk_ratelimit() so often
  uwb: use kcalloc where appropriate
  uwb: use time_after() when purging stale beacons
  uwb: add credits for the original developers of the UWB/WUSB/WLP subsystems
  uwb: add entries in the MAINTAINERS file
  uwb: depend on EXPERIMENTAL
  wusb: wusb-cbaf (CBA driver) sysfs ABI simplification
  uwb: document UWB and WUSB sysfs files
  uwb: add symlinks in sysfs between radio controllers and PALs
  uwb: dont tranmit identification IEs
  uwb: i1480/GUWA100U: fix firmware download issues
  uwb: i1480: remove MAC/PHY information checking function
  uwb: add Intel i1480 HWA to the UWB RC quirk table
  uwb: disable command/event filtering for D-Link DUB-1210
  uwb: initialize the debug sub-system
  uwb: Fix handling IEs with empty IE data in uwb_est_get_size()
  wusb: fix bmRequestType for Abort RPipe request
  wusb: fix error path for wusb_set_dev_addr()
  wusb: add HWA host controller driver
  ...

Merge branch 'for-next' of git://git.o-hand.com/linux-mfd

* 'for-next' of git://git.o-hand.com/linux-mfd:
  mfd: check for platform_get_irq() return value in sm501
  mfd: use pci_ioremap_bar() in sm501
  mfd: Don't store volatile bits in WM8350 register cache
  mfd: don't export wm3850 static functions
  mfd: twl4030-gpio driver
  mfd: rtc-twl4030 driver
  mfd: twl4030 IRQ handling update

Merge branch 'for-linus' of git://git./linux/kernel/git/roland/infiniband

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/ehca: Reject dynamic memory add/remove when ehca adapter is present
  IB/ehca: Fix reported max number of QPs and CQs in systems with >1 adapter
  IPoIB: Set netdev offload features properly for child (VLAN) interfaces
  IPoIB: Clean up ethtool support
  mlx4_core: Add Ethernet PCI device IDs
  mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC
  mlx4_core: Multiple port type support
  mlx4_core: Ethernet MAC/VLAN management
  mlx4_core: Get ethernet MTU and default address from firmware
  mlx4_core: Support multiple pre-reserved QP regions
  Update NetEffect maintainer emails to Intel emails
  RDMA/cxgb3: Remove cmid reference on tid allocation failures
  IB/mad: Use krealloc() to resize snoop table
  IPoIB: Always initialize poll_timer to avoid crash on unload
  IB/ehca: Don't allow creating UC QP with SRQ
  mlx4_core: Add QP range reservation support
  RDMA/ucma: Test ucma_alloc_multicast() return against NULL, not with IS_ERR()

Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  sata_via: load DEVICE register when CTL changes
  libata: set device class to NONE if phys_offline
  libata-eh: fix slave link EH action mask handling
  libata: transfer EHI control flags to slave ehc.i
  libata-sff: fix ata_sff_post_internal_cmd()
  libata: initialize port_task when !CONFIG_ATA_SFF

Merge branch 'for-linus' of /home/rmk/linux-2.6-arm

* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] clps711x: add sparsemem definitions
  [ARM] 5315/1: Fix section mismatch warning (sa1111)
  [ARM] Orion: activate workaround for 88f6183 SPI clock erratum
  [ARM] Orion: instantiate the dsa switch driver
  [ARM] mv78xx0: force link speed/duplex on eth2/eth3
  [ARM] remove extra brace in arch/arm/mach-pxa/trizeps4.c
  [ARM] balance parenthesis in header file
  [ARM] pxa: fix trizeps PCMCIA build
  [ARM] pxa: fix trizeps defconfig
  [ARM] dmabounce requires ZONE_DMA
  [ARM] 5303/1: period_cycles should be greater than 1
  [ARM] 5310/1: Fix cache flush functions for ARMv4
  [ARM] pxa: fix 3bca103a1e658d23737d20e1989139d9ca8973bf
  [ARM] pxa: fix redefinition of NR_IRQS
  [ARM] S3C24XX: Fix redefine of DEFINE_TIMER() in s3c24xx pwm-clock.c
  [ARM] S3C2443: Fix HCLK rate
  [ARM] S3C24XX: Serial driver debug depends on DEBUG_LL
  [ARM] S3C24XX: pwm-clock set_parent mask fix

Merge branch 'release' of git://git./linux/kernel/git/aegl/linux-2.6

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: (41 commits)
  [IA64] Fix annoying IA64_TR_ALLOC_MAX message.
  [IA64] kill sys32_pipe
  [IA64] remove sys32_pause
  [IA64] Add Variable Page Size and IA64 Support in Intel IOMMU
  ia64/pv_ops: paravirtualized instruction checker.
  ia64/xen: a recipe for using xen/ia64 with pv_ops.
  ia64/pv_ops: update Kconfig for paravirtualized guest and xen.
  ia64/xen: preliminary support for save/restore.
  ia64/xen: define xen machine vector for domU.
  ia64/pv_ops/xen: implement xen pv_time_ops.
  ia64/pv_ops/xen: implement xen pv_irq_ops.
  ia64/pv_ops/xen: define the nubmer of irqs which xen needs.
  ia64/pv_ops/xen: implement xen pv_iosapic_ops.
  ia64/pv_ops/xen: paravirtualize entry.S for ia64/xen.
  ia64/pv_ops/xen: paravirtualize ivt.S for xen.
  ia64/pv_ops/xen: paravirtualize DO_SAVE_MIN for xen.
  ia64/pv_ops/xen: define xen paravirtualized instructions for hand written assembly code
  ia64/pv_ops/xen: define xen pv_cpu_ops.
  ia64/pv_ops/xen: define xen pv_init_ops for various xen initialization.
  ia64/pv_ops/xen: elf note based xen startup.
  ...

sata_via: load DEVICE register when CTL changes

VIA controllers clear DEVICE register when IEN changes. Make sure
DEVICE is updated along with CTL.

This change is separated from Joseph Chan's larger patch.

http://thread.gmane.org/gmane.linux.kernel.commits.mm/40640

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Joseph Chan <JosephChan@via.com.tw>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata: set device class to NONE if phys_offline

Reset methods don't have access to phys link status for slave links
and may incorrectly indicate device presence causing unnecessary probe
failures for unoccupied links. This patch clears device class to NONE
during post-reset processing if phys link is offline.

As on/offlineness semantics is strictly defined and used in multiple
places by the core layer, this won't change behavior for drivers which
don't use slave links.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata-eh: fix slave link EH action mask handling

Slave link action mask is transferred to master link and all the EH
actions are taken by the master link. ata_eh_about_to_do() and
ata_eh_done() are called with ATA_EH_ALL_ACTIONS to clear the slave
link actions during transfer. This always sets ATA_PFLAG_RECOVERED
flag causing spurious "EH complete" messages.

Don't set ATA_PFLAG_RECOVERED for slave link actions.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata: transfer EHI control flags to slave ehc.i

ATA_EHI_NO_AUTOPSY and ATA_EHI_QUIET are used to control the behavior
of EH. As only the master link is visible outside EH, these flags are
set only for the master link although they should also apply to the
slave link, which causes spurious EH messages during probe and
suspend/resume.

This patch transfers those two flags to slave ehc.i before performing
slave autopsy and reporting.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata-sff: fix ata_sff_post_internal_cmd()

ata_sff_post_internal_cmd() needs to grab port lock before calling
ata_bmdma_stop() and also need to clear hsm_task_state. Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata: initialize port_task when !CONFIG_ATA_SFF

ap->port_task was not initialized if !CONFIG_ATA_SFF later triggering
lockdep warning. Make sure it's initialized.

Reported by Larry Finger.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

Merge branches 'cma', 'cxgb3', 'ehca', 'ipoib', 'mad', 'mlx4' and 'nes' into for-next

IB/ehca: Reject dynamic memory add/remove when ehca adapter is present

Since the ehca device driver does not support dynamic memory add and
remove operations, the driver must explicitly reject such requests in
order to prevent unpredictable behaviors related to existing memory
regions that cover all of memory being used by InfiniBand protocols in
the kernel.

The solution (for now at least) is to add a memory notifier to the
ehca device driver and if a request for dynamic memory add or remove
comes in, ehca will always reject it. The user can add or remove
memory by hot-removing the ehca adapter, performing the memory
operation, and then hot-adding the ehca adapter back.

Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

IB/ehca: Fix reported max number of QPs and CQs in systems with >1 adapter

Because ehca adapters can differ in the maximum number of QPs and CQs
we have to save the maximum number of these ressources per adapter and
not globally per ehca driver. This fix introduces 2 new members to the
shca structure to store the maximum value for QPs and CQs per adapter.

The module parameters are now used as initial values for those
variables. If a user selects an invalid number of CQs or QPs we don't
print an error any longer, instead we will inform the user with a
warning and set the values to the respective maximum supported by the
HW.

Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

IPoIB: Set netdev offload features properly for child (VLAN) interfaces

Child devices were created without any offload features set, fix this by
moving the code that computes the features into generic function which is
now called through non-child and child device creation.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
-- v1 has a bug where the 'result' flag in ipoib_vlan_add may be used uninitialized
Signed-off-by: Roland Dreier <rolandd@cisco.com>

IPoIB: Clean up ethtool support

Add a get_rx_csum method. Remove the driver's own get_tso method, as
the ethtool kernel code uses the default one if nothing is provided.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

mlx4_core: Add Ethernet PCI device IDs

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC

The Mellanox ConnectX can operate as an InfiniBand adapter, as an
Ethernet NIC, or as a Fibre Channel (FC) HBA. The kernel has a
low-level driver, mlx4_core, which handles multiplexing access to the
device, and there is also already an InfiniBad driver, mlx4_ib.

This patch adds a new driver, mlx4_en, which implements a standard
Ethernet NIC driver.

Signed-off-by: Liran Liss <liranl@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

mlx4_core: Multiple port type support

Multi-protocol adapters support different port types.  Each consumer
of mlx4_core queries for supported port types; in particular mlx4_ib
can no longer assume that all physical ports belong to it.  Port type
is configured through a sysfs interface.  When the type of a port is
changed, all mlx4 interfaces are unregistered, and then registered
again with the new port types.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

mlx4_core: Ethernet MAC/VLAN management

Add support for managing MAC and VLAN filters for each port.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Oren Duer <oren@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

mlx4_core: Get ethernet MTU and default address from firmware

Get maximum ethernet MTU and default MAC address from the firmware
QUERY_DEV_CAP command.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

mlx4_core: Support multiple pre-reserved QP regions

For ethernet support, we need to reserve QPs for the ethernet and
fibre channel driver. The QPs are reserved at the end of the QP
table. (This way we assure that they are aligned to their size)

We need to consider these reserved ranges in bitmap creation, so we
extend the mlx4 bitmap utility functions to allow reserved ranges at
both the bottom and the top of the range.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

powerpc: Support for relocatable kdump kernel

This adds relocatable kernel support for kdump. With this one can
use the same regular kernel to capture the kdump. A signature (0xfeed1234)
is passed in r6 from panic code to the next kernel through kexec_sequence
and purgatory code. The signature is used to differentiate between
kdump kernel and non-kdump kernels.

The purgatory code compares the signature and sets the __kdump_flag in
head_64.S. During the boot up, kernel code checks __kdump_flag and if it
is set, the kernel will behave as relocatable kdump kernel. This kernel
will boot at the address where it was loaded by kexec-tools ie. at the
address reserved through crashkernel boot parameter.

CONFIG_CRASH_DUMP depends on CONFIG_RELOCATABLE option to build kdump
kernel as relocatable. So the same kernel can be used as production and
kdump kernel.

This patch incorporates the changes suggested by Paul Mackerras to avoid
GOT use and to avoid two copies of the code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Mohan Kumar M <mohan@in.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Don't use a 16G page if beyond mem= limits

If mem= is used on the boot command line to limit memory then the memory block where a 16G page resides may not be available.

Thanks to Michael Ellerman for finding the problem.

Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Add del_node() for early boot code to prune inapplicable devices.

Some platforms have variants that can share most of a flat device tree but need
a few devices selectively pruned at boot time. This adds del_node() to ops.h
to allow access to the existing fdt_del_node().

Signed-off-by: Mike Ditto <mditto@consentry.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Further compile fixup for STRICT_MM_TYPECHECKS

A patch of mine was recently committed to fix up STRICT_MM_TYPECHECKS
behaviour on powerpc (f5ea64dcbad89875d130596df14c9b25d994a737).
However, something which breaks it again seems to have slipped in
afterwards. So, here's another small fix.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Remove empty #else from signal_64.c

Remove empty/bogus #else from signal_64.c

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Move memory size print into common show_cpuinfo for 32-bit

Most of the platforms were printing the size of the memory
in their show_cpuinfo implementations. This moves that to
the common show_cpuinfo, so that all 32-bit platforms will
now print the size of memory. I also update the code
to deal with the fact that total_memory is now a phys_addr_t.

Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

hvc_console: Remove __devexit annotation of hvc_remove()

Removed __devexit annotation of hvc_remove() to avoid a section mismatch
if the backend initialization fails and hvc_remove() must be used to
clean up allocated hvc structs (called in section __init or __devinit).

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

hvc_console: Add support for tty window resizing

The patch provides the hvc_resize() function to update the terminal
window dimensions (struct winsize) for a specified hvc console.
The function stores the new window size and schedules a function
that finally updates the tty winsize and signals the change to
user space (SIGWINCH).
Because the winsize update must acquire a mutex and might sleep,
the function is scheduled instead of being called from hvc_poll()
or khvcd.

This patch uses the tty_do_resize() routine from the tty layer.
A pending resize work is canceled in hvc_close() and hvc_hangup().

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

hvc_console: Fix loop if put_char() returns 0

If put_char() routine of a hvc console backend returns 0, then the
hvc console starts looping in the following scenarios:

1. hvc_console_print()
If put_char() returns 0 then the while loop may loop forever.
I have added the missing check for 0 to throw away console messages.

2. khvcd may loop:
The thread calls hvc_poll() --> hvc_push()... if there are still
buffered data then the HVC_POLL_WRITE bit is set and causes the
khvcd thread to loop (if yield() returns immediately).

However, instead of looping, the khvcd thread could sleep for
MIN_TIMEOUT (doing the same as for get_chars()).
The MIN_TIMEOUT is set if hvc_push() was not able to write
data to the backend. If data has been written, the timeout is
set to 0 to immediately re-schedule hvc_poll().

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> (virtio_console)
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

hvc_console: Add tty driver flag TTY_DRIVER_RESET_TERMIOS

After a tty hangup() or close() operation, processes might not reset the
termio settings to a sane state. In order to reset the termios to its
default settings the tty driver flag TTY_DRIVER_RESET_TERMIOS has been added.

TTY driver flag description from include/linux/tty_driver.h:
TTY_DRIVER_RESET_TERMIOS --- requests the tty layer to reset the
termios setting when the last process has closed the device.
Used for PTY's, in particular.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

hvc_console: Add a hangup notifier for backends

I have added a hangup notifier that can be used by hvc console
backends to handle a tty hangup. The default irq hangup notifier
calls the notifier_del_irq() for compatibility.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

mfd: check for platform_get_irq() return value in sm501

sm501_devdata->irq is unsigned, while platform_get_irq() returns a
signed int.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>

mfd: use pci_ioremap_bar() in sm501

Use the newly introduced pci_ioremap_bar() function in drivers/mfd.
pci_ioremap_bar() just takes a pci device and a bar number, with the goal
of making it really hard to get wrong, while also having a central place
to stick sanity checks.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>

mfd: Don't store volatile bits in WM8350 register cache

This makes the contents of the cache clearer and fixes incorrect
initialisation of the cache for partially volatile registers.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>

mfd: don't export wm3850 static functions

October 10th linux-next build (powerpc allyesconfig) failed like this:

drivers/mfd/wm8350-core.c:1131: error: __ksymtab_wm8350_create_cache causes a section type conflict

Caused by commit 89b4012befb1abca5e86d232bc0e2a797b0d9825 ("mfd: Core
support for the WM8350 AudioPlus PMIC"). wm8350_create_cache is not used
elsewhere, so remove the EXPORT_SYMBOL.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>

mfd: twl4030-gpio driver

This adds basic support for the GPIOs in the twl4030 power management
chip. That includes two open drain LED drivers, and the use of GPIO-0
(and GPIO-1) as MMC/SD card detect switches which can control whether
the VMMC1 (and VMMC2) regulators are active.

This version of the code has a debounce call that will probably be
replaced before long, when a more generic interface exists.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>

mfd: rtc-twl4030 driver

This adds a driver for the RTC inside the TWL4030 multi-function device.
It's a fairly basic RTC, with a wake-capable alarm.

Note that many of the pre-release Overo boards now in circulation can't
effectively use this RTC, because of a wiring error that puts its TWL
chip into "secure" mode. (As in "secure yourself against tampering".)
This isn't an issue on other OMAP3 boards now supported in mainline,
such as Beagle and Labrador.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>

mfd: twl4030 IRQ handling update

- Move it into a separate file; clean and streamline it
- Restructure the init code for reuse during secondary dispatch
- Support both levels (primary, secondary) of IRQ dispatch
- Use a workqueue for irq mask/unmask and trigger configuration

Code for two subchips currently share that secondary handler code.
One is the power subchip; its IRQs are now handled by this core,
courtesy of this patch. The other is the GPIO module, which will
be supported through a later patch.

There are also minor changes to the header file, mostly related
to GPIO support; nothing yet in mainline cares about those. A
few references to OMAP-specific symbols are disabled; when they
can all be removed, the TWL4030 support ceases being OMAP-specific.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Samuel Ortiz <sameo@openedhand.com>

stop_machine: fix error code handling on multiple cpus

Using |= for updating a value which might be updated on several cpus
concurrently will not always work since we need to make sure that the
update happens atomically.
To fix this just use a write if the called function returns an error
code on a cpu. We end up writing the error code of an arbitrary cpu
if multiple ones fail but that should be sufficient.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

stop_machine: use workqueues instead of kernel threads

Convert stop_machine to a workqueue based approach. Instead of using kernel
threads for stop_machine we now use a an rt workqueue to synchronize all
cpus.
This has the advantage that all needed per cpu threads are already created
when stop_machine gets called. And therefore a call to stop_machine won't
fail anymore. This is needed for s390 which needs a mechanism to synchronize
all cpus without allocating any memory.
As Rusty pointed out free_module() needs a non-failing stop_machine interface
as well.

As a side effect the stop_machine code gets simplified.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

workqueue: introduce create_rt_workqueue

create_rt_workqueue will create a real time prioritized workqueue.
This is needed for the conversion of stop_machine to a workqueue based
implementation.
This patch adds yet another parameter to __create_workqueue_key to tell
it that we want an rt workqueue.
However it looks like we rather should have something like "int type"
instead of singlethread, freezable and rt.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>

Call init_workqueues before pre smp initcalls.

This allows to create workqueues from within the context of
a pre smp initcall (aka early_initcall).

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Make panic= and panic_on_oops into core_params

This allows them to be examined and set after boot, plus means they
actually give errors if they are misused (eg. panic=yes).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Make initcall_debug a core_param

This is the one I really wanted: now it effects module loading, it
makes sense to be able to flip it after boot.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>

core_param() for genuinely core kernel parameters

There are a lot of one-liner uses of __setup() in the kernel: they're
cumbersome and not queryable (definitely not settable) via /sys.  Yet
it's ugly to simplify them to module_param(), because by default that
inserts a prefix of the module name (usually filename).

So, introduce a "core_param".  The parameter gets no prefix, but
appears in /sys/module/kernel/parameters/ (if non-zero perms arg).  I
thought about using the name "core", but that's more common than
"kernel".  And if you create a module called "kernel", you will die
a horrible death.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

param: Fix duplicate module prefixes

Instead of insisting each new module_param sysfs entry is unique,
handle the case where it already exists (for builtin modules).

The current code assumes that all identical prefixes are together in
the section: true for normal uses, but not necessarily so if someone
overrides MODULE_PARAM_PREFIX. More importantly, it's not true with
the new "core_param()" code which uses "kernel" as a prefix.

This simplifies the caller for the builtin case, at a slight loss of
efficiency (we do the lookup every time to see if the directory
exists).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg Kroah-Hartman <gregkh@suse.de>

module: check kernel param length at compile time, not runtime

The kparam code tries to handle over-length parameter prefixes at
runtime. Not only would I bet this has never been tested, it's not
clear that truncating names is a good idea either.

So let's check at compile time. We need to move the #define to
moduleparam.h to do this, though.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Remove stop_machine during module load v2

Remove stop_machine during module load v2

module loading currently does a stop_machine on each module load to insert
the module into the global module lists.  Especially on larger systems this
can be quite expensive.

It does that to handle concurrent lock lessmodule list readers
like kallsyms.

I don't think stop_machine() is actually needed to insert something
into a list though. There are no concurrent writers because the
module mutex is taken. And the RCU list functions know how to insert
a node into a list with the right memory ordering so that concurrent
readers don't go off into the wood.

So remove the stop_machine for the module list insert and just
do a list_add_rcu() instead.

Module removal will still do a stop_machine of course, it needs
that for other reasons.

v2: Revised readers based on Paul's comments. All readers that only
    rely on disabled preemption need to be changed to list_for_each_rcu().
    Done that. The others are ok because they have the modules mutex.
    Also added a possible missing preempt disable for print_modules().

[cc Paul McKenney for review. It's not RCU, but quite similar.]

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

module: simplify load_module.

Linus' recent catch of stack overflow in load_module lead me to look
at the code. A couple of helpers to get a section address and get
objects from a section can help clean things up a little.

(And in case you're wondering, the stack size also dropped from 328 to
284 bytes).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

[ARM] clps711x: add sparsemem definitions

Fix the edb7211_defconfig build errors.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

[ARM] 5315/1: Fix section mismatch warning (sa1111)

This patch fixes the section mismatch warning from
sa1111.o at buildtime.

  CC      arch/arm/common/sa1111.o
  LD      arch/arm/common/built-in.o
  LD      vmlinux.o
  MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x87f4): Section mismatch in reference from the function sa1111_probe() to the function .devinit.text:sa1110_mb_enable()
The function sa1111_probe() references
the function __devinit sa1110_mb_enable().
This is often because sa1111_probe lacks a __devinit
annotation or the annotation of sa1110_mb_enable is wrong.

Signed-off-by: Kristoffer Ericson <kristoffer.ericson@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

ide: remove useless subdirs from drivers/ide/

Suggested-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

NOHZ: fix thinko in the timer restart code path

commit fb02fbc14d17837b4b7b02dbb36142c16a7bf208 (NOHZ: restart tick
device from irq_enter())

solves the problem of stale jiffies when long running softirqs happen
in a long idle sleep period, but it has a major thinko in it:

When the interrupt which came in _is_ the timer interrupt which should
expire ts->sched_timer then we cancel and rearm the timer _before_ it
gets expired in hrtimer_interrupt() to the next period. That means the
call back function is not called. This game can go on for ever :(

Prevent this by making sure to only rearm the timer when the expiry
time is more than one tick_period away. Otherwise keep it running as
it is either already expired or will expiry at the right point to
update jiffies.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Venkatesch Pallipadi <venkatesh.pallipadi@intel.com>

Merge branch 'master' of /linux/kernel/git/torvalds/linux-2.6

Conflicts:

drivers/pci/dmar.c

dm: tidy local_init

This patch tidies local_init() in preparation for request-based dm.
No functional change.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm: remove unused flush_all

This patch removes the DM_WQ_FLUSH_ALL state that is unnecessary.

The dm_queue_flush(md, DM_WQ_FLUSH_ALL, NULL) in dm_suspend()
is never invoked because:
  - 'goto flush_and_out' is the same as 'goto out' because
    the 'goto flush_and_out' is called only when '!noflush'
  - If r is non-zero, then the code above will invoke 'goto out'
    and skip this code.

No functional change.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm raid1: separate region_hash interface part1

Separate the region hash code from raid1 so it can be shared by forthcoming
targets. Use BUG_ON() for failed async dm_io() calls.

Signed-off-by: Heinz Mauelshagen <hjm@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm: mark split bio as cloned

When a bio gets split, mark its fragments with the BIO_CLONED flag.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm crypt: remove waitqueue

Remove waitqueue no longer needed with the async crypto interface.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm crypt: fix async split

When writing io, dm-crypt has to allocate a new cloned bio
and encrypt the data into newly-allocated pages attached to this bio.
In rare cases, because of hw restrictions (e.g. physical segment limit)
or memory pressure, sometimes more than one cloned bio has to be used,
each processing a different fragment of the original.

Currently there is one waitqueue which waits for one fragment to finish
and continues processing the next fragment.

But when using asynchronous crypto this doesn't work, because several
fragments may be processed asynchronously or in parallel and there is
only one crypt context that cannot be shared between the bio fragments.
The result may be corruption of the data contained in the encrypted bio.

The patch fixes this by allocating new dm_crypt_io structs (with new
crypto contexts) and running them independently.

The fragments contains a pointer to the base dm_crypt_io struct to
handle reference counting, so the base one is properly deallocated
after all the fragments are finished.

In a low memory situation, this only uses one additional object from the
mempool. If the mempool is empty, the next allocation simple waits for
previous fragments to complete.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm crypt: tidy sector

Prepare local sector variable (offset) for later patch.
Do not update io->sector for still-running I/O.

No functional change.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm: remove dm header from targets

Change #include "dm.h" to #include <linux/device-mapper.h> in all targets.
Targets should not need direct access to internal DM structures.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm: publish array_too_big

Move array_too_big to include/linux/device-mapper.h because it is
used by targets.

Remove the test from dm-raid1 as the number of mirror legs is limited
such that it can never fail. (Even for stripes it seems rather
unlikely.)

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm exception store: fix misordered writes

We must zero the next chunk on disk *before* writing out the current chunk, not
after. Otherwise if the machine crashes at the wrong time, the "end of metadata"
marker may be missing.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org

dm exception store: refactor zero_area

Use a separate buffer for writing zeroes to the on-disk snapshot
exception store, make the updating of ps->current_area explicit and
refactor the code in preparation for the fix in the next patch.

No functional change.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org

dm snapshot: drop unused last_percent

The last_percent field is unused - remove it.
(It dates from when events were triggered as each X% filled up.)

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm snapshot: fix primary_pe race

Fix a race condition with primary_pe ref_count handling.

put_pending_exception runs under dm_snapshot->lock, it does atomic_dec_and_test
on primary_pe->ref_count, and later does atomic_read primary_pe->ref_count.

__origin_write does atomic_dec_and_test on primary_pe->ref_count without holding
dm_snapshot->lock.

This opens the following race condition:
Assume two CPUs, CPU1 is executing put_pending_exception (and holding
dm_snapshot->lock). CPU2 is executing __origin_write in parallel.
primary_pe->ref_count == 2.

CPU1:
if (primary_pe && atomic_dec_and_test(&primary_pe->ref_count))
origin_bios = bio_list_get(&primary_pe->origin_bios);
... decrements primary_pe->ref_count to 1. Doesn't load origin_bios

CPU2:
if (first && atomic_dec_and_test(&primary_pe->ref_count)) {
flush_bios(bio_list_get(&primary_pe->origin_bios));
free_pending_exception(primary_pe);
/* If we got here, pe_queue is necessarily empty. */
return r;
}
... decrements primary_pe->ref_count to 0, submits pending bios, frees
primary_pe.

CPU1:
if (!primary_pe || primary_pe != pe)
free_pending_exception(pe);
... this has no effect.
if (primary_pe && !atomic_read(&primary_pe->ref_count))
free_pending_exception(primary_pe);
... sees ref_count == 0 (written by CPU 2), does double free !!

This bug can happen only if someone is simultaneously writing to both the
origin and the snapshot.

If someone is writing only to the origin, __origin_write will submit kcopyd
request after it decrements primary_pe->ref_count (so it can't happen that the
finished copy races with primary_pe->ref_count decrementation).

If someone is writing only to the snapshot, __origin_write isn't invoked at all
and the race can't happen.

The race happens when someone writes to the snapshot --- this creates
pending_exception with primary_pe == NULL and starts copying. Then, someone
writes to the same chunk in the snapshot, and __origin_write races with
termination of already submitted request in pending_complete (that calls
put_pending_exception).

This race may be reason for bugs:
http://bugzilla.kernel.org/show_bug.cgi?id=11636
https://bugzilla.redhat.com/show_bug.cgi?id=465825

The patch fixes the code to make sure that:
1. If atomic_dec_and_test(&primary_pe->ref_count) returns false, the process
must no longer dereference primary_pe (because someone else may free it under
us).
2. If atomic_dec_and_test(&primary_pe->ref_count) returns true, the process
is responsible for freeing primary_pe.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org

dm kcopyd: avoid queue shuffle

Write throughput to LVM snapshot origin volume is an order
of magnitude slower than those to LV without snapshots or
snapshot target volumes, especially in the case of sequential
writes with O_SYNC on.

The following patch originally written by Kevin Jamieson and
Jan Blunck and slightly modified for the current RCs by myself
tries to improve the performance by modifying the behaviour
of kcopyd, so that it pushes back an I/O job to the head of
the job queue instead of the tail as process_jobs() currently
does when it has to wait for free pages. This way, write
requests aren't shuffled to cause extra seeks.

I tested the patch against 2.6.27-rc5 and got the following results.
The test is a dd command writing to snapshot origin followed by fsync
to the file just created/updated.  A couple of filesystem benchmarks
gave me similar results in case of sequential writes, while random
writes didn't suffer much.

dd if=/dev/zero of=<somewhere on snapshot origin> bs=4096 count=...
   [conv=notrunc when updating]

1) linux 2.6.27-rc5 without the patch, write to snapshot origin,
average throughput (MB/s)
                     10M     100M    1000M
create,dd         511.46   610.72    11.81
create,dd+fsync     7.10     6.77     8.13
update,dd         431.63   917.41    12.75
update,dd+fsync     7.79     7.43     8.12

compared with write throughput to LV without any snapshots,
all dd+fsync and 1000 MiB writes perform very poorly.

                     10M     100M    1000M
create,dd         555.03   608.98   123.29
create,dd+fsync   114.27    72.78    76.65
update,dd         152.34  1267.27   124.04
update,dd+fsync   130.56    77.81    77.84

2) linux 2.6.27-rc5 with the patch, write to snapshot origin,
average throughput (MB/s)

                     10M     100M    1000M
create,dd         537.06   589.44    46.21
create,dd+fsync    31.63    29.19    29.23
update,dd         487.59   897.65    37.76
update,dd+fsync    34.12    30.07    26.85

Although still not on par with plain LV performance -
cannot be avoided because it's copy on write anyway -
this simple patch successfully improves throughtput
of dd+fsync while not affecting the rest.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Kazuo Ito <ito.kazuo@oss.ntt.co.jp>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: stable@kernel.org

V4L/DVB (9336): cx88: always de-alloc frontends on fault condition

De-alloc frontends on fault condition.

Signed-off-by: Darron Broad <darron@kewl.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

V4L/DVB (9335): videobuf: split unregister bus creating self-contained frontend de-allocator

This creates a self contained frontend de-allocator
for the instances where an adapter has not been
registered yet frontend de-allocation may
be required.

Signed-off-by: Darron Broad <darron@kewl.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

V4L/DVB (9334): cx88: dvb_remove debug output

Add debug output for dvb_remove enter.

Signed-off-by: Darron Broad <darron@kewl.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

V4L/DVB (9333): cx88: Not all boards that requires cx88-mpeg has frontends

The multifrontend changes on cx88 assumed that all boards that use cx88-mpeg
supports DVB. This is not true. There also a few analog-only boards based on
Blackboard design that also uses cx88-mpeg. For those boards, there's no need
to allocate dvb frontends.

This patch fixes videobuf allocation for those devices.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

V4L/DVB (9332): cx88: initial fix for analogue only compilation

Initial fix for when analogue only is selected
for compilation (ie, !CX88_DVB)

Signed-off-by: Darron Broad <darron@kewl.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

V4L/DVB (9331): Remove unused inode parameter from video_ioctl2

inode is never used on video_ioctl2. Remove it and rename the function to
__video_ioctl2. This allows its usage directly as a callback at
fops.unlocked_ioctl.

Since we still need a callback with inode to be used with fops.ioctl,
this patch adds video_ioctl2() that is just a call to __video_ioctl2().

Also, this patch adds some comments about video_ioctl2 and __video_ioctl2
usage at v4l2-ioctl.h.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>