Commit Graph

21551 Commits

Author SHA1 Message Date
Sebastian Andrzej Siewior
453281a973 mtd: nand: introduce NAND_CREATE_EMPTY_BBT
it will create an empty BBT table without considering vendor's BBT
information. Vendor's information may be unavailable if the NAND
controller has a different DATA & OOB layout or this information may be
allready purged.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-10-25 00:54:37 +01:00
Sebastian Andrzej Siewior
7cba7b14fe mtd: nand: add support for BBT without OOB
The first (sixt) byte in the OOB area contains vendor's bad block
information. During identification of the NAND chip this information is
collected by scanning the complete chip.
The option NAND_USE_FLASH_BBT is used to store this information in a sector so
we don't have to scan the complete flash. Unfortunately the code stores
a marker in order to recognize the BBT in the OOB area. This will fail
if the OOB area is completely used for ECC.
This patch introduces the option NAND_USE_FLASH_BBT_NO_OOB which has to be
used with NAND_USE_FLASH_BBT. It will then store BBT on flash without
touching the OOB area. The BBT format on flash remains same except the
first page starts with the recognition pattern followed by the version byte.
This change was tested in nandsim and it looks good so far :)

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-10-25 00:53:48 +01:00
Huang Shijie
12a40a57f7 mtd: add init_size hook for NAND driver
Not all the NAND devices have all the information in additional
id bytes.

So add a hook in the nand_chip{} is a good method to calculate the
right value of oobsize, erasesize and so on.

Without the hook,you will get the wrong value, and you have to hack
in the ->scan_bbt() to change the wrong value which make the code
mess.

Signed-off-by: Huang Shijie <shijie8@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-10-25 00:49:57 +01:00
Roman Tereshonkov
5daa7b2149 mtd: prepare partition add and del functions for ioctl requests
mtd_is_master, mtd_add_partition and mtd_del_partition functions
are added to give the possibility of partition manipulation
by ioctl request.

The old partition add function is modified to fit the dynamic
allocation.

Signed-off-by: Roman Tereshonkov <roman.tereshonkov@nokia.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-10-25 00:47:37 +01:00
Linus Walleij
6c009ab89a mtd: generic FSMC NAND MTD driver
This is the same driver submitted by ST Micros SPEAr team but
generalized and tested on the ST-Ericsson U300. It probably
easily works on the NHK8815 too.

Signed-off-by: Vipin Kumar <vipin.kumar@st.com>
Signed-off-by: Rajeev Kumar <rajeev-dlh.kumar@st.com>
Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-10-25 00:33:48 +01:00
Florian Fainelli
d1e1f4e42b mtd: nand: add support for reading ONFI parameters from NAND device
This patch adds support for reading NAND device ONFI parameters and use
the ONFI informations to define its geometry. In case the device supports
ONFI, the onfi_version field in struct nand_chip contains the version (BCD)
and the onfi_params structure can be used by drivers to set up timings and
such. We currently only support ONFI 1.0 parameters.

Signed-off-by: Brian Norris <norris@broadcom.com>
Signed-off-by: Matthieu Castet <matthieu.castet@parrot.com>
Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-10-24 23:46:34 +01:00
Florian Fainelli
caa4b6f24c mtd: nand: add NAND_CMD_PARAM (0xec) definition
This command is used to read the device ONFI parameters page.

Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-10-24 23:45:00 +01:00
Brian Norris
5c709ee9f3 mtd: nand: Increase NAND_MAX_OOBSIZE
An increase in NAND_MAX_OOBSIZE and NAND_MAX_PAGESIZE is necessary
in order to support many new chips. Among those:

Toshiba TC58TxG4S2FBAxx  8KB page, 576B OOB
Micron MT29F64G08CBAAA   8KB page, 448B OOB

Signed-off-by: Brian Norris <norris@broadcom.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-10-24 23:37:51 +01:00
Brian Norris
0ceacf36e9 mtd: edit comments on deprecation of ioctl ECCGETLAYOUT
There were some improvements and additions necessary in the
comments explaining of the expansion of nand_ecclayout, the
introduction of nand_ecclayout_user, and the deprecation of the
ioctl ECCGETLAYOUT.

Also, I found a better placement for the macro MTD_MAX_ECCPOS_ENTRIES;
next to the definition of MTD_MAX_OOBFREE_ENTRIES in mtd-abi.h. The macro
is really only important for the ioctl code (found in drivers/mtd/mtdchar.c)
but since there are small edits being made to the user-space header, I
figured this is a better location.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-10-24 23:37:27 +01:00
Brian Norris
cc26c3cd3d mtd: nand: expand nand_ecc_layout, deprecate ioctl ECCGETLAYOUT
struct nand_ecclayout is too small for many new chips; OOB regions can be as
large as 448 bytes and may increase more in the future. Thus, copying that
struct to user-space with the ECCGETLAYOUT ioctl is not a good idea; the ioctl
would have to be updated every time there's a change to the current largest
size.

Instead, the old nand_ecclayout is renamed to nand_ecclayout_user and a
new struct nand_ecclayout is created that can accomodate larger sizes and
expand without affecting the user-space. struct nand_ecclayout can still
be used in board drivers without modification -- at least for now.

A new function is provided to convert from the new to the old in order to
allow the deprecated ioctl to continue to work with truncated data. Perhaps
the ioctl, the conversion process, and the struct nand_ecclayout_user can be
removed altogether in the future.

Note: There are comments in nand/davinci_nand.c::nand_davinci_probe()
regarding this issue; this driver (and maybe others) can be updated to
account for extra space. All kernel drivers can use the expanded
nand_ecclayout as a drop-in replacement and ignore its benefits.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-10-24 23:37:24 +01:00
Brian Norris
f4a2da0cd5 mtd: inftl.h: fix spacing errors
Replaced some spaces with tabs to fit CodingStyle guidelines

Signed-off-by: Brian Norris <norris@broadcom.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2010-10-24 23:27:24 +01:00
Linus Torvalds
a2724f28d9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits)
  tcp: Fix >4GB writes on 64-bit.
  net/9p: Mount only matching virtio channels
  de2104x: fix ethtool
  tproxy: check for transparent flag in ip_route_newports
  ipv6: add IPv6 to neighbour table overflow warning
  tcp: fix TSO FACK loss marking in tcp_mark_head_lost
  3c59x: fix regression from patch "Add ethtool WOL support"
  ipv6: add a missing unregister_pernet_subsys call
  s390: use free_netdev(netdev) instead of kfree()
  sgiseeq: use free_netdev(netdev) instead of kfree()
  rionet: use free_netdev(netdev) instead of kfree()
  ibm_newemac: use free_netdev(netdev) instead of kfree()
  smsc911x: Add MODULE_ALIAS()
  net: reset skb queue mapping when rx'ing over tunnel
  br2684: fix scheduling while atomic
  de2104x: fix TP link detection
  de2104x: fix power management
  de2104x: disable autonegotiation on broken hardware
  net: fix a lockdep splat
  e1000e: 82579 do not gate auto config of PHY by hardware during nominal use
  ...
2010-09-28 12:01:26 -07:00
David S. Miller
01db403cf9 tcp: Fix >4GB writes on 64-bit.
Fixes kernel bugzilla #16603

tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write
zero bytes, for example.

There is also the problem higher up of how verify_iovec() works.  It
wants to prevent the total length from looking like an error return
value.

However it does this using 'int', but syscalls return 'long' (and
thus signed 64-bit on 64-bit machines).  So it could trigger
false-positives on 64-bit as written.  So fix it to use 'long'.

Reported-by: Olaf Bonorden <bono@onlinehome.de>
Reported-by: Daniel Büse <dbuese@gmx.de>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-27 20:24:54 -07:00
Linus Torvalds
6a6aa2b7e4 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/amd-iommu: Fix rounding-bug in __unmap_single
  x86/amd-iommu: Work around S3 BIOS bug
  x86/amd-iommu: Set iommu configuration flags in enable-loop
  x86, setup: Fix earlyprintk=serial,0x3f8,115200
  x86, setup: Fix earlyprintk=serial,ttyS0,115200
2010-09-27 12:22:21 -07:00
Ingo Molnar
7329cf0201 Merge branch 'amd-iommu/2.6.36' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2010-09-24 11:19:53 +02:00
Joerg Roedel
4c894f47bb x86/amd-iommu: Work around S3 BIOS bug
This patch adds a workaround for an IOMMU BIOS problem to
the AMD IOMMU driver. The result of the bug is that the
IOMMU does not execute commands anymore when the system
comes out of the S3 state resulting in system failure. The
bug in the BIOS is that is does not restore certain hardware
specific registers correctly. This workaround reads out the
contents of these registers at boot time and restores them
on resume from S3. The workaround is limited to the specific
IOMMU chipset where this problem occurs.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-09-23 16:26:03 +02:00
FUJITA Tomonori
710224fa27 arm: fix "arm: fix pci_set_consistent_dma_mask for dmabounce devices"
This fixes the regression caused by the commit 6fee48cd33
("dma-mapping: arm: use generic pci_set_dma_mask and
pci_set_consistent_dma_mask").

ARM needs to clip the dma coherent mask for dmabounce devices. This
restores the old trick.

Note that strictly speaking, the DMA API doesn't allow architectures to do
such but I'm not sure it's worth adding the new API to set the dma mask
that allows architectures to clip it.

Reported-by: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-22 17:22:38 -07:00
Ollie Wild
56b49f4b8f net: Move "struct net" declaration inside the __KERNEL__ macro guard
This patch reduces namespace pollution by moving the "struct net" declaration
out of the userspace-facing portion of linux/netlink.h.  It has no impact on
the kernel.

(This came up because we have several C++ applications which use "net" as a
namespace name.)

Signed-off-by: Ollie Wild <aaw@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-22 13:21:05 -07:00
Sage Weil
8b15575cae fs: {lock,unlock}_flocks() stubs to prepare for BKL removal
The lock structs are currently protected by the BKL, but are accessed by
code in fs/locks.c and misc file system and DLM code.  These stubs will
allow all users to switch to the new interface before the implementation
is changed to a spinlock.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-21 17:27:44 -07:00
Linus Torvalds
7d7dee96e1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
  dca: disable dca on IOAT ver.3.0 multiple-IOH platforms
  netpoll: Disable IRQ around RCU dereference in netpoll_rx
  sctp: Do not reset the packet during sctp_packet_config().
  net/llc: storing negative error codes in unsigned short
  MAINTAINERS: move atlx discussions to netdev
  drivers/net/cxgb3/cxgb3_main.c: prevent reading uninitialized stack memory
  drivers/net/eql.c: prevent reading uninitialized stack memory
  drivers/net/usb/hso.c: prevent reading uninitialized memory
  xfrm: dont assume rcu_read_lock in xfrm_output_one()
  r8169: Handle rxfifo errors on 8168 chips
  3c59x: Remove atomic context inside vortex_{set|get}_wol
  tcp: Prevent overzealous packetization by SWS logic.
  net: RPS needs to depend upon USE_GENERIC_SMP_HELPERS
  phylib: fix PAL state machine restart on resume
  net: use rcu_barrier() in rollback_registered_many
  bonding: correctly process non-linear skbs
  ipv4: enable getsockopt() for IP_NODEFRAG
  ipv4: force_igmp_version ignored when a IGMPv3 query received
  ppp: potential NULL dereference in ppp_mp_explode()
  net/llc: make opt unsigned in llc_ui_setsockopt()
  ...
2010-09-19 11:05:50 -07:00
Herbert Xu
f0f9deae9e netpoll: Disable IRQ around RCU dereference in netpoll_rx
We cannot use rcu_dereference_bh safely in netpoll_rx as we may
be called with IRQs disabled.  We could however simply disable
IRQs as that too causes BH to be disabled and is safe in either
case.

Thanks to John Linville for discovering this bug and providing
a patch.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-17 16:55:03 -07:00
Linus Torvalds
94ca9d669a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: add documentation
2010-09-16 12:50:31 -07:00
Linus Torvalds
9c03f1622a Merge ssh://master.kernel.org/home/hpa/tree/sec
* ssh://master.kernel.org/home/hpa/tree/sec:
  x86-64, compat: Retruncate rax after ia32 syscall entry tracing
  x86-64, compat: Test %rax for the syscall number, not %eax
  compat: Make compat_alloc_user_space() incorporate the access_ok()
2010-09-14 17:07:51 -07:00
Linus Torvalds
de8d4f5d75 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  SUNRPC: Fix the NFSv4 and RPCSEC_GSS Kconfig dependencies
  statfs() gives ESTALE error
  NFS: Fix a typo in nfs_sockaddr_match_ipaddr6
  sunrpc: increase MAX_HASHTABLE_BITS to 14
  gss:spkm3 miss returning error to caller when import security context
  gss:krb5 miss returning error to caller when import security context
  Remove incorrect do_vfs_lock message
  SUNRPC: cleanup state-machine ordering
  SUNRPC: Fix a race in rpc_info_open
  SUNRPC: Fix race corrupting rpc upcall
  Fix null dereference in call_allocate
2010-09-14 17:04:48 -07:00
H. Peter Anvin
c41d68a513 compat: Make compat_alloc_user_space() incorporate the access_ok()
compat_alloc_user_space() expects the caller to independently call
access_ok() to verify the returned area.  A missing call could
introduce problems on some architectures.

This patch incorporates the access_ok() check into
compat_alloc_user_space() and also adds a sanity check on the length.
The existing compat_alloc_user_space() implementations are renamed
arch_compat_alloc_user_space() and are used as part of the
implementation of the new global function.

This patch assumes NULL will cause __get_user()/__put_user() to either
fail or access userspace on all architectures.  This should be
followed by checking the return value of compat_access_user_space()
for NULL in the callers, at which time the access_ok() in the callers
can also be removed.

Reported-by: Ben Hawkes <hawkes@sota.gen.nz>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James Bottomley <jejb@parisc-linux.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: <stable@kernel.org>
2010-09-14 16:08:45 -07:00
Linus Torvalds
2bb3a259d8 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
  dquot: do full inode dirty in allocating space
2010-09-13 12:46:09 -07:00
Linus Torvalds
6142811a33 Merge branch 'next-spi' of git://git.secretlab.ca/git/linux-2.6
* 'next-spi' of git://git.secretlab.ca/git/linux-2.6:
  spi/pl022: move probe call to subsys_initcall()
  powerpc/5200: mpc52xx_uart.c: Add of_node_put to avoid memory leak
  spi/pl022: fix APB pclk power regression on U300
  spi/spi_s3c64xx: Warn if PIO transfers time out
  spi/s3c64xx: Fix incorrect reuse of 'val' local variable.
  spi/s3c64xx: Fix compilation warning
  spi/dw_spi: clean the cs_control code
  spi/dw_spi: Allow interrupt sharing
  spi/spi_s3c64xx: Increase dead reckoning time in wait_for_xfer()
  spi/spi_s3c64xx: Move to subsys_initcall()
  spi: free children in spi_unregister_master, not siblings
  gpiolib: Add 'struct gpio_chip' forward declaration for !GPIOLIB case
  of: Fix missing includes - ll_temac
  spi/spi_s3c64xx: Staticise non-exported functions
  spi/spi_s3c64xx: Make probe more robust against missing board config
2010-09-13 12:45:50 -07:00
Tejun Heo
c54fce6eff workqueue: add documentation
Update copyright notice and add Documentation/workqueue.txt.

Randy Dunlap, Dave Chinner: misc fixes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-By: Florian Mickler <florian@mickler.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
2010-09-13 10:26:52 +02:00
Trond Myklebust
006abe887c SUNRPC: Fix a race in rpc_info_open
There is a race between rpc_info_open and rpc_release_client()
in that nothing stops a process from opening the file after
the clnt->cl_kref goes to zero.

Fix this by using atomic_inc_unless_zero()...

Reported-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-09-12 19:55:25 -04:00
Linus Torvalds
ff3cb3fec3 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: Range check cpu in blk_cpu_to_group
  scatterlist: prevent invalid free when alloc fails
  writeback: Fix lost wake-up shutting down writeback thread
  writeback: do not lose wakeup events when forking bdi threads
  cciss: fix reporting of max queue depth since init
  block: switch s390 tape_block and mg_disk to elevator_change()
  block: add function call to switch the IO scheduler from a driver
  fs/bio-integrity.c: return -ENOMEM on kmalloc failure
  bio-integrity.c: remove dependency on __GFP_NOFAIL
  BLOCK: fix bio.bi_rw handling
  block: put dev->kobj in blk_register_queue fail path
  cciss: handle allocation failure
  cfq-iosched: Documentation help for new tunables
  cfq-iosched: blktrace print per slice sector stats
  cfq-iosched: Implement tunable group_idle
  cfq-iosched: Do group share accounting in IOPS when slice_idle=0
  cfq-iosched: Do not idle if slice_idle=0
  cciss: disable doorbell reset on reset_devices
  blkio: Fix return code for mkdir calls
2010-09-10 07:26:27 -07:00
David S. Miller
053d8f6622 Merge branch 'vhost-net' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost 2010-09-09 21:59:51 -07:00
Linus Torvalds
df423dc7f2 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata-sff: Reenable Port Multiplier after libata-sff remodeling.
  libata: skip EH autopsy and recovery during suspend
  ahci: AHCI and RAID mode SATA patch for Intel Patsburg DeviceIDs
  ata_piix: IDE Mode SATA patch for Intel Patsburg DeviceIDs
  libata,pata_via: revert ata_wait_idle() removal from ata_sff/via_tf_load()
  ahci: fix hang on failed softreset
  pata_artop: Fix device ID parity check
2010-09-09 20:28:19 -07:00
Gwendal Grignou
ea3c64506e libata-sff: Reenable Port Multiplier after libata-sff remodeling.
Keep track of the link on the which the current request is in progress.
It allows support of links behind port multiplier.

Not all libata-sff is PMP compliant. Code for native BMDMA controller
does not take in accound PMP.

Tested on Marvell 7042 and Sil7526.

Signed-off-by: Gwendal Grignou <gwendal@google.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-09-09 22:31:55 -04:00
Tejun Heo
e2f3d75fc0 libata: skip EH autopsy and recovery during suspend
For some mysterious reason, certain hardware reacts badly to usual EH
actions while the system is going for suspend.  As the devices won't
be needed until the system is resumed, ask EH to skip usual autopsy
and recovery and proceed directly to suspend.

Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-09-09 22:27:59 -04:00
Christoph Lameter
aa45484031 mm: page allocator: calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is
cheaper than scanning a number of lists.  To avoid synchronization
overhead, counter deltas are maintained on a per-cpu basis and drained
both periodically and when the delta is above a threshold.  On large CPU
systems, the difference between the estimated and real value of
NR_FREE_PAGES can be very high.  If NR_FREE_PAGES is much higher than
number of real free page in buddy, the VM can allocate pages below min
watermark, at worst reducing the real number of pages to zero.  Even if
the OOM killer kills some victim for freeing memory, it may not free
memory if the exit path requires a new page resulting in livelock.

This patch introduces a zone_page_state_snapshot() function (courtesy of
Christoph) that takes a slightly more accurate view of an arbitrary vmstat
counter.  It is used to read NR_FREE_PAGES while kswapd is awake to avoid
the watermark being accidentally broken.  The estimate is not perfect and
may result in cache line bounces but is expected to be lighter than the
IPI calls necessary to continually drain the per-cpu counters while kswapd
is awake.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:25 -07:00
Hugh Dickins
3399446632 swap: discard while swapping only if SWAP_FLAG_DISCARD
Tests with recent firmware on Intel X25-M 80GB and OCZ Vertex 60GB SSDs
show a shift since I last tested in December: in part because of firmware
updates, in part because of the necessary move from barriers to awaiting
completion at the block layer.  While discard at swapon still shows as
slightly beneficial on both, discarding 1MB swap cluster when allocating
is now disadvanteous: adds 25% overhead on Intel, adds 230% on OCZ (YMMV).

Surrender: discard as presently implemented is more hindrance than help
for swap; but might prove useful on other devices, or with improvements.
So continue to do the discard at swapon, but make discard while swapping
conditional on a SWAP_FLAG_DISCARD to sys_swapon() (which has been using
only the lower 16 bits of int flags).

We can add a --discard or -d to swapon(8), and a "discard" to swap in
/etc/fstab: matching the mount option for btrfs, ext4, fat, gfs2, nilfs2.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nigel Cunningham <nigel@tuxonice.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:25 -07:00
Hugh Dickins
910321ea81 swap: revert special hibernation allocation
Please revert 2.6.36-rc commit d2997b1042
"hibernation: freeze swap at hibernation".  It complicated matters by
adding a second swap allocation path, just for hibernation; without in any
way fixing the issue that it was intended to address - page reclaim after
fixing the hibernation image might free swap from a page already imaged as
swapcache, letting its swap be reallocated to store a different page of
the image: resulting in data corruption if the imaged page were freed as
clean then swapped back in.  Pages freed to si->swap_map were still in
danger of being reallocated by the alternative allocation path.

I guess it inadvertently fixed slow SSD swap allocation for hibernation,
as reported by Nigel Cunningham: by missing out the discards that occur on
the usual swap allocation path; but that was unintentional, and needs a
separate fix.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ondrej Zary <linux@rainbow-software.org>
Cc: Andrea Gelmini <andrea.gelmini@gmail.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nigel Cunningham <nigel@tuxonice.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:25 -07:00
Gregory Bean
5affb60772 gpio: sx150x: correct and refine reset-on-probe behavior
Replace the arbitrary software-reset call from the device-probe
method, because:

- It is defective.  To work correctly, it should be two byte writes,
  not a single word write.  As it stands, it does nothing.

- Some devices with sx150x expanders installed have their NRESET pins
  ganged on the same line, so resetting one causes the others to reset -
  not a nice thing to do arbitrarily!

- The probe, usually taking place at boot, implies a recent hard-reset,
  so a software reset at this point is just a waste of energy anyway.

Therefore, make it optional, defaulting to off, as this will match the
common case of probing at powerup and also matches the current broken
no-op behavior.

Signed-off-by: Gregory Bean <gbean@codeaurora.org>
Reviewed-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:24 -07:00
Andrea Arcangeli
4969c1192d mm: fix swapin race condition
The pte_same check is reliable only if the swap entry remains pinned (by
the page lock on swapcache).  We've also to ensure the swapcache isn't
removed before we take the lock as try_to_free_swap won't care about the
page pin.

One of the possible impacts of this patch is that a KSM-shared page can
point to the anon_vma of another process, which could exit before the page
is freed.

This can leave a page with a pointer to a recycled anon_vma object, or
worse, a pointer to something that is no longer an anon_vma.

[riel@redhat.com: changelog help]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:24 -07:00
Michael S. Tsirkin
31583bb0cf cgroups: fix API thinko
Add cgroup_attach_task_all()

The existing cgroup_attach_task_current_cg() API is called by a thread to
attach another thread to all of its cgroups; this is unsuitable for cases
where a privileged task wants to attach itself to the cgroups of a less
privileged one, since the call must be made from the context of the target
task.

This patch adds a more generic cgroup_attach_task_all() API that allows
both the source task and to-be-moved task to be specified.
cgroup_attach_task_current_cg() becomes a specialization of the more
generic new function.

[menage@google.com: rewrote changelog]
[akpm@linux-foundation.org: address reviewer comments]
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Ben Blum <bblum@google.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:23 -07:00
Huang Ying
e0bf1024b3 kfifo: add parenthesis for macro parameter reference
Some macro parameter references inside typeof() operator are not enclosed
with parenthesis.  It should be safer to add them.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:22 -07:00
David Vrabel
f3c65b2870 mmc: avoid getting CID on SDIO-only cards
The introduction of support for SD combo cards breaks the initialization
of all CSR SDIO chips.  The GO_IDLE (CMD0) in mmc_sd_get_cid() causes CSR
chips to be reset (this is non-standard behavior).

When initializing an SDIO card check for a combo card by using the memory
present bit in the R4 response to IO_SEND_OP_COND (CMD5).  This avoids the
call to mmc_sd_get_cid() on an SDIO-only card.

Signed-off-by: David Vrabel <david.vrabel@csr.com>
Acked-by: Michal Mirolaw <mirq-linux@rere.qmqm.pl>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:22 -07:00
Jonathan Corbet
a73f8844e1 lglock: make lg_lock_global() actually lock globally
lg_lock_global() currently only acquires spinlocks for online CPUs, but
it's meant to lock all possible CPUs.  Lglock-protected resources may be
associated with removed CPUs - and, indeed, that could happen with the
per-superblock open files lists.

At Nick's suggestion, change for_each_online_cpu() to
for_each_possible_cpu() to protect accesses to those resources.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Acked-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 09:09:43 -07:00
Stefan Bader
39aa3cb3e8 mm: Move vma_stack_continue into mm.h
So it can be used by all that need to check for that.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 09:05:06 -07:00
Shaohua Li
d530148ae8 dquot: do full inode dirty in allocating space
Alex Shi found a regression when doing ffsb test. The test has several threads,
and each thread creates a small file, write to it and then delete it. ffsb
reports about 20% regression and Alex bisected it to 43d2932d88. The test
will call __mark_inode_dirty 3 times. without this commit, we only take
inode_lock one time, while with it, we take the lock 3 times with flags (
I_DIRTY_SYNC,I_DIRTY_PAGES,I_DIRTY). Perf shows the lock contention increased
too much. Below proposed patch fixes it.

fs is allocating blocks, which usually means file writes and the inode
will be dirtied soon. We fully dirty the inode to reduce some inode_lock
contention in several calls of __mark_inode_dirty.

Jan Kara: Added comment.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2010-09-09 16:08:51 +02:00
Linus Torvalds
2c20130f20 Merge branch 'semaphore-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'semaphore-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  semaphore: Add DEFINE_SEMAPHORE
2010-09-08 11:15:51 -07:00
Linus Torvalds
1faa6ec8cc Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mcheck: Avoid duplicate sysfs links/files for thresholding banks
  io-mapping: Fix the address space annotations
  x86: Fix the address space annotations of iomap_atomic_prot_pfn()
  x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline
  x86, hwmon: Fix unsafe smp_processor_id() in thermal_throttle_add_dev
2010-09-08 11:14:10 -07:00
Linus Torvalds
79637a41e4 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  gcc-4.6: kernel/*: Fix unused but set warnings
  mutex: Fix annotations to include it in kernel-locking docbook
  pid: make setpgid() system call use RCU read-side critical section
  MAINTAINERS: Add RCU's public git tree
2010-09-08 11:13:42 -07:00
Feng Tang
e3e55ff585 spi/dw_spi: clean the cs_control code
commit 052dc7c45i "spi/dw_spi: conditional transfer mode change"
introduced cs_control code, which has a bug by using bit offset
for spi mode to set transfer mode in control register. Also it
forces devices who don't need cs_control to re-configure the
control registers for each spi transfer. This patch will fix them

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-09-08 10:50:00 -06:00
Thomas Gleixner
febc88c594 semaphore: Add DEFINE_SEMAPHORE
The full cleanup of init_MUTEX[_LOCKED] and DECLARE_MUTEX has not been
done. Some of the users are real semaphores and we should name them as
such instead of confusing everyone with "MUTEX".

Provide the infrastructure to get finally rid of init_MUTEX[_LOCKED]
and DECLARE_MUTEX.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
LKML-Reference: <20100907125054.795929962@linutronix.de>
2010-09-08 15:04:10 +02:00