Commit Graph

376617 Commits

Author SHA1 Message Date
Michael Ellerman
cbda6aa10b powerpc/perf: Revert to original NO_SIPR logic
This is a revert and then some of commit 860aad7 "Add regs_no_sipr()".
This workaround was only needed on early chip versions.

As before NO_SIPR becomes a static flag of the PMU struct.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:29 +10:00
Kevin Hao
858957ab1e powerpc/pci: Remove the unused variables in pci_process_bridge_OF_ranges
The codes which ever used these two variables have gone. Throw away
them too.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:28 +10:00
Kevin Hao
2798389604 powerpc/pci: Remove the stale comments of pci_process_bridge_OF_ranges
These comments already don't apply to the current code. So just remove
them.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:28 +10:00
Srivatsa S. Bhat
f274ef8747 powerpc/pseries: Always enable CONFIG_HOTPLUG_CPU on PSERIES SMP
Adam Lackorzynski reported the following build failure on
!CONFIG_HOTPLUG_CPU configuration:

  CC      arch/powerpc/kernel/rtas.o
arch/powerpc/kernel/rtas.c: In function ‘rtas_cpu_state_change_mask’:
arch/powerpc/kernel/rtas.c:843:4: error: implicit declaration of function ‘cpu_down’ [-Werror=implicit-function-declaration]
cc1: all warnings being treated as errors
make[1]: *** [arch/powerpc/kernel/rtas.o] Error 1
make: *** [arch/powerpc/kernel] Error 2

The build fails because cpu_down() is defined only under CONFIG_HOTPLUG_CPU.

Looking further, the mobility code in pseries is one of the call-sites which
uses rtas_ibm_suspend_me(), which in turn calls rtas_cpu_state_change_mask().
And the mobility code is unconditionally compiled-in (it does not fall under
any Kconfig option). And commit 120496ac (powerpc: Bring all threads online
prior to migration/hibernation) which introduced this build regression is
critical for the proper functioning of the migration code. So it appears
that the only solution to this problem is to enable CONFIG_HOTPLUG_CPU if
SMP is enabled on PPC_PSERIES platforms. So make that change in the Kconfig.

Reported-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de>
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:27 +10:00
Paul Mackerras
8e44ddc3f3 powerpc/kvm/book3s: Add support for H_IPOLL and H_XIRR_X in XICS emulation
This adds the remaining two hypercalls defined by PAPR for manipulating
the XICS interrupt controller, H_IPOLL and H_XIRR_X.  H_IPOLL returns
information about the priority and pending interrupts for a virtual
cpu, without changing any state.  H_XIRR_X is like H_XIRR in that it
reads and acknowledges the highest-priority pending interrupt, but it
also returns the timestamp (timebase register value) from when the
interrupt was first received by the hypervisor.  Currently we just
return the current time, since we don't do any software queueing of
virtual interrupts inside the XICS emulation code.

These hcalls are not currently used by Linux guests, but may be in
future.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:27 +10:00
Priyanka Jain
f7b3367774 powerpc/32bit:Store temporary result in r0 instead of r8
Commit a9c4e541ea
"powerpc/kprobe: Complete kprobe and migrate exception frame"
introduced a regression:

While returning from exception handling in case of PREEMPT enabled,
_TIF_NEED_RESCHED bit is checked in TI_FLAGS (thread_info flag) of current
task. Only if this bit is set, it should continue with the process of
calling preempt_schedule_irq() to schedule highest priority task if
available.

Current code assumes that r8 contains TI_FLAGS and check this for
_TIF_NEED_RESCHED, but as r8 is modified in the code which executes before
this check, r8 no longer contains the expected TI_FLAGS information.

As a result check for comparison with _TIF_NEED_RESCHED was failing even if
NEED_RESCHED bit is set in the current thread_info flag. Due to this,
preempt_schedule_irq() and in turn scheduler was not getting called even if
highest priority task is ready for execution.

So, store temporary results in r0 instead of r8 to prevent r8 from getting
modified as subsequent code is dependent on its value.

Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
CC: <stable@vger.kernel.org> [v3.7+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:27 +10:00
Aneesh Kumar K.V
0608d69246 powerpc/mm: Always invalidate tlb on hpte invalidate and update
If a hash bucket gets full, we "evict" a more/less random entry from it.
When we do that we don't invalidate the TLB (hpte_remove) because we assume
the old translation is still technically "valid". This implies that when
we are invalidating or updating pte, even if HPTE entry is not valid
we should do a tlb invalidate.

This was a regression introduced by b1022fbd29

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:26 +10:00
Michael Neuling
280a5ba22c powerpc/pseries: Improve stream generation comments in copypage/user
No code changes, just documenting what's happening a little better.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:26 +10:00
Michael Neuling
a515348fc6 powerpc/pseries: Kill all prefetch streams on context switch
On context switch, we should have no prefetch streams leak from one
userspace process to another.  This frees up prefetch resources for the
next process.

Based on patch from Milton Miller.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:25 +10:00
Nishanth Aravamudan
2ac6f427ad powerpc/cputable: Fix oprofile_cpu_type on power8
Maynard informed me that neither the oprofile kernel module nor oprofile
userspace has been updated to support that "legacy" oprofile module
interface for power8, which is indicated by "ppc64/power8." This results
in no samples. The solution is to default to the "timer" type, instead.
The raw entry also should be updated, as "ppc64/ibm-compat-v1" indicates
to oprofile userspace to use "compatibility events" which are obsolete
in ISA 2.07.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:25 +10:00
chenhui zhao
e242114aff powerpc/mpic: Fix irq distribution problem when MPIC_SINGLE_DEST_CPU
For the mpic with a flag MPIC_SINGLE_DEST_CPU, only one bit should be
set in interrupt destination registers.

The code is applicable to 64-bit platforms as well as 32-bit.

Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:24 +10:00
Michael Neuling
2b3f8e87cf powerpc/tm: Fix userspace stack corruption on signal delivery for active transactions
When in an active transaction that takes a signal, we need to be careful with
the stack.  It's possible that the stack has moved back up after the tbegin.
The obvious case here is when the tbegin is called inside a function that
returns before a tend.  In this case, the stack is part of the checkpointed
transactional memory state.  If we write over this non transactionally or in
suspend, we are in trouble because if we get a tm abort, the program counter
and stack pointer will be back at the tbegin but our in memory stack won't be
valid anymore.

To avoid this, when taking a signal in an active transaction, we need to use
the stack pointer from the checkpointed state, rather than the speculated
state.  This ensures that the signal context (written tm suspended) will be
written below the stack required for the rollback.  The transaction is aborted
becuase of the treclaim, so any memory written between the tbegin and the
signal will be rolled back anyway.

For signals taken in non-TM or suspended mode, we use the
normal/non-checkpointed stack pointer.

Tested with 64 and 32 bit signals

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:23 +10:00
Michael Neuling
b75c100ef2 powerpc/tm: Move TM abort cause codes to uapi
These cause codes are usable by userspace, so let's export to uapi.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:23 +10:00
Michael Neuling
6ce6c629fd powerpc/tm: Abort on emulation and alignment faults
If we are emulating an instruction inside an active user transaction that
touches memory, the kernel can't emulate it as it operates in transactional
suspend context.  We need to abort these transactions and send them back to
userspace for the hardware to rollback.

We can service these if the user transaction is in suspend mode, since the
kernel will operate in the same suspend context.

This adds a check to all alignment faults and to specific instruction
emulations (only string instructions for now).  If the user process is in an
active (non-suspended) transaction, we abort the transaction go back to
userspace allowing the HW to roll back the transaction and tell the user of the
failure.  This also adds new tm abort cause codes to report the reason of the
persistent error to the user.

Crappy test case here http://neuling.org/devel/junkcode/aligntm.c

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:22 +10:00
Michael Neuling
24b92375dc powerpc/tm: Update cause codes documentation
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # 3.9 only
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:22 +10:00
Michael Neuling
35f7097fce powerpc/tm: Make room for hypervisor in abort cause codes
PAPR carves out 0xff-0xe0 for hypervisor use of transactional memory software
abort cause codes.  Unfortunately we don't respect this currently.

Below fixes this to move our cause codes to below this region.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # 3.9 only
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:22 +10:00
Linus Torvalds
1d822d6094 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull reiserfs fixes from Jan Kara:
 "Three reiserfs fixes.  They fix real problems spotted by users so I
  hope they are ok even at this stage."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  reiserfs: fix deadlock with nfs racing on create/lookup
  reiserfs: fix problems with chowning setuid file w/ xattrs
  reiserfs: fix spurious multiple-fill in reiserfs_readdir_dentry
2013-06-01 06:59:14 +09:00
Linus Torvalds
7cfb953258 xfs: extended attribute fixes for CRCs
- Remove assert on count of remote attribute CRC headers
 - Fix the number of blocks read in for remote attributes
 - Zero remote attribute tails properly
 - Fix mapping of remote attribute buffers to have correct length
 - initialize temp leaf properly in xfs_attr3_leaf_unbalance, and
   xfs_attr3_leaf_compact
 - Rework remote atttributes to have a header per block
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJRqQE3AAoJENaLyazVq6ZOBHMQAIaaBdvrHV0tg/EyzVCuVFul
 BJH2RrL7WqFe6cXjkb5qV9neesj/dyDoSOEVqko7klNxlUhnSK6j9UnM9QfeAZVg
 x22TQ+c938giCu/iibdaFsyEVIgX2WfjPY+OoKJh1kTPOimO1ekRE/H8qRJ3A5Ge
 hl5H/pyDcJXiXowNVQEF6sPj59BzEn65kelpIn/uW5BzrmdIjfM+wnvr4mvkxrCS
 0FM9acLmEl/ri448wsZjrlA8yltqg/J8LS+oc6AgWhWGB9qL3LzM7xtQblv47ail
 kJrvJek8RMQT1nQ8N1gQa0jCSCtWB+Lir04ynTM0b5h2NKcm6iFih5kd6a+wl5qd
 p26LkTSjIQLv21dx9fylTvUIf4xx0NCW/BNqSPpEgTXi6FUiYGzEq7iuvCfbGSYf
 +k9PB3KkVhmRzv/IWvP8OmnrhXZhQVo55Z+OulHwuaZ/+orL0Y0TtypW6imjETK5
 b/JiRoVaN1OroZUyg1qk4OnYftl6j93X3iyLniIZiI5hTMNZBhiqe4EcHd++pYDN
 gpaQC/V/9pAgSWhyyD6E5Jv7HvkUuOc73mSPHH3oz5WAoEI6I1XWx+pExo15BiZD
 FADL3HSnnvGDgGxfWb0nYSS/M0KdAT9lpu+48lKg1sIXVANnYkHxJNofM6ZXe042
 BUuItw8SCGARf9JEJF8J
 =VDbP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.10-rc4-crc-xattr-fixes' of git://oss.sgi.com/xfs/xfs

Pull xfs extended attribute fixes for CRCs from Ben Myers:
 "Here are several fixes that are relevant on CRC enabled XFS
  filesystems.  They are followed by a rework of the remote attribute
  code so that each block of the attribute contains a header with a CRC.

  Previously there was a CRC header per extent in the remote attribute
  code, but this was untenable because it was not possible to know how
  many extents would be allocated for the attribute until after the
  allocation has completed, due to the fragmentation of free space.
  This became complicated because the size of the headers needs to be
  added to the length of the payload to get the overall length required
  for the allocation.  With a header per block, things are less
  complicated at the cost of a little space.

  I would have preferred to defer this and the rest of the CRC queue to
  3.11 to mitigate risk for existing non-crc users in 3.10.  Doing so
  would require setting a feature bit for the on-disk changes, and so I
  have been pressured into sending this pull request by Eric Sandeen and
  David Chinner from Red Hat.  I'll send another pull request or two
  with the rest of the CRC queue next week.

   - Remove assert on count of remote attribute CRC headers
   - Fix the number of blocks read in for remote attributes
   - Zero remote attribute tails properly
   - Fix mapping of remote attribute buffers to have correct length
   - initialize temp leaf properly in xfs_attr3_leaf_unbalance, and
     xfs_attr3_leaf_compact
   - Rework remote atttributes to have a header per block"

* tag 'for-linus-v3.10-rc4-crc-xattr-fixes' of git://oss.sgi.com/xfs/xfs:
  xfs: rework remote attr CRCs
  xfs: fully initialise temp leaf in xfs_attr3_leaf_compact
  xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance
  xfs: correctly map remote attr buffers during removal
  xfs: remote attribute tail zeroing does too much
  xfs: remote attribute read too short
  xfs: remote attribute allocation may be contiguous
2013-06-01 06:56:21 +09:00
Linus Torvalds
e8d256aca0 xfs: fixes for 3.10-rc4
- Fix nested transactions in xfs_qm_scall_setqlim
 - Clear suid/sgid bits when we truncate with size update
 - Fix recovery for split buffers
 - Fix block count on remote symlinks
 - Add fsgeom flag for v5 superblock support
 - Disable XFS_IOC_SWAPEXT for CRC enabled filesystems
 - Fix dirv3 freespace block corruption
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJRqPJxAAoJENaLyazVq6ZOxVIQAIPG+G+PKTQsbj9kNAxk2yGb
 jAD9HD9bbFFD1Xz3RoO/CoJpXcxo/Gj/6r94ZNTlmvR1SHyL4P/9gCBI8F9Su+Ru
 wAtSyWyubV8bERYLfQ5zys7sNWBGhwYkHyjslIkuoRMv6jlAFpZugghKbF7hNCKg
 qNCkMlaj0uQQAWLAjiWMUwMYjQEhzmvds1VCf4pSCycI3/opYIi9ZV6Wr5vHcJkw
 nicZa5M9OH1oiIADKiMBXMdNGjmLx3l3gM7eqOoJeJWlok6MSp0AciB2hYwDuz/8
 5cxkId5dHO0IV1Bf0N4vYDGxnDBRbbncVzywFYS0WS/4MjL8IGR7jUzGZV+bH52v
 jax5TbS6EkbmgVr2hErmjBvUoAnVRZ1rbAYDNpvudhNjAz0730y0Zy17J2ahja6Q
 0wunmpyd/3hfaeczVu1DkXDWIYv6c0yGxz2Ca9o+bk9PIS7T0NjVWc2niATtnPWh
 HaQU0Ix/2y2zHWnSFJ28iikG1Dm5iTyByCUK1sToHLilB7o6GA8HVVzf5T02usN4
 OxLWqKunwUQzlTJ9Ng6YUGlI/iZiep/VgSY59QpivF6KwMN+S7XE1lPnGYRvHNVy
 gJCjyjbT9qPsJ1gy5pun9IuzbusYPpOzU/OUQtDNx35xyrRfV4Jy1fijSCHykHM+
 AWVVy0u03OueOE3/LQGv
 =r+vv
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.10-rc4' of git://oss.sgi.com/xfs/xfs

Pull xfs fixes from Ben Myers:
 - Fix nested transactions in xfs_qm_scall_setqlim
 - Clear suid/sgid bits when we truncate with size update
 - Fix recovery for split buffers
 - Fix block count on remote symlinks
 - Add fsgeom flag for v5 superblock support
 - Disable XFS_IOC_SWAPEXT for CRC enabled filesystems
 - Fix dirv3 freespace block corruption

* tag 'for-linus-v3.10-rc4' of git://oss.sgi.com/xfs/xfs:
  xfs: fix dir3 freespace block corruption
  xfs: disable swap extents ioctl on CRC enabled filesystems
  xfs: add fsgeom flag for v5 superblock support.
  xfs: fix incorrect remote symlink block count
  xfs: fix split buffer vector log recovery support
  xfs: kill suid/sgid through the truncate path.
  xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
2013-06-01 06:50:16 +09:00
Linus Torvalds
977b55cf98 Can't call pci_get_domain_bus_and_slot() from interupt context
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRp5IvAAoJEKurIx+X31iBGqMP/2aXdk0hBXKM76uCyYt1cn/X
 MibhCgX17L65BOvL19CB0Ylj+uAlgAw/0qzwMqP8eKgt1Unx0vzY5Pf0B1nOsHeF
 YQU2RAhu5Xw23sTSf3b9tRADki4n14pzhwfTzOSqY/a9JxeatoFElCyMjcXL5UaY
 hL96Ke1pvWZrAI3VT9tneH4AV0qfMajd7xSdCyS22+IdzcxEw/xln/SCXZieeFxE
 WQpNGGB/pSwJYnTU8nNJQdFLggO19Sa0v+rOZJzQ08BuKL1nJ6rlEZejk6T668So
 RJNlGTBTHdwMiH7HctUb9iYcTfyhGlZ+kxdq7puRtgKABbjMnzlSWpbJzIv/LSsm
 M0atocT9Se82V9g6g4lizEeBuwUjfv94GXrWlAtVgpi8KKpF0LOwMeeFvciFp96k
 p6hILjqiYVF+G9lP0Vkh1DPp9a2MfROLX0+GKZtuLjloZ0WJbqdOY8wHvFApsltL
 B5szN4TLLutQfHkfqH27MIDtFQmJe+tieMJ0EyfAJan4eigrlZu0iqw4tJkZWiHj
 yWIIgwaxFvZA639yWObVfk52nesjNuWNtVfKM+OjcUsR0O7UO9ZhcoI9Oew7lxqD
 XoEv4FgtPBFDz7Q659auUMrHx4cZ/STkxikUrLwQmR0QeR9nVW9OnGBGTsR8EKm+
 tymC0x7AYmnDJUPoZHqM
 =ToBG
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-aertracefix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull aer error logging fix from Tony Luck:
 "Can't call pci_get_domain_bus_and_slot() from interupt context"

* tag 'please-pull-aertracefix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  aerdrv: Move cper_print_aer() call out of interrupt context
2013-06-01 06:47:04 +09:00
Linus Torvalds
fe696b47eb - Module compilation issues (symbol not exported).
- Plug a hole where user space can bring the kernel down.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.9 (GNU/Linux)
 
 iQIcBAABAgAGBQJRqLxuAAoJEGvWsS0AyF7xeqoP/0aq848/C5mZRD0vhC3NzHLk
 Zkzwt1uGn4SF/kFoliITPhmkpGEBaLnoE3WA/7uinb7GpeSzfRTx6LNqNf4H0mvT
 K2NCB6BA4l2ve/x1IJ5NmV44ODMG8kxyTZlJBWJ7QQKf3a+q8pjatP4N621bbkZx
 1aNrT+IaNeDpZt6g07nhb+8+XN3qy5NzjlsRKNxF93Nnr4BCxLVPq+NxcHbr+FHl
 87M1hfTljhCnO4GP5OkUSIZppaPqAfVAOc0Y8B6ivbcVaCyNY9eRnb6rl60SJpBJ
 6yhL2BzwtllkFtnoKjBViJglQWHwFlO0nI8A9DzzO986nNLKX3ks9cBXIlqhpRfR
 Y4ZWVoDRIl6mw9oOVngfRO6mxXv0AdIkXbYxSYIzYGxA8TWTvGX5gcAHoX142+E2
 +qqoSpkkaXMrZ6slR1hHfZ2QlzuRs+Xq06Mte0IWCA3pa6c0skDQd9WkjaCRu6Gu
 072CUzBy4bxucFTIBtJanhNKtOmTa4F+W548QgsEcuhvg0mwksaabowdMlVzn9FR
 ttgb84VRA3VywHn5rRQRmPVUKxxUEjP0dcc6zELRwT9TgTdOaxDd7sPg43Odo4rS
 WnplZPLOnGNR/x94qNxbql3O4qRaIv2R36C3DfnUv01cqqrJI2Y6mruC8kJAm7TW
 QsrN64qznqKbmOJqIpPx
 =bsjQ
 -----END PGP SIGNATURE-----

Merge tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64

Pull arm64 fixes from Catalin Marinas:
 - Module compilation issues (symbol not exported).
 - Plug a hole where user space can bring the kernel down.

* tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
  arm64: don't kill the kernel on a bad esr from el0
  arm64: treat unhandled compat el0 traps as undef
  arm64: Do not report user faults for handled signals
  arm64: kernel: compiling issue, need 'EXPORT_SYMBOL(clear_page)'
2013-06-01 06:45:10 +09:00
Jeff Mahoney
a1457c0ce9 reiserfs: fix deadlock with nfs racing on create/lookup
Reiserfs is currently able to be deadlocked by having two NFS clients
where one has removed and recreated a file and another is accessing the
file with an open file handle.

If one client deletes and recreates a file with timing such that the
recreated file obtains the same [dirid, objectid] pair as the original
file while another client accesses the file via file handle, the create
and lookup can race and deadlock if the lookup manages to create the
in-memory inode first.

The create thread, in insert_inode_locked4, will hold the write lock
while waiting on the other inode to be unlocked. The lookup thread,
anywhere in the iget path, will release and reacquire the write lock while
it schedules. If it needs to reacquire the lock while the create thread
has it, it will never be able to make forward progress because it needs
to reacquire the lock before ultimately unlocking the inode.

This patch drops the write lock across the insert_inode_locked4 call so
that the ordering of inode_wait -> write lock is retained. Since this
would have been the case before the BKL push-down, this is safe.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-05-31 23:14:20 +02:00
Jeff Mahoney
4a8570112b reiserfs: fix problems with chowning setuid file w/ xattrs
reiserfs_chown_xattrs() takes the iattr struct passed into ->setattr
and uses it to iterate over all the attrs associated with a file to change
ownership of xattrs (and transfer quota associated with the xattr files).

When the setuid bit is cleared during chown, ATTR_MODE and iattr->ia_mode
are passed to all the xattrs as well. This means that the xattr directory
will have S_IFREG added to its mode bits.

This has been prevented in practice by a missing IS_PRIVATE check
in reiserfs_acl_chmod, which caused a double-lock to occur while holding
the write lock. Since the file system was completely locked up, the
writeout of the corrupted mode never happened.

This patch temporarily clears everything but ATTR_UID|ATTR_GID for the
calls to reiserfs_setattr and adds the missing IS_PRIVATE check.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-05-31 23:14:11 +02:00
Jeff Mahoney
0bdc7acba5 reiserfs: fix spurious multiple-fill in reiserfs_readdir_dentry
After sleeping for filldir(), we check to see if the file system has
changed and research. The next_pos pointer is updated but its value
isn't pushed into the key used for the search itself. As a result,
the search returns the same item that the last cycle of the loop did
and filldir() is called multiple times with the same data.

The end result is that the buffer can contain the same name multiple
times. This can be returned to userspace or used internally in the
xattr code where it can manifest with the following warning:

jdm-20004 reiserfs_delete_xattrs: Couldn't delete all xattrs (-2)

reiserfs_for_each_xattr uses reiserfs_readdir_dentry to iterate over
the xattr names and ends up trying to unlink the same name twice. The
second attempt fails with -ENOENT and the error is returned. At some
point I'll need to add support into reiserfsck to remove the orphaned
directories left behind when this occurs.

The fix is to push the value into the key before researching.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2013-05-31 23:13:58 +02:00
Al Viro
4ad1f70ebc zoran: racy refcount handling in vm_ops ->open()/->close()
worse, we lock ->resource_lock too late when we are destroying the
final clonal VMA; the check for lack of other mappings of the same
opened file can race with mmap().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31 15:33:32 -04:00
Richard Genoud
56c21b53ab atmel_lcdfb: blank the backlight on remove
When removing atmel_lcdfb module, the backlight is unregistered but not
blanked. (only for CONFIG_BACKLIGHT_ATMEL_LCDC case).
This can result in the screen going full white depending on how the PWM
is wired.

Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
2013-06-01 03:18:55 +08:00
Richard Genoud
65ac057bce trivial: atmel_lcdfb: add missing error message
When a too small framebuffer is given, the atmel_lcdfb_check_var
silently fails.
Adding an error message will save some head scratching.

Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
2013-06-01 03:18:30 +08:00
Al Viro
448293aadb befs_readdir(): do not increment ->f_pos if filldir tells us to stop
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31 15:17:56 -04:00
Al Viro
31abdab9c1 hpfs: deadlock and race in directory lseek()
For one thing, there's an ABBA deadlock on hpfs fs-wide lock and i_mutex
in hpfs_dir_lseek() - there's a lot of methods that grab the former with
the caller already holding the latter, so it must take i_mutex first.

For another, locking the damn thing, carefully validating the offset,
then dropping locks and assigning the offset is obviously racy.

Moreover, we _must_ do hpfs_add_pos(), or the machinery in dnode.c
won't modify the sucker on B-tree surgeries.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31 15:17:43 -04:00
Al Viro
1d7095c72d qnx6: qnx6_readdir() has a braino in pos calculation
We want to mask lower 5 bits out, not leave only those and clear the
rest...  As it is, we end up always starting to read from the beginning
of directory, no matter what the current position had been.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31 15:17:31 -04:00
Jan Beulich
801d9d26bf fix buffer leak after "scsi: saner replacements for ->proc_info()"
That patch failed to set proc_scsi_fops' .release method.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31 15:16:51 -04:00
Takashi Iwai
5d477b6079 vfs: Fix invalid ida_remove() call
When the group id of a shared mount is not allocated, the umount still
tries to call mnt_release_group_id(), which eventually hits a kernel
warning at ida_remove() spewing a message like:
  ida_remove called for id=0 which is not allocated.

This patch fixes the bug simply checking the group id in the caller.

Reported-by: Cristian Rodríguez <crrodriguez@opensuse.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31 15:16:33 -04:00
Christian Borntraeger
e86cbd8765 s390/pgtable: Fix gmap notifier address
The address of the gmap notifier was broken, resulting in
unhandled validity intercepts in KVM. Fix the rmap->vmaddr
to be on a segment boundary.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-05-31 17:23:53 +02:00
Stefan Weinhuber
8b811bae69 s390/dasd: fix handling of gone paths
When a path is gone and dasd_generic_path_event is called with a
PE_PATH_GONE event, we must assume that any I/O request on that
subchannel is still running. This is unlike the dasd_generic_notify
handler and the CIO_NO_PATH event, which implies that the subchannel
has been cleared.
If dasd_generic_path_event finds that the path has been the last
usable path, it must not call dasd_generic_last_path_gone (which would
reset the state of running requests), but just set the
DASD_STOPPED_DC_WAIT bit.

Signed-off-by: Stefan Weinhuber <wein@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-05-31 17:23:49 +02:00
Mark Rutland
9955ac47f4 arm64: don't kill the kernel on a bad esr from el0
Rather than completely killing the kernel if we receive an esr value we
can't deal with in the el0 handlers, send the process a SIGILL and log
the esr value in the hope that we can debug it. If we receive a bad esr
from el1, we'll die() as before.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: stable@vger.kernel.org
2013-05-31 16:04:51 +01:00
Mark Rutland
381cc2b970 arm64: treat unhandled compat el0 traps as undef
Currently, if a compat process reads or writes from/to a disabled
cp15/cp14 register, the trap is not handled by the el0_sync_compat
handler, and the kernel will head to bad_mode, where it will die(), and
oops(). For 64 bit processes, disabled system register accesses are
currently treated as unhandled instructions.

This patch modifies entry.S to treat these unhandled traps as undefined
instructions, sending a SIGILL to userspace. This gives processes a
chance to handle this and stop using inaccessible registers, and
prevents further issues in the kernel as a result of the die().

Reported-by: Johannes Jensen <Johannes.Jensen@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-05-31 16:04:44 +01:00
Finn Thain
df66834a43 m68k/mac: Fix unexpected interrupt with CONFIG_EARLY_PRINTK
The present code does not wait for the SCC to finish resetting itself
before trying to initialise the device. The result is that the SCC
interrupt sources become enabled (if they weren't already). This leads to
an early boot crash (unexpected interrupt) given CONFIG_EARLY_PRINTK. Fix
this by adding a delay. A successful reset disables the interrupt sources.

Also, after the reset for channel A setup, the SCC then gets a second
reset for channel B setup which leaves channel A uninitialised again. Fix
this by performing the reset only once.

Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Cc: stable@vger.kernel.org
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2013-05-31 10:43:18 +02:00
Nicholas Bellinger
aafc9d158b iscsi-target: Fix iscsit_free_cmd() se_cmd->cmd_kref shutdown handling
With the introduction of target_get_sess_cmd() referencing counting for
ISCSI_OP_SCSI_CMD processing with iser-target, iscsit_free_cmd() usage
in traditional iscsi-target driver code now needs to be aware of the
active I/O shutdown case when a remaining se_cmd->cmd_kref reference may
exist after transport_generic_free_cmd() completes, requiring a final
target_put_sess_cmd() to release iscsi_cmd descriptor memory.

This patch changes iscsit_free_cmd() to invoke __iscsit_free_cmd() before
transport_generic_free_cmd() -> target_put_sess_cmd(), and also avoids
aquiring the per-connection queue locks for typical fast-path calls
during normal ISTATE_REMOVE operation.

Also update iscsit_free_cmd() usage throughout iscsi-target to
use the new 'bool shutdown' parameter.

This patch fixes a regression bug introduced during v3.10-rc1 in
commit 3e1c81a95, that was causing the following WARNING to appear:

[  257.235153] ------------[ cut here]------------
[  257.240314] WARNING: at kernel/softirq.c:160 local_bh_enable_ip+0x3c/0x86()
[  257.248089] Modules linked in: vhost_scsi ib_srpt ib_cm ib_sa ib_mad ib_core tcm_qla2xxx tcm_loop
	tcm_fc libfc iscsi_target_mod target_core_pscsi target_core_file
	target_core_iblock target_core_mod configfs ipv6 iscsi_tcp libiscsi_tcp
	libiscsi scsi_transport_iscsi loop acpi_cpufreq freq_table mperf
	kvm_intel kvm crc32c_intel button ehci_pci pcspkr joydev i2c_i801
	microcode ext3 jbd raid10 raid456 async_pq async_xor xor async_memcpy
	async_raid6_recov raid6_pq async_tx raid1 raid0 linear igb hwmon
	i2c_algo_bit i2c_core ptp ata_piix libata qla2xxx uhci_hcd ehci_hcd
	mlx4_core scsi_transport_fc scsi_tgt pps_core
[  257.308748] CPU: 1 PID: 3295 Comm: iscsi_ttx Not tainted 3.10.0-rc2+ #103
[  257.316329] Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.00.0057.031020111721 03/10/2011
[  257.327597]  ffffffff814c24b7 ffff880458331b58 ffffffff8138eef2 ffff880458331b98
[  257.335892]  ffffffff8102c052 ffff880400000008 0000000000000000 ffff88085bdf0000
[  257.344191]  ffff88085bdf00d8 ffff88085bdf00e0 ffff88085bdf00f8 ffff880458331ba8
[  257.352488] Call Trace:
[  257.355223]  [<ffffffff8138eef2>] dump_stack+0x19/0x1f
[  257.360963]  [<ffffffff8102c052>] warn_slowpath_common+0x62/0x7b
[  257.367669]  [<ffffffff8102c080>] warn_slowpath_null+0x15/0x17
[  257.374181]  [<ffffffff81032345>] local_bh_enable_ip+0x3c/0x86
[  257.380697]  [<ffffffff813917fd>] _raw_spin_unlock_bh+0x10/0x12
[  257.387311]  [<ffffffffa029069c>] iscsit_free_r2ts_from_list+0x5e/0x67 [iscsi_target_mod]
[  257.396438]  [<ffffffffa02906c5>] iscsit_release_cmd+0x20/0x223 [iscsi_target_mod]
[  257.404893]  [<ffffffffa02977a4>] lio_release_cmd+0x3a/0x3e [iscsi_target_mod]
[  257.412964]  [<ffffffffa01d59a1>] target_release_cmd_kref+0x7a/0x7c [target_core_mod]
[  257.421712]  [<ffffffffa01d69bc>] target_put_sess_cmd+0x5f/0x7f [target_core_mod]
[  257.430071]  [<ffffffffa01d6d6d>] transport_release_cmd+0x59/0x6f [target_core_mod]
[  257.438625]  [<ffffffffa01d6eb4>] transport_put_cmd+0x131/0x140 [target_core_mod]
[  257.446985]  [<ffffffffa01d6192>] ? transport_wait_for_tasks+0xfa/0x1d5 [target_core_mod]
[  257.456121]  [<ffffffffa01d6f11>] transport_generic_free_cmd+0x4e/0x52 [target_core_mod]
[  257.465159]  [<ffffffff81050537>] ? __migrate_task+0x110/0x110
[  257.471674]  [<ffffffffa02904ba>] iscsit_free_cmd+0x46/0x55 [iscsi_target_mod]
[  257.479741]  [<ffffffffa0291edb>] iscsit_immediate_queue+0x301/0x353 [iscsi_target_mod]
[  257.488683]  [<ffffffffa0292f7e>] iscsi_target_tx_thread+0x1c6/0x2a8 [iscsi_target_mod]
[  257.497623]  [<ffffffff81047486>] ? wake_up_bit+0x25/0x25
[  257.503652]  [<ffffffffa0292db8>] ? iscsit_ack_from_expstatsn+0xd5/0xd5 [iscsi_target_mod]
[  257.512882]  [<ffffffff81046f89>] kthread+0xb0/0xb8
[  257.518329]  [<ffffffff81046ed9>] ? kthread_freezable_should_stop+0x60/0x60
[  257.526105]  [<ffffffff81396fec>] ret_from_fork+0x7c/0xb0
[  257.532133]  [<ffffffff81046ed9>] ? kthread_freezable_should_stop+0x60/0x60
[  257.539906] ---[ end trace 5520397d0f2e0800 ]---

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-05-31 01:21:28 -07:00
Nicholas Bellinger
d5ddad4168 target: Propigate up ->cmd_kref put return via transport_generic_free_cmd
Go ahead and propigate up the ->cmd_kref put return value from
target_put_sess_cmd() -> transport_release_cmd() -> transport_put_cmd()
-> transport_generic_free_cmd().

This is useful for certain fabrics when determining the active I/O
shutdown case with SCF_ACK_KREF where a final target_put_sess_cmd()
is still required by the caller.

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-05-31 01:21:23 -07:00
Linus Torvalds
a93cb29aca Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "One qxl 32-bit warning fix, the rest is a bunch of radeon fixes from
  Alex for some issues we've been seeing."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/qxl: fix build warnings on 32-bit
  radeon: use max_bus_speed to activate gen2 speeds
  drm/radeon: narrow scope of Apple re-POST hack
  drm/radeon: don't check crtcs in card_posted() on cards without DCE
  drm/radeon: fix card_posted check for newer asics
  drm/radeon: fix typo in cu_per_sh on verde
  drm/radeon: UVD block on SUMO2 is the same as on SUMO
2013-05-31 16:04:05 +10:00
Dave Airlie
970fa986fa drm/qxl: fix build warnings on 32-bit
Just the usual printk related warnings.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-05-31 12:45:09 +10:00
Fabio Estevam
d68c380590 clk: mxs: Include clk mxs header file
Fix the following sparse warnings:

drivers/clk/mxs/clk-imx28.c:72:5: warning: symbol 'mxs_saif_clkmux_select' was not declared. Should it be static?
drivers/clk/mxs/clk-imx28.c:156:12: warning: symbol 'mx28_clocks_init' was not declared. Should it be static?

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
[mturquette@linaro.org: fixed $SUBJECT line]
2013-05-30 18:27:24 -07:00
Kees Cook
cea4dcfdad iscsi-target: fix heap buffer overflow on error
If a key was larger than 64 bytes, as checked by iscsi_check_key(), the
error response packet, generated by iscsi_add_notunderstood_response(),
would still attempt to copy the entire key into the packet, overflowing
the structure on the heap.

Remote preauthentication kernel memory corruption was possible if a
target was configured and listening on the network.

CVE-2013-2850

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-05-30 18:07:54 -07:00
Linus Torvalds
4203afc3fb Merge branch 'for-3.10' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
 "A couple minor fixes for the (new to 3.10) gss-proxy code.

  And one regression from user-namespace changes.  (XBMC clients were
  doing something admittedly weird--sending -1 gid's--but something that
  we used to allow.)"

* 'for-3.10' of git://linux-nfs.org/~bfields/linux:
  svcrpc: fix failures to handle -1 uid's and gid's
  svcrpc: implement O_NONBLOCK behavior for use-gss-proxy
  svcauth_gss: fix error code in use_gss_proxy()
2013-05-31 09:48:56 +09:00
Nicholas Bellinger
21363ca873 target/file: Fix off-by-one READ_CAPACITY bug for !S_ISBLK export
This patch fixes a bug where FILEIO was incorrectly reporting the number
of logical blocks (+ 1) when using non struct block_device export mode.

It changes fd_get_blocks() to follow all other backend ->get_blocks() cases,
and reduces the calculated dev_size by one dev->dev_attrib.block_size
number of bytes, and also fixes initial fd_block_size assignment at
fd_configure_device() time introduced in commit 0fd97ccf4.

Reported-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reported-by: Badari Pulavarty <pbadari@us.ibm.com>
Tested-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-05-30 17:46:27 -07:00
Linus Torvalds
484b002e28 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:

 - Three EFI-related fixes

 - Two early memory initialization fixes

 - build fix for older binutils

 - fix for an eager FPU performance regression -- currently we don't
   allow the use of the FPU at interrupt time *at all* in eager mode,
   which is clearly wrong.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Allow FPU to be used at interrupt time even with eagerfpu
  x86, crc32-pclmul: Fix build with older binutils
  x86-64, init: Fix a possible wraparound bug in switchover in head_64.S
  x86, range: fix missing merge during add range
  x86, efi: initial the local variable of DataSize to zero
  efivar: fix oops in efivar_update_sysfs_entries() caused by memory reuse
  efivarfs: Never return ENOENT from firmware again
2013-05-31 09:44:10 +09:00
Pekka Riikonen
5187b28ff0 x86: Allow FPU to be used at interrupt time even with eagerfpu
With the addition of eagerfpu the irq_fpu_usable() now returns false
negatives especially in the case of ksoftirqd and interrupted idle task,
two common cases for FPU use for example in networking/crypto.  With
eagerfpu=off FPU use is possible in those contexts.  This is because of
the eagerfpu check in interrupted_kernel_fpu_idle():

...
  * For now, with eagerfpu we will return interrupted kernel FPU
  * state as not-idle. TBD: Ideally we can change the return value
  * to something like __thread_has_fpu(current). But we need to
  * be careful of doing __thread_clear_has_fpu() before saving
  * the FPU etc for supporting nested uses etc. For now, take
  * the simple route!
...
 	if (use_eager_fpu())
 		return 0;

As eagerfpu is automatically "on" on those CPUs that also have the
features like AES-NI this patch changes the eagerfpu check to return 1 in
case the kernel_fpu_begin() has not been said yet.  Once it has been the
__thread_has_fpu() will start returning 0.

Notice that with eagerfpu the __thread_has_fpu is always true initially.
FPU use is thus always possible no matter what task is under us, unless
the state has already been saved with kernel_fpu_begin().

[ hpa: this is a performance regression, not a correctness regression,
  but since it can be quite serious on CPUs which need encryption at
  interrupt time I am marking this for urgent/stable. ]

Signed-off-by: Pekka Riikonen <priikone@iki.fi>
Link: http://lkml.kernel.org/r/alpine.GSO.2.00.1305131356320.18@git.silcnet.org
Cc: <stable@vger.kernel.org> v3.7+
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-30 16:36:42 -07:00
Jan Beulich
2baad6121e x86, crc32-pclmul: Fix build with older binutils
binutils prior to 2.18 (e.g. the ones found on SLE10) don't support
assembling PEXTRD, so a macro based approach like the one for PCLMULQDQ
in the same file should be used.

This requires making the helper macros capable of recognizing 32-bit
general purpose register operands.

[ hpa: tagging for stable as it is a low risk build fix ]

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/51A6142A02000078000D99D8@nat28.tlf.novell.com
Cc: Alexander Boyko <alexander_boyko@xyratex.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-30 16:36:23 -07:00
Dave Chinner
7bc0dc271e xfs: rework remote attr CRCs
Note: this changes the on-disk remote attribute format. I assert
that this is OK to do as CRCs are marked experimental and the first
kernel it is included in has not yet reached release yet. Further,
the userspace utilities are still evolving and so anyone using this
stuff right now is a developer or tester using volatile filesystems
for testing this feature. Hence changing the format right now to
save longer term pain is the right thing to do.

The fundamental change is to move from a header per extent in the
attribute to a header per filesytem block in the attribute. This
means there are more header blocks and the parsing of the attribute
data is slightly more complex, but it has the advantage that we
always know the size of the attribute on disk based on the length of
the data it contains.

This is where the header-per-extent method has problems. We don't
know the size of the attribute on disk without first knowing how
many extents are used to hold it. And we can't tell from a
mapping lookup, either, because remote attributes can be allocated
contiguously with other attribute blocks and so there is no obvious
way of determining the actual size of the atribute on disk short of
walking and mapping buffers.

The problem with this approach is that if we map a buffer
incorrectly (e.g. we make the last buffer for the attribute data too
long), we then get buffer cache lookup failure when we map it
correctly. i.e. we get a size mismatch on lookup. This is not
necessarily fatal, but it's a cache coherency problem that can lead
to returning the wrong data to userspace or writing the wrong data
to disk. And debug kernels will assert fail if this occurs.

I found lots of niggly little problems trying to fix this issue on a
4k block size filesystem, finally getting it to pass with lots of
fixes. The thing is, 1024 byte filesystems still failed, and it was
getting really complex handling all the corner cases that were
showing up. And there were clearly more that I hadn't found yet.

It is complex, fragile code, and if we don't fix it now, it will be
complex, fragile code forever more.

Hence the simple fix is to add a header to each filesystem block.
This gives us the same relationship between the attribute data
length and the number of blocks on disk as we have without CRCs -
it's a linear mapping and doesn't require us to guess anything. It
is simple to implement, too - the remote block count calculated at
lookup time can be used by the remote attribute set/get/remove code
without modification for both CRC and non-CRC filesystems. The world
becomes sane again.

Because the copy-in and copy-out now need to iterate over each
filesystem block, I moved them into helper functions so we separate
the block mapping and buffer manupulations from the attribute data
and CRC header manipulations. The code becomes much clearer as a
result, and it is a lot easier to understand and debug. It also
appears to be much more robust - once it worked on 4k block size
filesystems, it has worked without failure on 1k block size
filesystems, too.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit ad1858d777)
2013-05-30 17:26:31 -05:00
Dave Chinner
634fd5322a xfs: fully initialise temp leaf in xfs_attr3_leaf_compact
xfs_attr3_leaf_compact() uses a temporary buffer for compacting the
the entries in a leaf. It copies the the original buffer into the
temporary buffer, then zeros the original buffer completely. It then
copies the entries back into the original buffer.  However, the
original buffer has not been correctly initialised, and so the
movement of the entries goes horribly wrong.

Make sure the zeroed destination buffer is fully initialised, and
once we've set up the destination incore header appropriately, write
is back to the buffer before starting to move entries around.

While debugging this, the _d/_s prefixes weren't sufficient to
remind me what buffer was what, so rename then all _src/_dst.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit d4c712bcf2)
2013-05-30 17:26:24 -05:00