Commit Graph

533609 Commits

Author SHA1 Message Date
James Morris
3dbbbe0eb6 Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/selinux into for-linus2 2015-07-11 09:13:45 +10:00
Stephen Smalley
892e8cac99 selinux: fix mprotect PROT_EXEC regression caused by mm change
commit 66fc130394 ("mm: shmem_zero_setup
skip security check and lockdep conflict with XFS") caused a regression
for SELinux by disabling any SELinux checking of mprotect PROT_EXEC on
shared anonymous mappings.  However, even before that regression, the
checking on such mprotect PROT_EXEC calls was inconsistent with the
checking on a mmap PROT_EXEC call for a shared anonymous mapping.  On a
mmap, the security hook is passed a NULL file and knows it is dealing
with an anonymous mapping and therefore applies an execmem check and no
file checks.  On a mprotect, the security hook is passed a vma with a
non-NULL vm_file (as this was set from the internally-created shmem
file during mmap) and therefore applies the file-based execute check
and no execmem check.  Since the aforementioned commit now marks the
shmem zero inode with the S_PRIVATE flag, the file checks are disabled
and we have no checking at all on mprotect PROT_EXEC.  Add a test to
the mprotect hook logic for such private inodes, and apply an execmem
check in that case.  This makes the mmap and mprotect checking
consistent for shared anonymous mappings, as well as for /dev/zero and
ashmem.

Cc: <stable@vger.kernel.org> # 4.1.x
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-07-10 16:45:29 -04:00
Linus Torvalds
1604f8719a arm64 fixes/clean-up:
- ACPI fix when checking the validity of the GICC MADT subtable
 - handle debug exceptions in the el*_inv exception entries
 - remove pointless register assignment in two compat syscall wrappers
 - unnecessary include path
 - defconfig update
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVn/w6AAoJEGvWsS0AyF7xP+4QAKutBGru8nUQ2n74mwNExD3c
 /6d9BKr9+ji1C5IYLFeaRNSuaZYxwDYoulzE3rL9Cs8IRkZRLGc2ACTdbbtzn/SP
 zgSRk/z5wdXbhosBAT3C4A1D0KrEQnFasW0VoTpt4AONezS95lljNdvEmWXwEoC+
 SEmQkh3DRLdHJFZNExunsB9hEdD2NOwackRvtQlzXEOuPTWo7uYF5O6o9spOi1KG
 cGQZiDdMtq7n44gH+NHfVyelGhxMgyxzLZhpfkG5l1a4gpfVXVrSwzGtF2ZFuufJ
 EsBHOozStl3PPyPvhfDSyS6NiySSqAM9ZYP/Dx9HSob3aLWS/4pmnkOolHJz7beo
 UoRdwPRquEUN+9cicBdkn+/t9dQRoL4TKSpoVNGkTEg8GjM0h8x++O3sgEmMcMXp
 KWFbUvEiJG5PcHQgEmVN8t4mQEcbVbymkdvQrvLo2lTMuJOEKdOSJNrXTisf4r8P
 2BcPKzcIWl83gxh+OdOykPioLIDzISYjJixqB0kgLBZd4Q+Idn/WRU6S2P+ryWnN
 3Iavmt7ZgBihwJ1Gtex6phxewxGsYRRM9gzW4kiIrDjkq1KbrtpA/s+wH7DzGRN1
 7KSKbuamzr9MO3+kHg5TuCbAfTnNGVujYmLauZp5QTFdDaS5SQfPHJF3Rq4po1H4
 89Rb625/L3XL887jih9+
 =4wCv
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes and clean-up from Catalin Marinas:
 - ACPI fix when checking the validity of the GICC MADT subtable
 - handle debug exceptions in the el*_inv exception entries
 - remove pointless register assignment in two compat syscall wrappers
 - unnecessary include path
 - defconfig update

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: entry32: remove pointless register assignment
  arm64: entry: handle debug exceptions in el*_inv
  arm64: Keep the ARM64 Kconfig selects sorted
  ACPI / ARM64 : use the new BAD_MADT_GICC_ENTRY macro
  ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro
  arm64: defconfig: Add Ceva ahci to the defconfig
  arm64: remove another unnecessary libfdt include path
2015-07-10 12:49:56 -07:00
John David Anglin
01ab605704 parisc: Fix some PTE/TLB race conditions and optimize __flush_tlb_range based on timing results
The increased use of pdtlb/pitlb instructions seemed to increase the
frequency of random segmentation faults building packages. Further, we
had a number of cases where TLB inserts would repeatedly fail and all
forward progress would stop. The Haskell ghc package caused a lot of
trouble in this area. The final indication of a race in pte handling was
this syslog entry on sibaris (C8000):

 swap_free: Unused swap offset entry 00000004
 BUG: Bad page map in process mysqld  pte:00000100 pmd:019bbec5
 addr:00000000ec464000 vm_flags:00100073 anon_vma:0000000221023828 mapping: (null) index:ec464
 CPU: 1 PID: 9176 Comm: mysqld Not tainted 4.0.0-2-parisc64-smp #1 Debian 4.0.5-1
 Backtrace:
  [<0000000040173eb0>] show_stack+0x20/0x38
  [<0000000040444424>] dump_stack+0x9c/0x110
  [<00000000402a0d38>] print_bad_pte+0x1a8/0x278
  [<00000000402a28b8>] unmap_single_vma+0x3d8/0x770
  [<00000000402a4090>] zap_page_range+0xf0/0x198
  [<00000000402ba2a4>] SyS_madvise+0x404/0x8c0

Note that the pte value is 0 except for the accessed bit 0x100. This bit
shouldn't be set without the present bit.

It should be noted that the madvise system call is probably a trigger for many
of the random segmentation faults.

In looking at the kernel code, I found the following problems:

1) The pte_clear define didn't take TLB lock when clearing a pte.
2) We didn't test pte present bit inside lock in exception support.
3) The pte and tlb locks needed to merged in order to ensure consistency
between page table and TLB. This also has the effect of serializing TLB
broadcasts on SMP systems.

The attached change implements the above and a few other tweaks to try
to improve performance. Based on the timing code, TLB purges are very
slow (e.g., ~ 209 cycles per page on rp3440). Thus, I think it
beneficial to test the split_tlb variable to avoid duplicate purges.
Probably, all PA 2.0 machines have combined TLBs.

I dropped using __flush_tlb_range in flush_tlb_mm as I realized all
applications and most threads have a stack size that is too large to
make this useful. I added some comments to this effect.

Since implementing 1 through 3, I haven't had any random segmentation
faults on mx3210 (rp3440) in about one week of building code and running
as a Debian buildd.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Helge Deller <deller@gmx.de>
2015-07-10 21:47:47 +02:00
Alex Ivanov
cb908ed349 stifb: Implement hardware accelerated copyarea
This patch adds hardware assisted scrolling. The code is based upon the
following investigation: https://parisc.wiki.kernel.org/index.php/NGLE#Blitter

A simple 'time ls -la /usr/bin' test shows 1.6x speed increase over soft
copy and 2.3x increase over FBINFO_READS_FAST (prefer soft copy over
screen redraw) on Artist framebuffer.

Signed-off-by: Alex Ivanov <lausgans@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
2015-07-10 21:44:19 +02:00
Linus Torvalds
3cdeb9d151 powerpc fixes for 4.2
- opal-prd mmap fix from Vaidy.
 - Set kernel taint for MCEs from Daniel.
 - Alignment exception description from Anton.
 - ppc4xx_hsta_msi build fix from Daniel
 - opal-elog interrupt fix from Alistair.
 - core_idle_state race fix from Shreyas.
 - hv-24x7 lockdep fix from Sukadev.
 - Multiple cxl fixes from Daniel, Ian, Mikey & Maninder.
 - Update MAINTAINERS to point at shared tree.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVn3eKAAoJEFHr6jzI4aWAPCAP/3ORZnylGUJGlR7GMtktn7vm
 XJyagXcFbtBYDz8HJUtDESwjCGd/mSOErBZaOEBWgv3qWf60VJzAnzQTgKvIWU1j
 4WXILIQivb9ibajUN5SkghgSgwekc7VqJnnlA2BfVTMtZiuD5DQqMWs4Mc/jIREU
 41g/Fc1vCiXW7dwFAxtvH14kBGCmkU+Fd/z9bDlOeLVAyDlqEl/dCdtjyRpipHSd
 nzAea2s9bwH6QYNSZKjtnTbJAelrg/ZG8CHSkr3UGTf/ak/YouPqzWp4aJcRmWe3
 GMCeC+93fCQ4bOuzQolgdYHPbMQa/sil+3RLuipPETLV+dbqhtMb/NLxqcihyKuE
 V8Sk7PsIPtveCbCOyvQTM3RrUtg7oOYPgraXrKtICx3n05vkVNI+Q/3uCWwmic42
 396KR9lcdpn3TDl6+MgJsWvKCxM0DX4dsFMQwjoXwi2Evd0EpMDfxIVBnCwzcRBw
 WNILcGT+uupfKrrROdC7NNmgevAK0mRWX5NeguRIk8AEe2ywaKZ2cBGhxte7669P
 Y98OuNtHhv4Pvhni0uRB0UTFaxjkSTZqJzUHXAl9xfRPlD1i+UVTdEAaRxN6yyn0
 r7c5b0o1fTiM/Nxvh6WL9rBV10XhJ0XerKqO4PU3zW9olZKG7ZUqFF/qsXklljAc
 FNJN31RCIgtctO+iLe5e
 =+CJX
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 - opal-prd mmap fix from Vaidy
 - set kernel taint for MCEs from Daniel
 - alignment exception description from Anton
 - ppc4xx_hsta_msi build fix from Daniel
 - opal-elog interrupt fix from Alistair
 - core_idle_state race fix from Shreyas
 - hv-24x7 lockdep fix from Sukadev
 - multiple cxl fixes from Daniel, Ian, Mikey & Maninder
 - update MAINTAINERS to point at shared tree

* tag 'powerpc-4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  cxl: Check if afu is not null in cxl_slbia
  powerpc: Update MAINTAINERS to point at shared tree
  powerpc/perf/24x7: Fix lockdep warning
  cxl: Fix off by one error allowing subsequent mmap page to be accessed
  cxl: Fail mmap if requested mapping is larger than assigned problem state area
  cxl: Fix refcounting in kernel API
  powerpc/powernv: Fix race in updating core_idle_state
  powerpc/powernv: Fix opal-elog interrupt handler
  powerpc/ppc4xx_hsta_msi: Include ppc-pci.h to fix reference to hose_list
  powerpc: Add plain English description for alignment exception oopses
  cxl: Test the correct mmio space before unmapping
  powerpc: Set the correct kernel taint on machine check errors
  cxl/vphb.c: Use phb pointer after NULL check
  powerpc/powernv: Fix vma page prot flags in opal-prd driver
2015-07-10 12:16:59 -07:00
Ross Zwisler
f0f2c072cf nfit: add support for NVDIMM "latch" flag
Add support in the NFIT BLK I/O path for the "latch" flag
defined in the "Get Block NVDIMM Flags" _DSM function:

http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

This flag requires the driver to read back the command register after it
is written in the block I/O path.  This ensures that the hardware has
fully processed the new command and moved the aperture appropriately.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-10 14:43:50 -04:00
Ross Zwisler
c2ad29540c nfit: update block I/O path to use PMEM API
Update the nfit block I/O path to use the new PMEM API and to adhere to
the read/write flows outlined in the "NVDIMM Block Window Driver
Writer's Guide":

http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf

This includes adding support for targeted NVDIMM flushes called "flush
hints" in the ACPI 6.0 specification:

http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf

For performance and media durability the mapping for a BLK aperture is
moved to a write-combining mapping which is consistent with
memcpy_to_pmem() and wmb_blk().

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-10 14:35:45 -04:00
Dan Williams
9d27a87ec9 tools/testing/nvdimm: add mock acpi_nfit_flush_address entries to nfit_test
In preparation for fixing the BLK path to properly use "directed
pcommit" enable the unit test infrastructure to emit mock "flush"
tables.  Writes to these flush addresses trigger a memory controller to
flush its internal buffers to persistent media, similar to the x86
"pcommit" instruction.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-10 14:07:03 -04:00
Dan Williams
f7ec83684a tools/testing/nvdimm: fix return code for unimplemented commands
The implementation for the new "DIMM Flags" DSM relies on the -ENOTTY
return code to indicate that the flags are unimplimented and to fall
back to a safe default.  As is the -ENXIO error code erroneoously
indicates to fail enabling a BLK region.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-10 13:50:50 -04:00
Dan Williams
b1b2e6235a tools/testing/nvdimm: mock ioremap_wt
In the 4.2-rc1 merge the default_memremap_pmem() implementation switched
from ioremap_nocache() to ioremap_wt().  Add it to the list of mocked
routines to restore the ability to run the unit tests.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-10 13:50:50 -04:00
Ross Zwisler
b864bc17f1 pmem: add maintainer for include/linux/pmem.h
The file include/linux/pmem.h was recently created to hold the PMEM API,
and is logically part of the PMEM driver.  Add an entry for this file to
MAINTAINERS.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-07-10 13:50:50 -04:00
Dmitry Torokhov
dbf3c37086 Revert "Input: synaptics - allocate 3 slots to keep stability in image sensors"
This reverts commit 63c4fda3c0 as it
causes issues with detecting 3-finger taps.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100481
Cc: stable@vger.kernel.org
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
2015-07-10 10:11:07 -07:00
Mark Rutland
ad2daa85bd arm64: entry32: remove pointless register assignment
We currently set x27 in compat_sys_sigreturn_wrapper and
compat_sys_rt_sigreturn_wrapper, similarly to what we do with r8/why on
32-bit ARM, in an attempt to prevent sigreturns from being restarted.

However, on arm64 we have always used pt_regs::syscallno for syscall
restarting (for both native and compat tasks), and x27 is never
inspected again before being overwritten in kernel_exit.

This patch removes the pointless register assignments.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-07-10 16:47:13 +01:00
Ralf Baechle
51d53674c3 MIPS: O32: Use compat_sys_getsockopt.
We were using the native syscall and that results in subtle breakage.

This is the same issue as fixed in 077d0e6561
(MIPS: N32: Use compat getsockopt syscall) but that commit did fix it only
for N32.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=100291
2015-07-10 11:02:22 +02:00
Paul Burton
1e18ac7aea MIPS: c-r4k: Extend way_string array
The L2 cache in the I6400 core has 16 ways, so extend the way_string
array to take such caches into account.

[ralf@linux-mips.org: Other already supported CPUs are free to support
more than 8 ways of cache as well.]

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10640/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-07-10 11:02:20 +02:00
James Hogan
6b5e741e9a MIPS: Pistachio: Support CDMM & Fast Debug Channel
Implement the mips_cdmm_phys_base() platform callback to provide a
default Common Device Memory Map (CDMM) physical base address for the
Pistachio SoC. This allows the CDMM in each VPE to be configured and
probed for devices, such as the Fast Debug Channel (FDC).

The physical address chosen is just below the default CPC address, which
appears to also be unallocated.

The FDC IRQ is also usable on Pistachio, and is routed through the GIC,
so implement the get_c0_fdc_int() platform callback using
gic_get_c0_fdc_int(), so the FDC driver doesn't have to fall back to
polling.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andrew Bresticker <abrestic@chromium.org>
Cc: James Hartley <james.hartley@imgtec.com>
Cc: linux-mips@linux-mips.org
Reviewed-by: Andrew Bresticker <abrestic@chromium.org>
Patchwork: http://patchwork.linux-mips.org/patch/9749/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-07-10 11:02:20 +02:00
James Hogan
6249ecbbb7 MIPS: Malta: Make GIC FDC IRQ workaround Malta specific
Wider testing reveals that the Fast Debug Channel (FDC) interrupt is
routed through the GIC just fine on Pistachio SoC, even though it
contains interAptiv cores. Clearly the FDC interrupt routing problems
previously observed on interAptiv and proAptiv cores are specific to the
Malta FPGA bitstreams.

Move the workaround for interAptiv and proAptiv out of
gic_get_c0_fdc_int() in the GIC irqchip driver into Malta's
get_c0_fdc_int() platform callback, to allow the Pistachio SoC to use
the FDC interrupt.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andrew Bresticker <abrestic@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: linux-mips@linux-mips.org
Reviewed-by: Andrew Bresticker <abrestic@chromium.org>
Cc: James Hartley <james.hartley@imgtec.com>
Patchwork: http://patchwork.linux-mips.org/patch/9748/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-07-10 11:02:18 +02:00
Markos Chandras
cccf34e941 MIPS: c-r4k: Fix cache flushing for MT cores
MT_SMP is not the only SMP option for MT cores. The MT_SMP option
allows more than one VPE per core to appear as a secondary CPU in the
system. Because of how CM works, it propagates the address-based
cache ops to the secondary cores but not the index-based ones.
Because of that, the code does not use IPIs to flush the L1 caches on
secondary cores because the CM would have done that already. However,
the CM functionality is independent of the type of SMP kernel so even in
non-MT kernels, IPIs are not necessary. As a result of which, we change
the conditional to depend on the CM presence. Moreover, since VPEs on
the same core share the same L1 caches, there is no need to send an
IPI on all of them so we calculate a suitable cpumask with only one
VPE per core.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: <stable@vger.kernel.org> # 3.15+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/10654/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-07-10 10:59:16 +02:00
Dave Airlie
2d28b633c3 omapdrm fixes for 4.2
Small fixes for omapdrm, including:
 * Fix packed 24 bit color formats
 * Ensure the planes are inside the crtc
 * Handle out-of-dma-memory error
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVlkYTAAoJEPo9qoy8lh710vwQAJ6UlpVR9Yo1YjywKbFw3GrM
 NAhtBxpUkM13QIpWIq7dzEtSzOU90KC7rCPmlxCLSE8zDKUWDbBexja1gdpe43n4
 I2HViKQI/GMtph4OkolPVWn3tiIQJp8JAqQsBpZkiJpLvRQZC/SKSm42ZkuqPwMX
 4+gq3c82ZCnA86BrXeVkxIGit+J2Jzf+dqb5F/4g+sVu+XjpsHIyHv5wCXBetTkW
 HiCIlej0FaROQMDqnYOE+M8InpFuA4Wc9JGPgUGRH/RQKnBocaOsdp/i98iAyBPD
 rqjFJYMMt+TOE9/I6NFZ+GyAFSV3M5WT+bc7KUmj7RgCXNHBxxGBzWC3SH56Zk3o
 ipDsjyLyZksLyMvrVG9dPjMCcU3rm7JOYeodQa9Y3Hc6Em3Wc7UwWXg9JoPLJXRN
 0UayZYSQJsP2/mBF+1eNsiNgDNz1AyRwB4qPb7jEgwCti6kbGv4AMTiwX+BbIJpm
 989UaRBze07GAbjiO0UVjydegNKf6Vp8J3B8+GnGHBQWnFIyB3a6bGmdBSeubjsg
 uPcsC/r8f2RTkOpyf6Mk3pCpkyi+S8hQvUchaPqdgjl37o/xRl2tq9mo3szRASDV
 HvxYh1eqnM1YIgDrDfO/FJtNqQlVULlg0HQO1Tu+wwOMOC9rAuSHSvBXDKVLN7KU
 8kS017e4KRPXlGHDAL6q
 =mgUq
 -----END PGP SIGNATURE-----

Merge tag 'omapdrm-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux into drm-fixes

omapdrm fixes for 4.2

Small fixes for omapdrm, including:
* Fix packed 24 bit color formats
* Ensure the planes are inside the crtc
* Handle out-of-dma-memory error

* tag 'omapdrm-4.2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
  drm/omap: replace ALIGN(PAGE_SIZE) by PAGE_ALIGN
  drm/omap: fix align_pitch() for 24 bits per pixel
  drm/omap: fix omap_gem_put_paddr() error handling
  drm/omap: fix omap_framebuffer_unpin() error handling
  drm/omap: increase DMM transaction timeout
  drm/omap: check that plane is inside crtc
  drm/omap: return error if dma_alloc_writecombine fails
2015-07-10 15:59:35 +10:00
Dave Airlie
59e7a16d60 Merge tag 'drm-intel-fixes-2015-07-09' of git://anongit.freedesktop.org/drm-intel into drm-fixes
Pile of fixes for either 4.2 issues or cc: stable. This should fix the 2nd
kind of WARNING Linus's been seeing, please ask him to scream if that's
not the case.

* tag 'drm-intel-fixes-2015-07-09' of git://anongit.freedesktop.org/drm-intel:
  Revert "drm/i915: Allocate context objects from stolen"
  drm/i915: Declare the swizzling unknown for L-shaped configurations
  drm/i915: Use crtc_state->active in primary check_plane func
  drm/i915: Check crtc->active in intel_crtc_disable_planes
  drm/i915: Restore all GGTT VMAs on resume
  drm/i915/chv: fix HW readout of the port PLL fractional divider
2015-07-10 15:58:43 +10:00
Dave Airlie
008b3f1f1c Merge tag 'drm-amdkfd-fixes-2015-07-09' of git://people.freedesktop.org/~gabbayo/linux into drm-fixes
A single fix so far for 4.2:
- checking a pointer is not null before using it

* tag 'drm-amdkfd-fixes-2015-07-09' of git://people.freedesktop.org/~gabbayo/linux:
  drm/amdkfd: validate pdd where it acquired first
2015-07-10 15:56:19 +10:00
Dave Airlie
9d5715f9de Merge branch 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
radeon and amdgpu fixes for 4.2.  All over the place:
- fix cursor corruption on resume and re-enable no VT switch on suspend
- vblank fixes
- fix gpuvm error messages
- misc other fixes

* 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon: disable vce init on cayman (v2)
  drm/amdgpu: fix timeout calculation
  drm/radeon: check if BO_VA is set before adding it to the invalidation list
  drm/radeon: allways add the VM clear duplicate
  Revert "Revert "drm/radeon: dont switch vt on suspend""
  drm/radeon: Fold radeon_set_cursor() into radeon_show_cursor()
  drm/radeon: unpin cursor BOs on suspend and pin them again on resume (v2)
  drm/radeon: Clean up reference counting and pinning of the cursor BOs
  drm/radeon: fix underflow in r600_cp_dispatch_texture()
  drm/radeon: default to 2048 MB GART size on SI+
  drm/radeon: fix HDP flushing
  drm/radeon: use RCU query for GEM_BUSY syscall
  drm/amdgpu: Handle irqs only based on irq ring, not irq status regs.
  drm/radeon: Handle irqs only based on irq ring, not irq status regs.
2015-07-10 15:55:48 +10:00
Eric Dumazet
a7d35f9d73 bridge: fix potential crash in __netdev_pick_tx()
Commit c29390c6df ("xps: must clear sender_cpu before forwarding")
fixed an issue in normal forward path, caused by sender_cpu & napi_id
skb fields being an union.

Bridge is another point where skb can be forwarded, so we need
the same cure.

Bug triggers if packet was received on a NIC using skb_mark_napi_id()

Fixes: 2bd82484bb ("xps: fix xps for stacked devices")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Bob Liu <bob.liu@oracle.com>
Tested-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 22:48:42 -07:00
Eric Dumazet
a4e2405cc5 tcp: do not export tcp_init_xmit_timers()
After commit 900f65d361 ("tcp: move duplicate code from
tcp_v4_init_sock()/tcp_v6_init_sock()"), we no longer
need to export tcp_init_xmit_timers()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 21:44:38 -07:00
Krzysztof Kozlowski
fcc028c106 net: axienet: Fix devm_ioremap_resource return value check
Value returned by devm_ioremap_resource() was checked for non-NULL but
devm_ioremap_resource() returns IOMEM_ERR_PTR, not NULL. In case of
error this could lead to dereference of ERR_PTR.

Signed-off-by: Krzysztof Kozlowski <k.kozlowski.k@gmail.com>
Cc: <stable@vger.kernel.org>
Fixes: 46aa27df88 ("net: axienet: Use devm_* calls")
Reviewed-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 21:43:16 -07:00
Nikolay Aleksandrov
09cf0211f9 bridge: mdb: fill state in br_mdb_notify
Fill also the port group state when sending notifications.

Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 21:41:11 -07:00
Masatake YAMATO
cb1c61680d route: remove unsed variable in __mkroute_input
flags local variable in __mkroute_input is not used as a variable.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 21:34:34 -07:00
Tom Herbert
35a256fee5 ipv6: Nonlocal bind
Add support to allow non-local binds similar to how this was done for IPv4.
Non-local binds are very useful in emulating the Internet in a box, etc.

This add the ip_nonlocal_bind sysctl under ipv6.

Testing:

Set up nonlocal binding and receive routing on a host, e.g.:

ip -6 rule add from ::/0 iif eth0 lookup 200
ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
sysctl -w net.ipv6.ip_nonlocal_bind=1

Set up routing to 2001:0:0:1::/64 on peer to go to first host

ping6 -I 2001:0:0:1::1 peer-address -- to verify

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 21:09:10 -07:00
Daniel Axtens
2c069a118f cxl: Check if afu is not null in cxl_slbia
The pointer to an AFU in the adapter's list of AFUs can be null
if we're in the process of removing AFUs. The afu_list_lock
doesn't guard against this.

Say we have 2 slices, and we're in the process of removing cxl.
 - We remove the AFUs in order (see cxl_remove). In cxl_remove_afu
   for AFU 0, we take the lock, set adapter->afu[0] = NULL, and
   release the lock.
 - Then we get an slbia. In cxl_slbia we take the lock, and set
   afu = adapter->afu[0], which is NULL.
 - Therefore our attempt to check afu->enabled will blow up.

Therefore, check if afu is a null pointer before dereferencing it.

Cc: stable@vger.kernel.org
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Michael Neuling <mikey@neuling.org>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-10 11:44:25 +10:00
Kevin Hilman
8dfbc0ab34 Minor fixes for omaps against v4.2-rc1. Mostly just minor dts changes
except for a GPMC fix to not use names for probing devices. Also a
 one liner clean-up to remove unecessary return from a void function.
 
 The summary for the changes being:
 
 - Fix probe for GPMC devices by reoving limitations based on device
   name
 
 - Remove unnecessary return from a void function
 
 - Revert beaglebone RTC sleep fix, we now have a better fix merged
 
 - Add am4372 EMIF node to fix a warning
 
 - Add am57xx-beagle-x15 power supply to fix USB2 if USB1 is disabled
 
 - Disable rfbi for am4372 as it does not have a driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVnjdMAAoJEBvUPslcq6VzDKoP/jgmjk4PwILrpu7P94RFpT2L
 Jp6mu8zg+LIJ+c/TtsDX/uVxw2VKfiXEjt4onLIPA4PyYJw+oj87dXIKooP8olCH
 6qoBqGzs7EtM/u+HAMifgCI0hshRoH4SinPb6Pq96cNSpvBrnckJaV5ZZ/yBgJAG
 uNUGZZKGs4RkKt6mybIaZgsDlrTMV0pQ+iSIYR9kHumnwq4PUMfFsECDfoI87W6I
 2ug962Co4e9+BNGhqpzqq8Xa5SJJrVatMrVKeAC5Eg54F+FihbnjncMBdqW4cZfv
 FYZDd//hIqDTE9t/2xy70+yHgtW7qgxmUCfNH62rdsMvVWsTzjcpaLw2G/bTALkz
 n0X3dCSNdMInjaFBxun93C63kNJUBLEWzKZ1TiR9uQEcS2O9wiH2NzZ8dfDEse2L
 qP+/G95M4mAQeePOZpTCcu3x6v3SA6AIgWepB8bat1kglL7OPaa08nO33RAIWfv+
 1iyZ6XY+N8bNYy7fCt6X4byws8woApKU4Fp28rTDwbm/jRD0GWz/AHSXz1Xoimz1
 R47IT3eyMrB4c6yDvIqm0ghKt+QXspblcElCVGuvV0iKfbKUrtwV00z3/dKCxMtc
 gfJHQOkZobfro4fCpMvg4jeTOpqAX9g9UdqotrsEtV0xZfbwrfbSRpuSVbLHz3Io
 dNG3MP164dB4SuxUWo7T
 =uoa8
 -----END PGP SIGNATURE-----

Merge tag 'omap-for-v4.2/fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes

Merge "omap fixes against v4.2-rc1" from Tony Lindgren:

Minor fixes for omaps against v4.2-rc1. Mostly just minor dts changes
except for a GPMC fix to not use names for probing devices. Also a
one liner clean-up to remove unecessary return from a void function.

The summary for the changes being:

- Fix probe for GPMC devices by reoving limitations based on device
  name

- Remove unnecessary return from a void function

- Revert beaglebone RTC sleep fix, we now have a better fix merged

- Add am4372 EMIF node to fix a warning

- Add am57xx-beagle-x15 power supply to fix USB2 if USB1 is disabled

- Disable rfbi for am4372 as it does not have a driver

* tag 'omap-for-v4.2/fixes-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
  ARM: dts: am4372.dtsi: disable rfbi
  ARM: dts: am57xx-beagle-x15: Provide supply for usb2_phy2
  ARM: dts: am4372: Add emif node
  Revert "ARM: dts: am335x-boneblack: disable RTC-only sleep"
  ARM: OMAP2+: Remove unnessary return statement from the void function, omap2_show_dma_caps
  memory: omap-gpmc: Fix parsing of devices
2015-07-09 15:38:16 -07:00
David S. Miller
5a10ececc6 Merge branch 'tw_cleanups'
Eric Dumazet says:

====================
inet: timewait cleanups

Another round of patches to make tw handling simpler.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 15:12:21 -07:00
Eric Dumazet
dbe7faa404 inet: inet_twsk_deschedule factorization
inet_twsk_deschedule() calls are followed by inet_twsk_put().

Only particular case is in inet_twsk_purge() but there is no point
to defer the inet_twsk_put() after re-enabling BH.

Lets rename inet_twsk_deschedule() to inet_twsk_deschedule_put()
and move the inet_twsk_put() inside.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 15:12:20 -07:00
Eric Dumazet
fc01538f9f inet: simplify timewait refcounting
timewait sockets have a complex refcounting logic.
Once we realize it should be similar to established and
syn_recv sockets, we can use sk_nulls_del_node_init_rcu()
and remove inet_twsk_unhash()

In particular, deferred inet_twsk_put() added in commit
13475a30b6 ("tcp: connect() race with timewait reuse")
looks unecessary : When removing a timewait socket from
ehash or bhash, caller must own a reference on the socket
anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 15:12:20 -07:00
Eric Dumazet
3fd2f1b9d9 inet: remove BUG_ON() in twsk_destructor()
Kernel will crash the same if one of the pointer is NULL anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 15:12:20 -07:00
Kevin Hilman
d024bae2c4 Allwinner late changes for 4.2
A bunch of defconfig changes, and some patches to make the Allwinner H3 and
 A33 boot properly.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVmUC5AAoJEBx+YmzsjxAgc24P/Ah6q+SSSv68adzGQp6LGO8t
 ru2OWZwlQn+w2gHJK9qAytmQdpSRqbbjB+8muLz9WwNNjHliDmJawUvBPdC2p03L
 ZrugiVUSKBeQOzdqvG0B3vOO3xszDE06DA+i0PZaBWH2uXDVfohhCFQ2n2/oQsS7
 eDSlBl1EHtZOKEHtuEi988YDJhYDnJeSG/SWwyUbW73QH5eYYl+9TnSXdwDKGmJS
 JyfWkafIsI3EKWUYmOJ2NgUv13pYQU9k/we53n+PxeGevsPuenJH8r2j/N8pWtUl
 srfZP82aEYCqlY4M6QL21k2KOPW8MIDMOOk9dh8l8k9nSC2oJXqsROJQpw2nrJRo
 oPoT+gxWacc+AutX3Hz+XjXNsCIsE3q0Yd7yBfiszXroPtkIBslJGhdJRbh3VrxO
 JS7lKVubJepePpARiWT4ginenWPgGPirDHWixSUcfgBnMN6fpdhOGpdDxWePy1IJ
 7wox3lRThnBhhvI1HuLES92XEiRYhin21LA00juPbvCMbQKuG2W9fBryXTi3U3LC
 g3RQfy3ViZbrEw/7iTjlm2MVYSRxSdA1SAQHNS5Q+4fem22XKeSlzHON/0izrCdV
 10zWNUAlolESWhCyD3TMScFl7WG+KMrjaHE/02+q7wSA6sr1aMGfjL6buVA7R5bF
 KiCWz4Sqp8p/a4RejDVl
 =m0CY
 -----END PGP SIGNATURE-----

Merge tag 'sunxi-late-for-4.2' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into fixes

Merge "Allwinner late changes for 4.2" from Maxime Ripard:

Allwinner late changes for 4.2

A bunch of defconfig changes, and some patches to make the Allwinner H3 and
A33 boot properly.

* tag 'sunxi-late-for-4.2' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux:
  ARM: sunxi: Enable simplefb in the defconfig
  ARM: Remove deprecated symbol from defconfig files
  ARM: sunxi: Add Machine support for A33
  ARM: sunxi: Introduce Allwinner H3 support
  Documentation: sunxi: Update Allwinner SoC documentation
2015-07-09 15:08:44 -07:00
Florian Westphal
8b58a39846 ipv6: use flag instead of u16 for hop in inet6_skb_parm
Hop was always either 0 or sizeof(struct ipv6hdr).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 15:06:59 -07:00
David S. Miller
fc5778ca3f Merge branch 'pktgen-races'
Oleg Nesterov says:

====================
net: pktgen: fix race between pktgen_thread_worker() and kthread_stop()

I am not familiar with this code and I have no idea how to test
these changes, so 2/2 comes as a separate change. 1/2 looks like
the obvious bugfix, and probably candidate for -stable.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 15:05:33 -07:00
Oleg Nesterov
1fbe4b46ca net: pktgen: kill the "Wait for kthread_stop" code in pktgen_thread_worker()
pktgen_thread_worker() doesn't need to wait for kthread_stop(), it
can simply exit. Just pktgen_create_thread() and pg_net_exit() should
do get_task_struct()/put_task_struct(). kthread_stop(dead_thread) is
fine.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 15:05:32 -07:00
Oleg Nesterov
fecdf8be2d net: pktgen: fix race between pktgen_thread_worker() and kthread_stop()
pktgen_thread_worker() is obviously racy, kthread_stop() can come
between the kthread_should_stop() check and set_current_state().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Marcelo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 15:05:32 -07:00
Enrico Mioso
4a0e3e989d cdc_ncm: Add support for moving NDP to end of NCM frame
NCM specs are not actually mandating a specific position in the frame for
the NDP (Network Datagram Pointer). However, some Huawei devices will
ignore our aggregates if it is not placed after the datagrams it points
to. Add support for doing just this, in a per-device configurable way.
While at it, update NCM subdrivers, disabling this functionality in all of
them, except in huawei_cdc_ncm where it is enabled instead.
We aren't making any distinction between different Huawei NCM devices,
based on what the vendor driver does. Standard NCM devices are left
unaffected: if they are compliant, they should be always usable, still
stay on the safe side.

This change has been tested and working with a Huawei E3131 device (which
works regardless of NDP position), a Huawei E3531 (also working both
ways) and an E3372 (which mandates NDP to be after indexed datagrams).

V1->V2:
- corrected wrong NDP acronym definition
- fixed possible NULL pointer dereference
- patch cleanup
V2->V3:
- Properly account for the NDP size when writing new packets to SKB

Signed-off-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 14:58:31 -07:00
Mugunthan V N
5a0266af10 drivers: net: cpsw: fix disabling of tx interrupt in rx isr
In commit 'c03abd84634d ("net: ethernet: cpsw: don't requests
IRQs we don't use")', common isr is split into tx and rx, but
in rx isr tx interrupt is also disabledi in cpsw_disable_irq().
So tx interrupts are not handled during rx interrupts and rx
napi completion and results in poor tx performance by 40Mbps.
Fixing by disabling only rx interrupt in rx isr.

Cc: Felipe Balbi <balbi@ti.com>
Cc: <stable@vger.kernel.org> # v4.0+
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 14:52:48 -07:00
Vaishali Thakkar
adb3505086 net: systemport: Use eth_hw_addr_random
Use eth_hw_addr_random() instead of calling random_ether_addr().
Here, this change is setting addr_assign_type to NET_ADDR_RANDOM.

The Coccinelle semantic patch that performs this transformation
is as follows:

@@
identifier a,b;
@@

-random_ether_addr(a->b);
+eth_hw_addr_random(a);

Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 14:51:15 -07:00
Aleksey S. Kazantsev
7c3d0d67d5 dsa: mv88e6352/mv88e6xxx: Add support for Marvell 88E6320 and 88E6321
MV88E6320 and MV88E6321 are largely compatible to MV886352,
but are members of a different chip family.

Signed-off-by: Aleksey S. Kazantsev <ioctl@yandex.ru>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 14:34:23 -07:00
David S. Miller
986ca37eae Merge branch 'tcp-in-slow-start'
Yuchung Cheng says:

====================
tcp: fixes some congestion control corner cases

This patch series fixes corner cases of TCP congestion control.
First issue is to avoid continuing slow start when cwnd reaches ssthresh.
Second issue is incorrectly processing order of congestion state and
cwnd update when entering fast recovery or undoing cwnd.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 14:22:53 -07:00
Yuchung Cheng
b20a3fa30a tcp: update congestion state first before raising cwnd
The congestion state and cwnd can be updated in the wrong order.
For example, upon receiving a dubious ACK, we incorrectly raise
the cwnd first (tcp_may_raise_cwnd()/tcp_cong_avoid()) because
the state is still Open, then enter recovery state to reduce cwnd.

For another example, if the ACK indicates spurious timeout or
retransmits, we first revert the cwnd reduction and congestion
state back to Open state.  But we don't raise the cwnd even though
the ACK does not indicate any congestion.

To fix this problem we should first call tcp_fastretrans_alert() to
process the dubious ACK and update the congestion state, then call
tcp_may_raise_cwnd() that raises cwnd based on the current state.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 14:22:52 -07:00
Yuchung Cheng
76174004a0 tcp: do not slow start when cwnd equals ssthresh
In the original design slow start is only used to raise cwnd
when cwnd is stricly below ssthresh. It makes little sense
to slow start when cwnd == ssthresh: especially
when hystart has set ssthresh in the initial ramp, or after
recovery when cwnd resets to ssthresh. Not doing so will
also help reduce the buffer bloat slightly.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 14:22:52 -07:00
Yuchung Cheng
071d5080e3 tcp: add tcp_in_slow_start helper
Add a helper to test the slow start condition in various congestion
control modules and other places. This is to prepare a slight improvement
in policy as to exactly when to slow start.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 14:22:52 -07:00
Alexander Duyck
1007f59dce net: skb_defer_rx_timestamp should check for phydev before setting up classify
This change makes it so that the call skb_defer_rx_timestamp will first
check for a phydev before going in and manipulating the skb->data and
skb->len values.  By doing this we can avoid unnecessary work on network
devices that don't support phydev.  As a result we reduce the total
instruction count needed to process this on most devices.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 14:17:15 -07:00
Jon Maxwell
2251ae46af tcp: v1 always send a quick ack when quickacks are enabled
V1 of this patch contains Eric Dumazet's suggestion to move the per
dst RTAX_QUICKACK check into tcp_in_quickack_mode(). Thanks Eric.

I ran some tests and after setting the "ip route change quickack 1"
knob there were still many delayed ACKs sent. This occured
because when icsk_ack.quick=0 the !icsk_ack.pingpong value is
subsequently ignored as tcp_in_quickack_mode() checks both these
values. The condition for a quick ack to trigger requires
that both icsk_ack.quick != 0 and icsk_ack.pingpong=0. Currently
only icsk_ack.pingpong is controlled by the knob. But the
icsk_ack.quick value changes dynamically depending on heuristics.
The crux of the matter is that delayed acks still cannot be entirely
disabled even with the RTAX_QUICKACK per dst knob enabled. This
patch ensures that a quick ack is always sent when the RTAX_QUICKACK
per dst knob is turned on.

The "ip route change quickack 1" knob was recently added to enable
quickacks. It was modeled around the TCP_QUICKACK setsockopt() option.
This issue is that even with "ip route change quickack 1" enabled
we still see delayed ACKs under some conditions. It would be nice
to be able to completely disable delayed ACKs.

Here is an example:

# netstat -s|grep dela
    3 delayed acks sent

For all routes enable the knob

# ip route change quickack 1

Generate some traffic across a slow link and we still see the delayed
acks.

# netstat -s|grep dela
    106 delayed acks sent
    1 delayed acks further delayed because of locked socket

The issue is that both the "ip route change quickack 1" knob and
the TCP_QUICKACK option set the icsk_ack.pingpong variable to 0.
However at the business end in the __tcp_ack_snd_check() routine,
tcp_in_quickack_mode() checks that both icsk_ack.quick != 0
and icsk_ack.pingpong=0 in order to trigger a quickack. As
icsk_ack.quick is determined by heuristics it can be 0. When
that occurs the icsk_ack.pingpong value is ignored and a delayed
ACK is sent regardless.

This patch moves the RTAX_QUICKACK per dst check into the
tcp_in_quickack_mode() routine which ensures that a quickack is
always sent when the quickack knob is enabled for that dst.

Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 14:15:44 -07:00