linux_dsm_epyc7002/drivers
Dean Nelson d5bc77a223 e1000: don't enable dma receives until after dma address has been setup
Doing an 'ifconfig ethN down' followed by an 'ifconfig ethN up' on a qemu-kvm
guest system configured with two e1000 NICs can result in an 'unable to handle
kernel paging request at 0000000100000000' or 'bad page map in process ...' or
something similar.

These result from a 4096-byte page being corrupted with the following two-word
pattern (16-bytes) repeated throughout the entire page:

  0x0000000000000000
  0x0000000100000000

There can be other bits set as well. What is a constant is that the 2nd word
has the 32nd bit set. So one could see:

        :
  0x0000000000000000
  0x0000000100000000
  0x0000000000000000
  0x0000000172adc067    <<< bad pte
  0x800000006ec60067
  0x0000000700000040
  0x0000000000000000
  0x0000000100000000
        :

Which came from from a process' page table I dumped out when the marked line
was seen as bad by print_bad_pte().

The repeating pattern represents the e1000's two-word receive descriptor:

struct e1000_rx_desc {
        __le64 buffer_addr;   /* Address of the descriptor's data buffer */
        __le16 length;        /* Length of data DMAed into data buffer */
        __le16 csum;          /* Packet checksum */
        u8 status;            /* Descriptor status */
        u8 errors;            /* Descriptor Errors */
        __le16 special;
};

And the 32nd bit of the 2nd word maps to the 'u8 status' member, and
corresponds to E1000_RXD_STAT_DD which indicates the descriptor is done.

The corruption appears to result from the following...

 . An 'ifconfig ethN down' gets us into e1000_close(), which through a number
   of subfunctions results in:
     1. E1000_RCTL_EN being cleared in RCTL register.  [e1000_down()]
     2. dma_free_coherent() being called.  [e1000_free_rx_resources()]

 . An 'ifconfig ethN up' gets us into e1000_open(), which through a number of
   subfunctions results in:
     1. dma_alloc_coherent() being called.  [e1000_setup_rx_resources()]
     2. E1000_RCTL_EN being set in RCTL register.  [e1000_setup_rctl()]
     3. E1000_RCTL_EN being cleared in RCTL register.  [e1000_configure_rx()]
     4. RDLEN, RDBAH and RDBAL registers being set to reflect the dma page
        allocated in step 1.  [e1000_configure_rx()]
     5. E1000_RCTL_EN being set in RCTL register.  [e1000_configure_rx()]

During the 'ifconfig ethN up' there is a window opened, starting in step 2
where the receives are enabled up until they are disabled in step 3, in which
the address of the receive descriptor dma page known by the NIC is still the
previous one which was freed during the 'ifconfig ethN down'. If this memory
has been reallocated for some other use and the NIC feels so inclined, it will
write to that former dma page with predictably unpleasant results.

I realize that in the guest, we're dealing with an e1000 NIC that is software
emulated by qemu-kvm. The problem doesn't appear to occur on bare-metal. Andy
suspects that this is because in the emulator link-up is essentially instant
and traffic can start flowing immediately. Whereas on bare-metal, link-up
usually seems to take at least a few milliseconds. And this might be enough
to prevent traffic from flowing into the device inside the window where
E1000_RCTL_EN is set.

So perhaps a modification needs to be made to the qemu-kvm e1000 NIC emulator
to delay the link-up. But in defense of the emulator, it seems like a bad idea
to enable dma operations before the address of the memory to be involved has
been made known.

The following patch no longer enables receives in e1000_setup_rctl() but leaves
them however they were. It only enables receives in e1000_configure_rx(), and
only after the dma address has been made known to the hardware.

There are two places where e1000_setup_rctl() gets called. The one in
e1000_configure() is followed immediately by a call to e1000_configure_rx(), so
there's really no change functionally (except for the removal of the problem
window. The other is in __e1000_shutdown() and is not followed by a call to
e1000_configure_rx(), so there is a change functionally. But consider...

 . An 'ifconfig ethN down' (just as described above).

 . A 'suspend' of the system, which (I'm assuming) will find its way into
   e1000_suspend() which calls __e1000_shutdown() resulting in:
     1. E1000_RCTL_EN being set in RCTL register.  [e1000_setup_rctl()]

And again we've re-opened the problem window for some unknown amount of time.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2011-09-28 23:06:57 -07:00
..
accessibility
acpi Merge branches 'apei', 'bz-13195' and 'doc' into acpi 2011-09-12 20:00:00 -04:00
amba
ata drivers/ata/sata_dwc_460ex.c: add missing kfree 2011-08-18 23:58:11 -04:00
atm atm: convert to SKB paged frag API. 2011-08-26 12:38:42 -04:00
auxdisplay
base Merge branch 'for-linus' of git://opensource.wolfsonmicro.com/regmap 2011-09-08 16:47:52 -07:00
bcma Merge branch 'master' of github.com:davem330/net 2011-09-22 03:23:13 -04:00
block floppy: use del_timer_sync() in init cleanup 2011-09-21 10:22:11 +02:00
bluetooth Bluetooth: add support for 2011 mac mini 2011-09-17 17:16:03 -03:00
cdrom
char drivers/char/msm_smd_pkt.c: don't use IS_ERR() 2011-08-25 16:25:33 -07:00
clk
clocksource
connector connector: add comm change event report to proc connector 2011-09-28 13:41:50 -04:00
cpufreq drivers/cpufreq/pcc-cpufreq.c: avoid NULL pointer dereference 2011-09-14 18:09:38 -07:00
cpuidle
crypto
dca
dio
dma dmaengine/ste_dma40: fix memory leak due to prepared descriptors 2011-09-05 17:08:26 +05:30
edac i7core_edac: fixed typo in error count calculation 2011-08-18 14:07:15 -07:00
eisa
firewire firewire: ohci: add no MSI quirk for O2Micro controller 2011-09-16 22:22:10 +02:00
firmware
gpio drivers/gpio/gpio-generic.c: fix build errors 2011-09-14 18:09:38 -07:00
gpu drm/radeon/kms: Make GPU/CPU page size handling consistent in blit code (v2) 2011-09-18 19:44:36 +01:00
hid Merge branch 'for-linus' of git://github.com/dtor/input 2011-09-16 14:09:19 -07:00
hwmon hwmon: (coretemp) Initialize tmin 2011-09-14 03:55:05 -07:00
hwspinlock
i2c i2c-tegra: fix possible race condition after tx 2011-09-07 00:13:40 +01:00
ide
idle
ieee802154
infiniband Merge branch 'master' of github.com:davem330/net 2011-09-22 03:23:13 -04:00
input Merge branch 'for-linus' of git://github.com/dtor/input 2011-09-16 14:09:19 -07:00
iommu x86, iommu: Mark DMAR IRQ as non-threaded 2011-09-13 23:44:53 +02:00
isdn
leds drivers/leds/ledtrig-timer.c: fix broken sysfs delay handling 2011-09-14 18:09:38 -07:00
lguest
macintosh
mca
md md: Fix handling for devices from 2TB to 4TB in 0.90 metadata. 2011-09-10 17:21:28 +10:00
media Merge branch 'master' of github.com:davem330/net 2011-09-22 03:23:13 -04:00
memstick
message
mfd mfd: Fix omap-usb-host build failure 2011-09-06 16:37:59 +02:00
misc drivers/misc/pti.c: give 'comm' function scope in pti_control_frame_built_and_sent() 2011-09-14 18:09:38 -07:00
mmc Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2011-09-21 13:20:21 -07:00
mtd UBI: do not link debug messages when debugging is disabled 2011-08-19 19:02:27 +03:00
net e1000: don't enable dma receives until after dma address has been setup 2011-09-28 23:06:57 -07:00
nfc NFC: Reserve tx head and tail room 2011-08-24 14:41:44 -04:00
nubus
of
oprofile
parisc
parport
pci pci: Don't crash when reading mpss from root complex 2011-09-13 16:08:31 -07:00
pcmcia
platform
pnp
power s3c-adc-battery: Fix compilation error due to missing header (module.h) 2011-08-19 21:01:46 +04:00
pps
ps3
ptp
rapidio rapidio: fix use of non-compatible registers 2011-08-25 16:25:34 -07:00
regulator
rtc drivers/rtc/rtc-s3c.c: fix no occurrence of alarm interrupt 2011-09-14 18:09:38 -07:00
s390 Merge branch 'master' of github.com:davem330/net 2011-09-22 03:23:13 -04:00
sbus
scsi Merge branch 'master' of github.com:davem330/net 2011-09-22 03:23:13 -04:00
sfi
sh
sn
spi
ssb ssb: fix DMA translation for some specific boards 2011-08-24 14:41:41 -04:00
staging Merge branch 'master' of github.com:davem330/net 2011-09-22 03:23:13 -04:00
target iscsi-target: Fix sendpage breakage with proper padding+DataDigest iovec offsets 2011-09-16 23:47:07 +00:00
tc
telephony
thermal
tty cris: fix a build error in drivers/tty/serial/crisv10.c 2011-09-14 18:09:38 -07:00
uio
usb USB: xHCI: prevent infinite loop when processing MSE event 2011-09-19 17:15:47 -07:00
uwb
vhost
video backlight: Declare backlight_types[] const 2011-09-10 14:00:02 -07:00
virt
virtio
vlynq
w1 MAINTAINERS: Evgeniy has moved 2011-08-25 16:25:33 -07:00
watchdog Merge branch 'master' of github.com:davem330/net 2011-09-22 03:23:13 -04:00
xen xen/pciback: Add flag indicating device has been assigned by Xen 2011-09-27 00:48:34 -04:00
zorro
Kconfig
Makefile