At a basic level, architectures define structures to record where active
ranges of page frames are located. Once located, the code to calculate zone
sizes and holes in each architecture is very similar. Some of this zone and
hole sizing code is difficult to read for no good reason. This set of patches
eliminates the similar-looking architecture-specific code.
The patches introduce a mechanism where architectures register where the
active ranges of page frames are with add_active_range(). When all areas have
been discovered, free_area_init_nodes() is called to initialise the pgdat and
zones. The zone sizes and holes are then calculated in an architecture
independent manner.
Patch 1 introduces the mechanism for registering and initialising PFN ranges
Patch 2 changes ppc to use the mechanism - 139 arch-specific LOC removed
Patch 3 changes x86 to use the mechanism - 136 arch-specific LOC removed
Patch 4 changes x86_64 to use the mechanism - 74 arch-specific LOC removed
Patch 5 changes ia64 to use the mechanism - 52 arch-specific LOC removed
Patch 6 accounts for mem_map as a memory hole as the pages are not reclaimable.
It adjusts the watermarks slightly
Tony Luck has successfully tested for ia64 on Itanium with tiger_defconfig,
gensparse_defconfig and defconfig. Bob Picco has also tested and debugged on
IA64. Jack Steiner successfully boot tested on a mammoth SGI IA64-based
machine. These were on patches against 2.6.17-rc1 and release 3 of these
patches but there have been no ia64-changes since release 3.
There are differences in the zone sizes for x86_64 as the arch-specific code
for x86_64 accounts the kernel image and the starting mem_maps as memory holes
but the architecture-independent code accounts the memory as present.
The big benefit of this set of patches is a sizable reduction of
architecture-specific code, some of which is very hairy. There should be a
greater reduction when other architectures use the same mechanisms for zone
and hole sizing but I lack the hardware to test on.
Additional credit;
Dave Hansen for the initial suggestion and comments on early patches
Andy Whitcroft for reviewing early versions and catching numerous
errors
Tony Luck for testing and debugging on IA64
Bob Picco for fixing bugs related to pfn registration, reviewing a
number of patch revisions, providing a number of suggestions
on future direction and testing heavily
Jack Steiner and Robin Holt for testing on IA64 and clarifying
issues related to memory holes
Yasunori for testing on IA64
Andi Kleen for reviewing and feeding back about x86_64
Christian Kujau for providing valuable information related to ACPI
problems on x86_64 and testing potential fixes
This patch:
Define the structure to represent an active range of page frames within a node
in an architecture independent manner. Architectures are expected to register
active ranges of PFNs using add_active_range(nid, start_pfn, end_pfn) and call
free_area_init_nodes() passing the PFNs of the end of each zone.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
un-, de-, -free, -destroy, -exit, etc functions should in general return
void. Also,
There is very little, say, filesystem driver code can do upon failed
kmem_cache_destroy(). If it will be decided to BUG in this case, BUG
should be put in generic code, instead.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixing up some endian-ness warnings in preparation to clone ext4 from ext3.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
More white space cleanups in preparation of cloning ext4 from ext3.
Removing spaces that precede a tab.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These are a few places I've found in jbd that look like they may not be
16T-safe, or consistent with the use of unsigned longs for block
containers. Problems here would be somewhat hard to hit, would require
journal blocks past the 8T boundary, which would not be terribly common.
Still, should fix.
(some of these have come from the ext4 work on jbd as well).
I think there's one more possibility that the wrap() function may not be
safe IF your last block in the journal butts right up against the 232 block
boundary, but that seems like a VERY remote possibility, and I'm not
worrying about it at this point.
Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove whitespace from ext3 and jbd, before we clone ext4.
Signed-off-by: Mingming Cao<cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/i2c-2.6: (30 commits)
i2c: Drop unimplemented slave functions
i2c: Constify i2c_algorithm declarations, part 2
i2c: Constify i2c_algorithm declarations, part 1
i2c: Let drivers constify i2c_algorithm data
i2c-isa: Restore driver owner
i2c-viapro: Add support for the VT8237A and VT8251
i2c: Warn on i2c client creation failure
i2c-core: Drop useless bitmaskings
i2c-algo-pcf: Discard the mdelay data struct member
i2c-algo-bit: Cleanups
i2c-isa: Fail adding driver on attach_adapter error
i2c: __must_check fixes (chip drivers)
i2c-dev: attach/detach_adapter cleanups
i2c-stub: Chip address as a module parameter
i2c: Plan i2c-isa for removal
i2c: New bus driver for TI OMAP boards
i2c-algo-bit: Discard the mdelay data struct member
i2c-matroxfb: Struct init conversion
i2c: Fix copy-n-paste in subsystem Kconfig
i2c-au1550: Add I2C support for Au1200
...
This patch adds pci_stop_bus_device() which stops a PCI device (detach
the driver, remove from the global list and so on) and any children.
This is needed for ACPI based PCI-to-PCI bridge hot-remove, and it will
be also needed for ACPI based PCI root bridge hot-remove.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: MUNEDA Takahiro <muneda.takahiro@jp.fujitsu.com>
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are numerous drivers that can use multithreaded probing but having
some kind of global flag as the way to control this makes migration to
threaded probing hard and since it enables it everywhere and is almost
as likely to cause serious pain as holding a clog dance in a minefield.
If we have a pci_driver multithread_probe flag to inherit you can turn
it on for one driver at a time.
From playing so far however I think we need a different model at the
device layer which serializes until the called probe function says "ok
you can start another one now". That would need some kind of flag and
semaphore plus a helper function.
Anyway in the absence of that this is a starting point to usefully play
with this stuff
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Patch 3 implements the core part of PCI-Express AER and aerdrv
port service driver.
When a root port service device is probed, the aerdrv will call
request_irq to register irq handler for AER error interrupt.
When a device sends an PCI-Express error message to the root port,
the root port will trigger an interrupt, by either MSI or IO-APIC,
then kernel would run the irq handler. The handler collects root
error status register and schedules a work. The work will call
the core part to process the error based on its type
(Correctable/non-fatal/fatal).
As for Correctable errors, the patch chooses to just clear the correctable
error status register of the device.
As for the non-fatal error, the patch follows generic PCI error handler
rules to call the error callback functions of the endpoint's driver. If
the device is a bridge, the patch chooses to broadcast the error to
downstream devices.
As for the fatal error, the patch resets the pci-express link and
follows generic PCI error handler rules to call the error callback
functions of the endpoint's driver. If the device is a bridge, the patch
chooses to broadcast the error to downstream devices.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Introduce msi_ht_cap_enabled() to check the MSI capability in the
Hypertransport configuration space.
It is used in a generic quirk quirk_msi_ht_cap() to check whether
MSI is enabled on hypertransport chipset, and a nVidia specific quirk
quirk_nvidia_ck804_msi_ht_cap() where two 2 HT MSI mappings have to
be checked.
Both quirks set the PCI_BUS_FLAGS_NO_MSI bus flag when MSI is disabled.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
0x08 is the HT capability, while PCI_CAP_ID_HT_IRQCONF would be
the subtype 0x80 that mpic_scan_ht_pic() uses.
Rename PCI_CAP_ID_HT_IRQCONF into PCI_CAP_ID_HT.
And by the way, use it in the ipath driver instead of defining its
own HT_CAPABILITY_ID.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
i2c: Drop unimplemented slave functions
Drop the function declarations for slave mode support of i2c adapters.
This was never implemented, and by the time it is I bet we will want
something different anyway.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
i2c: Let drivers constify i2c_algorithm data
Let drivers constify I2C algorithm method operations tables,
moving them from ".data" to ".rodata".
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
i2c-algo-pcf: Discard the mdelay data struct member
Just as i2c-algo-bit, i2c-algo-pcf has an unused mdelay struct member,
which we can get rid of to spare some code and memory.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
i2c-algo-bit: Discard the mdelay data struct member
The i2c_algo_bit_data structure has an mdelay member, which is not
used by the algorithm code (the code has always been ifdef'd out.)
Let's discard it to save some code and memory.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
i2c-algo-sibyte: Merge into i2c-sibyte
Merge i2c-algo-sibyte into i2c-sibyte, as this is a complete,
hardware-dependent SMBus implementation and not a reusable algorithm.
Perform some basic coding style cleanups while we're here (mainly
space-based indentation replaced by tabulations.)
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch enables building of individual WAN protocol support
routines (parts of generic HDLC) as separate modules.
All protocol-private definitions are moved from hdlc.h file
to protocol drivers. User-space interface and interface
between generic HDLC and underlying low-level HDLC drivers
are unchanged.
Signed-off-by: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
[PATCH] Don't set calgary iommu as default y
[PATCH] i386/x86-64: New Intel feature flags
[PATCH] x86: Add a cumulative thermal throttle event counter.
[PATCH] i386: Make the jiffies compares use the 64bit safe macros.
[PATCH] x86: Refactor thermal throttle processing
[PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
[PATCH] Fix unwinder warning in traps.c
[PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
[PATCH] x86: Move direct PCI scanning functions out of line
[PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
[PATCH] Don't leak NT bit into next task
[PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
[PATCH] Fix some broken white space in ia32_signal.c
[PATCH] Initialize argument registers for 32bit signal handlers.
[PATCH] Remove all traces of signal number conversion
[PATCH] Don't synchronize time reading on single core AMD systems
[PATCH] Remove outdated comment in x86-64 mmconfig code
[PATCH] Use string instructions for Core2 copy/clear
[PATCH] x86: - restore i8259A eoi status on resume
[PATCH] i386: Split multi-line printk in oops output.
...
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (47 commits)
Driver core: Don't call put methods while holding a spinlock
Driver core: Remove unneeded routines from driver core
Driver core: Fix potential deadlock in driver core
PCI: enable driver multi-threaded probe
Driver Core: add ability for drivers to do a threaded probe
sysfs: add proper sysfs_init() prototype
drivers/base: check errors
drivers/base: Platform notify needs to occur before drivers attach to the device
v4l-dev2: handle __must_check
add CONFIG_ENABLE_MUST_CHECK
add __must_check to device management code
Driver core: fixed add_bind_files() definition
Driver core: fix comments in drivers/base/power/resume.c
sysfs_remove_bin_file: no return value, dump_stack on error
kobject: must_check fixes
Driver core: add ability for devices to create and remove bin files
Class: add support for class interfaces for devices
Driver core: create devices/virtual/ tree
Driver core: add device_rename function
Driver core: add ability for classes to handle devices properly
...
Add the pm_trace attribute in /sys/power which has to be explicitly set to
one to really enable the "PM tracing" code compiled in when CONFIG_PM_TRACE
is set (which modifies the machine's CMOS clock in unpredictable ways).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change suspend_console() so that it waits for all consoles to flush the
remaining messages and make it possible to switch the console suspending off
with the help of a Kconfig option.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Stefan Seyfried <seife@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make swsusp use memory bitmaps to store its internal information during the
resume phase of the suspend-resume cycle.
If the pfns of saveable pages are saved during the suspend phase instead of
the kernel virtual addresses of these pages, we can use them during the resume
phase directly to set the corresponding bits in a memory bitmap. Then, this
bitmap is used to mark the page frames corresponding to the pages that were
saveable before the suspend (aka "unsafe" page frames).
Next, we allocate as many page frames as needed to store the entire suspend
image and make sure that there will be some extra free "safe" page frames for
the list of PBEs constructed later. Subsequently, the image is loaded and, if
possible, the data loaded from it are written into their "original" page
frames (ie. the ones they had occupied before the suspend).
The image data that cannot be written into their "original" page frames are
loaded into "safe" page frames and their "original" kernel virtual addresses,
as well as the addresses of the "safe" pages containing their copies, are
stored in a list of PBEs. Finally, the list of PBEs is used to copy the
remaining image data into their "original" page frames (this is done
atomically, by the architecture-dependent parts of swsusp).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove some things that are no longer used or defined elsewhere from suspend.h
and make the inline version of software_suspend() return the right error code.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The current suspend code has to be run on one CPU, so we use the CPU
hotplug to take the non-boot CPUs offline on SMP machines. However, we
should also make sure that these CPUs will not be enabled by someone else
after we have disabled them.
The functions disable_nonboot_cpus() and enable_nonboot_cpus() are moved to
kernel/cpu.c, because they now refer to some stuff in there that should
better be static. Also it's better if disable_nonboot_cpus() returns an
error instead of panicking if something goes wrong, and
enable_nonboot_cpus() has no reason to panic(), because the CPUs may have
been enabled by the userland before it tries to take them online.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement async reads for swsusp resuming.
Crufty old PIII testbox:
15.7 MB/s -> 20.3 MB/s
Sony Vaio:
14.6 MB/s -> 33.3 MB/s
I didn't implement the post-resume bio_set_pages_dirty(). I don't really
understand why resume needs to run set_page_dirty() against these pages.
It might be a worry that this code modifies PG_Uptodate, PG_Error and
PG_Locked against the image pages. Can this possibly affect the resumed-into
kernel? Hopefully not, if we're atomically restoring its mem_map?
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Jens Axboe <axboe@suse.de>
Cc: Laurent Riffard <laurent.riffard@free.fr>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Switch the swsusp writeout code from 4k-at-a-time to 4MB-at-a-time.
Crufty old PIII testbox:
12.9 MB/s -> 20.9 MB/s
Sony Vaio:
14.7 MB/s -> 26.5 MB/s
The implementation is crude. A better one would use larger BIOs, but wouldn't
gain any performance.
The memcpys will be mostly pipelined with the IO and basically come for free.
The ENOMEM path has not been tested. It should be.
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the DIV_ROUND_UP() helper macro: divide `n' by `d', rounding up.
Stolen from the gfs2 tree(!) because the swsusp patches need it.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If we're going to implement smp_call_function_single() on three architecture
with the same prototype then it should have a declaration in a
non-arch-specific header file.
Move it into <linux/smp.h>.
Cc: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I've come across some problems with the assembly version of the ELFNOTE
macro currently in -mm. (in
x86-put-note-sections-into-a-pt_note-segment-in-vmlinux.patch)
The first is that older gas does not support :varargs in .macro
definitions (in my testing 2.17 does while 2.15 does not, I don't know
when it became supported). The Changes file says binutils >= 2.12 so I
think we need to avoid using it. There are no other uses in mainline or
-mm. Old gas appears to just ignore it so you get "too many arguments"
type errors.
Secondly it seems that passing strings as arguments to assembler macros
is broken without varargs. It looks like they get unquoted or each
character is treated as a separate argument or something and this causes
all manner of grief. I think this is because of the use of -traditional
when compiling assembly files.
Therefore I have translated the assembler macro into a pre-processor
macro.
I added the desctype as a separate argument instead of including it with
the descdata as the previous version did since -traditional means the
ELFNOTE definition after the #else needs to have the same number of
arguments (I think so anyway, the -traditional CPP semantics are pretty
fscking strange!).
With this patch I am able to define elfnotes in assembly like this with
both old and new assemblers.
ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz, "linux")
ELFNOTE(Xen, XEN_ELFNOTE_GUEST_VERSION, .asciz, "2.6")
ELFNOTE(Xen, XEN_ELFNOTE_XEN_VERSION, .asciz, "xen-3.0")
ELFNOTE(Xen, XEN_ELFNOTE_VIRT_BASE, .long, __PAGE_OFFSET)
Which seems reasonable enough.
Signed-off-by: Ian Campbell <ian.campbell@xensource.com>
Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch will pack any .note.* section into a PT_NOTE segment in the output
file.
To do this, we tell ld that we need a PT_NOTE segment. This requires us to
start explicitly mapping sections to segments, so we also need to explicitly
create PT_LOAD segments for text and data, and map the sections to them
appropriately. Fortunately, each section will default to its previous
section's segment, so it doesn't take many changes to vmlinux.lds.S.
This only changes i386 for now, but I presume the corresponding changes for
other architectures will be as simple.
This change also adds <linux/elfnote.h>, which defines C and Assembler macros
for actually creating ELF notes.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds support for the Atmel AVR32 architecture as well as the AT32AP7000
CPU and the AT32STK1000 development board.
AVR32 is a new high-performance 32-bit RISC microprocessor core, designed for
cost-sensitive embedded applications, with particular emphasis on low power
consumption and high code density. The AVR32 architecture is not binary
compatible with earlier 8-bit AVR architectures.
The AVR32 architecture, including the instruction set, is described by the
AVR32 Architecture Manual, available from
http://www.atmel.com/dyn/resources/prod_documents/doc32000.pdf
The Atmel AT32AP7000 is the first CPU implementing the AVR32 architecture. It
features a 7-stage pipeline, 16KB instruction and data caches and a full
Memory Management Unit. It also comes with a large set of integrated
peripherals, many of which are shared with the AT91 ARM-based controllers from
Atmel.
Full data sheet is available from
http://www.atmel.com/dyn/resources/prod_documents/doc32003.pdf
while the CPU core implementation including caches and MMU is documented by
the AVR32 AP Technical Reference, available from
http://www.atmel.com/dyn/resources/prod_documents/doc32001.pdf
Information about the AT32STK1000 development board can be found at
http://www.atmel.com/dyn/products/tools_card.asp?tool_id=3918
including a BSP CD image with an earlier version of this patch, development
tools (binaries and source/patches) and a root filesystem image suitable for
booting from SD card.
Alternatively, there's a preliminary "getting started" guide available at
http://avr32linux.org/twiki/bin/view/Main/GettingStarted which provides links
to the sources and patches you will need in order to set up a cross-compiling
environment for avr32-linux.
This patch, as well as the other patches included with the BSP and the
toolchain patches, is actively supported by Atmel Corporation.
[dmccr@us.ibm.com: Fix more pxx_page macro locations]
[bunk@stusta.de: fix `make defconfig']
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Dave McCracken <dmccr@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Permit __do_IRQ() to be dispensed with based on a configuration option.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Replace ctxid with sid in selinux_audit_rule_match interface for
consistency with other interfaces.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rename selinux_ctxid_to_string to selinux_sid_to_string to be
consistent with other interfaces.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eliminate selinux_task_ctxid since it duplicates selinux_task_get_sid.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are many places where we need to determine the node of a zone.
Currently we use a difficult to read sequence of pointer dereferencing.
Put that into an inline function and use throughout VM. Maybe we can find
a way to optimize the lookup in the future.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently one can enable slab reclaim by setting an explicit option in
/proc/sys/vm/zone_reclaim_mode. Slab reclaim is then used as a final
option if the freeing of unmapped file backed pages is not enough to free
enough pages to allow a local allocation.
However, that means that the slab can grow excessively and that most memory
of a node may be used by slabs. We have had a case where a machine with
46GB of memory was using 40-42GB for slab. Zone reclaim was effective in
dealing with pagecache pages. However, slab reclaim was only done during
global reclaim (which is a bit rare on NUMA systems).
This patch implements slab reclaim during zone reclaim. Zone reclaim
occurs if there is a danger of an off node allocation. At that point we
1. Shrink the per node page cache if the number of pagecache
pages is more than min_unmapped_ratio percent of pages in a zone.
2. Shrink the slab cache if the number of the nodes reclaimable slab pages
(patch depends on earlier one that implements that counter)
are more than min_slab_ratio (a new /proc/sys/vm tunable).
The shrinking of the slab cache is a bit problematic since it is not node
specific. So we simply calculate what point in the slab we want to reach
(current per node slab use minus the number of pages that neeed to be
allocated) and then repeately run the global reclaim until that is
unsuccessful or we have reached the limit. I hope we will have zone based
slab reclaim at some point which will make that easier.
The default for the min_slab_ratio is 5%
Also remove the slab option from /proc/sys/vm/zone_reclaim_mode.
[akpm@osdl.org: cleanups]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the atomic counter for slab_reclaim_pages and replace the counter
and NR_SLAB with two ZVC counter that account for unreclaimable and
reclaimable slab pages: NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE.
Change the check in vmscan.c to refer to to NR_SLAB_RECLAIMABLE. The
intend seems to be to check for slab pages that could be freed.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*_pages is a better description of the role of the variable.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In many places we will need to use the same combination of flags. Specify
a single GFP_THISNODE definition for ease of use in gfp.h.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new gfp flag __GFP_THISNODE to avoid fallback to other nodes. This
flag is essential if a kernel component requires memory to be located on a
certain node. It will be needed for alloc_pages_node() to force allocation
on the indicated node and for alloc_pages() to force allocation on the
current node.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Let's try to keep mm/ comments more useful and up to date. This is a start.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
lock_page needs the caller to have a reference on the page->mapping inode
due to sync_page, ergo set_page_dirty_lock is obviously buggy according to
its comments.
Solve it by introducing a new lock_page_nosync which does not do a sync_page.
akpm: unpleasant solution to an unpleasant problem. If it goes wrong it could
cause great slowdowns while the lock_page() caller waits for kblockd to
perform the unplug. And if a filesystem has special sync_page() requirements
(none presently do), permanent hangs are possible.
otoh, set_page_dirty_lock() is usually (always?) called against userspace
pages. They are always up-to-date, so there shouldn't be any pending read I/O
against these pages.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch splits alloc_percpu() up into two phases. Likewise for
free_percpu(). This allows clients to limit initial allocations to online
cpu's, and to populate or depopulate per-cpu data at run time as needed:
struct my_struct *obj;
/* initial allocation for online cpu's */
obj = percpu_alloc(sizeof(struct my_struct), GFP_KERNEL);
...
/* populate per-cpu data for cpu coming online */
ptr = percpu_populate(obj, sizeof(struct my_struct), GFP_KERNEL, cpu);
...
/* access per-cpu object */
ptr = percpu_ptr(obj, smp_processor_id());
...
/* depopulate per-cpu data for cpu going offline */
percpu_depopulate(obj, cpu);
...
/* final removal */
percpu_free(obj);
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a notifer chain to the out of memory killer. If one of the registered
callbacks could release some memory, do not kill the process but return and
retry the allocation that forced the oom killer to run.
The purpose of the notifier is to add a safety net in the presence of
memory ballooners. If the resource manager inflated the balloon to a size
where memory allocations can not be satisfied anymore, it is better to
deflate the balloon a bit instead of killing processes.
The implementation for the s390 ballooner is included.
[akpm@osdl.org: cleanups]
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I wonder why we need this bitmask indexing into zone->node_zonelists[]?
We always start with the highest zone and then include all lower zones
if we build zonelists.
Are there really cases where we need allocation from ZONE_DMA or
ZONE_HIGHMEM but not ZONE_NORMAL? It seems that the current implementation
of highest_zone() makes that already impossible.
If we go linear on the index then gfp_zone() == highest_zone() and a lot
of definitions fall by the wayside.
We can now revert back to the use of gfp_zone() in mempolicy.c ;-)
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
After we have done this we can now do some typing cleanup.
The memory policy layer keeps a policy_zone that specifies
the zone that gets memory policies applied. This variable
can now be of type enum zone_type.
The check_highest_zone function and the build_zonelists funnctionm must
then also take a enum zone_type parameter.
Plus there are a number of loops over zones that also should use
zone_type.
We run into some troubles at some points with functions that need a
zone_type variable to become -1. Fix that up.
[pj@sgi.com: fix set_mempolicy() crash]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is a check in zonelist_policy that compares pieces of the bitmap
obtained from a gfp mask via GFP_ZONETYPES with a zone number in function
zonelist_policy().
The bitmap is an ORed mask of __GFP_DMA, __GFP_DMA32 and __GFP_HIGHMEM.
The policy_zone is a zone number with the possible values of ZONE_DMA,
ZONE_DMA32, ZONE_HIGHMEM and ZONE_NORMAL. These are two different domains
of values.
For some reason seemed to work before the zone reduction patchset (It
definitely works on SGI boxes since we just have one zone and the check
cannot fail).
With the zone reduction patchset this check definitely fails on systems
with two zones if the system actually has memory in both zones.
This is because ZONE_NORMAL is selected using no __GFP flag at
all and thus gfp_zone(gfpmask) == 0. ZONE_DMA is selected when __GFP_DMA
is set. __GFP_DMA is 0x01. So gfp_zone(gfpmask) == 1.
policy_zone is set to ZONE_NORMAL (==1) if ZONE_NORMAL and ZONE_DMA are
populated.
For ZONE_NORMAL gfp_zone(<no _GFP_DMA>) yields 0 which is <
policy_zone(ZONE_NORMAL) and so policy is not applied to regular memory
allocations!
Instead gfp_zone(__GFP_DMA) == 1 which results in policy being applied
to DMA allocations!
What we realy want in that place is to establish the highest allowable
zone for a given gfp_mask. If the highest zone is higher or equal to the
policy_zone then memory policies need to be applied. We have such
a highest_zone() function in page_alloc.c.
So move the highest_zone() function from mm/page_alloc.c into
include/linux/gfp.h. On the way we simplify the function and use the new
zone_type that was also introduced with the zone reduction patchset plus we
also specify the right type for the gfp flags parameter.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
eventcounters: Do not display counters for zones that are not available on an
arch
Do not define or display counters for the DMA32 and the HIGHMEM zone if such
zones were not configured.
[akpm@osdl.org: s390 fix]
[heiko.carstens@de.ibm.com: s390 fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make ZONE_HIGHMEM optional
- ifdef out code and definitions related to CONFIG_HIGHMEM
- __GFP_HIGHMEM falls back to normal allocations if there is no
ZONE_HIGHMEM
- GFP_ZONEMASK becomes 0x01 if there is no DMA32 and no HIGHMEM
zone.
[jdike@addtoit.com: build fix]
Signed-off-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make ZONE_DMA32 optional
- Add #ifdefs around ZONE_DMA32 specific code and definitions.
- Add CONFIG_ZONE_DMA32 config option and use that for x86_64
that alone needs this zone.
- Remove the use of CONFIG_DMA_IS_DMA32 and CONFIG_DMA_IS_NORMAL
for ia64 and fix up the way per node ZVCs are calculated.
- Fall back to prior GFP_ZONEMASK of 0x03 if there is no
DMA32 zone.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use enum for zones and reformat zones dependent information
Add comments explaning the use of zones and add a zones_t type for zone
numbers.
Line up information that will be #ifdefd by the following patches.
[akpm@osdl.org: comment cleanups]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move totalhigh_pages and nr_free_highpages() into highmem.c/.h
Move the totalhigh_pages definition into highmem.c/.h. Move the
nr_free_highpages function into highmem.c
[yoichi_yuasa@tripeaks.co.jp: build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It fixes various coding style issues, specially when spaces are useless. For
example '*' go next to the function name.
Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
__init in headers is pretty useless because the compiler doesn't check it, and
they get out of sync relatively frequently. So if you see an __init in a
header file, it's quite unreliable and you need to check the definition
anyway.
Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes the following needlessly global functions static:
- slab.c: kmem_find_general_cachep()
- swap.c: __page_cache_release()
- vmalloc.c: __vmalloc_node()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that we can detect writers of shared mappings, throttle them. Avoids OOM
by surprise.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Tracking of dirty pages in shared writeable mmap()s.
The idea is simple: write protect clean shared writeable pages, catch the
write-fault, make writeable and set dirty. On page write-back clean all the
PTE dirty bits and write protect them once again.
The implementation is a tad harder, mainly because the default
backing_dev_info capabilities were too loosely maintained. Hence it is not
enough to test the backing_dev_info for cap_account_dirty.
The current heuristic is as follows, a VMA is eligible when:
- its shared writeable
(vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
- it is not a 'special' mapping
(vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
- the backing_dev_info is cap_account_dirty
mapping_cap_account_dirty(vma->vm_file->f_mapping)
- f_op->mmap() didn't change the default page protection
Page from remap_pfn_range() are explicitly excluded because their COW
semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
because they don't have a backing store anyway.
mprotect() is taught about the new behaviour as well. However it overrides
the last condition.
Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
It can be called on any page, but is currently only implemented for mapped
pages, if the page is found the be of a VMA that accounts dirty pages it will
also wrprotect the PTE.
Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
under ->private_lock. This seems to be safe, since ->private_lock is used to
serialize access to the buffers, not the page itself. This is needed because
clear_page_dirty() will call into page_mkclean() and would thereby violate
locking order.
[dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce a VM_BUG_ON, which is turned on with CONFIG_DEBUG_VM. Use this
in the lightweight, inline refcounting functions; PageLRU and PageActive
checks in vmscan, because they're pretty well confined to vmscan. And in
page allocate/free fastpaths which can be the hottest parts of the kernel
for kbuilds.
Unlike BUG_ON, VM_BUG_ON must not be used to execute statements with
side-effects, and should not be used outside core mm code.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Give non-highmem architectures access to the kmap API for the purposes of
overriding (this is what the attached patch does).
The proposal is that we should now require all architectures with coherence
issues to manage data coherence via the kmap/kunmap API. Thus driver
writers never have to write code like
kmap(page)
modify data in page
flush_kernel_dcache_page(page)
kunmap(page)
instead, kmap/kunmap will manage the coherence and driver (and filesystem)
writers don't need to worry about how to flush between kmap and kunmap.
For most architectures, the page only needs to be flushed if it was
actually written to *and* there are user mappings of it, so the best
implementation looks to be: clear the page dirty pte bit in the kernel page
tables on kmap and on kunmap, check page->mappings for user maps, and then
the dirty bit, and only flush if it both has user mappings and is dirty.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
get_cpu_var()/per_cpu()/__get_cpu_var() arguments must be simple
identifiers. Otherwise the arch dependent implementations might break.
This patch enforces the correct usage of the macros by producing a syntax
error if the variable is not a simple identifier.
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
OneNAND lock scheme depends on density and process of chip.
Some OneNAND chips support all block unlock
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
The VIDIOC_G_SLICED_VBI_CAP needs to receive the v4l2_buf_type field before
it can return a result. Hence this ioctl must be IOWR, not IOR. Since this
ioctl is still marked experimental we can make this change.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
The current time_before/time_after macros will fail typechecks
when passed u64 values (as returned by get_jiffies_64()). On 64bit
systems, this will just result in a warning about mismatching types
without explicit casts, but since unsigned long and u64
(unsigned long long) are of same size, it will still work.
On 32bit systems, a long is 32bits, so the value from get_jiffies_64()
will be truncated by the cast and thus lose all the precision gained by
64bit jiffies.
Signed-off-by: Dmitriy Zavin <dmitriyz@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch adds the per thread cookie field to the task struct and the PDA.
Also it makes sure that the PDA value gets the new cookie value at context
switch, and that a new task gets a new cookie at task creation time.
Signed-off-by: Arjan van Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
CC: Andi Kleen <ak@suse.de>
The EDD code would scan the command line as a fixed array, without
taking account of either whitespace, null-termination, the old
command-line protocol, late overrides early, or the fact that the
command line may not be reachable from INITSEG.
This should fix those problems, and enable us to use a longer command
line.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Apparently IA64 needs it, but i386/x86-64 don't anymore
since gcc 2.95 support was dropped. Nobody else on linux-arch
requested keeping it generically
Cc: tony.luck@intel.com
Cc: kaos@sgi.com
Signed-off-by: Andi Kleen <ak@suse.de>
Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
context switch a trap is taken for the first FPU use to restore the FPU
context lazily. This is of course great for applications that have very
sporadic or no FPU use (since then you avoid doing the expensive
save/restore all the time). However for very frequent FPU users... you
take an extra trap every context switch.
The patch below adds a simple heuristic to this code: After 5 consecutive
context switches of FPU use, the lazy behavior is disabled and the context
gets restored every context switch. If the app indeed uses the FPU, the
trap is avoided. (the chance of the 6th time slice using FPU after the
previous 5 having done so are quite high obviously).
After 256 switches, this is reset and lazy behavior is returned (until
there are 5 consecutive ones again). The reason for this is to give apps
that do longer bursts of FPU use still the lazy behavior back after some
time.
[akpm@osdl.org: place new task_struct field next to jit_keyring to save space]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
This patch moves the entry.S:error_entry to .kprobes.text section,
since code marked unsafe for kprobes jumps directly to entry.S::error_entry,
that must be marked unsafe as well.
This patch also moves all the ".previous.text" asm directives to ".previous"
for kprobes section.
AK: Following a similar i386 patch from Chuck Ebbert
AK: Also merged Jeremy's fix in.
+From: Jeremy Fitzhardinge <jeremy@goop.org>
KPROBE_ENTRY does a .section .kprobes.text, and expects its users to
do a .previous at the end of the function.
Unfortunately, if any code within the function switches sections, for
example .fixup, then the .previous ends up putting all subsequent code
into .fixup. Worse, any subsequent .fixup code gets intermingled with
the code its supposed to be fixing (which is also in .fixup). It's
surprising this didn't cause more havok.
The fix is to use .pushsection/.popsection, so this stuff nests
properly. A further cleanup would be to get rid of all
.section/.previous pairs, since they're inherently fragile.
+From: Chuck Ebbert <76306.1226@compuserve.com>
Because code marked unsafe for kprobes jumps directly to
entry.S::error_code, that must be marked unsafe as well.
The easiest way to do that is to move the page fault entry
point to just before error_code and let it inherit the same
section.
Also moved all the ".previous" asm directives for kprobes
sections to column 1 and removed ".text" from them.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
- Remove unused all_contexts parameter
No caller used it
- Move skip argument into the structure (needed for
followon patches)
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
For NUMA optimization and some other algorithms it is useful to have a fast
to get the current CPU and node numbers in user space.
x86-64 added a fast way to do this in a vsyscall. This adds a generic
syscall for other architectures to make it a generic portable facility.
I expect some of them will also implement it as a faster vsyscall.
The cache is an optimization for the x86-64 vsyscall optimization. Since
what the syscall returns is an approximation anyways and user space
often wants very fast results it can be cached for some time. The norma
methods to get this information in user space are relatively slow
The vsyscall is in a better position to manage the cache because it has direct
access to a fast time stamp (jiffies). For the generic syscall optimization
it doesn't help much, but enforce a valid argument to keep programs
portable
I only added an i386 syscall entry for now. Other architectures can follow
as needed.
AK: Also added some cleanups from Andrew Morton
Signed-off-by: Andi Kleen <ak@suse.de>
This patch adds a vgetcpu vsyscall, which depending on the CPU RDTSCP
capability uses either the RDTSCP or CPUID to obtain a CPU and node
numbers and pass them to the program.
AK: Lots of changes over Vojtech's original code:
Better prototype for vgetcpu()
It's better to pass the cpu / node numbers as separate arguments
to avoid mistakes when going from SMP to NUMA.
Also add a fast time stamp based cache using a user supplied
argument to speed things more up.
Use fast method from Chuck Ebbert to retrieve node/cpu from
GDT limit instead of CPUID
Made sure RDTSCP init is always executed after node is known.
Drop printk
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
To quote Alan Cox:
The default Linux behaviour on an NMI of either memory or unknown is to
continue operation. For many environments such as scientific computing
it is preferable that the box is taken out and the error dealt with than
an uncorrected parity/ECC error get propogated.
A small number of systems do generate NMI's for bizarre random reasons
such as power management so the default is unchanged. In other respects
the new proc/sys entry works like the existing panic controls already in
that directory.
This is separate to the edac support - EDAC allows supported chipsets to
handle ECC errors well, this change allows unsupported cases to at least
panic rather than cause problems further down the line.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi
watchdog.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
There is a potential deadlock in the driver core. It boils down to
the fact that bus_remove_device() calls klist_remove() instead of
klist_del(), thereby waiting until the reference count of the
klist_node in the bus's klist of devices drops to 0. The refcount
can't reach 0 so long as a modprobe process is trying to bind a new
driver to the device being removed, by calling __driver_attach(). The
problem is that __driver_attach() tries to acquire the device's
parent's semaphore, but the caller of bus_remove_device() is quite
likely to own that semaphore already.
It isn't sufficient just to replace klist_remove() with klist_del().
Doing so runs the risk that the device would remain on the bus's klist
of devices for some time, and so could be bound to another driver even
after it was unregistered. What's needed is a new way to distinguish
whether or not a device is registered, based on a criterion other than
whether its klist_node is linked into the bus's klist of devices. That
way driver binding can fail when the device is unregistered, even if
it is still linked into the klist.
This patch (as782) implements the solution, by adding a new bitflag to
indiate when a struct device is registered, by testing the flag before
allowing a driver to bind a device, and by changing the definition of
the device_is_registered() inline.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds the infrastructure for drivers to do a threaded probe, and
waits at init time for all currently outstanding probes to complete.
A new kernel thread will be created when the probe() function for the
driver is called, if the multithread_probe bit is set in the driver
saying it can support this kind of operation.
I have tested this with USB and PCI, and it works, and shaves off a lot
of time in the boot process, but there are issues with finding root boot
disks, and some USB drivers assume that this can never happen, so it is
currently not enabled for any bus type. Individual drivers can enable
this right now if they wish, and bus authors can selectivly turn it on
as well, once they determine that their subsystem will work properly
with it.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Don't be crufty. Mark it __must_check too.
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Those 1500 warnings can be a bit of a pain. Add a config option to shut them
up.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We're getting a lot of crashes in the sysfs/kobject/device/bus/class code and
they're very hard to diagnose.
I'm suspecting that in some cases this is because drivers aren't checking
return values and aren't handling errors correctly. So the code blithely
blunders on and crashes later in very obscure ways.
There's just no reason to ignore errors which can and do occur. So the patch
sprinkles __must_check all over these APIs.
Causes 1,513 new warnings. Heh.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Make sysfs_remove_bin_file() void. If it detects an error,
printk the file name and call dump_stack().
sysfs_hash_and_remove() now returns an error code indicating
its success or failure so that sysfs_remove_bin_file() can
know success/failure.
Convert the only driver that checked the return value of
sysfs_remove_bin_file().
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Makes it easier for devices to create and remove binary attribute files
so they don't have to call directly into sysfs. This is needed to help
with the conversion from struct class_device to struct device.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When moving class_device usage over to device, we need to handle
class_interfaces properly with devices. This patch adds that support.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This change creates a devices/virtual/CLASS_NAME tree for struct devices
that belong to a class, yet do not have a "real" struct device for a
parent. It automatically creates the directories on the fly as needed.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds two new callbacks to the class structure:
int (*dev_uevent)(struct device *dev, char **envp, int num_envp,
char *buffer, int buffer_size);
void (*dev_release)(struct device *dev);
And one pointer:
struct device_attribute * dev_attrs;
which all corrispond with the same thing as the "normal" class devices
do, yet this is for when a struct device is bound to a class.
Someday soon, struct class_device will go away, and then the other
fields in this structure can be removed too. But this is necessary in
order to get the transition to work properly.
Tested out on a network core patch that converted it to use struct
device instead of struct class_device.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is needed for the network class devices in order to be able to
convert over to use struct device.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Teach platform_bus about the new suspend_late/resume_early PM calls,
issued with IRQs off. Do we really need sysdev and friends any more,
or can janitors start switching its users over to platform_device so
we can do a minor code-ectomy?
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Remove the new suspend_prepare() phase. It doesn't seem very usable,
has never been tested, doesn't address fault cleanup, and would need
a sibling resume_complete(); plus there are no real use cases. It
could be restored later if those issues get resolved.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This adds a new pm_message_t event type to use when preparing to restore a
swsusp snapshot. Devices that have been initialized by Linux after resume
(rather than left in power-up-reset state) may need to be reset; this new
event type give drivers the chance to do that.
The drivers that will care about this are those which understand more hardware
states than just "on" and "reset", relying on hardware state during resume()
methods to be either the state left by the preceding suspend(), or a
power-lost reset. The best current example of this class of drivers are USB
host controller drivers, which currently do not work through swsusp when
they're statically linked.
When the swsusp freeze/thaw mechanism kicks in, a troublesome third state
could exist: one state set up by a different kernel instance, before a
snapshot image is resumed. This mechanism lets drivers prevent that state.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Changes the PCI core to use the new suspend infrastructure changes.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Allow devices to participate in the suspend process more intimately,
in particular, allow the final phase (with interrupts disabled) to
also be open to normal devices, not just system devices.
Also, allow classes to participate in device suspend.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NetLabel]: update docs with website information
[NetLabel]: rework the Netlink attribute handling (part 2)
[NetLabel]: rework the Netlink attribute handling (part 1)
[Netlink]: add nla_validate_nested()
[NETLINK]: add nla_for_each_nested() to the interface list
[NetLabel]: change the SELinux permissions
[NetLabel]: make the CIPSOv4 cache spinlocks bottom half safe
[NetLabel]: correct improper handling of non-NetLabel peer contexts
[TCP]: make cubic the default
[TCP]: default congestion control menu
[ATM] he: Fix __init/__devinit conflict
[NETFILTER]: Add dscp,DSCP headers to header-y
[DCCP]: Introduce dccp_probe
[DCCP]: Use constants for CCIDs
[DCCP]: Introduce constants for CCID numbers
[DCCP]: Allow default/fallback service code.
Add logic to check ARP request / reply packets used for ARP
monitor link integrity checking.
The current method simply examines the slave device to see if it
has sent and received traffic; this can be fooled by extraneous traffic.
For example, if multiple hosts running bonding are behind a common
switch, the probe traffic from the multiple instances of bonding will
update the tx/rx times on each other's slave devices.
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add priv_flag to specifically identify bonding-involved devices. Needed
because IFF_MASTER is an unreliable identifier (vlan interfaces above bonding
will inherit IFF_MASTER). Misidentification of devices would cause
notifier events for other devices to be erroneously processed by bonding,
causing various havoc.
Bug discovered by Martin Papik <martin.papik@ipsec.info>; this patch is
modified from his original.
Signed-off-by: Martin Papik <martin.papik@ipsec.info>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This is version 21 of the Wireless Extensions. Changelog :
o finishes migrating the ESSID API (remove the +1)
o netdev->get_wireless_stats is no more
o long/short retry
This is a redacted version of a patch originally submitted by Jean
Tourrilhes. I removed most of the additions, in order to minimize
future support requirements for nl80211 (or other WE successor).
CC: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
They all contain the same thing. Instead, have a single generic one in
include/asm-generic, and permit an arch to override as needed.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
- allow high-level nand_write_page() function to be overridden
- likewise low-level write_page_raw() and read_page_raw() functions
- Clean up the abuse of chip->ecc.{write,read}_page() with MTD_OOB_RAW
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
This patch adds xt_dscp.h and xt_DSCP.h to the kernel headers which are
exported via 'make headers_install'. These are necessary for userspace
to add rules using dscp match and DSCP target.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (28 commits)
ocfs2: Teach ocfs2_drop_lock() to use ->set_lvb() callback
ocfs2: Remove ->unblock lockres operation
ocfs2: move downconvert worker to lockres ops
ocfs2: Remove unused dlmglue functions
ocfs2: Have the metadata lock use generic dlmglue functions
ocfs2: Add ->set_lvb callback in dlmglue
ocfs2: Add ->check_downconvert callback in dlmglue
ocfs2: Check for refreshing locks in generic unblock function
ocfs2: don't unconditionally pass LVB flags
ocfs2: combine inode and generic blocking AST functions
ocfs2: Add ->get_osb() dlmglue locking operation
ocfs2: remove ->unlock_ast() callback from ocfs2_lock_res_ops
ocfs2: combine inode and generic AST functions
ocfs2: Clean up lock resource refresh flags
ocfs2: Remove i_generation from inode lock names
ocfs2: Encode i_generation in the meta data lvb
ocfs2: Free up some space in the lvb
ocfs2: Remove special casing for inode creation in ocfs2_dentry_attach_lock()
ocfs2: manually d_move() during ocfs2_rename()
[PATCH] Allow file systems to manually d_move() inside of ->rename()
...
Some file systems want to manually d_move() the dentries involved in a
rename. We can do this by making use of the FS_ODD_RENAME flag if we just
have nfs_rename() unconditionally do the d_move(). While there, we rename
the flag to be more descriptive.
OCFS2 uses this to protect that part of the rename operation with a cluster
lock.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
This has been discussed on dccp@vger and removes the necessity for applications
to supply service codes in each and every case.
If an application does not want to provide a service code, that's fine, it will
be given 0. Otherwise, service codes can be set via socket options as before.
This patch has been tested using various client/server configurations
(including listening on multiple service codes).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: (50 commits)
[libata] Delete pata_it8172 driver
[PATCH] libata: improve handling of diagostic fail (and hardware that misreports it)
[PATCH] libata: fix non-uniform ports handling
Fix libata resource conflict for legacy mode
[libata] ata_piix: build fix
[PATCH] pata_amd: Check enable bits on Nvidia
[PATCH] Update SiS PATA
[libata] Add pata_jmicron driver to Kconfig, Makefile
[libata #pata-drivers] Trim trailing whitespace.
[libata] Trim trailing whitespace.
[libata] Add a bunch of PATA drivers.
Rename libata-bmdma.c to libata-sff.c.
libata: Grand renaming.
Clean up drivers/ata/Kconfig a bit.
[PATCH] CONFIG_PM=n slim: drivers/scsi/sata_sil*
[PATCH] sata_via: Add SATA support for vt8237a
[PATCH] libata: change path to libata in libata.tmpl
[PATCH] libata: s/CONFIG_SCSI_SATA/CONFIG_[S]ATA/g in pci/quirks.c
libata: Make sure drivers/ata is a separate Kconfig menu
[libata] ata_piix: add missing kfree()
...
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (217 commits)
net/ieee80211: fix more crypto-related build breakage
[PATCH] Spidernet: add ethtool -S (show statistics)
[NET] GT96100: Delete bitrotting ethernet driver
[PATCH] mv643xx_eth: restrict to 32-bit PPC_MULTIPLATFORM
[PATCH] Cirrus Logic ep93xx ethernet driver
r8169: the MMIO region of the 8167 stands behin BAR#1
e1000, ixgb: Remove pointless wrappers
[PATCH] Remove powerpc specific parts of 3c509 driver
[PATCH] s2io: Switch to pci_get_device
[PATCH] gt96100: move to pci_get_device API
[PATCH] ehea: bugfix for register access functions
[PATCH] e1000 disable device on PCI error
drivers/net/phy/fixed: #if 0 some incomplete code
drivers/net: const-ify ethtool_ops declarations
[PATCH] ethtool: allow const ethtool_ops
[PATCH] sky2: big endian
[PATCH] sky2: fiber support
[PATCH] sky2: tx pause bug fix
drivers/net: Trim trailing whitespace
[PATCH] ehea: IBM eHEA Ethernet Device Driver
...
Manually resolved conflicts in drivers/net/ixgb/ixgb_main.c and
drivers/net/sky2.c related to CHECKSUM_HW/CHECKSUM_PARTIAL changes by
commit 84fa7933a3 that just happened to be
next to unrelated changes in this update.
Some MMC hosts can only handle log2 block sizes. Unfortunately,
the MMC password support needs to be able to send non-log2 block
sizes. Provide a capability so that the MMC password support can
decide whether it should use this support or not.
The unfortunate side effect of this host limitation is that any
MMC card protected by a password which is not a log2 block size
can not be accessed on a host which only allows a log2 block size.
This change just adds the flag. The MMC password support code
needs updating to use it (if and when it is finally submitted.)
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (74 commits)
NFS: unmark NFS direct I/O as experimental
NFS: add comments clarifying the use of nfs_post_op_update()
NFSv4: rpc_mkpipe creating socket inodes w/out sk buffers
NFS: Use SEEK_END instead of hardcoded value
NFSv4: When mounting with a port=0 argument, substitute port=2049
NFSv4: Poll more aggressively when handling NFS4ERR_DELAY
NFSv4: Handle the condition NFS4ERR_FILE_OPEN
NFSv4: Retry lease recovery if it failed during a synchronous operation.
NFS: Don't invalidate the symlink we just stuffed into the cache
NFS: Make read() return an ESTALE if the file has been deleted
NFSv4: It's perfectly legal for clp to be NULL here....
NFS: nfs_lookup - don't hash dentry when optimising away the lookup
SUNRPC: Fix Oops in pmap_getport_done
SUNRPC: Add refcounting to the struct rpc_xprt
SUNRPC: Clean up soft task error handling
SUNRPC: Handle ENETUNREACH, EHOSTUNREACH and EHOSTDOWN socket errors
SUNRPC: rpc_delay() should not clobber the rpc_task->tk_status
Fix a referral error Oops
NFS: NFS_ROOT should use the new rpc_create API
NFS: Fix up compiler warnings on 64-bit platforms in client.c
...
Manually resolved conflict in net/sunrpc/xprtsock.c
In a subsequent patch, this will allow the portmapper to take a reference
to the rpc_xprt for which it is updating the port number, fixing an Oops.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Now that we have a copy of the symlink path in the page cache, we can pass
a struct page down to the XDR routines instead of a string buffer.
Test plan:
Connectathon, all NFS versions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If the LOOKUP or GETATTR in nfs_instantiate fail, nfs_instantiate will do a
d_drop before returning. But some callers already do a d_drop in the case
of an error return. Make certain we do only one d_drop in all error paths.
This issue was introduced because over time, the symlink proc API diverged
slightly from the create/mkdir/mknod proc API. To prevent other coding
mistakes of this type, change the symlink proc API to be more like
create/mkdir/mknod and move the nfs_instantiate call into the symlink proc
routines so it is used in exactly the same way for create, mkdir, mknod,
and symlink.
Test plan:
Connectathon, all versions of NFS.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The two function call API for creating a new RPC client is now obsolete.
Remove it.
Also, remove an unnecessary check to see whether the caller is capable of
using privileged network services. The kernel RPC client always uses a
privileged ephemeral port by default; callers are responsible for checking
the authority of users to make use of any RPC service, or for specifying
that a nonprivileged port is acceptable.
Test plan:
Repeated runs of Connectathon locking suite. Check network trace to ensure
correctness of NLM requests and replies.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Prepare for more generic transport endpoint handling needed by transports
that might use different forms of addressing, such as IPv6.
Introduce a single function call to replace the two-call
xprt_create_proto/rpc_create_client API. Define a new rpc_create_args
structure that allows callers to pass in remote endpoint addresses of
varying length.
Test-plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Remove some unused macros related to accessing an RPC peer address
Test plan:
Compile kernel with CONFIG_NFS option enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
IPv6 addresses are big (128 bytes). Now that no RPC client consumers treat
the addr field in rpc_xprt structs as an opaque, and access it only via the
API calls, we can safely widen the field in the rpc_xprt struct to
accomodate larger addresses.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Provide an API for formatting the remote peer address for printing without
exposing its internal structure. The address could be dynamic, so we
support a function call to get the address rather than reading it straight
out of a structure.
Test-plan:
Destructive testing (unplugging the network temporarily). Probably need
to rig a server where certain services aren't running, or that returns an
error for some typical operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a new method to the transport switch API to provide a way to convert
the opaque contents of xprt->addr to a human-readable string.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
include/linux/sunrpc/clnt.h already includes include/linux/sunrpc/xprt.h.
We can remove xprt.h from source files that already include clnt.h.
Likewise include/linux/sunrpc/timer.h.
Test plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Provide an API for retrieving the remote peer address without allowing
direct access to the rpc_xprt struct.
Test-plan:
Compile kernel with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Introduce a clean transport switch API for plugging in different types of
rpcbind mechanisms. For instance, rpcbind can cleanly replace the
existing portmapper client, or a transport can choose to implement RPC
binding any way it likes.
Test plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The previous patches removed the last user of RPC child tasks, so we can
remove support for child tasks from net/sunrpc/sched.c now.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move connection and bind state that was maintained in the rpc_clnt
structure to the rpc_xprt structure. This will allow the creation of
a clean API for plugging in different types of bind mechanisms.
This brings improvements such as the elimination of a single spin lock to
control serialization for all in-kernel RPC binding. A set of per-xprt
bitops is used to serialize tasks during RPC binding, just like it now
works for making RPC transport connections.
Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Hide the contents and format of xprt->addr by eliminating direct uses
of the xprt->addr.sin_port field. This change is required to support
alternate RPC host address formats (eg IPv6).
Test-plan:
Destructive testing (unplugging the network temporarily). Repeated runs of
Connectathon locking suite with UDP and TCP.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The attached patch makes NFS share superblocks between mounts from the same
server and FSID over the same protocol.
It does this by creating each superblock with a false root and returning the
real root dentry in the vfsmount presented by get_sb(). The root dentry set
starts off as an anonymous dentry if we don't already have the dentry for its
inode, otherwise it simply returns the dentry we already have.
We may thus end up with several trees of dentries in the superblock, and if at
some later point one of anonymous tree roots is discovered by normal filesystem
activity to be located in another tree within the superblock, the anonymous
root is named and materialises attached to the second tree at the appropriate
point.
Why do it this way? Why not pass an extra argument to the mount() syscall to
indicate the subpath and then pathwalk from the server root to the desired
directory? You can't guarantee this will work for two reasons:
(1) The root and intervening nodes may not be accessible to the client.
With NFS2 and NFS3, for instance, mountd is called on the server to get
the filehandle for the tip of a path. mountd won't give us handles for
anything we don't have permission to access, and so we can't set up NFS
inodes for such nodes, and so can't easily set up dentries (we'd have to
have ghost inodes or something).
With this patch we don't actually create dentries until we get handles
from the server that we can use to set up their inodes, and we don't
actually bind them into the tree until we know for sure where they go.
(2) Inaccessible symbolic links.
If we're asked to mount two exports from the server, eg:
mount warthog:/warthog/aaa/xxx /mmm
mount warthog:/warthog/bbb/yyy /nnn
We may not be able to access anything nearer the root than xxx and yyy,
but we may find out later that /mmm/www/yyy, say, is actually the same
directory as the one mounted on /nnn. What we might then find out, for
example, is that /warthog/bbb was actually a symbolic link to
/warthog/aaa/xxx/www, but we can't actually determine that by talking to
the server until /warthog is made available by NFS.
This would lead to having constructed an errneous dentry tree which we
can't easily fix. We can end up with a dentry marked as a directory when
it should actually be a symlink, or we could end up with an apparently
hardlinked directory.
With this patch we need not make assumptions about the type of a dentry
for which we can't retrieve information, nor need we assume we know its
place in the grand scheme of things until we actually see that place.
This patch reduces the possibility of aliasing in the inode and page caches for
inodes that may be accessed by more than one NFS export. It also reduces the
number of superblocks required for NFS where there are many NFS exports being
used from a server (home directory server + autofs for example).
This in turn makes it simpler to do local caching of network filesystems, as it
can then be guaranteed that there won't be links from multiple inodes in
separate superblocks to the same cache file.
Obviously, cache aliasing between different levels of NFS protocol could still
be a problem, but at least that gives us another key to use when indexing the
cache.
This patch makes the following changes:
(1) The server record construction/destruction has been abstracted out into
its own set of functions to make things easier to get right. These have
been moved into fs/nfs/client.c.
All the code in fs/nfs/client.c has to do with the management of
connections to servers, and doesn't touch superblocks in any way; the
remaining code in fs/nfs/super.c has to do with VFS superblock management.
(2) The sequence of events undertaken by NFS mount is now reordered:
(a) A volume representation (struct nfs_server) is allocated.
(b) A server representation (struct nfs_client) is acquired. This may be
allocated or shared, and is keyed on server address, port and NFS
version.
(c) If allocated, the client representation is initialised. The state
member variable of nfs_client is used to prevent a race during
initialisation from two mounts.
(d) For NFS4 a simple pathwalk is performed, walking from FH to FH to find
the root filehandle for the mount (fs/nfs/getroot.c). For NFS2/3 we
are given the root FH in advance.
(e) The volume FSID is probed for on the root FH.
(f) The volume representation is initialised from the FSINFO record
retrieved on the root FH.
(g) sget() is called to acquire a superblock. This may be allocated or
shared, keyed on client pointer and FSID.
(h) If allocated, the superblock is initialised.
(i) If the superblock is shared, then the new nfs_server record is
discarded.
(j) The root dentry for this mount is looked up from the root FH.
(k) The root dentry for this mount is assigned to the vfsmount.
(3) nfs_readdir_lookup() creates dentries for each of the entries readdir()
returns; this function now attaches disconnected trees from alternate
roots that happen to be discovered attached to a directory being read (in
the same way nfs_lookup() is made to do for lookup ops).
The new d_materialise_unique() function is now used to do this, thus
permitting the whole thing to be done under one set of locks, and thus
avoiding any race between mount and lookup operations on the same
directory.
(4) The client management code uses a new debug facility: NFSDBG_CLIENT which
is set by echoing 1024 to /proc/net/sunrpc/nfs_debug.
(5) Clone mounts are now called xdev mounts.
(6) Use the dentry passed to the statfs() op as the handle for retrieving fs
statistics rather than the root dentry of the superblock (which is now a
dummy).
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Eliminate nfs_server::client_sys in favour of nfs_client::cl_rpcclient as we
only really need one per server that we're talking to since it doesn't have any
security on it.
The retransmission management variables are also moved to the common struct as
they're required to set up the cl_rpcclient connection.
The NFS2/3 client and client_acl connections are thenceforth derived by cloning
the cl_rpcclient connection and post-applying the authorisation flavour.
The code for setting up the initial common connection has been moved to
client.c as nfs_create_rpc_client(). All the NFS program definition tables are
also moved there as that's where they're now required rather than super.c.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move the rpc_ops from the nfs_server struct to the nfs_client struct as they're
common to all server records of a particular NFS protocol version.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Maintain a common server record for NFS2/3 as well as for NFS4 so that common
stuff can be moved there from struct nfs_server.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add some extra const qualifiers into NFS.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Generalise the nfs_client structure by:
(1) Moving nfs_client to a more general place (nfs_fs_sb.h).
(2) Renaming its maintenance routines to be non-NFS4 specific.
(3) Move those maintenance routines to a new non-NFS4 specific file (client.c)
and move the declarations to internal.h.
(4) Make nfs_find/get_client() take a full sockaddr_in to include the port
number (will be required for NFS2/3).
(5) Make nfs_find/get_client() take the NFS protocol version (again will be
required to differentiate NFS2, 3 & 4 client records).
Also:
(6) Make nfs_client construction proceed akin to inodes, marking them as under
construction and providing a function to indicate completion.
(7) Make nfs_get_client() wait interruptibly if it finds a client that it can
share, but that client is currently being constructed.
(8) Make nfs4_create_client() use (6) and (7) instead of locking cl_sem.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a set_capabilities NFS RPC op so that the server capabilities can be set.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a lookup filehandle NFS RPC op so that a file handle can be looked up
without requiring dentries and inodes and other VFS stuff when doing an NFS4
pathwalk during mounting.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Return an error when starting the idmapping pipe so that we can detect it
failing.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Rename nfs_server::nfs4_state to nfs_client as it will be used to represent the
client state for NFS2 and NFS3 also.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Rename struct nfs4_client to struct nfs_client so that it can become the basis
for a general client record for NFS2 and NFS3 in addition to NFS4.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The attached patch adds a new directory cache management function that prepares
a disconnected anonymous function to be connected into the dentry tree. The
anonymous dentry is transferred the name and parentage from another dentry.
The following changes were made in [try #2]:
(*) d_materialise_dentry() now switches the parentage of the two nodes around
correctly when one or other of them is self-referential.
The following changes were made in [try #7]:
(*) d_instantiate_unique() has had the interior part split out as function
__d_instantiate_unique(). Callers of this latter function must be holding
the appropriate locks.
(*) _d_rehash() has been added as a wrapper around __d_rehash() to call it
with the most obvious hash list (the one from the name). d_rehash() now
calls _d_rehash().
(*) d_materialise_dentry() is now __d_materialise_dentry() and is static.
(*) d_materialise_unique() added to perform the combination of d_find_alias(),
d_materialise_dentry() and d_add_unique() that the NFS client was doing
twice, all within a single dcache_lock critical section. This reduces the
number of times two different spinlocks were being accessed.
The following further changes were made:
(*) Add the dentries onto their parents d_subdirs lists.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The current access cache only allows one entry at a time to be cached for each
inode. Add a per-inode red-black tree in order to allow more than one to
be cached at a time.
Should significantly cut down the time spent in path traversal for shared
directories such as ${PATH}, /usr/share, etc.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
IFA_F_HOMEADDRESS is introduced for Mobile IPv6 Home Addresses on
Mobile Node.
The IFA_F_HOMEADDRESS flag should be set for Mobile IPv6 Home
Addresses for 2 purposes. 1) We need to check this on receipt of
Type 2 Routing Header (RFC3775 Secion 6.4), 2) We prefer Home
Address(es) in source address selection (RFC3484 Section 5 Rule 4).
Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
IFA_F_NODAD flag, similar to IN6_IFF_NODAD in BSDs, is introduced
to skip DAD.
This flag should be set to Mobile IPv6 Home Address(es) on Mobile
Node because DAD would fail if we should perform DAD; our Home Agent
protects our Home Address(es).
Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We do not always need proxy NDP functionality even we
enable forwarding.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the master PPTP connection times out while still having unfullfilled
expectations (and a GRE keymap entry) associated with it, the keymap entry
is not destroyed.
Add a destroy callback to struct ip_conntrack_helper and use it to destroy
PPTP siblings when the master is destroyed.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove duplicated expectation handling in the NAT helper and simplify
the remains in the conntrack helper.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a few header definitions to match RFC2637. Most importantly the
PptpOutCallRequest header included an invalid padding field and a
size check was disabled because of this.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The conntrack structure contains the call ID in host byte order for no
reason, get rid of back and forth conversions.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split the xt_compat_match/xt_compat_target into smaller type-safe functions
performing just one operation. Handle all alignment and size-related
conversions centrally in these function instead of requiring each module to
implement a full-blown conversion function. Replace ->compat callback by
->compat_from_user and ->compat_to_user callbacks, responsible for
converting just a single private structure.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that IPv6 supports policy routing we need to reroute in NF_IP6_LOCAL_OUT
when the mark value changes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kill listhelp.h and use the list.h functions instead.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Additionaly exports the following information when providing
the list of registered generic netlink families:
- protocol version
- header size
- maximum number of attributes
- list of available operations including
- id
- flags
- avaiability of policy and doit/dumpit function
libnl HEAD provides a utility to read this new information:
0x0010 nlctrl version 1
hdrsize 0 maxattr 6
op GETFAMILY (0x03) [POLICY,DOIT,DUMPIT]
0x0011 NLBL_MGMT version 1
hdrsize 0 maxattr 0
op unknown (0x02) [DOIT]
op unknown (0x03) [DOIT]
....
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Function sk_filter() is called from tcp_v{4,6}_rcv() functions with arg
needlock = 0, while socket is not locked at that moment. In order to avoid
this and similar issues in the future, use rcu for sk->sk_filter field read
protection.
Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Do some simple optimization on the nf_bridge_pad() function
and don't use magic constants. Eliminate a double call and
the #ifdef'd code for CONFIG_BRIDGE_NETFILTER.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanup and rearrangement for better style and clarity:
Split the function nf_bridge_maybe_copy_header into two pieces
Move copy portion out of line.
Use Ethernet header size macros.
Use header file to handle CONFIG_NETFILTER_BRIDGE differences
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There will be relatively small increase in sparse endian warnings, but
this (and sin_port) patch is a first step to make networking code
endian clean.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds transmit buffering to DCCP.
I have tested with CCID2/3 and with loss and rate limiting.
Signed off by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support masking the nfmark value before the search. The mask value is
global for all filters contained in one instance. It can only be set
when a new instance is created, all filters must specify the same mask.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a FRA_FWMASK attributes for fwmark masks. For compatibility a mask of
0xFFFFFFFF is used when a mark value != 0 is sent without a mask.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The grow algorithm is simple, we grow if:
1) we see a hash chain collision at insert, and
2) we haven't hit the hash size limit (currently 1*1024*1024 slots), and
3) the number of xfrm_state objects is > the current hash mask
All of this needs some tweaking.
Remove __initdata from "hashdist" so we can use it safely at run time.
Signed-off-by: David S. Miller <davem@davemloft.net>
Sub policy can be used through netlink socket.
PF_KEY uses main only and it is TODO to support sub.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sub policy is introduced. Main and sub policy are applied the same flow.
(Policy that current kernel uses is named as main.)
It is required another transformation policy management to keep IPsec
and Mobile IPv6 lives separate.
Policy which lives shorter time in kernel should be a sub i.e. normally
main is for IPsec and sub is for Mobile IPv6.
(Such usage as two IPsec policies on different database can be used, too.)
Limitation or TODOs:
- Sub policy is not supported for per socket one (it is always inserted as main).
- Current kernel makes cached outbound with flowi to skip searching database.
However this patch makes it disabled only when "two policies are used and
the first matched one is bypass case" because neither flowi nor bundle
information knows about transformation template size.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
XFRM_MSG_REPORT is a message as notification of state protocol and
selector from kernel to user-space.
Mobile IPv6 will use it when inbound reject is occurred at route
optimization to make user-space know a binding error requirement.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Mobility header definition for Mobile IPv6.
Based on MIPL2 kernel patch.
This patch was also written by: Antti Tuominen <anttit@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add inbound function of home address option by registering it to TLV
table for destination options header.
Based on MIPL2 kernel patch.
This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add home address option definition for Mobile IPv6.
Based on MIPL2 kernel patch.
Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch transformation state is updated last used time
for each sending. Xtime is used for it like other state lifetime
expiration.
Mobile IPv6 enabled nodes will want to know traffic status of each
binding (e.g. judgement to request binding refresh by correspondent node,
or to keep home/care-of nonce alive by mobile node).
The last used timestamp is an important hint about it.
Based on MIPL2 kernel patch.
This patch was also written by: Henrik Petander <petander@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Care-of address is carried by state as a transformation option like
IPsec encryption/authentication algorithm.
Based on MIPL2 kernel patch.
Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
XFRM_STATE_WILDRECV flag is introduced; the last resort state is set
it and receives packet which is not route optimized but uses such
extension headers i.e. Mobile IPv6 signaling (binding update and
acknowledgement). A node enabled Mobile IPv6 adds the state.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a support to search transformation states by its addresses
by using source address list for Mobile IPv6 usage.
To use it from user-space, it is also added a message type for
source address as a xfrm state option.
Based on MIPL2 kernel patch.
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Transformation mode is used as either IPsec transport or tunnel.
It is required to add two more items, route optimization and inbound trigger
for Mobile IPv6.
Based on MIPL2 kernel patch.
This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on MIPL2 kernel patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shared match functions can use this to make runtime decisions basen on the
used match.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unnecessary packed attributes in nfnetlink structures. Unfortunately
in a few cases they have to stay to avoid changing structure sizes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The size is verified by x_tables and isn't needed by the modules anymore.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces the mark event. ctnetlink can use this to know if
the mark needs to be dumped.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This replaces IPv4 DSCP target by address family independent version.
This also
- utilizes dsfield.h to get/mangle DS field in IPv4/IPv6 header
- fixes Kconfig help text.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This replaces IPv4 dscp match by address family independent version.
This also
- utilizes dsfield.h to get the DS field in IPv4/IPv6 header, and
- checks for the DSCP value from user space.
- fixes Kconfig help text.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds more statistics info under /proc/net/sctp/snmp
that should be useful for debugging. The additional events that
are counted now include timer expirations, retransmits, packet
and data chunk discards.
The Data chunk discards include all the cases where a data chunk
is discarded including high tsn, bad stream, dup tsn and the most
useful one(out of receive buffer/rwnd).
Also moved the SCTP MIB data structures from the generic include
directories to include/sctp/sctp.h.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds rtnl_notify() to send rtnetlink notification messages and
rtnl_set_sk_err() to report notification errors as socket
errors in order to indicate the need of a resync due to loss
of events.
nlmsg_report() is added to properly document the meaning of
NLM_F_ECHO.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce RTA_TABLE route attribute and FRA_TABLE routing rule attribute
to hold 32 bit routing table IDs. Usespace compatibility is provided by
continuing to accept and send the rtm_table field, but because of its
limited size it can only carry the low 8 bits of the table ID. This
implies that if larger IDs are used, _all_ userspace programs using them
need to use RTA_TABLE.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sock_register() doesn't change the family, so the protocols can
define it read-only. No caller ever checks return value from
sock_unregister()
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Three values in net_proto_family are defined but never used.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts the DECnet rules code to use the generic
rules system created by Thomas Graf <tgraf@suug.ch>.
Signed-off-by: Steven Whitehouse <steve@chygwyn.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements wrapper functions that provide a convenient way
to access the sockets API for in-kernel users like sunrpc, cifs &
ocfs2 etc and any future users.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnetlink_rcv_msg() is not longer required to parse attributes
for the neighbour tables layer, remove dependency on obsolete and
buggy rta_buf.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moves netlink neighbour bits to linux/neighbour.h. Also
moves bits to be exported to userspace from net/neighbour.h
to linux/neighbour.h and removes __KERNEL__ guards, userspace
is not supposed to be using it.
rtnetlink_rcv_msg() is not longer required to parse attributes
for the neighbour layer, remove dependency on obsolete and
buggy rta_buf.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update hardware checksums incrementally to avoid breaking GSO.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose
checksum still needs to be completed) and CHECKSUM_COMPLETE (for
incoming packets, device supplied full checksum).
Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support for policy routing rules including a new
local table for routes with a local destination.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add NetLabel support to the SELinux LSM and modify the
socket_post_create() LSM hook to return an error code. The most
significant part of this patch is the addition of NetLabel hooks into
the following SELinux LSM hooks:
* selinux_file_permission()
* selinux_socket_sendmsg()
* selinux_socket_post_create()
* selinux_socket_sock_rcv_skb()
* selinux_socket_getpeersec_stream()
* selinux_socket_getpeersec_dgram()
* selinux_sock_graft()
* selinux_inet_conn_request()
The basic reasoning behind this patch is that outgoing packets are
"NetLabel'd" by labeling their socket and the NetLabel security
attributes are checked via the additional hook in
selinux_socket_sock_rcv_skb(). NetLabel itself is only a labeling
mechanism, similar to filesystem extended attributes, it is up to the
SELinux enforcement mechanism to perform the actual access checks.
In addition to the changes outlined above this patch also includes
some changes to the extended bitmap (ebitmap) and multi-level security
(mls) code to import and export SELinux TE/MLS attributes into and out
of NetLabel.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the Commercial IP Security Option (CIPSO) to the IPv4
network stack. CIPSO has become a de-facto standard for
trusted/labeled networking amongst existing Trusted Operating Systems
such as Trusted Solaris, HP-UX CMW, etc. This implementation is
designed to be used with the NetLabel subsystem to provide explicit
packet labeling to LSM developers.
The CIPSO/IPv4 packet labeling works by the LSM calling a NetLabel API
function which attaches a CIPSO label (IPv4 option) to a given socket;
this in turn attaches the CIPSO label to every packet leaving the
socket without any extra processing on the outbound side. On the
inbound side the individual packet's sk_buff is examined through a
call to a NetLabel API function to determine if a CIPSO/IPv4 label is
present and if so the security attributes of the CIPSO label are
returned to the caller of the NetLabel API function.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changes to the core network stack to support the NetLabel subsystem. This
includes changes to the IPv4 option handling to support CIPSO labels.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This automatically labels the TCP, Unix stream, and dccp child sockets
as well as openreqs to be at the same MLS level as the peer. This will
result in the selection of appropriately labeled IPSec Security
Associations.
This also uses the sock's sid (as opposed to the isec sid) in SELinux
enforcement of secmark in rcv_skb and postroute_last hooks.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This defaults the label of socket-specific IPSec policies to be the
same as the socket they are set on.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This labels the flows that could utilize IPSec xfrms at the points the
flows are defined so that IPSec policy and SAs at the right label can
be used.
The following protos are currently not handled, but they should
continue to be able to use single-labeled IPSec like they currently
do.
ipmr
ip_gre
ipip
igmp
sit
sctp
ip6_tunnel (IPv6 over IPv6 tunnel device)
decnet
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This implements a seemless mechanism for xfrm policy selection and
state matching based on the flow sid. This also includes the necessary
SELinux enforcement pieces.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds security for IP sockets at the sock level. Security at the
sock level is needed to enforce the SELinux security policy for
security associations even when a sock is orphaned (such as in the TCP
LAST_ACK state).
This will also be used to enforce SELinux controls over data arriving
at or leaving a child socket while it's still waiting to be accepted.
Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.infradead.org/~dwmw2/hdroneline:
[HEADERS] One line per header in Kbuild files to reduce conflicts
Manual (trivial) conflict resolution in include/asm-s390/Kbuild
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (64 commits)
[BLOCK] dm-crypt: trivial comment improvements
[CRYPTO] api: Deprecate crypto_digest_* and crypto_alg_available
[CRYPTO] padlock: Convert padlock-sha to use crypto_hash
[CRYPTO] users: Use crypto_comp and crypto_has_*
[CRYPTO] api: Add crypto_comp and crypto_has_*
[CRYPTO] users: Use crypto_hash interface instead of crypto_digest
[SCSI] iscsi: Use crypto_hash interface instead of crypto_digest
[CRYPTO] digest: Remove old HMAC implementation
[CRYPTO] doc: Update documentation for hash and me
[SCTP]: Use HMAC template and hash interface
[IPSEC]: Use HMAC template and hash interface
[CRYPTO] tcrypt: Use HMAC template and hash interface
[CRYPTO] hmac: Add crypto template implementation
[CRYPTO] digest: Added user API for new hash type
[CRYPTO] api: Mark parts of cipher interface as deprecated
[PATCH] scatterlist: Add const to sg_set_buf/sg_init_one pointer argument
[CRYPTO] drivers: Remove obsolete block cipher operations
[CRYPTO] users: Use block ciphers where applicable
[SUNRPC] GSS: Use block ciphers where applicable
[IPSEC] ESP: Use block ciphers where applicable
...
ioremap must be balanced by an iounmap and failing to do so can result
in a memory leak.
Tested (compilation only) with:
- allmodconfig
- Modifying drivers/mtd/maps/Kconfig and drivers/mtd/nand/Kconfig to
make sure that the changed file is compiling without warning
Signed-off-by: Amol Lad <amol@verismonetworks.com>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
The fs_no mean used to be fs_enet driver driven, hence it was an
enumeration across all the possible fs_enet "users" in the SoC. Now, with
QE on the pipeline, and to make DTS descriptions more clear, fs_no features
relevant SoC part number, with additional field to describe the SoC type.
Another reason for that is now not only fs_enet is going to utilize those
stuff. There might be UART, HLDC, and even USB, so to prevent confusion and
be ready for upcoming OF_device transfer, fs_enet and cpm_uart drivers were
updated in that concern, as well as the relevant DTS.
Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com>
Incorporating the new way of cpm2 immr access, introduced in the previous
patch, into CPM2 peripheral devices (fs_enet and cpm_uart). Both ppc and
powerpc approved working( real actions taken in powerpc only, ppc just
has a wrapper to keep init stuff consistent).
Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com>
This patch marks the crypto_digest_* functions and crypto_alg_available
as deprecated. They've been replaced by crypto_hash_* and crypto_has_*
respectively.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch converts padlock-sha to use crypto_hash for its fallback.
It also changes the fallback selection to use selection by type instead
of name. This is done through the new CRYPTO_ALG_NEED_FALLBACK bit,
which is set if and only if an algorithm needs a fallback of the same
type.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds the crypto_comp type to complete the compile-time checking
conversion. The functions crypto_has_alg and crypto_has_cipher, etc. are
also added to replace crypto_alg_available.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch removes the old HMAC implementation now that nobody uses it
anymore.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing digest user interface is inadequate for support asynchronous
operations. For one it doesn't return a value to indicate success or
failure, nor does it take a per-operation descriptor which is essential
for the issuing of requests while other requests are still outstanding.
This patch is the first in a series of steps to remodel the interface
for asynchronous operations.
For the ease of transition the new interface will be known as "hash"
while the old one will remain as "digest".
This patch also changes sg_next to allow chaining.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Mark the parts of the cipher interface that have been replaced by
block ciphers as deprecated. Thanks to Andrew Morton for suggesting
doing this before removing them completely.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds a const modifier to the buf argument of sg_set_buf and
sg_init_one. This lets people call it with pointers that are const.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds the new type of block ciphers. Unlike current cipher
algorithms which operate on a single block at a time, block ciphers
operate on an arbitrarily long linear area of data. As it is block-based,
it will skip any data remaining at the end which cannot form a block.
The block cipher has one major difference when compared to the existing
block cipher implementation. The sg walking is now performed by the
algorithm rather than the cipher mid-layer. This is needed for drivers
that directly support sg lists. It also improves performance for all
algorithms as it reduces the total number of indirect calls by one.
In future the existing cipher algorithm will be converted to only have
a single-block interface. This will be done after all existing users
have switched over to the new block cipher type.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds two new operations for the simple cipher that encrypts or
decrypts a single block at a time. This will be the main interface after
the existing block operations have moved over to the new block ciphers.
It also adds the crypto_cipher type which is currently only used on the
new operations but will be extended to setkey as well once existing users
have been converted to use block ciphers where applicable.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds the crypto_type structure which will be used for all new
crypto algorithm types, beginning with block ciphers.
The primary purpose of this abstraction is to allow different crypto_type
objects for crypto algorithms of the same type, in particular, there will
be a different crypto_type objects for asynchronous algorithms.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Up until now all crypto transforms have been of the same type, struct
crypto_tfm, regardless of whether they are ciphers, digests, or other
types. As a result of that, we check the types at run-time before
each crypto operation.
This is rather cumbersome. We could instead use different C types for
each crypto type to ensure that the correct types are used at compile
time. That is, we would have crypto_cipher/crypto_digest instead of
just crypto_tfm. The appropriate type would then be required for the
actual operations such as crypto_digest_digest.
Now that we have the type/mask fields when looking up algorithms, it
is easy to request for an algorithm of the precise type that the user
wants. However, crypto_alloc_tfm currently does not expose these new
attributes.
This patch introduces the function crypto_alloc_base which will carry
these new parameters. It will be renamed to crypto_alloc_tfm once
all existing users have been converted.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds the asynchronous flag and changes all existing users to
only look up algorithms that are synchronous.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch makes IV operations on ECB fail through nocrypt_iv rather than
calling BUG(). This is needed to generalise CBC/ECB using the template
mechanism.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Now that the tfm is passed directly to setkey instead of the ctx, we no
longer need to pass the &tfm->crt_flags pointer.
This patch also gets rid of a few unnecessary checks on the key length
for ciphers as the cipher layer guarantees that the key length is within
the bounds specified by the algorithm.
Rather than testing dia_setkey every time, this patch does it only once
during crypto_alloc_tfm. The redundant check from crypto_digest_setkey
is also removed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add missing accessors for cra_driver_name and cra_priority.
Signed-off-by: Michal Ludvig <michal@logix.cz>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Spawns lock a specific crypto algorithm in place. They can then be used
with crypto_spawn_tfm to allocate a tfm for that algorithm. When the base
algorithm of a spawn is deregistered, all its spawns will be automatically
removed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cryptomgr module is a simple manager of crypto algorithm instances.
It ensures that parameterised algorithms of the type tmpl(alg) (e.g.,
cbc(aes)) are always created.
This is meant to satisfy the needs for most users. For more complex
cases such as deeper combinations or multiple parameters, a netlink
module will be created which allows arbitrary expressions to be parsed
in user-space.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a notifier chain for algorithm/template registration events.
This will be used to register compound algorithms such as cbc(aes). In
future this will also be passed onto user-space through netlink.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now we've relied on module reference counting to ensure that the
crypto_alg structures don't disappear from under us. This was good enough
as long as each crypto_alg came from exactly one module.
However, with parameterised crypto algorithms a crypto_alg object may need
two or more modules to operate. This means that we need to count the
references to the crypto_alg object directly.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously the __aligned__ attribute was added to the crypto_tfm context
member to ensure it is alinged correctly on architectures such as arm.
Unfortunately kmalloc does not use the same minimum alignment rules as
gcc so this is useless.
This patch changes it to use kmalloc's minimum alignment.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a bus for the adjunct processor interface. Up to 64 devices can
be connect to the ap bus interface, each device with 16 domains. That
makes 1024 message queues. The interface is asynchronous, the answer
to a message sent to a queue needs to be received at some later point
in time. Unfortunately the interface does not provide interrupts when
a message reply is pending. So the ap bus needs to implement some
fancy polling, each active queue is polled once per 1/HZ second or
continuously if an idle cpus exsists and the poll thread is activ
(see poll_thread parameter).
The ap bus uses the sysfs path /sys/bus/ap and has two bus attributes,
ap_domain and config_time. The ap_domain selects one of the 16 domains
to be used for this system. This limits the maximum number of ap devices
to 64. The config_time attribute contains the number of seconds between
two ap bus scans to find new devices.
The ap bus uses the modalias entries of the form "ap:tN" to autoload
the ap driver for hardware type N. Currently known types are:
3 - PCICC, 4 - PCICA, 5 - PCIXCC, 6 - CEX2A and 7 - CEX2C.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Ralph Wuerthner <rwuerthn@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* git://git.infradead.org/mtd-2.6:
[MTD] Use SEEK_{SET,CUR,END} instead of hardcoded values in mtdchar lseek()
MTD: Fix bug in fixup_convert_atmel_pri
[JFFS2][SUMMARY] Fix a summary collecting bug.
[PATCH] [MTD] DEVICES: Fill more device IDs in the structure of m25p80
MTD: Add lock/unlock operations for Atmel AT49BV6416
MTD: Convert Atmel PRI information to AMD format
fs/jffs2/xattr.c: remove dead code
[PATCH] [MTD] Maps: Add dependency on alternate probe methods to physmap
[PATCH] MTD: Add Macronix MX29F040 to JEDEC
[MTD] Fixes of performance and stability issues in CFI driver.
block2mtd.c: Make kernel boot command line arguments work (try 4)
[MTD NAND] Fix lookup error in nand_get_flash_type()
remove #error on !PCI from pmc551.c
MTD: [NAND] Fix the sharpsl driver after breakage from a core conversion
[MTD] NAND: OOB buffer offset fixups
make fs/jffs2/nodelist.c:jffs2_obsolete_node_frag() static
[PATCH] [MTD] NAND: fix dead URL in Kconfig
Some laptops have separate "rfkill" buttons for disabling/enabling
Bluetooth and WLAN.
Signed-off-by: Lennart Poettering <mzxreary@0pointer.de>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
BUS_VIRTUAL can be used when creating virtual devices using uinput driver.
Note that when uinput is used to drive a real piece of hardware "real" bus
type (such as BUS_USB, BUS_BLUETOOTH) should be specified.
Signed-off-by: Michael Hanselmann <linux-kernel@hansmi.ch>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Our ATA probe code checks that a device is not reporting a diagnostic
failure during start up. Unfortunately at least one device seems to like
doing this - the Gigabyte iRAM.
This is only done for the master right now (which is fine for the iRAM
as it is SATA), as with PATA some combinations of ATAPI device seem to
fool the check into seeing a drive that isn't there if it is applied to
the slave.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Non-uniform ports handling got broken while updating libata to handle
those in the same host. Only separate irq for the non-uniform
secondary port was implemented while all other fields (host flags,
transfer mode...) of the secondary port simply shared those of the
first.
For ata_piix combined mode, which ATM is the only user of non-uniform
ports, this causes the secondary port assume the wrong type. This can
cause PATA port to use SATA ops, which results in bogus check on PCS
and detection failure.
This patch adds ata_probe_ent->pinfo2 which points to optional
port_info for the secondary port. For the time being, this seems to
be the simplest solution. This workaround will be removed together
with ata_probe_ent itself after init model is updated to allow more
flexibility.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Nelson A. de Oliveira <naoliv@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch includes xt_SECMARK.h and xt_CONNSECMARK.h to the kernel
headers which are exported via 'make headers_install'. This is needed to
allow userland code to be built correctly with these features.
Please apply, and consider for inclusion with 2.6.18 as a bugfix.
Signed-off-by: James Morris <jmorris@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a capability flag for drivers to set when they can perform multi-
block transfers to cards _and_ correctly report the number of bytes
transferred should an error occur.
The last point is very important - if a driver reports more bytes than
were actually accepted by the card and an error occurs, there is the
possibility for data loss.
Pierre Ossman provided the patch for wbsd and sdhci.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Microsoft Natural Elite Pro keyboard produces unisual response to
the GET ID command - single byte 0xaa (normally keyboards produce
2-byte response). Fail GET ID command so atkbd gets a change to
do alternate probe.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
The ethtool_ops structure is immutable, it expected to be setup
by the driver and is never changed. This patch allows drivers to
declare there ethtool_ops structure read-only.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
There's useful stuff in <linux/timex.h> but <asm/timex.h> has nothing for
userspace. Stop exporting it, and include it only from within the existing
#ifdef __KERNEL__ part of <linux/timex.h>
This fixes a 'make headers_check' failure on i386 because asm-i386/timex.h
includes both asm-i386/tsc.h and asm-i386/processor.h, neither of which are
exported to userspace. It's not entirely clear _why_ it includes either of
these, but it does.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We don't need any of this crap included from the user-visible part of nfs_fs.h
-- remove it all.
In fact, we probably don't need anything but NFS_SUPER_MAGIC to be defined; is
there any need for anything else? And magic numbers should probably move to
<linux/magic.h> rather than being strewn across various fs-specific include
files which exist in userspace for solely that purpose.
With this patch, 'make header_check' works again at least on PowerPC.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
V4L/DVB (4608c): Fix I2C dependencies for saa7146 modules
V4L/DVB (4608b): i2c deps fix on DVB
V4L/DVB (4605): Fixes an issue with V4L1 and make headers-install
V4L/DVB (4520): Fix an error when loading bttv driver on PV M4900.
V4L/DVB (4511): Restore tuner_ymec_tvf66t5_b_dff_pal_ranges[] to fix UHF switch functionality
V4L/DVB (4494a): Fix compilation when V4L1 support is not present
V4L1 support should be disabled when no CONFIG_VIDEO_V4L1_COMPAT is defined,
to allow checking for broken V4L2 ports. This is very important during the
migration phase for V4L2 API.
However, userspace apps should be capable of using both APIs, since they need
to test at runtime, via VIDIOCGCAP ioctl, if V4L1 is supported. So, when
__KERNEL__ is not defined, those ioctls and corresponding structs should be
visible.
This patch also removes the obsolete defines HAVE_V4L1 and HAVE_V4L2, that
where causing some confusion, and were replaced by CONFIG_VIDEO_V4L1_COMPAT
and CONFIG_VIDEO_V4L2.
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
The logic in nfs_direct_read_schedule and nfs_direct_write_schedule can
allow data->npages to be one larger than rpages. This causes a page
pointer to be written beyond the end of the pagevec in nfs_read_data (or
nfs_write_data).
Fix this by making nfs_(read|write)_alloc() calculate the size of the
pagevec array, and initialise data->npages.
Also get rid of the redundant argument to nfs_commit_alloc().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
New SiS south bridge device ID is 0x966.
Next coming product will be 0x968. (Will be released in Q4, this year)
We don't make any updates to the IDE controller.
Signed-off-by: David Wang <touch@sis.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rather than having two places which independently calculate the
timeout for data transfers, make it a library function instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Pierre Ossman <drzeus@drzeus.cx>
Let drivers constify MMC host method operations tables,
moving them from ".data" to ".rodata".
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
linux/device.h header is not included in the David Woodhouse's
kernel-headers git tree which is used for userspace kernel headers. Which
results in compile errors when building iproute2. Attached patch moves
linux/device.h include under the #ifdef __KERNEL__ section.
Signed-off-by: Ismail Donmez <ismail@pardus.org.tr>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Frank v. Waveren pointed out that on 64bit machines the timespec to
ktime_t conversion might overflow. This is also true for timeval to
ktime_t conversions. This breaks a "sleep inf" on 64bit machines.
While a timespec/timeval with tx.sec = MAX_LONG is valid by specification
the internal representation of ktime_t is based on nanoseconds. The
conversion of seconds to nanoseconds overflows for seconds values >=
(MAX_LONG / NSEC_PER_SEC).
Check the seconds argument to the conversion and limit it to the maximum
time which can be represented by ktime_t.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Frank v Waveren <fvw@var.cx>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixes an error message on make xmldocs.
Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch formally adds support for the posting of FC events via netlink.
It is a followup to the original RFC at:
http://marc.theaimsgroup.com/?l=linux-scsi&m=114530667923464&w=2
and the initial posting at:
http://marc.theaimsgroup.com/?l=linux-scsi&m=115507374832500&w=2
The patch has been updated to optimize the send path, per the discussions
in the initial posting.
Per discussions at the Storage Summit and at OLS, we are to use netlink for
async events from transports. Also per discussions, to avoid a netlink
protocol per transport, I've create a single NETLINK_SCSITRANSPORT protocol,
which can then be used by all transports.
This patch:
- Creates new files scsi_netlink.c and scsi_netlink.h, which contains the
single and shared definitions for the SCSI Transport. It is tied into the
base SCSI subsystem intialization.
Contains a single interface routine, scsi_send_transport_event(), for a
transport to send an event (via multicast to a protocol specific group).
- Creates a new scsi_netlink_fc.h file, which contains the FC netlink event
messages
- Adds 3 new routines to the fc transport:
fc_get_event_number() - to get a FC event #
fc_host_post_event() - to send a simple FC event (32 bits of data)
fc_host_post_vendor_event() - to send a Vendor unique event, with
arbitrary amounts of data.
Note: the separation of event number allows for a LLD to send a standard
event, followed by vendor-specific data for the event.
Note: This patch assumes 2 prior fc transport patches have been installed:
http://marc.theaimsgroup.com/?l=linux-scsi&m=115555807316329&w=2http://marc.theaimsgroup.com/?l=linux-scsi&m=115581614930261&w=2
Sorry - next time I'll do something like making these individual
patches of the same posting when I know they'll be posted closely
together.
Signed-off-by: James Smart <James.Smart@emulex.com>
Tidy up configuration not to make SCSI always select NET
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6:
uhci-hcd: fix list access bug
USB: Support for ELECOM LD-USB20 in pegasus
USB: Add VIA quirk fixup for VT8235 usb2
USB: rtl8150_disconnect() needs tasklet_kill()
USB Storage: unusual_devs.h for Sony Ericsson M600i
USB Storage: Remove the finecam3 unusual_devs entry
UHCI: don't stop at an Iso error
usb gadget: g_ether spinlock recursion fix
USB: add all wacom device to hid-core.c blacklist
hid-core.c: Adds all GTCO CalComp Digitizers and InterWrite School Products to blacklist
USB floppy drive SAMSUNG SFD-321U/EP detected 8 times
Cleanup allocation and freeing of tsk->delays used by delay accounting.
This solves two problems reported for delay accounting:
1. oops in __delayacct_blkio_ticks
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0608.2/1844.html
Currently tsk->delays is getting freed too early in task exit which can
cause a NULL tsk->delays to get accessed via reading of /proc/<tgid>/stats.
The patch fixes this problem by freeing tsk->delays closer to when
task_struct itself is freed up. As a result, it also eliminates the use of
tsk->delays_lock which was only being used (inadequately) to safeguard
access to tsk->delays while a task was exiting.
2. Possible memory leak in kernel/delayacct.c
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0608.2/1389.html
The patch cleans up tsk->delays allocations after a bad fork which was
missing earlier.
The patch has been tested to fix the problems listed above and stress
tested with rapid calls to delay accounting's taskstats command interface
(which is the other path that can access the same data, besides the /proc
interface causing the oops above).
Signed-off-by: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The ZVC counter update threshold is currently set to a fixed value of 32.
This patch sets up the threshold depending on the number of processors and
the sizes of the zones in the system.
With the current threshold of 32, I was able to observe slight contention
when more than 130-140 processors concurrently updated the counters. The
contention vanished when I either increased the threshold to 64 or used
Andrew's idea of overstepping the interval (see ZVC overstep patch).
However, we saw contention again at 220-230 processors. So we need higher
values for larger systems.
But the current default is already a bit of an overkill for smaller
systems. Some systems have tiny zones where precision matters. For
example i386 and x86_64 have 16M DMA zones and either 900M ZONE_NORMAL or
ZONE_DMA32. These are even present on SMP and NUMA systems.
The patch here sets up a threshold based on the number of processors in the
system and the size of the zone that these counters are used for. The
threshold should grow logarithmically, so we use fls() as an easy
approximation.
Results of tests on a system with 1024 processors (4TB RAM)
The following output is from a test allocating 1GB of memory concurrently
on each processor (Forking the process. So contention on mmap_sem and the
pte locks is not a factor):
X MIN
TYPE: CPUS WALL WALL SYS USER TOTCPU
fork 1 0.552 0.552 0.540 0.012 0.552
fork 4 0.552 0.548 2.164 0.036 2.200
fork 16 0.564 0.548 8.812 0.164 8.976
fork 128 0.580 0.572 72.204 1.208 73.412
fork 256 1.300 0.660 310.400 2.160 312.560
fork 512 3.512 0.696 1526.836 4.816 1531.652
fork 1020 20.024 0.700 17243.176 6.688 17249.863
So a threshold of 32 is fine up to 128 processors. At 256 processors contention
becomes a factor.
Overstepping the counter (earlier patch) improves the numbers a bit:
fork 4 0.552 0.548 2.164 0.040 2.204
fork 16 0.552 0.548 8.640 0.148 8.788
fork 128 0.556 0.548 69.676 0.956 70.632
fork 256 0.876 0.636 212.468 2.108 214.576
fork 512 2.276 0.672 997.324 4.260 1001.584
fork 1020 13.564 0.680 11586.436 6.088 11592.523
Still contention at 512 and 1020. Contention at 1020 is down by a third.
256 still has a slight bit of contention.
After this patch the counter threshold will be set to 125 which reduces
contention significantly:
fork 128 0.560 0.548 69.776 0.932 70.708
fork 256 0.636 0.556 143.460 2.036 145.496
fork 512 0.640 0.548 284.244 4.236 288.480
fork 1020 1.500 0.588 1326.152 8.892 1335.044
[akpm@osdl.org: !SMP build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch to add VIA PCI quirk for Enhanced/Extended USB on VT8235
southbridge. It is needed in order to use EHCI/USB 2.0 with ACPI.
Without it IRQs are not routed correctly, you get an "Unlink after
no-IRQ?" error and the device is unusable.
I belive this could also be a fix for Bugzilla Bug 5835.
Signed-off-by: Mark Hindley <mark@hindley.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The current block queue implementation already contains most of the
machinery for shared tag maps. The only remaining pieces are a way to
allocate and destroy a tag map independently of the queues (so that
the maps can be managed on the life cycle of the overseeing entity)
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Right now, various kernel modules are being migrated over to use
request_firmware in order to pull in binary firmware blobs from userland
when the module is loaded. This makes sense.
However, there is right now little mechanism in place to automatically
determine which binary firmware blobs must be included with a kernel in
order to satisfy the prerequisites of these drivers. This affects
vendors, but also regular users to a certain extent too.
The attached patch introduces MODULE_FIRMWARE as a mechanism for
advertising that a particular firmware file is to be loaded - it will
then show up via modinfo and could be used e.g. when packaging a kernel.
Signed-off-by: Jon Masters <jcm@redhat.com>
Comments added in line with all the other MODULE_ tag
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The vast majority of drivers and changes are from Alan Cox. Albert Lee
contributed and maintains pata_pdc2027x. Adrian Bunk, Andrew Morton,
and Tejun Heo contributed various minor fixes and updates.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Unlike the other tty comment patch this one has code changes. Specifically
it limits the queue size for a tty to 64K characters (128Kbytes) worst case
even if the tty is ignoring tty->throttle. This is because certain drivers
don't honour the throttle value correctly, although it is a useful
safeguard anyway.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
reiserfs seems to have another locking level layer for the i_mutex due to the
xattrs-are-a-directory thing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
register_one_node()'s should be defined under CONFIG_NUMA=n.
fixes following bug.
CC init/version.o
LD init/built-in.o
LD .tmp_vmlinux1
mm/built-in.o: In function `add_memory': undefined reference to `register_one_node'
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
JBD currently allocates commit and frozen buffers from slabs. With
CONFIG_SLAB_DEBUG, its possible for an allocation to cross the page
boundary causing IO problems.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=200127
So, instead of allocating these from regular slabs - manage allocation from
its own slabs and disable slab debug for these slabs.
[akpm@osdl.org: cleanups]
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When reading /dev/vcsa while a font with more than 256 characters is
loaded, one of the attribute bits records the 9th bit of the character.
But depending on the console driver (vgacon or fbcon for instance), that's
bit 3 or bit 0. And there is no way for userland to know that, thus no way
for userland to safely grab the screen content. So here is a (tested)
patch:
Add a VT_GETHIFONTMASK ioctl for knowing which bit is the 9th bit for VC
text (vc_hi_font_mask field of the vc_data structure).
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is a patch that adds support for the Instashield IS-200 2 port PCI
serial card.
Signed-off-by: Peter Horton <pdh@colonel-panic.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The bridge-netfilter code will overwrite memory if there is not
headroom in the skb to save the header. This first showed up when
using Xen with sky2 driver that doesn't allocate the extra space.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Brown observed that the current limit of 32 bytes isn't enough to hold two
ip addresses and the rest of the stuff we're putting in it, so it's often
truncated to the point where it's unlikely to be unique. This can cause
spurious CLID_INUSE's from the server.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from fc8c17ec251e984ab3df9182ed097aa5b577c915 commit)
Some hardware uses port 664 for its hardware-based IPMI listener. Teach
the RPC client to avoid using that port by raising the default minimum port
number to 665.
Test plan:
Find a mainboard known to use port 664 for IPMI; enable IPMI; mount NFS
servers in a tight loop.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 58e8cb3a035d22fc386e1c53a5d98c3f219530fb commit)
Make it take a dentry argument instead of a path
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 648d4116eb2509f010f7f34704a650150309b3e7 commit)
The biggest change is that ata_host_set is renamed to ata_host.
* ata_host_set => ata_host
* ata_probe_ent->host_flags => ata_probe_ent->port_flags
* ata_probe_ent->host_set_flags => ata_probe_ent->_host_flags
* ata_host_stats => ata_port_stats
* ata_port->host => ata_port->scsi_host
* ata_port->host_set => ata_port->host
* ata_port_info->host_flags => ata_port_info->flags
* ata_(.*)host_set(.*)\(\) => ata_\1host\2()
The leading underscore in ata_probe_ent->_host_flags is to avoid
reusing ->host_flags for different purpose. Currently, the only user
of the field is libata-bmdma.c and probe_ent itself is scheduled to be
removed.
ata_port->host is reused for different purpose but this field is used
inside libata core proper and of different type.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This contains board-specific portion to respect driver changes (for 8272ads ,
885ads and 866ads). Altered platform_data structures as well as initial setup
routines relevant to fs_enet.
Changes to the mpc8560ads ppc/ code are also introduced, but mainly as
reference, since the entire board support is going to appear in arch/powerpc.
Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This makes it possible for HW PHY-less boards to utilize PAL goodies. Generic
routines to connect to fixed PHY are provided, as well as ability to specify
software callback that fills up link, speed, etc. information into PHY
descriptor (the latter feature not tested so far).
Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
When the bridge recomputes features, it does not maintain the
constraint that SG/GSO must be off if TX checksum is off.
This patch adds that constraint.
On a completely unrelated note, I've also added TSO6 and TSO_ECN
feature bits if GSO is enabled on the underlying device through
the new NETIF_F_GSO_SOFTWARE macro.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since __vlan_hwaccel_rx() is essentially bypassing the
netif_receive_skb() call that would have occurred if we did the VLAN
decapsulation in software, we are missing the skb_bond() call and the
assosciated checks it does.
Export those checks via an inline function, skb_bond_should_drop(),
and use this in __vlan_hwaccel_rx().
Signed-off-by: David S. Miller <davem@davemloft.net>
Atmel flash chips don't have PRI information in the same format as
AMD flash chips. This patch installs a fixup for all Atmel chips that
converts the relevant PRI fields into AMD format.
Only the fields that are actually used by the command set is actually
converted. The rest are initialized to zero (which should be safe)
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
Don't let fuse_readpages leave the @pages list not empty when exiting
on error.
[akpm@osdl.org: kernel-doc fixes]
Signed-off-by: Alexander Zarochentsev <zam@namesys.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
linux/backlight.h pulls in header files (eg. ioport.h) that break
compilation of userspace programs. To solve the problem, only include
backlight.h in fb.h if compiling kernel stuff.
Signed-off-by: Michal Januszewski <spock@gentoo.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The CFA world has some additional rules and drive modes we need to support for
newer expansion cards and on embedded boxes
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The IPv4/IPv6 datagram output path was using skb_trim to trim paged
packets because they know that the packet has not been cloned yet
(since the packet hasn't been given to anything else in the system).
This broke because skb_trim no longer allows paged packets to be
trimmed. Paged packets must be given to one of the pskb_trim functions
instead.
This patch adds a new pskb_trim_unique function to cover the IPv4/IPv6
datagram output path scenario and replaces the corresponding skb_trim
calls with it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>