linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-25 20:20:55 +07:00

Author	SHA1	Message	Date
Will Page	04bf7e745b	8250_pci: add support for National Instruments legacy 8420 RS232 boards Signed-off-by: Will Page <will.page@ni.com> Signed-off-by: Shawn Bohrer <shawn.bohrer@ni.com> Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-06 14:36:28 -07:00
Shawn Bohrer	46a0fac943	8250_pci: add support for National Instruments 843x RS232 devices This implements basic support for all 843x RS232 devices, but does not add DMA support. This means that sustained data transfers at high baud rates may not be possible on multiple ports simultaneously. Signed-off-by: Shawn Bohrer <shawn.bohrer@ni.com> Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-06 14:36:27 -07:00
Linus Torvalds	ffa009c366	Merge git://git.infradead.org/iommu-2.6 * git://git.infradead.org/iommu-2.6: drivers/pci/intr_remapping.c: include acpi.h intel-iommu: Fix oops in device_to_iommu() when devices not found. intel-iommu: Handle PCI domains appropriately. intel-iommu: Fix device-to-iommu mapping for PCI-PCI bridges. x2apic/intr-remap: decouple interrupt remapping from x2apic x86, dmar: check if it's initialized before disable queue invalidation intel-iommu: set compatibility format interrupt Intel IOMMU Suspend/Resume Support - Interrupt Remapping Intel IOMMU Suspend/Resume Support - Queued Invalidation Intel IOMMU Suspend/Resume Support - DMAR intel-iommu: Add for_each_iommu() and for_each_active_iommu() macros	2009-04-06 14:26:05 -07:00
Linus Torvalds	609862be07	Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: lockdep: add stack dumps to asserts hrtimer: fix rq->lock inversion (again)	2009-04-06 13:37:30 -07:00
Linus Torvalds	12fe32e4f9	Merge branch 'kmemtrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'kmemtrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: kmemtrace: trace kfree() calls with NULL or zero-length objects kmemtrace: small cleanups kmemtrace: restore original tracing data binary format, improve ABI kmemtrace: kmemtrace_alloc() must fill type_id kmemtrace: use tracepoints kmemtrace, rcu: don't include unnecessary headers, allow kmemtrace w/ tracepoints kmemtrace, rcu: fix rcupreempt.c data structure dependencies kmemtrace, rcu: fix rcu_tree_trace.c data structure dependencies kmemtrace, rcu: fix linux/rcutree.h and linux/rcuclassic.h dependencies kmemtrace, mm: fix slab.h dependency problem in mm/failslab.c kmemtrace, kbuild: fix slab.h dependency problem in lib/decompress_unlzma.c kmemtrace, kbuild: fix slab.h dependency problem in lib/decompress_bunzip2.c kmemtrace, kbuild: fix slab.h dependency problem in lib/decompress_inflate.c kmemtrace, squashfs: fix slab.h dependency problem in squasfs kmemtrace, befs: fix slab.h dependency problem kmemtrace, security: fix linux/key.h header file dependencies kmemtrace, fs: fix linux/fdtable.h header file dependencies kmemtrace, fs: uninline simple_transaction_set() kmemtrace, fs, security: move alloc_secdata() and free_secdata() to linux/security.h	2009-04-06 13:30:00 -07:00
Linus Torvalds	a63856252d	Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux * 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: (81 commits) nfsd41: define nfsd4_set_statp as noop for !CONFIG_NFSD_V4 nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc nfsd41: Documentation/filesystems/nfs41-server.txt nfsd41: CREATE_EXCLUSIVE4_1 nfsd41: SUPPATTR_EXCLCREAT attribute nfsd41: support for 3-word long attribute bitmask nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verify nfsd41: pass writable attrs mask to nfsd4_decode_fattr nfsd41: provide support for minor version 1 at rpc level nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions nfsd41: add OPEN4_SHARE_ACCESS_WANT nfs4_stateid bmap nfsd41: access_valid nfsd41: clientid handling nfsd41: check encode size for sessions maxresponse cached nfsd41: stateid handling nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op nfsd41: destroy_session operation nfsd41: non-page DRC for solo sequence responses nfsd41: Add a create session replay cache nfsd41: create_session operation ...	2009-04-06 13:25:56 -07:00
Linus Torvalds	b24241a092	Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6 * 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6: i2c: Delete unused i2c-algo-sgi helper module i2c: Delete many unused driver IDs i2c: Deprecate client_register and client_unregister methods	2009-04-06 13:25:27 -07:00
Linus Torvalds	3cd69271f8	Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds * 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds: leds: introduce lp5521 led driver leds: just ignore invalid GPIOs in leds-gpio leds: Fix &&/\|\| confusion in leds-pca9532.c leds: move h1940-leds's probe function to .devinit.text leds: remove an unnecessary "goto" on drivers/leds/leds-s3c24.c leds: add BD2802GU LED driver leds: remove experimental flag from leds-clevo-mail leds: Prevent multiple LED triggers with the same name leds: Add gpio-led trigger leds: Add rb532 LED driver for the User LED leds: Add suspend/resume state flags to leds-gpio leds: simple driver for pwm driven LEDs leds: Fix leds-gpio driver multiple module_init/exit usage leds: Add dac124s085 driver leds: allow led-drivers to use a variable range of brightness values leds: Add openfirmware platform device support	2009-04-06 13:22:45 -07:00
Yuji Shimada	296ccb086d	PCI: Setup disabled bridges even if buses are added This patch sets up disabled bridges even if buses have already been added. pci_assign_unassigned_resources is called after buses are added. pci_assign_unassigned_resources calls pci_bus_assign_resources. pci_bus_assign_resources calls pci_setup_bridge to configure BARs of bridges. Currently pci_setup_bridge returns immediately if the bus have already been added. So pci_assign_unassigned_resources can't configure BARs of bridges that were added in a disabled state; this patch fixes the issue. On logical hot-add, we need to prevent the kernel from re-initializing bridges that have already been initialized. To achieve this, pci_setup_bridge returns immediately if the bridge have already been enabled. We don't need to check whether the specified bus is a root bus or not. pci_setup_bridge is not called on a root bus, because a root bus does not have a bridge. The patch adds a new helper function, pci_is_enabled. I made the function name similar to pci_is_managed. The codes which use enable_cnt directly are changed to use pci_is_enabled. Acked-by: Alex Chiang <achiang@hp.com> Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-04-06 11:25:06 -07:00
Benny Halevy	04826f43d4	nfsd41: define nfsd4_set_statp as noop for !CONFIG_NFSD_V4 Fixes following modpost error: ERROR: "nfsd4_set_statp" [fs/nfsd/nfsd.ko] undefined! Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-06 09:48:20 -07:00
Benny Halevy	f0ad670d70	nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc Fixes the following compiler error: fs/nfsd/nfssvc.c: In function 'set_max_drc': fs/nfsd/nfssvc.c:240: error: 'NFSD_DRC_SIZE_SHIFT' undeclared CONFIG_NFSD_V4 is not set Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-06 09:17:53 -07:00
Jean Delvare	abe213d7f6	i2c: Delete unused i2c-algo-sgi helper module The i2c-algo-sgi code was merged into the vino driver, so we can delete it now. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2009-04-06 18:12:25 +02:00
Jean Delvare	7c8ad4aff0	i2c: Delete many unused driver IDs Delete many unused I2C driver IDs. We should be able to get rid of i2c_driver.id pretty soon now. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2009-04-06 18:12:25 +02:00
Jean Delvare	e3ee703366	i2c: Deprecate client_register and client_unregister methods The new i2c binding model makes the client_register and client_unregister methods of struct i2c_adapter useless, so we can remove them with the rest of the legacy model. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2009-04-06 18:12:24 +02:00
Kim Kyuwon	0b56129be7	leds: add BD2802GU LED driver ROHM BD2802GU is a RGB LED controller attached to i2c bus and specifically engineered for decoration purposes. This RGB controller incorporates lighting patterns and illuminates. This driver is designed to minimize power consumption, so when there is no emitting LED, it enters to reset state. And because the BD2802GU has lots of features that can't be covered by the current LED framework, it provides Advanced Configuration Function(ADF) mode, so that user applications can set registers of BD2802GU directly. Here are basic usage examples : ; to turn on LED (not blink) $ echo 1 > /sys/class/leds/led1_R/brightness ; to blink LED $ echo timer > /sys/class/leds/led1_R/trigger $ echo 1 > /sys/class/leds/led1_R/delay_on $ echo 1 > /sys/class/leds/led1_R/delay_off ; to turn off LED $ echo 0 > /sys/class/leds/led1_R/brightness [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Kim Kyuwon <chammoru@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>	2009-04-06 16:06:26 +01:00
Richard Purdie	defb512d25	leds: Add suspend/resume state flags to leds-gpio Add an option to preserve LED state when suspending/resuming to the LED gpio driver. Based on a suggestion from Robert Jarzmik. Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>	2009-04-06 16:06:26 +01:00
Luotao Fu	41c42ff5db	leds: simple driver for pwm driven LEDs Add a simple driver for pwm driver LEDs. pwm_id and period can be defined in board file. It is developed for pxa, however it is probably generic enough to be used on other platforms with pwm. Signed-off-by: Luotao Fu <l.fu@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>	2009-04-06 16:06:26 +01:00
Guennadi Liakhovetski	1bd465e6b0	leds: allow led-drivers to use a variable range of brightness values This patch allows drivers to override the default maximum brightness value of 255. We take care to preserve backwards-compatibility as much as possible, so that user-space ABI doesn't change for existing drivers. LED trigger code has also been updated to use the per-LED maximum. Signed-off-by: Guennadi Liakhovetski <lg@denx.de> Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>	2009-04-06 16:06:25 +01:00
Jens Axboe	aeb6fafb8f	block: Add flag for telling the IO schedulers NOT to anticipate more IO By default, CFQ will anticipate more IO from a given io context if the previously completed IO was sync. This used to be fine, since the only sync IO was reads and O_DIRECT writes. But with more "normal" sync writes being used now, we don't want to anticipate for those. Add a bio/request flag that informs the IO scheduler that this is a sync request that we should not idle for. Introduce WRITE_ODIRECT specifically for O_DIRECT writes, and make sure that the other sync writes set this flag. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-06 08:04:54 -07:00
Jens Axboe	a1f242524c	Add WRITE_SYNC_PLUG and SWRITE_SYNC_PLUG (S)WRITE_SYNC always unplugs the device right after IO submission. Sometimes we want to build up a queue before doing so, so add variants that explicitly DON'T unplug the queue. The caller must then do that after submitting all the IO. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-06 08:04:53 -07:00
Jens Axboe	1faa16d228	block: change the request allocation/congestion logic to be sync/async based This makes sure that we never wait on async IO for sync requests, instead of doing the split on writes vs reads. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-06 08:04:53 -07:00
Thomas Gleixner	81ec5364a5	[MTD] [NAND] Add support for 4KiB pages. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Siewior <bigeasy@linutronix.de> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-04-06 07:01:56 -07:00
Ingo Molnar	9efe21cb82	Merge branch 'linus' into irq/threaded Conflicts: include/linux/irq.h kernel/irq/handle.c	2009-04-06 01:41:22 +02:00
Linus Torvalds	48f286a28f	Merge branch 'for-next' of git://git.o-hand.com/linux-mfd * 'for-next' of git://git.o-hand.com/linux-mfd: mfd: fix da903x warning mfd: fix MAINTAINERS entry mfd: Use the value of the final spin when reading the AUXADC mfd: Storage class should be before const qualifier mfd: PASIC3: supply clock_rate to DS1WM via driver_data mfd: remove DS1WM clock handling mfd: remove unused PASIC3 bus_shift field pxa/magician: remove deprecated .bus_shift from PASIC3 platform_data mfd: convert PASIC3 to use MFD core mfd: convert DS1WM to use MFD core mfd: Support active high IRQs on WM835x mfd: Use bulk read to fill WM8350 register cache mfd: remove duplicated #include from pcf50633	2009-04-05 11:38:37 -07:00
Linus Torvalds	32fb6c1756	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (140 commits) ACPI: processor: use .notify method instead of installing handler directly ACPI: button: use .notify method instead of installing handler directly ACPI: support acpi_device_ops .notify methods toshiba-acpi: remove MAINTAINERS entry ACPI: battery: asynchronous init acer-wmi: Update copyright notice & documentation acer-wmi: Cleanup the failure cleanup handling acer-wmi: Blacklist Acer Aspire One video: build fix thinkpad-acpi: rework brightness support thinkpad-acpi: enhanced debugging messages for the fan subdriver thinkpad-acpi: enhanced debugging messages for the hotkey subdriver thinkpad-acpi: enhanced debugging messages for rfkill subdrivers thinkpad-acpi: restrict access to some firmware LEDs thinkpad-acpi: remove HKEY disable functionality thinkpad-acpi: add new debug helpers and warn of deprecated atts thinkpad-acpi: add missing log levels thinkpad-acpi: cleanup debug helpers thinkpad-acpi: documentation cleanup thinkpad-acpi: drop ibm-acpi alias ...	2009-04-05 11:16:25 -07:00
Linus Torvalds	3516c6a8dc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6: (714 commits) Staging: sxg: slicoss: Specify the license for Sahara SXG and Slicoss drivers Staging: serqt_usb: fix build due to proc tty changes Staging: serqt_usb: fix checkpatch errors Staging: serqt_usb: add TODO file Staging: serqt_usb: Lindent the code Staging: add USB serial Quatech driver staging: document that the wifi staging drivers a bit better Staging: echo cleanup Staging: BUG to BUG_ON changes Staging: remove some pointless conditionals before kfree_skb() Staging: line6: fix build error, select SND_RAWMIDI Staging: line6: fix checkpatch errors in variax.c Staging: line6: fix checkpatch errors in toneport.c Staging: line6: fix checkpatch errors in pcm.c Staging: line6: fix checkpatch errors in midibuf.c Staging: line6: fix checkpatch errors in midi.c Staging: line6: fix checkpatch errors in dumprequest.c Staging: line6: fix checkpatch errors in driver.c Staging: line6: fix checkpatch errors in audio.c Staging: line6: fix checkpatch errors in pod.c ...	2009-04-05 11:06:45 -07:00
Linus Torvalds	714f83d5d9	Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits) tracing, net: fix net tree and tracing tree merge interaction tracing, powerpc: fix powerpc tree and tracing tree interaction ring-buffer: do not remove reader page from list on ring buffer free function-graph: allow unregistering twice trace: make argument 'mem' of trace_seq_putmem() const tracing: add missing 'extern' keywords to trace_output.h tracing: provide trace_seq_reserve() blktrace: print out BLK_TN_MESSAGE properly blktrace: extract duplidate code blktrace: fix memory leak when freeing struct blk_io_trace blktrace: fix blk_probes_ref chaos blktrace: make classic output more classic blktrace: fix off-by-one bug blktrace: fix the original blktrace blktrace: fix a race when creating blk_tree_root in debugfs blktrace: fix timestamp in binary output tracing, Text Edit Lock: cleanup tracing: filter fix for TRACE_EVENT_FORMAT events ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release() x86: kretprobe-booster interrupt emulation code fix ... Fix up trivial conflicts in arch/parisc/include/asm/ftrace.h include/linux/memory.h kernel/extable.c kernel/module.c	2009-04-05 11:04:19 -07:00
Linus Torvalds	90975ef712	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: (36 commits) cpumask: remove cpumask allocation from idle_balance, fix numa, cpumask: move numa_node_id default implementation to topology.h, fix cpumask: remove cpumask allocation from idle_balance x86: cpumask: x86 mmio-mod.c use cpumask_var_t for downed_cpus x86: cpumask: update 32-bit APM not to mug current->cpus_allowed x86: microcode: cleanup x86: cpumask: use work_on_cpu in arch/x86/kernel/microcode_core.c cpumask: fix CONFIG_CPUMASK_OFFSTACK=y cpu hotunplug crash numa, cpumask: move numa_node_id default implementation to topology.h cpumask: convert node_to_cpumask_map[] to cpumask_var_t cpumask: remove x86 cpumask_t uses. cpumask: use cpumask_var_t in uv_flush_tlb_others. cpumask: remove cpumask_t assignment from vector_allocation_domain() cpumask: make Xen use the new operators. cpumask: clean up summit's send_IPI functions cpumask: use new cpumask functions throughout x86 x86: unify cpu_callin_mask/cpu_callout_mask/cpu_initialized_mask/cpu_sibling_setup_mask cpumask: convert struct cpuinfo_x86's llc_shared_map to cpumask_var_t cpumask: convert node_to_cpumask_map[] to cpumask_var_t x86: unify 32 and 64-bit node_to_cpumask_map ...	2009-04-05 10:33:07 -07:00
Linus Torvalds	cab4e4c43f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-module-and-param * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-module-and-param: module: use strstarts() strstarts: helper function for !strncmp(str, prefix, strlen(prefix)) arm: allow usage of string functions in linux/string.h module: don't use stop_machine on module load module: create a request_module_nowait() module: include other structures in module version check module: remove the SHF_ALLOC flag on the __versions section. module: clarify the force-loading taint message. module: Export symbols needed for Ksplice Ksplice: Add functions for walking kallsyms symbols module: remove module_text_address() module: __module_address module: Make find_symbol return a struct kernel_symbol kernel/module.c: fix an unused goto label param: fix charp parameters set via sysfs Fix trivial conflicts in kernel/extable.c manually.	2009-04-05 10:30:21 -07:00
Linus Torvalds	e4c393fd55	Merge branch 'printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: printk: correct the behavior of printk_timed_ratelimit() vsprintf: unify the format decoding layer for its 3 users, cleanup fix regression from "vsprintf: unify the format decoding layer for its 3 users" vsprintf: fix bug in negative value printing vsprintf: unify the format decoding layer for its 3 users vsprintf: add binary printf printk: introduce printk_once() Fix trivial conflicts (printk_once vs log_buf_kexec_setup() added near each other) in include/linux/kernel.h.	2009-04-05 10:23:25 -07:00
Len Brown	478c6a43fc	Merge branch 'linus' into release Conflicts: arch/x86/kernel/cpu/cpufreq/longhaul.c Signed-off-by: Len Brown <len.brown@intel.com>	2009-04-05 02:14:15 -04:00
Len Brown	33526a5360	Merge branch 'x2apic' into release	2009-04-05 01:51:51 -04:00
Len Brown	7c27fd19b6	Merge branch 'sony-laptop' into release	2009-04-05 01:42:14 -04:00
Len Brown	3266d63c06	Merge branch 'battery' into release	2009-04-05 01:39:26 -04:00
Len Brown	4f3bff70a6	Merge branch 'thermal' into release	2009-04-05 01:39:12 -04:00
Philipp Zabel	7d33ccbeec	mfd: remove DS1WM clock handling This driver requests a clock that usually is supplied by the MFD in which the DS1WM is contained. Currently, it is impossible for a MFD to register their clocks with the generic clock API due to different implementations across architectures. For now, this patch removes the clock handling from DS1WM altogether, trusting that the MFD enable/disable functions will switch the clock if needed. The clock rate is obtained from a new parameter in driver_data. Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Signed-off-by: Samuel Ortiz <sameo@openedhand.com>	2009-04-05 00:32:22 +02:00
Philipp Zabel	b72019dbd1	mfd: remove unused PASIC3 bus_shift field Removes the now-unused bus_shift field from pasic3_platform_data. Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Signed-off-by: Samuel Ortiz <sameo@openedhand.com>	2009-04-05 00:32:22 +02:00
Philipp Zabel	a23a175795	mfd: convert DS1WM to use MFD core This patch converts the DS1WM driver into an MFD cell. It also calculates the bus_shift parameter from the memory resource size. Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Signed-off-by: Samuel Ortiz <sameo@openedhand.com>	2009-04-05 00:32:20 +02:00
Mark Brown	3206450355	mfd: Support active high IRQs on WM835x Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Samuel Ortiz <sameo@openedhand.com>	2009-04-05 00:32:20 +02:00
Linus Torvalds	601cc11d05	Make non-compat preadv/pwritev use native register size Instead of always splitting the file offset into 32-bit 'high' and 'low' parts, just split them into the largest natural word-size - which in C terms is 'unsigned long'. This allows 64-bit architectures to avoid the unnecessary 32-bit shifting and masking for native format (while the compat interfaces will obviously always have to do it). This also changes the order of 'high' and 'low' to be "low first". Why? Because when we have it like this, the 64-bit system calls now don't use the "pos_high" argument at all, and it makes more sense for the native system call to simply match the user-mode prototype. This results in a much more natural calling convention, and allows the compiler to generate much more straightforward code. On x86-64, we now generate testq %rcx, %rcx # pos_l js .L122 #, movq %rcx, -48(%rbp) # pos_l, pos from the C source loff_t pos = pos_from_hilo(pos_h, pos_l); ... if (pos < 0) return -EINVAL; and the 'pos_h' register isn't even touched. It used to generate code like mov %r8d, %r8d # pos_low, pos_low salq $32, %rcx #, tmp71 movq %r8, %rax # pos_low, pos.386 orq %rcx, %rax # tmp71, pos.386 js .L122 #, movq %rax, -48(%rbp) # pos.386, pos which isn't _that_ horrible, but it does show how the natural word size is just a more sensible interface (same arguments will hold in the user level glibc wrapper function, of course, so the kernel side is just half of the equation!) Note: in all cases the user code wrapper can again be the same. You can just do #define HALF_BITS (sizeof(unsigned long)*4) __syscall(PWRITEV, fd, iov, count, offset, (offset >> HALF_BITS) >> HALF_BITS); or something like that. That way the user mode wrapper will also be nicely passing in a zero (it won't actually have to do the shifts, the compiler will understand what is going on) for the last argument. And that is a good idea, even if nobody will necessarily ever care: if we ever do move to a 128-bit lloff_t, this particular system call might be left alone. Of course, that will be the least of our worries if we really ever need to care, so this may not be worth really caring about. [ Fixed for lost 'loff_t' cast noticed by Andrew Morton ] Acked-by: Gerd Hoffmann <kraxel@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: Ingo Molnar <mingo@elte.hu> Cc: Ralf Baechle <ralf@linux-mips.org>> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-04 14:20:34 -07:00
David Brownell	1f24b5a8ec	[MTD] driver model updates Update driver model support in the MTD framework, so it fits better into the current udev-based hotplug framework: - Each mtd_info now has a device node. MTD drivers should set the dev.parent field to point to the physical device, before setting up partitions or otherwise declaring MTDs. - Those device nodes always map to /sys/class/mtdX device nodes, which no longer depend on MTD_CHARDEV. - Those mtdX sysfs nodes have a "starter set" of attributes; it's not yet sufficient to replace /proc/mtd. - Enabling MTD_CHARDEV provides /sys/class/mtdXro/ nodes and the /sys/class/mtd/dev attributes (for udev, mdev, etc). - Include a MODULE_ALIAS_CHARDEV_MAJOR macro. It'll work with udev creating the /dev/mtd nodes, not just a static rootfs. So the sysfs structure is pretty much what you'd expect, except that readonly chardev nodes are a bit quirky. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-04-04 14:29:07 +01:00
David Woodhouse	276dbf9970	intel-iommu: Handle PCI domains appropriately. We were comparing {bus,devfn} and assuming that a match meant it was the same device. It doesn't -- the same {bus,devfn} can exist in multiple PCI domains. Include domain number in device identification (and call it 'segment' in most places, because there's already a lot of references to 'domain' which means something else, and this code is infected with ACPI thinking already). Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-04-04 10:43:31 +01:00
Benny Halevy	79fb54abd2	nfsd41: CREATE_EXCLUSIVE4_1 Implement the CREATE_EXCLUSIVE4_1 open mode conforming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 This mode allows the client to atomically create a file if it doesn't exist while setting some of its attributes. It must be implemented if the server supports persistent reply cache and/or pnfs. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:23 -07:00
Benny Halevy	8c18f2052e	nfsd41: SUPPATTR_EXCLCREAT attribute Return bitmask for supported EXCLUSIVE4_1 create attributes. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:23 -07:00
Andy Adamson	7e70570647	nfsd41: support for 3-word long attribute bitmask Also, use client minorversion to generate supported attrs Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:23 -07:00
Marc Eshel	f3ec22b5b0	nfsd41: provide support for minor version 1 at rpc level Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:22 -07:00
Benny Halevy	8daf220a6a	nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions Support enabling and disabling nfsv4.1 via /proc/fs/nfsd/versions by writing the strings "+4.1" or "-4.1" correspondingly. Use user mode nfs-utils (rpc.nfsd option) to enable. This will allow us to get rid of CONFIG_NFSD_V4_1 [nfsd41: disable support for minorversion by default] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:21 -07:00
Andy Adamson	d87a8ade95	nfsd41: access_valid For nfs41, the open share flags are used also for delegation "wants" and "signals". Check that they are valid. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:21 -07:00
Andy Adamson	6668958fac	nfsd41: stateid handling When sessions are used, stateful operation sequenceid and stateid handling are not used. When sessions are used, on the first open set the seqid to 1, mark state confirmed and skip seqid processing. When sessionas are used the stateid generation number is ignored when it is zero whereas without sessions bad_stateid or stale stateid is returned. Add flags to propagate session use to all stateful ops and down to check_stateid_generation. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Andy Adamson <andros@netapp.com> [nfsd4_has_session should return a boolean, not u32] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: pass nfsd4_compoundres * to nfsd4_process_open1] [nfsd41: calculate HAS_SESSION in nfs4_preprocess_stateid_op] [nfsd41: calculate HAS_SESSION in nfs4_preprocess_seqid_op] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:19 -07:00
Benny Halevy	dd453dfd70	nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op Currently we only use cstate->current_fh, will also be used by nfsd41 code. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:19 -07:00
Benny Halevy	e10e0cfc2f	nfsd41: destroy_session operation Implement the destory_session operation confoming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 [use sessionid_lock spin lock] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:19 -07:00
Andy Adamson	bf864a31d5	nfsd41: non-page DRC for solo sequence responses A session inactivity time compound (lease renewal) or a compound where the sequence operation has sa_cachethis set to FALSE do not require any pages to be held in the v4.1 DRC. This is because struct nfsd4_slot is already caching the session information. Add logic to the nfs41 server to not cache response pages for solo sequence responses. Return nfserr_replay_uncached_rep on the operation following the sequence operation when sa_cachethis is FALSE. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use cstate session in nfsd4_replay_cache_entry] [nfsd41: rename nfsd4_no_page_in_cache] [nfsd41 rename nfsd4_enc_no_page_replay] [nfsd41 nfsd4_is_solo_sequence] [nfsd41 change nfsd4_not_cached return] Signed-off-by: Andy Adamson <andros@netapp.com> [changed return type to bool] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41 drop parens in nfsd4_is_solo_sequence call] Signed-off-by: Andy Adamson <andros@netapp.com> [changed "== 0" to "!"] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:19 -07:00
Andy Adamson	38eb76a54d	nfsd41: Add a create session replay cache Replace the nfs4_client cl_seqid field with a single struct nfs41_slot used for the create session replay cache. The CREATE_SESSION slot sets the sl_session pointer to NULL. Otherwise, the slot and it's replay cache are used just like the session slots. Fix unconfirmed create_session replay response by initializing the create_session slot sequence id to 0. A future patch will set the CREATE_SESSION cache when a SEQUENCE operation preceeds the CREATE_SESSION operation. This compound is currently only cached in the session slot table. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use bool inuse for slot state] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: revert portion of nfsd4_set_cache_entry] Signed-off-by: Andy Adamson <andros@netpp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:18 -07:00
Andy Adamson	ec6b5d7b50	nfsd41: create_session operation Implement the create_session operation confoming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 Look up the client id (generated by the server on exchange_id, given by the client on create_session). If neither a confirmed or unconfirmed client is found then the client id is stale If a confirmed cilent is found (i.e. we already received create_session for it) then compare the sequence id to determine if it's a replay or possibly a mis-ordered rpc. If the seqid is in order, update the confirmed client seqid and procedd with updating the session parameters. If an unconfirmed client_id is found then verify the creds and seqid. If both match move the client id to confirmed state and proceed with processing the create_session. Currently, we do not support persistent sessions, and RDMA. alloc_init_session generates a new sessionid and creates a session structure. NFSD_PAGES_PER_SLOT is used for the max response cached calculation, and for the counting of DRC pages using the hard limits set in struct srv_serv. A note on NFSD_PAGES_PER_SLOT: Other patches in this series allow for NFSD_PAGES_PER_SLOT + 1 pages to be cached in a DRC slot when the response size is less than NFSD_PAGES_PER_SLOT * PAGE_SIZE but xdr_buf pages are used. e.g. a READDIR operation will encode a small amount of data in the xdr_buf head, and then the READDIR in the xdr_buf pages. So, the hard limit calculation use of pages by a session is underestimated by the number of cached operations using the xdr_buf pages. Yet another patch caches no pages for the solo sequence operation, or any compound where cache_this is False. So the hard limit calculation use of pages by a session is overestimated by the number of these operations in the cache. TODO: improve resource pre-allocation and negotiate session parameters accordingly. Respect and possibly adjust backchannel attributes. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> [nfsd41: remove headerpadsz from channel attributes] Our client and server only support a headerpadsz of 0. [nfsd41: use DRC limits in fore channel init] [nfsd41: do not change CREATE_SESSION back channel attrs] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [use sessionid_lock spin lock] [nfsd41: use bool inuse for slot state] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41 remove sl_session from alloc_init_session] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [simplify nfsd4_encode_create_session error handling] [nfsd41: fix comment style in init_forechannel_attrs] [nfsd41: allocate struct nfsd4_session and slot table in one piece] [nfsd41: no need to INIT_LIST_HEAD in alloc_init_session just prior to list_add] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:18 -07:00
Andy Adamson	da3846a286	nfsd41: nfsd DRC logic Replay a request in nfsd4_sequence. Add a minorversion to struct nfsd4_compound_state. Pass the current slot to nfs4svc_encode_compound res via struct nfsd4_compoundres to set an NFSv4.1 DRC entry. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use bool inuse for slot state] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use cstate session in nfs4svc_encode_compoundres] [nfsd41 replace nfsd4_set_cache_entry] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:17 -07:00
Andy Adamson	c3d06f9ce8	nfsd41: hard page limit for DRC Use no more than 1/128th of the number of free pages at nfsd startup for the v4.1 DRC. This is an arbitrary default which should probably end up under the control of an administrator. Signed-off-by: Andy Adamson <andros@netapp.com> [moved added fields in struct svc_serv under CONFIG_NFSD_V4_1] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [fix set_max_drc calculation of sv_drc_max_pages] [moved NFSD_DRC_SIZE_SHIFT's declaration up in header file] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:17 -07:00
Andy Adamson	074fe89753	nfsd41: DRC save, restore, and clear functions Cache all the result pages, including the rpc header in rq_respages[0], for a request in the slot table cache entry. Cache the statp pointer from nfsd_dispatch which points into rq_respages[0] just past the rpc header. When setting a cache entry, calculate and save the length of the nfs data minus the rpc header for rq_respages[0]. When replaying a cache entry, replace the cached rpc header with the replayed request rpc result header, unless there is not enough room in the cached results first page. In that case, use the cached rpc header. The sessions fore channel maxresponse size cached is set to NFSD_PAGES_PER_SLOT * PAGE_SIZE. For compounds we are cacheing with operations such as READDIR that use the xdr_buf->pages to hold data, we choose to cache the extra page of data rather than copying data from xdr_buf->pages into the xdr_buf->head page. [nfsd41: limit cache to maxresponsesize_cached] [nfsd41: mv nfsd4_set_statp under CONFIG_NFSD_V4_1] [nfsd41: rename nfsd4_move_pages] [nfsd41: rename page_no variable] [nfsd41: rename nfsd4_set_cache_entry] [nfsd41: fix nfsd41_copy_replay_data comment] [nfsd41: add to nfsd4_set_cache_entry] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:17 -07:00
Benny Halevy	b85d4c01b7	nfsd41: sequence operation Implement the sequence operation conforming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 Check for stale clientid (as derived from the sessionid). Enforce slotid range and exactly-once semantics using the slotid and seqid. If everything went well renew the client lease and mark the slot INPROGRESS. Add a struct nfsd4_slot pointer to struct nfsd4_compound_state. To be used for sessions DRC replay. [nfsd41: rename sequence catchthis to cachethis] Signed-off-by: Andy Adamson<andros@netapp.com> [pulled some code to set cstate->slot from "nfsd DRC logic"] [use sessionid_lock spin lock] [nfsd41: use bool inuse for slot state] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd: add a struct nfsd4_slot pointer to struct nfsd4_compound_state] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: add nfsd4_session pointer to nfsd4_compound_state] [nfsd41: set cstate session] [nfsd41: use cstate session in nfsd4_sequence] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [simplify nfsd4_encode_sequence error handling] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:16 -07:00
Andy Adamson	a1bcecd29c	nfsd41: match clientid establishment method We need to distinguish between client names provided by NFSv4.0 clients SETCLIENTID and those provided by NFSv4.1 via EXCHANGE_ID when looking up the clientid by string. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Andy Adamson <andros@netapp.com> [nfsd41: use boolean values for use_exchange_id argument] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: simplify match_clientid_establishment logic] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:15 -07:00
Andy Adamson	0733d21338	nfsd41: exchange_id operation Implement the exchange_id operation confoming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-28 Based on the client provided name, hash a client id. If a confirmed one is found, compare the op's creds and verifier. If the creds match and the verifier is different then expire the old client (client re-incarnated), otherwise, if both match, assume it's a replay and ignore it. If an unconfirmed client is found, then copy the new creds and verifer if need update, otherwise assume replay. The client is moved to a confirmed state on create_session. In the nfs41 branch set the exchange_id flags to EXCHGID4_FLAG_USE_NON_PNFS \| EXCHGID4_FLAG_SUPP_MOVED_REFER (pNFS is not supported, Referrals are supported, Migration is not.). Address various scenarios from section 18.35 of the spec: 1. Check for EXCHGID4_FLAG_UPD_CONFIRMED_REC_A and set EXCHGID4_FLAG_CONFIRMED_R as appropriate. 2. Return error codes per 18.35.4 scenarios. 3. Update client records or generate new client ids depending on scenario. Note: 18.35.4 case 3 probably still needs revisiting. The handling seems not quite right. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Andy Adamosn <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use utsname for major_id (and copy to server_scope)] [nfsd41: fix handling of various exchange id scenarios] Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: reverse use of EXCHGID4_INVAL_FLAG_MASK_A] [simplify nfsd4_encode_exchange_id error handling] [nfsd41: embed an xdr_netobj in nfsd4_exchange_id] [nfsd41: return nfserr_serverfault for spa_how == SP4_MACH_CRED] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:15 -07:00
Andy Adamson	069b6ad4bb	nfsd41: proc stubs Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:14 -07:00
Andy Adamson	2db134eb3b	nfsd41: xdr infrastructure Define nfsd41_dec_ops vector and add it to nfsd4_minorversion for minorversion 1. Note: nfsd4_enc_ops vector is shared for v4.0 and v4.1 since we don't need to filter out obsolete ops as this is done in the decoding phase. exchange_id, create_session, destroy_session, and sequence ops are implemented as stubs returning nfserr_opnotsupp at this stage. [was nfsd41: xdr stubs] [get rid of CONFIG_NFSD_V4_1] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:14 -07:00
Marc Eshel	5282fd724b	nfsd41: sessionid hashing Simple sessionid hashing using its monotonically increasing sequence number. Locking considerations: sessionid_hashtbl access is controlled by the sessionid_lock spin lock. It must be taken for insert, delete, and lookup. nfsd4_sequence looks up the session id and if the session is found, it calls nfsd4_get_session (still under the sessionid_lock). nfsd4_destroy_session calls nfsd4_put_session after unhashing it, so when the session's kref reaches zero it's going to get freed. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [we don't use a prime for sessionid hash table size] [use sessionid_lock spin lock] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:14 -07:00
Marc Eshel	9fb870702d	nfsd41: introduce nfs4_client cl_sessions list [get rid of CONFIG_NFSD_V4_1] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:13 -07:00
Andy Adamson	7116ed6b99	nfsd41: sessions basic data types This patch provides basic data structures representing the nfs41 sessions and slots, plus helpers for keeping a reference count on the session and freeing it. Note that our server only support a headerpadsz of 0 and it ignores backchannel attributes at the moment. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: remove headerpadsz from channel attributes] [nfsd41: embed nfsd4_channel in nfsd4_session] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use bool inuse for slot state] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41 remove sl_session from nfsd4_slot] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:13 -07:00
Marc Eshel	10add806c3	nfsd41: define nfs41 error codes Define all error code present in http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: clean up error code definitions] [nfsd41: change NFSERR_REPLAY_ME] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:12 -07:00
Benny Halevy	18df1884a8	nfs41: common protocol definitions Define all NFSv4.1 common operation and error code constants. Note that some of the definitions are used by both the nfs41 client and the server code. This patch is duplicated in the nfs41 and nfsd41 sessions patchset. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: add exchange id flags] Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [removed server-only hunk changing NFSERR_REPLAY_ME] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: add SEQ4_XX to nfs41-common-protocol] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: generic error code update] [nfs41: reverse EXCHGID4_INVAL_FLAG_MASK_{A,R}] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:12 -07:00
Andy Adamson	2f425878b6	nfsd: don't use the deferral service, return NFS4ERR_DELAY On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be returned. It is up to the NFSv4.1 client to resend only the operations that have not been processed. Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in nfsd4_proc_compound() only when NFSv4.1 Sessions are used. Note: this isn't an adequate solution on its own. It's acceptable as a way to get some minimal 4.1 up and working, but we're going to have to find a way to avoid returning DELAY in all common cases before 4.1 can really be considered ready. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: reverse rq_nodeferral negative logic] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [sunrpc: initialize rq_usedeferral] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2009-04-03 17:41:12 -07:00
Linus Torvalds	b1dbb67911	Merge branch 'ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: s390: remove arch specific smp_send_stop() panic: clean up kernel/panic.c panic, smp: provide smp_send_stop() wrapper on UP too panic: decrease oops_in_progress only after having done the panic generic-ipi: eliminate WARN_ON()s during oops/panic generic-ipi: cleanups generic-ipi: remove CSD_FLAG_WAIT generic-ipi: remove kmalloc() generic IPI: simplify barriers and locking	2009-04-03 17:33:30 -07:00
Linus Torvalds	492f59f526	Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: locking: rename trace_softirq_[enter\|exit] => lockdep_softirq_[enter\|exit] lockdep: remove duplicate CONFIG_DEBUG_LOCKDEP definitions lockdep: require framepointers for x86 lockdep: remove extra "irq" string lockdep: fix incorrect state name	2009-04-03 17:29:53 -07:00
Suresh Siddha	7237d3de78	x86, ACPI: add support for x2apic ACPI extensions All logical processors with APIC ID values of 255 and greater will have their APIC reported through Processor X2APIC structure (type-9 entry type) and all logical processors with APIC ID less than 255 will have their APIC reported through legacy Processor Local APIC (type-0 entry type) only. This is the same case even for NMI structure reporting. The Processor X2APIC Affinity structure provides the association between the X2APIC ID of a logical processor and the proximity domain to which the logical processor belongs. For OSPM, Procssor IDs outside the 0-254 range are to be declared as Device() objects in the ACPI namespace. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2009-04-03 20:08:12 -04:00
Linus Torvalds	5fba0925fd	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: remove compat stuff HID: constify arrays of struct apple_key_translation HID: add support for Kye/Genius Ergo 525V HID: Support Apple mini aluminum keyboard HID: support for Kensington slimblade device HID: DragonRise game controller force feedback driver HID: add support for another version of 0e8f:0003 device in hid-pl HID: fix race between usb_register_dev() and hiddev_open() HID: bring back possibility to specify vid/pid ignore on module load HID: make HID_DEBUG defaults consistent HID: autosuspend -- fix lockup of hid on reset HID: hid_reset_resume() needs to be defined only when CONFIG_PM is set HID: fix USB HID devices after STD with autosuspend HID: do not try to compile PM code with CONFIG_PM unset HID: autosuspend support for USB HID	2009-04-03 15:25:44 -07:00
Linus Torvalds	811158b147	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits) trivial: Update my email address trivial: NULL noise: drivers/mtd/tests/mtd_*test.c trivial: NULL noise: drivers/media/dvb/frontends/drx397xD_fw.h trivial: Fix misspelling of "Celsius". trivial: remove unused variable 'path' in alloc_file() trivial: fix a pdlfush -> pdflush typo in comment trivial: jbd header comment typo fix for JBD_PARANOID_IOFAIL trivial: wusb: Storage class should be before const qualifier trivial: drivers/char/bsr.c: Storage class should be before const qualifier trivial: h8300: Storage class should be before const qualifier trivial: fix where cgroup documentation is not correctly referred to trivial: Give the right path in Documentation example trivial: MTD: remove EOL from MODULE_DESCRIPTION trivial: Fix typo in bio_split()'s documentation trivial: PWM: fix of #endif comment trivial: fix typos/grammar errors in Kconfig texts trivial: Fix misspelling of firmware trivial: cgroups: documentation typo and spelling corrections trivial: Update contact info for Jochen Hein trivial: fix typo "resgister" -> "register" ...	2009-04-03 15:24:35 -07:00
Evgeniy Polyakov	ce0d9d7255	Staging: dst: core files. This patch contains DST core files, which introduce block layer, connector and sysfs registration glue and main headers. Connector is used for the configuration of the node (its type, address, device name and so on). Sysfs provides bits of information about running devices in the following format: +/* + * DST sysfs tree for device called 'storage': + * + * /sys/bus/dst/devices/storage/ + * /sys/bus/dst/devices/storage/type : 192.168.4.80:1025 + * /sys/bus/dst/devices/storage/size : 800 + * /sys/bus/dst/devices/storage/name : storage + */ DST header contains structure definitions and protocol command description. Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-04-03 14:53:32 -07:00
Han, Weidong	161fde083f	intel-iommu: set compatibility format interrupt When extended interrupt mode (x2apic mode) is not supported in a system, it must set compatibility format interrupt to bypass interrupt remapping, otherwise compatibility format interrupts will be blocked. This will be used when interrupt remapping is enabled while x2apic is not supported. Signed-off-by: Weidong Han <weidong.han@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-04-03 21:46:01 +01:00
Fenghua Yu	b24696bc55	Intel IOMMU Suspend/Resume Support - Interrupt Remapping This patch enables suspend/resume for interrupt remapping. During suspend, interrupt remapping is disabled. When resume, interrupt remapping is enabled again. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-04-03 21:45:59 +01:00
Fenghua Yu	f59c7b69bc	Intel IOMMU Suspend/Resume Support - DMAR This patch implements the suspend and resume feature for Intel IOMMU DMAR. It hooks to kernel suspend and resume interface. When suspend happens, it saves necessary hardware registers. When resume happens, it restores the registers and restarts IOMMU by enabling translation, setting up root entry, and re-enabling queued invalidation. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2009-04-03 21:45:54 +01:00
David Woodhouse	8f912ba4d7	intel-iommu: Add for_each_iommu() and for_each_active_iommu() macros Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu>	2009-04-03 21:45:46 +01:00
Linus Torvalds	133e2a3164	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: dma: Add SoF and EoF debugging to ipu_idmac.c, minor cleanup dw_dmac: add cyclic API to DW DMA driver dmaengine: Add privatecnt to revert DMA_PRIVATE property dmatest: add dma interrupts and callbacks dmatest: add xor test dmaengine: allow dma support for async_tx to be toggled async_tx: provide __async_inline for HAS_DMA=n archs dmaengine: kill some unused headers dmaengine: initialize tx_list in dma_async_tx_descriptor_init dma: i.MX31 IPU DMA robustness improvements dma: improve section assignment in i.MX31 IPU DMA driver dma: ipu_idmac driver cosmetic clean-up dmaengine: fail device registration if channel registration fails	2009-04-03 12:13:45 -07:00
Linus Torvalds	20bec8ab14	Merge branch 'ext3-latency-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'ext3-latency-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext3: Add replace-on-rename hueristics for data=writeback mode ext3: Add replace-on-truncate hueristics for data=writeback mode ext3: Use WRITE_SYNC for commits which are caused by fsync() block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks	2009-04-03 11:10:33 -07:00
Linus Torvalds	18b34b9546	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6: (32 commits) regulator: twl4030 VAUX3 supports 3.0V regulator: Support disabling of unused regulators by machines regulator: Don't increment use_count for boot_on regulators twl4030-regulator: expose VPLL2 regulator: refcount fixes regulator: Don't warn if we failed to get a regulator regulator: Allow boot_on regulators to be disabled by clients regulator: Implement list_voltage for WM835x LDOs and DCDCs twl4030-regulator: list more VAUX4 voltages regulator: Don't warn on omitted voltage constraints regulator: Implement list_voltage() for WM8400 DCDCs and LDOs MMC: regulator utilities regulator: twl4030 voltage enumeration (v2) regulator: twl4030 regulators regulator: get_status() grows kerneldoc regulator: enumerate voltages (v2) regulator: Fix get_mode() for WM835x DCDCs regulator: Allow regulators to set the initial operating mode regulator: Suggest use of datasheet supply or pin names for consumers regulator: email - update email address and regulator webpage. ...	2009-04-03 10:39:20 -07:00
Linus Torvalds	ca1ee219c0	Merge git://git.infradead.org/iommu-2.6 * git://git.infradead.org/iommu-2.6: intel-iommu: Fix address wrap on 32-bit kernel. intel-iommu: Enable DMAR on 32-bit kernel. intel-iommu: fix PCI device detach from virtual machine intel-iommu: VT-d page table to support snooping control bit iommu: Add domain_has_cap iommu_ops intel-iommu: Snooping control support Fixed trivial conflicts in arch/x86/Kconfig and drivers/pci/intel-iommu.c	2009-04-03 10:36:57 -07:00
Yinghai Lu	9756b15e1b	irq: fix cpumask memory leak on offstack cpumask kernels Need to free the old cpumask for affinity and pending_mask. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <49D18FF0.50707@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-03 19:14:44 +02:00
Linus Torvalds	3cc50ac0db	Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache * git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (41 commits) NFS: Add mount options to enable local caching on NFS NFS: Display local caching state NFS: Store pages from an NFS inode into a local cache NFS: Read pages from FS-Cache into an NFS inode NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching NFS: Add read context retention for FS-Cache to call back with NFS: FS-Cache page management NFS: Add some new I/O counters for FS-Cache doing things for NFS NFS: Invalidate FsCache page flags when cache removed NFS: Use local disk inode cache NFS: Define and create inode-level cache objects NFS: Define and create superblock-level objects NFS: Define and create server-level objects NFS: Register NFS for caching and retrieve the top-level index NFS: Permit local filesystem caching to be enabled for NFS NFS: Add FS-Cache option bit and debug bit NFS: Add comment banners to some NFS functions FS-Cache: Make kAFS use FS-Cache CacheFiles: A cache that backs onto a mounted filesystem CacheFiles: Export things for CacheFiles ...	2009-04-03 10:07:43 -07:00
Linus Torvalds	d9b9be024a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (36 commits) dm: set queue ordered mode dm: move wait queue declaration dm: merge pushback and deferred bio lists dm: allow uninterruptible wait for pending io dm: merge __flush_deferred_io into caller dm: move bio_io_error into __split_and_process_bio dm: rename __split_bio dm: remove unnecessary struct dm_wq_req dm: remove unnecessary work queue context field dm: remove unnecessary work queue type field dm: bio list add bio_list_add_head dm snapshot: persistent fix dtr cleanup dm snapshot: move status to exception store dm snapshot: move ctr parsing to exception store dm snapshot: use DMEMIT macro for status dm snapshot: remove dm_snap header dm snapshot: remove dm_snap header use dm exception store: move cow pointer dm exception store: move chunk_fields dm exception store: move dm_target pointer ...	2009-04-03 10:02:45 -07:00
Kumar Gala	3688e07f83	Fix highmem PPC build failure Commit `f4112de6b6` ("mm: introduce debug_kmap_atomic") broke PPC builds with CONFIG_HIGHMEM=y: CC init/main.o In file included from include/linux/highmem.h:25, from include/linux/pagemap.h:11, from include/linux/mempolicy.h:63, from init/main.c:53: arch/powerpc/include/asm/highmem.h: In function 'kmap_atomic_prot': arch/powerpc/include/asm/highmem.h:98: error: implicit declaration of function 'debug_kmap_atomic' In file included from include/linux/pagemap.h:11, from include/linux/mempolicy.h:63, from init/main.c:53: include/linux/highmem.h: At top level: include/linux/highmem.h:196: warning: conflicting types for 'debug_kmap_atomic' include/linux/highmem.h:196: error: static declaration of 'debug_kmap_atomic' follows non-static declaration include/asm/highmem.h:98: error: previous implicit declaration of 'debug_kmap_atomic' was here make[1]: * [init/main.o] Error 1 make: * [init] Error 2 Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-03 09:48:29 -07:00
Linus Torvalds	c54c4dec61	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: ixp4xx - Fix handling of chained sg buffers crypto: shash - Fix unaligned calculation with short length hwrng: timeriomem - Use phys address rather than virt	2009-04-03 09:45:53 -07:00
Linus Torvalds	223cdea4c4	Merge branch 'for-linus' of git://neil.brown.name/md * 'for-linus' of git://neil.brown.name/md: (53 commits) md/raid5 revise rules for when to update metadata during reshape md/raid5: minor code cleanups in make_request. md: remove CONFIG_MD_RAID_RESHAPE config option. md/raid5: be more careful about write ordering when reshaping. md: don't display meaningless values in sysfs files resync_start and sync_speed md/raid5: allow layout and chunksize to be changed on active array. md/raid5: reshape using largest of old and new chunk size md/raid5: prepare for allowing reshape to change layout md/raid5: prepare for allowing reshape to change chunksize. md/raid5: clearly differentiate 'before' and 'after' stripes during reshape. Documentation/md.txt update md: allow number of drives in raid5 to be reduced md/raid5: change reshape-progress measurement to cope with reshaping backwards. md: add explicit method to signal the end of a reshape. md/raid5: enhance raid5_size to work correctly with negative delta_disks md/raid5: drop qd_idx from r6_state md/raid6: move raid6 data processing to raid6_pq.ko md: raid5 run(): Fix max_degraded for raid level 4. md: 'array_size' sysfs attribute md: centralize ->array_sectors modifications ...	2009-04-03 09:08:19 -07:00
Linus Torvalds	ea02259fdf	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/linux-hdreg-h-cleanup * git://git.kernel.org/pub/scm/linux/kernel/git/bart/linux-hdreg-h-cleanup: remove <linux/ata.h> include from <linux/hdreg.h> include/linux/hdreg.h: remove unused defines isd200: use ATA_* defines instead of _STAT and _ERR ones include/linux/hdreg.h: cover WIN_* and friends with #ifndef/#endif __KERNEL__ aoe: WIN_* -> ATA_CMD_* isd200: WIN_* -> ATA_CMD_* include/linux/hdreg.h: cover struct hd_driveid with #ifndef/#endif __KERNEL__ xsysace: make it 'struct hd_driveid'-free ubd_kern: make it 'struct hd_driveid'-free isd200: make it 'struct hd_driveid'-free	2009-04-03 09:02:32 -07:00
David Howells	f42b293d6d	NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching nfs_readpage_async() needs to be non-static so that it can be used as a fallback for the local on-disk caching should an EIO crop up when reading the cache. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:44 +01:00
David Howells	6a51091d07	NFS: Add some new I/O counters for FS-Cache doing things for NFS Add some new NFS I/O counters for FS-Cache doing things for NFS. A new line is emitted into /proc/pid/mountstats if caching is enabled that looks like: fsc: <rok> <rfl> <wok> <wfl> <unc> Where <rok> is the number of pages read successfully from the cache, <rfl> is the number of failed page reads against the cache, <wok> is the number of successful page writes to the cache, <wfl> is the number of failed page writes to the cache, and <unc> is the number of NFS pages that have been disconnected from the cache. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:43 +01:00
David Howells	ef79c097bb	NFS: Use local disk inode cache Bind data storage objects in the local cache to NFS inodes. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:43 +01:00
David Howells	08734048b3	NFS: Define and create superblock-level objects Define and create superblock-level cache index objects (as managed by nfs_server structs). Each superblock object is created in a server level index object and is itself an index into which inode-level objects are inserted. Ideally there would be one superblock-level object per server, and the former would be folded into the latter; however, since the "nosharecache" option exists this isn't possible. The superblock object key is a sequence consisting of: (1) Certain superblock s_flags. (2) Various connection parameters that serve to distinguish superblocks for sget(). (3) The volume FSID. (4) The security flavour. (5) The uniquifier length. (6) The uniquifier text. This is normally an empty string, unless the fsc=xyz mount option was used to explicitly specify a uniquifier. The key blob is of variable length, depending on the length of (6). The superblock object is given no coherency data to carry in the auxiliary data permitted by the cache. It is assumed that the superblock is always coherent. This patch also adds uniquification handling such that two otherwise identical superblocks, at least one of which is marked "nosharecache", won't end up trying to share the on-disk cache. It will be possible to manually provide a uniquifier through a mount option with a later patch to avoid the error otherwise produced. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:42 +01:00
David Howells	147272813e	NFS: Define and create server-level objects Define and create server-level cache index objects (as managed by nfs_client structs). Each server object is created in the NFS top-level index object and is itself an index into which superblock-level objects are inserted. Ideally there would be one superblock-level object per server, and the former would be folded into the latter; however, since the "nosharecache" option exists this isn't possible. The server object key is a sequence consisting of: (1) NFS version (2) Server address family (eg: AF_INET or AF_INET6) (3) Server port. (4) Server IP address. The key blob is of variable length, depending on the length of (4). The server object is given no coherency data to carry in the auxiliary data permitted by the cache. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:42 +01:00
David Howells	c6a6f19e22	NFS: Add FS-Cache option bit and debug bit Add FS-Cache option bit to nfs_server struct. This is set to indicate local on-disk caching is enabled for a particular superblock. Also add debug bit for local caching operations. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:42 +01:00
David Howells	385e1ca5f2	CacheFiles: Permit the page lock state to be monitored Add a function to install a monitor on the page lock waitqueue for a particular page, thus allowing the page being unlocked to be detected. This is used by CacheFiles to detect read completion on a page in the backing filesystem so that it can then copy the data to the waiting netfs page. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:39 +01:00
David Howells	b510882281	FS-Cache: Implement data I/O part of netfs API Implement the data I/O part of the FS-Cache netfs API. The documentation and API header file were added in a previous patch. This patch implements the following functions for the netfs to call: () fscache_attr_changed(). Indicate that the object has changed its attributes. The only attribute currently recorded is the file size. Only pages within the set file size will be stored in the cache. This operation is submitted for asynchronous processing, and will return immediately. It will return -ENOMEM if an out of memory error is encountered, -ENOBUFS if the object is not actually cached, or 0 if the operation is successfully queued. () fscache_read_or_alloc_page(). () fscache_read_or_alloc_pages(). Request data be fetched from the disk, and allocate internal metadata to track the netfs pages and reserve disk space for unknown pages. These operations perform semi-asynchronous data reads. Upon returning they will indicate which pages they think can be retrieved from disk, and will have set in progress attempts to retrieve those pages. These will return, in order of preference, -ENOMEM on memory allocation error, -ERESTARTSYS if a signal interrupted proceedings, -ENODATA if one or more requested pages are not yet cached, -ENOBUFS if the object is not actually cached or if there isn't space for future pages to be cached on this object, or 0 if successful. In the case of the multipage function, the pages for which reads are set in progress will be removed from the list and the page count decreased appropriately. If any read operations should fail, the completion function will be given an error, and will also be passed contextual information to allow the netfs to fall back to querying the server for the absent pages. For each successful read, the page completion function will also be called. Any pages subsequently tracked by the cache will have PG_fscache set upon them on return. fscache_uncache_page() must be called for such pages. If supplied by the netfs, the mark_pages_cached() cookie op will be invoked for any pages now tracked. () fscache_alloc_page(). Allocate internal metadata to track a netfs page and reserve disk space. This will return -ENOMEM on memory allocation error, -ERESTARTSYS on signal, -ENOBUFS if the object isn't cached, or there isn't enough space in the cache, or 0 if successful. Any pages subsequently tracked by the cache will have PG_fscache set upon them on return. fscache_uncache_page() must be called for such pages. If supplied by the netfs, the mark_pages_cached() cookie op will be invoked for any pages now tracked. () fscache_write_page(). Request data be stored to disk. This may only be called on pages that have been read or alloc'd by the above three functions and have not yet been uncached. This will return -ENOMEM on memory allocation error, -ERESTARTSYS on signal, -ENOBUFS if the object isn't cached, or there isn't immediately enough space in the cache, or 0 if successful. On a successful return, this operation will have queued the page for asynchronous writing to the cache. The page will be returned with PG_fscache_write set until the write completes one way or another. The caller will not be notified if the write fails due to an I/O error. If that happens, the object will become available and all pending writes will be aborted. Note that the cache may batch up page writes, and so it may take a while to get around to writing them out. The caller must assume that until PG_fscache_write is cleared the page is use by the cache. Any changes made to the page may be reflected on disk. The page may even be under DMA. () fscache_uncache_page(). Indicate that the cache should stop tracking a page previously read or alloc'd from the cache. If the page was alloc'd only, but unwritten, it will not appear on disk. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:39 +01:00
David Howells	ccc4fc3d11	FS-Cache: Implement the cookie management part of the netfs API Implement the cookie management part of the FS-Cache netfs client API. The documentation and API header file were added in a previous patch. This patch implements the following three functions: (1) fscache_acquire_cookie(). Acquire a cookie to represent an object to the netfs. If the object in question is a non-index object, then that object and its parent indices will be created on disk at this point if they don't already exist. Index creation is deferred because an index may reside in multiple caches. (2) fscache_relinquish_cookie(). Retire or release a cookie previously acquired. At this point, the object on disk may be destroyed. (3) fscache_update_cookie(). Update the in-cache representation of a cookie. This is used to update the auxiliary data for coherency management purposes. With this patch it is possible to have a netfs instruct a cache backend to look up, validate and create metadata on disk and to destroy it again. The ability to actually store and retrieve data in the objects so created is added in later patches. Note that these functions will never return an error. _All_ errors are handled internally to FS-Cache. The worst that can happen is that fscache_acquire_cookie() may return a NULL pointer - which is considered a negative cookie pointer and can be passed back to any function that takes a cookie without harm. A negative cookie pointer merely suppresses caching at that level. The stub in linux/fscache.h will detect inline the negative cookie pointer and abort the operation as fast as possible. This means that the compiler doesn't have to set up for a call in that case. See the documentation in Documentation/filesystems/caching/netfs-api.txt for more information. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:38 +01:00
David Howells	726dd7ff10	FS-Cache: Add netfs registration Add functions to register and unregister a network filesystem or other client of the FS-Cache service. This allocates and releases the cookie representing the top-level index for a netfs, and makes it available to the netfs. If the FS-Cache facility is disabled, then the calls are optimised away at compile time. Note that whilst this patch may appear to work with FS-Cache enabled and a netfs attempting to use it, it will leak the cookie it allocates for the netfs as fscache_relinquish_cookie() is implemented in a later patch. This will cause the slab code to emit a warning when the module is removed. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:38 +01:00
David Howells	0e04d4cefc	FS-Cache: Add cache tag handling Implement two features of FS-Cache: (1) The ability to request and release cache tags - names by which a cache may be known to a netfs, and thus selected for use. (2) An internal function by which a cache is selected by consulting the netfs, if the netfs wishes to be consulted. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:37 +01:00
David Howells	7394daa8c6	FS-Cache: Add use of /proc and presentation of statistics Make FS-Cache create its /proc interface and present various statistical information through it. Also provide the functions for updating this information. These features are enabled by: CONFIG_FSCACHE_PROC CONFIG_FSCACHE_STATS CONFIG_FSCACHE_HISTOGRAM The /proc directory for FS-Cache is also exported so that caching modules can add their own statistics there too. The FS-Cache module is loadable at this point, and the statistics files can be examined by userspace: cat /proc/fs/fscache/stats cat /proc/fs/fscache/histogram Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:37 +01:00
David Howells	0dfc41d1ef	FS-Cache: Add the FS-Cache cache backend API and documentation Add the API for a generic facility (FS-Cache) by which caches may declare them selves open for business, and may obtain work to be done from network filesystems. The header file is included by: #include <linux/fscache-cache.h> Documentation for the API is also added to: Documentation/filesystems/caching/backend-api.txt This API is not usable without the implementation of the utility functions which will be added in further patches. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:36 +01:00
David Howells	2d6fff6370	FS-Cache: Add the FS-Cache netfs API and documentation Add the API for a generic facility (FS-Cache) by which filesystems (such as AFS or NFS) may call on local caching capabilities without having to know anything about how the cache works, or even if there is a cache: +---------+ \| \| +--------------+ \| NFS \|--+ \| \| \| \| \| +-->\| CacheFS \| +---------+ \| +----------+ \| \| /dev/hda5 \| \| \| \| \| +--------------+ +---------+ +-->\| \| \| \| \| \| \|--+ \| AFS \|----->\| FS-Cache \| \| \| \| \|--+ +---------+ +-->\| \| \| \| \| \| \| +--------------+ +---------+ \| +----------+ \| \| \| \| \| \| +-->\| CacheFiles \| \| ISOFS \|--+ \| /var/cache \| \| \| +--------------+ +---------+ General documentation and documentation of the netfs specific API are provided in addition to the header files. As this patch stands, it is possible to build a filesystem against the facility and attempt to use it. All that will happen is that all requests will be immediately denied as if no cache is present. Further patches will implement the core of the facility. The facility will transfer requests from networking filesystems to appropriate caches if possible, or else gracefully deny them. If this facility is disabled in the kernel configuration, then all its operations will trivially reduce to nothing during compilation. WHY NOT I_MAPPING? ================== I have added my own API to implement caching rather than using i_mapping to do this for a number of reasons. These have been discussed a lot on the LKML and CacheFS mailing lists, but to summarise the basics: (1) Most filesystems don't do hole reportage. Holes in files are treated as blocks of zeros and can't be distinguished otherwise, making it difficult to distinguish blocks that have been read from the network and cached from those that haven't. (2) The backing inode must be fully populated before being exposed to userspace through the main inode because the VM/VFS goes directly to the backing inode and does not interrogate the front inode's VM ops. Therefore: (a) The backing inode must fit entirely within the cache. (b) All backed files currently open must fit entirely within the cache at the same time. (c) A working set of files in total larger than the cache may not be cached. (d) A file may not grow larger than the available space in the cache. (e) A file that's open and cached, and remotely grows larger than the cache is potentially stuffed. (3) Writes go to the backing filesystem, and can only be transferred to the network when the file is closed. (4) There's no record of what changes have been made, so the whole file must be written back. (5) The pages belong to the backing filesystem, and all metadata associated with that page are relevant only to the backing filesystem, and not anything stacked atop it. OVERVIEW ======== FS-Cache provides (or will provide) the following facilities: (1) Caches can be added / removed at any time, even whilst in use. (2) Adds a facility by which tags can be used to refer to caches, even if they're not available yet. (3) More than one cache can be used at once. Caches can be selected explicitly by use of tags. (4) The netfs is provided with an interface that allows either party to withdraw caching facilities from a file (required for (1)). (5) A netfs may annotate cache objects that belongs to it. This permits the storage of coherency maintenance data. (6) Cache objects will be pinnable and space reservations will be possible. (7) The interface to the netfs returns as few errors as possible, preferring rather to let the netfs remain oblivious. (8) Cookies are used to represent indices, files and other objects to the netfs. The simplest cookie is just a NULL pointer - indicating nothing cached there. (9) The netfs is allowed to propose - dynamically - any index hierarchy it desires, though it must be aware that the index search function is recursive, stack space is limited, and indices can only be children of indices. (10) Indices can be used to group files together to reduce key size and to make group invalidation easier. The use of indices may make lookup quicker, but that's cache dependent. (11) Data I/O is effectively done directly to and from the netfs's pages. The netfs indicates that page A is at index B of the data-file represented by cookie C, and that it should be read or written. The cache backend may or may not start I/O on that page, but if it does, a netfs callback will be invoked to indicate completion. The I/O may be either synchronous or asynchronous. (12) Cookies can be "retired" upon release. At this point FS-Cache will mark them as obsolete and the index hierarchy rooted at that point will get recycled. (13) The netfs provides a "match" function for index searches. In addition to saying whether a match was made or not, this can also specify that an entry should be updated or deleted. FS-Cache maintains a virtual index tree in which all indices, files, objects and pages are kept. Bits of this tree may actually reside in one or more caches. FSDEF \| +------------------------------------+ \| \| NFS AFS \| \| +--------------------------+ +-----------+ \| \| \| \| homedir mirror afs.org redhat.com \| \| \| +------------+ +---------------+ +----------+ \| \| \| \| \| \| 00001 00002 00007 00125 vol00001 vol00002 \| \| \| \| \| +---+---+ +-----+ +---+ +------+------+ +-----+----+ \| \| \| \| \| \| \| \| \| \| \| \| \| PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak \| \| PG0 +-------+ \| \| 00001 00003 \| +---+---+ \| \| \| PG0 PG1 PG2 In the example above, two netfs's can be seen to be backed: NFS and AFS. These have different index hierarchies: () The NFS primary index will probably contain per-server indices. Each server index is indexed by NFS file handles to get data file objects. Each data file objects can have an array of pages, but may also have further child objects, such as extended attributes and directory entries. Extended attribute objects themselves have page-array contents. () The AFS primary index contains per-cell indices. Each cell index contains per-logical-volume indices. Each of volume index contains up to three indices for the read-write, read-only and backup mirrors of those volumes. Each of these contains vnode data file objects, each of which contains an array of pages. The very top index is the FS-Cache master index in which individual netfs's have entries. Any index object may reside in more than one cache, provided it only has index children. Any index with non-index object children will be assumed to only reside in one cache. The FS-Cache overview can be found in: Documentation/filesystems/caching/fscache.txt The netfs API to FS-Cache can be found in: Documentation/filesystems/caching/netfs-api.txt Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:36 +01:00
David Howells	266cf658ef	FS-Cache: Recruit a page flags for cache management Recruit a page flag to aid in cache management. The following extra flag is defined: (1) PG_fscache (PG_private_2) The marked page is backed by a local cache and is pinning resources in the cache driver. If PG_fscache is set, then things that checked for PG_private will now also check for that. This includes things like truncation and page invalidation. The function page_has_private() had been added to make the checks for both PG_private and PG_private_2 at the same time. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:36 +01:00
David Howells	03fb3d2af9	FS-Cache: Release page->private after failed readahead The attached patch causes read_cache_pages() to release page-private data on a page for which add_to_page_cache() fails. If the filler function fails, then the problematic page is left attached to the pagecache (with appropriate flags set, one presumes) and the remaining to-be-attached pages are invalidated and discarded. This permits pages with caching references associated with them to be cleaned up. The invalidatepage() address space op is called (indirectly) to do the honours. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:35 +01:00
David Howells	8f0aa2f25b	Document the slow work thread pool Document the slow work thread pool. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:35 +01:00
David Howells	12e22c5e4b	Make the slow work pool configurable Make the slow work pool configurable through /proc/sys/kernel/slow-work. () /proc/sys/kernel/slow-work/min-threads The minimum number of threads that should be in the pool as long as it is in use. This may be anywhere between 2 and max-threads. () /proc/sys/kernel/slow-work/max-threads The maximum number of threads that should in the pool. This may be anywhere between min-threads and 255 or NR_CPUS * 2, whichever is greater. (*) /proc/sys/kernel/slow-work/vslow-percentage The percentage of active threads in the pool that may be used to execute very slow work items. This may be between 1 and 99. The resultant number is bounded to between 1 and one fewer than the number of active threads. This ensures there is always at least one thread that can process very slow work items, and always at least one thread that won't. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:35 +01:00
David Howells	07fe7cb7c7	Create a dynamically sized pool of threads for doing very slow work items Create a dynamically sized pool of threads for doing very slow work items, such as invoking mkdir() or rmdir() - things that may take a long time and may sleep, holding mutexes/semaphores and hogging a thread, and are thus unsuitable for workqueues. The number of threads is always at least a settable minimum, but more are started when there's more work to do, up to a limit. Because of the nature of the load, it's not suitable for a 1-thread-per-CPU type pool. A system with one CPU may well want several threads. This is used by FS-Cache to do slow caching operations in the background, such as looking up, creating or deleting cache objects. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Steve Dickson <steved@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Daire Byrne <Daire.Byrne@framestore.com>	2009-04-03 16:42:35 +01:00
Eduard - Gabriel Munteanu	ca2b84cb3c	kmemtrace: use tracepoints kmemtrace now uses tracepoints instead of markers. We no longer need to use format specifiers to pass arguments. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> [ folded: Use the new TP_PROTO and TP_ARGS to fix the build. ] [ folded: fix build when CONFIG_KMEMTRACE is disabled. ] [ folded: define tracepoints when CONFIG_TRACEPOINTS is enabled. ] Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <ae61c0f37156db8ec8dc0d5778018edde60a92e3.1237813499.git.eduard.munteanu@linux360.ro> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-03 12:23:06 +02:00
Eduard - Gabriel Munteanu	ac44021fcc	kmemtrace, rcu: don't include unnecessary headers, allow kmemtrace w/ tracepoints Impact: cleanup linux/percpu.h includes linux/slab.h, which generates circular inclusion dependencies when trying to switch kmemtrace to use tracepoints instead of markers. This patch allows tracing within slab headers' inline functions. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: paulmck@linux.vnet.ibm.com LKML-Reference: <1237898630.25315.83.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-03 12:23:05 +02:00
Ingo Molnar	a979241c53	kmemtrace, rcu: fix rcupreempt.c data structure dependencies Impact: cleanup We want to remove percpu.h from rcupreempt.h, but if we do that the percpu primitives there wont build anymore. Move them to the .c file instead. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: paulmck@linux.vnet.ibm.com LKML-Reference: <1237898630.25315.83.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-03 12:23:04 +02:00
Ingo Molnar	b1f77b0581	kmemtrace, rcu: fix linux/rcutree.h and linux/rcuclassic.h dependencies Impact: build fix for all non-x86 architectures We want to remove percpu.h from rcuclassic.h/rcutree.h (for upcoming kmemtrace changes) but that would break the DECLARE_PER_CPU based declarations in these files. Move the quiescent counter management functions to their respective RCU implementation .c files - they were slightly above the inlining limit anyway. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: paulmck@linux.vnet.ibm.com LKML-Reference: <1237898630.25315.83.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-03 12:23:02 +02:00
Pekka Enberg	aa84442d67	kmemtrace, security: fix linux/key.h header file dependencies Impact: cleanup We want to remove percpu.h from rcupdate.h (for upcoming kmemtrace changes), but this is not possible currently without breaking the build because key.h has an implicit include file dependency on rwsem.h: CC [M] fs/cifs/cifs_spnego.o In file included from include/keys/user-type.h:15, from fs/cifs/cifs_spnego.c:24: include/linux/key.h:128: error: field ‘sem’ has incomplete type make[2]: * [fs/cifs/cifs_spnego.o] Error 1 make[1]: * [fs/cifs] Error 2 make: *** [fs] Error 2 Fix it by making the dependency explicit. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> LKML-Reference: <1237884886.25315.39.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-03 12:21:12 +02:00
Ingo Molnar	21e5445928	kmemtrace, fs: fix linux/fdtable.h header file dependencies Impact: cleanup We want to remove percpu.h from rcupdate.h (for upcoming kmemtrace changes), but this is not possible currently without breaking the build because fdtable.h has an implicit include file dependency: it uses __init does not include init.h. This can cause build failures on non-x86 architectures: /home/mingo/tip/include/linux/fdtable.h:66: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'files_defer_init' make[2]: *** [fs/locks.o] Error 1 We got this header included indirectly via rcupdate.h's percpu.h inclusion - but if that is not there the build will break. Fix it. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: paulmck@linux.vnet.ibm.com LKML-Reference: <1237898630.25315.83.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-03 12:13:03 +02:00
Ingo Molnar	76791ab2d5	kmemtrace, fs: uninline simple_transaction_set() Impact: cleanup We want to remove percpu.h from rcupdate.h (for upcoming kmemtrace changes), but this is not possible currently without breaking the build because fs.h has an implicit include file depedency: it uses PAGE_SIZE but does not include asm/page.h which defines it. This problem gets masked in practice because most fs.h using sites use rcupreempt.h (and other headers) which includes percpu.h which brings in asm/page.h indirectly. We cannot add asm/page.h to asm/fs.h because page.h is not an exported header. Move simple_transaction_set() to the other simple-transaction file helpers in fs/libfs.c. This removes the include file hell and also reduces kernel size a bit. Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: paulmck@linux.vnet.ibm.com LKML-Reference: <1237898630.25315.83.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-03 12:09:09 +02:00
Pekka Enberg	3d544f411f	kmemtrace, fs, security: move alloc_secdata() and free_secdata() to linux/security.h Impact: cleanup We want to remove percpu.h from rcupdate.h (for upcoming kmemtrace changes), but this is not possible currently without breaking the build because fs.h has implicit include file depedencies: it uses GFP_* types in inlines but does not include gfp.h. In practice most fs.h using .c files get gfp.h included implicitly, via an indirect route: via rcupdate.h inclusion - so this underlying problem gets masked in practice. So we want to solve fs.h's dependency on gfp.h. gfp.h can not be included here directly because it is not exported and it would break the build the following way: /home/mingo/tip/usr/include/linux/bsg.h:11: found __[us]{8,16,32,64} type without #include <linux/types.h> /home/mingo/tip/usr/include/linux/fs.h:11: included file 'linux/gfp.h' is not exported make[3]: * [/home/mingo/tip/usr/include/linux/.check] Error 1 make[2]: * [linux] Error 2 As suggested by Alexey Dobriyan, move alloc_secdata() and free_secdata() to linux/security.h - they belong there. This also cleans fs.h of GFP_* usage. Suggested-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> LKML-Reference: <1237906803.25315.96.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-03 12:08:57 +02:00
Theodore Ts'o	f7ab34ea72	ext3: Add replace-on-truncate hueristics for data=writeback mode In data=writeback mode, start an asynchronous flush when closing a file which had been previously truncated down to zero. This lowers the probability of data loss in the case of applications that attempt to replace a file using truncate. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-03 01:34:35 -04:00
Linus Torvalds	8fe74cf053	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: Remove two unneeded exports and make two symbols static in fs/mpage.c Cleanup after commit `585d3bc06f` Trim includes of fdtable.h Don't crap into descriptor table in binfmt_som Trim includes in binfmt_elf Don't mess with descriptor table in load_elf_binary() Get rid of indirect include of fs_struct.h New helper - current_umask() check_unsafe_exec() doesn't care about signal handlers sharing New locking/refcounting for fs_struct Take fs_struct handling to new file (fs/fs_struct.c) Get rid of bumping fs_struct refcount in pivot_root(2) Kill unsharing fs_struct in __set_personality()	2009-04-02 21:09:10 -07:00
Robin Holt	f5f7eac41d	Allow rwlocks to re-enable interrupts Pass the original flags to rwlock arch-code, so that it can re-enable interrupts if implemented for that architecture. Initially, make __raw_read_lock_flags and __raw_write_lock_flags stubs which just do the same thing as non-flags variants. Signed-off-by: Petr Tesarik <ptesarik@suse.cz> Signed-off-by: Robin Holt <holt@sgi.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:11 -07:00
Robin Holt	e8c158bb31	Factor out #ifdefs from kernel/spinlock.c to LOCK_CONTENDED_FLAGS SGI has observed that on large systems, interrupts are not serviced for a long period of time when waiting for a rwlock. The following patch series re-enables irqs while waiting for the lock, resembling the code which is already there for spinlocks. I only made the ia64 version, because the patch adds some overhead to the fast path. I assume there is currently no demand to have this for other architectures, because the systems are not so large. Of course, the possibility to implement raw_{read\|write}_lock_flags for any architecture is still there. This patch: The new macro LOCK_CONTENDED_FLAGS expands to the correct implementation depending on the config options, so that IRQ's are re-enabled when possible, but they remain disabled if CONFIG_LOCKDEP is set. Signed-off-by: Petr Tesarik <ptesarik@suse.cz> Signed-off-by: Robin Holt <holt@sgi.com> Cc: <linux-arch@vger.kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:10 -07:00
Gerd Hoffmann	f3554f4bc6	preadv/pwritev: Add preadv and pwritev system calls. This patch adds preadv and pwritev system calls. These syscalls are a pretty straightforward combination of pread and readv (same for write). They are quite useful for doing vectored I/O in threaded applications. Using lseek+readv instead opens race windows you'll have to plug with locking. Other systems have such system calls too, for example NetBSD, check here: http://www.daemon-systems.org/man/preadv.2.html The application-visible interface provided by glibc should look like this to be compatible to the existing implementations in the BSD family: ssize_t preadv(int d, const struct iovec iov, int iovcnt, off_t offset); ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset); This prototype has one problem though: On 32bit archs is the (64bit) offset argument unaligned, which the syscall ABI of several archs doesn't allow to do. At least s390 needs a wrapper in glibc to handle this. As we'll need a wrappers in glibc anyway I've decided to push problem to glibc entriely and use a syscall prototype which works without arch-specific wrappers inside the kernel: The offset argument is explicitly splitted into two 32bit values. The patch sports the actual system call implementation and the windup in the x86 system call tables. Other archs follow as separate patches. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <linux-api@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:08 -07:00
Neil Horman	04d491ab2a	kexec: add dmesg log symbols to /proc/vmcoreinfo lists It would be nice to be able to extract the dmesg log from a vmcore file without needing to keep the debug symbols for the running kernel handy all the time. We have a facility to do this in /proc/vmcore. This patch adds the log_buf and log_end symbols to the vmcoreinfo area so that tools (like makedumpfile) can easily extract the dmesg logs from a vmcore image. [akpm@linux-foundation.org: several fixes and cleanups] [akpm@linux-foundation.org: fix unused log_buf_kexec_setup()] [akpm@linux-foundation.org: build fix] Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Cc: Simon Horman <horms@verge.net.au> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Simon Horman <horms@verge.net.au> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:04 -07:00
Harry Ciao	7c5ff4f92e	pci: Add AMD8111 PCI Bridge PCI Device ID Add the PCI Device ID of the PCI Bridge Controller on AMD8111 chip. Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:03 -07:00
Oleg Nesterov	1b0f7ffd0e	pids: kill signal_struct-> __pgrp/__session and friends We are wasting 2 words in signal_struct without any reason to implement task_pgrp_nr() and task_session_nr(). task_session_nr() has no callers since `2e2ba22ea4`, we can remove it. task_pgrp_nr() is still (I believe wrongly) used in fs/autofsX and fs/coda. This patch reimplements task_pgrp_nr() via task_pgrp_nr_ns(), and kills __pgrp/__session and the related helpers. The change in drivers/char/tty_io.c is cosmetic, but hopefully makes sense anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Alan Cox <number6@the-village.bc.nu> [tty parts] Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:02 -07:00
Oleg Nesterov	52ee2dfdd4	pids: refactor vnr/nr_ns helpers to make them safe Inho, the safety rules for vnr/nr_ns helpers are horrible and buggy. task_pid_nr_ns(task) needs rcu/tasklist depending on task == current. As for "special" pids, vnr/nr_ns helpers always need rcu. However, if task != current, they are unsafe even under rcu lock, we can't trust task->group_leader without the special checks. And almost every helper has a callsite which needs a fix. Also, it is a bit annoying that the implementations of, say, task_pgrp_vnr() and task_pgrp_nr_ns() are not "symmetrical". This patch introduces the new helper, __task_pid_nr_ns(), which is always safe to use, and turns all other helpers into the trivial wrappers. After this I'll send another patch which converts task_tgid_xxx() as well, they're are a bit special. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:02 -07:00
Oleg Nesterov	6dda81f438	pids: document task_pgrp/task_session is not safe without tasklist/rcu Even if task == current, it is not safe to dereference the result of task_pgrp/task_session. We can race with another thread which changes the special pid via setpgid/setsid. Document this. The next 2 patches give an example of the unsafe usage, we have more bad users. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:02 -07:00
Paul Fulghum	1f80769ffd	synclink_gt: add clock options Add support for x8 asynchronous sample rate and ability to specify base clock frequency. Signed-off-by: Paul Fulghum <paulkf@microgate.com> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:01 -07:00
Kirill A. Shutemov	a50b0aa4bd	struct linux_binprm: drop unused fields Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:01 -07:00
Lai Jiangshan	40e8a10de2	cpu hotplug: remove unused cpuhotplug_mutex_lock() cpuhotplug_mutex_lock() is not used, remove it. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:00 -07:00
Oleg Nesterov	bb24c679a5	tracehook_notify_death: use task_detached() helper Now that task_detached() is exported, change tracehook_notify_death() to use this helper, nobody else checks ->exit_signal == -1 by hand. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Metzger, Markus T" <markus.t.metzger@intel.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:00 -07:00
Oleg Nesterov	39c626ae47	forget_original_parent: split out the un-ptrace part By discussion with Roland. - Rename ptrace_exit() to exit_ptrace(), and change it to do all the necessary work with ->ptraced list by its own. - Move this code from exit.c to ptrace.c - Update the comment in ptrace_detach() to explain the rechecking of the child->ptrace. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Metzger, Markus T" <markus.t.metzger@intel.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:05:00 -07:00
Oleg Nesterov	4576145c1e	ptrace: fix possible zombie leak on PTRACE_DETACH When ptrace_detach() takes tasklist, the tracee can be SIGKILL'ed. If it has already passed exit_notify() we can leak a zombie, because a) ptracing disables the auto-reaping logic, and b) ->real_parent was not notified about the child's death. ptrace_detach() should follow the ptrace_exit's logic, change the code accordingly. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Roland McGrath <roland@redhat.com> Tested-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:59 -07:00
Oleg Nesterov	43918f2bf4	signals: remove 'handler' parameter to tracehook functions Container-init must behave like global-init to processes within the container and hence it must be immune to unhandled fatal signals from within the container (i.e SIG_DFL signals that terminate the process). But the same container-init must behave like a normal process to processes in ancestor namespaces and so if it receives the same fatal signal from a process in ancestor namespace, the signal must be processed. Implementing these semantics requires that send_signal() determine pid namespace of the sender but since signals can originate from workqueues/ interrupt-handlers, determining pid namespace of sender may not always be possible or safe. This patchset implements the design/simplified semantics suggested by Oleg Nesterov. The simplified semantics for container-init are: - container-init must never be terminated by a signal from a descendant process. - container-init must never be immune to SIGKILL from an ancestor namespace (so a process in parent namespace must always be able to terminate a descendant container). - container-init may be immune to unhandled fatal signals (like SIGUSR1) even if they are from ancestor namespace. SIGKILL/SIGSTOP are the only reliable signals to a container-init from ancestor namespace. This patch: Based on an earlier patch submitted by Oleg Nesterov and comments from Roland McGrath (http://lkml.org/lkml/2008/11/19/258). The handler parameter is currently unused in the tracehook functions. Besides, the tracehook functions are called with siglock held, so the functions can check the handler if they later need to. Removing the parameter simiplifies changes to sig_ignored() in a follow-on patch. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Daniel Lezcano <daniel.lezcano@free.fr> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:58 -07:00
David Rientjes	a1bc5a4eee	cpusets: replace zone allowed functions with node allowed The cpuset_zone_allowed() variants are actually only a function of the zone's node. Cc: Paul Menage <menage@google.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:57 -07:00
Li Zefan	bd1a8ab73e	cgroups: add 'data' field to struct cgroup_scanner We need to pass some data to test_task() or process_task() in some cases. Will be used later. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:56 -07:00
KAMEZAWA Hiroyuki	a3b2d69269	cgroups: use css id in swap cgroup for saving memory v5 Try to use CSS ID for records in swap_cgroup. By this, on 64bit machine, size of swap_cgroup goes down to 2 bytes from 8bytes. This means, when 2GB of swap is equipped, (assume the page size is 4096bytes) From size of swap_cgroup = 2G/4k * 8 = 4Mbytes. To size of swap_cgroup = 2G/4k * 2 = 1Mbytes. Reduction is large. Of course, there are trade-offs. This CSS ID will add overhead to swap-in/swap-out/swap-free. But in general, - swap is a resource which the user tend to avoid use. - If swap is never used, swap_cgroup area is not used. - Reading traditional manuals, size of swap should be proportional to size of memory. Memory size of machine is increasing now. I think reducing size of swap_cgroup makes sense. Note: - ID->CSS lookup routine has no locks, it's under RCU-Read-Side. - memcg can be obsolete at rmdir() but not freed while refcnt from swap_cgroup is available. Changelog v4->v5: - reworked on to memcg-charge-swapcache-to-proper-memcg.patch Changlog ->v4: - fixed not configured case. - deleted unnecessary comments. - fixed NULL pointer bug. - fixed message in dmesg. [nishimura@mxp.nes.nec.co.jp: css_tryget can be called twice in !PageCgroupUsed case] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:56 -07:00
KOSAKI Motohiro	3918b96e03	memcg: remove mem_cgroup_reclaim_imbalance() remnants commit `4f98a2fee8` (vmscan: split LRU lists into anon & file sets) removed mem_cgroup_reclaim_imbalance(), but there are some leftovers in memcontrol.h. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:56 -07:00
KOSAKI Motohiro	c137b5ece4	memcg: remove mem_cgroup_calc_mapped_ratio() Currently, mem_cgroup_calc_mapped_ratio() is unused at all. it can be removed and KAMEZAWA-san suggested it. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:55 -07:00
Balbir Singh	e222432bfa	memcg: show memcg information during OOM Add RSS and swap to OOM output from memcg Display memcg values like failcnt, usage and limit when an OOM occurs due to memcg. Thanks to Johannes Weiner, Li Zefan, David Rientjes, Kamezawa Hiroyuki, Daisuke Nishimura and KOSAKI Motohiro for review. Sample output ------------- Task in /a/x killed as a result of limit of /a memory: usage 1048576kB, limit 1048576kB, failcnt 4183 memory+swap: usage 1400964kB, limit 9007199254740991kB, failcnt 0 [akpm@linux-foundation.org: compilation fix] [akpm@linux-foundation.org: fix kerneldoc and whitespace] [akpm@linux-foundation.org: add printk facility level] Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:55 -07:00
KAMEZAWA Hiroyuki	0b7f569e45	memcg: fix OOM killer under memcg This patch tries to fix OOM Killer problems caused by hierarchy. Now, memcg itself has OOM KILL function (in oom_kill.c) and tries to kill a task in memcg. But, when hierarchy is used, it's broken and correct task cannot be killed. For example, in following cgroup /groupA/ hierarchy=1, limit=1G, 01 nolimit 02 nolimit All tasks' memory usage under /groupA, /groupA/01, groupA/02 is limited to groupA's 1Gbytes but OOM Killer just kills tasks in groupA. This patch provides makes the bad process be selected from all tasks under hierarchy. BTW, currently, oom_jiffies is updated against groupA in above case. oom_jiffies of tree should be updated. To see how oom_jiffies is used, please check mem_cgroup_oom_called() callers. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: const fix] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:55 -07:00
Li Zefan	099fca3225	cgroups: show correct file mode We have some read-only files and write-only files, but currently they are all set to 0644, which is counter-intuitive and cause trouble for some cgroup tools like libcgroup. This patch adds 'mode' to struct cftype to allow cgroup subsys to set it's own files' file mode, and for the most cases cft->mode can be default to 0 and cgroup will figure out proper mode. Acked-by: Paul Menage <menage@google.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:54 -07:00
KAMEZAWA Hiroyuki	ec64f51545	cgroup: fix frequent -EBUSY at rmdir In following situation, with memory subsystem, /groupA use_hierarchy==1 /01 some tasks /02 some tasks /03 some tasks /04 empty When tasks under 01/02/03 hit limit on /groupA, hierarchical reclaim is triggered and the kernel walks tree under groupA. In this case, rmdir /groupA/04 fails with -EBUSY frequently because of temporal refcnt from the kernel. In general. cgroup can be rmdir'd if there are no children groups and no tasks. Frequent fails of rmdir() is not useful to users. (And the reason for -EBUSY is unknown to users.....in most cases) This patch tries to modify above behavior, by - retries if css_refcnt is got by someone. - add "return value" to pre_destroy() and allows subsystem to say "we're really busy!" Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:54 -07:00
KAMEZAWA Hiroyuki	38460b48d0	cgroup: CSS ID support Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code. This patch attaches unique ID to each css and provides following. - css_lookup(subsys, id) returns pointer to struct cgroup_subysys_state of id. - css_get_next(subsys, id, rootid, depth, foundid) returns the next css under "root" by scanning When cgroup_subsys->use_id is set, an id for css is maintained. The cgroup framework only parepares - css_id of root css for subsys - id is automatically attached at creation of css. - id is not freed automatically. Because the cgroup framework don't know lifetime of cgroup_subsys_state. free_css_id() function is provided. This must be called by subsys. There are several reasons to develop this. - Saving space .... For example, memcg's swap_cgroup is array of pointers to cgroup. But it is not necessary to be very fast. By replacing pointers(8bytes per ent) to ID (2byes per ent), we can reduce much amount of memory usage. - Scanning without lock. CSS_ID provides "scan id under this ROOT" function. By this, scanning css under root can be written without locks. ex) do { rcu_read_lock(); next = cgroup_get_next(subsys, id, root, &found); /* check sanity of next here */ css_tryget(); rcu_read_unlock(); id = found + 1 } while(...) Characteristics: - Each css has unique ID under subsys. - Lifetime of ID is controlled by subsys. - css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy - Allowed ID is 1-65535, ID 0 is UNUSED ID. Design Choices: - scan-by-ID v.s. scan-by-tree-walk. As /proc's pid scan does, scan-by-ID is robust when scanning is done by following kind of routine. scan -> rest a while(release a lock) -> conitunue from interrupted memcg's hierarchical reclaim does this. - When subsys->use_id is set, # of css in the system is limited to 65535. [bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:53 -07:00
Grzegorz Nosek	313e924c08	cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroups The ns_proxy cgroup allows moving processes to child cgroups only one level deep at a time. This commit relaxes this restriction and makes it possible to attach tasks directly to grandchild cgroups, e.g.: ($pid is in the root cgroup) echo $pid > /cgroup/CG1/CG2/tasks Previously this operation would fail with -EPERM and would have to be performed as two steps: echo $pid > /cgroup/CG1/tasks echo $pid > /cgroup/CG1/CG2/tasks Also, the target cgroup no longer needs to be empty to move a task there. Signed-off-by: Grzegorz Nosek <root@localdomain.pl> Acked-by: Serge Hallyn <serue@us.ibm.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:53 -07:00
Paul Menage	d20a390a0e	cgroups: fix cgroup.h comments Fix the style of some multi-line comments in cgroup.h to match Documentation/CodingStyle Signed-off-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:53 -07:00
Cyrus Massoumi	039fd8ce62	ext3: remove the BKL in ext3/ioctl.c Reformat ext3/ioctl.c to make it look more like ext4/ioctl.c and remove the BKL around ext3_ioctl(). Signed-off-by: Cyrus Massoumi <cyrusm@gmx.net> Cc: <linux-ext4@vger.kernel.org> Acked-by: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:52 -07:00
Michael Buesch	bfb9bcdbda	spi-gpio: allow operation without CS signal Change spi-gpio so that it is possible to drive SPI communications over GPIO without the need for a chipselect signal. This is useful in very small setups where there's only one slave device on the bus. This patch does not affect existing setups. I use this for a tiny communication channel between an embedded device and a microcontroller. There are not enough GPIOs available for chipselect and it's not needed anyway in this case. Signed-off-by: Michael Buesch <mb@bu3sch.de> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:51 -07:00
Mike Rapoport	96615841e1	rtc-v3020: add ability to access v3020 chip with GPIOs The v3020 RTC can be connected to GPIOs as well as to memory-like interface. Add ability to use GPIO bit-bang for v3020 read-write access. [akpm@linux-foundation.org: fix off-by-one in error path] Signed-off-by: Mike Rapoport <mike@compulab.co.il> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:51 -07:00
Alexey Dobriyan	6f2c55b843	Simplify copy_thread() First argument unused since 2.3.11. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:51 -07:00
David Brownell	14dd1ff0f9	memory_accessor: implement the new memory_accessor interfaces for SPI EEPROMs - Define new setup() hook to export the accessor - Implement accessor methods Moves some error checking out of the sysfs interface code into the layer below it, which is now shared by both sysfs and memory access code. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:50 -07:00
Kevin Hilman	7274ec8bd7	memory_accessor: implement the new memory_accessor interface for I2C EEPROM In the case of at24, the platform code registers a 'setup' callback with the at24_platform_data. When the at24 driver detects an EEPROM, it fills out the read and write functions of the memory_accessor and calls the setup callback passing the memory_accessor struct. The platform code can then use the read/write functions in the memory_accessor struct for reading and writing the EEPROM. Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com> Cc: David Brownell <dbrownell@users.sourceforge.net> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:50 -07:00
Kevin Hilman	06c421ee0d	memory_accessor: new interface for reading/writing persistent memory Add an interface by which other kernel code can read/write persistent memory such as I2C or SPI EEPROMs, or devices which provide NVRAM. Use cases include storage of board-specific configuration data like Ethernet addresses and sensor calibrations. Original idea, review and improvement suggestions by David Brownell. Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:50 -07:00
Jean Delvare	bf6aede712	workqueue: add to_delayed_work() helper function It is a fairly common operation to have a pointer to a work and to need a pointer to the delayed work it is contained in. In particular, all delayed works which want to rearm themselves will have to do that. So it would seem fair to offer a helper function for this operation. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Greg KH <greg@kroah.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:50 -07:00
Lee Schermerhorn	9a896c9a48	mm: define a UNIQUE value for AS_UNEVICTABLE flag A new "address_space flag"--AS_MM_ALL_LOCKS--was defined to use the next available AS flag while the Unevictable LRU was under development. The Unevictable LRU was using the same flag and "no one" noticed. Current mainline, since 2.6.28, has same value for two symbolic flag names. So, define a unique flag value for AS_UNEVICTABLE--up close to the other flags, [at the cost of an additional #ifdef] so we'll notice next time. Note that #ifdef is not actually required, if we don't mind having the unused flag value defined. Replace #defines with an enum. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: <stable@kernel.org> [2.6.28.x, 2.6.29.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:49 -07:00
Eric Sandeen	8e2c3795c7	add fiemap.h to header-y Include fiemap.h in header-y; it defines the interface for the FS_IOC_FIEMAP file mapping ioctl. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:49 -07:00
David Howells	33e5d76979	nommu: fix a number of issues with the per-MM VMA patch Fix a number of issues with the per-MM VMA patch: (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on a NOMMU system with more than 2G pages. Makes no difference on a 32-bit system. (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value, lest it overflow. (3) Move the allocation of the vm_area_struct slab back for fork.c. (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs. (5) Use BUG_ON() rather than if () BUG(). (6) Make the default validate_nommu_regions() a static inline rather than a #define. (7) Make free_page_series()'s objection to pages with a refcount != 1 more informative. (8) Adjust the __put_nommu_region() banner comment to indicate that the semaphore must be held for writing. (9) Limit the number of warnings about munmaps of non-mmapped regions. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:48 -07:00
Akinobu Mita	ee3b4290ae	generic debug pagealloc: build fix This fixes a build failure with generic debug pagealloc: mm/debug-pagealloc.c: In function 'set_page_poison': mm/debug-pagealloc.c:8: error: 'struct page' has no member named 'debug_flags' mm/debug-pagealloc.c: In function 'clear_page_poison': mm/debug-pagealloc.c:13: error: 'struct page' has no member named 'debug_flags' mm/debug-pagealloc.c: In function 'page_poison': mm/debug-pagealloc.c:18: error: 'struct page' has no member named 'debug_flags' mm/debug-pagealloc.c: At top level: mm/debug-pagealloc.c:120: error: redefinition of 'kernel_map_pages' include/linux/mm.h:1278: error: previous definition of 'kernel_map_pages' was here mm/debug-pagealloc.c: In function 'kernel_map_pages': mm/debug-pagealloc.c:122: error: 'debug_pagealloc_enabled' undeclared (first use in this function) by fixing - debug_flags should be in struct page - define DEBUG_PAGEALLOC config option for all architectures Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:48 -07:00
Jonathan Brassow	7513c2a761	dm raid1: add is_remote_recovering hook for clusters The logging API needs an extra function to make cluster mirroring possible. This new function allows us to check whether a mirror region is being recovered on another machine in the cluster. This helps us prevent simultaneous recovery I/O and process I/O to the same locations on disk. Cluster-aware log modules will implement this function. Single machine log modules will not. So, there is no performance penalty for single machine mirrors. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Acked-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2009-04-02 19:55:30 +01:00
Mike Snitzer	ec44ab9d66	dm log: remove struct dm_dirty_log_internal Remove the 'dm_dirty_log_internal' structure. The resulting cleanup eliminates extra memory allocations. Therefore exposing the internal list_head to the external 'dm_dirty_log_type' structure is a worthwhile compromise. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2009-04-02 19:55:30 +01:00
Cheng Renquan	45194e4f89	dm target: remove struct tt_internal The tt_internal is really just a list_head to manage registered target_type in a double linked list, Here embed the list_head into target_type directly, 1. to avoid kmalloc/kfree; 2. then tt_internal is really unneeded; Cc: stable@kernel.org Signed-off-by: Cheng Renquan <crquan@gmail.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Reviewed-by: Alasdair G Kergon <agk@redhat.com>	2009-04-02 19:55:28 +01:00
Ingo Molnar	8302294f43	Merge branch 'tracing/core-v2' into tracing-for-linus Conflicts: include/linux/slub_def.h lib/Kconfig.debug mm/slob.c mm/slub.c	2009-04-02 00:49:02 +02:00
Hans-Christian Egtvedt	d9de451989	dw_dmac: add cyclic API to DW DMA driver This patch adds a cyclic DMA interface to the DW DMA driver. This is very useful if you want to use the DMA controller in combination with a sound device which uses cyclic buffers. Using a DMA channel for cyclic DMA will disable the possibility to use it as a normal DMA engine until the user calls the cyclic free function on the DMA channel. Also a cyclic DMA list can not be prepared if the channel is already active. Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com> Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2009-04-01 15:42:34 -07:00
Bartlomiej Zolnierkiewicz	eae6c2b641	remove <linux/ata.h> include from <linux/hdreg.h> All <linux/hdreg.h> users that need <linux/ata.h> have been fixed to include it directly. Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2009-04-01 21:42:26 +02:00
Bartlomiej Zolnierkiewicz	4fe6e30645	include/linux/hdreg.h: remove unused defines * Move HD_IRQ define to drivers/block/hd.c (only user). * Remove unused _STAT, _ERR, HD_*, CD, IO, REL and TAG_MASK defines. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2009-04-01 21:42:25 +02:00
Bartlomiej Zolnierkiewicz	dafd01cc14	include/linux/hdreg.h: cover WIN_* and friends with #ifndef/#endif __KERNEL__ Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2009-04-01 21:42:25 +02:00
Bartlomiej Zolnierkiewicz	6fd5c665d8	include/linux/hdreg.h: cover struct hd_driveid with #ifndef/#endif __KERNEL__ Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2009-04-01 21:42:23 +02:00
Linus Torvalds	4fe70410d9	Merge branch 'for-linus' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'for-linus' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (58 commits) SUNRPC: Ensure IPV6_V6ONLY is set on the socket before binding to a port NSM: Fix unaligned accesses in nsm_init_private() NFS: Simplify logic to compare socket addresses in client.c NFS: Start PF_INET6 callback listener only if IPv6 support is available lockd: Start PF_INET6 listener only if IPv6 support is available SUNRPC: Remove CONFIG_SUNRPC_REGISTER_V4 SUNRPC: rpcb_register() should handle errors silently SUNRPC: Simplify kernel RPC service registration SUNRPC: Simplify svc_unregister() SUNRPC: Allow callers to pass rpcb_v4_register a NULL address SUNRPC: rpcbind actually interprets r_owner string SUNRPC: Clean up address type casts in rpcb_v4_register() SUNRPC: Don't return EPROTONOSUPPORT in svc_register()'s helpers SUNRPC: Use IPv4 loopback for registering AF_INET6 kernel RPC services SUNRPC: Set IPV6ONLY flag on PF_INET6 RPC listener sockets NFS: Revert creation of IPv6 listeners for lockd and NFSv4 callbacks SUNRPC: Remove @family argument from svc_create() and svc_create_pooled() SUNRPC: Change svc_create_xprt() to take a @family argument SUNRPC: svc_setup_socket() gets protocol family from socket SUNRPC: Pass a family argument to svc_register() ...	2009-04-01 10:58:42 -07:00
Linus Torvalds	395d73413c	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (33 commits) ext4: Regularize mount options ext4: fix locking typo in mballoc which could cause soft lockup hangs ext4: fix typo which causes a memory leak on error path jbd2: Update locking coments ext4: Rename pa_linear to pa_type ext4: add checks of block references for non-extent inodes ext4: Check for an valid i_mode when reading the inode from disk ext4: Use WRITE_SYNC for commits which are caused by fsync() ext4: Add auto_da_alloc mount option ext4: Use struct flex_groups to calculate get_orlov_stats() ext4: Use atomic_t's in struct flex_groups ext4: remove /proc tuning knobs ext4: Add sysfs support ext4: Track lifetime disk writes ext4: Fix discard of inode prealloc space with delayed allocation. ext4: Automatically allocate delay allocated blocks on rename ext4: Automatically allocate delay allocated blocks on close ext4: add EXT4_IOC_ALLOC_DA_BLKS ioctl ext4: Simplify delalloc code by removing mpage_da_writepages() ext4: Save stack space by removing fake buffer heads ...	2009-04-01 10:57:49 -07:00
Trond Myklebust	cc85906110	Merge branch 'devel' into for-linus	2009-04-01 13:28:15 -04:00
Linus Torvalds	c09bca786f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (59 commits) ide-floppy: do not complete rq's prematurely ide: be able to build pmac driver without IDE built-in ide-pmac: IDE cable detection on Apple PowerBook ide: inline SELECT_DRIVE() ide: turn selectproc() method into dev_select() method (take 5) MAINTAINERS: move old ide-{floppy,tape} entries to CREDITS (take 2) ide: move data register access out of tf_{read\|load}() methods (take 2) ide: call {in\|out}put_data() methods from tf_{read\|load}() methods (take 2) ide-io-std: shorten ide_{in\|out}put_data() ide: rename IDE_TFLAG_IN_[HOB_]FEATURE ide: turn set_irq() method into write_devctl() method ide: use ATA_HOB ide-disk: use ATA_ERR ide: add support for CFA specified transfer modes (take 3) ide-iops: only clear DMA words on setting DMA mode ide: identify data word 53 bit 1 doesn't cover words 62 and 63 (take 3) au1xxx-ide: auide_{in\|out}sw() should be static ide-floppy: use ide_pio_bytes() ide-{floppy,tape}: fix padding for PIO transfers ide: remove CONFIG_BLK_DEV_IDEDOUBLER config option ...	2009-04-01 10:02:15 -07:00
Linus Torvalds	e76e5b2c66	Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits) PCI: fix HT MSI mapping fix PCI: don't enable too much HT MSI mapping x86/PCI: make pci=lastbus=255 work when acpi is on PCI: save and restore PCIe 2.0 registers PCI: update fakephp for bus_id removal PCI: fix kernel oops on bridge removal PCI: fix conflict between SR-IOV and config space sizing powerpc/PCI: include pci.h in powerpc MSI implementation PCI Hotplug: schedule fakephp for feature removal PCI Hotplug: rename legacy_fakephp to fakephp PCI Hotplug: restore fakephp interface with complete reimplementation PCI: Introduce /sys/bus/pci/devices/.../rescan PCI: Introduce /sys/bus/pci/devices/.../remove PCI: Introduce /sys/bus/pci/rescan PCI: Introduce pci_rescan_bus() PCI: do not enable bridges more than once PCI: do not initialize bridges more than once PCI: always scan child buses PCI: pci_scan_slot() returns newly found devices PCI: don't scan existing devices ... Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt	2009-04-01 09:47:12 -07:00
Andrew Morton	6a7f2829b5	fbdev: uninline lock_fb_info() Before: text data bss dec hex filename 3648 2910 32 6590 19be drivers/video/backlight/backlight.o 3226 2812 32 6070 17b6 drivers/video/backlight/lcd.o 30990 16688 8480 56158 db5e drivers/video/console/fbcon.o 15488 8400 24 23912 5d68 drivers/video/fbmem.o After: text data bss dec hex filename 3537 2870 32 6439 1927 drivers/video/backlight/backlight.o 3131 2772 32 5935 172f drivers/video/backlight/lcd.o 30876 16648 8480 56004 dac4 drivers/video/console/fbcon.o 15506 8400 24 23930 5d7a drivers/video/fbmem.o Cc: Andrea Righi <righi.andrea@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:29 -07:00
Krzysztof Helt	614c0dc932	cirrusfb: add accelerator constant Add an accelerator constant so almost all Cirrus are recognized as accelerators by the fbset command. Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:29 -07:00
Andrew Morton	78d89ef40c	rtc: convert LEAP_YEAR into an inline - the LEAP_YEAR macro is buggy - it references its arg multiple times. Fix this by turning it into a C function. - give it a more approriate name - Move it to rtc.h so that other .c files can use it, instead of copying it. Cc: dann frazier <dannf@hp.com> Acked-by: Alessandro Zummo <alessandro.zummo@towertech.it> Cc: stephane eranian <eranian@googlemail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:24 -07:00
Ian Kent	79955898f9	autofs4: fix kernel includes autofs_dev-ioctl.h is included by both the kernel module and user space tools and it includes two kernel header files. Compiles work if the kernel headers are installed but fail otherwise. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:23 -07:00
Anton Vorontsov	35b4b3c0c1	spi_mpc83xx: add OF platform driver bindings Implement full support for OF SPI bindings. Now the driver can manage its own chip selects without any help from the board files and/or fsl_soc constructors. The "legacy" code is well isolated and could be removed as time goes by. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Cc: David Brownell <david-b@pacbell.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@gate.crashing.org> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:22 -07:00
Anton Vorontsov	364fdbc00f	spi_mpc83xx: rework chip selects handling The main purpose of this patch is to pass 'struct spi_device' to the chip select handling routines. This is needed so that we could implement full-fledged OpenFirmware support for this driver. While at it, also: - Replace two {de,activate}_cs routines by single cs_contol(). - Don't duplicate platform data callbacks in mpc83xx_spi struct. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Cc: David Brownell <david-b@pacbell.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@gate.crashing.org> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:22 -07:00
Davide Libenzi	c0da377536	epoll keyed wakeups: introduce new *_poll() wakeup macros Introduce new wakeup macros that allow passing an event mask to the wakeup targets. They exactly mimic their non-_poll() counterpart, with the added event mask passing capability. I did add only the ones currently requested, avoiding the _nr() and _all() for the moment. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: William Lee Irwin III <wli@movementarian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:20 -07:00
Davide Libenzi	4ede816ac3	epoll keyed wakeups: add __wake_up_locked_key() and __wake_up_sync_key() This patchset introduces wakeup hints for some of the most popular (from epoll POV) devices, so that epoll code can avoid spurious wakeups on its waiters. The problem with epoll is that the callback-based wakeups do not, ATM, carry any information about the events the wakeup is related to. So the only choice epoll has (not being able to call f_op->poll() from inside the callback), is to add the file* to a ready-list and resolve the real events later on, at epoll_wait() (or its own f_op->poll()) time. This can cause spurious wakeups, since the wake_up() itself might be for an event the caller is not interested into. The rate of these spurious wakeup can be pretty high in case of many network sockets being monitored. By allowing devices to report the events the wakeups refer to (at least the two major classes - POLLIN/POLLOUT), we are able to spare useless wakeups by proper handling inside the epoll's poll callback. Epoll will have in any case to call f_op->poll() on the file* later on, since the change to be done in order to have the full event set sent via wakeup, is too invasive for the way our f_op->poll() system works (the full event set is calculated inside the poll function - there are too many of them to even start thinking the change - also poll/select would need change too). Epoll is changed in a way that both devices which send event hints, and the ones that don't, are correctly handled. The former will gain some efficiency though. As a general rule for devices, would be to add an event mask by using key-aware wakeup macros, when making up poll wait queues. I tested it (together with the epoll's poll fix patch Andrew has in -mm) and wakeups for the supported devices are correctly filtered. Test program available here: http://www.xmailserver.org/epoll_test.c This patch: Nothing revolutionary here. Just using the available "key" that our wakeup core already support. The __wake_up_locked_key() was no brainer, since both __wake_up_locked() and __wake_up_locked_key() are thin wrappers around __wake_up_common(). The __wake_up_sync() function had a body, so the choice was between borrowing the body for __wake_up_sync_key() and calling it from __wake_up_sync(), or make an inline and calling it from both. I chose the former since in most archs it all resolves to "mov $0, REG; jmp ADDR". Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: William Lee Irwin III <wli@movementarian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:20 -07:00
Davide Libenzi	bcd0b235bf	eventfd: improve support for semaphore-like behavior People started using eventfd in a semaphore-like way where before they were using pipes. That is, counter-based resource access. Where a "wait()" returns immediately by decrementing the counter by one, if counter is greater than zero. Otherwise will wait. And where a "post(count)" will add count to the counter releasing the appropriate amount of waiters. If eventfd the "post" (write) part is fine, while the "wait" (read) does not dequeue 1, but the whole counter value. The problem with eventfd is that a read() on the fd returns and wipes the whole counter, making the use of it as semaphore a little bit more cumbersome. You can do a read() followed by a write() of COUNTER-1, but IMO it's pretty easy and cheap to make this work w/out extra steps. This patch introduces a new eventfd flag that tells eventfd to only dequeue 1 from the counter, allowing simple read/write to make it behave like a semaphore. Simple test here: http://www.xmailserver.org/eventfd-sem.c To be back-compatible with earlier kernels, userspace applications should probe for the availability of this feature via #ifdef EFD_SEMAPHORE fd = eventfd2 (CNT, EFD_SEMAPHORE); if (fd == -1 && errno == EINVAL) <fallback> #else <fallback> #endif Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: <linux-api@vger.kernel.org> Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:20 -07:00
Cyrill Gorcunov	311d07611e	introduce pr_cont() macro We cover all log-levels by pr_... macros except KERN_CONT one. Add it for convenience. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:18 -07:00
Eric Sandeen	c2d7543851	filesystem freeze: allow SysRq emergency thaw to thaw frozen filesystems Now that the filesystem freeze operation has been elevated to the VFS, and is just an ioctl away, some sort of safety net for unintentionally frozen root filesystems may be in order. The timeout thaw originally proposed did not get merged, but perhaps something like this would be useful in emergencies. For example, freeze /path/to/mountpoint may freeze your root filesystem if you forgot that you had that unmounted. I chose 'j' as the last remaining character other than 'h' which is sort of reserved for help (because help is generated on any unknown character). I've tested this on a non-root fs with multiple (nested) freezers, as well as on a system rendered unresponsive due to a frozen root fs. [randy.dunlap@oracle.com: emergency thaw only if CONFIG_BLOCK enabled] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: Takashi Sato <t-sato@yk.jp.nec.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:17 -07:00
J. R. Okajima	53d6660836	loop: add ioctl to resize a loop device Add the ability to 'resize' the loop device on the fly. One practical application is a loop file with XFS filesystem, already mounted: You can easily enlarge the file (append some bytes) and then call ioctl(fd, LOOP_SET_CAPACITY, new); The loop driver will learn about the new size and you can use xfs_growfs later on, which will allow you to use full capacity of the loop file without the need to unmount. Test app: #include <linux/fs.h> #include <linux/loop.h> #include <sys/ioctl.h> #include <sys/stat.h> #include <sys/types.h> #include <assert.h> #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #define _GNU_SOURCE #include <getopt.h> char me; void usage(FILE f) { fprintf(f, "%s [options] loop_dev [backend_file]\n" "-s, --set new_size_in_bytes\n" "\twhen backend_file is given, " "it will be expanded too while keeping the original contents\n", me); } struct option opts[] = { { .name = "set", .has_arg = 1, .flag = NULL, .val = 's' }, { .name = "help", .has_arg = 0, .flag = NULL, .val = 'h' } }; void err_size(char name, __u64 old) { fprintf(stderr, "size must be larger than current %s (%llu)\n", name, old); } int main(int argc, char argv[]) { int fd, err, c, i, bfd; ssize_t ssz; size_t sz; __u64 old, new, append; char a[BUFSIZ]; struct stat st; FILE out; char backend, *dev; err = EINVAL; out = stderr; me = argv[0]; new = 0; while ((c = getopt_long(argc, argv, "s:h", opts, &i)) != -1) { switch (c) { case 's': errno = 0; new = strtoull(optarg, NULL, 0); if (errno) { err = errno; perror(argv[i]); goto out; } break; case 'h': err = 0; out = stdout; goto err; default: perror(argv[i]); goto err; } } if (optind < argc) dev = argv[optind++]; else goto err; fd = open(dev, O_RDONLY); if (fd < 0) { err = errno; perror(dev); goto out; } err = ioctl(fd, BLKGETSIZE64, &old); if (err) { err = errno; perror("ioctl BLKGETSIZE64"); goto out; } if (!new) { printf("%llu\n", old); goto out; } if (new < old) { err = EINVAL; err_size(dev, old); goto out; } if (optind < argc) { backend = argv[optind++]; bfd = open(backend, O_WRONLY\|O_APPEND); if (bfd < 0) { err = errno; perror(backend); goto out; } err = fstat(bfd, &st); if (err) { err = errno; perror(backend); goto out; } if (new < st.st_size) { err = EINVAL; err_size(backend, st.st_size); goto out; } append = new - st.st_size; sz = sizeof(a); while (append > 0) { if (append < sz) sz = append; ssz = write(bfd, a, sz); if (ssz != sz) { err = errno; perror(backend); goto out; } append -= sz; } err = fsync(bfd); if (err) { err = errno; perror(backend); goto out; } } err = ioctl(fd, LOOP_SET_CAPACITY, new); if (err) { err = errno; perror("ioctl LOOP_SET_CAPACITY"); } goto out; err: usage(out); out: return err; } Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Tomas Matejicek <tomas@slax.org> Cc: <util-linux-ng@vger.kernel.org> Cc: Karel Zak <kzak@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: <linux-api@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:17 -07:00
Magnus Damm	a8af78982f	pm: rework includes, remove arch ifdefs Make the following header file changes: - remove arch ifdefs and asm/suspend.h from linux/suspend.h - add asm/suspend.h to disk.c (for arch_prepare_suspend()) - add linux/io.h to swsusp.c (for ioremap()) - x86 32/64 bit compile fixes Signed-off-by: Magnus Damm <damm@igel.co.jp> Cc: Paul Mundt <lethal@linux-sh.org> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:16 -07:00
Hugh Dickins	9fab5619bd	shmem: writepage directly to swap Synopsis: if shmem_writepage calls swap_writepage directly, most shmem swap loads benefit, and a catastrophic interaction between SLUB and some flash storage is avoided. shmem_writepage() has always been peculiar in making no attempt to write: it has just transferred a shmem page from file cache to swap cache, then let that page make its way around the LRU again before being written and freed. The idea was that people use tmpfs because they want those pages to stay in RAM; so although we give it an overflow to swap, we should resist writing too soon, giving those pages a second chance before they can be reclaimed. That was always questionable, and I've toyed with this patch for years; but never had a clear justification to depart from the original design. It became more questionable in 2.6.28, when the split LRU patches classed shmem and tmpfs pages as SwapBacked rather than as file_cache: that in itself gives them more resistance to reclaim than normal file pages. I prepared this patch for 2.6.29, but the merge window arrived before I'd completed gathering statistics to justify sending it in. Then while comparing SLQB against SLUB, running SLUB on a laptop I'd habitually used with SLAB, I found SLUB to run my tmpfs kbuild swapping tests five times slower than SLAB or SLQB - other machines slower too, but nowhere near so bad. Simpler "cp -a" swapping tests showed the same. slub_max_order=0 brings sanity to all, but heavy swapping is too far from normal to justify such a tuning. The crucial factor on that laptop turns out to be that I'm using an SD card for swap. What happens is this: By default, SLUB uses order-2 pages for shmem_inode_cache (and many other fs inodes), so creating tmpfs files under memory pressure brings lumpy reclaim into play. One subpage of the order is chosen from the bottom of the LRU as usual, then the other three picked out from their random positions on the LRUs. In a tmpfs load, many of these pages will be ones which already passed through shmem_writepage, so already have swap allocated. And though their offsets on swap were probably allocated sequentially, now that the pages are picked off at random, their swap offsets are scattered. But the flash storage on the SD card is very sensitive to having its writes merged: once swap is written at scattered offsets, performance falls apart. Rotating disk seeks increase too, but less disastrously. So: stop giving shmem/tmpfs pages a second pass around the LRU, write them out to swap as soon as their swap has been allocated. It's surely possible to devise an artificial load which runs faster the old way, one whose sizing is such that the tmpfs pages on their second pass are the ones that are wanted again, and other pages not. But I've not yet found such a load: on all machines, under the loads I've tried, immediate swap_writepage speeds up shmem swapping: especially when using the SLUB allocator (and more effectively than slub_max_order=0), but also with the others; and it also reduces the variance between runs. How much faster varies widely: a factor of five is rare, 5% is common. One load which might have suffered: imagine a swapping shmem load in a limited mem_cgroup on a machine with plenty of memory. Before 2.6.29 the swapcache was not charged, and such a load would have run quickest with the shmem swapcache never written to swap. But now swapcache is charged, so even this load benefits from shmem_writepage directly to swap. Apologies for the #ifndef CONFIG_SWAP swap_writepage() stub in swap.h: it's silly because that will never get called; but refactoring shmem.c sensibly according to CONFIG_SWAP will be a separate task. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:15 -07:00
KAMEZAWA Hiroyuki	327c0e9686	vmscan: fix it to take care of nodemask try_to_free_pages() is used for the direct reclaim of up to SWAP_CLUSTER_MAX pages when watermarks are low. The caller to alloc_pages_nodemask() can specify a nodemask of nodes that are allowed to be used but this is not passed to try_to_free_pages(). This can lead to unnecessary reclaim of pages that are unusable by the caller and int the worst case lead to allocation failure as progress was not been make where it is needed. This patch passes the nodemask used for alloc_pages_nodemask() to try_to_free_pages(). Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:15 -07:00
David Howells	33925b25d2	nommu: there is no mlock() for NOMMU, so don't provide the bits The mlock() facility does not exist for NOMMU since all mappings are effectively locked anyway, so we don't make the bits available when they're not useful. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Greg Ungerer <gerg@snapgear.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Enrik Berkhan <Enrik.Berkhan@ge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:14 -07:00
Akinobu Mita	f4112de6b6	mm: introduce debug_kmap_atomic x86 has debug_kmap_atomic_prot() which is error checking function for kmap_atomic. It is usefull for the other architectures, although it needs CONFIG_TRACE_IRQFLAGS_SUPPORT. This patch exposes it to the other architectures. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:14 -07:00
Nick Piggin	c2ec175c39	mm: page_mkwrite change prototype to match fault Change the page_mkwrite prototype to take a struct vm_fault, and return VM_FAULT_xxx flags. There should be no functional change. This makes it possible to return much more detailed error information to the VM (and also can provide more information eg. virtual_address to the driver, which might be important in some special cases). This is required for a subsequent fix. And will also make it easier to merge page_mkwrite() with fault() in future. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Felix Blyakher <felixb@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:14 -07:00
Anton Blanchard	c2fdf3a9b2	mm: enable hashdist by default on 64bit NUMA On PowerPC we allocate large boot time hashes on node 0. This leads to an imbalance in the free memory, for example on a 64GB box (4 x 16GB nodes): Free memory: Node 0: 97.03% Node 1: 98.54% Node 2: 98.42% Node 3: 98.53% If we switch to using vmalloc (like ia64 and x86-64) things are more balanced: Free memory: Node 0: 97.53% Node 1: 98.35% Node 2: 98.33% Node 3: 98.33% For many HPC applications we are limited by the free available memory on the smallest node, so even though the same amount of memory is used the better balancing helps. Since all 64bit NUMA capable architectures should have sufficient vmalloc space, it makes sense to enable it via CONFIG_64BIT. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:14 -07:00
Alexey Dobriyan	704503d836	mm: fix proc_dointvec_userhz_jiffies "breakage" Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9838 On i386, HZ=1000, jiffies_to_clock_t() converts time in a somewhat strange way from the user's point of view: # echo 500 >/proc/sys/vm/dirty_writeback_centisecs # cat /proc/sys/vm/dirty_writeback_centisecs 499 So, we have 5000 jiffies converted to only 499 clock ticks and reported back. TICK_NSEC = 999848 ACTHZ = 256039 Keeping in-kernel variable in units passed from userspace will fix issue of course, but this probably won't be right for every sysctl. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:13 -07:00
Akinobu Mita	6a11f75b6a	generic debug pagealloc CONFIG_DEBUG_PAGEALLOC is now supported by x86, powerpc, sparc64, and s390. This patch implements it for the rest of the architectures by filling the pages with poison byte patterns after free_pages() and verifying the poison patterns before alloc_pages(). This generic one cannot detect invalid page accesses immediately but invalid read access may cause invalid dereference by poisoned memory and invalid write access can be detected after a long delay. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:13 -07:00
Li Zefan	610a77e04a	memdup_user(): introduce I notice there are many places doing copy_from_user() which follows kmalloc(): dst = kmalloc(len, GFP_KERNEL); if (!dst) return -ENOMEM; if (copy_from_user(dst, src, len)) { kfree(dst); return -EFAULT } memdup_user() is a wrapper of the above code. With this new function, we don't have to write 'len' twice, which can lead to typos/mistakes. It also produces smaller code and kernel text. A quick grep shows 250+ places where memdup_user() may be used. I'll prepare a patchset to do this conversion. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:13 -07:00
KOSAKI Motohiro	d1d7487173	mm: remove pagevec_swap_free() pagevec_swap_free() is now unused. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:13 -07:00
Edward Shishkin	e3a7cca1ef	vfs: add/use account_page_dirtied() Add a helper function account_page_dirtied(). Use that from two callsites. reiser4 adds a function which adds a third callsite. Signed-off-by: Edward Shishkin<edward.shishkin@gmail.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:12 -07:00
KOSAKI Motohiro	ee99c71c59	mm: introduce for_each_populated_zone() macro Impact: cleanup In almost cases, for_each_zone() is used with populated_zone(). It's because almost function doesn't need memoryless node information. Therefore, for_each_populated_zone() can help to make code simplify. This patch has no functional change. [akpm@linux-foundation.org: small cleanup] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:11 -07:00
Oleg Nesterov	9de1581e75	get_mm_hiwater_xxx: trivial, s/define/inline/ Andrew pointed out get_mm_hiwater_xxx() evaluate "mm" argument thrice/twice, make them inline. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:11 -07:00
Alexey Dobriyan	0f043a81eb	proc tty: remove struct tty_operations::read_proc struct tty_operations::proc_fops took it's place and there is one less create_proc_read_entry() user now! Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:10 -07:00
Alexey Dobriyan	ae149b6bec	proc tty: add struct tty_operations::proc_fops Used for gradual switch of TTY drivers from using ->read_proc which helps with gradual switch from ->read_proc for the whole tree. As side effect, fix possible race condition when ->data initialized after PDE is hooked into proc tree. ->proc_fops takes precedence over ->read_proc. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:08 -07:00
Dmitri Vorobiev	ced117c73e	Remove two unneeded exports and make two symbols static in fs/mpage.c Commit `29a814d2ee` (vfs: add hooks for ext4's delayed allocation support) exported the following functions mpage_bio_submit() __mpage_writepage() for the benefit of ext4's delayed allocation support. Since commit `a1d6cc563b` (ext4: Rework the ext4_da_writepages() function), these functions are not used by the ext4 driver anymore. However, the now unnecessary exports still remain, and this patch removes those. Moreover, these two functions can become static again. The issue was spotted by namespacecheck. Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-04-01 07:38:54 -04:00

... 2 3 4 5 6 ...

15644 Commits