License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 21:07:57 +07:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2005-04-17 05:20:36 +07:00
|
|
|
#ifndef __LINUX_NODEMASK_H
|
|
|
|
#define __LINUX_NODEMASK_H
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Nodemasks provide a bitmap suitable for representing the
|
|
|
|
* set of Node's in a system, one bit position per Node number.
|
|
|
|
*
|
|
|
|
* See detailed comments in the file linux/bitmap.h describing the
|
|
|
|
* data type on which these nodemasks are based.
|
|
|
|
*
|
2015-02-14 05:38:15 +07:00
|
|
|
* For details of nodemask_parse_user(), see bitmap_parse_user() in
|
|
|
|
* lib/bitmap.c. For details of nodelist_parse(), see bitmap_parselist(),
|
|
|
|
* also in bitmap.c. For details of node_remap(), see bitmap_bitremap in
|
|
|
|
* lib/bitmap.c. For details of nodes_remap(), see bitmap_remap in
|
|
|
|
* lib/bitmap.c. For details of nodes_onto(), see bitmap_onto in
|
|
|
|
* lib/bitmap.c. For details of nodes_fold(), see bitmap_fold in
|
|
|
|
* lib/bitmap.c.
|
2005-04-17 05:20:36 +07:00
|
|
|
*
|
|
|
|
* The available nodemask operations are:
|
|
|
|
*
|
|
|
|
* void node_set(node, mask) turn on bit 'node' in mask
|
|
|
|
* void node_clear(node, mask) turn off bit 'node' in mask
|
|
|
|
* void nodes_setall(mask) set all bits
|
|
|
|
* void nodes_clear(mask) clear all bits
|
|
|
|
* int node_isset(node, mask) true iff bit 'node' set in mask
|
|
|
|
* int node_test_and_set(node, mask) test and set bit 'node' in mask
|
|
|
|
*
|
|
|
|
* void nodes_and(dst, src1, src2) dst = src1 & src2 [intersection]
|
|
|
|
* void nodes_or(dst, src1, src2) dst = src1 | src2 [union]
|
|
|
|
* void nodes_xor(dst, src1, src2) dst = src1 ^ src2
|
|
|
|
* void nodes_andnot(dst, src1, src2) dst = src1 & ~src2
|
|
|
|
* void nodes_complement(dst, src) dst = ~src
|
|
|
|
*
|
|
|
|
* int nodes_equal(mask1, mask2) Does mask1 == mask2?
|
|
|
|
* int nodes_intersects(mask1, mask2) Do mask1 and mask2 intersect?
|
|
|
|
* int nodes_subset(mask1, mask2) Is mask1 a subset of mask2?
|
|
|
|
* int nodes_empty(mask) Is mask empty (no bits sets)?
|
|
|
|
* int nodes_full(mask) Is mask full (all bits sets)?
|
|
|
|
* int nodes_weight(mask) Hamming weight - number of set bits
|
|
|
|
*
|
|
|
|
* void nodes_shift_right(dst, src, n) Shift right
|
|
|
|
* void nodes_shift_left(dst, src, n) Shift left
|
|
|
|
*
|
|
|
|
* int first_node(mask) Number lowest set bit, or MAX_NUMNODES
|
|
|
|
* int next_node(node, mask) Next node past 'node', or MAX_NUMNODES
|
2016-05-20 07:10:58 +07:00
|
|
|
* int next_node_in(node, mask) Next node past 'node', or wrap to first,
|
|
|
|
* or MAX_NUMNODES
|
2005-04-17 05:20:36 +07:00
|
|
|
* int first_unset_node(mask) First node not set in mask, or
|
2016-05-20 07:10:58 +07:00
|
|
|
* MAX_NUMNODES
|
2005-04-17 05:20:36 +07:00
|
|
|
*
|
|
|
|
* nodemask_t nodemask_of_node(node) Return nodemask with bit 'node' set
|
|
|
|
* NODE_MASK_ALL Initializer - all bits set
|
|
|
|
* NODE_MASK_NONE Initializer - no bits set
|
|
|
|
* unsigned long *nodes_addr(mask) Array of unsigned long's in mask
|
|
|
|
*
|
2006-10-11 15:21:55 +07:00
|
|
|
* int nodemask_parse_user(ubuf, ulen, mask) Parse ascii string as nodemask
|
2005-04-17 05:20:36 +07:00
|
|
|
* int nodelist_parse(buf, map) Parse ascii string as nodelist
|
2005-10-31 06:02:33 +07:00
|
|
|
* int node_remap(oldbit, old, new) newbit = map(old, new)(oldbit)
|
mempolicy: add bitmap_onto() and bitmap_fold() operations
The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(),
with the usual cpumask and nodemask wrappers.
The bitmap_onto() operator computes one bitmap relative to another. If the
n-th bit in the origin mask is set, then the m-th bit of the destination mask
will be set, where m is the position of the n-th set bit in the relative mask.
The bitmap_fold() operator folds a bitmap into a second that has bit m set iff
the input bitmap has some bit n set, where m == n mod sz, for the specified sz
value.
There are two substantive changes between this patch and its
predecessor bitmap_relative:
1) Renamed bitmap_relative() to be bitmap_onto().
2) Added bitmap_fold().
The essential motivation for bitmap_onto() is to provide a mechanism for
converting a cpuset-relative CPU or Node mask to an absolute mask. Cpuset
relative masks are written as if the current task were in a cpuset whose CPUs
or Nodes were just the consecutive ones numbered 0..N-1, for some N. The
bitmap_onto() operator is provided in anticipation of adding support for the
first such cpuset relative mask, by the mbind() and set_mempolicy() system
calls, using a planned flag of MPOL_F_RELATIVE_NODES. These bitmap operators
(and their nodemask wrappers, in particular) will be used in code that
converts the user specified cpuset relative memory policy to a specific system
node numbered policy, given the current mems_allowed of the tasks cpuset.
Such cpuset relative mempolicies will address two deficiencies
of the existing interface between cpusets and mempolicies:
1) A task cannot at present reliably establish a cpuset
relative mempolicy because there is an essential race
condition, in that the tasks cpuset may be changed in
between the time the task can query its cpuset placement,
and the time the task can issue the applicable mbind or
set_memplicy system call.
2) A task cannot at present establish what cpuset relative
mempolicy it would like to have, if it is in a smaller
cpuset than it might have mempolicy preferences for,
because the existing interface only allows specifying
mempolicies for nodes currently allowed by the cpuset.
Cpuset relative mempolicies are useful for tasks that don't distinguish
particularly between one CPU or Node and another, but only between how many of
each are allowed, and the proper placement of threads and memory pages on the
various CPUs and Nodes available.
The motivation for the added bitmap_fold() can be seen in the following
example.
Let's say an application has specified some mempolicies that presume 16 memory
nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset
relative) nodes 12-15. Then lets say that application is crammed into a
cpuset that only has 8 memory nodes, 0-7. If one just uses bitmap_onto(),
this mempolicy, mapped to that cpuset, would ignore the requested relative
nodes above 7, leaving it empty of nodes. That's not good; better to fold the
higher nodes down, so that some nodes are included in the resulting mapped
mempolicy. In this case, the mempolicy nodes 12-15 are taken modulo 8 (the
weight of the mems_allowed of the confining cpuset), resulting in a mempolicy
specifying nodes 4-7.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: <kosaki.motohiro@jp.fujitsu.com>
Cc: <ray-lk@madrabbit.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 16:12:29 +07:00
|
|
|
* void nodes_remap(dst, src, old, new) *dst = map(old, new)(src)
|
|
|
|
* void nodes_onto(dst, orig, relmap) *dst = orig relative to relmap
|
|
|
|
* void nodes_fold(dst, orig, sz) dst bits = orig bits mod sz
|
2005-04-17 05:20:36 +07:00
|
|
|
*
|
|
|
|
* for_each_node_mask(node, mask) for-loop node over mask
|
|
|
|
*
|
|
|
|
* int num_online_nodes() Number of online Nodes
|
|
|
|
* int num_possible_nodes() Number of all possible Nodes
|
|
|
|
*
|
2011-07-27 06:08:30 +07:00
|
|
|
* int node_random(mask) Random node with set bit in mask
|
|
|
|
*
|
2005-04-17 05:20:36 +07:00
|
|
|
* int node_online(node) Is some node online?
|
|
|
|
* int node_possible(node) Is some node possible?
|
|
|
|
*
|
|
|
|
* node_set_online(node) set bit 'node' in node_online_map
|
|
|
|
* node_set_offline(node) clear bit 'node' in node_online_map
|
|
|
|
*
|
|
|
|
* for_each_node(node) for-loop node over node_possible_map
|
|
|
|
* for_each_online_node(node) for-loop node over node_online_map
|
|
|
|
*
|
|
|
|
* Subtlety:
|
|
|
|
* 1) The 'type-checked' form of node_isset() causes gcc (3.3.2, anyway)
|
|
|
|
* to generate slightly worse code. So use a simple one-line #define
|
|
|
|
* for node_isset(), instead of wrapping an inline inside a macro, the
|
|
|
|
* way we do the other calls.
|
mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY aware
At first, init_task's mems_allowed is initialized as this.
init_task->mems_allowed == node_state[N_POSSIBLE]
And cpuset's top_cpuset mask is initialized as this
top_cpuset->mems_allowed = node_state[N_HIGH_MEMORY]
Before 2.6.29:
policy's mems_allowed is initialized as this.
1. update tasks->mems_allowed by its cpuset->mems_allowed.
2. policy->mems_allowed = nodes_and(tasks->mems_allowed, user's mask)
Updating task's mems_allowed in reference to top_cpuset's one.
cpuset's mems_allowed is aware of N_HIGH_MEMORY, always.
In 2.6.30: After commit 58568d2a8215cb6f55caf2332017d7bdff954e1c
("cpuset,mm: update tasks' mems_allowed in time"), policy's mems_allowed
is initialized as this.
1. policy->mems_allowd = nodes_and(task->mems_allowed, user's mask)
Here, if task is in top_cpuset, task->mems_allowed is not updated from
init's one. Assume user excutes command as #numactrl --interleave=all
,....
policy->mems_allowd = nodes_and(N_POSSIBLE, ALL_SET_MASK)
Then, policy's mems_allowd can includes a possible node, which has no pgdat.
MPOL's INTERLEAVE just scans nodemask of task->mems_allowd and access this
directly.
NODE_DATA(nid)->zonelist even if NODE_DATA(nid)==NULL
Then, what's we need is making policy->mems_allowed be aware of
N_HIGH_MEMORY. This patch does that. But to do so, extra nodemask will
be on statck. Because I know cpumask has a new interface of
CPUMASK_ALLOC(), I added it to node.
This patch stands on old behavior. But I feel this fix itself is just a
Band-Aid. But to do fundametal fix, we have to take care of memory
hotplug and it takes time. (task->mems_allowd should be N_HIGH_MEMORY, I
think.)
mpol_set_nodemask() should be aware of N_HIGH_MEMORY and policy's nodemask
should be includes only online nodes.
In old behavior, this is guaranteed by frequent reference to cpuset's
code. Now, most of them are removed and mempolicy has to check it by
itself.
To do check, a few nodemask_t will be used for calculating nodemask. But,
size of nodemask_t can be big and it's not good to allocate them on stack.
Now, cpumask_t has CPUMASK_ALLOC/FREE an easy code for get scratch area.
NODEMASK_ALLOC/FREE shoudl be there.
[akpm@linux-foundation.org: cleanups & tweaks]
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Paul Menage <menage@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Rientjes <rientjes@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 05:07:33 +07:00
|
|
|
*
|
|
|
|
* NODEMASK_SCRATCH
|
|
|
|
* When doing above logical AND, OR, XOR, Remap operations the callers tend to
|
|
|
|
* need temporary nodemask_t's on the stack. But if NODES_SHIFT is large,
|
|
|
|
* nodemask_t's consume too much stack space. NODEMASK_SCRATCH is a helper
|
|
|
|
* for such situations. See below and CPUMASK_ALLOC also.
|
2005-04-17 05:20:36 +07:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/threads.h>
|
|
|
|
#include <linux/bitmap.h>
|
|
|
|
#include <linux/numa.h>
|
|
|
|
|
|
|
|
typedef struct { DECLARE_BITMAP(bits, MAX_NUMNODES); } nodemask_t;
|
|
|
|
extern nodemask_t _unused_nodemask_arg_;
|
|
|
|
|
2015-02-14 05:36:57 +07:00
|
|
|
/**
|
|
|
|
* nodemask_pr_args - printf args to output a nodemask
|
|
|
|
* @maskp: nodemask to be printed
|
|
|
|
*
|
|
|
|
* Can be used to provide arguments for '%*pb[l]' when printing a nodemask.
|
|
|
|
*/
|
2017-11-18 06:26:12 +07:00
|
|
|
#define nodemask_pr_args(maskp) __nodemask_pr_numnodes(maskp), \
|
|
|
|
__nodemask_pr_bits(maskp)
|
|
|
|
static inline unsigned int __nodemask_pr_numnodes(const nodemask_t *m)
|
|
|
|
{
|
|
|
|
return m ? MAX_NUMNODES : 0;
|
|
|
|
}
|
|
|
|
static inline const unsigned long *__nodemask_pr_bits(const nodemask_t *m)
|
|
|
|
{
|
|
|
|
return m ? m->bits : NULL;
|
|
|
|
}
|
2015-02-14 05:36:57 +07:00
|
|
|
|
2013-07-26 01:26:10 +07:00
|
|
|
/*
|
|
|
|
* The inline keyword gives the compiler room to decide to inline, or
|
|
|
|
* not inline a function as it sees best. However, as these functions
|
|
|
|
* are called in both __init and non-__init functions, if they are not
|
|
|
|
* inlined we will end up with a section mis-match error (of the type of
|
|
|
|
* freeable items not being freed). So we must use __always_inline here
|
|
|
|
* to fix the problem. If other functions in the future also end up in
|
|
|
|
* this situation they will also need to be annotated as __always_inline
|
|
|
|
*/
|
2005-04-17 05:20:36 +07:00
|
|
|
#define node_set(node, dst) __node_set((node), &(dst))
|
2013-07-26 01:26:10 +07:00
|
|
|
static __always_inline void __node_set(int node, volatile nodemask_t *dstp)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
|
|
|
set_bit(node, dstp->bits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define node_clear(node, dst) __node_clear((node), &(dst))
|
|
|
|
static inline void __node_clear(int node, volatile nodemask_t *dstp)
|
|
|
|
{
|
|
|
|
clear_bit(node, dstp->bits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_setall(dst) __nodes_setall(&(dst), MAX_NUMNODES)
|
2015-02-13 06:01:56 +07:00
|
|
|
static inline void __nodes_setall(nodemask_t *dstp, unsigned int nbits)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
|
|
|
bitmap_fill(dstp->bits, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_clear(dst) __nodes_clear(&(dst), MAX_NUMNODES)
|
2015-02-13 06:01:56 +07:00
|
|
|
static inline void __nodes_clear(nodemask_t *dstp, unsigned int nbits)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
|
|
|
bitmap_zero(dstp->bits, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* No static inline type checking - see Subtlety (1) above. */
|
|
|
|
#define node_isset(node, nodemask) test_bit((node), (nodemask).bits)
|
|
|
|
|
|
|
|
#define node_test_and_set(node, nodemask) \
|
|
|
|
__node_test_and_set((node), &(nodemask))
|
|
|
|
static inline int __node_test_and_set(int node, nodemask_t *addr)
|
|
|
|
{
|
|
|
|
return test_and_set_bit(node, addr->bits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_and(dst, src1, src2) \
|
|
|
|
__nodes_and(&(dst), &(src1), &(src2), MAX_NUMNODES)
|
|
|
|
static inline void __nodes_and(nodemask_t *dstp, const nodemask_t *src1p,
|
2015-02-13 06:01:56 +07:00
|
|
|
const nodemask_t *src2p, unsigned int nbits)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
|
|
|
bitmap_and(dstp->bits, src1p->bits, src2p->bits, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_or(dst, src1, src2) \
|
|
|
|
__nodes_or(&(dst), &(src1), &(src2), MAX_NUMNODES)
|
|
|
|
static inline void __nodes_or(nodemask_t *dstp, const nodemask_t *src1p,
|
2015-02-13 06:01:56 +07:00
|
|
|
const nodemask_t *src2p, unsigned int nbits)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
|
|
|
bitmap_or(dstp->bits, src1p->bits, src2p->bits, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_xor(dst, src1, src2) \
|
|
|
|
__nodes_xor(&(dst), &(src1), &(src2), MAX_NUMNODES)
|
|
|
|
static inline void __nodes_xor(nodemask_t *dstp, const nodemask_t *src1p,
|
2015-02-13 06:01:56 +07:00
|
|
|
const nodemask_t *src2p, unsigned int nbits)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
|
|
|
bitmap_xor(dstp->bits, src1p->bits, src2p->bits, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_andnot(dst, src1, src2) \
|
|
|
|
__nodes_andnot(&(dst), &(src1), &(src2), MAX_NUMNODES)
|
|
|
|
static inline void __nodes_andnot(nodemask_t *dstp, const nodemask_t *src1p,
|
2015-02-13 06:01:56 +07:00
|
|
|
const nodemask_t *src2p, unsigned int nbits)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
|
|
|
bitmap_andnot(dstp->bits, src1p->bits, src2p->bits, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_complement(dst, src) \
|
|
|
|
__nodes_complement(&(dst), &(src), MAX_NUMNODES)
|
|
|
|
static inline void __nodes_complement(nodemask_t *dstp,
|
2015-02-13 06:01:56 +07:00
|
|
|
const nodemask_t *srcp, unsigned int nbits)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
|
|
|
bitmap_complement(dstp->bits, srcp->bits, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_equal(src1, src2) \
|
|
|
|
__nodes_equal(&(src1), &(src2), MAX_NUMNODES)
|
|
|
|
static inline int __nodes_equal(const nodemask_t *src1p,
|
2015-02-13 06:01:56 +07:00
|
|
|
const nodemask_t *src2p, unsigned int nbits)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
|
|
|
return bitmap_equal(src1p->bits, src2p->bits, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_intersects(src1, src2) \
|
|
|
|
__nodes_intersects(&(src1), &(src2), MAX_NUMNODES)
|
|
|
|
static inline int __nodes_intersects(const nodemask_t *src1p,
|
2015-02-13 06:01:56 +07:00
|
|
|
const nodemask_t *src2p, unsigned int nbits)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
|
|
|
return bitmap_intersects(src1p->bits, src2p->bits, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_subset(src1, src2) \
|
|
|
|
__nodes_subset(&(src1), &(src2), MAX_NUMNODES)
|
|
|
|
static inline int __nodes_subset(const nodemask_t *src1p,
|
2015-02-13 06:01:56 +07:00
|
|
|
const nodemask_t *src2p, unsigned int nbits)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
|
|
|
return bitmap_subset(src1p->bits, src2p->bits, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_empty(src) __nodes_empty(&(src), MAX_NUMNODES)
|
2015-02-13 06:01:56 +07:00
|
|
|
static inline int __nodes_empty(const nodemask_t *srcp, unsigned int nbits)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
|
|
|
return bitmap_empty(srcp->bits, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_full(nodemask) __nodes_full(&(nodemask), MAX_NUMNODES)
|
2015-02-13 06:01:56 +07:00
|
|
|
static inline int __nodes_full(const nodemask_t *srcp, unsigned int nbits)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
|
|
|
return bitmap_full(srcp->bits, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_weight(nodemask) __nodes_weight(&(nodemask), MAX_NUMNODES)
|
2015-02-13 06:01:56 +07:00
|
|
|
static inline int __nodes_weight(const nodemask_t *srcp, unsigned int nbits)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
|
|
|
return bitmap_weight(srcp->bits, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_shift_right(dst, src, n) \
|
|
|
|
__nodes_shift_right(&(dst), &(src), (n), MAX_NUMNODES)
|
|
|
|
static inline void __nodes_shift_right(nodemask_t *dstp,
|
|
|
|
const nodemask_t *srcp, int n, int nbits)
|
|
|
|
{
|
|
|
|
bitmap_shift_right(dstp->bits, srcp->bits, n, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_shift_left(dst, src, n) \
|
|
|
|
__nodes_shift_left(&(dst), &(src), (n), MAX_NUMNODES)
|
|
|
|
static inline void __nodes_shift_left(nodemask_t *dstp,
|
|
|
|
const nodemask_t *srcp, int n, int nbits)
|
|
|
|
{
|
|
|
|
bitmap_shift_left(dstp->bits, srcp->bits, n, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* FIXME: better would be to fix all architectures to never return
|
|
|
|
> MAX_NUMNODES, then the silly min_ts could be dropped. */
|
|
|
|
|
|
|
|
#define first_node(src) __first_node(&(src))
|
|
|
|
static inline int __first_node(const nodemask_t *srcp)
|
|
|
|
{
|
|
|
|
return min_t(int, MAX_NUMNODES, find_first_bit(srcp->bits, MAX_NUMNODES));
|
|
|
|
}
|
|
|
|
|
|
|
|
#define next_node(n, src) __next_node((n), &(src))
|
|
|
|
static inline int __next_node(int n, const nodemask_t *srcp)
|
|
|
|
{
|
|
|
|
return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1));
|
|
|
|
}
|
|
|
|
|
2016-05-20 07:10:58 +07:00
|
|
|
/*
|
|
|
|
* Find the next present node in src, starting after node n, wrapping around to
|
|
|
|
* the first node in src if needed. Returns MAX_NUMNODES if src is empty.
|
|
|
|
*/
|
|
|
|
#define next_node_in(n, src) __next_node_in((n), &(src))
|
|
|
|
int __next_node_in(int node, const nodemask_t *srcp);
|
|
|
|
|
2009-12-15 08:58:17 +07:00
|
|
|
static inline void init_nodemask_of_node(nodemask_t *mask, int node)
|
|
|
|
{
|
|
|
|
nodes_clear(*mask);
|
|
|
|
node_set(node, *mask);
|
|
|
|
}
|
|
|
|
|
2005-04-17 05:20:36 +07:00
|
|
|
#define nodemask_of_node(node) \
|
|
|
|
({ \
|
|
|
|
typeof(_unused_nodemask_arg_) m; \
|
|
|
|
if (sizeof(m) == sizeof(unsigned long)) { \
|
2009-12-15 08:58:17 +07:00
|
|
|
m.bits[0] = 1UL << (node); \
|
2005-04-17 05:20:36 +07:00
|
|
|
} else { \
|
2009-12-15 08:58:17 +07:00
|
|
|
init_nodemask_of_node(&m, (node)); \
|
2005-04-17 05:20:36 +07:00
|
|
|
} \
|
|
|
|
m; \
|
|
|
|
})
|
|
|
|
|
|
|
|
#define first_unset_node(mask) __first_unset_node(&(mask))
|
|
|
|
static inline int __first_unset_node(const nodemask_t *maskp)
|
|
|
|
{
|
|
|
|
return min_t(int,MAX_NUMNODES,
|
|
|
|
find_first_zero_bit(maskp->bits, MAX_NUMNODES));
|
|
|
|
}
|
|
|
|
|
|
|
|
#define NODE_MASK_LAST_WORD BITMAP_LAST_WORD_MASK(MAX_NUMNODES)
|
|
|
|
|
|
|
|
#if MAX_NUMNODES <= BITS_PER_LONG
|
|
|
|
|
|
|
|
#define NODE_MASK_ALL \
|
|
|
|
((nodemask_t) { { \
|
|
|
|
[BITS_TO_LONGS(MAX_NUMNODES)-1] = NODE_MASK_LAST_WORD \
|
|
|
|
} })
|
|
|
|
|
|
|
|
#else
|
|
|
|
|
|
|
|
#define NODE_MASK_ALL \
|
|
|
|
((nodemask_t) { { \
|
|
|
|
[0 ... BITS_TO_LONGS(MAX_NUMNODES)-2] = ~0UL, \
|
|
|
|
[BITS_TO_LONGS(MAX_NUMNODES)-1] = NODE_MASK_LAST_WORD \
|
|
|
|
} })
|
|
|
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#define NODE_MASK_NONE \
|
|
|
|
((nodemask_t) { { \
|
|
|
|
[0 ... BITS_TO_LONGS(MAX_NUMNODES)-1] = 0UL \
|
|
|
|
} })
|
|
|
|
|
|
|
|
#define nodes_addr(src) ((src).bits)
|
|
|
|
|
2006-10-11 15:21:55 +07:00
|
|
|
#define nodemask_parse_user(ubuf, ulen, dst) \
|
|
|
|
__nodemask_parse_user((ubuf), (ulen), &(dst), MAX_NUMNODES)
|
|
|
|
static inline int __nodemask_parse_user(const char __user *buf, int len,
|
2005-04-17 05:20:36 +07:00
|
|
|
nodemask_t *dstp, int nbits)
|
|
|
|
{
|
2006-10-11 15:21:55 +07:00
|
|
|
return bitmap_parse_user(buf, len, dstp->bits, nbits);
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
#define nodelist_parse(buf, dst) __nodelist_parse((buf), &(dst), MAX_NUMNODES)
|
|
|
|
static inline int __nodelist_parse(const char *buf, nodemask_t *dstp, int nbits)
|
|
|
|
{
|
|
|
|
return bitmap_parselist(buf, dstp->bits, nbits);
|
|
|
|
}
|
|
|
|
|
2005-10-31 06:02:33 +07:00
|
|
|
#define node_remap(oldbit, old, new) \
|
|
|
|
__node_remap((oldbit), &(old), &(new), MAX_NUMNODES)
|
|
|
|
static inline int __node_remap(int oldbit,
|
|
|
|
const nodemask_t *oldp, const nodemask_t *newp, int nbits)
|
|
|
|
{
|
|
|
|
return bitmap_bitremap(oldbit, oldp->bits, newp->bits, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_remap(dst, src, old, new) \
|
|
|
|
__nodes_remap(&(dst), &(src), &(old), &(new), MAX_NUMNODES)
|
|
|
|
static inline void __nodes_remap(nodemask_t *dstp, const nodemask_t *srcp,
|
|
|
|
const nodemask_t *oldp, const nodemask_t *newp, int nbits)
|
|
|
|
{
|
|
|
|
bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits);
|
|
|
|
}
|
|
|
|
|
mempolicy: add bitmap_onto() and bitmap_fold() operations
The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(),
with the usual cpumask and nodemask wrappers.
The bitmap_onto() operator computes one bitmap relative to another. If the
n-th bit in the origin mask is set, then the m-th bit of the destination mask
will be set, where m is the position of the n-th set bit in the relative mask.
The bitmap_fold() operator folds a bitmap into a second that has bit m set iff
the input bitmap has some bit n set, where m == n mod sz, for the specified sz
value.
There are two substantive changes between this patch and its
predecessor bitmap_relative:
1) Renamed bitmap_relative() to be bitmap_onto().
2) Added bitmap_fold().
The essential motivation for bitmap_onto() is to provide a mechanism for
converting a cpuset-relative CPU or Node mask to an absolute mask. Cpuset
relative masks are written as if the current task were in a cpuset whose CPUs
or Nodes were just the consecutive ones numbered 0..N-1, for some N. The
bitmap_onto() operator is provided in anticipation of adding support for the
first such cpuset relative mask, by the mbind() and set_mempolicy() system
calls, using a planned flag of MPOL_F_RELATIVE_NODES. These bitmap operators
(and their nodemask wrappers, in particular) will be used in code that
converts the user specified cpuset relative memory policy to a specific system
node numbered policy, given the current mems_allowed of the tasks cpuset.
Such cpuset relative mempolicies will address two deficiencies
of the existing interface between cpusets and mempolicies:
1) A task cannot at present reliably establish a cpuset
relative mempolicy because there is an essential race
condition, in that the tasks cpuset may be changed in
between the time the task can query its cpuset placement,
and the time the task can issue the applicable mbind or
set_memplicy system call.
2) A task cannot at present establish what cpuset relative
mempolicy it would like to have, if it is in a smaller
cpuset than it might have mempolicy preferences for,
because the existing interface only allows specifying
mempolicies for nodes currently allowed by the cpuset.
Cpuset relative mempolicies are useful for tasks that don't distinguish
particularly between one CPU or Node and another, but only between how many of
each are allowed, and the proper placement of threads and memory pages on the
various CPUs and Nodes available.
The motivation for the added bitmap_fold() can be seen in the following
example.
Let's say an application has specified some mempolicies that presume 16 memory
nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset
relative) nodes 12-15. Then lets say that application is crammed into a
cpuset that only has 8 memory nodes, 0-7. If one just uses bitmap_onto(),
this mempolicy, mapped to that cpuset, would ignore the requested relative
nodes above 7, leaving it empty of nodes. That's not good; better to fold the
higher nodes down, so that some nodes are included in the resulting mapped
mempolicy. In this case, the mempolicy nodes 12-15 are taken modulo 8 (the
weight of the mems_allowed of the confining cpuset), resulting in a mempolicy
specifying nodes 4-7.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: <kosaki.motohiro@jp.fujitsu.com>
Cc: <ray-lk@madrabbit.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 16:12:29 +07:00
|
|
|
#define nodes_onto(dst, orig, relmap) \
|
|
|
|
__nodes_onto(&(dst), &(orig), &(relmap), MAX_NUMNODES)
|
|
|
|
static inline void __nodes_onto(nodemask_t *dstp, const nodemask_t *origp,
|
|
|
|
const nodemask_t *relmapp, int nbits)
|
|
|
|
{
|
|
|
|
bitmap_onto(dstp->bits, origp->bits, relmapp->bits, nbits);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define nodes_fold(dst, orig, sz) \
|
|
|
|
__nodes_fold(&(dst), &(orig), sz, MAX_NUMNODES)
|
|
|
|
static inline void __nodes_fold(nodemask_t *dstp, const nodemask_t *origp,
|
|
|
|
int sz, int nbits)
|
|
|
|
{
|
|
|
|
bitmap_fold(dstp->bits, origp->bits, sz, nbits);
|
|
|
|
}
|
|
|
|
|
2005-04-17 05:20:36 +07:00
|
|
|
#if MAX_NUMNODES > 1
|
|
|
|
#define for_each_node_mask(node, mask) \
|
|
|
|
for ((node) = first_node(mask); \
|
|
|
|
(node) < MAX_NUMNODES; \
|
|
|
|
(node) = next_node((node), (mask)))
|
|
|
|
#else /* MAX_NUMNODES == 1 */
|
|
|
|
#define for_each_node_mask(node, mask) \
|
|
|
|
if (!nodes_empty(mask)) \
|
|
|
|
for ((node) = 0; (node) < 1; (node)++)
|
|
|
|
#endif /* MAX_NUMNODES */
|
|
|
|
|
Memoryless nodes: Generic management of nodemasks for various purposes
Why do we need to support memoryless nodes?
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> For fujitsu, problem is called "empty" node.
>
> When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init)
> creates nodes, which includes no memory, no cpu.
>
> I tried to remove empty-node in past, but that was denied.
> It was because we can hot-add cpu to the empty node.
> (node-hotplug triggered by cpu is not implemented now. and it will be ugly.)
>
>
> For HP, (Lee can comment on this later), they have memory-less-node.
> As far as I hear, HP's machine can have following configration.
>
> (example)
> Node0: CPU0 memory AAA MB
> Node1: CPU1 memory AAA MB
> Node2: CPU2 memory AAA MB
> Node3: CPU3 memory AAA MB
> Node4: Memory XXX GB
>
> AAA is very small value (below 16MB) and will be omitted by ia64 bootstrap.
> After boot, only Node 4 has valid memory (but have no cpu.)
>
> Maybe this is memory-interleave by firmware config.
Christoph Lameter <clameter@sgi.com> wrote:
> Future SGI platforms (actually also current one can have but nothing like
> that is deployed to my knowledge) have nodes with only cpus. Current SGI
> platforms have nodes with just I/O that we so far cannot manage in the
> core. So the arch code maps them to the nearest memory node.
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> For the HP platforms, we can configure each cell with from 0% to 100%
> "cell local memory". When we configure with <100% CLM, the "missing
> percentages" are interleaved by hardware on a cache-line granularity to
> improve bandwidth at the expense of latency for numa-challenged
> applications [and OSes, but not our problem ;-)]. When we boot Linux on
> such a config, all of the real nodes have no memory--it all resides in a
> single interleaved pseudo-node.
>
> When we boot Linux on a 100% CLM configuration [== NUMA], we still have
> the interleaved pseudo-node. It contains a few hundred MB stolen from
> the real nodes to contain the DMA zone. [Interleaved memory resides at
> phys addr 0]. The memoryless-nodes patches, along with the zoneorder
> patches, support this config as well.
>
> Also, when we boot a NUMA config with the "mem=" command line,
> specifying less memory than actually exists, Linux takes the excluded
> memory "off the top" rather than distributing it across the nodes. This
> can result in memoryless nodes, as well.
>
This patch:
Preparation for memoryless node patches.
Provide a generic way to keep nodemasks describing various characteristics of
NUMA nodes.
Remove the node_online_map and the node_possible map and realize the same
functionality using two nodes stats: N_POSSIBLE and N_ONLINE.
[Lee.Schermerhorn@hp.com: Initialize N_*_MEMORY and N_CPU masks for non-NUMA config]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Bob Picco <bob.picco@hp.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 15:25:27 +07:00
|
|
|
/*
|
|
|
|
* Bitmasks that are kept for all the nodes.
|
|
|
|
*/
|
|
|
|
enum node_states {
|
2007-10-16 15:25:29 +07:00
|
|
|
N_POSSIBLE, /* The node could become online at some point */
|
|
|
|
N_ONLINE, /* The node is online */
|
|
|
|
N_NORMAL_MEMORY, /* The node has regular memory */
|
|
|
|
#ifdef CONFIG_HIGHMEM
|
|
|
|
N_HIGH_MEMORY, /* The node has regular or high memory */
|
|
|
|
#else
|
|
|
|
N_HIGH_MEMORY = N_NORMAL_MEMORY,
|
|
|
|
#endif
|
2012-12-13 04:52:00 +07:00
|
|
|
N_MEMORY, /* The node has memory(regular, high, movable) */
|
2007-10-16 15:25:36 +07:00
|
|
|
N_CPU, /* The node has one or more cpus */
|
Memoryless nodes: Generic management of nodemasks for various purposes
Why do we need to support memoryless nodes?
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> For fujitsu, problem is called "empty" node.
>
> When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init)
> creates nodes, which includes no memory, no cpu.
>
> I tried to remove empty-node in past, but that was denied.
> It was because we can hot-add cpu to the empty node.
> (node-hotplug triggered by cpu is not implemented now. and it will be ugly.)
>
>
> For HP, (Lee can comment on this later), they have memory-less-node.
> As far as I hear, HP's machine can have following configration.
>
> (example)
> Node0: CPU0 memory AAA MB
> Node1: CPU1 memory AAA MB
> Node2: CPU2 memory AAA MB
> Node3: CPU3 memory AAA MB
> Node4: Memory XXX GB
>
> AAA is very small value (below 16MB) and will be omitted by ia64 bootstrap.
> After boot, only Node 4 has valid memory (but have no cpu.)
>
> Maybe this is memory-interleave by firmware config.
Christoph Lameter <clameter@sgi.com> wrote:
> Future SGI platforms (actually also current one can have but nothing like
> that is deployed to my knowledge) have nodes with only cpus. Current SGI
> platforms have nodes with just I/O that we so far cannot manage in the
> core. So the arch code maps them to the nearest memory node.
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> For the HP platforms, we can configure each cell with from 0% to 100%
> "cell local memory". When we configure with <100% CLM, the "missing
> percentages" are interleaved by hardware on a cache-line granularity to
> improve bandwidth at the expense of latency for numa-challenged
> applications [and OSes, but not our problem ;-)]. When we boot Linux on
> such a config, all of the real nodes have no memory--it all resides in a
> single interleaved pseudo-node.
>
> When we boot Linux on a 100% CLM configuration [== NUMA], we still have
> the interleaved pseudo-node. It contains a few hundred MB stolen from
> the real nodes to contain the DMA zone. [Interleaved memory resides at
> phys addr 0]. The memoryless-nodes patches, along with the zoneorder
> patches, support this config as well.
>
> Also, when we boot a NUMA config with the "mem=" command line,
> specifying less memory than actually exists, Linux takes the excluded
> memory "off the top" rather than distributing it across the nodes. This
> can result in memoryless nodes, as well.
>
This patch:
Preparation for memoryless node patches.
Provide a generic way to keep nodemasks describing various characteristics of
NUMA nodes.
Remove the node_online_map and the node_possible map and realize the same
functionality using two nodes stats: N_POSSIBLE and N_ONLINE.
[Lee.Schermerhorn@hp.com: Initialize N_*_MEMORY and N_CPU masks for non-NUMA config]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Bob Picco <bob.picco@hp.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 15:25:27 +07:00
|
|
|
NR_NODE_STATES
|
|
|
|
};
|
|
|
|
|
2005-04-17 05:20:36 +07:00
|
|
|
/*
|
|
|
|
* The following particular system nodemasks and operations
|
|
|
|
* on them manage all possible and online nodes.
|
|
|
|
*/
|
|
|
|
|
Memoryless nodes: Generic management of nodemasks for various purposes
Why do we need to support memoryless nodes?
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> For fujitsu, problem is called "empty" node.
>
> When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init)
> creates nodes, which includes no memory, no cpu.
>
> I tried to remove empty-node in past, but that was denied.
> It was because we can hot-add cpu to the empty node.
> (node-hotplug triggered by cpu is not implemented now. and it will be ugly.)
>
>
> For HP, (Lee can comment on this later), they have memory-less-node.
> As far as I hear, HP's machine can have following configration.
>
> (example)
> Node0: CPU0 memory AAA MB
> Node1: CPU1 memory AAA MB
> Node2: CPU2 memory AAA MB
> Node3: CPU3 memory AAA MB
> Node4: Memory XXX GB
>
> AAA is very small value (below 16MB) and will be omitted by ia64 bootstrap.
> After boot, only Node 4 has valid memory (but have no cpu.)
>
> Maybe this is memory-interleave by firmware config.
Christoph Lameter <clameter@sgi.com> wrote:
> Future SGI platforms (actually also current one can have but nothing like
> that is deployed to my knowledge) have nodes with only cpus. Current SGI
> platforms have nodes with just I/O that we so far cannot manage in the
> core. So the arch code maps them to the nearest memory node.
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> For the HP platforms, we can configure each cell with from 0% to 100%
> "cell local memory". When we configure with <100% CLM, the "missing
> percentages" are interleaved by hardware on a cache-line granularity to
> improve bandwidth at the expense of latency for numa-challenged
> applications [and OSes, but not our problem ;-)]. When we boot Linux on
> such a config, all of the real nodes have no memory--it all resides in a
> single interleaved pseudo-node.
>
> When we boot Linux on a 100% CLM configuration [== NUMA], we still have
> the interleaved pseudo-node. It contains a few hundred MB stolen from
> the real nodes to contain the DMA zone. [Interleaved memory resides at
> phys addr 0]. The memoryless-nodes patches, along with the zoneorder
> patches, support this config as well.
>
> Also, when we boot a NUMA config with the "mem=" command line,
> specifying less memory than actually exists, Linux takes the excluded
> memory "off the top" rather than distributing it across the nodes. This
> can result in memoryless nodes, as well.
>
This patch:
Preparation for memoryless node patches.
Provide a generic way to keep nodemasks describing various characteristics of
NUMA nodes.
Remove the node_online_map and the node_possible map and realize the same
functionality using two nodes stats: N_POSSIBLE and N_ONLINE.
[Lee.Schermerhorn@hp.com: Initialize N_*_MEMORY and N_CPU masks for non-NUMA config]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Bob Picco <bob.picco@hp.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 15:25:27 +07:00
|
|
|
extern nodemask_t node_states[NR_NODE_STATES];
|
2005-04-17 05:20:36 +07:00
|
|
|
|
|
|
|
#if MAX_NUMNODES > 1
|
Memoryless nodes: Generic management of nodemasks for various purposes
Why do we need to support memoryless nodes?
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> For fujitsu, problem is called "empty" node.
>
> When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init)
> creates nodes, which includes no memory, no cpu.
>
> I tried to remove empty-node in past, but that was denied.
> It was because we can hot-add cpu to the empty node.
> (node-hotplug triggered by cpu is not implemented now. and it will be ugly.)
>
>
> For HP, (Lee can comment on this later), they have memory-less-node.
> As far as I hear, HP's machine can have following configration.
>
> (example)
> Node0: CPU0 memory AAA MB
> Node1: CPU1 memory AAA MB
> Node2: CPU2 memory AAA MB
> Node3: CPU3 memory AAA MB
> Node4: Memory XXX GB
>
> AAA is very small value (below 16MB) and will be omitted by ia64 bootstrap.
> After boot, only Node 4 has valid memory (but have no cpu.)
>
> Maybe this is memory-interleave by firmware config.
Christoph Lameter <clameter@sgi.com> wrote:
> Future SGI platforms (actually also current one can have but nothing like
> that is deployed to my knowledge) have nodes with only cpus. Current SGI
> platforms have nodes with just I/O that we so far cannot manage in the
> core. So the arch code maps them to the nearest memory node.
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> For the HP platforms, we can configure each cell with from 0% to 100%
> "cell local memory". When we configure with <100% CLM, the "missing
> percentages" are interleaved by hardware on a cache-line granularity to
> improve bandwidth at the expense of latency for numa-challenged
> applications [and OSes, but not our problem ;-)]. When we boot Linux on
> such a config, all of the real nodes have no memory--it all resides in a
> single interleaved pseudo-node.
>
> When we boot Linux on a 100% CLM configuration [== NUMA], we still have
> the interleaved pseudo-node. It contains a few hundred MB stolen from
> the real nodes to contain the DMA zone. [Interleaved memory resides at
> phys addr 0]. The memoryless-nodes patches, along with the zoneorder
> patches, support this config as well.
>
> Also, when we boot a NUMA config with the "mem=" command line,
> specifying less memory than actually exists, Linux takes the excluded
> memory "off the top" rather than distributing it across the nodes. This
> can result in memoryless nodes, as well.
>
This patch:
Preparation for memoryless node patches.
Provide a generic way to keep nodemasks describing various characteristics of
NUMA nodes.
Remove the node_online_map and the node_possible map and realize the same
functionality using two nodes stats: N_POSSIBLE and N_ONLINE.
[Lee.Schermerhorn@hp.com: Initialize N_*_MEMORY and N_CPU masks for non-NUMA config]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Bob Picco <bob.picco@hp.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 15:25:27 +07:00
|
|
|
static inline int node_state(int node, enum node_states state)
|
|
|
|
{
|
|
|
|
return node_isset(node, node_states[state]);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void node_set_state(int node, enum node_states state)
|
|
|
|
{
|
|
|
|
__node_set(node, &node_states[state]);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void node_clear_state(int node, enum node_states state)
|
|
|
|
{
|
|
|
|
__node_clear(node, &node_states[state]);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int num_node_state(enum node_states state)
|
|
|
|
{
|
|
|
|
return nodes_weight(node_states[state]);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define for_each_node_state(__node, __state) \
|
|
|
|
for_each_node_mask((__node), node_states[__state])
|
|
|
|
|
|
|
|
#define first_online_node first_node(node_states[N_ONLINE])
|
2014-08-07 06:07:50 +07:00
|
|
|
#define first_memory_node first_node(node_states[N_MEMORY])
|
|
|
|
static inline int next_online_node(int nid)
|
|
|
|
{
|
|
|
|
return next_node(nid, node_states[N_ONLINE]);
|
|
|
|
}
|
|
|
|
static inline int next_memory_node(int nid)
|
|
|
|
{
|
|
|
|
return next_node(nid, node_states[N_MEMORY]);
|
|
|
|
}
|
Memoryless nodes: Generic management of nodemasks for various purposes
Why do we need to support memoryless nodes?
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> For fujitsu, problem is called "empty" node.
>
> When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init)
> creates nodes, which includes no memory, no cpu.
>
> I tried to remove empty-node in past, but that was denied.
> It was because we can hot-add cpu to the empty node.
> (node-hotplug triggered by cpu is not implemented now. and it will be ugly.)
>
>
> For HP, (Lee can comment on this later), they have memory-less-node.
> As far as I hear, HP's machine can have following configration.
>
> (example)
> Node0: CPU0 memory AAA MB
> Node1: CPU1 memory AAA MB
> Node2: CPU2 memory AAA MB
> Node3: CPU3 memory AAA MB
> Node4: Memory XXX GB
>
> AAA is very small value (below 16MB) and will be omitted by ia64 bootstrap.
> After boot, only Node 4 has valid memory (but have no cpu.)
>
> Maybe this is memory-interleave by firmware config.
Christoph Lameter <clameter@sgi.com> wrote:
> Future SGI platforms (actually also current one can have but nothing like
> that is deployed to my knowledge) have nodes with only cpus. Current SGI
> platforms have nodes with just I/O that we so far cannot manage in the
> core. So the arch code maps them to the nearest memory node.
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> For the HP platforms, we can configure each cell with from 0% to 100%
> "cell local memory". When we configure with <100% CLM, the "missing
> percentages" are interleaved by hardware on a cache-line granularity to
> improve bandwidth at the expense of latency for numa-challenged
> applications [and OSes, but not our problem ;-)]. When we boot Linux on
> such a config, all of the real nodes have no memory--it all resides in a
> single interleaved pseudo-node.
>
> When we boot Linux on a 100% CLM configuration [== NUMA], we still have
> the interleaved pseudo-node. It contains a few hundred MB stolen from
> the real nodes to contain the DMA zone. [Interleaved memory resides at
> phys addr 0]. The memoryless-nodes patches, along with the zoneorder
> patches, support this config as well.
>
> Also, when we boot a NUMA config with the "mem=" command line,
> specifying less memory than actually exists, Linux takes the excluded
> memory "off the top" rather than distributing it across the nodes. This
> can result in memoryless nodes, as well.
>
This patch:
Preparation for memoryless node patches.
Provide a generic way to keep nodemasks describing various characteristics of
NUMA nodes.
Remove the node_online_map and the node_possible map and realize the same
functionality using two nodes stats: N_POSSIBLE and N_ONLINE.
[Lee.Schermerhorn@hp.com: Initialize N_*_MEMORY and N_CPU masks for non-NUMA config]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Bob Picco <bob.picco@hp.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 15:25:27 +07:00
|
|
|
|
2007-02-21 04:57:51 +07:00
|
|
|
extern int nr_node_ids;
|
2009-06-17 05:32:15 +07:00
|
|
|
extern int nr_online_nodes;
|
|
|
|
|
|
|
|
static inline void node_set_online(int nid)
|
|
|
|
{
|
|
|
|
node_set_state(nid, N_ONLINE);
|
|
|
|
nr_online_nodes = num_node_state(N_ONLINE);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void node_set_offline(int nid)
|
|
|
|
{
|
|
|
|
node_clear_state(nid, N_ONLINE);
|
|
|
|
nr_online_nodes = num_node_state(N_ONLINE);
|
|
|
|
}
|
2011-07-27 06:08:30 +07:00
|
|
|
|
2005-04-17 05:20:36 +07:00
|
|
|
#else
|
Memoryless nodes: Generic management of nodemasks for various purposes
Why do we need to support memoryless nodes?
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> For fujitsu, problem is called "empty" node.
>
> When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init)
> creates nodes, which includes no memory, no cpu.
>
> I tried to remove empty-node in past, but that was denied.
> It was because we can hot-add cpu to the empty node.
> (node-hotplug triggered by cpu is not implemented now. and it will be ugly.)
>
>
> For HP, (Lee can comment on this later), they have memory-less-node.
> As far as I hear, HP's machine can have following configration.
>
> (example)
> Node0: CPU0 memory AAA MB
> Node1: CPU1 memory AAA MB
> Node2: CPU2 memory AAA MB
> Node3: CPU3 memory AAA MB
> Node4: Memory XXX GB
>
> AAA is very small value (below 16MB) and will be omitted by ia64 bootstrap.
> After boot, only Node 4 has valid memory (but have no cpu.)
>
> Maybe this is memory-interleave by firmware config.
Christoph Lameter <clameter@sgi.com> wrote:
> Future SGI platforms (actually also current one can have but nothing like
> that is deployed to my knowledge) have nodes with only cpus. Current SGI
> platforms have nodes with just I/O that we so far cannot manage in the
> core. So the arch code maps them to the nearest memory node.
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> For the HP platforms, we can configure each cell with from 0% to 100%
> "cell local memory". When we configure with <100% CLM, the "missing
> percentages" are interleaved by hardware on a cache-line granularity to
> improve bandwidth at the expense of latency for numa-challenged
> applications [and OSes, but not our problem ;-)]. When we boot Linux on
> such a config, all of the real nodes have no memory--it all resides in a
> single interleaved pseudo-node.
>
> When we boot Linux on a 100% CLM configuration [== NUMA], we still have
> the interleaved pseudo-node. It contains a few hundred MB stolen from
> the real nodes to contain the DMA zone. [Interleaved memory resides at
> phys addr 0]. The memoryless-nodes patches, along with the zoneorder
> patches, support this config as well.
>
> Also, when we boot a NUMA config with the "mem=" command line,
> specifying less memory than actually exists, Linux takes the excluded
> memory "off the top" rather than distributing it across the nodes. This
> can result in memoryless nodes, as well.
>
This patch:
Preparation for memoryless node patches.
Provide a generic way to keep nodemasks describing various characteristics of
NUMA nodes.
Remove the node_online_map and the node_possible map and realize the same
functionality using two nodes stats: N_POSSIBLE and N_ONLINE.
[Lee.Schermerhorn@hp.com: Initialize N_*_MEMORY and N_CPU masks for non-NUMA config]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Bob Picco <bob.picco@hp.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 15:25:27 +07:00
|
|
|
|
|
|
|
static inline int node_state(int node, enum node_states state)
|
|
|
|
{
|
|
|
|
return node == 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void node_set_state(int node, enum node_states state)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void node_clear_state(int node, enum node_states state)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline int num_node_state(enum node_states state)
|
|
|
|
{
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
#define for_each_node_state(node, __state) \
|
|
|
|
for ( (node) = 0; (node) == 0; (node) = 1)
|
|
|
|
|
2006-03-27 16:15:57 +07:00
|
|
|
#define first_online_node 0
|
2014-08-07 06:07:50 +07:00
|
|
|
#define first_memory_node 0
|
2006-03-27 16:15:57 +07:00
|
|
|
#define next_online_node(nid) (MAX_NUMNODES)
|
2007-02-21 04:57:51 +07:00
|
|
|
#define nr_node_ids 1
|
2009-06-17 05:32:15 +07:00
|
|
|
#define nr_online_nodes 1
|
Memoryless nodes: Generic management of nodemasks for various purposes
Why do we need to support memoryless nodes?
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> For fujitsu, problem is called "empty" node.
>
> When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init)
> creates nodes, which includes no memory, no cpu.
>
> I tried to remove empty-node in past, but that was denied.
> It was because we can hot-add cpu to the empty node.
> (node-hotplug triggered by cpu is not implemented now. and it will be ugly.)
>
>
> For HP, (Lee can comment on this later), they have memory-less-node.
> As far as I hear, HP's machine can have following configration.
>
> (example)
> Node0: CPU0 memory AAA MB
> Node1: CPU1 memory AAA MB
> Node2: CPU2 memory AAA MB
> Node3: CPU3 memory AAA MB
> Node4: Memory XXX GB
>
> AAA is very small value (below 16MB) and will be omitted by ia64 bootstrap.
> After boot, only Node 4 has valid memory (but have no cpu.)
>
> Maybe this is memory-interleave by firmware config.
Christoph Lameter <clameter@sgi.com> wrote:
> Future SGI platforms (actually also current one can have but nothing like
> that is deployed to my knowledge) have nodes with only cpus. Current SGI
> platforms have nodes with just I/O that we so far cannot manage in the
> core. So the arch code maps them to the nearest memory node.
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> For the HP platforms, we can configure each cell with from 0% to 100%
> "cell local memory". When we configure with <100% CLM, the "missing
> percentages" are interleaved by hardware on a cache-line granularity to
> improve bandwidth at the expense of latency for numa-challenged
> applications [and OSes, but not our problem ;-)]. When we boot Linux on
> such a config, all of the real nodes have no memory--it all resides in a
> single interleaved pseudo-node.
>
> When we boot Linux on a 100% CLM configuration [== NUMA], we still have
> the interleaved pseudo-node. It contains a few hundred MB stolen from
> the real nodes to contain the DMA zone. [Interleaved memory resides at
> phys addr 0]. The memoryless-nodes patches, along with the zoneorder
> patches, support this config as well.
>
> Also, when we boot a NUMA config with the "mem=" command line,
> specifying less memory than actually exists, Linux takes the excluded
> memory "off the top" rather than distributing it across the nodes. This
> can result in memoryless nodes, as well.
>
This patch:
Preparation for memoryless node patches.
Provide a generic way to keep nodemasks describing various characteristics of
NUMA nodes.
Remove the node_online_map and the node_possible map and realize the same
functionality using two nodes stats: N_POSSIBLE and N_ONLINE.
[Lee.Schermerhorn@hp.com: Initialize N_*_MEMORY and N_CPU masks for non-NUMA config]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Bob Picco <bob.picco@hp.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 15:25:27 +07:00
|
|
|
|
2009-06-17 05:32:15 +07:00
|
|
|
#define node_set_online(node) node_set_state((node), N_ONLINE)
|
|
|
|
#define node_set_offline(node) node_clear_state((node), N_ONLINE)
|
2011-07-27 06:08:30 +07:00
|
|
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#if defined(CONFIG_NUMA) && (MAX_NUMNODES > 1)
|
|
|
|
extern int node_random(const nodemask_t *maskp);
|
|
|
|
#else
|
|
|
|
static inline int node_random(const nodemask_t *mask)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
2005-04-17 05:20:36 +07:00
|
|
|
#endif
|
|
|
|
|
Memoryless nodes: Generic management of nodemasks for various purposes
Why do we need to support memoryless nodes?
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> For fujitsu, problem is called "empty" node.
>
> When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init)
> creates nodes, which includes no memory, no cpu.
>
> I tried to remove empty-node in past, but that was denied.
> It was because we can hot-add cpu to the empty node.
> (node-hotplug triggered by cpu is not implemented now. and it will be ugly.)
>
>
> For HP, (Lee can comment on this later), they have memory-less-node.
> As far as I hear, HP's machine can have following configration.
>
> (example)
> Node0: CPU0 memory AAA MB
> Node1: CPU1 memory AAA MB
> Node2: CPU2 memory AAA MB
> Node3: CPU3 memory AAA MB
> Node4: Memory XXX GB
>
> AAA is very small value (below 16MB) and will be omitted by ia64 bootstrap.
> After boot, only Node 4 has valid memory (but have no cpu.)
>
> Maybe this is memory-interleave by firmware config.
Christoph Lameter <clameter@sgi.com> wrote:
> Future SGI platforms (actually also current one can have but nothing like
> that is deployed to my knowledge) have nodes with only cpus. Current SGI
> platforms have nodes with just I/O that we so far cannot manage in the
> core. So the arch code maps them to the nearest memory node.
Lee Schermerhorn <Lee.Schermerhorn@hp.com> wrote:
> For the HP platforms, we can configure each cell with from 0% to 100%
> "cell local memory". When we configure with <100% CLM, the "missing
> percentages" are interleaved by hardware on a cache-line granularity to
> improve bandwidth at the expense of latency for numa-challenged
> applications [and OSes, but not our problem ;-)]. When we boot Linux on
> such a config, all of the real nodes have no memory--it all resides in a
> single interleaved pseudo-node.
>
> When we boot Linux on a 100% CLM configuration [== NUMA], we still have
> the interleaved pseudo-node. It contains a few hundred MB stolen from
> the real nodes to contain the DMA zone. [Interleaved memory resides at
> phys addr 0]. The memoryless-nodes patches, along with the zoneorder
> patches, support this config as well.
>
> Also, when we boot a NUMA config with the "mem=" command line,
> specifying less memory than actually exists, Linux takes the excluded
> memory "off the top" rather than distributing it across the nodes. This
> can result in memoryless nodes, as well.
>
This patch:
Preparation for memoryless node patches.
Provide a generic way to keep nodemasks describing various characteristics of
NUMA nodes.
Remove the node_online_map and the node_possible map and realize the same
functionality using two nodes stats: N_POSSIBLE and N_ONLINE.
[Lee.Schermerhorn@hp.com: Initialize N_*_MEMORY and N_CPU masks for non-NUMA config]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Tested-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Bob Picco <bob.picco@hp.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@skynet.ie>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 15:25:27 +07:00
|
|
|
#define node_online_map node_states[N_ONLINE]
|
|
|
|
#define node_possible_map node_states[N_POSSIBLE]
|
|
|
|
|
|
|
|
#define num_online_nodes() num_node_state(N_ONLINE)
|
|
|
|
#define num_possible_nodes() num_node_state(N_POSSIBLE)
|
|
|
|
#define node_online(node) node_state((node), N_ONLINE)
|
|
|
|
#define node_possible(node) node_state((node), N_POSSIBLE)
|
|
|
|
|
|
|
|
#define for_each_node(node) for_each_node_state(node, N_POSSIBLE)
|
|
|
|
#define for_each_online_node(node) for_each_node_state(node, N_ONLINE)
|
2005-04-17 05:20:36 +07:00
|
|
|
|
mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY aware
At first, init_task's mems_allowed is initialized as this.
init_task->mems_allowed == node_state[N_POSSIBLE]
And cpuset's top_cpuset mask is initialized as this
top_cpuset->mems_allowed = node_state[N_HIGH_MEMORY]
Before 2.6.29:
policy's mems_allowed is initialized as this.
1. update tasks->mems_allowed by its cpuset->mems_allowed.
2. policy->mems_allowed = nodes_and(tasks->mems_allowed, user's mask)
Updating task's mems_allowed in reference to top_cpuset's one.
cpuset's mems_allowed is aware of N_HIGH_MEMORY, always.
In 2.6.30: After commit 58568d2a8215cb6f55caf2332017d7bdff954e1c
("cpuset,mm: update tasks' mems_allowed in time"), policy's mems_allowed
is initialized as this.
1. policy->mems_allowd = nodes_and(task->mems_allowed, user's mask)
Here, if task is in top_cpuset, task->mems_allowed is not updated from
init's one. Assume user excutes command as #numactrl --interleave=all
,....
policy->mems_allowd = nodes_and(N_POSSIBLE, ALL_SET_MASK)
Then, policy's mems_allowd can includes a possible node, which has no pgdat.
MPOL's INTERLEAVE just scans nodemask of task->mems_allowd and access this
directly.
NODE_DATA(nid)->zonelist even if NODE_DATA(nid)==NULL
Then, what's we need is making policy->mems_allowed be aware of
N_HIGH_MEMORY. This patch does that. But to do so, extra nodemask will
be on statck. Because I know cpumask has a new interface of
CPUMASK_ALLOC(), I added it to node.
This patch stands on old behavior. But I feel this fix itself is just a
Band-Aid. But to do fundametal fix, we have to take care of memory
hotplug and it takes time. (task->mems_allowd should be N_HIGH_MEMORY, I
think.)
mpol_set_nodemask() should be aware of N_HIGH_MEMORY and policy's nodemask
should be includes only online nodes.
In old behavior, this is guaranteed by frequent reference to cpuset's
code. Now, most of them are removed and mempolicy has to check it by
itself.
To do check, a few nodemask_t will be used for calculating nodemask. But,
size of nodemask_t can be big and it's not good to allocate them on stack.
Now, cpumask_t has CPUMASK_ALLOC/FREE an easy code for get scratch area.
NODEMASK_ALLOC/FREE shoudl be there.
[akpm@linux-foundation.org: cleanups & tweaks]
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Paul Menage <menage@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Rientjes <rientjes@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 05:07:33 +07:00
|
|
|
/*
|
2009-12-15 08:58:38 +07:00
|
|
|
* For nodemask scrach area.
|
|
|
|
* NODEMASK_ALLOC(type, name) allocates an object with a specified type and
|
|
|
|
* name.
|
mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY aware
At first, init_task's mems_allowed is initialized as this.
init_task->mems_allowed == node_state[N_POSSIBLE]
And cpuset's top_cpuset mask is initialized as this
top_cpuset->mems_allowed = node_state[N_HIGH_MEMORY]
Before 2.6.29:
policy's mems_allowed is initialized as this.
1. update tasks->mems_allowed by its cpuset->mems_allowed.
2. policy->mems_allowed = nodes_and(tasks->mems_allowed, user's mask)
Updating task's mems_allowed in reference to top_cpuset's one.
cpuset's mems_allowed is aware of N_HIGH_MEMORY, always.
In 2.6.30: After commit 58568d2a8215cb6f55caf2332017d7bdff954e1c
("cpuset,mm: update tasks' mems_allowed in time"), policy's mems_allowed
is initialized as this.
1. policy->mems_allowd = nodes_and(task->mems_allowed, user's mask)
Here, if task is in top_cpuset, task->mems_allowed is not updated from
init's one. Assume user excutes command as #numactrl --interleave=all
,....
policy->mems_allowd = nodes_and(N_POSSIBLE, ALL_SET_MASK)
Then, policy's mems_allowd can includes a possible node, which has no pgdat.
MPOL's INTERLEAVE just scans nodemask of task->mems_allowd and access this
directly.
NODE_DATA(nid)->zonelist even if NODE_DATA(nid)==NULL
Then, what's we need is making policy->mems_allowed be aware of
N_HIGH_MEMORY. This patch does that. But to do so, extra nodemask will
be on statck. Because I know cpumask has a new interface of
CPUMASK_ALLOC(), I added it to node.
This patch stands on old behavior. But I feel this fix itself is just a
Band-Aid. But to do fundametal fix, we have to take care of memory
hotplug and it takes time. (task->mems_allowd should be N_HIGH_MEMORY, I
think.)
mpol_set_nodemask() should be aware of N_HIGH_MEMORY and policy's nodemask
should be includes only online nodes.
In old behavior, this is guaranteed by frequent reference to cpuset's
code. Now, most of them are removed and mempolicy has to check it by
itself.
To do check, a few nodemask_t will be used for calculating nodemask. But,
size of nodemask_t can be big and it's not good to allocate them on stack.
Now, cpumask_t has CPUMASK_ALLOC/FREE an easy code for get scratch area.
NODEMASK_ALLOC/FREE shoudl be there.
[akpm@linux-foundation.org: cleanups & tweaks]
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Paul Menage <menage@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Rientjes <rientjes@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 05:07:33 +07:00
|
|
|
*/
|
2009-12-15 08:58:38 +07:00
|
|
|
#if NODES_SHIFT > 8 /* nodemask_t > 256 bytes */
|
|
|
|
#define NODEMASK_ALLOC(type, name, gfp_flags) \
|
|
|
|
type *name = kmalloc(sizeof(*name), gfp_flags)
|
|
|
|
#define NODEMASK_FREE(m) kfree(m)
|
mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY aware
At first, init_task's mems_allowed is initialized as this.
init_task->mems_allowed == node_state[N_POSSIBLE]
And cpuset's top_cpuset mask is initialized as this
top_cpuset->mems_allowed = node_state[N_HIGH_MEMORY]
Before 2.6.29:
policy's mems_allowed is initialized as this.
1. update tasks->mems_allowed by its cpuset->mems_allowed.
2. policy->mems_allowed = nodes_and(tasks->mems_allowed, user's mask)
Updating task's mems_allowed in reference to top_cpuset's one.
cpuset's mems_allowed is aware of N_HIGH_MEMORY, always.
In 2.6.30: After commit 58568d2a8215cb6f55caf2332017d7bdff954e1c
("cpuset,mm: update tasks' mems_allowed in time"), policy's mems_allowed
is initialized as this.
1. policy->mems_allowd = nodes_and(task->mems_allowed, user's mask)
Here, if task is in top_cpuset, task->mems_allowed is not updated from
init's one. Assume user excutes command as #numactrl --interleave=all
,....
policy->mems_allowd = nodes_and(N_POSSIBLE, ALL_SET_MASK)
Then, policy's mems_allowd can includes a possible node, which has no pgdat.
MPOL's INTERLEAVE just scans nodemask of task->mems_allowd and access this
directly.
NODE_DATA(nid)->zonelist even if NODE_DATA(nid)==NULL
Then, what's we need is making policy->mems_allowed be aware of
N_HIGH_MEMORY. This patch does that. But to do so, extra nodemask will
be on statck. Because I know cpumask has a new interface of
CPUMASK_ALLOC(), I added it to node.
This patch stands on old behavior. But I feel this fix itself is just a
Band-Aid. But to do fundametal fix, we have to take care of memory
hotplug and it takes time. (task->mems_allowd should be N_HIGH_MEMORY, I
think.)
mpol_set_nodemask() should be aware of N_HIGH_MEMORY and policy's nodemask
should be includes only online nodes.
In old behavior, this is guaranteed by frequent reference to cpuset's
code. Now, most of them are removed and mempolicy has to check it by
itself.
To do check, a few nodemask_t will be used for calculating nodemask. But,
size of nodemask_t can be big and it's not good to allocate them on stack.
Now, cpumask_t has CPUMASK_ALLOC/FREE an easy code for get scratch area.
NODEMASK_ALLOC/FREE shoudl be there.
[akpm@linux-foundation.org: cleanups & tweaks]
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Paul Menage <menage@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Rientjes <rientjes@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 05:07:33 +07:00
|
|
|
#else
|
2010-03-11 06:22:42 +07:00
|
|
|
#define NODEMASK_ALLOC(type, name, gfp_flags) type _##name, *name = &_##name
|
2009-12-15 08:58:38 +07:00
|
|
|
#define NODEMASK_FREE(m) do {} while (0)
|
mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY aware
At first, init_task's mems_allowed is initialized as this.
init_task->mems_allowed == node_state[N_POSSIBLE]
And cpuset's top_cpuset mask is initialized as this
top_cpuset->mems_allowed = node_state[N_HIGH_MEMORY]
Before 2.6.29:
policy's mems_allowed is initialized as this.
1. update tasks->mems_allowed by its cpuset->mems_allowed.
2. policy->mems_allowed = nodes_and(tasks->mems_allowed, user's mask)
Updating task's mems_allowed in reference to top_cpuset's one.
cpuset's mems_allowed is aware of N_HIGH_MEMORY, always.
In 2.6.30: After commit 58568d2a8215cb6f55caf2332017d7bdff954e1c
("cpuset,mm: update tasks' mems_allowed in time"), policy's mems_allowed
is initialized as this.
1. policy->mems_allowd = nodes_and(task->mems_allowed, user's mask)
Here, if task is in top_cpuset, task->mems_allowed is not updated from
init's one. Assume user excutes command as #numactrl --interleave=all
,....
policy->mems_allowd = nodes_and(N_POSSIBLE, ALL_SET_MASK)
Then, policy's mems_allowd can includes a possible node, which has no pgdat.
MPOL's INTERLEAVE just scans nodemask of task->mems_allowd and access this
directly.
NODE_DATA(nid)->zonelist even if NODE_DATA(nid)==NULL
Then, what's we need is making policy->mems_allowed be aware of
N_HIGH_MEMORY. This patch does that. But to do so, extra nodemask will
be on statck. Because I know cpumask has a new interface of
CPUMASK_ALLOC(), I added it to node.
This patch stands on old behavior. But I feel this fix itself is just a
Band-Aid. But to do fundametal fix, we have to take care of memory
hotplug and it takes time. (task->mems_allowd should be N_HIGH_MEMORY, I
think.)
mpol_set_nodemask() should be aware of N_HIGH_MEMORY and policy's nodemask
should be includes only online nodes.
In old behavior, this is guaranteed by frequent reference to cpuset's
code. Now, most of them are removed and mempolicy has to check it by
itself.
To do check, a few nodemask_t will be used for calculating nodemask. But,
size of nodemask_t can be big and it's not good to allocate them on stack.
Now, cpumask_t has CPUMASK_ALLOC/FREE an easy code for get scratch area.
NODEMASK_ALLOC/FREE shoudl be there.
[akpm@linux-foundation.org: cleanups & tweaks]
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Paul Menage <menage@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Rientjes <rientjes@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 05:07:33 +07:00
|
|
|
#endif
|
|
|
|
|
|
|
|
/* A example struture for using NODEMASK_ALLOC, used in mempolicy. */
|
|
|
|
struct nodemask_scratch {
|
|
|
|
nodemask_t mask1;
|
|
|
|
nodemask_t mask2;
|
|
|
|
};
|
|
|
|
|
2009-12-15 08:58:38 +07:00
|
|
|
#define NODEMASK_SCRATCH(x) \
|
|
|
|
NODEMASK_ALLOC(struct nodemask_scratch, x, \
|
|
|
|
GFP_KERNEL | __GFP_NORETRY)
|
2009-12-15 08:58:13 +07:00
|
|
|
#define NODEMASK_SCRATCH_FREE(x) NODEMASK_FREE(x)
|
mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY aware
At first, init_task's mems_allowed is initialized as this.
init_task->mems_allowed == node_state[N_POSSIBLE]
And cpuset's top_cpuset mask is initialized as this
top_cpuset->mems_allowed = node_state[N_HIGH_MEMORY]
Before 2.6.29:
policy's mems_allowed is initialized as this.
1. update tasks->mems_allowed by its cpuset->mems_allowed.
2. policy->mems_allowed = nodes_and(tasks->mems_allowed, user's mask)
Updating task's mems_allowed in reference to top_cpuset's one.
cpuset's mems_allowed is aware of N_HIGH_MEMORY, always.
In 2.6.30: After commit 58568d2a8215cb6f55caf2332017d7bdff954e1c
("cpuset,mm: update tasks' mems_allowed in time"), policy's mems_allowed
is initialized as this.
1. policy->mems_allowd = nodes_and(task->mems_allowed, user's mask)
Here, if task is in top_cpuset, task->mems_allowed is not updated from
init's one. Assume user excutes command as #numactrl --interleave=all
,....
policy->mems_allowd = nodes_and(N_POSSIBLE, ALL_SET_MASK)
Then, policy's mems_allowd can includes a possible node, which has no pgdat.
MPOL's INTERLEAVE just scans nodemask of task->mems_allowd and access this
directly.
NODE_DATA(nid)->zonelist even if NODE_DATA(nid)==NULL
Then, what's we need is making policy->mems_allowed be aware of
N_HIGH_MEMORY. This patch does that. But to do so, extra nodemask will
be on statck. Because I know cpumask has a new interface of
CPUMASK_ALLOC(), I added it to node.
This patch stands on old behavior. But I feel this fix itself is just a
Band-Aid. But to do fundametal fix, we have to take care of memory
hotplug and it takes time. (task->mems_allowd should be N_HIGH_MEMORY, I
think.)
mpol_set_nodemask() should be aware of N_HIGH_MEMORY and policy's nodemask
should be includes only online nodes.
In old behavior, this is guaranteed by frequent reference to cpuset's
code. Now, most of them are removed and mempolicy has to check it by
itself.
To do check, a few nodemask_t will be used for calculating nodemask. But,
size of nodemask_t can be big and it's not good to allocate them on stack.
Now, cpumask_t has CPUMASK_ALLOC/FREE an easy code for get scratch area.
NODEMASK_ALLOC/FREE shoudl be there.
[akpm@linux-foundation.org: cleanups & tweaks]
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Paul Menage <menage@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Rientjes <rientjes@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 05:07:33 +07:00
|
|
|
|
|
|
|
|
2005-04-17 05:20:36 +07:00
|
|
|
#endif /* __LINUX_NODEMASK_H */
|