Commit Graph

888077 Commits

Author SHA1 Message Date
Dmitry Safonov
61c5767603 selftests/timens: Add Time Namespace test for supported clocks
A test to check that all supported clocks work on host and inside
a new time namespace. Use both ways to get time: through VDSO and
by entering the kernel with implicit syscall.

Introduce a new timens directory in selftests framework for
the next timens tests.

Output on success:
 1..10
 ok 1 Passed for CLOCK_BOOTTIME (syscall)
 ok 2 Passed for CLOCK_BOOTTIME (vdso)
 ok 3 Passed for CLOCK_BOOTTIME_ALARM (syscall)
 ok 4 Passed for CLOCK_BOOTTIME_ALARM (vdso)
 ok 5 Passed for CLOCK_MONOTONIC (syscall)
 ok 6 Passed for CLOCK_MONOTONIC (vdso)
 ok 7 Passed for CLOCK_MONOTONIC_COARSE (syscall)
 ok 8 Passed for CLOCK_MONOTONIC_COARSE (vdso)
 ok 9 Passed for CLOCK_MONOTONIC_RAW (syscall)
 ok 10 Passed for CLOCK_MONOTONIC_RAW (vdso)
 # Pass 10 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0

Output with lack of permissions:
 1..10
 not ok 1 # SKIP need to run as root

Output without support of time namespaces:
 1..10
 not ok 1 # SKIP Time namespaces are not supported

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-29-dima@arista.com
2020-01-14 12:21:00 +01:00
Andrei Vagin
04a8682a71 fs/proc: Introduce /proc/pid/timens_offsets
API to set time namespace offsets for children processes, i.e.:
echo "$clockid $offset_sec $offset_nsec" > /proc/self/timens_offsets

Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-28-dima@arista.com
2020-01-14 12:20:59 +01:00
Dmitry Safonov
70ddf65184 x86/vdso: Zap vvar pages when switching to a time namespace
The VVAR page layout depends on whether a task belongs to the root or
non-root time namespace. Whenever a task changes its namespace, the VVAR
page tables are cleared and then they will be re-faulted with a
corresponding layout.

Co-developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-27-dima@arista.com
2020-01-14 12:20:59 +01:00
Dmitry Safonov
e6b28ec65b x86/vdso: On timens page fault prefault also VVAR page
As timens page has offsets to data on VVAR page VVAR is going
to be accessed shortly. Set it up with timens in one page fault
as optimization.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-26-dima@arista.com
2020-01-14 12:20:59 +01:00
Dmitry Safonov
af34ebeb86 x86/vdso: Handle faults on timens page
If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page.

Co-developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-25-dima@arista.com
2020-01-14 12:20:58 +01:00
Dmitry Safonov
afaa7b5ac7 time: Allocate per-timens vvar page
VDSO support for Time namespace needs to set up a page with the same
layout as VVAR. That timens page will be placed on position of VVAR page
inside namespace. That page contains time namespace clock offsets and it
has vdso_data->seq set to 1 to enforce the slow path and
vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace
handling path.

Allocate the timens page during namespace creation. Setup the offsets
when the first task enters the ns and freeze them to guarantee the pace
of monotonic/boottime clocks and to avoid breakage of applications.

The design decision is to have a global offset_lock which is used during
namespace offsets setup and to freeze offsets when the first task joins the
new time namespace. That is better in terms of memory usage compared to
having a per namespace mutex that's used only during the setup period.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Based-on-work-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-24-dima@arista.com
2020-01-14 12:20:58 +01:00
Dmitry Safonov
550a77a74c x86/vdso: Add time napespace page
To support time namespaces in the VDSO with a minimal impact on regular non
time namespace affected tasks, the namespace handling needs to be hidden in
a slow path.

The most obvious place is vdso_seq_begin(). If a task belongs to a time
namespace then the VVAR page which contains the system wide VDSO data is
replaced with a namespace specific page which has the same layout as the
VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the VDSO data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

Allocate the time namespace page among VVAR pages and place vdso_data on
it.  Provide __arch_get_timens_vdso_data() helper for VDSO code to get the
code-relative position of VVARs on that special page.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-23-dima@arista.com
2020-01-14 12:20:58 +01:00
Dmitry Safonov
64b302ab66 x86/vdso: Provide vdso_data offset on vvar_page
VDSO support for time namespaces needs to set up a page with the same
layout as VVAR. That timens page will be placed on position of VVAR page
inside namespace. That page has vdso_data->seq set to 1 to enforce
the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce
the time namespace handling path.

To prepare the time namespace page the kernel needs to know the vdso_data
offset.  Provide arch_get_vdso_data() helper for locating vdso_data on VVAR
page.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-22-dima@arista.com
2020-01-14 12:20:57 +01:00
Thomas Gleixner
660fd04f93 lib/vdso: Prepare for time namespace support
To support time namespaces in the vdso with a minimal impact on regular non
time namespace affected tasks, the namespace handling needs to be hidden in
a slow path.

The most obvious place is vdso_seq_begin(). If a task belongs to a time
namespace then the VVAR page which contains the system wide vdso data is
replaced with a namespace specific page which has the same layout as the
VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the vdso data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

If VDSO time namespace support is disabled the whole magic is compiled out.

Initial testing shows that the disabled case is almost identical to the
host case which does not take the slow timens path. With the special timens
page installed the performance hit is constant time and in the range of
5-7%.

For the vdso functions which are not using the sequence count an
unconditional check for vdso_data->clock_mode is added which switches to
the real vdso when the clock_mode is VCLOCK_TIMENS.

[avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data
 pointer by CS_RAW offset.]

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-21-dima@arista.com
2020-01-14 12:20:57 +01:00
Dmitry Safonov
6f74acfde2 x86/vdso: Restrict splitting VVAR VMA
Forbid splitting VVAR VMA resulting in a stricter ABI and reducing the
amount of corner-cases to consider while working further on VDSO time
namespace support.

As the offset from timens to VVAR page is computed compile-time, the pages
in VVAR should stay together and not being partically mremap()'ed.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-20-dima@arista.com
2020-01-14 12:20:56 +01:00
Dmitry Safonov
0efc8bb0bb fs/proc: Respect boottime inside time namespace for /proc/uptime
Make sure that /proc/uptime is adjusted to the tasks time namespace.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-19-dima@arista.com
2020-01-14 12:20:56 +01:00
Andrei Vagin
1f9b37bfbb posix-timers: Make clock_nanosleep() time namespace aware
clock_nanosleep() accepts absolute values of expiration time, if the
TIMER_ABSTIME flag is set. This value is in the tasks time namespace,
which has to be converted to the host time namespace.

Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-18-dima@arista.com
2020-01-14 12:20:55 +01:00
Andrei Vagin
ea2d1f7fce hrtimers: Prepare hrtimer_nanosleep() for time namespaces
clock_nanosleep() accepts absolute values of expiration time when
TIMER_ABSTIME flag is set. This absolute value is inside the task's
time namespace, and has to be converted to the host's time.

There is timens_ktime_to_host() helper for converting time, but
it accepts ktime argument.

As a preparation, make hrtimer_nanosleep() accept a clock value in ktime
instead of timespec64.

Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-17-dima@arista.com
2020-01-14 12:20:55 +01:00
Andrei Vagin
0b9b9a3b16 alarmtimer: Make nanosleep() time namespace aware
clock_nanosleep() accepts absolute values of expiration time when the
TIMER_ABSTIME flag is set. This absolute value is inside the task's
time namespace and has to be converted to the host's time.

Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-16-dima@arista.com
2020-01-14 12:20:55 +01:00
Andrei Vagin
7da8b3a44b posix-timers: Make timer_settime() time namespace aware
Wire timer_settime() syscall into time namespace virtualization.

sys_timer_settime() calls the ktime->timer_set() callback. Right now,
common_timer_set() is the only implementation for the callback.

The user-supplied expiry value is converted from timespec64 to ktime and
then timens_ktime_to_host() can be used to convert namespace's time to the
host time.

Inside a time namespace kernel's time differs by a fixed offset from a
user-supplied time, but only absolute values (TIMER_ABSTIME) must be
converted.

Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-15-dima@arista.com
2020-01-14 12:20:54 +01:00
Andrei Vagin
6cd889d43c timerfd: Make timerfd_settime() time namespace aware
timerfd_settime() accepts an absolute value of the expiration time if
TFD_TIMER_ABSTIME is specified. This value is in the task's time namespace
and has to be converted to the host's time namespace.

Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-14-dima@arista.com
2020-01-14 12:20:53 +01:00
Andrei Vagin
89dd8eecfe time: Add do_timens_ktime_to_host() helper
The helper subtracts namespace's clock offset from the given time
and ensures that the result is within [0, KTIME_MAX].

Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-13-dima@arista.com
2020-01-14 12:20:53 +01:00
Andrei Vagin
5a590f35ad posix-clocks: Wire up clock_gettime() with timens offsets
Adjust monotonic and boottime clocks with per-timens offsets.  As the
result a process inside time namespace will see timers and clocks corrected
to offsets that were set when the namespace was created

Note that applications usually go through vDSO to get time, which is not
yet adjusted. Further changes will complete time namespace virtualisation
with vDSO support.

Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-12-dima@arista.com
2020-01-14 12:20:52 +01:00
Andrei Vagin
198fa445d5 posix-timers: Use clock_get_ktime() in common_timer_get()
Now, when the clock_get_ktime() callback exists, the suboptimal
timespec64-based conversion can be removed from common_timer_get().

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-11-dima@arista.com
2020-01-14 12:20:52 +01:00
Andrei Vagin
9c71a2e8a7 posix-clocks: Introduce clock_get_ktime() callback
The callsite in common_timer_get() has already a comment:
    /*
     * The timespec64 based conversion is suboptimal, but it's not
     * worth to implement yet another callback.
     */
    kc->clock_get(timr->it_clock, &ts64);
    now = timespec64_to_ktime(ts64);

The upcoming support for time namespaces requires to have access to:

 - The time in a task's time namespace for sys_clock_gettime()
 - The time in the root name space for common_timer_get()

That adds a valid reason to finally implement a separate callback which
returns the time in ktime_t format.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-10-dima@arista.com
2020-01-14 12:20:51 +01:00
Andrei Vagin
2f58bf909a alarmtimer: Provide get_timespec() callback
The upcoming support for time namespaces requires to have access to:

  - The time in a task's time namespace for sys_clock_gettime()
  - The time in the root name space for common_timer_get()

Wire up alarm bases with get_timespec().

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-9-dima@arista.com
2020-01-14 12:20:51 +01:00
Andrei Vagin
41b3b8dffc alarmtimer: Rename gettime() callback to get_ktime()
The upcoming support for time namespaces requires to have access to:

  - The time in a tasks time namespace for sys_clock_gettime()
  - The time in the root name space for common_timer_get()

struct alarm_base needs to follow the same naming convention, so rename
.gettime() callback into get_ktime() as a preparation for introducing
get_timespec().

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-8-dima@arista.com
2020-01-14 12:20:50 +01:00
Andrei Vagin
eaf80194d0 posix-clocks: Rename .clock_get_timespec() callbacks accordingly
The upcoming support for time namespaces requires to have access to:

  - The time in a task's time namespace for sys_clock_gettime()
  - The time in the root name space for common_timer_get()

That adds a valid reason to finally implement a separate callback which
returns the time in ktime_t format in (struct k_clock).

As a preparation ground for introducing clock_get_ktime(), the original
callback clock_get() was renamed into clock_get_timespec().
Reflect the renaming into the callback implementations.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-7-dima@arista.com
2020-01-14 12:20:50 +01:00
Andrei Vagin
819a95fe3a posix-clocks: Rename the clock_get() callback to clock_get_timespec()
The upcoming support for time namespaces requires to have access to:

 - The time in a task's time namespace for sys_clock_gettime()
 - The time in the root name space for common_timer_get()

That adds a valid reason to finally implement a separate callback which
returns the time in ktime_t format, rather than in (struct timespec).

Rename the clock_get() callback to clock_get_timespec() as a preparation
for introducing clock_get_ktime().

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-6-dima@arista.com
2020-01-14 12:20:49 +01:00
Andrei Vagin
af993f58d6 time: Add timens_offsets to be used for tasks in time namespace
Introduce offsets for time namespace. They will contain an adjustment
needed to convert clocks to/from host's.

A new namespace is created with the same offsets as the time namespace
of the current process.

Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-5-dima@arista.com
2020-01-14 12:20:49 +01:00
Andrei Vagin
769071ac9f ns: Introduce Time Namespace
Time Namespace isolates clock values.

The kernel provides access to several clocks CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc.

CLOCK_REALTIME
      System-wide clock that measures real (i.e., wall-clock) time.

CLOCK_MONOTONIC
      Clock that cannot be set and represents monotonic time since
      some unspecified starting point.

CLOCK_BOOTTIME
      Identical to CLOCK_MONOTONIC, except it also includes any time
      that the system is suspended.

For many users, the time namespace means the ability to changes date and
time in a container (CLOCK_REALTIME). Providing per namespace notions of
CLOCK_REALTIME would be complex with a massive overhead, but has a dubious
value.

But in the context of checkpoint/restore functionality, monotonic and
boottime clocks become interesting. Both clocks are monotonic with
unspecified starting points. These clocks are widely used to measure time
slices and set timers. After restoring or migrating processes, it has to be
guaranteed that they never go backward. In an ideal case, the behavior of
these clocks should be the same as for a case when a whole system is
suspended. All this means that it is required to set CLOCK_MONOTONIC and
CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace
offsets for clocks.

A time namespace is similar to a pid namespace in the way how it is
created: unshare(CLONE_NEWTIME) system call creates a new time namespace,
but doesn't set it to the current process. Then all children of the process
will be born in the new time namespace, or a process can use the setns()
system call to join a namespace.

This scheme allows setting clock offsets for a namespace, before any
processes appear in it.

All available clone flags have been used, so CLONE_NEWTIME uses the highest
bit of CSIGNAL. It means that it can be used only with the unshare() and
the clone3() system calls.

[ tglx: Adjusted paragraph about clone3() to reality and massaged the
  	changelog a bit. ]

Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://criu.org/Time_namespace
Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html
Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com
2020-01-14 12:20:48 +01:00
Andrei Vagin
c966533f8c lib/vdso: Mark do_hres() and do_coarse() as __always_inline
Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
(more clock_gettime() cycles - the better):

clock            | before     | after      | diff
----------------------------------------------------------
monotonic        |  153222105 |  166775025 | 8.8%
monotonic-coarse |  671557054 |  691513017 | 3.0%
monotonic-raw    |  147116067 |  161057395 | 9.5%
boottime         |  153446224 |  166962668 | 9.1%

The improvement for arm64 for monotonic and boottime is around 3.5%.

clock            | before     | after      | diff
==================================================
monotonic          17326692     17951770     3.6%
monotonic-coarse   43624027     44215292     1.3%
monotonic-raw      17541809     17554932     0.1%
boottime           17334982     17954361     3.5%

[ tglx: Avoid the goto ]

Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-3-dima@arista.com
2020-01-14 12:20:48 +01:00
Andrei Vagin
0898a16a36 lib/vdso: Add unlikely() hint into vdso_read_begin()
Place the branch with no concurrent write before the contended case.

Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
(more clock_gettime() cycles - the better):
        | before    | after
-----------------------------------
        | 150252214 | 153242367
        | 150301112 | 153324800
        | 150392773 | 153125401
        | 150373957 | 153399355
        | 150303157 | 153489417
        | 150365237 | 153494270
-----------------------------------
avg     | 150331408 | 153345935
diff %  | 2	    | 0
-----------------------------------
stdev % | 0.3	    | 0.1

Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20191112012724.250792-2-dima@arista.com
2020-01-14 12:20:47 +01:00
Christophe Leroy
cdb7c5a9c8 lib/vdso: Avoid duplication in __cvdso_clock_getres()
VDSO_HRES and VDSO_RAW clocks are handled the same way.

Avoid the code duplication.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/fdf1a968a8f7edd61456f1689ac44082ebb19c15.1577111367.git.christophe.leroy@c-s.fr
2020-01-14 12:20:47 +01:00
Christophe Leroy
8463cf8052 lib/vdso: Let do_coarse() return 0 to simplify the callsite
do_coarse() is similar to do_hres() except that it never fails.

Change its type to int instead of void and let it always return success (0)
to simplify the call site.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/21e8afa38c02ca8672c2690307383507fe63b454.1577111367.git.christophe.leroy@c-s.fr
2020-01-14 12:20:46 +01:00
Vincenzo Frascino
0b5c12332d x86/vdso: Remove unused VDSO_HAS_32BIT_FALLBACK
VDSO_HAS_32BIT_FALLBACK has been removed from the core since
the architectures that support the generic vDSO library have
been converted to support the 32 bit fallbacks.

Remove unused VDSO_HAS_32BIT_FALLBACK from x86 vdso.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20190830135902.20861-9-vincenzo.frascino@arm.com
2020-01-14 12:20:46 +01:00
Vincenzo Frascino
de0209f53a mips: vdso: Remove unused VDSO_HAS_32BIT_FALLBACK
VDSO_HAS_32BIT_FALLBACK has been removed from the core since
the architectures that support the generic vDSO library have
been converted to support the 32 bit fallbacks.

Remove unused VDSO_HAS_32BIT_FALLBACK from mips vdso.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul Burton <paul.burton@mips.com>
Link: https://lore.kernel.org/r/20190830135902.20861-8-vincenzo.frascino@arm.com
2020-01-14 12:20:46 +01:00
Vincenzo Frascino
972188f3a2 arm64: compat: vdso: Remove unused VDSO_HAS_32BIT_FALLBACK
VDSO_HAS_32BIT_FALLBACK has been removed from the core since
the architectures that support the generic vDSO library have
been converted to support the 32 bit fallbacks.

Remove unused VDSO_HAS_32BIT_FALLBACK from arm64 compat vdso.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20190830135902.20861-7-vincenzo.frascino@arm.com
2020-01-14 12:20:45 +01:00
Vincenzo Frascino
a279235ddb lib/vdso: Remove checks on return value for 32 bit vDSO
Since all the architectures that support the generic vDSO library have
been converted to support the 32 bit fallbacks it is not required
anymore to check the return value of __cvdso_clock_get*time32_common()
before updating the old_timespec fields.

Remove the related checks from the generic vdso library.

References: c60a32ea4f ("lib/vdso/32: Provide legacy syscall fallbacks")
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20190830135902.20861-6-vincenzo.frascino@arm.com
2020-01-14 12:20:45 +01:00
Vincenzo Frascino
b767081c07 lib/vdso: Remove VDSO_HAS_32BIT_FALLBACK
VDSO_HAS_32BIT_FALLBACK was introduced to address a regression which
caused seccomp to deny access to the applications to clock_gettime64()
and clock_getres64() because they are not enabled in the existing
filters.

The purpose of VDSO_HAS_32BIT_FALLBACK was to simplify the conditional
implementation of __cvdso_clock_get*time32() variants.

Now that all the architectures that support the generic vDSO library
have been converted to support the 32 bit fallbacks the conditional
can be removed.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20190830135902.20861-5-vincenzo.frascino@arm.com

References: c60a32ea4f ("lib/vdso/32: Provide legacy syscall fallbacks")
2020-01-14 12:20:44 +01:00
Vincenzo Frascino
bf279849ad lib/vdso: Build 32 bit specific functions in the right context
clock_gettime32 and clock_getres_time32 should be compiled only with a
32 bit vdso library.

Exclude these symbols when BUILD_VDSO32 is not defined.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20190830135902.20861-3-vincenzo.frascino@arm.com
2020-01-14 12:20:44 +01:00
Thomas Gleixner
715f23b610 ARM: vdso: Set BUILD_VDSO32 and provide 32bit fallbacks
Setting BUILD_VDSO32 is required to expose the legacy 32bit interfaces in
the generic VDSO code which are going to be hidden behind an #ifdef
BUILD_VDSO32.

The 32bit fallbacks are necessary to remove the existing
VDSO_HAS_32BIT_FALLBACK hackery.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/87tv4zq9dc.fsf@nanos.tec.linutronix.de
2020-01-14 12:20:43 +01:00
Vincenzo Frascino
3b5584afee arm64: compat: vdso: Expose BUILD_VDSO32
clock_gettime32 and clock_getres_time32 should be compiled only with the
32 bit vdso library.

Expose BUILD_VDSO32 when arm64 compat is compiled, to provide an
indication to the generic library to include these symbols.

Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20190830135902.20861-2-vincenzo.frascino@arm.com
2020-01-14 12:20:43 +01:00
Thomas Gleixner
2e34d63d82 Merge branch 'timers/urgent' into timers/core
Pick up upstream VDSO fix before adding more VDSO changes.
2020-01-10 21:11:54 +01:00
Vincenzo Frascino
ffd08731b2 lib/vdso: Make __cvdso_clock_getres() static
Fix the following sparse warning in the generic vDSO library:

  linux/lib/vdso/gettimeofday.c:224:5: warning: symbol
  '__cvdso_clock_getres' was not declared. Should it be static?

Make it static and also mark it __maybe_unsed.

Fixes: 502a590a17 ("lib/vdso: Move fallback invocation to the callers")
Reported-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191128111719.8282-1-vincenzo.frascino@arm.com
2020-01-10 19:29:01 +01:00
Paul Cercueil
2707745533 time/sched_clock: Disable interrupts in sched_clock_register()
Instead of issueing a warning if sched_clock_register() is called from a
context where IRQs are enabled, the code now ensures that IRQs are indeed
disabled.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20200107010630.954648-1-paul@crapouillou.net
2020-01-09 18:50:18 +01:00
Arnd Bergmann
f35deaff1b time/posix-stubs: Provide compat itimer supoprt for alpha
Using compat_sys_getitimer and compat_sys_setitimer on alpha
causes a link failure in the Alpha tinyconfig and other configurations
that turn off CONFIG_POSIX_TIMERS.

Use the same #ifdef check for the stub version as well.

Fixes: 4c22ea2b91 ("y2038: use compat_{get,set}_itimer on alpha")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20191207191043.656328-1-arnd@arndb.de
2020-01-09 18:20:23 +01:00
Linus Torvalds
c79f46a282 Linux 5.5-rc5 2020-01-05 14:23:27 -08:00
Linus Torvalds
768fc661d1 RISC-V updates for v5.5-rc5
Several fixes for RISC-V:
 
 - Fix function graph trace support
 
 - Prefix the CSR IRQ_* macro names with "RV_", to avoid collisions
   with macros elsewhere in the Linux kernel tree named "IRQ_TIMER"
 
 - Use __pa_symbol() when computing the physical address of a kernel
   symbol, rather than __pa()
 
 - Mark the RISC-V port as supporting GCOV
 
 One DT addition:
 
 - Describe the L2 cache controller in the FU540 DT file
 
 One documentation update:
 
 - Add patch acceptance guideline documentation
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAl4Re3wACgkQx4+xDQu9
 KkvPNA/6AmDvw90JX1Xm6jK2J/eYaUr29dJZ7KYdQ7HhbsixkJecTqjW3huyWfv6
 ynsqWA0e87iSGyV+zBshw6nxVyGgDZgRam5100eiP25ZiTR5cbR8e6JmVVplGdkH
 VOONhAM64Pknoy+lCL0emP01+qa27PqTCQomxDDIxgrpfgbR9PPjgpuA7B0gCOz1
 yhc6EjfbSRKLJjnowoSdvHcDvzU0oIPjMPxQz1gUmBmUKRp5Idn/xTtWG9p/M5T8
 6XPzh8hEfbgvi0ur0Dlz9hav2g5mSL2S5n4CPitAAOyrG/jDktlQgYcs7nSTAk7m
 vtU6oUZTMJ9nmMyPQspI+krW2Q4g5BUUVRwm3ckNe6sqwrs8TlXIeTQb8Yb+B4Xm
 3EaS5YvfMNoHjlHQZ814EPAFNEqROgI04O2Sl97+gqEpQUwC9qLSeXiVQHgxcW9/
 8UqlbY/9XvKnZq6btGAJElXAirbt5j+TWKCULyUKLu8mz3644ascIr2sBOkMSb61
 ZRZbxEbK8Avwf2etv7hap1Gd2j+ywlK0IziKOsAPBr0Pif0lAdawf1PjH491spVc
 //re3g/rZu3I2+NK4qKNCoxZwzT5w/kK8rKJJ7sw8KVB2kTVjV7Qrmktt5Zeh021
 PXPT/tUgLMvZW6RRCqPCXZru28qJxlLAhuQEtHC/KbSYcm2sSJc=
 =8HSd
 -----END PGP SIGNATURE-----

Merge tag 'riscv/for-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:
 "Several fixes for RISC-V:

   - Fix function graph trace support

   - Prefix the CSR IRQ_* macro names with "RV_", to avoid collisions
     with macros elsewhere in the Linux kernel tree named "IRQ_TIMER"

   - Use __pa_symbol() when computing the physical address of a kernel
     symbol, rather than __pa()

   - Mark the RISC-V port as supporting GCOV

  One DT addition:

   - Describe the L2 cache controller in the FU540 DT file

  One documentation update:

   - Add patch acceptance guideline documentation"

* tag 'riscv/for-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  Documentation: riscv: add patch acceptance guidelines
  riscv: prefix IRQ_ macro names with an RV_ namespace
  clocksource: riscv: add notrace to riscv_sched_clock
  riscv: ftrace: correct the condition logic in function graph tracer
  riscv: dts: Add DT support for SiFive L2 cache controller
  riscv: gcov: enable gcov for RISC-V
  riscv: mm: use __pa_symbol for kernel symbols
2020-01-05 11:15:31 -08:00
Paul Walmsley
0e194d9da1 Documentation: riscv: add patch acceptance guidelines
Formalize, in kernel documentation, the patch acceptance policy for
arch/riscv.  In summary, it states that as maintainers, we plan to
only accept patches for new modules or extensions that have been
frozen or ratified by the RISC-V Foundation.

We've been following these guidelines for the past few months.  In the
meantime, we've received quite a bit of feedback that it would be
helpful to have these guidelines formally documented.

Based on a suggestion from Matthew Wilcox, we also add a link to this
file to Documentation/process/index.rst, to make this document easier
to find.  The format of this document has also been changed to align
to the format outlined in the maintainer entry profiles, in accordance
with comments from Jon Corbet and Dan Williams.

Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Krste Asanovic <krste@berkeley.edu>
Cc: Andrew Waterman <waterman@eecs.berkeley.edu>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
2020-01-04 21:49:01 -08:00
Paul Walmsley
2f3035da40 riscv: prefix IRQ_ macro names with an RV_ namespace
"IRQ_TIMER", used in the arch/riscv CSR header file, is a sufficiently
generic macro name that it's used by several source files across the
Linux code base.  Some of these other files ultimately include the
arch/riscv CSR include file, causing collisions.  Fix by prefixing the
RISC-V csr.h IRQ_ macro names with an RV_ prefix.

Fixes: a4c3733d32 ("riscv: abstract out CSR names for supervisor vs machine mode")
Reported-by: Olof Johansson <olof@lixom.net>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2020-01-04 21:48:59 -08:00
Zong Li
9d05c18e8d clocksource: riscv: add notrace to riscv_sched_clock
When enabling ftrace graph tracer, it gets the tracing clock in
ftrace_push_return_trace().  Eventually, it invokes riscv_sched_clock()
to get the clock value.  If riscv_sched_clock() isn't marked with
'notrace', it will call ftrace_push_return_trace() and cause infinite
loop.

The result of failure as follow:

command: echo function_graph >current_tracer
[   46.176787] Unable to handle kernel paging request at virtual address ffffffe04fb38c48
[   46.177309] Oops [#1]
[   46.177478] Modules linked in:
[   46.177770] CPU: 0 PID: 256 Comm: $d Not tainted 5.5.0-rc1 #47
[   46.177981] epc: ffffffe00035e59a ra : ffffffe00035e57e sp : ffffffe03a7569b0
[   46.178216]  gp : ffffffe000d29b90 tp : ffffffe03a756180 t0 : ffffffe03a756968
[   46.178430]  t1 : ffffffe00087f408 t2 : ffffffe03a7569a0 s0 : ffffffe03a7569f0
[   46.178643]  s1 : ffffffe00087f408 a0 : 0000000ac054cda4 a1 : 000000000087f411
[   46.178856]  a2 : 0000000ac054cda4 a3 : 0000000000373ca0 a4 : ffffffe04fb38c48
[   46.179099]  a5 : 00000000153e22a8 a6 : 00000000005522ff a7 : 0000000000000005
[   46.179338]  s2 : ffffffe03a756a90 s3 : ffffffe00032811c s4 : ffffffe03a756a58
[   46.179570]  s5 : ffffffe000d29fe0 s6 : 0000000000000001 s7 : 0000000000000003
[   46.179809]  s8 : 0000000000000003 s9 : 0000000000000002 s10: 0000000000000004
[   46.180053]  s11: 0000000000000000 t3 : 0000003fc815749c t4 : 00000000000efc90
[   46.180293]  t5 : ffffffe000d29658 t6 : 0000000000040000
[   46.180482] status: 0000000000000100 badaddr: ffffffe04fb38c48 cause: 000000000000000f

Signed-off-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[paul.walmsley@sifive.com: cleaned up patch description]
Fixes: 92e0d143fd ("clocksource/drivers/riscv_timer: Provide the sched_clock")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2020-01-04 21:48:48 -08:00
Linus Torvalds
36487907f3 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "17 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  hexagon: define ioremap_uc
  ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less
  ocfs2: call journal flush to mark journal as empty after journal recovery when mount
  mm/hugetlb: defer freeing of huge pages if in non-task context
  mm/gup: fix memory leak in __gup_benchmark_ioctl
  mm/oom: fix pgtables units mismatch in Killed process message
  fs/posix_acl.c: fix kernel-doc warnings
  hexagon: work around compiler crash
  hexagon: parenthesize registers in asm predicates
  fs/namespace.c: make to_mnt_ns() static
  fs/nsfs.c: include headers for missing declarations
  fs/direct-io.c: include fs/internal.h for missing prototype
  mm: move_pages: return valid node id in status if the page is already on the target node
  memcg: account security cred as well to kmemcg
  kcov: fix struct layout for kcov_remote_arg
  mm/zsmalloc.c: fix the migrated zspage statistics.
  mm/memory_hotplug: shrink zones when offlining memory
2020-01-04 19:38:51 -08:00
Linus Torvalds
a125bcda2d + Bug fixes
- performance regression: only get a label reference if the fast
     path check fails
   - fix aa_xattrs_match() may sleep while holding a RCU lock
   - fix bind mounts aborting with -ENOMEM
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE7cSDD705q2rFEEf7BS82cBjVw9gFAl4RPEUACgkQBS82cBjV
 w9gUwg/9EsFcegRD6T2NmiqUp7jY2KRjAf5JrKqZJvCbXFj9Mnurv9ug5A6b20JU
 SbyXU1CNIxqPTmwPMRF1YGtq/vuAV5UIgb14pjepmsvzF5A/xpgKJucR0gokc5PU
 6eFbPCQ4aXF83/d+/SiJfANP+oUY372b6cTzHKMouKWXBbeIG4F9vtVrPMEZlcjo
 NNxDtmHNeFdASV17uA+8OKseGluPzPlWvnpq5va/Uz4avmxdeaBQxqh2N891IUqO
 drRTpF6cgp/072cEoSrS+2dmliHOteS9785Vh4iLF4LVt8lsRjoT0Z66eGL175Ve
 TYwp7EfgIwhcUYLZgKjz7t+wp3l7Yw5retlzbzbfFTIxTfGCKunUpZIUwssI+QzW
 npMOU+0jTLZ9hVZICOicxzs40kHu8tR4dKYPHuYB2G3gNW8LGMEodVn2SBL1H6+2
 r4vUjaLIsyUHBpfQDjWHjMEN/gdSekyBvRhpnC5qNMdOTnTeHSNow88igBxmGNhA
 K9iymhYV0E1ZMw4KfGtyKZ2Zfd3E8F0ryH0cnavXYBYfvuVNs18M0TkaYwPb1+YH
 02SyEGRnnhxtIO+GwLve6pmdRT9edZgL6Pc3yCaJ/e/g9FIpFgPbX44IxuO+XrL/
 ku6NK9hXrpvk9Z/CAVUHYcRMSgQTz1lTsw9b9ooxgHQmsXuOUhM=
 =uw+S
 -----END PGP SIGNATURE-----

Merge tag 'apparmor-pr-2020-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull apparmor fixes from John Johansen:

 - performance regression: only get a label reference if the fast path
   check fails

 - fix aa_xattrs_match() may sleep while holding a RCU lock

 - fix bind mounts aborting with -ENOMEM

* tag 'apparmor-pr-2020-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
  apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock
  apparmor: only get a label reference if the fast path check fails
  apparmor: fix bind mounts aborting with -ENOMEM
2020-01-04 19:28:30 -08:00
John Johansen
8c62ed27a1 apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock
aa_xattrs_match() is unfortunately calling vfs_getxattr_alloc() from a
context protected by an rcu_read_lock. This can not be done as
vfs_getxattr_alloc() may sleep regardles of the gfp_t value being
passed to it.

Fix this by breaking the rcu_read_lock on the policy search when the
xattr match feature is requested and restarting the search if a policy
changes occur.

Fixes: 8e51f9087f ("apparmor: Add support for attaching profiles via xattr, presence and value")
Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: John Johansen <john.johansen@canonical.com>
2020-01-04 15:56:44 -08:00