linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-27 04:45:23 +07:00

Author	SHA1	Message	Date
Dmitry Safonov	61c5767603	selftests/timens: Add Time Namespace test for supported clocks A test to check that all supported clocks work on host and inside a new time namespace. Use both ways to get time: through VDSO and by entering the kernel with implicit syscall. Introduce a new timens directory in selftests framework for the next timens tests. Output on success: 1..10 ok 1 Passed for CLOCK_BOOTTIME (syscall) ok 2 Passed for CLOCK_BOOTTIME (vdso) ok 3 Passed for CLOCK_BOOTTIME_ALARM (syscall) ok 4 Passed for CLOCK_BOOTTIME_ALARM (vdso) ok 5 Passed for CLOCK_MONOTONIC (syscall) ok 6 Passed for CLOCK_MONOTONIC (vdso) ok 7 Passed for CLOCK_MONOTONIC_COARSE (syscall) ok 8 Passed for CLOCK_MONOTONIC_COARSE (vdso) ok 9 Passed for CLOCK_MONOTONIC_RAW (syscall) ok 10 Passed for CLOCK_MONOTONIC_RAW (vdso) # Pass 10 Fail 0 Xfail 0 Xpass 0 Skip 0 Error 0 Output with lack of permissions: 1..10 not ok 1 # SKIP need to run as root Output without support of time namespaces: 1..10 not ok 1 # SKIP Time namespaces are not supported Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-29-dima@arista.com	2020-01-14 12:21:00 +01:00
Andrei Vagin	04a8682a71	fs/proc: Introduce /proc/pid/timens_offsets API to set time namespace offsets for children processes, i.e.: echo "$clockid $offset_sec $offset_nsec" > /proc/self/timens_offsets Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-28-dima@arista.com	2020-01-14 12:20:59 +01:00
Dmitry Safonov	70ddf65184	x86/vdso: Zap vvar pages when switching to a time namespace The VVAR page layout depends on whether a task belongs to the root or non-root time namespace. Whenever a task changes its namespace, the VVAR page tables are cleared and then they will be re-faulted with a corresponding layout. Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-27-dima@arista.com	2020-01-14 12:20:59 +01:00
Dmitry Safonov	e6b28ec65b	x86/vdso: On timens page fault prefault also VVAR page As timens page has offsets to data on VVAR page VVAR is going to be accessed shortly. Set it up with timens in one page fault as optimization. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-26-dima@arista.com	2020-01-14 12:20:59 +01:00
Dmitry Safonov	af34ebeb86	x86/vdso: Handle faults on timens page If a task belongs to a time namespace then the VVAR page which contains the system wide VDSO data is replaced with a namespace specific page which has the same layout as the VVAR page. Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-25-dima@arista.com	2020-01-14 12:20:58 +01:00
Dmitry Safonov	afaa7b5ac7	time: Allocate per-timens vvar page VDSO support for Time namespace needs to set up a page with the same layout as VVAR. That timens page will be placed on position of VVAR page inside namespace. That page contains time namespace clock offsets and it has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. Allocate the timens page during namespace creation. Setup the offsets when the first task enters the ns and freeze them to guarantee the pace of monotonic/boottime clocks and to avoid breakage of applications. The design decision is to have a global offset_lock which is used during namespace offsets setup and to freeze offsets when the first task joins the new time namespace. That is better in terms of memory usage compared to having a per namespace mutex that's used only during the setup period. Suggested-by: Andy Lutomirski <luto@kernel.org> Based-on-work-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-24-dima@arista.com	2020-01-14 12:20:58 +01:00
Dmitry Safonov	550a77a74c	x86/vdso: Add time napespace page To support time namespaces in the VDSO with a minimal impact on regular non time namespace affected tasks, the namespace handling needs to be hidden in a slow path. The most obvious place is vdso_seq_begin(). If a task belongs to a time namespace then the VVAR page which contains the system wide VDSO data is replaced with a namespace specific page which has the same layout as the VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. The extra check in the case that vdso_data->seq is odd, e.g. a concurrent update of the VDSO data is in progress, is not really affecting regular tasks which are not part of a time namespace as the task is spin waiting for the update to finish and vdso_data->seq to become even again. If a time namespace task hits that code path, it invokes the corresponding time getter function which retrieves the real VVAR page, reads host time and then adds the offset for the requested clock which is stored in the special VVAR page. Allocate the time namespace page among VVAR pages and place vdso_data on it. Provide __arch_get_timens_vdso_data() helper for VDSO code to get the code-relative position of VVARs on that special page. Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-23-dima@arista.com	2020-01-14 12:20:58 +01:00
Dmitry Safonov	64b302ab66	x86/vdso: Provide vdso_data offset on vvar_page VDSO support for time namespaces needs to set up a page with the same layout as VVAR. That timens page will be placed on position of VVAR page inside namespace. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. To prepare the time namespace page the kernel needs to know the vdso_data offset. Provide arch_get_vdso_data() helper for locating vdso_data on VVAR page. Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-22-dima@arista.com	2020-01-14 12:20:57 +01:00
Thomas Gleixner	660fd04f93	lib/vdso: Prepare for time namespace support To support time namespaces in the vdso with a minimal impact on regular non time namespace affected tasks, the namespace handling needs to be hidden in a slow path. The most obvious place is vdso_seq_begin(). If a task belongs to a time namespace then the VVAR page which contains the system wide vdso data is replaced with a namespace specific page which has the same layout as the VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. The extra check in the case that vdso_data->seq is odd, e.g. a concurrent update of the vdso data is in progress, is not really affecting regular tasks which are not part of a time namespace as the task is spin waiting for the update to finish and vdso_data->seq to become even again. If a time namespace task hits that code path, it invokes the corresponding time getter function which retrieves the real VVAR page, reads host time and then adds the offset for the requested clock which is stored in the special VVAR page. If VDSO time namespace support is disabled the whole magic is compiled out. Initial testing shows that the disabled case is almost identical to the host case which does not take the slow timens path. With the special timens page installed the performance hit is constant time and in the range of 5-7%. For the vdso functions which are not using the sequence count an unconditional check for vdso_data->clock_mode is added which switches to the real vdso when the clock_mode is VCLOCK_TIMENS. [avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data pointer by CS_RAW offset.] Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-21-dima@arista.com	2020-01-14 12:20:57 +01:00
Dmitry Safonov	6f74acfde2	x86/vdso: Restrict splitting VVAR VMA Forbid splitting VVAR VMA resulting in a stricter ABI and reducing the amount of corner-cases to consider while working further on VDSO time namespace support. As the offset from timens to VVAR page is computed compile-time, the pages in VVAR should stay together and not being partically mremap()'ed. Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-20-dima@arista.com	2020-01-14 12:20:56 +01:00
Dmitry Safonov	0efc8bb0bb	fs/proc: Respect boottime inside time namespace for /proc/uptime Make sure that /proc/uptime is adjusted to the tasks time namespace. Co-developed-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-19-dima@arista.com	2020-01-14 12:20:56 +01:00
Andrei Vagin	1f9b37bfbb	posix-timers: Make clock_nanosleep() time namespace aware clock_nanosleep() accepts absolute values of expiration time, if the TIMER_ABSTIME flag is set. This value is in the tasks time namespace, which has to be converted to the host time namespace. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-18-dima@arista.com	2020-01-14 12:20:55 +01:00
Andrei Vagin	ea2d1f7fce	hrtimers: Prepare hrtimer_nanosleep() for time namespaces clock_nanosleep() accepts absolute values of expiration time when TIMER_ABSTIME flag is set. This absolute value is inside the task's time namespace, and has to be converted to the host's time. There is timens_ktime_to_host() helper for converting time, but it accepts ktime argument. As a preparation, make hrtimer_nanosleep() accept a clock value in ktime instead of timespec64. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-17-dima@arista.com	2020-01-14 12:20:55 +01:00
Andrei Vagin	0b9b9a3b16	alarmtimer: Make nanosleep() time namespace aware clock_nanosleep() accepts absolute values of expiration time when the TIMER_ABSTIME flag is set. This absolute value is inside the task's time namespace and has to be converted to the host's time. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-16-dima@arista.com	2020-01-14 12:20:55 +01:00
Andrei Vagin	7da8b3a44b	posix-timers: Make timer_settime() time namespace aware Wire timer_settime() syscall into time namespace virtualization. sys_timer_settime() calls the ktime->timer_set() callback. Right now, common_timer_set() is the only implementation for the callback. The user-supplied expiry value is converted from timespec64 to ktime and then timens_ktime_to_host() can be used to convert namespace's time to the host time. Inside a time namespace kernel's time differs by a fixed offset from a user-supplied time, but only absolute values (TIMER_ABSTIME) must be converted. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-15-dima@arista.com	2020-01-14 12:20:54 +01:00
Andrei Vagin	6cd889d43c	timerfd: Make timerfd_settime() time namespace aware timerfd_settime() accepts an absolute value of the expiration time if TFD_TIMER_ABSTIME is specified. This value is in the task's time namespace and has to be converted to the host's time namespace. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-14-dima@arista.com	2020-01-14 12:20:53 +01:00
Andrei Vagin	89dd8eecfe	time: Add do_timens_ktime_to_host() helper The helper subtracts namespace's clock offset from the given time and ensures that the result is within [0, KTIME_MAX]. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-13-dima@arista.com	2020-01-14 12:20:53 +01:00
Andrei Vagin	5a590f35ad	posix-clocks: Wire up clock_gettime() with timens offsets Adjust monotonic and boottime clocks with per-timens offsets. As the result a process inside time namespace will see timers and clocks corrected to offsets that were set when the namespace was created Note that applications usually go through vDSO to get time, which is not yet adjusted. Further changes will complete time namespace virtualisation with vDSO support. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-12-dima@arista.com	2020-01-14 12:20:52 +01:00
Andrei Vagin	198fa445d5	posix-timers: Use clock_get_ktime() in common_timer_get() Now, when the clock_get_ktime() callback exists, the suboptimal timespec64-based conversion can be removed from common_timer_get(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-11-dima@arista.com	2020-01-14 12:20:52 +01:00
Andrei Vagin	9c71a2e8a7	posix-clocks: Introduce clock_get_ktime() callback The callsite in common_timer_get() has already a comment: /* * The timespec64 based conversion is suboptimal, but it's not * worth to implement yet another callback. */ kc->clock_get(timr->it_clock, &ts64); now = timespec64_to_ktime(ts64); The upcoming support for time namespaces requires to have access to: - The time in a task's time namespace for sys_clock_gettime() - The time in the root name space for common_timer_get() That adds a valid reason to finally implement a separate callback which returns the time in ktime_t format. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-10-dima@arista.com	2020-01-14 12:20:51 +01:00
Andrei Vagin	2f58bf909a	alarmtimer: Provide get_timespec() callback The upcoming support for time namespaces requires to have access to: - The time in a task's time namespace for sys_clock_gettime() - The time in the root name space for common_timer_get() Wire up alarm bases with get_timespec(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-9-dima@arista.com	2020-01-14 12:20:51 +01:00
Andrei Vagin	41b3b8dffc	alarmtimer: Rename gettime() callback to get_ktime() The upcoming support for time namespaces requires to have access to: - The time in a tasks time namespace for sys_clock_gettime() - The time in the root name space for common_timer_get() struct alarm_base needs to follow the same naming convention, so rename .gettime() callback into get_ktime() as a preparation for introducing get_timespec(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-8-dima@arista.com	2020-01-14 12:20:50 +01:00
Andrei Vagin	eaf80194d0	posix-clocks: Rename .clock_get_timespec() callbacks accordingly The upcoming support for time namespaces requires to have access to: - The time in a task's time namespace for sys_clock_gettime() - The time in the root name space for common_timer_get() That adds a valid reason to finally implement a separate callback which returns the time in ktime_t format in (struct k_clock). As a preparation ground for introducing clock_get_ktime(), the original callback clock_get() was renamed into clock_get_timespec(). Reflect the renaming into the callback implementations. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-7-dima@arista.com	2020-01-14 12:20:50 +01:00
Andrei Vagin	819a95fe3a	posix-clocks: Rename the clock_get() callback to clock_get_timespec() The upcoming support for time namespaces requires to have access to: - The time in a task's time namespace for sys_clock_gettime() - The time in the root name space for common_timer_get() That adds a valid reason to finally implement a separate callback which returns the time in ktime_t format, rather than in (struct timespec). Rename the clock_get() callback to clock_get_timespec() as a preparation for introducing clock_get_ktime(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-6-dima@arista.com	2020-01-14 12:20:49 +01:00
Andrei Vagin	af993f58d6	time: Add timens_offsets to be used for tasks in time namespace Introduce offsets for time namespace. They will contain an adjustment needed to convert clocks to/from host's. A new namespace is created with the same offsets as the time namespace of the current process. Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-5-dima@arista.com	2020-01-14 12:20:49 +01:00
Andrei Vagin	769071ac9f	ns: Introduce Time Namespace Time Namespace isolates clock values. The kernel provides access to several clocks CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc. CLOCK_REALTIME System-wide clock that measures real (i.e., wall-clock) time. CLOCK_MONOTONIC Clock that cannot be set and represents monotonic time since some unspecified starting point. CLOCK_BOOTTIME Identical to CLOCK_MONOTONIC, except it also includes any time that the system is suspended. For many users, the time namespace means the ability to changes date and time in a container (CLOCK_REALTIME). Providing per namespace notions of CLOCK_REALTIME would be complex with a massive overhead, but has a dubious value. But in the context of checkpoint/restore functionality, monotonic and boottime clocks become interesting. Both clocks are monotonic with unspecified starting points. These clocks are widely used to measure time slices and set timers. After restoring or migrating processes, it has to be guaranteed that they never go backward. In an ideal case, the behavior of these clocks should be the same as for a case when a whole system is suspended. All this means that it is required to set CLOCK_MONOTONIC and CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace offsets for clocks. A time namespace is similar to a pid namespace in the way how it is created: unshare(CLONE_NEWTIME) system call creates a new time namespace, but doesn't set it to the current process. Then all children of the process will be born in the new time namespace, or a process can use the setns() system call to join a namespace. This scheme allows setting clock offsets for a namespace, before any processes appear in it. All available clone flags have been used, so CLONE_NEWTIME uses the highest bit of CSIGNAL. It means that it can be used only with the unshare() and the clone3() system calls. [ tglx: Adjusted paragraph about clone3() to reality and massaged the changelog a bit. ] Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://criu.org/Time_namespace Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com	2020-01-14 12:20:48 +01:00
Andrei Vagin	c966533f8c	lib/vdso: Mark do_hres() and do_coarse() as __always_inline Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (more clock_gettime() cycles - the better): clock \| before \| after \| diff ---------------------------------------------------------- monotonic \| 153222105 \| 166775025 \| 8.8% monotonic-coarse \| 671557054 \| 691513017 \| 3.0% monotonic-raw \| 147116067 \| 161057395 \| 9.5% boottime \| 153446224 \| 166962668 \| 9.1% The improvement for arm64 for monotonic and boottime is around 3.5%. clock \| before \| after \| diff ================================================== monotonic 17326692 17951770 3.6% monotonic-coarse 43624027 44215292 1.3% monotonic-raw 17541809 17554932 0.1% boottime 17334982 17954361 3.5% [ tglx: Avoid the goto ] Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191112012724.250792-3-dima@arista.com	2020-01-14 12:20:48 +01:00
Andrei Vagin	0898a16a36	lib/vdso: Add unlikely() hint into vdso_read_begin() Place the branch with no concurrent write before the contended case. Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz (more clock_gettime() cycles - the better): \| before \| after ----------------------------------- \| 150252214 \| 153242367 \| 150301112 \| 153324800 \| 150392773 \| 153125401 \| 150373957 \| 153399355 \| 150303157 \| 153489417 \| 150365237 \| 153494270 ----------------------------------- avg \| 150331408 \| 153345935 diff % \| 2 \| 0 ----------------------------------- stdev % \| 0.3 \| 0.1 Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/r/20191112012724.250792-2-dima@arista.com	2020-01-14 12:20:47 +01:00
Christophe Leroy	cdb7c5a9c8	lib/vdso: Avoid duplication in __cvdso_clock_getres() VDSO_HRES and VDSO_RAW clocks are handled the same way. Avoid the code duplication. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/fdf1a968a8f7edd61456f1689ac44082ebb19c15.1577111367.git.christophe.leroy@c-s.fr	2020-01-14 12:20:47 +01:00
Christophe Leroy	8463cf8052	lib/vdso: Let do_coarse() return 0 to simplify the callsite do_coarse() is similar to do_hres() except that it never fails. Change its type to int instead of void and let it always return success (0) to simplify the call site. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/21e8afa38c02ca8672c2690307383507fe63b454.1577111367.git.christophe.leroy@c-s.fr	2020-01-14 12:20:46 +01:00
Vincenzo Frascino	0b5c12332d	x86/vdso: Remove unused VDSO_HAS_32BIT_FALLBACK VDSO_HAS_32BIT_FALLBACK has been removed from the core since the architectures that support the generic vDSO library have been converted to support the 32 bit fallbacks. Remove unused VDSO_HAS_32BIT_FALLBACK from x86 vdso. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20190830135902.20861-9-vincenzo.frascino@arm.com	2020-01-14 12:20:46 +01:00
Vincenzo Frascino	de0209f53a	mips: vdso: Remove unused VDSO_HAS_32BIT_FALLBACK VDSO_HAS_32BIT_FALLBACK has been removed from the core since the architectures that support the generic vDSO library have been converted to support the 32 bit fallbacks. Remove unused VDSO_HAS_32BIT_FALLBACK from mips vdso. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Paul Burton <paul.burton@mips.com> Link: https://lore.kernel.org/r/20190830135902.20861-8-vincenzo.frascino@arm.com	2020-01-14 12:20:46 +01:00
Vincenzo Frascino	972188f3a2	arm64: compat: vdso: Remove unused VDSO_HAS_32BIT_FALLBACK VDSO_HAS_32BIT_FALLBACK has been removed from the core since the architectures that support the generic vDSO library have been converted to support the 32 bit fallbacks. Remove unused VDSO_HAS_32BIT_FALLBACK from arm64 compat vdso. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20190830135902.20861-7-vincenzo.frascino@arm.com	2020-01-14 12:20:45 +01:00
Vincenzo Frascino	a279235ddb	lib/vdso: Remove checks on return value for 32 bit vDSO Since all the architectures that support the generic vDSO library have been converted to support the 32 bit fallbacks it is not required anymore to check the return value of __cvdso_clock_get*time32_common() before updating the old_timespec fields. Remove the related checks from the generic vdso library. References: `c60a32ea4f` ("lib/vdso/32: Provide legacy syscall fallbacks") Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20190830135902.20861-6-vincenzo.frascino@arm.com	2020-01-14 12:20:45 +01:00
Vincenzo Frascino	b767081c07	lib/vdso: Remove VDSO_HAS_32BIT_FALLBACK VDSO_HAS_32BIT_FALLBACK was introduced to address a regression which caused seccomp to deny access to the applications to clock_gettime64() and clock_getres64() because they are not enabled in the existing filters. The purpose of VDSO_HAS_32BIT_FALLBACK was to simplify the conditional implementation of __cvdso_clock_get*time32() variants. Now that all the architectures that support the generic vDSO library have been converted to support the 32 bit fallbacks the conditional can be removed. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20190830135902.20861-5-vincenzo.frascino@arm.com References: `c60a32ea4f` ("lib/vdso/32: Provide legacy syscall fallbacks")	2020-01-14 12:20:44 +01:00
Vincenzo Frascino	bf279849ad	lib/vdso: Build 32 bit specific functions in the right context clock_gettime32 and clock_getres_time32 should be compiled only with a 32 bit vdso library. Exclude these symbols when BUILD_VDSO32 is not defined. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20190830135902.20861-3-vincenzo.frascino@arm.com	2020-01-14 12:20:44 +01:00
Thomas Gleixner	715f23b610	ARM: vdso: Set BUILD_VDSO32 and provide 32bit fallbacks Setting BUILD_VDSO32 is required to expose the legacy 32bit interfaces in the generic VDSO code which are going to be hidden behind an #ifdef BUILD_VDSO32. The 32bit fallbacks are necessary to remove the existing VDSO_HAS_32BIT_FALLBACK hackery. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/87tv4zq9dc.fsf@nanos.tec.linutronix.de	2020-01-14 12:20:43 +01:00
Vincenzo Frascino	3b5584afee	arm64: compat: vdso: Expose BUILD_VDSO32 clock_gettime32 and clock_getres_time32 should be compiled only with the 32 bit vdso library. Expose BUILD_VDSO32 when arm64 compat is compiled, to provide an indication to the generic library to include these symbols. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20190830135902.20861-2-vincenzo.frascino@arm.com	2020-01-14 12:20:43 +01:00
Thomas Gleixner	2e34d63d82	Merge branch 'timers/urgent' into timers/core Pick up upstream VDSO fix before adding more VDSO changes.	2020-01-10 21:11:54 +01:00
Vincenzo Frascino	ffd08731b2	lib/vdso: Make __cvdso_clock_getres() static Fix the following sparse warning in the generic vDSO library: linux/lib/vdso/gettimeofday.c:224:5: warning: symbol '__cvdso_clock_getres' was not declared. Should it be static? Make it static and also mark it __maybe_unsed. Fixes: `502a590a17` ("lib/vdso: Move fallback invocation to the callers") Reported-by: Marc Gonzalez <marc.w.gonzalez@free.fr> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20191128111719.8282-1-vincenzo.frascino@arm.com	2020-01-10 19:29:01 +01:00
Paul Cercueil	2707745533	time/sched_clock: Disable interrupts in sched_clock_register() Instead of issueing a warning if sched_clock_register() is called from a context where IRQs are enabled, the code now ensures that IRQs are indeed disabled. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200107010630.954648-1-paul@crapouillou.net	2020-01-09 18:50:18 +01:00
Arnd Bergmann	f35deaff1b	time/posix-stubs: Provide compat itimer supoprt for alpha Using compat_sys_getitimer and compat_sys_setitimer on alpha causes a link failure in the Alpha tinyconfig and other configurations that turn off CONFIG_POSIX_TIMERS. Use the same #ifdef check for the stub version as well. Fixes: `4c22ea2b91` ("y2038: use compat_{get,set}_itimer on alpha") Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20191207191043.656328-1-arnd@arndb.de	2020-01-09 18:20:23 +01:00
Linus Torvalds	c79f46a282	Linux 5.5-rc5	2020-01-05 14:23:27 -08:00
Linus Torvalds	768fc661d1	RISC-V updates for v5.5-rc5 Several fixes for RISC-V: - Fix function graph trace support - Prefix the CSR IRQ_* macro names with "RV_", to avoid collisions with macros elsewhere in the Linux kernel tree named "IRQ_TIMER" - Use __pa_symbol() when computing the physical address of a kernel symbol, rather than __pa() - Mark the RISC-V port as supporting GCOV One DT addition: - Describe the L2 cache controller in the FU540 DT file One documentation update: - Add patch acceptance guideline documentation -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAl4Re3wACgkQx4+xDQu9 KkvPNA/6AmDvw90JX1Xm6jK2J/eYaUr29dJZ7KYdQ7HhbsixkJecTqjW3huyWfv6 ynsqWA0e87iSGyV+zBshw6nxVyGgDZgRam5100eiP25ZiTR5cbR8e6JmVVplGdkH VOONhAM64Pknoy+lCL0emP01+qa27PqTCQomxDDIxgrpfgbR9PPjgpuA7B0gCOz1 yhc6EjfbSRKLJjnowoSdvHcDvzU0oIPjMPxQz1gUmBmUKRp5Idn/xTtWG9p/M5T8 6XPzh8hEfbgvi0ur0Dlz9hav2g5mSL2S5n4CPitAAOyrG/jDktlQgYcs7nSTAk7m vtU6oUZTMJ9nmMyPQspI+krW2Q4g5BUUVRwm3ckNe6sqwrs8TlXIeTQb8Yb+B4Xm 3EaS5YvfMNoHjlHQZ814EPAFNEqROgI04O2Sl97+gqEpQUwC9qLSeXiVQHgxcW9/ 8UqlbY/9XvKnZq6btGAJElXAirbt5j+TWKCULyUKLu8mz3644ascIr2sBOkMSb61 ZRZbxEbK8Avwf2etv7hap1Gd2j+ywlK0IziKOsAPBr0Pif0lAdawf1PjH491spVc //re3g/rZu3I2+NK4qKNCoxZwzT5w/kK8rKJJ7sw8KVB2kTVjV7Qrmktt5Zeh021 PXPT/tUgLMvZW6RRCqPCXZru28qJxlLAhuQEtHC/KbSYcm2sSJc= =8HSd -----END PGP SIGNATURE----- Merge tag 'riscv/for-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: "Several fixes for RISC-V: - Fix function graph trace support - Prefix the CSR IRQ_* macro names with "RV_", to avoid collisions with macros elsewhere in the Linux kernel tree named "IRQ_TIMER" - Use __pa_symbol() when computing the physical address of a kernel symbol, rather than __pa() - Mark the RISC-V port as supporting GCOV One DT addition: - Describe the L2 cache controller in the FU540 DT file One documentation update: - Add patch acceptance guideline documentation" * tag 'riscv/for-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: Documentation: riscv: add patch acceptance guidelines riscv: prefix IRQ_ macro names with an RV_ namespace clocksource: riscv: add notrace to riscv_sched_clock riscv: ftrace: correct the condition logic in function graph tracer riscv: dts: Add DT support for SiFive L2 cache controller riscv: gcov: enable gcov for RISC-V riscv: mm: use __pa_symbol for kernel symbols	2020-01-05 11:15:31 -08:00
Paul Walmsley	0e194d9da1	Documentation: riscv: add patch acceptance guidelines Formalize, in kernel documentation, the patch acceptance policy for arch/riscv. In summary, it states that as maintainers, we plan to only accept patches for new modules or extensions that have been frozen or ratified by the RISC-V Foundation. We've been following these guidelines for the past few months. In the meantime, we've received quite a bit of feedback that it would be helpful to have these guidelines formally documented. Based on a suggestion from Matthew Wilcox, we also add a link to this file to Documentation/process/index.rst, to make this document easier to find. The format of this document has also been changed to align to the format outlined in the maintainer entry profiles, in accordance with comments from Jon Corbet and Dan Williams. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Krste Asanovic <krste@berkeley.edu> Cc: Andrew Waterman <waterman@eecs.berkeley.edu> Cc: Matthew Wilcox <willy@infradead.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jonathan Corbet <corbet@lwn.net>	2020-01-04 21:49:01 -08:00
Paul Walmsley	2f3035da40	riscv: prefix IRQ_ macro names with an RV_ namespace "IRQ_TIMER", used in the arch/riscv CSR header file, is a sufficiently generic macro name that it's used by several source files across the Linux code base. Some of these other files ultimately include the arch/riscv CSR include file, causing collisions. Fix by prefixing the RISC-V csr.h IRQ_ macro names with an RV_ prefix. Fixes: `a4c3733d32` ("riscv: abstract out CSR names for supervisor vs machine mode") Reported-by: Olof Johansson <olof@lixom.net> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>	2020-01-04 21:48:59 -08:00
Zong Li	9d05c18e8d	clocksource: riscv: add notrace to riscv_sched_clock When enabling ftrace graph tracer, it gets the tracing clock in ftrace_push_return_trace(). Eventually, it invokes riscv_sched_clock() to get the clock value. If riscv_sched_clock() isn't marked with 'notrace', it will call ftrace_push_return_trace() and cause infinite loop. The result of failure as follow: command: echo function_graph >current_tracer [ 46.176787] Unable to handle kernel paging request at virtual address ffffffe04fb38c48 [ 46.177309] Oops [#1] [ 46.177478] Modules linked in: [ 46.177770] CPU: 0 PID: 256 Comm: $d Not tainted 5.5.0-rc1 #47 [ 46.177981] epc: ffffffe00035e59a ra : ffffffe00035e57e sp : ffffffe03a7569b0 [ 46.178216] gp : ffffffe000d29b90 tp : ffffffe03a756180 t0 : ffffffe03a756968 [ 46.178430] t1 : ffffffe00087f408 t2 : ffffffe03a7569a0 s0 : ffffffe03a7569f0 [ 46.178643] s1 : ffffffe00087f408 a0 : 0000000ac054cda4 a1 : 000000000087f411 [ 46.178856] a2 : 0000000ac054cda4 a3 : 0000000000373ca0 a4 : ffffffe04fb38c48 [ 46.179099] a5 : 00000000153e22a8 a6 : 00000000005522ff a7 : 0000000000000005 [ 46.179338] s2 : ffffffe03a756a90 s3 : ffffffe00032811c s4 : ffffffe03a756a58 [ 46.179570] s5 : ffffffe000d29fe0 s6 : 0000000000000001 s7 : 0000000000000003 [ 46.179809] s8 : 0000000000000003 s9 : 0000000000000002 s10: 0000000000000004 [ 46.180053] s11: 0000000000000000 t3 : 0000003fc815749c t4 : 00000000000efc90 [ 46.180293] t5 : ffffffe000d29658 t6 : 0000000000040000 [ 46.180482] status: 0000000000000100 badaddr: ffffffe04fb38c48 cause: 000000000000000f Signed-off-by: Zong Li <zong.li@sifive.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> [paul.walmsley@sifive.com: cleaned up patch description] Fixes: `92e0d143fd` ("clocksource/drivers/riscv_timer: Provide the sched_clock") Cc: stable@vger.kernel.org Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>	2020-01-04 21:48:48 -08:00
Linus Torvalds	36487907f3	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "17 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: hexagon: define ioremap_uc ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less ocfs2: call journal flush to mark journal as empty after journal recovery when mount mm/hugetlb: defer freeing of huge pages if in non-task context mm/gup: fix memory leak in __gup_benchmark_ioctl mm/oom: fix pgtables units mismatch in Killed process message fs/posix_acl.c: fix kernel-doc warnings hexagon: work around compiler crash hexagon: parenthesize registers in asm predicates fs/namespace.c: make to_mnt_ns() static fs/nsfs.c: include headers for missing declarations fs/direct-io.c: include fs/internal.h for missing prototype mm: move_pages: return valid node id in status if the page is already on the target node memcg: account security cred as well to kmemcg kcov: fix struct layout for kcov_remote_arg mm/zsmalloc.c: fix the migrated zspage statistics. mm/memory_hotplug: shrink zones when offlining memory	2020-01-04 19:38:51 -08:00
Linus Torvalds	a125bcda2d	+ Bug fixes - performance regression: only get a label reference if the fast path check fails - fix aa_xattrs_match() may sleep while holding a RCU lock - fix bind mounts aborting with -ENOMEM -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE7cSDD705q2rFEEf7BS82cBjVw9gFAl4RPEUACgkQBS82cBjV w9gUwg/9EsFcegRD6T2NmiqUp7jY2KRjAf5JrKqZJvCbXFj9Mnurv9ug5A6b20JU SbyXU1CNIxqPTmwPMRF1YGtq/vuAV5UIgb14pjepmsvzF5A/xpgKJucR0gokc5PU 6eFbPCQ4aXF83/d+/SiJfANP+oUY372b6cTzHKMouKWXBbeIG4F9vtVrPMEZlcjo NNxDtmHNeFdASV17uA+8OKseGluPzPlWvnpq5va/Uz4avmxdeaBQxqh2N891IUqO drRTpF6cgp/072cEoSrS+2dmliHOteS9785Vh4iLF4LVt8lsRjoT0Z66eGL175Ve TYwp7EfgIwhcUYLZgKjz7t+wp3l7Yw5retlzbzbfFTIxTfGCKunUpZIUwssI+QzW npMOU+0jTLZ9hVZICOicxzs40kHu8tR4dKYPHuYB2G3gNW8LGMEodVn2SBL1H6+2 r4vUjaLIsyUHBpfQDjWHjMEN/gdSekyBvRhpnC5qNMdOTnTeHSNow88igBxmGNhA K9iymhYV0E1ZMw4KfGtyKZ2Zfd3E8F0ryH0cnavXYBYfvuVNs18M0TkaYwPb1+YH 02SyEGRnnhxtIO+GwLve6pmdRT9edZgL6Pc3yCaJ/e/g9FIpFgPbX44IxuO+XrL/ ku6NK9hXrpvk9Z/CAVUHYcRMSgQTz1lTsw9b9ooxgHQmsXuOUhM= =uw+S -----END PGP SIGNATURE----- Merge tag 'apparmor-pr-2020-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor Pull apparmor fixes from John Johansen: - performance regression: only get a label reference if the fast path check fails - fix aa_xattrs_match() may sleep while holding a RCU lock - fix bind mounts aborting with -ENOMEM * tag 'apparmor-pr-2020-01-04' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock apparmor: only get a label reference if the fast path check fails apparmor: fix bind mounts aborting with -ENOMEM	2020-01-04 19:28:30 -08:00
John Johansen	8c62ed27a1	apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock aa_xattrs_match() is unfortunately calling vfs_getxattr_alloc() from a context protected by an rcu_read_lock. This can not be done as vfs_getxattr_alloc() may sleep regardles of the gfp_t value being passed to it. Fix this by breaking the rcu_read_lock on the policy search when the xattr match feature is requested and restarting the search if a policy changes occur. Fixes: `8e51f9087f` ("apparmor: Add support for attaching profiles via xattr, presence and value") Reported-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: John Johansen <john.johansen@canonical.com>	2020-01-04 15:56:44 -08:00

1 2 3 4 5 ...

888077 Commits