aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/include
AgeCommit message (Collapse)AuthorFilesLines
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman284-0/+284
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-20x86/mm: Limit mmap() of /dev/mem to valid physical addressesCraig Bergstrom1-0/+4
Currently, it is possible to mmap() any offset from /dev/mem. If a program mmaps() /dev/mem offsets outside of the addressable limits of a system, the page table can be corrupted by setting reserved bits. For example if you mmap() offset 0x0001000000000000 of /dev/mem on an x86_64 system with a 48-bit bus, the page fault handler will be called with error_code set to RSVD. The kernel then crashes with a page table corruption error. This change prevents this page table corruption on x86 by refusing to mmap offsets higher than the highest valid address in the system. Signed-off-by: Craig Bergstrom <craigb@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: dsafonov@virtuozzo.com Cc: kirill.shutemov@linux.intel.com Cc: mhocko@suse.com Cc: oleg@redhat.com Link: http://lkml.kernel.org/r/20171019192856.39672-1-craigb@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-18x86/mm: Remove debug/x86/tlb_defer_switch_to_init_mmAndy Lutomirski1-8/+12
Borislav thinks that we don't need this knob in a released kernel. Get rid of it. Requested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: b956575bed91 ("x86/mm: Flush more aggressively in lazy TLB mode") Link: http://lkml.kernel.org/r/1fa72431924e81e86c164ff7881bf9240d1f1a6c.1508000261.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-18x86/mm: Tidy up "x86/mm: Flush more aggressively in lazy TLB mode"Andy Lutomirski1-1/+6
Due to timezones, commit: b956575bed91 ("x86/mm: Flush more aggressively in lazy TLB mode") was an outdated patch that well tested and fixed the bug but didn't address Borislav's review comments. Tidy it up: - The name "tlb_use_lazy_mode()" was highly confusing. Change it to "tlb_defer_switch_to_init_mm()", which describes what it actually means. - Move the static_branch crap into a helper. - Improve comments. Actually removing the debugfs option is in the next patch. Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: b956575bed91 ("x86/mm: Flush more aggressively in lazy TLB mode") Link: http://lkml.kernel.org/r/154ef95428d4592596b6e98b0af1d2747d6cfbf8.1508000261.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-14Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds5-11/+32
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A landry list of fixes: - fix reboot breakage on some PCID-enabled system - fix crashes/hangs on some PCID-enabled systems - fix microcode loading on certain older CPUs - various unwinder fixes - extend an APIC quirk to more hardware systems and disable APIC related warning on virtualized systems - various Hyper-V fixes - a macro definition robustness fix - remove jprobes IRQ disabling - various mem-encryption fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Do the family check first x86/mm: Flush more aggressively in lazy TLB mode x86/apic: Update TSC_DEADLINE quirk with additional SKX stepping x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on hypervisors x86/mm: Disable various instrumentations of mm/mem_encrypt.c and mm/tlb.c x86/hyperv: Fix hypercalls with extended CPU ranges for TLB flushing x86/hyperv: Don't use percpu areas for pcpu_flush/pcpu_flush_ex structures x86/hyperv: Clear vCPU banks between calls to avoid flushing unneeded vCPUs x86/unwind: Disable unwinder warnings on 32-bit x86/unwind: Align stack pointer in unwinder dump x86/unwind: Use MSB for frame pointer encoding on 32-bit x86/unwind: Fix dereference of untrusted pointer x86/alternatives: Fix alt_max_short macro to really be a max() x86/mm/64: Fix reboot interaction with CR4.PCIDE kprobes/x86: Remove IRQ disabling from jprobe handlers kprobes/x86: Set up frame pointer in kprobe trampoline
2017-10-14Merge branch 'ras-urgent-for-linus' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Ingo Molnar: "A boot parameter fix, plus a header export fix" * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Hide mca_cfg RAS/CEC: Use the right length for "cec_disable"
2017-10-14x86/mm: Flush more aggressively in lazy TLB modeAndy Lutomirski2-7/+25
Since commit: 94b1b03b519b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking") x86's lazy TLB mode has been all the way lazy: when running a kernel thread (including the idle thread), the kernel keeps using the last user mm's page tables without attempting to maintain user TLB coherence at all. From a pure semantic perspective, this is fine -- kernel threads won't attempt to access user pages, so having stale TLB entries doesn't matter. Unfortunately, I forgot about a subtlety. By skipping TLB flushes, we also allow any paging-structure caches that may exist on the CPU to become incoherent. This means that we can have a paging-structure cache entry that references a freed page table, and the CPU is within its rights to do a speculative page walk starting at the freed page table. I can imagine this causing two different problems: - A speculative page walk starting from a bogus page table could read IO addresses. I haven't seen any reports of this causing problems. - A speculative page walk that involves a bogus page table can install garbage in the TLB. Such garbage would always be at a user VA, but some AMD CPUs have logic that triggers a machine check when it notices these bogus entries. I've seen a couple reports of this. Boris further explains the failure mode: > It is actually more of an optimization which assumes that paging-structure > entries are in WB DRAM: > > "TlbCacheDis: cacheable memory disable. Read-write. 0=Enables > performance optimization that assumes PML4, PDP, PDE, and PTE entries > are in cacheable WB-DRAM; memory type checks may be bypassed, and > addresses outside of WB-DRAM may result in undefined behavior or NB > protocol errors. 1=Disables performance optimization and allows PML4, > PDP, PDE and PTE entries to be in any memory type. Operating systems > that maintain page tables in memory types other than WB- DRAM must set > TlbCacheDis to insure proper operation." > > The MCE generated is an NB protocol error to signal that > > "Link: A specific coherent-only packet from a CPU was issued to an > IO link. This may be caused by software which addresses page table > structures in a memory type other than cacheable WB-DRAM without > properly configuring MSRC001_0015[TlbCacheDis]. This may occur, for > example, when page table structure addresses are above top of memory. In > such cases, the NB will generate an MCE if it sees a mismatch between > the memory operation generated by the core and the link type." > > I'm assuming coherent-only packets don't go out on IO links, thus the > error. To fix this, reinstate TLB coherence in lazy mode. With this patch applied, we do it in one of two ways: - If we have PCID, we simply switch back to init_mm's page tables when we enter a kernel thread -- this seems to be quite cheap except for the cost of serializing the CPU. - If we don't have PCID, then we set a flag and switch to init_mm the first time we would otherwise need to flush the TLB. The /sys/kernel/debug/x86/tlb_use_lazy_mode debug switch can be changed to override the default mode for benchmarking. In theory, we could optimize this better by only flushing the TLB in lazy CPUs when a page table is freed. Doing that would require auditing the mm code to make sure that all page table freeing goes through tlb_remove_page() as well as reworking some data structures to implement the improved flush logic. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-by: Adam Borowski <kilobyte@angband.pl> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Biggers <ebiggers@google.com> Cc: Johannes Hirte <johannes.hirte@datenkhaos.de> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 94b1b03b519b ("x86/mm: Rework lazy TLB mode and TLB freshness tracking") Link: http://lkml.kernel.org/r/20171009170231.fkpraqokz6e4zeco@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10x86/hyperv: Clear vCPU banks between calls to avoid flushing unneeded vCPUsVitaly Kuznetsov1-0/+1
hv_flush_pcpu_ex structures are not cleared between calls for performance reasons (they're variable size up to PAGE_SIZE each) but we must clear hv_vp_set.bank_contents part of it to avoid flushing unneeded vCPUs. The rest of the structure is formed correctly. To do the clearing in an efficient way stash the maximum possible vCPU number (this may differ from Linux CPU id). Reported-by: Jork Loeser <Jork.Loeser@microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: devel@linuxdriverproject.org Link: http://lkml.kernel.org/r/20171006154854.18092-1-vkuznets@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-09x86/alternatives: Fix alt_max_short macro to really be a max()Mathias Krause2-4/+6
The alt_max_short() macro in asm/alternative.h does not work as intended, leading to nasty bugs. E.g. alt_max_short("1", "3") evaluates to 3, but alt_max_short("3", "1") evaluates to 1 -- not exactly the maximum of 1 and 3. In fact, I had to learn it the hard way by crashing my kernel in not so funny ways by attempting to make use of the ALTENATIVE_2 macro with alternatives where the first one was larger than the second one. According to [1] and commit dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") the right handed side should read "-(-(a < b))" not "-(-(a - b))". Fix that, to make the macro work as intended. While at it, fix up the comments regarding the additional "-", too. It's not about gas' usage of s32 but brain dead logic of having a "true" value of -1 for the < operator ... *sigh* Btw., the one in asm/alternative-asm.h is correct. And, apparently, all current users of ALTERNATIVE_2() pass same sized alternatives, avoiding to hit the bug. [1] http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax Reviewed-and-tested-by: Borislav Petkov <bp@suse.de> Fixes: dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1507228213-13095-1-git-send-email-minipli@googlemail.com
2017-10-05x86/mce: Hide mca_cfgBorislav Petkov1-1/+0
Now that lguest is gone, put it in the internal header which should be used only by MCA/RAS code. Add missing header guards while at it. No functional change. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20171002092836.22971-3-bp@alien8.de
2017-10-04kvm/x86: Avoid async PF preempting the kernel incorrectlyBoqun Feng1-2/+2
Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call schedule() to reschedule in some cases. This could result in accidentally ending the current RCU read-side critical section early, causing random memory corruption in the guest, or otherwise preempting the currently running task inside between preempt_disable and preempt_enable. The difficulty to handle this well is because we don't know whether an async PF delivered in a preemptible section or RCU read-side critical section for PREEMPT_COUNT=n, since preempt_disable()/enable() and rcu_read_lock/unlock() are both no-ops in that case. To cure this, we treat any async PF interrupting a kernel context as one that cannot be preempted, preventing kvm_async_pf_task_wait() from choosing the schedule() path in that case. To do so, a second parameter for kvm_async_pf_task_wait() is introduced, so that we know whether it's called from a context interrupting the kernel, and the parameter is set properly in all the callsites. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: stable@vger.kernel.org Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-10-01Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds3-15/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "This contains the following fixes and improvements: - Avoid dereferencing an unprotected VMA pointer in the fault signal generation code - Fix inline asm call constraints for GCC 4.4 - Use existing register variable to retrieve the stack pointer instead of forcing the compiler to create another indirect access which results in excessive extra 'mov %rsp, %<dst>' instructions - Disable branch profiling for the memory encryption code to prevent an early boot crash - Fix a sparse warning caused by casting the __user annotation in __get_user_asm_u64() away - Fix an off by one error in the loop termination of the error patch in the x86 sysfs init code - Add missing CPU IDs to various Intel specific drivers to enable the functionality on recent hardware - More (init) constification in the numachip code" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Use register variable to get stack pointer value x86/mm: Disable branch profiling in mem_encrypt.c x86/asm: Fix inline asm call constraints for GCC 4.4 perf/x86/intel/uncore: Correct num_boxes for IIO and IRP perf/x86/intel/rapl: Add missing CPU IDs perf/x86/msr: Add missing CPU IDs perf/x86/intel/cstate: Add missing CPU IDs x86: Don't cast away the __user in __get_user_asm_u64() x86/sysfs: Fix off-by-one error in loop termination x86/mm: Fix fault error path using unsafe vma pointer x86/numachip: Add const and __initconst to numachip2_clockevent
2017-09-29Merge tag 'for-linus-4.14c-rc3-tag' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - avoid a warning when compiling with clang - consider read-only bits in xen-pciback when writing to a BAR - fix a boot crash of pv-domains * tag 'for-linus-4.14c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/mmu: Call xen_cleanhighmap() with 4MB aligned for page tables mapping xen-pciback: relax BAR sizing write value check x86/xen: clean up clang build warning
2017-09-29x86/asm: Use register variable to get stack pointer valueAndrey Ryabinin2-13/+2
Currently we use current_stack_pointer() function to get the value of the stack pointer register. Since commit: f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang") ... we have a stack register variable declared. It can be used instead of current_stack_pointer() function which allows to optimize away some excessive "mov %rsp, %<dst>" instructions: -mov %rsp,%rdx -sub %rdx,%rax -cmp $0x3fff,%rax -ja ffffffff810722fd <ist_begin_non_atomic+0x2d> +sub %rsp,%rax +cmp $0x3fff,%rax +ja ffffffff810722fa <ist_begin_non_atomic+0x2a> Remove current_stack_pointer(), rename __asm_call_sp to current_stack_pointer and use it instead of the removed function. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170929141537.29167-1-aryabinin@virtuozzo.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-29x86/asm: Fix inline asm call constraints for GCC 4.4Josh Poimboeuf1-2/+4
The kernel test bot (run by Xiaolong Ye) reported that the following commit: f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang") is causing double faults in a kernel compiled with GCC 4.4. Linus subsequently diagnosed the crash pattern and the buggy commit and found that the issue is with this code: register unsigned int __asm_call_sp asm("esp"); #define ASM_CALL_CONSTRAINT "+r" (__asm_call_sp) Even on a 64-bit kernel, it's using ESP instead of RSP. That causes GCC to produce the following bogus code: ffffffff8147461d: 89 e0 mov %esp,%eax ffffffff8147461f: 4c 89 f7 mov %r14,%rdi ffffffff81474622: 4c 89 fe mov %r15,%rsi ffffffff81474625: ba 20 00 00 00 mov $0x20,%edx ffffffff8147462a: 89 c4 mov %eax,%esp ffffffff8147462c: e8 bf 52 05 00 callq ffffffff814c98f0 <copy_user_generic_unrolled> Despite the absurdity of it backing up and restoring the stack pointer for no reason, the bug is actually the fact that it's only backing up and restoring the lower 32 bits of the stack pointer. The upper 32 bits are getting cleared out, corrupting the stack pointer. So change the '__asm_call_sp' register variable to be associated with the actual full-size stack pointer. This also requires changing the __ASM_SEL() macro to be based on the actual compiled arch size, rather than the CONFIG value, because CONFIG_X86_64 compiles some files with '-m32' (e.g., realmode and vdso). Otherwise Clang fails to build the kernel because it complains about the use of a 64-bit register (RSP) in a 32-bit file. Reported-and-Bisected-and-Tested-by: kernel test robot <xiaolong.ye@intel.com> Diagnosed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: LKP <lkp@01.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: f5caf621ee35 ("x86/asm: Fix inline asm call constraints for Clang") Link: http://lkml.kernel.org/r/20170928215826.6sdpmwtkiydiytim@treble Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-26Merge branch 'WIP.x86/fpu' into x86/fpu, because it's readyIngo Molnar4-105/+40
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-26x86/fpu: Introduce validate_xstate_header()Eric Biggers1-0/+4
Move validation of user-supplied xstate_header into a helper function, in preparation of calling it from both the ptrace and sigreturn syscall paths. The new function also considers it to be an error if *any* reserved bits are set, whereas before we were just clearing most of them silently. This should reduce the chance of bugs that fail to correctly validate user-supplied XSAVE areas. It also will expose any broken userspace programs that set the other reserved bits; this is desirable because such programs will lose compatibility with future CPUs and kernels if those bits are ever used for anything. (There shouldn't be any such programs, and in fact in the case where the compacted format is in use we were already validating xfeatures. But you never know...) Signed-off-by: Eric Biggers <ebiggers@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kevin Hao <haokexin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: kernel-hardening@lists.openwall.com Link: http://lkml.kernel.org/r/20170924105913.9157-2-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-26x86/fpu: Rename fpu__activate_fpstate_read/write() to ↵Ingo Molnar1-2/+2
fpu__prepare_[read|write]() As per the new nomenclature we don't 'activate' the FPU state anymore, we initialize it. So drop the _activate_fpstate name from these functions, which were a bit of a mouthful anyway, and name them: fpu__prepare_read() fpu__prepare_write() Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-26x86/fpu: Rename fpu__activate_curr() to fpu__initialize()Ingo Molnar1-1/+1
Rename this function to better express that it's all about initializing the FPU state of a task which goes hand in hand with the fpu::initialized field. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-33-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-26x86/fpu: Rename fpu::fpstate_active to fpu::initializedIngo Molnar3-9/+9
The x86 FPU code used to have a complex state machine where both the FPU registers and the FPU state context could be 'active' (or inactive) independently of each other - which enabled features like lazy FPU restore. Much of this complexity is gone in the current code: now we basically can have FPU-less tasks (kernel threads) that don't use (and save/restore) FPU state at all, plus full FPU users that save/restore directly with no laziness whatsoever. But the fpu::fpstate_active still carries bits of the old complexity - meanwhile this flag has become a simple flag that shows whether the FPU context saving area in the thread struct is initialized and used, or not. Rename it to fpu::initialized to express this simplicity in the name as well. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-30-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-26x86/fpu: Remove fpu__current_fpstate_write_begin/end()Ingo Molnar1-2/+0
These functions are not used anymore, so remove them. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-29-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-25x86: Don't cast away the __user in __get_user_asm_u64()Ville Syrjälä1-1/+1
Don't cast away the __user in __get_user_asm_u64() on x86-32. Prevents sparse getting upset. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20170912164000.13745-1-ville.syrjala@linux.intel.com
2017-09-25x86/fpu: Reinitialize FPU registers if restoring FPU state failsEric Biggers1-36/+15
Userspace can change the FPU state of a task using the ptrace() or rt_sigreturn() system calls. Because reserved bits in the FPU state can cause the XRSTOR instruction to fail, the kernel has to carefully validate that no reserved bits or other invalid values are being set. Unfortunately, there have been bugs in this validation code. For example, we were not checking that the 'xcomp_bv' field in the xstate_header was 0. As-is, such bugs are exploitable to read the FPU registers of other processes on the system. To do so, an attacker can create a task, assign to it an invalid FPU state, then spin in a loop and monitor the values of the FPU registers. Because the task's FPU registers are not being restored, sometimes the FPU registers will have the values from another process. This is likely to continue to be a problem in the future because the validation done by the CPU instructions like XRSTOR is not immediately visible to kernel developers. Nor will invalid FPU states ever be encountered during ordinary use --- they will only be seen during fuzzing or exploits. There can even be reserved bits outside the xstate_header which are easy to forget about. For example, the MXCSR register contains reserved bits, which were not validated by the KVM_SET_XSAVE ioctl until commit a575813bfe4b ("KVM: x86: Fix load damaged SSEx MXCSR register"). Therefore, mitigate this class of vulnerability by restoring the FPU registers from init_fpstate if restoring from the task's state fails. We actually used to do this, but it was (perhaps unwisely) removed by commit 9ccc27a5d297 ("x86/fpu: Remove error return values from copy_kernel_to_*regs() functions"). This new patch is also a bit different. First, it only clears the registers, not also the bad in-memory state; this is simpler and makes it easier to make the mitigation cover all callers of __copy_kernel_to_fpregs(). Second, it does the register clearing in an exception handler so that no extra instructions are added to context switches. In fact, we *remove* instructions, since previously we were always zeroing the register containing 'err' even if CONFIG_X86_DEBUG_FPU was disabled. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Kevin Hao <haokexin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: kernel-hardening@lists.openwall.com Link: http://lkml.kernel.org/r/20170922174156.16780-4-ebiggers3@gmail.com Link: http://lkml.kernel.org/r/20170923130016.21448-27-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Turn WARN_ON() in context switch into WARN_ON_FPU()Andi Kleen1-1/+1
copy_xregs_to_kernel checks if the alternatives have been already patched. This WARN_ON() is always executed in every context switch. All the other checks in fpu internal.h are WARN_ON_FPU(), but this one is plain WARN_ON(). I assume it was forgotten to switch it. So switch it to WARN_ON_FPU() too to avoid some unnecessary code in the context switch, and a potentially expensive cache line miss for the global variable. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170329062605.4970-1-andi@firstfloor.org Link: http://lkml.kernel.org/r/20170923130016.21448-24-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Add FPU state copying quirk to handle XRSTOR failure on Intel ↵Rik van Riel1-0/+3
Skylake CPUs On Skylake CPUs I noticed that XRSTOR is unable to deal with states created by copyout_from_xsaves() if the xstate has only SSE/YMM state, and no FP state. That is, xfeatures had XFEATURE_MASK_SSE set, but not XFEATURE_MASK_FP. The reason is that part of the SSE/YMM state lives in the MXCSR and MXCSR_FLAGS fields of the FP state. Ensure that whenever we copy SSE or YMM state around, the MXCSR and MXCSR_FLAGS fields are also copied around. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170210085445.0f1cc708@annuminas.surriel.com Link: http://lkml.kernel.org/r/20170923130016.21448-22-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Remove struct fpu::fpregs_activeIngo Molnar3-32/+1
The previous changes paved the way for the removal of the fpu::fpregs_active state flag - we now only have the fpu::fpstate_active and fpu::last_cpu fields left. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-21-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Decouple fpregs_activate()/fpregs_deactivate() from fpu->fpregs_activeIngo Molnar1-6/+1
The fpregs_activate()/fpregs_deactivate() are currently called in such a pattern: if (!fpu->fpregs_active) fpregs_activate(fpu); ... if (fpu->fpregs_active) fpregs_deactivate(fpu); But note that it's actually safe to call them without checking the flag first. This further decouples the fpu->fpregs_active flag from actual FPU logic. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-20-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Change fpu->fpregs_active users to fpu->fpstate_activeIngo Molnar1-1/+3
We want to simplify the FPU state machine by eliminating fpu->fpregs_active, and we can do that because the two state flags (::fpregs_active and ::fpstate_active) are set essentially together. The old lazy FPU switching code used to make a distinction - but there's no lazy switching code anymore, we always switch in an 'eager' fashion. Do this by first changing all substantial uses of fpu->fpregs_active to fpu->fpstate_active and adding a few debug checks to double check our assumption is correct. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-19-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Simplify fpu->fpregs_active useIngo Molnar1-16/+1
The fpregs_active() inline function is pretty pointless - in almost all the callsites it can be replaced with a direct fpu->fpregs_active access. Do so and eliminate the extra layer of obfuscation. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-16-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Flip the parameter order in copy_*_to_xstate()Ingo Molnar1-2/+2
Make it more consistent with regular memcpy() semantics, where the destination argument comes first. No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-15-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Remove 'kbuf' parameter from the copy_user_to_xstate() APIIngo Molnar1-1/+1
No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-14-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Remove 'ubuf' parameter from the copy_kernel_to_xstate() APIIngo Molnar1-1/+1
No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-13-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Split copy_user_to_xstate() into copy_kernel_to_xstate() & ↵Ingo Molnar1-2/+2
copy_user_to_xstate() Similar to: x86/fpu: Split copy_xstate_to_user() into copy_xstate_to_kernel() & copy_xstate_to_user() No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-12-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Clarify parameter names in the copy_xstate_to_*() methodsIngo Molnar1-2/+2
Right now there's a confusing mixture of 'offset' and 'size' parameters: - __copy_xstate_to_*() input parameter 'end_pos' not not really an offset, but the full size of the copy to be performed. - input parameter 'count' to copy_xstate_to_*() shadows that of __copy_xstate_to_*()'s 'count' parameter name - but the roles are different: the first one is the total number of bytes to be copied, while the second one is a partial copy size. To unconfuse all this, use a consistent set of parameter names: - 'size' is the partial copy size within a single xstate component - 'size_total' is the total copy requested - 'offset_start' is the requested starting offset. - 'offset' is the offset within an xstate component. No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-9-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Clean up parameter order in the copy_xstate_to_*() APIsIngo Molnar1-2/+2
Parameter ordering is weird: int copy_xstate_to_kernel(unsigned int pos, unsigned int count, void *kbuf, struct xregs_state *xsave); int copy_xstate_to_user(unsigned int pos, unsigned int count, void __user *ubuf, struct xregs_state *xsave); 'pos' and 'count', which are attributes of the destination buffer, are listed before the destination buffer itself ... List them after the primary arguments instead. This makes the code more similar to regular memcpy() variant APIs. No change in functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-6-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Remove 'kbuf' parameter from the copy_xstate_to_user() APIsIngo Molnar1-1/+1
The 'kbuf' parameter is unused in the _user() side of the API, remove it. This simplifies the code and makes it easier to think about. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-5-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Remove 'ubuf' parameter from the copy_xstate_to_kernel() APIsIngo Molnar1-1/+1
The 'ubuf' parameter is unused in the _kernel() side of the API, remove it. This simplifies the code and makes it easier to think about. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-4-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Split copy_xstate_to_user() into copy_xstate_to_kernel() & ↵Ingo Molnar1-2/+2
copy_xstate_to_user() copy_xstate_to_user() is a weird API - in part due to a bad API inherited from the regset APIs. But don't propagate that bad API choice into the FPU code - so as a first step split the API into kernel and user buffer handling routines. (Also split the xstate_copyout() internal helper.) The split API is a dumb duplication that should be obviously correct, the real splitting will be done in the next patch. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-3-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-24x86/fpu: Rename copyin_to_xsaves()/copyout_from_xsaves() to ↵Ingo Molnar1-2/+2
copy_user_to_xstate()/copy_xstate_to_user() The 'copyin/copyout' nomenclature needlessly departs from what the modern FPU code uses, which is: copy_fpregs_to_fpstate() copy_fpstate_to_sigframe() copy_fregs_to_user() copy_fxregs_to_kernel() copy_fxregs_to_user() copy_kernel_to_fpregs() copy_kernel_to_fregs() copy_kernel_to_fxregs() copy_kernel_to_xregs() copy_user_to_fregs() copy_user_to_fxregs() copy_user_to_xregs() copy_xregs_to_kernel() copy_xregs_to_user() I.e. according to this pattern, the following rename should be done: copyin_to_xsaves() -> copy_user_to_xstate() copyout_from_xsaves() -> copy_xstate_to_user() or, if we want to be pedantic, denote that that the user-space format is ptrace: copyin_to_xsaves() -> copy_user_ptrace_to_xstate() copyout_from_xsaves() -> copy_xstate_to_user_ptrace() But I'd suggest the shorter, non-pedantic name. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Biggers <ebiggers3@gmail.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20170923130016.21448-2-mingo@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-23x86/asm: Fix inline asm call constraints for ClangJosh Poimboeuf9-36/+36
For inline asm statements which have a CALL instruction, we list the stack pointer as a constraint to convince GCC to ensure the frame pointer is set up first: static inline void foo() { register void *__sp asm(_ASM_SP); asm("call bar" : "+r" (__sp)) } Unfortunately, that pattern causes Clang to corrupt the stack pointer. The fix is easy: convert the stack pointer register variable to a global variable. It should be noted that the end result is different based on the GCC version. With GCC 6.4, this patch has exactly the same result as before: defconfig defconfig-nofp distro distro-nofp before 9820389 9491555 8816046 8516940 after 9820389 9491555 8816046 8516940 With GCC 7.2, however, GCC's behavior has changed. It now changes its behavior based on the conversion of the register variable to a global. That somehow convinces it to *always* set up the frame pointer before inserting *any* inline asm. (Therefore, listing the variable as an output constraint is a no-op and is no longer necessary.) It's a bit overkill, but the performance impact should be negligible. And in fact, there's a nice improvement with frame pointers disabled: defconfig defconfig-nofp distro distro-nofp before 9796316 9468236 9076191 8790305 after 9796957 9464267 9076381 8785949 So in summary, while listing the stack pointer as an output constraint is no longer necessary for newer versions of GCC, it's still needed for older versions. Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-21x86/xen: clean up clang build warningColin Ian King1-2/+2
In the case where sizeof(maddr) != sizeof(long) p is initialized and never read and clang throws a warning on this. Move declaration of p to clean up the clang build warning: warning: Value stored to 'p' during its initialization is never read Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-09-17x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware codeAndy Lutomirski1-2/+19
Putting the logical ASID into CR3's PCID bits directly means that we have two cases to consider separately: ASID == 0 and ASID != 0. This means that bugs that only hit in one of these cases trigger nondeterministically. There were some bugs like this in the past, and I think there's still one in current kernels. In particular, we have a number of ASID-unware code paths that save CR3, write some special value, and then restore CR3. This includes suspend/resume, hibernate, kexec, EFI, and maybe other things I've missed. This is currently dangerous: if ASID != 0, then this code sequence will leave garbage in the TLB tagged for ASID 0. We could potentially see corruption when switching back to ASID 0. In principle, an initialize_tlbstate_and_flush() call after these sequences would solve the problem, but EFI, at least, does not call this. (And it probably shouldn't -- initialize_tlbstate_and_flush() is rather expensive.) Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/cdc14bbe5d3c3ef2a562be09a6368ffe9bd947a6.1505663533.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-17x86/mm: Factor out CR3-building codeAndy Lutomirski1-4/+11
Current, the code that assembles a value to load into CR3 is open-coded everywhere. Factor it out into helpers build_cr3() and build_cr3_noflush(). This makes one semantic change: __get_current_cr3_fast() was wrong on SME systems. No one noticed because the only caller is in the VMX code, and there are no CPUs with both SME and VMX. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <Thomas.Lendacky@amd.com> Link: http://lkml.kernel.org/r/ce350cf11e93e2842d14d0b95b0199c7d881f527.1505663533.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-2/+1
Pull more KVM updates from Paolo Bonzini: - PPC bugfixes - RCU splat fix - swait races fix - pointless userspace-triggerable BUG() fix - misc fixes for KVM_RUN corner cases - nested virt correctness fixes + one host DoS - some cleanups - clang build fix - fix AMD AVIC with default QEMU command line options - x86 bugfixes * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits) kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry kvm,mips: Fix potential swait_active() races kvm,powerpc: Serialize wq active checks in ops->vcpu_kick kvm: Serialize wq active checks in kvm_vcpu_wake_up() kvm,x86: Fix apf_task_wake_one() wq serialization kvm,lapic: Justify use of swait_active() kvm,async_pf: Use swq_has_sleeper() sched/wait: Add swq_has_sleeper() KVM: VMX: Do not BUG() on out-of-bounds guest IRQ KVM: Don't accept obviously wrong gsi values via KVM_IRQFD kvm: nVMX: Don't allow L2 to access the hardware CR8 KVM: trace events: update list of exit reasons KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously KVM: X86: Don't block vCPU if there is pending exception KVM: SVM: Add irqchip_split() checks before enabling AVIC KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv() KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu() KVM: x86: fix clang build ...
2017-09-13KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()Suravee Suthikulpanit1-1/+1
Modify struct kvm_x86_ops.arch.apicv_active() to take struct kvm_vcpu pointer as parameter in preparation to subsequent changes. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-13KVM: x86: Remove .get_pkru() from kvm_x86_opsJoerg Roedel1-1/+0
The commit 9dd21e104bc ('KVM: x86: simplify handling of PKRU') removed all users and providers of that call-back, but didn't remove it. Remove it now. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-09-13x86/hyper-v: Remove duplicated HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED definitionVitaly Kuznetsov1-6/+0
Commits: 7dcf90e9e032 ("PCI: hv: Use vPCI protocol version 1.2") 628f54cc6451 ("x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls") added the same definition and they came in through different trees. Fix the duplication. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: devel@linuxdriverproject.org Link: http://lkml.kernel.org/r/20170911150620.3998-1-vkuznets@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-13x86/paravirt: Remove no longer used paravirt functionsJuergen Gross5-76/+10
With removal of lguest some of the paravirt functions are no longer needed: ->read_cr4() ->store_idt() ->set_pmd_at() ->set_pud_at() ->pte_update() Remove them. Signed-off-by: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Cc: boris.ostrovsky@oracle.com Cc: chrisw@sous-sol.org Cc: jeremy@goop.org Cc: rusty@rustcorp.com.au Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20170904102527.25409-1-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-12Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two fixes: dead code removal, plus a SME memory encryption fix on 32-bit kernels that crashed Xen guests" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Remove unused and undefined __generic_processor_info() declaration x86/mm: Make the SME mask a u64
2017-09-11Merge tag 'libnvdimm-for-4.14' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm from Dan Williams: "A rework of media error handling in the BTT driver and other updates. It has appeared in a few -next releases and collected some late- breaking build-error and warning fixups as a result. Summary: - Media error handling support in the Block Translation Table (BTT) driver is reworked to address sleeping-while-atomic locking and memory-allocation-context conflicts. - The dax_device lookup overhead for xfs and ext4 is moved out of the iomap hot-path to a mount-time lookup. - A new 'ecc_unit_size' sysfs attribute is added to advertise the read-modify-write boundary property of a persistent memory range. - Preparatory fix-ups for arm and powerpc pmem support are included along with other miscellaneous fixes" * tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits) libnvdimm, btt: fix format string warnings libnvdimm, btt: clean up warning and error messages ext4: fix null pointer dereference on sbi libnvdimm, nfit: move the check on nd_reserved2 to the endpoint dax: fix FS_DAX=n BLOCK=y compilation libnvdimm: fix integer overflow static analysis warning libnvdimm, nd_blk: remove mmio_flush_range() libnvdimm, btt: rework error clearing libnvdimm: fix potential deadlock while clearing errors libnvdimm, btt: cache sector_size in arena_info libnvdimm, btt: ensure that flags were also unchanged during a map_read libnvdimm, btt: refactor map entry operations with macros libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute ext4: perform dax_device lookup at mount ext2: perform dax_device lookup at mount xfs: perform dax_device lookup at mount dax: introduce a fs_dax_get_by_bdev() helper libnvdimm, btt: check memory allocation failure libnvdimm, label: fix index block size calculation ...

Privacy Policy