aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel
AgeCommit message (Collapse)AuthorFilesLines
2014-03-02Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes, most of them on the tooling side" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Fix strict alias issue for find_first_bit perf tools: fix BFD detection on opensuse perf: Fix hotplug splat perf/x86: Fix event scheduling perf symbols: Destroy unused symsrcs perf annotate: Check availability of annotate when processing samples
2014-02-27perf/x86: Fix event schedulingPeter Zijlstra1-0/+3
Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures, with perf WARN_ON()s triggering. He also provided traces of the failures. This is I think the relevant bit: > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_disable: x86_pmu_disable > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926156: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926158: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926159: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1 > pec_1076_warn-2804 [000] d... 147.926161: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926162: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926163: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926166: collect_events: Adding event: 1 (ffff880119ec8800) So we add the insn:p event (fd[23]). At this point we should have: n_events = 2, n_added = 1, n_txn = 1 > pec_1076_warn-2804 [000] d... 147.926170: collect_events: Adding event: 0 (ffff8800c9e01800) > pec_1076_warn-2804 [000] d... 147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00) We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]). These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so that's not visible. group_sched_in() pmu->start_txn() /* nop - BP pmu */ event_sched_in() event->pmu->add() So here we should end up with: 0: n_events = 3, n_added = 2, n_txn = 2 4: n_events = 4, n_added = 3, n_txn = 3 But seeing the below state on x86_pmu_enable(), the must have failed, because the 0 and 4 events aren't there anymore. Looking at group_sched_in(), since the BP is the leader, its event_sched_in() must have succeeded, for otherwise we would not have seen the sibling adds. But since neither 0 or 4 are in the below state; their event_sched_in() must have failed; but I don't see why, the complete state: 0,0,1:p,4 fits perfectly fine on a core2. However, since we try and schedule 4 it means the 0 event must have succeeded! Therefore the 4 event must have failed, its failure will have put group_sched_in() into the fail path, which will call: event_sched_out() event->pmu->del() on 0 and the BP event. Now x86_pmu_del() will reduce n_events; but it will not reduce n_added; giving what we see below: n_event = 2, n_added = 2, n_txn = 2 > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_enable: x86_pmu_enable > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926179: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926181: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926182: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2 > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926186: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: 1->0 tag: 1 config: 1 (ffff880119ec8800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0 So the problem is that x86_pmu_del(), when called from a group_sched_in() that fails (for whatever reason), and without x86_pmu TXN support (because the leader is !x86_pmu), will corrupt the n_added state. Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stephane Eranian <eranian@google.com> Cc: Dave Jones <davej@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-26x86, kaslr: export offset in VMCOREINFO ELF notesEugene Surovegin1-0/+2
Include kASLR offset in VMCOREINFO ELF notes to assist in debugging. [ hpa: pushing this for v3.14 to avoid having a kernel version with kASLR where we can't debug output. ] Signed-off-by: Eugene Surovegin <surovegin@google.com> Link: http://lkml.kernel.org/r/20140123173120.GA25474@www.outflux.net Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-23Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-20/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: - a bugfix which prevents a divide by 0 panic when the newly introduced try_msr_calibrate_tsc() fails - enablement of the Baytrail platform to utilize the newfangled msr based calibration * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: tsc: Add missing Baytrail frequency to the table x86, tsc: Fallback to normal calibration if fast MSR calibration fails
2014-02-22Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds5-26/+52
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixlets from all around the place" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/uncore: Fix IVT/SNB-EP uncore CBOX NID filter table perf/x86: Correctly use FEATURE_PDCM perf, nmi: Fix unknown NMI warning perf trace: Fix ioctl 'request' beautifier build problems on !(i386 || x86_64) arches perf trace: Add fallback definition of EFD_SEMAPHORE perf list: Fix checking for supported events on older kernels perf tools: Handle PERF_RECORD_HEADER_EVENT_TYPE properly perf probe: Do not add offset twice to uprobe address perf/x86: Fix Userspace RDPMC switch perf/x86/intel/p6: Add userspace RDPMC quirk for PPro
2014-02-21perf/x86/uncore: Fix IVT/SNB-EP uncore CBOX NID filter tableStephane Eranian1-1/+9
This patch updates the CBOX PMU filters mapping tables for SNB-EP and IVT (model 45 and 62 respectively). The NID umask always comes in addition to another umask. When set, the NID filter is applied. The current mapping tables were missing some code/umask combinations to account for the NID umask. This patch fixes that. Cc: mingo@elte.hu Cc: ak@linux.intel.com Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140219131018.GA24475@quad Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21perf/x86: Correctly use FEATURE_PDCMPeter Zijlstra1-4/+1
The current code simply assumes Intel Arch PerfMon v2+ to have the IA32_PERF_CAPABILITIES MSR; the SDM specifies that we should check CPUID[1].ECX[15] (aka, FEATURE_PDCM) instead. This was found by KVM which implements v2+ but didn't provide the capabilities MSR. Change the code to DTRT; KVM will also implement the MSR and return 0. Cc: pbonzini@redhat.com Reported-by: "Michael S. Tsirkin" <mst@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140203132903.GI8874@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-21perf, nmi: Fix unknown NMI warningMarkus Metzger1-4/+2
When using BTS on Core i7-4*, I get the below kernel warning. $ perf record -c 1 -e branches:u ls Message from syslogd@labpc1501 at Nov 11 15:49:25 ... kernel:[ 438.317893] Uhhuh. NMI received for unknown reason 31 on CPU 2. Message from syslogd@labpc1501 at Nov 11 15:49:25 ... kernel:[ 438.317920] Do you have a strange power saving mode enabled? Message from syslogd@labpc1501 at Nov 11 15:49:25 ... kernel:[ 438.317945] Dazed and confused, but trying to continue Make intel_pmu_handle_irq() take the full exit path when returning early. Cc: eranian@google.com Cc: peterz@infradead.org Cc: mingo@kernel.org Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1392425048-5309-1-git-send-email-andi@firstfloor.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-20Merge branch 'fixes-for-v3.14' of ↵Linus Torvalds1-1/+3
git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull DMA-mapping fixes from Marek Szyprowski: "This contains fixes for incorrect atomic test in dma-mapping subsystem for ARM and x86 architecture" * 'fixes-for-v3.14' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: x86: dma-mapping: fix GFP_ATOMIC macro usage ARM: dma-mapping: fix GFP_ATOMIC macro usage
2014-02-19x86: tsc: Add missing Baytrail frequency to the tableMika Westerberg1-1/+1
Intel Baytrail is based on Silvermont core so MSR_FSB_FREQ[2:0] == 0 means that the CPU reference clock runs at 83.3MHz. Add this missing frequency to the table. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Bin Gao <bin.gao@linux.intel.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1392810750-18660-2-git-send-email-mika.westerberg@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-19x86, tsc: Fallback to normal calibration if fast MSR calibration failsThomas Gleixner2-19/+16
If we cannot calibrate TSC via MSR based calibration try_msr_calibrate_tsc() stores zero to fast_calibrate and returns that to the caller. This value gets then propagated further to clockevents code resulting division by zero oops like the one below: divide error: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 3.13.0+ #47 task: ffff880075508000 ti: ffff880075506000 task.ti: ffff880075506000 RIP: 0010:[<ffffffff810aec14>] [<ffffffff810aec14>] clockevents_config.part.3+0x24/0xa0 RSP: 0000:ffff880075507e58 EFLAGS: 00010246 RAX: ffffffffffffffff RBX: ffff880079c0cd80 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffff RBP: ffff880075507e70 R08: 0000000000000001 R09: 00000000000000be R10: 00000000000000bd R11: 0000000000000003 R12: 000000000000b008 R13: 0000000000000008 R14: 000000000000b010 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880079c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff880079fff000 CR3: 0000000001c0b000 CR4: 00000000001006f0 Stack: ffff880079c0cd80 000000000000b008 0000000000000008 ffff880075507e88 ffffffff810aecb0 ffff880079c0cd80 ffff880075507e98 ffffffff81030168 ffff880075507ed8 ffffffff81d1104f 00000000000000c3 0000000000000000 Call Trace: [<ffffffff810aecb0>] clockevents_config_and_register+0x20/0x30 [<ffffffff81030168>] setup_APIC_timer+0xc8/0xd0 [<ffffffff81d1104f>] setup_boot_APIC_clock+0x4cc/0x4d8 [<ffffffff81d0f5de>] native_smp_prepare_cpus+0x3dd/0x3f0 [<ffffffff81d02ee9>] kernel_init_freeable+0xc3/0x205 [<ffffffff8177c910>] ? rest_init+0x90/0x90 [<ffffffff8177c91e>] kernel_init+0xe/0x120 [<ffffffff8178deec>] ret_from_fork+0x7c/0xb0 [<ffffffff8177c910>] ? rest_init+0x90/0x90 Prevent this from happening by: 1) Modifying try_msr_calibrate_tsc() to return calibration value or zero if it fails. 2) Check this return value in native_calibrate_tsc() and in case of zero fallback to use normal non-MSR based calibration. [mw: Added subject and changelog] Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bin Gao <bin.gao@linux.intel.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1392810750-18660-1-git-send-email-mika.westerberg@linux.intel.com Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-02-15Merge tag 'trace-fixes-v3.14-rc2' of ↵Linus Torvalds1-36/+47
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull twi tracing fixes from Steven Rostedt: "Two urgent fixes in the tracing utility. The first is a fix for the way the ring buffer stores timestamps. After a restructure of the code was done, the ring buffer timestamp logic missed the fact that the first event on a sub buffer is to have a zero delta, as the full timestamp is stored on the sub buffer itself. But because the delta was not cleared to zero, the timestamp for that event will be calculated as the real timestamp + the delta from the last timestamp. This can skew the timestamps of the events and have them say they happened when they didn't really happen. That's bad. The second fix is for modifying the function graph caller site. When the stop machine was removed from updating the function tracing code, it missed updating the function graph call site location. It is still modified as if it is being done via stop machine. But it's not. This can lead to a GPF and kernel crash if the function graph call site happens to lie between cache lines and one CPU is executing it while another CPU is doing the update. It would be a very hard condition to hit, but the result is severe enough to have it fixed ASAP" * tag 'trace-fixes-v3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace/x86: Use breakpoints for converting function graph caller ring-buffer: Fix first commit on sub-buffer having non-zero delta
2014-02-13x86, smap: Don't enable SMAP if CONFIG_X86_SMAP is disabledH. Peter Anvin1-1/+6
If SMAP support is not compiled into the kernel, don't enable SMAP in CR4 -- in fact, we should clear it, because the kernel doesn't contain the proper STAC/CLAC instructions for SMAP support. Found by Fengguang Wu's test system. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Link: http://lkml.kernel.org/r/20140213124550.GA30497@localhost Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v3.7+
2014-02-12ftrace/x86: Use breakpoints for converting function graph callerSteven Rostedt (Red Hat)1-36/+47
When the conversion was made to remove stop machine and use the breakpoint logic instead, the modification of the function graph caller is still done directly as though it was being done under stop machine. As it is not converted via stop machine anymore, there is a possibility that the code could be layed across cache lines and if another CPU is accessing that function graph call when it is being updated, it could cause a General Protection Fault. Convert the update of the function graph caller to use the breakpoint method as well. Cc: H. Peter Anvin <hpa@zytor.com> Cc: stable@vger.kernel.org # 3.5+ Fixes: 08d636b6d4fb "ftrace/x86: Have arch x86_64 use breakpoints instead of stop machine" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-11x86: dma-mapping: fix GFP_ATOMIC macro usageMarek Szyprowski1-1/+3
GFP_ATOMIC is not a single gfp flag, but a macro which expands to the other flags, where meaningful is the LACK of __GFP_WAIT flag. To check if caller wants to perform an atomic allocation, the code must test for a lack of the __GFP_WAIT flag. This patch fixes the issue introduced in v3.5-rc1. CC: stable@vger.kernel.org Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2014-02-09x86: Use preempt_disable_notrace() in cycles_2_ns()Steven Rostedt1-2/+2
When debug preempt is enabled, preempt_disable() can be traced by function and function graph tracing. There's a place in the function graph tracer that calls trace_clock() which eventually calls cycles_2_ns() outside of the recursion protection. When cycles_2_ns() calls preempt_disable() it gets traced and the graph tracer will go into a recursive loop causing a crash or worse, a triple fault. Simple fix is to use preempt_disable_notrace() in cycles_2_ns, which makes sense because the preempt_disable() tracing may use that code too, and it tracing it, even with recursion protection is rather pointless. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140204141315.2a968a72@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09perf/x86: Fix Userspace RDPMC switchPeter Zijlstra1-1/+1
The current code forgets to change the CR4 state on the current CPU. Use on_each_cpu() instead of smp_call_function(). Reported-by: Mark Davies <junk@eslaf.co.uk> Suggested-by: Mark Davies <junk@eslaf.co.uk> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: fweisbec@gmail.com Link: http://lkml.kernel.org/n/tip-69efsat90ibhnd577zy3z9gh@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09perf/x86/intel/p6: Add userspace RDPMC quirk for PProPeter Zijlstra3-16/+39
PPro machines can die hard when PCE gets enabled due to a CPU erratum. The safe way it so disable it by default and keep it disabled. See erratum 26 in: http://download.intel.com/design/archives/processors/pro/docs/24268935.pdf Reported-and-Tested-by: Mark Davies <junk@eslaf.co.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vince@deater.net> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140206170815.GW2936@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-07Merge tag 'efi-urgent' into x86/urgentH. Peter Anvin12-24/+47
* Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit). Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-06x86, microcode, AMD: Unify valid container checksBorislav Petkov1-14/+29
For additional coverage, BorisO and friends unknowlingly did swap AMD microcode with Intel microcode blobs in order to see what happens. What did happen on 32-bit was [ 5.722656] BUG: unable to handle kernel paging request at be3a6008 [ 5.722693] IP: [<c106d6b4>] load_microcode_amd+0x24/0x3f0 [ 5.722716] *pdpt = 0000000000000000 *pde = 0000000000000000 because there was a valid initrd there but without valid microcode in it and the container check happened *after* the relocated ramdisk handling on 32-bit, which was clearly wrong. While at it, take care of the ramdisk relocation on both 32- and 64-bit as it is done on both. Also, comment what we're doing because this code is a bit tricky. Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1391460104-7261-1-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-01-31Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core debug changes from Ingo Molnar: "This contains mostly kernel debugging related updates: - make hung_task detection more configurable to distros - add final bits for x86 UV NMI debugging, with related KGDB changes - update the mailing-list of MAINTAINERS entries I'm involved with" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hung_task: Display every hung task warning sysctl: Add neg_one as a standard constraint x86/uv/nmi, kgdb/kdb: Fix UV NMI handler when KDB not configured x86/uv/nmi: Fix Sparse warnings kgdb/kdb: Fix no KDB config problem MAINTAINERS: Restore "L: linux-kernel@vger.kernel.org" entries
2014-01-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+32
Pull more KVM updates from Paolo Bonzini: "Second batch of KVM updates. Some minor x86 fixes, two s390 guest features that need some handling in the host, and all the PPC changes. The PPC changes include support for little-endian guests and enablement for new POWER8 features" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (45 commits) x86, kvm: correctly access the KVM_CPUID_FEATURES leaf at 0x40000101 x86, kvm: cache the base of the KVM cpuid leaves kvm: x86: move KVM_CAP_HYPERV_TIME outside #ifdef KVM: PPC: Book3S PR: Cope with doorbell interrupts KVM: PPC: Book3S HV: Add software abort codes for transactional memory KVM: PPC: Book3S HV: Add new state for transactional memory powerpc/Kconfig: Make TM select VSX and VMX KVM: PPC: Book3S HV: Basic little-endian guest support KVM: PPC: Book3S HV: Add support for DABRX register on POWER7 KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbells KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8 KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs KVM: PPC: Book3S HV: Consolidate code that checks reason for wake from nap KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8 KVM: PPC: Book3S HV: Add handler for HV facility unavailable KVM: PPC: Book3S HV: Flush the correct number of TLB sets on POWER8 KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers KVM: PPC: Book3S HV: Don't set DABR on POWER8 kvm/ppc: IRQ disabling cleanup ...
2014-01-31Merge branch 'x86-asmlinkage-for-linus' of ↵Linus Torvalds2-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asmlinkage (LTO) changes from Peter Anvin: "This patchset adds more infrastructure for link time optimization (LTO). This patchset was pulled into my tree late because of a miscommunication (part of the patchset was picked up by other maintainers). However, the patchset is strictly build-related and seems to be okay in testing" * 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, asmlinkage, xen: Fix type of NMI x86, asmlinkage, xen, kvm: Make {xen,kvm}_lock_spinning global and visible x86: Use inline assembler instead of global register variable to get sp x86, asmlinkage, paravirt: Make paravirt thunks global x86, asmlinkage, paravirt: Don't rely on local assembler labels x86, asmlinkage, lguest: Fix C functions used by inline assembler
2014-01-31x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable()Prarit Bhargava1-1/+8
Further discussion here: http://marc.info/?l=linux-kernel&m=139073901101034&w=2 kbuild, 0day kernel build service, outputs the warning: arch/x86/kernel/irq.c:333:1: warning: the frame size of 2056 bytes is larger than 2048 bytes [-Wframe-larger-than=] because check_irq_vectors_for_cpu_disable() allocates two cpumasks on the stack. Fix this by moving the two cpumasks to a global file context. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1390915331-27375-1-git-send-email-prarit@redhat.com Cc: Andi Kleen <ak@linux.intel.com> Cc: Michel Lespinasse <walken@google.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Yang Zhang <yang.z.zhang@Intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Janet Morgan <janet.morgan@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Ruiv Wang <ruiv.wang@gmail.com> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-30x86, asmlinkage, xen, kvm: Make {xen,kvm}_lock_spinning global and visibleAndi Kleen1-1/+1
These functions are called from inline assembler stubs, thus need to be global and visible. Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-7-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-30x86, asmlinkage, paravirt: Make paravirt thunks globalAndi Kleen1-4/+4
The paravirt thunks use a hack of using a static reference to a static function to reference that function from the top level statement. This assumes that gcc always generates static function names in a specific format, which is not necessarily true. Simply make these functions global and asmlinkage or __visible. This way the static __used variables are not needed and everything works. Functions with arguments are __visible to keep the register calling convention on 32bit. Changed in paravirt and in all users (Xen and vsmp) v2: Use __visible for functions with arguments Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Ido Yariv <ido@wizery.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-5-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-01-29x86, kvm: correctly access the KVM_CPUID_FEATURES leaf at 0x40000101Paolo Bonzini1-0/+5
When Hyper-V hypervisor leaves are present, KVM must relocate its own leaves at 0x40000100, because Windows does not look for Hyper-V leaves at indices other than 0x40000000. In this case, the KVM features are at 0x40000101, but the old code would always look at 0x40000001. Fix by using kvm_cpuid_base(). This also requires making the function non-inline, since kvm_cpuid_base() is static. Fixes: 1085ba7f552d84aa8ac0ae903fa8d0cc2ff9f79d Cc: stable@vger.kernel.org Cc: mtosatti@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-29x86, kvm: cache the base of the KVM cpuid leavesPaolo Bonzini1-0/+27
It is unnecessary to go through hypervisor_cpuid_base every time a leaf is found (which will be every time a feature is requested after the next patch). Fixes: 1085ba7f552d84aa8ac0ae903fa8d0cc2ff9f79d Cc: stable@vger.kernel.org Cc: mtosatti@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-01-28x86: revert wrong memblock current limit settingYinghai Lu1-1/+1
Dave reported big numa system booting is broken. It turns out that commit 5b6e529521d3 ("x86: memblock: set current limit to max low memory address") sets the limit to low wrongly. max_low_pfn_mapped is different from max_pfn_mapped. max_low_pfn_mapped is always under 4G. That will memblock_alloc_nid all go under 4G. Revert 5b6e529521d3 to fix a no-boot regression which was triggered by 457ff1de2d24 ("lib/swiotlb.c: use memblock apis for early memory allocations"). Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reported-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-25Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "A couple of regression fixes mostly hitting virtualized setups, but also some bare metal systems" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/x86/tsc: Initialize multiplier to 0 sched/clock: Fixup early initialization sched/preempt/x86: Fix voluntary preempt for x86 Revert "sched: Fix sleep time double accounting in enqueue entity"
2014-01-25Merge branch 'linus' into x86/urgentIngo Molnar53-312/+2070
Merge in the x86 changes to apply a fix. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridgeMel Gorman2-11/+4
There was a large ebizzy performance regression that was bisected to commit 611ae8e3 (x86/tlb: enable tlb flush range support for x86). The problem was related to the tlb_flushall_shift tuning for IvyBridge which was altered. The problem is that it is not clear if the tuning values for each CPU family is correct as the methodology used to tune the values is unclear. This patch uses a conservative tlb_flushall_shift value for all CPU families except IvyBridge so the decision can be revisited if any regression is found as a result of this change. IvyBridge is an exception as testing with one methodology determined that the value of 2 is acceptable. Details are in the changelog for the patch "x86: mm: Change tlb_flushall_shift for IvyBridge". One important aspect of this to watch out for is Xen. The original commit log mentioned large performance gains on Xen. It's possible Xen is more sensitive to this value if it flushes small ranges of pages more frequently than workloads on bare metal typically do. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Alex Shi <alex.shi@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-dyzMww3fqugnhbhgo6Gxmtkw@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25x86: mm: change tlb_flushall_shift for IvyBridgeMel Gorman1-1/+1
There was a large performance regression that was bisected to commit 611ae8e3 ("x86/tlb: enable tlb flush range support for x86"). This patch simply changes the default balance point between a local and global flush for IvyBridge. In the interest of allowing the tests to be reproduced, this patch was tested using mmtests 0.15 with the following configurations configs/config-global-dhp__tlbflush-performance configs/config-global-dhp__scheduler-performance configs/config-global-dhp__network-performance Results are from two machines Ivybridge 4 threads: Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz Ivybridge 8 threads: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz Page fault microbenchmark showed nothing interesting. Ebizzy was configured to run multiple iterations and threads. Thread counts ranged from 1 to NR_CPUS*2. For each thread count, it ran 100 iterations and each iteration lasted 10 seconds. Ivybridge 4 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 6395.44 ( 0.00%) 6789.09 ( 6.16%) Mean 2 7012.85 ( 0.00%) 8052.16 ( 14.82%) Mean 3 6403.04 ( 0.00%) 6973.74 ( 8.91%) Mean 4 6135.32 ( 0.00%) 6582.33 ( 7.29%) Mean 5 6095.69 ( 0.00%) 6526.68 ( 7.07%) Mean 6 6114.33 ( 0.00%) 6416.64 ( 4.94%) Mean 7 6085.10 ( 0.00%) 6448.51 ( 5.97%) Mean 8 6120.62 ( 0.00%) 6462.97 ( 5.59%) Ivybridge 8 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 7336.65 ( 0.00%) 7787.02 ( 6.14%) Mean 2 8218.41 ( 0.00%) 9484.13 ( 15.40%) Mean 3 7973.62 ( 0.00%) 8922.01 ( 11.89%) Mean 4 7798.33 ( 0.00%) 8567.03 ( 9.86%) Mean 5 7158.72 ( 0.00%) 8214.23 ( 14.74%) Mean 6 6852.27 ( 0.00%) 7952.45 ( 16.06%) Mean 7 6774.65 ( 0.00%) 7536.35 ( 11.24%) Mean 8 6510.50 ( 0.00%) 6894.05 ( 5.89%) Mean 12 6182.90 ( 0.00%) 6661.29 ( 7.74%) Mean 16 6100.09 ( 0.00%) 6608.69 ( 8.34%) Ebizzy hits the worst case scenario for TLB range flushing every time and it shows for these Ivybridge CPUs at least that the default choice is a poor on. The patch addresses the problem. Next was a tlbflush microbenchmark written by Alex Shi at http://marc.info/?l=linux-kernel&m=133727348217113 . It measures access costs while the TLB is being flushed. The expectation is that if there are always full TLB flushes that the benchmark would suffer and it benefits from range flushing There are 320 iterations of the test per thread count. The number of entries is randomly selected with a min of 1 and max of 512. To ensure a reasonably even spread of entries, the full range is broken up into 8 sections and a random number selected within that section. iteration 1, random number between 0-64 iteration 2, random number between 64-128 etc This is still a very weak methodology. When you do not know what are typical ranges, random is a reasonable choice but it can be easily argued that the opimisation was for smaller ranges and an even spread is not representative of any workload that matters. To improve this, we'd need to know the probability distribution of TLB flush range sizes for a set of workloads that are considered "common", build a synthetic trace and feed that into this benchmark. Even that is not perfect because it would not account for the time between flushes but there are limits of what can be reasonably done and still be doing something useful. If a representative synthetic trace is provided then this benchmark could be revisited and the shift values retuned. Ivybridge 4 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 10.50 ( 0.00%) 10.50 ( 0.03%) Mean 2 17.59 ( 0.00%) 17.18 ( 2.34%) Mean 3 22.98 ( 0.00%) 21.74 ( 5.41%) Mean 5 47.13 ( 0.00%) 46.23 ( 1.92%) Mean 8 43.30 ( 0.00%) 42.56 ( 1.72%) Ivybridge 8 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 9.45 ( 0.00%) 9.36 ( 0.93%) Mean 2 9.37 ( 0.00%) 9.70 ( -3.54%) Mean 3 9.36 ( 0.00%) 9.29 ( 0.70%) Mean 5 14.49 ( 0.00%) 15.04 ( -3.75%) Mean 8 41.08 ( 0.00%) 38.73 ( 5.71%) Mean 13 32.04 ( 0.00%) 31.24 ( 2.49%) Mean 16 40.05 ( 0.00%) 39.04 ( 2.51%) For both CPUs, average access time is reduced which is good as this is the benchmark that was used to tune the shift values in the first place albeit it is now known *how* the benchmark was used. The scheduler benchmarks were somewhat inconclusive. They showed gains and losses and makes me reconsider how stable those benchmarks really are or if something else might be interfering with the test results recently. Network benchmarks were inconclusive. Almost all results were flat except for netperf-udp tests on the 4 thread machine. These results were unstable and showed large variations between reboots. It is unknown if this is a recent problems but I've noticed before that netperf-udp results tend to vary. Based on these results, changing the default for Ivybridge seems like a logical choice. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Alex Shi <alex.shi@linaro.org> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-cqnadffh1tiqrshthRj3Esge@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25mm, x86: Account for TLB flushes only when debuggingMel Gorman1-2/+2
Bisection between 3.11 and 3.12 fingered commit 9824cf97 ("mm: vmstats: tlb flush counters") to cause overhead problems. The counters are undeniably useful but how often do we really need to debug TLB flush related issues? It does not justify taking the penalty everywhere so make it a debugging option. Signed-off-by: Mel Gorman <mgorman@suse.de> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Alex Shi <alex.shi@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-XzxjntugxuwpxXhcrxqqh53b@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25x86/uv/nmi: Fix Sparse warningsMike Travis1-1/+0
Make uv_register_nmi_notifier() and uv_handle_nmi_ping() static to address sparse warnings. Fix problem where uv_nmi_kexec_failed is unused when CONFIG_KEXEC is not defined. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Hedi Berriche <hedi@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/20140114162551.480872353@asylum.americas.sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25x86/AMD/NB: Fix amd_set_subcaches() parameter typeDan Carpenter1-1/+1
This is under CAP_SYS_ADMIN, but Smatch complains that mask comes from the user and the test for "mask > 0xf" can underflow. The fix is simple: amd_set_subcaches() should hand down not an 'int' but an 'unsigned long' like it was originally indended to do. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/20140121072209.GA22095@elgon.mountain Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-25x86/quirks: Add workaround for AMD F16h Erratum792Aravind Gopalakrishnan1-0/+37
The workaround for this Erratum is included in AGESA. But BIOSes spun only after Jan2014 will have the fix (atleast server versions of the chip). The erratum affects both embedded and server platforms and since we cannot say with certainity that ALL BIOSes on systems out in the field will have the fix, we should probably insulate ourselves in case BIOS does not do the right thing or someone is using old BIOSes. Refer to Revision Guide for AMD F16h models 00h-0fh, document 51810 Rev. 3.04, November2013 for details on the Erratum. Tested the patch on Fam16h server platform and it works fine. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: <hmh@hmh.eng.br> Cc: <Kim.Naru@amd.com> Cc: <Suravee.Suthikulpanit@amd.com> Cc: <bp@suse.de> Cc: <sherry.hurwitz@amd.com> Link: http://lkml.kernel.org/r/1390515212-1824-1-git-send-email-Aravind.Gopalakrishnan@amd.com [ Minor edits. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-24Merge tag 'pm+acpi-3.14-rc1' of ↵Linus Torvalds3-8/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "As far as the number of commits goes, the top spot belongs to ACPI this time with cpufreq in the second position and a handful of PM core, PNP and cpuidle updates. They are fixes and cleanups mostly, as usual, with a couple of new features in the mix. The most visible change is probably that we will create struct acpi_device objects (visible in sysfs) for all devices represented in the ACPI tables regardless of their status and there will be a new sysfs attribute under those objects allowing user space to check that status via _STA. Consequently, ACPI device eject or generally hot-removal will not delete those objects, unless the table containing the corresponding namespace nodes is unloaded, which is extremely rare. Also ACPI container hotplug will be handled quite a bit differently and cpufreq will support CPU boost ("turbo") generically and not only in the acpi-cpufreq driver. Specifics: - ACPI core changes to make it create a struct acpi_device object for every device represented in the ACPI tables during all namespace scans regardless of the current status of that device. In accordance with this, ACPI hotplug operations will not delete those objects, unless the underlying ACPI tables go away. - On top of the above, new sysfs attribute for ACPI device objects allowing user space to check device status by triggering the execution of _STA for its ACPI object. From Srinivas Pandruvada. - ACPI core hotplug changes reducing code duplication, integrating the PCI root hotplug with the core and reworking container hotplug. - ACPI core simplifications making it use ACPI_COMPANION() in the code "glueing" ACPI device objects to "physical" devices. - ACPICA update to upstream version 20131218. This adds support for the DBG2 and PCCT tables to ACPICA, fixes some bugs and improves debug facilities. From Bob Moore, Lv Zheng and Betty Dall. - Init code change to carry out the early ACPI initialization earlier. That should allow us to use ACPI during the timekeeping initialization and possibly to simplify the EFI initialization too. From Chun-Yi Lee. - Clenups of the inclusions of ACPI headers in many places all over from Lv Zheng and Rashika Kheria (work in progress). - New helper for ACPI _DSM execution and rework of the code in drivers that uses _DSM to execute it via the new helper. From Jiang Liu. - New Win8 OSI blacklist entries from Takashi Iwai. - Assorted ACPI fixes and cleanups from Al Stone, Emil Goode, Hanjun Guo, Lan Tianyu, Masanari Iida, Oliver Neukum, Prarit Bhargava, Rashika Kheria, Tang Chen, Zhang Rui. - intel_pstate driver updates, including proper Baytrail support, from Dirk Brandewie and intel_pstate documentation from Ramkumar Ramachandra. - Generic CPU boost ("turbo") support for cpufreq from Lukasz Majewski. - powernow-k6 cpufreq driver fixes from Mikulas Patocka. - cpufreq core fixes and cleanups from Viresh Kumar, Jane Li, Mark Brown. - Assorted cpufreq drivers fixes and cleanups from Anson Huang, John Tobias, Paul Bolle, Paul Walmsley, Sachin Kamat, Shawn Guo, Viresh Kumar. - cpuidle cleanups from Bartlomiej Zolnierkiewicz. - Support for hibernation APM events from Bin Shi. - Hibernation fix to avoid bringing up nonboot CPUs with ACPI EC disabled during thaw transitions from Bjørn Mork. - PM core fixes and cleanups from Ben Dooks, Leonardo Potenza, Ulf Hansson. - PNP subsystem fixes and cleanups from Dmitry Torokhov, Levente Kurusa, Rashika Kheria. - New tool for profiling system suspend from Todd E Brandt and a cpupower tool cleanup from One Thousand Gnomes" * tag 'pm+acpi-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (153 commits) thermal: exynos: boost: Automatic enable/disable of BOOST feature (at Exynos4412) cpufreq: exynos4x12: Change L0 driver data to CPUFREQ_BOOST_FREQ Documentation: cpufreq / boost: Update BOOST documentation cpufreq: exynos: Extend Exynos cpufreq driver to support boost cpufreq / boost: Kconfig: Support for software-managed BOOST acpi-cpufreq: Adjust the code to use the common boost attribute cpufreq: Add boost frequency support in core intel_pstate: Add trace point to report internal state. cpufreq: introduce cpufreq_generic_get() routine ARM: SA1100: Create dummy clk_get_rate() to avoid build failures cpufreq: stats: create sysfs entries when cpufreq_stats is a module cpufreq: stats: free table and remove sysfs entry in a single routine cpufreq: stats: remove hotplug notifiers cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properly cpufreq: speedstep: remove unused speedstep_get_state platform: introduce OF style 'modalias' support for platform bus PM / tools: new tool for suspend/resume performance optimization ACPI: fix module autoloading for ACPI enumerated devices ACPI: add module autoloading support for ACPI enumerated devices ACPI: fix create_modalias() return value handling ...
2014-01-23sched/x86/tsc: Initialize multiplier to 0Peter Zijlstra1-1/+1
Since we keep the clock value linearly continuous on frequency change, make sure the initial multiplier is 0, such that our initial value is 0. Without this we compute the initial value at whatever the TSC has managed to reach since power-on. Reported-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Fixes: 20d1c86a57762 ("sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs") Cc: lenb@kernel.org Cc: rjw@rjwysocki.net Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: rui.zhang@intel.com Cc: jacob.jun.pan@linux.intel.com Cc: Mike Galbraith <bitbucket@online.de> Cc: hpa@zytor.com Cc: paulmck@linux.vnet.ibm.com Cc: John Stultz <john.stultz@linaro.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: dyoung@redhat.com Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140123094804.GP30183@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-23Merge tag 'pci-v3.14-changes' of ↵Linus Torvalds2-5/+3
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI changes for the v3.14 merge window: Resource management - Change pci_bus_region addresses to dma_addr_t (Bjorn Helgaas) - Support 64-bit AGP BARs (Bjorn Helgaas, Yinghai Lu) - Add pci_bus_address() to get bus address of a BAR (Bjorn Helgaas) - Use pci_resource_start() for CPU address of AGP BARs (Bjorn Helgaas) - Enforce bus address limits in resource allocation (Yinghai Lu) - Allocate 64-bit BARs above 4G when possible (Yinghai Lu) - Convert pcibios_resource_to_bus() to take pci_bus, not pci_dev (Yinghai Lu) PCI device hotplug - Major rescan/remove locking update (Rafael J. Wysocki) - Make ioapic builtin only (not modular) (Yinghai Lu) - Fix release/free issues (Yinghai Lu) - Clean up pciehp (Bjorn Helgaas) - Announce pciehp slot info during enumeration (Bjorn Helgaas) MSI - Add pci_msi_vec_count(), pci_msix_vec_count() (Alexander Gordeev) - Add pci_enable_msi_range(), pci_enable_msix_range() (Alexander Gordeev) - Deprecate "tri-state" interfaces: fail/success/fail+info (Alexander Gordeev) - Export MSI mode using attributes, not kobjects (Greg Kroah-Hartman) - Drop "irq" param from *_restore_msi_irqs() (DuanZhenzhong) SR-IOV - Clear NumVFs when disabling SR-IOV in sriov_init() (ethan.zhao) Virtualization - Add support for save/restore of extended capabilities (Alex Williamson) - Add Virtual Channel to save/restore support (Alex Williamson) - Never treat a VF as a multifunction device (Alex Williamson) - Add pci_try_reset_function(), et al (Alex Williamson) AER - Ignore non-PCIe error sources (Betty Dall) - Support ACPI HEST error sources for domains other than 0 (Betty Dall) - Consolidate HEST error source parsers (Bjorn Helgaas) - Add a TLP header print helper (Borislav Petkov) Freescale i.MX6 - Remove unnecessary code (Fabio Estevam) - Make reset-gpio optional (Marek Vasut) - Report "link up" only after link training completes (Marek Vasut) - Start link in Gen1 before negotiating for Gen2 mode (Marek Vasut) - Fix PCIe startup code (Richard Zhu) Marvell MVEBU - Remove duplicate of_clk_get_by_name() call (Andrew Lunn) - Drop writes to bridge Secondary Status register (Jason Gunthorpe) - Obey bridge PCI_COMMAND_MEM and PCI_COMMAND_IO bits (Jason Gunthorpe) - Support a bridge with no IO port window (Jason Gunthorpe) - Use max_t() instead of max(resource_size_t,) (Jingoo Han) - Remove redundant of_match_ptr (Sachin Kamat) - Call pci_ioremap_io() at startup instead of dynamically (Thomas Petazzoni) NVIDIA Tegra - Disable Gen2 for Tegra20 and Tegra30 (Eric Brower) Renesas R-Car - Add runtime PM support (Valentine Barshak) - Fix rcar_pci_probe() return value check (Wei Yongjun) Synopsys DesignWare - Fix crash in dw_msi_teardown_irq() (Bjørn Erik Nilsen) - Remove redundant call to pci_write_config_word() (Bjørn Erik Nilsen) - Fix missing MSI IRQs (Harro Haan) - Add dw_pcie prefix before cfg_read/write (Pratyush Anand) - Fix I/O transfers by using CPU (not realio) address (Pratyush Anand) - Whitespace cleanup (Jingoo Han) EISA - Call put_device() if device_register() fails (Levente Kurusa) - Revert EISA initialization breakage ((Bjorn Helgaas) Miscellaneous - Remove unused code, including PCIe 3.0 interfaces (Stephen Hemminger) - Prevent bus conflicts while checking for bridge apertures (Bjorn Helgaas) - Stop clearing bridge Secondary Status when setting up I/O aperture (Bjorn Helgaas) - Use dev_is_pci() to identify PCI devices (Yijing Wang) - Deprecate DEFINE_PCI_DEVICE_TABLE (Joe Perches) - Update documentation 00-INDEX (Erik Ekman)" * tag 'pci-v3.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (119 commits) Revert "EISA: Initialize device before its resources" Revert "EISA: Log device resources in dmesg" vfio-pci: Use pci "try" reset interface PCI: Check parent kobject in pci_destroy_dev() xen/pcifront: Use global PCI rescan-remove locking powerpc/eeh: Use global PCI rescan-remove locking PCI: Fix pci_check_and_unmask_intx() comment typos PCI: Add pci_try_reset_function(), pci_try_reset_slot(), pci_try_reset_bus() MPT / PCI: Use pci_stop_and_remove_bus_device_locked() platform / x86: Use global PCI rescan-remove locking PCI: hotplug: Use global PCI rescan-remove locking pcmcia: Use global PCI rescan-remove locking ACPI / hotplug / PCI: Use global PCI rescan-remove locking ACPI / PCI: Use global PCI rescan-remove locking in PCI root hotplug PCI: Add global pci_lock_rescan_remove() PCI: Cleanup pci.h whitespace PCI: Reorder so actual code comes before stubs PCI/AER: Support ACPI HEST AER error sources for PCI domains other than 0 ACPICA: Add helper macros to extract bus/segment numbers from HEST table. PCI: Make local functions static ...
2014-01-22x86/mm: memblock: switch to use NUMA_NO_NODEGrygorii Strashko2-2/+2
Update X86 code to use NUMA_NO_NODE instead of MAX_NUMNODES while calling memblock APIs, because memblock API will be changed to use NUMA_NO_NODE and will produce warning during boot otherwise. See: https://lkml.org/lkml/2013/12/9/898 Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-22x86: memblock: set current limit to max low memory addressSantosh Shilimkar1-1/+1
The memblock current limit value is used to limit early boot memory allocations below max low memory address by default, as the kernel can access only to the low memory. Hence, set memblock current limit value to the max mapped low memory address instead of max mapped memory address. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Tejun Heo <tj@kernel.org> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Walmsley <paul@pwsan.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-20Merge tag 'driver-core-3.14-rc1' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core / sysfs patches from Greg KH: "Here's the big driver core and sysfs patch set for 3.14-rc1. There's a lot of work here moving sysfs logic out into a "kernfs" to allow other subsystems to also have a virtual filesystem with the same attributes of sysfs (handle device disconnect, dynamic creation / removal as needed / unneeded, etc) This is primarily being done for the cgroups filesystem, but the goal is to also move debugfs to it when it is ready, solving all of the known issues in that filesystem as well. The code isn't completed yet, but all should be stable now (there is a big section that was reverted due to problems found when testing) There's also some other smaller fixes, and a driver core addition that allows for a "collection" of objects, that the DRM people will be using soon (it's in this tree to make merges after -rc1 easier) All of this has been in linux-next with no reported issues" * tag 'driver-core-3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (113 commits) kernfs: associate a new kernfs_node with its parent on creation kernfs: add struct dentry declaration in kernfs.h kernfs: fix get_active failure handling in kernfs_seq_*() Revert "kernfs: fix get_active failure handling in kernfs_seq_*()" Revert "kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitq" Revert "kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep()" Revert "kernfs: remove KERNFS_REMOVED" Revert "kernfs: restructure removal path to fix possible premature return" Revert "kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove()" Revert "kernfs: remove kernfs_addrm_cxt" Revert "kernfs: make kernfs_get_active() block if the node is deactivated but not removed" Revert "kernfs: implement kernfs_{de|re}activate[_self]()" Revert "kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers" Revert "pci: use device_remove_file_self() instead of device_schedule_callback()" Revert "scsi: use device_remove_file_self() instead of device_schedule_callback()" Revert "s390: use device_remove_file_self() instead of device_schedule_callback()" Revert "sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()" Revert "kernfs: remove unnecessary NULL check in __kernfs_remove()" kernfs: remove unnecessary NULL check in __kernfs_remove() drivers/base: provide an infrastructure for componentised subsystems ...
2014-01-20Merge branch 'x86/mpx' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-0/+10
Pull x86 cpufeature and mpx updates from Peter Anvin: "This includes the basic infrastructure for MPX (Memory Protection Extensions) support, but does not include MPX support itself. It is, however, a prerequisite for KVM support for MPX, which I believe will be pushed later this merge window by the KVM team. This includes moving the functionality in futex_atomic_cmpxchg_inatomic() into a new function in uaccess.h so it can be reused - this will be used by the final MPX patches. The actual MPX functionality (map management and so on) will be pushed in a future merge window, when ready" * 'x86/mpx' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel/mpx: Remove unused LWP structure x86, mpx: Add MPX related opcodes to the x86 opcode map x86: replace futex_atomic_cmpxchg_inatomic() with user_atomic_cmpxchg_inatomic x86: add user_atomic_cmpxchg_inatomic at uaccess.h x86, xsave: Support eager-only xsave features, add MPX support x86, cpufeature: Define the Intel MPX feature flag
2014-01-20Merge branch 'x86-kaslr-for-linus' of ↵Linus Torvalds2-14/+26
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 kernel address space randomization support from Peter Anvin: "This enables kernel address space randomization for x86" * 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, kaslr: Clarify RANDOMIZE_BASE_MAX_OFFSET x86, kaslr: Remove unused including <linux/version.h> x86, kaslr: Use char array to gain sizeof sanity x86, kaslr: Add a circular multiply for better bit diffusion x86, kaslr: Mix entropy sources together as needed x86/relocs: Add percpu fixup for GNU ld 2.23 x86, boot: Rename get_flags() and check_flags() to *_cpuflags() x86, kaslr: Raise the maximum virtual address to -1 GiB on x86_64 x86, kaslr: Report kernel offset on panic x86, kaslr: Select random position from e820 maps x86, kaslr: Provide randomness functions x86, kaslr: Return location from decompress_kernel x86, boot: Move CPU flags out of cpucheck x86, relocs: Add more per-cpu gold special cases
2014-01-20Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds3-0/+86
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull leftover x86 fixes from Ingo Molnar: "Two leftover fixes that did not make it into v3.13" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Add check for number of available vectors before CPU down x86, cpu, amd: Add workaround for family 16h, erratum 793
2014-01-20Merge branch 'x86-ras-for-linus' of ↵Linus Torvalds2-9/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: - SCI reporting for other error types not only correctable ones - GHES cleanups - Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant - PCIe AER tracepoint severity levels fix - Error path correction for the mce device init - MCE timer fix - Add more flexibility to the error injection (EINJ) debugfs interface * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, mce: Fix mce_start_timer semantics ACPI, APEI, GHES: Cleanup ghes memory error handling ACPI, APEI: Cleanup alignment-aware accesses ACPI, APEI, GHES: Do not report only correctable errors with SCI ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface ACPI, eMCA: Combine eMCA/EDAC event reporting priority EDAC, sb_edac: Modify H/W event reporting policy EDAC: Add an edac_report parameter to EDAC PCI, AER: Fix severity usage in aer trace event x86, mce: Call put_device on device_register failure
2014-01-20Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds4-1/+365
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull Intel SoC changes from Ingo Molnar: "Improved Intel SoC platform support" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, tsc, apic: Unbreak static (MSR) calibration when CONFIG_X86_LOCAL_APIC=n x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs arch: x86: New MailBox support driver for Intel SOC's
2014-01-20Merge branch 'x86-microcode-for-linus' of ↵Linus Torvalds11-113/+185
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loader updates from Ingo Molnar: "There are two main changes in this tree: - AMD microcode early loading fixes - some microcode loader source files reorganization" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode: Move to a proper location x86, microcode, AMD: Fix early ucode loading x86, microcode: Share native MSR accessing variants x86, ramdisk: Export relocated ramdisk VA
2014-01-20Merge branch 'x86-efi-kexec-for-linus' of ↵Linus Torvalds3-2/+346
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI changes from Ingo Molnar: "This consists of two main parts: - New static EFI runtime services virtual mapping layout which is groundwork for kexec support on EFI (Borislav Petkov) - EFI kexec support itself (Dave Young)" * 'x86-efi-kexec-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/efi: parse_efi_setup() build fix x86: ksysfs.c build fix x86/efi: Delete superfluous global variables x86: Reserve setup_data ranges late after parsing memmap cmdline x86: Export x86 boot_params to sysfs x86: Add xloadflags bit for EFI runtime support on kexec x86/efi: Pass necessary EFI data for kexec via setup_data efi: Export EFI runtime memory mapping to sysfs efi: Export more EFI table variables to sysfs x86/efi: Cleanup efi_enter_virtual_mode() function x86/efi: Fix off-by-one bug in EFI Boot Services reservation x86/efi: Add a wrapper function efi_map_region_fixed() x86/efi: Remove unused variables in __map_region() x86/efi: Check krealloc return value x86/efi: Runtime services virtual mapping x86/mm/cpa: Map in an arbitrary pgd x86/mm/pageattr: Add last levels of error path x86/mm/pageattr: Add a PUD error unwinding path x86/mm/pageattr: Add a PTE pagetable populating function x86/mm/pageattr: Add a PMD pagetable populating function ...

Privacy Policy