Scott Wood [Sat, 12 Oct 2013 00:22:39 +0000 (19:22 -0500)]
powerpc/fsl-book3e-64: Use paca for hugetlb TLB1 entry selection
This keeps usage coordinated for hugetlb and indirect entries, which
should make entry selection more predictable and probably improve overall
performance when mixing the two.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Scott Wood [Sat, 12 Oct 2013 00:22:38 +0000 (19:22 -0500)]
powerpc/e6500: TLB miss handler with hardware tablewalk support
There are a few things that make the existing hw tablewalk handlers
unsuitable for e6500:
- Indirect entries go in TLB1 (though the resulting direct entries go in
TLB0).
- It has threads, but no "tlbsrx." -- so we need a spinlock and
a normal "tlbsx". Because we need this lock, hardware tablewalk
is mandatory on e6500 unless we want to add spinlock+tlbsx to
the normal bolted TLB miss handler.
- TLB1 has no HES (nor next-victim hint) so we need software round robin
(TODO: integrate this round robin data with hugetlb/KVM)
- The existing tablewalk handlers map half of a page table at a time,
because IBM hardware has a fixed 1MiB indirect page size. e6500
has variable size indirect entries, with a minimum of 2MiB.
So we can't do the half-page indirect mapping, and even if we
could it would be less efficient than mapping the full page.
- Like on e5500, the linear mapping is bolted, so we don't need the
overhead of supporting nested tlb misses.
Note that hardware tablewalk does not work in rev1 of e6500.
We do not expect to support e6500 rev1 in mainline Linux.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Mihai Caraman <mihai.caraman@freescale.com>
Scott Wood [Sat, 12 Oct 2013 00:22:37 +0000 (19:22 -0500)]
powerpc: add barrier after writing kernel PTE
There is no barrier between something like ioremap() writing to
a PTE, and returning the value to a caller that may then store the
pointer in a place that is visible to other CPUs. Such callers
generally don't perform barriers of their own.
Even if callers of ioremap() and similar things did use barriers,
the most logical choise would be smp_wmb(), which is not
architecturally sufficient when BookE hardware tablewalk is used. A
full sync is specified by the architecture.
For userspace mappings, OTOH, we generally already have an lwsync due
to locking, and if we occasionally take a spurious fault due to not
having a full sync with hardware tablewalk, it will not be fatal
because we will retry rather than oops.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Kevin Hao [Tue, 24 Dec 2013 07:12:12 +0000 (15:12 +0800)]
powerpc/fsl_booke: enable the relocatable for the kdump kernel
The RELOCATABLE is more flexible and without any alignment restriction.
And it is a superset of DYNAMIC_MEMSTART. So use it by default for
a kdump kernel.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Kevin Hao [Tue, 24 Dec 2013 07:12:11 +0000 (15:12 +0800)]
powerpc/fsl_booke: smp support for booting a relocatable kernel above 64M
When booting above the 64M for a secondary cpu, we also face the
same issue as the boot cpu that the PAGE_OFFSET map two different
physical address for the init tlb and the final map. So we have to use
switch_to_as1/restore_to_as0 between the conversion of these two
maps. When restoring to as0 for a secondary cpu, we only need to
return to the caller. So add a new parameter for function
restore_to_as0 for this purpose.
Use LOAD_REG_ADDR_PIC to get the address of variables which may
be used before we set the final map in cams for the secondary cpu.
Move the setting of cams a bit earlier in order to avoid the
unnecessary using of LOAD_REG_ADDR_PIC.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Kevin Hao [Tue, 24 Dec 2013 07:12:10 +0000 (15:12 +0800)]
powerpc/fsl_booke: make sure PAGE_OFFSET map to memstart_addr for relocatable kernel
This is always true for a non-relocatable kernel. Otherwise the kernel
would get stuck. But for a relocatable kernel, it seems a little
complicated. When booting a relocatable kernel, we just align the
kernel start addr to 64M and map the PAGE_OFFSET from there. The
relocation will base on this virtual address. But if this address
is not the same as the memstart_addr, we will have to change the
map of PAGE_OFFSET to the real memstart_addr and do another relocation
again.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
[scottwood@freescale.com: make offset long and non-negative in simple case]
Signed-off-by: Scott Wood <scottwood@freescale.com>
Kevin Hao [Tue, 24 Dec 2013 07:12:09 +0000 (15:12 +0800)]
powerpc/fsl_booke: introduce map_mem_in_cams_addr
Introduce this function so we can set both the physical and virtual
address for the map in cams. This will be used by the relocation code.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Kevin Hao [Tue, 24 Dec 2013 07:12:08 +0000 (15:12 +0800)]
powerpc: introduce early_get_first_memblock_info
For a relocatable kernel since it can be loaded at any place, there
is no any relation between the kernel start addr and the memstart_addr.
So we can't calculate the memstart_addr from kernel start addr. And
also we can't wait to do the relocation after we get the real
memstart_addr from device tree because it is so late. So introduce
a new function we can use to get the first memblock address and size
in a very early stage (before machine_init).
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Kevin Hao [Tue, 24 Dec 2013 07:12:07 +0000 (15:12 +0800)]
powerpc/fsl_booke: set the tlb entry for the kernel address in AS1
We use the tlb1 entries to map low mem to the kernel space. In the
current code, it assumes that the first tlb entry would cover the
kernel image. But this is not true for some special cases, such as
when we run a relocatable kernel above the 64M or set
CONFIG_KERNEL_START above 64M. So we choose to switch to address
space 1 before setting these tlb entries.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Kevin Hao [Tue, 24 Dec 2013 07:12:06 +0000 (15:12 +0800)]
powerpc: enable the relocatable support for the fsl booke 32bit kernel
This is based on the codes in the head_44x.S. The difference is that
the init tlb size we used is 64M. With this patch we can only load the
kernel at address between memstart_addr ~ memstart_addr + 64M. We will
fix this restriction in the following patches.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Kevin Hao [Tue, 24 Dec 2013 07:12:05 +0000 (15:12 +0800)]
powerpc: introduce macro LOAD_REG_ADDR_PIC
This is used to get the address of a variable when the kernel is not
running at the linked or relocated address.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Kevin Hao [Tue, 24 Dec 2013 07:12:04 +0000 (15:12 +0800)]
powerpc/fsl_booke: introduce get_phys_addr function
Move the codes which translate a effective address to physical address
to a separate function. So it can be reused by other code.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Kevin Hao [Tue, 24 Dec 2013 07:12:03 +0000 (15:12 +0800)]
powerpc/fsl_booke: protect the access to MAS7
The e500v1 doesn't implement the MAS7, so we should avoid to access
this register on that implementations. In the current kernel, the
access to MAS7 are protected by either CONFIG_PHYS_64BIT or
MMU_FTR_BIG_PHYS. Since some code are executed before the code
patching, we have to use CONFIG_PHYS_64BIT in these cases.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Wang Dongsheng [Mon, 6 Jan 2014 05:23:31 +0000 (13:23 +0800)]
powerpc/mpic_timer: fix convert ticks to time subtraction overflow
In some cases tmp_sec may be greater than ticks, because in the process
of calculation ticks and tmp_sec will be rounded.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Wang Dongsheng [Mon, 6 Jan 2014 05:23:30 +0000 (13:23 +0800)]
powerpc/mpic_timer: fix the time is not accurate caused by GTCRR toggle bit
When the timer GTCCR toggle bit is inverted, we calculated the rest
of the time is not accurate. So we need to ignore this bit.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Wang Dongsheng [Wed, 18 Dec 2013 08:39:24 +0000 (16:39 +0800)]
powerpc/p1022ds: add a interrupt for rtc node
Add an external interrupt for rtc node.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Wang Dongsheng [Wed, 18 Dec 2013 08:39:23 +0000 (16:39 +0800)]
powerpc/p1022ds: fix rtc compatible string
RTC Hardware(ds3232) and rtc compatible string does not match.
Change "dallas,ds1339" to "dallas,ds3232".
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Anton Blanchard [Mon, 9 Dec 2013 05:03:10 +0000 (16:03 +1100)]
drivers/tty: ehv_bytechan fails to build as a module
ehv_bytechan is marked tristate but fails to build as a module:
drivers/tty/ehv_bytechan.c:363:1: error: type defaults to ‘int’ in declaration of ‘console_initcall’ [-Werror=implicit-int]
It doesn't make much sense for a console driver to be built as
a module, so change it to a bool.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Wang Dongsheng [Tue, 17 Dec 2013 08:17:02 +0000 (16:17 +0800)]
powerpc/85xx: add sysfs for pw20 state and altivec idle
Add a sys interface to enable/diable pw20 state or altivec idle, and
control the wait entry time.
Enable/Disable interface:
0, disable. 1, enable.
/sys/devices/system/cpu/cpuX/pw20_state
/sys/devices/system/cpu/cpuX/altivec_idle
Set wait time interface:(Nanosecond)
/sys/devices/system/cpu/cpuX/pw20_wait_time
/sys/devices/system/cpu/cpuX/altivec_idle_wait_time
Example: Base on TBfreq is 41MHZ.
1~48(ns): TB[63]
49~97(ns): TB[62]
98~195(ns): TB[61]
196~390(ns): TB[60]
391~780(ns): TB[59]
781~1560(ns): TB[58]
...
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
[scottwood@freescale.com: change ifdef]
Signed-off-by: Scott Wood <scottwood@freescale.com>
Wang Dongsheng [Tue, 17 Dec 2013 08:17:01 +0000 (16:17 +0800)]
powerpc/85xx: add hardware automatically enter pw20 state
Using hardware features make core automatically enter PW20 state.
Set a TB count to hardware, the effective count begins when PW10
is entered. When the effective period has expired, the core will
proceed from PW10 to PW20 if no exit conditions have occurred during
the period.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Wang Dongsheng [Tue, 17 Dec 2013 08:17:00 +0000 (16:17 +0800)]
powerpc/85xx: add hardware automatically enter altivec idle state
Each core's AltiVec unit may be placed into a power savings mode
by turning off power to the unit. Core hardware will automatically
power down the AltiVec unit after no AltiVec instructions have
executed in N cycles. The AltiVec power-control is triggered by hardware.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Wang Dongsheng [Tue, 17 Dec 2013 08:16:59 +0000 (16:16 +0800)]
powerpc/fsl: add E6500 PVR and SPRN_PWRMGTCR0 define
E6500 PVR and SPRN_PWRMGTCR0 will be used in subsequent pw20/altivec
idle patches.
Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Christian Engelmayer [Sun, 15 Dec 2013 18:39:26 +0000 (19:39 +0100)]
powerpc/sysdev: Fix a pci section mismatch for Book E
Moved the following functions out of the __init section:
arch/powerpc/sysdev/fsl_pci.c : fsl_add_bridge()
arch/powerpc/sysdev/indirect_pci.c : setup_indirect_pci()
Those are referenced by arch/powerpc/sysdev/fsl_pci.c : fsl_pci_probe() when
compiling for Book E support.
Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Zhao Qiang [Thu, 5 Dec 2013 06:00:03 +0000 (14:00 +0800)]
powerpc/p1010rdb-pa: modify phy interrupt.
It is not correct according to p1010rdb-pa user guide.
So modify it.
Signed-off-by: Zhao Qiang <B45475@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
LEROY Christophe [Fri, 22 Nov 2013 17:28:29 +0000 (18:28 +0100)]
powerpc 8xx: defconfig: slice by 4 is more efficient than the default slice by 8 on Powerpc 8xx.
On PPC_8xx, CRC32_SLICEBY4 is more efficient (almost twice) than CRC32_SLICEBY8,
as shown below:
With CRC32_SLICEBY8:
[ 1.109204] crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64
[ 1.114401] crc32: self tests passed, processed 225944 bytes in
15118910 nsec
[ 1.130655] crc32c: CRC_LE_BITS = 64
[ 1.134235] crc32c: self tests passed, processed 225944 bytes in
4479879 nsec
With CRC32_SLICEBY4:
[ 1.097129] crc32: CRC_LE_BITS = 32, CRC_BE BITS = 32
[ 1.101878] crc32: self tests passed, processed 225944 bytes in
8616242 nsec
[ 1.116298] crc32c: CRC_LE_BITS = 32
[ 1.119607] crc32c: self tests passed, processed 225944 bytes in
3289576 nsec
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Xie Xiaobo [Wed, 6 Nov 2013 09:08:03 +0000 (17:08 +0800)]
powerpc/85xx: Add TWR-P1025 board support
TWR-P1025 Overview
-----------------
512Mbyte DDR3 (on board DDR)
64MB Nor Flash
eTSEC1: Connected to RGMII PHY AR8035
eTSEC3: Connected to RGMII PHY AR8035
Two USB2.0 Type A
One microSD Card slot
One mini-PCIe slot
One mini-USB TypeB dual UART
Signed-off-by: Michael Johnston <michael.johnston@freescale.com>
Signed-off-by: Xie Xiaobo <X.Xie@freescale.com>
[scottwood@freescale.com: use pr_info rather than KERN_INFO]
Signed-off-by: Scott Wood <scottwood@freescale.com>
Xie Xiaobo [Wed, 6 Nov 2013 09:08:02 +0000 (17:08 +0800)]
powerpc/85xx: Add QE common init function
Define a QE init function in common file, and avoid
the same codes being duplicated in board files.
Signed-off-by: Xie Xiaobo <X.Xie@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Lijun Pan [Fri, 15 Nov 2013 23:28:20 +0000 (17:28 -0600)]
powerpc/85xx: Merge 85xx/p1023_defconfig into mpc85xx_smp and mpc85xx
mpc85xx_smp_defconfig and mpc85xx_defconfig already have CONFIG_P1023RDS=y.
Merge CONFIG_P1023RDB=y and other relevant configurations into
mpc85xx_smp_defconfig and mpc85_defconfig.
Signed-off-by: Lijun Pan <Lijun.Pan@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Scott Wood [Thu, 2 Jan 2014 22:37:50 +0000 (16:37 -0600)]
powerpc/fsl-booke: Use SPRN_SPRGn rather than mfsprg/mtsprg
This fixes a build break that was probably introduced with the removal
of -Wa,-me500 (commit
f49596a4cf4753d13951608f24f939a59fdcc653), where
the assembler refuses to recognize SPRG4-7 with a generic PPC target.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Dongsheng Wang <dongsheng.wang@freescale.com>
Cc: Anton Vorontsov <avorontsov@mvista.com>
Reviewed-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Tested-by: Wang Dongsheng <dongsheng.wang@freescale.com>
Kevin Hao [Thu, 7 Nov 2013 07:17:17 +0000 (15:17 +0800)]
powerpc/85xx: don't init the mpic ipi for the SoC which has doorbell support
It makes no sense to initialize the mpic ipi for the SoC which has
doorbell support. So set the smp_85xx_ops.probe to NULL for this
case. Since the smp_85xx_ops.probe is also used in function
smp_85xx_setup_cpu() to check if we need to invoke
mpic_setup_this_cpu(), we introduce a new setup_cpu function
smp_85xx_basic_setup() to remove this dependency.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Zhao Qiang [Thu, 7 Nov 2013 02:29:29 +0000 (10:29 +0800)]
powerpc/p1010rdb:update mtd of nand to adapt to both old and new p1010rdb
P1010rdb-pa and p1010rdb-pb have different mtd of nand.
So update dts to adapt to both p1010rdb-pa and p1010rdb-pb.
Move the nand-mtd from p1010rdb.dtsi to p1010rdb-pa*.dts.
Remove nand-mtd for p1010rdb-pb, whick will use mtdparts
from u-boot instead of nand-mtd in device tree.
Signed-off-by: Zhao Qiang <B45475@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Zhao Qiang [Thu, 7 Nov 2013 02:29:28 +0000 (10:29 +0800)]
powerpc/p1010rdb:update dts to adapt to both old and new p1010rdb
P1010rdb-pa and p1010rdb-pb have different phy interrupts.
So update dts to adapt to both p1010rdb-pa and p1010rdb-pb.
Signed-off-by: Shengzhou Liu <Shengzhou.Liu@freescale.com>
Signed-off-by: Zhao Qiang <B45475@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Joseph Myers [Mon, 4 Nov 2013 16:55:05 +0000 (16:55 +0000)]
powerpc: fix e500 SPE float SIGFPE generation
The e500 SPE floating-point emulation code is called from
SPEFloatingPointException and SPEFloatingPointRoundException in
arch/powerpc/kernel/traps.c. Those functions have support for
generating SIGFPE, but do_spe_mathemu and speround_handler don't
generate a return value to indicate that this should be done. Such a
return value should depend on whether an exception is raised that has
been set via prctl to generate SIGFPE. This patch adds the relevant
logic in these functions so that SIGFPE is generated as expected by
the glibc testsuite.
Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Joseph Myers [Mon, 4 Nov 2013 16:54:46 +0000 (16:54 +0000)]
powerpc: fix e500 SPE float to integer and fixed-point conversions
The e500 SPE floating-point emulation code has several problems in how
it handles conversions to integer and fixed-point fractional types.
There are the following 20 relevant instructions. These can convert
to signed or unsigned 32-bit integers, either rounding towards zero
(as correct for C casts from floating-point to integer) or according
to the current rounding mode, or to signed or unsigned 32-bit
fixed-point values (values in the range [-1, 1) or [0, 1)). For
conversion from double precision there are also instructions to
convert to 64-bit integers, rounding towards zero, although as far as
I know those instructions are completely theoretical (they are only
defined for implementations that support both SPE and classic 64-bit,
and I'm not aware of any such hardware even though the architecture
definition permits that combination).
#define EFSCTUI 0x2d4
#define EFSCTSI 0x2d5
#define EFSCTUF 0x2d6
#define EFSCTSF 0x2d7
#define EFSCTUIZ 0x2d8
#define EFSCTSIZ 0x2da
#define EVFSCTUI 0x294
#define EVFSCTSI 0x295
#define EVFSCTUF 0x296
#define EVFSCTSF 0x297
#define EVFSCTUIZ 0x298
#define EVFSCTSIZ 0x29a
#define EFDCTUIDZ 0x2ea
#define EFDCTSIDZ 0x2eb
#define EFDCTUI 0x2f4
#define EFDCTSI 0x2f5
#define EFDCTUF 0x2f6
#define EFDCTSF 0x2f7
#define EFDCTUIZ 0x2f8
#define EFDCTSIZ 0x2fa
The emulation code, for the instructions that come in variants
rounding either towards zero or according to the current rounding
direction, uses "if (func & 0x4)" as a condition for using _FP_ROUND
(otherwise _FP_ROUND_ZERO is used). The condition is correct, but the
code it controls isn't. Whether _FP_ROUND or _FP_ROUND_ZERO is used
makes no difference, as the effect of those soft-fp macros is to round
an intermediate floating-point result using the low three bits (the
last one sticky) of the working format. As these operations are
dealing with a freshly unpacked floating-point input, those low bits
are zero and no rounding occurs. The emulation code then uses the
FP_TO_INT_* macros for the actual integer conversion, with the effect
of always rounding towards zero; for rounding according to the current
rounding direction, it should be using FP_TO_INT_ROUND_*.
The instructions in question have semantics defined (in the Power ISA
documents) for out-of-range values and NaNs: out-of-range values
saturate and NaNs are converted to zero. The emulation does nothing
to follow those semantics for NaNs (the soft-fp handling is to treat
them as infinities), and messes up the saturation semantics. For
single-precision conversion to integers, (((func & 0x3) != 0) || SB_s)
is the condition used for doing a signed conversion. The first part
is correct, but the second isn't: negative numbers should result in
saturation to 0 when converted to unsigned. Double-precision
conversion to 64-bit integers correctly uses ((func & 0x1) == 0).
Double-precision conversion to 32-bit integers uses (((func & 0x3) !=
0) || DB_s), with correct first part and incorrect second part. And
vector float conversion to integers uses (((func & 0x3) != 0) ||
SB0_s) (and similar for the other vector element), where the sign bit
check is again wrong.
The incorrect handling of negative numbers converted to unsigned was
introduced in commit
afc0a07d4a283599ac3a6a31d7454e9baaeccca0. The
rationale given there was a C testcase with cast from float to
unsigned int. Conversion of out-of-range floating-point numbers to
integer types in C is undefined behavior in the base standard, defined
in Annex F to produce an unspecified value. That is, the C testcase
used to justify that patch is incorrect - there is no ISO C
requirement for a particular value resulting from this conversion -
and in any case, the correct semantics for such emulation are the
semantics for the instruction (unsigned saturation, which is what it
does in hardware when the emulation is disabled).
The conversion to fixed-point values has its own problems. That code
doesn't try to do a full emulation; it relies on the trap handler only
being called for arguments that are infinities, NaNs, subnormal or out
of range. That's fine, but the logic ((vb.wp[1] >> 23) == 0xff &&
((vb.wp[1] & 0x7fffff) > 0)) for NaN detection won't detect negative
NaNs as being NaNs (the same applies for the double-precision case),
and subnormals are mapped to 0 rather than respecting the rounding
mode; the code should also explicitly raise the "invalid" exception.
The code for vectors works by executing the scalar float instruction
with the trapping disabled, meaning at least subnormals won't be
handled correctly.
As well as all those problems in the main emulation code, the rounding
handler - used to emulate rounding upward and downward when not
supported in hardware and when no higher priority exception occurred -
has its own problems.
* It gets called in some cases even for the instructions rounding to
zero, and then acts according to the current rounding mode when it
should just leave alone the truncated result provided by hardware.
* It presumes that the result is a single-precision, double-precision
or single-precision vector as appropriate for the instruction type,
determines the sign of the result accordingly, and then adjusts the
result based on that sign and the rounding mode.
- In the single-precision cases at least the sign determination for
an integer result is the same as for a floating-point result; in
the double-precision case, converted to 32-bit integer or fixed
point, the sign of a double-precision value is in the high part of
the register but it's the low part of the register that has the
result of the conversion.
- If the result is unsigned fixed-point, its sign may be wrongly
determined as negative (does not actually cause problems, because
inexact unsigned fixed-point results with the high bit set can
only appear when converting from double, in which case the sign
determination is instead wrongly using the high part of the
register).
- If the sign of the result is correctly determined as negative, any
adjustment required to change the truncated result to one correct
for the rounding mode should be in the opposite direction for
two's-complement integers as for sign-magnitude floating-point
values.
- And if the integer result is zero, the correct sign can only be
determined by examining the original operand, and not at all (as
far as I can tell) if the operand and result are the same
register.
This patch fixes all these problems (as far as possible, given the
inability to determine the correct sign in the rounding handler when
the truncated result is 0, the conversion is to a signed type and the
truncated result has overwritten the original operand). Conversion to
fixed-point now uses full emulation, and does not use "asm" in the
vector case; the semantics are exactly those of converting to integer
according to the current rounding direction, once the exponent has
been adjusted, so the code makes such an adjustment then uses the
FP_TO_INT_ROUND macros.
The testcase I used for verifying that the instructions (other than
the theoretical conversions to 64-bit integers) produce the correct
results is at <http://lkml.org/lkml/2013/10/8/708>.
Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Joseph Myers [Mon, 4 Nov 2013 16:54:15 +0000 (16:54 +0000)]
math-emu: fix floating-point to integer overflow detection
On overflow, the math-emu macro _FP_TO_INT_ROUND tries to saturate its
result (subject to the value of rsigned specifying the desired
overflow semantics). However, if the rounding step has the effect of
increasing the exponent so as to cause overflow (if the rounded result
is 1 larger than the largest positive value with the given number of
bits, allowing for signedness), the overflow does not get detected,
meaning that for unsigned results 0 is produced instead of the maximum
unsigned integer with the give number of bits, without an exception
being raised for overflow, and that for signed results the minimum
(negative) value is produced instead of the maximum (positive) value,
again without an exception. This patch makes the code check for
rounding increasing the exponent and adjusts the exponent value as
needed for the overflow check.
Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Joseph Myers [Mon, 4 Nov 2013 16:53:50 +0000 (16:53 +0000)]
math-emu: fix floating-point to integer unsigned saturation
The math-emu macros _FP_TO_INT and _FP_TO_INT_ROUND are supposed to
saturate their results for out-of-range arguments, except in the case
rsigned == 2 (when instead the low bits of the result are taken).
However, in the case rsigned == 0 (converting to unsigned integers),
they mistakenly produce 0 for positive results and the maximum
unsigned integer for negative results, the opposite of correct
unsigned saturation. This patch fixes the logic.
Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Joseph Myers [Mon, 4 Nov 2013 16:53:14 +0000 (16:53 +0000)]
powerpc: fix e500 SPE float rounding inexactness detection
The e500 SPE floating-point emulation code for the rounding modes
rounding to positive or negative infinity (which may not be
implemented in hardware) tries to avoid emulating rounding if the
result was inexact. However, it tests inexactness using the sticky
bit with the cumulative result of previous operations, rather than
with the non-sticky bits relating to the operation that generated the
interrupt. Furthermore, when a vector operation generates the
interrupt, it's possible that only one of the low and high parts is
inexact, and so only that part should have rounding emulated. This
results in incorrect rounding of exact results in these modes when the
sticky bit is set from a previous operation.
(I'm not sure why the rounding interrupts are generated at all when
the result is exact, but empirically the hardware does generate them.)
This patch checks for inexactness using the correct bits of SPEFSCR,
and ensures that rounding only occurs when the relevant part of the
result was actually inexact.
Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Joseph Myers [Tue, 10 Dec 2013 23:07:45 +0000 (23:07 +0000)]
powerpc: fix exception clearing in e500 SPE float emulation
The e500 SPE floating-point emulation code clears existing exceptions
(__FPU_FPSCR &= ~FP_EX_MASK;) before ORing in the exceptions from the
emulated operation. However, these exception bits are the "sticky",
cumulative exception bits, and should only be cleared by the user
program setting SPEFSCR, not implicitly by any floating-point
instruction (whether executed purely by the hardware or emulated).
The spurious clearing of these bits shows up as missing exceptions in
glibc testing.
Fixing this, however, is not as simple as just not clearing the bits,
because while the bits may be from previous floating-point operations
(in which case they should not be cleared), the processor can also set
the sticky bits itself before the interrupt for an exception occurs,
and this can happen in cases when IEEE 754 semantics are that the
sticky bit should not be set. Specifically, the "invalid" sticky bit
is set in various cases with non-finite operands, where IEEE 754
semantics do not involve raising such an exception, and the
"underflow" sticky bit is set in cases of exact underflow, whereas
IEEE 754 semantics are that this flag is set only for inexact
underflow. Thus, for correct emulation the kernel needs to know the
setting of these two sticky bits before the instruction being
emulated.
When a floating-point operation raises an exception, the kernel can
note the state of the sticky bits immediately afterwards. Some
<fenv.h> functions that affect the state of these bits, such as
fesetenv and feholdexcept, need to use prctl with PR_GET_FPEXC and
PR_SET_FPEXC anyway, and so it is natural to record the state of those
bits during that call into the kernel and so avoid any need for a
separate call into the kernel to inform it of a change to those bits.
Thus, the interface I chose to use (in this patch and the glibc port)
is that one of those prctl calls must be made after any userspace
change to those sticky bits, other than through a floating-point
operation that traps into the kernel anyway. feclearexcept and
fesetexceptflag duly make those calls, which would not be required
were it not for this issue.
The previous EGLIBC port, and the uClibc code copied from it, is
fundamentally broken as regards any use of prctl for floating-point
exceptions because it didn't use the PR_FP_EXC_SW_ENABLE bit in its
prctl calls (and did various worse things, such as passing a pointer
when prctl expected an integer). If you avoid anything where prctl is
used, the clearing of sticky bits still means it will never give
anything approximating correct exception semantics with existing
kernels. I don't believe the patch makes things any worse for
existing code that doesn't try to inform the kernel of changes to
sticky bits - such code may get incorrect exceptions in some cases,
but it would have done so anyway in other cases.
Signed-off-by: Joseph Myers <joseph@codesourcery.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Mihai Caraman [Thu, 8 Aug 2013 12:56:09 +0000 (15:56 +0300)]
powerpc/booke64: Add LRAT error exception handler
LRAT (Logical to Real Address Translation) present in MMU v2 provides hardware
translation from a logical page number (LPN) to a real page number (RPN) when
tlbwe is executed by a guest or when a page table translation occurs from a
guest virtual address.
Add LRAT error exception handler to Booke3E 64-bit kernel and the basic KVM
handler to avoid build breakage. This is a prerequisite for KVM LRAT support
that will follow.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Benjamin Herrenschmidt [Mon, 30 Dec 2013 04:19:31 +0000 (15:19 +1100)]
Merge branch 'merge' into next
Merge a pile of fixes that went into the "merge" branch (3.13-rc's) such
as Anton Little Endian fixes.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Sun, 15 Dec 2013 23:47:54 +0000 (10:47 +1100)]
powerpc: Fix endian issues in power7/8 machine check handler
The SLB save area is shared with the hypervisor and is defined
as big endian, so we need to byte swap on little endian builds.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Mon, 30 Dec 2013 03:48:27 +0000 (14:48 +1100)]
Merge remote-tracking branch 'agust/merge' into merge
Anatolij writes:
Please pull two DTS fixes for MPC5125 tower board. Without
them the v3.13-rcX kernels do not boot.
Alistair Popple [Mon, 9 Dec 2013 07:17:03 +0000 (18:17 +1100)]
powerpc/iommu: Update the generic code to use dynamic iommu page sizes
This patch updates the generic iommu backend code to use the
it_page_shift field to determine the iommu page size instead of
using hardcoded values.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Alistair Popple [Mon, 9 Dec 2013 07:17:02 +0000 (18:17 +1100)]
powerpc/iommu: Add it_page_shift field to determine iommu page size
This patch adds a it_page_shift field to struct iommu_table and
initiliases it to 4K for all platforms.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Alistair Popple [Mon, 9 Dec 2013 07:17:01 +0000 (18:17 +1100)]
powerpc/iommu: Update constant names to reflect their hardcoded page size
The powerpc iommu uses a hardcoded page size of 4K. This patch changes
the name of the IOMMU_PAGE_* macros to reflect the hardcoded values. A
future patch will use the existing names to support dynamic page
sizes.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Opdenacker [Mon, 9 Dec 2013 05:27:40 +0000 (06:27 +0100)]
powerpc: Remove unused REDBOOT Kconfig parameter
This removes the REDBOOT Kconfig parameter,
which was no longer used anywhere in the source code
and Makefiles.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mahesh Salgaonkar [Mon, 9 Dec 2013 19:10:15 +0000 (00:40 +0530)]
powerpc: Fix "attempt to move .org backwards" error
With recent machine check patch series changes, The exception vectors
starting from 0x4300 are now overflowing with allyesconfig. Fix that by
moving machine_check_common and machine_check_handle_early code out of
that region to make enough room for exception vector area.
Fixes this build error reportes by Stephen:
arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:958: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:959: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:983: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:984: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:1003: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:1013: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:1014: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:1015: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:1016: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:1017: Error: attempt to move .org backwards
arch/powerpc/kernel/exceptions-64s.S:1018: Error: attempt to move .org backwards
[Moved the code further down as it introduced link errors due to too long
relative branches to the masked interrupts handlers from the exception
prologs. Also removed the useless feature section --BenH
]
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Olof Johansson [Sat, 28 Dec 2013 21:01:47 +0000 (13:01 -0800)]
powerpc: Fix alignment of secondary cpu spin vars
Commit
5c0484e25ec0 ('powerpc: Endian safe trampoline') resulted in
losing proper alignment of the spinlock variables used when booting
secondary CPUs, causing some quite odd issues with failing to boot on
PA Semi-based systems.
This showed itself on ppc64_defconfig, but not on pasemi_defconfig,
so it had gone unnoticed when I initially tested the LE patch set.
Fix is to add explicit alignment instead of relying on good luck. :)
[ It appears that there is a different issue with PA Semi systems
however this fix is definitely correct so applying anyway -- BenH
]
Fixes: 5c0484e25ec0 ('powerpc: Endian safe trampoline')
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=67811
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Mon, 23 Dec 2013 01:19:51 +0000 (12:19 +1100)]
powerpc: Align p_end
p_end is an 8 byte value embedded in the text section. This means it
is only 4 byte aligned when it should be 8 byte aligned. Fix this
by adding an explicit alignment.
This fixes an issue where POWER7 little endian builds with
CONFIG_RELOCATABLE=y fail to boot.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Brian W Hart [Fri, 20 Dec 2013 19:06:01 +0000 (13:06 -0600)]
powernv/eeh: Add buffer for P7IOC hub error data
Prevent ioda_eeh_hub_diag() from clobbering itself when called by supplying
a per-PHB buffer for P7IOC hub diagnostic data. Take care to inform OPAL of
the correct size for the buffer.
[Small style change to the use of sizeof -- BenH]
Signed-off-by: Brian W Hart <hartb@linux.vnet.ibm.com>
Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Brian W Hart [Thu, 19 Dec 2013 23:14:07 +0000 (17:14 -0600)]
powernv/eeh: Fix possible buffer overrun in ioda_eeh_phb_diag()
PHB diagnostic buffer may be smaller than PAGE_SIZE, especially when
PAGE_SIZE > 4KB.
Signed-off-by: Brian W Hart <hartb@linux.vnet.ibm.com>
Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Paul E. McKenney [Tue, 17 Dec 2013 22:29:57 +0000 (09:29 +1100)]
powerpc: Make 64-bit non-VMX __copy_tofrom_user bi-endian
The powerpc 64-bit __copy_tofrom_user() function uses shifts to handle
unaligned invocations. However, these shifts were designed for
big-endian systems: On little-endian systems, they must shift in the
opposite direction.
This commit relies on the C preprocessor to insert the correct shifts
into the assembly code.
[ This is a rare but nasty LE issue. Most of the time we use the POWER7
optimised __copy_tofrom_user_power7 loop, but when it hits an exception
we fall back to the base __copy_tofrom_user loop. - Anton ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Rajesh B Prathipati [Mon, 16 Dec 2013 07:58:22 +0000 (18:58 +1100)]
powerpc: Make unaligned accesses endian-safe for powerpc
The generic put_unaligned/get_unaligned macros were made endian-safe by
calling the appropriate endian dependent macros based on the endian type
of the powerpc processor.
Signed-off-by: Rajesh B Prathipati <rprathip@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling [Mon, 16 Dec 2013 04:12:43 +0000 (15:12 +1100)]
powerpc: Fix bad stack check in exception entry
In EXCEPTION_PROLOG_COMMON() we check to see if the stack pointer (r1)
is valid when coming from the kernel. If it's not valid, we die but
with a nice oops message.
Currently we allocate a stack frame (subtract INT_FRAME_SIZE) before we
check to see if the stack pointer is negative. Unfortunately, this
won't detect a bad stack where r1 is less than INT_FRAME_SIZE.
This patch fixes the check to compare the modified r1 with
-INT_FRAME_SIZE. With this, bad kernel stack pointers (including NULL
pointers) are correctly detected again.
Kudos to Paulus for finding this.
Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Matteo Facchinetti [Fri, 20 Dec 2013 09:16:22 +0000 (10:16 +0100)]
powerpc/512x: dts: disable MPC5125 usb module
At the moment the USB controller's pin muxing is not setup
correctly and causes a kernel panic upon system startup, so
disable the USB1 device tree node in the MPC5125 tower board
dts file.
The USB controller is connected to an USB3320 ULPI transceiver
and the device tree should receive an update to reflect correct
dependencies and required initialization data before the USB1
node can get re-enabled.
Signed-off-by: Matteo Facchinetti <matteo.facchinetti@sirius-es.it>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Gerhard Sittig [Tue, 10 Dec 2013 09:51:08 +0000 (10:51 +0100)]
powerpc/512x: dts: remove misplaced IRQ spec from 'soc' node (5125)
the 'soc' node in the MPC5125 "tower" board .dts has an '#interrupt-cells'
property although this node is not an interrupt controller
remove this erroneously placed property because starting with v3.13-rc1
lookup and resolution of 'interrupts' specs for peripherals gets misled
(tries to use the 'soc' as the interrupt parent which fails), emits
'no irq domain found' WARN() messages and breaks the boot process
[ best viewed with 'git diff -U5' to have DT node names in the context ]
Cc: Anatolij Gustschin <agust@denx.de>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: devicetree@vger.kernel.org
Signed-off-by: Gerhard Sittig <gsi@denx.de>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Benjamin Herrenschmidt [Fri, 13 Dec 2013 04:56:06 +0000 (15:56 +1100)]
powerpc/powernv: Fix OPAL LPC access in Little Endian
We are passing pointers to the firmware for reads, we need to properly
convert the result as OPAL is always BE.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Fri, 13 Dec 2013 04:53:43 +0000 (15:53 +1100)]
powerpc/powernv: Fix endian issue in opal_xscom_read
opal_xscom_read uses a pointer to return the data so we need
to byteswap it on LE builds.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Thu, 12 Dec 2013 04:59:41 +0000 (15:59 +1100)]
powerpc: Fix endian issues in crash dump code
A couple more device tree properties that need byte swapping.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Thu, 12 Dec 2013 04:59:40 +0000 (15:59 +1100)]
powerpc/pseries: Fix endian issues in MSI code
The MSI code is miscalculating quotas in little endian mode.
Add required byteswaps to fix this.
Before we claimed a quota of 65536, after the patch we
see the correct value of 256.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Thu, 12 Dec 2013 04:59:39 +0000 (15:59 +1100)]
powerpc/pseries: Fix PCIE link speed endian issue
We need to byteswap ibm,pcie-link-speed-stats.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Thu, 12 Dec 2013 04:59:38 +0000 (15:59 +1100)]
powerpc/pseries: Fix endian issues in nvram code
The NVRAM code has a number of endian issues. I noticed a very
confused error log count:
RTAS:
100663330 -------- RTAS event begin --------
100663330 == 0x06000022. 0x6 LE error logs and 0x22 BE error logs.
The pstore code has similar issues - if we write an oops in one
endian and attempt to read it in another we get junk.
Make both of these formats big endian, and byteswap as required.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Thu, 12 Dec 2013 04:59:37 +0000 (15:59 +1100)]
powerpc/pseries: Fix endian issues in /proc/ppc64/lparcfg
Some obvious issues:
cat /proc/ppc64/lparcfg
...
partition_id=
16777216
...
partition_potential_processors=
268435456
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Thu, 12 Dec 2013 04:59:36 +0000 (15:59 +1100)]
powerpc: Fix topology core_id endian issue on LE builds
cpu_to_core_id() is missing a byteswap:
cat /sys/devices/system/cpu/cpu63/topology/core_id
201326592
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Thu, 12 Dec 2013 04:59:35 +0000 (15:59 +1100)]
powerpc: Fix endian issue in setup-common.c
During on LE boot we see:
Partition configured for
1073741824 cpus, operating system maximum is 2048.
Clearly missing a byteswap here.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Ulrich Weigand [Thu, 12 Dec 2013 04:59:34 +0000 (15:59 +1100)]
powerpc: PTRACE_PEEKUSR always returns FPR0
There is a bug in using ptrace to access FPRs via PTRACE_PEEKUSR /
PTRACE_POKEUSR. In effect, trying to access any of the FPRs always
really accesses FPR0, which does seriously break debugging :-)
The problem seems to have been introduced by commit
3ad26e5c4459d
(Merge branch 'for-kvm' into next).
[ It is indeed a merge conflict between Paul's FPU/VSX state rework
and my LE patches - Anton ]
Signed-off-by: Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mahesh Salgaonkar [Mon, 9 Dec 2013 10:03:39 +0000 (15:33 +0530)]
powerpc: Fix up the kdump base cap to 128M
The current logic sets the kdump base to min of 2G or ppc64_rma_size/2.
On PowerNV kernel the first memory block 'memory@0' can be very large,
equal to the DIMM size with ppc64_rma_size value capped to 1G. Hence on
PowerNV, kdump base is set to 512M resulting kdump to fail while allocating
paca array. This is because, paca need its memory from RMA region capped
at 256M (see allocate_pacas()).
This patch lowers the kdump base cap to 128M so that kdump kernel can
successfully get memory below 256M for paca allocation.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Thadeu Lima de Souza Cascardo [Mon, 9 Dec 2013 16:41:01 +0000 (14:41 -0200)]
powernv: Fix VFIO support with PHB3
I have recently found out that no iommu_groups could be found under
/sys/ on a P8. That prevents PCI passthrough from working.
During my investigation, I found out there seems to be a missing
iommu_register_group for PHB3. The following patch seems to fix the
problem. After applying it, I see iommu_groups under
/sys/kernel/iommu_groups/, and can also bind vfio-pci to an adapter,
which gives me a device at /dev/vfio/.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anatolij Gustschin [Mon, 9 Dec 2013 21:15:44 +0000 (22:15 +0100)]
powerpc/52xx: Re-enable bestcomm driver in defconfigs
The bestcomm driver has been moved to drivers/dma, so to select
this driver by default additionally CONFIG_DMADEVICES has to be
enabled. Currently it is not enabled in the config despite existing
CONFIG_PPC_BESTCOMM=y in the config files. Fix it.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Olof Johansson [Mon, 9 Dec 2013 23:40:20 +0000 (15:40 -0800)]
powerpc/pasemi: Turn on devtmpfs in defconfig
At least some distros expect it these days; turn it on. Also, random
churn from doing a savedefconfig for the first time in a year or so.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cedric Le Goater [Wed, 4 Dec 2013 16:49:52 +0000 (17:49 +0100)]
offb: Add palette hack for little endian
The pseudo palette color entries need to be ajusted for little
endian.
This patch byteswaps the values in the pseudo palette depending
on the host endian order and the screen depth.
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cedric Le Goater [Wed, 4 Dec 2013 16:49:51 +0000 (17:49 +0100)]
offb: Little endian fixes
The "screen" properties : depth, width, height, linebytes need
to be converted to the host endian order when read from the device
tree.
The offb_init_palette_hacks() routine also made assumption on the
host endian order.
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Hong H. Pham [Sat, 7 Dec 2013 14:06:33 +0000 (09:06 -0500)]
powerpc: Fix PTE page address mismatch in pgtable ctor/dtor
In pte_alloc_one(), pgtable_page_ctor() is passed an address that has
not been converted by page_address() to the newly allocated PTE page.
When the PTE is freed, __pte_free_tlb() calls pgtable_page_dtor()
with an address to the PTE page that has been converted by page_address().
The mismatch in the PTE's page address causes pgtable_page_dtor() to access
invalid memory, so resources for that PTE (such as the page lock) is not
properly cleaned up.
On PPC32, only SMP kernels are affected.
On PPC64, only SMP kernels with 4K page size are affected.
This bug was introduced by commit
d614bb041209fd7cb5e4b35e11a7b2f6ee8f62b8
"powerpc: Move the pte free routines from common header".
On a preempt-rt kernel, a spinlock is dynamically allocated for each
PTE in pgtable_page_ctor(). When the PTE is freed, calling
pgtable_page_dtor() with a mismatched page address causes a memory leak,
as the pointer to the PTE's spinlock is bogus.
On mainline, there isn't any immediately obvious symptoms, but the
problem still exists here.
Fixes: d614bb041209fd7c "powerpc: Move the pte free routes from common header"
Cc: Paul Mackerras <paulus@samba.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-stable <stable@vger.kernel.org> # v3.10+
Signed-off-by: Hong H. Pham <hong.pham@windriver.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Ilia Mirkin [Sat, 7 Dec 2013 00:43:37 +0000 (19:43 -0500)]
powerpc/44x: Fix ocm_block allocation
Allocate enough memory for the ocm_block structure, not just a pointer
to it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Fri, 29 Nov 2013 03:55:22 +0000 (14:55 +1100)]
powerpc: Fix build break with PPC_EARLY_DEBUG_BOOTX=y
A kernel configured with PPC_EARLY_DEBUG_BOOTX=y but PPC_PMAC=n and
PPC_MAPLE=n will fail to link:
btext.c:(.text+0x2d0fc): undefined reference to `.rmci_off'
btext.c:(.text+0x2d214): undefined reference to `.rmci_on'
Fix it by making the build of rmci_on/off() depend on
PPC_EARLY_DEBUG_BOOTX, which also enable the only code that uses them.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Tue, 10 Dec 2013 00:24:10 +0000 (11:24 +1100)]
Merge remote-tracking branch 'agust/merge' into merge
Anatolij Gustschin says:
<<
Please pull a device tree fix for v3.13. The booting on mpc512x
is broken since v3.13-rc1, this patch repairs it.
>>
Mahesh Salgaonkar [Fri, 15 Nov 2013 04:20:57 +0000 (09:50 +0530)]
powerpc/powernv: Get FSP memory errors and plumb into memory poison infrastructure.
Get the memory errors reported by opal and plumb it into memory poison
infrastructure. This patch uses new messaging channel infrastructure to
pull the fsp memory errors to linux.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mahesh Salgaonkar [Fri, 15 Nov 2013 04:20:50 +0000 (09:50 +0530)]
powerpc/powernv: Add config option for hwpoisoning.
Add config option to enable generic memory hwpoisoning infrastructure for
ppc64.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Mon, 18 Nov 2013 09:28:13 +0000 (14:58 +0530)]
powerpc/mm: Enable _PAGE_NUMA for book3s
We steal the _PAGE_COHERENCE bit and use that for indicating NUMA ptes.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Mon, 18 Nov 2013 09:28:12 +0000 (14:58 +0530)]
powerpc/mm: Only check for _PAGE_PRESENT in set_pte/pmd functions
We want to make sure we don't use these function when updating a pte
or pmd entry that have a valid hpte entry, because these functions
don't invalidate them. So limit the check to _PAGE_PRESENT bit.
Numafault core changes use these functions for updating _PAGE_NUMA bits.
That should be ok because when _PAGE_NUMA is set we can be sure that
hpte entries are not present.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Mon, 18 Nov 2013 09:28:10 +0000 (14:58 +0530)]
powerpc/mm: Free up _PAGE_COHERENCE for numa fault use later
Set memory coherence always on hash64 config. If
a platform cannot have memory coherence always set they
can infer that from _PAGE_NO_CACHE and _PAGE_WRITETHRU
like in lpar. So we dont' really need a separate bit
for tracking _PAGE_COHERENCE.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Mon, 18 Nov 2013 09:28:09 +0000 (14:58 +0530)]
powerpc/mm: Use HPTE constants when updating hpte bits
Even though we have same value for linux PTE bits and hash PTE pits
use the hash pte bits wen updating hash pte
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Jeremy Kerr [Thu, 5 Dec 2013 03:42:40 +0000 (11:42 +0800)]
powerpc: Dynamically allocate slb_shadow from memblock
Currently, the slb_shadow buffer is our largest symbol:
[jk@pablo linux]$ nm --size-sort -r -S obj/vmlinux | head -1
c000000000da0000 0000000000040000 d slb_shadow
- we allocate 128 bytes per cpu; so 256k with NR_CPUS=2048. As we have
constant initialisers, it's allocated in .text, causing a larger vmlinux
image. We may also allocate unecessary slb_shadow buffers (> no. pacas),
since we use the build-time NR_CPUS rather than the run-time nr_cpu_ids.
We could move this to the bss, but then we still have the NR_CPUS vs
nr_cpu_ids potential for overallocation.
This change dynamically allocates the slb_shadow array, during
initialise_pacas(). At a cost of 104 bytes of text, we save 256k of
data:
[jk@pablo linux]$ size obj/vmlinux{.orig,}
text data bss dec hex filename
9202795 5244676 1169576 15617047 ee4c17 obj/vmlinux.orig
9202899 4982532 1169576 15355007 ea4c7f obj/vmlinux
Tested on pseries.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Jeremy Kerr [Thu, 5 Dec 2013 03:31:08 +0000 (11:31 +0800)]
powerpc: Make slb_shadow a local
The only external user of slb_shadow is the pseries lpar code, and it
can access through the paca array instead.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Yijing Wang [Thu, 5 Dec 2013 12:01:20 +0000 (20:01 +0800)]
powerpc/pci: Use dev_is_pci() to check whether it is pci device
Use PCI standard marco dev_is_pci() instead of directly compare
pci_bus_type to check whether it is pci device.
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Thu, 5 Dec 2013 18:38:22 +0000 (00:08 +0530)]
mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE
change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE.
On archs like ppc64 that don't use _PAGE_PROTNONE and also have
a separate page table outside linux pagetable, we just need to
make sure that when calling change_prot_numa we flush the
hardware page table entry so that next page access result in a numa
fault.
We still need to make sure we use the numa faulting logic only
when CONFIG_NUMA_BALANCING is set. This implies the migrate-on-fault
(Lazy migration) via mbind will only work if CONFIG_NUMA_BALANCING
is set.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gerhard Sittig [Tue, 3 Dec 2013 10:56:52 +0000 (11:56 +0100)]
powerpc/512x: dts: remove misplaced IRQ spec from 'soc' node
the 'soc' node in the common .dtsi for MPC5121 has an '#interrupt-cells'
property although this node is not an interrupt controller
remove this erroneously placed property because starting with v3.13-rc1
lookup and resolution of 'interrupts' specs for peripherals gets misled,
emits 'no irq domain found' WARN() messages and breaks the boot process
irq: no irq domain found for /soc@
80000000 !
------------[ cut here ]------------
WARNING: at drivers/of/platform.c:171
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Tainted: G W
3.13.0-rc1-00001-g8a66234 #8
task:
df823bb0 ti:
df834000 task.ti:
df834000
NIP:
c02b5190 LR:
c02b5180 CTR:
c01cf4e0
REGS:
df835c50 TRAP: 0700 Tainted: G W (
3.13.0-rc1-00001-g8a66234)
MSR:
00029032 <EE,ME,IR,DR,RI> CR:
229a9d42 XER:
20000000
GPR00:
c02b5180 df835d00 df823bb0 00000000 00000000 df835b18 ffffffff 00000308
GPR08:
c0479cc0 c0480000 c0479cc0 00000308 00000308 00000000 c00040fc 00000000
GPR16:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 df850880
GPR24:
df84d670 00000000 00000001 df8561a0 dffffccc df85089c 00000020 00000001
NIP [
c02b5190] of_device_alloc+0xf4/0x1a0
LR [
c02b5180] of_device_alloc+0xe4/0x1a0
Call Trace:
[
df835d00] [
c02b5180] of_device_alloc+0xe4/0x1a0 (unreliable)
[
df835d50] [
c02b5278] of_platform_device_create_pdata+0x3c/0xc8
[
df835d70] [
c02b53fc] of_platform_bus_create+0xf8/0x170
[
df835dc0] [
c02b5448] of_platform_bus_create+0x144/0x170
[
df835e10] [
c02b55a8] of_platform_bus_probe+0x98/0xe8
[
df835e30] [
c0437508] mpc512x_init+0x28/0x1c4
[
df835e70] [
c0435de8] ppc_init+0x4c/0x60
[
df835e80] [
c0003b28] do_one_initcall+0x150/0x1a4
[
df835ef0] [
c0432048] kernel_init_freeable+0x114/0x1c0
[
df835f30] [
c0004114] kernel_init+0x18/0x124
[
df835f40] [
c000e910] ret_from_kernel_thread+0x5c/0x64
Instruction dump:
409effd4 57c9103a 57de2834 7c89f050 7f83e378 7c972214 7f45d378 48001f55
7c63d278 7c630034 5463d97e 687a0001 <
0f1a0000>
2f990000 387b0010 939b0098
---[ end trace
2257f10e5a20cbdd ]---
...
irq: no irq domain found for /soc@
80000000 !
fsl-diu-fb
80002100.display: could not get DIU IRQ
fsl-diu-fb: probe of
80002100.display failed with error -22
irq: no irq domain found for /soc@
80000000 !
mpc512x_dma
80014000.dma: Error mapping IRQ!
mpc512x_dma: probe of
80014000.dma failed with error -22
...
irq: no irq domain found for /soc@
80000000 !
fs_enet: probe of
80002800.ethernet failed with error -22
...
irq: no irq domain found for /soc@
80000000 !
mpc5121-rtc
80000a00.rtc: mpc5121_rtc_probe: could not request irq: 0
mpc5121-rtc: probe of
80000a00.rtc failed with error -22
...
Cc: Anatolij Gustschin <agust@denx.de>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: devicetree@vger.kernel.org
Signed-off-by: Gerhard Sittig <gsi@denx.de>
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Michael Neuling [Fri, 29 Nov 2013 03:22:48 +0000 (14:22 +1100)]
powernv: Remove get/set_rtc_time when they are not present
Currently we continue to poll get/set_rtc_time even when we know they
are not working.
This changes it so that if it fails at boot time we remove the ppc_md
get/set_rtc_time hooks so that we don't end up polling known broken
calls.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Fri, 29 Nov 2013 02:27:18 +0000 (13:27 +1100)]
powerpc: Add real mode cache inhibited IO accessors
These accessors allow us to do cache inhibited accesses when in real
mode. They should only be used in real mode.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Brian King [Mon, 25 Nov 2013 22:27:54 +0000 (16:27 -0600)]
powerpc: Increase EEH recovery timeout for SR-IOV
In order to support concurrent adapter firmware download
to SR-IOV adapters on pSeries, each VF will see an EEH event
where the slot will remain in the unavailable state for
the duration of the adapter firmware update, which can take
as long as 5 minutes. Extend the EEH recovery timeout to
account for this.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Fri, 22 Nov 2013 08:28:46 +0000 (16:28 +0800)]
powerpc/eeh: Output PHB diag-data
When hitting frozen PE or fenced PHB, it's always indicative to
have dumped PHB diag-data for further analysis and diagnosis.
However, we never dump that for the cases. The patch intends to
dump PHB diag-data at the backend of eeh_ops::get_log() for PowerNV
platform.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gavin Shan [Fri, 22 Nov 2013 08:28:45 +0000 (16:28 +0800)]
powerpc/powernv: Move PHB-diag dump functions around
Prior to the completion of PCI enumeration, we actively detects
EEH errors on PCI config cycles and dump PHB diag-data if necessary.
The EEH backend also dumps PHB diag-data in case of frozen PE or
fenced PHB. However, we are using different functions to dump the
PHB diag-data for those 2 cases.
The patch merges the functions for dumping PHB diag-data to one so
that we can avoid duplicate code. Also, we never dump PHB3 diag-data
during PCI config cycles with frozen PE. The patch fixes it as well.
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Alexey Kardashevskiy [Thu, 21 Nov 2013 06:43:14 +0000 (17:43 +1100)]
PPC: POWERNV: move iommu_add_device earlier
The current implementation of IOMMU on sPAPR does not use iommu_ops
and therefore does not call IOMMU API's bus_set_iommu() which
1) sets iommu_ops for a bus
2) registers a bus notifier
Instead, PCI devices are added to IOMMU groups from
subsys_initcall_sync(tce_iommu_init) which does basically the same
thing without using iommu_ops callbacks.
However Freescale PAMU driver (https://lkml.org/lkml/2013/7/1/158)
implements iommu_ops and when tce_iommu_init is called, every PCI device
is already added to some group so there is a conflict.
This patch does 2 things:
1. removes the loop in which PCI devices were added to groups and
adds explicit iommu_add_device() calls to add devices as soon as they get
the iommu_table pointer assigned to them.
2. moves a bus notifier to powernv code in order to avoid conflict with
the notifier from Freescale driver.
iommu_add_device() and iommu_del_device() are public now.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Vasant Hegde [Mon, 18 Nov 2013 11:09:22 +0000 (16:39 +0530)]
powerpc/powernv: Move SG list structure to header file
Move SG list and entry structure to header file so that
it can be used in other places as well.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mahesh Salgaonkar [Mon, 18 Nov 2013 10:05:58 +0000 (15:35 +0530)]
powerpc/powernv: Infrastructure to read opal messages in generic format.
Opal now has a new messaging infrastructure to push the messages to
linux in a generic format for different type of messages using only one
event bit. The format of the opal message is as below:
struct opal_msg {
uint32_t msg_type;
uint32_t reserved;
uint64_t params[8];
};
This patch allows clients to subscribe for notification for specific
message type. It is upto the subscriber to decipher the messages who showed
interested in receiving specific message type.
The interface to subscribe for notification is:
int opal_message_notifier_register(enum OpalMessageType msg_type,
struct notifier_block *nb)
The notifier will fetch the opal message when available and notify the
subscriber with message type and the opal message. It is subscribers
responsibility to copy the message data before returning from notifier
callback.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Geert Uytterhoeven [Tue, 12 Nov 2013 19:07:14 +0000 (20:07 +0100)]
powerpc/windfarm: Remove superfluous name casts
wf_sensor.name is "const char *"
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mahesh Salgaonkar [Wed, 30 Oct 2013 14:36:13 +0000 (20:06 +0530)]
powerpc/powernv: Machine check exception handling.
Add basic error handling in machine check exception handler.
- If MSR_RI isn't set, we can not recover.
- Check if disposition set to OpalMCE_DISPOSITION_RECOVERED.
- Check if address at fault is inside kernel address space, if not then send
SIGBUS to process if we hit exception when in userspace.
- If address at fault is not provided then and if we get a synchronous machine
check while in userspace then kill the task.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mahesh Salgaonkar [Wed, 30 Oct 2013 14:35:58 +0000 (20:05 +0530)]
powerpc/powernv: Remove machine check handling in OPAL.
Now that we are ready to handle machine check directly in linux, do not
register with firmware to handle machine check exception.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mahesh Salgaonkar [Wed, 30 Oct 2013 14:35:49 +0000 (20:05 +0530)]
powerpc/book3s: Queue up and process delayed MCE events.
When machine check real mode handler can not continue into host kernel
in V mode, it returns from the interrupt and we loose MCE event which
never gets logged. In such a situation queue up the MCE event so that
we can log it later when we get back into host kernel with r1 pointing to
kernel stack e.g. during syscall exit.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Mahesh Salgaonkar [Wed, 30 Oct 2013 14:35:40 +0000 (20:05 +0530)]
powerpc/book3s: Decode and save machine check event.
Now that we handle machine check in linux, the MCE decoding should also
take place in linux host. This info is crucial to log before we go down
in case we can not handle the machine check errors. This patch decodes
and populates a machine check event which contain high level meaning full
MCE information.
We do this in real mode C code with ME bit on. The MCE information is still
available on emergency stack (in pt_regs structure format). Even if we take
another exception at this point the MCE early handler will allocate a new
stack frame on top of current one. So when we return back here we still have
our MCE information safe on current stack.
We use per cpu buffer to save high level MCE information. Each per cpu buffer
is an array of machine check event structure indexed by per cpu counter
mce_nest_count. The mce_nest_count is incremented every time we enter
machine check early handler in real mode to get the current free slot
(index = mce_nest_count - 1). The mce_nest_count is decremented once the
MCE info is consumed by virtual mode machine exception handler.
This patch provides save_mce_event(), get_mce_event() and release_mce_event()
generic routines that can be used by machine check handlers to populate and
retrieve the event. The routine release_mce_event() will free the event slot so
that it can be reused. Caller can invoke get_mce_event() with a release flag
either to release the event slot immediately OR keep it so that it can be
fetched again. The event slot can be also released anytime by invoking
release_mce_event().
This patch also updates kvm code to invoke get_mce_event to retrieve generic
mce event rather than paca->opal_mce_evt.
The KVM code always calls get_mce_event() with release flags set to false so
that event is available for linus host machine
If machine check occurs while we are in guest, KVM tries to handle the error.
If KVM is able to handle MC error successfully, it enters the guest and
delivers the machine check to guest. If KVM is not able to handle MC error, it
exists the guest and passes the control to linux host machine check handler
which then logs MC event and decides how to handle it in linux host. In failure
case, KVM needs to make sure that the MC event is available for linux host to
consume. Hence KVM always calls get_mce_event() with release flags set to false
and later it invokes release_mce_event() only if it succeeds to handle error.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>