Releases: open-power/skiboot
v7.1
v6.2
skiboot-6.2
skiboot v6.2 was released on Friday December 14th 2018. It is the first
release of skiboot 6.2, which becomes the new stable release of skiboot
following the 6.1 release, first released July 11th 2018.
Skiboot 6.2 will mark the basis for op-build v2.2.
skiboot v6.2 contains all bug fixes as of [skiboot-6.0.14]{role="ref"},
and [skiboot-5.4.10]{role="ref"} (the currently maintained stable
releases).
For how the skiboot stable releases work, see [stable-rules]{role="ref"}
for details.
This release has been a longer cycle than typical for a variety of
reasons. It also contains a lot of cleanup work and minor bug fixes
(much like skiboot 6.1 did).
Over skiboot 6.1, we have the following changes:
General
Since v6.2-rc2:
-
i2c: Fix i2c request hang during opal init if timers are not checked
If an i2c request cannot go through the first time, because the bus
is found in error and need a reset or it's locked by the OCC for
example, the underlying i2c implementation is using timers to manage
the request. However during opal init, opal pollers may not be
called, it depends in the context in which the i2c request is made.
If the pollers are not called, the timers are not checked and we can
end up with an i2c request which will not move foward and skiboot
hangs.Fix it by explicitly checking the timers if we are waiting for an
i2c request to complete and it seems to be taking a while.
Since v6.1:
-
cpu: Quieten OS endian switch messages
Users see these when loading an OS from Petitboot: :
[ 119.486794100,5] OPAL: Switch to big-endian OS [ 120.022302604,5] OPAL: Switch to little-endian OS
Which is expected and doesn't provide any information the user can
act on. Switch them to PR_INFO so they still appear in the log, but
not on the serial console. -
Recognise signed VERSION partition
A few things need to change to support a signed VERSION partition:
- A signed VERSION partition will be 4K +
SECURE_BOOT_HEADERS_SIZE (4K). - The VERSION partition needs to be loaded after secure/trusted
boot is set up, and therefore after nvram_init(). - Added to the trustedboot resources array.
This also moves the ipmi_dt_add_bmc_info() call to after
flash_dt_add_fw_version() since it adds info to
ibm,firmware-versions. - A signed VERSION partition will be 4K +
-
Run pollers in time_wait() when not booting
This only bit us hard with hiomap in one scenario.
Our OPAL API has been OPAL_POLL_EVENTS may be needed to make
forward progress on ongoing operations, and the internal to skiboot
API has been that time_wait() of a suitable time will run pollers
(on at least one CPU) to help ensure forward progress can be made.In a perfect world, interrupts are used but they may: a) be
disabled, or
b) the thing we're doing can't use interrupts because computers are
generally terrible.Back in 3db397e (circa 2015), we changed skiboot so that we'd
run pollers only on the boot CPU, and not if we held any locks. This
was to reduce the chance of programming code that could deadlock, as
well as to ensure that we didn't just thrash all the cachelines for
running pollers all over a large system during boot, or hard spin on
the same locks on all secondary CPUs.The problem arises if the OS we're booting makes an OPAL call early
on, with interrupts disabled, that requires a poller to run to make
forward progress. An example of this would be OPAL_WRITE_NVRAM
early in Linux boot (where Linux sets up the partitions it wants) -
something that occurs iff we've had to reformat NVRAM this boot
(i.e. first boot or corrupted NVRAM).The hiomap implementation should arguably not rely on synchronous
IPMI messages, but this is a future improvement (as was for mbox
before it). The mbox-flash code solved this problem by spinning on
check_timers().More generically though, the approach of running the pollers when no
longer booting means we behave more in line with what the API is
meant to be, rather than have this odd case of "time_wait() for a
condition that could also be tripped by an interrupt works fine
unless the OS is up and running but hasn't set interrupts up yet". -
ipmi: Reduce ipmi_queue_msg_sync() polling loop time to 10ms
On a plain boot, this reduces the time spent in OPAL by ~170ms on
p9dsu. This is due to hiomap (currently) using synchronous IPMI
messages.It will also significantly reduce latency on runtime flash
operations for hiomap, as we'll spend typically 10-20ms in OPAL
rather than 100-200ms. It's not an ideal solution to that, but
it's a quick and obvious win for jitter. -
core/device: NULL pointer dereference fix
-
core/flash: NULL pointer dereference fixes
-
core/cpu: Call memset with proper cpu_thread offset
-
libflash: Add ipmi-hiomap, and prefer it for PNOR access
ipmi-hiomap implements the PNOR access control protocol formerly
known as "the mbox protocol" but uses IPMI instead of the AST LPC
mailbox as a transport. As there is no-longer any mailbox involved
in this alternate implementation the old protocol name is quite
misleading, and so it has been renamed to "the hiomap protoocol"
(Host I/O Mapping protocol). The same commands and events are used
though this client-side implementation assumes v2 of the protocol is
supported by the BMC.The code is a heavily-reworked copy of the mbox-flash source and is
introduced this way to allow for the mbox implementation's eventual
removal.mbox-flash should in theory be renamed to mbox-hiomap for
consistency, but as it is on life-support effective immediately we
may as well just remove it entirely when the time is right. -
opal/hmi: Handle early HMIs on thread0 when secondaries are still in
OPAL.When primary thread receives a CORE level HMI for timer facility
errors while secondaries are still in OPAL, thread 0 ends up in
rendez-vous waiting for secondaries to get into hmi handling. This
is because OPAL runs with MSR(EE=0) and hence HMIs are delayed on
secondary threads until they are given to Linux OS. Fix this by
adding a check for secondary state and force them in hmi handling by
queuing job on secondary threads.I have tested this by injecting HDEC parity error very early during
Linux kernel boot. Recovery works fine for non-TB errors. But if TB
is bad at this very eary stage we already doomed.Without this patch we see: :
[ 285.046347408,7] OPAL: Start CPU 0x0843 (PIR 0x0843) -> 0x000000000000a83c [ 285.051160609,7] OPAL: Start CPU 0x0844 (PIR 0x0844) -> 0x000000000000a83c [ 285.055359021,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000 [ 285.055361439,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:0: TFMR(2e12002870e14000) Timer Facility Error [ 286.232183823,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 1 (sptr=0000ccc1) [ 287.409002056,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 2 (sptr=0000ccc1) [ 289.073820164,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 3 (sptr=0000ccc1) [ 290.250638683,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 1 (sptr=0000ccc2) [ 291.427456821,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 2 (sptr=0000ccc2) [ 293.092274807,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 3 (sptr=0000ccc2) [ 294.269092904,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 1 (sptr=0000ccc3) [ 295.445910944,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 2 (sptr=0000ccc3) [ 297.110728970,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 3 (sptr=0000ccc3)
After this patch: :
[ 259.401719351,7] OPAL: Start CPU 0x0841 (PIR 0x0841) -> 0x000000000000a83c [ 259.406259572,7] OPAL: Start CPU 0x0842 (PIR 0x0842) -> 0x000000000000a83c [ 259.410615534,7] OPAL: Start CPU 0x0843 (PIR 0x0843) -> 0x000000000000a83c [ 259.415444519,7] OPAL: Start CPU 0x0844 (PIR 0x0844) -> 0x000000000000a83c [ 259.419641401,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000 [ 259.419644124,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:0: TFMR(2e12002870e04000) Timer Facility Error [ 259.419650678,7] HMI: Sending hmi job to thread 1 [ 259.419652744,7] HMI: Sending hmi job to thread 2 [ 259.419653051,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000 [ 259.419654725,7] HMI: Sending hmi job to thread 3 [ 259.419654916,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000 [ 259.419658025,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000 [ 259.419658406,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:2: TFMR(2e12002870e04000) Timer Facility Error [ 259.419663095,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:3: TFMR(2e12002870e04000) Timer Facility Error [ 259.419655234,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:1: TFMR(2e12002870e04000) Timer Facility Error [ 259.425109779,7] OPAL: Start CPU 0x0845 (PIR 0x0845) -> 0x000000000000a83c [ 259.429870681,7] OPAL: Start CPU 0x0846 (PIR 0x0846) -> 0x000000000000a83c [ 259.434549250,7] OPAL: Start CPU 0x0847 (PIR 0x0847) -> 0x000000000000a83c
-
core/cpu: Fix memory allocation for job array
fixes: 7a3f307 cor...
v6.0.3
skiboot-6.0.3
skiboot 6.0.3 was released on Wednesday May 23rd, 2018. It replaces
:ref:skiboot-6.0.2
as the current stable release in the 6.0.x series.
It is recommended that 6.0.3 be used instead of any previous 6.0.x version.
Over :ref:skiboot-6.0.3
, we have bug fixes related to i2c booting in
secure mode, and general functionality with a TPM present. These changes are:
-
p8-i2c: Remove force reset
Force reset was added as an attempt to work around some issues with TPM
devices locking up their I2C bus. In that particular case the problem
was that the device would hold the SCL line down permanently due to a
device firmware bug. The force reset doesn't actually do anything to
alleviate the situation here, it just happens to reset the internal
master state enough to make the I2C driver appear to work until
something tries to access the bus again.On P9 systems with secure boot enabled there is the added problem
of the "diagostic mode" not being supported on I2C masters A,B,C and
D. Diagnostic mode allows the SCL and SDA lines to be driven directly
by software. Without this force reset is impossible to implement.This patch removes the force reset functionality entirely since:
a) it doesn't do what it's supposed to, and
b) it's butt ugly codeAdditionally, turn p8_i2c_reset_engine() into p8_i2c_reset_port().
There's no need to reset every port on a master in response to an
error that occurred on a specific port. -
libstb/i2c-driver: Bump max timeout
We have observed some TPMs clock streching the I2C bus for signifigant
amounts of time when processing commands. The same TPMs also have
errata that can result in permernantly locking up a bus in response to
an I2C transaction they don't understand. Using an excessively long
timeout to prevent this in the field. -
Add TPM timeout workaround
Set the default timeout for any bus containing a TPM to one second. This
is needed to work around a bug in the firmware of certain TPMs that will
clock strech the I2C port the for up to a second. Additionally, when the
TPM is clock streching it responds to a STOP condition on the bus by
bricking itself. Clearing this error requires a hard power cycle of the
system since the TPM is powered by standby power.
v6.0.2
skiboot-6.0.2
skiboot 6.0.2 was released on Friday May 18th, 2018. It replaces
:ref:skiboot-6.0.1
as the current stable release in the 6.0.x series.
It is recommended that 6.0.2 be used instead of any previous 6.0.x version.
Over :ref:skiboot-6.0.1
, we one bug fix:
-
cpu: Clear PCR SPR in opal_reinit_cpus()
Currently if Linux boots with a non-zero PCR, things can go bad where
some early userspace programs can take illegal instructions. This is
being fixed in Linux, but in the mean time, we should cleanup in
skiboot also.This could exhibit itself as petitboot getting killed with SIGILL and
no boot devices showing up, but only in a situation where you've done
a kdump from a kernel running a p8 compat guest
v6.0.1
skiboot-6.0.1
skiboot 6.0.1 was released on Wednesday May 16th, 2018. It replaces
:ref:skiboot-6.0
as the current stable release in the 6.0.x series.
It is recommended that 6.0.1 be used instead of any previous 6.0.x version
due to the bug fixes and debugging enhancements in it.
Over :ref:skiboot-6.0
, we have two bug fixes:
-
OpenBMC: use 0x3a as OEM command for partial add esel.
This fixes the bug where skiboot would never send an eSEL to the BMC.
-
Add location code to NPU2 HMI logging
The current HMI error message does not specifiy where the HMI
error occured.The original error message was ::
NPU: FIR#0 FIR 0x0080100000000000 mask 0x009a48180f01ffff
The enhanced error message is ::
NPU2: [Loc: UOPWR.0000000-Node0-Proc0] P:0 FIR#0 FIR 0x0000100000000000 mask 0x009a48180f03ffff
v6.0
skiboot-6.0
skiboot v6.0 was released on Friday May 11th 2018. It is the first
release of skiboot 6.0, which is the new stable release of skiboot
following the 5.11 release, first released April 6th 2018.
Skiboot 6.0 is the basis for op-build v2.0 and will is required for
POWER9 systems.
skiboot v6.0 contains all bug fixes as of :ref:skiboot-5.11
,
:ref:skiboot-5.10.5
, and :ref:skiboot-5.4.9
(the currently maintained
stable releases). We do not expect any further stable releases in the
5.10.x series, nor in the 5.11.x series.
For how the skiboot stable releases work, see :ref:stable-rules
for details.
Over skiboot-5.11, we have the following changes:
New Features
Since 6.0-rc1:
-
Update default stop-state-disable mask to cut only stop11
Stability improvements in microcode for stop4/stop5 are
available in upstream hcode images. Stop4 and stop5 can
be safely enabled by default.Use ~0xE0000000 to cut all but stop0,1,2 in case there
are any issues with stop4/5.example: ::
nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0x1FFFFFFF
Note: that DD2.1 chips that have a frequency <1867Mhz possible need to
run a hcode image different than the default in op-build (set
BR2_HCODE_LATEST_VERSION=y
in your config) -
ibm,firmware-versions: add hcode to device tree
op-build commit 736a08b996e292a449c4996edb264011dfe56a40
added hcode to the VERSION partition, let's parse it out
and let the user know. -
ipmi: Add BMC firmware version to device tree
BMC Get device ID command gives BMC firmware version details. Lets add this
to device tree. User space tools will use this information to display BMC
version details.
Since 5.11:
-
Disable stop states from OPAL
On ZZ, stop4,5,11 are enabled for PowerVM, even though doing
so may cause problems with OPAL due to bugs in hcode.For other platforms, this isn't so much of an issue as
we can just control stop states by the MRW. However the
rebuild-the-world approach to changing values there is a bit
annoying if you just want to rule out a specific stop state
from being problematic.Provide an nvram option to override what's disabled in OPAL.
The OPAL mask is currently ~0xE0000000 (i.e. all but stop 0,1,2)
You can set an NVRAM override with: ::
nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0xFFFFFFF
This nvram override will disable all stop states.
-
interrupts: Create an "interrupts" property in the OPAL node
Deprecate the old "opal-interrupts", it's still there, but the new
property follows the standard and allow us to specify whether an
interrupt is level or edge sensitive.Similarly create "interrupt-names" whose content is identical to
"opal-interrupts-names". -
SBE: Add timer support on POWER9
SBE on P9 provides one shot programmable timer facility. We can use this
to implement OPAL timers and hence limit the reliance on the Linux
heartbeat (similar to HW timer facility provided by SLW on P8). -
Add SBE driver support
SBE (Self Boot Engine) on P9 has two different jobs:
- Boot the chip up to the point the core is functional
- Provide various services like timer, scom, stash MPIPL, etc., at runtime
We will use SBE for various purposes like timer, MPIPL, etc.
-
opal:hmi: Add missing processor recovery reason string.
With this patch now we see reason string printed for CORE_WOF[43] bit. ::
[ 477.352234986,7] HMI: [Loc: U78D3.001.WZS004A-P1-C48]: P:8 C:22 T:3: Processor recovery occurred.
[ 477.352240742,7] HMI: Core WOF = 0x0000000000100000 recovered error:
[ 477.352242181,7] HMI: PC - Thread hang recovery -
Add DIMM actual speed to device tree
Recent HDAT provides DIMM actuall speed. Lets add this to device tree.
-
Fix DIMM size property
Today we parse vpd blob to get DIMM size information. This is limited
to FSP based system. HDAT provides DIMM size value. Lets use that to
populate device tree. So that we can get size information on BMC based
system as well. -
PCI: Set slot power limit when supported
The PCIe slot capability can be implemented in a root or switch
downstream port to set the maximum power a card is allowed to draw
from the system. This patch adds support for setting the power limit
when the platform has defined one. -
hdata/spira: parse vpd to add part-number and serial-number to xscom@ node
Expected by FWTS and associates our processor with the part/serial
number, which is obviously a good thing for one's own sanity.
Improved HMI Handling
^^^^^^^^^^^^^^^^^^^^^
-
opal/hmi: Add documentation for opal_handle_hmi2 call
-
opal/hmi: Generate hmi event for recovered HDEC parity error.
-
opal/hmi: check thread 0 tfmr to validate latched tfmr errors.
Due to P9 errata, HDEC parity and TB residue errors are latched for
non-zero threads 1-3 even if they are cleared. But these are not
latched on thread 0. Hence, use xscom SCOMC/SCOMD to read thread 0 tfmr
value and ignore them on non-zero threads if they are not present on
thread 0. -
opal/hmi: Print additional debug information in rendezvous.
-
opal/hmi: Fix handling of TFMR parity/corrupt error.
While testing TFMR parity/corrupt error it has been observed that HMIs are
delivered twice for this error- First time HMI is delivered with HMER[4,5]=1 and TFMR[60]=1.
- Second time HMI is delivered with HMER[4,5]=1 and TFMR[60]=0 with valid TB.
On second HMI we end up throwing "HMI: TB invalid without core error
reported" even though TB is in a valid state. -
opal/hmi: Stop flooding HMI event for TOD errors.
Fix the issue where every thread on the chip sends HMI event to host for
TOD errors. TOD errors are reported to all the core/threads on the chip.
Any one thread can fix the error and send event. Rest of the threads don't
need to send HMI event unnecessarily. -
opal/hmi: Fix soft lockups during TOD errors
There are some TOD errors which do not affect working of TOD and TB. They
stay in valid state. Hence we don't need rendez vous for TOD errors that
does not affect TB working.TOD errors that affects TOD/TB will report a global error on TFMR[44]
alongwith bit 51, and they will go in rendez vous path as expected.But the TOD errors that does not affect TB register sets only TFMR bit 51.
The TFMR bit 51 is cleared when any single thread clears the TOD error.
Once cleared, the bit 51 is reflected to all the cores on that chip. Any
thread that reads the TFMR register after the error is cleared will see
TFMR bit 51 reset. Hence the threads that see TFMR[51]=1, falls through
rendez-vous path and threads that see TFMR[51]=0, returns doing
nothing. This ends up in a soft lockups in host kernel.This patch fixes this issue by not considering TOD interrupt (TFMR[51])
as a core-global error and hence avoiding rendez-vous path completely.
Instead threads that see TFMR[51]=1 will now take different path that
just do the TOD error recovery. -
opal/hmi: Do not send HMI event if no errors are found.
For TOD errors, all the cores in the chip get HMIs. Any one thread from any
core can fix the issue and TFMR will have error conditions cleared. Rest of
the threads need take any action if TOD errors are already cleared. Hence
thread 0 of every core should get a fresh copy of TFMR before going ahead
recovery path. Initialize recover = -1, so that if no errors found that
thread need not send a HMI event to linux. This helps in stop flooding host
with hmi event by every thread even there are no errors found. -
opal/hmi: Initialize the hmi event with old value of HMER.
Do this before we check for TFAC errors. Otherwise the event at host console
shows no error reported in HMER register.Without this patch the console event show HMER with all zeros ::
[ 216.753417] Severe Hypervisor Maintenance interrupt [Recovered]
[ 216.753498] Error detail: Timer facility experienced an error
[ 216.753509] HMER: 0000000000000000
[ 216.753518] TFMR: 3c12000870e04000After this patch it shows old HMER values on host console: ::
[ 2237.652533] Severe Hypervisor Maintenance interrupt [Recovered]
[ 2237.652651] Error detail: Timer facility experienced an error
[ 2237.652766] HMER: 0840000000000000
[ 2237.652837] TFMR: 3c12000870e04000 -
opal/hmi: Rework HMI handling of TFAC errors
This patch reworks the HMI handling for TFAC errors by introducing
4 rendez-vous points improve the thread synchronization while handling
timebase errors that requires all thread to clear dirty data from TB/HDEC
register before clearing the errors. -
opal/hmi: Don't bother passing HMER to pre-recovery cleanup
The test for TFAC error is now redundant so we remove it and
remove the HMER argument. -
opal/hmi: Move timer related error handling to a separate function
Currently no functional change. This is a first step to completely
rewriting how these things are handled. -
opal/hmi: Add a new opal_handle_hmi2 that returns direct info to Linux
It returns a 64-bit flags mask currently set to provide info
about which timer facilities were lost, and whether an event
was generated. -
opal/hmi: Remove races in clearing HMER
Writing to HMER acts as an "AND". The current code writes back the
value we originally read with the bits we handled cleared. This is
racy, if a new bit gets set in HW after the original read, we'll end
up clearing it without handling it.Instead, use an all 1's mask with only the bit handled cleared.
-
opal/hmi: Don't re-read HMER multiple times
We want to make sure all reporting and actions are based
upon the same snapshot of HMER...
v6.0-rc2
skiboot-6.0-rc2
skiboot v6.0-rc2 was released on Wednesday May 9th 2018. It is the second
release candidate of skiboot 6.0, which will become the new stable release
of skiboot following the 5.11 release, first released April 6th 2018.
Skiboot 6.0 will mark the basis for op-build v2.0 and will be required for
POWER9 systems.
skiboot v6.0-rc2 contains all bug fixes as of :ref:skiboot-5.11
,
:ref:skiboot-5.10.5
, and :ref:skiboot-5.4.9
(the currently maintained
stable releases). Once 6.0 is released, we do not expect any further
stable releases in the 5.10.x series, nor in the 5.11.x series.
For how the skiboot stable releases work, see :ref:stable-rules
for details.
The current plan is to cut the final 6.0 in early May (maybe in a day or two
after this -rc if things look okay), with skiboot 6.0
being for all POWER8 and POWER9 platforms in op-build v2.0.
Over skiboot-6.0-rc1, we have the following changes:
-
Update default stop-state-disable mask to cut only stop11
Stability improvements in microcode for stop4/stop5 are
available in upstream hcode images. Stop4 and stop5 can
be safely enabled by default.Use ~0xE0000000 to cut all but stop0,1,2 in case there
are any issues with stop4/5.example: ::
nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0x1FFFFFFF
Note: that DD2.1 chips that have a frequency <1867Mhz possible need to
run a hcode image different than the default in op-build (set
BR2_HCODE_LATEST_VERSION=y
in your config) -
ibm,firmware-versions: add hcode to device tree
op-build commit 736a08b996e292a449c4996edb264011dfe56a40
added hcode to the VERSION partition, let's parse it out
and let the user know. -
ipmi: Add BMC firmware version to device tree
BMC Get device ID command gives BMC firmware version details. Lets add this
to device tree. User space tools will use this information to display BMC
version details. -
mambo: Enable XER CA32 and OV32 bits on P9
POWER9 adds 32 bit carry and overflow bits to the XER, but we need to
set the relevant CTRL1 bit to enable them. -
Makefile: Fix building natively on ppc64le
When on ppc64le and CROSS is not set by the environment, make assumes
ppc64 and sets a default CROSS. Check for ppc64le as well, so that
'make' works out of the box on ppc64le. -
p9dsu: timeout for variant detection, default to 2uess
-
core/direct-controls: improve p9_stop_thread error handling
p9_stop_thread should fail the operation if it finds the thread was
already quiescd. This implies something else is doing direct controls
on the thread (e.g., pdbg) or there is some exceptional condition we
don't know how to deal with. Proceeding here would cause things to
trample on each other, for example the hard lockup watchdog trying to
send a sreset to the core while it is stopped for debugging with pdbg
will end in tears.If p9_stop_thread times out waiting for the thread to quiesce, do
not hit it with a core_start direct control, because we don't know
what state things are in and doing more things at this point is worse
than doing nothing. There is no good recipe described in the workbook
to de-assert the core_stop control if it fails to quiesce the thread.
After timing out here, the thread may eventually quiesce and get
stuck, but that's simpler to debug than undefied behaviour. -
core/direct-controls: fix p9_cont_thread for stopped/inactive threads
Firstly, p9_cont_thread should check that the thread actually was
quiesced before it tries to resume it. Anything could happen if we
try this from an arbitrary thread state.Then when resuming a quiesced thread that is inactive or stopped (in
a stop idle state), we must not send a core_start direct control,
clear_maint must be used in these cases. -
occ: Use major version number while checking the pstate table format
The minor version increments of the pstate table are backward
compatible. The minor version is changed when the pstate table
remains same and the existing reserved bytes are used for pointing
new data. So use only major version number while parsing the pstate
table. This will allow old skiboot to parse the pstate table and
handle minor version updates. -
hmi: Clear unknown debug trigger
On some systems, seeing hangs like this when Linux starts: ::
[ 170.027252763,5] OCC: All Chip Rdy after 0 ms [ 170.062930145,5] INIT: Starting kernel at 0x20011000, fdt at 0x30ae0530 366247 bytes) [ 171.238270428,5] OPAL: Switch to little-endian OS
If you look at the in memory skiboot console (or do
nvram -p ibm,skiboot --update-config log-level-driver=7
) we see the console get
spammed with: ::[ 5209.109790675,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000 [ 5209.109792716,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000 [ 5209.109794695,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000 [ 5209.109796689,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
We're taking the debug trigger (bit 17) early on, before the
hmi_debug_trigger function in the kernel is set up.This clears the HMI in Skiboot and reports to the kernel instead of
bringing down the machine. -
core/hmi: assign flags=0 in case nothing set by handle_hmi_exception
Theoretically we could have returned junk to the OS in this parameter.
-
SLW: Fix mambo boot to use stop states
After commit 35c66b8 ("SLW: Move MAMBO simulator checks to
slw_init"), mambo boot no longer calls add_cpu_idle_state_properties()
and as such we never enable stop states.After adding the call back, we get more testing coverage as well
as faster mambo SMT boots. -
phb4: Hardware init updates
CFG Write Request Timeout was incorrectly set to informational and not
fatal for both non-CAPI and CAPI, so set it to fatal. This was a
mistake in the specification. Correcting this fixes a niche bug in
escalation (which is necessary on pre-DD2.2) that can cause a checkstop
due to a NCU timeout.In addition, set the values in the timeout control registers to match.
This fixes an extremely rare and unreproducible bug, though the current
timings don't make sense since they're higher than the NCU timeout (16)
which will checkstop the machine anyway. -
SLW: quieten 'Configuring self-restore' for DARN,NCU_SPEC_BAR and HRMOR
-
Experimental support for building with Clang
-
Improvements to testing and Travis CI
v6.0-rc1
skiboot-6.0-rc1
skiboot v6.0-rc1 was released on Tuesday May 1st 2018. It is the first
release candidate of skiboot 6.0, which will become the new stable
release of skiboot following the 5.11 release, first released April
6th 2018.
Skiboot 6.0 will mark the basis for op-build v2.0 and will be required
for POWER9 systems.
skiboot v6.0-rc1 contains all bug fixes as of skiboot-5.11,
skiboot-5.10.5, and skiboot-5.4.9 (the currently maintained stable
releases). Once 6.0 is released, we do not expect any further stable
releases in the 5.10.x series, nor in the 5.11.x series.
For how the skiboot stable releases work, see Skiboot stable tree
rules and releases for details.
The current plan is to cut the final 6.0 in early May, with skiboot
6.0 being for all POWER8 and POWER9 platforms in op-build v2.0.
Over skiboot-5.11, we have the following changes:
New Features
-
Disable stop states from OPAL
On ZZ, stop4,5,11 are enabled for PowerVM, even though doing so may
cause problems with OPAL due to bugs in hcode.For other platforms, this isn’t so much of an issue as we can just
control stop states by the MRW. However the rebuild-the-world
approach to changing values there is a bit annoying if you just want
to rule out a specific stop state from being problematic.Provide an nvram option to override what’s disabled in OPAL.
The OPAL mask is currently ~0xE0000000 (i.e. all but stop 0,1,2)
You can set an NVRAM override with:
nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0xFFFFFFF
This nvram override will disable all stop states.
-
interrupts: Create an “interrupts” property in the OPAL node
Deprecate the old “opal-interrupts”, it’s still there, but the new
property follows the standard and allow us to specify whether an
interrupt is level or edge sensitive.Similarly create “interrupt-names” whose content is identical to
“opal-interrupts-names”. -
SBE: Add timer support on POWER9
SBE on P9 provides one shot programmable timer facility. We can use
this to implement OPAL timers and hence limit the reliance on the
Linux heartbeat (similar to HW timer facility provided by SLW on
P8). -
Add SBE driver support
SBE (Self Boot Engine) on P9 has two different jobs: - Boot the chip
up to the point the core is functional - Provide various services
like timer, scom, stash MPIPL, etc., at runtimeWe will use SBE for various purposes like timer, MPIPL, etc.
-
opal:hmi: Add missing processor recovery reason string.
With this patch now we see reason string printed for CORE_WOF[43]
bit.[ 477.352234986,7] HMI: [Loc: U78D3.001.WZS004A-P1-C48]: P:8 C:22 T:3: Processor recovery occurred.
[ 477.352240742,7] HMI: Core WOF = 0x0000000000100000 recovered error:
[ 477.352242181,7] HMI: PC - Thread hang recovery -
Add DIMM actual speed to device tree
Recent HDAT provides DIMM actuall speed. Lets add this to device
tree. -
Fix DIMM size property
Today we parse vpd blob to get DIMM size information. This is
limited to FSP based system. HDAT provides DIMM size value. Lets use
that to populate device tree. So that we can get size information on
BMC based system as well. -
PCI: Set slot power limit when supported
The PCIe slot capability can be implemented in a root or switch
downstream port to set the maximum power a card is allowed to draw
from the system. This patch adds support for setting the power limit
when the platform has defined one. -
hdata/spira: parse vpd to add part-number and serial-number to
xscom@ nodeExpected by FWTS and associates our processor with the part/serial
number, which is obviously a good thing for one’s own sanity.
Improved HMI Handling
-
opal/hmi: Add documentation for opal_handle_hmi2 call
-
opal/hmi: Generate hmi event for recovered HDEC parity error.
-
opal/hmi: check thread 0 tfmr to validate latched tfmr errors.
Due to P9 errata, HDEC parity and TB residue errors are latched for
non-zero threads 1-3 even if they are cleared. But these are not
latched on thread 0. Hence, use xscom SCOMC/SCOMD to read thread 0
tfmr value and ignore them on non-zero threads if they are not
present on thread 0. -
opal/hmi: Print additional debug information in rendezvous.
-
opal/hmi: Fix handling of TFMR parity/corrupt error.
While testing TFMR parity/corrupt error it has been observed that
HMIs are delivered twice for this error-
First time HMI is delivered with HMER[4,5]=1 and TFMR[60]=1.
-
Second time HMI is delivered with HMER[4,5]=1 and TFMR[60]=0
with valid TB.
On second HMI we end up throwing “HMI: TB invalid without core error
reported” even though TB is in a valid state. -
-
opal/hmi: Stop flooding HMI event for TOD errors.
Fix the issue where every thread on the chip sends HMI event to host
for TOD errors. TOD errors are reported to all the core/threads on
the chip. Any one thread can fix the error and send event. Rest of
the threads don’t need to send HMI event unnecessarily. -
opal/hmi: Fix soft lockups during TOD errors
There are some TOD errors which do not affect working of TOD and TB.
They stay in valid state. Hence we don’t need rendez vous for TOD
errors that does not affect TB working.TOD errors that affects TOD/TB will report a global error on
TFMR[44] alongwith bit 51, and they will go in rendez vous path as
expected.But the TOD errors that does not affect TB register sets only TFMR
bit 51. The TFMR bit 51 is cleared when any single thread clears the
TOD error. Once cleared, the bit 51 is reflected to all the cores on
that chip. Any thread that reads the TFMR register after the error
is cleared will see TFMR bit 51 reset. Hence the threads that see
TFMR[51]=1, falls through rendez-vous path and threads that see
TFMR[51]=0, returns doing nothing. This ends up in a soft lockups in
host kernel.This patch fixes this issue by not considering TOD interrupt
(TFMR[51]) as a core-global error and hence avoiding rendez-vous
path completely. Instead threads that see TFMR[51]=1 will now take
different path that just do the TOD error recovery. -
opal/hmi: Do not send HMI event if no errors are found.
For TOD errors, all the cores in the chip get HMIs. Any one thread
from any core can fix the issue and TFMR will have error conditions
cleared. Rest of the threads need take any action if TOD errors are
already cleared. Hence thread 0 of every core should get a fresh
copy of TFMR before going ahead recovery path. Initialize recover =
-1, so that if no errors found that thread need not send a HMI event
to linux. This helps in stop flooding host with hmi event by every
thread even there are no errors found. -
opal/hmi: Initialize the hmi event with old value of HMER.
Do this before we check for TFAC errors. Otherwise the event at host
console shows no error reported in HMER register.Without this patch the console event show HMER with all zeros
[ 216.753417] Severe Hypervisor Maintenance interrupt [Recovered]
[ 216.753498] Error detail: Timer facility experienced an error
[ 216.753509] HMER: 0000000000000000
[ 216.753518] TFMR: 3c12000870e04000After this patch it shows old HMER values on host console:
[ 2237.652533] Severe Hypervisor Maintenance interrupt [Recovered]
[ 2237.652651] Error detail: Timer facility experienced an error
[ 2237.652766] HMER: 0840000000000000
[ 2237.652837] TFMR: 3c12000870e04000 -
opal/hmi: Rework HMI handling of TFAC errors
This patch reworks the HMI handling for TFAC errors by introducing 4
rendez-vous points improve the thread synchronization while handling
timebase errors that requires all thread to clear dirty data from
TB/HDEC register before clearing the errors. -
opal/hmi: Don’t bother passing HMER to pre-recovery cleanup
The test for TFAC error is now redundant so we remove it and remove
the HMER argument. -
opal/hmi: Move timer related error handling to a separate function
Currently no functional change. This is a first step to completely
rewriting how these things are handled. -
opal/hmi: Add a new opal_handle_hmi2 that returns direct info to
LinuxIt returns a 64-bit flags mask currently set to provide info about
which timer facilities were lost, and whether an event was
generated. -
opal/hmi: Remove races in clearing HMER
Writing to HMER acts as an “AND”. The current code writes back the
value we originally read with the bits we handled cleared. This is
racy, if a new bit gets set in HW after the original read, we’ll end
up clearing it without handling it.Instead, use an all 1’s mask with only the bit handled cleared.
-
opal/hmi: Don’t re-read HMER multiple times
We want to make sure all reporting and actions are based upon the
same snapshot of HMER in case bits get added by HW while we are in
OPAL.
libflash and ffspart
Many improvements to the ffspart utility and libflash have come in
this release, making ffspart suitable for building bit-identical
PNOR images as the existing tooling used by op-build. The plan is to
switch op-build to use this infrastructure in the not too distant
future.
-
libflash/blocklevel: Make read/write be ECC agnostic for callers
The blocklevel abstraction allows for regions of the backing store
to be marked as ECC protected so that blocklevel can decode/encode
the ECC bytes into the buffer automatically without the caller
having to be ECC aware.Unfortunately this abstraction is far from perfect, this is only
useful if reads and w...
v5.10.5
skiboot-5.10.5
skiboot 5.10.5 was released on Tuesday April 24th, 2018. It replaces
skiboot-5.10.4 as the current stable release in the 5.10.x series.
It is recommended that 5.10.5 be used instead of any previous 5.10.x
version due to the bug fixes and debugging enhancements in it.
Over skiboot-5.10.4, we have four bug fixes:
-
npu2/hw-procedures: fence bricks on GPU reset
The NPU workbook defines a way of fencing a brick and getting the
brick out of fence state. We do have an implementation of bringing
the brick out of fenced/quiesced state. We do the latter in our
procedures, but to support run time reset we need to do the former.The fencing ensures that access to memory behind the links will not
lead to HMI’s, but instead SUE’s will be populated in cache (in the
case of speculation). The expectation is then that prior to and
after reset, the operating system components will flush the cache
for the region of memory behind the GPU.This patch does the following:
-
Implements a npu2_dev_fence_brick() function to set/clear
fence state -
Clear FIR bits prior to clearing the fence status
-
Clear’s the fence status
-
We take the powerbus out of CQ fence much later now, in
credits_check() which is the last hardware procedure called
after link training.
-
-
hdata/spira: parse vpd to add part-number and serial-number to
xscom@ nodeExpected by FWTS and associates our processor with the part/serial
number, which is obviously a good thing for one’s own sanity. -
hw/imc: Check for pause_microcode_at_boot() return status
pause_microcode_at_boot() loops through all the chip’s ucode control
block and pause the ucode if it is in the running state. But it does
not fail if any of the chip’s ucode is not initialised.Add code to return a failure if ucode is not initialized in any of
the chip. Since pause_microcode_at_boot() is called just before
attaching the IMC device nodes in imc_init(), add code to check for
the function return. -
core/cpufeatures: Fix setting DARN and SCV HWCAP feature bits
DARN and SCV has been assigned AT_HWCAP2 (32-63) bits:
#define PPC_FEATURE2_DARN 0x00200000 /* darn random number insn /
#define PPC_FEATURE2_SCV 0x00100000 / scv syscall */A cpufeatures-aware OS will not advertise these to userspace without
this patch.
v5.11
skiboot-5.11
skiboot v5.11 was released on Friday April 6th 2018. It is the first
release of skiboot 5.11, which is now the new stable release of
skiboot following the 5.10 release, first released February 23rd 2018.
It is not expected to keep the 5.11 branch around for long, and
instead quickly move onto a 6.0, which will mark the basis for op-
build v2.0 and will be required for POWER9 systems.
It is expected that skiboot 6.0 will follow very shortly. Consider
5.11 more of a beta release to 6.0 than anything. For POWER9 systems
it should certainly be more solid than previous releases though.
skiboot v5.11 contains all bug fixes as of skiboot-5.10.4 and
skiboot-5.4.9 (the currently maintained stable releases). There may
be more 5.10.x stable releases, it will depend on demand.
For how the skiboot stable releases work, see Skiboot stable tree
rules and releases for details.
Over skiboot-5.10, we have the following changes:
New Platforms
-
Add VESNIN platform support
The Vesnin platform from YADRO is a 4 socked POWER8 system with up
to 8TB of memory with 460GB/s of memory bandwidth in only 2U. Many
kudos to the team from Yadro for submitting their code upstream!
New Features
-
fast-reboot: enable by default for POWER9
- Fast reboot is disabled if NPU2 is present or CAPI2/OpenCAPI is
used
- Fast reboot is disabled if NPU2 is present or CAPI2/OpenCAPI is
-
PCI tunneled operations on PHB4
-
phb4: set PBCQ Tunnel BAR for tunneled operations
P9 supports PCI tunneled operations (atomics and as_notify) that
are initiated by devices.A subset of the tunneled operations require a response, that must
be sent back from the host to the device. For example, an atomic
compare and swap will return the compare status, as swap will only
performed in case of success. Similarly, as_notify reports if the
target thread has been woken up or not, because the operation may
fail.To enable tunneled operations, a device driver must tell the host
where it expects tunneled operation responses, by setting the PBCQ
Tunnel BAR Response register with a specific value within the
range of its BARs.This register is currently initialized by enable_capi_mode(). But,
as tunneled operations may also operate in PCI mode, a new API is
required to set the PBCQ Tunnel BAR Response register, without
switching to CAPI mode.This patch provides two new OPAL calls to get/set the PBCQ Tunnel
BAR Response register.Note: as there is only one PBCQ Tunnel BAR register, shared
between all the devices connected to the same PHB, only one of
these devices will be able to use tunneled operations, at any
time. -
phb4: set PHB CMPM registers for tunneled operations
P9 supports PCI tunneled operations (atomics and as_notify) that
require setting the PHB ASN Compare/Mask register with a 16-bit
indication.This register is currently initialized by enable_capi_mode(). But,
as tunneled operations may also work in PCI mode, the ASN
Compare/Mask register should rather be initialized in
phb4_init_ioda3().This patch also adds “ibm,phb-indications” to the device tree, to
tell Linux the values of CAPI, ASN, and NBW indications, when
supported.Tunneled operations tested by IBM in CAPI mode, by Mellanox
Technologies in PCI mode.
-
-
Tie tm-suspend fw-feature and opal_reinit_cpus() together
Currently opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED)
always returns OPAL_UNSUPPORTED.This ties the tm suspend fw-feature to the
opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) so that when
tm suspend is disabled, we correctly report it to the kernel. For
backwards compatibility, it’s assumed tm suspend is available if the
fw-feature is not present.Currently hostboot will clear fw-feature(TM_SUSPEND_ENABLED) on P9N
DD2.1. P9N DD2.2 will set fw-feature(TM_SUSPEND_ENABLED). DD2.0 and
below has TM disabled completely (not just suspend).We are using opal_reinit_cpus() to determine this setting (rather
than the device tree/HDAT) as some future firmware may let us change
this dynamically after boot. That is not the case currently though.
Power Management
-
SLW: Increase stop4-5 residency by 10x
Using DGEMM benchmark we observed there was a drop of 5-9%
throughput with and without stop4/5. In this benchmark the GPU waits
on the cpu to wakeup and provide the subsequent data block to
compute. The wakup latency accumulates over the run and shows up as
a performance drop.Linux enters stop4/5 more aggressively for its wakeup latency.
Increasing the residency from 1ms to 10ms makes the performance drop
<1% -
occ: Set up OCC messaging even if we fail to setup pstates
This means that we no longer hit this bug if we fail to get valid
pstates from the OCC.[console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
[ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
[ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
[ 10.318805] Disabling lock debugging due to kernel taint
[ 10.318808] Severe Machine check interrupt [Not recovered]
[ 10.318812] NIP [000000003003e434]: 0x3003e434
[ 10.318813] Initiator: CPU
[ 10.318815] Error type: Real address [Load/Store (foreign)]
[ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception
[ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3
[ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240
[ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1)
[ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000
[ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1
mbox based platforms
For platforms using the mbox protocol for host flash access (all BMC
based OpenPOWER systems, most OpenBMC based systems) there have been
some hardening efforts in the event of the BMC being poorly behaved.
-
mbox: Reduce default BMC timeouts
Rebooting a BMC can take 70 seconds. Skiboot cannot possibly spin
for 70 seconds waiting for a BMC to come back. This also makes the
current default of 30 seconds a bit pointless, is it far too short
to be a worse case wait time but too long to avoid hitting
hardlockup detectors and wrecking havoc inside host linux.Just change it to three seconds so that host linux will survive and
that, reads and writes will fail but at least the host stays up.Also refactored the waiting loop just a bit so that it’s easier to
read. -
mbox: Harden against BMC daemon errors
Bugs present in the BMC daemon mean that skiboot gets presented with
mbox windows of size zero. These windows cannot be valid and skiboot
already detects these conditions.Currently skiboot warns quite strongly about the occurrence of these
problems. The problem for skiboot is that it doesn’t take any
action. Initially I wanting to avoid putting policy like this into
skiboot but since these bugs aren’t going away and skiboot barfing
is leading to lockups and ultimately the host going down something
needs to be done.I propose that when we detect the problem we fail the mbox call and
punt the problem back up to Linux. I don’t like it but at least it
will cause errors to cascade and won’t bring the host down. I’m not
sure how Linux is supposed to detect this or what it can even do but
this is better than a crash.Diagnosing a failure to boot if skiboot its self fails to read flash
may be marginally more difficult with this patch. This is because
skiboot will now only print one warning about the zero sized window
rather than continuously spitting it out.
Fast Reboot Improvements
Around fast-reboot we have made several improvements to harden the
fast reboot code paths and resort to a full IPL if something doesn’t
look right.
-
core/fast-reboot: zero memory after fast reboot
This improves the security and predictability of the fast reboot
environment.There can not be a secure fence between fast reboots, because a
malicious OS can modify the firmware itself. However a well-behaved
OS can have a reasonable expectation that OS memory regions it has
modified will be cleared upon fast reboot.The memory is zeroed after all other CPUs come up from fast reboot,
just before the new kernel is loaded and booted into. This allows
image preloading to run concurrently, and will allow parallelisation
of the clearing in future. -
core/fast-reboot: verify mem regions before fast reboot
Run the mem_region sanity checkers before proceeding with fast
reboot.This is the beginning of proactive sanity checks on opal data for
fast reboot (with complements the reactive disable_fast_reboot
cases). This is encouraged to re-use and share any kind of debug
code and unit test code. -
fast-reboot: occ: Only delete /ibm, opal/power-mgt nodes if they
exist -
core/fast-reboot: disable fast reboot upon fundamental
entry/exit/locking errorsThis disables fast reboot in several more cases where serious errors
like lock corruption or call re-entrancy are detected. -
capp: Disable fast-reboot whenever enable_capi_mode() is called
This patch updates phb4_set_capi_mode() to disable fast-reboot
whenever enable_capi_mode() is called, irrespective to its...