Moved XSTSTDCSP, XSTSTDCDP and XSTSTDCQP to decodetree and moved some of
its decoding away from the helper as previously the DCMX, XB and BF were
calculated in the helper with the help of cpu_env, now that part was
moved to the decodetree with the rest.
xvtstdcsp:
rept loop master patch
8 12500 1,85393600 1,94683600 (+5.0%)
25 4000 1,78779800 1,92479000 (+7.7%)
100 1000 2,12775000 2,28895500 (+7.6%)
500 200 2,99655300 3,23102900 (+7.8%)
2500 40 6,89082200 7,44827500 (+8.1%)
8000 12 17,50585500 18,95152100 (+8.3%)
xvtstdcdp:
rept loop master patch
8 12500 1,39043100 1,33539800 (-4.0%)
25 4000 1,35731800 1,37347800 (+1.2%)
100 1000 1,51514800 1,56053000 (+3.0%)
500 200 2,21014400 2,47906000 (+12.2%)
2500 40 5,39488200 6,68766700 (+24.0%)
8000 12 13,98623900 18,17661900 (+30.0%)
xvtstdcdp:
rept loop master patch
8 12500 1,35123800 1,34455800 (-0.5%)
25 4000 1,36441200 1,36759600 (+0.2%)
100 1000 1,49763500 1,54138400 (+2.9%)
500 200 2,19020200 2,46196400 (+12.4%)
2500 40 5,39265700 6,68147900 (+23.9%)
8000 12 14,04163600 18,19669600 (+29.6%)
As some values are now decoded outside the helper and passed to it as an
argument the number of arguments of the helper increased, the number
of TCGop needed to load the arguments increased. I suspect that's why
the slow-down in the tests with a high REPT but low LOOP.
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-12-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Moved XVTSTDCSP and XVTSTDCDP to decodetree an restructured the helper
to be simpler and do all decoding in the decodetree (so XB, XT and DCMX
are all calculated outside the helper).
Obs: The tests in this one are slightly different, these are the sum of
these instructions with all possible immediate and those instructions
are repeated 10 times.
xvtstdcsp:
rept loop master patch
8 12500 2,76402100 2,70699100 (-2.1%)
25 4000 2,64867100 2,67884100 (+1.1%)
100 1000 2,73806300 2,78701000 (+1.8%)
500 200 3,44666500 3,61027600 (+4.7%)
2500 40 5,85790200 6,47475500 (+10.5%)
8000 12 15,22102100 17,46062900 (+14.7%)
xvtstdcdp:
rept loop master patch
8 12500 2,11818000 1,61065300 (-24.0%)
25 4000 2,04573400 1,60132200 (-21.7%)
100 1000 2,13834100 1,69988100 (-20.5%)
500 200 2,73977000 2,48631700 (-9.3%)
2500 40 5,05067000 5,25914100 (+4.1%)
8000 12 14,60507800 15,93704900 (+9.1%)
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-11-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
In ppc emulation, exception flags are not cleared at the end of an
instruction. Instead, the next instruction is responsible to clear
it before its emulation. However, some helpers are not doing it,
causing an issue where the previously set exception flags are being
used and leading to incorrect values being set in FPSCR.
Fix this by clearing fp_status before doing the instruction 'real' work
for the following helpers that were missing this behavior:
- VSX_CVT_INT_TO_FP_VECTOR
- VSX_CVT_FP_TO_FP
- VSX_CVT_FP_TO_INT_VECTOR
- VSX_CVT_FP_TO_INT2
- VSX_CVT_FP_TO_INT
- VSX_CVT_FP_TO_FP_HP
- VSX_CVT_FP_TO_FP_VECTOR
- VSX_CMP
- VSX_ROUND
- xscvqpdp
- xscvdpsp[n]
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220906125523.38765-9-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
In 205eb5a89e we updated most VSX instructions to zero the
second doubleword, as is requested by PowerISA since v3.1.
However, VSX_MADD helper was left behind unchanged, while it
is also affected and should be fixed as well.
This patch applies the fix for MADD instructions.
Fixes: 205eb5a89e ("target/ppc: Change VSX instructions behavior to fill with zeros")
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220906125523.38765-6-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
When an overflow exception occurs and OE is set the intermediate result
should be adjusted (by subtracting from the exponent) to avoid rounding
to inf. The same applies to an underflow exceptionion and UE (but adding
to the exponent). To do this set the fp_status.rebias_overflow when OE
is set and fp_status.rebias_underflow when UE is set as the FPU will
recalculate in case of a overflow/underflow if the according rebias* is
set.
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220805141522.412864-3-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Commit c29018cc73 added an env->fpscr OR operation using a ternary
that checks if 'error' is not zero:
env->fpscr |= error ? FP_FEX : 0;
However, in the current body of do_fpscr_check_status(), 'error' is
granted to be always non-zero at that point. The result is that Coverity
is less than pleased:
Control flow issues (DEADCODE)
Execution cannot reach the expression "0ULL" inside this statement:
"env->fpscr |= (error ? 1073...".
Remove the ternary and always make env->fpscr |= FP_FEX.
Cc: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Cc: Richard Henderson <richard.henderson@linaro.org>
Fixes: Coverity CID 1489442
Fixes: c29018cc73 ("target/ppc: Implemented xvf*ger*")
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Reviewed-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Message-Id: <20220602191048.137511-1-danielhb413@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
This patch fixes another not-so-clear situation in Power ISA
regarding the inexact bits in FPSCR. The ISA states that:
"""
When Overflow Exception is disabled (OE=0) and an
Overflow Exception occurs, the following actions are
taken:
...
2. Inexact Exception is set
XX <- 1
...
FI is set to 1
...
"""
However, when tested on a Power 9 hardware, some instructions that
trigger an OX don't set the FI bit:
xvcvdpsp(0x4050533fcdb7b95ff8d561c40bf90996) = FI: CLEARED -> CLEARED
xvnmsubmsp(0xf3c0c1fc8f3230, 0xbeaab9c5) = FI: CLEARED -> CLEARED
(just a few examples. Other instructions are also affected)
The root cause for this seems to be that only instructions that list
the bit FI in the "Special Registers Altered" should modify it.
QEMU is, today, not working like the hardware:
xvcvdpsp(0x4050533fcdb7b95ff8d561c40bf90996) = FI: CLEARED -> SET
xvnmsubmsp(0xf3c0c1fc8f3230, 0xbeaab9c5) = FI: CLEARED -> SET
(all tests assume FI is cleared beforehand)
Fix this by making float_overflow_excp() return float_flag_inexact
if it should update the inexact flags.
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Message-Id: <20220517161522.36132-3-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
According to Power ISA, the FI bit in FPSCR is non-sticky.
This means that if an instruction is said to modify the FI bit, then
it should be set or cleared depending on the result of the
instruction. Otherwise, it should be kept as was before.
However, the following inconsistency was found when comparing results
from the hardware (tested on both a Power 9 processor and in
Power 10 Mambo):
(FI bit is set before the execution of the instruction)
Hardware: xscmpeqdp(0xff..ff, 0xff..ff) = FI: SET -> SET
QEMU: xscmpeqdp(0xff..ff, 0xff..ff) = FI: SET -> CLEARED
As the FI bit is non-sticky, and xscmpeqdp does not list it as a field
that is changed by the instruction, it should not be changed after its
execution.
This is happening to multiple instructions in the vsx implementations.
If the ISA does not list the FI bit as altered for a particular
instruction, then it should be kept as it was before the instruction.
QEMU is not following this behavior. Affected instructions include:
- xv* (all vsx-vector instructions);
- xscmp*, xsmax*, xsmin*;
- xstdivdp and similars;
(to identify the affected instructions, just search in the ISA for
the instructions that does not list FI in "Special Registers Altered")
Most instructions use the function do_float_check_status() to commit
changes in the inexact flag. So the fix is to add a parameter to it
that will control if the bit FI should be changed or not.
All users of do_float_check_status() are then modified to provide this
argument, controlling if that specific instruction changes bit FI or
not.
Some macro helpers are responsible for both instructions that change
and instructions that aren't suposed to change FI. This seems to always
overlap with the sfprf flag. So, reuse this flag for this purpose when
applicable.
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220517161522.36132-2-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Power ISA v3.1 formalizes the previously undefined result in
words 1 and 3 to be a copy of the result in words 0 and 2.
This affects: xvcvsxdsp, xvcvuxdsp, xvcvdpsp.
And the previously undefined result in word 1 to be a copy of
the result in word 0.
This affects: xscvdpsp.
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220316200427.3410437-1-lucas.coutinho@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
ISA v3.1 changed some VSX instructions behavior by changing what the
other words/doubleword in the result should contain when the result is
only one word/doubleword. e.g. xsmaxdp operates on doubleword 0 and
saves the result also in doubleword 0.
Before, the second doubleword result was undefined according to the
ISA, but now it's stated that it should be zeroed.
Even tough the result was undefined before, hardware implementing these
instructions already filled these fields with 0s. Changing every ISA
version in QEMU to this behavior makes the results match what happens
in hardware.
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220204181944.65063-1-victor.colombo@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The non-signalling versions of VSX scalar convert to shorter/longer
precision insns doesn't silence SNaNs in the hardware. To better match
this behavior, use the non-arithmatic conversion of helper_todouble
instead of float32_to_float64. A test is added to prevent future
regressions.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20211228120310.1957990-1-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
PPC instruction xsmaxcdp, xsmincdp, xsmaxjdp, and xsminjdp are using
vector registers when they should be using VSX ones. This happens
because the instructions are using GEN_VSX_HELPER_R3, which adds 32
to the register numbers, effectively making them vector registers.
This patch fixes it by changing these instructions to use
GEN_VSX_HELPER_X3.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Victor Colombo <victor.colombo@eldorado.org.br>
Message-Id: <20211213120958.24443-2-victor.colombo@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
When computing the predicate "is this value currently formatted
for single precision", we do not want to round the value according
to the current rounding mode, nor perform a floating-point equality.
We want to see if the N bits that make up single-precision are the
only ones set within the register, and then a bitwise equality.
Fixes a bug in which a single-precision NaN is considered !SP,
because float64_eq(nan, nan) is always false.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211119160502.17432-35-richard.henderson@linaro.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
There is no double-rounding bug here, because the result is
merely an estimate to within 1 part in 256, but perform the
operation with float64r32_div for consistency.
Use float_flag_invalid_snan instead of recomputing the
snan-ness of the operand.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211119160502.17432-34-richard.henderson@linaro.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Now that vxsqrt and vxsnan are computed directly by softfloat,
we don't need to recompute it. Split out float_invalid_op_sqrt
to be used in several places. This fixes VSX_SQRT, which did
not order its tests correctly to eliminate NaN with sign set.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211119160502.17432-25-richard.henderson@linaro.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>