gcc-14.2: Fix ICE on aarch64

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464 for issue
description and list of patches to backport.

Signed-off-by: BtbN <btbn@btbn.de>
This commit is contained in:
BtbN 2024-08-10 16:48:55 +02:00 committed by Chris Packham
parent ed12fa6840
commit 5595edc370
6 changed files with 781 additions and 0 deletions

View File

@ -0,0 +1,139 @@
From cb547fed9177c2a28f376c881facfcf4b64e70a9 Mon Sep 17 00:00:00 2001
From: Richard Sandiford <richard.sandiford@arm.com>
Date: Thu, 13 Jun 2024 12:48:21 +0100
Subject: [PATCH 11/16] aarch64: Fix invalid nested subregs [PR115464]
The testcase extracts one arm_neon.h vector from a pair (one subreg)
and then reinterprets the result as an SVE vector (another subreg).
Each subreg makes sense individually, but we can't fold them together
into a single subreg: it's 32 bytes -> 16 bytes -> 16*N bytes,
but the interpretation of 32 bytes -> 16*N bytes depends on
whether N==1 or N>1.
Since the second subreg makes sense individually, simplify_subreg
should bail out rather than ICE on it. simplify_gen_subreg will
then do the same (because it already checks validate_subreg).
This leaves simplify_gen_subreg returning null, requiring the
caller to take appropriate action.
I think this is relatively likely to occur elsewhere, so the patch
adds a helper for forcing a subreg, allowing a temporary pseudo to
be created where necessary.
I'll follow up by using force_subreg in more places. This patch
is intended to be a minimal backportable fix for the PR.
gcc/
PR target/115464
* simplify-rtx.cc (simplify_context::simplify_subreg): Don't try
to fold two subregs together if their relationship isn't known
at compile time.
* explow.h (force_subreg): Declare.
* explow.cc (force_subreg): New function.
* config/aarch64/aarch64-sve-builtins-base.cc
(svset_neonq_impl::expand): Use it instead of simplify_gen_subreg.
gcc/testsuite/
PR target/115464
* gcc.target/aarch64/sve/acle/general/pr115464.c: New test.
(cherry picked from commit 0970ff46ba6330fc80e8736fc05b2eaeeae0b6a0)
---
gcc/config/aarch64/aarch64-sve-builtins-base.cc | 2 +-
gcc/explow.cc | 15 +++++++++++++++
gcc/explow.h | 2 ++
gcc/simplify-rtx.cc | 5 +++++
.../aarch64/sve/acle/general/pr115464.c | 13 +++++++++++++
5 files changed, 36 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 0d2edf3f19e..c9182594bc1 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -1174,7 +1174,7 @@ public:
Advanced SIMD argument as an SVE vector. */
if (!BYTES_BIG_ENDIAN
&& is_undef (CALL_EXPR_ARG (e.call_expr, 0)))
- return simplify_gen_subreg (mode, e.args[1], GET_MODE (e.args[1]), 0);
+ return force_subreg (mode, e.args[1], GET_MODE (e.args[1]), 0);
rtx_vector_builder builder (VNx16BImode, 16, 2);
for (unsigned int i = 0; i < 16; i++)
diff --git a/gcc/explow.cc b/gcc/explow.cc
index 8e5f6b8e680..f6843398c4b 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -745,6 +745,21 @@ force_reg (machine_mode mode, rtx x)
return temp;
}
+/* Like simplify_gen_subreg, but force OP into a new register if the
+ subreg cannot be formed directly. */
+
+rtx
+force_subreg (machine_mode outermode, rtx op,
+ machine_mode innermode, poly_uint64 byte)
+{
+ rtx x = simplify_gen_subreg (outermode, op, innermode, byte);
+ if (x)
+ return x;
+
+ op = copy_to_mode_reg (innermode, op);
+ return simplify_gen_subreg (outermode, op, innermode, byte);
+}
+
/* If X is a memory ref, copy its contents to a new temp reg and return
that reg. Otherwise, return X. */
diff --git a/gcc/explow.h b/gcc/explow.h
index 16aa02cfb68..cbd1fcb7eb3 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -42,6 +42,8 @@ extern rtx copy_to_suggested_reg (rtx, rtx, machine_mode);
Args are mode (in case value is a constant) and the value. */
extern rtx force_reg (machine_mode, rtx);
+extern rtx force_subreg (machine_mode, rtx, machine_mode, poly_uint64);
+
/* Return given rtx, copied into a new temp reg if it was in memory. */
extern rtx force_not_mem (rtx);
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index dceaa13333c..729d408aa55 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -7612,6 +7612,11 @@ simplify_context::simplify_subreg (machine_mode outermode, rtx op,
poly_uint64 innermostsize = GET_MODE_SIZE (innermostmode);
rtx newx;
+ /* Make sure that the relationship between the two subregs is
+ known at compile time. */
+ if (!ordered_p (outersize, innermostsize))
+ return NULL_RTX;
+
if (outermode == innermostmode
&& known_eq (byte, 0U)
&& known_eq (SUBREG_BYTE (op), 0))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c
new file mode 100644
index 00000000000..d728d1325ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464.c
@@ -0,0 +1,13 @@
+/* { dg-options "-O2" } */
+
+#include <arm_neon.h>
+#include <arm_sve.h>
+#include <arm_neon_sve_bridge.h>
+
+svuint16_t
+convolve4_4_x (uint16x8x2_t permute_tbl)
+{
+ return svset_neonq_u16 (svundef_u16 (), permute_tbl.val[1]);
+}
+
+/* { dg-final { scan-assembler {\tmov\tz0\.d, z1\.d\n} } } */
--
2.44.2

View File

@ -0,0 +1,114 @@
From 12d860b5b700b5218461a0b9e4a1a3ddb55eb211 Mon Sep 17 00:00:00 2001
From: Richard Sandiford <richard.sandiford@arm.com>
Date: Tue, 18 Jun 2024 12:22:30 +0100
Subject: [PATCH 12/16] aarch64: Use force_subreg in more places
This patch makes the aarch64 code use force_subreg instead of
simplify_gen_subreg in more places. The criteria were:
(1) The code is obviously specific to expand (where new pseudos
can be created).
(2) The value is obviously an rvalue rather than an lvalue.
(3) The offset wasn't a simple lowpart or highpart calculation;
a later patch will deal with those.
gcc/
* config/aarch64/aarch64-builtins.cc (aarch64_expand_fcmla_builtin):
Use force_subreg instead of simplify_gen_subreg.
* config/aarch64/aarch64-simd.md (ctz<mode>2): Likewise.
* config/aarch64/aarch64-sve-builtins-base.cc
(svget_impl::expand): Likewise.
(svget_neonq_impl::expand): Likewise.
* config/aarch64/aarch64-sve-builtins-functions.h
(multireg_permute::expand): Likewise.
(cherry picked from commit 1474a8eead4ab390e59ee014befa8c40346679f4)
---
gcc/config/aarch64/aarch64-builtins.cc | 4 ++--
gcc/config/aarch64/aarch64-simd.md | 4 ++--
gcc/config/aarch64/aarch64-sve-builtins-base.cc | 8 +++-----
gcc/config/aarch64/aarch64-sve-builtins-functions.h | 6 +++---
4 files changed, 10 insertions(+), 12 deletions(-)
diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
index 75d21de1401..b2e46a073a8 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2510,12 +2510,12 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode)
rtx temp2 = gen_reg_rtx (DImode);
temp1 = simplify_gen_subreg (d->mode, op2, quadmode,
subreg_lowpart_offset (d->mode, quadmode));
- temp1 = simplify_gen_subreg (V2DImode, temp1, d->mode, 0);
+ temp1 = force_subreg (V2DImode, temp1, d->mode, 0);
if (BYTES_BIG_ENDIAN)
emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const0_rtx));
else
emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const1_rtx));
- op2 = simplify_gen_subreg (d->mode, temp2, GET_MODE (temp2), 0);
+ op2 = force_subreg (d->mode, temp2, GET_MODE (temp2), 0);
/* And recalculate the index. */
lane -= nunits / 4;
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 33ab0741e87..5b9efe0b165 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -412,8 +412,8 @@
"TARGET_SIMD"
{
emit_insn (gen_bswap<mode>2 (operands[0], operands[1]));
- rtx op0_castsi2qi = simplify_gen_subreg(<VS:VSI2QI>mode, operands[0],
- <MODE>mode, 0);
+ rtx op0_castsi2qi = force_subreg (<VS:VSI2QI>mode, operands[0],
+ <MODE>mode, 0);
emit_insn (gen_aarch64_rbit<VS:vsi2qi> (op0_castsi2qi, op0_castsi2qi));
emit_insn (gen_clz<mode>2 (operands[0], operands[0]));
DONE;
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index c9182594bc1..2c95da79572 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -1121,9 +1121,8 @@ public:
expand (function_expander &e) const override
{
/* Fold the access into a subreg rvalue. */
- return simplify_gen_subreg (e.vector_mode (0), e.args[0],
- GET_MODE (e.args[0]),
- INTVAL (e.args[1]) * BYTES_PER_SVE_VECTOR);
+ return force_subreg (e.vector_mode (0), e.args[0], GET_MODE (e.args[0]),
+ INTVAL (e.args[1]) * BYTES_PER_SVE_VECTOR);
}
};
@@ -1157,8 +1156,7 @@ public:
e.add_fixed_operand (indices);
return e.generate_insn (icode);
}
- return simplify_gen_subreg (e.result_mode (), e.args[0],
- GET_MODE (e.args[0]), 0);
+ return force_subreg (e.result_mode (), e.args[0], GET_MODE (e.args[0]), 0);
}
};
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-functions.h b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
index 3b8e575e98e..7d06a57ff83 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-functions.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins-functions.h
@@ -639,9 +639,9 @@ public:
{
machine_mode elt_mode = e.vector_mode (0);
rtx arg = e.args[0];
- e.args[0] = simplify_gen_subreg (elt_mode, arg, GET_MODE (arg), 0);
- e.args.safe_push (simplify_gen_subreg (elt_mode, arg, GET_MODE (arg),
- GET_MODE_SIZE (elt_mode)));
+ e.args[0] = force_subreg (elt_mode, arg, GET_MODE (arg), 0);
+ e.args.safe_push (force_subreg (elt_mode, arg, GET_MODE (arg),
+ GET_MODE_SIZE (elt_mode)));
}
return e.use_exact_insn (icode);
}
--
2.44.2

View File

@ -0,0 +1,167 @@
From eb49bbb886ef374eddb93e866c9c9f5f314c8014 Mon Sep 17 00:00:00 2001
From: Richard Sandiford <richard.sandiford@arm.com>
Date: Tue, 18 Jun 2024 12:22:31 +0100
Subject: [PATCH 13/16] aarch64: Add some uses of force_lowpart_subreg
This patch makes more use of force_lowpart_subreg, similarly
to the recent patch for force_subreg. The criteria were:
(1) The code is obviously specific to expand (where new pseudos
can be created).
(2) The value is obviously an rvalue rather than an lvalue.
gcc/
PR target/115464
* config/aarch64/aarch64-builtins.cc (aarch64_expand_fcmla_builtin)
(aarch64_expand_rwsr_builtin): Use force_lowpart_subreg instead of
simplify_gen_subreg and lowpart_subreg.
* config/aarch64/aarch64-sve-builtins-base.cc
(svset_neonq_impl::expand): Likewise.
* config/aarch64/aarch64-sve-builtins-sme.cc
(add_load_store_slice_operand): Likewise.
* config/aarch64/aarch64.cc (aarch64_sve_reinterpret): Likewise.
(aarch64_addti_scratch_regs, aarch64_subvti_scratch_regs): Likewise.
gcc/testsuite/
PR target/115464
* gcc.target/aarch64/sve/acle/general/pr115464_2.c: New test.
(cherry picked from commit 6bd4fbae45d11795a9a6f54b866308d4d7134def)
---
gcc/config/aarch64/aarch64-builtins.cc | 11 +++++------
gcc/config/aarch64/aarch64-sve-builtins-base.cc | 2 +-
gcc/config/aarch64/aarch64-sve-builtins-sme.cc | 2 +-
gcc/config/aarch64/aarch64.cc | 14 +++++---------
.../aarch64/sve/acle/general/pr115464_2.c | 11 +++++++++++
5 files changed, 23 insertions(+), 17 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c
diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
index b2e46a073a8..264b9560709 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -2497,8 +2497,7 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode)
int lane = INTVAL (lane_idx);
if (lane < nunits / 4)
- op2 = simplify_gen_subreg (d->mode, op2, quadmode,
- subreg_lowpart_offset (d->mode, quadmode));
+ op2 = force_lowpart_subreg (d->mode, op2, quadmode);
else
{
/* Select the upper 64 bits, either a V2SF or V4HF, this however
@@ -2508,8 +2507,7 @@ aarch64_expand_fcmla_builtin (tree exp, rtx target, int fcode)
gen_highpart_mode generates code that isn't optimal. */
rtx temp1 = gen_reg_rtx (d->mode);
rtx temp2 = gen_reg_rtx (DImode);
- temp1 = simplify_gen_subreg (d->mode, op2, quadmode,
- subreg_lowpart_offset (d->mode, quadmode));
+ temp1 = force_lowpart_subreg (d->mode, op2, quadmode);
temp1 = force_subreg (V2DImode, temp1, d->mode, 0);
if (BYTES_BIG_ENDIAN)
emit_insn (gen_aarch64_get_lanev2di (temp2, temp1, const0_rtx));
@@ -2754,7 +2752,7 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int fcode)
case AARCH64_WSR64:
case AARCH64_WSRF64:
case AARCH64_WSR128:
- subreg = lowpart_subreg (sysreg_mode, input_val, mode);
+ subreg = force_lowpart_subreg (sysreg_mode, input_val, mode);
break;
case AARCH64_WSRF:
subreg = gen_lowpart_SUBREG (SImode, input_val);
@@ -2789,7 +2787,8 @@ aarch64_expand_rwsr_builtin (tree exp, rtx target, int fcode)
case AARCH64_RSR64:
case AARCH64_RSRF64:
case AARCH64_RSR128:
- return lowpart_subreg (TYPE_MODE (TREE_TYPE (exp)), target, sysreg_mode);
+ return force_lowpart_subreg (TYPE_MODE (TREE_TYPE (exp)),
+ target, sysreg_mode);
case AARCH64_RSRF:
subreg = gen_lowpart_SUBREG (SImode, target);
return gen_lowpart_SUBREG (SFmode, subreg);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 2c95da79572..3c970e9c5f8 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -1183,7 +1183,7 @@ public:
if (BYTES_BIG_ENDIAN)
return e.use_exact_insn (code_for_aarch64_sve_set_neonq (mode));
insn_code icode = code_for_vcond_mask (mode, mode);
- e.args[1] = lowpart_subreg (mode, e.args[1], GET_MODE (e.args[1]));
+ e.args[1] = force_lowpart_subreg (mode, e.args[1], GET_MODE (e.args[1]));
e.add_output_operand (icode);
e.add_input_operand (icode, e.args[1]);
e.add_input_operand (icode, e.args[0]);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sme.cc b/gcc/config/aarch64/aarch64-sve-builtins-sme.cc
index f4c91bcbb95..b66b35ae60b 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sme.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sme.cc
@@ -112,7 +112,7 @@ add_load_store_slice_operand (function_expander &e, insn_code icode,
rtx base = e.args[argno];
if (e.mode_suffix_id == MODE_vnum)
{
- rtx vnum = lowpart_subreg (SImode, e.args[vnum_argno], DImode);
+ rtx vnum = force_lowpart_subreg (SImode, e.args[vnum_argno], DImode);
base = simplify_gen_binary (PLUS, SImode, base, vnum);
}
e.add_input_operand (icode, base);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 1beec94629d..a064aeecbc0 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -3284,7 +3284,7 @@ aarch64_sve_reinterpret (machine_mode mode, rtx x)
/* can_change_mode_class must only return true if subregs and svreinterprets
have the same semantics. */
if (targetm.can_change_mode_class (GET_MODE (x), mode, FP_REGS))
- return lowpart_subreg (mode, x, GET_MODE (x));
+ return force_lowpart_subreg (mode, x, GET_MODE (x));
rtx res = gen_reg_rtx (mode);
x = force_reg (GET_MODE (x), x);
@@ -26979,9 +26979,8 @@ aarch64_addti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
rtx *high_in2)
{
*low_dest = gen_reg_rtx (DImode);
- *low_in1 = gen_lowpart (DImode, op1);
- *low_in2 = simplify_gen_subreg (DImode, op2, TImode,
- subreg_lowpart_offset (DImode, TImode));
+ *low_in1 = force_lowpart_subreg (DImode, op1, TImode);
+ *low_in2 = force_lowpart_subreg (DImode, op2, TImode);
*high_dest = gen_reg_rtx (DImode);
*high_in1 = gen_highpart (DImode, op1);
*high_in2 = simplify_gen_subreg (DImode, op2, TImode,
@@ -27013,11 +27012,8 @@ aarch64_subvti_scratch_regs (rtx op1, rtx op2, rtx *low_dest,
rtx *high_in2)
{
*low_dest = gen_reg_rtx (DImode);
- *low_in1 = simplify_gen_subreg (DImode, op1, TImode,
- subreg_lowpart_offset (DImode, TImode));
-
- *low_in2 = simplify_gen_subreg (DImode, op2, TImode,
- subreg_lowpart_offset (DImode, TImode));
+ *low_in1 = force_lowpart_subreg (DImode, op1, TImode);
+ *low_in2 = force_lowpart_subreg (DImode, op2, TImode);
*high_dest = gen_reg_rtx (DImode);
*high_in1 = simplify_gen_subreg (DImode, op1, TImode,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c
new file mode 100644
index 00000000000..f561c34f732
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c
@@ -0,0 +1,11 @@
+/* { dg-options "-O2" } */
+
+#include <arm_neon.h>
+#include <arm_sve.h>
+#include <arm_neon_sve_bridge.h>
+
+svuint16_t
+convolve4_4_x (uint16x8x2_t permute_tbl, svuint16_t a)
+{
+ return svset_neonq_u16 (a, permute_tbl.val[1]);
+}
--
2.44.2

View File

@ -0,0 +1,121 @@
From 2dcdf9d026ed2e881b0bd8b378ed072e410490fe Mon Sep 17 00:00:00 2001
From: Richard Sandiford <richard.sandiford@arm.com>
Date: Tue, 18 Jun 2024 12:22:31 +0100
Subject: [PATCH 14/16] Add force_lowpart_subreg
optabs had a local function called lowpart_subreg_maybe_copy
that is very similar to the lowpart version of force_subreg.
This patch adds a force_lowpart_subreg wrapper around
force_subreg and uses it in optabs.cc.
The only difference between the old and new functions is that
the old one asserted success while the new one doesn't.
It's common not to assert elsewhere when taking subregs;
normally a null result is enough.
Later patches will make more use of the new function.
gcc/
* explow.h (force_lowpart_subreg): Declare.
* explow.cc (force_lowpart_subreg): New function.
* optabs.cc (lowpart_subreg_maybe_copy): Delete.
(expand_absneg_bit): Use force_lowpart_subreg instead of
lowpart_subreg_maybe_copy.
(expand_copysign_bit): Likewise.
(cherry picked from commit 5f40d1c0cc6ce91ef28d326b8707b3f05e6f239c)
---
gcc/explow.cc | 14 ++++++++++++++
gcc/explow.h | 1 +
gcc/optabs.cc | 24 ++----------------------
3 files changed, 17 insertions(+), 22 deletions(-)
diff --git a/gcc/explow.cc b/gcc/explow.cc
index f6843398c4b..5fdfa81f69b 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -760,6 +760,20 @@ force_subreg (machine_mode outermode, rtx op,
return simplify_gen_subreg (outermode, op, innermode, byte);
}
+/* Try to return an rvalue expression for the OUTERMODE lowpart of OP,
+ which has mode INNERMODE. Allow OP to be forced into a new register
+ if necessary.
+
+ Return null on failure. */
+
+rtx
+force_lowpart_subreg (machine_mode outermode, rtx op,
+ machine_mode innermode)
+{
+ auto byte = subreg_lowpart_offset (outermode, innermode);
+ return force_subreg (outermode, op, innermode, byte);
+}
+
/* If X is a memory ref, copy its contents to a new temp reg and return
that reg. Otherwise, return X. */
diff --git a/gcc/explow.h b/gcc/explow.h
index cbd1fcb7eb3..dd654649b06 100644
--- a/gcc/explow.h
+++ b/gcc/explow.h
@@ -43,6 +43,7 @@ extern rtx copy_to_suggested_reg (rtx, rtx, machine_mode);
extern rtx force_reg (machine_mode, rtx);
extern rtx force_subreg (machine_mode, rtx, machine_mode, poly_uint64);
+extern rtx force_lowpart_subreg (machine_mode, rtx, machine_mode);
/* Return given rtx, copied into a new temp reg if it was in memory. */
extern rtx force_not_mem (rtx);
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index ce91f94ed43..804c0dc73ba 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -3096,26 +3096,6 @@ expand_ffs (scalar_int_mode mode, rtx op0, rtx target)
return 0;
}
-/* Extract the OMODE lowpart from VAL, which has IMODE. Under certain
- conditions, VAL may already be a SUBREG against which we cannot generate
- a further SUBREG. In this case, we expect forcing the value into a
- register will work around the situation. */
-
-static rtx
-lowpart_subreg_maybe_copy (machine_mode omode, rtx val,
- machine_mode imode)
-{
- rtx ret;
- ret = lowpart_subreg (omode, val, imode);
- if (ret == NULL)
- {
- val = force_reg (imode, val);
- ret = lowpart_subreg (omode, val, imode);
- gcc_assert (ret != NULL);
- }
- return ret;
-}
-
/* Expand a floating point absolute value or negation operation via a
logical operation on the sign bit. */
@@ -3204,7 +3184,7 @@ expand_absneg_bit (enum rtx_code code, scalar_float_mode mode,
gen_lowpart (imode, op0),
immed_wide_int_const (mask, imode),
gen_lowpart (imode, target), 1, OPTAB_LIB_WIDEN);
- target = lowpart_subreg_maybe_copy (mode, temp, imode);
+ target = force_lowpart_subreg (mode, temp, imode);
set_dst_reg_note (get_last_insn (), REG_EQUAL,
gen_rtx_fmt_e (code, mode, copy_rtx (op0)),
@@ -4043,7 +4023,7 @@ expand_copysign_bit (scalar_float_mode mode, rtx op0, rtx op1, rtx target,
temp = expand_binop (imode, ior_optab, op0, op1,
gen_lowpart (imode, target), 1, OPTAB_LIB_WIDEN);
- target = lowpart_subreg_maybe_copy (mode, temp, imode);
+ target = force_lowpart_subreg (mode, temp, imode);
}
return target;
--
2.44.2

View File

@ -0,0 +1,194 @@
From d02fe5a6bfdfcae086e5374db3f8fd076df9b1a5 Mon Sep 17 00:00:00 2001
From: Richard Sandiford <richard.sandiford@arm.com>
Date: Tue, 18 Jun 2024 12:22:30 +0100
Subject: [PATCH 15/16] Make more use of force_subreg
This patch makes target-independent code use force_subreg instead
of simplify_gen_subreg in some places. The criteria were:
(1) The code is obviously specific to expand (where new pseudos
can be created), or at least would be invalid to call when
!can_create_pseudo_p () and temporaries are needed.
(2) The value is obviously an rvalue rather than an lvalue.
(3) The offset wasn't a simple lowpart or highpart calculation;
a later patch will deal with those.
Doing this should reduce the likelihood of bugs like PR115464
occuring in other situations.
gcc/
* expmed.cc (store_bit_field_using_insv): Use force_subreg
instead of simplify_gen_subreg.
(store_bit_field_1): Likewise.
(extract_bit_field_as_subreg): Likewise.
(extract_integral_bit_field): Likewise.
(emit_store_flag_1): Likewise.
* expr.cc (convert_move): Likewise.
(convert_modes): Likewise.
(emit_group_load_1): Likewise.
(emit_group_store): Likewise.
(expand_assignment): Likewise.
(cherry picked from commit d4047da6a070175aae7121c739d1cad6b08ff4b2)
---
gcc/expmed.cc | 22 ++++++++--------------
gcc/expr.cc | 27 ++++++++++++---------------
2 files changed, 20 insertions(+), 29 deletions(-)
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 19765311b95..bd190722de6 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -695,13 +695,7 @@ store_bit_field_using_insv (const extraction_insn *insv, rtx op0,
if we must narrow it, be sure we do it correctly. */
if (GET_MODE_SIZE (value_mode) < GET_MODE_SIZE (op_mode))
- {
- tmp = simplify_subreg (op_mode, value1, value_mode, 0);
- if (! tmp)
- tmp = simplify_gen_subreg (op_mode,
- force_reg (value_mode, value1),
- value_mode, 0);
- }
+ tmp = force_subreg (op_mode, value1, value_mode, 0);
else
{
tmp = gen_lowpart_if_possible (op_mode, value1);
@@ -800,7 +794,7 @@ store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
if (known_eq (bitnum, 0U)
&& known_eq (bitsize, GET_MODE_BITSIZE (GET_MODE (op0))))
{
- sub = simplify_gen_subreg (GET_MODE (op0), value, fieldmode, 0);
+ sub = force_subreg (GET_MODE (op0), value, fieldmode, 0);
if (sub)
{
if (reverse)
@@ -1627,7 +1621,7 @@ extract_bit_field_as_subreg (machine_mode mode, rtx op0,
&& known_eq (bitsize, GET_MODE_BITSIZE (mode))
&& lowpart_bit_field_p (bitnum, bitsize, op0_mode)
&& TRULY_NOOP_TRUNCATION_MODES_P (mode, op0_mode))
- return simplify_gen_subreg (mode, op0, op0_mode, bytenum);
+ return force_subreg (mode, op0, op0_mode, bytenum);
return NULL_RTX;
}
@@ -1994,11 +1988,11 @@ extract_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
return convert_extracted_bit_field (target, mode, tmode, unsignedp);
}
/* If OP0 is a hard register, copy it to a pseudo before calling
- simplify_gen_subreg. */
+ force_subreg. */
if (REG_P (op0) && HARD_REGISTER_P (op0))
op0 = copy_to_reg (op0);
- op0 = simplify_gen_subreg (word_mode, op0, op0_mode.require (),
- bitnum / BITS_PER_WORD * UNITS_PER_WORD);
+ op0 = force_subreg (word_mode, op0, op0_mode.require (),
+ bitnum / BITS_PER_WORD * UNITS_PER_WORD);
op0_mode = word_mode;
bitnum %= BITS_PER_WORD;
}
@@ -5759,8 +5753,8 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx op0, rtx op1,
/* Do a logical OR or AND of the two words and compare the
result. */
- op00 = simplify_gen_subreg (word_mode, op0, int_mode, 0);
- op01 = simplify_gen_subreg (word_mode, op0, int_mode, UNITS_PER_WORD);
+ op00 = force_subreg (word_mode, op0, int_mode, 0);
+ op01 = force_subreg (word_mode, op0, int_mode, UNITS_PER_WORD);
tem = expand_binop (word_mode,
op1 == const0_rtx ? ior_optab : and_optab,
op00, op01, NULL_RTX, unsignedp,
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 9f66d479445..8ffa76b1bb8 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -302,7 +302,7 @@ convert_move (rtx to, rtx from, int unsignedp)
GET_MODE_BITSIZE (to_mode)));
if (VECTOR_MODE_P (to_mode))
- from = simplify_gen_subreg (to_mode, from, GET_MODE (from), 0);
+ from = force_subreg (to_mode, from, GET_MODE (from), 0);
else
to = simplify_gen_subreg (from_mode, to, GET_MODE (to), 0);
@@ -936,7 +936,7 @@ convert_modes (machine_mode mode, machine_mode oldmode, rtx x, int unsignedp)
{
gcc_assert (known_eq (GET_MODE_BITSIZE (mode),
GET_MODE_BITSIZE (oldmode)));
- return simplify_gen_subreg (mode, x, oldmode, 0);
+ return force_subreg (mode, x, oldmode, 0);
}
temp = gen_reg_rtx (mode);
@@ -3076,8 +3076,8 @@ emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, tree type,
}
}
else if (CONSTANT_P (src) && GET_MODE (dst) != BLKmode
- && XVECLEN (dst, 0) > 1)
- tmps[i] = simplify_gen_subreg (mode, src, GET_MODE (dst), bytepos);
+ && XVECLEN (dst, 0) > 1)
+ tmps[i] = force_subreg (mode, src, GET_MODE (dst), bytepos);
else if (CONSTANT_P (src))
{
if (known_eq (bytelen, ssize))
@@ -3301,7 +3301,7 @@ emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
if (known_eq (rtx_to_poly_int64 (XEXP (XVECEXP (src, 0, start), 1)),
bytepos))
{
- temp = simplify_gen_subreg (outer, tmps[start], inner, 0);
+ temp = force_subreg (outer, tmps[start], inner, 0);
if (temp)
{
emit_move_insn (dst, temp);
@@ -3321,7 +3321,7 @@ emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
finish - 1), 1)),
bytepos))
{
- temp = simplify_gen_subreg (outer, tmps[finish - 1], inner, 0);
+ temp = force_subreg (outer, tmps[finish - 1], inner, 0);
if (temp)
{
emit_move_insn (dst, temp);
@@ -6195,11 +6195,9 @@ expand_assignment (tree to, tree from, bool nontemporal)
to_mode = GET_MODE_INNER (to_mode);
machine_mode from_mode = GET_MODE_INNER (GET_MODE (result));
rtx from_real
- = simplify_gen_subreg (to_mode, XEXP (result, 0),
- from_mode, 0);
+ = force_subreg (to_mode, XEXP (result, 0), from_mode, 0);
rtx from_imag
- = simplify_gen_subreg (to_mode, XEXP (result, 1),
- from_mode, 0);
+ = force_subreg (to_mode, XEXP (result, 1), from_mode, 0);
if (!from_real || !from_imag)
goto concat_store_slow;
emit_move_insn (XEXP (to_rtx, 0), from_real);
@@ -6215,8 +6213,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
if (MEM_P (result))
from_rtx = change_address (result, to_mode, NULL_RTX);
else
- from_rtx
- = simplify_gen_subreg (to_mode, result, from_mode, 0);
+ from_rtx = force_subreg (to_mode, result, from_mode, 0);
if (from_rtx)
{
emit_move_insn (XEXP (to_rtx, 0),
@@ -6228,10 +6225,10 @@ expand_assignment (tree to, tree from, bool nontemporal)
{
to_mode = GET_MODE_INNER (to_mode);
rtx from_real
- = simplify_gen_subreg (to_mode, result, from_mode, 0);
+ = force_subreg (to_mode, result, from_mode, 0);
rtx from_imag
- = simplify_gen_subreg (to_mode, result, from_mode,
- GET_MODE_SIZE (to_mode));
+ = force_subreg (to_mode, result, from_mode,
+ GET_MODE_SIZE (to_mode));
if (!from_real || !from_imag)
goto concat_store_slow;
emit_move_insn (XEXP (to_rtx, 0), from_real);
--
2.44.2

View File

@ -0,0 +1,46 @@
From 5468439a1f987b7d801c6c76d6c989e57af8916a Mon Sep 17 00:00:00 2001
From: Richard Sandiford <richard.sandiford@arm.com>
Date: Tue, 25 Jun 2024 09:41:21 +0100
Subject: [PATCH 16/16] Revert one of the force_subreg changes
One of the changes in g:d4047da6a070175aae7121c739d1cad6b08ff4b2
caused a regression in ft32-elf; see:
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655418.html
for details. This change was different from the others in that the
original call was to simplify_subreg rather than simplify_lowpart_subreg.
The old code would therefore go on to do the force_reg for more cases
than the new code would.
gcc/
* expmed.cc (store_bit_field_using_insv): Revert earlier change
to use force_subreg instead of simplify_gen_subreg.
(cherry picked from commit b694bf417cdd7d0a4d78e9927bab6bc202b7df6c)
---
gcc/expmed.cc | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index bd190722de6..85ec2614a3f 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -695,7 +695,13 @@ store_bit_field_using_insv (const extraction_insn *insv, rtx op0,
if we must narrow it, be sure we do it correctly. */
if (GET_MODE_SIZE (value_mode) < GET_MODE_SIZE (op_mode))
- tmp = force_subreg (op_mode, value1, value_mode, 0);
+ {
+ tmp = simplify_subreg (op_mode, value1, value_mode, 0);
+ if (! tmp)
+ tmp = simplify_gen_subreg (op_mode,
+ force_reg (value_mode, value1),
+ value_mode, 0);
+ }
else
{
tmp = gen_lowpart_if_possible (op_mode, value1);
--
2.44.2