[AMDGPU] Implement CFI for CSR spills #164724

slinder1 · 2025-10-22T22:28:08Z

Introduce new SPILL pseudos to allow CFI to be generated for only CSR
spills, and to make ISA-instruction-level accurate information.

Other targets either generate slightly incorrect information or rely on
conventions for how spills are placed within the entry block. The
approach in this change produces larger unwind tables, with the
increased size being spent on additional DW_CFA_advance_location
instructions needed to describe the unwinding accurately.

Co-authored-by: Scott Linder scott.linder@amd.com
Co-authored-by: Venkata Ramanaiah Nalamothu VenkataRamanaiah.Nalamothu@amd.com

Introduce new SPILL pseudos to allow CFI to be generated for only CSR spills, and to make ISA-instruction-level accurate information. Other targets either generate slightly incorrect information or rely on conventions for how spills are placed within the entry block. The approach in this change produces larger unwind tables, with the increased size being spent on additional DW_CFA_advance_location instructions needed to describe the unwinding accurately. Co-authored-by: Scott Linder <scott.linder@amd.com> Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu@amd.com>

slinder1 · 2025-10-22T22:28:26Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

[AMDGPU][MC] Replace shifted registers in CFI instructions #164726
[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs #164725
[AMDGPU] Implement CFI for CSR spills #164724 👈 (View in Graphite)
[AMDGPU] Implement CFI for non-kernel functions #164723
[AMDGPU] Emit entry function Dwarf CFI #164722
[MC][Dwarf] Add custom CFI pseudo-ops for use in AMDGPU #164721 : 1 other dependent PR (#166464 )
[MC] Use a variant to hold MCCFIInstruction state (NFC) #164720
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-10-22T22:39:43Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-globalisel

Author: Scott Linder (slinder1)

Changes

Introduce new SPILL pseudos to allow CFI to be generated for only CSR
spills, and to make ISA-instruction-level accurate information.

Other targets either generate slightly incorrect information or rely on
conventions for how spills are placed within the entry block. The
approach in this change produces larger unwind tables, with the
increased size being spent on additional DW_CFA_advance_location
instructions needed to describe the unwinding accurately.

Co-authored-by: Scott Linder <scott.linder@amd.com>
Co-authored-by: Venkata Ramanaiah Nalamothu <VenkataRamanaiah.Nalamothu@amd.com>

Patch is 4.01 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/164724.diff

101 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIFrameLowering.cpp (+93-6)
(modified) llvm/lib/Target/AMDGPU/SIFrameLowering.h (+22)
(modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+92-45)
(modified) llvm/lib/Target/AMDGPU/SIInstrInfo.h (+18-1)
(modified) llvm/lib/Target/AMDGPU/SIInstructions.td (+18)
(modified) llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp (+36-54)
(modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+203-20)
(modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.h (+11-4)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/assert-align.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/call-outgoing-stack-args.ll (+14-14)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/localizer.ll (+4-4)
(modified) llvm/test/CodeGen/AMDGPU/a-v-flat-atomicrmw.ll (+4-4)
(modified) llvm/test/CodeGen/AMDGPU/a-v-global-atomicrmw.ll (+4-4)
(modified) llvm/test/CodeGen/AMDGPU/abi-attribute-hints-undefined-behavior.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/accvgpr-spill-scc-clobber.mir (+2688)
(modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll (+4-3)
(modified) llvm/test/CodeGen/AMDGPU/amdgcn-call-whole-wave.ll (+14-14)
(modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll (+5621-3349)
(modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.256bit.ll (+5-5)
(modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.320bit.ll (+26-26)
(modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll (+1130-1130)
(modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.576bit.ll (+114-114)
(modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.640bit.ll (+609-33)
(modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.704bit.ll (+649-57)
(modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.768bit.ll (+736-128)
(modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll (+878-262)
(modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll (+1054-438)
(modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll (+1229-613)
(modified) llvm/test/CodeGen/AMDGPU/amdgpu-cs-chain-preserve-cc.ll (+2)
(modified) llvm/test/CodeGen/AMDGPU/amdgpu-simplify-libcall-pow-codegen.ll (+154-154)
(modified) llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit-undefined-behavior2.ll (+19-18)
(modified) llvm/test/CodeGen/AMDGPU/av_spill_cross_bb_usage.mir (+12)
(modified) llvm/test/CodeGen/AMDGPU/bf16.ll (+214-208)
(modified) llvm/test/CodeGen/AMDGPU/branch-relax-spill.ll (+78-78)
(modified) llvm/test/CodeGen/AMDGPU/call-args-inreg.ll (+144-144)
(modified) llvm/test/CodeGen/AMDGPU/call-argument-types.ll (+68-68)
(modified) llvm/test/CodeGen/AMDGPU/call-graph-register-usage.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll (+60-60)
(modified) llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll (+96-96)
(modified) llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs-packed.ll (+23-23)
(modified) llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs.ll (+23-23)
(modified) llvm/test/CodeGen/AMDGPU/cross-block-use-is-not-abi-copy.ll (+12-12)
(modified) llvm/test/CodeGen/AMDGPU/debug-frame.ll (+491-15)
(modified) llvm/test/CodeGen/AMDGPU/dwarf-multi-register-use-crash.ll (+49-33)
(modified) llvm/test/CodeGen/AMDGPU/dynamic-vgpr-reserve-stack-for-cwsr.ll (+8-10)
(modified) llvm/test/CodeGen/AMDGPU/eliminate-frame-index-s-mov-b32.mir (+96)
(modified) llvm/test/CodeGen/AMDGPU/fix-frame-reg-in-custom-csr-spills.ll (+5-5)
(modified) llvm/test/CodeGen/AMDGPU/frame-index.mir (+96)
(modified) llvm/test/CodeGen/AMDGPU/frame-setup-without-sgpr-to-vgpr-spills.ll (+7-22)
(modified) llvm/test/CodeGen/AMDGPU/function-args-inreg.ll (+15-14)
(modified) llvm/test/CodeGen/AMDGPU/gfx-call-non-gfx-func.ll (+74-74)
(modified) llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll (+2269-2280)
(modified) llvm/test/CodeGen/AMDGPU/gfx-callable-preserved-registers.ll (+118-121)
(modified) llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll (+139-83)
(modified) llvm/test/CodeGen/AMDGPU/global-alias.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll (+46-46)
(modified) llvm/test/CodeGen/AMDGPU/indirect-call.ll (+552-552)
(modified) llvm/test/CodeGen/AMDGPU/insert-delay-alu-bug.ll (+3-4)
(modified) llvm/test/CodeGen/AMDGPU/insert-waitcnts-crash.ll (+9-8)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.readfirstlane.ll (+16-16)
(modified) llvm/test/CodeGen/AMDGPU/llvm.maximum.f64.ll (+6-6)
(modified) llvm/test/CodeGen/AMDGPU/llvm.minimum.f64.ll (+6-6)
(modified) llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.gfx10.ll (+51-49)
(modified) llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll (+826-825)
(modified) llvm/test/CodeGen/AMDGPU/maximumnum.bf16.ll (+14-14)
(modified) llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/minimumnum.bf16.ll (+14-14)
(modified) llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll (+13-13)
(modified) llvm/test/CodeGen/AMDGPU/need-fp-from-vgpr-spills.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/nested-calls.ll (+6-6)
(modified) llvm/test/CodeGen/AMDGPU/no-source-locations-in-prologue.ll (+2-1)
(modified) llvm/test/CodeGen/AMDGPU/pei-amdgpu-cs-chain-preserve.mir (+1)
(modified) llvm/test/CodeGen/AMDGPU/pei-vgpr-block-spill-csr.mir (+320)
(modified) llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll (+9-24)
(modified) llvm/test/CodeGen/AMDGPU/s-getpc-b64-remat.ll (+6-7)
(modified) llvm/test/CodeGen/AMDGPU/sgpr-spill-overlap-wwm-reserve.mir (+162-95)
(modified) llvm/test/CodeGen/AMDGPU/sgpr-spills-split-regalloc.ll (+6-21)
(modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2i64.v8i64.ll (+237-205)
(modified) llvm/test/CodeGen/AMDGPU/si-lower-sgpr-spills-vgpr-lanes-usage.mir (+12-9)
(modified) llvm/test/CodeGen/AMDGPU/si-lower-sgpr-spills.mir (+5)
(modified) llvm/test/CodeGen/AMDGPU/sibling-call.ll (+123-123)
(modified) llvm/test/CodeGen/AMDGPU/spill-partial-csr-sgpr-live-ins.mir (+5)
(modified) llvm/test/CodeGen/AMDGPU/spill-sgpr-csr-live-ins.mir (+1)
(modified) llvm/test/CodeGen/AMDGPU/spill-sgpr-to-virtual-vgpr.mir (+16)
(modified) llvm/test/CodeGen/AMDGPU/spill-vgpr-block.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/spill_more_than_wavesize_csr_sgprs.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/stack-realign.ll (+9-9)
(modified) llvm/test/CodeGen/AMDGPU/stacksave_stackrestore.ll (+11-11)
(modified) llvm/test/CodeGen/AMDGPU/strictfp_f16_abi_promote.ll (+27-27)
(modified) llvm/test/CodeGen/AMDGPU/swdev504645-global-fold.ll (+4-3)
(modified) llvm/test/CodeGen/AMDGPU/tail-call-inreg-arguments.error.ll (+5-5)
(modified) llvm/test/CodeGen/AMDGPU/tuple-allocation-failure.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/unfold-masked-merge-scalar-variablemask.ll (+18-20)
(modified) llvm/test/CodeGen/AMDGPU/unspill-vgpr-after-rewrite-vgpr-mfma.ll (+10-8)
(modified) llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll (+84-84)
(modified) llvm/test/CodeGen/AMDGPU/vgpr-tuple-allocation.ll (+44-37)
(modified) llvm/test/CodeGen/AMDGPU/wave32.ll (+6-6)
(modified) llvm/test/CodeGen/AMDGPU/whole-wave-functions.ll (+50-54)
(modified) llvm/test/CodeGen/AMDGPU/whole-wave-register-copy.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/whole-wave-register-spill.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll (+4-4)

diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
index 5a0b1afbdfdff..bbde3c49f64c6 100644
--- a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
@@ -2244,17 +2244,49 @@ bool SIFrameLowering::allocateScavengingFrameIndexesNearIncomingSP(
   return true;
 }
 
+static bool isLiveIntoMBB(MCRegister Reg, MachineBasicBlock &MBB,
+                          const TargetRegisterInfo *TRI) {
+  for (MCRegAliasIterator R(Reg, TRI, true); R.isValid(); ++R) {
+    if (MBB.isLiveIn(*R)) {
+      return true;
+    }
+  }
+  return false;
+}
+
 bool SIFrameLowering::spillCalleeSavedRegisters(
     MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
     ArrayRef<CalleeSavedInfo> CSI, const TargetRegisterInfo *TRI) const {
   MachineFunction *MF = MBB.getParent();
   const GCNSubtarget &ST = MF->getSubtarget<GCNSubtarget>();
-  if (!ST.useVGPRBlockOpsForCSR())
-    return false;
+  const SIInstrInfo *TII = ST.getInstrInfo();
+  const SIRegisterInfo *SITRI = static_cast<const SIRegisterInfo *>(TRI);
+
+  if (!ST.useVGPRBlockOpsForCSR()) {
+    for (const CalleeSavedInfo &CS : CSI) {
+      // Insert the spill to the stack frame.
+      unsigned Reg = CS.getReg();
+
+      if (CS.isSpilledToReg()) {
+        BuildMI(MBB, MI, DebugLoc(), TII->get(TargetOpcode::COPY),
+                CS.getDstReg())
+            .addReg(Reg, getKillRegState(true));
+      } else {
+        const TargetRegisterClass *RC = TRI->getMinimalPhysRegClass(
+            Reg, Reg == SITRI->getReturnAddressReg(*MF) ? MVT::i64 : MVT::i32);
+        // If this value was already livein, we probably have a direct use of
+        // the incoming register value, so don't kill at the spill point. This
+        // happens since we pass some special inputs (workgroup IDs) in the
+        // callee saved range.
+        const bool IsLiveIn = isLiveIntoMBB(Reg, MBB, TRI);
+        TII->storeRegToStackSlotCFI(MBB, MI, Reg, !IsLiveIn, CS.getFrameIdx(),
+                                    RC, TRI);
+      }
+    }
+    return true;
+  }
 
   MachineFrameInfo &FrameInfo = MF->getFrameInfo();
-  SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
-  const SIInstrInfo *TII = ST.getInstrInfo();
   SIMachineFunctionInfo *FuncInfo = MF->getInfo<SIMachineFunctionInfo>();
 
   const TargetRegisterClass *BlockRegClass =
@@ -2278,10 +2310,10 @@ bool SIFrameLowering::spillCalleeSavedRegisters(
                                  FrameInfo.getObjectAlign(FrameIndex));
 
     BuildMI(MBB, MI, MI->getDebugLoc(),
-            TII->get(AMDGPU::SI_BLOCK_SPILL_V1024_SAVE))
+            TII->get(AMDGPU::SI_BLOCK_SPILL_V1024_CFI_SAVE))
         .addReg(Reg, getKillRegState(false))
         .addFrameIndex(FrameIndex)
-        .addReg(MFI->getStackPtrOffsetReg())
+        .addReg(FuncInfo->getStackPtrOffsetReg())
         .addImm(0)
         .addImm(Mask)
         .addMemOperand(MMO);
@@ -2467,6 +2499,22 @@ MachineInstr *SIFrameLowering::buildCFI(MachineBasicBlock &MBB,
       .setMIFlag(flag);
 }
 
+MachineInstr *SIFrameLowering::buildCFIForVRegToVRegSpill(
+    MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
+    const DebugLoc &DL, const Register Reg, const Register RegCopy) const {
+  MachineFunction &MF = *MBB.getParent();
+  const MCRegisterInfo &MCRI = *MF.getContext().getRegisterInfo();
+  const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
+
+  unsigned MaskReg = MCRI.getDwarfRegNum(
+      ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC, false);
+  auto CFIInst = MCCFIInstruction::createLLVMVectorRegisterMask(
+      nullptr, MCRI.getDwarfRegNum(Reg, false),
+      MCRI.getDwarfRegNum(RegCopy, false), VGPRLaneBitSize, MaskReg,
+      ST.getWavefrontSize());
+  return buildCFI(MBB, MBBI, DL, std::move(CFIInst));
+}
+
 MachineInstr *SIFrameLowering::buildCFIForSGPRToVGPRSpill(
     MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
     const DebugLoc &DL, const Register SGPR, const Register VGPR,
@@ -2515,6 +2563,34 @@ MachineInstr *SIFrameLowering::buildCFIForSGPRToVGPRSpill(
   return buildCFI(MBB, MBBI, DL, std::move(CFIInst));
 }
 
+MachineInstr *SIFrameLowering::buildCFIForSGPRToVMEMSpill(
+    MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
+    const DebugLoc &DL, unsigned SGPR, int64_t Offset) const {
+  MachineFunction &MF = *MBB.getParent();
+  const MCRegisterInfo &MCRI = *MF.getContext().getRegisterInfo();
+  return buildCFI(MBB, MBBI, DL,
+                  llvm::MCCFIInstruction::createOffset(
+                      nullptr, MCRI.getDwarfRegNum(SGPR, false), Offset));
+}
+
+MachineInstr *SIFrameLowering::buildCFIForVGPRToVMEMSpill(
+    MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
+    const DebugLoc &DL, unsigned VGPR, int64_t Offset) const {
+  const MachineFunction &MF = *MBB.getParent();
+  const MCRegisterInfo &MCRI = *MF.getContext().getRegisterInfo();
+  const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
+
+  int DwarfVGPR = MCRI.getDwarfRegNum(VGPR, false);
+  assert(DwarfVGPR != -1);
+
+  unsigned MaskReg = MCRI.getDwarfRegNum(
+      ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC, false);
+  auto CFIInst = MCCFIInstruction::createLLVMVectorOffset(
+      nullptr, DwarfVGPR, VGPRLaneBitSize, MaskReg, ST.getWavefrontSize(),
+      Offset);
+  return buildCFI(MBB, MBBI, DL, std::move(CFIInst));
+}
+
 MachineInstr *SIFrameLowering::buildCFIForRegToSGPRPairSpill(
     MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
     const DebugLoc &DL, const Register Reg, const Register SGPRPair) const {
@@ -2535,3 +2611,14 @@ MachineInstr *SIFrameLowering::buildCFIForRegToSGPRPairSpill(
       nullptr, DwarfReg, DwarfSGPR0, SGPRBitSize, DwarfSGPR1, SGPRBitSize);
   return buildCFI(MBB, MBBI, DL, std::move(CFIInst));
 }
+
+MachineInstr *
+SIFrameLowering::buildCFIForSameValue(MachineBasicBlock &MBB,
+                                      MachineBasicBlock::iterator MBBI,
+                                      const DebugLoc &DL, Register Reg) const {
+  const MachineFunction &MF = *MBB.getParent();
+  const MCRegisterInfo &MCRI = *MF.getContext().getRegisterInfo();
+  int DwarfReg = MCRI.getDwarfRegNum(Reg, /*isEH=*/false);
+  auto CFIInst = MCCFIInstruction::createSameValue(nullptr, DwarfReg);
+  return buildCFI(MBB, MBBI, DL, std::move(CFIInst));
+}
diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.h b/llvm/lib/Target/AMDGPU/SIFrameLowering.h
index 20f608f2dfc24..2b716db0b7a22 100644
--- a/llvm/lib/Target/AMDGPU/SIFrameLowering.h
+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.h
@@ -120,6 +120,13 @@ class SIFrameLowering final : public AMDGPUFrameLowering {
            const DebugLoc &DL, const MCCFIInstruction &CFIInst,
            MachineInstr::MIFlag flag = MachineInstr::FrameSetup) const;
 
+  /// Create a CFI index describing a spill of the VGPR/AGPR \p Reg to another
+  /// VGPR/AGPR \p RegCopy and build a MachineInstr around it.
+  MachineInstr *buildCFIForVRegToVRegSpill(MachineBasicBlock &MBB,
+                                           MachineBasicBlock::iterator MBBI,
+                                           const DebugLoc &DL,
+                                           const Register Reg,
+                                           const Register RegCopy) const;
   /// Create a CFI index describing a spill of an SGPR to a single lane of
   /// a VGPR and build a MachineInstr around it.
   MachineInstr *buildCFIForSGPRToVGPRSpill(MachineBasicBlock &MBB,
@@ -134,10 +141,25 @@ class SIFrameLowering final : public AMDGPUFrameLowering {
       MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI,
       const DebugLoc &DL, Register SGPR,
       ArrayRef<SIRegisterInfo::SpilledReg> VGPRSpills) const;
+  /// Create a CFI index describing a spill of a SGPR to VMEM and
+  /// build a MachineInstr around it.
+  MachineInstr *buildCFIForSGPRToVMEMSpill(MachineBasicBlock &MBB,
+                                           MachineBasicBlock::iterator MBBI,
+                                           const DebugLoc &DL, unsigned SGPR,
+                                           int64_t Offset) const;
+  /// Create a CFI index describing a spill of a VGPR to VMEM and
+  /// build a MachineInstr around it.
+  MachineInstr *buildCFIForVGPRToVMEMSpill(MachineBasicBlock &MBB,
+                                           MachineBasicBlock::iterator MBBI,
+                                           const DebugLoc &DL, unsigned VGPR,
+                                           int64_t Offset) const;
   MachineInstr *buildCFIForRegToSGPRPairSpill(MachineBasicBlock &MBB,
                                               MachineBasicBlock::iterator MBBI,
                                               const DebugLoc &DL, Register Reg,
                                               Register SGPRPair) const;
+  MachineInstr *buildCFIForSameValue(MachineBasicBlock &MBB,
+                                     MachineBasicBlock::iterator MBBI,
+                                     const DebugLoc &DL, Register Reg) const;
   // Returns true if the function may need to reserve space on the stack for the
   // CWSR trap handler.
   bool mayReserveScratchForCWSR(const MachineFunction &MF) const;
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index d930a21c2d7f5..a097e721d142f 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -1530,22 +1530,26 @@ SIInstrInfo::getIndirectRegWriteMovRelPseudo(unsigned VecSize, unsigned EltSize,
   return get(getIndirectVGPRWriteMovRelPseudoOpc(VecSize));
 }
 
-static unsigned getSGPRSpillSaveOpcode(unsigned Size) {
+static unsigned getSGPRSpillSaveOpcode(unsigned Size, bool NeedsCFI) {
   switch (Size) {
   case 4:
-    return AMDGPU::SI_SPILL_S32_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_S32_CFI_SAVE : AMDGPU::SI_SPILL_S32_SAVE;
   case 8:
-    return AMDGPU::SI_SPILL_S64_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_S64_CFI_SAVE : AMDGPU::SI_SPILL_S64_SAVE;
   case 12:
-    return AMDGPU::SI_SPILL_S96_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_S96_CFI_SAVE : AMDGPU::SI_SPILL_S96_SAVE;
   case 16:
-    return AMDGPU::SI_SPILL_S128_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_S128_CFI_SAVE
+                    : AMDGPU::SI_SPILL_S128_SAVE;
   case 20:
-    return AMDGPU::SI_SPILL_S160_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_S160_CFI_SAVE
+                    : AMDGPU::SI_SPILL_S160_SAVE;
   case 24:
-    return AMDGPU::SI_SPILL_S192_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_S192_CFI_SAVE
+                    : AMDGPU::SI_SPILL_S192_SAVE;
   case 28:
-    return AMDGPU::SI_SPILL_S224_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_S224_CFI_SAVE
+                    : AMDGPU::SI_SPILL_S224_SAVE;
   case 32:
     return AMDGPU::SI_SPILL_S256_SAVE;
   case 36:
@@ -1557,69 +1561,90 @@ static unsigned getSGPRSpillSaveOpcode(unsigned Size) {
   case 48:
     return AMDGPU::SI_SPILL_S384_SAVE;
   case 64:
-    return AMDGPU::SI_SPILL_S512_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_S512_CFI_SAVE
+                    : AMDGPU::SI_SPILL_S512_SAVE;
   case 128:
-    return AMDGPU::SI_SPILL_S1024_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_S1024_CFI_SAVE
+                    : AMDGPU::SI_SPILL_S1024_SAVE;
   default:
     llvm_unreachable("unknown register size");
   }
 }
 
-static unsigned getVGPRSpillSaveOpcode(unsigned Size) {
+static unsigned getVGPRSpillSaveOpcode(unsigned Size, bool NeedsCFI) {
   switch (Size) {
   case 2:
     return AMDGPU::SI_SPILL_V16_SAVE;
   case 4:
-    return AMDGPU::SI_SPILL_V32_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_V32_CFI_SAVE : AMDGPU::SI_SPILL_V32_SAVE;
   case 8:
-    return AMDGPU::SI_SPILL_V64_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_V64_CFI_SAVE : AMDGPU::SI_SPILL_V64_SAVE;
   case 12:
-    return AMDGPU::SI_SPILL_V96_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_V96_CFI_SAVE : AMDGPU::SI_SPILL_V96_SAVE;
   case 16:
-    return AMDGPU::SI_SPILL_V128_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_V128_CFI_SAVE
+                    : AMDGPU::SI_SPILL_V128_SAVE;
   case 20:
-    return AMDGPU::SI_SPILL_V160_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_V160_CFI_SAVE
+                    : AMDGPU::SI_SPILL_V160_SAVE;
   case 24:
-    return AMDGPU::SI_SPILL_V192_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_V192_CFI_SAVE
+                    : AMDGPU::SI_SPILL_V192_SAVE;
   case 28:
-    return AMDGPU::SI_SPILL_V224_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_V224_CFI_SAVE
+                    : AMDGPU::SI_SPILL_V224_SAVE;
   case 32:
-    return AMDGPU::SI_SPILL_V256_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_V256_CFI_SAVE
+                    : AMDGPU::SI_SPILL_V256_SAVE;
   case 36:
-    return AMDGPU::SI_SPILL_V288_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_V288_CFI_SAVE
+                    : AMDGPU::SI_SPILL_V288_SAVE;
   case 40:
-    return AMDGPU::SI_SPILL_V320_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_V320_CFI_SAVE
+                    : AMDGPU::SI_SPILL_V320_SAVE;
   case 44:
-    return AMDGPU::SI_SPILL_V352_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_V352_CFI_SAVE
+                    : AMDGPU::SI_SPILL_V352_SAVE;
   case 48:
-    return AMDGPU::SI_SPILL_V384_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_V384_CFI_SAVE
+                    : AMDGPU::SI_SPILL_V384_SAVE;
   case 64:
-    return AMDGPU::SI_SPILL_V512_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_V512_CFI_SAVE
+                    : AMDGPU::SI_SPILL_V512_SAVE;
   case 128:
-    return AMDGPU::SI_SPILL_V1024_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_V1024_CFI_SAVE
+                    : AMDGPU::SI_SPILL_V1024_SAVE;
   default:
     llvm_unreachable("unknown register size");
   }
 }
 
-static unsigned getAVSpillSaveOpcode(unsigned Size) {
+static unsigned getAVSpillSaveOpcode(unsigned Size, bool NeedsCFI) {
   switch (Size) {
   case 4:
-    return AMDGPU::SI_SPILL_AV32_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_AV32_CFI_SAVE
+                    : AMDGPU::SI_SPILL_AV32_SAVE;
   case 8:
-    return AMDGPU::SI_SPILL_AV64_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_AV64_CFI_SAVE
+                    : AMDGPU::SI_SPILL_AV64_SAVE;
   case 12:
-    return AMDGPU::SI_SPILL_AV96_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_AV96_CFI_SAVE
+                    : AMDGPU::SI_SPILL_AV96_SAVE;
   case 16:
-    return AMDGPU::SI_SPILL_AV128_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_AV128_CFI_SAVE
+                    : AMDGPU::SI_SPILL_AV128_SAVE;
   case 20:
-    return AMDGPU::SI_SPILL_AV160_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_AV160_CFI_SAVE
+                    : AMDGPU::SI_SPILL_AV160_SAVE;
   case 24:
-    return AMDGPU::SI_SPILL_AV192_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_AV192_CFI_SAVE
+                    : AMDGPU::SI_SPILL_AV192_SAVE;
   case 28:
-    return AMDGPU::SI_SPILL_AV224_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_AV224_CFI_SAVE
+                    : AMDGPU::SI_SPILL_AV224_SAVE;
   case 32:
-    return AMDGPU::SI_SPILL_AV256_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_AV256_CFI_SAVE
+                    : AMDGPU::SI_SPILL_AV256_SAVE;
   case 36:
     return AMDGPU::SI_SPILL_AV288_SAVE;
   case 40:
@@ -1629,9 +1654,11 @@ static unsigned getAVSpillSaveOpcode(unsigned Size) {
   case 48:
     return AMDGPU::SI_SPILL_AV384_SAVE;
   case 64:
-    return AMDGPU::SI_SPILL_AV512_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_AV512_CFI_SAVE
+                    : AMDGPU::SI_SPILL_AV512_SAVE;
   case 128:
-    return AMDGPU::SI_SPILL_AV1024_SAVE;
+    return NeedsCFI ? AMDGPU::SI_SPILL_AV1024_CFI_SAVE
+                    : AMDGPU::SI_SPILL_AV1024_SAVE;
   default:
     llvm_unreachable("unknown register size");
   }
@@ -1651,7 +1678,7 @@ static unsigned getWWMRegSpillSaveOpcode(unsigned Size,
 
 unsigned SIInstrInfo::getVectorRegSpillSaveOpcode(
     Register Reg, const TargetRegisterClass *RC, unsigned Size,
-    const SIMachineFunctionInfo &MFI) const {
+    const SIMachineFunctionInfo &MFI, bool NeedsCFI) const {
   bool IsVectorSuperClass = RI.isVectorSuperClass(RC);
 
   // Choose the right opcode if spilling a WWM register.
@@ -1660,16 +1687,16 @@ unsigned SIInstrInfo::getVectorRegSpillSaveOpcode(
 
   // TODO: Check if AGPRs are available
   if (ST.hasMAIInsts())
-    return getAVSpillSaveOpcode(Size);
+    return getAVSpillSaveOpcode(Size, NeedsCFI);
 
-  return getVGPRSpillSaveOpcode(Size);
+  return getVGPRSpillSaveOpcode(Size, NeedsCFI);
 }
 
-void SIInstrInfo::storeRegToStackSlot(
+void SIInstrInfo::storeRegToStackSlotImpl(
     MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, Register SrcReg,
     bool isKill, int FrameIndex, const TargetRegisterClass *RC,
-    const TargetRegisterInfo *TRI, Register VReg,
-    MachineInstr::MIFlag Flags) const {
+    const TargetRegisterInfo *TRI, Register VReg, MachineInstr::MIFlag Flags,
+    bool NeedsCFI) const {
   MachineFunction *MF = MBB.getParent();
   SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
   MachineFrameInfo &FrameInfo = MF->getFrameInfo();
@@ -1691,7 +1718,8 @@ void SIInstrInfo::storeRegToStackSlot(
 
     // We are only allowed to create one new instruction when spilling
     // registers, so we need to use pseudo instruction for spilling SGPRs.
-    const MCInstrDesc &OpDesc = get(getSGPRSpillSaveOpcode(SpillSize));
+    const MCInstrDesc &OpDesc =
+        get(getSGPRSpillSaveOpcode(SpillSize, NeedsCFI));
 
     // The SGPR spill/restore instructions only work on number sgprs, so we need
     // to make sure we are using the correct register class.
@@ -1710,8 +1738,8 @@ void SIInstrInfo::storeRegToStackSlot(
     return;
   }
 
-  unsigned Opcode =
-      getVectorRegSpillSaveOpcode(VReg ? VReg : SrcReg, RC, SpillSize, *MFI);
+  unsigned Opcode = getVectorRegSpillSaveOpcode(VReg ? VReg : SrcReg, RC,
+                                                SpillSize, *MFI, NeedsCFI);
   MFI->setHasSpilledVGPRs();
 
   BuildMI(MBB, MI, DL, get(Opcode))
@@ -1722,6 +1750,25 @@ void SIInstrInfo::storeRegToStackSlot(
     .addMemOperand(MMO);
 }
 
+void SIInstrInfo::storeRegToStackSlot(
+    MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, Register SrcReg,
+    bool isKill, int FrameIndex, const TargetRegisterClass *RC,
+    const TargetRegisterInfo *TRI, Register VReg,
+    MachineInstr::MIFlag Flags) const {
+  storeRegToStackSlotImpl(MBB, MI, SrcReg, isKill, FrameIndex, RC, TRI, VReg,
+                          Flags, false);
+}
+
+void SIInstrInfo::storeRegToStackSlotCFI(MachineBasicBlock &MBB,
+                                         MachineBasicBlock::iterator MI,
+                                         Register SrcReg, bool isKill,
+                                         int FrameIndex,
+                                         const TargetRegisterClass *RC,
+                                         const TargetRegisterInfo *TRI) const {
+  storeRegToStackSlotImpl(MBB, MI, SrcReg, isKill, FrameIndex, RC, TRI,
+                          Register(), MachineInstr::NoFlags, true);
+}
+
 static unsigned getSGPRSpillRestoreOpcode(unsigned Size) {
   switch (Size) {
   case 4:
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index 5fdeddaf3f736..9c0a80bbcecda 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -293,13 +293,29 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
                     MachineBasicBlock::iterator I, const DebugLoc &DL,
                     Register SrcReg, int Value)  const;
 
+private:
+  void storeRegToStackSlotImpl(MachineBasicBlock &MBB,
+                               MachineBasicBlock::iterator MI, Register SrcReg,
+                               bool isKill, int FrameIndex,
+                               const TargetRegisterClass *RC,
+                               const TargetRegisterInfo *TRI, Register VReg,
+                               MachineInstr::MIFlag Flags, bool NeedsCFI) const;
+
+public:
+  void storeRegToStackSlotCFI(MachineBasicBlock &MBB,
+                              MachineBasicBlock::iterator MI, Register SrcReg,
+                              bool isKill, int FrameIndex,
+                              const TargetRegisterClass *RC,
+                              const TargetRegisterInfo *TRI) const;
+
   bool getConstValDefinedInReg(const MachineInstr &MI, cons...
[truncated]

arsenm · 2025-10-23T03:38:30Z

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

  return true;
 }

+static bool isLiveIntoMBB(MCRegister Reg, MachineBasicBlock &MBB,


This shouldn't be here. I'm still unclear on what the rules are for when register aliases will appear in the live in list. This also should account for the LaneMask

arsenm · 2025-10-23T03:40:25Z

llvm/test/CodeGen/AMDGPU/callee-special-input-vgprs-packed.ll

 ; GCN-NEXT:    buffer_store_dword v40, off, s[0:3], s33 ; 4-byte Folded Spill
 ; GCN-NEXT:    s_mov_b64 exec, s[6:7]
 ; GCN-NEXT:    v_writelane_b32 v40, s4, 2
+; GCN-NEXT:    v_writelane_b32 v40, s30, 0


I'm not sure why this results in all of these scheduling changes

There are a few things leading to the scheduling changes, and I will split up this change into multiple to tease them apart so we can consider them independently.

The biggest issue is that the postmisched which runs after prologepilog (conservatively) sees the CFI_INSTRUCTIONs as scheduling boundaries, so e.g. the sink of v_readlane_b32 s30, v40, 0 from its original spot in the prolog past the CFI_INSTRUCTION which describes it is not valid.

This is why we force the CFI emission in all cases for AMDGPU, because we really want to avoid spurious scheduling changes which only occur when debug-info is enabled. If enabling debug-info can actually change program behavior then it isn't terribly useful. I do think we should move to just making the default be CFI-enabled, and still respect nounwind for e.g. test cases.

I dug up https://bugs.llvm.org/show_bug.cgi?id=37240 and https://reviews.llvm.org/D71350 concerning the larger problem, and AFAIK there isn't a great general solution yet. X86 seems to dodge the issue by not enabling a post-RA scheduler by default. arm/aarch64 just seem to have the bug currently.

I spent a couple days trying to sketch out what a better fix might look like, but it will be a large effort, and I would rather get the changes we have already been relying on for unwind info upstream first, if that is acceptable.

This was referenced Oct 22, 2025

[MC] Use a variant to hold MCCFIInstruction state (NFC) #164720

Open

[MC][Dwarf] Add custom CFI pseudo-ops for use in AMDGPU #164721

Open

[AMDGPU] Emit entry function Dwarf CFI #164722

Open

This was referenced Oct 22, 2025

[AMDGPU] Implement CFI for non-kernel functions #164723

Open

[AMDGPU] Implement -amdgpu-spill-cfi-saved-regs #164725

Open

[AMDGPU][MC] Replace shifted registers in CFI instructions #164726

Open

slinder1 assigned kzhuravl, epilk, SLTozer, adrian-prantl, OCHyams and jmorse Oct 22, 2025

slinder1 marked this pull request as ready for review October 22, 2025 22:38

llvmbot added backend:AMDGPU llvm:globalisel labels Oct 22, 2025

arsenm reviewed Oct 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Implement CFI for CSR spills #164724

[AMDGPU] Implement CFI for CSR spills #164724

slinder1 commented Oct 22, 2025

Uh oh!

slinder1 commented Oct 22, 2025 •

edited

Loading

Uh oh!

llvmbot commented Oct 22, 2025 •

edited

Loading

Uh oh!

arsenm Oct 23, 2025

Uh oh!

arsenm Oct 23, 2025

Uh oh!

slinder1 Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

[AMDGPU] Implement CFI for CSR spills #164724

Are you sure you want to change the base?

[AMDGPU] Implement CFI for CSR spills #164724

Conversation

slinder1 commented Oct 22, 2025

Uh oh!

slinder1 commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

slinder1 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

slinder1 commented Oct 22, 2025 •

edited

Loading

llvmbot commented Oct 22, 2025 •

edited

Loading