rp2/machine_spi: Detect RX overrun when using DMA transfers. #18578

sfe-SparkFro · 2025-12-16T15:48:38Z

Summary

Testing

Before this change, the reproduction code in #18471 freezes within ~100 loop iterations. After this change, ~~the reproduction code does not freeze (at least, not within 10k loop iterations)~~ an OSError is raised instead of freezing.

Trade-offs and Alternatives

I don't know the original motivation for using DMA channels for large SPI transfers, but one downside of this change is that long interrupts could hold up SPI transfers. I don't know how problematic that really is, but users should be writing their interrupts to be efficient anyways.

~~If a user really want to use DMA channels for large SPI transfers, they can easily do this with the existing MicroPython API. Something like this should work (I've not tested it myself):~~

src_data = bytearray(1024)
DATA_REQUEST_INDEX = 24

spi = machine.SPI(0)

dma = rp2.DMA()
ctrl = dma.pack_ctrl(size=0, inc_write=False, treq_sel=DATA_REQUEST_INDEX)

dma.configure(
    read=src_data,
    write=spi,
    count=len(src_data),
    ctrl=ctrl,
    trigger=True
)

An alternative solution could be to enable to the user to select whether they want to use the DMA for large transfers, either when the SPI constructor is called, or when the read/write functions are called.

github-actions · 2025-12-16T15:57:03Z

Code size report:

Reference:  mpy-cross/mpconfigport: Explicitly disable native code loading. [8fb848f]
Comparison: rp2/machine_spi: Improve RX overrun handling. [merge of 95a4749]
  mpy-cross:    +0 +0.000% 
   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:   +80 +0.009% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

ports/rp2/machine_spi.c

peterhinch · 2025-12-19T17:32:44Z

Is this a bug or a known limitation? If the bus arbitration for PSRAM can fail irretrievably in the face of concurrent access by two DMA channels, the fault will eventually occur in user code. In this case disabling DMA in the SPI driver is a workround rather than a fix. If, rather than a bug, there is an inherent limit on DMA data rates it should be documented.

sfe-SparkFro · 2025-12-21T00:17:56Z

Force pushed 403b109, which only disables DMA transfers if PSRAM is enabled. @dpgeorge Please have a look. Haven't tested yet, don't have hardware with me right now.

Is this a bug or a known limitation?

The fact that DMA transfers to/from PSRAM can stall other DMA channels is a known limitation. Section 4.4.3 of the RP2350 datasheet states this pretty clearly in the second paragraph:

...QMI serial transfers force lengthy bus stalls on the DMA... stalling the DMA prevents any other active DMA channels from making progress during this time, slowing overall DMA throughput.

However, the current MicroPython SPI implementation results in a bug where the processor can get stuck in an infinite loop, because it does nothing to consider the above limitation. IMO, it should at least do something to guarantee that it can't get stuck in an infinite loop.

In this case disabling DMA in the SPI driver is a workround rather than a fix.

Yes, outright disabling DMA transfers is certainly a workaround. Perhaps an alternative fix could be something like the following?

         dma_start_channel_mask((1u << chan_rx) | (1u << chan_tx));
-        dma_channel_wait_for_finish_blocking(chan_rx);
         dma_channel_wait_for_finish_blocking(chan_tx);

+        // Wait for SPI to finish, and all data to be received by RX DMA
+        while (spi_is_busy(self->spi_inst) || spi_is_readable(self->spi_inst)) {
+        }
+
+        // If RX DMA still has transfers pending, raise an error
+        if (dma_channel_hw_addr(chan_rx)->transfer_count) {
+            mp_raise_OSError(MP_EIO);
+        }
    }

nickovs · 2025-12-30T14:53:48Z

Since the cause is a limitation of the hardware, and the bug in the code is the result of the code not properly handling the implications of that hardware limitation, I don't think that killing all DMA support for SPI as a workaround is appropriate.

I would favour your later proposal of having the code raise an error if the RX DMA doesn't complete. That way as long as the user doesn't set up their code in a way that overwhelms the SPRAM bus they can still have the benefits of using DMA for their SPI transfers.

nickovs · 2025-12-30T16:05:28Z

@sfe-SparkFro On a related note, I just submitted PR #18622 to add support for the DMA controller's pacing timers. At the moment if you uses the rp2.DMA() class for a memory-to-memory move it will always run at full speed but with this new code you could update your failing example in #18471 to something like this:

...
dma_psram = rp2.DMA()
dma_psram_timer = rp2.DMATimer()
# Limit the transfer rate to 20 million transfers per second
dma_psram_timer.freq = 20_000_000
dma_psram_ctrl = dma_psram.pack_ctrl(size = 0, treq_sel=dma_psram_timer.treq_sel)
...

This way the SRAM to PSRAM transfer will be rate limited and the user can arrange not to swamp the bus.

It would be instructive if you could test this on a board with PSRAM to confirm that it can be used to fix the problem (but I still think the SPI code should be updated to raise an error rather than stall if the RX DMA doesn't complete).

Fixes bug where SPI can freeze if another DMA channel is tranferring to/from PSRAM. Signed-off-by: Dryw Wade <[email protected]>

sfe-SparkFro · 2025-12-30T18:15:21Z

See 30fb190. Now this PR continues to use DMA transfers when possible, but can gracefully detect the SPI RX FIFO overrun condition and raise an OSError if that occurs.

I would like to see a solution that allows the user to force software transfers to guarantee this OSError never gets raised. Perhaps it can be added as an argument in the constructor? Eg. machine.SPI(dma=false). Can save for a different PR, let's just get this fix through 🙂

Need to clear the overrun flag. RX overruns are not a problem if only writing. Signed-off-by: Dryw Wade <[email protected]>

sfe-SparkFro · 2025-12-30T18:31:32Z

Couple improvements in 95a4749. Forgot to clear the overrun flag, and realized that an OSError doesn't need to be raised if write_only == true since RX data is being ignored anyways. This PR passes all my tests now!

Also, with my test code, I realized RX overruns occur much more frequently when transferring from PSRAM to SRAM, almost every loop iteration:

dma_psram.config(
    read = psram_buffer,
    write = sram_buffer,
    ...
)

Can also easily test SPI read versus write by changing spi.write(sram_buffer) to spi.readinto(sram_buffer), which highlights that the OSError does not get raised with writes (but I confirmed overruns were still happening by adding a mp_printf()).

sfe-SparkFro · 2025-12-30T18:55:40Z

It would be instructive if you could test this on a board with PSRAM to confirm that it can be used to fix the problem

Thanks! Just tested it out. Pacing the PSRAM DMA at 20 million transfers per second does not help (my test code typically freezes on the first loop iteration now that I'm transferring from PSRAM to SRAM). If I drop it to 2 million, then it does help, but does not fix the problem (first run froze after ~600 loop iterations).

Maybe it can be fixed by further reducing the transfer rate, but that would really start throttling the DMA, and I'm not even sure that pacing the PSRAM DMA transfer can be guaranteed to fix this. When the DMA encounters a bus stall during a PSRAM transfer, the amount of time its stalled is variable. I'm guessing it's possible for a single very long bus stall to cause an RX FIFO overrun, in which case, no amount of DMA pacing fixes the problem. Makes it a lot less likely, but not guaranteed. Or at least, I'm guessing that's the case, but even if not, you're sacrificing DMA throughput. Which is why I'd like an option to force software SPI transfers, because it's guaranteed to not have overruns while allowing the PSRAM DMA channel to transfer as fast as the XIP subsystem allows.

sfe-SparkFro mentioned this pull request Dec 16, 2025

RP2350 SPI ST7789 driver sometimes freezes sparkfun/red_vision#7

Closed

dpgeorge added the port-rp2 label Dec 17, 2025

dpgeorge reviewed Dec 17, 2025

View reviewed changes

ports/rp2/machine_spi.c Show resolved Hide resolved

sfe-SparkFro force-pushed the rp2_spi_no_dma branch from 54171d2 to 403b109 Compare December 20, 2025 23:40

rp2/machine_spi: Detect RX overrun when using DMA transfers.

30fb190

Fixes bug where SPI can freeze if another DMA channel is tranferring to/from PSRAM. Signed-off-by: Dryw Wade <[email protected]>

sfe-SparkFro force-pushed the rp2_spi_no_dma branch from 403b109 to 30fb190 Compare December 30, 2025 18:01

sfe-SparkFro changed the title ~~rp2/machine_spi: Don't use DMA channels for SPI transfers.~~ rp2/machine_spi: Detect RX overrun when using DMA transfers. Dec 30, 2025

rp2/machine_spi: Improve RX overrun handling.

95a4749

Need to clear the overrun flag. RX overruns are not a problem if only writing. Signed-off-by: Dryw Wade <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

rp2/machine_spi: Detect RX overrun when using DMA transfers. #18578

rp2/machine_spi: Detect RX overrun when using DMA transfers. #18578

sfe-SparkFro commented Dec 16, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

peterhinch commented Dec 19, 2025

Uh oh!

sfe-SparkFro commented Dec 21, 2025

Uh oh!

nickovs commented Dec 30, 2025

Uh oh!

nickovs commented Dec 30, 2025

Uh oh!

sfe-SparkFro commented Dec 30, 2025

Uh oh!

sfe-SparkFro commented Dec 30, 2025 •

edited

Loading

Uh oh!

sfe-SparkFro commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

rp2/machine_spi: Detect RX overrun when using DMA transfers. #18578

Are you sure you want to change the base?

rp2/machine_spi: Detect RX overrun when using DMA transfers. #18578

Conversation

sfe-SparkFro commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Trade-offs and Alternatives

Uh oh!

github-actions bot commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

peterhinch commented Dec 19, 2025

Uh oh!

sfe-SparkFro commented Dec 21, 2025

Uh oh!

nickovs commented Dec 30, 2025

Uh oh!

nickovs commented Dec 30, 2025

Uh oh!

sfe-SparkFro commented Dec 30, 2025

Uh oh!

sfe-SparkFro commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfe-SparkFro commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sfe-SparkFro commented Dec 16, 2025 •

edited

Loading

github-actions bot commented Dec 16, 2025 •

edited

Loading

sfe-SparkFro commented Dec 30, 2025 •

edited

Loading