py: Implement PEP 750 t-strings using existing f-string parser (WIP) #18650

dpgeorge · 2026-01-06T12:40:08Z

Summary

This is an alternative to #17557 which aims to implement t-strings in a more efficient way (less code size), leveraging the existing f-string parser in the lexer. It includes:

t-string parsing in py/lexer.c
new built-in __template__() function to construct t-string objects
new built-in Template and Interpolation classes which implement all the functionality from PEP 750
new built-in string module with templatelib sub-module, which contains the classes Template and Interpolation

This PR is built upon #18588.

The way it works is that an input t-string like:

t"hello {name:5}"

is converted character-by-character by the lexer/tokenizer to:

__template__(("hello ", "",), name, "name", None, "5")

(For reference, if it were an f-string it would be converted to "hello {:5}".format(name).)

Compared to #17557 which costs about +7400 bytes on stm32, this implementation costs +2844 bytes.

This is still a work-in-progress. It implements most of the t-string functionality including nested t-strings and f-strings, but there are a few corner cases yet to tidy up. I don't see any show stoppers though, and code size should hopefully not grow much more either.

Testing

All 16 tests from #17557 have been added here. So far 11 of them pass, and 1 is no longer relevant (testing runtime overflow limit which is no longer there).

Trade-offs and Alternatives

Being an alternative to #17557, it shows a different way to achieve the same end result. #17557 starts up a new parser instance each time a t-string is encountered and recursively parses the t-string, whereas the implementation here just transforms the input characters. After all, t-strings (and f-strings) are really just syntactic sugar.

This adds code size, but if t-strings are not used then there is very little execution overhead, all of which is contained to the lexer.

The changes to py/lexer.c are mildly complex, but not really much more complex than the existing f-string logic. It's just a different way of transforming the input stream.

codecov · 2026-01-06T13:57:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.41%. Comparing base (26c1696) to head (e61ae6d).

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #18650      +/-   ##
==========================================
+ Coverage   98.38%   98.41%   +0.03%     
==========================================
  Files         171      172       +1     
  Lines       22298    22606     +308     
==========================================
+ Hits        21937    22247     +310     
+ Misses        361      359       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

This saves about 4 bytes on ARM Cortex-M, and about 50-60 bytes on x86-64. It also allows the upcoming `vstr_ins_strn()` function to be inline as well, and have less of a code-size impact when used. Signed-off-by: Damien George <[email protected]>

This is now an easy function to define as inline, so it does not impact code size unless it's used. Signed-off-by: Damien George <[email protected]>

Having this check takes code size and execution time, and it's not necessary: all callers of this function pass a non-zero value for `byte_len` already. And even if `byte_len` was zero, the code would still perform correctly. Signed-off-by: Damien George <[email protected]>

The null byte cannot exist in source code (per CPython), so use it to indicate the end of the input stream (instead of `(mp_uint_t)-1`). This allows the cache chars (chr0/1/2 and their saved versions) to be 8-bit bytes, making it clear that they are not `unichar` values. It also saves a bit of memory in the `mp_lexer_t` data structure. (And in a future commit allows the saved cache chars to be eliminated entirely by storing them in a vstr instead.) In order to keep code size down, the frequently used `chr0` is still of type `uint32_t`. Having it 32-bit means that machine instructions to load it are smaller (it adds about +80 bytes to Thumb code if `chr0` is changed to `uint8_t`). Also add tests for invalid bytes in the input stream to make sure there are no regressions in this regard. Signed-off-by: Damien George <[email protected]>

It turns out that it's relatively simple to support nested f-strings, which is what this commit implements. The way the MicroPython f-string parser works at the moment is: 1. it extracts the f-string arguments (things in curly braces) into a temporary buffer (a vstr) 2. once the f-string ends (reaches its closing quote) the lexer switches to tokenizing the temporary buffer 3. once the buffer is empty it switches back to the stream. The temporary buffer can easily hold f-strings itself (ie nested f-strings) and they can be re-parsed by the lexer using the same algorithm. The only thing stopping that from working is that the temporary buffer can't be reused for the nested f-string because it's currently being parsed. This commit fixes that by adding a second temporary buffer, which is the "injection" buffer. That allows arbitrary number of nestings with a simple modification to the original algorithm: 1. when an f-string is encountered the string is parsed and its arguments are extracted into `fstring_args` 2. when the f-string finishes, `fstring_args` is inserted into the current position in `inject_chrs` (which is the start of that buffer if no injection is ongoing) 3. `fstring_args` is now cleared and ready for any further f-strings (nested or not) 4. the lexer switches to `inject_chrs` if it's not already reading from it 5. if an f-string appeared inside the f-string then it is in `inject_chrs` and can be processed as before, extracting its arguments into `fstring_args`, which can then be inserted again into `inject_chrs` 6. once `inject_chrs` is exhausted (meaning that all levels of f-strings have been fully processed) the lexer switched back to tokenizing the stream. Amazingly, this scheme supports arbitrary numbers of nestings of f-strings using the same quote style. This adds some code size and a bit more memory usage for the lexer. In particular for a single (non-nested) f-string it now makes an extra copy of the `fstring_args` data, when copying it across to `inject_chrs`. Otherwise, memory use only goes up with the complexity of nested f-strings. Signed-off-by: Damien George <[email protected]>

This way, the use of `lex->fstring_args` is fully self contained within the string literal parsing section of `mp_lexer_to_next()`. Signed-off-by: Damien George <[email protected]>

Signed-off-by: Koudai Aono <[email protected]>

Signed-off-by: Damien George <[email protected]>

github-actions · 2026-01-07T01:20:37Z

Code size report:

Reference:  github/workflows: Use same Ubuntu for code_size as ports_esp32. [26c1696]
Comparison: py/lexer: Improve t-string edge cases. [merge of e61ae6d]
  mpy-cross: +1832 +0.485% [incl +96(data)]
   bare-arm:   -12 -0.021% 
minimal x86:   -94 -0.050% 
   unix x64: +5768 +0.673% standard[incl +416(data)]
      stm32: +2964 +0.751% PYBV10
      esp32: +2972 +0.170% ESP32_GENERIC[incl +480(data)]
     mimxrt: +3000 +0.798% TEENSY40
        rp2: +2848 +0.310% RPI_PICO_W
       samd: +2956 +1.088% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32: +3036 +0.666% VIRT_RV32

Signed-off-by: Damien George <[email protected]>

This now works in MicroPython. Signed-off-by: Damien George <[email protected]>

Signed-off-by: Damien George <[email protected]>

Now OK in MicroPython. Signed-off-by: Damien George <[email protected]>

Signed-off-by: Damien George <[email protected]>

Not worth supporting. Signed-off-by: Damien George <[email protected]>

Reusing the existing f-string parser in the lexer. Signed-off-by: Damien George <[email protected]>

Signed-off-by: Damien George <[email protected]>

dpgeorge added the py-core Relates to py/ directory in source label Jan 6, 2026

dpgeorge mentioned this pull request Jan 6, 2026

py: Add PEP 750 template strings support #17557

Open

dpgeorge force-pushed the py-implement-tstrings branch 2 times, most recently from 9395826 to 7c6e8e2 Compare January 6, 2026 13:26

dpgeorge and others added 10 commits January 7, 2026 11:59

py/vstr: Add vstr_ins_strn helper function.

4a539e1

This is now an easy function to define as inline, so it does not impact code size unless it's used. Signed-off-by: Damien George <[email protected]>

py/lexer: Move f-string completion code to more logical location.

56eae2b

This way, the use of `lex->fstring_args` is fully self contained within the string literal parsing section of `mp_lexer_to_next()`. Signed-off-by: Damien George <[email protected]>

tests: Add coverage for PEP 750 template strings.

a860cc3

Signed-off-by: Koudai Aono <[email protected]>

tests: Add .exp outputs for template string suite.

91dc351

Signed-off-by: Koudai Aono <[email protected]>

tests: Remove !a t-string tests, ascii is not supported.

38e30aa

Signed-off-by: Damien George <[email protected]>

tests/basics: Change t-string expected output for error cases.

4d2ab9d

Signed-off-by: Damien George <[email protected]>

dpgeorge force-pushed the py-implement-tstrings branch from f103117 to 9916c48 Compare January 7, 2026 01:00

dpgeorge force-pushed the py-implement-tstrings branch from 9916c48 to ac21499 Compare January 7, 2026 01:32

dpgeorge added 12 commits January 9, 2026 14:29

tests: Skip failing t-string tests.

8a2bb8d

Signed-off-by: Damien George <[email protected]>

tests: Strip trailing spaces to match CPython.

99613fa

This now works in MicroPython. Signed-off-by: Damien George <[email protected]>

tests: Update errors to match MicroPython t-string parsing.

a4495f2

Signed-off-by: Damien George <[email protected]>

tests: Use str(t) instead of t.__str__().

a2ad5ac

Signed-off-by: Damien George <[email protected]>

tests: Remove overflow test with many interpolations.

f488d38

Now OK in MicroPython. Signed-off-by: Damien George <[email protected]>

tests: Update test for large number of interpolations.

0810956

Signed-off-by: Damien George <[email protected]>

tests: Don't test tripple quotes in format spec.

027984b

Not worth supporting. Signed-off-by: Damien George <[email protected]>

tests: Don't test space after ! conversion.

3495e49

Not worth supporting. Signed-off-by: Damien George <[email protected]>

py: Add support for PEP 750's t-strings.

82e89e0

Reusing the existing f-string parser in the lexer. Signed-off-by: Damien George <[email protected]>

mpy-cross: Enable t-strings.

843e9bf

Signed-off-by: Damien George <[email protected]>

windows/mpconfigport: Enable t-strings.

1cd4f42

Signed-off-by: Damien George <[email protected]>

py/lexer: Improve t-string edge cases.

e61ae6d

Signed-off-by: Damien George <[email protected]>

dpgeorge force-pushed the py-implement-tstrings branch from ac21499 to e61ae6d Compare January 9, 2026 03:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

py: Implement PEP 750 t-strings using existing f-string parser (WIP) #18650

py: Implement PEP 750 t-strings using existing f-string parser (WIP) #18650

dpgeorge commented Jan 6, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

py: Implement PEP 750 t-strings using existing f-string parser (WIP) #18650

Are you sure you want to change the base?

py: Implement PEP 750 t-strings using existing f-string parser (WIP) #18650

Conversation

dpgeorge commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Trade-offs and Alternatives

Uh oh!

codecov bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dpgeorge commented Jan 6, 2026 •

edited

Loading

codecov bot commented Jan 6, 2026 •

edited

Loading

github-actions bot commented Jan 7, 2026 •

edited

Loading