Skip to content

Add logic to track rendering area of various PDF ops#19043

Merged
calixteman merged 1 commit intomozilla:masterfrom
nicolo-ribaudo:compute-bounding-boxes
Aug 22, 2025
Merged

Add logic to track rendering area of various PDF ops#19043
calixteman merged 1 commit intomozilla:masterfrom
nicolo-ribaudo:compute-bounding-boxes

Conversation

@nicolo-ribaudo
Copy link
Contributor

@nicolo-ribaudo nicolo-ribaudo commented Nov 14, 2024

I started working towards #6419. This PR introduces the logic to track where different elements of the PDF are rendered, and hooks it up to the debugger since @calixteman mentioned that it would be useful.

I'm marking this as draft because there are a few changes I need to make:

  • change the various methods in canvas.js to receive the index as a param, rather than returning a function that takes the index
  • clean up the "dependencies tracking", since currently it's all over the place. Ideally most of this logic should be self-contained in CanvasRecorder, so that when not recording it doesn't have a performance impact.
  • improve the dependency tracking (so far I'm only tracking some of them)
  • do not track extra dependencies (for example, a stroke path doesn't depend on the fill color)
  • track object dependencies
  • fix image dependencies tracking for transform (currently there is a .setTransform that makes it get lost)

However, I'd love to receive feedback on the direction.

Commit 1:

Add logic to track rendering area of various PDF ops

This commit is a first step towards #6419, and it can also help with
#13287. To support rendering part of a page, we will need to
first compute which ops can affect what is visible in that part of
the page.

This commit adds logic to track "group of ops" with their respective
bounding boxes. Each group eather corresponds to a single op or
to a range, and it can have dependencies earlier in the ops list that
are not contiguous to the range.

Consider the following example:

0. setFillRGBColor
1. beginText
2. showText "Hello"
3. endText
4. constructPath [...]
5. eoFill

here we have two groups: the text (range 1-3) and the path (range 4-5).
Each of them has a corresponding bounding box, and a dependency
on the op at index 0.

This tracking happens when first rendering a PDF: we wrap the canvas
with a "canvas recorder" that has the same API, but with additional
methods to mark the start/end of a group.

Commit 2:

Hook up the ops bbox logic to the pdf debugger

When using the pdf debugger, when hovering over a step now:

  • it highlights the steps in the same groups
  • it highlights the steps that they depend on
  • it highlights on the PDF itself the bounding box

This is an example of what the debugger integration looks like (note: I couldn't figure out how to make my cursor show up in the recording 😅 I'm moving it over the steps list):

Screen.Recording.2024-11-14.at.16.35.58.mov

By default it doesn't show all the bounding boxes because on some PDFs it's too much noise, but if you click on the checkbox then it shows the boxes and you can click on a box to scroll into view the corresponding ops.

@nicolo-ribaudo
Copy link
Contributor Author

master...nicolo-ribaudo:pdf.js:draw-page-portion-optimized is a branch merging this PR together with #19128. In the video below you can see that it first renders in the background a low-resolution image "the old way" taking 12 seconds, and then it renders the "detail view" on top taking only 1.4 seconds and only running one fifth of the PDF operations :)

Screen.Recording.2024-12-17.at.18.10.30.mp4

Still keeping this as draft because there are significant bugs (in the PDF I'm using for testing, it often skips rendering some pieces of text even if they are visible on screen, or it renders some paths with the wrong color), but it's nice to see some progress.

@bobsingor
Copy link

Very good progress on this! This is a feature that the community is waiting a long time for. Can't wait to see more progress on this.

@nicolo-ribaudo nicolo-ribaudo force-pushed the compute-bounding-boxes branch from 8184a06 to cad8d31 Compare June 1, 2025 16:43
@nicolo-ribaudo
Copy link
Contributor Author

nicolo-ribaudo commented Jun 2, 2025

Update!

  • I've reworked the dependency tracking to be based on PDF operations rather than on canvas operations. Doing it on canvas operations originally seemed cleaner, but it introduces a lot of complexity because each PDF op calls many canvas ops, and they read state from the canvas in a way that caused the tracking logic to loose information of where that state was originally coming from.
  • I've now hooked it up to the "detail view" logic, so that we record dependencies/bboxes while rendering the background page and then use that information when rendering the detail view.

This video shows how we are skipping some ops while rendering the detail view as we scroll around the page :)

Screen.Recording.2025-06-02.at.16.00.49.mov

The main missing task is that I have to properly hook this logic up to the reftests, maybe rendering a fraction of the page with the logic and checking that it matches the same fraction of the page with the unoptimized rendering. Once this is done, I can go through the failing tests one by one and add the missing tracking.

@nicolo-ribaudo nicolo-ribaudo force-pushed the compute-bounding-boxes branch 3 times, most recently from c294316 to 740b221 Compare June 9, 2025 17:36
@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_test from @nicolo-ribaudo received. Current queue size: 0

Live output at: http://54.193.163.58:8877/b5954f4701f62b6/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/96c4987519a6873/output.txt

Total script time: 34.09 mins

  • Unit tests: Passed
  • Integration Tests: FAILED
  • Regression tests: Passed

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/b5954f4701f62b6/output.txt

Total script time: 67.87 mins

  • Unit tests: FAILED
  • Integration Tests: FAILED
  • Regression tests: Passed

@nicolo-ribaudo
Copy link
Contributor Author

Linux failures:

TEST-UNEXPECTED-FAIL | must write a string in a FreeText editor
  - Error: Attempted to use detached Frame '4b160de5-100d-4e4b-aba8-005695c18418'
      at BidiFrame.<anonymous> (file:///home/ubuntu/pdfjs/botio-files-pdfjs/private/96c4987519a6873/node_modules/puppeteer-core/lib/esm/puppeteer/util/decorators.js:99:23)
TEST-UNEXPECTED-FAIL | must check that text change can be undone/redone
  - TimeoutError: Navigation timeout of 30000 ms exceeded
      at file:///home/ubuntu/pdfjs/botio-files-pdfjs/private/96c4987519a6873/node_modules/puppeteer-core/lib/esm/puppeteer/common/util.js:228:19
TEST-UNEXPECTED-FAIL | must check that a freetext can be undone
  - TypeError: Cannot read properties of undefined (reading 'map')
      at closePages (file:///home/ubuntu/pdfjs/botio-files-pdfjs/private/96c4987519a6873/test/integration/test_utils.mjs:139:28)
  - Maybe caused by the failure above
TEST-UNEXPECTED-FAIL | must serialize invisible annotations
  - TimeoutError: Navigation timeout of 30000 ms exceeded
      at file:///home/ubuntu/pdfjs/botio-files-pdfjs/private/96c4987519a6873/node_modules/puppeteer-core/lib/esm/puppeteer/common/util.js:228:19
TEST-UNEXPECTED-FAIL | must delete an existing annotation
  - TypeError: Cannot read properties of undefined (reading 'map')
      at closePages (file:///home/ubuntu/pdfjs/botio-files-pdfjs/private/96c4987519a6873/test/integration/test_utils.mjs:139:28)
  - Maybe caused by the failure above
TEST-UNEXPECTED-FAIL | must delete an existing annotation with a popup
  - TypeError: Cannot read properties of undefined (reading 'map')
      at closePages (file:///home/ubuntu/pdfjs/botio-files-pdfjs/private/96c4987519a6873/test/integration/test_utils.mjs:139:28)
  - Maybe caused by the failure above

Windows failures (unit tests, it's a timeout):

TEST-UNEXPECTED-FAIL | caches image resources at the document/page level as expected (issue 11878) | in chrome

Windows failures (integration tests, all because of timeouts):

TEST-UNEXPECTED-FAIL | must check that a stamp can be undone
  - TypeError: Cannot read properties of undefined (reading 'map')
      at closePages (file:///home/ubuntu/pdfjs/botio-files-pdfjs/private/96c4987519a6873/test/integration/test_utils.mjs:139:28)
TEST-UNEXPECTED-FAIL | must check that a stamp can be undone
  - TimeoutError: Navigation timeout of 30000 ms exceeded
      at file:///C:/pdfjs/botio-files-pdfjs/private/b5954f4701f62b6/node_modules/puppeteer-core/lib/esm/puppeteer/common/util.js:228:19

The windows unit test timeout is the same across the last two runs.

@nicolo-ribaudo
Copy link
Contributor Author

/botio unittest

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_unittest from @nicolo-ribaudo received. Current queue size: 0

Live output at: http://54.193.163.58:8877/de6a45131497f86/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_unittest from @nicolo-ribaudo received. Current queue size: 0

Live output at: http://54.241.84.105:8877/de3cbe6fa74fab4/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/de3cbe6fa74fab4/output.txt

Total script time: 2.49 mins

  • Unit Tests: Passed

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/de6a45131497f86/output.txt

Total script time: 8.65 mins

  • Unit Tests: FAILED

@nicolo-ribaudo nicolo-ribaudo force-pushed the compute-bounding-boxes branch from 44996da to c24487e Compare August 22, 2025 16:24
This commit is a first step towards mozilla#6419, and it can also help with
first compute which ops can affect what is visible in that part of
the page.

This commit adds logic to track operations with their respective
bounding boxes. Only operations that actually cause something to
be rendered have a bounding box and dependencies.

Consider the following example:
```
0. setFillRGBColor
1. beginText
2. showText "Hello"
3. endText
4. constructPath [...] -> eoFill
```
here we have three rendering operations: the showText op (2) and the
path (4). (2) depends on (0), (1) and (3), while (4) only depends on
(0). Both (2) and (4) have a bounding box.

This tracking happens when first rendering a PDF: we then use the
recorded information to optimize future partial renderings of a PDF, so
that we can skip operations that do not affected the PDF area on the
canvas.

All this logic only runs when the new `enableOptimizedPartialRendering`
preference, disabled by default, is enabled.

The bounding boxes and dependencies are also shown in the pdfBug
stepper. When hovering over a step now:
- it highlights the steps that they depend on
- it highlights on the PDF itself the bounding box
@nicolo-ribaudo nicolo-ribaudo force-pushed the compute-bounding-boxes branch from c24487e to 6a22da9 Compare August 22, 2025 16:27
@nicolo-ribaudo
Copy link
Contributor Author

/botio unittest

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_unittest from @nicolo-ribaudo received. Current queue size: 0

Live output at: http://54.241.84.105:8877/fdc58ac905120ed/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_unittest from @nicolo-ribaudo received. Current queue size: 0

Live output at: http://54.193.163.58:8877/ccac5fd94e8b80f/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/fdc58ac905120ed/output.txt

Total script time: 2.40 mins

  • Unit Tests: Passed

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/ccac5fd94e8b80f/output.txt

Total script time: 7.87 mins

  • Unit Tests: FAILED

Copy link
Contributor

@calixteman calixteman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you.

@calixteman calixteman merged commit 673f19b into mozilla:master Aug 22, 2025
9 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Closed in PDF.js quality Aug 22, 2025
@calixteman
Copy link
Contributor

/botio makeref

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_makeref from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/86aa7b57d36ff5c/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/86a45629e3f18f6/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/86a45629e3f18f6/output.txt

Total script time: 17.42 mins

  • Make references: Passed
  • Check references: Passed

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Success

Full output at http://54.193.163.58:8877/86aa7b57d36ff5c/output.txt

Total script time: 30.19 mins

  • Make references: Passed
  • Check references: Passed

@nicolo-ribaudo nicolo-ribaudo deleted the compute-bounding-boxes branch August 22, 2025 19:26
@timvandermeij
Copy link
Contributor

Nice work; thanks!

@nicolo-ribaudo
Copy link
Contributor Author

I have a couple of follow up PRs coming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Closed

Development

Successfully merging this pull request may close these issues.

5 participants