Skip to content

API/Core: Initial Table Scan Reporting support#5268

Merged
rdblue merged 1 commit into
apache:masterfrom
nastra:scan-reporting
Aug 3, 2022
Merged

API/Core: Initial Table Scan Reporting support#5268
rdblue merged 1 commit into
apache:masterfrom
nastra:scan-reporting

Conversation

@nastra
Copy link
Copy Markdown
Contributor

@nastra nastra commented Jul 13, 2022

The idea here is that we collect metrics during Table scans so that we
can better understand what exactly happens during a Table scan.

This is just an initial implementation that gathers/records a few metrics. The idea is to get early feedback on this approach.

Overview of proposed changes

  • a ScanReport that contains all relevant metrics from a Scan
  • a ScanReporter interface that can be used to collect and report different metrics during a Table scant. Engines can implement this API to customize how a ScanReport is being reported
  • a LoggingScanReporter is introduced that logs a ScanReport to the log file. We can think about defaulting to that one or using a no-op scan reporter
  • a new Catalog Property scan-reporter-impl is introduced, allowing a Catalog to specify which ScanReporter to use
  • a ScanMetrics that carries all relevant metrics during a Table scan. This class should be seen as something that is only used during a scan. At the end of a scan we'll want to create a ScanReport based on the collected metrics
  • a Timer and a DefaultTimer were introduced so that we can measure spent time for a table scan
  • some other changes, such as DefaultScanMetrics that use native Java Integer/Long counters and a DefaultTimer

Comment thread api/src/main/java/org/apache/iceberg/metrics/ScanMetricsContext.java Outdated
@nastra nastra changed the title Initial ScanMetricsContext / ScanReporting support Initial Table Scan Reporting support Jul 14, 2022
@nastra nastra force-pushed the scan-reporting branch 2 times, most recently from 845ae00 to b2fd00b Compare July 14, 2022 10:58
Comment thread api/src/main/java/org/apache/iceberg/metrics/Timer.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/Timer.java
Comment thread api/src/main/java/org/apache/iceberg/metrics/Timer.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/Timer.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/DefaultTimer.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/LongCounter.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/LongCounter.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/LongCounter.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/io/CloseableIterable.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/DefaultTimer.java Outdated
@nastra
Copy link
Copy Markdown
Contributor Author

nastra commented Jul 15, 2022

I applied all the review feedback and splitted out the introduction of a DefaultMetricsContext and the Timer API into #5286. Once #5286 is merged, I'll rebase this PR so that it only contains Table Scan related changes

Comment thread api/src/main/java/org/apache/iceberg/metrics/Timer.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/Timer.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/Timer.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/Timer.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/Timer.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/DefaultTimer.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/DefaultTimer.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/DefaultTimer.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/ScanReport.java Outdated
Comment thread api/src/main/java/org/apache/iceberg/metrics/ScanReport.java Outdated
Comment thread core/src/main/java/org/apache/iceberg/ManifestGroup.java
Comment thread data/src/test/java/org/apache/iceberg/TestScanPlanningAndReporting.java Outdated
@nastra nastra changed the title Initial Table Scan Reporting support API/Core: Initial Table Scan Reporting support Aug 1, 2022
@nastra nastra requested a review from rdblue August 1, 2022 12:40
@nastra nastra closed this Aug 1, 2022
@nastra nastra reopened this Aug 1, 2022
Comment thread data/src/test/java/org/apache/iceberg/TestScanPlanningAndReporting.java Outdated
Comment thread data/src/test/java/org/apache/iceberg/TestScanPlanningAndReporting.java Outdated
@nastra nastra requested a review from rdblue August 3, 2022 09:40
Comment thread data/src/test/java/org/apache/iceberg/TestScanPlanningAndReporting.java Outdated
Comment thread data/src/test/java/org/apache/iceberg/TestScanPlanningAndReporting.java Outdated
Comment thread data/src/test/java/org/apache/iceberg/TestScanPlanningAndReporting.java Outdated
Comment thread data/src/test/java/org/apache/iceberg/TestScanPlanningAndReporting.java Outdated
Comment thread data/src/test/java/org/apache/iceberg/TestScanPlanningAndReporting.java Outdated
Comment thread data/src/test/java/org/apache/iceberg/TestScanPlanningAndReporting.java Outdated
The idea here is that we collect metrics during Table scans so that we
can better understand where time is spent during a scan and how many
files are being looked at.
@rdblue rdblue merged commit 305ab15 into apache:master Aug 3, 2022
@rdblue
Copy link
Copy Markdown
Contributor

rdblue commented Aug 3, 2022

Thanks @nastra! I just merged this.

@nastra nastra deleted the scan-reporting branch September 27, 2022 06:23
@findepi
Copy link
Copy Markdown
Member

findepi commented Dec 21, 2022

Just FYI, this seems to have inflated logs quite noticeably (trinodb/trino#15492).
I wonder whether INFO-level logging should be the default behavior.
Wouldn't it be better to have these logged at DEBUG by default?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants