[FLINK] Implement Iceberg lookup join functionality, and source code and junit test code.#15183
[FLINK] Implement Iceberg lookup join functionality, and source code and junit test code.#15183fightBoxing wants to merge 1 commit into
Conversation
…and junit test code.
mxm
left a comment
There was a problem hiding this comment.
Thanks for the PR @fightBoxing! A couple of comments:
- Please only target the newest version of Flink (currently 2.1). Remove any code for older versions. Backports need to be dealt later on.
- Please make sure the code is clean and compiles.
- Please make sure to read https://iceberg.apache.org/contribute/
- Comments should be in English
In general, this type of change may warrant a design document which should be reviewed before the code changes. It may be necessary to break up the changes into multiple PRs to make it easier to review the different components. I hope this makes sense. Thank you for your time and effort.
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Problem
In production environments, there is a common need to join streaming data with dimension data stored in Iceberg tables. The dimension data needs to be periodically refreshed to ensure join accuracy. Currently, Flink lacks native support for Iceberg lookup joins, forcing users to work around this limitation or use alternative solutions.
Solution
This PR implements Iceberg lookup join functionality for Flink, enabling efficient joins between streaming data and Iceberg dimension tables. The implementation includes:
IcebergLookupCache: A cache mechanism for storing and managing lookup data with TTL support
IcebergLookupReader: A reader component for loading and refreshing lookup data from Iceberg tables
IcebergTableSource enhancement: Updated to support lookup join operations
Configuration options: New config options for customizing lookup join behavior (cache size, refresh interval, etc.)
Integration tests: Comprehensive test coverage (IcebergLookupJoinITCase)
Changes
Added IcebergLookupCache for efficient caching of lookup data
Added IcebergLookupReader for reading lookup data from Iceberg tables
Added IcebergLookupJoinITCase for integration testing
Updated IcebergTableSource to support lookup join operations
Added configuration options in FlinkConfigOptions for lookup join settings
Updated build.gradle files for v1.16, v1.17, and v1.18
Benefits
Enables real-time joins with Iceberg dimension tables
Reduces data latency by avoiding frequent full table scans
Improves performance through intelligent caching strategies
Seamlessly integrates with existing Flink lookup join framework
Supports periodic data refresh to ensure data freshness
Testing
Added integration tests to validate lookup join functionality
Tested cache refresh mechanisms
Verified correctness of join results
Ensures backward compatibility