Spark: Support UPDATE statements with subqueries#2206
Conversation
| import org.apache.spark.sql.catalyst.utils.PlanUtils.isIcebergRelation | ||
|
|
||
| object MergeIntoTablePredicateCheck extends (LogicalPlan => Unit) { | ||
| object RowLevelOperationsPredicateCheck extends (LogicalPlan => Unit) { |
There was a problem hiding this comment.
I squashed multiple rules into one.
| scanPlan: LogicalPlan, | ||
| assignments: Seq[Assignment], | ||
| cond: Expression): LogicalPlan = { | ||
| cond: Expression = Literal.TrueLiteral): LogicalPlan = { |
There was a problem hiding this comment.
When we process matched rows, we know all rows must be updated. No need for If expressions.
| } | ||
|
|
||
| @Test | ||
| public void testUpdateWithInSubquery() { |
There was a problem hiding this comment.
I validated results match Postgres for some queries. I'll check others before the PR is merged.
There was a problem hiding this comment.
One thing I think you may want to check here is subqueries that uses aliases as well
There was a problem hiding this comment.
Good point, let me add this together with self joins.
There was a problem hiding this comment.
Added testUpdateWithSelfSubquery with this.
rdblue
left a comment
There was a problem hiding this comment.
The logic in the rewrite looks correct to me.
|
@aokolnychyi A small request.. Is it possible for you to put the rewritten optimized plan in the pr description ? Just for us to refer back to quickly get an idea of the rewrite. |
|
@dilipbiswal, good idea. I'll add a few sample plans. |
|
@dilipbiswal, added an example to the PR description. |
|
Thanks for reviewing, @RussellSpitzer @dilipbiswal @rdblue! |
This PR adds support for UPDATE statements with subqueries.
Let's consider an example.