Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix: MySQL hash expression in StreamingMergeDiff was NULL-unsafe
CONCAT() returns NULL if any argument is NULL. Without COALESCE, any
row whose hash expression contained a NULL column produced a NULL hash.
In Phase 1 of the streaming merge, PHP compares hashes with !==; since
null !== null evaluates to false, a changed row where both source and
target shared the same NULL column(s) but differed in other columns
would silently pass through as identical — the update would be missed.

Also add CHAR(31) as the column separator (consistent with the pgsql
and sqlite drivers, which already use \x1f) to prevent cross-column
hash collisions where different value distributions produce the same
concatenated string.

Fix: wrap each column in COALESCE(CAST(col AS CHAR CHARACTER SET utf8), '')
and separate columns with CHAR(31) in the CONCAT call.

Updated testBuildHashExpressionMySQL to assert COALESCE and CHAR(31)
are both present in the generated expression."
  • Loading branch information
jasdeepkhalsa committed Mar 26, 2026
commit 84384b28f780e8737e8aa5418c88c5fdfb85f5fd
6 changes: 4 additions & 2 deletions src/DB/Data/StreamingMergeDiff.php
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,9 @@ private function streamAndMerge(string $table, array $key, array $hashCols): arr
/**
* Build a driver-appropriate SQL hash expression for the given columns.
*
* MySQL: SHA2(CONCAT(CAST(col AS CHAR CHARACTER SET utf8), …), 256)
* MySQL: SHA2(CONCAT(COALESCE(CAST(col AS CHAR CHARACTER SET utf8), ''), CHAR(31), …), 256)
* CHAR(31) is the unit-separator (0x1f); COALESCE ensures NULL columns
* do not propagate NULL through CONCAT (which would mask real changes).
* Postgres: md5(COALESCE(col::text, '') || E'\x1f' || …)
* SQLite: hex(COALESCE(CAST(col AS TEXT), '') || X'1f' || …)
*/
Expand All @@ -235,7 +237,7 @@ public function buildHashExpression(array $columns): string
}

return match ($this->driver) {
'mysql' => 'SHA2(CONCAT(' . implode(', ', array_map(fn($c) => "CAST(`$c` AS CHAR CHARACTER SET utf8)", $columns)) . '), 256)',
'mysql' => 'SHA2(CONCAT(' . implode(", CHAR(31), ", array_map(fn($c) => "COALESCE(CAST(`$c` AS CHAR CHARACTER SET utf8), '')", $columns)) . '), 256)',
'pgsql' => "md5(" . implode(" || E'\\x1f' || ", array_map(fn($c) => "COALESCE(\"$c\"::text, '')", $columns)) . ")",
'sqlite' => "hex(" . implode(" || X'1f' || ", array_map(fn($c) => "COALESCE(CAST(\"$c\" AS TEXT), '')", $columns)) . ")",
default => throw new DataException("Unsupported driver: {$this->driver}"),
Expand Down
7 changes: 7 additions & 0 deletions tests/DataDiff/StreamingMergeDiffTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -449,6 +449,13 @@ public function testBuildHashExpressionMySQL(): void
$expr = $merge->buildHashExpression(['col1', 'col2']);
$this->assertStringContainsString('SHA2(', $expr);
$this->assertStringContainsString('CONCAT(', $expr);
// NULL-safety: each column must be wrapped in COALESCE so a single NULL
// column does not collapse the entire hash expression to NULL and cause
// changed rows to be silently skipped.
$this->assertStringContainsString('COALESCE(', $expr);
// Column separator: CHAR(31) (unit separator, 0x1f) prevents cross-column
// hash collisions where different value distributions produce the same string.
$this->assertStringContainsString('CHAR(31)', $expr);
}

public function testBuildHashExpressionEmpty(): void
Expand Down
Loading