-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
Context
PR #5827 added dbt integration that creates Entity objects from dbt model columns.
Problem
No validation that the entity column has an appropriate data type for use as an entity key. Entity keys should typically be:
- STRING / VARCHAR
- INT / INT64 / BIGINT
- UUID (if supported)
But the code would accept any column type including:
- FLOAT / DOUBLE (non-deterministic for joins)
- BYTES (not suitable for entity keys)
- TIMESTAMP (rarely appropriate)
Current Behavior
# In dbt_import.py:191-197
if entity_column not in column_names:
click.echo(warning)
continue
# No type checking!Proposed Solution
Add validation and warning:
entity_col = next((c for c in model.columns if c.name == entity_column), None)
if entity_col:
normalized_type = entity_col.data_type.upper()
valid_entity_types = ['STRING', 'TEXT', 'VARCHAR', 'INT', 'INT32', 'INT64', 'INTEGER', 'BIGINT', 'UUID']
if not any(t in normalized_type for t in valid_entity_types):
click.echo(
f"{Fore.YELLOW}Warning: Entity column '{entity_column}' has type "
f"'{entity_col.data_type}' which may not be suitable for entity keys."
f" Recommended types: STRING, INT64{Style.RESET_ALL}"
)Edge Cases to Handle
- FLOAT columns (should warn strongly)
- ARRAY columns (invalid for entities)
- Complex/nested types (invalid)
Related
Metadata
Metadata
Assignees
Labels
No labels