Skip to content

BigQuerySource.get_table_query_string() silently ignores query when table is also set #6200

@max36067

Description

@max36067

Summary

When both table and query are provided to BigQuerySource, get_table_query_string() always returns table, silently ignoring query. This makes it impossible to use a custom query (e.g., for deduplication) on a PushSource batch source, since PushSource requires table for offline writes via offline_write_batch().

Expected Behavior

When both table and query are set on a BigQuerySource:

  • Reads (get_table_query_string()) should use query — it's more specific and intentionally provided
  • Writes (offline_write_batch()) should continue using .table directly as the write destination

Current Behavior

get_table_query_string() in bigquery_source.py always prefers table:

def get_table_query_string(self) -> str:
    if self.table:
        return f"`{self.table}`"
    return f"({self.query})"

This means any custom query (e.g., deduplication logic) is silently ignored when table is also present.

Use Case

Streaming (push) sources often produce duplicate rows in BigQuery. The natural solution is:

batch_source = BigQuerySource(
    name="my_batch_source",
    table="project.dataset.my_table",        # needed for push writes
    query="""
        SELECT * FROM `project.dataset.my_table`
        QUALIFY ROW_NUMBER() OVER (PARTITION BY entity_id, event_time) = 1
    """,                                      # needed for deduplicated reads
    timestamp_field="event_time",
)
push_source = PushSource(name="my_source", batch_source=batch_source)

But because get_table_query_string() ignores query when table is set, reads return duplicates. And removing table to force query usage breaks offline_write_batch(), which accesses .table directly (bigquery.py:449).

Environment

  • Feast version: 0.58.0 (also confirmed unresolved on 0.61.0 / current main)
  • Offline store: BigQuery

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions