# Summary This document contains [DuckDB's official documentation and guides](https://duckdb.org/) in a single-file easy-to-search form. If you find any issues, please report them [as a GitHub issue](https://github.com/duckdb/duckdb-web/issues). Contributions are very welcome in the form of [pull requests](https://github.com/duckdb/duckdb-web/pulls). If you are considering submitting a contribution to the documentation, please consult our [contributor guide](https://github.com/duckdb/duckdb-web/blob/main/CONTRIBUTING.md). Code repositories: * DuckDB source code: [github.com/duckdb/duckdb](https://github.com/duckdb/duckdb) * DuckDB documentation source code: [github.com/duckdb/duckdb-web](https://github.com/duckdb/duckdb-web) # Connect {#connect} ## Connect {#docs:stable:connect:overview} #### Connect or Create a Database {#docs:stable:connect:overview::connect-or-create-a-database} To use DuckDB, you must first create a connection to a database. The exact syntax varies between the [client APIs](#docs:stable:clients:overview) but it typically involves passing an argument to configure persistence. #### Persistence {#docs:stable:connect:overview::persistence} DuckDB can operate in both persistent mode, where the data is saved to disk, and in in-memory mode, where the entire dataset is stored in the main memory. > **Tip.** Both persistent and in-memory databases use spilling to disk to facilitate larger-than-memory workloads (i.e., out-of-core-processing). ##### Persistent Database {#docs:stable:connect:overview::persistent-database} To create or open a persistent database, set the path of the database file, e.g., `my_database.duckdb`, when creating the connection. This path can point to an existing database or to a file that does not yet exist and DuckDB will open or create a database at that location as needed. The file may have an arbitrary extension, but `.db` or `.duckdb` are two common choices with `.ddb` also used sometimes. Starting with v0.10, DuckDB's storage format is [backwards-compatible](#docs:stable:internals:storage::backward-compatibility), i.e., DuckDB is able to read database files produced by an older versions of DuckDB. For example, DuckDB v0.10 can read and operate on files created by the previous DuckDB version, v0.9. For more details on DuckDB's storage format, see the [storage page](#docs:stable:internals:storage). ##### In-Memory Database {#docs:stable:connect:overview::in-memory-database} DuckDB can operate in in-memory mode. In most clients, this can be activated by passing the special value `:memory:` as the database file or omitting the database file argument. In in-memory mode, no data is persisted to disk, therefore, all data is lost when the process finishes. ## Concurrency {#docs:stable:connect:concurrency} #### Handling Concurrency {#docs:stable:connect:concurrency::handling-concurrency} DuckDB has two configurable options for concurrency: 1. One process can both read and write to the database. 2. Multiple processes can read from the database, but no processes can write ([`access_mode = 'READ_ONLY'`](#docs:stable:configuration:overview::configuration-reference)). When using option 1, DuckDB supports multiple writer threads using a combination of [MVCC (Multi-Version Concurrency Control)](https://en.wikipedia.org/wiki/Multiversion_concurrency_control) and optimistic concurrency control (see [Concurrency within a Single Process](#::concurrency-within-a-single-process)), but all within that single writer process. The reason for this concurrency model is to allow for the caching of data in RAM for faster analytical queries, rather than going back and forth to disk during each query. It also allows the caching of functions pointers, the database catalog, and other items so that subsequent queries on the same connection are faster. > DuckDB is optimized for bulk operations, so executing many small transactions is not a primary design goal. #### Concurrency within a Single Process {#docs:stable:connect:concurrency::concurrency-within-a-single-process} DuckDB supports concurrency within a single process according to the following rules. As long as there are no write conflicts, multiple concurrent writes will succeed. Appends will never conflict, even on the same table. Multiple threads can also simultaneously update separate tables or separate subsets of the same table. Optimistic concurrency control comes into play when two threads attempt to edit (update or delete) the same row at the same time. In that situation, the second thread to attempt the edit will fail with a conflict error. #### Writing to DuckDB from Multiple Processes {#docs:stable:connect:concurrency::writing-to-duckdb-from-multiple-processes} Writing to DuckDB from multiple processes is not supported automatically and is not a primary design goal (see [Handling Concurrency](#::handling-concurrency)). If multiple processes must write to the same file, several design patterns are possible, but would need to be implemented in application logic. For example, each process could acquire a cross-process mutex lock, then open the database in read/write mode and close it when the query is complete. Instead of using a mutex lock, each process could instead retry the connection if another process is already connected to the database (being sure to close the connection upon query completion). Another alternative would be to do multi-process transactions on a MySQL, PostgreSQL, or SQLite database, and use DuckDB's [MySQL](#docs:stable:core_extensions:mysql), [PostgreSQL](#docs:stable:core_extensions:postgres), or [SQLite](#docs:stable:core_extensions:sqlite) extensions to execute analytical queries on that data periodically. Additional options include writing data to Parquet files and using DuckDB's ability to [read multiple Parquet files](#docs:stable:data:parquet:overview), taking a similar approach with [CSV files](#docs:stable:data:csv:overview), or creating a web server to receive requests and manage reads and writes to DuckDB. #### Optimistic Concurrency Control {#docs:stable:connect:concurrency::optimistic-concurrency-control} DuckDB uses [optimistic concurrency control](https://en.wikipedia.org/wiki/Optimistic_concurrency_control), an approach generally considered to be the best fit for read-intensive analytical database systems as it speeds up read query processing. As a result any transactions that modify the same rows at the same time will cause a transaction conflict error: ```console Transaction conflict: cannot update a table that has been altered! ``` > **Tip.** A common workaround when a transaction conflict is encountered is to rerun the transaction. # Data Import and Export {#data} ## Importing Data {#docs:stable:data:overview} The first step to using a database system is to insert data into that system. DuckDB can directly connect to [many popular data sources](#docs:stable:data:data_sources) and offers several data ingestion methods that allow you to easily and efficiently fill up the database. On this page, we provide an overview of these methods so you can select which one is best suited for your use case. #### `INSERT` Statements {#docs:stable:data:overview::insert-statements} `INSERT` statements are the standard way of loading data into a database system. They are suitable for quick prototyping, but should be avoided for bulk loading as they have significant per-row overhead. ```sql INSERT INTO people VALUES (1, 'Mark'); ``` For a more detailed description, see the [page on the `INSERT` statement](#docs:stable:data:insert). #### File Loading: Relative Paths {#docs:stable:data:overview::file-loading-relative-paths} Use the configuration option [`file_search_path`](#docs:stable:configuration:overview::local-configuration-options) to configure to which â€œroot directoriesâ€ relative paths are expanded on. If `file_search_path` is not set, the working directory is used as the basis for relative paths. #### File Formats {#docs:stable:data:overview::file-formats} ##### CSV Loading {#docs:stable:data:overview::csv-loading} Data can be efficiently loaded from CSV files using several methods. The simplest is to use the CSV file's name: ```sql SELECT * FROM 'test.csv'; ``` Alternatively, use the [`read_csv` function](#docs:stable:data:csv:overview) to pass along options: ```sql SELECT * FROM read_csv('test.csv', header = false); ``` Or use the [`COPY` statement](#docs:stable:sql:statements:copy::copy--from): ```sql COPY tbl FROM 'test.csv' (HEADER false); ``` It is also possible to read data directly from **compressed CSV files** (e.g., compressed with [gzip](https://www.gzip.org/)): ```sql SELECT * FROM 'test.csv.gz'; ``` DuckDB can create a table from the loaded data using the [`CREATE TABLE ... AS SELECT` statement](#docs:stable:sql:statements:create_table::create-table--as-select-ctas): ```sql CREATE TABLE test AS SELECT * FROM 'test.csv'; ``` For more details, see the [page on CSV loading](#docs:stable:data:csv:overview). ##### Parquet Loading {#docs:stable:data:overview::parquet-loading} Parquet files can be efficiently loaded and queried using their filename: ```sql SELECT * FROM 'test.parquet'; ``` Alternatively, use the [`read_parquet` function](#docs:stable:data:parquet:overview): ```sql SELECT * FROM read_parquet('test.parquet'); ``` Or use the [`COPY` statement](#docs:stable:sql:statements:copy::copy--from): ```sql COPY tbl FROM 'test.parquet'; ``` For more details, see the [page on Parquet loading](#docs:stable:data:parquet:overview). ##### JSON Loading {#docs:stable:data:overview::json-loading} JSON files can be efficiently loaded and queried using their filename: ```sql SELECT * FROM 'test.json'; ``` Alternatively, use the [`read_json_auto` function](#docs:stable:data:json:overview): ```sql SELECT * FROM read_json_auto('test.json'); ``` Or use the [`COPY` statement](#docs:stable:sql:statements:copy::copy--from): ```sql COPY tbl FROM 'test.json'; ``` For more details, see the [page on JSON loading](#docs:stable:data:json:overview). ##### Returning the Filename {#docs:stable:data:overview::returning-the-filename} Since DuckDB v1.3.0, the CSV, JSON and Parquet readers support the `filename` virtual column: ```sql COPY (FROM (VALUES (42), (43)) t(x)) TO 'test.parquet'; SELECT *, filename FROM 'test.parquet'; ``` #### Appender {#docs:stable:data:overview::appender} In several APIs (C, C++, Go, Java, and Rust), the [Appender](#docs:stable:data:appender) can be used as an alternative for bulk data loading. This class can be used to efficiently add rows to the database system without using SQL statements. ## Data Sources {#docs:stable:data:data_sources} DuckDB sources several data sources, including file formats, network protocols, and database systems: * [AWS S3 buckets and storage with S3-compatible API](#docs:stable:core_extensions:httpfs:s3api) * [Azure Blob Storage](#docs:stable:core_extensions:azure) * [Blob files](#docs:stable:guides:file_formats:read_file::read_blob) * [Cloudflare R2](#docs:stable:guides:network_cloud_storage:cloudflare_r2_import) * [CSV](#docs:stable:data:csv:overview) * [Delta Lake](#docs:stable:core_extensions:delta) * [Excel](#docs:stable:core_extensions:excel) * [httpfs](#docs:stable:core_extensions:httpfs:https) * [Iceberg](#docs:stable:core_extensions:iceberg:overview) * [JSON](#docs:stable:data:json:overview) * [MySQL](#docs:stable:core_extensions:mysql) * [Parquet](#docs:stable:data:parquet:overview) * [PostgreSQL](#docs:stable:core_extensions:postgres) * [SQLite](#docs:stable:core_extensions:sqlite) * [Text files](#docs:stable:guides:file_formats:read_file::read_text) ## CSV Files {#data:csv} ### CSV Import {#docs:stable:data:csv:overview} #### Examples {#docs:stable:data:csv:overview::examples} The following examples use the [`flights.csv`](https://duckdb.org/data/flights.csv) file. Read a CSV file from disk, auto-infer options: ```sql SELECT * FROM 'flights.csv'; ``` Use the `read_csv` function with custom options: ```sql SELECT * FROM read_csv('flights.csv', delim = '|', header = true, columns = { 'FlightDate': 'DATE', 'UniqueCarrier': 'VARCHAR', 'OriginCityName': 'VARCHAR', 'DestCityName': 'VARCHAR' }); ``` Read a CSV from stdin, auto-infer options: ```batch cat flights.csv | duckdb -c "SELECT * FROM read_csv('/dev/stdin')" ``` Read a CSV file into a table: ```sql CREATE TABLE ontime ( FlightDate DATE, UniqueCarrier VARCHAR, OriginCityName VARCHAR, DestCityName VARCHAR ); COPY ontime FROM 'flights.csv'; ``` Alternatively, create a table without specifying the schema manually using a [`CREATE TABLE ... AS SELECT` statement](#docs:stable:sql:statements:create_table::create-table--as-select-ctas): ```sql CREATE TABLE ontime AS SELECT * FROM 'flights.csv'; ``` We can use the [`FROM`-first syntax](#docs:stable:sql:query_syntax:from::from-first-syntax) to omit `SELECT *`. ```sql CREATE TABLE ontime AS FROM 'flights.csv'; ``` #### CSV Loading {#docs:stable:data:csv:overview::csv-loading} CSV loading, i.e., importing CSV files to the database, is a very common, and yet surprisingly tricky, task. While CSVs seem simple on the surface, there are a lot of inconsistencies found within CSV files that can make loading them a challenge. CSV files come in many different varieties, are often corrupt, and do not have a schema. The CSV reader needs to cope with all of these different situations. The DuckDB CSV reader can automatically infer which configuration flags to use by analyzing the CSV file using the [CSV sniffer](https://duckdb.org/2023/10/27/csv-sniffer). This will work correctly in most situations, and should be the first option attempted. In rare situations where the CSV reader cannot figure out the correct configuration it is possible to manually configure the CSV reader to correctly parse the CSV file. See the [auto detection page](#docs:stable:data:csv:auto_detection) for more information. #### Parameters {#docs:stable:data:csv:overview::parameters} Below are parameters that can be passed to the [`read_csv` function](#::csv-functions). Where meaningfully applicable, these parameters can also be passed to the [`COPY` statement](#docs:stable:sql:statements:copy::copy-to). | Name | Description | Type | Default | |:--|:-----|:-|:-| | `all_varchar` | Skip type detection and assume all columns are of type `VARCHAR`. This option is only supported by the `read_csv` function. | `BOOL` | `false` | | `allow_quoted_nulls` | Allow the conversion of quoted values to `NULL` values | `BOOL` | `true` | | `auto_detect` | [Auto detect CSV parameters](#docs:stable:data:csv:auto_detection). | `BOOL` | `true` | | `auto_type_candidates` | Types that the sniffer uses when detecting column types. The `VARCHAR` type is always included as a fallback option. See [example](#::auto_type_candidates-details). | `TYPE[]` | [default types](#::auto_type_candidates-details) | | `buffer_size` | Size of the buffers used to read files, in bytes. Must be large enough to hold four lines and can significantly impact performance. | `BIGINT` | `16 * max_line_size` | | `columns` | Column names and types, as a struct (e.g., `{'col1': 'INTEGER', 'col2': 'VARCHAR'}`). Using this option disables auto detection of the schema. | `STRUCT` | (empty) | | `comment` | Character used to initiate comments. Lines starting with a comment character (optionally preceded by space characters) are completely ignored; other lines containing a comment character are parsed only up to that point. | `VARCHAR` | (empty) | | `compression` | Method used to compress CSV files. By default this is detected automatically from the file extension (e.g., `t.csv.gz` will use gzip, `t.csv` will use `none`). Options are `none`, `gzip`, `zstd`. | `VARCHAR` | `auto` | | `dateformat` | [Date format](#docs:stable:sql:functions:dateformat) used when parsing and writing dates. | `VARCHAR` | (empty) | | `date_format` | Alias for `dateformat`; only available in the `COPY` statement. | `VARCHAR` | (empty) | | `decimal_separator` | Decimal separator for numbers. | `VARCHAR` | `.` | | `delim` | Delimiter character used to separate columns within each line, e.g., `,` `;` `\t`. The delimiter character can be up to 4 bytes, e.g., ðŸ¦†. Alias for `sep`. | `VARCHAR` | `,` | | `delimiter` | Alias for `delim`; only available in the `COPY` statement. | `VARCHAR` | `,` | | `escape` | String used to escape the `quote` character within quoted values. | `VARCHAR` | `"` | | `encoding` | Encoding used by the CSV file. Options are `utf-8`, `utf-16`, `latin-1`. Not available in the `COPY` statement (which always uses `utf-8`). | `VARCHAR` | `utf-8` | | `filename` | Add path of the containing file to each row, as a string column named `filename`. Relative or absolute paths are returned depending on the path or glob pattern provided to `read_csv`, not just filenames. Since DuckDB v1.3.0, the `filename` column is added automatically as a virtual column and this option is only kept for compatibility reasons. | `BOOL` | `false` | | `force_not_null` | Do not match values in the specified columns against the `NULL` string. In the default case where the `NULL` string is empty, this means that empty values are read as zero-length strings instead of `NULL`s. | `VARCHAR[]` | `[]` | | `header` | First line of each file contains the column names. | `BOOL` | `false` | | `hive_partitioning` | Interpret the path as a [Hive partitioned path](#docs:stable:data:partitioning:hive_partitioning). | `BOOL` | (auto-detected) | | `ignore_errors` | Ignore any parsing errors encountered. | `BOOL` | `false` | | `max_line_size` or `maximum_line_size`. Not available in the `COPY` statement. | Maximum line size, in bytes. | `BIGINT` | 2000000 | | `names` or `column_names` | Column names, as a list. See [example](#docs:stable:data:csv:tips::provide-names-if-the-file-does-not-contain-a-header). | `VARCHAR[]` | (empty) | | `new_line` | New line character(s). Options are `'\r'`,`'\n'`, or `'\r\n'`. The CSV parser only distinguishes between single-character and double-character line delimiters. Therefore, it does not differentiate between `'\r'` and `'\n'`.| `VARCHAR` | (empty) | | `normalize_names` | Normalize column names. This removes any non-alphanumeric characters from them. Column names that are reserved SQL keywords are prefixed with an underscore character (` _`). | `BOOL` | `false` | | `null_padding` | Pad the remaining columns on the right with `NULL` values when a line lacks columns. | `BOOL` | `false` | | `nullstr` or `null` | Strings that represent a `NULL` value. | `VARCHAR` or `VARCHAR[]` | (empty) | | `parallel` | Use the parallel CSV reader. | `BOOL` | `true` | | `quote` | String used to quote values. | `VARCHAR` | `"` | | `rejects_scan` | Name of the [temporary table where information on faulty scans is stored](#docs:stable:data:csv:reading_faulty_csv_files::reject-scans). | `VARCHAR` | `reject_scans` | | `rejects_table` | Name of the [temporary table where information on faulty lines is stored](#docs:stable:data:csv:reading_faulty_csv_files::reject-errors). | `VARCHAR` | `reject_errors` | | `rejects_limit` | Upper limit on the number of faulty lines per file that are recorded in the rejects table. Setting this to `0` means that no limit is applied. | `BIGINT` | `0` | | `sample_size` | Number of sample lines for [auto detection of parameters](#docs:stable:data:csv:auto_detection). | `BIGINT` | 20480 | | `sep` | Delimiter character used to separate columns within each line, e.g., `,` `;` `\t`. The delimiter character can be up to 4 bytes, e.g., ðŸ¦†. Alias for `delim`. | `VARCHAR` | `,` | | `skip` | Number of lines to skip at the start of each file. | `BIGINT` | 0 | | `store_rejects` | Skip any lines with errors and store them in the rejects table. | `BOOL` | `false` | | `strict_mode` | Enforces the strictness level of the CSV Reader. When set to `true`, the parser will throw an error upon encountering any issues. When set to `false`, the parser will attempt to read structurally incorrect files. It is important to note that reading structurally incorrect files can cause ambiguity; therefore, this option should be used with caution. | `BOOL` | `true` | | `thousands` | Character used to identify thousands separators in numeric values. It must be a single character and different from the `decimal_separator` option.| `VARCHAR` | (empty) | | `timestampformat` | [Timestamp format](#docs:stable:sql:functions:dateformat) used when parsing and writing timestamps. | `VARCHAR` | (empty) | | `timestamp_format` | Alias for `timestampformat`; only available in the `COPY` statement. | `VARCHAR` | (empty) | | `types` or `dtypes` or `column_types` | Column types, as either a list (by position) or a struct (by name). See [example](#docs:stable:data:csv:tips::override-the-types-of-specific-columns). | `VARCHAR[]` or `STRUCT` | (empty) | | `union_by_name` | Align columns from different files [by column name](#docs:stable:data:multiple_files:combining_schemas::union-by-name) instead of position. Using this option increases memory consumption. | `BOOL` | `false` | > **Tip.** DuckDB's CSV reader supports `UTF-8` (default), `UTF-16` and `Latin-1` encodings as well as many other `encoding` options > natively through the `encoding` extension, for details see [All Supported Encodings](#{% link docs:stable:core_extensions:encodings.md%}::all-supported-encodings). > To convert files with different encodings, we recommend using the [`iconv` command-line tool](https://linux.die.net/man/1/iconv). > > ```batch > iconv -f ISO-8859-2 -t UTF-8 input.csv > input-utf-8.csv > ``` ##### `auto_type_candidates` Details {#docs:stable:data:csv:overview::auto_type_candidates-details} The `auto_type_candidates` option lets you specify the data types that should be considered by the CSV reader for [column data type detection](#docs:stable:data:csv:auto_detection::type-detection). Usage example: ```sql SELECT * FROM read_csv('csv_file.csv', auto_type_candidates = ['BIGINT', 'DATE']); ``` The default value for the `auto_type_candidates` option is `['SQLNULL', 'BOOLEAN', 'BIGINT', 'DOUBLE', 'TIME', 'DATE', 'TIMESTAMP', 'VARCHAR']`. #### CSV Functions {#docs:stable:data:csv:overview::csv-functions} The `read_csv` automatically attempts to figure out the correct configuration of the CSV reader using the [CSV sniffer](https://duckdb.org/2023/10/27/csv-sniffer). It also automatically deduces types of columns. If the CSV file has a header, it will use the names found in that header to name the columns. Otherwise, the columns will be named `column0, column1, column2, ...`. An example with the [`flights.csv`](https://duckdb.org/data/flights.csv) file: ```sql SELECT * FROM read_csv('flights.csv'); ``` | FlightDate | UniqueCarrier | OriginCityName | DestCityName | |------------|---------------|----------------|-----------------| | 1988-01-01 | AA | New York, NY | Los Angeles, CA | | 1988-01-02 | AA | New York, NY | Los Angeles, CA | | 1988-01-03 | AA | New York, NY | Los Angeles, CA | The path can either be a relative path (relative to the current working directory) or an absolute path. We can use `read_csv` to create a persistent table as well: ```sql CREATE TABLE ontime AS SELECT * FROM read_csv('flights.csv'); DESCRIBE ontime; ``` | column_name | column_type | null | key | default | extra | |----------------|-------------|------|------|---------|-------| | FlightDate | DATE | YES | NULL | NULL | NULL | | UniqueCarrier | VARCHAR | YES | NULL | NULL | NULL | | OriginCityName | VARCHAR | YES | NULL | NULL | NULL | | DestCityName | VARCHAR | YES | NULL | NULL | NULL | ```sql SELECT * FROM read_csv('flights.csv', sample_size = 20_000); ``` If we set `delim` / `sep`, `quote`, `escape`, or `header` explicitly, we can bypass the automatic detection of this particular parameter: ```sql SELECT * FROM read_csv('flights.csv', header = true); ``` Multiple files can be read at once by providing a glob or a list of files. Refer to the [multiple files section](#docs:stable:data:multiple_files:overview) for more information. #### Writing Using the `COPY` Statement {#docs:stable:data:csv:overview::writing-using-the-copy-statement} The [`COPY` statement](#docs:stable:sql:statements:copy::copy-to) can be used to load data from a CSV file into a table. This statement has the same syntax as the one used in PostgreSQL. To load the data using the `COPY` statement, we must first create a table with the correct schema (which matches the order of the columns in the CSV file and uses types that fit the values in the CSV file). `COPY` detects the CSV's configuration options automatically. ```sql CREATE TABLE ontime ( flightdate DATE, uniquecarrier VARCHAR, origincityname VARCHAR, destcityname VARCHAR ); COPY ontime FROM 'flights.csv'; SELECT * FROM ontime; ``` | flightdate | uniquecarrier | origincityname | destcityname | |------------|---------------|----------------|-----------------| | 1988-01-01 | AA | New York, NY | Los Angeles, CA | | 1988-01-02 | AA | New York, NY | Los Angeles, CA | | 1988-01-03 | AA | New York, NY | Los Angeles, CA | If we want to manually specify the CSV format, we can do so using the configuration options of `COPY`. ```sql CREATE TABLE ontime (flightdate DATE, uniquecarrier VARCHAR, origincityname VARCHAR, destcityname VARCHAR); COPY ontime FROM 'flights.csv' (DELIMITER '|', HEADER); SELECT * FROM ontime; ``` #### Reading Faulty CSV Files {#docs:stable:data:csv:overview::reading-faulty-csv-files} DuckDB supports reading erroneous CSV files. For details, see the [Reading Faulty CSV Files page](#docs:stable:data:csv:reading_faulty_csv_files). #### Order Preservation {#docs:stable:data:csv:overview::order-preservation} The CSV reader respects the `preserve_insertion_order` [configuration option](#docs:stable:configuration:overview) to [preserve insertion order](#docs:stable:sql:dialect:order_preservation). When `true` (the default), the order of the rows in the result set returned by the CSV reader is the same as the order of the corresponding lines read from the file(s). When `false`, there is no guarantee that the order is preserved. #### Writing CSV Files {#docs:stable:data:csv:overview::writing-csv-files} DuckDB can write CSV files using the [`COPY ... TO` statement](#docs:stable:sql:statements:copy::copy--to). ### CSV Auto Detection {#docs:stable:data:csv:auto_detection} When using `read_csv`, the system tries to automatically infer how to read the CSV file using the [CSV sniffer](https://duckdb.org/2023/10/27/csv-sniffer). This step is necessary because CSV files are not self-describing and come in many different dialects. The auto-detection works roughly as follows: * Detect the dialect of the CSV file (delimiter, quoting rule, escape) * Detect the types of each of the columns * Detect whether or not the file has a header row By default the system will try to auto-detect all options. However, options can be individually overridden by the user. This can be useful in case the system makes a mistake. For example, if the delimiter is chosen incorrectly, we can override it by calling the `read_csv` with an explicit delimiter (e.g., `read_csv('file.csv', delim = '|')`). #### Sample Size {#docs:stable:data:csv:auto_detection::sample-size} The type detection works by operating on a sample of the file. The size of the sample can be modified by setting the `sample_size` parameter. The default sample size is 20,480 rows. Setting the `sample_size` parameter to `-1` means the entire file is read for sampling: ```sql SELECT * FROM read_csv('my_csv_file.csv', sample_size = -1); ``` The way sampling is performed depends on the type of file. If we are reading from a regular file on disk, we will jump into the file and try to sample from different locations in the file. If we are reading from a file in which we cannot jump â€“ such as a `.gz` compressed CSV file or `stdin` â€“ samples are taken only from the beginning of the file. #### `sniff_csv` Function {#docs:stable:data:csv:auto_detection::sniff_csv-function} It is possible to run the CSV sniffer as a separate step using the `sniff_csv(filename)` function, which returns the detected CSV properties as a table with a single row. The `sniff_csv` function accepts an optional `sample_size` parameter to configure the number of rows sampled. ```sql FROM sniff_csv('my_file.csv'); FROM sniff_csv('my_file.csv', sample_size = 1000); ``` | Column name | Description | Example | |--------------------|-----------------------------------------------|-------------------------------------------------------------------| | `Delimiter` | Delimiter | `,` | | `Quote` | Quote character | `"` | | `Escape` | Escape | `\` | | `NewLineDelimiter` | New-line delimiter | `\r\n` | | `Comment` | Comment character | `#` | | `SkipRows` | Number of rows skipped | 1 | | `HasHeader` | Whether the CSV has a header | `true` | | `Columns` | Column types encoded as a `LIST` of `STRUCT`s | `({'name': 'VARCHAR', 'age': 'BIGINT'})` | | `DateFormat` | Date format | `%d/%m/%Y` | | `TimestampFormat` | Timestamp Format | `%Y-%m-%dT%H:%M:%S.%f` | | `UserArguments` | Arguments used to invoke `sniff_csv` | `sample_size = 1000` | | `Prompt` | Prompt ready to be used to read the CSV | `FROM read_csv('my_file.csv', auto_detect=false, delim=',', ...)` | ##### Prompt {#docs:stable:data:csv:auto_detection::prompt} The `Prompt` column contains a SQL command with the configurations detected by the sniffer. ```sql -- use line mode in CLI to get the full command .mode line SELECT Prompt FROM sniff_csv('my_file.csv'); ``` ```text Prompt = FROM read_csv('my_file.csv', auto_detect=false, delim=',', quote='"', escape='"', new_line='\n', skip=0, header=true, columns={...}); ``` #### Detection Steps {#docs:stable:data:csv:auto_detection::detection-steps} ##### Dialect Detection {#docs:stable:data:csv:auto_detection::dialect-detection} Dialect detection works by attempting to parse the samples using the set of considered values. The detected dialect is the dialect that has (1) a consistent number of columns for each row, and (2) the highest number of columns for each row. The following dialects are considered for automatic dialect detection. | Parameters | Considered values | |------------|-----------------------| | `delim` | `,` `|` `;` `\t` | | `quote` | `"` `'` (empty) | | `escape` | `"` `'` `\` (empty) | Consider the example file [`flights.csv`](https://duckdb.org/data/flights.csv): ```csv FlightDate|UniqueCarrier|OriginCityName|DestCityName 1988-01-01|AA|New York, NY|Los Angeles, CA 1988-01-02|AA|New York, NY|Los Angeles, CA 1988-01-03|AA|New York, NY|Los Angeles, CA ``` In this file, the dialect detection works as follows: * If we split by a `|` every row is split into `4` columns * If we split by a `,` rows 2-4 are split into `3` columns, while the first row is split into `1` column * If we split by `;`, every row is split into `1` column * If we split by `\t`, every row is split into `1` column In this example â€“ the system selects the `|` as the delimiter. All rows are split into the same amount of columns, and there is more than one column per row meaning the delimiter was actually found in the CSV file. ##### Type Detection {#docs:stable:data:csv:auto_detection::type-detection} After detecting the dialect, the system will attempt to figure out the types of each of the columns. Note that this step is only performed if we are calling `read_csv`. In case of the `COPY` statement the types of the table that we are copying into will be used instead. The type detection works by attempting to convert the values in each column to the candidate types. If the conversion is unsuccessful, the candidate type is removed from the set of candidate types for that column. After all samples have been handled â€“ the remaining candidate type with the highest priority is chosen. The default set of candidate types is given below, in order of priority: | Types | |-------------| | NULL | | BOOLEAN | | TIME | | DATE | | TIMESTAMP | | TIMESTAMPTZ | | BIGINT | | DOUBLE | | VARCHAR | Everything can be cast to `VARCHAR`, therefore, this type has the lowest priority meaning that all columns are converted to `VARCHAR` as a fallback if they cannot be cast to anything else. In [`flights.csv`](https://duckdb.org/data/flights.csv) the `FlightDate` column will be cast to a `DATE`, while the other columns will be cast to `VARCHAR`. The set of candidate types that should be considered by the CSV reader can be specified explicitly using the [`auto_type_candidates`](#docs:stable:data:csv:overview::auto_type_candidates-details) option. `VARCHAR` as the fallback type will always be considered as a candidate type whether you specify it or not. Here are all additional candidate types that may be specified using the `auto_type_candidates` option, in order of priority: | Types | |-----------| | TINYINT | | SMALLINT | | INTEGER | | DECIMAL | | FLOAT | Even though the set of data types that can be automatically detected may appear quite limited, the CSV reader can be configured to read arbitrarily complex types by using the `types`-option described in the next section. Type detection can be entirely disabled by using the `all_varchar` option. If this is set all columns will remain as `VARCHAR` (as they originally occur in the CSV file). Note that using quote characters vs. no quote characters (e.g., `"42"` and `42`) does not make a difference for type detection. Quoted fields will not be converted to `VARCHAR`, instead, the sniffer will try to find the type candidate with the highest priority. ###### Overriding Type Detection {#docs:stable:data:csv:auto_detection::overriding-type-detection} The detected types can be individually overridden using the `types` option. This option takes either of two options: * A list of type definitions (e.g., `types = ['INTEGER', 'VARCHAR', 'DATE']`). This overrides the types of the columns in-order of occurrence in the CSV file. * Alternatively, `types` takes a `name` â†’ `type` map which overrides options of individual columns (e.g., `types = {'quarter': 'INTEGER'}`). The set of column types that may be specified using the `types` option is not as limited as the types available for the `auto_type_candidates` option: any valid type definition is acceptable to the `types`-option. (To get a valid type definition, use the [`typeof()`](#docs:stable:sql:functions:utility::typeofexpression) function, or use the `column_type` column of the [`DESCRIBE`](#docs:stable:guides:meta:describe) result.) The `sniff_csv()` function's `Column` field returns a struct with column names and types that can be used as a basis for overriding types. #### Header Detection {#docs:stable:data:csv:auto_detection::header-detection} Header detection works by checking if the candidate header row deviates from the other rows in the file in terms of types. For example, in [`flights.csv`](https://duckdb.org/data/flights.csv), we can see that the header row consists of only `VARCHAR` columns â€“ whereas the values contain a `DATE` value for the `FlightDate` column. As such â€“ the system defines the first row as the header row and extracts the column names from the header row. In files that do not have a header row, the column names are generated as `column0`, `column1`, etc. Note that headers cannot be detected correctly if all columns are of type `VARCHAR` â€“ as in this case the system cannot distinguish the header row from the other rows in the file. In this case, the system assumes the file has a header. This can be overridden by setting the `header` option to `false`. ##### Dates and Timestamps {#docs:stable:data:csv:auto_detection::dates-and-timestamps} DuckDB supports the [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601) format by default for timestamps, dates and times. Unfortunately, not all dates and times are formatted using this standard. For that reason, the CSV reader also supports the `dateformat` and `timestampformat` options. Using this format the user can specify a [format string](#docs:stable:sql:functions:dateformat) that specifies how the date or timestamp should be read. As part of the auto-detection, the system tries to figure out if dates and times are stored in a different representation. This is not always possible â€“ as there are ambiguities in the representation. For example, the date `01-02-2000` can be parsed as either January 2nd or February 1st. Often these ambiguities can be resolved. For example, if we later encounter the date `21-02-2000` then we know that the format must have been `DD-MM-YYYY`. `MM-DD-YYYY` is no longer possible as there is no 21nd month. If the ambiguities cannot be resolved by looking at the data the system has a list of preferences for which date format to use. If the system chooses incorrectly, the user can specify the `dateformat` and `timestampformat` options manually. The system considers the following formats for dates (` dateformat`). Higher entries are chosen over lower entries in case of ambiguities (i.e., ISO 8601 is preferred over `MM-DD-YYYY`). | dateformat | |------------| | ISO 8601 | | %y-%m-%d | | %Y-%m-%d | | %d-%m-%y | | %d-%m-%Y | | %m-%d-%y | | %m-%d-%Y | The system considers the following formats for timestamps (` timestampformat`). Higher entries are chosen over lower entries in case of ambiguities. | timestampformat | |----------------------| | ISO 8601 | | %y-%m-%d %H:%M:%S | | %Y-%m-%d %H:%M:%S | | %d-%m-%y %H:%M:%S | | %d-%m-%Y %H:%M:%S | | %m-%d-%y %I:%M:%S %p | | %m-%d-%Y %I:%M:%S %p | | %Y-%m-%d %H:%M:%S.%f | ### Reading Faulty CSV Files {#docs:stable:data:csv:reading_faulty_csv_files} CSV files can come in all shapes and forms, with some presenting many errors that make the process of cleanly reading them inherently difficult. To help users read these files, DuckDB supports detailed error messages, the ability to skip faulty lines, and the possibility of storing faulty lines in a temporary table to assist users with a data cleaning step. #### Structural Errors {#docs:stable:data:csv:reading_faulty_csv_files::structural-errors} DuckDB supports the detection and skipping of several different structural errors. In this section, we will go over each error with an example. For the examples, consider the following table: ```sql CREATE TABLE people (name VARCHAR, birth_date DATE); ``` DuckDB detects the following error types: * `CAST`: Casting errors occur when a column in the CSV file cannot be cast to the expected schema value. For example, the line `Pedro,The 90s` would cause an error since the string `The 90s` cannot be cast to a date. * `MISSING COLUMNS`: This error occurs if a line in the CSV file has fewer columns than expected. In our example, we expect two columns; therefore, a row with just one value, e.g., `Pedro`, would cause this error. * `TOO MANY COLUMNS`: This error occurs if a line in the CSV has more columns than expected. In our example, any line with more than two columns would cause this error, e.g., `Pedro,01-01-1992,pdet`. * `UNQUOTED VALUE`: Quoted values in CSV lines must always be unquoted at the end; if a quoted value remains quoted throughout, it will cause an error. For example, assuming our scanner uses `quote='"'`, the line `"pedro"holanda, 01-01-1992` would present an unquoted value error. * `LINE SIZE OVER MAXIMUM`: DuckDB has a parameter that sets the maximum line size a CSV file can have, which by default is set to 2,097,152 bytes. Assuming our scanner is set to `max_line_size = 25`, the line `Pedro Holanda, 01-01-1992` would produce an error, as it exceeds 25 bytes. * `INVALID ENCODING`: DuckDB supports UTF-8 strings, UTF-16 and Latin-1 encodings. Lines containing other characters will produce an error. For example, the line `pedro\xff\xff, 01-01-1992` would be problematic. ##### Anatomy of a CSV Error {#docs:stable:data:csv:reading_faulty_csv_files::anatomy-of-a-csv-error} By default, when performing a CSV read, if any structural errors are encountered, the scanner will immediately stop the scanning process and throw the error to the user. These errors are designed to provide as much information as possible to allow users to evaluate them directly in their CSV file. This is an example for a full error message: ```console Conversion Error: CSV Error on Line: 5648 Original Line: Pedro,The 90s Error when converting column "birth_date". date field value out of range: "The 90s", expected format is (DD-MM-YYYY) Column date is being converted as type DATE This type was auto-detected from the CSV file. Possible solutions: * Override the type for this column manually by setting the type explicitly, e.g., types={'birth_date': 'VARCHAR'} * Set the sample size to a larger value to enable the auto-detection to scan more values, e.g., sample_size=-1 * Use a COPY statement to automatically derive types from an existing table. file= people.csv delimiter = , (Auto-Detected) quote = " (Auto-Detected) escape = " (Auto-Detected) new_line = \r\n (Auto-Detected) header = true (Auto-Detected) skip_rows = 0 (Auto-Detected) date_format = (DD-MM-YYYY) (Auto-Detected) timestamp_format = (Auto-Detected) null_padding=0 sample_size=20480 ignore_errors=false all_varchar=0 ``` The first block provides us with information regarding where the error occurred, including the line number, the original CSV line, and which field was problematic: ```console Conversion Error: CSV Error on Line: 5648 Original Line: Pedro,The 90s Error when converting column "birth_date". date field value out of range: "The 90s", expected format is (DD-MM-YYYY) ``` The second block provides us with potential solutions: ```console Column date is being converted as type DATE This type was auto-detected from the CSV file. Possible solutions: * Override the type for this column manually by setting the type explicitly, e.g., types={'birth_date': 'VARCHAR'} * Set the sample size to a larger value to enable the auto-detection to scan more values, e.g., sample_size=-1 * Use a COPY statement to automatically derive types from an existing table. ``` Since the type of this field was auto-detected, it suggests defining the field as a `VARCHAR` or fully utilizing the dataset for type detection. Finally, the last block presents some of the options used in the scanner that can cause errors, indicating whether they were auto-detected or manually set by the user. #### Using the `ignore_errors` Option {#docs:stable:data:csv:reading_faulty_csv_files::using-the-ignore_errors-option} There are cases where CSV files may have multiple structural errors, and users simply wish to skip these and read the correct data. Reading erroneous CSV files is possible by utilizing the `ignore_errors` option. With this option set, rows containing data that would otherwise cause the CSV parser to generate an error will be ignored. In our example, we will demonstrate a CAST error, but note that any of the errors described in our Structural Error section would cause the faulty line to be skipped. For example, consider the following CSV file, [`faulty.csv`](https://duckdb.org/data/faulty.csv): ```csv Pedro,31 Oogie Boogie, three ``` If you read the CSV file, specifying that the first column is a `VARCHAR` and the second column is an `INTEGER`, loading the file would fail, as the string `three` cannot be converted to an `INTEGER`. For example, the following query will throw a casting error. ```sql FROM read_csv('faulty.csv', columns = {'name': 'VARCHAR', 'age': 'INTEGER'}); ``` However, with `ignore_errors` set, the second row of the file is skipped, outputting only the complete first row. For example: ```sql FROM read_csv( 'faulty.csv', columns = {'name': 'VARCHAR', 'age': 'INTEGER'}, ignore_errors = true ); ``` Outputs: | name | age | |-------|-----| | Pedro | 31 | One should note that the CSV Parser is affected by the projection pushdown optimization. Hence, if we were to select only the name column, both rows would be considered valid, as the casting error on the age would never occur. For example: ```sql SELECT name FROM read_csv('faulty.csv', columns = {'name': 'VARCHAR', 'age': 'INTEGER'}); ``` Outputs: | name | |--------------| | Pedro | | Oogie Boogie | #### Retrieving Faulty CSV Lines {#docs:stable:data:csv:reading_faulty_csv_files::retrieving-faulty-csv-lines} Being able to read faulty CSV files is important, but for many data cleaning operations, it is also necessary to know exactly which lines are corrupted and what errors the parser discovered on them. For scenarios like these, it is possible to use DuckDB's CSV Rejects Table feature. By default, this feature creates two temporary tables. 1. `reject_scans`: Stores information regarding the parameters of the CSV Scanner 2. `reject_errors`: Stores information regarding each CSV faulty line and in which CSV Scanner they happened. Note that any of the errors described in our Structural Error section will be stored in the rejects tables. Also, if a line has multiple errors, multiple entries will be stored for the same line, one for each error. ##### Reject Scans {#docs:stable:data:csv:reading_faulty_csv_files::reject-scans} The CSV Reject Scans Table returns the following information: | Column name | Description | Type | |:--|:-----|:-| | `scan_id` | The internal ID used in DuckDB to represent that scanner | `UBIGINT` | | `file_id` | A scanner might happen over multiple files, so the file_id represents a unique file in a scanner | `UBIGINT` | | `file_path` | The file path | `VARCHAR` | | `delimiter` | The delimiter used e.g., ; | `VARCHAR` | | `quote` | The quote used e.g., " | `VARCHAR` | | `escape` | The quote used e.g., " | `VARCHAR` | | `newline_delimiter` | The newline delimiter used e.g., \r\n | `VARCHAR` | | `skip_rows` | If any rows were skipped from the top of the file | `UINTEGER` | | `has_header` | If the file has a header | `BOOLEAN` | | `columns` | The schema of the file (i.e., all column names and types) | `VARCHAR` | | `date_format` | The format used for date types | `VARCHAR` | | `timestamp_format` | The format used for timestamp types| `VARCHAR` | | `user_arguments` | Any extra scanner parameters manually set by the user | `VARCHAR` | ##### Reject Errors {#docs:stable:data:csv:reading_faulty_csv_files::reject-errors} The CSV Reject Errors Table returns the following information: | Column name | Description | Type | |:--|:-----|:-| | `scan_id` | The internal ID used in DuckDB to represent that scanner, used to join with reject scans tables | `UBIGINT` | | `file_id` | The file_id represents a unique file in a scanner, used to join with reject scans tables | `UBIGINT` | | `line` | Line number, from the CSV File, where the error occurred. | `UBIGINT` | | `line_byte_position` | Byte Position of the start of the line, where the error occurred. | `UBIGINT` | | `byte_position` | Byte Position where the error occurred. | `UBIGINT` | | `column_idx` | If the error happens in a specific column, the index of the column. | `UBIGINT` | | `column_name` | If the error happens in a specific column, the name of the column. | `VARCHAR` | | `error_type` | The type of the error that happened. | `ENUM` | | `csv_line` | The original CSV line. | `VARCHAR` | | `error_message` | The error message produced by DuckDB. | `VARCHAR` | #### Parameters {#docs:stable:data:csv:reading_faulty_csv_files::parameters} The parameters listed below are used in the `read_csv` function to configure the CSV Rejects Table. | Name | Description | Type | Default | |:--|:-----|:-|:-| | `store_rejects` | If set to true, any errors in the file will be skipped and stored in the default rejects temporary tables.| `BOOLEAN` | False | | `rejects_scan` | Name of a temporary table where the information of the scan information of faulty CSV file are stored. | `VARCHAR` | reject_scans | | `rejects_table` | Name of a temporary table where the information of the faulty lines of a CSV file are stored. | `VARCHAR` | reject_errors | | `rejects_limit` | Upper limit on the number of faulty records from a CSV file that will be recorded in the rejects table. 0 is used when no limit should be applied. | `BIGINT` | 0 | To store the information of the faulty CSV lines in a rejects table, the user must simply set the `store_rejects` option to true. For example: ```sql FROM read_csv( 'faulty.csv', columns = {'name': 'VARCHAR', 'age': 'INTEGER'}, store_rejects = true ); ``` You can then query both the `reject_scans` and `reject_errors` tables, to retrieve information about the rejected tuples. For example: ```sql FROM reject_scans; ``` Outputs: | scan_id | file_id | file_path | delimiter | quote | escape | newline_delimiter | skip_rows | has_header | columns | date_format | timestamp_format | user_arguments | |---------|---------|-----------------------------------|-----------|-------|--------|-------------------|-----------|-----------:|--------------------------------------|-------------|------------------|--------------------| | 5 | 0 | faulty.csv | , | " | " | \n | 0 | false | {'name': 'VARCHAR','age': 'INTEGER'} | | | store_rejects=true | ```sql FROM reject_errors; ``` Outputs: | scan_id | file_id | line | line_byte_position | byte_position | column_idx | column_name | error_type | csv_line | error_message | |---------|---------|------|--------------------|---------------|------------|-------------|------------|---------------------|------------------------------------------------------------------------------------| | 5 | 0 | 2 | 10 | 23 | 2 | age | CAST | Oogie Boogie, three | Error when converting column "age". Could not convert string " three" to 'INTEGER' | ### CSV Import Tips {#docs:stable:data:csv:tips} Below is a collection of tips to help when attempting to import complex CSV files. In the examples, we use the [`flights.csv`](https://duckdb.org/data/flights.csv) file. #### Override the Header Flag if the Header Is Not Correctly Detected {#docs:stable:data:csv:tips::override-the-header-flag-if-the-header-is-not-correctly-detected} If a file contains only string columns the `header` auto-detection might fail. Provide the `header` option to override this behavior. ```sql SELECT * FROM read_csv('flights.csv', header = true); ``` #### Provide Names if the File Does Not Contain a Header {#docs:stable:data:csv:tips::provide-names-if-the-file-does-not-contain-a-header} If the file does not contain a header, names will be auto-generated by default. You can provide your own names with the `names` option. ```sql SELECT * FROM read_csv('flights.csv', names = ['DateOfFlight', 'CarrierName']); ``` #### Override the Types of Specific Columns {#docs:stable:data:csv:tips::override-the-types-of-specific-columns} The `types` flag can be used to override types of only certain columns by providing a struct of `name` â†’ `type` mappings. ```sql SELECT * FROM read_csv('flights.csv', types = {'FlightDate': 'DATE'}); ``` #### Use `COPY` When Loading Data into a Table {#docs:stable:data:csv:tips::use-copy-when-loading-data-into-a-table} The [`COPY` statement](#docs:stable:sql:statements:copy) copies data directly into a table. The CSV reader uses the schema of the table instead of auto-detecting types from the file. This speeds up the auto-detection, and prevents mistakes from being made during auto-detection. ```sql COPY tbl FROM 'test.csv'; ``` #### Use `union_by_name` When Loading Files with Different Schemas {#docs:stable:data:csv:tips::use-union_by_name-when-loading-files-with-different-schemas} The `union_by_name` option can be used to unify the schema of files that have different or missing columns. For files that do not have certain columns, `NULL` values are filled in. ```sql SELECT * FROM read_csv('flights*.csv', union_by_name = true); ``` To load data into _an existing table_ where the table has more columns than the CSV file, you can use the [`INSERT INTO ... BY NAME` clause](#docs:stable:sql:statements:insert::insert-into--by-name): ```sql INSERT INTO tbl BY NAME SELECT * FROM read_csv('input.csv'); ``` #### Sample Size {#docs:stable:data:csv:tips::sample-size} If the [CSV sniffer](https://duckdb.org/2023/10/27/csv-sniffer) is not detecting the correct type, try increasing the sample size. The option `sample_size = -1` forces the sniffer to read the entire file: ```sql SELECT * FROM read_csv('my_csv_file.csv', sample_size = -1); ``` ## JSON Files {#data:json} ### JSON Overview {#docs:stable:data:json:overview} DuckDB supports SQL functions that are useful for reading values from existing JSON and creating new JSON data. JSON is supported with the `json` extension which is shipped with most DuckDB distributions and is auto-loaded on first use. If you would like to install or load it manually, please consult the [â€œInstalling and Loadingâ€ page](#docs:stable:data:json:installing_and_loading). #### About JSON {#docs:stable:data:json:overview::about-json} JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attributeâ€“value pairs and arrays (or other serializable values). While it is not a very efficient format for tabular data, it is very commonly used, especially as a data interchange format. #### JSONPath and JSON Pointer Syntax {#docs:stable:data:json:overview::jsonpath-and-json-pointer-syntax} DuckDB implements multiple interfaces for JSON extraction: [JSONPath](https://goessner.net/articles/JsonPath/) and [JSON Pointer](https://datatracker.ietf.org/doc/html/rfc6901). Both of them work with the arrow operator (` ->`) and the `json_extract` function call. Note that DuckDB only supports lookups in JSONPath, i.e., extracting fields with `.` or array elements with `[]`. Arrays can be indexed from the back and both approaches support the wildcard `*`. DuckDB _not_ support the full JSONPath syntax because SQL is readily available for any further transformations. > It's best to pick either the JSONPath or the JSON Pointer syntax and use it in your entire application. #### Indexing {#docs:stable:data:json:overview::indexing} > **Warning.** Following [PostgreSQL's conventions](#docs:stable:sql:dialect:postgresql_compatibility), DuckDB uses 1-based indexing for its [`ARRAY`](#docs:stable:sql:data_types:array) and [`LIST`](#docs:stable:sql:data_types:list) data types but [0-based indexing for the JSON data type](https://www.postgresql.org/docs/17/functions-json.html#FUNCTIONS-JSON-PROCESSING). #### Examples {#docs:stable:data:json:overview::examples} ##### Loading JSON {#docs:stable:data:json:overview::loading-json} Read a JSON file from disk, auto-infer options: ```sql SELECT * FROM 'todos.json'; ``` Use the `read_json` function with custom options: ```sql SELECT * FROM read_json('todos.json', format = 'array', columns = {userId: 'UBIGINT', id: 'UBIGINT', title: 'VARCHAR', completed: 'BOOLEAN'}); ``` Read a JSON file from stdin, auto-infer options: ```batch cat data/json/todos.json | duckdb -c "SELECT * FROM read_json('/dev/stdin')" ``` Read a JSON file into a table: ```sql CREATE TABLE todos (userId UBIGINT, id UBIGINT, title VARCHAR, completed BOOLEAN); COPY todos FROM 'todos.json' (AUTO_DETECT true); ``` Alternatively, create a table without specifying the schema manually with a [`CREATE TABLE ... AS SELECT` clause](#docs:stable:sql:statements:create_table::create-table--as-select-ctas): ```sql CREATE TABLE todos AS SELECT * FROM 'todos.json'; ``` Since DuckDB v1.3.0, the JSON reader returns the `filename` virtual column: ```sql SELECT filename, * FROM 'todos-*.json'; ``` ##### Writing JSON {#docs:stable:data:json:overview::writing-json} Write the result of a query to a JSON file: ```sql COPY (SELECT * FROM todos) TO 'todos.json'; ``` ##### JSON Data Type {#docs:stable:data:json:overview::json-data-type} Create a table with a column for storing JSON data and insert data into it: ```sql CREATE TABLE example (j JSON); INSERT INTO example VALUES ('{ "family": "anatidae", "species": [ "duck", "goose", "swan", null ] }'); ``` ##### Retrieving JSON Data {#docs:stable:data:json:overview::retrieving-json-data} Retrieve the family key's value: ```sql SELECT j.family FROM example; ``` ```text "anatidae" ``` Extract the family key's value with a [JSONPath](https://goessner.net/articles/JsonPath/) expression as `JSON`: ```sql SELECT j->'$.family' FROM example; ``` ```text "anatidae" ``` Extract the family key's value with a [JSONPath](https://goessner.net/articles/JsonPath/) expression as a `VARCHAR`: ```sql SELECT j->>'$.family' FROM example; ``` ```text anatidae ``` ##### Using Quotes for Special Characters {#docs:stable:data:json:overview::using-quotes-for-special-characters} JSON object keys that contain the special `[` and `.` characters can be used by surrounding them with double quotes (` "`): ```sql SELECT '{"d[u]._\"ck":42}'->'$."d[u]._\"ck"' AS v; ``` ```text 42 ``` ### Creating JSON {#docs:stable:data:json:creating_json} #### JSON Creation Functions {#docs:stable:data:json:creating_json::json-creation-functions} The following functions are used to create JSON. | Function | Description | |:--|:----| | `to_json(any)` | Create `JSON` from a value of `any` type. Our `LIST` is converted to a JSON array, and our `STRUCT` and `MAP` are converted to a JSON object. | | `json_quote(any)` | Alias for `to_json`. | | `array_to_json(list)` | Alias for `to_json` that only accepts `LIST`. | | `row_to_json(list)` | Alias for `to_json` that only accepts `STRUCT`. | | `json_array(any, ...)` | Create a JSON array from the values in the argument lists. | | `json_object(key, value, ...)` | Create a JSON object from `key`, `value` pairs in the argument list. Requires an even number of arguments. | | `json_merge_patch(json, json)` | Merge two JSON documents together. | Examples: ```sql SELECT to_json('duck'); ``` ```text "duck" ``` ```sql SELECT to_json([1, 2, 3]); ``` ```text [1,2,3] ``` ```sql SELECT to_json({duck : 42}); ``` ```text {"duck":42} ``` ```sql SELECT to_json(MAP(['duck'], [42])); ``` ```text {"duck":42} ``` ```sql SELECT json_array('duck', 42, 'goose', 123); ``` ```text ["duck",42,"goose",123] ``` ```sql SELECT json_object('duck', 42, 'goose', 123); ``` ```text {"duck":42,"goose":123} ``` ```sql SELECT json_merge_patch('{"duck": 42}', '{"goose": 123}'); ``` ```text {"goose":123,"duck":42} ``` ### Loading JSON {#docs:stable:data:json:loading_json} The DuckDB JSON reader can automatically infer which configuration flags to use by analyzing the JSON file. This will work correctly in most situations, and should be the first option attempted. In rare situations where the JSON reader cannot figure out the correct configuration, it is possible to manually configure the JSON reader to correctly parse the JSON file. #### The `read_json` Function {#docs:stable:data:json:loading_json::the-read_json-function} The `read_json` is the simplest method of loading JSON files: it automatically attempts to figure out the correct configuration of the JSON reader. It also automatically deduces types of columns. In the following example, we use the [`todos.json`](https://duckdb.org/data/json/todos.json) file, ```sql SELECT * FROM read_json('todos.json') LIMIT 5; ``` | userId | id | title | completed | |-------:|---:|-----------------------------------------------------------------|-----------| | 1 | 1 | delectus aut autem | false | | 1 | 2 | quis ut nam facilis et officia qui | false | | 1 | 3 | fugiat veniam minus | false | | 1 | 4 | et porro tempora | true | | 1 | 5 | laboriosam mollitia et enim quasi adipisci quia provident illum | false | We can use `read_json` to create a persistent table as well: ```sql CREATE TABLE todos AS SELECT * FROM read_json('todos.json'); DESCRIBE todos; ``` | column_name | column_type | null | key | default | extra | |-------------|-------------|------|------|---------|-------| | userId | UBIGINT | YES | NULL | NULL | NULL | | id | UBIGINT | YES | NULL | NULL | NULL | | title | VARCHAR | YES | NULL | NULL | NULL | | completed | BOOLEAN | YES | NULL | NULL | NULL | If we specify types for subset of columns, `read_json` excludes columns that we don't specify: ```sql SELECT * FROM read_json( 'todos.json', columns = {userId: 'UBIGINT', completed: 'BOOLEAN'} ) LIMIT 5; ``` Note that only the `userId` and `completed` columns are shown: | userId | completed | |-------:|----------:| | 1 | false | | 1 | false | | 1 | false | | 1 | true | | 1 | false | Multiple files can be read at once by providing a glob or a list of files. Refer to the [multiple files section](#docs:stable:data:multiple_files:overview) for more information. #### Functions for Reading JSON Objects {#docs:stable:data:json:loading_json::functions-for-reading-json-objects} The following table functions are used to read JSON: | Function | Description | |:---|:---| | `read_json_objects(filename)` | Read a JSON object from `filename`, where `filename` can also be a list of files or a glob pattern. | | `read_ndjson_objects(filename)` | Alias for `read_json_objects` with the parameter `format` set to `newline_delimited`. | | `read_json_objects_auto(filename)` | Alias for `read_json_objects` with the parameter `format` set to `auto` . | ##### Parameters {#docs:stable:data:json:loading_json::parameters} These functions have the following parameters: | Name | Description | Type | Default | |:--|:-----|:-|:-| | `compression` | The compression type for the file. By default this will be detected automatically from the file extension (e.g., `t.json.gz` will use gzip, `t.json` will use none). Options are `none`, `gzip`, `zstd` and `auto_detect`. | `VARCHAR` | `auto_detect` | | `filename` | Whether or not an extra `filename` column should be included in the result. Since DuckDB v1.3.0, the `filename` column is added automatically as a virtual column and this option is only kept for compatibility reasons. | `BOOL` | `false` | | `format` | Can be one of `auto`, `unstructured`, `newline_delimited` and `array`. | `VARCHAR` | `array` | | `hive_partitioning` | Whether or not to interpret the path as a [Hive partitioned path](#docs:stable:data:partitioning:hive_partitioning). | `BOOL` | (auto-detected) | | `ignore_errors` | Whether to ignore parse errors (only possible when `format` is `newline_delimited`). | `BOOL` | `false` | | `maximum_sample_files` | The maximum number of JSON files sampled for auto-detection. | `BIGINT` | `32` | | `maximum_object_size` | The maximum size of a JSON object (in bytes). | `UINTEGER` | `16777216` | The `format` parameter specifies how to read the JSON from a file. With `unstructured`, the top-level JSON is read, e.g., for `birds.json`: ```json { "duck": 42 } { "goose": [1, 2, 3] } ``` ```sql FROM read_json_objects('birds.json', format = 'unstructured'); ``` will result in two objects being read: ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ json â”‚ â”‚ json â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ {\n "duck": 42\n} â”‚ â”‚ {\n "goose": [1, 2, 3]\n} â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` With `newline_delimited`, [NDJSON](https://github.com/ndjson/ndjson-spec) is read, where each JSON is separated by a newline (` \n`), e.g., for `birds-nd.json`: ```json {"duck": 42} {"goose": [1, 2, 3]} ``` ```sql FROM read_json_objects('birds-nd.json', format = 'newline_delimited'); ``` will also result in two objects being read: ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ json â”‚ â”‚ json â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ {"duck": 42} â”‚ â”‚ {"goose": [1, 2, 3]} â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` With `array`, each array element is read, e.g., for `birds-array.json`: ```json [ { "duck": 42 }, { "goose": [1, 2, 3] } ] ``` ```sql FROM read_json_objects('birds-array.json', format = 'array'); ``` will again result in two objects being read: ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ json â”‚ â”‚ json â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ {\n "duck": 42\n } â”‚ â”‚ {\n "goose": [1, 2, 3]\n } â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Functions for Reading JSON as a Table {#docs:stable:data:json:loading_json::functions-for-reading-json-as-a-table} DuckDB also supports reading JSON as a table, using the following functions: | Function | Description | |:---------|:----------------| | `read_json(filename)` | Read JSON from `filename`, where `filename` can also be a list of files, or a glob pattern. | | `read_json_auto(filename)` | Alias for `read_json`. | | `read_ndjson(filename)` | Alias for `read_json` with parameter `format` set to `newline_delimited`. | | `read_ndjson_auto(filename)` | Alias for `read_json` with parameter `format` set to `newline_delimited`. | ##### Parameters {#docs:stable:data:json:loading_json::parameters} Besides the `maximum_object_size`, `format`, `ignore_errors` and `compression`, these functions have additional parameters: | Name | Description | Type | Default | |:--|:------|:-|:-| | `auto_detect` | Whether to auto-detect the names of the keys and data types of the values automatically | `BOOL` | `true` | | `columns` | A struct that specifies the key names and value types contained within the JSON file (e.g., `{key1: 'INTEGER', key2: 'VARCHAR'}`). If `auto_detect` is enabled these will be inferred | `STRUCT` | `(empty)` | | `dateformat` | Specifies the date format to use when parsing dates. See [Date Format](#docs:stable:sql:functions:dateformat) | `VARCHAR` | `iso` | | `maximum_depth` | Maximum nesting depth to which the automatic schema detection detects types. Set to -1 to fully detect nested JSON types | `BIGINT` | `-1` | | `records` | Can be one of `auto`, `true`, `false` | `VARCHAR` | `auto` | | `sample_size` | Option to define number of sample objects for automatic JSON type detection. Set to -1 to scan the entire input file | `UBIGINT` | `20480` | | `timestampformat` | Specifies the date format to use when parsing timestamps. See [Date Format](#docs:stable:sql:functions:dateformat) | `VARCHAR` | `iso`| | `union_by_name` | Whether the schema's of multiple JSON files should be [unified](#docs:stable:data:multiple_files:combining_schemas) | `BOOL` | `false` | | `map_inference_threshold` | Controls the threshold for number of columns whose schema will be auto-detected; if JSON schema auto-detection would infer a `STRUCT` type for a field that has _more_ than this threshold number of subfields, it infers a `MAP` type instead. Set to `-1` to disable `MAP` inference. | `BIGINT` | `200` | | `field_appearance_threshold` | The JSON reader divides the number of appearances of each JSON field by the auto-detection sample size. If the average over the fields of an object is less than this threshold, it will default to using a `MAP` type with value type of merged field types. | `DOUBLE` | `0.1` | Note that DuckDB can convert JSON arrays directly to its internal `LIST` type, and missing keys become `NULL`: ```sql SELECT * FROM read_json( ['birds1.json', 'birds2.json'], columns = {duck: 'INTEGER', goose: 'INTEGER[]', swan: 'DOUBLE'} ); ``` | duck | goose | swan | |-----:|-----------|-----:| | 42 | [1, 2, 3] | NULL | | 43 | [4, 5, 6] | 3.3 | DuckDB can automatically detect the types like so: ```sql SELECT goose, duck FROM read_json('*.json.gz'); SELECT goose, duck FROM '*.json.gz'; -- equivalent ``` DuckDB can read (and auto-detect) a variety of formats, specified with the `format` parameter. Querying a JSON file that contains an `array`, e.g.: ```json [ { "duck": 42, "goose": 4.2 }, { "duck": 43, "goose": 4.3 } ] ``` Can be queried exactly the same as a JSON file that contains `unstructured` JSON, e.g.: ```json { "duck": 42, "goose": 4.2 } { "duck": 43, "goose": 4.3 } ``` Both can be read as the table: ```sql SELECT FROM read_json('birds.json'); ``` | duck | goose | |-----:|------:| | 42 | 4.2 | | 43 | 4.3 | If your JSON file does not contain â€œrecordsâ€, i.e., any other type of JSON than objects, DuckDB can still read it. This is specified with the `records` parameter. The `records` parameter specifies whether the JSON contains records that should be unpacked into individual columns. DuckDB also attempts to auto-detect this. For example, take the following file, `birds-records.json`: ```json {"duck": 42, "goose": [1, 2, 3]} {"duck": 43, "goose": [4, 5, 6]} ``` ```sql SELECT * FROM read_json('birds-records.json'); ``` The query results in two columns: | duck | goose | |-----:|:--------| | 42 | [1,2,3] | | 43 | [4,5,6] | You can read the same file with `records` set to `false`, to get a single column, which is a `STRUCT` containing the data: | json | |:-----| | {'duck': 42, 'goose': [1,2,3]} | | {'duck': 43, 'goose': [4,5,6]} | For additional examples reading more complex data, please see the [â€œShredding Deeply Nested JSON, One Vector at a Timeâ€ blog post](https://duckdb.org/2023/03/03/json). #### Loading with the `COPY` Statement Using `FORMAT json` {#docs:stable:data:json:loading_json::loading-with-the-copy-statement-using-format-json} When the `json` extension is installed, `FORMAT json` is supported for `COPY FROM`, `IMPORT DATABASE`, as well as `COPY TO` and `EXPORT DATABASE`. See the [`COPY` statement](#docs:stable:sql:statements:copy) and the [`IMPORT` / `EXPORT` clauses](#docs:stable:sql:statements:export). By default, `COPY` expects newline-delimited JSON. If you prefer copying data to/from a JSON array, you can specify `ARRAY true`, e.g., ```sql COPY (SELECT * FROM range(5) r(i)) TO 'numbers.json' (ARRAY true); ``` will create the following file: ```json [ {"i":0}, {"i":1}, {"i":2}, {"i":3}, {"i":4} ] ``` This can be read back to DuckDB as follows: ```sql CREATE TABLE numbers (i BIGINT); COPY numbers FROM 'numbers.json' (ARRAY true); ``` The format can be detected automatically the format like so: ```sql CREATE TABLE numbers (i BIGINT); COPY numbers FROM 'numbers.json' (AUTO_DETECT true); ``` We can also create a table from the auto-detected schema: ```sql CREATE TABLE numbers AS FROM 'numbers.json'; ``` ##### Parameters {#docs:stable:data:json:loading_json::parameters} | Name | Description | Type | Default | |:--|:-----|:-|:-| | `auto_detect` | Whether to auto-detect the names of the keys and data types of the values automatically | `BOOL` | `false` | | `columns` | A struct that specifies the key names and value types contained within the JSON file (e.g., `{key1: 'INTEGER', key2: 'VARCHAR'}`). If `auto_detect` is enabled these will be inferred | `STRUCT` | `(empty)` | | `compression` | The compression type for the file. By default this will be detected automatically from the file extension (e.g., `t.json.gz` will use gzip, `t.json` will use none). Options are `uncompressed`, `gzip`, `zstd` and `auto_detect`. | `VARCHAR` | `auto_detect` | | `convert_strings_to_integers` | Whether strings representing integer values should be converted to a numerical type. | `BOOL` | `false` | | `dateformat` | Specifies the date format to use when parsing dates. See [Date Format](#docs:stable:sql:functions:dateformat) | `VARCHAR` | `iso` | | `filename` | Whether or not an extra `filename` column should be included in the result. | `BOOL` | `false` | | `format` | Can be one of `auto, unstructured, newline_delimited, array` | `VARCHAR` | `array` | | `hive_partitioning` | Whether or not to interpret the path as a [Hive partitioned path](#docs:stable:data:partitioning:hive_partitioning). | `BOOL` | `false` | | `ignore_errors` | Whether to ignore parse errors (only possible when `format` is `newline_delimited`) | `BOOL` | `false` | | `maximum_depth` | Maximum nesting depth to which the automatic schema detection detects types. Set to -1 to fully detect nested JSON types | `BIGINT` | `-1` | | `maximum_object_size` | The maximum size of a JSON object (in bytes) | `UINTEGER` | `16777216` | | `records` | Can be one of `auto`, `true`, `false` | `VARCHAR` | `records` | | `sample_size` | Option to define number of sample objects for automatic JSON type detection. Set to -1 to scan the entire input file | `UBIGINT` | `20480` | | `timestampformat` | Specifies the date format to use when parsing timestamps. See [Date Format](#docs:stable:sql:functions:dateformat) | `VARCHAR` | `iso`| | `union_by_name` | Whether the schema's of multiple JSON files should be [unified](#docs:stable:data:multiple_files:combining_schemas). | `BOOL` | `false` | ### Writing JSON {#docs:stable:data:json:writing_json} The contents of tables or the result of queries can be written directly to a JSON file using the `COPY` statement. For example: ```sql CREATE TABLE cities AS FROM (VALUES ('Amsterdam', 1), ('London', 2)) cities(name, id); COPY cities TO 'cities.json'; ``` This will result in `cities.json` with the following content: ```json {"name":"Amsterdam","id":1} {"name":"London","id":2} ``` See the [`COPY` statement](#docs:stable:sql:statements:copy::copy-to) for more information. ### JSON Type {#docs:stable:data:json:json_type} DuckDB supports `json` via the `JSON` logical type. The `JSON` logical type is interpreted as JSON, i.e., parsed, in JSON functions rather than interpreted as `VARCHAR`, i.e., a regular string (modulo the equality-comparison caveat at the bottom of this page). All JSON creation functions return values of this type. We also allow any of DuckDB's types to be cast to JSON, and JSON to be cast back to any of DuckDB's types, for example, to cast `JSON` to DuckDB's `STRUCT` type, run: ```sql SELECT '{"duck": 42}'::JSON::STRUCT(duck INTEGER); ``` ```text {'duck': 42} ``` And back: ```sql SELECT {duck: 42}::JSON; ``` ```text {"duck":42} ``` This works for our nested types as shown in the example, but also for non-nested types: ```sql SELECT '2023-05-12'::DATE::JSON; ``` ```text "2023-05-12" ``` The only exception to this behavior is the cast from `VARCHAR` to `JSON`, which does not alter the data, but instead parses and validates the contents of the `VARCHAR` as JSON. ### JSON Processing Functions {#docs:stable:data:json:json_functions} #### JSON Extraction Functions {#docs:stable:data:json:json_functions::json-extraction-functions} There are two extraction functions, which have their respective operators. The operators can only be used if the string is stored as the `JSON` logical type. These functions supports the same two location notations as [JSON Scalar functions](#::json-scalar-functions). | Function | Alias | Operator | Description | | :-------------------------------- | :----------------------- | :------- | --------------------------------------------------------------------------------------------------------------------------------- | | `json_exists(json, path)` | | | Returns `true` if the supplied path exists in the `json`, and `false` otherwise. | | `json_extract(json, path)` | `json_extract_path` | `->` | Extracts `JSON` from `json` at the given `path`. If `path` is a `LIST`, the result will be a `LIST` of `JSON`. | | `json_extract_string(json, path)` | `json_extract_path_text` | `->>` | Extracts `VARCHAR` from `json` at the given `path`. If `path` is a `LIST`, the result will be a `LIST` of `VARCHAR`. | | `json_value(json, path)` | | | Extracts `JSON` from `json` at the given `path`. If the `json` at the supplied path is not a scalar value, it will return `NULL`. | Note that the arrow operator `->`, which is used for JSON extracts, has a low precedence as it is also used in [lambda functions](#docs:stable:sql:functions:lambda). Therefore, you need to surround the `->` operator with parentheses when expressing operations such as equality comparisons (` =`). For example: ```sql SELECT ((JSON '{"field": 42}')->'field') = 42; ``` > **Warning.** DuckDB's JSON data type uses [0-based indexing](#docs:stable:data:json:overview::indexing). Examples: ```sql CREATE TABLE example (j JSON); INSERT INTO example VALUES ('{ "family": "anatidae", "species": [ "duck", "goose", "swan", null ] }'); ``` ```sql SELECT json_extract(j, '$.family') FROM example; ``` ```text "anatidae" ``` ```sql SELECT j->'$.family' FROM example; ``` ```text "anatidae" ``` ```sql SELECT j->'$.species[0]' FROM example; ``` ```text "duck" ``` ```sql SELECT j->'$.species[*]' FROM example; ``` ```text ["duck", "goose", "swan", null] ``` ```sql SELECT j->>'$.species[*]' FROM example; ``` ```text [duck, goose, swan, null] ``` ```sql SELECT j->'$.species'->0 FROM example; ``` ```text "duck" ``` ```sql SELECT j->'species'->['/0', '/1'] FROM example; ``` ```text ['"duck"', '"goose"'] ``` ```sql SELECT json_extract_string(j, '$.family') FROM example; ``` ```text anatidae ``` ```sql SELECT j->>'$.family' FROM example; ``` ```text anatidae ``` ```sql SELECT j->>'$.species[0]' FROM example; ``` ```text duck ``` ```sql SELECT j->'species'->>0 FROM example; ``` ```text duck ``` ```sql SELECT j->'species'->>['/0', '/1'] FROM example; ``` ```text [duck, goose] ``` Note that DuckDB's JSON data type uses [0-based indexing](#docs:stable:data:json:overview::indexing). If multiple values need to be extracted from the same JSON, it is more efficient to extract a list of paths: The following will cause the JSON to be parsed twice,: Resulting in a slower query that uses more memory: ```sql SELECT json_extract(j, 'family') AS family, json_extract(j, 'species') AS species FROM example; ``` | family | species | | ---------- | ---------------------------- | | "anatidae" | ["duck","goose","swan",null] | The following produces the same result but is faster and more memory-efficient: ```sql WITH extracted AS ( SELECT json_extract(j, ['family', 'species']) AS extracted_list FROM example ) SELECT extracted_list[1] AS family, extracted_list[2] AS species FROM extracted; ``` #### JSON Scalar Functions {#docs:stable:data:json:json_functions::json-scalar-functions} The following scalar JSON functions can be used to gain information about the stored JSON values. With the exception of `json_valid(json)`, all JSON functions produce an error when invalid JSON is supplied. We support two kinds of notations to describe locations within JSON: [JSON Pointer](https://datatracker.ietf.org/doc/html/rfc6901) and JSONPath. | Function | Description | | :------------------------------------------ | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `json_array_length(json[, path])` | Return the number of elements in the JSON array `json`, or `0` if it is not a JSON array. If `path` is specified, return the number of elements in the JSON array at the given `path`. If `path` is a `LIST`, the result will be `LIST` of array lengths. | | `json_contains(json_haystack, json_needle)` | Returns `true` if `json_needle` is contained in `json_haystack`. Both parameters are of JSON type, but `json_needle` can also be a numeric value or a string, however the string must be wrapped in double quotes. | | `json_keys(json[, path])` | Returns the keys of `json` as a `LIST` of `VARCHAR`, if `json` is a JSON object. If `path` is specified, return the keys of the JSON object at the given `path`. If `path` is a `LIST`, the result will be `LIST` of `LIST` of `VARCHAR`. | | `json_structure(json)` | Return the structure of `json`. Defaults to `JSON` if the structure is inconsistent (e.g., incompatible types in an array). | | `json_type(json[, path])` | Return the type of the supplied `json`, which is one of `ARRAY`, `BIGINT`, `BOOLEAN`, `DOUBLE`, `OBJECT`, `UBIGINT`, `VARCHAR`, and `NULL`. If `path` is specified, return the type of the element at the given `path`. If `path` is a `LIST`, the result will be `LIST` of types. | | `json_valid(json)` | Return whether `json` is valid JSON. | | `json(json)` | Parse and minify `json`. | The JSONPointer syntax separates each field with a `/`. For example, to extract the first element of the array with key `duck`, you can do: ```sql SELECT json_extract('{"duck": [1, 2, 3]}', '/duck/0'); ``` ```text 1 ``` The JSONPath syntax separates fields with a `.`, and accesses array elements with `[i]`, and always starts with `$`. Using the same example, we can do the following: ```sql SELECT json_extract('{"duck": [1, 2, 3]}', '$.duck[0]'); ``` ```text 1 ``` Note that DuckDB's JSON data type uses [0-based indexing](#docs:stable:data:json:overview::indexing). JSONPath is more expressive, and can also access from the back of lists: ```sql SELECT json_extract('{"duck": [1, 2, 3]}', '$.duck[#-1]'); ``` ```text 3 ``` JSONPath also allows escaping syntax tokens, using double quotes: ```sql SELECT json_extract('{"duck.goose": [1, 2, 3]}', '$."duck.goose"[1]'); ``` ```text 2 ``` Examples using the [anatidae biological family](https://en.wikipedia.org/wiki/Anatidae): ```sql CREATE TABLE example (j JSON); INSERT INTO example VALUES ('{ "family": "anatidae", "species": [ "duck", "goose", "swan", null ] }'); ``` ```sql SELECT json(j) FROM example; ``` ```text {"family":"anatidae","species":["duck","goose","swan",null]} ``` ```sql SELECT j.family FROM example; ``` ```text "anatidae" ``` ```sql SELECT j.species[0] FROM example; ``` ```text "duck" ``` ```sql SELECT json_valid(j) FROM example; ``` ```text true ``` ```sql SELECT json_valid('{'); ``` ```text false ``` ```sql SELECT json_array_length('["duck", "goose", "swan", null]'); ``` ```text 4 ``` ```sql SELECT json_array_length(j, 'species') FROM example; ``` ```text 4 ``` ```sql SELECT json_array_length(j, '/species') FROM example; ``` ```text 4 ``` ```sql SELECT json_array_length(j, '$.species') FROM example; ``` ```text 4 ``` ```sql SELECT json_array_length(j, ['$.species']) FROM example; ``` ```text [4] ``` ```sql SELECT json_type(j) FROM example; ``` ```text OBJECT ``` ```sql SELECT json_keys(j) FROM example; ``` ```text [family, species] ``` ```sql SELECT json_structure(j) FROM example; ``` ```text {"family":"VARCHAR","species":["VARCHAR"]} ``` ```sql SELECT json_structure('["duck", {"family": "anatidae"}]'); ``` ```text ["JSON"] ``` ```sql SELECT json_contains('{"key": "value"}', '"value"'); ``` ```text true ``` ```sql SELECT json_contains('{"key": 1}', '1'); ``` ```text true ``` ```sql SELECT json_contains('{"top_key": {"key": "value"}}', '{"key": "value"}'); ``` ```text true ``` #### JSON Aggregate Functions {#docs:stable:data:json:json_functions::json-aggregate-functions} There are three JSON aggregate functions. | Function | Description | | :------------------------------ | :--------------------------------------------------------------------- | | `json_group_array(any)` | Return a JSON array with all values of `any` in the aggregation. | | `json_group_object(key, value)` | Return a JSON object with all `key`, `value` pairs in the aggregation. | | `json_group_structure(json)` | Return the combined `json_structure` of all `json` in the aggregation. | Examples: ```sql CREATE TABLE example1 (k VARCHAR, v INTEGER); INSERT INTO example1 VALUES ('duck', 42), ('goose', 7); ``` ```sql SELECT json_group_array(v) FROM example1; ``` ```text [42, 7] ``` ```sql SELECT json_group_object(k, v) FROM example1; ``` ```text {"duck":42,"goose":7} ``` ```sql CREATE TABLE example2 (j JSON); INSERT INTO example2 VALUES ('{"family": "anatidae", "species": ["duck", "goose"], "coolness": 42.42}'), ('{"family": "canidae", "species": ["labrador", "bulldog"], "hair": true}'); ``` ```sql SELECT json_group_structure(j) FROM example2; ``` ```text {"family":"VARCHAR","species":["VARCHAR"],"coolness":"DOUBLE","hair":"BOOLEAN"} ``` #### Transforming JSON to Nested Types {#docs:stable:data:json:json_functions::transforming-json-to-nested-types} In many cases, it is inefficient to extract values from JSON one-by-one. Instead, we can â€œextractâ€ all values at once, transforming JSON to the nested types `LIST` and `STRUCT`. | Function | Description | | :--------------------------------------- | :--------------------------------------------------------------------- | | `json_transform(json, structure)` | Transform `json` according to the specified `structure`. | | `from_json(json, structure)` | Alias for `json_transform`. | | `json_transform_strict(json, structure)` | Same as `json_transform`, but throws an error when type casting fails. | | `from_json_strict(json, structure)` | Alias for `json_transform_strict`. | The `structure` argument is JSON of the same form as returned by `json_structure`. The `structure` argument can be modified to transform the JSON into the desired structure and types. It is possible to extract fewer key/value pairs than are present in the JSON, and it is also possible to extract more: missing keys become `NULL`. Examples: ```sql CREATE TABLE example (j JSON); INSERT INTO example VALUES ('{"family": "anatidae", "species": ["duck", "goose"], "coolness": 42.42}'), ('{"family": "canidae", "species": ["labrador", "bulldog"], "hair": true}'); ``` ```sql SELECT json_transform(j, '{"family": "VARCHAR", "coolness": "DOUBLE"}') FROM example; ``` ```text {'family': anatidae, 'coolness': 42.420000} {'family': canidae, 'coolness': NULL} ``` ```sql SELECT json_transform(j, '{"family": "TINYINT", "coolness": "DECIMAL(4, 2)"}') FROM example; ``` ```text {'family': NULL, 'coolness': 42.42} {'family': NULL, 'coolness': NULL} ``` ```sql SELECT json_transform_strict(j, '{"family": "TINYINT", "coolness": "DOUBLE"}') FROM example; ``` ```console Invalid Input Error: Failed to cast value: "anatidae" ``` #### JSON Table Functions {#docs:stable:data:json:json_functions::json-table-functions} DuckDB implements two JSON table functions that take a JSON value and produce a table from it. | Function | Description | | :----------------------- | :------------------------------------------------------------------------------------------- | | `json_each(json[ ,path]` | Traverse `json` and return one row for each element in the top-level array or object. | | `json_tree(json[ ,path]` | Traverse `json` in depth-first fashion and return one row for each element in the structure. | If the element is not an array or object, the element itself is returned. If the optional `path` argument is supplied, traversal starts from the element at the given path instead of the root element. The resulting table has the following columns: | Field | Type | Description | | :-------- | :----------------- | :------------------------------------------ | | `key` | `VARCHAR` | Key of element relative to its parent | | `value` | `JSON` | Value of element | | `type` | `VARCHAR` | `json_type` (function) of this element | | `atom` | `JSON` | `json_value` (function) of this element | | `id` | `UBIGINT` | Element identifier, numbered by parse order | | `parent` | `UBIGINT` | `id` of parent element | | `fullkey` | `VARCHAR` | JSON path to element | | `path` | `VARCHAR` | JSON path to parent element | | `json` | `JSON` (Virtual) | The `json` parameter | | `root` | `TEXT` (Virtual) | The `path` parameter | | `rowid` | `BIGINT` (Virtual) | The row identifier | These functions are analogous to [SQLite's functions with the same name](https://www.sqlite.org/json1.html#jeach). Note that, because the `json_each` and `json_tree` functions refer to previous subqueries in the same FROM clause, they are [*lateral joins*](#docs:stable:sql:query_syntax:from::lateral-joins). Examples: ```sql CREATE TABLE example (j JSON); INSERT INTO example VALUES ('{"family": "anatidae", "species": ["duck", "goose"], "coolness": 42.42}'), ('{"family": "canidae", "species": ["labrador", "bulldog"], "hair": true}'); ``` ```sql SELECT je.*, je.rowid FROM example AS e, json_each(e.j) AS je; ``` | key | value | type | atom | id | parent | fullkey | path | rowid | | -------- | ---------------------- | ------- | ---------- | --: | ------ | ---------- | ---- | ----: | | family | "anatidae" | VARCHAR | "anatidae" | 2 | NULL | $.family | $ | 0 | | species | ["duck","goose"] | ARRAY | NULL | 4 | NULL | $.species | $ | 1 | | coolness | 42.42 | DOUBLE | 42.42 | 8 | NULL | $.coolness | $ | 2 | | family | "canidae" | VARCHAR | "canidae" | 2 | NULL | $.family | $ | 0 | | species | ["labrador","bulldog"] | ARRAY | NULL | 4 | NULL | $.species | $ | 1 | | hair | true | BOOLEAN | true | 8 | NULL | $.hair | $ | 2 | ```sql SELECT je.*, je.rowid FROM example AS e, json_each(e.j, '$.species') AS je; ``` | key | value | type | atom | id | parent | fullkey | path | rowid | | --- | ---------- | ------- | ---------- | --: | ------ | ------------ | --------- | ----: | | 0 | "duck" | VARCHAR | "duck" | 5 | NULL | $.species[0] | $.species | 0 | | 1 | "goose" | VARCHAR | "goose" | 6 | NULL | $.species[1] | $.species | 1 | | 0 | "labrador" | VARCHAR | "labrador" | 5 | NULL | $.species[0] | $.species | 0 | | 1 | "bulldog" | VARCHAR | "bulldog" | 6 | NULL | $.species[1] | $.species | 1 | ```sql SELECT je.key, je.value, je.type, je.id, je.parent, je.fullkey, je.rowid FROM example AS e, json_tree(e.j) AS je; ``` | key | value | type | id | parent | fullkey | rowid | | -------- | ----------------------------------------------------------------- | ------- | --: | ------ | ------------ | ----: | | NULL | {"family":"anatidae","species":["duck","goose"],"coolness":42.42} | OBJECT | 0 | NULL | $ | 0 | | family | "anatidae" | VARCHAR | 2 | 0 | $.family | 1 | | species | ["duck","goose"] | ARRAY | 4 | 0 | $.species | 2 | | 0 | "duck" | VARCHAR | 5 | 4 | $.species[0] | 3 | | 1 | "goose" | VARCHAR | 6 | 4 | $.species[1] | 4 | | coolness | 42.42 | DOUBLE | 8 | 0 | $.coolness | 5 | | NULL | {"family":"canidae","species":["labrador","bulldog"],"hair":true} | OBJECT | 0 | NULL | $ | 0 | | family | "canidae" | VARCHAR | 2 | 0 | $.family | 1 | | species | ["labrador","bulldog"] | ARRAY | 4 | 0 | $.species | 2 | | 0 | "labrador" | VARCHAR | 5 | 4 | $.species[0] | 3 | | 1 | "bulldog" | VARCHAR | 6 | 4 | $.species[1] | 4 | | hair | true | BOOLEAN | 8 | 0 | $.hair | 5 | ### JSON Format Settings {#docs:stable:data:json:format_settings} The JSON extension can attempt to determine the format of a JSON file when setting `format` to `auto`. Here are some example JSON files and the corresponding `format` settings that should be used. In each of the below cases, the `format` setting was not needed, as DuckDB was able to infer it correctly, but it is included for illustrative purposes. A query of this shape would work in each case: ```sql SELECT * FROM filename.json; ``` ###### Format: `newline_delimited` {#docs:stable:data:json:format_settings::format-newline_delimited} With `format = 'newline_delimited'` newline-delimited JSON can be parsed. Each line is a JSON. We use the example file [`records.json`](https://duckdb.org/data/records.json) with the following content: ```json {"key1":"value1", "key2": "value1"} {"key1":"value2", "key2": "value2"} {"key1":"value3", "key2": "value3"} ``` ```sql SELECT * FROM read_json('records.json', format = 'newline_delimited'); ``` | key1 | key2 | |--------|--------| | value1 | value1 | | value2 | value2 | | value3 | value3 | ###### Format: `array` {#docs:stable:data:json:format_settings::format-array} If the JSON file contains a JSON array of objects (pretty-printed or not), `array_of_objects` may be used. To demonstrate its use, we use the example file [`records-in-array.json`](https://duckdb.org/data/records-in-array.json): ```json [ {"key1":"value1", "key2": "value1"}, {"key1":"value2", "key2": "value2"}, {"key1":"value3", "key2": "value3"} ] ``` ```sql SELECT * FROM read_json('records-in-array.json', format = 'array'); ``` | key1 | key2 | |--------|--------| | value1 | value1 | | value2 | value2 | | value3 | value3 | ###### Format: `unstructured` {#docs:stable:data:json:format_settings::format-unstructured} If the JSON file contains JSON that is not newline-delimited or an array, `unstructured` may be used. To demonstrate its use, we use the example file [`unstructured.json`](https://duckdb.org/data/unstructured.json): ```json { "key1":"value1", "key2":"value1" } { "key1":"value2", "key2":"value2" } { "key1":"value3", "key2":"value3" } ``` ```sql SELECT * FROM read_json('unstructured.json', format = 'unstructured'); ``` | key1 | key2 | |--------|--------| | value1 | value1 | | value2 | value2 | | value3 | value3 | ##### Records Settings {#docs:stable:data:json:format_settings::records-settings} The JSON extension can attempt to determine whether a JSON file contains records when setting `records = auto`. When `records = true`, the JSON extension expects JSON objects, and will unpack the fields of JSON objects into individual columns. Continuing with the same example file, [`records.json`](https://duckdb.org/data/records.json): ```json {"key1":"value1", "key2": "value1"} {"key1":"value2", "key2": "value2"} {"key1":"value3", "key2": "value3"} ``` ```sql SELECT * FROM read_json('records.json', records = true); ``` | key1 | key2 | |--------|--------| | value1 | value1 | | value2 | value2 | | value3 | value3 | When `records = false`, the JSON extension will not unpack the top-level objects, and create `STRUCT`s instead: ```sql SELECT * FROM read_json('records.json', records = false); ``` | json | |----------------------------------| | {'key1': value1, 'key2': value1} | | {'key1': value2, 'key2': value2} | | {'key1': value3, 'key2': value3} | This is especially useful if we have non-object JSON, for example, [`arrays.json`](https://duckdb.org/data/arrays.json): ```json [1, 2, 3] [4, 5, 6] [7, 8, 9] ``` ```sql SELECT * FROM read_json('arrays.json', records = false); ``` | json | |-----------| | [1, 2, 3] | | [4, 5, 6] | | [7, 8, 9] | ### Installing and Loading the JSON extension {#docs:stable:data:json:installing_and_loading} The `json` extension is shipped by default in DuckDB builds, otherwise, it will be transparently [autoloaded](#docs:stable:extensions:overview::autoloading-extensions) on first use. If you would like to install and load it manually, run: ```sql INSTALL json; LOAD json; ``` ### SQL to/from JSON {#docs:stable:data:json:sql_to_and_from_json} DuckDB provides functions to serialize and deserialize `SELECT` statements between SQL and JSON, as well as executing JSON serialized statements. | Function | Type | Description | |:------|:-|:---------| | `json_deserialize_sql(json)` | Scalar | Deserialize one or many `json` serialized statements back to an equivalent SQL string. | | `json_execute_serialized_sql(varchar)` | Table | Execute `json` serialized statements and return the resulting rows. Only one statement at a time is supported for now. | | `json_serialize_sql(varchar, skip_default := boolean, skip_empty := boolean, skip_null := boolean, format := boolean)` | Scalar | Serialize a set of semicolon-separated (` ;`) select statements to an equivalent list of `json` serialized statements. | | `PRAGMA json_execute_serialized_sql(varchar)` | Pragma | Pragma version of the `json_execute_serialized_sql` function. | The `json_serialize_sql(varchar)` function takes three optional parameters, `skip_empty`, `skip_null`, and `format` that can be used to control the output of the serialized statements. If you run the `json_execute_serialized_sql(varchar)` table function inside of a transaction the serialized statements will not be able to see any transaction local changes. This is because the statements are executed in a separate query context. You can use the `PRAGMA json_execute_serialized_sql(varchar)` pragma version to execute the statements in the same query context as the pragma, although with the limitation that the serialized JSON must be provided as a constant string, i.e., you cannot do `PRAGMA json_execute_serialized_sql(json_serialize_sql(...))`. Note that these functions do not preserve syntactic sugar such as `FROM * SELECT ...`, so a statement round-tripped through `json_deserialize_sql(json_serialize_sql(...))` may not be identical to the original statement, but should always be semantically equivalent and produce the same output. ##### Examples {#docs:stable:data:json:sql_to_and_from_json::examples} Simple example: ```sql SELECT json_serialize_sql('SELECT 2'); ``` ```text {"error":false,"statements":[{"node":{"type":"SELECT_NODE","modifiers":[],"cte_map":{"map":[]},"select_list":[{"class":"CONSTANT","type":"VALUE_CONSTANT","alias":"","query_location":7,"value":{"type":{"id":"INTEGER","type_info":null},"is_null":false,"value":2}}],"from_table":{"type":"EMPTY","alias":"","sample":null,"query_location":18446744073709551615},"where_clause":null,"group_expressions":[],"group_sets":[],"aggregate_handling":"STANDARD_HANDLING","having":null,"sample":null,"qualify":null},"named_param_map":[]}]} ``` Example with multiple statements and skip options: ```sql SELECT json_serialize_sql('SELECT 1 + 2; SELECT a + b FROM tbl1', skip_empty := true, skip_null := true); ``` ```text {"error":false,"statements":[{"node":{"type":"SELECT_NODE","select_list":[{"class":"FUNCTION","type":"FUNCTION","query_location":9,"function_name":"+","children":[{"class":"CONSTANT","type":"VALUE_CONSTANT","query_location":7,"value":{"type":{"id":"INTEGER"},"is_null":false,"value":1}},{"class":"CONSTANT","type":"VALUE_CONSTANT","query_location":11,"value":{"type":{"id":"INTEGER"},"is_null":false,"value":2}}],"order_bys":{"type":"ORDER_MODIFIER"},"distinct":false,"is_operator":true,"export_state":false}],"from_table":{"type":"EMPTY","query_location":18446744073709551615},"aggregate_handling":"STANDARD_HANDLING"}},{"node":{"type":"SELECT_NODE","select_list":[{"class":"FUNCTION","type":"FUNCTION","query_location":23,"function_name":"+","children":[{"class":"COLUMN_REF","type":"COLUMN_REF","query_location":21,"column_names":["a"]},{"class":"COLUMN_REF","type":"COLUMN_REF","query_location":25,"column_names":["b"]}],"order_bys":{"type":"ORDER_MODIFIER"},"distinct":false,"is_operator":true,"export_state":false}],"from_table":{"type":"BASE_TABLE","query_location":32,"table_name":"tbl1"},"aggregate_handling":"STANDARD_HANDLING"}}]} ``` Skip the default values in the AST (e.g., `"distinct":false`): ```sql SELECT json_serialize_sql('SELECT 1 + 2; SELECT a + b FROM tbl1', skip_default := true, skip_empty := true, skip_null := true); ``` ```text {"error":false,"statements":[{"node":{"type":"SELECT_NODE","select_list":[{"class":"FUNCTION","type":"FUNCTION","query_location":9,"function_name":"+","children":[{"class":"CONSTANT","type":"VALUE_CONSTANT","query_location":7,"value":{"type":{"id":"INTEGER"},"is_null":false,"value":1}},{"class":"CONSTANT","type":"VALUE_CONSTANT","query_location":11,"value":{"type":{"id":"INTEGER"},"is_null":false,"value":2}}],"order_bys":{"type":"ORDER_MODIFIER"},"is_operator":true}],"from_table":{"type":"EMPTY"},"aggregate_handling":"STANDARD_HANDLING"}},{"node":{"type":"SELECT_NODE","select_list":[{"class":"FUNCTION","type":"FUNCTION","query_location":23,"function_name":"+","children":[{"class":"COLUMN_REF","type":"COLUMN_REF","query_location":21,"column_names":["a"]},{"class":"COLUMN_REF","type":"COLUMN_REF","query_location":25,"column_names":["b"]}],"order_bys":{"type":"ORDER_MODIFIER"},"is_operator":true}],"from_table":{"type":"BASE_TABLE","query_location":32,"table_name":"tbl1"},"aggregate_handling":"STANDARD_HANDLING"}}]} ``` Example with a syntax error: ```sql SELECT json_serialize_sql('TOTALLY NOT VALID SQL'); ``` ```text {"error":true,"error_type":"parser","error_message":"syntax error at or near \"TOTALLY\"","error_subtype":"SYNTAX_ERROR","position":"0"} ``` Example with deserialize: ```sql SELECT json_deserialize_sql(json_serialize_sql('SELECT 1 + 2')); ``` ```text SELECT (1 + 2) ``` Example with deserialize and syntax sugar, which is lost during the transformation: ```sql SELECT json_deserialize_sql(json_serialize_sql('FROM x SELECT 1 + 2')); ``` ```text SELECT (1 + 2) FROM x ``` Example with execute: ```sql SELECT * FROM json_execute_serialized_sql(json_serialize_sql('SELECT 1 + 2')); ``` ```text 3 ``` Example with error: ```sql SELECT * FROM json_execute_serialized_sql(json_serialize_sql('TOTALLY NOT VALID SQL')); ``` ```console Parser Error: Error parsing json: parser: syntax error at or near "TOTALLY" ``` ### Caveats {#docs:stable:data:json:caveats} #### Equality Comparison {#docs:stable:data:json:caveats::equality-comparison} > **Warning.** Currently, equality comparison of JSON files can differ based on the context. In some cases, it is based on raw text comparison, while in other cases, it uses logical content comparison. The following query returns true for all fields: ```sql SELECT a != b, -- Space is part of physical JSON content. Despite equal logical content, values are treated as not equal. c != d, -- Same. c[0] = d[0], -- Equality because space was removed from physical content of fields: a = c[0], -- Indeed, field is equal to empty list without space... b != c[0], -- ... but different from empty list with space. FROM ( SELECT '[]'::JSON AS a, '[ ]'::JSON AS b, '[[]]'::JSON AS c, '[[ ]]'::JSON AS d ); ``` | (a != b) | (c != d) | (c[0] = d[0]) | (a = c[0]) | (b != c[0]) | |----------|----------|---------------|------------|-------------| | true | true | true | true | true | ## Multiple Files {#data:multiple_files} ### Reading Multiple Files {#docs:stable:data:multiple_files:overview} DuckDB can read multiple files of different types (CSV, Parquet, JSON files) at the same time using either the glob syntax, or by providing a list of files to read. See the [combining schemas](#docs:stable:data:multiple_files:combining_schemas) page for tips on reading files with different schemas. #### CSV {#docs:stable:data:multiple_files:overview::csv} Read all files with a name ending in `.csv` in the folder `dir`: ```sql SELECT * FROM 'dir/*.csv'; ``` Read all files with a name ending in `.csv`, two directories deep: ```sql SELECT * FROM '*/*/*.csv'; ``` Read all files with a name ending in `.csv`, at any depth in the folder `dir`: ```sql SELECT * FROM 'dir/**/*.csv'; ``` Read the CSV files `flights1.csv` and `flights2.csv`: ```sql SELECT * FROM read_csv(['flights1.csv', 'flights2.csv']); ``` Read the CSV files `flights1.csv` and `flights2.csv`, unifying schemas by name and outputting a `filename` column: ```sql SELECT * FROM read_csv(['flights1.csv', 'flights2.csv'], union_by_name = true, filename = true); ``` #### Parquet {#docs:stable:data:multiple_files:overview::parquet} Read all files that match the glob pattern: ```sql SELECT * FROM 'test/*.parquet'; ``` Read three Parquet files and treat them as a single table: ```sql SELECT * FROM read_parquet(['file1.parquet', 'file2.parquet', 'file3.parquet']); ``` Read all Parquet files from two specific folders: ```sql SELECT * FROM read_parquet(['folder1/*.parquet', 'folder2/*.parquet']); ``` Read all Parquet files that match the glob pattern at any depth: ```sql SELECT * FROM read_parquet('dir/**/*.parquet'); ``` #### Multi-File Reads and Globs {#docs:stable:data:multiple_files:overview::multi-file-reads-and-globs} DuckDB can also read a series of Parquet files and treat them as if they were a single table. Note that this only works if the Parquet files have the same schema. You can specify which Parquet files you want to read using a list parameter, glob pattern matching syntax, or a combination of both. ##### List Parameter {#docs:stable:data:multiple_files:overview::list-parameter} The `read_parquet` function can accept a list of filenames as the input parameter. Read three Parquet files and treat them as a single table: ```sql SELECT * FROM read_parquet(['file1.parquet', 'file2.parquet', 'file3.parquet']); ``` ##### Glob Syntax {#docs:stable:data:multiple_files:overview::glob-syntax} Any file name input to the `read_parquet` function can either be an exact filename, or use a glob syntax to read multiple files that match a pattern. | Wildcard | Description | |------------|-----------------------------------------------------------| | `*` | Matches any number of any characters (including none) | | `**` | Matches any number of subdirectories (including none) | | `?` | Matches any single character | | `[abc]` | Matches one character given in the bracket | | `[a-z]` | Matches one character from the range given in the bracket | Note that the `?` wildcard in globs is not supported for reads over S3 due to HTTP encoding issues. Here is an example that reads all the files that end with `.parquet` located in the `test` folder: Read all files that match the glob pattern: ```sql SELECT * FROM read_parquet('test/*.parquet'); ``` ##### List of Globs {#docs:stable:data:multiple_files:overview::list-of-globs} The glob syntax and the list input parameter can be combined to scan files that meet one of multiple patterns. Read all Parquet files from 2 specific folders. ```sql SELECT * FROM read_parquet(['folder1/*.parquet', 'folder2/*.parquet']); ``` DuckDB can read multiple CSV files at the same time using either the glob syntax, or by providing a list of files to read. #### Filename {#docs:stable:data:multiple_files:overview::filename} The `filename` argument can be used to add an extra `filename` column to the result that indicates which row came from which file. For example: ```sql SELECT * FROM read_csv(['flights1.csv', 'flights2.csv'], union_by_name = true, filename = true); ``` | FlightDate | OriginCityName | DestCityName | UniqueCarrier | filename | |------------|----------------|-----------------|---------------|--------------| | 1988-01-01 | New York, NY | Los Angeles, CA | NULL | flights1.csv | | 1988-01-02 | New York, NY | Los Angeles, CA | NULL | flights1.csv | | 1988-01-03 | New York, NY | Los Angeles, CA | AA | flights2.csv | #### Glob Function to Find Filenames {#docs:stable:data:multiple_files:overview::glob-function-to-find-filenames} The glob pattern matching syntax can also be used to search for filenames using the `glob` table function. It accepts one parameter: the path to search (which may include glob patterns). Search the current directory for all files. ```sql SELECT * FROM glob('*'); ``` | file | |---------------| | test.csv | | test.json | | test.parquet | | test2.csv | | test2.parquet | | todos.json | ### Combining Schemas {#docs:stable:data:multiple_files:combining_schemas} #### Examples {#docs:stable:data:multiple_files:combining_schemas::examples} Read a set of CSV files combining columns by position: ```sql SELECT * FROM read_csv('flights*.csv'); ``` Read a set of CSV files combining columns by name: ```sql SELECT * FROM read_csv('flights*.csv', union_by_name = true); ``` #### Combining Schemas {#docs:stable:data:multiple_files:combining_schemas::combining-schemas} When reading from multiple files, we have to **combine schemas** from those files. That is because each file has its own schema that can differ from the other files. DuckDB offers two ways of unifying schemas of multiple files: **by column position** and **by column name**. By default, DuckDB reads the schema of the first file provided, and then unifies columns in subsequent files by column position. This works correctly as long as all files have the same schema. If the schema of the files differs, you might want to use the `union_by_name` option to allow DuckDB to construct the schema by reading all of the names instead. Below is an example of how both methods work. #### Union by Position {#docs:stable:data:multiple_files:combining_schemas::union-by-position} By default, DuckDB unifies the columns of these different files **by position**. This means that the first column in each file is combined together, as well as the second column in each file, etc. For example, consider the following two files. [`flights1.csv`](https://duckdb.org/data/flights1.csv): ```csv FlightDate|UniqueCarrier|OriginCityName|DestCityName 1988-01-01|AA|New York, NY|Los Angeles, CA 1988-01-02|AA|New York, NY|Los Angeles, CA ``` [`flights2.csv`](https://duckdb.org/data/flights2.csv): ```csv FlightDate|UniqueCarrier|OriginCityName|DestCityName 1988-01-03|AA|New York, NY|Los Angeles, CA ``` Reading the two files at the same time will produce the following result set: | FlightDate | UniqueCarrier | OriginCityName | DestCityName | |------------|---------------|----------------|-----------------| | 1988-01-01 | AA | New York, NY | Los Angeles, CA | | 1988-01-02 | AA | New York, NY | Los Angeles, CA | | 1988-01-03 | AA | New York, NY | Los Angeles, CA | This is equivalent to the SQL construct [`UNION ALL`](#docs:stable:sql:query_syntax:setops::union-all). #### Union by Name {#docs:stable:data:multiple_files:combining_schemas::union-by-name} If you are processing multiple files that have different schemas, perhaps because columns have been added or renamed, it might be desirable to unify the columns of different files **by name** instead. This can be done by providing the `union_by_name` option. For example, consider the following two files, where `flights4.csv` has an extra column (` UniqueCarrier`). [`flights3.csv`](https://duckdb.org/data/flights3.csv): ```csv FlightDate|OriginCityName|DestCityName 1988-01-01|New York, NY|Los Angeles, CA 1988-01-02|New York, NY|Los Angeles, CA ``` [`flights4.csv`](https://duckdb.org/data/flights4.csv): ```csv FlightDate|UniqueCarrier|OriginCityName|DestCityName 1988-01-03|AA|New York, NY|Los Angeles, CA ``` Reading these when unifying column names **by position** results in an error â€“ as the two files have a different number of columns. When specifying the `union_by_name` option, the columns are correctly unified, and any missing values are set to `NULL`. ```sql SELECT * FROM read_csv(['flights3.csv', 'flights4.csv'], union_by_name = true); ``` | FlightDate | OriginCityName | DestCityName | UniqueCarrier | |------------|----------------|-----------------|---------------| | 1988-01-01 | New York, NY | Los Angeles, CA | NULL | | 1988-01-02 | New York, NY | Los Angeles, CA | NULL | | 1988-01-03 | New York, NY | Los Angeles, CA | AA | This is equivalent to the SQL construct [`UNION ALL BY NAME`](#docs:stable:sql:query_syntax:setops::union-all-by-name). > Using the `union_by_name` option increases memory consumption. ## Parquet Files {#data:parquet} ### Reading and Writing Parquet Files {#docs:stable:data:parquet:overview} #### Examples {#docs:stable:data:parquet:overview::examples} Read a single Parquet file: ```sql SELECT * FROM 'test.parquet'; ``` Figure out which columns/types are in a Parquet file: ```sql DESCRIBE SELECT * FROM 'test.parquet'; ``` Create a table from a Parquet file: ```sql CREATE TABLE test AS SELECT * FROM 'test.parquet'; ``` If the file does not end in `.parquet`, use the `read_parquet` function: ```sql SELECT * FROM read_parquet('test.parq'); ``` Use list parameter to read three Parquet files and treat them as a single table: ```sql SELECT * FROM read_parquet(['file1.parquet', 'file2.parquet', 'file3.parquet']); ``` Read all files that match the glob pattern: ```sql SELECT * FROM 'test/*.parquet'; ``` Read all files that match the glob pattern, and include the `filename` virtual column that specifies which file each row came from (this column is available by default without a configuration options since DuckDB v1.3.0): ```sql SELECT *, filename FROM read_parquet('test/*.parquet'); ``` Use a list of globs to read all Parquet files from two specific folders: ```sql SELECT * FROM read_parquet(['folder1/*.parquet', 'folder2/*.parquet']); ``` Read over HTTPS: ```sql SELECT * FROM read_parquet('https://some.url/some_file.parquet'); ``` Query the [metadata of a Parquet file](#docs:stable:data:parquet:metadata::parquet-metadata): ```sql SELECT * FROM parquet_metadata('test.parquet'); ``` Query the [file metadata of a Parquet file](#docs:stable:data:parquet:metadata::parquet-file-metadata): ```sql SELECT * FROM parquet_file_metadata('test.parquet'); ``` Query the [key-value metadata of a Parquet file](#docs:stable:data:parquet:metadata::parquet-key-value-metadata): ```sql SELECT * FROM parquet_kv_metadata('test.parquet'); ``` Query the [schema of a Parquet file](#docs:stable:data:parquet:metadata::parquet-schema): ```sql SELECT * FROM parquet_schema('test.parquet'); ``` Write the results of a query to a Parquet file using the default compression (Snappy): ```sql COPY (SELECT * FROM tbl) TO 'result-snappy.parquet' (FORMAT parquet); ``` Write the results from a query to a Parquet file with specific compression and row group size: ```sql COPY (FROM generate_series(100_000)) TO 'test.parquet' (FORMAT parquet, COMPRESSION zstd, ROW_GROUP_SIZE 100_000); ``` Export the table contents of the entire database as parquet: ```sql EXPORT DATABASE 'target_directory' (FORMAT parquet); ``` #### Parquet Files {#docs:stable:data:parquet:overview::parquet-files} Parquet files are compressed columnar files that are efficient to load and process. DuckDB provides support for both reading and writing Parquet files in an efficient manner, as well as support for pushing filters and projections into the Parquet file scans. > Parquet datasets differ based on the number of files, the size of individual files, the compression algorithm used, row group size, etc. These have a significant effect on performance. Please consult the [Performance Guide](#docs:stable:guides:performance:file_formats) for details. #### `read_parquet` Function {#docs:stable:data:parquet:overview::read_parquet-function} | Function | Description | Example | |:--|:--|:-----| | `read_parquet(path_or_list_of_paths)` | Read Parquet file(s) | `SELECT * FROM read_parquet('test.parquet');` | | `parquet_scan(path_or_list_of_paths)` | Alias for `read_parquet` | `SELECT * FROM parquet_scan('test.parquet');` | If your file ends in `.parquet`, the function syntax is optional. The system will automatically infer that you are reading a Parquet file: ```sql SELECT * FROM 'test.parquet'; ``` Multiple files can be read at once by providing a glob or a list of files. Refer to the [multiple files section](#docs:stable:data:multiple_files:overview) for more information. ##### Parameters {#docs:stable:data:parquet:overview::parameters} There are a number of options exposed that can be passed to the `read_parquet` function or the [`COPY` statement](#docs:stable:sql:statements:copy). | Name | Description | Type | Default | |:--|:-----|:-|:-| | `binary_as_string` | Parquet files generated by legacy writers do not correctly set the `UTF8` flag for strings, causing string columns to be loaded as `BLOB` instead. Set this to true to load binary columns as strings. | `BOOL` | `false` | | `encryption_config` | Configuration for [Parquet encryption](#docs:stable:data:parquet:encryption). | `STRUCT` | - | | `filename` | Whether or not an extra `filename` column should be included in the result. Since DuckDB v1.3.0, the `filename` column is added automatically as a virtual column and this option is only kept for compatibility reasons. | `BOOL` | `false` | | `file_row_number` | Whether or not to include the `file_row_number` column. | `BOOL` | `false` | | `hive_partitioning` | Whether or not to interpret the path as a [Hive partitioned path](#docs:stable:data:partitioning:hive_partitioning). | `BOOL` | (auto-detected) | | `union_by_name` | Whether the columns of multiple schemas should be [unified by name](#docs:stable:data:multiple_files:combining_schemas), rather than by position. | `BOOL` | `false` | #### Partial Reading {#docs:stable:data:parquet:overview::partial-reading} DuckDB supports projection pushdown into the Parquet file itself. That is to say, when querying a Parquet file, only the columns required for the query are read. This allows you to read only the part of the Parquet file that you are interested in. This will be done automatically by DuckDB. DuckDB also supports filter pushdown into the Parquet reader. When you apply a filter to a column that is scanned from a Parquet file, the filter will be pushed down into the scan, and can even be used to skip parts of the file using the built-in zonemaps. Note that this will depend on whether or not your Parquet file contains zonemaps. Filter and projection pushdown provide significant performance benefits. See [our blog post â€œQuerying Parquet with Precision Using DuckDBâ€](https://duckdb.org/2021/06/25/querying-parquet) for more information. #### Inserts and Views {#docs:stable:data:parquet:overview::inserts-and-views} You can also insert the data into a table or create a table from the Parquet file directly. This will load the data from the Parquet file and insert it into the database: Insert the data from the Parquet file in the table: ```sql INSERT INTO people SELECT * FROM read_parquet('test.parquet'); ``` Create a table directly from a Parquet file: ```sql CREATE TABLE people AS SELECT * FROM read_parquet('test.parquet'); ``` If you wish to keep the data stored inside the Parquet file, but want to query the Parquet file directly, you can create a view over the `read_parquet` function. You can then query the Parquet file as if it were a built-in table: Create a view over the Parquet file: ```sql CREATE VIEW people AS SELECT * FROM read_parquet('test.parquet'); ``` Query the Parquet file: ```sql SELECT * FROM people; ``` #### Writing to Parquet Files {#docs:stable:data:parquet:overview::writing-to-parquet-files} DuckDB also has support for writing to Parquet files using the `COPY` statement syntax. See the [`COPY` Statement page](#docs:stable:sql:statements:copy) for details, including all possible parameters for the `COPY` statement. Write a query to a Snappy-compressed Parquet file: ```sql COPY (SELECT * FROM tbl) TO 'result-snappy.parquet' (FORMAT parquet); ``` Write `tbl` to a zstd-compressed Parquet file: ```sql COPY tbl TO 'result-zstd.parquet' (FORMAT parquet, COMPRESSION zstd); ``` Write `tbl` to a zstd-compressed Parquet file with the lowest compression level yielding the fastest compression: ```sql COPY tbl TO 'result-zstd.parquet' (FORMAT parquet, COMPRESSION zstd, COMPRESSION_LEVEL 1); ``` Write to Parquet file with [key-value metadata](#docs:stable:data:parquet:metadata): ```sql COPY ( SELECT 42 AS number, true AS is_even ) TO 'kv_metadata.parquet' ( FORMAT parquet, KV_METADATA { number: 'Answer to life, universe, and everything', is_even: 'not ''odd''' -- single quotes in values must be escaped } ); ``` Write a CSV file to an uncompressed Parquet file: ```sql COPY 'test.csv' TO 'result-uncompressed.parquet' (FORMAT parquet, COMPRESSION uncompressed); ``` Write a query to a Parquet file with zstd-compression and row group size: ```sql COPY (FROM generate_series(100_000)) TO 'row-groups-zstd.parquet' (FORMAT parquet, COMPRESSION zstd, ROW_GROUP_SIZE 100_000); ``` Write data to an LZ4-compressed Parquet file: ```sql COPY (FROM generate_series(100_000)) TO 'result-lz4.parquet' (FORMAT parquet, COMPRESSION lz4); ``` Or, equivalently: ```sql COPY (FROM generate_series(100_000)) TO 'result-lz4.parquet' (FORMAT parquet, COMPRESSION lz4_raw); ``` Write data to a Brotli-compressed Parquet file: ```sql COPY (FROM generate_series(100_000)) TO 'result-brotli.parquet' (FORMAT parquet, COMPRESSION brotli); ``` To configure the page size of Parquet file's dictionary pages, use the `STRING_DICTIONARY_PAGE_SIZE_LIMIT` option (default: 1 MB): ```sql COPY lineitem TO 'lineitem-with-custom-dictionary-size.parquet' (FORMAT parquet, STRING_DICTIONARY_PAGE_SIZE_LIMIT 100_000); ``` DuckDB's `EXPORT` command can be used to export an entire database to a series of Parquet files. See the [â€œ`EXPORT` statementâ€ page](#docs:stable:sql:statements:export) for more details: Export the table contents of the entire database as Parquet: ```sql EXPORT DATABASE 'target_directory' (FORMAT parquet); ``` #### Encryption {#docs:stable:data:parquet:overview::encryption} DuckDB supports reading and writing [encrypted Parquet files](#docs:stable:data:parquet:encryption). #### Supported Features {#docs:stable:data:parquet:overview::supported-features} The list of supported Parquet features is available in the [Parquet documentation's â€œImplementation statusâ€ page](https://parquet.apache.org/docs/file-format/implementationstatus/). #### Installing and Loading the Parquet Extension {#docs:stable:data:parquet:overview::installing-and-loading-the-parquet-extension} The support for Parquet files is enabled via extension. The `parquet` extension is bundled with almost all clients. However, if your client does not bundle the `parquet` extension, the extension must be installed separately: ```sql INSTALL parquet; ``` ### Querying Parquet Metadata {#docs:stable:data:parquet:metadata} #### Parquet Metadata {#docs:stable:data:parquet:metadata::parquet-metadata} The `parquet_metadata` function can be used to query the metadata contained within a Parquet file, which reveals various internal details of the Parquet file such as the statistics of the different columns. This can be useful for figuring out what kind of skipping is possible in Parquet files, or even to obtain a quick overview of what the different columns contain: ```sql SELECT * FROM parquet_metadata('test.parquet'); ``` Below is a table of the columns returned by `parquet_metadata`. | Field | Type | | -------------------------- | --------------- | | file_name | VARCHAR | | row_group_id | BIGINT | | row_group_num_rows | BIGINT | | row_group_num_columns | BIGINT | | row_group_bytes | BIGINT | | column_id | BIGINT | | file_offset | BIGINT | | num_values | BIGINT | | path_in_schema | VARCHAR | | type | VARCHAR | | stats_min | VARCHAR | | stats_max | VARCHAR | | stats_null_count | BIGINT | | stats_distinct_count | BIGINT | | stats_min_value | VARCHAR | | stats_max_value | VARCHAR | | compression | VARCHAR | | encodings | VARCHAR | | index_page_offset | BIGINT | | dictionary_page_offset | BIGINT | | data_page_offset | BIGINT | | total_compressed_size | BIGINT | | total_uncompressed_size | BIGINT | | key_value_metadata | MAP(BLOB, BLOB) | | bloom_filter_offset | BIGINT | | bloom_filter_length | BIGINT | | min_is_exact | BOOLEAN | | max_is_exact | BOOLEAN | | row_group_compressed_bytes | BIGINT | #### Parquet Schema {#docs:stable:data:parquet:metadata::parquet-schema} The `parquet_schema` function can be used to query the internal schema contained within a Parquet file. Note that this is the schema as it is contained within the metadata of the Parquet file. If you want to figure out the column names and types contained within a Parquet file it is easier to use `DESCRIBE`. Fetch the column names and column types: ```sql DESCRIBE SELECT * FROM 'test.parquet'; ``` Fetch the internal schema of a Parquet file: ```sql SELECT * FROM parquet_schema('test.parquet'); ``` Below is a table of the columns returned by `parquet_schema`. | Field | Type | | --------------- | ------- | | file_name | VARCHAR | | name | VARCHAR | | type | VARCHAR | | type_length | VARCHAR | | repetition_type | VARCHAR | | num_children | BIGINT | | converted_type | VARCHAR | | scale | BIGINT | | precision | BIGINT | | field_id | BIGINT | | logical_type | VARCHAR | #### Parquet File Metadata {#docs:stable:data:parquet:metadata::parquet-file-metadata} The `parquet_file_metadata` function can be used to query file-level metadata such as the format version and the encryption algorithm used: ```sql SELECT * FROM parquet_file_metadata('test.parquet'); ``` Below is a table of the columns returned by `parquet_file_metadata`. | Field | Type | | --------------------------- | ------- | | file_name | VARCHAR | | created_by | VARCHAR | | num_rows | BIGINT | | num_row_groups | BIGINT | | format_version | BIGINT | | encryption_algorithm | VARCHAR | | footer_signing_key_metadata | VARCHAR | #### Parquet Key-Value Metadata {#docs:stable:data:parquet:metadata::parquet-key-value-metadata} The `parquet_kv_metadata` function can be used to query custom metadata defined as key-value pairs: ```sql SELECT * FROM parquet_kv_metadata('test.parquet'); ``` Below is a table of the columns returned by `parquet_kv_metadata`. | Field | Type | | --------- | ------- | | file_name | VARCHAR | | key | BLOB | | value | BLOB | #### Bloom Filters {#docs:stable:data:parquet:metadata::bloom-filters} DuckDB [supports Bloom filters](https://duckdb.org/2025/03/07/parquet-bloom-filters-in-duckdb) for pruning the row groups that need to be read to answer highly selective queries. Currently, Bloom filters are supported for the following types: * Integer types: `TINYINT`, `UTINYINT`, `SMALLINT`, `USMALLINT`, `INTEGER`, `UINTEGER`, `BIGINT`, `UBIGINT` * Floating point types: `FLOAT`, `DOUBLE` * `VARCHAR` * `BLOB` The `parquet_bloom_probe(filename, column_name, value)` function shows which row groups can excluded when filtering for a given value of a given column using the Bloom filter. For example: ```sql FROM parquet_bloom_probe('my_file.parquet', 'my_col', 500); ``` | file_name | row_group_id | bloom_filter_excludes | | --------------- | -----------: | --------------------: | | my_file.parquet | 0 | true | | ... | ... | ... | | my_file.parquet | 9 | false | ### Parquet Encryption {#docs:stable:data:parquet:encryption} Starting with version 0.10.0, DuckDB supports reading and writing encrypted Parquet files. DuckDB broadly follows the [Parquet Modular Encryption specification](https://github.com/apache/parquet-format/blob/master/Encryption.md) with some [limitations](#::limitations). #### Reading and Writing Encrypted Files {#docs:stable:data:parquet:encryption::reading-and-writing-encrypted-files} Using the `PRAGMA add_parquet_key` function, named encryption keys of 128, 192, or 256 bits can be added to a session. These keys are stored in-memory: ```sql PRAGMA add_parquet_key('key128', '0123456789112345'); PRAGMA add_parquet_key('key192', '012345678911234501234567'); PRAGMA add_parquet_key('key256', '01234567891123450123456789112345'); PRAGMA add_parquet_key('key256base64', 'MDEyMzQ1Njc4OTExMjM0NTAxMjM0NTY3ODkxMTIzNDU='); ``` ##### Writing Encrypted Parquet Files {#docs:stable:data:parquet:encryption::writing-encrypted-parquet-files} After specifying the key (e.g., `key256`), files can be encrypted as follows: ```sql COPY tbl TO 'tbl.parquet' (ENCRYPTION_CONFIG {footer_key: 'key256'}); ``` ##### Reading Encrypted Parquet Files {#docs:stable:data:parquet:encryption::reading-encrypted-parquet-files} An encrypted Parquet file using a specific key (e.g., `key256`), can then be read as follows: ```sql COPY tbl FROM 'tbl.parquet' (ENCRYPTION_CONFIG {footer_key: 'key256'}); ``` Or: ```sql SELECT * FROM read_parquet('tbl.parquet', encryption_config = {footer_key: 'key256'}); ``` #### Limitations {#docs:stable:data:parquet:encryption::limitations} DuckDB's Parquet encryption currently has the following limitations. 1. It is not compatible with the encryption of, e.g., PyArrow, until the missing details are implemented. 2. DuckDB encrypts the footer and all columns using the `footer_key`. The Parquet specification allows encryption of individual columns with different keys, e.g.: ```sql COPY tbl TO 'tbl.parquet' (ENCRYPTION_CONFIG { footer_key: 'key256', column_keys: {key256: ['col0', 'col1']} }); ``` However, this is unsupported at the moment and will cause an error to be thrown (for now): ```console Not implemented Error: Parquet encryption_config column_keys not yet implemented ``` #### Performance Implications {#docs:stable:data:parquet:encryption::performance-implications} Note that encryption has some performance implications. Without encryption, reading/writing the `lineitem` table from [`TPC-H`](#docs:stable:core_extensions:tpch) at SF1, which is 6M rows and 15 columns, from/to a Parquet file takes 0.26 and 0.99 seconds, respectively. With encryption, this takes 0.64 and 2.21 seconds, both approximately 2.5Ã— slower than the unencrypted version. ### Parquet Tips {#docs:stable:data:parquet:tips} Below is a collection of tips to help when dealing with Parquet files. #### Tips for Reading Parquet Files {#docs:stable:data:parquet:tips::tips-for-reading-parquet-files} ##### Use `union_by_name` When Loading Files with Different Schemas {#docs:stable:data:parquet:tips::use-union_by_name-when-loading-files-with-different-schemas} The `union_by_name` option can be used to unify the schema of files that have different or missing columns. For files that do not have certain columns, `NULL` values are filled in: ```sql SELECT * FROM read_parquet('flights*.parquet', union_by_name = true); ``` #### Tips for Writing Parquet Files {#docs:stable:data:parquet:tips::tips-for-writing-parquet-files} Using a [glob pattern](#docs:stable:data:multiple_files:overview::glob-syntax) upon read or a [Hive partitioning](#docs:stable:data:partitioning:hive_partitioning) structure are good ways to transparently handle multiple files. ##### Enabling `PER_THREAD_OUTPUT` {#docs:stable:data:parquet:tips::enabling-per_thread_output} If the final number of Parquet files is not important, writing one file per thread can significantly improve performance: ```sql COPY (FROM generate_series(10_000_000)) TO 'test.parquet' (FORMAT parquet, PER_THREAD_OUTPUT); ``` ##### Selecting a `ROW_GROUP_SIZE` {#docs:stable:data:parquet:tips::selecting-a-row_group_size} The `ROW_GROUP_SIZE` parameter specifies the minimum number of rows in a Parquet row group, with a minimum value equal to DuckDB's vector size, 2,048, and a default of 122,880. A Parquet row group is a partition of rows, consisting of a column chunk for each column in the dataset. Compression algorithms are only applied per row group, so the larger the row group size, the more opportunities to compress the data. On the other hand, larger row group sizes mean that each thread keeps more data in memory before flushing when streaming results. Another argument for smaller row group sizes is that DuckDB can read Parquet row groups in parallel even within the same file and uses predicate pushdown to only scan the row groups whose metadata ranges match the `WHERE` clause of the query. However, there is some overhead associated with reading the metadata in each group. A good rule of thumb is to ensure that the number of row groups per file is at least as large as the number of CPU threads used to query that file. More row groups beyond the thread count would improve the speed of highly selective queries, but slow down queries that must scan the whole file like aggregations. To write a query to a Parquet file with a different row group size, run: ```sql COPY (FROM generate_series(100_000)) TO 'row-groups.parquet' (FORMAT parquet, ROW_GROUP_SIZE 100_000); ``` ##### The `ROW_GROUPS_PER_FILE` Option {#docs:stable:data:parquet:tips::the-row_groups_per_file-option} The `ROW_GROUPS_PER_FILE` parameter creates a new Parquet file if the current one has a specified number of row groups. ```sql COPY (FROM generate_series(100_000)) TO 'output-directory' (FORMAT parquet, ROW_GROUP_SIZE 20_000, ROW_GROUPS_PER_FILE 2); ``` > If multiple threads are active, the number of row groups in a file may slightly exceed the specified number of row groups to limit the amount of locking â€“ similarly to the behaviour of [`FILE_SIZE_BYTES`](#..:..:sql:statements:copy::copy--to-options). > However, if `PER_THREAD_OUTPUT` is set, only one thread writes to each file, and it becomes accurate again. See the [Performance Guide on â€œFile Formatsâ€](#docs:stable:guides:performance:file_formats::parquet-file-sizes) for more tips. ## Partitioning {#data:partitioning} ### Hive Partitioning {#docs:stable:data:partitioning:hive_partitioning} #### Examples {#docs:stable:data:partitioning:hive_partitioning::examples} Read data from a Hive partitioned dataset: ```sql SELECT * FROM read_parquet('orders/*/*/*.parquet', hive_partitioning = true); ``` Write a table to a Hive partitioned dataset: ```sql COPY orders TO 'orders' (FORMAT parquet, PARTITION_BY (year, month)); ``` Note that the `PARTITION_BY` options cannot use expressions. You can produce columns on the fly using the following syntax: ```sql COPY (SELECT *, year(timestamp) AS year, month(timestamp) AS month FROM services) TO 'test' (PARTITION_BY (year, month)); ``` When reading, the partition columns are read from the directory structure and can be included or excluded depending on the `hive_partitioning` parameter. ```sql FROM read_parquet('test/*/*/*.parquet', hive_partitioning = false); -- will not include year, month columns FROM read_parquet('test/*/*/*.parquet', hive_partitioning = true); -- will include year, month partition columns ``` #### Hive Partitioning {#docs:stable:data:partitioning:hive_partitioning::hive-partitioning} Hive partitioning is a [partitioning strategy](https://en.wikipedia.org/wiki/Partition_(database)) that is used to split a table into multiple files based on **partition keys**. The files are organized into folders. Within each folder, the **partition key** has a value that is determined by the name of the folder. Below is an example of a Hive partitioned file hierarchy. The files are partitioned on two keys (` year` and `month`). ```text orders â”œâ”€â”€ year=2021 â”‚ â”œâ”€â”€ month=1 â”‚ â”‚ â”œâ”€â”€ file1.parquet â”‚ â”‚ â””â”€â”€ file2.parquet â”‚ â””â”€â”€ month=2 â”‚ â””â”€â”€ file3.parquet â””â”€â”€ year=2022 â”œâ”€â”€ month=11 â”‚ â”œâ”€â”€ file4.parquet â”‚ â””â”€â”€ file5.parquet â””â”€â”€ month=12 â””â”€â”€ file6.parquet ``` Files stored in this hierarchy can be read using the `hive_partitioning` flag. ```sql SELECT * FROM read_parquet('orders/*/*/*.parquet', hive_partitioning = true); ``` When we specify the `hive_partitioning` flag, the values of the columns will be read from the directories. ##### Filter Pushdown {#docs:stable:data:partitioning:hive_partitioning::filter-pushdown} Filters on the partition keys are automatically pushed down into the files. This way the system skips reading files that are not necessary to answer a query. For example, consider the following query on the above dataset: ```sql SELECT * FROM read_parquet('orders/*/*/*.parquet', hive_partitioning = true) WHERE year = 2022 AND month = 11; ``` When executing this query, only the following files will be read: ```text orders â””â”€â”€ year=2022 â””â”€â”€ month=11 â”œâ”€â”€ file4.parquet â””â”€â”€ file5.parquet ``` ##### Auto-detection {#docs:stable:data:partitioning:hive_partitioning::auto-detection} By default the system tries to infer if the provided files are in a hive partitioned hierarchy. And if so, the `hive_partitioning` flag is enabled automatically. The auto-detection will look at the names of the folders and search for a `'key' = 'value'` pattern. This behavior can be overridden by using the `hive_partitioning` configuration option: ```sql SET hive_partitioning = false; ``` ##### Hive Types {#docs:stable:data:partitioning:hive_partitioning::hive-types} `hive_types` is a way to specify the logical types of the hive partitions in a struct: ```sql SELECT * FROM read_parquet( 'dir/**/*.parquet', hive_partitioning = true, hive_types = {'release': DATE, 'orders': BIGINT} ); ``` `hive_types` will be auto-detected for the following types: `DATE`, `TIMESTAMP` and `BIGINT`. To switch off the auto-detection, the flag `hive_types_autocast = 0` can be set. ##### Writing Partitioned Files {#docs:stable:data:partitioning:hive_partitioning::writing-partitioned-files} See the [Partitioned Writes](#docs:stable:data:partitioning:partitioned_writes) section. ### Partitioned Writes {#docs:stable:data:partitioning:partitioned_writes} #### Examples {#docs:stable:data:partitioning:partitioned_writes::examples} Write a table to a Hive partitioned dataset of Parquet files: ```sql COPY orders TO 'orders' (FORMAT parquet, PARTITION_BY (year, month)); ``` Write a table to a Hive partitioned dataset of CSV files, allowing overwrites: ```sql COPY orders TO 'orders' (FORMAT csv, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE); ``` Write a table to a Hive partitioned dataset of GZIP-compressed CSV files, setting explicit data files' extension: ```sql COPY orders TO 'orders' (FORMAT csv, PARTITION_BY (year, month), COMPRESSION gzip, FILE_EXTENSION 'csv.gz'); ``` #### Partitioned Writes {#docs:stable:data:partitioning:partitioned_writes::partitioned-writes} When the `PARTITION_BY` clause is specified for the [`COPY` statement](#docs:stable:sql:statements:copy), the files are written in a [Hive partitioned](#docs:stable:data:partitioning:hive_partitioning) folder hierarchy. The target is the name of the root directory (in the example above: `orders`). The files are written in-order in the file hierarchy. Currently, one file is written per thread to each directory. ```text orders â”œâ”€â”€ year=2021 â”‚ â”œâ”€â”€ month=1 â”‚ â”‚ â”œâ”€â”€ data_1.parquet â”‚ â”‚ â””â”€â”€ data_2.parquet â”‚ â””â”€â”€ month=2 â”‚ â””â”€â”€ data_1.parquet â””â”€â”€ year=2022 â”œâ”€â”€ month=11 â”‚ â”œâ”€â”€ data_1.parquet â”‚ â””â”€â”€ data_2.parquet â””â”€â”€ month=12 â””â”€â”€ data_1.parquet ``` The values of the partitions are automatically extracted from the data. Note that it can be very expensive to write a larger number of partitions as many files will be created. The ideal partition count depends on how large your dataset is. To limit the maximum number of files the system can keep open before flushing to disk when writing using `PARTITION_BY`, use the `partitioned_write_max_open_files` configuration option (default: 100): ```batch SET partitioned_write_max_open_files = 10; ``` > **Best practice.** Writing data into many small partitions is expensive. It is generally recommended to have at least `100 MB` of data per partition. ##### Filename Pattern {#docs:stable:data:partitioning:partitioned_writes::filename-pattern} By default, files will be named `data_0.parquet` or `data_0.csv`. With the flag `FILENAME_PATTERN` a pattern with `{i}` or `{uuid}` can be defined to create specific filenames: * `{i}` will be replaced by an index * `{uuid}` will be replaced by a 128 bits long UUID Write a table to a Hive partitioned dataset of .parquet files, with an index in the filename: ```sql COPY orders TO 'orders' (FORMAT parquet, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE, FILENAME_PATTERN 'orders_{i}'); ``` Write a table to a Hive partitioned dataset of .parquet files, with unique filenames: ```sql COPY orders TO 'orders' (FORMAT parquet, PARTITION_BY (year, month), OVERWRITE_OR_IGNORE, FILENAME_PATTERN 'file_{uuid}'); ``` ##### Overwriting {#docs:stable:data:partitioning:partitioned_writes::overwriting} By default the partitioned write will not allow overwriting existing directories. On a local file system, the `OVERWRITE` and `OVERWRITE_OR_IGNORE` options remove the existing directories. On remote file systems, overwriting is not supported. ##### Appending {#docs:stable:data:partitioning:partitioned_writes::appending} To append to an existing Hive partitioned directory structure, use the `APPEND` option: ```sql COPY orders TO 'orders' (FORMAT parquet, PARTITION_BY (year, month), APPEND); ``` Using the `APPEND` option result in a behavior similar the `OVERWRITE_OR_IGNORE, FILENAME_PATTERN '{uuid}'` options, but DuckDB performs an extra check for whether the file already exists and then regenerates the UUID in the rare event that it does (to avoid clashes). ##### Handling Slashes in Columns {#docs:stable:data:partitioning:partitioned_writes::handling-slashes-in-columns} To handle slashes in column names, use Percent-Encoding implemented by the [`url_encode` function](#docs:stable:sql:functions:text::url_encodestring). ## Appender {#docs:stable:data:appender} The Appender can be used to load bulk data into a DuckDB database. It is currently available in the [C, C++, Go, Java, and Rust APIs](#::appender-support-in-other-clients). The Appender is tied to a connection, and will use the transaction context of that connection when appending. An Appender always appends to a single table in the database file. In the [C++ API](#docs:stable:clients:cpp), the Appender works as follows: ```cpp DuckDB db; Connection con(db); // create the table con.Query("CREATE TABLE people (id INTEGER, name VARCHAR)"); // initialize the appender Appender appender(con, "people"); ``` The `AppendRow` function is the easiest way of appending data. It uses recursive templates to allow you to put all the values of a single row within one function call, as follows: ```cpp appender.AppendRow(1, "Mark"); ``` Rows can also be individually constructed using the `BeginRow`, `EndRow` and `Append` methods. This is done internally by `AppendRow`, and hence has the same performance characteristics. ```cpp appender.BeginRow(); appender.Append(2); appender.Append("Hannes"); appender.EndRow(); ``` Any values added to the Appender are cached prior to being inserted into the database system for performance reasons. That means that, while appending, the rows might not be immediately visible in the system. The cache is automatically flushed when the Appender goes out of scope or when `appender.Close()` is called. The cache can also be manually flushed using the `appender.Flush()` method. After either `Flush` or `Close` is called, all the data has been written to the database system. #### Date, Time and Timestamps {#docs:stable:data:appender::date-time-and-timestamps} While numbers and strings are rather self-explanatory, dates, times and timestamps require some explanation. They can be directly appended using the methods provided by `duckdb::Date`, `duckdb::Time` or `duckdb::Timestamp`. They can also be appended using the internal `duckdb::Value` type, however, this adds some additional overheads and should be avoided if possible. Below is a short example: ```cpp con.Query("CREATE TABLE dates (d DATE, t TIME, ts TIMESTAMP)"); Appender appender(con, "dates"); // construct the values using the Date/Time/Timestamp types // (this is the most efficient approach) appender.AppendRow( Date::FromDate(1992, 1, 1), Time::FromTime(1, 1, 1, 0), Timestamp::FromDatetime(Date::FromDate(1992, 1, 1), Time::FromTime(1, 1, 1, 0)) ); // construct duckdb::Value objects appender.AppendRow( Value::DATE(1992, 1, 1), Value::TIME(1, 1, 1, 0), Value::TIMESTAMP(1992, 1, 1, 1, 1, 1, 0) ); ``` #### Commit Frequency {#docs:stable:data:appender::commit-frequency} By default, the appender performs a commits every 204,800 rows. You can change this by explicitly using [transactions](#docs:stable:sql:statements:transactions) and surrounding your batches of `AppendRow` calls by `BEGIN TRANSACTION` and `COMMIT` statements. #### Handling Constraint Violations {#docs:stable:data:appender::handling-constraint-violations} If the Appender encounters a `PRIMARY KEY` conflict or a `UNIQUE` constraint violation, it fails and returns the following error: ```console Constraint Error: PRIMARY KEY or UNIQUE constraint violated: duplicate key "..." ``` In this case, the entire append operation fails and no rows are inserted. #### Appender Support in Other Clients {#docs:stable:data:appender::appender-support-in-other-clients} The Appender is also available in the following client APIs: * [C](#docs:stable:clients:c:appender) * [Go](#docs:stable:clients:go::appender) * [Java (JDBC)](#docs:stable:clients:java::appender) * [Julia](#docs:stable:clients:julia::appender-api) * [Rust](#docs:stable:clients:rust::appender) ## INSERT Statements {#docs:stable:data:insert} `INSERT` statements are the standard way of loading data into a relational database. When using `INSERT` statements, the values are supplied row-by-row. While simple, there is significant overhead involved in parsing and processing individual `INSERT` statements. This makes lots of individual row-by-row insertions very inefficient for bulk insertion. > **Best practice.** As a rule-of-thumb, avoid using lots of individual row-by-row `INSERT` statements when inserting more than a few rows (i.e., avoid using `INSERT` statements as part of a loop). When bulk inserting data, try to maximize the amount of data that is inserted per statement. If you must use `INSERT` statements to load data in a loop, avoid executing the statements in auto-commit mode. After every commit, the database is required to sync the changes made to disk to ensure no data is lost. In auto-commit mode every single statement will be wrapped in a separate transaction, meaning `fsync` will be called for every statement. This is typically unnecessary when bulk loading and will significantly slow down your program. > **Tip.** If you absolutely must use `INSERT` statements in a loop to load data, wrap them in calls to `BEGIN TRANSACTION` and `COMMIT`. #### Syntax {#docs:stable:data:insert::syntax} An example of using `INSERT INTO` to load data in a table is as follows: ```sql CREATE TABLE people (id INTEGER, name VARCHAR); INSERT INTO people VALUES (1, 'Mark'), (2, 'Hannes'); ``` For a more detailed description together with syntax diagram can be found, see the [page on the `INSERT` statement](#docs:stable:sql:statements:insert). # Lakehouse Formats {#docs:stable:lakehouse_formats} Lakehouse formats, often referred to as open table formats, are specifications for storing data in object storage while maintaining some guarantees such as ACID transactions or keeping snapshot history. Over time, multiple lakehouse formats have emerged, each one with its own unique approach to managing its metadata (a.k.a. catalog). In this page, we will go over the support that DuckDB offers for some of these formats as well as some workarounds that you can use to still use DuckDB and get close to full interoperability with these formats. #### DuckDB Lakehouse Support Matrix {#docs:stable:lakehouse_formats::duckdb-lakehouse-support-matrix} DuckDB supports Iceberg, Delta and DuckLake as first-class citizens. The following matrix represents what DuckDB natively supports out of the box through core extensions. | | DuckLake | Iceberg | Delta | | ---------------------------- | :-------------------------------------------------------------------- | :---------------------------------------------------------------------- | :--------------------------------------------------------- | | Extension | [`ducklake`](https://ducklake.select/docs/stable/duckdb/introduction) | [`iceberg`](#docs:stable:core_extensions:iceberg:overview) | [`delta`](#docs:stable:core_extensions:delta) | | Read | âœ… | âœ… | âœ… | | Write | âœ… | âœ… | âœ… | | Deletes | âœ… | âŒ | âŒ | | Updates | âœ… | âŒ | âŒ | | Upserting | âœ… | âŒ | âŒ | | Create table | âœ… | âœ… | âŒ | | Create table with partitions | âœ… | âŒ | âŒ | | Attaching to a catalog | âœ… | âœ… | âœ… `*` | | Rename table | âœ… | âŒ | âŒ | | Rename columns | âœ… | âŒ | âŒ | | Add/drop columns | âœ… | âŒ | âŒ | | Alter column type | âœ… | âŒ | âŒ | | Compaction and maintenance | âœ… | âŒ | âŒ | | Encryption | âœ… | âŒ | âŒ | | Manage table properties | âœ… | âŒ | âŒ | | Time travel | âœ… | âœ… | âœ… | | Query table changes | âœ… | âŒ | âŒ | `*` Through the `uc_catalog` extension DuckDB aims to build native extensions with minimal dependencies. The `iceberg` extension for example, has dependencies on third-party Iceberg libraries, which means all data and metadata operations are implemented natively in the DuckDB extension. For the `delta` extension, we use the [`delta-kernel-rs` project](https://github.com/delta-io/delta-kernel-rs), which is meant to be a lightweight platform for engines to build delta integrations that are as close to native as possible. > **Why do native implementations matter?** Native implementations allow DuckDB to do more performance optimizations such as complex filter pushdowns (with file-level and row-group level pruning) and improve memory management. #### Workarounds for Unsupported Features {#docs:stable:lakehouse_formats::workarounds-for-unsupported-features} If the DuckDB core extension does not cover your use case, you can still use DuckDB to process the data and use an external library to help you with the unsupported operations. If you are using the Python client, there are some very good off-the-self libraries that can help you. This examples will have one thing in common, they use Arrow as an efficient, zero-copy data interface with DuckDB. ##### Using PyIceberg with DuckDB {#docs:stable:lakehouse_formats::using-pyiceberg-with-duckdb} In this example, we will use [PyIceberg](https://py.iceberg.apache.org/) to create and alter the schema of a table and DuckDB to read and write to the table.

Click here to see the full example.

```python from pyiceberg.catalog import load_catalog from pyiceberg.schema import Schema from pyiceberg.types import ( TimestampType, FloatType, DoubleType, StringType, NestedField, ) import duckdb # Create a table with PyIceberg catalog = load_catalog( "docs", **{ "uri": "http://127.0.0.1:8181", "s3.endpoint": "http://127.0.0.1:9000", "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO", "s3.access-key-id": "admin", "s3.secret-access-key": "password", } ) schema = Schema( NestedField(field_id=1, name="datetime", field_type=TimestampType(), required=True), NestedField(field_id=2, name="symbol", field_type=StringType(), required=True), NestedField(field_id=3, name="bid", field_type=FloatType(), required=False), NestedField(field_id=4, name="ask", field_type=DoubleType(), required=False) ) catalog.create_table( identifier="default.bids", schema=schema, partition_spec=partition_spec, ) # Write and read the table with DuckDB with duckdb.connect() as conn: conn.execute(""" CREATE SECRET ( TYPE S3, KEY_ID 'admin', SECRET 'password', ENDPOINT '127.0.0.1:9000', URL_STYLE 'path', USE_SSL false ); ATTACH '' AS my_datalake ( TYPE ICEBERG, CLIENT_ID 'admin', CLIENT_SECRET 'password', ENDPOINT 'http://127.0.0.1:8181' ); """) conn.execute(""" INSERT INTO my_datalake.default.bids VALUES ('2024-01-01 10:00:00', 'AAPL', 150.0, 150.5); """) conn.sql("SELECT * FROM my_datalake.default.bids;").show() # Alter schema with PyIceberg table = catalog.load_table("default.bids") with table.update_schema() as update: update.add_column("retries", IntegerType(), "Number of retries to place the bid") ```

##### Using delta-rs with DuckDB {#docs:stable:lakehouse_formats::using-delta-rs-with-duckdb} In this example, we create a Delta table with the `delta-rs` Python binding, then we use the `delta` extension of DuckDB to read it. We also showcase how to do other read operations with DuckDB, like reading the change data feed using the Arrow zero-copy integration. This operation can also be lazy if reading bigger data by using [Arrow Datasets](https://delta-io.github.io/delta-rs/integrations/delta-lake-arrow/).

Click here to see the full example.

```python import deltalake as dl import pyarrow as pa # Create a delta table and read it with DuckDB Delta extension dl.write_deltalake( "tmp/some_table", pa.table({ "id": [1, 2, 3], "value": ["a", "b", "c"] }) ) with duckdb.connect() as conn: conn.execute(""" INSTALL delta; LOAD delta; """) conn.sql(""" SELECT * FROM delta_scan('tmp/some_table') """).show() # Append some data and read the data change feed using the PyArrow integration dl.write_deltalake( "tmp/some_table", pa.table({ "id": [4, 5], "value": ["d", "e"] }), mode="append" ) table = dl.DeltaTable("tmp/some_table").load_cdf(starting_version=1, ending_version=2) with duckdb.connect() as conn: conn.register("t", table) conn.sql("SELECT * FROM t").show() ```

# Client APIs {#clients} ## Client Overview {#docs:stable:clients:overview} DuckDB is an in-process database system and offers client APIs (also known as â€œdriversâ€) for several languages. | Client API | Maintainer | Support tier | Version | | ------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------- | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | [C](#docs:stable:clients:c:overview) | The DuckDB team | Primary | [1.4.1](https://duckdb.org/install/index.html?version=stable&environment=cplusplus) | | [Command Line Interface (CLI)](#docs:stable:clients:cli:overview) | The DuckDB team | Primary | [1.4.1](https://duckdb.org/install/index.html?version=stable&environment=cli) | | [Java (JDBC)](#docs:stable:clients:java) | The DuckDB team | Primary | [{{ site.current_duckdb_java_short_version }}](https://central.sonatype.com/artifact/org.duckdb/duckdb_jdbc) | | [Go](#docs:stable:clients:go) | The DuckDB team | Primary | [{{ site.current_duckdb_go_version }}](https://github.com/duckdb/duckdb-go#readme) | | [Node.js (node-neo)](#docs:stable:clients:node_neo:overview) | [Jeff Raymakers](https://github.com/jraymakers) ([MotherDuck](https://motherduck.com/)) | Primary | [{{ site.current_duckdb_node_neo_version }}](https://www.npmjs.com/package/@duckdb/node-api) | | [ODBC](#docs:stable:clients:odbc:overview) | The DuckDB team | Primary | [{{ site.current_duckdb_odbc_short_version }}](https://duckdb.org/install/index.html?version=stable&environment=odbc) | | [Python](#docs:stable:clients:python:overview) | The DuckDB team | Primary | [1.4.1](https://pypi.org/project/duckdb/) | | [R](#docs:stable:clients:r) | [Kirill MÃ¼ller](https://github.com/krlmlr) and the DuckDB team | Primary | [{{ site.current_duckdb_r_version }}](https://cran.r-project.org/web/packages/duckdb/index.html) | | [Rust](#docs:stable:clients:rust) | The DuckDB team | Primary | [{{ site.current_duckdb_rust_version }}](https://crates.io/crates/duckdb) | | [WebAssembly (Wasm)](#docs:stable:clients:wasm:overview) | The DuckDB team | Primary | [{{ site.current_duckdb_wasm_version }}](https://github.com/duckdb/duckdb-wasm#readme) | | [ADBC (Arrow)](#docs:stable:clients:adbc) | The DuckDB team | Secondary | [1.4.1](#docs:stable:clients:adbc) | | [C# (.NET)](https://duckdb.net/) | [Giorgi](https://github.com/Giorgi) | Secondary | [{{ site.current_duckdb_csharp_version}}](https://www.nuget.org/packages?q=Tags%3A%22DuckDB%22+Author%3A%22Giorgi%22&includeComputedFrameworks=true&prerel=true&sortby=relevance) | | [C++](#docs:stable:clients:cpp) | The DuckDB team | Secondary | [1.4.1](https://duckdb.org/install/index.html?version=stable&environment=cplusplus) | | [Node.js (deprecated)](#docs:stable:clients:nodejs:overview) | The DuckDB team | Secondary | [{{ site.current_duckdb_nodejs_version }}](https://www.npmjs.com/package/duckdb) | For a list of tertiary clients, see the [â€œTertiary Clientsâ€ page](#docs:stable:clients:tertiary). #### Support Tiers {#docs:stable:clients:overview::support-tiers} There are three tiers of support for clients. Primary clients are the first to receive new features and are covered by [community support](https://duckdblabs.com/community_support_policy). Secondary clients receive new features but are not covered by community support. Finally, there are no feature or support guarantees for tertiary clients. > The DuckDB clients listed above are open-source and we welcome community contributions to these libraries. > All primary and secondary clients are available for the MIT license. > For tertiary clients, please consult the repository for the license. We report the latest stable version for the clients in the primary and secondary support tiers. #### Compatibility {#docs:stable:clients:overview::compatibility} All DuckDB clients support the same DuckDB SQL syntax and use the same on-disk [database format](#docs:stable:internals:storage). [DuckDB extensions](#docs:stable:extensions:overview) are also portable between clients with some exceptions (see [Wasm extensions](#docs:stable:clients:wasm:extensions::list-of-officially-available-extensions)). ## Tertiary Clients {#docs:stable:clients:tertiary} | Client API | Maintainer | Support tier | Version | | ------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- | ------------ | --------------------------------------------------------------------------------------------------------------------------- | | [Common Lisp](https://github.com/ak-coram/cl-duckdb) | [ak-coram](https://github.com/ak-coram) | Tertiary | | | [Crystal](https://github.com/amauryt/crystal-duckdb) | [amauryt](https://github.com/amauryt) | Tertiary | | | [Dart](#docs:stable:clients:dart) | [TigerEye](https://www.tigereye.com/) | Tertiary | | | [Elixir](https://github.com/AlexR2D2/duckdbex) | [AlexR2D2](https://github.com/AlexR2D2/duckdbex) | Tertiary | | | [Erlang](https://github.com/mmzeeman/educkdb) | [MM Zeeman](https://github.com/mmzeeman) | Tertiary | | | [Haskell](https://github.com/tritlo/duckdb-haskell) | [Tritlo](https://github.com/tritlo) | Tertiary | | | [Julia](#docs:stable:clients:julia) | The DuckDB team | Tertiary | | | [PHP](#docs:stable:clients:php) | [satur-io](https://github.com/satur-io/duckdb-php) | Tertiary | | | [Pyodide](https://github.com/duckdb/duckdb-pyodide) | The DuckDB team | Tertiary | | | [Raku](https://raku.land/zef:bduggan/Duckie) | [bduggan](https://github.com/bduggan) | Tertiary | | | [Ruby](https://suketa.github.io/ruby-duckdb/) | [suketa](https://github.com/suketa) | Tertiary | | | [Scala](https://www.duck4s.com/docs/index.html) | [Salar Rahmanian](https://www.softinio.com) | Tertiary | | | [Swift](#docs:stable:clients:swift) | The DuckDB team | Tertiary | | | [Zig](https://github.com/karlseguin/zuckdb.zig) | [karlseguin](https://github.com/karlseguin) | Tertiary | | ## ADBC Client {#docs:stable:clients:adbc} > The latest stable version of the DuckDB ADBC client is 1.4.1. [Arrow Database Connectivity (ADBC)](https://arrow.apache.org/adbc/), similarly to ODBC and JDBC, is a C-style API that enables code portability between different database systems. This allows developers to effortlessly build applications that communicate with database systems without using code specific to that system. The main difference between ADBC and ODBC/JDBC is that ADBC uses [Arrow](https://arrow.apache.org/) to transfer data between the database system and the application. DuckDB has an ADBC driver, which takes advantage of the [zero-copy integration between DuckDB and Arrow](https://duckdb.org/2021/12/03/duck-arrow) to efficiently transfer data. Please refer to the [ADBC documentation page](https://arrow.apache.org/adbc/current/) for a more extensive discussion on ADBC and a detailed API explanation. #### Implemented Functionality {#docs:stable:clients:adbc::implemented-functionality} The DuckDB-ADBC driver implements the full ADBC specification, with the exception of the `ConnectionReadPartition` and `StatementExecutePartitions` functions. Both of these functions exist to support systems that internally partition the query results, which does not apply to DuckDB. In this section, we will describe the main functions that exist in ADBC, along with the arguments they take and provide examples for each function. ##### Database {#docs:stable:clients:adbc::database} Set of functions that operate on a database. | Function name | Description | Arguments | Example | |:---|:-|:---|:----| | `DatabaseNew` | Allocate a new (but uninitialized) database. | `(AdbcDatabase *database, AdbcError *error)` | `AdbcDatabaseNew(&adbc_database, &adbc_error)` | | `DatabaseSetOption` | Set a char* option. | `(AdbcDatabase *database, const char *key, const char *value, AdbcError *error)` | `AdbcDatabaseSetOption(&adbc_database, "path", "test.db", &adbc_error)` | | `DatabaseInit` | Finish setting options and initialize the database. | `(AdbcDatabase *database, AdbcError *error)` | `AdbcDatabaseInit(&adbc_database, &adbc_error)` | | `DatabaseRelease` | Destroy the database. | `(AdbcDatabase *database, AdbcError *error)` | `AdbcDatabaseRelease(&adbc_database, &adbc_error)` | ##### Connection {#docs:stable:clients:adbc::connection} A set of functions that create and destroy a connection to interact with a database. | Function name | Description | Arguments | Example | |:---|:-|:---|:----| | `ConnectionNew` | Allocate a new (but uninitialized) connection. | `(AdbcConnection*, AdbcError*)` | `AdbcConnectionNew(&adbc_connection, &adbc_error)` | | `ConnectionSetOption` | Options may be set before ConnectionInit. | `(AdbcConnection*, const char*, const char*, AdbcError*)` | `AdbcConnectionSetOption(&adbc_connection, ADBC_CONNECTION_OPTION_AUTOCOMMIT, ADBC_OPTION_VALUE_DISABLED, &adbc_error)` | | `ConnectionInit` | Finish setting options and initialize the connection. | `(AdbcConnection*, AdbcDatabase*, AdbcError*)` | `AdbcConnectionInit(&adbc_connection, &adbc_database, &adbc_error)` | | `ConnectionRelease` | Destroy this connection. | `(AdbcConnection*, AdbcError*)` | `AdbcConnectionRelease(&adbc_connection, &adbc_error)` | A set of functions that retrieve metadata about the database. In general, these functions will return Arrow objects, specifically an ArrowArrayStream. | Function name | Description | Arguments | Example | |:---|:-|:---|:----| | `ConnectionGetObjects` | Get a hierarchical view of all catalogs, database schemas, tables, and columns. | `(AdbcConnection*, int, const char*, const char*, const char*, const char**, const char*, ArrowArrayStream*, AdbcError*)` | `AdbcDatabaseInit(&adbc_database, &adbc_error)` | | `ConnectionGetTableSchema` | Get the Arrow schema of a table. | `(AdbcConnection*, const char*, const char*, const char*, ArrowSchema*, AdbcError*)` | `AdbcDatabaseRelease(&adbc_database, &adbc_error)` | | `ConnectionGetTableTypes` | Get a list of table types in the database. | `(AdbcConnection*, ArrowArrayStream*, AdbcError*)` | `AdbcDatabaseNew(&adbc_database, &adbc_error)` | A set of functions with transaction semantics for the connection. By default, all connections start with auto-commit mode on, but this can be turned off via the ConnectionSetOption function. | Function name | Description | Arguments | Example | |:---|:-|:---|:----| | `ConnectionCommit` | Commit any pending transactions. | `(AdbcConnection*, AdbcError*)` | `AdbcConnectionCommit(&adbc_connection, &adbc_error)` | | `ConnectionRollback` | Rollback any pending transactions. | `(AdbcConnection*, AdbcError*)` | `AdbcConnectionRollback(&adbc_connection, &adbc_error)` | ##### Statement {#docs:stable:clients:adbc::statement} Statements hold state related to query execution. They represent both one-off queries and prepared statements. They can be reused; however, doing so will invalidate prior result sets from that statement. The functions used to create, destroy, and set options for a statement: | Function name | Description | Arguments | Example | |:---|:-|:---|:----| | `StatementNew` | Create a new statement for a given connection. | `(AdbcConnection*, AdbcStatement*, AdbcError*)` | `AdbcStatementNew(&adbc_connection, &adbc_statement, &adbc_error)` | | `StatementRelease` | Destroy a statement. | `(AdbcStatement*, AdbcError*)` | `AdbcStatementRelease(&adbc_statement, &adbc_error)` | | `StatementSetOption` | Set a string option on a statement. | `(AdbcStatement*, const char*, const char*, AdbcError*)` | `StatementSetOption(&adbc_statement, ADBC_INGEST_OPTION_TARGET_TABLE, "TABLE_NAME", &adbc_error)` | Functions related to query execution: | Function name | Description | Arguments | Example | |:---|:-|:---|:----| | `StatementSetSqlQuery` | Set the SQL query to execute. The query can then be executed with StatementExecuteQuery. | `(AdbcStatement*, const char*, AdbcError*)` | `AdbcStatementSetSqlQuery(&adbc_statement, "SELECT * FROM TABLE", &adbc_error)` | | `StatementSetSubstraitPlan` | Set a substrait plan to execute. The query can then be executed with StatementExecuteQuery. | `(AdbcStatement*, const uint8_t*, size_t, AdbcError*)` | `AdbcStatementSetSubstraitPlan(&adbc_statement, substrait_plan, length, &adbc_error)` | | `StatementExecuteQuery` | Execute a statement and get the results. | `(AdbcStatement*, ArrowArrayStream*, int64_t*, AdbcError*)` | `AdbcStatementExecuteQuery(&adbc_statement, &arrow_stream, &rows_affected, &adbc_error)` | | `StatementPrepare` | Turn this statement into a prepared statement to be executed multiple times. | `(AdbcStatement*, AdbcError*)` | `AdbcStatementPrepare(&adbc_statement, &adbc_error)` | Functions related to binding, used for bulk insertion or in prepared statements. | Function name | Description | Arguments | Example | |:---|:-|:---|:----| | `StatementBindStream` | Bind Arrow Stream. This can be used for bulk inserts or prepared statements. | `(AdbcStatement*, ArrowArrayStream*, AdbcError*)` | `StatementBindStream(&adbc_statement, &input_data, &adbc_error)` | #### Setting Up the DuckDB ADBC Driver {#docs:stable:clients:adbc::setting-up-the-duckdb-adbc-driver} Before using DuckDB as an ADBC driver, you must install the `libduckdb` shared library on your system and make it available to your application. This library contains the core DuckDB engine that the ADBC driver interfaces with. ##### Downloading libduckdb {#docs:stable:clients:adbc::downloading-libduckdb} Download the appropriate `libduckdb` library for your platform from the [DuckDB releases page](https://github.com/duckdb/duckdb/releases): - **Linux**: `libduckdb-linux-amd64.zip` (contains `libduckdb.so`) - **macOS**: `libduckdb-osx-universal.zip` (contains `libduckdb.dylib`) - **Windows**: `libduckdb-windows-amd64.zip` (contains `duckdb.dll`) Extract the archive to obtain the shared library file. ##### Installing the Library {#docs:stable:clients:adbc::installing-the-library} ###### Linux {#docs:stable:clients:adbc::linux} 1. Extract the `libduckdb.so` file from the downloaded archive 2. Make sure your code can use the library. You can: - Either copy it to a system library directory (requires root access): ```bash sudo cp libduckdb.so /usr/local/lib/ sudo ldconfig ``` - Or place it in a custom directory and add that directory to your `LD_LIBRARY_PATH`: ```bash mkdir -p ~/lib cp libduckdb.so ~/lib/ export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH ``` ###### macOS {#docs:stable:clients:adbc::macos} 1. Extract the `libduckdb.dylib` file from the downloaded archive 2. Make sure your code can use the library. You can: - Either copy it to a system library directory: ```bash sudo cp libduckdb.dylib /usr/local/lib/ ``` - Or place it in a custom directory and add that directory to your `DYLD_LIBRARY_PATH`: ```bash mkdir -p ~/lib cp libduckdb.dylib ~/lib/ export DYLD_LIBRARY_PATH=~/lib:$DYLD_LIBRARY_PATH ``` ###### Windows {#docs:stable:clients:adbc::windows} 1. Extract the `duckdb.dll` file from the downloaded archive 2. Place it in one of the following locations: - The same directory as your application executable - A directory listed in your `PATH` environment variable - The Windows system directory (e.g., `C:\Windows\System32`) ##### Understanding Library Paths {#docs:stable:clients:adbc::understanding-library-paths} The `LD_LIBRARY_PATH` (Linux) and `DYLD_LIBRARY_PATH` (macOS) are environment variables that tell the system where to look for shared libraries at runtime. When your application tries to load `libduckdb`, the system searches these paths to locate the library file. ##### Verifying Installation {#docs:stable:clients:adbc::verifying-installation} You can verify that the library is properly installed and accessible: **Linux/macOS:** ```bash ldd path/to/your/application # Linux otool -L path/to/your/application # macOS ``` #### Examples {#docs:stable:clients:adbc::examples} Regardless of the programming language being used, there are two database options which will be required to utilize ADBC with DuckDB. The first one is the `driver`, which takes a path to the DuckDB library (see [Setting Up the DuckDB ADBC Driver](#::setting-up-the-duckdb-adbc-driver) above for installation instructions). The second option is the `entrypoint`, which is an exported function from the DuckDB-ADBC driver that initializes all the ADBC functions. Once we have configured these two options, we can optionally set the `path` option, providing a path on disk to store our DuckDB database. If not set, an in-memory database is created. After configuring all the necessary options, we can proceed to initialize our database. Below is how you can do so with various different language environments. ##### C++ {#docs:stable:clients:adbc::c} We begin our C++ example by declaring the essential variables for querying data through ADBC. These variables include Error, Database, Connection, Statement handling, and an Arrow Stream to transfer data between DuckDB and the application. ```cpp AdbcError adbc_error; AdbcDatabase adbc_database; AdbcConnection adbc_connection; AdbcStatement adbc_statement; ArrowArrayStream arrow_stream; ``` We can then initialize our database variable. Before initializing the database, we need to set the `driver` and `entrypoint` options as mentioned above. Then we set the `path` option and initialize the database. The `driver` option should point to your installed `libduckdb` library â€“ see [Setting Up the DuckDB ADBC Driver](#::setting-up-the-duckdb-adbc-driver) for installation instructions. ```cpp AdbcDatabaseNew(&adbc_database, &adbc_error); AdbcDatabaseSetOption(&adbc_database, "driver", "path/to/libduckdb.dylib", &adbc_error); AdbcDatabaseSetOption(&adbc_database, "entrypoint", "duckdb_adbc_init", &adbc_error); // By default, we start an in-memory database, but you can optionally define a path to store it on disk. AdbcDatabaseSetOption(&adbc_database, "path", "test.db", &adbc_error); AdbcDatabaseInit(&adbc_database, &adbc_error); ``` After initializing the database, we must create and initialize a connection to it. ```cpp AdbcConnectionNew(&adbc_connection, &adbc_error); AdbcConnectionInit(&adbc_connection, &adbc_database, &adbc_error); ``` We can now initialize our statement and run queries through our connection. After the `AdbcStatementExecuteQuery` the `arrow_stream` is populated with the result. ```cpp AdbcStatementNew(&adbc_connection, &adbc_statement, &adbc_error); AdbcStatementSetSqlQuery(&adbc_statement, "SELECT 42", &adbc_error); int64_t rows_affected; AdbcStatementExecuteQuery(&adbc_statement, &arrow_stream, &rows_affected, &adbc_error); arrow_stream.release(arrow_stream) ``` Besides running queries, we can also ingest data via `arrow_streams`. For this we need to set an option with the table name we want to insert to, bind the stream and then execute the query. ```cpp StatementSetOption(&adbc_statement, ADBC_INGEST_OPTION_TARGET_TABLE, "AnswerToEverything", &adbc_error); StatementBindStream(&adbc_statement, &arrow_stream, &adbc_error); StatementExecuteQuery(&adbc_statement, nullptr, nullptr, &adbc_error); ``` ##### Python {#docs:stable:clients:adbc::python} The first thing to do is to use `pip` and install the ADBC Driver manager. You will also need to install the `pyarrow` to directly access Apache Arrow formatted result sets (such as using `fetch_arrow_table`). ```bash pip install adbc_driver_manager pyarrow ``` > For details on the `adbc_driver_manager` package, see the [`adbc_driver_manager` package documentation](https://arrow.apache.org/adbc/current/python/api/adbc_driver_manager.html). As with C++, we need to provide initialization options consisting of the location of the libduckdb shared object and entrypoint function. Notice that the `path` argument for DuckDB is passed in through the `db_kwargs` dictionary. ```python import adbc_driver_duckdb.dbapi with adbc_driver_duckdb.dbapi.connect("test.db") as conn, conn.cursor() as cur: cur.execute("SELECT 42") # fetch a pyarrow table tbl = cur.fetch_arrow_table() print(tbl) ``` Alongside `fetch_arrow_table`, other methods from DBApi are also implemented on the cursor, such as `fetchone` and `fetchall`. Data can also be ingested via `arrow_streams`. We just need to set options on the statement to bind the stream of data and execute the query. ```python import adbc_driver_duckdb.dbapi import pyarrow data = pyarrow.record_batch( [[1, 2, 3, 4], ["a", "b", "c", "d"]], names = ["ints", "strs"], ) with adbc_driver_duckdb.dbapi.connect("test.db") as conn, conn.cursor() as cur: cur.adbc_ingest("AnswerToEverything", data) ``` ##### Go {#docs:stable:clients:adbc::go} Make sure to install the `libduckdb` library first â€“ see [Setting Up the DuckDB ADBC Driver](#::setting-up-the-duckdb-adbc-driver) for detailed installation instructions. The following example uses an in-memory DuckDB database to modify in-memory Arrow RecordBatches via SQL queries: {% raw %} ```go package main import ( "bytes" "context" "fmt" "io" "github.com/apache/arrow-adbc/go/adbc" "github.com/apache/arrow-adbc/go/adbc/drivermgr" "github.com/apache/arrow-go/v18/arrow" "github.com/apache/arrow-go/v18/arrow/array" "github.com/apache/arrow-go/v18/arrow/ipc" "github.com/apache/arrow-go/v18/arrow/memory" ) func _makeSampleArrowRecord() arrow.Record { b := array.NewFloat64Builder(memory.DefaultAllocator) b.AppendValues([]float64{1, 2, 3}, nil) col := b.NewArray() defer col.Release() defer b.Release() schema := arrow.NewSchema([]arrow.Field{{Name: "column1", Type: arrow.PrimitiveTypes.Float64}}, nil) return array.NewRecord(schema, []arrow.Array{col}, int64(col.Len())) } type DuckDBSQLRunner struct { ctx context.Context conn adbc.Connection db adbc.Database } func NewDuckDBSQLRunner(ctx context.Context) (*DuckDBSQLRunner, error) { var drv drivermgr.Driver db, err := drv.NewDatabase(map[string]string{ "driver": "duckdb", "entrypoint": "duckdb_adbc_init", "path": ":memory:", }) if err != nil { return nil, fmt.Errorf("failed to create new in-memory DuckDB database: %w", err) } conn, err := db.Open(ctx) if err != nil { return nil, fmt.Errorf("failed to open connection to new in-memory DuckDB database: %w", err) } return &DuckDBSQLRunner{ctx: ctx, conn: conn, db: db}, nil } func serializeRecord(record arrow.Record) (io.Reader, error) { buf := new(bytes.Buffer) wr := ipc.NewWriter(buf, ipc.WithSchema(record.Schema())) if err := wr.Write(record); err != nil { return nil, fmt.Errorf("failed to write record: %w", err) } if err := wr.Close(); err != nil { return nil, fmt.Errorf("failed to close writer: %w", err) } return buf, nil } func (r *DuckDBSQLRunner) importRecord(sr io.Reader) error { rdr, err := ipc.NewReader(sr) if err != nil { return fmt.Errorf("failed to create IPC reader: %w", err) } defer rdr.Release() _, err = adbc.IngestStream(r.ctx, r.conn, rdr, "temp_table", adbc.OptionValueIngestModeCreate, adbc.IngestStreamOptions{}) return err } func (r *DuckDBSQLRunner) runSQL(sql string) ([]arrow.Record, error) { stmt, err := r.conn.NewStatement() if err != nil { return nil, fmt.Errorf("failed to create new statement: %w", err) } defer stmt.Close() if err := stmt.SetSqlQuery(sql); err != nil { return nil, fmt.Errorf("failed to set SQL query: %w", err) } out, n, err := stmt.ExecuteQuery(r.ctx) if err != nil { return nil, fmt.Errorf("failed to execute query: %w", err) } defer out.Release() result := make([]arrow.Record, 0, n) for out.Next() { rec := out.Record() rec.Retain() // .Next() will release the record, so we need to retain it result = append(result, rec) } if out.Err() != nil { return nil, out.Err() } return result, nil } func (r *DuckDBSQLRunner) RunSQLOnRecord(record arrow.Record, sql string) ([]arrow.Record, error) { serializedRecord, err := serializeRecord(record) if err != nil { return nil, fmt.Errorf("failed to serialize record: %w", err) } if err := r.importRecord(serializedRecord); err != nil { return nil, fmt.Errorf("failed to import record: %w", err) } result, err := r.runSQL(sql) if err != nil { return nil, fmt.Errorf("failed to run SQL: %w", err) } if _, err := r.runSQL("DROP TABLE temp_table"); err != nil { return nil, fmt.Errorf("failed to drop temp table after running query: %w", err) } return result, nil } func (r *DuckDBSQLRunner) Close() { r.conn.Close() r.db.Close() } func main() { rec := _makeSampleArrowRecord() fmt.Println(rec) runner, err := NewDuckDBSQLRunner(context.Background()) if err != nil { panic(err) } defer runner.Close() resultRecords, err := runner.RunSQLOnRecord(rec, "SELECT column1+1 FROM temp_table") if err != nil { panic(err) } for _, resultRecord := range resultRecords { fmt.Println(resultRecord) resultRecord.Release() } } ``` {% endraw %} Running it produces the following output: ```go record: schema: fields: 1 - column1: type=float64 rows: 3 col[0][column1]: [1 2 3] record: schema: fields: 1 - (column1 + 1): type=float64, nullable rows: 3 col[0][(column1 + 1)]: [2 3 4] ``` ## C {#clients:c} ### Overview {#docs:stable:clients:c:overview} > The latest stable version of the DuckDB C API is 1.4.1. DuckDB implements a custom C API modeled somewhat following the SQLite C API. The API is contained in the `duckdb.h` header. Continue to [Startup & Shutdown](#docs:stable:clients:c:connect) to get started, or check out the [Full API overview](#docs:stable:clients:c:api). We also provide a SQLite API wrapper which means that if your applications is programmed against the SQLite C API, you can re-link to DuckDB and it should continue working. See the [`sqlite_api_wrapper`](https://github.com/duckdb/duckdb/tree/main/tools/sqlite3_api_wrapper) folder in our source repository for more information. #### Installation {#docs:stable:clients:c:overview::installation} The DuckDB C API can be installed as part of the `libduckdb` packages. Please see the [installation page](https://duckdb.org/install) for details. ### Startup & Shutdown {#docs:stable:clients:c:connect} To use DuckDB, you must first initialize a `duckdb_database` handle using `duckdb_open()`. `duckdb_open()` takes as parameter the database file to read and write from. The special value `NULL` (` nullptr`) can be used to create an **in-memory database**. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the process). With the `duckdb_database` handle, you can create one or many `duckdb_connection` using `duckdb_connect()`. While individual connections are thread-safe, they will be locked during querying. It is therefore recommended that each thread uses its own connection to allow for the best parallel performance. All `duckdb_connection`s have to explicitly be disconnected with `duckdb_disconnect()` and the `duckdb_database` has to be explicitly closed with `duckdb_close()` to avoid memory and file handle leaking. #### Example {#docs:stable:clients:c:connect::example} ```c duckdb_database db; duckdb_connection con; if (duckdb_open(NULL, &db) == DuckDBError) { // handle error } if (duckdb_connect(db, &con) == DuckDBError) { // handle error } // run queries... // cleanup duckdb_disconnect(&con); duckdb_close(&db); ``` #### API Reference Overview {#docs:stable:clients:c:connect::api-reference-overview} ```c duckdb_instance_cache duckdb_create_instance_cache(); duckdb_state duckdb_get_or_create_from_cache(duckdb_instance_cache instance_cache, const char *path, duckdb_database *out_database, duckdb_config config, char **out_error); void duckdb_destroy_instance_cache(duckdb_instance_cache *instance_cache); duckdb_state duckdb_open(const char *path, duckdb_database *out_database); duckdb_state duckdb_open_ext(const char *path, duckdb_database *out_database, duckdb_config config, char **out_error); void duckdb_close(duckdb_database *database); duckdb_state duckdb_connect(duckdb_database database, duckdb_connection *out_connection); void duckdb_interrupt(duckdb_connection connection); duckdb_query_progress_type duckdb_query_progress(duckdb_connection connection); void duckdb_disconnect(duckdb_connection *connection); void duckdb_connection_get_client_context(duckdb_connection connection, duckdb_client_context *out_context); void duckdb_connection_get_arrow_options(duckdb_connection connection, duckdb_arrow_options *out_arrow_options); idx_t duckdb_client_context_get_connection_id(duckdb_client_context context); void duckdb_destroy_client_context(duckdb_client_context *context); void duckdb_destroy_arrow_options(duckdb_arrow_options *arrow_options); const char *duckdb_library_version(); duckdb_value duckdb_get_table_names(duckdb_connection connection, const char *query, bool qualified); ``` ###### `duckdb_create_instance_cache` {#docs:stable:clients:c:connect::duckdb_create_instance_cache} Creates a new database instance cache. The instance cache is necessary if a client/program (re)opens multiple databases to the same file within the same process. Must be destroyed with 'duckdb_destroy_instance_cache'. ####### Return Value {#docs:stable:clients:c:connect::return-value} The database instance cache. ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c duckdb_instance_cache duckdb_create_instance_cache( ); ```
###### `duckdb_get_or_create_from_cache` {#docs:stable:clients:c:connect::duckdb_get_or_create_from_cache} Creates a new database instance in the instance cache, or retrieves an existing database instance. Must be closed with 'duckdb_close'. ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c duckdb_state duckdb_get_or_create_from_cache( duckdb_instance_cache instance_cache, const char *path, duckdb_database *out_database, duckdb_config config, char **out_error ); ``` ####### Parameters {#docs:stable:clients:c:connect::parameters} * `instance_cache`: The instance cache in which to create the database, or from which to take the database. * `path`: Path to the database file on disk. Both `nullptr` and `:memory:` open or retrieve an in-memory database. * `out_database`: The resulting cached database. * `config`: (Optional) configuration used to create the database. * `out_error`: If set and the function returns `DuckDBError`, this contains the error message. Note that the error message must be freed using `duckdb_free`. ####### Return Value {#docs:stable:clients:c:connect::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_destroy_instance_cache` {#docs:stable:clients:c:connect::duckdb_destroy_instance_cache} Destroys an existing database instance cache and de-allocates its memory. ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c void duckdb_destroy_instance_cache( duckdb_instance_cache *instance_cache ); ``` ####### Parameters {#docs:stable:clients:c:connect::parameters} * `instance_cache`: The instance cache to destroy.
###### `duckdb_open` {#docs:stable:clients:c:connect::duckdb_open} Creates a new database or opens an existing database file stored at the given path. If no path is given a new in-memory database is created instead. The database must be closed with 'duckdb_close'. ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c duckdb_state duckdb_open( const char *path, duckdb_database *out_database ); ``` ####### Parameters {#docs:stable:clients:c:connect::parameters} * `path`: Path to the database file on disk. Both `nullptr` and `:memory:` open an in-memory database. * `out_database`: The result database object. ####### Return Value {#docs:stable:clients:c:connect::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_open_ext` {#docs:stable:clients:c:connect::duckdb_open_ext} Extended version of duckdb_open. Creates a new database or opens an existing database file stored at the given path. The database must be closed with 'duckdb_close'. ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c duckdb_state duckdb_open_ext( const char *path, duckdb_database *out_database, duckdb_config config, char **out_error ); ``` ####### Parameters {#docs:stable:clients:c:connect::parameters} * `path`: Path to the database file on disk. Both `nullptr` and `:memory:` open an in-memory database. * `out_database`: The result database object. * `config`: (Optional) configuration used to start up the database. * `out_error`: If set and the function returns `DuckDBError`, this contains the error message. Note that the error message must be freed using `duckdb_free`. ####### Return Value {#docs:stable:clients:c:connect::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_close` {#docs:stable:clients:c:connect::duckdb_close} Closes the specified database and de-allocates all memory allocated for that database. This should be called after you are done with any database allocated through `duckdb_open` or `duckdb_open_ext`. Note that failing to call `duckdb_close` (in case of e.g., a program crash) will not cause data corruption. Still, it is recommended to always correctly close a database object after you are done with it. ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c void duckdb_close( duckdb_database *database ); ``` ####### Parameters {#docs:stable:clients:c:connect::parameters} * `database`: The database object to shut down.
###### `duckdb_connect` {#docs:stable:clients:c:connect::duckdb_connect} Opens a connection to a database. Connections are required to query the database, and store transactional state associated with the connection. The instantiated connection should be closed using 'duckdb_disconnect'. ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c duckdb_state duckdb_connect( duckdb_database database, duckdb_connection *out_connection ); ``` ####### Parameters {#docs:stable:clients:c:connect::parameters} * `database`: The database file to connect to. * `out_connection`: The result connection object. ####### Return Value {#docs:stable:clients:c:connect::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_interrupt` {#docs:stable:clients:c:connect::duckdb_interrupt} Interrupt running query ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c void duckdb_interrupt( duckdb_connection connection ); ``` ####### Parameters {#docs:stable:clients:c:connect::parameters} * `connection`: The connection to interrupt
###### `duckdb_query_progress` {#docs:stable:clients:c:connect::duckdb_query_progress} Get progress of the running query ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c duckdb_query_progress_type duckdb_query_progress( duckdb_connection connection ); ``` ####### Parameters {#docs:stable:clients:c:connect::parameters} * `connection`: The working connection ####### Return Value {#docs:stable:clients:c:connect::return-value} -1 if no progress or a percentage of the progress
###### `duckdb_disconnect` {#docs:stable:clients:c:connect::duckdb_disconnect} Closes the specified connection and de-allocates all memory allocated for that connection. ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c void duckdb_disconnect( duckdb_connection *connection ); ``` ####### Parameters {#docs:stable:clients:c:connect::parameters} * `connection`: The connection to close.
###### `duckdb_connection_get_client_context` {#docs:stable:clients:c:connect::duckdb_connection_get_client_context} Retrieves the client context of the connection. ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c void duckdb_connection_get_client_context( duckdb_connection connection, duckdb_client_context *out_context ); ``` ####### Parameters {#docs:stable:clients:c:connect::parameters} * `connection`: The connection. * `out_context`: The client context of the connection. Must be destroyed with `duckdb_destroy_client_context`.
###### `duckdb_connection_get_arrow_options` {#docs:stable:clients:c:connect::duckdb_connection_get_arrow_options} Retrieves the arrow options of the connection. ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c void duckdb_connection_get_arrow_options( duckdb_connection connection, duckdb_arrow_options *out_arrow_options ); ``` ####### Parameters {#docs:stable:clients:c:connect::parameters} * `connection`: The connection.
###### `duckdb_client_context_get_connection_id` {#docs:stable:clients:c:connect::duckdb_client_context_get_connection_id} Returns the connection id of the client context. ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c idx_t duckdb_client_context_get_connection_id( duckdb_client_context context ); ``` ####### Parameters {#docs:stable:clients:c:connect::parameters} * `context`: The client context. ####### Return Value {#docs:stable:clients:c:connect::return-value} The connection id of the client context.
###### `duckdb_destroy_client_context` {#docs:stable:clients:c:connect::duckdb_destroy_client_context} Destroys the client context and deallocates its memory. ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c void duckdb_destroy_client_context( duckdb_client_context *context ); ``` ####### Parameters {#docs:stable:clients:c:connect::parameters} * `context`: The client context to destroy.
###### `duckdb_destroy_arrow_options` {#docs:stable:clients:c:connect::duckdb_destroy_arrow_options} Destroys the arrow options and deallocates its memory. ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c void duckdb_destroy_arrow_options( duckdb_arrow_options *arrow_options ); ``` ####### Parameters {#docs:stable:clients:c:connect::parameters} * `arrow_options`: The arrow options to destroy.
###### `duckdb_library_version` {#docs:stable:clients:c:connect::duckdb_library_version} Returns the version of the linked DuckDB, with a version postfix for dev versions Usually used for developing C extensions that must return this for a compatibility check. ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c const char *duckdb_library_version( ); ```
###### `duckdb_get_table_names` {#docs:stable:clients:c:connect::duckdb_get_table_names} Get the list of (fully qualified) table names of the query. ####### Syntax {#docs:stable:clients:c:connect::syntax} ```c duckdb_value duckdb_get_table_names( duckdb_connection connection, const char *query, bool qualified ); ``` ####### Parameters {#docs:stable:clients:c:connect::parameters} * `connection`: The connection for which to get the table names. * `query`: The query for which to get the table names. * `qualified`: Returns fully qualified table names (catalog.schema.table), if set to true, else only the (not escaped) table names. ####### Return Value {#docs:stable:clients:c:connect::return-value} A duckdb_value of type VARCHAR[] containing the (fully qualified) table names of the query. Must be destroyed with duckdb_destroy_value.
### Configuration {#docs:stable:clients:c:config} Configuration options can be provided to change different settings of the database system. Note that many of these settings can be changed later on using [`PRAGMA` statements](#..:..:configuration:pragmas) as well. The configuration object should be created, filled with values and passed to `duckdb_open_ext`. #### Example {#docs:stable:clients:c:config::example} ```c duckdb_database db; duckdb_config config; // create the configuration object if (duckdb_create_config(&config) == DuckDBError) { // handle error } // set some configuration options duckdb_set_config(config, "access_mode", "READ_WRITE"); // or READ_ONLY duckdb_set_config(config, "threads", "8"); duckdb_set_config(config, "max_memory", "8GB"); duckdb_set_config(config, "default_order", "DESC"); // open the database using the configuration if (duckdb_open_ext(NULL, &db, config, NULL) == DuckDBError) { // handle error } // cleanup the configuration object duckdb_destroy_config(&config); // run queries... // cleanup duckdb_close(&db); ``` #### API Reference Overview {#docs:stable:clients:c:config::api-reference-overview} ```c duckdb_state duckdb_create_config(duckdb_config *out_config); size_t duckdb_config_count(); duckdb_state duckdb_get_config_flag(size_t index, const char **out_name, const char **out_description); duckdb_state duckdb_set_config(duckdb_config config, const char *name, const char *option); void duckdb_destroy_config(duckdb_config *config); ``` ###### `duckdb_create_config` {#docs:stable:clients:c:config::duckdb_create_config} Initializes an empty configuration object that can be used to provide start-up options for the DuckDB instance through `duckdb_open_ext`. The duckdb_config must be destroyed using 'duckdb_destroy_config' This will always succeed unless there is a malloc failure. Note that `duckdb_destroy_config` should always be called on the resulting config, even if the function returns `DuckDBError`. ####### Syntax {#docs:stable:clients:c:config::syntax} ```c duckdb_state duckdb_create_config( duckdb_config *out_config ); ``` ####### Parameters {#docs:stable:clients:c:config::parameters} * `out_config`: The result configuration object. ####### Return Value {#docs:stable:clients:c:config::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_config_count` {#docs:stable:clients:c:config::duckdb_config_count} This returns the total amount of configuration options available for usage with `duckdb_get_config_flag`. This should not be called in a loop as it internally loops over all the options. ####### Return Value {#docs:stable:clients:c:config::return-value} The amount of config options available. ####### Syntax {#docs:stable:clients:c:config::syntax} ```c size_t duckdb_config_count( ); ```
###### `duckdb_get_config_flag` {#docs:stable:clients:c:config::duckdb_get_config_flag} Obtains a human-readable name and description of a specific configuration option. This can be used to e.g. display configuration options. This will succeed unless `index` is out of range (i.e., `>= duckdb_config_count`). The result name or description MUST NOT be freed. ####### Syntax {#docs:stable:clients:c:config::syntax} ```c duckdb_state duckdb_get_config_flag( size_t index, const char **out_name, const char **out_description ); ``` ####### Parameters {#docs:stable:clients:c:config::parameters} * `index`: The index of the configuration option (between 0 and `duckdb_config_count`) * `out_name`: A name of the configuration flag. * `out_description`: A description of the configuration flag. ####### Return Value {#docs:stable:clients:c:config::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_set_config` {#docs:stable:clients:c:config::duckdb_set_config} Sets the specified option for the specified configuration. The configuration option is indicated by name. To obtain a list of config options, see `duckdb_get_config_flag`. In the source code, configuration options are defined in `config.cpp`. This can fail if either the name is invalid, or if the value provided for the option is invalid. ####### Syntax {#docs:stable:clients:c:config::syntax} ```c duckdb_state duckdb_set_config( duckdb_config config, const char *name, const char *option ); ``` ####### Parameters {#docs:stable:clients:c:config::parameters} * `config`: The configuration object to set the option on. * `name`: The name of the configuration flag to set. * `option`: The value to set the configuration flag to. ####### Return Value {#docs:stable:clients:c:config::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_destroy_config` {#docs:stable:clients:c:config::duckdb_destroy_config} Destroys the specified configuration object and de-allocates all memory allocated for the object. ####### Syntax {#docs:stable:clients:c:config::syntax} ```c void duckdb_destroy_config( duckdb_config *config ); ``` ####### Parameters {#docs:stable:clients:c:config::parameters} * `config`: The configuration object to destroy.
### Query {#docs:stable:clients:c:query} The `duckdb_query` method allows SQL queries to be run in DuckDB from C. This method takes two parameters, a (null-terminated) SQL query string and a `duckdb_result` result pointer. The result pointer may be `NULL` if the application is not interested in the result set or if the query produces no result. After the result is consumed, the `duckdb_destroy_result` method should be used to clean up the result. Elements can be extracted from the `duckdb_result` object using a variety of methods. The `duckdb_column_count` can be used to extract the number of columns. `duckdb_column_name` and `duckdb_column_type` can be used to extract the names and types of individual columns. #### Example {#docs:stable:clients:c:query::example} ```c duckdb_state state; duckdb_result result; // create a table state = duckdb_query(con, "CREATE TABLE integers (i INTEGER, j INTEGER);", NULL); if (state == DuckDBError) { // handle error } // insert three rows into the table state = duckdb_query(con, "INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL);", NULL); if (state == DuckDBError) { // handle error } // query rows again state = duckdb_query(con, "SELECT * FROM integers", &result); if (state == DuckDBError) { // handle error } // handle the result // ... // destroy the result after we are done with it duckdb_destroy_result(&result); ``` #### Value Extraction {#docs:stable:clients:c:query::value-extraction} Values can be extracted using either the `duckdb_fetch_chunk` function, or using the `duckdb_value` convenience functions. The `duckdb_fetch_chunk` function directly hands you data chunks in DuckDB's native array format and can therefore be very fast. The `duckdb_value` functions perform bounds- and type-checking, and will automatically cast values to the desired type. This makes them more convenient and easier to use, at the expense of being slower. See the [Types](#docs:stable:clients:c:types) page for more information. > For optimal performance, use `duckdb_fetch_chunk` to extract data from the query result. > The `duckdb_value` functions perform internal type-checking, bounds-checking and casting which makes them slower. ##### `duckdb_fetch_chunk` {#docs:stable:clients:c:query::duckdb_fetch_chunk} Below is an end-to-end example that prints the above result to CSV format using the `duckdb_fetch_chunk` function. Note that the function is NOT generic: we do need to know exactly what the types of the result columns are. ```c duckdb_database db; duckdb_connection con; duckdb_open(nullptr, &db); duckdb_connect(db, &con); duckdb_result res; duckdb_query(con, "CREATE TABLE integers (i INTEGER, j INTEGER);", NULL); duckdb_query(con, "INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL);", NULL); duckdb_query(con, "SELECT * FROM integers;", &res); // iterate until result is exhausted while (true) { duckdb_data_chunk result = duckdb_fetch_chunk(res); if (!result) { // result is exhausted break; } // get the number of rows from the data chunk idx_t row_count = duckdb_data_chunk_get_size(result); // get the first column duckdb_vector col1 = duckdb_data_chunk_get_vector(result, 0); int32_t *col1_data = (int32_t *) duckdb_vector_get_data(col1); uint64_t *col1_validity = duckdb_vector_get_validity(col1); // get the second column duckdb_vector col2 = duckdb_data_chunk_get_vector(result, 1); int32_t *col2_data = (int32_t *) duckdb_vector_get_data(col2); uint64_t *col2_validity = duckdb_vector_get_validity(col2); // iterate over the rows for (idx_t row = 0; row < row_count; row++) { if (duckdb_validity_row_is_valid(col1_validity, row)) { printf("%d", col1_data[row]); } else { printf("NULL"); } printf(","); if (duckdb_validity_row_is_valid(col2_validity, row)) { printf("%d", col2_data[row]); } else { printf("NULL"); } printf("\n"); } duckdb_destroy_data_chunk(&result); } // clean-up duckdb_destroy_result(&res); duckdb_disconnect(&con); duckdb_close(&db); ``` This prints the following result: ```csv 3,4 5,6 7,NULL ``` ##### `duckdb_value` {#docs:stable:clients:c:query::duckdb_value} > **Deprecated.** The `duckdb_value` functions are deprecated and are scheduled for removal in a future release. Below is an example that prints the above result to CSV format using the `duckdb_value_varchar` function. Note that the function is generic: we do not need to know about the types of the individual result columns. ```c // print the above result to CSV format using `duckdb_value_varchar` idx_t row_count = duckdb_row_count(&result); idx_t column_count = duckdb_column_count(&result); for (idx_t row = 0; row < row_count; row++) { for (idx_t col = 0; col < column_count; col++) { if (col > 0) printf(","); auto str_val = duckdb_value_varchar(&result, col, row); printf("%s", str_val); duckdb_free(str_val); } printf("\n"); } ``` #### API Reference Overview {#docs:stable:clients:c:query::api-reference-overview} ```c duckdb_state duckdb_query(duckdb_connection connection, const char *query, duckdb_result *out_result); void duckdb_destroy_result(duckdb_result *result); const char *duckdb_column_name(duckdb_result *result, idx_t col); duckdb_type duckdb_column_type(duckdb_result *result, idx_t col); duckdb_statement_type duckdb_result_statement_type(duckdb_result result); duckdb_logical_type duckdb_column_logical_type(duckdb_result *result, idx_t col); duckdb_arrow_options duckdb_result_get_arrow_options(duckdb_result *result); idx_t duckdb_column_count(duckdb_result *result); idx_t duckdb_row_count(duckdb_result *result); idx_t duckdb_rows_changed(duckdb_result *result); void *duckdb_column_data(duckdb_result *result, idx_t col); bool *duckdb_nullmask_data(duckdb_result *result, idx_t col); const char *duckdb_result_error(duckdb_result *result); duckdb_error_type duckdb_result_error_type(duckdb_result *result); ``` ###### `duckdb_query` {#docs:stable:clients:c:query::duckdb_query} Executes a SQL query within a connection and stores the full (materialized) result in the out_result pointer. If the query fails to execute, DuckDBError is returned and the error message can be retrieved by calling `duckdb_result_error`. Note that after running `duckdb_query`, `duckdb_destroy_result` must be called on the result object even if the query fails, otherwise the error stored within the result will not be freed correctly. ####### Syntax {#docs:stable:clients:c:query::syntax} ```c duckdb_state duckdb_query( duckdb_connection connection, const char *query, duckdb_result *out_result ); ``` ####### Parameters {#docs:stable:clients:c:query::parameters} * `connection`: The connection to perform the query in. * `query`: The SQL query to run. * `out_result`: The query result. ####### Return Value {#docs:stable:clients:c:query::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_destroy_result` {#docs:stable:clients:c:query::duckdb_destroy_result} Closes the result and de-allocates all memory allocated for that result. ####### Syntax {#docs:stable:clients:c:query::syntax} ```c void duckdb_destroy_result( duckdb_result *result ); ``` ####### Parameters {#docs:stable:clients:c:query::parameters} * `result`: The result to destroy.
###### `duckdb_column_name` {#docs:stable:clients:c:query::duckdb_column_name} Returns the column name of the specified column. The result should not need to be freed; the column names will automatically be destroyed when the result is destroyed. Returns `NULL` if the column is out of range. ####### Syntax {#docs:stable:clients:c:query::syntax} ```c const char *duckdb_column_name( duckdb_result *result, idx_t col ); ``` ####### Parameters {#docs:stable:clients:c:query::parameters} * `result`: The result object to fetch the column name from. * `col`: The column index. ####### Return Value {#docs:stable:clients:c:query::return-value} The column name of the specified column.
###### `duckdb_column_type` {#docs:stable:clients:c:query::duckdb_column_type} Returns the column type of the specified column. Returns `DUCKDB_TYPE_INVALID` if the column is out of range. ####### Syntax {#docs:stable:clients:c:query::syntax} ```c duckdb_type duckdb_column_type( duckdb_result *result, idx_t col ); ``` ####### Parameters {#docs:stable:clients:c:query::parameters} * `result`: The result object to fetch the column type from. * `col`: The column index. ####### Return Value {#docs:stable:clients:c:query::return-value} The column type of the specified column.
###### `duckdb_result_statement_type` {#docs:stable:clients:c:query::duckdb_result_statement_type} Returns the statement type of the statement that was executed ####### Syntax {#docs:stable:clients:c:query::syntax} ```c duckdb_statement_type duckdb_result_statement_type( duckdb_result result ); ``` ####### Parameters {#docs:stable:clients:c:query::parameters} * `result`: The result object to fetch the statement type from. ####### Return Value {#docs:stable:clients:c:query::return-value} duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID
###### `duckdb_column_logical_type` {#docs:stable:clients:c:query::duckdb_column_logical_type} Returns the logical column type of the specified column. The return type of this call should be destroyed with `duckdb_destroy_logical_type`. Returns `NULL` if the column is out of range. ####### Syntax {#docs:stable:clients:c:query::syntax} ```c duckdb_logical_type duckdb_column_logical_type( duckdb_result *result, idx_t col ); ``` ####### Parameters {#docs:stable:clients:c:query::parameters} * `result`: The result object to fetch the column type from. * `col`: The column index. ####### Return Value {#docs:stable:clients:c:query::return-value} The logical column type of the specified column.
###### `duckdb_result_get_arrow_options` {#docs:stable:clients:c:query::duckdb_result_get_arrow_options} Returns the arrow options associated with the given result. These options are definitions of how the arrow arrays/schema should be produced. ####### Syntax {#docs:stable:clients:c:query::syntax} ```c duckdb_arrow_options duckdb_result_get_arrow_options( duckdb_result *result ); ``` ####### Parameters {#docs:stable:clients:c:query::parameters} * `result`: The result object to fetch arrow options from. ####### Return Value {#docs:stable:clients:c:query::return-value} The arrow options associated with the given result. This must be destroyed with `duckdb_destroy_arrow_options`.
###### `duckdb_column_count` {#docs:stable:clients:c:query::duckdb_column_count} Returns the number of columns present in a the result object. ####### Syntax {#docs:stable:clients:c:query::syntax} ```c idx_t duckdb_column_count( duckdb_result *result ); ``` ####### Parameters {#docs:stable:clients:c:query::parameters} * `result`: The result object. ####### Return Value {#docs:stable:clients:c:query::return-value} The number of columns present in the result object.
###### `duckdb_row_count` {#docs:stable:clients:c:query::duckdb_row_count} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Returns the number of rows present in the result object. ####### Syntax {#docs:stable:clients:c:query::syntax} ```c idx_t duckdb_row_count( duckdb_result *result ); ``` ####### Parameters {#docs:stable:clients:c:query::parameters} * `result`: The result object. ####### Return Value {#docs:stable:clients:c:query::return-value} The number of rows present in the result object.
###### `duckdb_rows_changed` {#docs:stable:clients:c:query::duckdb_rows_changed} Returns the number of rows changed by the query stored in the result. This is relevant only for INSERT/UPDATE/DELETE queries. For other queries the rows_changed will be 0. ####### Syntax {#docs:stable:clients:c:query::syntax} ```c idx_t duckdb_rows_changed( duckdb_result *result ); ``` ####### Parameters {#docs:stable:clients:c:query::parameters} * `result`: The result object. ####### Return Value {#docs:stable:clients:c:query::return-value} The number of rows changed.
###### `duckdb_column_data` {#docs:stable:clients:c:query::duckdb_column_data} > **Deprecated.** This method has been deprecated. Prefer using `duckdb_result_get_chunk` instead. Returns the data of a specific column of a result in columnar format. The function returns a dense array which contains the result data. The exact type stored in the array depends on the corresponding duckdb_type (as provided by `duckdb_column_type`). For the exact type by which the data should be accessed, see the comments in [the types section](#types) or the `DUCKDB_TYPE` enum. For example, for a column of type `DUCKDB_TYPE_INTEGER`, rows can be accessed in the following manner: ```c int32_t *data = (int32_t *) duckdb_column_data(&result, 0); printf("Data for row %d: %d\n", row, data[row]); ``` ####### Syntax {#docs:stable:clients:c:query::syntax} ```c void *duckdb_column_data( duckdb_result *result, idx_t col ); ``` ####### Parameters {#docs:stable:clients:c:query::parameters} * `result`: The result object to fetch the column data from. * `col`: The column index. ####### Return Value {#docs:stable:clients:c:query::return-value} The column data of the specified column.
###### `duckdb_nullmask_data` {#docs:stable:clients:c:query::duckdb_nullmask_data} > **Deprecated.** This method has been deprecated. Prefer using `duckdb_result_get_chunk` instead. Returns the nullmask of a specific column of a result in columnar format. The nullmask indicates for every row whether or not the corresponding row is `NULL`. If a row is `NULL`, the values present in the array provided by `duckdb_column_data` are undefined. ```c int32_t *data = (int32_t *) duckdb_column_data(&result, 0); bool *nullmask = duckdb_nullmask_data(&result, 0); if (nullmask[row]) { printf("Data for row %d: NULL\n", row); } else { printf("Data for row %d: %d\n", row, data[row]); } ``` ####### Syntax {#docs:stable:clients:c:query::syntax} ```c bool *duckdb_nullmask_data( duckdb_result *result, idx_t col ); ``` ####### Parameters {#docs:stable:clients:c:query::parameters} * `result`: The result object to fetch the nullmask from. * `col`: The column index. ####### Return Value {#docs:stable:clients:c:query::return-value} The nullmask of the specified column.
###### `duckdb_result_error` {#docs:stable:clients:c:query::duckdb_result_error} Returns the error message contained within the result. The error is only set if `duckdb_query` returns `DuckDBError`. The result of this function must not be freed. It will be cleaned up when `duckdb_destroy_result` is called. ####### Syntax {#docs:stable:clients:c:query::syntax} ```c const char *duckdb_result_error( duckdb_result *result ); ``` ####### Parameters {#docs:stable:clients:c:query::parameters} * `result`: The result object to fetch the error from. ####### Return Value {#docs:stable:clients:c:query::return-value} The error of the result.
###### `duckdb_result_error_type` {#docs:stable:clients:c:query::duckdb_result_error_type} Returns the result error type contained within the result. The error is only set if `duckdb_query` returns `DuckDBError`. ####### Syntax {#docs:stable:clients:c:query::syntax} ```c duckdb_error_type duckdb_result_error_type( duckdb_result *result ); ``` ####### Parameters {#docs:stable:clients:c:query::parameters} * `result`: The result object to fetch the error from. ####### Return Value {#docs:stable:clients:c:query::return-value} The error type of the result.
### Data Chunks {#docs:stable:clients:c:data_chunk} Data chunks represent a horizontal slice of a table. They hold a number of [vectors](#docs:stable:clients:c:vector), that can each hold up to the `VECTOR_SIZE` rows. The vector size can be obtained through the `duckdb_vector_size` function and is configurable, but is usually set to `2048`. Data chunks and vectors are what DuckDB uses natively to store and represent data. For this reason, the data chunk interface is the most efficient way of interfacing with DuckDB. Be aware, however, that correctly interfacing with DuckDB using the data chunk API does require knowledge of DuckDB's internal vector format. Data chunks can be used in two manners: * **Reading Data**: Data chunks can be obtained from query results using the `duckdb_fetch_chunk` method, or as input to a user-defined function. In this case, the [vector methods](#docs:stable:clients:c:vector) can be used to read individual values. * **Writing Data**: Data chunks can be created using `duckdb_create_data_chunk`. The data chunk can then be filled with values and used in `duckdb_append_data_chunk` to write data to the database. The primary manner of interfacing with data chunks is by obtaining the internal vectors of the data chunk using the `duckdb_data_chunk_get_vector` method. Afterwards, the [vector methods](#docs:stable:clients:c:vector) can be used to read from or write to the individual vectors. #### API Reference Overview {#docs:stable:clients:c:data_chunk::api-reference-overview} ```c duckdb_data_chunk duckdb_create_data_chunk(duckdb_logical_type *types, idx_t column_count); void duckdb_destroy_data_chunk(duckdb_data_chunk *chunk); void duckdb_data_chunk_reset(duckdb_data_chunk chunk); idx_t duckdb_data_chunk_get_column_count(duckdb_data_chunk chunk); duckdb_vector duckdb_data_chunk_get_vector(duckdb_data_chunk chunk, idx_t col_idx); idx_t duckdb_data_chunk_get_size(duckdb_data_chunk chunk); void duckdb_data_chunk_set_size(duckdb_data_chunk chunk, idx_t size); ``` ###### `duckdb_create_data_chunk` {#docs:stable:clients:c:data_chunk::duckdb_create_data_chunk} Creates an empty data chunk with the specified column types. The result must be destroyed with `duckdb_destroy_data_chunk`. ####### Syntax {#docs:stable:clients:c:data_chunk::syntax} ```c duckdb_data_chunk duckdb_create_data_chunk( duckdb_logical_type *types, idx_t column_count ); ``` ####### Parameters {#docs:stable:clients:c:data_chunk::parameters} * `types`: An array of column types. Column types can not contain ANY and INVALID types. * `column_count`: The number of columns. ####### Return Value {#docs:stable:clients:c:data_chunk::return-value} The data chunk.
###### `duckdb_destroy_data_chunk` {#docs:stable:clients:c:data_chunk::duckdb_destroy_data_chunk} Destroys the data chunk and de-allocates all memory allocated for that chunk. ####### Syntax {#docs:stable:clients:c:data_chunk::syntax} ```c void duckdb_destroy_data_chunk( duckdb_data_chunk *chunk ); ``` ####### Parameters {#docs:stable:clients:c:data_chunk::parameters} * `chunk`: The data chunk to destroy.
###### `duckdb_data_chunk_reset` {#docs:stable:clients:c:data_chunk::duckdb_data_chunk_reset} Resets a data chunk, clearing the validity masks and setting the cardinality of the data chunk to 0. After calling this method, you must call `duckdb_vector_get_validity` and `duckdb_vector_get_data` to obtain current data and validity pointers ####### Syntax {#docs:stable:clients:c:data_chunk::syntax} ```c void duckdb_data_chunk_reset( duckdb_data_chunk chunk ); ``` ####### Parameters {#docs:stable:clients:c:data_chunk::parameters} * `chunk`: The data chunk to reset.
###### `duckdb_data_chunk_get_column_count` {#docs:stable:clients:c:data_chunk::duckdb_data_chunk_get_column_count} Retrieves the number of columns in a data chunk. ####### Syntax {#docs:stable:clients:c:data_chunk::syntax} ```c idx_t duckdb_data_chunk_get_column_count( duckdb_data_chunk chunk ); ``` ####### Parameters {#docs:stable:clients:c:data_chunk::parameters} * `chunk`: The data chunk to get the data from ####### Return Value {#docs:stable:clients:c:data_chunk::return-value} The number of columns in the data chunk
###### `duckdb_data_chunk_get_vector` {#docs:stable:clients:c:data_chunk::duckdb_data_chunk_get_vector} Retrieves the vector at the specified column index in the data chunk. The pointer to the vector is valid for as long as the chunk is alive. It does NOT need to be destroyed. ####### Syntax {#docs:stable:clients:c:data_chunk::syntax} ```c duckdb_vector duckdb_data_chunk_get_vector( duckdb_data_chunk chunk, idx_t col_idx ); ``` ####### Parameters {#docs:stable:clients:c:data_chunk::parameters} * `chunk`: The data chunk to get the data from ####### Return Value {#docs:stable:clients:c:data_chunk::return-value} The vector
###### `duckdb_data_chunk_get_size` {#docs:stable:clients:c:data_chunk::duckdb_data_chunk_get_size} Retrieves the current number of tuples in a data chunk. ####### Syntax {#docs:stable:clients:c:data_chunk::syntax} ```c idx_t duckdb_data_chunk_get_size( duckdb_data_chunk chunk ); ``` ####### Parameters {#docs:stable:clients:c:data_chunk::parameters} * `chunk`: The data chunk to get the data from ####### Return Value {#docs:stable:clients:c:data_chunk::return-value} The number of tuples in the data chunk
###### `duckdb_data_chunk_set_size` {#docs:stable:clients:c:data_chunk::duckdb_data_chunk_set_size} Sets the current number of tuples in a data chunk. ####### Syntax {#docs:stable:clients:c:data_chunk::syntax} ```c void duckdb_data_chunk_set_size( duckdb_data_chunk chunk, idx_t size ); ``` ####### Parameters {#docs:stable:clients:c:data_chunk::parameters} * `chunk`: The data chunk to set the size in * `size`: The number of tuples in the data chunk
### Vectors {#docs:stable:clients:c:vector} Vectors represent a horizontal slice of a column. They hold a number of values of a specific type, similar to an array. Vectors are the core data representation used in DuckDB. Vectors are typically stored within [data chunks](#docs:stable:clients:c:data_chunk). The vector and data chunk interfaces are the most efficient way of interacting with DuckDB, allowing for the highest performance. However, the interfaces are also difficult to use and care must be taken when using them. #### Vector Format {#docs:stable:clients:c:vector::vector-format} Vectors are arrays of a specific data type. The logical type of a vector can be obtained using `duckdb_vector_get_column_type`. The type id of the logical type can then be obtained using `duckdb_get_type_id`. Vectors themselves do not have sizes. Instead, the parent data chunk has a size (that can be obtained through `duckdb_data_chunk_get_size`). All vectors that belong to a data chunk have the same size. ##### Primitive Types {#docs:stable:clients:c:vector::primitive-types} For primitive types, the underlying array can be obtained using the `duckdb_vector_get_data` method. The array can then be accessed using the correct native type. Below is a table that contains a mapping of the `duckdb_type` to the native type of the array. | duckdb_type | NativeType | |--------------------------|------------------| | DUCKDB_TYPE_BOOLEAN | bool | | DUCKDB_TYPE_TINYINT | int8_t | | DUCKDB_TYPE_SMALLINT | int16_t | | DUCKDB_TYPE_INTEGER | int32_t | | DUCKDB_TYPE_BIGINT | int64_t | | DUCKDB_TYPE_UTINYINT | uint8_t | | DUCKDB_TYPE_USMALLINT | uint16_t | | DUCKDB_TYPE_UINTEGER | uint32_t | | DUCKDB_TYPE_UBIGINT | uint64_t | | DUCKDB_TYPE_FLOAT | float | | DUCKDB_TYPE_DOUBLE | double | | DUCKDB_TYPE_TIMESTAMP | duckdb_timestamp | | DUCKDB_TYPE_DATE | duckdb_date | | DUCKDB_TYPE_TIME | duckdb_time | | DUCKDB_TYPE_INTERVAL | duckdb_interval | | DUCKDB_TYPE_HUGEINT | duckdb_hugeint | | DUCKDB_TYPE_UHUGEINT | duckdb_uhugeint | | DUCKDB_TYPE_VARCHAR | duckdb_string_t | | DUCKDB_TYPE_BLOB | duckdb_string_t | | DUCKDB_TYPE_TIMESTAMP_S | duckdb_timestamp | | DUCKDB_TYPE_TIMESTAMP_MS | duckdb_timestamp | | DUCKDB_TYPE_TIMESTAMP_NS | duckdb_timestamp | | DUCKDB_TYPE_UUID | duckdb_hugeint | | DUCKDB_TYPE_TIME_TZ | duckdb_time_tz | | DUCKDB_TYPE_TIMESTAMP_TZ | duckdb_timestamp | ##### `NULL` Values {#docs:stable:clients:c:vector::null-values} Any value in a vector can be `NULL`. When a value is `NULL`, the values contained within the primary array at that index is undefined (and can be uninitialized). The validity mask is a bitmask consisting of `uint64_t` elements. For every `64` values in the vector, one `uint64_t` element exists (rounded up). The validity mask has its bit set to 1 if the value is valid, or set to 0 if the value is invalid (i.e .`NULL`). The bits of the bitmask can be read directly, or the slower helper method `duckdb_validity_row_is_valid` can be used to check whether or not a value is `NULL`. The `duckdb_vector_get_validity` returns a pointer to the validity mask. Note that if all values in a vector are valid, this function **might** return `nullptr` in which case the validity mask does not need to be checked. ##### Strings {#docs:stable:clients:c:vector::strings} String values are stored as a `duckdb_string_t`. This is a special struct that stores the string inline (if it is short, i.e., `<= 12 bytes`) or a pointer to the string data if it is longer than `12` bytes. ```c typedef struct { union { struct { uint32_t length; char prefix[4]; char *ptr; } pointer; struct { uint32_t length; char inlined[12]; } inlined; } value; } duckdb_string_t; ``` The length can either be accessed directly, or the `duckdb_string_is_inlined` can be used to check if a string is inlined. ##### Decimals {#docs:stable:clients:c:vector::decimals} Decimals are stored as integer values internally. The exact native type depends on the `width` of the decimal type, as shown in the following table: | Width | NativeType | |-------|----------------| | <= 4 | int16_t | | <= 9 | int32_t | | <= 18 | int64_t | | <= 38 | duckdb_hugeint | The `duckdb_decimal_internal_type` can be used to obtain the internal type of the decimal. Decimals are stored as integer values multiplied by `10^scale`. The scale of a decimal can be obtained using `duckdb_decimal_scale`. For example, a decimal value of `10.5` with type `DECIMAL(8, 3)` is stored internally as an `int32_t` value of `10500`. In order to obtain the correct decimal value, the value should be divided by the appropriate power-of-ten. ##### Enums {#docs:stable:clients:c:vector::enums} Enums are stored as unsigned integer values internally. The exact native type depends on the size of the enum dictionary, as shown in the following table: | Dictionary size | NativeType | |-----------------|------------| | <= 255 | uint8_t | | <= 65535 | uint16_t | | <= 4294967295 | uint32_t | The `duckdb_enum_internal_type` can be used to obtain the internal type of the enum. In order to obtain the actual string value of the enum, the `duckdb_enum_dictionary_value` function must be used to obtain the enum value that corresponds to the given dictionary entry. Note that the enum dictionary is the same for the entire column â€“ and so only needs to be constructed once. ##### Structs {#docs:stable:clients:c:vector::structs} Structs are nested types that contain any number of child types. Think of them like a `struct` in C. The way to access struct data using vectors is to access the child vectors recursively using the `duckdb_struct_vector_get_child` method. The struct vector itself does not have any data (i.e., you should not use `duckdb_vector_get_data` method on the struct). **However**, the struct vector itself **does** have a validity mask. The reason for this is that the child elements of a struct can be `NULL`, but the struct **itself** can also be `NULL`. ##### Lists {#docs:stable:clients:c:vector::lists} Lists are nested types that contain a single child type, repeated `x` times per row. Think of them like a variable-length array in C. The way to access list data using vectors is to access the child vector using the `duckdb_list_vector_get_child` method. The `duckdb_vector_get_data` must be used to get the offsets and lengths of the lists stored as `duckdb_list_entry`, that can then be applied to the child vector. ```c typedef struct { uint64_t offset; uint64_t length; } duckdb_list_entry; ``` Note that both list entries itself **and** any children stored in the lists can also be `NULL`. This must be checked using the validity mask again. ##### Arrays {#docs:stable:clients:c:vector::arrays} Arrays are nested types that contain a single child type, repeated exactly `array_size` times per row. Think of them like a fixed-size array in C. Arrays work exactly the same as lists, **except** the length and offset of each entry is fixed. The fixed array size can be obtained by using `duckdb_array_type_array_size`. The data for entry `n` then resides at `offset = n * array_size`, and always has `length = array_size`. Note that much like lists, arrays can still be `NULL`, which must be checked using the validity mask. #### Examples {#docs:stable:clients:c:vector::examples} Below are several full end-to-end examples of how to interact with vectors. ##### Example: Reading an int64 Vector with `NULL` Values {#docs:stable:clients:c:vector::example-reading-an-int64-vector-with-null-values} ```c duckdb_database db; duckdb_connection con; duckdb_open(nullptr, &db); duckdb_connect(db, &con); duckdb_result res; duckdb_query(con, "SELECT CASE WHEN i%2=0 THEN NULL ELSE i END res_col FROM range(10) t(i)", &res); // iterate until result is exhausted while (true) { duckdb_data_chunk result = duckdb_fetch_chunk(res); if (!result) { // result is exhausted break; } // get the number of rows from the data chunk idx_t row_count = duckdb_data_chunk_get_size(result); // get the first column duckdb_vector res_col = duckdb_data_chunk_get_vector(result, 0); // get the native array and the validity mask of the vector int64_t *vector_data = (int64_t *) duckdb_vector_get_data(res_col); uint64_t *vector_validity = duckdb_vector_get_validity(res_col); // iterate over the rows for (idx_t row = 0; row < row_count; row++) { if (duckdb_validity_row_is_valid(vector_validity, row)) { printf("%lld\n", vector_data[row]); } else { printf("NULL\n"); } } duckdb_destroy_data_chunk(&result); } // clean-up duckdb_destroy_result(&res); duckdb_disconnect(&con); duckdb_close(&db); ``` ##### Example: Reading a String Vector {#docs:stable:clients:c:vector::example-reading-a-string-vector} ```c duckdb_database db; duckdb_connection con; duckdb_open(nullptr, &db); duckdb_connect(db, &con); duckdb_result res; duckdb_query(con, "SELECT CASE WHEN i%2=0 THEN CONCAT('short_', i) ELSE CONCAT('longstringprefix', i) END FROM range(10) t(i)", &res); // iterate until result is exhausted while (true) { duckdb_data_chunk result = duckdb_fetch_chunk(res); if (!result) { // result is exhausted break; } // get the number of rows from the data chunk idx_t row_count = duckdb_data_chunk_get_size(result); // get the first column duckdb_vector res_col = duckdb_data_chunk_get_vector(result, 0); // get the native array and the validity mask of the vector duckdb_string_t *vector_data = (duckdb_string_t *) duckdb_vector_get_data(res_col); uint64_t *vector_validity = duckdb_vector_get_validity(res_col); // iterate over the rows for (idx_t row = 0; row < row_count; row++) { if (duckdb_validity_row_is_valid(vector_validity, row)) { duckdb_string_t str = vector_data[row]; if (duckdb_string_is_inlined(str)) { // use inlined string printf("%.*s\n", str.value.inlined.length, str.value.inlined.inlined); } else { // follow string pointer printf("%.*s\n", str.value.pointer.length, str.value.pointer.ptr); } } else { printf("NULL\n"); } } duckdb_destroy_data_chunk(&result); } // clean-up duckdb_destroy_result(&res); duckdb_disconnect(&con); duckdb_close(&db); ``` ##### Example: Reading a Struct Vector {#docs:stable:clients:c:vector::example-reading-a-struct-vector} ```c duckdb_database db; duckdb_connection con; duckdb_open(nullptr, &db); duckdb_connect(db, &con); duckdb_result res; duckdb_query(con, "SELECT CASE WHEN i%5=0 THEN NULL ELSE {'col1': i, 'col2': CASE WHEN i%2=0 THEN NULL ELSE 100 + i * 42 END} END FROM range(10) t(i)", &res); // iterate until result is exhausted while (true) { duckdb_data_chunk result = duckdb_fetch_chunk(res); if (!result) { // result is exhausted break; } // get the number of rows from the data chunk idx_t row_count = duckdb_data_chunk_get_size(result); // get the struct column duckdb_vector struct_col = duckdb_data_chunk_get_vector(result, 0); uint64_t *struct_validity = duckdb_vector_get_validity(struct_col); // get the child columns of the struct duckdb_vector col1_vector = duckdb_struct_vector_get_child(struct_col, 0); int64_t *col1_data = (int64_t *) duckdb_vector_get_data(col1_vector); uint64_t *col1_validity = duckdb_vector_get_validity(col1_vector); duckdb_vector col2_vector = duckdb_struct_vector_get_child(struct_col, 1); int64_t *col2_data = (int64_t *) duckdb_vector_get_data(col2_vector); uint64_t *col2_validity = duckdb_vector_get_validity(col2_vector); // iterate over the rows for (idx_t row = 0; row < row_count; row++) { if (!duckdb_validity_row_is_valid(struct_validity, row)) { // entire struct is NULL printf("NULL\n"); continue; } // read col1 printf("{'col1': "); if (!duckdb_validity_row_is_valid(col1_validity, row)) { // col1 is NULL printf("NULL"); } else { printf("%lld", col1_data[row]); } printf(", 'col2': "); if (!duckdb_validity_row_is_valid(col2_validity, row)) { // col2 is NULL printf("NULL"); } else { printf("%lld", col2_data[row]); } printf("}\n"); } duckdb_destroy_data_chunk(&result); } // clean-up duckdb_destroy_result(&res); duckdb_disconnect(&con); duckdb_close(&db); ``` ##### Example: Reading a List Vector {#docs:stable:clients:c:vector::example-reading-a-list-vector} ```c duckdb_database db; duckdb_connection con; duckdb_open(nullptr, &db); duckdb_connect(db, &con); duckdb_result res; duckdb_query(con, "SELECT CASE WHEN i % 5 = 0 THEN NULL WHEN i % 2 = 0 THEN [i, i + 1] ELSE [i * 42, NULL, i * 84] END FROM range(10) t(i)", &res); // iterate until result is exhausted while (true) { duckdb_data_chunk result = duckdb_fetch_chunk(res); if (!result) { // result is exhausted break; } // get the number of rows from the data chunk idx_t row_count = duckdb_data_chunk_get_size(result); // get the list column duckdb_vector list_col = duckdb_data_chunk_get_vector(result, 0); duckdb_list_entry *list_data = (duckdb_list_entry *) duckdb_vector_get_data(list_col); uint64_t *list_validity = duckdb_vector_get_validity(list_col); // get the child column of the list duckdb_vector list_child = duckdb_list_vector_get_child(list_col); int64_t *child_data = (int64_t *) duckdb_vector_get_data(list_child); uint64_t *child_validity = duckdb_vector_get_validity(list_child); // iterate over the rows for (idx_t row = 0; row < row_count; row++) { if (!duckdb_validity_row_is_valid(list_validity, row)) { // entire list is NULL printf("NULL\n"); continue; } // read the list offsets for this row duckdb_list_entry list = list_data[row]; printf("["); for (idx_t child_idx = list.offset; child_idx < list.offset + list.length; child_idx++) { if (child_idx > list.offset) { printf(", "); } if (!duckdb_validity_row_is_valid(child_validity, child_idx)) { // col1 is NULL printf("NULL"); } else { printf("%lld", child_data[child_idx]); } } printf("]\n"); } duckdb_destroy_data_chunk(&result); } // clean-up duckdb_destroy_result(&res); duckdb_disconnect(&con); duckdb_close(&db); ``` #### API Reference Overview {#docs:stable:clients:c:vector::api-reference-overview} ```c duckdb_vector duckdb_create_vector(duckdb_logical_type type, idx_t capacity); void duckdb_destroy_vector(duckdb_vector *vector); duckdb_logical_type duckdb_vector_get_column_type(duckdb_vector vector); void *duckdb_vector_get_data(duckdb_vector vector); uint64_t *duckdb_vector_get_validity(duckdb_vector vector); void duckdb_vector_ensure_validity_writable(duckdb_vector vector); void duckdb_vector_assign_string_element(duckdb_vector vector, idx_t index, const char *str); void duckdb_vector_assign_string_element_len(duckdb_vector vector, idx_t index, const char *str, idx_t str_len); duckdb_vector duckdb_list_vector_get_child(duckdb_vector vector); idx_t duckdb_list_vector_get_size(duckdb_vector vector); duckdb_state duckdb_list_vector_set_size(duckdb_vector vector, idx_t size); duckdb_state duckdb_list_vector_reserve(duckdb_vector vector, idx_t required_capacity); duckdb_vector duckdb_struct_vector_get_child(duckdb_vector vector, idx_t index); duckdb_vector duckdb_array_vector_get_child(duckdb_vector vector); void duckdb_slice_vector(duckdb_vector vector, duckdb_selection_vector sel, idx_t len); void duckdb_vector_copy_sel(duckdb_vector src, duckdb_vector dst, duckdb_selection_vector sel, idx_t src_count, idx_t src_offset, idx_t dst_offset); void duckdb_vector_reference_value(duckdb_vector vector, duckdb_value value); void duckdb_vector_reference_vector(duckdb_vector to_vector, duckdb_vector from_vector); ``` ##### Validity Mask Functions {#docs:stable:clients:c:vector::validity-mask-functions} ```c bool duckdb_validity_row_is_valid(uint64_t *validity, idx_t row); void duckdb_validity_set_row_validity(uint64_t *validity, idx_t row, bool valid); void duckdb_validity_set_row_invalid(uint64_t *validity, idx_t row); void duckdb_validity_set_row_valid(uint64_t *validity, idx_t row); ``` ###### `duckdb_create_vector` {#docs:stable:clients:c:vector::duckdb_create_vector} Creates a flat vector. Must be destroyed with `duckdb_destroy_vector`. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c duckdb_vector duckdb_create_vector( duckdb_logical_type type, idx_t capacity ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `type`: The logical type of the vector. * `capacity`: The capacity of the vector. ####### Return Value {#docs:stable:clients:c:vector::return-value} The vector.
###### `duckdb_destroy_vector` {#docs:stable:clients:c:vector::duckdb_destroy_vector} Destroys the vector and de-allocates its memory. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c void duckdb_destroy_vector( duckdb_vector *vector ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `vector`: A pointer to the vector.
###### `duckdb_vector_get_column_type` {#docs:stable:clients:c:vector::duckdb_vector_get_column_type} Retrieves the column type of the specified vector. The result must be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c duckdb_logical_type duckdb_vector_get_column_type( duckdb_vector vector ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `vector`: The vector get the data from ####### Return Value {#docs:stable:clients:c:vector::return-value} The type of the vector
###### `duckdb_vector_get_data` {#docs:stable:clients:c:vector::duckdb_vector_get_data} Retrieves the data pointer of the vector. The data pointer can be used to read or write values from the vector. How to read or write values depends on the type of the vector. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c void *duckdb_vector_get_data( duckdb_vector vector ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `vector`: The vector to get the data from ####### Return Value {#docs:stable:clients:c:vector::return-value} The data pointer
###### `duckdb_vector_get_validity` {#docs:stable:clients:c:vector::duckdb_vector_get_validity} Retrieves the validity mask pointer of the specified vector. If all values are valid, this function MIGHT return NULL! The validity mask is a bitset that signifies null-ness within the data chunk. It is a series of uint64_t values, where each uint64_t value contains validity for 64 tuples. The bit is set to 1 if the value is valid (i.e., not NULL) or 0 if the value is invalid (i.e., NULL). Validity of a specific value can be obtained like this: idx_t entry_idx = row_idx / 64; idx_t idx_in_entry = row_idx % 64; bool is_valid = validity_mask[entry_idx] & (1 << idx_in_entry); Alternatively, the (slower) duckdb_validity_row_is_valid function can be used. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c uint64_t *duckdb_vector_get_validity( duckdb_vector vector ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `vector`: The vector to get the data from ####### Return Value {#docs:stable:clients:c:vector::return-value} The pointer to the validity mask, or NULL if no validity mask is present
###### `duckdb_vector_ensure_validity_writable` {#docs:stable:clients:c:vector::duckdb_vector_ensure_validity_writable} Ensures the validity mask is writable by allocating it. After this function is called, `duckdb_vector_get_validity` will ALWAYS return non-NULL. This allows NULL values to be written to the vector, regardless of whether a validity mask was present before. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c void duckdb_vector_ensure_validity_writable( duckdb_vector vector ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `vector`: The vector to alter
###### `duckdb_vector_assign_string_element` {#docs:stable:clients:c:vector::duckdb_vector_assign_string_element} Assigns a string element in the vector at the specified location. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c void duckdb_vector_assign_string_element( duckdb_vector vector, idx_t index, const char *str ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `vector`: The vector to alter * `index`: The row position in the vector to assign the string to * `str`: The null-terminated string
###### `duckdb_vector_assign_string_element_len` {#docs:stable:clients:c:vector::duckdb_vector_assign_string_element_len} Assigns a string element in the vector at the specified location. You may also use this function to assign BLOBs. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c void duckdb_vector_assign_string_element_len( duckdb_vector vector, idx_t index, const char *str, idx_t str_len ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `vector`: The vector to alter * `index`: The row position in the vector to assign the string to * `str`: The string * `str_len`: The length of the string (in bytes)
###### `duckdb_list_vector_get_child` {#docs:stable:clients:c:vector::duckdb_list_vector_get_child} Retrieves the child vector of a list vector. The resulting vector is valid as long as the parent vector is valid. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c duckdb_vector duckdb_list_vector_get_child( duckdb_vector vector ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `vector`: The vector ####### Return Value {#docs:stable:clients:c:vector::return-value} The child vector
###### `duckdb_list_vector_get_size` {#docs:stable:clients:c:vector::duckdb_list_vector_get_size} Returns the size of the child vector of the list. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c idx_t duckdb_list_vector_get_size( duckdb_vector vector ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `vector`: The vector ####### Return Value {#docs:stable:clients:c:vector::return-value} The size of the child list
###### `duckdb_list_vector_set_size` {#docs:stable:clients:c:vector::duckdb_list_vector_set_size} Sets the total size of the underlying child-vector of a list vector. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c duckdb_state duckdb_list_vector_set_size( duckdb_vector vector, idx_t size ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `vector`: The list vector. * `size`: The size of the child list. ####### Return Value {#docs:stable:clients:c:vector::return-value} The duckdb state. Returns DuckDBError if the vector is nullptr.
###### `duckdb_list_vector_reserve` {#docs:stable:clients:c:vector::duckdb_list_vector_reserve} Sets the total capacity of the underlying child-vector of a list. After calling this method, you must call `duckdb_vector_get_validity` and `duckdb_vector_get_data` to obtain current data and validity pointers ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c duckdb_state duckdb_list_vector_reserve( duckdb_vector vector, idx_t required_capacity ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `vector`: The list vector. * `required_capacity`: the total capacity to reserve. ####### Return Value {#docs:stable:clients:c:vector::return-value} The duckdb state. Returns DuckDBError if the vector is nullptr.
###### `duckdb_struct_vector_get_child` {#docs:stable:clients:c:vector::duckdb_struct_vector_get_child} Retrieves the child vector of a struct vector. The resulting vector is valid as long as the parent vector is valid. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c duckdb_vector duckdb_struct_vector_get_child( duckdb_vector vector, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `vector`: The vector * `index`: The child index ####### Return Value {#docs:stable:clients:c:vector::return-value} The child vector
###### `duckdb_array_vector_get_child` {#docs:stable:clients:c:vector::duckdb_array_vector_get_child} Retrieves the child vector of an array vector. The resulting vector is valid as long as the parent vector is valid. The resulting vector has the size of the parent vector multiplied by the array size. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c duckdb_vector duckdb_array_vector_get_child( duckdb_vector vector ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `vector`: The vector ####### Return Value {#docs:stable:clients:c:vector::return-value} The child vector
###### `duckdb_slice_vector` {#docs:stable:clients:c:vector::duckdb_slice_vector} Slice a vector with a selection vector. The length of the selection vector must be less than or equal to the length of the vector. Turns the vector into a dictionary vector. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c void duckdb_slice_vector( duckdb_vector vector, duckdb_selection_vector sel, idx_t len ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `vector`: The vector to slice. * `sel`: The selection vector. * `len`: The length of the selection vector.
###### `duckdb_vector_copy_sel` {#docs:stable:clients:c:vector::duckdb_vector_copy_sel} Copy the src vector to the dst with a selection vector that identifies which indices to copy. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c void duckdb_vector_copy_sel( duckdb_vector src, duckdb_vector dst, duckdb_selection_vector sel, idx_t src_count, idx_t src_offset, idx_t dst_offset ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `src`: The vector to copy from. * `dst`: The vector to copy to. * `sel`: The selection vector. The length of the selection vector should not be more than the length of the src vector * `src_count`: The number of entries from selection vector to copy. Think of this as the effective length of the selection vector starting from index 0 * `src_offset`: The offset in the selection vector to copy from (important: actual number of items copied = src_count - src_offset). * `dst_offset`: The offset in the dst vector to start copying to.
###### `duckdb_vector_reference_value` {#docs:stable:clients:c:vector::duckdb_vector_reference_value} Copies the value from `value` to `vector`. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c void duckdb_vector_reference_value( duckdb_vector vector, duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `vector`: The receiving vector. * `value`: The value to copy into the vector.
###### `duckdb_vector_reference_vector` {#docs:stable:clients:c:vector::duckdb_vector_reference_vector} Changes `to_vector` to reference `from_vector. After, the vectors share ownership of the data. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c void duckdb_vector_reference_vector( duckdb_vector to_vector, duckdb_vector from_vector ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `to_vector`: The receiving vector. * `from_vector`: The vector to reference.
###### `duckdb_validity_row_is_valid` {#docs:stable:clients:c:vector::duckdb_validity_row_is_valid} Returns whether or not a row is valid (i.e., not NULL) in the given validity mask. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c bool duckdb_validity_row_is_valid( uint64_t *validity, idx_t row ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `validity`: The validity mask, as obtained through `duckdb_vector_get_validity` * `row`: The row index ####### Return Value {#docs:stable:clients:c:vector::return-value} true if the row is valid, false otherwise
###### `duckdb_validity_set_row_validity` {#docs:stable:clients:c:vector::duckdb_validity_set_row_validity} In a validity mask, sets a specific row to either valid or invalid. Note that `duckdb_vector_ensure_validity_writable` should be called before calling `duckdb_vector_get_validity`, to ensure that there is a validity mask to write to. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c void duckdb_validity_set_row_validity( uint64_t *validity, idx_t row, bool valid ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `validity`: The validity mask, as obtained through `duckdb_vector_get_validity`. * `row`: The row index * `valid`: Whether or not to set the row to valid, or invalid
###### `duckdb_validity_set_row_invalid` {#docs:stable:clients:c:vector::duckdb_validity_set_row_invalid} In a validity mask, sets a specific row to invalid. Equivalent to `duckdb_validity_set_row_validity` with valid set to false. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c void duckdb_validity_set_row_invalid( uint64_t *validity, idx_t row ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `validity`: The validity mask * `row`: The row index
###### `duckdb_validity_set_row_valid` {#docs:stable:clients:c:vector::duckdb_validity_set_row_valid} In a validity mask, sets a specific row to valid. Equivalent to `duckdb_validity_set_row_validity` with valid set to true. ####### Syntax {#docs:stable:clients:c:vector::syntax} ```c void duckdb_validity_set_row_valid( uint64_t *validity, idx_t row ); ``` ####### Parameters {#docs:stable:clients:c:vector::parameters} * `validity`: The validity mask * `row`: The row index
### Values {#docs:stable:clients:c:value} The value class represents a single value of any type. #### API Reference Overview {#docs:stable:clients:c:value::api-reference-overview} ```c void duckdb_destroy_value(duckdb_value *value); duckdb_value duckdb_create_varchar(const char *text); duckdb_value duckdb_create_varchar_length(const char *text, idx_t length); duckdb_value duckdb_create_bool(bool input); duckdb_value duckdb_create_int8(int8_t input); duckdb_value duckdb_create_uint8(uint8_t input); duckdb_value duckdb_create_int16(int16_t input); duckdb_value duckdb_create_uint16(uint16_t input); duckdb_value duckdb_create_int32(int32_t input); duckdb_value duckdb_create_uint32(uint32_t input); duckdb_value duckdb_create_uint64(uint64_t input); duckdb_value duckdb_create_int64(int64_t val); duckdb_value duckdb_create_hugeint(duckdb_hugeint input); duckdb_value duckdb_create_uhugeint(duckdb_uhugeint input); duckdb_value duckdb_create_bignum(duckdb_bignum input); duckdb_value duckdb_create_decimal(duckdb_decimal input); duckdb_value duckdb_create_float(float input); duckdb_value duckdb_create_double(double input); duckdb_value duckdb_create_date(duckdb_date input); duckdb_value duckdb_create_time(duckdb_time input); duckdb_value duckdb_create_time_ns(duckdb_time_ns input); duckdb_value duckdb_create_time_tz_value(duckdb_time_tz value); duckdb_value duckdb_create_timestamp(duckdb_timestamp input); duckdb_value duckdb_create_timestamp_tz(duckdb_timestamp input); duckdb_value duckdb_create_timestamp_s(duckdb_timestamp_s input); duckdb_value duckdb_create_timestamp_ms(duckdb_timestamp_ms input); duckdb_value duckdb_create_timestamp_ns(duckdb_timestamp_ns input); duckdb_value duckdb_create_interval(duckdb_interval input); duckdb_value duckdb_create_blob(const uint8_t *data, idx_t length); duckdb_value duckdb_create_bit(duckdb_bit input); duckdb_value duckdb_create_uuid(duckdb_uhugeint input); bool duckdb_get_bool(duckdb_value val); int8_t duckdb_get_int8(duckdb_value val); uint8_t duckdb_get_uint8(duckdb_value val); int16_t duckdb_get_int16(duckdb_value val); uint16_t duckdb_get_uint16(duckdb_value val); int32_t duckdb_get_int32(duckdb_value val); uint32_t duckdb_get_uint32(duckdb_value val); int64_t duckdb_get_int64(duckdb_value val); uint64_t duckdb_get_uint64(duckdb_value val); duckdb_hugeint duckdb_get_hugeint(duckdb_value val); duckdb_uhugeint duckdb_get_uhugeint(duckdb_value val); duckdb_bignum duckdb_get_bignum(duckdb_value val); duckdb_decimal duckdb_get_decimal(duckdb_value val); float duckdb_get_float(duckdb_value val); double duckdb_get_double(duckdb_value val); duckdb_date duckdb_get_date(duckdb_value val); duckdb_time duckdb_get_time(duckdb_value val); duckdb_time_ns duckdb_get_time_ns(duckdb_value val); duckdb_time_tz duckdb_get_time_tz(duckdb_value val); duckdb_timestamp duckdb_get_timestamp(duckdb_value val); duckdb_timestamp duckdb_get_timestamp_tz(duckdb_value val); duckdb_timestamp_s duckdb_get_timestamp_s(duckdb_value val); duckdb_timestamp_ms duckdb_get_timestamp_ms(duckdb_value val); duckdb_timestamp_ns duckdb_get_timestamp_ns(duckdb_value val); duckdb_interval duckdb_get_interval(duckdb_value val); duckdb_logical_type duckdb_get_value_type(duckdb_value val); duckdb_blob duckdb_get_blob(duckdb_value val); duckdb_bit duckdb_get_bit(duckdb_value val); duckdb_uhugeint duckdb_get_uuid(duckdb_value val); char *duckdb_get_varchar(duckdb_value value); duckdb_value duckdb_create_struct_value(duckdb_logical_type type, duckdb_value *values); duckdb_value duckdb_create_list_value(duckdb_logical_type type, duckdb_value *values, idx_t value_count); duckdb_value duckdb_create_array_value(duckdb_logical_type type, duckdb_value *values, idx_t value_count); duckdb_value duckdb_create_map_value(duckdb_logical_type map_type, duckdb_value *keys, duckdb_value *values, idx_t entry_count); duckdb_value duckdb_create_union_value(duckdb_logical_type union_type, idx_t tag_index, duckdb_value value); idx_t duckdb_get_map_size(duckdb_value value); duckdb_value duckdb_get_map_key(duckdb_value value, idx_t index); duckdb_value duckdb_get_map_value(duckdb_value value, idx_t index); bool duckdb_is_null_value(duckdb_value value); duckdb_value duckdb_create_null_value(); idx_t duckdb_get_list_size(duckdb_value value); duckdb_value duckdb_get_list_child(duckdb_value value, idx_t index); duckdb_value duckdb_create_enum_value(duckdb_logical_type type, uint64_t value); uint64_t duckdb_get_enum_value(duckdb_value value); duckdb_value duckdb_get_struct_child(duckdb_value value, idx_t index); char *duckdb_value_to_string(duckdb_value value); ``` ###### `duckdb_destroy_value` {#docs:stable:clients:c:value::duckdb_destroy_value} Destroys the value and de-allocates all memory allocated for that type. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c void duckdb_destroy_value( duckdb_value *value ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `value`: The value to destroy.
###### `duckdb_create_varchar` {#docs:stable:clients:c:value::duckdb_create_varchar} Creates a value from a null-terminated string ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_varchar( const char *text ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `text`: The null-terminated string ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_varchar_length` {#docs:stable:clients:c:value::duckdb_create_varchar_length} Creates a value from a string ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_varchar_length( const char *text, idx_t length ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `text`: The text * `length`: The length of the text ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_bool` {#docs:stable:clients:c:value::duckdb_create_bool} Creates a value from a boolean ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_bool( bool input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The boolean value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_int8` {#docs:stable:clients:c:value::duckdb_create_int8} Creates a value from an int8_t (a tinyint) ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_int8( int8_t input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The tinyint value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_uint8` {#docs:stable:clients:c:value::duckdb_create_uint8} Creates a value from a uint8_t (a utinyint) ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_uint8( uint8_t input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The utinyint value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_int16` {#docs:stable:clients:c:value::duckdb_create_int16} Creates a value from an int16_t (a smallint) ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_int16( int16_t input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The smallint value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_uint16` {#docs:stable:clients:c:value::duckdb_create_uint16} Creates a value from a uint16_t (a usmallint) ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_uint16( uint16_t input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The usmallint value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_int32` {#docs:stable:clients:c:value::duckdb_create_int32} Creates a value from an int32_t (an integer) ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_int32( int32_t input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The integer value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_uint32` {#docs:stable:clients:c:value::duckdb_create_uint32} Creates a value from a uint32_t (a uinteger) ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_uint32( uint32_t input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The uinteger value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_uint64` {#docs:stable:clients:c:value::duckdb_create_uint64} Creates a value from a uint64_t (a ubigint) ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_uint64( uint64_t input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The ubigint value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_int64` {#docs:stable:clients:c:value::duckdb_create_int64} Creates a value from an int64 ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_int64( int64_t val ); ```
###### `duckdb_create_hugeint` {#docs:stable:clients:c:value::duckdb_create_hugeint} Creates a value from a hugeint ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_hugeint( duckdb_hugeint input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The hugeint value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_uhugeint` {#docs:stable:clients:c:value::duckdb_create_uhugeint} Creates a value from a uhugeint ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_uhugeint( duckdb_uhugeint input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The uhugeint value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_bignum` {#docs:stable:clients:c:value::duckdb_create_bignum} Creates a BIGNUM value from a duckdb_bignum ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_bignum( duckdb_bignum input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The duckdb_bignum value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_decimal` {#docs:stable:clients:c:value::duckdb_create_decimal} Creates a DECIMAL value from a duckdb_decimal ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_decimal( duckdb_decimal input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The duckdb_decimal value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_float` {#docs:stable:clients:c:value::duckdb_create_float} Creates a value from a float ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_float( float input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The float value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_double` {#docs:stable:clients:c:value::duckdb_create_double} Creates a value from a double ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_double( double input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The double value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_date` {#docs:stable:clients:c:value::duckdb_create_date} Creates a value from a date ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_date( duckdb_date input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The date value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_time` {#docs:stable:clients:c:value::duckdb_create_time} Creates a value from a time ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_time( duckdb_time input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The time value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_time_ns` {#docs:stable:clients:c:value::duckdb_create_time_ns} Creates a value from a time_ns ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_time_ns( duckdb_time_ns input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The time value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_time_tz_value` {#docs:stable:clients:c:value::duckdb_create_time_tz_value} Creates a value from a time_tz. Not to be confused with `duckdb_create_time_tz`, which creates a duckdb_time_tz_t. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_time_tz_value( duckdb_time_tz value ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `value`: The time_tz value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_timestamp` {#docs:stable:clients:c:value::duckdb_create_timestamp} Creates a TIMESTAMP value from a duckdb_timestamp ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_timestamp( duckdb_timestamp input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The duckdb_timestamp value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_timestamp_tz` {#docs:stable:clients:c:value::duckdb_create_timestamp_tz} Creates a TIMESTAMP_TZ value from a duckdb_timestamp ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_timestamp_tz( duckdb_timestamp input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The duckdb_timestamp value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_timestamp_s` {#docs:stable:clients:c:value::duckdb_create_timestamp_s} Creates a TIMESTAMP_S value from a duckdb_timestamp_s ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_timestamp_s( duckdb_timestamp_s input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The duckdb_timestamp_s value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_timestamp_ms` {#docs:stable:clients:c:value::duckdb_create_timestamp_ms} Creates a TIMESTAMP_MS value from a duckdb_timestamp_ms ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_timestamp_ms( duckdb_timestamp_ms input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The duckdb_timestamp_ms value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_timestamp_ns` {#docs:stable:clients:c:value::duckdb_create_timestamp_ns} Creates a TIMESTAMP_NS value from a duckdb_timestamp_ns ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_timestamp_ns( duckdb_timestamp_ns input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The duckdb_timestamp_ns value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_interval` {#docs:stable:clients:c:value::duckdb_create_interval} Creates a value from an interval ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_interval( duckdb_interval input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The interval value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_blob` {#docs:stable:clients:c:value::duckdb_create_blob} Creates a value from a blob ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_blob( const uint8_t *data, idx_t length ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `data`: The blob data * `length`: The length of the blob data ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_bit` {#docs:stable:clients:c:value::duckdb_create_bit} Creates a BIT value from a duckdb_bit ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_bit( duckdb_bit input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The duckdb_bit value ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_uuid` {#docs:stable:clients:c:value::duckdb_create_uuid} Creates a UUID value from a uhugeint ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_uuid( duckdb_uhugeint input ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `input`: The duckdb_uhugeint containing the UUID ####### Return Value {#docs:stable:clients:c:value::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_get_bool` {#docs:stable:clients:c:value::duckdb_get_bool} Returns the boolean value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c bool duckdb_get_bool( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a boolean ####### Return Value {#docs:stable:clients:c:value::return-value} A boolean, or false if the value cannot be converted
###### `duckdb_get_int8` {#docs:stable:clients:c:value::duckdb_get_int8} Returns the int8_t value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c int8_t duckdb_get_int8( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a tinyint ####### Return Value {#docs:stable:clients:c:value::return-value} A int8_t, or MinValue if the value cannot be converted
###### `duckdb_get_uint8` {#docs:stable:clients:c:value::duckdb_get_uint8} Returns the uint8_t value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c uint8_t duckdb_get_uint8( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a utinyint ####### Return Value {#docs:stable:clients:c:value::return-value} A uint8_t, or MinValue if the value cannot be converted
###### `duckdb_get_int16` {#docs:stable:clients:c:value::duckdb_get_int16} Returns the int16_t value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c int16_t duckdb_get_int16( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a smallint ####### Return Value {#docs:stable:clients:c:value::return-value} A int16_t, or MinValue if the value cannot be converted
###### `duckdb_get_uint16` {#docs:stable:clients:c:value::duckdb_get_uint16} Returns the uint16_t value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c uint16_t duckdb_get_uint16( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a usmallint ####### Return Value {#docs:stable:clients:c:value::return-value} A uint16_t, or MinValue if the value cannot be converted
###### `duckdb_get_int32` {#docs:stable:clients:c:value::duckdb_get_int32} Returns the int32_t value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c int32_t duckdb_get_int32( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing an integer ####### Return Value {#docs:stable:clients:c:value::return-value} A int32_t, or MinValue if the value cannot be converted
###### `duckdb_get_uint32` {#docs:stable:clients:c:value::duckdb_get_uint32} Returns the uint32_t value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c uint32_t duckdb_get_uint32( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a uinteger ####### Return Value {#docs:stable:clients:c:value::return-value} A uint32_t, or MinValue if the value cannot be converted
###### `duckdb_get_int64` {#docs:stable:clients:c:value::duckdb_get_int64} Returns the int64_t value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c int64_t duckdb_get_int64( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a bigint ####### Return Value {#docs:stable:clients:c:value::return-value} A int64_t, or MinValue if the value cannot be converted
###### `duckdb_get_uint64` {#docs:stable:clients:c:value::duckdb_get_uint64} Returns the uint64_t value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c uint64_t duckdb_get_uint64( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a ubigint ####### Return Value {#docs:stable:clients:c:value::return-value} A uint64_t, or MinValue if the value cannot be converted
###### `duckdb_get_hugeint` {#docs:stable:clients:c:value::duckdb_get_hugeint} Returns the hugeint value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_hugeint duckdb_get_hugeint( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a hugeint ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_hugeint, or MinValue if the value cannot be converted
###### `duckdb_get_uhugeint` {#docs:stable:clients:c:value::duckdb_get_uhugeint} Returns the uhugeint value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_uhugeint duckdb_get_uhugeint( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a uhugeint ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_uhugeint, or MinValue if the value cannot be converted
###### `duckdb_get_bignum` {#docs:stable:clients:c:value::duckdb_get_bignum} Returns the duckdb_bignum value of the given value. The `data` field must be destroyed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_bignum duckdb_get_bignum( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a BIGNUM ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_bignum. The `data` field must be destroyed with `duckdb_free`.
###### `duckdb_get_decimal` {#docs:stable:clients:c:value::duckdb_get_decimal} Returns the duckdb_decimal value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_decimal duckdb_get_decimal( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a DECIMAL ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_decimal, or MinValue if the value cannot be converted
###### `duckdb_get_float` {#docs:stable:clients:c:value::duckdb_get_float} Returns the float value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c float duckdb_get_float( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a float ####### Return Value {#docs:stable:clients:c:value::return-value} A float, or NAN if the value cannot be converted
###### `duckdb_get_double` {#docs:stable:clients:c:value::duckdb_get_double} Returns the double value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c double duckdb_get_double( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a double ####### Return Value {#docs:stable:clients:c:value::return-value} A double, or NAN if the value cannot be converted
###### `duckdb_get_date` {#docs:stable:clients:c:value::duckdb_get_date} Returns the date value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_date duckdb_get_date( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a date ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_date, or MinValue if the value cannot be converted
###### `duckdb_get_time` {#docs:stable:clients:c:value::duckdb_get_time} Returns the time value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_time duckdb_get_time( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a time ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_time, or MinValue if the value cannot be converted
###### `duckdb_get_time_ns` {#docs:stable:clients:c:value::duckdb_get_time_ns} Returns the time_ns value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_time_ns duckdb_get_time_ns( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a time_ns ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_time_ns, or MinValue if the value cannot be converted
###### `duckdb_get_time_tz` {#docs:stable:clients:c:value::duckdb_get_time_tz} Returns the time_tz value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_time_tz duckdb_get_time_tz( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a time_tz ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_time_tz, or MinValue if the value cannot be converted
###### `duckdb_get_timestamp` {#docs:stable:clients:c:value::duckdb_get_timestamp} Returns the TIMESTAMP value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_timestamp duckdb_get_timestamp( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a TIMESTAMP ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_timestamp, or MinValue if the value cannot be converted
###### `duckdb_get_timestamp_tz` {#docs:stable:clients:c:value::duckdb_get_timestamp_tz} Returns the TIMESTAMP_TZ value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_timestamp duckdb_get_timestamp_tz( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a TIMESTAMP_TZ ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_timestamp, or MinValue if the value cannot be converted
###### `duckdb_get_timestamp_s` {#docs:stable:clients:c:value::duckdb_get_timestamp_s} Returns the duckdb_timestamp_s value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_timestamp_s duckdb_get_timestamp_s( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a TIMESTAMP_S ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_timestamp_s, or MinValue if the value cannot be converted
###### `duckdb_get_timestamp_ms` {#docs:stable:clients:c:value::duckdb_get_timestamp_ms} Returns the duckdb_timestamp_ms value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_timestamp_ms duckdb_get_timestamp_ms( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a TIMESTAMP_MS ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_timestamp_ms, or MinValue if the value cannot be converted
###### `duckdb_get_timestamp_ns` {#docs:stable:clients:c:value::duckdb_get_timestamp_ns} Returns the duckdb_timestamp_ns value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_timestamp_ns duckdb_get_timestamp_ns( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a TIMESTAMP_NS ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_timestamp_ns, or MinValue if the value cannot be converted
###### `duckdb_get_interval` {#docs:stable:clients:c:value::duckdb_get_interval} Returns the interval value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_interval duckdb_get_interval( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a interval ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_interval, or MinValue if the value cannot be converted
###### `duckdb_get_value_type` {#docs:stable:clients:c:value::duckdb_get_value_type} Returns the type of the given value. The type is valid as long as the value is not destroyed. The type itself must not be destroyed. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_logical_type duckdb_get_value_type( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_logical_type.
###### `duckdb_get_blob` {#docs:stable:clients:c:value::duckdb_get_blob} Returns the blob value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_blob duckdb_get_blob( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a blob ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_blob
###### `duckdb_get_bit` {#docs:stable:clients:c:value::duckdb_get_bit} Returns the duckdb_bit value of the given value. The `data` field must be destroyed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_bit duckdb_get_bit( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a BIT ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_bit
###### `duckdb_get_uuid` {#docs:stable:clients:c:value::duckdb_get_uuid} Returns a duckdb_uhugeint representing the UUID value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_uhugeint duckdb_get_uuid( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `val`: A duckdb_value containing a UUID ####### Return Value {#docs:stable:clients:c:value::return-value} A duckdb_uhugeint representing the UUID value
###### `duckdb_get_varchar` {#docs:stable:clients:c:value::duckdb_get_varchar} Obtains a string representation of the given value. The result must be destroyed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c char *duckdb_get_varchar( duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `value`: The value ####### Return Value {#docs:stable:clients:c:value::return-value} The string value. This must be destroyed with `duckdb_free`.
###### `duckdb_create_struct_value` {#docs:stable:clients:c:value::duckdb_create_struct_value} Creates a struct value from a type and an array of values. Must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_struct_value( duckdb_logical_type type, duckdb_value *values ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `type`: The type of the struct * `values`: The values for the struct fields ####### Return Value {#docs:stable:clients:c:value::return-value} The struct value, or nullptr, if any child type is `DUCKDB_TYPE_ANY` or `DUCKDB_TYPE_INVALID`.
###### `duckdb_create_list_value` {#docs:stable:clients:c:value::duckdb_create_list_value} Creates a list value from a child (element) type and an array of values of length `value_count`. Must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_list_value( duckdb_logical_type type, duckdb_value *values, idx_t value_count ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `type`: The type of the list * `values`: The values for the list * `value_count`: The number of values in the list ####### Return Value {#docs:stable:clients:c:value::return-value} The list value, or nullptr, if the child type is `DUCKDB_TYPE_ANY` or `DUCKDB_TYPE_INVALID`.
###### `duckdb_create_array_value` {#docs:stable:clients:c:value::duckdb_create_array_value} Creates an array value from a child (element) type and an array of values of length `value_count`. Must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_array_value( duckdb_logical_type type, duckdb_value *values, idx_t value_count ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `type`: The type of the array * `values`: The values for the array * `value_count`: The number of values in the array ####### Return Value {#docs:stable:clients:c:value::return-value} The array value, or nullptr, if the child type is `DUCKDB_TYPE_ANY` or `DUCKDB_TYPE_INVALID`.
###### `duckdb_create_map_value` {#docs:stable:clients:c:value::duckdb_create_map_value} Creates a map value from a map type and two arrays, one for the keys and one for the values, each of length `entry_count`. Must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_map_value( duckdb_logical_type map_type, duckdb_value *keys, duckdb_value *values, idx_t entry_count ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `map_type`: The map type * `keys`: The keys of the map * `values`: The values of the map * `entry_count`: The number of entrys (key-value pairs) in the map ####### Return Value {#docs:stable:clients:c:value::return-value} The map value, or nullptr, if the parameters are invalid.
###### `duckdb_create_union_value` {#docs:stable:clients:c:value::duckdb_create_union_value} Creates a union value from a union type, a tag index, and a value. Must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_union_value( duckdb_logical_type union_type, idx_t tag_index, duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `union_type`: The union type * `tag_index`: The index of the tag of the union * `value`: The value of the union for that tag ####### Return Value {#docs:stable:clients:c:value::return-value} The union value, or nullptr, if the parameters are invalid.
###### `duckdb_get_map_size` {#docs:stable:clients:c:value::duckdb_get_map_size} Returns the number of elements in a MAP value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c idx_t duckdb_get_map_size( duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `value`: The MAP value. ####### Return Value {#docs:stable:clients:c:value::return-value} The number of elements in the map.
###### `duckdb_get_map_key` {#docs:stable:clients:c:value::duckdb_get_map_key} Returns the MAP key at index as a duckdb_value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_get_map_key( duckdb_value value, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `value`: The MAP value. * `index`: The index of the key. ####### Return Value {#docs:stable:clients:c:value::return-value} The key as a duckdb_value.
###### `duckdb_get_map_value` {#docs:stable:clients:c:value::duckdb_get_map_value} Returns the MAP value at index as a duckdb_value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_get_map_value( duckdb_value value, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `value`: The MAP value. * `index`: The index of the value. ####### Return Value {#docs:stable:clients:c:value::return-value} The value as a duckdb_value.
###### `duckdb_is_null_value` {#docs:stable:clients:c:value::duckdb_is_null_value} Returns whether the value's type is SQLNULL or not. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c bool duckdb_is_null_value( duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `value`: The value to check. ####### Return Value {#docs:stable:clients:c:value::return-value} True, if the value's type is SQLNULL, otherwise false.
###### `duckdb_create_null_value` {#docs:stable:clients:c:value::duckdb_create_null_value} Creates a value of type SQLNULL. ####### Return Value {#docs:stable:clients:c:value::return-value} The duckdb_value representing SQLNULL. This must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_null_value( ); ```
###### `duckdb_get_list_size` {#docs:stable:clients:c:value::duckdb_get_list_size} Returns the number of elements in a LIST value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c idx_t duckdb_get_list_size( duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `value`: The LIST value. ####### Return Value {#docs:stable:clients:c:value::return-value} The number of elements in the list.
###### `duckdb_get_list_child` {#docs:stable:clients:c:value::duckdb_get_list_child} Returns the LIST child at index as a duckdb_value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_get_list_child( duckdb_value value, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `value`: The LIST value. * `index`: The index of the child. ####### Return Value {#docs:stable:clients:c:value::return-value} The child as a duckdb_value.
###### `duckdb_create_enum_value` {#docs:stable:clients:c:value::duckdb_create_enum_value} Creates an enum value from a type and a value. Must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_create_enum_value( duckdb_logical_type type, uint64_t value ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `type`: The type of the enum * `value`: The value for the enum ####### Return Value {#docs:stable:clients:c:value::return-value} The enum value, or nullptr.
###### `duckdb_get_enum_value` {#docs:stable:clients:c:value::duckdb_get_enum_value} Returns the enum value of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c uint64_t duckdb_get_enum_value( duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `value`: A duckdb_value containing an enum ####### Return Value {#docs:stable:clients:c:value::return-value} A uint64_t, or MinValue if the value cannot be converted
###### `duckdb_get_struct_child` {#docs:stable:clients:c:value::duckdb_get_struct_child} Returns the STRUCT child at index as a duckdb_value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c duckdb_value duckdb_get_struct_child( duckdb_value value, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `value`: The STRUCT value. * `index`: The index of the child. ####### Return Value {#docs:stable:clients:c:value::return-value} The child as a duckdb_value.
###### `duckdb_value_to_string` {#docs:stable:clients:c:value::duckdb_value_to_string} Returns the SQL string representation of the given value. ####### Syntax {#docs:stable:clients:c:value::syntax} ```c char *duckdb_value_to_string( duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:value::parameters} * `value`: A duckdb_value. ####### Return Value {#docs:stable:clients:c:value::return-value} The SQL string representation as a null-terminated string. The result must be freed with `duckdb_free`.
### Types {#docs:stable:clients:c:types} DuckDB is a strongly typed database system. As such, every column has a single type specified. This type is constant over the entire column. That is to say, a column that is labeled as an `INTEGER` column will only contain `INTEGER` values. DuckDB also supports columns of composite types. For example, it is possible to define an array of integers (` INTEGER[]`). It is also possible to define types as arbitrary structs (` ROW(i INTEGER, j VARCHAR)`). For that reason, native DuckDB type objects are not mere enums, but a class that can potentially be nested. Types in the C API are modeled using an enum (` duckdb_type`) and a complex class (` duckdb_logical_type`). For most primitive types, e.g., integers or varchars, the enum is sufficient. For more complex types, such as lists, structs or decimals, the logical type must be used. ```c typedef enum DUCKDB_TYPE { DUCKDB_TYPE_INVALID = 0, DUCKDB_TYPE_BOOLEAN = 1, DUCKDB_TYPE_TINYINT = 2, DUCKDB_TYPE_SMALLINT = 3, DUCKDB_TYPE_INTEGER = 4, DUCKDB_TYPE_BIGINT = 5, DUCKDB_TYPE_UTINYINT = 6, DUCKDB_TYPE_USMALLINT = 7, DUCKDB_TYPE_UINTEGER = 8, DUCKDB_TYPE_UBIGINT = 9, DUCKDB_TYPE_FLOAT = 10, DUCKDB_TYPE_DOUBLE = 11, DUCKDB_TYPE_TIMESTAMP = 12, DUCKDB_TYPE_DATE = 13, DUCKDB_TYPE_TIME = 14, DUCKDB_TYPE_INTERVAL = 15, DUCKDB_TYPE_HUGEINT = 16, DUCKDB_TYPE_UHUGEINT = 32, DUCKDB_TYPE_VARCHAR = 17, DUCKDB_TYPE_BLOB = 18, DUCKDB_TYPE_DECIMAL = 19, DUCKDB_TYPE_TIMESTAMP_S = 20, DUCKDB_TYPE_TIMESTAMP_MS = 21, DUCKDB_TYPE_TIMESTAMP_NS = 22, DUCKDB_TYPE_ENUM = 23, DUCKDB_TYPE_LIST = 24, DUCKDB_TYPE_STRUCT = 25, DUCKDB_TYPE_MAP = 26, DUCKDB_TYPE_ARRAY = 33, DUCKDB_TYPE_UUID = 27, DUCKDB_TYPE_UNION = 28, DUCKDB_TYPE_BIT = 29, DUCKDB_TYPE_TIME_TZ = 30, DUCKDB_TYPE_TIMESTAMP_TZ = 31, } duckdb_type; ``` #### Functions {#docs:stable:clients:c:types::functions} The enum type of a column in the result can be obtained using the `duckdb_column_type` function. The logical type of a column can be obtained using the `duckdb_column_logical_type` function. ##### `duckdb_value` {#docs:stable:clients:c:types::duckdb_value} The `duckdb_value` functions will auto-cast values as required. For example, it is no problem to use `duckdb_value_double` on a column of type `duckdb_value_int32`. The value will be auto-cast and returned as a double. Note that in certain cases the cast may fail. For example, this can happen if we request a `duckdb_value_int8` and the value does not fit within an `int8` value. In this case, a default value will be returned (usually `0` or `nullptr`). The same default value will also be returned if the corresponding value is `NULL`. The `duckdb_value_is_null` function can be used to check if a specific value is `NULL` or not. The exception to the auto-cast rule is the `duckdb_value_varchar_internal` function. This function does not auto-cast and only works for `VARCHAR` columns. The reason this function exists is that the result does not need to be freed. > `duckdb_value_varchar` and `duckdb_value_blob` require the result to be de-allocated using `duckdb_free`. ##### `duckdb_fetch_chunk` {#docs:stable:clients:c:types::duckdb_fetch_chunk} The `duckdb_fetch_chunk` function can be used to read data chunks from a DuckDB result set, and is the most efficient way of reading data from a DuckDB result using the C API. It is also the only way of reading data of certain types from a DuckDB result. For example, the `duckdb_value` functions do not support structural reading of composite types (lists or structs) or more complex types like enums and decimals. For more information about data chunks, see the [documentation on data chunks](#docs:stable:clients:c:data_chunk). #### API Reference Overview {#docs:stable:clients:c:types::api-reference-overview} ```c duckdb_data_chunk duckdb_result_get_chunk(duckdb_result result, idx_t chunk_index); bool duckdb_result_is_streaming(duckdb_result result); idx_t duckdb_result_chunk_count(duckdb_result result); duckdb_result_type duckdb_result_return_type(duckdb_result result); ``` ##### Date Time Timestamp Helpers {#docs:stable:clients:c:types::date-time-timestamp-helpers} ```c duckdb_date_struct duckdb_from_date(duckdb_date date); duckdb_date duckdb_to_date(duckdb_date_struct date); bool duckdb_is_finite_date(duckdb_date date); duckdb_time_struct duckdb_from_time(duckdb_time time); duckdb_time_tz duckdb_create_time_tz(int64_t micros, int32_t offset); duckdb_time_tz_struct duckdb_from_time_tz(duckdb_time_tz micros); duckdb_time duckdb_to_time(duckdb_time_struct time); duckdb_timestamp_struct duckdb_from_timestamp(duckdb_timestamp ts); duckdb_timestamp duckdb_to_timestamp(duckdb_timestamp_struct ts); bool duckdb_is_finite_timestamp(duckdb_timestamp ts); bool duckdb_is_finite_timestamp_s(duckdb_timestamp_s ts); bool duckdb_is_finite_timestamp_ms(duckdb_timestamp_ms ts); bool duckdb_is_finite_timestamp_ns(duckdb_timestamp_ns ts); ``` ##### Hugeint Helpers {#docs:stable:clients:c:types::hugeint-helpers} ```c double duckdb_hugeint_to_double(duckdb_hugeint val); duckdb_hugeint duckdb_double_to_hugeint(double val); ``` ##### Decimal Helpers {#docs:stable:clients:c:types::decimal-helpers} ```c duckdb_decimal duckdb_double_to_decimal(double val, uint8_t width, uint8_t scale); double duckdb_decimal_to_double(duckdb_decimal val); ``` ##### Logical Type Interface {#docs:stable:clients:c:types::logical-type-interface} ```c duckdb_logical_type duckdb_create_logical_type(duckdb_type type); char *duckdb_logical_type_get_alias(duckdb_logical_type type); void duckdb_logical_type_set_alias(duckdb_logical_type type, const char *alias); duckdb_logical_type duckdb_create_list_type(duckdb_logical_type type); duckdb_logical_type duckdb_create_array_type(duckdb_logical_type type, idx_t array_size); duckdb_logical_type duckdb_create_map_type(duckdb_logical_type key_type, duckdb_logical_type value_type); duckdb_logical_type duckdb_create_union_type(duckdb_logical_type *member_types, const char **member_names, idx_t member_count); duckdb_logical_type duckdb_create_struct_type(duckdb_logical_type *member_types, const char **member_names, idx_t member_count); duckdb_logical_type duckdb_create_enum_type(const char **member_names, idx_t member_count); duckdb_logical_type duckdb_create_decimal_type(uint8_t width, uint8_t scale); duckdb_type duckdb_get_type_id(duckdb_logical_type type); uint8_t duckdb_decimal_width(duckdb_logical_type type); uint8_t duckdb_decimal_scale(duckdb_logical_type type); duckdb_type duckdb_decimal_internal_type(duckdb_logical_type type); duckdb_type duckdb_enum_internal_type(duckdb_logical_type type); uint32_t duckdb_enum_dictionary_size(duckdb_logical_type type); char *duckdb_enum_dictionary_value(duckdb_logical_type type, idx_t index); duckdb_logical_type duckdb_list_type_child_type(duckdb_logical_type type); duckdb_logical_type duckdb_array_type_child_type(duckdb_logical_type type); idx_t duckdb_array_type_array_size(duckdb_logical_type type); duckdb_logical_type duckdb_map_type_key_type(duckdb_logical_type type); duckdb_logical_type duckdb_map_type_value_type(duckdb_logical_type type); idx_t duckdb_struct_type_child_count(duckdb_logical_type type); char *duckdb_struct_type_child_name(duckdb_logical_type type, idx_t index); duckdb_logical_type duckdb_struct_type_child_type(duckdb_logical_type type, idx_t index); idx_t duckdb_union_type_member_count(duckdb_logical_type type); char *duckdb_union_type_member_name(duckdb_logical_type type, idx_t index); duckdb_logical_type duckdb_union_type_member_type(duckdb_logical_type type, idx_t index); void duckdb_destroy_logical_type(duckdb_logical_type *type); duckdb_state duckdb_register_logical_type(duckdb_connection con, duckdb_logical_type type, duckdb_create_type_info info); ``` ###### `duckdb_result_get_chunk` {#docs:stable:clients:c:types::duckdb_result_get_chunk} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Fetches a data chunk from the duckdb_result. This function should be called repeatedly until the result is exhausted. The result must be destroyed with `duckdb_destroy_data_chunk`. This function supersedes all `duckdb_value` functions, as well as the `duckdb_column_data` and `duckdb_nullmask_data` functions. It results in significantly better performance, and should be preferred in newer code-bases. If this function is used, none of the other result functions can be used and vice versa (i.e., this function cannot be mixed with the legacy result functions). Use `duckdb_result_chunk_count` to figure out how many chunks there are in the result. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_data_chunk duckdb_result_get_chunk( duckdb_result result, idx_t chunk_index ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `result`: The result object to fetch the data chunk from. * `chunk_index`: The chunk index to fetch from. ####### Return Value {#docs:stable:clients:c:types::return-value} The resulting data chunk. Returns `NULL` if the chunk index is out of bounds.
###### `duckdb_result_is_streaming` {#docs:stable:clients:c:types::duckdb_result_is_streaming} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Checks if the type of the internal result is StreamQueryResult. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c bool duckdb_result_is_streaming( duckdb_result result ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `result`: The result object to check. ####### Return Value {#docs:stable:clients:c:types::return-value} Whether or not the result object is of the type StreamQueryResult
###### `duckdb_result_chunk_count` {#docs:stable:clients:c:types::duckdb_result_chunk_count} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Returns the number of data chunks present in the result. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c idx_t duckdb_result_chunk_count( duckdb_result result ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `result`: The result object ####### Return Value {#docs:stable:clients:c:types::return-value} Number of data chunks present in the result.
###### `duckdb_result_return_type` {#docs:stable:clients:c:types::duckdb_result_return_type} Returns the return_type of the given result, or DUCKDB_RETURN_TYPE_INVALID on error ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_result_type duckdb_result_return_type( duckdb_result result ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `result`: The result object ####### Return Value {#docs:stable:clients:c:types::return-value} The return_type
###### `duckdb_from_date` {#docs:stable:clients:c:types::duckdb_from_date} Decompose a `duckdb_date` object into year, month and date (stored as `duckdb_date_struct`). ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_date_struct duckdb_from_date( duckdb_date date ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `date`: The date object, as obtained from a `DUCKDB_TYPE_DATE` column. ####### Return Value {#docs:stable:clients:c:types::return-value} The `duckdb_date_struct` with the decomposed elements.
###### `duckdb_to_date` {#docs:stable:clients:c:types::duckdb_to_date} Re-compose a `duckdb_date` from year, month and date (` duckdb_date_struct`). ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_date duckdb_to_date( duckdb_date_struct date ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `date`: The year, month and date stored in a `duckdb_date_struct`. ####### Return Value {#docs:stable:clients:c:types::return-value} The `duckdb_date` element.
###### `duckdb_is_finite_date` {#docs:stable:clients:c:types::duckdb_is_finite_date} Test a `duckdb_date` to see if it is a finite value. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c bool duckdb_is_finite_date( duckdb_date date ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `date`: The date object, as obtained from a `DUCKDB_TYPE_DATE` column. ####### Return Value {#docs:stable:clients:c:types::return-value} True if the date is finite, false if it is Â±infinity.
###### `duckdb_from_time` {#docs:stable:clients:c:types::duckdb_from_time} Decompose a `duckdb_time` object into hour, minute, second and microsecond (stored as `duckdb_time_struct`). ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_time_struct duckdb_from_time( duckdb_time time ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `time`: The time object, as obtained from a `DUCKDB_TYPE_TIME` column. ####### Return Value {#docs:stable:clients:c:types::return-value} The `duckdb_time_struct` with the decomposed elements.
###### `duckdb_create_time_tz` {#docs:stable:clients:c:types::duckdb_create_time_tz} Create a `duckdb_time_tz` object from micros and a timezone offset. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_time_tz duckdb_create_time_tz( int64_t micros, int32_t offset ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `micros`: The microsecond component of the time. * `offset`: The timezone offset component of the time. ####### Return Value {#docs:stable:clients:c:types::return-value} The `duckdb_time_tz` element.
###### `duckdb_from_time_tz` {#docs:stable:clients:c:types::duckdb_from_time_tz} Decompose a TIME_TZ objects into micros and a timezone offset. Use `duckdb_from_time` to further decompose the micros into hour, minute, second and microsecond. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_time_tz_struct duckdb_from_time_tz( duckdb_time_tz micros ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `micros`: The time object, as obtained from a `DUCKDB_TYPE_TIME_TZ` column.
###### `duckdb_to_time` {#docs:stable:clients:c:types::duckdb_to_time} Re-compose a `duckdb_time` from hour, minute, second and microsecond (` duckdb_time_struct`). ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_time duckdb_to_time( duckdb_time_struct time ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `time`: The hour, minute, second and microsecond in a `duckdb_time_struct`. ####### Return Value {#docs:stable:clients:c:types::return-value} The `duckdb_time` element.
###### `duckdb_from_timestamp` {#docs:stable:clients:c:types::duckdb_from_timestamp} Decompose a `duckdb_timestamp` object into a `duckdb_timestamp_struct`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_timestamp_struct duckdb_from_timestamp( duckdb_timestamp ts ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `ts`: The ts object, as obtained from a `DUCKDB_TYPE_TIMESTAMP` column. ####### Return Value {#docs:stable:clients:c:types::return-value} The `duckdb_timestamp_struct` with the decomposed elements.
###### `duckdb_to_timestamp` {#docs:stable:clients:c:types::duckdb_to_timestamp} Re-compose a `duckdb_timestamp` from a duckdb_timestamp_struct. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_timestamp duckdb_to_timestamp( duckdb_timestamp_struct ts ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `ts`: The de-composed elements in a `duckdb_timestamp_struct`. ####### Return Value {#docs:stable:clients:c:types::return-value} The `duckdb_timestamp` element.
###### `duckdb_is_finite_timestamp` {#docs:stable:clients:c:types::duckdb_is_finite_timestamp} Test a `duckdb_timestamp` to see if it is a finite value. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c bool duckdb_is_finite_timestamp( duckdb_timestamp ts ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `ts`: The duckdb_timestamp object, as obtained from a `DUCKDB_TYPE_TIMESTAMP` column. ####### Return Value {#docs:stable:clients:c:types::return-value} True if the timestamp is finite, false if it is Â±infinity.
###### `duckdb_is_finite_timestamp_s` {#docs:stable:clients:c:types::duckdb_is_finite_timestamp_s} Test a `duckdb_timestamp_s` to see if it is a finite value. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c bool duckdb_is_finite_timestamp_s( duckdb_timestamp_s ts ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `ts`: The duckdb_timestamp_s object, as obtained from a `DUCKDB_TYPE_TIMESTAMP_S` column. ####### Return Value {#docs:stable:clients:c:types::return-value} True if the timestamp is finite, false if it is Â±infinity.
###### `duckdb_is_finite_timestamp_ms` {#docs:stable:clients:c:types::duckdb_is_finite_timestamp_ms} Test a `duckdb_timestamp_ms` to see if it is a finite value. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c bool duckdb_is_finite_timestamp_ms( duckdb_timestamp_ms ts ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `ts`: The duckdb_timestamp_ms object, as obtained from a `DUCKDB_TYPE_TIMESTAMP_MS` column. ####### Return Value {#docs:stable:clients:c:types::return-value} True if the timestamp is finite, false if it is Â±infinity.
###### `duckdb_is_finite_timestamp_ns` {#docs:stable:clients:c:types::duckdb_is_finite_timestamp_ns} Test a `duckdb_timestamp_ns` to see if it is a finite value. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c bool duckdb_is_finite_timestamp_ns( duckdb_timestamp_ns ts ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `ts`: The duckdb_timestamp_ns object, as obtained from a `DUCKDB_TYPE_TIMESTAMP_NS` column. ####### Return Value {#docs:stable:clients:c:types::return-value} True if the timestamp is finite, false if it is Â±infinity.
###### `duckdb_hugeint_to_double` {#docs:stable:clients:c:types::duckdb_hugeint_to_double} Converts a duckdb_hugeint object (as obtained from a `DUCKDB_TYPE_HUGEINT` column) into a double. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c double duckdb_hugeint_to_double( duckdb_hugeint val ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `val`: The hugeint value. ####### Return Value {#docs:stable:clients:c:types::return-value} The converted `double` element.
###### `duckdb_double_to_hugeint` {#docs:stable:clients:c:types::duckdb_double_to_hugeint} Converts a double value to a duckdb_hugeint object. If the conversion fails because the double value is too big the result will be 0. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_hugeint duckdb_double_to_hugeint( double val ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `val`: The double value. ####### Return Value {#docs:stable:clients:c:types::return-value} The converted `duckdb_hugeint` element.
###### `duckdb_double_to_decimal` {#docs:stable:clients:c:types::duckdb_double_to_decimal} Converts a double value to a duckdb_decimal object. If the conversion fails because the double value is too big, or the width/scale are invalid the result will be 0. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_decimal duckdb_double_to_decimal( double val, uint8_t width, uint8_t scale ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `val`: The double value. ####### Return Value {#docs:stable:clients:c:types::return-value} The converted `duckdb_decimal` element.
###### `duckdb_decimal_to_double` {#docs:stable:clients:c:types::duckdb_decimal_to_double} Converts a duckdb_decimal object (as obtained from a `DUCKDB_TYPE_DECIMAL` column) into a double. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c double duckdb_decimal_to_double( duckdb_decimal val ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `val`: The decimal value. ####### Return Value {#docs:stable:clients:c:types::return-value} The converted `double` element.
###### `duckdb_create_logical_type` {#docs:stable:clients:c:types::duckdb_create_logical_type} Creates a `duckdb_logical_type` from a primitive type. The resulting logical type must be destroyed with `duckdb_destroy_logical_type`. Returns an invalid logical type, if type is: `DUCKDB_TYPE_INVALID`, `DUCKDB_TYPE_DECIMAL`, `DUCKDB_TYPE_ENUM`, `DUCKDB_TYPE_LIST`, `DUCKDB_TYPE_STRUCT`, `DUCKDB_TYPE_MAP`, `DUCKDB_TYPE_ARRAY`, or `DUCKDB_TYPE_UNION`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_logical_type duckdb_create_logical_type( duckdb_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The primitive type to create. ####### Return Value {#docs:stable:clients:c:types::return-value} The logical type.
###### `duckdb_logical_type_get_alias` {#docs:stable:clients:c:types::duckdb_logical_type_get_alias} Returns the alias of a duckdb_logical_type, if set, else `nullptr`. The result must be destroyed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c char *duckdb_logical_type_get_alias( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type ####### Return Value {#docs:stable:clients:c:types::return-value} The alias or `nullptr`
###### `duckdb_logical_type_set_alias` {#docs:stable:clients:c:types::duckdb_logical_type_set_alias} Sets the alias of a duckdb_logical_type. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c void duckdb_logical_type_set_alias( duckdb_logical_type type, const char *alias ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type * `alias`: The alias to set
###### `duckdb_create_list_type` {#docs:stable:clients:c:types::duckdb_create_list_type} Creates a LIST type from its child type. The return type must be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_logical_type duckdb_create_list_type( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The child type of the list ####### Return Value {#docs:stable:clients:c:types::return-value} The logical type.
###### `duckdb_create_array_type` {#docs:stable:clients:c:types::duckdb_create_array_type} Creates an ARRAY type from its child type. The return type must be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_logical_type duckdb_create_array_type( duckdb_logical_type type, idx_t array_size ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The child type of the array. * `array_size`: The number of elements in the array. ####### Return Value {#docs:stable:clients:c:types::return-value} The logical type.
###### `duckdb_create_map_type` {#docs:stable:clients:c:types::duckdb_create_map_type} Creates a MAP type from its key type and value type. The return type must be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_logical_type duckdb_create_map_type( duckdb_logical_type key_type, duckdb_logical_type value_type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `key_type`: The map's key type. * `value_type`: The map's value type. ####### Return Value {#docs:stable:clients:c:types::return-value} The logical type.
###### `duckdb_create_union_type` {#docs:stable:clients:c:types::duckdb_create_union_type} Creates a UNION type from the passed arrays. The return type must be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_logical_type duckdb_create_union_type( duckdb_logical_type *member_types, const char **member_names, idx_t member_count ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `member_types`: The array of union member types. * `member_names`: The union member names. * `member_count`: The number of union members. ####### Return Value {#docs:stable:clients:c:types::return-value} The logical type.
###### `duckdb_create_struct_type` {#docs:stable:clients:c:types::duckdb_create_struct_type} Creates a STRUCT type based on the member types and names. The resulting type must be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_logical_type duckdb_create_struct_type( duckdb_logical_type *member_types, const char **member_names, idx_t member_count ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `member_types`: The array of types of the struct members. * `member_names`: The array of names of the struct members. * `member_count`: The number of members of the struct. ####### Return Value {#docs:stable:clients:c:types::return-value} The logical type.
###### `duckdb_create_enum_type` {#docs:stable:clients:c:types::duckdb_create_enum_type} Creates an ENUM type from the passed member name array. The resulting type should be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_logical_type duckdb_create_enum_type( const char **member_names, idx_t member_count ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `member_names`: The array of names that the enum should consist of. * `member_count`: The number of elements that were specified in the array. ####### Return Value {#docs:stable:clients:c:types::return-value} The logical type.
###### `duckdb_create_decimal_type` {#docs:stable:clients:c:types::duckdb_create_decimal_type} Creates a DECIMAL type with the specified width and scale. The resulting type should be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_logical_type duckdb_create_decimal_type( uint8_t width, uint8_t scale ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `width`: The width of the decimal type * `scale`: The scale of the decimal type ####### Return Value {#docs:stable:clients:c:types::return-value} The logical type.
###### `duckdb_get_type_id` {#docs:stable:clients:c:types::duckdb_get_type_id} Retrieves the enum `duckdb_type` of a `duckdb_logical_type`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_type duckdb_get_type_id( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type. ####### Return Value {#docs:stable:clients:c:types::return-value} The `duckdb_type` id.
###### `duckdb_decimal_width` {#docs:stable:clients:c:types::duckdb_decimal_width} Retrieves the width of a decimal type. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c uint8_t duckdb_decimal_width( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:types::return-value} The width of the decimal type
###### `duckdb_decimal_scale` {#docs:stable:clients:c:types::duckdb_decimal_scale} Retrieves the scale of a decimal type. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c uint8_t duckdb_decimal_scale( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:types::return-value} The scale of the decimal type
###### `duckdb_decimal_internal_type` {#docs:stable:clients:c:types::duckdb_decimal_internal_type} Retrieves the internal storage type of a decimal type. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_type duckdb_decimal_internal_type( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:types::return-value} The internal type of the decimal type
###### `duckdb_enum_internal_type` {#docs:stable:clients:c:types::duckdb_enum_internal_type} Retrieves the internal storage type of an enum type. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_type duckdb_enum_internal_type( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:types::return-value} The internal type of the enum type
###### `duckdb_enum_dictionary_size` {#docs:stable:clients:c:types::duckdb_enum_dictionary_size} Retrieves the dictionary size of the enum type. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c uint32_t duckdb_enum_dictionary_size( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:types::return-value} The dictionary size of the enum type
###### `duckdb_enum_dictionary_value` {#docs:stable:clients:c:types::duckdb_enum_dictionary_value} Retrieves the dictionary value at the specified position from the enum. The result must be freed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c char *duckdb_enum_dictionary_value( duckdb_logical_type type, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type object * `index`: The index in the dictionary ####### Return Value {#docs:stable:clients:c:types::return-value} The string value of the enum type. Must be freed with `duckdb_free`.
###### `duckdb_list_type_child_type` {#docs:stable:clients:c:types::duckdb_list_type_child_type} Retrieves the child type of the given LIST type. Also accepts MAP types. The result must be freed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_logical_type duckdb_list_type_child_type( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type, either LIST or MAP. ####### Return Value {#docs:stable:clients:c:types::return-value} The child type of the LIST or MAP type.
###### `duckdb_array_type_child_type` {#docs:stable:clients:c:types::duckdb_array_type_child_type} Retrieves the child type of the given ARRAY type. The result must be freed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_logical_type duckdb_array_type_child_type( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type. Must be ARRAY. ####### Return Value {#docs:stable:clients:c:types::return-value} The child type of the ARRAY type.
###### `duckdb_array_type_array_size` {#docs:stable:clients:c:types::duckdb_array_type_array_size} Retrieves the array size of the given array type. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c idx_t duckdb_array_type_array_size( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:types::return-value} The fixed number of elements the values of this array type can store.
###### `duckdb_map_type_key_type` {#docs:stable:clients:c:types::duckdb_map_type_key_type} Retrieves the key type of the given map type. The result must be freed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_logical_type duckdb_map_type_key_type( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:types::return-value} The key type of the map type. Must be destroyed with `duckdb_destroy_logical_type`.
###### `duckdb_map_type_value_type` {#docs:stable:clients:c:types::duckdb_map_type_value_type} Retrieves the value type of the given map type. The result must be freed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_logical_type duckdb_map_type_value_type( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:types::return-value} The value type of the map type. Must be destroyed with `duckdb_destroy_logical_type`.
###### `duckdb_struct_type_child_count` {#docs:stable:clients:c:types::duckdb_struct_type_child_count} Returns the number of children of a struct type. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c idx_t duckdb_struct_type_child_count( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:types::return-value} The number of children of a struct type.
###### `duckdb_struct_type_child_name` {#docs:stable:clients:c:types::duckdb_struct_type_child_name} Retrieves the name of the struct child. The result must be freed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c char *duckdb_struct_type_child_name( duckdb_logical_type type, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type object * `index`: The child index ####### Return Value {#docs:stable:clients:c:types::return-value} The name of the struct type. Must be freed with `duckdb_free`.
###### `duckdb_struct_type_child_type` {#docs:stable:clients:c:types::duckdb_struct_type_child_type} Retrieves the child type of the given struct type at the specified index. The result must be freed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_logical_type duckdb_struct_type_child_type( duckdb_logical_type type, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type object * `index`: The child index ####### Return Value {#docs:stable:clients:c:types::return-value} The child type of the struct type. Must be destroyed with `duckdb_destroy_logical_type`.
###### `duckdb_union_type_member_count` {#docs:stable:clients:c:types::duckdb_union_type_member_count} Returns the number of members that the union type has. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c idx_t duckdb_union_type_member_count( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type (union) object ####### Return Value {#docs:stable:clients:c:types::return-value} The number of members of a union type.
###### `duckdb_union_type_member_name` {#docs:stable:clients:c:types::duckdb_union_type_member_name} Retrieves the name of the union member. The result must be freed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c char *duckdb_union_type_member_name( duckdb_logical_type type, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type object * `index`: The child index ####### Return Value {#docs:stable:clients:c:types::return-value} The name of the union member. Must be freed with `duckdb_free`.
###### `duckdb_union_type_member_type` {#docs:stable:clients:c:types::duckdb_union_type_member_type} Retrieves the child type of the given union member at the specified index. The result must be freed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_logical_type duckdb_union_type_member_type( duckdb_logical_type type, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type object * `index`: The child index ####### Return Value {#docs:stable:clients:c:types::return-value} The child type of the union member. Must be destroyed with `duckdb_destroy_logical_type`.
###### `duckdb_destroy_logical_type` {#docs:stable:clients:c:types::duckdb_destroy_logical_type} Destroys the logical type and de-allocates all memory allocated for that type. ####### Syntax {#docs:stable:clients:c:types::syntax} ```c void duckdb_destroy_logical_type( duckdb_logical_type *type ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `type`: The logical type to destroy.
###### `duckdb_register_logical_type` {#docs:stable:clients:c:types::duckdb_register_logical_type} Registers a custom type within the given connection. The type must have an alias ####### Syntax {#docs:stable:clients:c:types::syntax} ```c duckdb_state duckdb_register_logical_type( duckdb_connection con, duckdb_logical_type type, duckdb_create_type_info info ); ``` ####### Parameters {#docs:stable:clients:c:types::parameters} * `con`: The connection to use * `type`: The custom type to register ####### Return Value {#docs:stable:clients:c:types::return-value} Whether or not the registration was successful.
### Prepared Statements {#docs:stable:clients:c:prepared} A prepared statement is a parameterized query. The query is prepared with question marks (` ?`) or dollar symbols (` $1`) indicating the parameters of the query. Values can then be bound to these parameters, after which the prepared statement can be executed using those parameters. A single query can be prepared once and executed many times. Prepared statements are useful to: * Easily supply parameters to functions while avoiding string concatenation/SQL injection attacks. * Speeding up queries that will be executed many times with different parameters. DuckDB supports prepared statements in the C API with the `duckdb_prepare` method. The `duckdb_bind` family of functions is used to supply values for subsequent execution of the prepared statement using `duckdb_execute_prepared`. After we are done with the prepared statement it can be cleaned up using the `duckdb_destroy_prepare` method. #### Example {#docs:stable:clients:c:prepared::example} ```c duckdb_prepared_statement stmt; duckdb_result result; if (duckdb_prepare(con, "INSERT INTO integers VALUES ($1, $2)", &stmt) == DuckDBError) { // handle error } duckdb_bind_int32(stmt, 1, 42); // the parameter index starts counting at 1! duckdb_bind_int32(stmt, 2, 43); // NULL as second parameter means no result set is requested duckdb_execute_prepared(stmt, NULL); duckdb_destroy_prepare(&stmt); // we can also query result sets using prepared statements if (duckdb_prepare(con, "SELECT * FROM integers WHERE i = ?", &stmt) == DuckDBError) { // handle error } duckdb_bind_int32(stmt, 1, 42); duckdb_execute_prepared(stmt, &result); // do something with result // clean up duckdb_destroy_result(&result); duckdb_destroy_prepare(&stmt); ``` After calling `duckdb_prepare`, the prepared statement parameters can be inspected using `duckdb_nparams` and `duckdb_param_type`. In case the prepare fails, the error can be obtained through `duckdb_prepare_error`. It is not required that the `duckdb_bind` family of functions matches the prepared statement parameter type exactly. The values will be auto-cast to the required value as required. For example, calling `duckdb_bind_int8` on a parameter type of `DUCKDB_TYPE_INTEGER` will work as expected. > **Warning.** Do **not** use prepared statements to insert large amounts of data into DuckDB. Instead it is recommended to use the [Appender](#docs:stable:clients:c:appender). #### API Reference Overview {#docs:stable:clients:c:prepared::api-reference-overview} ```c duckdb_state duckdb_prepare(duckdb_connection connection, const char *query, duckdb_prepared_statement *out_prepared_statement); void duckdb_destroy_prepare(duckdb_prepared_statement *prepared_statement); const char *duckdb_prepare_error(duckdb_prepared_statement prepared_statement); idx_t duckdb_nparams(duckdb_prepared_statement prepared_statement); const char *duckdb_parameter_name(duckdb_prepared_statement prepared_statement, idx_t index); duckdb_type duckdb_param_type(duckdb_prepared_statement prepared_statement, idx_t param_idx); duckdb_logical_type duckdb_param_logical_type(duckdb_prepared_statement prepared_statement, idx_t param_idx); duckdb_state duckdb_clear_bindings(duckdb_prepared_statement prepared_statement); duckdb_statement_type duckdb_prepared_statement_type(duckdb_prepared_statement statement); idx_t duckdb_prepared_statement_column_count(duckdb_prepared_statement prepared_statement); const char *duckdb_prepared_statement_column_name(duckdb_prepared_statement prepared_statement, idx_t col_idx); duckdb_logical_type duckdb_prepared_statement_column_logical_type(duckdb_prepared_statement prepared_statement, idx_t col_idx); duckdb_type duckdb_prepared_statement_column_type(duckdb_prepared_statement prepared_statement, idx_t col_idx); ``` ###### `duckdb_prepare` {#docs:stable:clients:c:prepared::duckdb_prepare} Create a prepared statement object from a query. Note that after calling `duckdb_prepare`, the prepared statement should always be destroyed using `duckdb_destroy_prepare`, even if the prepare fails. If the prepare fails, `duckdb_prepare_error` can be called to obtain the reason why the prepare failed. ####### Syntax {#docs:stable:clients:c:prepared::syntax} ```c duckdb_state duckdb_prepare( duckdb_connection connection, const char *query, duckdb_prepared_statement *out_prepared_statement ); ``` ####### Parameters {#docs:stable:clients:c:prepared::parameters} * `connection`: The connection object * `query`: The SQL query to prepare * `out_prepared_statement`: The resulting prepared statement object ####### Return Value {#docs:stable:clients:c:prepared::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_destroy_prepare` {#docs:stable:clients:c:prepared::duckdb_destroy_prepare} Closes the prepared statement and de-allocates all memory allocated for the statement. ####### Syntax {#docs:stable:clients:c:prepared::syntax} ```c void duckdb_destroy_prepare( duckdb_prepared_statement *prepared_statement ); ``` ####### Parameters {#docs:stable:clients:c:prepared::parameters} * `prepared_statement`: The prepared statement to destroy.
###### `duckdb_prepare_error` {#docs:stable:clients:c:prepared::duckdb_prepare_error} Returns the error message associated with the given prepared statement. If the prepared statement has no error message, this returns `nullptr` instead. The error message should not be freed. It will be de-allocated when `duckdb_destroy_prepare` is called. ####### Syntax {#docs:stable:clients:c:prepared::syntax} ```c const char *duckdb_prepare_error( duckdb_prepared_statement prepared_statement ); ``` ####### Parameters {#docs:stable:clients:c:prepared::parameters} * `prepared_statement`: The prepared statement to obtain the error from. ####### Return Value {#docs:stable:clients:c:prepared::return-value} The error message, or `nullptr` if there is none.
###### `duckdb_nparams` {#docs:stable:clients:c:prepared::duckdb_nparams} Returns the number of parameters that can be provided to the given prepared statement. Returns 0 if the query was not successfully prepared. ####### Syntax {#docs:stable:clients:c:prepared::syntax} ```c idx_t duckdb_nparams( duckdb_prepared_statement prepared_statement ); ``` ####### Parameters {#docs:stable:clients:c:prepared::parameters} * `prepared_statement`: The prepared statement to obtain the number of parameters for.
###### `duckdb_parameter_name` {#docs:stable:clients:c:prepared::duckdb_parameter_name} Returns the name used to identify the parameter The returned string should be freed using `duckdb_free`. Returns NULL if the index is out of range for the provided prepared statement. ####### Syntax {#docs:stable:clients:c:prepared::syntax} ```c const char *duckdb_parameter_name( duckdb_prepared_statement prepared_statement, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:prepared::parameters} * `prepared_statement`: The prepared statement for which to get the parameter name from.
###### `duckdb_param_type` {#docs:stable:clients:c:prepared::duckdb_param_type} Returns the parameter type for the parameter at the given index. Returns `DUCKDB_TYPE_INVALID` if the parameter index is out of range or the statement was not successfully prepared. ####### Syntax {#docs:stable:clients:c:prepared::syntax} ```c duckdb_type duckdb_param_type( duckdb_prepared_statement prepared_statement, idx_t param_idx ); ``` ####### Parameters {#docs:stable:clients:c:prepared::parameters} * `prepared_statement`: The prepared statement. * `param_idx`: The parameter index. ####### Return Value {#docs:stable:clients:c:prepared::return-value} The parameter type
###### `duckdb_param_logical_type` {#docs:stable:clients:c:prepared::duckdb_param_logical_type} Returns the logical type for the parameter at the given index. Returns `nullptr` if the parameter index is out of range or the statement was not successfully prepared. The return type of this call should be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:prepared::syntax} ```c duckdb_logical_type duckdb_param_logical_type( duckdb_prepared_statement prepared_statement, idx_t param_idx ); ``` ####### Parameters {#docs:stable:clients:c:prepared::parameters} * `prepared_statement`: The prepared statement. * `param_idx`: The parameter index. ####### Return Value {#docs:stable:clients:c:prepared::return-value} The logical type of the parameter
###### `duckdb_clear_bindings` {#docs:stable:clients:c:prepared::duckdb_clear_bindings} Clear the params bind to the prepared statement. ####### Syntax {#docs:stable:clients:c:prepared::syntax} ```c duckdb_state duckdb_clear_bindings( duckdb_prepared_statement prepared_statement ); ```
###### `duckdb_prepared_statement_type` {#docs:stable:clients:c:prepared::duckdb_prepared_statement_type} Returns the statement type of the statement to be executed ####### Syntax {#docs:stable:clients:c:prepared::syntax} ```c duckdb_statement_type duckdb_prepared_statement_type( duckdb_prepared_statement statement ); ``` ####### Parameters {#docs:stable:clients:c:prepared::parameters} * `statement`: The prepared statement. ####### Return Value {#docs:stable:clients:c:prepared::return-value} duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID
###### `duckdb_prepared_statement_column_count` {#docs:stable:clients:c:prepared::duckdb_prepared_statement_column_count} Returns the number of columns present in a the result of the prepared statement. If any of the column types are invalid, the result will be 1. ####### Syntax {#docs:stable:clients:c:prepared::syntax} ```c idx_t duckdb_prepared_statement_column_count( duckdb_prepared_statement prepared_statement ); ``` ####### Parameters {#docs:stable:clients:c:prepared::parameters} * `prepared_statement`: The prepared statement. ####### Return Value {#docs:stable:clients:c:prepared::return-value} The number of columns present in the result of the prepared statement.
###### `duckdb_prepared_statement_column_name` {#docs:stable:clients:c:prepared::duckdb_prepared_statement_column_name} Returns the name of the specified column of the result of the prepared_statement. The returned string should be freed using `duckdb_free`. Returns `nullptr` if the column is out of range. ####### Syntax {#docs:stable:clients:c:prepared::syntax} ```c const char *duckdb_prepared_statement_column_name( duckdb_prepared_statement prepared_statement, idx_t col_idx ); ``` ####### Parameters {#docs:stable:clients:c:prepared::parameters} * `prepared_statement`: The prepared statement. * `col_idx`: The column index. ####### Return Value {#docs:stable:clients:c:prepared::return-value} The column name of the specified column.
###### `duckdb_prepared_statement_column_logical_type` {#docs:stable:clients:c:prepared::duckdb_prepared_statement_column_logical_type} Returns the column type of the specified column of the result of the prepared_statement. Returns `DUCKDB_TYPE_INVALID` if the column is out of range. The return type of this call should be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:prepared::syntax} ```c duckdb_logical_type duckdb_prepared_statement_column_logical_type( duckdb_prepared_statement prepared_statement, idx_t col_idx ); ``` ####### Parameters {#docs:stable:clients:c:prepared::parameters} * `prepared_statement`: The prepared statement to fetch the column type from. * `col_idx`: The column index. ####### Return Value {#docs:stable:clients:c:prepared::return-value} The logical type of the specified column.
###### `duckdb_prepared_statement_column_type` {#docs:stable:clients:c:prepared::duckdb_prepared_statement_column_type} Returns the column type of the specified column of the result of the prepared_statement. Returns `DUCKDB_TYPE_INVALID` if the column is out of range. ####### Syntax {#docs:stable:clients:c:prepared::syntax} ```c duckdb_type duckdb_prepared_statement_column_type( duckdb_prepared_statement prepared_statement, idx_t col_idx ); ``` ####### Parameters {#docs:stable:clients:c:prepared::parameters} * `prepared_statement`: The prepared statement to fetch the column type from. * `col_idx`: The column index. ####### Return Value {#docs:stable:clients:c:prepared::return-value} The type of the specified column.
### Appender {#docs:stable:clients:c:appender} Appenders are the most efficient way of loading data into DuckDB from within the C interface, and are recommended for fast data loading. The appender is much faster than using prepared statements or individual `INSERT INTO` statements. Appends are made in row-wise format. For every column, a `duckdb_append_[type]` call should be made, after which the row should be finished by calling `duckdb_appender_end_row`. After all rows have been appended, `duckdb_appender_destroy` should be used to finalize the appender and clean up the resulting memory. Note that `duckdb_appender_destroy` should always be called on the resulting appender, even if the function returns `DuckDBError`. #### Example {#docs:stable:clients:c:appender::example} ```c duckdb_query(con, "CREATE TABLE people (id INTEGER, name VARCHAR)", NULL); duckdb_appender appender; if (duckdb_appender_create(con, NULL, "people", &appender) == DuckDBError) { // handle error } // append the first row (1, Mark) duckdb_append_int32(appender, 1); duckdb_append_varchar(appender, "Mark"); duckdb_appender_end_row(appender); // append the second row (2, Hannes) duckdb_append_int32(appender, 2); duckdb_append_varchar(appender, "Hannes"); duckdb_appender_end_row(appender); // finish appending and flush all the rows to the table duckdb_appender_destroy(&appender); ``` #### API Reference Overview {#docs:stable:clients:c:appender::api-reference-overview} ```c duckdb_state duckdb_appender_create(duckdb_connection connection, const char *schema, const char *table, duckdb_appender *out_appender); duckdb_state duckdb_appender_create_ext(duckdb_connection connection, const char *catalog, const char *schema, const char *table, duckdb_appender *out_appender); duckdb_state duckdb_appender_create_query(duckdb_connection connection, const char *query, idx_t column_count, duckdb_logical_type *types, const char *table_name, const char **column_names, duckdb_appender *out_appender); idx_t duckdb_appender_column_count(duckdb_appender appender); duckdb_logical_type duckdb_appender_column_type(duckdb_appender appender, idx_t col_idx); const char *duckdb_appender_error(duckdb_appender appender); duckdb_error_data duckdb_appender_error_data(duckdb_appender appender); duckdb_state duckdb_appender_flush(duckdb_appender appender); duckdb_state duckdb_appender_close(duckdb_appender appender); duckdb_state duckdb_appender_destroy(duckdb_appender *appender); duckdb_state duckdb_appender_add_column(duckdb_appender appender, const char *name); duckdb_state duckdb_appender_clear_columns(duckdb_appender appender); duckdb_state duckdb_appender_begin_row(duckdb_appender appender); duckdb_state duckdb_appender_end_row(duckdb_appender appender); duckdb_state duckdb_append_default(duckdb_appender appender); duckdb_state duckdb_append_default_to_chunk(duckdb_appender appender, duckdb_data_chunk chunk, idx_t col, idx_t row); duckdb_state duckdb_append_bool(duckdb_appender appender, bool value); duckdb_state duckdb_append_int8(duckdb_appender appender, int8_t value); duckdb_state duckdb_append_int16(duckdb_appender appender, int16_t value); duckdb_state duckdb_append_int32(duckdb_appender appender, int32_t value); duckdb_state duckdb_append_int64(duckdb_appender appender, int64_t value); duckdb_state duckdb_append_hugeint(duckdb_appender appender, duckdb_hugeint value); duckdb_state duckdb_append_uint8(duckdb_appender appender, uint8_t value); duckdb_state duckdb_append_uint16(duckdb_appender appender, uint16_t value); duckdb_state duckdb_append_uint32(duckdb_appender appender, uint32_t value); duckdb_state duckdb_append_uint64(duckdb_appender appender, uint64_t value); duckdb_state duckdb_append_uhugeint(duckdb_appender appender, duckdb_uhugeint value); duckdb_state duckdb_append_float(duckdb_appender appender, float value); duckdb_state duckdb_append_double(duckdb_appender appender, double value); duckdb_state duckdb_append_date(duckdb_appender appender, duckdb_date value); duckdb_state duckdb_append_time(duckdb_appender appender, duckdb_time value); duckdb_state duckdb_append_timestamp(duckdb_appender appender, duckdb_timestamp value); duckdb_state duckdb_append_interval(duckdb_appender appender, duckdb_interval value); duckdb_state duckdb_append_varchar(duckdb_appender appender, const char *val); duckdb_state duckdb_append_varchar_length(duckdb_appender appender, const char *val, idx_t length); duckdb_state duckdb_append_blob(duckdb_appender appender, const void *data, idx_t length); duckdb_state duckdb_append_null(duckdb_appender appender); duckdb_state duckdb_append_value(duckdb_appender appender, duckdb_value value); duckdb_state duckdb_append_data_chunk(duckdb_appender appender, duckdb_data_chunk chunk); ``` ###### `duckdb_appender_create` {#docs:stable:clients:c:appender::duckdb_appender_create} Creates an appender object. Note that the object must be destroyed with `duckdb_appender_destroy`. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_appender_create( duckdb_connection connection, const char *schema, const char *table, duckdb_appender *out_appender ); ``` ####### Parameters {#docs:stable:clients:c:appender::parameters} * `connection`: The connection context to create the appender in. * `schema`: The schema of the table to append to, or `nullptr` for the default schema. * `table`: The table name to append to. * `out_appender`: The resulting appender object. ####### Return Value {#docs:stable:clients:c:appender::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_create_ext` {#docs:stable:clients:c:appender::duckdb_appender_create_ext} Creates an appender object. Note that the object must be destroyed with `duckdb_appender_destroy`. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_appender_create_ext( duckdb_connection connection, const char *catalog, const char *schema, const char *table, duckdb_appender *out_appender ); ``` ####### Parameters {#docs:stable:clients:c:appender::parameters} * `connection`: The connection context to create the appender in. * `catalog`: The catalog of the table to append to, or `nullptr` for the default catalog. * `schema`: The schema of the table to append to, or `nullptr` for the default schema. * `table`: The table name to append to. * `out_appender`: The resulting appender object. ####### Return Value {#docs:stable:clients:c:appender::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_create_query` {#docs:stable:clients:c:appender::duckdb_appender_create_query} Creates an appender object that executes the given query with any data appended to it. Note that the object must be destroyed with `duckdb_appender_destroy`. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_appender_create_query( duckdb_connection connection, const char *query, idx_t column_count, duckdb_logical_type *types, const char *table_name, const char **column_names, duckdb_appender *out_appender ); ``` ####### Parameters {#docs:stable:clients:c:appender::parameters} * `connection`: The connection context to create the appender in. * `query`: The query to execute, can be an INSERT, DELETE, UPDATE or MERGE INTO statement. * `column_count`: The number of columns to append. * `types`: The types of the columns to append. * `table_name`: (optionally) the table name used to refer to the appended data, defaults to "appended_data". * `column_names`: (optionally) the list of column names, defaults to "col1", "col2", ... * `out_appender`: The resulting appender object. ####### Return Value {#docs:stable:clients:c:appender::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_column_count` {#docs:stable:clients:c:appender::duckdb_appender_column_count} Returns the number of columns that belong to the appender. If there is no active column list, then this equals the table's physical columns. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c idx_t duckdb_appender_column_count( duckdb_appender appender ); ``` ####### Parameters {#docs:stable:clients:c:appender::parameters} * `appender`: The appender to get the column count from. ####### Return Value {#docs:stable:clients:c:appender::return-value} The number of columns in the data chunks.
###### `duckdb_appender_column_type` {#docs:stable:clients:c:appender::duckdb_appender_column_type} Returns the type of the column at the specified index. This is either a type in the active column list, or the same type as a column in the receiving table. Note: The resulting type must be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_logical_type duckdb_appender_column_type( duckdb_appender appender, idx_t col_idx ); ``` ####### Parameters {#docs:stable:clients:c:appender::parameters} * `appender`: The appender to get the column type from. * `col_idx`: The index of the column to get the type of. ####### Return Value {#docs:stable:clients:c:appender::return-value} The `duckdb_logical_type` of the column.
###### `duckdb_appender_error` {#docs:stable:clients:c:appender::duckdb_appender_error} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Use duckdb_appender_error_data instead. Returns the error message associated with the appender. If the appender has no error message, this returns `nullptr` instead. The error message should not be freed. It will be de-allocated when `duckdb_appender_destroy` is called. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c const char *duckdb_appender_error( duckdb_appender appender ); ``` ####### Parameters {#docs:stable:clients:c:appender::parameters} * `appender`: The appender to get the error from. ####### Return Value {#docs:stable:clients:c:appender::return-value} The error message, or `nullptr` if there is none.
###### `duckdb_appender_error_data` {#docs:stable:clients:c:appender::duckdb_appender_error_data} Returns the error data associated with the appender. Must be destroyed with duckdb_destroy_error_data. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_error_data duckdb_appender_error_data( duckdb_appender appender ); ``` ####### Parameters {#docs:stable:clients:c:appender::parameters} * `appender`: The appender to get the error data from. ####### Return Value {#docs:stable:clients:c:appender::return-value} The error data.
###### `duckdb_appender_flush` {#docs:stable:clients:c:appender::duckdb_appender_flush} Flush the appender to the table, forcing the cache of the appender to be cleared. If flushing the data triggers a constraint violation or any other error, then all data is invalidated, and this function returns DuckDBError. It is not possible to append more values. Call duckdb_appender_error_data to obtain the error data followed by duckdb_appender_destroy to destroy the invalidated appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_appender_flush( duckdb_appender appender ); ``` ####### Parameters {#docs:stable:clients:c:appender::parameters} * `appender`: The appender to flush. ####### Return Value {#docs:stable:clients:c:appender::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_close` {#docs:stable:clients:c:appender::duckdb_appender_close} Closes the appender by flushing all intermediate states and closing it for further appends. If flushing the data triggers a constraint violation or any other error, then all data is invalidated, and this function returns DuckDBError. Call duckdb_appender_error_data to obtain the error data followed by duckdb_appender_destroy to destroy the invalidated appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_appender_close( duckdb_appender appender ); ``` ####### Parameters {#docs:stable:clients:c:appender::parameters} * `appender`: The appender to flush and close. ####### Return Value {#docs:stable:clients:c:appender::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_destroy` {#docs:stable:clients:c:appender::duckdb_appender_destroy} Closes the appender by flushing all intermediate states to the table and destroying it. By destroying it, this function de-allocates all memory associated with the appender. If flushing the data triggers a constraint violation, then all data is invalidated, and this function returns DuckDBError. Due to the destruction of the appender, it is no longer possible to obtain the specific error message with duckdb_appender_error. Therefore, call duckdb_appender_close before destroying the appender, if you need insights into the specific error. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_appender_destroy( duckdb_appender *appender ); ``` ####### Parameters {#docs:stable:clients:c:appender::parameters} * `appender`: The appender to flush, close and destroy. ####### Return Value {#docs:stable:clients:c:appender::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_add_column` {#docs:stable:clients:c:appender::duckdb_appender_add_column} Appends a column to the active column list of the appender. Immediately flushes all previous data. The active column list specifies all columns that are expected when flushing the data. Any non-active columns are filled with their default values, or NULL. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_appender_add_column( duckdb_appender appender, const char *name ); ``` ####### Parameters {#docs:stable:clients:c:appender::parameters} * `appender`: The appender to add the column to. ####### Return Value {#docs:stable:clients:c:appender::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_clear_columns` {#docs:stable:clients:c:appender::duckdb_appender_clear_columns} Removes all columns from the active column list of the appender, resetting the appender to treat all columns as active. Immediately flushes all previous data. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_appender_clear_columns( duckdb_appender appender ); ``` ####### Parameters {#docs:stable:clients:c:appender::parameters} * `appender`: The appender to clear the columns from. ####### Return Value {#docs:stable:clients:c:appender::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_begin_row` {#docs:stable:clients:c:appender::duckdb_appender_begin_row} A nop function, provided for backwards compatibility reasons. Does nothing. Only `duckdb_appender_end_row` is required. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_appender_begin_row( duckdb_appender appender ); ```
###### `duckdb_appender_end_row` {#docs:stable:clients:c:appender::duckdb_appender_end_row} Finish the current row of appends. After end_row is called, the next row can be appended. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_appender_end_row( duckdb_appender appender ); ``` ####### Parameters {#docs:stable:clients:c:appender::parameters} * `appender`: The appender. ####### Return Value {#docs:stable:clients:c:appender::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_append_default` {#docs:stable:clients:c:appender::duckdb_append_default} Append a DEFAULT value (NULL if DEFAULT not available for column) to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_default( duckdb_appender appender ); ```
###### `duckdb_append_default_to_chunk` {#docs:stable:clients:c:appender::duckdb_append_default_to_chunk} Append a DEFAULT value, at the specified row and column, (NULL if DEFAULT not available for column) to the chunk created from the specified appender. The default value of the column must be a constant value. Non-deterministic expressions like nextval('seq') or random() are not supported. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_default_to_chunk( duckdb_appender appender, duckdb_data_chunk chunk, idx_t col, idx_t row ); ``` ####### Parameters {#docs:stable:clients:c:appender::parameters} * `appender`: The appender to get the default value from. * `chunk`: The data chunk to append the default value to. * `col`: The chunk column index to append the default value to. * `row`: The chunk row index to append the default value to. ####### Return Value {#docs:stable:clients:c:appender::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_append_bool` {#docs:stable:clients:c:appender::duckdb_append_bool} Append a bool value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_bool( duckdb_appender appender, bool value ); ```
###### `duckdb_append_int8` {#docs:stable:clients:c:appender::duckdb_append_int8} Append an int8_t value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_int8( duckdb_appender appender, int8_t value ); ```
###### `duckdb_append_int16` {#docs:stable:clients:c:appender::duckdb_append_int16} Append an int16_t value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_int16( duckdb_appender appender, int16_t value ); ```
###### `duckdb_append_int32` {#docs:stable:clients:c:appender::duckdb_append_int32} Append an int32_t value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_int32( duckdb_appender appender, int32_t value ); ```
###### `duckdb_append_int64` {#docs:stable:clients:c:appender::duckdb_append_int64} Append an int64_t value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_int64( duckdb_appender appender, int64_t value ); ```
###### `duckdb_append_hugeint` {#docs:stable:clients:c:appender::duckdb_append_hugeint} Append a duckdb_hugeint value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_hugeint( duckdb_appender appender, duckdb_hugeint value ); ```
###### `duckdb_append_uint8` {#docs:stable:clients:c:appender::duckdb_append_uint8} Append a uint8_t value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_uint8( duckdb_appender appender, uint8_t value ); ```
###### `duckdb_append_uint16` {#docs:stable:clients:c:appender::duckdb_append_uint16} Append a uint16_t value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_uint16( duckdb_appender appender, uint16_t value ); ```
###### `duckdb_append_uint32` {#docs:stable:clients:c:appender::duckdb_append_uint32} Append a uint32_t value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_uint32( duckdb_appender appender, uint32_t value ); ```
###### `duckdb_append_uint64` {#docs:stable:clients:c:appender::duckdb_append_uint64} Append a uint64_t value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_uint64( duckdb_appender appender, uint64_t value ); ```
###### `duckdb_append_uhugeint` {#docs:stable:clients:c:appender::duckdb_append_uhugeint} Append a duckdb_uhugeint value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_uhugeint( duckdb_appender appender, duckdb_uhugeint value ); ```
###### `duckdb_append_float` {#docs:stable:clients:c:appender::duckdb_append_float} Append a float value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_float( duckdb_appender appender, float value ); ```
###### `duckdb_append_double` {#docs:stable:clients:c:appender::duckdb_append_double} Append a double value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_double( duckdb_appender appender, double value ); ```
###### `duckdb_append_date` {#docs:stable:clients:c:appender::duckdb_append_date} Append a duckdb_date value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_date( duckdb_appender appender, duckdb_date value ); ```
###### `duckdb_append_time` {#docs:stable:clients:c:appender::duckdb_append_time} Append a duckdb_time value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_time( duckdb_appender appender, duckdb_time value ); ```
###### `duckdb_append_timestamp` {#docs:stable:clients:c:appender::duckdb_append_timestamp} Append a duckdb_timestamp value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_timestamp( duckdb_appender appender, duckdb_timestamp value ); ```
###### `duckdb_append_interval` {#docs:stable:clients:c:appender::duckdb_append_interval} Append a duckdb_interval value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_interval( duckdb_appender appender, duckdb_interval value ); ```
###### `duckdb_append_varchar` {#docs:stable:clients:c:appender::duckdb_append_varchar} Append a varchar value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_varchar( duckdb_appender appender, const char *val ); ```
###### `duckdb_append_varchar_length` {#docs:stable:clients:c:appender::duckdb_append_varchar_length} Append a varchar value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_varchar_length( duckdb_appender appender, const char *val, idx_t length ); ```
###### `duckdb_append_blob` {#docs:stable:clients:c:appender::duckdb_append_blob} Append a blob value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_blob( duckdb_appender appender, const void *data, idx_t length ); ```
###### `duckdb_append_null` {#docs:stable:clients:c:appender::duckdb_append_null} Append a NULL value to the appender (of any type). ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_null( duckdb_appender appender ); ```
###### `duckdb_append_value` {#docs:stable:clients:c:appender::duckdb_append_value} Append a duckdb_value to the appender. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_value( duckdb_appender appender, duckdb_value value ); ```
###### `duckdb_append_data_chunk` {#docs:stable:clients:c:appender::duckdb_append_data_chunk} Appends a pre-filled data chunk to the specified appender. Attempts casting, if the data chunk types do not match the active appender types. ####### Syntax {#docs:stable:clients:c:appender::syntax} ```c duckdb_state duckdb_append_data_chunk( duckdb_appender appender, duckdb_data_chunk chunk ); ``` ####### Parameters {#docs:stable:clients:c:appender::parameters} * `appender`: The appender to append to. * `chunk`: The data chunk to append. ####### Return Value {#docs:stable:clients:c:appender::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
### Table Functions {#docs:stable:clients:c:table_functions} The table function API can be used to define a table function that can then be called from within DuckDB in the `FROM` clause of a query. #### API Reference Overview {#docs:stable:clients:c:table_functions::api-reference-overview} ```c duckdb_table_function duckdb_create_table_function(); void duckdb_destroy_table_function(duckdb_table_function *table_function); void duckdb_table_function_set_name(duckdb_table_function table_function, const char *name); void duckdb_table_function_add_parameter(duckdb_table_function table_function, duckdb_logical_type type); void duckdb_table_function_add_named_parameter(duckdb_table_function table_function, const char *name, duckdb_logical_type type); void duckdb_table_function_set_extra_info(duckdb_table_function table_function, void *extra_info, duckdb_delete_callback_t destroy); void duckdb_table_function_set_bind(duckdb_table_function table_function, duckdb_table_function_bind_t bind); void duckdb_table_function_set_init(duckdb_table_function table_function, duckdb_table_function_init_t init); void duckdb_table_function_set_local_init(duckdb_table_function table_function, duckdb_table_function_init_t init); void duckdb_table_function_set_function(duckdb_table_function table_function, duckdb_table_function_t function); void duckdb_table_function_supports_projection_pushdown(duckdb_table_function table_function, bool pushdown); duckdb_state duckdb_register_table_function(duckdb_connection con, duckdb_table_function function); ``` ##### Table Function Bind {#docs:stable:clients:c:table_functions::table-function-bind} ```c void *duckdb_bind_get_extra_info(duckdb_bind_info info); void duckdb_table_function_get_client_context(duckdb_bind_info info, duckdb_client_context *out_context); void duckdb_bind_add_result_column(duckdb_bind_info info, const char *name, duckdb_logical_type type); idx_t duckdb_bind_get_parameter_count(duckdb_bind_info info); duckdb_value duckdb_bind_get_parameter(duckdb_bind_info info, idx_t index); duckdb_value duckdb_bind_get_named_parameter(duckdb_bind_info info, const char *name); void duckdb_bind_set_bind_data(duckdb_bind_info info, void *bind_data, duckdb_delete_callback_t destroy); void duckdb_bind_set_cardinality(duckdb_bind_info info, idx_t cardinality, bool is_exact); void duckdb_bind_set_error(duckdb_bind_info info, const char *error); ``` ##### Table Function Init {#docs:stable:clients:c:table_functions::table-function-init} ```c void *duckdb_init_get_extra_info(duckdb_init_info info); void *duckdb_init_get_bind_data(duckdb_init_info info); void duckdb_init_set_init_data(duckdb_init_info info, void *init_data, duckdb_delete_callback_t destroy); idx_t duckdb_init_get_column_count(duckdb_init_info info); idx_t duckdb_init_get_column_index(duckdb_init_info info, idx_t column_index); void duckdb_init_set_max_threads(duckdb_init_info info, idx_t max_threads); void duckdb_init_set_error(duckdb_init_info info, const char *error); ``` ##### Table Function {#docs:stable:clients:c:table_functions::table-function} ```c void *duckdb_function_get_extra_info(duckdb_function_info info); void *duckdb_function_get_bind_data(duckdb_function_info info); void *duckdb_function_get_init_data(duckdb_function_info info); void *duckdb_function_get_local_init_data(duckdb_function_info info); void duckdb_function_set_error(duckdb_function_info info, const char *error); ``` ###### `duckdb_create_table_function` {#docs:stable:clients:c:table_functions::duckdb_create_table_function} Creates a new empty table function. The return value should be destroyed with `duckdb_destroy_table_function`. ####### Return Value {#docs:stable:clients:c:table_functions::return-value} The table function object. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c duckdb_table_function duckdb_create_table_function( ); ```
###### `duckdb_destroy_table_function` {#docs:stable:clients:c:table_functions::duckdb_destroy_table_function} Destroys the given table function object. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_destroy_table_function( duckdb_table_function *table_function ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `table_function`: The table function to destroy
###### `duckdb_table_function_set_name` {#docs:stable:clients:c:table_functions::duckdb_table_function_set_name} Sets the name of the given table function. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_table_function_set_name( duckdb_table_function table_function, const char *name ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `table_function`: The table function * `name`: The name of the table function
###### `duckdb_table_function_add_parameter` {#docs:stable:clients:c:table_functions::duckdb_table_function_add_parameter} Adds a parameter to the table function. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_table_function_add_parameter( duckdb_table_function table_function, duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `table_function`: The table function. * `type`: The parameter type. Cannot contain INVALID.
###### `duckdb_table_function_add_named_parameter` {#docs:stable:clients:c:table_functions::duckdb_table_function_add_named_parameter} Adds a named parameter to the table function. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_table_function_add_named_parameter( duckdb_table_function table_function, const char *name, duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `table_function`: The table function. * `name`: The parameter name. * `type`: The parameter type. Cannot contain INVALID.
###### `duckdb_table_function_set_extra_info` {#docs:stable:clients:c:table_functions::duckdb_table_function_set_extra_info} Assigns extra information to the table function that can be fetched during binding, etc. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_table_function_set_extra_info( duckdb_table_function table_function, void *extra_info, duckdb_delete_callback_t destroy ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `table_function`: The table function * `extra_info`: The extra information * `destroy`: The callback that will be called to destroy the extra information (if any)
###### `duckdb_table_function_set_bind` {#docs:stable:clients:c:table_functions::duckdb_table_function_set_bind} Sets the bind function of the table function. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_table_function_set_bind( duckdb_table_function table_function, duckdb_table_function_bind_t bind ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `table_function`: The table function * `bind`: The bind function
###### `duckdb_table_function_set_init` {#docs:stable:clients:c:table_functions::duckdb_table_function_set_init} Sets the init function of the table function. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_table_function_set_init( duckdb_table_function table_function, duckdb_table_function_init_t init ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `table_function`: The table function * `init`: The init function
###### `duckdb_table_function_set_local_init` {#docs:stable:clients:c:table_functions::duckdb_table_function_set_local_init} Sets the thread-local init function of the table function. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_table_function_set_local_init( duckdb_table_function table_function, duckdb_table_function_init_t init ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `table_function`: The table function * `init`: The init function
###### `duckdb_table_function_set_function` {#docs:stable:clients:c:table_functions::duckdb_table_function_set_function} Sets the main function of the table function. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_table_function_set_function( duckdb_table_function table_function, duckdb_table_function_t function ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `table_function`: The table function * `function`: The function
###### `duckdb_table_function_supports_projection_pushdown` {#docs:stable:clients:c:table_functions::duckdb_table_function_supports_projection_pushdown} Sets whether or not the given table function supports projection pushdown. If this is set to true, the system will provide a list of all required columns in the `init` stage through the `duckdb_init_get_column_count` and `duckdb_init_get_column_index` functions. If this is set to false (the default), the system will expect all columns to be projected. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_table_function_supports_projection_pushdown( duckdb_table_function table_function, bool pushdown ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `table_function`: The table function * `pushdown`: True if the table function supports projection pushdown, false otherwise.
###### `duckdb_register_table_function` {#docs:stable:clients:c:table_functions::duckdb_register_table_function} Register the table function object within the given connection. The function requires at least a name, a bind function, an init function and a main function. If the function is incomplete or a function with this name already exists DuckDBError is returned. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c duckdb_state duckdb_register_table_function( duckdb_connection con, duckdb_table_function function ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `con`: The connection to register it in. * `function`: The function pointer ####### Return Value {#docs:stable:clients:c:table_functions::return-value} Whether or not the registration was successful.
###### `duckdb_bind_get_extra_info` {#docs:stable:clients:c:table_functions::duckdb_bind_get_extra_info} Retrieves the extra info of the function as set in `duckdb_table_function_set_extra_info`. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void *duckdb_bind_get_extra_info( duckdb_bind_info info ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:table_functions::return-value} The extra info
###### `duckdb_table_function_get_client_context` {#docs:stable:clients:c:table_functions::duckdb_table_function_get_client_context} Retrieves the client context of the bind info of a table function. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_table_function_get_client_context( duckdb_bind_info info, duckdb_client_context *out_context ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The bind info object of the table function. * `out_context`: The client context of the bind info. Must be destroyed with `duckdb_destroy_client_context`.
###### `duckdb_bind_add_result_column` {#docs:stable:clients:c:table_functions::duckdb_bind_add_result_column} Adds a result column to the output of the table function. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_bind_add_result_column( duckdb_bind_info info, const char *name, duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The table function's bind info. * `name`: The column name. * `type`: The logical column type.
###### `duckdb_bind_get_parameter_count` {#docs:stable:clients:c:table_functions::duckdb_bind_get_parameter_count} Retrieves the number of regular (non-named) parameters to the function. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c idx_t duckdb_bind_get_parameter_count( duckdb_bind_info info ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:table_functions::return-value} The number of parameters
###### `duckdb_bind_get_parameter` {#docs:stable:clients:c:table_functions::duckdb_bind_get_parameter} Retrieves the parameter at the given index. The result must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c duckdb_value duckdb_bind_get_parameter( duckdb_bind_info info, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object * `index`: The index of the parameter to get ####### Return Value {#docs:stable:clients:c:table_functions::return-value} The value of the parameter. Must be destroyed with `duckdb_destroy_value`.
###### `duckdb_bind_get_named_parameter` {#docs:stable:clients:c:table_functions::duckdb_bind_get_named_parameter} Retrieves a named parameter with the given name. The result must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c duckdb_value duckdb_bind_get_named_parameter( duckdb_bind_info info, const char *name ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object * `name`: The name of the parameter ####### Return Value {#docs:stable:clients:c:table_functions::return-value} The value of the parameter. Must be destroyed with `duckdb_destroy_value`.
###### `duckdb_bind_set_bind_data` {#docs:stable:clients:c:table_functions::duckdb_bind_set_bind_data} Sets the user-provided bind data in the bind object of the table function. This object can be retrieved again during execution. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_bind_set_bind_data( duckdb_bind_info info, void *bind_data, duckdb_delete_callback_t destroy ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The bind info of the table function. * `bind_data`: The bind data object. * `destroy`: The callback to destroy the bind data (if any).
###### `duckdb_bind_set_cardinality` {#docs:stable:clients:c:table_functions::duckdb_bind_set_cardinality} Sets the cardinality estimate for the table function, used for optimization. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_bind_set_cardinality( duckdb_bind_info info, idx_t cardinality, bool is_exact ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The bind data object. * `is_exact`: Whether or not the cardinality estimate is exact, or an approximation
###### `duckdb_bind_set_error` {#docs:stable:clients:c:table_functions::duckdb_bind_set_error} Report that an error has occurred while calling bind on a table function. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_bind_set_error( duckdb_bind_info info, const char *error ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object * `error`: The error message
###### `duckdb_init_get_extra_info` {#docs:stable:clients:c:table_functions::duckdb_init_get_extra_info} Retrieves the extra info of the function as set in `duckdb_table_function_set_extra_info`. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void *duckdb_init_get_extra_info( duckdb_init_info info ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:table_functions::return-value} The extra info
###### `duckdb_init_get_bind_data` {#docs:stable:clients:c:table_functions::duckdb_init_get_bind_data} Gets the bind data set by `duckdb_bind_set_bind_data` during the bind. Note that the bind data should be considered as read-only. For tracking state, use the init data instead. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void *duckdb_init_get_bind_data( duckdb_init_info info ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:table_functions::return-value} The bind data object
###### `duckdb_init_set_init_data` {#docs:stable:clients:c:table_functions::duckdb_init_set_init_data} Sets the user-provided init data in the init object. This object can be retrieved again during execution. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_init_set_init_data( duckdb_init_info info, void *init_data, duckdb_delete_callback_t destroy ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object * `init_data`: The init data object. * `destroy`: The callback that will be called to destroy the init data (if any)
###### `duckdb_init_get_column_count` {#docs:stable:clients:c:table_functions::duckdb_init_get_column_count} Returns the number of projected columns. This function must be used if projection pushdown is enabled to figure out which columns to emit. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c idx_t duckdb_init_get_column_count( duckdb_init_info info ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:table_functions::return-value} The number of projected columns.
###### `duckdb_init_get_column_index` {#docs:stable:clients:c:table_functions::duckdb_init_get_column_index} Returns the column index of the projected column at the specified position. This function must be used if projection pushdown is enabled to figure out which columns to emit. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c idx_t duckdb_init_get_column_index( duckdb_init_info info, idx_t column_index ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object * `column_index`: The index at which to get the projected column index, from 0..duckdb_init_get_column_count(info) ####### Return Value {#docs:stable:clients:c:table_functions::return-value} The column index of the projected column.
###### `duckdb_init_set_max_threads` {#docs:stable:clients:c:table_functions::duckdb_init_set_max_threads} Sets how many threads can process this table function in parallel (default: 1) ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_init_set_max_threads( duckdb_init_info info, idx_t max_threads ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object * `max_threads`: The maximum amount of threads that can process this table function
###### `duckdb_init_set_error` {#docs:stable:clients:c:table_functions::duckdb_init_set_error} Report that an error has occurred while calling init. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_init_set_error( duckdb_init_info info, const char *error ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object * `error`: The error message
###### `duckdb_function_get_extra_info` {#docs:stable:clients:c:table_functions::duckdb_function_get_extra_info} Retrieves the extra info of the function as set in `duckdb_table_function_set_extra_info`. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void *duckdb_function_get_extra_info( duckdb_function_info info ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:table_functions::return-value} The extra info
###### `duckdb_function_get_bind_data` {#docs:stable:clients:c:table_functions::duckdb_function_get_bind_data} Gets the table function's bind data set by `duckdb_bind_set_bind_data`. Note that the bind data is read-only. For tracking state, use the init data instead. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void *duckdb_function_get_bind_data( duckdb_function_info info ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The function info object. ####### Return Value {#docs:stable:clients:c:table_functions::return-value} The bind data object.
###### `duckdb_function_get_init_data` {#docs:stable:clients:c:table_functions::duckdb_function_get_init_data} Gets the init data set by `duckdb_init_set_init_data` during the init. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void *duckdb_function_get_init_data( duckdb_function_info info ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:table_functions::return-value} The init data object
###### `duckdb_function_get_local_init_data` {#docs:stable:clients:c:table_functions::duckdb_function_get_local_init_data} Gets the thread-local init data set by `duckdb_init_set_init_data` during the local_init. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void *duckdb_function_get_local_init_data( duckdb_function_info info ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:table_functions::return-value} The init data object
###### `duckdb_function_set_error` {#docs:stable:clients:c:table_functions::duckdb_function_set_error} Report that an error has occurred while executing the function. ####### Syntax {#docs:stable:clients:c:table_functions::syntax} ```c void duckdb_function_set_error( duckdb_function_info info, const char *error ); ``` ####### Parameters {#docs:stable:clients:c:table_functions::parameters} * `info`: The info object * `error`: The error message
### Replacement Scans {#docs:stable:clients:c:replacement_scans} The replacement scan API can be used to register a callback that is called when a table is read that does not exist in the catalog. For example, when a query such as `SELECT * FROM my_table` is executed and `my_table` does not exist, the replacement scan callback will be called with `my_table` as parameter. The replacement scan can then insert a table function with a specific parameter to replace the read of the table. #### API Reference Overview {#docs:stable:clients:c:replacement_scans::api-reference-overview} ```c void duckdb_add_replacement_scan(duckdb_database db, duckdb_replacement_callback_t replacement, void *extra_data, duckdb_delete_callback_t delete_callback); void duckdb_replacement_scan_set_function_name(duckdb_replacement_scan_info info, const char *function_name); void duckdb_replacement_scan_add_parameter(duckdb_replacement_scan_info info, duckdb_value parameter); void duckdb_replacement_scan_set_error(duckdb_replacement_scan_info info, const char *error); ``` ###### `duckdb_add_replacement_scan` {#docs:stable:clients:c:replacement_scans::duckdb_add_replacement_scan} Add a replacement scan definition to the specified database. ####### Syntax {#docs:stable:clients:c:replacement_scans::syntax} ```c void duckdb_add_replacement_scan( duckdb_database db, duckdb_replacement_callback_t replacement, void *extra_data, duckdb_delete_callback_t delete_callback ); ``` ####### Parameters {#docs:stable:clients:c:replacement_scans::parameters} * `db`: The database object to add the replacement scan to * `replacement`: The replacement scan callback * `extra_data`: Extra data that is passed back into the specified callback * `delete_callback`: The delete callback to call on the extra data, if any
###### `duckdb_replacement_scan_set_function_name` {#docs:stable:clients:c:replacement_scans::duckdb_replacement_scan_set_function_name} Sets the replacement function name. If this function is called in the replacement callback, the replacement scan is performed. If it is not called, the replacement callback is not performed. ####### Syntax {#docs:stable:clients:c:replacement_scans::syntax} ```c void duckdb_replacement_scan_set_function_name( duckdb_replacement_scan_info info, const char *function_name ); ``` ####### Parameters {#docs:stable:clients:c:replacement_scans::parameters} * `info`: The info object * `function_name`: The function name to substitute.
###### `duckdb_replacement_scan_add_parameter` {#docs:stable:clients:c:replacement_scans::duckdb_replacement_scan_add_parameter} Adds a parameter to the replacement scan function. ####### Syntax {#docs:stable:clients:c:replacement_scans::syntax} ```c void duckdb_replacement_scan_add_parameter( duckdb_replacement_scan_info info, duckdb_value parameter ); ``` ####### Parameters {#docs:stable:clients:c:replacement_scans::parameters} * `info`: The info object * `parameter`: The parameter to add.
###### `duckdb_replacement_scan_set_error` {#docs:stable:clients:c:replacement_scans::duckdb_replacement_scan_set_error} Report that an error has occurred while executing the replacement scan. ####### Syntax {#docs:stable:clients:c:replacement_scans::syntax} ```c void duckdb_replacement_scan_set_error( duckdb_replacement_scan_info info, const char *error ); ``` ####### Parameters {#docs:stable:clients:c:replacement_scans::parameters} * `info`: The info object * `error`: The error message
### Complete API {#docs:stable:clients:c:api} This page contains the reference for DuckDB's C API. > The reference contains several deprecation notices. These concern methods whose long-term availability is not guaranteed as they may be removed in the future. That said, DuckDB's developers plan to carry out deprecations slowly as several of the deprecated methods do not yet have a fully functional alternative. Therefore, they will not removed before the alternative is available, and even then, there will be a grace period of a few minor versions before removing them. The reason that the methods are already deprecated in v1.0 is to denote that they are not part of the v1.0 stable API, which contains methods that are available long-term. #### API Reference Overview {#docs:stable:clients:c:api::api-reference-overview} ##### Open Connect {#docs:stable:clients:c:api::open-connect} ```c duckdb_instance_cache duckdb_create_instance_cache(); duckdb_state duckdb_get_or_create_from_cache(duckdb_instance_cache instance_cache, const char *path, duckdb_database *out_database, duckdb_config config, char **out_error); void duckdb_destroy_instance_cache(duckdb_instance_cache *instance_cache); duckdb_state duckdb_open(const char *path, duckdb_database *out_database); duckdb_state duckdb_open_ext(const char *path, duckdb_database *out_database, duckdb_config config, char **out_error); void duckdb_close(duckdb_database *database); duckdb_state duckdb_connect(duckdb_database database, duckdb_connection *out_connection); void duckdb_interrupt(duckdb_connection connection); duckdb_query_progress_type duckdb_query_progress(duckdb_connection connection); void duckdb_disconnect(duckdb_connection *connection); void duckdb_connection_get_client_context(duckdb_connection connection, duckdb_client_context *out_context); void duckdb_connection_get_arrow_options(duckdb_connection connection, duckdb_arrow_options *out_arrow_options); idx_t duckdb_client_context_get_connection_id(duckdb_client_context context); void duckdb_destroy_client_context(duckdb_client_context *context); void duckdb_destroy_arrow_options(duckdb_arrow_options *arrow_options); const char *duckdb_library_version(); duckdb_value duckdb_get_table_names(duckdb_connection connection, const char *query, bool qualified); ``` ##### Configuration {#docs:stable:clients:c:api::configuration} ```c duckdb_state duckdb_create_config(duckdb_config *out_config); size_t duckdb_config_count(); duckdb_state duckdb_get_config_flag(size_t index, const char **out_name, const char **out_description); duckdb_state duckdb_set_config(duckdb_config config, const char *name, const char *option); void duckdb_destroy_config(duckdb_config *config); ``` ##### Error Data {#docs:stable:clients:c:api::error-data} ```c duckdb_error_data duckdb_create_error_data(duckdb_error_type type, const char *message); void duckdb_destroy_error_data(duckdb_error_data *error_data); duckdb_error_type duckdb_error_data_error_type(duckdb_error_data error_data); const char *duckdb_error_data_message(duckdb_error_data error_data); bool duckdb_error_data_has_error(duckdb_error_data error_data); ``` ##### Query Execution {#docs:stable:clients:c:api::query-execution} ```c duckdb_state duckdb_query(duckdb_connection connection, const char *query, duckdb_result *out_result); void duckdb_destroy_result(duckdb_result *result); const char *duckdb_column_name(duckdb_result *result, idx_t col); duckdb_type duckdb_column_type(duckdb_result *result, idx_t col); duckdb_statement_type duckdb_result_statement_type(duckdb_result result); duckdb_logical_type duckdb_column_logical_type(duckdb_result *result, idx_t col); duckdb_arrow_options duckdb_result_get_arrow_options(duckdb_result *result); idx_t duckdb_column_count(duckdb_result *result); idx_t duckdb_row_count(duckdb_result *result); idx_t duckdb_rows_changed(duckdb_result *result); void *duckdb_column_data(duckdb_result *result, idx_t col); bool *duckdb_nullmask_data(duckdb_result *result, idx_t col); const char *duckdb_result_error(duckdb_result *result); duckdb_error_type duckdb_result_error_type(duckdb_result *result); ``` ##### Result Functions {#docs:stable:clients:c:api::result-functions} ```c duckdb_data_chunk duckdb_result_get_chunk(duckdb_result result, idx_t chunk_index); bool duckdb_result_is_streaming(duckdb_result result); idx_t duckdb_result_chunk_count(duckdb_result result); duckdb_result_type duckdb_result_return_type(duckdb_result result); ``` ##### Safe Fetch Functions {#docs:stable:clients:c:api::safe-fetch-functions} ```c bool duckdb_value_boolean(duckdb_result *result, idx_t col, idx_t row); int8_t duckdb_value_int8(duckdb_result *result, idx_t col, idx_t row); int16_t duckdb_value_int16(duckdb_result *result, idx_t col, idx_t row); int32_t duckdb_value_int32(duckdb_result *result, idx_t col, idx_t row); int64_t duckdb_value_int64(duckdb_result *result, idx_t col, idx_t row); duckdb_hugeint duckdb_value_hugeint(duckdb_result *result, idx_t col, idx_t row); duckdb_uhugeint duckdb_value_uhugeint(duckdb_result *result, idx_t col, idx_t row); duckdb_decimal duckdb_value_decimal(duckdb_result *result, idx_t col, idx_t row); uint8_t duckdb_value_uint8(duckdb_result *result, idx_t col, idx_t row); uint16_t duckdb_value_uint16(duckdb_result *result, idx_t col, idx_t row); uint32_t duckdb_value_uint32(duckdb_result *result, idx_t col, idx_t row); uint64_t duckdb_value_uint64(duckdb_result *result, idx_t col, idx_t row); float duckdb_value_float(duckdb_result *result, idx_t col, idx_t row); double duckdb_value_double(duckdb_result *result, idx_t col, idx_t row); duckdb_date duckdb_value_date(duckdb_result *result, idx_t col, idx_t row); duckdb_time duckdb_value_time(duckdb_result *result, idx_t col, idx_t row); duckdb_timestamp duckdb_value_timestamp(duckdb_result *result, idx_t col, idx_t row); duckdb_interval duckdb_value_interval(duckdb_result *result, idx_t col, idx_t row); char *duckdb_value_varchar(duckdb_result *result, idx_t col, idx_t row); duckdb_string duckdb_value_string(duckdb_result *result, idx_t col, idx_t row); char *duckdb_value_varchar_internal(duckdb_result *result, idx_t col, idx_t row); duckdb_string duckdb_value_string_internal(duckdb_result *result, idx_t col, idx_t row); duckdb_blob duckdb_value_blob(duckdb_result *result, idx_t col, idx_t row); bool duckdb_value_is_null(duckdb_result *result, idx_t col, idx_t row); ``` ##### Helpers {#docs:stable:clients:c:api::helpers} ```c void *duckdb_malloc(size_t size); void duckdb_free(void *ptr); idx_t duckdb_vector_size(); bool duckdb_string_is_inlined(duckdb_string_t string); uint32_t duckdb_string_t_length(duckdb_string_t string); const char *duckdb_string_t_data(duckdb_string_t *string); ``` ##### Date Time Timestamp Helpers {#docs:stable:clients:c:api::date-time-timestamp-helpers} ```c duckdb_date_struct duckdb_from_date(duckdb_date date); duckdb_date duckdb_to_date(duckdb_date_struct date); bool duckdb_is_finite_date(duckdb_date date); duckdb_time_struct duckdb_from_time(duckdb_time time); duckdb_time_tz duckdb_create_time_tz(int64_t micros, int32_t offset); duckdb_time_tz_struct duckdb_from_time_tz(duckdb_time_tz micros); duckdb_time duckdb_to_time(duckdb_time_struct time); duckdb_timestamp_struct duckdb_from_timestamp(duckdb_timestamp ts); duckdb_timestamp duckdb_to_timestamp(duckdb_timestamp_struct ts); bool duckdb_is_finite_timestamp(duckdb_timestamp ts); bool duckdb_is_finite_timestamp_s(duckdb_timestamp_s ts); bool duckdb_is_finite_timestamp_ms(duckdb_timestamp_ms ts); bool duckdb_is_finite_timestamp_ns(duckdb_timestamp_ns ts); ``` ##### Hugeint Helpers {#docs:stable:clients:c:api::hugeint-helpers} ```c double duckdb_hugeint_to_double(duckdb_hugeint val); duckdb_hugeint duckdb_double_to_hugeint(double val); ``` ##### Unsigned Hugeint Helpers {#docs:stable:clients:c:api::unsigned-hugeint-helpers} ```c double duckdb_uhugeint_to_double(duckdb_uhugeint val); duckdb_uhugeint duckdb_double_to_uhugeint(double val); ``` ##### Decimal Helpers {#docs:stable:clients:c:api::decimal-helpers} ```c duckdb_decimal duckdb_double_to_decimal(double val, uint8_t width, uint8_t scale); double duckdb_decimal_to_double(duckdb_decimal val); ``` ##### Prepared Statements {#docs:stable:clients:c:api::prepared-statements} ```c duckdb_state duckdb_prepare(duckdb_connection connection, const char *query, duckdb_prepared_statement *out_prepared_statement); void duckdb_destroy_prepare(duckdb_prepared_statement *prepared_statement); const char *duckdb_prepare_error(duckdb_prepared_statement prepared_statement); idx_t duckdb_nparams(duckdb_prepared_statement prepared_statement); const char *duckdb_parameter_name(duckdb_prepared_statement prepared_statement, idx_t index); duckdb_type duckdb_param_type(duckdb_prepared_statement prepared_statement, idx_t param_idx); duckdb_logical_type duckdb_param_logical_type(duckdb_prepared_statement prepared_statement, idx_t param_idx); duckdb_state duckdb_clear_bindings(duckdb_prepared_statement prepared_statement); duckdb_statement_type duckdb_prepared_statement_type(duckdb_prepared_statement statement); idx_t duckdb_prepared_statement_column_count(duckdb_prepared_statement prepared_statement); const char *duckdb_prepared_statement_column_name(duckdb_prepared_statement prepared_statement, idx_t col_idx); duckdb_logical_type duckdb_prepared_statement_column_logical_type(duckdb_prepared_statement prepared_statement, idx_t col_idx); duckdb_type duckdb_prepared_statement_column_type(duckdb_prepared_statement prepared_statement, idx_t col_idx); ``` ##### Bind Values to Prepared Statements {#docs:stable:clients:c:api::bind-values-to-prepared-statements} ```c duckdb_state duckdb_bind_value(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_value val); duckdb_state duckdb_bind_parameter_index(duckdb_prepared_statement prepared_statement, idx_t *param_idx_out, const char *name); duckdb_state duckdb_bind_boolean(duckdb_prepared_statement prepared_statement, idx_t param_idx, bool val); duckdb_state duckdb_bind_int8(duckdb_prepared_statement prepared_statement, idx_t param_idx, int8_t val); duckdb_state duckdb_bind_int16(duckdb_prepared_statement prepared_statement, idx_t param_idx, int16_t val); duckdb_state duckdb_bind_int32(duckdb_prepared_statement prepared_statement, idx_t param_idx, int32_t val); duckdb_state duckdb_bind_int64(duckdb_prepared_statement prepared_statement, idx_t param_idx, int64_t val); duckdb_state duckdb_bind_hugeint(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_hugeint val); duckdb_state duckdb_bind_uhugeint(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_uhugeint val); duckdb_state duckdb_bind_decimal(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_decimal val); duckdb_state duckdb_bind_uint8(duckdb_prepared_statement prepared_statement, idx_t param_idx, uint8_t val); duckdb_state duckdb_bind_uint16(duckdb_prepared_statement prepared_statement, idx_t param_idx, uint16_t val); duckdb_state duckdb_bind_uint32(duckdb_prepared_statement prepared_statement, idx_t param_idx, uint32_t val); duckdb_state duckdb_bind_uint64(duckdb_prepared_statement prepared_statement, idx_t param_idx, uint64_t val); duckdb_state duckdb_bind_float(duckdb_prepared_statement prepared_statement, idx_t param_idx, float val); duckdb_state duckdb_bind_double(duckdb_prepared_statement prepared_statement, idx_t param_idx, double val); duckdb_state duckdb_bind_date(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_date val); duckdb_state duckdb_bind_time(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_time val); duckdb_state duckdb_bind_timestamp(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_timestamp val); duckdb_state duckdb_bind_timestamp_tz(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_timestamp val); duckdb_state duckdb_bind_interval(duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_interval val); duckdb_state duckdb_bind_varchar(duckdb_prepared_statement prepared_statement, idx_t param_idx, const char *val); duckdb_state duckdb_bind_varchar_length(duckdb_prepared_statement prepared_statement, idx_t param_idx, const char *val, idx_t length); duckdb_state duckdb_bind_blob(duckdb_prepared_statement prepared_statement, idx_t param_idx, const void *data, idx_t length); duckdb_state duckdb_bind_null(duckdb_prepared_statement prepared_statement, idx_t param_idx); ``` ##### Execute Prepared Statements {#docs:stable:clients:c:api::execute-prepared-statements} ```c duckdb_state duckdb_execute_prepared(duckdb_prepared_statement prepared_statement, duckdb_result *out_result); duckdb_state duckdb_execute_prepared_streaming(duckdb_prepared_statement prepared_statement, duckdb_result *out_result); ``` ##### Extract Statements {#docs:stable:clients:c:api::extract-statements} ```c idx_t duckdb_extract_statements(duckdb_connection connection, const char *query, duckdb_extracted_statements *out_extracted_statements); duckdb_state duckdb_prepare_extracted_statement(duckdb_connection connection, duckdb_extracted_statements extracted_statements, idx_t index, duckdb_prepared_statement *out_prepared_statement); const char *duckdb_extract_statements_error(duckdb_extracted_statements extracted_statements); void duckdb_destroy_extracted(duckdb_extracted_statements *extracted_statements); ``` ##### Pending Result Interface {#docs:stable:clients:c:api::pending-result-interface} ```c duckdb_state duckdb_pending_prepared(duckdb_prepared_statement prepared_statement, duckdb_pending_result *out_result); duckdb_state duckdb_pending_prepared_streaming(duckdb_prepared_statement prepared_statement, duckdb_pending_result *out_result); void duckdb_destroy_pending(duckdb_pending_result *pending_result); const char *duckdb_pending_error(duckdb_pending_result pending_result); duckdb_pending_state duckdb_pending_execute_task(duckdb_pending_result pending_result); duckdb_pending_state duckdb_pending_execute_check_state(duckdb_pending_result pending_result); duckdb_state duckdb_execute_pending(duckdb_pending_result pending_result, duckdb_result *out_result); bool duckdb_pending_execution_is_finished(duckdb_pending_state pending_state); ``` ##### Value Interface {#docs:stable:clients:c:api::value-interface} ```c void duckdb_destroy_value(duckdb_value *value); duckdb_value duckdb_create_varchar(const char *text); duckdb_value duckdb_create_varchar_length(const char *text, idx_t length); duckdb_value duckdb_create_bool(bool input); duckdb_value duckdb_create_int8(int8_t input); duckdb_value duckdb_create_uint8(uint8_t input); duckdb_value duckdb_create_int16(int16_t input); duckdb_value duckdb_create_uint16(uint16_t input); duckdb_value duckdb_create_int32(int32_t input); duckdb_value duckdb_create_uint32(uint32_t input); duckdb_value duckdb_create_uint64(uint64_t input); duckdb_value duckdb_create_int64(int64_t val); duckdb_value duckdb_create_hugeint(duckdb_hugeint input); duckdb_value duckdb_create_uhugeint(duckdb_uhugeint input); duckdb_value duckdb_create_bignum(duckdb_bignum input); duckdb_value duckdb_create_decimal(duckdb_decimal input); duckdb_value duckdb_create_float(float input); duckdb_value duckdb_create_double(double input); duckdb_value duckdb_create_date(duckdb_date input); duckdb_value duckdb_create_time(duckdb_time input); duckdb_value duckdb_create_time_ns(duckdb_time_ns input); duckdb_value duckdb_create_time_tz_value(duckdb_time_tz value); duckdb_value duckdb_create_timestamp(duckdb_timestamp input); duckdb_value duckdb_create_timestamp_tz(duckdb_timestamp input); duckdb_value duckdb_create_timestamp_s(duckdb_timestamp_s input); duckdb_value duckdb_create_timestamp_ms(duckdb_timestamp_ms input); duckdb_value duckdb_create_timestamp_ns(duckdb_timestamp_ns input); duckdb_value duckdb_create_interval(duckdb_interval input); duckdb_value duckdb_create_blob(const uint8_t *data, idx_t length); duckdb_value duckdb_create_bit(duckdb_bit input); duckdb_value duckdb_create_uuid(duckdb_uhugeint input); bool duckdb_get_bool(duckdb_value val); int8_t duckdb_get_int8(duckdb_value val); uint8_t duckdb_get_uint8(duckdb_value val); int16_t duckdb_get_int16(duckdb_value val); uint16_t duckdb_get_uint16(duckdb_value val); int32_t duckdb_get_int32(duckdb_value val); uint32_t duckdb_get_uint32(duckdb_value val); int64_t duckdb_get_int64(duckdb_value val); uint64_t duckdb_get_uint64(duckdb_value val); duckdb_hugeint duckdb_get_hugeint(duckdb_value val); duckdb_uhugeint duckdb_get_uhugeint(duckdb_value val); duckdb_bignum duckdb_get_bignum(duckdb_value val); duckdb_decimal duckdb_get_decimal(duckdb_value val); float duckdb_get_float(duckdb_value val); double duckdb_get_double(duckdb_value val); duckdb_date duckdb_get_date(duckdb_value val); duckdb_time duckdb_get_time(duckdb_value val); duckdb_time_ns duckdb_get_time_ns(duckdb_value val); duckdb_time_tz duckdb_get_time_tz(duckdb_value val); duckdb_timestamp duckdb_get_timestamp(duckdb_value val); duckdb_timestamp duckdb_get_timestamp_tz(duckdb_value val); duckdb_timestamp_s duckdb_get_timestamp_s(duckdb_value val); duckdb_timestamp_ms duckdb_get_timestamp_ms(duckdb_value val); duckdb_timestamp_ns duckdb_get_timestamp_ns(duckdb_value val); duckdb_interval duckdb_get_interval(duckdb_value val); duckdb_logical_type duckdb_get_value_type(duckdb_value val); duckdb_blob duckdb_get_blob(duckdb_value val); duckdb_bit duckdb_get_bit(duckdb_value val); duckdb_uhugeint duckdb_get_uuid(duckdb_value val); char *duckdb_get_varchar(duckdb_value value); duckdb_value duckdb_create_struct_value(duckdb_logical_type type, duckdb_value *values); duckdb_value duckdb_create_list_value(duckdb_logical_type type, duckdb_value *values, idx_t value_count); duckdb_value duckdb_create_array_value(duckdb_logical_type type, duckdb_value *values, idx_t value_count); duckdb_value duckdb_create_map_value(duckdb_logical_type map_type, duckdb_value *keys, duckdb_value *values, idx_t entry_count); duckdb_value duckdb_create_union_value(duckdb_logical_type union_type, idx_t tag_index, duckdb_value value); idx_t duckdb_get_map_size(duckdb_value value); duckdb_value duckdb_get_map_key(duckdb_value value, idx_t index); duckdb_value duckdb_get_map_value(duckdb_value value, idx_t index); bool duckdb_is_null_value(duckdb_value value); duckdb_value duckdb_create_null_value(); idx_t duckdb_get_list_size(duckdb_value value); duckdb_value duckdb_get_list_child(duckdb_value value, idx_t index); duckdb_value duckdb_create_enum_value(duckdb_logical_type type, uint64_t value); uint64_t duckdb_get_enum_value(duckdb_value value); duckdb_value duckdb_get_struct_child(duckdb_value value, idx_t index); char *duckdb_value_to_string(duckdb_value value); ``` ##### Logical Type Interface {#docs:stable:clients:c:api::logical-type-interface} ```c duckdb_logical_type duckdb_create_logical_type(duckdb_type type); char *duckdb_logical_type_get_alias(duckdb_logical_type type); void duckdb_logical_type_set_alias(duckdb_logical_type type, const char *alias); duckdb_logical_type duckdb_create_list_type(duckdb_logical_type type); duckdb_logical_type duckdb_create_array_type(duckdb_logical_type type, idx_t array_size); duckdb_logical_type duckdb_create_map_type(duckdb_logical_type key_type, duckdb_logical_type value_type); duckdb_logical_type duckdb_create_union_type(duckdb_logical_type *member_types, const char **member_names, idx_t member_count); duckdb_logical_type duckdb_create_struct_type(duckdb_logical_type *member_types, const char **member_names, idx_t member_count); duckdb_logical_type duckdb_create_enum_type(const char **member_names, idx_t member_count); duckdb_logical_type duckdb_create_decimal_type(uint8_t width, uint8_t scale); duckdb_type duckdb_get_type_id(duckdb_logical_type type); uint8_t duckdb_decimal_width(duckdb_logical_type type); uint8_t duckdb_decimal_scale(duckdb_logical_type type); duckdb_type duckdb_decimal_internal_type(duckdb_logical_type type); duckdb_type duckdb_enum_internal_type(duckdb_logical_type type); uint32_t duckdb_enum_dictionary_size(duckdb_logical_type type); char *duckdb_enum_dictionary_value(duckdb_logical_type type, idx_t index); duckdb_logical_type duckdb_list_type_child_type(duckdb_logical_type type); duckdb_logical_type duckdb_array_type_child_type(duckdb_logical_type type); idx_t duckdb_array_type_array_size(duckdb_logical_type type); duckdb_logical_type duckdb_map_type_key_type(duckdb_logical_type type); duckdb_logical_type duckdb_map_type_value_type(duckdb_logical_type type); idx_t duckdb_struct_type_child_count(duckdb_logical_type type); char *duckdb_struct_type_child_name(duckdb_logical_type type, idx_t index); duckdb_logical_type duckdb_struct_type_child_type(duckdb_logical_type type, idx_t index); idx_t duckdb_union_type_member_count(duckdb_logical_type type); char *duckdb_union_type_member_name(duckdb_logical_type type, idx_t index); duckdb_logical_type duckdb_union_type_member_type(duckdb_logical_type type, idx_t index); void duckdb_destroy_logical_type(duckdb_logical_type *type); duckdb_state duckdb_register_logical_type(duckdb_connection con, duckdb_logical_type type, duckdb_create_type_info info); ``` ##### Data Chunk Interface {#docs:stable:clients:c:api::data-chunk-interface} ```c duckdb_data_chunk duckdb_create_data_chunk(duckdb_logical_type *types, idx_t column_count); void duckdb_destroy_data_chunk(duckdb_data_chunk *chunk); void duckdb_data_chunk_reset(duckdb_data_chunk chunk); idx_t duckdb_data_chunk_get_column_count(duckdb_data_chunk chunk); duckdb_vector duckdb_data_chunk_get_vector(duckdb_data_chunk chunk, idx_t col_idx); idx_t duckdb_data_chunk_get_size(duckdb_data_chunk chunk); void duckdb_data_chunk_set_size(duckdb_data_chunk chunk, idx_t size); ``` ##### Vector Interface {#docs:stable:clients:c:api::vector-interface} ```c duckdb_vector duckdb_create_vector(duckdb_logical_type type, idx_t capacity); void duckdb_destroy_vector(duckdb_vector *vector); duckdb_logical_type duckdb_vector_get_column_type(duckdb_vector vector); void *duckdb_vector_get_data(duckdb_vector vector); uint64_t *duckdb_vector_get_validity(duckdb_vector vector); void duckdb_vector_ensure_validity_writable(duckdb_vector vector); void duckdb_vector_assign_string_element(duckdb_vector vector, idx_t index, const char *str); void duckdb_vector_assign_string_element_len(duckdb_vector vector, idx_t index, const char *str, idx_t str_len); duckdb_vector duckdb_list_vector_get_child(duckdb_vector vector); idx_t duckdb_list_vector_get_size(duckdb_vector vector); duckdb_state duckdb_list_vector_set_size(duckdb_vector vector, idx_t size); duckdb_state duckdb_list_vector_reserve(duckdb_vector vector, idx_t required_capacity); duckdb_vector duckdb_struct_vector_get_child(duckdb_vector vector, idx_t index); duckdb_vector duckdb_array_vector_get_child(duckdb_vector vector); void duckdb_slice_vector(duckdb_vector vector, duckdb_selection_vector sel, idx_t len); void duckdb_vector_copy_sel(duckdb_vector src, duckdb_vector dst, duckdb_selection_vector sel, idx_t src_count, idx_t src_offset, idx_t dst_offset); void duckdb_vector_reference_value(duckdb_vector vector, duckdb_value value); void duckdb_vector_reference_vector(duckdb_vector to_vector, duckdb_vector from_vector); ``` ##### Validity Mask Functions {#docs:stable:clients:c:api::validity-mask-functions} ```c bool duckdb_validity_row_is_valid(uint64_t *validity, idx_t row); void duckdb_validity_set_row_validity(uint64_t *validity, idx_t row, bool valid); void duckdb_validity_set_row_invalid(uint64_t *validity, idx_t row); void duckdb_validity_set_row_valid(uint64_t *validity, idx_t row); ``` ##### Scalar Functions {#docs:stable:clients:c:api::scalar-functions} ```c duckdb_scalar_function duckdb_create_scalar_function(); void duckdb_destroy_scalar_function(duckdb_scalar_function *scalar_function); void duckdb_scalar_function_set_name(duckdb_scalar_function scalar_function, const char *name); void duckdb_scalar_function_set_varargs(duckdb_scalar_function scalar_function, duckdb_logical_type type); void duckdb_scalar_function_set_special_handling(duckdb_scalar_function scalar_function); void duckdb_scalar_function_set_volatile(duckdb_scalar_function scalar_function); void duckdb_scalar_function_add_parameter(duckdb_scalar_function scalar_function, duckdb_logical_type type); void duckdb_scalar_function_set_return_type(duckdb_scalar_function scalar_function, duckdb_logical_type type); void duckdb_scalar_function_set_extra_info(duckdb_scalar_function scalar_function, void *extra_info, duckdb_delete_callback_t destroy); void duckdb_scalar_function_set_bind(duckdb_scalar_function scalar_function, duckdb_scalar_function_bind_t bind); void duckdb_scalar_function_set_bind_data(duckdb_bind_info info, void *bind_data, duckdb_delete_callback_t destroy); void duckdb_scalar_function_set_bind_data_copy(duckdb_bind_info info, duckdb_copy_callback_t copy); void duckdb_scalar_function_bind_set_error(duckdb_bind_info info, const char *error); void duckdb_scalar_function_set_function(duckdb_scalar_function scalar_function, duckdb_scalar_function_t function); duckdb_state duckdb_register_scalar_function(duckdb_connection con, duckdb_scalar_function scalar_function); void *duckdb_scalar_function_get_extra_info(duckdb_function_info info); void *duckdb_scalar_function_bind_get_extra_info(duckdb_bind_info info); void *duckdb_scalar_function_get_bind_data(duckdb_function_info info); void duckdb_scalar_function_get_client_context(duckdb_bind_info info, duckdb_client_context *out_context); void duckdb_scalar_function_set_error(duckdb_function_info info, const char *error); duckdb_scalar_function_set duckdb_create_scalar_function_set(const char *name); void duckdb_destroy_scalar_function_set(duckdb_scalar_function_set *scalar_function_set); duckdb_state duckdb_add_scalar_function_to_set(duckdb_scalar_function_set set, duckdb_scalar_function function); duckdb_state duckdb_register_scalar_function_set(duckdb_connection con, duckdb_scalar_function_set set); idx_t duckdb_scalar_function_bind_get_argument_count(duckdb_bind_info info); duckdb_expression duckdb_scalar_function_bind_get_argument(duckdb_bind_info info, idx_t index); ``` ##### Selection Vector Interface {#docs:stable:clients:c:api::selection-vector-interface} ```c duckdb_selection_vector duckdb_create_selection_vector(idx_t size); void duckdb_destroy_selection_vector(duckdb_selection_vector sel); sel_t *duckdb_selection_vector_get_data_ptr(duckdb_selection_vector sel); ``` ##### Aggregate Functions {#docs:stable:clients:c:api::aggregate-functions} ```c duckdb_aggregate_function duckdb_create_aggregate_function(); void duckdb_destroy_aggregate_function(duckdb_aggregate_function *aggregate_function); void duckdb_aggregate_function_set_name(duckdb_aggregate_function aggregate_function, const char *name); void duckdb_aggregate_function_add_parameter(duckdb_aggregate_function aggregate_function, duckdb_logical_type type); void duckdb_aggregate_function_set_return_type(duckdb_aggregate_function aggregate_function, duckdb_logical_type type); void duckdb_aggregate_function_set_functions(duckdb_aggregate_function aggregate_function, duckdb_aggregate_state_size state_size, duckdb_aggregate_init_t state_init, duckdb_aggregate_update_t update, duckdb_aggregate_combine_t combine, duckdb_aggregate_finalize_t finalize); void duckdb_aggregate_function_set_destructor(duckdb_aggregate_function aggregate_function, duckdb_aggregate_destroy_t destroy); duckdb_state duckdb_register_aggregate_function(duckdb_connection con, duckdb_aggregate_function aggregate_function); void duckdb_aggregate_function_set_special_handling(duckdb_aggregate_function aggregate_function); void duckdb_aggregate_function_set_extra_info(duckdb_aggregate_function aggregate_function, void *extra_info, duckdb_delete_callback_t destroy); void *duckdb_aggregate_function_get_extra_info(duckdb_function_info info); void duckdb_aggregate_function_set_error(duckdb_function_info info, const char *error); duckdb_aggregate_function_set duckdb_create_aggregate_function_set(const char *name); void duckdb_destroy_aggregate_function_set(duckdb_aggregate_function_set *aggregate_function_set); duckdb_state duckdb_add_aggregate_function_to_set(duckdb_aggregate_function_set set, duckdb_aggregate_function function); duckdb_state duckdb_register_aggregate_function_set(duckdb_connection con, duckdb_aggregate_function_set set); ``` ##### Table Functions {#docs:stable:clients:c:api::table-functions} ```c duckdb_table_function duckdb_create_table_function(); void duckdb_destroy_table_function(duckdb_table_function *table_function); void duckdb_table_function_set_name(duckdb_table_function table_function, const char *name); void duckdb_table_function_add_parameter(duckdb_table_function table_function, duckdb_logical_type type); void duckdb_table_function_add_named_parameter(duckdb_table_function table_function, const char *name, duckdb_logical_type type); void duckdb_table_function_set_extra_info(duckdb_table_function table_function, void *extra_info, duckdb_delete_callback_t destroy); void duckdb_table_function_set_bind(duckdb_table_function table_function, duckdb_table_function_bind_t bind); void duckdb_table_function_set_init(duckdb_table_function table_function, duckdb_table_function_init_t init); void duckdb_table_function_set_local_init(duckdb_table_function table_function, duckdb_table_function_init_t init); void duckdb_table_function_set_function(duckdb_table_function table_function, duckdb_table_function_t function); void duckdb_table_function_supports_projection_pushdown(duckdb_table_function table_function, bool pushdown); duckdb_state duckdb_register_table_function(duckdb_connection con, duckdb_table_function function); ``` ##### Table Function Bind {#docs:stable:clients:c:api::table-function-bind} ```c void *duckdb_bind_get_extra_info(duckdb_bind_info info); void duckdb_table_function_get_client_context(duckdb_bind_info info, duckdb_client_context *out_context); void duckdb_bind_add_result_column(duckdb_bind_info info, const char *name, duckdb_logical_type type); idx_t duckdb_bind_get_parameter_count(duckdb_bind_info info); duckdb_value duckdb_bind_get_parameter(duckdb_bind_info info, idx_t index); duckdb_value duckdb_bind_get_named_parameter(duckdb_bind_info info, const char *name); void duckdb_bind_set_bind_data(duckdb_bind_info info, void *bind_data, duckdb_delete_callback_t destroy); void duckdb_bind_set_cardinality(duckdb_bind_info info, idx_t cardinality, bool is_exact); void duckdb_bind_set_error(duckdb_bind_info info, const char *error); ``` ##### Table Function Init {#docs:stable:clients:c:api::table-function-init} ```c void *duckdb_init_get_extra_info(duckdb_init_info info); void *duckdb_init_get_bind_data(duckdb_init_info info); void duckdb_init_set_init_data(duckdb_init_info info, void *init_data, duckdb_delete_callback_t destroy); idx_t duckdb_init_get_column_count(duckdb_init_info info); idx_t duckdb_init_get_column_index(duckdb_init_info info, idx_t column_index); void duckdb_init_set_max_threads(duckdb_init_info info, idx_t max_threads); void duckdb_init_set_error(duckdb_init_info info, const char *error); ``` ##### Table Function {#docs:stable:clients:c:api::table-function} ```c void *duckdb_function_get_extra_info(duckdb_function_info info); void *duckdb_function_get_bind_data(duckdb_function_info info); void *duckdb_function_get_init_data(duckdb_function_info info); void *duckdb_function_get_local_init_data(duckdb_function_info info); void duckdb_function_set_error(duckdb_function_info info, const char *error); ``` ##### Replacement Scans {#docs:stable:clients:c:api::replacement-scans} ```c void duckdb_add_replacement_scan(duckdb_database db, duckdb_replacement_callback_t replacement, void *extra_data, duckdb_delete_callback_t delete_callback); void duckdb_replacement_scan_set_function_name(duckdb_replacement_scan_info info, const char *function_name); void duckdb_replacement_scan_add_parameter(duckdb_replacement_scan_info info, duckdb_value parameter); void duckdb_replacement_scan_set_error(duckdb_replacement_scan_info info, const char *error); ``` ##### Profiling Info {#docs:stable:clients:c:api::profiling-info} ```c duckdb_profiling_info duckdb_get_profiling_info(duckdb_connection connection); duckdb_value duckdb_profiling_info_get_value(duckdb_profiling_info info, const char *key); duckdb_value duckdb_profiling_info_get_metrics(duckdb_profiling_info info); idx_t duckdb_profiling_info_get_child_count(duckdb_profiling_info info); duckdb_profiling_info duckdb_profiling_info_get_child(duckdb_profiling_info info, idx_t index); ``` ##### Appender {#docs:stable:clients:c:api::appender} ```c duckdb_state duckdb_appender_create(duckdb_connection connection, const char *schema, const char *table, duckdb_appender *out_appender); duckdb_state duckdb_appender_create_ext(duckdb_connection connection, const char *catalog, const char *schema, const char *table, duckdb_appender *out_appender); duckdb_state duckdb_appender_create_query(duckdb_connection connection, const char *query, idx_t column_count, duckdb_logical_type *types, const char *table_name, const char **column_names, duckdb_appender *out_appender); idx_t duckdb_appender_column_count(duckdb_appender appender); duckdb_logical_type duckdb_appender_column_type(duckdb_appender appender, idx_t col_idx); const char *duckdb_appender_error(duckdb_appender appender); duckdb_error_data duckdb_appender_error_data(duckdb_appender appender); duckdb_state duckdb_appender_flush(duckdb_appender appender); duckdb_state duckdb_appender_close(duckdb_appender appender); duckdb_state duckdb_appender_destroy(duckdb_appender *appender); duckdb_state duckdb_appender_add_column(duckdb_appender appender, const char *name); duckdb_state duckdb_appender_clear_columns(duckdb_appender appender); duckdb_state duckdb_appender_begin_row(duckdb_appender appender); duckdb_state duckdb_appender_end_row(duckdb_appender appender); duckdb_state duckdb_append_default(duckdb_appender appender); duckdb_state duckdb_append_default_to_chunk(duckdb_appender appender, duckdb_data_chunk chunk, idx_t col, idx_t row); duckdb_state duckdb_append_bool(duckdb_appender appender, bool value); duckdb_state duckdb_append_int8(duckdb_appender appender, int8_t value); duckdb_state duckdb_append_int16(duckdb_appender appender, int16_t value); duckdb_state duckdb_append_int32(duckdb_appender appender, int32_t value); duckdb_state duckdb_append_int64(duckdb_appender appender, int64_t value); duckdb_state duckdb_append_hugeint(duckdb_appender appender, duckdb_hugeint value); duckdb_state duckdb_append_uint8(duckdb_appender appender, uint8_t value); duckdb_state duckdb_append_uint16(duckdb_appender appender, uint16_t value); duckdb_state duckdb_append_uint32(duckdb_appender appender, uint32_t value); duckdb_state duckdb_append_uint64(duckdb_appender appender, uint64_t value); duckdb_state duckdb_append_uhugeint(duckdb_appender appender, duckdb_uhugeint value); duckdb_state duckdb_append_float(duckdb_appender appender, float value); duckdb_state duckdb_append_double(duckdb_appender appender, double value); duckdb_state duckdb_append_date(duckdb_appender appender, duckdb_date value); duckdb_state duckdb_append_time(duckdb_appender appender, duckdb_time value); duckdb_state duckdb_append_timestamp(duckdb_appender appender, duckdb_timestamp value); duckdb_state duckdb_append_interval(duckdb_appender appender, duckdb_interval value); duckdb_state duckdb_append_varchar(duckdb_appender appender, const char *val); duckdb_state duckdb_append_varchar_length(duckdb_appender appender, const char *val, idx_t length); duckdb_state duckdb_append_blob(duckdb_appender appender, const void *data, idx_t length); duckdb_state duckdb_append_null(duckdb_appender appender); duckdb_state duckdb_append_value(duckdb_appender appender, duckdb_value value); duckdb_state duckdb_append_data_chunk(duckdb_appender appender, duckdb_data_chunk chunk); ``` ##### Table Description {#docs:stable:clients:c:api::table-description} ```c duckdb_state duckdb_table_description_create(duckdb_connection connection, const char *schema, const char *table, duckdb_table_description *out); duckdb_state duckdb_table_description_create_ext(duckdb_connection connection, const char *catalog, const char *schema, const char *table, duckdb_table_description *out); void duckdb_table_description_destroy(duckdb_table_description *table_description); const char *duckdb_table_description_error(duckdb_table_description table_description); duckdb_state duckdb_column_has_default(duckdb_table_description table_description, idx_t index, bool *out); char *duckdb_table_description_get_column_name(duckdb_table_description table_description, idx_t index); ``` ##### Arrow Interface {#docs:stable:clients:c:api::arrow-interface} ```c duckdb_error_data duckdb_to_arrow_schema(duckdb_arrow_options arrow_options, duckdb_logical_type *types, const char **names, idx_t column_count, struct ArrowSchema *out_schema); duckdb_error_data duckdb_data_chunk_to_arrow(duckdb_arrow_options arrow_options, duckdb_data_chunk chunk, struct ArrowArray *out_arrow_array); duckdb_error_data duckdb_schema_from_arrow(duckdb_connection connection, struct ArrowSchema *schema, duckdb_arrow_converted_schema *out_types); duckdb_error_data duckdb_data_chunk_from_arrow(duckdb_connection connection, struct ArrowArray *arrow_array, duckdb_arrow_converted_schema converted_schema, duckdb_data_chunk *out_chunk); void duckdb_destroy_arrow_converted_schema(duckdb_arrow_converted_schema *arrow_converted_schema); duckdb_state duckdb_query_arrow(duckdb_connection connection, const char *query, duckdb_arrow *out_result); duckdb_state duckdb_query_arrow_schema(duckdb_arrow result, duckdb_arrow_schema *out_schema); duckdb_state duckdb_prepared_arrow_schema(duckdb_prepared_statement prepared, duckdb_arrow_schema *out_schema); void duckdb_result_arrow_array(duckdb_result result, duckdb_data_chunk chunk, duckdb_arrow_array *out_array); duckdb_state duckdb_query_arrow_array(duckdb_arrow result, duckdb_arrow_array *out_array); idx_t duckdb_arrow_column_count(duckdb_arrow result); idx_t duckdb_arrow_row_count(duckdb_arrow result); idx_t duckdb_arrow_rows_changed(duckdb_arrow result); const char *duckdb_query_arrow_error(duckdb_arrow result); void duckdb_destroy_arrow(duckdb_arrow *result); void duckdb_destroy_arrow_stream(duckdb_arrow_stream *stream_p); duckdb_state duckdb_execute_prepared_arrow(duckdb_prepared_statement prepared_statement, duckdb_arrow *out_result); duckdb_state duckdb_arrow_scan(duckdb_connection connection, const char *table_name, duckdb_arrow_stream arrow); duckdb_state duckdb_arrow_array_scan(duckdb_connection connection, const char *table_name, duckdb_arrow_schema arrow_schema, duckdb_arrow_array arrow_array, duckdb_arrow_stream *out_stream); ``` ##### Threading Information {#docs:stable:clients:c:api::threading-information} ```c void duckdb_execute_tasks(duckdb_database database, idx_t max_tasks); duckdb_task_state duckdb_create_task_state(duckdb_database database); void duckdb_execute_tasks_state(duckdb_task_state state); idx_t duckdb_execute_n_tasks_state(duckdb_task_state state, idx_t max_tasks); void duckdb_finish_execution(duckdb_task_state state); bool duckdb_task_state_is_finished(duckdb_task_state state); void duckdb_destroy_task_state(duckdb_task_state state); bool duckdb_execution_is_finished(duckdb_connection con); ``` ##### Streaming Result Interface {#docs:stable:clients:c:api::streaming-result-interface} ```c duckdb_data_chunk duckdb_stream_fetch_chunk(duckdb_result result); duckdb_data_chunk duckdb_fetch_chunk(duckdb_result result); ``` ##### Cast Functions {#docs:stable:clients:c:api::cast-functions} ```c duckdb_cast_function duckdb_create_cast_function(); void duckdb_cast_function_set_source_type(duckdb_cast_function cast_function, duckdb_logical_type source_type); void duckdb_cast_function_set_target_type(duckdb_cast_function cast_function, duckdb_logical_type target_type); void duckdb_cast_function_set_implicit_cast_cost(duckdb_cast_function cast_function, int64_t cost); void duckdb_cast_function_set_function(duckdb_cast_function cast_function, duckdb_cast_function_t function); void duckdb_cast_function_set_extra_info(duckdb_cast_function cast_function, void *extra_info, duckdb_delete_callback_t destroy); void *duckdb_cast_function_get_extra_info(duckdb_function_info info); duckdb_cast_mode duckdb_cast_function_get_cast_mode(duckdb_function_info info); void duckdb_cast_function_set_error(duckdb_function_info info, const char *error); void duckdb_cast_function_set_row_error(duckdb_function_info info, const char *error, idx_t row, duckdb_vector output); duckdb_state duckdb_register_cast_function(duckdb_connection con, duckdb_cast_function cast_function); void duckdb_destroy_cast_function(duckdb_cast_function *cast_function); ``` ##### Expression Interface {#docs:stable:clients:c:api::expression-interface} ```c void duckdb_destroy_expression(duckdb_expression *expr); duckdb_logical_type duckdb_expression_return_type(duckdb_expression expr); bool duckdb_expression_is_foldable(duckdb_expression expr); duckdb_error_data duckdb_expression_fold(duckdb_client_context context, duckdb_expression expr, duckdb_value *out_value); ``` ###### `duckdb_create_instance_cache` {#docs:stable:clients:c:api::duckdb_create_instance_cache} Creates a new database instance cache. The instance cache is necessary if a client/program (re)opens multiple databases to the same file within the same process. Must be destroyed with 'duckdb_destroy_instance_cache'. ####### Return Value {#docs:stable:clients:c:api::return-value} The database instance cache. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_instance_cache duckdb_create_instance_cache( ); ```
###### `duckdb_get_or_create_from_cache` {#docs:stable:clients:c:api::duckdb_get_or_create_from_cache} Creates a new database instance in the instance cache, or retrieves an existing database instance. Must be closed with 'duckdb_close'. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_get_or_create_from_cache( duckdb_instance_cache instance_cache, const char *path, duckdb_database *out_database, duckdb_config config, char **out_error ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `instance_cache`: The instance cache in which to create the database, or from which to take the database. * `path`: Path to the database file on disk. Both `nullptr` and `:memory:` open or retrieve an in-memory database. * `out_database`: The resulting cached database. * `config`: (Optional) configuration used to create the database. * `out_error`: If set and the function returns `DuckDBError`, this contains the error message. Note that the error message must be freed using `duckdb_free`. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_destroy_instance_cache` {#docs:stable:clients:c:api::duckdb_destroy_instance_cache} Destroys an existing database instance cache and de-allocates its memory. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_instance_cache( duckdb_instance_cache *instance_cache ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `instance_cache`: The instance cache to destroy.
###### `duckdb_open` {#docs:stable:clients:c:api::duckdb_open} Creates a new database or opens an existing database file stored at the given path. If no path is given a new in-memory database is created instead. The database must be closed with 'duckdb_close'. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_open( const char *path, duckdb_database *out_database ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `path`: Path to the database file on disk. Both `nullptr` and `:memory:` open an in-memory database. * `out_database`: The result database object. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_open_ext` {#docs:stable:clients:c:api::duckdb_open_ext} Extended version of duckdb_open. Creates a new database or opens an existing database file stored at the given path. The database must be closed with 'duckdb_close'. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_open_ext( const char *path, duckdb_database *out_database, duckdb_config config, char **out_error ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `path`: Path to the database file on disk. Both `nullptr` and `:memory:` open an in-memory database. * `out_database`: The result database object. * `config`: (Optional) configuration used to start up the database. * `out_error`: If set and the function returns `DuckDBError`, this contains the error message. Note that the error message must be freed using `duckdb_free`. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_close` {#docs:stable:clients:c:api::duckdb_close} Closes the specified database and de-allocates all memory allocated for that database. This should be called after you are done with any database allocated through `duckdb_open` or `duckdb_open_ext`. Note that failing to call `duckdb_close` (in case of e.g., a program crash) will not cause data corruption. Still, it is recommended to always correctly close a database object after you are done with it. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_close( duckdb_database *database ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `database`: The database object to shut down.
###### `duckdb_connect` {#docs:stable:clients:c:api::duckdb_connect} Opens a connection to a database. Connections are required to query the database, and store transactional state associated with the connection. The instantiated connection should be closed using 'duckdb_disconnect'. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_connect( duckdb_database database, duckdb_connection *out_connection ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `database`: The database file to connect to. * `out_connection`: The result connection object. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_interrupt` {#docs:stable:clients:c:api::duckdb_interrupt} Interrupt running query ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_interrupt( duckdb_connection connection ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection to interrupt
###### `duckdb_query_progress` {#docs:stable:clients:c:api::duckdb_query_progress} Get progress of the running query ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_query_progress_type duckdb_query_progress( duckdb_connection connection ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The working connection ####### Return Value {#docs:stable:clients:c:api::return-value} -1 if no progress or a percentage of the progress
###### `duckdb_disconnect` {#docs:stable:clients:c:api::duckdb_disconnect} Closes the specified connection and de-allocates all memory allocated for that connection. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_disconnect( duckdb_connection *connection ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection to close.
###### `duckdb_connection_get_client_context` {#docs:stable:clients:c:api::duckdb_connection_get_client_context} Retrieves the client context of the connection. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_connection_get_client_context( duckdb_connection connection, duckdb_client_context *out_context ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection. * `out_context`: The client context of the connection. Must be destroyed with `duckdb_destroy_client_context`.
###### `duckdb_connection_get_arrow_options` {#docs:stable:clients:c:api::duckdb_connection_get_arrow_options} Retrieves the arrow options of the connection. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_connection_get_arrow_options( duckdb_connection connection, duckdb_arrow_options *out_arrow_options ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection.
###### `duckdb_client_context_get_connection_id` {#docs:stable:clients:c:api::duckdb_client_context_get_connection_id} Returns the connection id of the client context. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_client_context_get_connection_id( duckdb_client_context context ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `context`: The client context. ####### Return Value {#docs:stable:clients:c:api::return-value} The connection id of the client context.
###### `duckdb_destroy_client_context` {#docs:stable:clients:c:api::duckdb_destroy_client_context} Destroys the client context and deallocates its memory. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_client_context( duckdb_client_context *context ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `context`: The client context to destroy.
###### `duckdb_destroy_arrow_options` {#docs:stable:clients:c:api::duckdb_destroy_arrow_options} Destroys the arrow options and deallocates its memory. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_arrow_options( duckdb_arrow_options *arrow_options ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `arrow_options`: The arrow options to destroy.
###### `duckdb_library_version` {#docs:stable:clients:c:api::duckdb_library_version} Returns the version of the linked DuckDB, with a version postfix for dev versions Usually used for developing C extensions that must return this for a compatibility check. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c const char *duckdb_library_version( ); ```
###### `duckdb_get_table_names` {#docs:stable:clients:c:api::duckdb_get_table_names} Get the list of (fully qualified) table names of the query. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_get_table_names( duckdb_connection connection, const char *query, bool qualified ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection for which to get the table names. * `query`: The query for which to get the table names. * `qualified`: Returns fully qualified table names (catalog.schema.table), if set to true, else only the (not escaped) table names. ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_value of type VARCHAR[] containing the (fully qualified) table names of the query. Must be destroyed with duckdb_destroy_value.
###### `duckdb_create_config` {#docs:stable:clients:c:api::duckdb_create_config} Initializes an empty configuration object that can be used to provide start-up options for the DuckDB instance through `duckdb_open_ext`. The duckdb_config must be destroyed using 'duckdb_destroy_config' This will always succeed unless there is a malloc failure. Note that `duckdb_destroy_config` should always be called on the resulting config, even if the function returns `DuckDBError`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_create_config( duckdb_config *out_config ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `out_config`: The result configuration object. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_config_count` {#docs:stable:clients:c:api::duckdb_config_count} This returns the total amount of configuration options available for usage with `duckdb_get_config_flag`. This should not be called in a loop as it internally loops over all the options. ####### Return Value {#docs:stable:clients:c:api::return-value} The amount of config options available. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c size_t duckdb_config_count( ); ```
###### `duckdb_get_config_flag` {#docs:stable:clients:c:api::duckdb_get_config_flag} Obtains a human-readable name and description of a specific configuration option. This can be used to e.g. display configuration options. This will succeed unless `index` is out of range (i.e., `>= duckdb_config_count`). The result name or description MUST NOT be freed. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_get_config_flag( size_t index, const char **out_name, const char **out_description ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `index`: The index of the configuration option (between 0 and `duckdb_config_count`) * `out_name`: A name of the configuration flag. * `out_description`: A description of the configuration flag. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_set_config` {#docs:stable:clients:c:api::duckdb_set_config} Sets the specified option for the specified configuration. The configuration option is indicated by name. To obtain a list of config options, see `duckdb_get_config_flag`. In the source code, configuration options are defined in `config.cpp`. This can fail if either the name is invalid, or if the value provided for the option is invalid. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_set_config( duckdb_config config, const char *name, const char *option ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `config`: The configuration object to set the option on. * `name`: The name of the configuration flag to set. * `option`: The value to set the configuration flag to. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_destroy_config` {#docs:stable:clients:c:api::duckdb_destroy_config} Destroys the specified configuration object and de-allocates all memory allocated for the object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_config( duckdb_config *config ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `config`: The configuration object to destroy.
###### `duckdb_create_error_data` {#docs:stable:clients:c:api::duckdb_create_error_data} Creates duckdb_error_data. Must be destroyed with `duckdb_destroy_error_data`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_error_data duckdb_create_error_data( duckdb_error_type type, const char *message ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The error type. * `message`: The error message. ####### Return Value {#docs:stable:clients:c:api::return-value} The error data.
###### `duckdb_destroy_error_data` {#docs:stable:clients:c:api::duckdb_destroy_error_data} Destroys the error data and deallocates its memory. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_error_data( duckdb_error_data *error_data ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `error_data`: The error data to destroy.
###### `duckdb_error_data_error_type` {#docs:stable:clients:c:api::duckdb_error_data_error_type} Returns the duckdb_error_type of the error data. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_error_type duckdb_error_data_error_type( duckdb_error_data error_data ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `error_data`: The error data. ####### Return Value {#docs:stable:clients:c:api::return-value} The error type.
###### `duckdb_error_data_message` {#docs:stable:clients:c:api::duckdb_error_data_message} Returns the error message of the error data. Must not be freed. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c const char *duckdb_error_data_message( duckdb_error_data error_data ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `error_data`: The error data. ####### Return Value {#docs:stable:clients:c:api::return-value} The error message.
###### `duckdb_error_data_has_error` {#docs:stable:clients:c:api::duckdb_error_data_has_error} Returns whether the error data contains an error or not. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_error_data_has_error( duckdb_error_data error_data ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `error_data`: The error data. ####### Return Value {#docs:stable:clients:c:api::return-value} True, if the error data contains an exception, else false.
###### `duckdb_query` {#docs:stable:clients:c:api::duckdb_query} Executes a SQL query within a connection and stores the full (materialized) result in the out_result pointer. If the query fails to execute, DuckDBError is returned and the error message can be retrieved by calling `duckdb_result_error`. Note that after running `duckdb_query`, `duckdb_destroy_result` must be called on the result object even if the query fails, otherwise the error stored within the result will not be freed correctly. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_query( duckdb_connection connection, const char *query, duckdb_result *out_result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection to perform the query in. * `query`: The SQL query to run. * `out_result`: The query result. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_destroy_result` {#docs:stable:clients:c:api::duckdb_destroy_result} Closes the result and de-allocates all memory allocated for that result. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_result( duckdb_result *result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result to destroy.
###### `duckdb_column_name` {#docs:stable:clients:c:api::duckdb_column_name} Returns the column name of the specified column. The result should not need to be freed; the column names will automatically be destroyed when the result is destroyed. Returns `NULL` if the column is out of range. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c const char *duckdb_column_name( duckdb_result *result, idx_t col ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object to fetch the column name from. * `col`: The column index. ####### Return Value {#docs:stable:clients:c:api::return-value} The column name of the specified column.
###### `duckdb_column_type` {#docs:stable:clients:c:api::duckdb_column_type} Returns the column type of the specified column. Returns `DUCKDB_TYPE_INVALID` if the column is out of range. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_type duckdb_column_type( duckdb_result *result, idx_t col ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object to fetch the column type from. * `col`: The column index. ####### Return Value {#docs:stable:clients:c:api::return-value} The column type of the specified column.
###### `duckdb_result_statement_type` {#docs:stable:clients:c:api::duckdb_result_statement_type} Returns the statement type of the statement that was executed ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_statement_type duckdb_result_statement_type( duckdb_result result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object to fetch the statement type from. ####### Return Value {#docs:stable:clients:c:api::return-value} duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID
###### `duckdb_column_logical_type` {#docs:stable:clients:c:api::duckdb_column_logical_type} Returns the logical column type of the specified column. The return type of this call should be destroyed with `duckdb_destroy_logical_type`. Returns `NULL` if the column is out of range. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_column_logical_type( duckdb_result *result, idx_t col ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object to fetch the column type from. * `col`: The column index. ####### Return Value {#docs:stable:clients:c:api::return-value} The logical column type of the specified column.
###### `duckdb_result_get_arrow_options` {#docs:stable:clients:c:api::duckdb_result_get_arrow_options} Returns the arrow options associated with the given result. These options are definitions of how the arrow arrays/schema should be produced. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_arrow_options duckdb_result_get_arrow_options( duckdb_result *result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object to fetch arrow options from. ####### Return Value {#docs:stable:clients:c:api::return-value} The arrow options associated with the given result. This must be destroyed with `duckdb_destroy_arrow_options`.
###### `duckdb_column_count` {#docs:stable:clients:c:api::duckdb_column_count} Returns the number of columns present in a the result object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_column_count( duckdb_result *result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object. ####### Return Value {#docs:stable:clients:c:api::return-value} The number of columns present in the result object.
###### `duckdb_row_count` {#docs:stable:clients:c:api::duckdb_row_count} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Returns the number of rows present in the result object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_row_count( duckdb_result *result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object. ####### Return Value {#docs:stable:clients:c:api::return-value} The number of rows present in the result object.
###### `duckdb_rows_changed` {#docs:stable:clients:c:api::duckdb_rows_changed} Returns the number of rows changed by the query stored in the result. This is relevant only for INSERT/UPDATE/DELETE queries. For other queries the rows_changed will be 0. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_rows_changed( duckdb_result *result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object. ####### Return Value {#docs:stable:clients:c:api::return-value} The number of rows changed.
###### `duckdb_column_data` {#docs:stable:clients:c:api::duckdb_column_data} > **Deprecated.** This method has been deprecated. Prefer using `duckdb_result_get_chunk` instead. Returns the data of a specific column of a result in columnar format. The function returns a dense array which contains the result data. The exact type stored in the array depends on the corresponding duckdb_type (as provided by `duckdb_column_type`). For the exact type by which the data should be accessed, see the comments in [the types section](#types) or the `DUCKDB_TYPE` enum. For example, for a column of type `DUCKDB_TYPE_INTEGER`, rows can be accessed in the following manner: ```c int32_t *data = (int32_t *) duckdb_column_data(&result, 0); printf("Data for row %d: %d\n", row, data[row]); ``` ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void *duckdb_column_data( duckdb_result *result, idx_t col ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object to fetch the column data from. * `col`: The column index. ####### Return Value {#docs:stable:clients:c:api::return-value} The column data of the specified column.
###### `duckdb_nullmask_data` {#docs:stable:clients:c:api::duckdb_nullmask_data} > **Deprecated.** This method has been deprecated. Prefer using `duckdb_result_get_chunk` instead. Returns the nullmask of a specific column of a result in columnar format. The nullmask indicates for every row whether or not the corresponding row is `NULL`. If a row is `NULL`, the values present in the array provided by `duckdb_column_data` are undefined. ```c int32_t *data = (int32_t *) duckdb_column_data(&result, 0); bool *nullmask = duckdb_nullmask_data(&result, 0); if (nullmask[row]) { printf("Data for row %d: NULL\n", row); } else { printf("Data for row %d: %d\n", row, data[row]); } ``` ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool *duckdb_nullmask_data( duckdb_result *result, idx_t col ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object to fetch the nullmask from. * `col`: The column index. ####### Return Value {#docs:stable:clients:c:api::return-value} The nullmask of the specified column.
###### `duckdb_result_error` {#docs:stable:clients:c:api::duckdb_result_error} Returns the error message contained within the result. The error is only set if `duckdb_query` returns `DuckDBError`. The result of this function must not be freed. It will be cleaned up when `duckdb_destroy_result` is called. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c const char *duckdb_result_error( duckdb_result *result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object to fetch the error from. ####### Return Value {#docs:stable:clients:c:api::return-value} The error of the result.
###### `duckdb_result_error_type` {#docs:stable:clients:c:api::duckdb_result_error_type} Returns the result error type contained within the result. The error is only set if `duckdb_query` returns `DuckDBError`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_error_type duckdb_result_error_type( duckdb_result *result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object to fetch the error from. ####### Return Value {#docs:stable:clients:c:api::return-value} The error type of the result.
###### `duckdb_result_get_chunk` {#docs:stable:clients:c:api::duckdb_result_get_chunk} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Fetches a data chunk from the duckdb_result. This function should be called repeatedly until the result is exhausted. The result must be destroyed with `duckdb_destroy_data_chunk`. This function supersedes all `duckdb_value` functions, as well as the `duckdb_column_data` and `duckdb_nullmask_data` functions. It results in significantly better performance, and should be preferred in newer code-bases. If this function is used, none of the other result functions can be used and vice versa (i.e., this function cannot be mixed with the legacy result functions). Use `duckdb_result_chunk_count` to figure out how many chunks there are in the result. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_data_chunk duckdb_result_get_chunk( duckdb_result result, idx_t chunk_index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object to fetch the data chunk from. * `chunk_index`: The chunk index to fetch from. ####### Return Value {#docs:stable:clients:c:api::return-value} The resulting data chunk. Returns `NULL` if the chunk index is out of bounds.
###### `duckdb_result_is_streaming` {#docs:stable:clients:c:api::duckdb_result_is_streaming} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Checks if the type of the internal result is StreamQueryResult. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_result_is_streaming( duckdb_result result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object to check. ####### Return Value {#docs:stable:clients:c:api::return-value} Whether or not the result object is of the type StreamQueryResult
###### `duckdb_result_chunk_count` {#docs:stable:clients:c:api::duckdb_result_chunk_count} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Returns the number of data chunks present in the result. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_result_chunk_count( duckdb_result result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object ####### Return Value {#docs:stable:clients:c:api::return-value} Number of data chunks present in the result.
###### `duckdb_result_return_type` {#docs:stable:clients:c:api::duckdb_result_return_type} Returns the return_type of the given result, or DUCKDB_RETURN_TYPE_INVALID on error ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_result_type duckdb_result_return_type( duckdb_result result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object ####### Return Value {#docs:stable:clients:c:api::return-value} The return_type
###### `duckdb_value_boolean` {#docs:stable:clients:c:api::duckdb_value_boolean} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The boolean value at the specified location, or false if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_value_boolean( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_int8` {#docs:stable:clients:c:api::duckdb_value_int8} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The int8_t value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c int8_t duckdb_value_int8( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_int16` {#docs:stable:clients:c:api::duckdb_value_int16} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The int16_t value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c int16_t duckdb_value_int16( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_int32` {#docs:stable:clients:c:api::duckdb_value_int32} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The int32_t value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c int32_t duckdb_value_int32( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_int64` {#docs:stable:clients:c:api::duckdb_value_int64} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The int64_t value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c int64_t duckdb_value_int64( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_hugeint` {#docs:stable:clients:c:api::duckdb_value_hugeint} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The duckdb_hugeint value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_hugeint duckdb_value_hugeint( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_uhugeint` {#docs:stable:clients:c:api::duckdb_value_uhugeint} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The duckdb_uhugeint value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_uhugeint duckdb_value_uhugeint( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_decimal` {#docs:stable:clients:c:api::duckdb_value_decimal} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The duckdb_decimal value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_decimal duckdb_value_decimal( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_uint8` {#docs:stable:clients:c:api::duckdb_value_uint8} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The uint8_t value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c uint8_t duckdb_value_uint8( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_uint16` {#docs:stable:clients:c:api::duckdb_value_uint16} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The uint16_t value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c uint16_t duckdb_value_uint16( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_uint32` {#docs:stable:clients:c:api::duckdb_value_uint32} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The uint32_t value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c uint32_t duckdb_value_uint32( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_uint64` {#docs:stable:clients:c:api::duckdb_value_uint64} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The uint64_t value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c uint64_t duckdb_value_uint64( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_float` {#docs:stable:clients:c:api::duckdb_value_float} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The float value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c float duckdb_value_float( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_double` {#docs:stable:clients:c:api::duckdb_value_double} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The double value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c double duckdb_value_double( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_date` {#docs:stable:clients:c:api::duckdb_value_date} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The duckdb_date value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_date duckdb_value_date( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_time` {#docs:stable:clients:c:api::duckdb_value_time} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The duckdb_time value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_time duckdb_value_time( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_timestamp` {#docs:stable:clients:c:api::duckdb_value_timestamp} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The duckdb_timestamp value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_timestamp duckdb_value_timestamp( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_interval` {#docs:stable:clients:c:api::duckdb_value_interval} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The duckdb_interval value at the specified location, or 0 if the value cannot be converted. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_interval duckdb_value_interval( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_varchar` {#docs:stable:clients:c:api::duckdb_value_varchar} > **Deprecated.** This method has been deprecated. Use duckdb_value_string instead. This function does not work correctly if the string contains null bytes. ####### Return Value {#docs:stable:clients:c:api::return-value} The text value at the specified location as a null-terminated string, or nullptr if the value cannot be converted. The result must be freed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c char *duckdb_value_varchar( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_string` {#docs:stable:clients:c:api::duckdb_value_string} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. No support for nested types, and for other complex types. The resulting field "string.data" must be freed with `duckdb_free.` ####### Return Value {#docs:stable:clients:c:api::return-value} The string value at the specified location. Attempts to cast the result value to string. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_string duckdb_value_string( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_varchar_internal` {#docs:stable:clients:c:api::duckdb_value_varchar_internal} > **Deprecated.** This method has been deprecated. Use duckdb_value_string_internal instead. This function does not work correctly if the string contains null bytes. ####### Return Value {#docs:stable:clients:c:api::return-value} The char* value at the specified location. ONLY works on VARCHAR columns and does not auto-cast. If the column is NOT a VARCHAR column this function will return NULL. The result must NOT be freed. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c char *duckdb_value_varchar_internal( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_string_internal` {#docs:stable:clients:c:api::duckdb_value_string_internal} > **Deprecated.** This method has been deprecated. Use duckdb_value_string_internal instead. This function does not work correctly if the string contains null bytes. ####### Return Value {#docs:stable:clients:c:api::return-value} The char* value at the specified location. ONLY works on VARCHAR columns and does not auto-cast. If the column is NOT a VARCHAR column this function will return NULL. The result must NOT be freed. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_string duckdb_value_string_internal( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_blob` {#docs:stable:clients:c:api::duckdb_value_blob} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} The duckdb_blob value at the specified location. Returns a blob with blob.data set to nullptr if the value cannot be converted. The resulting field "blob.data" must be freed with `duckdb_free.` ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_blob duckdb_value_blob( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_value_is_null` {#docs:stable:clients:c:api::duckdb_value_is_null} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. ####### Return Value {#docs:stable:clients:c:api::return-value} Returns true if the value at the specified index is NULL, and false otherwise. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_value_is_null( duckdb_result *result, idx_t col, idx_t row ); ```
###### `duckdb_malloc` {#docs:stable:clients:c:api::duckdb_malloc} Allocate `size` bytes of memory using the duckdb internal malloc function. Any memory allocated in this manner should be freed using `duckdb_free`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void *duckdb_malloc( size_t size ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `size`: The number of bytes to allocate. ####### Return Value {#docs:stable:clients:c:api::return-value} A pointer to the allocated memory region.
###### `duckdb_free` {#docs:stable:clients:c:api::duckdb_free} Free a value returned from `duckdb_malloc`, `duckdb_value_varchar`, `duckdb_value_blob`, or `duckdb_value_string`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_free( void *ptr ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `ptr`: The memory region to de-allocate.
###### `duckdb_vector_size` {#docs:stable:clients:c:api::duckdb_vector_size} The internal vector size used by DuckDB. This is the amount of tuples that will fit into a data chunk created by `duckdb_create_data_chunk`. ####### Return Value {#docs:stable:clients:c:api::return-value} The vector size. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_vector_size( ); ```
###### `duckdb_string_is_inlined` {#docs:stable:clients:c:api::duckdb_string_is_inlined} Whether or not the duckdb_string_t value is inlined. This means that the data of the string does not have a separate allocation. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_string_is_inlined( duckdb_string_t string ); ```
###### `duckdb_string_t_length` {#docs:stable:clients:c:api::duckdb_string_t_length} Get the string length of a string_t ####### Syntax {#docs:stable:clients:c:api::syntax} ```c uint32_t duckdb_string_t_length( duckdb_string_t string ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `string`: The string to get the length of. ####### Return Value {#docs:stable:clients:c:api::return-value} The length.
###### `duckdb_string_t_data` {#docs:stable:clients:c:api::duckdb_string_t_data} Get a pointer to the string data of a string_t ####### Syntax {#docs:stable:clients:c:api::syntax} ```c const char *duckdb_string_t_data( duckdb_string_t *string ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `string`: The string to get the pointer to. ####### Return Value {#docs:stable:clients:c:api::return-value} The pointer.
###### `duckdb_from_date` {#docs:stable:clients:c:api::duckdb_from_date} Decompose a `duckdb_date` object into year, month and date (stored as `duckdb_date_struct`). ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_date_struct duckdb_from_date( duckdb_date date ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `date`: The date object, as obtained from a `DUCKDB_TYPE_DATE` column. ####### Return Value {#docs:stable:clients:c:api::return-value} The `duckdb_date_struct` with the decomposed elements.
###### `duckdb_to_date` {#docs:stable:clients:c:api::duckdb_to_date} Re-compose a `duckdb_date` from year, month and date (` duckdb_date_struct`). ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_date duckdb_to_date( duckdb_date_struct date ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `date`: The year, month and date stored in a `duckdb_date_struct`. ####### Return Value {#docs:stable:clients:c:api::return-value} The `duckdb_date` element.
###### `duckdb_is_finite_date` {#docs:stable:clients:c:api::duckdb_is_finite_date} Test a `duckdb_date` to see if it is a finite value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_is_finite_date( duckdb_date date ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `date`: The date object, as obtained from a `DUCKDB_TYPE_DATE` column. ####### Return Value {#docs:stable:clients:c:api::return-value} True if the date is finite, false if it is Â±infinity.
###### `duckdb_from_time` {#docs:stable:clients:c:api::duckdb_from_time} Decompose a `duckdb_time` object into hour, minute, second and microsecond (stored as `duckdb_time_struct`). ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_time_struct duckdb_from_time( duckdb_time time ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `time`: The time object, as obtained from a `DUCKDB_TYPE_TIME` column. ####### Return Value {#docs:stable:clients:c:api::return-value} The `duckdb_time_struct` with the decomposed elements.
###### `duckdb_create_time_tz` {#docs:stable:clients:c:api::duckdb_create_time_tz} Create a `duckdb_time_tz` object from micros and a timezone offset. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_time_tz duckdb_create_time_tz( int64_t micros, int32_t offset ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `micros`: The microsecond component of the time. * `offset`: The timezone offset component of the time. ####### Return Value {#docs:stable:clients:c:api::return-value} The `duckdb_time_tz` element.
###### `duckdb_from_time_tz` {#docs:stable:clients:c:api::duckdb_from_time_tz} Decompose a TIME_TZ objects into micros and a timezone offset. Use `duckdb_from_time` to further decompose the micros into hour, minute, second and microsecond. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_time_tz_struct duckdb_from_time_tz( duckdb_time_tz micros ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `micros`: The time object, as obtained from a `DUCKDB_TYPE_TIME_TZ` column.
###### `duckdb_to_time` {#docs:stable:clients:c:api::duckdb_to_time} Re-compose a `duckdb_time` from hour, minute, second and microsecond (` duckdb_time_struct`). ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_time duckdb_to_time( duckdb_time_struct time ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `time`: The hour, minute, second and microsecond in a `duckdb_time_struct`. ####### Return Value {#docs:stable:clients:c:api::return-value} The `duckdb_time` element.
###### `duckdb_from_timestamp` {#docs:stable:clients:c:api::duckdb_from_timestamp} Decompose a `duckdb_timestamp` object into a `duckdb_timestamp_struct`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_timestamp_struct duckdb_from_timestamp( duckdb_timestamp ts ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `ts`: The ts object, as obtained from a `DUCKDB_TYPE_TIMESTAMP` column. ####### Return Value {#docs:stable:clients:c:api::return-value} The `duckdb_timestamp_struct` with the decomposed elements.
###### `duckdb_to_timestamp` {#docs:stable:clients:c:api::duckdb_to_timestamp} Re-compose a `duckdb_timestamp` from a duckdb_timestamp_struct. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_timestamp duckdb_to_timestamp( duckdb_timestamp_struct ts ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `ts`: The de-composed elements in a `duckdb_timestamp_struct`. ####### Return Value {#docs:stable:clients:c:api::return-value} The `duckdb_timestamp` element.
###### `duckdb_is_finite_timestamp` {#docs:stable:clients:c:api::duckdb_is_finite_timestamp} Test a `duckdb_timestamp` to see if it is a finite value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_is_finite_timestamp( duckdb_timestamp ts ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `ts`: The duckdb_timestamp object, as obtained from a `DUCKDB_TYPE_TIMESTAMP` column. ####### Return Value {#docs:stable:clients:c:api::return-value} True if the timestamp is finite, false if it is Â±infinity.
###### `duckdb_is_finite_timestamp_s` {#docs:stable:clients:c:api::duckdb_is_finite_timestamp_s} Test a `duckdb_timestamp_s` to see if it is a finite value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_is_finite_timestamp_s( duckdb_timestamp_s ts ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `ts`: The duckdb_timestamp_s object, as obtained from a `DUCKDB_TYPE_TIMESTAMP_S` column. ####### Return Value {#docs:stable:clients:c:api::return-value} True if the timestamp is finite, false if it is Â±infinity.
###### `duckdb_is_finite_timestamp_ms` {#docs:stable:clients:c:api::duckdb_is_finite_timestamp_ms} Test a `duckdb_timestamp_ms` to see if it is a finite value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_is_finite_timestamp_ms( duckdb_timestamp_ms ts ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `ts`: The duckdb_timestamp_ms object, as obtained from a `DUCKDB_TYPE_TIMESTAMP_MS` column. ####### Return Value {#docs:stable:clients:c:api::return-value} True if the timestamp is finite, false if it is Â±infinity.
###### `duckdb_is_finite_timestamp_ns` {#docs:stable:clients:c:api::duckdb_is_finite_timestamp_ns} Test a `duckdb_timestamp_ns` to see if it is a finite value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_is_finite_timestamp_ns( duckdb_timestamp_ns ts ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `ts`: The duckdb_timestamp_ns object, as obtained from a `DUCKDB_TYPE_TIMESTAMP_NS` column. ####### Return Value {#docs:stable:clients:c:api::return-value} True if the timestamp is finite, false if it is Â±infinity.
###### `duckdb_hugeint_to_double` {#docs:stable:clients:c:api::duckdb_hugeint_to_double} Converts a duckdb_hugeint object (as obtained from a `DUCKDB_TYPE_HUGEINT` column) into a double. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c double duckdb_hugeint_to_double( duckdb_hugeint val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: The hugeint value. ####### Return Value {#docs:stable:clients:c:api::return-value} The converted `double` element.
###### `duckdb_double_to_hugeint` {#docs:stable:clients:c:api::duckdb_double_to_hugeint} Converts a double value to a duckdb_hugeint object. If the conversion fails because the double value is too big the result will be 0. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_hugeint duckdb_double_to_hugeint( double val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: The double value. ####### Return Value {#docs:stable:clients:c:api::return-value} The converted `duckdb_hugeint` element.
###### `duckdb_uhugeint_to_double` {#docs:stable:clients:c:api::duckdb_uhugeint_to_double} Converts a duckdb_uhugeint object (as obtained from a `DUCKDB_TYPE_UHUGEINT` column) into a double. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c double duckdb_uhugeint_to_double( duckdb_uhugeint val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: The uhugeint value. ####### Return Value {#docs:stable:clients:c:api::return-value} The converted `double` element.
###### `duckdb_double_to_uhugeint` {#docs:stable:clients:c:api::duckdb_double_to_uhugeint} Converts a double value to a duckdb_uhugeint object. If the conversion fails because the double value is too big the result will be 0. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_uhugeint duckdb_double_to_uhugeint( double val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: The double value. ####### Return Value {#docs:stable:clients:c:api::return-value} The converted `duckdb_uhugeint` element.
###### `duckdb_double_to_decimal` {#docs:stable:clients:c:api::duckdb_double_to_decimal} Converts a double value to a duckdb_decimal object. If the conversion fails because the double value is too big, or the width/scale are invalid the result will be 0. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_decimal duckdb_double_to_decimal( double val, uint8_t width, uint8_t scale ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: The double value. ####### Return Value {#docs:stable:clients:c:api::return-value} The converted `duckdb_decimal` element.
###### `duckdb_decimal_to_double` {#docs:stable:clients:c:api::duckdb_decimal_to_double} Converts a duckdb_decimal object (as obtained from a `DUCKDB_TYPE_DECIMAL` column) into a double. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c double duckdb_decimal_to_double( duckdb_decimal val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: The decimal value. ####### Return Value {#docs:stable:clients:c:api::return-value} The converted `double` element.
###### `duckdb_prepare` {#docs:stable:clients:c:api::duckdb_prepare} Create a prepared statement object from a query. Note that after calling `duckdb_prepare`, the prepared statement should always be destroyed using `duckdb_destroy_prepare`, even if the prepare fails. If the prepare fails, `duckdb_prepare_error` can be called to obtain the reason why the prepare failed. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_prepare( duckdb_connection connection, const char *query, duckdb_prepared_statement *out_prepared_statement ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection object * `query`: The SQL query to prepare * `out_prepared_statement`: The resulting prepared statement object ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_destroy_prepare` {#docs:stable:clients:c:api::duckdb_destroy_prepare} Closes the prepared statement and de-allocates all memory allocated for the statement. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_prepare( duckdb_prepared_statement *prepared_statement ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared_statement`: The prepared statement to destroy.
###### `duckdb_prepare_error` {#docs:stable:clients:c:api::duckdb_prepare_error} Returns the error message associated with the given prepared statement. If the prepared statement has no error message, this returns `nullptr` instead. The error message should not be freed. It will be de-allocated when `duckdb_destroy_prepare` is called. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c const char *duckdb_prepare_error( duckdb_prepared_statement prepared_statement ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared_statement`: The prepared statement to obtain the error from. ####### Return Value {#docs:stable:clients:c:api::return-value} The error message, or `nullptr` if there is none.
###### `duckdb_nparams` {#docs:stable:clients:c:api::duckdb_nparams} Returns the number of parameters that can be provided to the given prepared statement. Returns 0 if the query was not successfully prepared. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_nparams( duckdb_prepared_statement prepared_statement ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared_statement`: The prepared statement to obtain the number of parameters for.
###### `duckdb_parameter_name` {#docs:stable:clients:c:api::duckdb_parameter_name} Returns the name used to identify the parameter The returned string should be freed using `duckdb_free`. Returns NULL if the index is out of range for the provided prepared statement. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c const char *duckdb_parameter_name( duckdb_prepared_statement prepared_statement, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared_statement`: The prepared statement for which to get the parameter name from.
###### `duckdb_param_type` {#docs:stable:clients:c:api::duckdb_param_type} Returns the parameter type for the parameter at the given index. Returns `DUCKDB_TYPE_INVALID` if the parameter index is out of range or the statement was not successfully prepared. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_type duckdb_param_type( duckdb_prepared_statement prepared_statement, idx_t param_idx ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared_statement`: The prepared statement. * `param_idx`: The parameter index. ####### Return Value {#docs:stable:clients:c:api::return-value} The parameter type
###### `duckdb_param_logical_type` {#docs:stable:clients:c:api::duckdb_param_logical_type} Returns the logical type for the parameter at the given index. Returns `nullptr` if the parameter index is out of range or the statement was not successfully prepared. The return type of this call should be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_param_logical_type( duckdb_prepared_statement prepared_statement, idx_t param_idx ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared_statement`: The prepared statement. * `param_idx`: The parameter index. ####### Return Value {#docs:stable:clients:c:api::return-value} The logical type of the parameter
###### `duckdb_clear_bindings` {#docs:stable:clients:c:api::duckdb_clear_bindings} Clear the params bind to the prepared statement. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_clear_bindings( duckdb_prepared_statement prepared_statement ); ```
###### `duckdb_prepared_statement_type` {#docs:stable:clients:c:api::duckdb_prepared_statement_type} Returns the statement type of the statement to be executed ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_statement_type duckdb_prepared_statement_type( duckdb_prepared_statement statement ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `statement`: The prepared statement. ####### Return Value {#docs:stable:clients:c:api::return-value} duckdb_statement_type value or DUCKDB_STATEMENT_TYPE_INVALID
###### `duckdb_prepared_statement_column_count` {#docs:stable:clients:c:api::duckdb_prepared_statement_column_count} Returns the number of columns present in a the result of the prepared statement. If any of the column types are invalid, the result will be 1. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_prepared_statement_column_count( duckdb_prepared_statement prepared_statement ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared_statement`: The prepared statement. ####### Return Value {#docs:stable:clients:c:api::return-value} The number of columns present in the result of the prepared statement.
###### `duckdb_prepared_statement_column_name` {#docs:stable:clients:c:api::duckdb_prepared_statement_column_name} Returns the name of the specified column of the result of the prepared_statement. The returned string should be freed using `duckdb_free`. Returns `nullptr` if the column is out of range. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c const char *duckdb_prepared_statement_column_name( duckdb_prepared_statement prepared_statement, idx_t col_idx ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared_statement`: The prepared statement. * `col_idx`: The column index. ####### Return Value {#docs:stable:clients:c:api::return-value} The column name of the specified column.
###### `duckdb_prepared_statement_column_logical_type` {#docs:stable:clients:c:api::duckdb_prepared_statement_column_logical_type} Returns the column type of the specified column of the result of the prepared_statement. Returns `DUCKDB_TYPE_INVALID` if the column is out of range. The return type of this call should be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_prepared_statement_column_logical_type( duckdb_prepared_statement prepared_statement, idx_t col_idx ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared_statement`: The prepared statement to fetch the column type from. * `col_idx`: The column index. ####### Return Value {#docs:stable:clients:c:api::return-value} The logical type of the specified column.
###### `duckdb_prepared_statement_column_type` {#docs:stable:clients:c:api::duckdb_prepared_statement_column_type} Returns the column type of the specified column of the result of the prepared_statement. Returns `DUCKDB_TYPE_INVALID` if the column is out of range. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_type duckdb_prepared_statement_column_type( duckdb_prepared_statement prepared_statement, idx_t col_idx ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared_statement`: The prepared statement to fetch the column type from. * `col_idx`: The column index. ####### Return Value {#docs:stable:clients:c:api::return-value} The type of the specified column.
###### `duckdb_bind_value` {#docs:stable:clients:c:api::duckdb_bind_value} Binds a value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_value( duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_value val ); ```
###### `duckdb_bind_parameter_index` {#docs:stable:clients:c:api::duckdb_bind_parameter_index} Retrieve the index of the parameter for the prepared statement, identified by name ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_parameter_index( duckdb_prepared_statement prepared_statement, idx_t *param_idx_out, const char *name ); ```
###### `duckdb_bind_boolean` {#docs:stable:clients:c:api::duckdb_bind_boolean} Binds a bool value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_boolean( duckdb_prepared_statement prepared_statement, idx_t param_idx, bool val ); ```
###### `duckdb_bind_int8` {#docs:stable:clients:c:api::duckdb_bind_int8} Binds an int8_t value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_int8( duckdb_prepared_statement prepared_statement, idx_t param_idx, int8_t val ); ```
###### `duckdb_bind_int16` {#docs:stable:clients:c:api::duckdb_bind_int16} Binds an int16_t value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_int16( duckdb_prepared_statement prepared_statement, idx_t param_idx, int16_t val ); ```
###### `duckdb_bind_int32` {#docs:stable:clients:c:api::duckdb_bind_int32} Binds an int32_t value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_int32( duckdb_prepared_statement prepared_statement, idx_t param_idx, int32_t val ); ```
###### `duckdb_bind_int64` {#docs:stable:clients:c:api::duckdb_bind_int64} Binds an int64_t value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_int64( duckdb_prepared_statement prepared_statement, idx_t param_idx, int64_t val ); ```
###### `duckdb_bind_hugeint` {#docs:stable:clients:c:api::duckdb_bind_hugeint} Binds a duckdb_hugeint value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_hugeint( duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_hugeint val ); ```
###### `duckdb_bind_uhugeint` {#docs:stable:clients:c:api::duckdb_bind_uhugeint} Binds a duckdb_uhugeint value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_uhugeint( duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_uhugeint val ); ```
###### `duckdb_bind_decimal` {#docs:stable:clients:c:api::duckdb_bind_decimal} Binds a duckdb_decimal value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_decimal( duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_decimal val ); ```
###### `duckdb_bind_uint8` {#docs:stable:clients:c:api::duckdb_bind_uint8} Binds a uint8_t value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_uint8( duckdb_prepared_statement prepared_statement, idx_t param_idx, uint8_t val ); ```
###### `duckdb_bind_uint16` {#docs:stable:clients:c:api::duckdb_bind_uint16} Binds a uint16_t value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_uint16( duckdb_prepared_statement prepared_statement, idx_t param_idx, uint16_t val ); ```
###### `duckdb_bind_uint32` {#docs:stable:clients:c:api::duckdb_bind_uint32} Binds a uint32_t value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_uint32( duckdb_prepared_statement prepared_statement, idx_t param_idx, uint32_t val ); ```
###### `duckdb_bind_uint64` {#docs:stable:clients:c:api::duckdb_bind_uint64} Binds a uint64_t value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_uint64( duckdb_prepared_statement prepared_statement, idx_t param_idx, uint64_t val ); ```
###### `duckdb_bind_float` {#docs:stable:clients:c:api::duckdb_bind_float} Binds a float value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_float( duckdb_prepared_statement prepared_statement, idx_t param_idx, float val ); ```
###### `duckdb_bind_double` {#docs:stable:clients:c:api::duckdb_bind_double} Binds a double value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_double( duckdb_prepared_statement prepared_statement, idx_t param_idx, double val ); ```
###### `duckdb_bind_date` {#docs:stable:clients:c:api::duckdb_bind_date} Binds a duckdb_date value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_date( duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_date val ); ```
###### `duckdb_bind_time` {#docs:stable:clients:c:api::duckdb_bind_time} Binds a duckdb_time value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_time( duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_time val ); ```
###### `duckdb_bind_timestamp` {#docs:stable:clients:c:api::duckdb_bind_timestamp} Binds a duckdb_timestamp value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_timestamp( duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_timestamp val ); ```
###### `duckdb_bind_timestamp_tz` {#docs:stable:clients:c:api::duckdb_bind_timestamp_tz} Binds a duckdb_timestamp value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_timestamp_tz( duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_timestamp val ); ```
###### `duckdb_bind_interval` {#docs:stable:clients:c:api::duckdb_bind_interval} Binds a duckdb_interval value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_interval( duckdb_prepared_statement prepared_statement, idx_t param_idx, duckdb_interval val ); ```
###### `duckdb_bind_varchar` {#docs:stable:clients:c:api::duckdb_bind_varchar} Binds a null-terminated varchar value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_varchar( duckdb_prepared_statement prepared_statement, idx_t param_idx, const char *val ); ```
###### `duckdb_bind_varchar_length` {#docs:stable:clients:c:api::duckdb_bind_varchar_length} Binds a varchar value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_varchar_length( duckdb_prepared_statement prepared_statement, idx_t param_idx, const char *val, idx_t length ); ```
###### `duckdb_bind_blob` {#docs:stable:clients:c:api::duckdb_bind_blob} Binds a blob value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_blob( duckdb_prepared_statement prepared_statement, idx_t param_idx, const void *data, idx_t length ); ```
###### `duckdb_bind_null` {#docs:stable:clients:c:api::duckdb_bind_null} Binds a NULL value to the prepared statement at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_bind_null( duckdb_prepared_statement prepared_statement, idx_t param_idx ); ```
###### `duckdb_execute_prepared` {#docs:stable:clients:c:api::duckdb_execute_prepared} Executes the prepared statement with the given bound parameters, and returns a materialized query result. This method can be called multiple times for each prepared statement, and the parameters can be modified between calls to this function. Note that the result must be freed with `duckdb_destroy_result`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_execute_prepared( duckdb_prepared_statement prepared_statement, duckdb_result *out_result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared_statement`: The prepared statement to execute. * `out_result`: The query result. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_execute_prepared_streaming` {#docs:stable:clients:c:api::duckdb_execute_prepared_streaming} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Executes the prepared statement with the given bound parameters, and returns an optionally-streaming query result. To determine if the resulting query was in fact streamed, use `duckdb_result_is_streaming` This method can be called multiple times for each prepared statement, and the parameters can be modified between calls to this function. Note that the result must be freed with `duckdb_destroy_result`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_execute_prepared_streaming( duckdb_prepared_statement prepared_statement, duckdb_result *out_result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared_statement`: The prepared statement to execute. * `out_result`: The query result. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_extract_statements` {#docs:stable:clients:c:api::duckdb_extract_statements} Extract all statements from a query. Note that after calling `duckdb_extract_statements`, the extracted statements should always be destroyed using `duckdb_destroy_extracted`, even if no statements were extracted. If the extract fails, `duckdb_extract_statements_error` can be called to obtain the reason why the extract failed. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_extract_statements( duckdb_connection connection, const char *query, duckdb_extracted_statements *out_extracted_statements ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection object * `query`: The SQL query to extract * `out_extracted_statements`: The resulting extracted statements object ####### Return Value {#docs:stable:clients:c:api::return-value} The number of extracted statements or 0 on failure.
###### `duckdb_prepare_extracted_statement` {#docs:stable:clients:c:api::duckdb_prepare_extracted_statement} Prepare an extracted statement. Note that after calling `duckdb_prepare_extracted_statement`, the prepared statement should always be destroyed using `duckdb_destroy_prepare`, even if the prepare fails. If the prepare fails, `duckdb_prepare_error` can be called to obtain the reason why the prepare failed. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_prepare_extracted_statement( duckdb_connection connection, duckdb_extracted_statements extracted_statements, idx_t index, duckdb_prepared_statement *out_prepared_statement ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection object * `extracted_statements`: The extracted statements object * `index`: The index of the extracted statement to prepare * `out_prepared_statement`: The resulting prepared statement object ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_extract_statements_error` {#docs:stable:clients:c:api::duckdb_extract_statements_error} Returns the error message contained within the extracted statements. The result of this function must not be freed. It will be cleaned up when `duckdb_destroy_extracted` is called. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c const char *duckdb_extract_statements_error( duckdb_extracted_statements extracted_statements ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `extracted_statements`: The extracted statements to fetch the error from. ####### Return Value {#docs:stable:clients:c:api::return-value} The error of the extracted statements.
###### `duckdb_destroy_extracted` {#docs:stable:clients:c:api::duckdb_destroy_extracted} De-allocates all memory allocated for the extracted statements. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_extracted( duckdb_extracted_statements *extracted_statements ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `extracted_statements`: The extracted statements to destroy.
###### `duckdb_pending_prepared` {#docs:stable:clients:c:api::duckdb_pending_prepared} Executes the prepared statement with the given bound parameters, and returns a pending result. The pending result represents an intermediate structure for a query that is not yet fully executed. The pending result can be used to incrementally execute a query, returning control to the client between tasks. Note that after calling `duckdb_pending_prepared`, the pending result should always be destroyed using `duckdb_destroy_pending`, even if this function returns DuckDBError. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_pending_prepared( duckdb_prepared_statement prepared_statement, duckdb_pending_result *out_result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared_statement`: The prepared statement to execute. * `out_result`: The pending query result. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_pending_prepared_streaming` {#docs:stable:clients:c:api::duckdb_pending_prepared_streaming} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Executes the prepared statement with the given bound parameters, and returns a pending result. This pending result will create a streaming duckdb_result when executed. The pending result represents an intermediate structure for a query that is not yet fully executed. Note that after calling `duckdb_pending_prepared_streaming`, the pending result should always be destroyed using `duckdb_destroy_pending`, even if this function returns DuckDBError. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_pending_prepared_streaming( duckdb_prepared_statement prepared_statement, duckdb_pending_result *out_result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared_statement`: The prepared statement to execute. * `out_result`: The pending query result. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_destroy_pending` {#docs:stable:clients:c:api::duckdb_destroy_pending} Closes the pending result and de-allocates all memory allocated for the result. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_pending( duckdb_pending_result *pending_result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `pending_result`: The pending result to destroy.
###### `duckdb_pending_error` {#docs:stable:clients:c:api::duckdb_pending_error} Returns the error message contained within the pending result. The result of this function must not be freed. It will be cleaned up when `duckdb_destroy_pending` is called. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c const char *duckdb_pending_error( duckdb_pending_result pending_result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `pending_result`: The pending result to fetch the error from. ####### Return Value {#docs:stable:clients:c:api::return-value} The error of the pending result.
###### `duckdb_pending_execute_task` {#docs:stable:clients:c:api::duckdb_pending_execute_task} Executes a single task within the query, returning whether or not the query is ready. If this returns DUCKDB_PENDING_RESULT_READY, the duckdb_execute_pending function can be called to obtain the result. If this returns DUCKDB_PENDING_RESULT_NOT_READY, the duckdb_pending_execute_task function should be called again. If this returns DUCKDB_PENDING_ERROR, an error occurred during execution. The error message can be obtained by calling duckdb_pending_error on the pending_result. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_pending_state duckdb_pending_execute_task( duckdb_pending_result pending_result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `pending_result`: The pending result to execute a task within. ####### Return Value {#docs:stable:clients:c:api::return-value} The state of the pending result after the execution.
###### `duckdb_pending_execute_check_state` {#docs:stable:clients:c:api::duckdb_pending_execute_check_state} If this returns DUCKDB_PENDING_RESULT_READY, the duckdb_execute_pending function can be called to obtain the result. If this returns DUCKDB_PENDING_RESULT_NOT_READY, the duckdb_pending_execute_check_state function should be called again. If this returns DUCKDB_PENDING_ERROR, an error occurred during execution. The error message can be obtained by calling duckdb_pending_error on the pending_result. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_pending_state duckdb_pending_execute_check_state( duckdb_pending_result pending_result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `pending_result`: The pending result. ####### Return Value {#docs:stable:clients:c:api::return-value} The state of the pending result.
###### `duckdb_execute_pending` {#docs:stable:clients:c:api::duckdb_execute_pending} Fully execute a pending query result, returning the final query result. If duckdb_pending_execute_task has been called until DUCKDB_PENDING_RESULT_READY was returned, this will return fast. Otherwise, all remaining tasks must be executed first. Note that the result must be freed with `duckdb_destroy_result`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_execute_pending( duckdb_pending_result pending_result, duckdb_result *out_result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `pending_result`: The pending result to execute. * `out_result`: The result object. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_pending_execution_is_finished` {#docs:stable:clients:c:api::duckdb_pending_execution_is_finished} Returns whether a duckdb_pending_state is finished executing. For example if `pending_state` is DUCKDB_PENDING_RESULT_READY, this function will return true. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_pending_execution_is_finished( duckdb_pending_state pending_state ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `pending_state`: The pending state on which to decide whether to finish execution. ####### Return Value {#docs:stable:clients:c:api::return-value} Boolean indicating pending execution should be considered finished.
###### `duckdb_destroy_value` {#docs:stable:clients:c:api::duckdb_destroy_value} Destroys the value and de-allocates all memory allocated for that type. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_value( duckdb_value *value ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `value`: The value to destroy.
###### `duckdb_create_varchar` {#docs:stable:clients:c:api::duckdb_create_varchar} Creates a value from a null-terminated string ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_varchar( const char *text ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `text`: The null-terminated string ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_varchar_length` {#docs:stable:clients:c:api::duckdb_create_varchar_length} Creates a value from a string ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_varchar_length( const char *text, idx_t length ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `text`: The text * `length`: The length of the text ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_bool` {#docs:stable:clients:c:api::duckdb_create_bool} Creates a value from a boolean ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_bool( bool input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The boolean value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_int8` {#docs:stable:clients:c:api::duckdb_create_int8} Creates a value from an int8_t (a tinyint) ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_int8( int8_t input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The tinyint value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_uint8` {#docs:stable:clients:c:api::duckdb_create_uint8} Creates a value from a uint8_t (a utinyint) ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_uint8( uint8_t input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The utinyint value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_int16` {#docs:stable:clients:c:api::duckdb_create_int16} Creates a value from an int16_t (a smallint) ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_int16( int16_t input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The smallint value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_uint16` {#docs:stable:clients:c:api::duckdb_create_uint16} Creates a value from a uint16_t (a usmallint) ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_uint16( uint16_t input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The usmallint value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_int32` {#docs:stable:clients:c:api::duckdb_create_int32} Creates a value from an int32_t (an integer) ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_int32( int32_t input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The integer value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_uint32` {#docs:stable:clients:c:api::duckdb_create_uint32} Creates a value from a uint32_t (a uinteger) ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_uint32( uint32_t input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The uinteger value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_uint64` {#docs:stable:clients:c:api::duckdb_create_uint64} Creates a value from a uint64_t (a ubigint) ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_uint64( uint64_t input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The ubigint value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_int64` {#docs:stable:clients:c:api::duckdb_create_int64} Creates a value from an int64 ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_int64( int64_t val ); ```
###### `duckdb_create_hugeint` {#docs:stable:clients:c:api::duckdb_create_hugeint} Creates a value from a hugeint ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_hugeint( duckdb_hugeint input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The hugeint value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_uhugeint` {#docs:stable:clients:c:api::duckdb_create_uhugeint} Creates a value from a uhugeint ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_uhugeint( duckdb_uhugeint input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The uhugeint value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_bignum` {#docs:stable:clients:c:api::duckdb_create_bignum} Creates a BIGNUM value from a duckdb_bignum ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_bignum( duckdb_bignum input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The duckdb_bignum value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_decimal` {#docs:stable:clients:c:api::duckdb_create_decimal} Creates a DECIMAL value from a duckdb_decimal ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_decimal( duckdb_decimal input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The duckdb_decimal value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_float` {#docs:stable:clients:c:api::duckdb_create_float} Creates a value from a float ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_float( float input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The float value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_double` {#docs:stable:clients:c:api::duckdb_create_double} Creates a value from a double ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_double( double input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The double value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_date` {#docs:stable:clients:c:api::duckdb_create_date} Creates a value from a date ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_date( duckdb_date input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The date value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_time` {#docs:stable:clients:c:api::duckdb_create_time} Creates a value from a time ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_time( duckdb_time input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The time value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_time_ns` {#docs:stable:clients:c:api::duckdb_create_time_ns} Creates a value from a time_ns ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_time_ns( duckdb_time_ns input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The time value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_time_tz_value` {#docs:stable:clients:c:api::duckdb_create_time_tz_value} Creates a value from a time_tz. Not to be confused with `duckdb_create_time_tz`, which creates a duckdb_time_tz_t. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_time_tz_value( duckdb_time_tz value ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `value`: The time_tz value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_timestamp` {#docs:stable:clients:c:api::duckdb_create_timestamp} Creates a TIMESTAMP value from a duckdb_timestamp ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_timestamp( duckdb_timestamp input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The duckdb_timestamp value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_timestamp_tz` {#docs:stable:clients:c:api::duckdb_create_timestamp_tz} Creates a TIMESTAMP_TZ value from a duckdb_timestamp ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_timestamp_tz( duckdb_timestamp input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The duckdb_timestamp value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_timestamp_s` {#docs:stable:clients:c:api::duckdb_create_timestamp_s} Creates a TIMESTAMP_S value from a duckdb_timestamp_s ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_timestamp_s( duckdb_timestamp_s input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The duckdb_timestamp_s value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_timestamp_ms` {#docs:stable:clients:c:api::duckdb_create_timestamp_ms} Creates a TIMESTAMP_MS value from a duckdb_timestamp_ms ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_timestamp_ms( duckdb_timestamp_ms input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The duckdb_timestamp_ms value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_timestamp_ns` {#docs:stable:clients:c:api::duckdb_create_timestamp_ns} Creates a TIMESTAMP_NS value from a duckdb_timestamp_ns ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_timestamp_ns( duckdb_timestamp_ns input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The duckdb_timestamp_ns value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_interval` {#docs:stable:clients:c:api::duckdb_create_interval} Creates a value from an interval ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_interval( duckdb_interval input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The interval value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_blob` {#docs:stable:clients:c:api::duckdb_create_blob} Creates a value from a blob ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_blob( const uint8_t *data, idx_t length ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `data`: The blob data * `length`: The length of the blob data ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_bit` {#docs:stable:clients:c:api::duckdb_create_bit} Creates a BIT value from a duckdb_bit ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_bit( duckdb_bit input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The duckdb_bit value ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_create_uuid` {#docs:stable:clients:c:api::duckdb_create_uuid} Creates a UUID value from a uhugeint ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_uuid( duckdb_uhugeint input ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `input`: The duckdb_uhugeint containing the UUID ####### Return Value {#docs:stable:clients:c:api::return-value} The value. This must be destroyed with `duckdb_destroy_value`.
###### `duckdb_get_bool` {#docs:stable:clients:c:api::duckdb_get_bool} Returns the boolean value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_get_bool( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a boolean ####### Return Value {#docs:stable:clients:c:api::return-value} A boolean, or false if the value cannot be converted
###### `duckdb_get_int8` {#docs:stable:clients:c:api::duckdb_get_int8} Returns the int8_t value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c int8_t duckdb_get_int8( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a tinyint ####### Return Value {#docs:stable:clients:c:api::return-value} A int8_t, or MinValue if the value cannot be converted
###### `duckdb_get_uint8` {#docs:stable:clients:c:api::duckdb_get_uint8} Returns the uint8_t value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c uint8_t duckdb_get_uint8( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a utinyint ####### Return Value {#docs:stable:clients:c:api::return-value} A uint8_t, or MinValue if the value cannot be converted
###### `duckdb_get_int16` {#docs:stable:clients:c:api::duckdb_get_int16} Returns the int16_t value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c int16_t duckdb_get_int16( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a smallint ####### Return Value {#docs:stable:clients:c:api::return-value} A int16_t, or MinValue if the value cannot be converted
###### `duckdb_get_uint16` {#docs:stable:clients:c:api::duckdb_get_uint16} Returns the uint16_t value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c uint16_t duckdb_get_uint16( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a usmallint ####### Return Value {#docs:stable:clients:c:api::return-value} A uint16_t, or MinValue if the value cannot be converted
###### `duckdb_get_int32` {#docs:stable:clients:c:api::duckdb_get_int32} Returns the int32_t value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c int32_t duckdb_get_int32( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing an integer ####### Return Value {#docs:stable:clients:c:api::return-value} A int32_t, or MinValue if the value cannot be converted
###### `duckdb_get_uint32` {#docs:stable:clients:c:api::duckdb_get_uint32} Returns the uint32_t value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c uint32_t duckdb_get_uint32( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a uinteger ####### Return Value {#docs:stable:clients:c:api::return-value} A uint32_t, or MinValue if the value cannot be converted
###### `duckdb_get_int64` {#docs:stable:clients:c:api::duckdb_get_int64} Returns the int64_t value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c int64_t duckdb_get_int64( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a bigint ####### Return Value {#docs:stable:clients:c:api::return-value} A int64_t, or MinValue if the value cannot be converted
###### `duckdb_get_uint64` {#docs:stable:clients:c:api::duckdb_get_uint64} Returns the uint64_t value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c uint64_t duckdb_get_uint64( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a ubigint ####### Return Value {#docs:stable:clients:c:api::return-value} A uint64_t, or MinValue if the value cannot be converted
###### `duckdb_get_hugeint` {#docs:stable:clients:c:api::duckdb_get_hugeint} Returns the hugeint value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_hugeint duckdb_get_hugeint( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a hugeint ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_hugeint, or MinValue if the value cannot be converted
###### `duckdb_get_uhugeint` {#docs:stable:clients:c:api::duckdb_get_uhugeint} Returns the uhugeint value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_uhugeint duckdb_get_uhugeint( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a uhugeint ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_uhugeint, or MinValue if the value cannot be converted
###### `duckdb_get_bignum` {#docs:stable:clients:c:api::duckdb_get_bignum} Returns the duckdb_bignum value of the given value. The `data` field must be destroyed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_bignum duckdb_get_bignum( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a BIGNUM ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_bignum. The `data` field must be destroyed with `duckdb_free`.
###### `duckdb_get_decimal` {#docs:stable:clients:c:api::duckdb_get_decimal} Returns the duckdb_decimal value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_decimal duckdb_get_decimal( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a DECIMAL ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_decimal, or MinValue if the value cannot be converted
###### `duckdb_get_float` {#docs:stable:clients:c:api::duckdb_get_float} Returns the float value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c float duckdb_get_float( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a float ####### Return Value {#docs:stable:clients:c:api::return-value} A float, or NAN if the value cannot be converted
###### `duckdb_get_double` {#docs:stable:clients:c:api::duckdb_get_double} Returns the double value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c double duckdb_get_double( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a double ####### Return Value {#docs:stable:clients:c:api::return-value} A double, or NAN if the value cannot be converted
###### `duckdb_get_date` {#docs:stable:clients:c:api::duckdb_get_date} Returns the date value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_date duckdb_get_date( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a date ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_date, or MinValue if the value cannot be converted
###### `duckdb_get_time` {#docs:stable:clients:c:api::duckdb_get_time} Returns the time value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_time duckdb_get_time( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a time ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_time, or MinValue if the value cannot be converted
###### `duckdb_get_time_ns` {#docs:stable:clients:c:api::duckdb_get_time_ns} Returns the time_ns value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_time_ns duckdb_get_time_ns( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a time_ns ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_time_ns, or MinValue if the value cannot be converted
###### `duckdb_get_time_tz` {#docs:stable:clients:c:api::duckdb_get_time_tz} Returns the time_tz value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_time_tz duckdb_get_time_tz( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a time_tz ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_time_tz, or MinValue if the value cannot be converted
###### `duckdb_get_timestamp` {#docs:stable:clients:c:api::duckdb_get_timestamp} Returns the TIMESTAMP value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_timestamp duckdb_get_timestamp( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a TIMESTAMP ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_timestamp, or MinValue if the value cannot be converted
###### `duckdb_get_timestamp_tz` {#docs:stable:clients:c:api::duckdb_get_timestamp_tz} Returns the TIMESTAMP_TZ value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_timestamp duckdb_get_timestamp_tz( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a TIMESTAMP_TZ ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_timestamp, or MinValue if the value cannot be converted
###### `duckdb_get_timestamp_s` {#docs:stable:clients:c:api::duckdb_get_timestamp_s} Returns the duckdb_timestamp_s value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_timestamp_s duckdb_get_timestamp_s( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a TIMESTAMP_S ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_timestamp_s, or MinValue if the value cannot be converted
###### `duckdb_get_timestamp_ms` {#docs:stable:clients:c:api::duckdb_get_timestamp_ms} Returns the duckdb_timestamp_ms value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_timestamp_ms duckdb_get_timestamp_ms( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a TIMESTAMP_MS ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_timestamp_ms, or MinValue if the value cannot be converted
###### `duckdb_get_timestamp_ns` {#docs:stable:clients:c:api::duckdb_get_timestamp_ns} Returns the duckdb_timestamp_ns value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_timestamp_ns duckdb_get_timestamp_ns( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a TIMESTAMP_NS ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_timestamp_ns, or MinValue if the value cannot be converted
###### `duckdb_get_interval` {#docs:stable:clients:c:api::duckdb_get_interval} Returns the interval value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_interval duckdb_get_interval( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a interval ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_interval, or MinValue if the value cannot be converted
###### `duckdb_get_value_type` {#docs:stable:clients:c:api::duckdb_get_value_type} Returns the type of the given value. The type is valid as long as the value is not destroyed. The type itself must not be destroyed. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_get_value_type( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_logical_type.
###### `duckdb_get_blob` {#docs:stable:clients:c:api::duckdb_get_blob} Returns the blob value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_blob duckdb_get_blob( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a blob ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_blob
###### `duckdb_get_bit` {#docs:stable:clients:c:api::duckdb_get_bit} Returns the duckdb_bit value of the given value. The `data` field must be destroyed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_bit duckdb_get_bit( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a BIT ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_bit
###### `duckdb_get_uuid` {#docs:stable:clients:c:api::duckdb_get_uuid} Returns a duckdb_uhugeint representing the UUID value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_uhugeint duckdb_get_uuid( duckdb_value val ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `val`: A duckdb_value containing a UUID ####### Return Value {#docs:stable:clients:c:api::return-value} A duckdb_uhugeint representing the UUID value
###### `duckdb_get_varchar` {#docs:stable:clients:c:api::duckdb_get_varchar} Obtains a string representation of the given value. The result must be destroyed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c char *duckdb_get_varchar( duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `value`: The value ####### Return Value {#docs:stable:clients:c:api::return-value} The string value. This must be destroyed with `duckdb_free`.
###### `duckdb_create_struct_value` {#docs:stable:clients:c:api::duckdb_create_struct_value} Creates a struct value from a type and an array of values. Must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_struct_value( duckdb_logical_type type, duckdb_value *values ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The type of the struct * `values`: The values for the struct fields ####### Return Value {#docs:stable:clients:c:api::return-value} The struct value, or nullptr, if any child type is `DUCKDB_TYPE_ANY` or `DUCKDB_TYPE_INVALID`.
###### `duckdb_create_list_value` {#docs:stable:clients:c:api::duckdb_create_list_value} Creates a list value from a child (element) type and an array of values of length `value_count`. Must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_list_value( duckdb_logical_type type, duckdb_value *values, idx_t value_count ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The type of the list * `values`: The values for the list * `value_count`: The number of values in the list ####### Return Value {#docs:stable:clients:c:api::return-value} The list value, or nullptr, if the child type is `DUCKDB_TYPE_ANY` or `DUCKDB_TYPE_INVALID`.
###### `duckdb_create_array_value` {#docs:stable:clients:c:api::duckdb_create_array_value} Creates an array value from a child (element) type and an array of values of length `value_count`. Must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_array_value( duckdb_logical_type type, duckdb_value *values, idx_t value_count ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The type of the array * `values`: The values for the array * `value_count`: The number of values in the array ####### Return Value {#docs:stable:clients:c:api::return-value} The array value, or nullptr, if the child type is `DUCKDB_TYPE_ANY` or `DUCKDB_TYPE_INVALID`.
###### `duckdb_create_map_value` {#docs:stable:clients:c:api::duckdb_create_map_value} Creates a map value from a map type and two arrays, one for the keys and one for the values, each of length `entry_count`. Must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_map_value( duckdb_logical_type map_type, duckdb_value *keys, duckdb_value *values, idx_t entry_count ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `map_type`: The map type * `keys`: The keys of the map * `values`: The values of the map * `entry_count`: The number of entrys (key-value pairs) in the map ####### Return Value {#docs:stable:clients:c:api::return-value} The map value, or nullptr, if the parameters are invalid.
###### `duckdb_create_union_value` {#docs:stable:clients:c:api::duckdb_create_union_value} Creates a union value from a union type, a tag index, and a value. Must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_union_value( duckdb_logical_type union_type, idx_t tag_index, duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `union_type`: The union type * `tag_index`: The index of the tag of the union * `value`: The value of the union for that tag ####### Return Value {#docs:stable:clients:c:api::return-value} The union value, or nullptr, if the parameters are invalid.
###### `duckdb_get_map_size` {#docs:stable:clients:c:api::duckdb_get_map_size} Returns the number of elements in a MAP value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_get_map_size( duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `value`: The MAP value. ####### Return Value {#docs:stable:clients:c:api::return-value} The number of elements in the map.
###### `duckdb_get_map_key` {#docs:stable:clients:c:api::duckdb_get_map_key} Returns the MAP key at index as a duckdb_value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_get_map_key( duckdb_value value, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `value`: The MAP value. * `index`: The index of the key. ####### Return Value {#docs:stable:clients:c:api::return-value} The key as a duckdb_value.
###### `duckdb_get_map_value` {#docs:stable:clients:c:api::duckdb_get_map_value} Returns the MAP value at index as a duckdb_value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_get_map_value( duckdb_value value, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `value`: The MAP value. * `index`: The index of the value. ####### Return Value {#docs:stable:clients:c:api::return-value} The value as a duckdb_value.
###### `duckdb_is_null_value` {#docs:stable:clients:c:api::duckdb_is_null_value} Returns whether the value's type is SQLNULL or not. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_is_null_value( duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `value`: The value to check. ####### Return Value {#docs:stable:clients:c:api::return-value} True, if the value's type is SQLNULL, otherwise false.
###### `duckdb_create_null_value` {#docs:stable:clients:c:api::duckdb_create_null_value} Creates a value of type SQLNULL. ####### Return Value {#docs:stable:clients:c:api::return-value} The duckdb_value representing SQLNULL. This must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_null_value( ); ```
###### `duckdb_get_list_size` {#docs:stable:clients:c:api::duckdb_get_list_size} Returns the number of elements in a LIST value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_get_list_size( duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `value`: The LIST value. ####### Return Value {#docs:stable:clients:c:api::return-value} The number of elements in the list.
###### `duckdb_get_list_child` {#docs:stable:clients:c:api::duckdb_get_list_child} Returns the LIST child at index as a duckdb_value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_get_list_child( duckdb_value value, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `value`: The LIST value. * `index`: The index of the child. ####### Return Value {#docs:stable:clients:c:api::return-value} The child as a duckdb_value.
###### `duckdb_create_enum_value` {#docs:stable:clients:c:api::duckdb_create_enum_value} Creates an enum value from a type and a value. Must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_create_enum_value( duckdb_logical_type type, uint64_t value ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The type of the enum * `value`: The value for the enum ####### Return Value {#docs:stable:clients:c:api::return-value} The enum value, or nullptr.
###### `duckdb_get_enum_value` {#docs:stable:clients:c:api::duckdb_get_enum_value} Returns the enum value of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c uint64_t duckdb_get_enum_value( duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `value`: A duckdb_value containing an enum ####### Return Value {#docs:stable:clients:c:api::return-value} A uint64_t, or MinValue if the value cannot be converted
###### `duckdb_get_struct_child` {#docs:stable:clients:c:api::duckdb_get_struct_child} Returns the STRUCT child at index as a duckdb_value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_get_struct_child( duckdb_value value, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `value`: The STRUCT value. * `index`: The index of the child. ####### Return Value {#docs:stable:clients:c:api::return-value} The child as a duckdb_value.
###### `duckdb_value_to_string` {#docs:stable:clients:c:api::duckdb_value_to_string} Returns the SQL string representation of the given value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c char *duckdb_value_to_string( duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `value`: A duckdb_value. ####### Return Value {#docs:stable:clients:c:api::return-value} The SQL string representation as a null-terminated string. The result must be freed with `duckdb_free`.
###### `duckdb_create_logical_type` {#docs:stable:clients:c:api::duckdb_create_logical_type} Creates a `duckdb_logical_type` from a primitive type. The resulting logical type must be destroyed with `duckdb_destroy_logical_type`. Returns an invalid logical type, if type is: `DUCKDB_TYPE_INVALID`, `DUCKDB_TYPE_DECIMAL`, `DUCKDB_TYPE_ENUM`, `DUCKDB_TYPE_LIST`, `DUCKDB_TYPE_STRUCT`, `DUCKDB_TYPE_MAP`, `DUCKDB_TYPE_ARRAY`, or `DUCKDB_TYPE_UNION`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_create_logical_type( duckdb_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The primitive type to create. ####### Return Value {#docs:stable:clients:c:api::return-value} The logical type.
###### `duckdb_logical_type_get_alias` {#docs:stable:clients:c:api::duckdb_logical_type_get_alias} Returns the alias of a duckdb_logical_type, if set, else `nullptr`. The result must be destroyed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c char *duckdb_logical_type_get_alias( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type ####### Return Value {#docs:stable:clients:c:api::return-value} The alias or `nullptr`
###### `duckdb_logical_type_set_alias` {#docs:stable:clients:c:api::duckdb_logical_type_set_alias} Sets the alias of a duckdb_logical_type. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_logical_type_set_alias( duckdb_logical_type type, const char *alias ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type * `alias`: The alias to set
###### `duckdb_create_list_type` {#docs:stable:clients:c:api::duckdb_create_list_type} Creates a LIST type from its child type. The return type must be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_create_list_type( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The child type of the list ####### Return Value {#docs:stable:clients:c:api::return-value} The logical type.
###### `duckdb_create_array_type` {#docs:stable:clients:c:api::duckdb_create_array_type} Creates an ARRAY type from its child type. The return type must be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_create_array_type( duckdb_logical_type type, idx_t array_size ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The child type of the array. * `array_size`: The number of elements in the array. ####### Return Value {#docs:stable:clients:c:api::return-value} The logical type.
###### `duckdb_create_map_type` {#docs:stable:clients:c:api::duckdb_create_map_type} Creates a MAP type from its key type and value type. The return type must be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_create_map_type( duckdb_logical_type key_type, duckdb_logical_type value_type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `key_type`: The map's key type. * `value_type`: The map's value type. ####### Return Value {#docs:stable:clients:c:api::return-value} The logical type.
###### `duckdb_create_union_type` {#docs:stable:clients:c:api::duckdb_create_union_type} Creates a UNION type from the passed arrays. The return type must be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_create_union_type( duckdb_logical_type *member_types, const char **member_names, idx_t member_count ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `member_types`: The array of union member types. * `member_names`: The union member names. * `member_count`: The number of union members. ####### Return Value {#docs:stable:clients:c:api::return-value} The logical type.
###### `duckdb_create_struct_type` {#docs:stable:clients:c:api::duckdb_create_struct_type} Creates a STRUCT type based on the member types and names. The resulting type must be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_create_struct_type( duckdb_logical_type *member_types, const char **member_names, idx_t member_count ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `member_types`: The array of types of the struct members. * `member_names`: The array of names of the struct members. * `member_count`: The number of members of the struct. ####### Return Value {#docs:stable:clients:c:api::return-value} The logical type.
###### `duckdb_create_enum_type` {#docs:stable:clients:c:api::duckdb_create_enum_type} Creates an ENUM type from the passed member name array. The resulting type should be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_create_enum_type( const char **member_names, idx_t member_count ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `member_names`: The array of names that the enum should consist of. * `member_count`: The number of elements that were specified in the array. ####### Return Value {#docs:stable:clients:c:api::return-value} The logical type.
###### `duckdb_create_decimal_type` {#docs:stable:clients:c:api::duckdb_create_decimal_type} Creates a DECIMAL type with the specified width and scale. The resulting type should be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_create_decimal_type( uint8_t width, uint8_t scale ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `width`: The width of the decimal type * `scale`: The scale of the decimal type ####### Return Value {#docs:stable:clients:c:api::return-value} The logical type.
###### `duckdb_get_type_id` {#docs:stable:clients:c:api::duckdb_get_type_id} Retrieves the enum `duckdb_type` of a `duckdb_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_type duckdb_get_type_id( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type. ####### Return Value {#docs:stable:clients:c:api::return-value} The `duckdb_type` id.
###### `duckdb_decimal_width` {#docs:stable:clients:c:api::duckdb_decimal_width} Retrieves the width of a decimal type. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c uint8_t duckdb_decimal_width( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:api::return-value} The width of the decimal type
###### `duckdb_decimal_scale` {#docs:stable:clients:c:api::duckdb_decimal_scale} Retrieves the scale of a decimal type. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c uint8_t duckdb_decimal_scale( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:api::return-value} The scale of the decimal type
###### `duckdb_decimal_internal_type` {#docs:stable:clients:c:api::duckdb_decimal_internal_type} Retrieves the internal storage type of a decimal type. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_type duckdb_decimal_internal_type( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:api::return-value} The internal type of the decimal type
###### `duckdb_enum_internal_type` {#docs:stable:clients:c:api::duckdb_enum_internal_type} Retrieves the internal storage type of an enum type. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_type duckdb_enum_internal_type( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:api::return-value} The internal type of the enum type
###### `duckdb_enum_dictionary_size` {#docs:stable:clients:c:api::duckdb_enum_dictionary_size} Retrieves the dictionary size of the enum type. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c uint32_t duckdb_enum_dictionary_size( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:api::return-value} The dictionary size of the enum type
###### `duckdb_enum_dictionary_value` {#docs:stable:clients:c:api::duckdb_enum_dictionary_value} Retrieves the dictionary value at the specified position from the enum. The result must be freed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c char *duckdb_enum_dictionary_value( duckdb_logical_type type, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type object * `index`: The index in the dictionary ####### Return Value {#docs:stable:clients:c:api::return-value} The string value of the enum type. Must be freed with `duckdb_free`.
###### `duckdb_list_type_child_type` {#docs:stable:clients:c:api::duckdb_list_type_child_type} Retrieves the child type of the given LIST type. Also accepts MAP types. The result must be freed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_list_type_child_type( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type, either LIST or MAP. ####### Return Value {#docs:stable:clients:c:api::return-value} The child type of the LIST or MAP type.
###### `duckdb_array_type_child_type` {#docs:stable:clients:c:api::duckdb_array_type_child_type} Retrieves the child type of the given ARRAY type. The result must be freed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_array_type_child_type( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type. Must be ARRAY. ####### Return Value {#docs:stable:clients:c:api::return-value} The child type of the ARRAY type.
###### `duckdb_array_type_array_size` {#docs:stable:clients:c:api::duckdb_array_type_array_size} Retrieves the array size of the given array type. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_array_type_array_size( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:api::return-value} The fixed number of elements the values of this array type can store.
###### `duckdb_map_type_key_type` {#docs:stable:clients:c:api::duckdb_map_type_key_type} Retrieves the key type of the given map type. The result must be freed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_map_type_key_type( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:api::return-value} The key type of the map type. Must be destroyed with `duckdb_destroy_logical_type`.
###### `duckdb_map_type_value_type` {#docs:stable:clients:c:api::duckdb_map_type_value_type} Retrieves the value type of the given map type. The result must be freed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_map_type_value_type( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:api::return-value} The value type of the map type. Must be destroyed with `duckdb_destroy_logical_type`.
###### `duckdb_struct_type_child_count` {#docs:stable:clients:c:api::duckdb_struct_type_child_count} Returns the number of children of a struct type. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_struct_type_child_count( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type object ####### Return Value {#docs:stable:clients:c:api::return-value} The number of children of a struct type.
###### `duckdb_struct_type_child_name` {#docs:stable:clients:c:api::duckdb_struct_type_child_name} Retrieves the name of the struct child. The result must be freed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c char *duckdb_struct_type_child_name( duckdb_logical_type type, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type object * `index`: The child index ####### Return Value {#docs:stable:clients:c:api::return-value} The name of the struct type. Must be freed with `duckdb_free`.
###### `duckdb_struct_type_child_type` {#docs:stable:clients:c:api::duckdb_struct_type_child_type} Retrieves the child type of the given struct type at the specified index. The result must be freed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_struct_type_child_type( duckdb_logical_type type, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type object * `index`: The child index ####### Return Value {#docs:stable:clients:c:api::return-value} The child type of the struct type. Must be destroyed with `duckdb_destroy_logical_type`.
###### `duckdb_union_type_member_count` {#docs:stable:clients:c:api::duckdb_union_type_member_count} Returns the number of members that the union type has. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_union_type_member_count( duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type (union) object ####### Return Value {#docs:stable:clients:c:api::return-value} The number of members of a union type.
###### `duckdb_union_type_member_name` {#docs:stable:clients:c:api::duckdb_union_type_member_name} Retrieves the name of the union member. The result must be freed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c char *duckdb_union_type_member_name( duckdb_logical_type type, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type object * `index`: The child index ####### Return Value {#docs:stable:clients:c:api::return-value} The name of the union member. Must be freed with `duckdb_free`.
###### `duckdb_union_type_member_type` {#docs:stable:clients:c:api::duckdb_union_type_member_type} Retrieves the child type of the given union member at the specified index. The result must be freed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_union_type_member_type( duckdb_logical_type type, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type object * `index`: The child index ####### Return Value {#docs:stable:clients:c:api::return-value} The child type of the union member. Must be destroyed with `duckdb_destroy_logical_type`.
###### `duckdb_destroy_logical_type` {#docs:stable:clients:c:api::duckdb_destroy_logical_type} Destroys the logical type and de-allocates all memory allocated for that type. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_logical_type( duckdb_logical_type *type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type to destroy.
###### `duckdb_register_logical_type` {#docs:stable:clients:c:api::duckdb_register_logical_type} Registers a custom type within the given connection. The type must have an alias ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_register_logical_type( duckdb_connection con, duckdb_logical_type type, duckdb_create_type_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `con`: The connection to use * `type`: The custom type to register ####### Return Value {#docs:stable:clients:c:api::return-value} Whether or not the registration was successful.
###### `duckdb_create_data_chunk` {#docs:stable:clients:c:api::duckdb_create_data_chunk} Creates an empty data chunk with the specified column types. The result must be destroyed with `duckdb_destroy_data_chunk`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_data_chunk duckdb_create_data_chunk( duckdb_logical_type *types, idx_t column_count ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `types`: An array of column types. Column types can not contain ANY and INVALID types. * `column_count`: The number of columns. ####### Return Value {#docs:stable:clients:c:api::return-value} The data chunk.
###### `duckdb_destroy_data_chunk` {#docs:stable:clients:c:api::duckdb_destroy_data_chunk} Destroys the data chunk and de-allocates all memory allocated for that chunk. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_data_chunk( duckdb_data_chunk *chunk ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `chunk`: The data chunk to destroy.
###### `duckdb_data_chunk_reset` {#docs:stable:clients:c:api::duckdb_data_chunk_reset} Resets a data chunk, clearing the validity masks and setting the cardinality of the data chunk to 0. After calling this method, you must call `duckdb_vector_get_validity` and `duckdb_vector_get_data` to obtain current data and validity pointers ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_data_chunk_reset( duckdb_data_chunk chunk ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `chunk`: The data chunk to reset.
###### `duckdb_data_chunk_get_column_count` {#docs:stable:clients:c:api::duckdb_data_chunk_get_column_count} Retrieves the number of columns in a data chunk. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_data_chunk_get_column_count( duckdb_data_chunk chunk ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `chunk`: The data chunk to get the data from ####### Return Value {#docs:stable:clients:c:api::return-value} The number of columns in the data chunk
###### `duckdb_data_chunk_get_vector` {#docs:stable:clients:c:api::duckdb_data_chunk_get_vector} Retrieves the vector at the specified column index in the data chunk. The pointer to the vector is valid for as long as the chunk is alive. It does NOT need to be destroyed. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_vector duckdb_data_chunk_get_vector( duckdb_data_chunk chunk, idx_t col_idx ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `chunk`: The data chunk to get the data from ####### Return Value {#docs:stable:clients:c:api::return-value} The vector
###### `duckdb_data_chunk_get_size` {#docs:stable:clients:c:api::duckdb_data_chunk_get_size} Retrieves the current number of tuples in a data chunk. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_data_chunk_get_size( duckdb_data_chunk chunk ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `chunk`: The data chunk to get the data from ####### Return Value {#docs:stable:clients:c:api::return-value} The number of tuples in the data chunk
###### `duckdb_data_chunk_set_size` {#docs:stable:clients:c:api::duckdb_data_chunk_set_size} Sets the current number of tuples in a data chunk. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_data_chunk_set_size( duckdb_data_chunk chunk, idx_t size ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `chunk`: The data chunk to set the size in * `size`: The number of tuples in the data chunk
###### `duckdb_create_vector` {#docs:stable:clients:c:api::duckdb_create_vector} Creates a flat vector. Must be destroyed with `duckdb_destroy_vector`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_vector duckdb_create_vector( duckdb_logical_type type, idx_t capacity ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `type`: The logical type of the vector. * `capacity`: The capacity of the vector. ####### Return Value {#docs:stable:clients:c:api::return-value} The vector.
###### `duckdb_destroy_vector` {#docs:stable:clients:c:api::duckdb_destroy_vector} Destroys the vector and de-allocates its memory. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_vector( duckdb_vector *vector ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `vector`: A pointer to the vector.
###### `duckdb_vector_get_column_type` {#docs:stable:clients:c:api::duckdb_vector_get_column_type} Retrieves the column type of the specified vector. The result must be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_vector_get_column_type( duckdb_vector vector ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `vector`: The vector get the data from ####### Return Value {#docs:stable:clients:c:api::return-value} The type of the vector
###### `duckdb_vector_get_data` {#docs:stable:clients:c:api::duckdb_vector_get_data} Retrieves the data pointer of the vector. The data pointer can be used to read or write values from the vector. How to read or write values depends on the type of the vector. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void *duckdb_vector_get_data( duckdb_vector vector ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `vector`: The vector to get the data from ####### Return Value {#docs:stable:clients:c:api::return-value} The data pointer
###### `duckdb_vector_get_validity` {#docs:stable:clients:c:api::duckdb_vector_get_validity} Retrieves the validity mask pointer of the specified vector. If all values are valid, this function MIGHT return NULL! The validity mask is a bitset that signifies null-ness within the data chunk. It is a series of uint64_t values, where each uint64_t value contains validity for 64 tuples. The bit is set to 1 if the value is valid (i.e., not NULL) or 0 if the value is invalid (i.e., NULL). Validity of a specific value can be obtained like this: idx_t entry_idx = row_idx / 64; idx_t idx_in_entry = row_idx % 64; bool is_valid = validity_mask[entry_idx] & (1 << idx_in_entry); Alternatively, the (slower) duckdb_validity_row_is_valid function can be used. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c uint64_t *duckdb_vector_get_validity( duckdb_vector vector ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `vector`: The vector to get the data from ####### Return Value {#docs:stable:clients:c:api::return-value} The pointer to the validity mask, or NULL if no validity mask is present
###### `duckdb_vector_ensure_validity_writable` {#docs:stable:clients:c:api::duckdb_vector_ensure_validity_writable} Ensures the validity mask is writable by allocating it. After this function is called, `duckdb_vector_get_validity` will ALWAYS return non-NULL. This allows NULL values to be written to the vector, regardless of whether a validity mask was present before. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_vector_ensure_validity_writable( duckdb_vector vector ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `vector`: The vector to alter
###### `duckdb_vector_assign_string_element` {#docs:stable:clients:c:api::duckdb_vector_assign_string_element} Assigns a string element in the vector at the specified location. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_vector_assign_string_element( duckdb_vector vector, idx_t index, const char *str ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `vector`: The vector to alter * `index`: The row position in the vector to assign the string to * `str`: The null-terminated string
###### `duckdb_vector_assign_string_element_len` {#docs:stable:clients:c:api::duckdb_vector_assign_string_element_len} Assigns a string element in the vector at the specified location. You may also use this function to assign BLOBs. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_vector_assign_string_element_len( duckdb_vector vector, idx_t index, const char *str, idx_t str_len ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `vector`: The vector to alter * `index`: The row position in the vector to assign the string to * `str`: The string * `str_len`: The length of the string (in bytes)
###### `duckdb_list_vector_get_child` {#docs:stable:clients:c:api::duckdb_list_vector_get_child} Retrieves the child vector of a list vector. The resulting vector is valid as long as the parent vector is valid. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_vector duckdb_list_vector_get_child( duckdb_vector vector ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `vector`: The vector ####### Return Value {#docs:stable:clients:c:api::return-value} The child vector
###### `duckdb_list_vector_get_size` {#docs:stable:clients:c:api::duckdb_list_vector_get_size} Returns the size of the child vector of the list. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_list_vector_get_size( duckdb_vector vector ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `vector`: The vector ####### Return Value {#docs:stable:clients:c:api::return-value} The size of the child list
###### `duckdb_list_vector_set_size` {#docs:stable:clients:c:api::duckdb_list_vector_set_size} Sets the total size of the underlying child-vector of a list vector. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_list_vector_set_size( duckdb_vector vector, idx_t size ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `vector`: The list vector. * `size`: The size of the child list. ####### Return Value {#docs:stable:clients:c:api::return-value} The duckdb state. Returns DuckDBError if the vector is nullptr.
###### `duckdb_list_vector_reserve` {#docs:stable:clients:c:api::duckdb_list_vector_reserve} Sets the total capacity of the underlying child-vector of a list. After calling this method, you must call `duckdb_vector_get_validity` and `duckdb_vector_get_data` to obtain current data and validity pointers ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_list_vector_reserve( duckdb_vector vector, idx_t required_capacity ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `vector`: The list vector. * `required_capacity`: the total capacity to reserve. ####### Return Value {#docs:stable:clients:c:api::return-value} The duckdb state. Returns DuckDBError if the vector is nullptr.
###### `duckdb_struct_vector_get_child` {#docs:stable:clients:c:api::duckdb_struct_vector_get_child} Retrieves the child vector of a struct vector. The resulting vector is valid as long as the parent vector is valid. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_vector duckdb_struct_vector_get_child( duckdb_vector vector, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `vector`: The vector * `index`: The child index ####### Return Value {#docs:stable:clients:c:api::return-value} The child vector
###### `duckdb_array_vector_get_child` {#docs:stable:clients:c:api::duckdb_array_vector_get_child} Retrieves the child vector of an array vector. The resulting vector is valid as long as the parent vector is valid. The resulting vector has the size of the parent vector multiplied by the array size. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_vector duckdb_array_vector_get_child( duckdb_vector vector ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `vector`: The vector ####### Return Value {#docs:stable:clients:c:api::return-value} The child vector
###### `duckdb_slice_vector` {#docs:stable:clients:c:api::duckdb_slice_vector} Slice a vector with a selection vector. The length of the selection vector must be less than or equal to the length of the vector. Turns the vector into a dictionary vector. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_slice_vector( duckdb_vector vector, duckdb_selection_vector sel, idx_t len ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `vector`: The vector to slice. * `sel`: The selection vector. * `len`: The length of the selection vector.
###### `duckdb_vector_copy_sel` {#docs:stable:clients:c:api::duckdb_vector_copy_sel} Copy the src vector to the dst with a selection vector that identifies which indices to copy. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_vector_copy_sel( duckdb_vector src, duckdb_vector dst, duckdb_selection_vector sel, idx_t src_count, idx_t src_offset, idx_t dst_offset ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `src`: The vector to copy from. * `dst`: The vector to copy to. * `sel`: The selection vector. The length of the selection vector should not be more than the length of the src vector * `src_count`: The number of entries from selection vector to copy. Think of this as the effective length of the selection vector starting from index 0 * `src_offset`: The offset in the selection vector to copy from (important: actual number of items copied = src_count - src_offset). * `dst_offset`: The offset in the dst vector to start copying to.
###### `duckdb_vector_reference_value` {#docs:stable:clients:c:api::duckdb_vector_reference_value} Copies the value from `value` to `vector`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_vector_reference_value( duckdb_vector vector, duckdb_value value ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `vector`: The receiving vector. * `value`: The value to copy into the vector.
###### `duckdb_vector_reference_vector` {#docs:stable:clients:c:api::duckdb_vector_reference_vector} Changes `to_vector` to reference `from_vector. After, the vectors share ownership of the data. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_vector_reference_vector( duckdb_vector to_vector, duckdb_vector from_vector ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `to_vector`: The receiving vector. * `from_vector`: The vector to reference.
###### `duckdb_validity_row_is_valid` {#docs:stable:clients:c:api::duckdb_validity_row_is_valid} Returns whether or not a row is valid (i.e., not NULL) in the given validity mask. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_validity_row_is_valid( uint64_t *validity, idx_t row ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `validity`: The validity mask, as obtained through `duckdb_vector_get_validity` * `row`: The row index ####### Return Value {#docs:stable:clients:c:api::return-value} true if the row is valid, false otherwise
###### `duckdb_validity_set_row_validity` {#docs:stable:clients:c:api::duckdb_validity_set_row_validity} In a validity mask, sets a specific row to either valid or invalid. Note that `duckdb_vector_ensure_validity_writable` should be called before calling `duckdb_vector_get_validity`, to ensure that there is a validity mask to write to. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_validity_set_row_validity( uint64_t *validity, idx_t row, bool valid ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `validity`: The validity mask, as obtained through `duckdb_vector_get_validity`. * `row`: The row index * `valid`: Whether or not to set the row to valid, or invalid
###### `duckdb_validity_set_row_invalid` {#docs:stable:clients:c:api::duckdb_validity_set_row_invalid} In a validity mask, sets a specific row to invalid. Equivalent to `duckdb_validity_set_row_validity` with valid set to false. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_validity_set_row_invalid( uint64_t *validity, idx_t row ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `validity`: The validity mask * `row`: The row index
###### `duckdb_validity_set_row_valid` {#docs:stable:clients:c:api::duckdb_validity_set_row_valid} In a validity mask, sets a specific row to valid. Equivalent to `duckdb_validity_set_row_validity` with valid set to true. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_validity_set_row_valid( uint64_t *validity, idx_t row ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `validity`: The validity mask * `row`: The row index
###### `duckdb_create_scalar_function` {#docs:stable:clients:c:api::duckdb_create_scalar_function} Creates a new empty scalar function. The return value must be destroyed with `duckdb_destroy_scalar_function`. ####### Return Value {#docs:stable:clients:c:api::return-value} The scalar function object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_scalar_function duckdb_create_scalar_function( ); ```
###### `duckdb_destroy_scalar_function` {#docs:stable:clients:c:api::duckdb_destroy_scalar_function} Destroys the given scalar function object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_scalar_function( duckdb_scalar_function *scalar_function ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `scalar_function`: The scalar function to destroy
###### `duckdb_scalar_function_set_name` {#docs:stable:clients:c:api::duckdb_scalar_function_set_name} Sets the name of the given scalar function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_scalar_function_set_name( duckdb_scalar_function scalar_function, const char *name ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `scalar_function`: The scalar function * `name`: The name of the scalar function
###### `duckdb_scalar_function_set_varargs` {#docs:stable:clients:c:api::duckdb_scalar_function_set_varargs} Sets the parameters of the given scalar function to varargs. Does not require adding parameters with duckdb_scalar_function_add_parameter. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_scalar_function_set_varargs( duckdb_scalar_function scalar_function, duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `scalar_function`: The scalar function. * `type`: The type of the arguments. ####### Return Value {#docs:stable:clients:c:api::return-value} The parameter type. Cannot contain INVALID.
###### `duckdb_scalar_function_set_special_handling` {#docs:stable:clients:c:api::duckdb_scalar_function_set_special_handling} Sets the scalar function's null-handling behavior to special. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_scalar_function_set_special_handling( duckdb_scalar_function scalar_function ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `scalar_function`: The scalar function.
###### `duckdb_scalar_function_set_volatile` {#docs:stable:clients:c:api::duckdb_scalar_function_set_volatile} Sets the Function Stability of the scalar function to VOLATILE, indicating the function should be re-run for every row. This limits optimization that can be performed for the function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_scalar_function_set_volatile( duckdb_scalar_function scalar_function ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `scalar_function`: The scalar function.
###### `duckdb_scalar_function_add_parameter` {#docs:stable:clients:c:api::duckdb_scalar_function_add_parameter} Adds a parameter to the scalar function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_scalar_function_add_parameter( duckdb_scalar_function scalar_function, duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `scalar_function`: The scalar function. * `type`: The parameter type. Cannot contain INVALID.
###### `duckdb_scalar_function_set_return_type` {#docs:stable:clients:c:api::duckdb_scalar_function_set_return_type} Sets the return type of the scalar function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_scalar_function_set_return_type( duckdb_scalar_function scalar_function, duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `scalar_function`: The scalar function * `type`: Cannot contain INVALID or ANY.
###### `duckdb_scalar_function_set_extra_info` {#docs:stable:clients:c:api::duckdb_scalar_function_set_extra_info} Assigns extra information to the scalar function that can be fetched during binding, etc. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_scalar_function_set_extra_info( duckdb_scalar_function scalar_function, void *extra_info, duckdb_delete_callback_t destroy ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `scalar_function`: The scalar function * `extra_info`: The extra information * `destroy`: The callback that will be called to destroy the extra information (if any)
###### `duckdb_scalar_function_set_bind` {#docs:stable:clients:c:api::duckdb_scalar_function_set_bind} Sets the (optional) bind function of the scalar function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_scalar_function_set_bind( duckdb_scalar_function scalar_function, duckdb_scalar_function_bind_t bind ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `scalar_function`: The scalar function. * `bind`: The bind function.
###### `duckdb_scalar_function_set_bind_data` {#docs:stable:clients:c:api::duckdb_scalar_function_set_bind_data} Sets the user-provided bind data in the bind object of the scalar function. The bind data object can be retrieved again during execution. In most case, you also need to set the copy-callback of your bind data via duckdb_scalar_function_set_bind_data_copy. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_scalar_function_set_bind_data( duckdb_bind_info info, void *bind_data, duckdb_delete_callback_t destroy ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The bind info of the scalar function. * `bind_data`: The bind data object. * `destroy`: The callback to destroy the bind data (if any).
###### `duckdb_scalar_function_set_bind_data_copy` {#docs:stable:clients:c:api::duckdb_scalar_function_set_bind_data_copy} Sets the copy-callback for the user-provided bind data in the bind object of the scalar function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_scalar_function_set_bind_data_copy( duckdb_bind_info info, duckdb_copy_callback_t copy ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The bind info of the scalar function. * `copy`: The callback to copy the bind data (if any).
###### `duckdb_scalar_function_bind_set_error` {#docs:stable:clients:c:api::duckdb_scalar_function_bind_set_error} Report that an error has occurred while calling bind on a scalar function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_scalar_function_bind_set_error( duckdb_bind_info info, const char *error ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The bind info object. * `error`: The error message.
###### `duckdb_scalar_function_set_function` {#docs:stable:clients:c:api::duckdb_scalar_function_set_function} Sets the main function of the scalar function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_scalar_function_set_function( duckdb_scalar_function scalar_function, duckdb_scalar_function_t function ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `scalar_function`: The scalar function * `function`: The function
###### `duckdb_register_scalar_function` {#docs:stable:clients:c:api::duckdb_register_scalar_function} Register the scalar function object within the given connection. The function requires at least a name, a function and a return type. If the function is incomplete or a function with this name already exists DuckDBError is returned. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_register_scalar_function( duckdb_connection con, duckdb_scalar_function scalar_function ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `con`: The connection to register it in. * `scalar_function`: The function pointer ####### Return Value {#docs:stable:clients:c:api::return-value} Whether or not the registration was successful.
###### `duckdb_scalar_function_get_extra_info` {#docs:stable:clients:c:api::duckdb_scalar_function_get_extra_info} Retrieves the extra info of the function as set in `duckdb_scalar_function_set_extra_info`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void *duckdb_scalar_function_get_extra_info( duckdb_function_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object. ####### Return Value {#docs:stable:clients:c:api::return-value} The extra info.
###### `duckdb_scalar_function_bind_get_extra_info` {#docs:stable:clients:c:api::duckdb_scalar_function_bind_get_extra_info} Retrieves the extra info of the function as set in the bind info. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void *duckdb_scalar_function_bind_get_extra_info( duckdb_bind_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object. ####### Return Value {#docs:stable:clients:c:api::return-value} The extra info.
###### `duckdb_scalar_function_get_bind_data` {#docs:stable:clients:c:api::duckdb_scalar_function_get_bind_data} Gets the scalar function's bind data set by `duckdb_scalar_function_set_bind_data`. Note that the bind data is read-only. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void *duckdb_scalar_function_get_bind_data( duckdb_function_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The function info. ####### Return Value {#docs:stable:clients:c:api::return-value} The bind data object.
###### `duckdb_scalar_function_get_client_context` {#docs:stable:clients:c:api::duckdb_scalar_function_get_client_context} Retrieves the client context of the bind info of a scalar function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_scalar_function_get_client_context( duckdb_bind_info info, duckdb_client_context *out_context ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The bind info object of the scalar function. * `out_context`: The client context of the bind info. Must be destroyed with `duckdb_destroy_client_context`.
###### `duckdb_scalar_function_set_error` {#docs:stable:clients:c:api::duckdb_scalar_function_set_error} Report that an error has occurred while executing the scalar function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_scalar_function_set_error( duckdb_function_info info, const char *error ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object. * `error`: The error message
###### `duckdb_create_scalar_function_set` {#docs:stable:clients:c:api::duckdb_create_scalar_function_set} Creates a new empty scalar function set. The return value must be destroyed with `duckdb_destroy_scalar_function_set`. ####### Return Value {#docs:stable:clients:c:api::return-value} The scalar function set object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_scalar_function_set duckdb_create_scalar_function_set( const char *name ); ```
###### `duckdb_destroy_scalar_function_set` {#docs:stable:clients:c:api::duckdb_destroy_scalar_function_set} Destroys the given scalar function set object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_scalar_function_set( duckdb_scalar_function_set *scalar_function_set ); ```
###### `duckdb_add_scalar_function_to_set` {#docs:stable:clients:c:api::duckdb_add_scalar_function_to_set} Adds the scalar function as a new overload to the scalar function set. Returns DuckDBError if the function could not be added, for example if the overload already exists. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_add_scalar_function_to_set( duckdb_scalar_function_set set, duckdb_scalar_function function ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `set`: The scalar function set * `function`: The function to add
###### `duckdb_register_scalar_function_set` {#docs:stable:clients:c:api::duckdb_register_scalar_function_set} Register the scalar function set within the given connection. The set requires at least a single valid overload. If the set is incomplete or a function with this name already exists DuckDBError is returned. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_register_scalar_function_set( duckdb_connection con, duckdb_scalar_function_set set ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `con`: The connection to register it in. * `set`: The function set to register ####### Return Value {#docs:stable:clients:c:api::return-value} Whether or not the registration was successful.
###### `duckdb_scalar_function_bind_get_argument_count` {#docs:stable:clients:c:api::duckdb_scalar_function_bind_get_argument_count} Returns the number of input arguments of the scalar function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_scalar_function_bind_get_argument_count( duckdb_bind_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The bind info. ####### Return Value {#docs:stable:clients:c:api::return-value} The number of input arguments.
###### `duckdb_scalar_function_bind_get_argument` {#docs:stable:clients:c:api::duckdb_scalar_function_bind_get_argument} Returns the input argument at index of the scalar function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_expression duckdb_scalar_function_bind_get_argument( duckdb_bind_info info, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The bind info. * `index`: The argument index. ####### Return Value {#docs:stable:clients:c:api::return-value} The input argument at index. Must be destroyed with `duckdb_destroy_expression`.
###### `duckdb_create_selection_vector` {#docs:stable:clients:c:api::duckdb_create_selection_vector} Creates a new selection vector of size `size`. Must be destroyed with `duckdb_destroy_selection_vector`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_selection_vector duckdb_create_selection_vector( idx_t size ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `size`: The size of the selection vector. ####### Return Value {#docs:stable:clients:c:api::return-value} The selection vector.
###### `duckdb_destroy_selection_vector` {#docs:stable:clients:c:api::duckdb_destroy_selection_vector} Destroys the selection vector and de-allocates its memory. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_selection_vector( duckdb_selection_vector sel ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `sel`: The selection vector.
###### `duckdb_selection_vector_get_data_ptr` {#docs:stable:clients:c:api::duckdb_selection_vector_get_data_ptr} Access the data pointer of a selection vector. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c sel_t *duckdb_selection_vector_get_data_ptr( duckdb_selection_vector sel ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `sel`: The selection vector. ####### Return Value {#docs:stable:clients:c:api::return-value} The data pointer.
###### `duckdb_create_aggregate_function` {#docs:stable:clients:c:api::duckdb_create_aggregate_function} Creates a new empty aggregate function. The return value should be destroyed with `duckdb_destroy_aggregate_function`. ####### Return Value {#docs:stable:clients:c:api::return-value} The aggregate function object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_aggregate_function duckdb_create_aggregate_function( ); ```
###### `duckdb_destroy_aggregate_function` {#docs:stable:clients:c:api::duckdb_destroy_aggregate_function} Destroys the given aggregate function object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_aggregate_function( duckdb_aggregate_function *aggregate_function ); ```
###### `duckdb_aggregate_function_set_name` {#docs:stable:clients:c:api::duckdb_aggregate_function_set_name} Sets the name of the given aggregate function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_aggregate_function_set_name( duckdb_aggregate_function aggregate_function, const char *name ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `aggregate_function`: The aggregate function * `name`: The name of the aggregate function
###### `duckdb_aggregate_function_add_parameter` {#docs:stable:clients:c:api::duckdb_aggregate_function_add_parameter} Adds a parameter to the aggregate function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_aggregate_function_add_parameter( duckdb_aggregate_function aggregate_function, duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `aggregate_function`: The aggregate function. * `type`: The parameter type. Cannot contain INVALID.
###### `duckdb_aggregate_function_set_return_type` {#docs:stable:clients:c:api::duckdb_aggregate_function_set_return_type} Sets the return type of the aggregate function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_aggregate_function_set_return_type( duckdb_aggregate_function aggregate_function, duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `aggregate_function`: The aggregate function. * `type`: The return type. Cannot contain INVALID or ANY.
###### `duckdb_aggregate_function_set_functions` {#docs:stable:clients:c:api::duckdb_aggregate_function_set_functions} Sets the main functions of the aggregate function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_aggregate_function_set_functions( duckdb_aggregate_function aggregate_function, duckdb_aggregate_state_size state_size, duckdb_aggregate_init_t state_init, duckdb_aggregate_update_t update, duckdb_aggregate_combine_t combine, duckdb_aggregate_finalize_t finalize ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `aggregate_function`: The aggregate function * `state_size`: state size * `state_init`: state init function * `update`: update states * `combine`: combine states * `finalize`: finalize states
###### `duckdb_aggregate_function_set_destructor` {#docs:stable:clients:c:api::duckdb_aggregate_function_set_destructor} Sets the state destructor callback of the aggregate function (optional) ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_aggregate_function_set_destructor( duckdb_aggregate_function aggregate_function, duckdb_aggregate_destroy_t destroy ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `aggregate_function`: The aggregate function * `destroy`: state destroy callback
###### `duckdb_register_aggregate_function` {#docs:stable:clients:c:api::duckdb_register_aggregate_function} Register the aggregate function object within the given connection. The function requires at least a name, functions and a return type. If the function is incomplete or a function with this name already exists DuckDBError is returned. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_register_aggregate_function( duckdb_connection con, duckdb_aggregate_function aggregate_function ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `con`: The connection to register it in. ####### Return Value {#docs:stable:clients:c:api::return-value} Whether or not the registration was successful.
###### `duckdb_aggregate_function_set_special_handling` {#docs:stable:clients:c:api::duckdb_aggregate_function_set_special_handling} Sets the NULL handling of the aggregate function to SPECIAL_HANDLING. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_aggregate_function_set_special_handling( duckdb_aggregate_function aggregate_function ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `aggregate_function`: The aggregate function
###### `duckdb_aggregate_function_set_extra_info` {#docs:stable:clients:c:api::duckdb_aggregate_function_set_extra_info} Assigns extra information to the scalar function that can be fetched during binding, etc. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_aggregate_function_set_extra_info( duckdb_aggregate_function aggregate_function, void *extra_info, duckdb_delete_callback_t destroy ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `aggregate_function`: The aggregate function * `extra_info`: The extra information * `destroy`: The callback that will be called to destroy the extra information (if any)
###### `duckdb_aggregate_function_get_extra_info` {#docs:stable:clients:c:api::duckdb_aggregate_function_get_extra_info} Retrieves the extra info of the function as set in `duckdb_aggregate_function_set_extra_info`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void *duckdb_aggregate_function_get_extra_info( duckdb_function_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:api::return-value} The extra info
###### `duckdb_aggregate_function_set_error` {#docs:stable:clients:c:api::duckdb_aggregate_function_set_error} Report that an error has occurred while executing the aggregate function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_aggregate_function_set_error( duckdb_function_info info, const char *error ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object * `error`: The error message
###### `duckdb_create_aggregate_function_set` {#docs:stable:clients:c:api::duckdb_create_aggregate_function_set} Creates a new empty aggregate function set. The return value should be destroyed with `duckdb_destroy_aggregate_function_set`. ####### Return Value {#docs:stable:clients:c:api::return-value} The aggregate function set object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_aggregate_function_set duckdb_create_aggregate_function_set( const char *name ); ```
###### `duckdb_destroy_aggregate_function_set` {#docs:stable:clients:c:api::duckdb_destroy_aggregate_function_set} Destroys the given aggregate function set object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_aggregate_function_set( duckdb_aggregate_function_set *aggregate_function_set ); ```
###### `duckdb_add_aggregate_function_to_set` {#docs:stable:clients:c:api::duckdb_add_aggregate_function_to_set} Adds the aggregate function as a new overload to the aggregate function set. Returns DuckDBError if the function could not be added, for example if the overload already exists. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_add_aggregate_function_to_set( duckdb_aggregate_function_set set, duckdb_aggregate_function function ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `set`: The aggregate function set * `function`: The function to add
###### `duckdb_register_aggregate_function_set` {#docs:stable:clients:c:api::duckdb_register_aggregate_function_set} Register the aggregate function set within the given connection. The set requires at least a single valid overload. If the set is incomplete or a function with this name already exists DuckDBError is returned. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_register_aggregate_function_set( duckdb_connection con, duckdb_aggregate_function_set set ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `con`: The connection to register it in. * `set`: The function set to register ####### Return Value {#docs:stable:clients:c:api::return-value} Whether or not the registration was successful.
###### `duckdb_create_table_function` {#docs:stable:clients:c:api::duckdb_create_table_function} Creates a new empty table function. The return value should be destroyed with `duckdb_destroy_table_function`. ####### Return Value {#docs:stable:clients:c:api::return-value} The table function object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_table_function duckdb_create_table_function( ); ```
###### `duckdb_destroy_table_function` {#docs:stable:clients:c:api::duckdb_destroy_table_function} Destroys the given table function object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_table_function( duckdb_table_function *table_function ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `table_function`: The table function to destroy
###### `duckdb_table_function_set_name` {#docs:stable:clients:c:api::duckdb_table_function_set_name} Sets the name of the given table function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_table_function_set_name( duckdb_table_function table_function, const char *name ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `table_function`: The table function * `name`: The name of the table function
###### `duckdb_table_function_add_parameter` {#docs:stable:clients:c:api::duckdb_table_function_add_parameter} Adds a parameter to the table function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_table_function_add_parameter( duckdb_table_function table_function, duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `table_function`: The table function. * `type`: The parameter type. Cannot contain INVALID.
###### `duckdb_table_function_add_named_parameter` {#docs:stable:clients:c:api::duckdb_table_function_add_named_parameter} Adds a named parameter to the table function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_table_function_add_named_parameter( duckdb_table_function table_function, const char *name, duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `table_function`: The table function. * `name`: The parameter name. * `type`: The parameter type. Cannot contain INVALID.
###### `duckdb_table_function_set_extra_info` {#docs:stable:clients:c:api::duckdb_table_function_set_extra_info} Assigns extra information to the table function that can be fetched during binding, etc. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_table_function_set_extra_info( duckdb_table_function table_function, void *extra_info, duckdb_delete_callback_t destroy ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `table_function`: The table function * `extra_info`: The extra information * `destroy`: The callback that will be called to destroy the extra information (if any)
###### `duckdb_table_function_set_bind` {#docs:stable:clients:c:api::duckdb_table_function_set_bind} Sets the bind function of the table function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_table_function_set_bind( duckdb_table_function table_function, duckdb_table_function_bind_t bind ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `table_function`: The table function * `bind`: The bind function
###### `duckdb_table_function_set_init` {#docs:stable:clients:c:api::duckdb_table_function_set_init} Sets the init function of the table function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_table_function_set_init( duckdb_table_function table_function, duckdb_table_function_init_t init ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `table_function`: The table function * `init`: The init function
###### `duckdb_table_function_set_local_init` {#docs:stable:clients:c:api::duckdb_table_function_set_local_init} Sets the thread-local init function of the table function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_table_function_set_local_init( duckdb_table_function table_function, duckdb_table_function_init_t init ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `table_function`: The table function * `init`: The init function
###### `duckdb_table_function_set_function` {#docs:stable:clients:c:api::duckdb_table_function_set_function} Sets the main function of the table function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_table_function_set_function( duckdb_table_function table_function, duckdb_table_function_t function ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `table_function`: The table function * `function`: The function
###### `duckdb_table_function_supports_projection_pushdown` {#docs:stable:clients:c:api::duckdb_table_function_supports_projection_pushdown} Sets whether or not the given table function supports projection pushdown. If this is set to true, the system will provide a list of all required columns in the `init` stage through the `duckdb_init_get_column_count` and `duckdb_init_get_column_index` functions. If this is set to false (the default), the system will expect all columns to be projected. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_table_function_supports_projection_pushdown( duckdb_table_function table_function, bool pushdown ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `table_function`: The table function * `pushdown`: True if the table function supports projection pushdown, false otherwise.
###### `duckdb_register_table_function` {#docs:stable:clients:c:api::duckdb_register_table_function} Register the table function object within the given connection. The function requires at least a name, a bind function, an init function and a main function. If the function is incomplete or a function with this name already exists DuckDBError is returned. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_register_table_function( duckdb_connection con, duckdb_table_function function ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `con`: The connection to register it in. * `function`: The function pointer ####### Return Value {#docs:stable:clients:c:api::return-value} Whether or not the registration was successful.
###### `duckdb_bind_get_extra_info` {#docs:stable:clients:c:api::duckdb_bind_get_extra_info} Retrieves the extra info of the function as set in `duckdb_table_function_set_extra_info`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void *duckdb_bind_get_extra_info( duckdb_bind_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:api::return-value} The extra info
###### `duckdb_table_function_get_client_context` {#docs:stable:clients:c:api::duckdb_table_function_get_client_context} Retrieves the client context of the bind info of a table function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_table_function_get_client_context( duckdb_bind_info info, duckdb_client_context *out_context ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The bind info object of the table function. * `out_context`: The client context of the bind info. Must be destroyed with `duckdb_destroy_client_context`.
###### `duckdb_bind_add_result_column` {#docs:stable:clients:c:api::duckdb_bind_add_result_column} Adds a result column to the output of the table function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_bind_add_result_column( duckdb_bind_info info, const char *name, duckdb_logical_type type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The table function's bind info. * `name`: The column name. * `type`: The logical column type.
###### `duckdb_bind_get_parameter_count` {#docs:stable:clients:c:api::duckdb_bind_get_parameter_count} Retrieves the number of regular (non-named) parameters to the function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_bind_get_parameter_count( duckdb_bind_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:api::return-value} The number of parameters
###### `duckdb_bind_get_parameter` {#docs:stable:clients:c:api::duckdb_bind_get_parameter} Retrieves the parameter at the given index. The result must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_bind_get_parameter( duckdb_bind_info info, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object * `index`: The index of the parameter to get ####### Return Value {#docs:stable:clients:c:api::return-value} The value of the parameter. Must be destroyed with `duckdb_destroy_value`.
###### `duckdb_bind_get_named_parameter` {#docs:stable:clients:c:api::duckdb_bind_get_named_parameter} Retrieves a named parameter with the given name. The result must be destroyed with `duckdb_destroy_value`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_bind_get_named_parameter( duckdb_bind_info info, const char *name ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object * `name`: The name of the parameter ####### Return Value {#docs:stable:clients:c:api::return-value} The value of the parameter. Must be destroyed with `duckdb_destroy_value`.
###### `duckdb_bind_set_bind_data` {#docs:stable:clients:c:api::duckdb_bind_set_bind_data} Sets the user-provided bind data in the bind object of the table function. This object can be retrieved again during execution. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_bind_set_bind_data( duckdb_bind_info info, void *bind_data, duckdb_delete_callback_t destroy ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The bind info of the table function. * `bind_data`: The bind data object. * `destroy`: The callback to destroy the bind data (if any).
###### `duckdb_bind_set_cardinality` {#docs:stable:clients:c:api::duckdb_bind_set_cardinality} Sets the cardinality estimate for the table function, used for optimization. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_bind_set_cardinality( duckdb_bind_info info, idx_t cardinality, bool is_exact ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The bind data object. * `is_exact`: Whether or not the cardinality estimate is exact, or an approximation
###### `duckdb_bind_set_error` {#docs:stable:clients:c:api::duckdb_bind_set_error} Report that an error has occurred while calling bind on a table function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_bind_set_error( duckdb_bind_info info, const char *error ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object * `error`: The error message
###### `duckdb_init_get_extra_info` {#docs:stable:clients:c:api::duckdb_init_get_extra_info} Retrieves the extra info of the function as set in `duckdb_table_function_set_extra_info`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void *duckdb_init_get_extra_info( duckdb_init_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:api::return-value} The extra info
###### `duckdb_init_get_bind_data` {#docs:stable:clients:c:api::duckdb_init_get_bind_data} Gets the bind data set by `duckdb_bind_set_bind_data` during the bind. Note that the bind data should be considered as read-only. For tracking state, use the init data instead. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void *duckdb_init_get_bind_data( duckdb_init_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:api::return-value} The bind data object
###### `duckdb_init_set_init_data` {#docs:stable:clients:c:api::duckdb_init_set_init_data} Sets the user-provided init data in the init object. This object can be retrieved again during execution. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_init_set_init_data( duckdb_init_info info, void *init_data, duckdb_delete_callback_t destroy ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object * `init_data`: The init data object. * `destroy`: The callback that will be called to destroy the init data (if any)
###### `duckdb_init_get_column_count` {#docs:stable:clients:c:api::duckdb_init_get_column_count} Returns the number of projected columns. This function must be used if projection pushdown is enabled to figure out which columns to emit. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_init_get_column_count( duckdb_init_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:api::return-value} The number of projected columns.
###### `duckdb_init_get_column_index` {#docs:stable:clients:c:api::duckdb_init_get_column_index} Returns the column index of the projected column at the specified position. This function must be used if projection pushdown is enabled to figure out which columns to emit. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_init_get_column_index( duckdb_init_info info, idx_t column_index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object * `column_index`: The index at which to get the projected column index, from 0..duckdb_init_get_column_count(info) ####### Return Value {#docs:stable:clients:c:api::return-value} The column index of the projected column.
###### `duckdb_init_set_max_threads` {#docs:stable:clients:c:api::duckdb_init_set_max_threads} Sets how many threads can process this table function in parallel (default: 1) ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_init_set_max_threads( duckdb_init_info info, idx_t max_threads ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object * `max_threads`: The maximum amount of threads that can process this table function
###### `duckdb_init_set_error` {#docs:stable:clients:c:api::duckdb_init_set_error} Report that an error has occurred while calling init. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_init_set_error( duckdb_init_info info, const char *error ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object * `error`: The error message
###### `duckdb_function_get_extra_info` {#docs:stable:clients:c:api::duckdb_function_get_extra_info} Retrieves the extra info of the function as set in `duckdb_table_function_set_extra_info`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void *duckdb_function_get_extra_info( duckdb_function_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:api::return-value} The extra info
###### `duckdb_function_get_bind_data` {#docs:stable:clients:c:api::duckdb_function_get_bind_data} Gets the table function's bind data set by `duckdb_bind_set_bind_data`. Note that the bind data is read-only. For tracking state, use the init data instead. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void *duckdb_function_get_bind_data( duckdb_function_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The function info object. ####### Return Value {#docs:stable:clients:c:api::return-value} The bind data object.
###### `duckdb_function_get_init_data` {#docs:stable:clients:c:api::duckdb_function_get_init_data} Gets the init data set by `duckdb_init_set_init_data` during the init. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void *duckdb_function_get_init_data( duckdb_function_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:api::return-value} The init data object
###### `duckdb_function_get_local_init_data` {#docs:stable:clients:c:api::duckdb_function_get_local_init_data} Gets the thread-local init data set by `duckdb_init_set_init_data` during the local_init. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void *duckdb_function_get_local_init_data( duckdb_function_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object ####### Return Value {#docs:stable:clients:c:api::return-value} The init data object
###### `duckdb_function_set_error` {#docs:stable:clients:c:api::duckdb_function_set_error} Report that an error has occurred while executing the function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_function_set_error( duckdb_function_info info, const char *error ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object * `error`: The error message
###### `duckdb_add_replacement_scan` {#docs:stable:clients:c:api::duckdb_add_replacement_scan} Add a replacement scan definition to the specified database. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_add_replacement_scan( duckdb_database db, duckdb_replacement_callback_t replacement, void *extra_data, duckdb_delete_callback_t delete_callback ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `db`: The database object to add the replacement scan to * `replacement`: The replacement scan callback * `extra_data`: Extra data that is passed back into the specified callback * `delete_callback`: The delete callback to call on the extra data, if any
###### `duckdb_replacement_scan_set_function_name` {#docs:stable:clients:c:api::duckdb_replacement_scan_set_function_name} Sets the replacement function name. If this function is called in the replacement callback, the replacement scan is performed. If it is not called, the replacement callback is not performed. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_replacement_scan_set_function_name( duckdb_replacement_scan_info info, const char *function_name ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object * `function_name`: The function name to substitute.
###### `duckdb_replacement_scan_add_parameter` {#docs:stable:clients:c:api::duckdb_replacement_scan_add_parameter} Adds a parameter to the replacement scan function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_replacement_scan_add_parameter( duckdb_replacement_scan_info info, duckdb_value parameter ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object * `parameter`: The parameter to add.
###### `duckdb_replacement_scan_set_error` {#docs:stable:clients:c:api::duckdb_replacement_scan_set_error} Report that an error has occurred while executing the replacement scan. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_replacement_scan_set_error( duckdb_replacement_scan_info info, const char *error ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object * `error`: The error message
###### `duckdb_get_profiling_info` {#docs:stable:clients:c:api::duckdb_get_profiling_info} Returns the root node of the profiling information. Returns nullptr, if profiling is not enabled. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_profiling_info duckdb_get_profiling_info( duckdb_connection connection ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: A connection object. ####### Return Value {#docs:stable:clients:c:api::return-value} A profiling information object.
###### `duckdb_profiling_info_get_value` {#docs:stable:clients:c:api::duckdb_profiling_info_get_value} Returns the value of the metric of the current profiling info node. Returns nullptr, if the metric does not exist or is not enabled. Currently, the value holds a string, and you can retrieve the string by calling the corresponding function: char *duckdb_get_varchar(duckdb_value value). ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_profiling_info_get_value( duckdb_profiling_info info, const char *key ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: A profiling information object. * `key`: The name of the requested metric. ####### Return Value {#docs:stable:clients:c:api::return-value} The value of the metric. Must be freed with `duckdb_destroy_value`
###### `duckdb_profiling_info_get_metrics` {#docs:stable:clients:c:api::duckdb_profiling_info_get_metrics} Returns the key-value metric map of this profiling node as a MAP duckdb_value. The individual elements are accessible via the duckdb_value MAP functions. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_value duckdb_profiling_info_get_metrics( duckdb_profiling_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: A profiling information object. ####### Return Value {#docs:stable:clients:c:api::return-value} The key-value metric map as a MAP duckdb_value.
###### `duckdb_profiling_info_get_child_count` {#docs:stable:clients:c:api::duckdb_profiling_info_get_child_count} Returns the number of children in the current profiling info node. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_profiling_info_get_child_count( duckdb_profiling_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: A profiling information object. ####### Return Value {#docs:stable:clients:c:api::return-value} The number of children in the current node.
###### `duckdb_profiling_info_get_child` {#docs:stable:clients:c:api::duckdb_profiling_info_get_child} Returns the child node at the specified index. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_profiling_info duckdb_profiling_info_get_child( duckdb_profiling_info info, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: A profiling information object. * `index`: The index of the child node. ####### Return Value {#docs:stable:clients:c:api::return-value} The child node at the specified index.
###### `duckdb_appender_create` {#docs:stable:clients:c:api::duckdb_appender_create} Creates an appender object. Note that the object must be destroyed with `duckdb_appender_destroy`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_appender_create( duckdb_connection connection, const char *schema, const char *table, duckdb_appender *out_appender ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection context to create the appender in. * `schema`: The schema of the table to append to, or `nullptr` for the default schema. * `table`: The table name to append to. * `out_appender`: The resulting appender object. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_create_ext` {#docs:stable:clients:c:api::duckdb_appender_create_ext} Creates an appender object. Note that the object must be destroyed with `duckdb_appender_destroy`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_appender_create_ext( duckdb_connection connection, const char *catalog, const char *schema, const char *table, duckdb_appender *out_appender ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection context to create the appender in. * `catalog`: The catalog of the table to append to, or `nullptr` for the default catalog. * `schema`: The schema of the table to append to, or `nullptr` for the default schema. * `table`: The table name to append to. * `out_appender`: The resulting appender object. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_create_query` {#docs:stable:clients:c:api::duckdb_appender_create_query} Creates an appender object that executes the given query with any data appended to it. Note that the object must be destroyed with `duckdb_appender_destroy`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_appender_create_query( duckdb_connection connection, const char *query, idx_t column_count, duckdb_logical_type *types, const char *table_name, const char **column_names, duckdb_appender *out_appender ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection context to create the appender in. * `query`: The query to execute, can be an INSERT, DELETE, UPDATE or MERGE INTO statement. * `column_count`: The number of columns to append. * `types`: The types of the columns to append. * `table_name`: (optionally) the table name used to refer to the appended data, defaults to "appended_data". * `column_names`: (optionally) the list of column names, defaults to "col1", "col2", ... * `out_appender`: The resulting appender object. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_column_count` {#docs:stable:clients:c:api::duckdb_appender_column_count} Returns the number of columns that belong to the appender. If there is no active column list, then this equals the table's physical columns. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_appender_column_count( duckdb_appender appender ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `appender`: The appender to get the column count from. ####### Return Value {#docs:stable:clients:c:api::return-value} The number of columns in the data chunks.
###### `duckdb_appender_column_type` {#docs:stable:clients:c:api::duckdb_appender_column_type} Returns the type of the column at the specified index. This is either a type in the active column list, or the same type as a column in the receiving table. Note: The resulting type must be destroyed with `duckdb_destroy_logical_type`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_appender_column_type( duckdb_appender appender, idx_t col_idx ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `appender`: The appender to get the column type from. * `col_idx`: The index of the column to get the type of. ####### Return Value {#docs:stable:clients:c:api::return-value} The `duckdb_logical_type` of the column.
###### `duckdb_appender_error` {#docs:stable:clients:c:api::duckdb_appender_error} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Use duckdb_appender_error_data instead. Returns the error message associated with the appender. If the appender has no error message, this returns `nullptr` instead. The error message should not be freed. It will be de-allocated when `duckdb_appender_destroy` is called. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c const char *duckdb_appender_error( duckdb_appender appender ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `appender`: The appender to get the error from. ####### Return Value {#docs:stable:clients:c:api::return-value} The error message, or `nullptr` if there is none.
###### `duckdb_appender_error_data` {#docs:stable:clients:c:api::duckdb_appender_error_data} Returns the error data associated with the appender. Must be destroyed with duckdb_destroy_error_data. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_error_data duckdb_appender_error_data( duckdb_appender appender ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `appender`: The appender to get the error data from. ####### Return Value {#docs:stable:clients:c:api::return-value} The error data.
###### `duckdb_appender_flush` {#docs:stable:clients:c:api::duckdb_appender_flush} Flush the appender to the table, forcing the cache of the appender to be cleared. If flushing the data triggers a constraint violation or any other error, then all data is invalidated, and this function returns DuckDBError. It is not possible to append more values. Call duckdb_appender_error_data to obtain the error data followed by duckdb_appender_destroy to destroy the invalidated appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_appender_flush( duckdb_appender appender ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `appender`: The appender to flush. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_close` {#docs:stable:clients:c:api::duckdb_appender_close} Closes the appender by flushing all intermediate states and closing it for further appends. If flushing the data triggers a constraint violation or any other error, then all data is invalidated, and this function returns DuckDBError. Call duckdb_appender_error_data to obtain the error data followed by duckdb_appender_destroy to destroy the invalidated appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_appender_close( duckdb_appender appender ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `appender`: The appender to flush and close. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_destroy` {#docs:stable:clients:c:api::duckdb_appender_destroy} Closes the appender by flushing all intermediate states to the table and destroying it. By destroying it, this function de-allocates all memory associated with the appender. If flushing the data triggers a constraint violation, then all data is invalidated, and this function returns DuckDBError. Due to the destruction of the appender, it is no longer possible to obtain the specific error message with duckdb_appender_error. Therefore, call duckdb_appender_close before destroying the appender, if you need insights into the specific error. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_appender_destroy( duckdb_appender *appender ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `appender`: The appender to flush, close and destroy. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_add_column` {#docs:stable:clients:c:api::duckdb_appender_add_column} Appends a column to the active column list of the appender. Immediately flushes all previous data. The active column list specifies all columns that are expected when flushing the data. Any non-active columns are filled with their default values, or NULL. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_appender_add_column( duckdb_appender appender, const char *name ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `appender`: The appender to add the column to. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_clear_columns` {#docs:stable:clients:c:api::duckdb_appender_clear_columns} Removes all columns from the active column list of the appender, resetting the appender to treat all columns as active. Immediately flushes all previous data. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_appender_clear_columns( duckdb_appender appender ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `appender`: The appender to clear the columns from. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_appender_begin_row` {#docs:stable:clients:c:api::duckdb_appender_begin_row} A nop function, provided for backwards compatibility reasons. Does nothing. Only `duckdb_appender_end_row` is required. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_appender_begin_row( duckdb_appender appender ); ```
###### `duckdb_appender_end_row` {#docs:stable:clients:c:api::duckdb_appender_end_row} Finish the current row of appends. After end_row is called, the next row can be appended. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_appender_end_row( duckdb_appender appender ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `appender`: The appender. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_append_default` {#docs:stable:clients:c:api::duckdb_append_default} Append a DEFAULT value (NULL if DEFAULT not available for column) to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_default( duckdb_appender appender ); ```
###### `duckdb_append_default_to_chunk` {#docs:stable:clients:c:api::duckdb_append_default_to_chunk} Append a DEFAULT value, at the specified row and column, (NULL if DEFAULT not available for column) to the chunk created from the specified appender. The default value of the column must be a constant value. Non-deterministic expressions like nextval('seq') or random() are not supported. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_default_to_chunk( duckdb_appender appender, duckdb_data_chunk chunk, idx_t col, idx_t row ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `appender`: The appender to get the default value from. * `chunk`: The data chunk to append the default value to. * `col`: The chunk column index to append the default value to. * `row`: The chunk row index to append the default value to. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_append_bool` {#docs:stable:clients:c:api::duckdb_append_bool} Append a bool value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_bool( duckdb_appender appender, bool value ); ```
###### `duckdb_append_int8` {#docs:stable:clients:c:api::duckdb_append_int8} Append an int8_t value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_int8( duckdb_appender appender, int8_t value ); ```
###### `duckdb_append_int16` {#docs:stable:clients:c:api::duckdb_append_int16} Append an int16_t value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_int16( duckdb_appender appender, int16_t value ); ```
###### `duckdb_append_int32` {#docs:stable:clients:c:api::duckdb_append_int32} Append an int32_t value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_int32( duckdb_appender appender, int32_t value ); ```
###### `duckdb_append_int64` {#docs:stable:clients:c:api::duckdb_append_int64} Append an int64_t value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_int64( duckdb_appender appender, int64_t value ); ```
###### `duckdb_append_hugeint` {#docs:stable:clients:c:api::duckdb_append_hugeint} Append a duckdb_hugeint value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_hugeint( duckdb_appender appender, duckdb_hugeint value ); ```
###### `duckdb_append_uint8` {#docs:stable:clients:c:api::duckdb_append_uint8} Append a uint8_t value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_uint8( duckdb_appender appender, uint8_t value ); ```
###### `duckdb_append_uint16` {#docs:stable:clients:c:api::duckdb_append_uint16} Append a uint16_t value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_uint16( duckdb_appender appender, uint16_t value ); ```
###### `duckdb_append_uint32` {#docs:stable:clients:c:api::duckdb_append_uint32} Append a uint32_t value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_uint32( duckdb_appender appender, uint32_t value ); ```
###### `duckdb_append_uint64` {#docs:stable:clients:c:api::duckdb_append_uint64} Append a uint64_t value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_uint64( duckdb_appender appender, uint64_t value ); ```
###### `duckdb_append_uhugeint` {#docs:stable:clients:c:api::duckdb_append_uhugeint} Append a duckdb_uhugeint value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_uhugeint( duckdb_appender appender, duckdb_uhugeint value ); ```
###### `duckdb_append_float` {#docs:stable:clients:c:api::duckdb_append_float} Append a float value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_float( duckdb_appender appender, float value ); ```
###### `duckdb_append_double` {#docs:stable:clients:c:api::duckdb_append_double} Append a double value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_double( duckdb_appender appender, double value ); ```
###### `duckdb_append_date` {#docs:stable:clients:c:api::duckdb_append_date} Append a duckdb_date value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_date( duckdb_appender appender, duckdb_date value ); ```
###### `duckdb_append_time` {#docs:stable:clients:c:api::duckdb_append_time} Append a duckdb_time value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_time( duckdb_appender appender, duckdb_time value ); ```
###### `duckdb_append_timestamp` {#docs:stable:clients:c:api::duckdb_append_timestamp} Append a duckdb_timestamp value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_timestamp( duckdb_appender appender, duckdb_timestamp value ); ```
###### `duckdb_append_interval` {#docs:stable:clients:c:api::duckdb_append_interval} Append a duckdb_interval value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_interval( duckdb_appender appender, duckdb_interval value ); ```
###### `duckdb_append_varchar` {#docs:stable:clients:c:api::duckdb_append_varchar} Append a varchar value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_varchar( duckdb_appender appender, const char *val ); ```
###### `duckdb_append_varchar_length` {#docs:stable:clients:c:api::duckdb_append_varchar_length} Append a varchar value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_varchar_length( duckdb_appender appender, const char *val, idx_t length ); ```
###### `duckdb_append_blob` {#docs:stable:clients:c:api::duckdb_append_blob} Append a blob value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_blob( duckdb_appender appender, const void *data, idx_t length ); ```
###### `duckdb_append_null` {#docs:stable:clients:c:api::duckdb_append_null} Append a NULL value to the appender (of any type). ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_null( duckdb_appender appender ); ```
###### `duckdb_append_value` {#docs:stable:clients:c:api::duckdb_append_value} Append a duckdb_value to the appender. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_value( duckdb_appender appender, duckdb_value value ); ```
###### `duckdb_append_data_chunk` {#docs:stable:clients:c:api::duckdb_append_data_chunk} Appends a pre-filled data chunk to the specified appender. Attempts casting, if the data chunk types do not match the active appender types. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_append_data_chunk( duckdb_appender appender, duckdb_data_chunk chunk ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `appender`: The appender to append to. * `chunk`: The data chunk to append. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_table_description_create` {#docs:stable:clients:c:api::duckdb_table_description_create} Creates a table description object. Note that `duckdb_table_description_destroy` should always be called on the resulting table_description, even if the function returns `DuckDBError`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_table_description_create( duckdb_connection connection, const char *schema, const char *table, duckdb_table_description *out ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection context. * `schema`: The schema of the table, or `nullptr` for the default schema. * `table`: The table name. * `out`: The resulting table description object. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_table_description_create_ext` {#docs:stable:clients:c:api::duckdb_table_description_create_ext} Creates a table description object. Note that `duckdb_table_description_destroy` must be called on the resulting table_description, even if the function returns `DuckDBError`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_table_description_create_ext( duckdb_connection connection, const char *catalog, const char *schema, const char *table, duckdb_table_description *out ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection context. * `catalog`: The catalog (database) name of the table, or `nullptr` for the default catalog. * `schema`: The schema of the table, or `nullptr` for the default schema. * `table`: The table name. * `out`: The resulting table description object. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_table_description_destroy` {#docs:stable:clients:c:api::duckdb_table_description_destroy} Destroy the TableDescription object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_table_description_destroy( duckdb_table_description *table_description ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `table_description`: The table_description to destroy.
###### `duckdb_table_description_error` {#docs:stable:clients:c:api::duckdb_table_description_error} Returns the error message associated with the given table_description. If the table_description has no error message, this returns `nullptr` instead. The error message should not be freed. It will be de-allocated when `duckdb_table_description_destroy` is called. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c const char *duckdb_table_description_error( duckdb_table_description table_description ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `table_description`: The table_description to get the error from. ####### Return Value {#docs:stable:clients:c:api::return-value} The error message, or `nullptr` if there is none.
###### `duckdb_column_has_default` {#docs:stable:clients:c:api::duckdb_column_has_default} Check if the column at 'index' index of the table has a DEFAULT expression. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_column_has_default( duckdb_table_description table_description, idx_t index, bool *out ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `table_description`: The table_description to query. * `index`: The index of the column to query. * `out`: The out-parameter used to store the result. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_table_description_get_column_name` {#docs:stable:clients:c:api::duckdb_table_description_get_column_name} Obtain the column name at 'index'. The out result must be destroyed with `duckdb_free`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c char *duckdb_table_description_get_column_name( duckdb_table_description table_description, idx_t index ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `table_description`: The table_description to query. * `index`: The index of the column to query. ####### Return Value {#docs:stable:clients:c:api::return-value} The column name.
###### `duckdb_to_arrow_schema` {#docs:stable:clients:c:api::duckdb_to_arrow_schema} Transforms a DuckDB Schema into an Arrow Schema ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_error_data duckdb_to_arrow_schema( duckdb_arrow_options arrow_options, duckdb_logical_type *types, const char **names, idx_t column_count, struct ArrowSchema *out_schema ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `arrow_options`: The Arrow settings used to produce arrow. * `types`: The DuckDB logical types for each column in the schema. * `names`: The names for each column in the schema. * `column_count`: The number of columns that exist in the schema. * `out_schema`: The resulting arrow schema. Must be destroyed with `out_schema->release(out_schema)`. ####### Return Value {#docs:stable:clients:c:api::return-value} The error data. Must be destroyed with `duckdb_destroy_error_data`.
###### `duckdb_data_chunk_to_arrow` {#docs:stable:clients:c:api::duckdb_data_chunk_to_arrow} Transforms a DuckDB data chunk into an Arrow array. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_error_data duckdb_data_chunk_to_arrow( duckdb_arrow_options arrow_options, duckdb_data_chunk chunk, struct ArrowArray *out_arrow_array ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `arrow_options`: The Arrow settings used to produce arrow. * `chunk`: The DuckDB data chunk to convert. * `out_arrow_array`: The output Arrow structure that will hold the converted data. Must be released with `out_arrow_array->release(out_arrow_array)` ####### Return Value {#docs:stable:clients:c:api::return-value} The error data. Must be destroyed with `duckdb_destroy_error_data`.
###### `duckdb_schema_from_arrow` {#docs:stable:clients:c:api::duckdb_schema_from_arrow} Transforms an Arrow Schema into a DuckDB Schema. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_error_data duckdb_schema_from_arrow( duckdb_connection connection, struct ArrowSchema *schema, duckdb_arrow_converted_schema *out_types ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection to get the transformation settings from. * `schema`: The input Arrow schema. Must be released with `schema->release(schema)`. * `out_types`: The Arrow converted schema with extra information about the arrow types. Must be destroyed with `duckdb_destroy_arrow_converted_schema`. ####### Return Value {#docs:stable:clients:c:api::return-value} The error data. Must be destroyed with `duckdb_destroy_error_data`.
###### `duckdb_data_chunk_from_arrow` {#docs:stable:clients:c:api::duckdb_data_chunk_from_arrow} Transforms an Arrow array into a DuckDB data chunk. The data chunk will retain ownership of the underlying Arrow data. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_error_data duckdb_data_chunk_from_arrow( duckdb_connection connection, struct ArrowArray *arrow_array, duckdb_arrow_converted_schema converted_schema, duckdb_data_chunk *out_chunk ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection to get the transformation settings from. * `arrow_array`: The input Arrow array. Data ownership is passed on to DuckDB's DataChunk, the underlying object does not need to be released and won't have ownership of the data. * `converted_schema`: The Arrow converted schema with extra information about the arrow types. * `out_chunk`: The resulting DuckDB data chunk. Must be destroyed by duckdb_destroy_data_chunk. ####### Return Value {#docs:stable:clients:c:api::return-value} The error data. Must be destroyed with `duckdb_destroy_error_data`.
###### `duckdb_destroy_arrow_converted_schema` {#docs:stable:clients:c:api::duckdb_destroy_arrow_converted_schema} Destroys the arrow converted schema and de-allocates all memory allocated for that arrow converted schema. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_arrow_converted_schema( duckdb_arrow_converted_schema *arrow_converted_schema ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `arrow_converted_schema`: The arrow converted schema to destroy.
###### `duckdb_query_arrow` {#docs:stable:clients:c:api::duckdb_query_arrow} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Executes a SQL query within a connection and stores the full (materialized) result in an arrow structure. If the query fails to execute, DuckDBError is returned and the error message can be retrieved by calling `duckdb_query_arrow_error`. Note that after running `duckdb_query_arrow`, `duckdb_destroy_arrow` must be called on the result object even if the query fails, otherwise the error stored within the result will not be freed correctly. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_query_arrow( duckdb_connection connection, const char *query, duckdb_arrow *out_result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection to perform the query in. * `query`: The SQL query to run. * `out_result`: The query result. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_query_arrow_schema` {#docs:stable:clients:c:api::duckdb_query_arrow_schema} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Fetch the internal arrow schema from the arrow result. Remember to call release on the respective ArrowSchema object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_query_arrow_schema( duckdb_arrow result, duckdb_arrow_schema *out_schema ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result to fetch the schema from. * `out_schema`: The output schema. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_prepared_arrow_schema` {#docs:stable:clients:c:api::duckdb_prepared_arrow_schema} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Fetch the internal arrow schema from the prepared statement. Remember to call release on the respective ArrowSchema object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_prepared_arrow_schema( duckdb_prepared_statement prepared, duckdb_arrow_schema *out_schema ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared`: The prepared statement to fetch the schema from. * `out_schema`: The output schema. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_result_arrow_array` {#docs:stable:clients:c:api::duckdb_result_arrow_array} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Convert a data chunk into an arrow struct array. Remember to call release on the respective ArrowArray object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_result_arrow_array( duckdb_result result, duckdb_data_chunk chunk, duckdb_arrow_array *out_array ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object the data chunk have been fetched from. * `chunk`: The data chunk to convert. * `out_array`: The output array.
###### `duckdb_query_arrow_array` {#docs:stable:clients:c:api::duckdb_query_arrow_array} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Fetch an internal arrow struct array from the arrow result. Remember to call release on the respective ArrowArray object. This function can be called multiple time to get next chunks, which will free the previous out_array. So consume the out_array before calling this function again. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_query_arrow_array( duckdb_arrow result, duckdb_arrow_array *out_array ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result to fetch the array from. * `out_array`: The output array. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_arrow_column_count` {#docs:stable:clients:c:api::duckdb_arrow_column_count} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Returns the number of columns present in the arrow result object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_arrow_column_count( duckdb_arrow result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object. ####### Return Value {#docs:stable:clients:c:api::return-value} The number of columns present in the result object.
###### `duckdb_arrow_row_count` {#docs:stable:clients:c:api::duckdb_arrow_row_count} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Returns the number of rows present in the arrow result object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_arrow_row_count( duckdb_arrow result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object. ####### Return Value {#docs:stable:clients:c:api::return-value} The number of rows present in the result object.
###### `duckdb_arrow_rows_changed` {#docs:stable:clients:c:api::duckdb_arrow_rows_changed} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Returns the number of rows changed by the query stored in the arrow result. This is relevant only for INSERT/UPDATE/DELETE queries. For other queries the rows_changed will be 0. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_arrow_rows_changed( duckdb_arrow result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object. ####### Return Value {#docs:stable:clients:c:api::return-value} The number of rows changed.
###### `duckdb_query_arrow_error` {#docs:stable:clients:c:api::duckdb_query_arrow_error} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Returns the error message contained within the result. The error is only set if `duckdb_query_arrow` returns `DuckDBError`. The error message should not be freed. It will be de-allocated when `duckdb_destroy_arrow` is called. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c const char *duckdb_query_arrow_error( duckdb_arrow result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object to fetch the error from. ####### Return Value {#docs:stable:clients:c:api::return-value} The error of the result.
###### `duckdb_destroy_arrow` {#docs:stable:clients:c:api::duckdb_destroy_arrow} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Closes the result and de-allocates all memory allocated for the arrow result. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_arrow( duckdb_arrow *result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result to destroy.
###### `duckdb_destroy_arrow_stream` {#docs:stable:clients:c:api::duckdb_destroy_arrow_stream} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Releases the arrow array stream and de-allocates its memory. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_arrow_stream( duckdb_arrow_stream *stream_p ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `stream_p`: The arrow array stream to destroy.
###### `duckdb_execute_prepared_arrow` {#docs:stable:clients:c:api::duckdb_execute_prepared_arrow} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Executes the prepared statement with the given bound parameters, and returns an arrow query result. Note that after running `duckdb_execute_prepared_arrow`, `duckdb_destroy_arrow` must be called on the result object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_execute_prepared_arrow( duckdb_prepared_statement prepared_statement, duckdb_arrow *out_result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `prepared_statement`: The prepared statement to execute. * `out_result`: The query result. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_arrow_scan` {#docs:stable:clients:c:api::duckdb_arrow_scan} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Scans the Arrow stream and creates a view with the given name. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_arrow_scan( duckdb_connection connection, const char *table_name, duckdb_arrow_stream arrow ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection on which to execute the scan. * `table_name`: Name of the temporary view to create. * `arrow`: Arrow stream wrapper. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_arrow_array_scan` {#docs:stable:clients:c:api::duckdb_arrow_array_scan} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Scans the Arrow array and creates a view with the given name. Note that after running `duckdb_arrow_array_scan`, `duckdb_destroy_arrow_stream` must be called on the out stream. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_arrow_array_scan( duckdb_connection connection, const char *table_name, duckdb_arrow_schema arrow_schema, duckdb_arrow_array arrow_array, duckdb_arrow_stream *out_stream ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `connection`: The connection on which to execute the scan. * `table_name`: Name of the temporary view to create. * `arrow_schema`: Arrow schema wrapper. * `arrow_array`: Arrow array wrapper. * `out_stream`: Output array stream that wraps around the passed schema, for releasing/deleting once done. ####### Return Value {#docs:stable:clients:c:api::return-value} `DuckDBSuccess` on success or `DuckDBError` on failure.
###### `duckdb_execute_tasks` {#docs:stable:clients:c:api::duckdb_execute_tasks} Execute DuckDB tasks on this thread. Will return after `max_tasks` have been executed, or if there are no more tasks present. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_execute_tasks( duckdb_database database, idx_t max_tasks ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `database`: The database object to execute tasks for * `max_tasks`: The maximum amount of tasks to execute
###### `duckdb_create_task_state` {#docs:stable:clients:c:api::duckdb_create_task_state} Creates a task state that can be used with duckdb_execute_tasks_state to execute tasks until `duckdb_finish_execution` is called on the state. `duckdb_destroy_state` must be called on the result. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_task_state duckdb_create_task_state( duckdb_database database ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `database`: The database object to create the task state for ####### Return Value {#docs:stable:clients:c:api::return-value} The task state that can be used with duckdb_execute_tasks_state.
###### `duckdb_execute_tasks_state` {#docs:stable:clients:c:api::duckdb_execute_tasks_state} Execute DuckDB tasks on this thread. The thread will keep on executing tasks forever, until duckdb_finish_execution is called on the state. Multiple threads can share the same duckdb_task_state. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_execute_tasks_state( duckdb_task_state state ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `state`: The task state of the executor
###### `duckdb_execute_n_tasks_state` {#docs:stable:clients:c:api::duckdb_execute_n_tasks_state} Execute DuckDB tasks on this thread. The thread will keep on executing tasks until either duckdb_finish_execution is called on the state, max_tasks tasks have been executed or there are no more tasks to be executed. Multiple threads can share the same duckdb_task_state. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c idx_t duckdb_execute_n_tasks_state( duckdb_task_state state, idx_t max_tasks ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `state`: The task state of the executor * `max_tasks`: The maximum amount of tasks to execute ####### Return Value {#docs:stable:clients:c:api::return-value} The amount of tasks that have actually been executed
###### `duckdb_finish_execution` {#docs:stable:clients:c:api::duckdb_finish_execution} Finish execution on a specific task. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_finish_execution( duckdb_task_state state ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `state`: The task state to finish execution
###### `duckdb_task_state_is_finished` {#docs:stable:clients:c:api::duckdb_task_state_is_finished} Check if the provided duckdb_task_state has finished execution ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_task_state_is_finished( duckdb_task_state state ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `state`: The task state to inspect ####### Return Value {#docs:stable:clients:c:api::return-value} Whether or not duckdb_finish_execution has been called on the task state
###### `duckdb_destroy_task_state` {#docs:stable:clients:c:api::duckdb_destroy_task_state} Destroys the task state returned from duckdb_create_task_state. Note that this should not be called while there is an active duckdb_execute_tasks_state running on the task state. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_task_state( duckdb_task_state state ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `state`: The task state to clean up
###### `duckdb_execution_is_finished` {#docs:stable:clients:c:api::duckdb_execution_is_finished} Returns true if the execution of the current query is finished. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_execution_is_finished( duckdb_connection con ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `con`: The connection on which to check
###### `duckdb_stream_fetch_chunk` {#docs:stable:clients:c:api::duckdb_stream_fetch_chunk} > **Warning.** Deprecation notice. This method is scheduled for removal in a future release. Fetches a data chunk from the (streaming) duckdb_result. This function should be called repeatedly until the result is exhausted. The result must be destroyed with `duckdb_destroy_data_chunk`. This function can only be used on duckdb_results created with 'duckdb_pending_prepared_streaming' If this function is used, none of the other result functions can be used and vice versa (i.e., this function cannot be mixed with the legacy result functions or the materialized result functions). It is not known beforehand how many chunks will be returned by this result. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_data_chunk duckdb_stream_fetch_chunk( duckdb_result result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object to fetch the data chunk from. ####### Return Value {#docs:stable:clients:c:api::return-value} The resulting data chunk. Returns `NULL` if the result has an error.
###### `duckdb_fetch_chunk` {#docs:stable:clients:c:api::duckdb_fetch_chunk} Fetches a data chunk from a duckdb_result. This function should be called repeatedly until the result is exhausted. The result must be destroyed with `duckdb_destroy_data_chunk`. It is not known beforehand how many chunks will be returned by this result. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_data_chunk duckdb_fetch_chunk( duckdb_result result ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `result`: The result object to fetch the data chunk from. ####### Return Value {#docs:stable:clients:c:api::return-value} The resulting data chunk. Returns `NULL` if the result has an error.
###### `duckdb_create_cast_function` {#docs:stable:clients:c:api::duckdb_create_cast_function} Creates a new cast function object. ####### Return Value {#docs:stable:clients:c:api::return-value} The cast function object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_cast_function duckdb_create_cast_function( ); ```
###### `duckdb_cast_function_set_source_type` {#docs:stable:clients:c:api::duckdb_cast_function_set_source_type} Sets the source type of the cast function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_cast_function_set_source_type( duckdb_cast_function cast_function, duckdb_logical_type source_type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `cast_function`: The cast function object. * `source_type`: The source type to set.
###### `duckdb_cast_function_set_target_type` {#docs:stable:clients:c:api::duckdb_cast_function_set_target_type} Sets the target type of the cast function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_cast_function_set_target_type( duckdb_cast_function cast_function, duckdb_logical_type target_type ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `cast_function`: The cast function object. * `target_type`: The target type to set.
###### `duckdb_cast_function_set_implicit_cast_cost` {#docs:stable:clients:c:api::duckdb_cast_function_set_implicit_cast_cost} Sets the "cost" of implicitly casting the source type to the target type using this function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_cast_function_set_implicit_cast_cost( duckdb_cast_function cast_function, int64_t cost ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `cast_function`: The cast function object. * `cost`: The cost to set.
###### `duckdb_cast_function_set_function` {#docs:stable:clients:c:api::duckdb_cast_function_set_function} Sets the actual cast function to use. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_cast_function_set_function( duckdb_cast_function cast_function, duckdb_cast_function_t function ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `cast_function`: The cast function object. * `function`: The function to set.
###### `duckdb_cast_function_set_extra_info` {#docs:stable:clients:c:api::duckdb_cast_function_set_extra_info} Assigns extra information to the cast function that can be fetched during execution, etc. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_cast_function_set_extra_info( duckdb_cast_function cast_function, void *extra_info, duckdb_delete_callback_t destroy ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `extra_info`: The extra information * `destroy`: The callback that will be called to destroy the extra information (if any)
###### `duckdb_cast_function_get_extra_info` {#docs:stable:clients:c:api::duckdb_cast_function_get_extra_info} Retrieves the extra info of the function as set in `duckdb_cast_function_set_extra_info`. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void *duckdb_cast_function_get_extra_info( duckdb_function_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object. ####### Return Value {#docs:stable:clients:c:api::return-value} The extra info.
###### `duckdb_cast_function_get_cast_mode` {#docs:stable:clients:c:api::duckdb_cast_function_get_cast_mode} Get the cast execution mode from the given function info. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_cast_mode duckdb_cast_function_get_cast_mode( duckdb_function_info info ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object. ####### Return Value {#docs:stable:clients:c:api::return-value} The cast mode.
###### `duckdb_cast_function_set_error` {#docs:stable:clients:c:api::duckdb_cast_function_set_error} Report that an error has occurred while executing the cast function. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_cast_function_set_error( duckdb_function_info info, const char *error ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object. * `error`: The error message.
###### `duckdb_cast_function_set_row_error` {#docs:stable:clients:c:api::duckdb_cast_function_set_row_error} Report that an error has occurred while executing the cast function, setting the corresponding output row to NULL. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_cast_function_set_row_error( duckdb_function_info info, const char *error, idx_t row, duckdb_vector output ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `info`: The info object. * `error`: The error message. * `row`: The index of the row within the output vector to set to NULL. * `output`: The output vector.
###### `duckdb_register_cast_function` {#docs:stable:clients:c:api::duckdb_register_cast_function} Registers a cast function within the given connection. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_state duckdb_register_cast_function( duckdb_connection con, duckdb_cast_function cast_function ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `con`: The connection to use. * `cast_function`: The cast function to register. ####### Return Value {#docs:stable:clients:c:api::return-value} Whether or not the registration was successful.
###### `duckdb_destroy_cast_function` {#docs:stable:clients:c:api::duckdb_destroy_cast_function} Destroys the cast function object. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_cast_function( duckdb_cast_function *cast_function ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `cast_function`: The cast function object.
###### `duckdb_destroy_expression` {#docs:stable:clients:c:api::duckdb_destroy_expression} Destroys the expression and de-allocates its memory. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c void duckdb_destroy_expression( duckdb_expression *expr ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `expr`: A pointer to the expression.
###### `duckdb_expression_return_type` {#docs:stable:clients:c:api::duckdb_expression_return_type} Returns the return type of an expression. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_logical_type duckdb_expression_return_type( duckdb_expression expr ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `expr`: The expression. ####### Return Value {#docs:stable:clients:c:api::return-value} The return type. Must be destroyed with `duckdb_destroy_logical_type`.
###### `duckdb_expression_is_foldable` {#docs:stable:clients:c:api::duckdb_expression_is_foldable} Returns whether the expression is foldable into a value or not. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c bool duckdb_expression_is_foldable( duckdb_expression expr ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `expr`: The expression. ####### Return Value {#docs:stable:clients:c:api::return-value} True, if the expression is foldable, else false.
###### `duckdb_expression_fold` {#docs:stable:clients:c:api::duckdb_expression_fold} Folds an expression creating a folded value. ####### Syntax {#docs:stable:clients:c:api::syntax} ```c duckdb_error_data duckdb_expression_fold( duckdb_client_context context, duckdb_expression expr, duckdb_value *out_value ); ``` ####### Parameters {#docs:stable:clients:c:api::parameters} * `context`: The client context. * `expr`: The expression. Must be foldable. * `out_value`: The folded value, if folding was successful. Must be destroyed with `duckdb_destroy_value`. ####### Return Value {#docs:stable:clients:c:api::return-value} The error data. Must be destroyed with `duckdb_destroy_error_data`.
## C++ API {#docs:stable:clients:cpp} > The latest stable version of the DuckDB C++ API is 1.4.1. > **Warning.** DuckDB's C++ API is internal. > It is not guaranteed to be stable and can change without notice. > If you would like to build an application on DuckDB, we recommend using the [C API](#docs:stable:clients:c:overview). #### Installation {#docs:stable:clients:cpp::installation} The DuckDB C++ API can be installed as part of the `libduckdb` packages. Please see the [installation page](https://duckdb.org/install) for details. #### Basic API Usage {#docs:stable:clients:cpp::basic-api-usage} DuckDB implements a custom C++ API. This is built around the abstractions of a database instance (` DuckDB` class), multiple `Connection`s to the database instance and `QueryResult` instances as the result of queries. The header file for the C++ API is `duckdb.hpp`. ##### Startup & Shutdown {#docs:stable:clients:cpp::startup--shutdown} To use DuckDB, you must first initialize a `DuckDB` instance using its constructor. `DuckDB()` takes as parameter the database file to read and write from. The special value `nullptr` can be used to create an **in-memory database**. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the process). The second parameter to the `DuckDB` constructor is an optional `DBConfig` object. In `DBConfig`, you can set various database parameters, for example the read/write mode or memory limits. The `DuckDB` constructor may throw exceptions, for example if the database file is not usable. With the `DuckDB` instance, you can create one or many `Connection` instances using the `Connection()` constructor. While connections should be thread-safe, they will be locked during querying. It is therefore recommended that each thread uses its own connection if you are in a multithreaded environment. ```cpp DuckDB db(nullptr); Connection con(db); ``` ##### Querying {#docs:stable:clients:cpp::querying} Connections expose the `Query()` method to send a SQL query string to DuckDB from C++. `Query()` fully materializes the query result as a `MaterializedQueryResult` in memory before returning at which point the query result can be consumed. There is also a streaming API for queries, see further below. ```cpp // create a table con.Query("CREATE TABLE integers (i INTEGER, j INTEGER)"); // insert three rows into the table con.Query("INSERT INTO integers VALUES (3, 4), (5, 6), (7, NULL)"); auto result = con.Query("SELECT * FROM integers"); if (result->HasError()) { cerr << result->GetError() << endl; } else { cout << result->ToString() << endl; } ``` The `MaterializedQueryResult` instance contains firstly two fields that indicate whether the query was successful. `Query` will not throw exceptions under normal circumstances. Instead, invalid queries or other issues will lead to the `success` Boolean field in the query result instance to be set to `false`. In this case an error message may be available in `error` as a string. If successful, other fields are set: the type of statement that was just executed (e.g., `StatementType::INSERT_STATEMENT`) is contained in `statement_type`. The high-level (â€œLogical typeâ€/â€œSQL typeâ€) types of the result set columns are in `types`. The names of the result columns are in the `names` string vector. In case multiple result sets are returned, for example because the result set contained multiple statements, the result set can be chained using the `next` field. DuckDB also supports prepared statements in the C++ API with the `Prepare()` method. This returns an instance of `PreparedStatement`. This instance can be used to execute the prepared statement with parameters. Below is an example: ```cpp std::unique_ptr prepare = con.Prepare("SELECT count(*) FROM a WHERE i = $1"); std::unique_ptr result = prepare->Execute(12); ``` > **Warning.** Do **not** use prepared statements to insert large amounts of data into DuckDB. See the [data import documentation](#docs:stable:data:overview) for better options. ##### UDF API {#docs:stable:clients:cpp::udf-api} The UDF API allows the definition of user-defined functions. It is exposed in `duckdb:Connection` through the methods: `CreateScalarFunction()`, `CreateVectorizedFunction()`, and variants. These methods created UDFs into the temporary schema (` TEMP_SCHEMA`) of the owner connection that is the only one allowed to use and change them. ###### CreateScalarFunction {#docs:stable:clients:cpp::createscalarfunction} The user can code an ordinary scalar function and invoke the `CreateScalarFunction()` to register and afterward use the UDF in a `SELECT` statement, for instance: ```cpp bool bigger_than_four(int value) { return value > 4; } connection.CreateScalarFunction("bigger_than_four", &bigger_than_four); connection.Query("SELECT bigger_than_four(i) FROM (VALUES(3), (5)) tbl(i)")->Print(); ``` The `CreateScalarFunction()` methods automatically creates vectorized scalar UDFs so they are as efficient as built-in functions, we have two variants of this method interface as follows: **1.** ```cpp template void CreateScalarFunction(string name, TR (*udf_func)(Argsâ€¦)) ``` - template parameters: - **TR** is the return type of the UDF function; - **Args** are the arguments up to 3 for the UDF function (this method only supports until ternary functions); - **name**: is the name to register the UDF function; - **udf_func**: is a pointer to the UDF function. This method automatically discovers from the template typenames the corresponding LogicalTypes: - `bool` â†’ `LogicalType::BOOLEAN` - `int8_t` â†’ `LogicalType::TINYINT` - `int16_t` â†’ `LogicalType::SMALLINT` - `int32_t` â†’ `LogicalType::INTEGER` - `int64_t` â†’` LogicalType::BIGINT` - `float` â†’ `LogicalType::FLOAT` - `double` â†’ `LogicalType::DOUBLE` - `string_t` â†’ `LogicalType::VARCHAR` In DuckDB some primitive types, e.g., `int32_t`, are mapped to the same `LogicalType`: `INTEGER`, `TIME` and `DATE`, then for disambiguation the users can use the following overloaded method. **2.** ```cpp template void CreateScalarFunction(string name, vector args, LogicalType ret_type, TR (*udf_func)(Argsâ€¦)) ``` An example of use would be: ```cpp int32_t udf_date(int32_t a) { return a; } con.Query("CREATE TABLE dates (d DATE)"); con.Query("INSERT INTO dates VALUES ('1992-01-01')"); con.CreateScalarFunction("udf_date", {LogicalType::DATE}, LogicalType::DATE, &udf_date); con.Query("SELECT udf_date(d) FROM dates")->Print(); ``` - template parameters: - **TR** is the return type of the UDF function; - **Args** are the arguments up to 3 for the UDF function (this method only supports until ternary functions); - **name**: is the name to register the UDF function; - **args**: are the LogicalType arguments that the function uses, which should match with the template Args types; - **ret_type**: is the LogicalType of return of the function, which should match with the template TR type; - **udf_func**: is a pointer to the UDF function. This function checks the template types against the LogicalTypes passed as arguments and they must match as follow: - LogicalTypeId::BOOLEAN â†’ bool - LogicalTypeId::TINYINT â†’ int8_t - LogicalTypeId::SMALLINT â†’ int16_t - LogicalTypeId::DATE, LogicalTypeId::TIME, LogicalTypeId::INTEGER â†’ int32_t - LogicalTypeId::BIGINT, LogicalTypeId::TIMESTAMP â†’ int64_t - LogicalTypeId::FLOAT, LogicalTypeId::DOUBLE, LogicalTypeId::DECIMAL â†’ double - LogicalTypeId::VARCHAR, LogicalTypeId::CHAR, LogicalTypeId::BLOB â†’ string_t - LogicalTypeId::VARBINARY â†’ blob_t ###### CreateVectorizedFunction {#docs:stable:clients:cpp::createvectorizedfunction} The `CreateVectorizedFunction()` methods register a vectorized UDF such as: ```cpp /* * This vectorized function copies the input values to the result vector */ template static void udf_vectorized(DataChunk &args, ExpressionState &state, Vector &result) { // set the result vector type result.vector_type = VectorType::FLAT_VECTOR; // get a raw array from the result auto result_data = FlatVector::GetData(result); // get the solely input vector auto &input = args.data[0]; // now get an orrified vector VectorData vdata; input.Orrify(args.size(), vdata); // get a raw array from the orrified input auto input_data = (TYPE *)vdata.data; // handling the data for (idx_t i = 0; i < args.size(); i++) { auto idx = vdata.sel->get_index(i); if ((*vdata.nullmask)[idx]) { continue; } result_data[i] = input_data[idx]; } } con.Query("CREATE TABLE integers (i INTEGER)"); con.Query("INSERT INTO integers VALUES (1), (2), (3), (999)"); con.CreateVectorizedFunction("udf_vectorized_int", &&udf_vectorized); con.Query("SELECT udf_vectorized_int(i) FROM integers")->Print(); ``` The Vectorized UDF is a pointer of the type _scalar_function_t_: ```cpp typedef std::function scalar_function_t; ``` - **args** is a [DataChunk](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/common/types/data_chunk.hpp) that holds a set of input vectors for the UDF that all have the same length; - **expr** is an [ExpressionState](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/execution/expression_executor_state.hpp) that provides information to the query's expression state; - **result**: is a [Vector](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/common/types/vector.hpp) to store the result values. There are different vector types to handle in a Vectorized UDF: - ConstantVector; - DictionaryVector; - FlatVector; - ListVector; - StringVector; - StructVector; - SequenceVector. The general API of the `CreateVectorizedFunction()` method is as follows: **1.** ```cpp template void CreateVectorizedFunction(string name, scalar_function_t udf_func, LogicalType varargs = LogicalType::INVALID) ``` - template parameters: - **TR** is the return type of the UDF function; - **Args** are the arguments up to 3 for the UDF function. - **name** is the name to register the UDF function; - **udf_func** is a _vectorized_ UDF function; - **varargs** The type of varargs to support, or LogicalTypeId::INVALID (default value) if the function does not accept variable length arguments. This method automatically discovers from the template typenames the corresponding LogicalTypes: - bool â†’ LogicalType::BOOLEAN; - int8_t â†’ LogicalType::TINYINT; - int16_t â†’ LogicalType::SMALLINT - int32_t â†’ LogicalType::INTEGER - int64_t â†’ LogicalType::BIGINT - float â†’ LogicalType::FLOAT - double â†’ LogicalType::DOUBLE - string_t â†’ LogicalType::VARCHAR **2.** ```cpp template void CreateVectorizedFunction(string name, vector args, LogicalType ret_type, scalar_function_t udf_func, LogicalType varargs = LogicalType::INVALID) ``` ## CLI {#clients:cli} ### Command Line Client {#docs:stable:clients:cli:overview} > The latest stable version of the DuckDB command line client is 1.4.1. #### Installation {#docs:stable:clients:cli:overview::installation} The DuckDB CLI (Command Line Interface) is a single, dependency-free executable. It is precompiled for Windows, Mac, and Linux for both the stable version and for nightly builds produced by GitHub Actions. Please see the [installation page](https://duckdb.org/install) under the CLI tab for download links. The DuckDB CLI is based on the SQLite command line shell, so CLI-client-specific functionality is similar to what is described in the [SQLite documentation](https://www.sqlite.org/cli.html) (although DuckDB's SQL syntax follows PostgreSQL conventions with a [few exceptions](#docs:stable:sql:dialect:postgresql_compatibility)). > DuckDB has a [tldr page](https://tldr.inbrowser.app/pages/common/duckdb), which summarizes the most common uses of the CLI client. > If you have [tldr](https://github.com/tldr-pages/tldr) installed, you can display it by running `tldr duckdb`. #### Getting Started {#docs:stable:clients:cli:overview::getting-started} Once the CLI executable has been downloaded, unzip it and save it to any directory. Navigate to that directory in a terminal and enter the command `duckdb` to run the executable. If in a PowerShell or POSIX shell environment, use the command `./duckdb` instead. #### Usage {#docs:stable:clients:cli:overview::usage} The typical usage of the `duckdb` command is the following: ```batch duckdb âŸ¨OPTIONSâŸ© âŸ¨FILENAMEâŸ© ``` ##### Options {#docs:stable:clients:cli:overview::options} The `âŸ¨OPTIONSâŸ©`{:.language-sql .highlight} part encodes [arguments for the CLI client](#docs:stable:clients:cli:arguments). Common options include: * `-csv`: sets the output mode to CSV * `-json`: sets the output mode to JSON * `-readonly`: open the database in read-only mode (see [concurrency in DuckDB](#docs:stable:connect:concurrency::handling-concurrency)) For a full list of options, see the [command line arguments page](#docs:stable:clients:cli:arguments). ##### In-Memory vs. Persistent Database {#docs:stable:clients:cli:overview::in-memory-vs-persistent-database} When no `âŸ¨FILENAMEâŸ©`{:.language-sql .highlight} argument is provided, the DuckDB CLI will open a temporary [in-memory database](#docs:stable:connect:overview::in-memory-database). You will see DuckDB's version number, the information on the connection and a prompt starting with a `D`. ```batch duckdb ``` ```text DuckDB v1.4.1 ({{ site.current_duckdb_codename }}) b390a7c376 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. D ``` To open or create a [persistent database](#docs:stable:connect:overview::persistent-database), simply include a path as a command line argument: ```batch duckdb my_database.duckdb ``` ##### Running SQL Statements in the CLI {#docs:stable:clients:cli:overview::running-sql-statements-in-the-cli} Once the CLI has been opened, enter a SQL statement followed by a semicolon, then hit enter and it will be executed. Results will be displayed in a table in the terminal. If a semicolon is omitted, hitting enter will allow for multi-line SQL statements to be entered. ```sql SELECT 'quack' AS my_column; ``` | my_column | |-----------| | quack | The CLI supports all of DuckDB's rich [SQL syntax](#docs:stable:sql:introduction) including `SELECT`, `CREATE`, and `ALTER` statements. ##### Editor Features {#docs:stable:clients:cli:overview::editor-features} The CLI supports [autocompletion](#docs:stable:clients:cli:autocomplete), and has sophisticated [editor features](#docs:stable:clients:cli:editing) and [syntax highlighting](#docs:stable:clients:cli:syntax_highlighting) on certain platforms. ##### Exiting the CLI {#docs:stable:clients:cli:overview::exiting-the-cli} To exit the CLI, press `Ctrl`+`D` if your platform supports it. Otherwise, press `Ctrl`+`C` or use the `.exit` command. If you used a persistent database, DuckDB will automatically checkpoint (save the latest edits to disk) and close. This will remove the `.wal` file (the [write-ahead log](https://en.wikipedia.org/wiki/Write-ahead_logging)) and consolidate all of your data into the single-file database. ##### Dot Commands {#docs:stable:clients:cli:overview::dot-commands} In addition to SQL syntax, special [dot commands](#docs:stable:clients:cli:dot_commands) may be entered into the CLI client. To use one of these commands, begin the line with a period (` .`) immediately followed by the name of the command you wish to execute. Additional arguments to the command are entered, space separated, after the command. If an argument must contain a space, either single or double quotes may be used to wrap that parameter. Dot commands must be entered on a single line, and no whitespace may occur before the period. No semicolon is required at the end of the line. Frequently-used configurations can be stored in the file `~/.duckdbrc`, which will be loaded when starting the CLI client. See the [Configuring the CLI](#::configuring-the-cli) section below for further information on these options. > **Tip.** To prevent the DuckDB CLI client from reading the `~/.duckdbrc` file, start it as follows: > ```batch > duckdb -init /dev/null > ``` Below, we summarize a few important dot commands. To see all available commands, see the [dot commands page](#docs:stable:clients:cli:dot_commands) or use the `.help` command. ###### Opening Database Files {#docs:stable:clients:cli:overview::opening-database-files} In addition to connecting to a database when opening the CLI, a new database connection can be made by using the `.open` command. If no additional parameters are supplied, a new in-memory database connection is created. This database will not be persisted when the CLI connection is closed. ```text .open ``` The `.open` command optionally accepts several options, but the final parameter can be used to indicate a path to a persistent database (or where one should be created). The special string `:memory:` can also be used to open a temporary in-memory database. ```text .open persistent.duckdb ``` > **Warning.** `.open` closes the current database. > To keep the current database, while adding a new database, use the [`ATTACH` statement](#docs:stable:sql:statements:attach). One important option accepted by `.open` is the `--readonly` flag. This disallows any editing of the database. To open in read only mode, the database must already exist. This also means that a new in-memory database can't be opened in read only mode since in-memory databases are created upon connection. ```text .open --readonly preexisting.duckdb ``` ###### Output Formats {#docs:stable:clients:cli:overview::output-formats} The `.mode` [dot command](#docs:stable:clients:cli:dot_commands::mode) may be used to change the appearance of the tables returned in the terminal output. These include the default `duckbox` mode, `csv` and `json` mode for ingestion by other tools, `markdown` and `latex` for documents, and `insert` mode for generating SQL statements. ###### Writing Results to a File {#docs:stable:clients:cli:overview::writing-results-to-a-file} By default, the DuckDB CLI sends results to the terminal's standard output. However, this can be modified using either the `.output` or `.once` commands. For details, see the documentation for the [output dot command](#docs:stable:clients:cli:dot_commands::output-writing-results-to-a-file). ###### Reading SQL from a File {#docs:stable:clients:cli:overview::reading-sql-from-a-file} The DuckDB CLI can read both SQL commands and dot commands from an external file instead of the terminal using the `.read` command. This allows for a number of commands to be run in sequence and allows command sequences to be saved and reused. The `.read` command requires only one argument: the path to the file containing the SQL and/or commands to execute. After running the commands in the file, control will revert back to the terminal. Output from the execution of that file is governed by the same `.output` and `.once` commands that have been discussed previously. This allows the output to be displayed back to the terminal, as in the first example below, or out to another file, as in the second example. In this example, the file `select_example.sql` is located in the same directory as duckdb.exe and contains the following SQL statement: ```sql SELECT * FROM generate_series(5); ``` To execute it from the CLI, the `.read` command is used. ```text .read select_example.sql ``` The output below is returned to the terminal by default. The formatting of the table can be adjusted using the `.output` or `.once` commands. ```text | generate_series | |----------------:| | 0 | | 1 | | 2 | | 3 | | 4 | | 5 | ``` Multiple commands, including both SQL and dot commands, can also be run in a single `.read` command. In this example, the file `write_markdown_to_file.sql` is located in the same directory as duckdb.exe and contains the following commands: ```sql .mode markdown .output series.md SELECT * FROM generate_series(5); ``` To execute it from the CLI, the `.read` command is used as before. ```text .read write_markdown_to_file.sql ``` In this case, no output is returned to the terminal. Instead, the file `series.md` is created (or replaced if it already existed) with the markdown-formatted results shown here: ```text | generate_series | |----------------:| | 0 | | 1 | | 2 | | 3 | | 4 | | 5 | ``` #### Configuring the CLI {#docs:stable:clients:cli:overview::configuring-the-cli} Several dot commands can be used to configure the CLI. On startup, the CLI reads and executes all commands in the file `~/.duckdbrc`, including dot commands and SQL statements. This allows you to store the configuration state of the CLI. You may also point to a different initialization file using the `-init`. ##### Setting a Custom Prompt {#docs:stable:clients:cli:overview::setting-a-custom-prompt} As an example, a file in the same directory as the DuckDB CLI named `prompt.sql` will change the DuckDB prompt to be a duck head and run a SQL statement. Note that the duck head is built with Unicode characters and does not work in all terminal environments (e.g., in Windows, unless running with WSL and using the Windows Terminal). ```text .prompt 'âš«â—— ' ``` To invoke that file on initialization, use this command: ```batch duckdb -init prompt.sql ``` This outputs: ```text -- Loading resources from prompt.sql vâŸ¨versionâŸ© âŸ¨git_hashâŸ© Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. âš«â—— ``` #### Non-Interactive Usage {#docs:stable:clients:cli:overview::non-interactive-usage} To read/process a file and exit immediately, redirect the file contents in to `duckdb`: ```batch duckdb < select_example.sql ``` To execute a command with SQL text passed in directly from the command line, call `duckdb` with two arguments: the database location (or `:memory:`), and a string with the SQL statement to execute. ```batch duckdb :memory: "SELECT 42 AS the_answer" ``` #### Loading Extensions {#docs:stable:clients:cli:overview::loading-extensions} To load extensions, use DuckDB's SQL `INSTALL` and `LOAD` commands as you would other SQL statements. ```sql INSTALL fts; LOAD fts; ``` For details, see the [Extension docs](#docs:stable:extensions:overview). #### Reading from stdin and Writing to stdout {#docs:stable:clients:cli:overview::reading-from-stdin-and-writing-to-stdout} When in a Unix environment, it can be useful to pipe data between multiple commands. DuckDB is able to read data from stdin as well as write to stdout using the file location of stdin (` /dev/stdin`) and stdout (` /dev/stdout`) within SQL commands, as pipes act very similarly to file handles. This command will create an example CSV: ```sql COPY (SELECT 42 AS woot UNION ALL SELECT 43 AS woot) TO 'test.csv' (HEADER); ``` First, read a file and pipe it to the `duckdb` CLI executable. As arguments to the DuckDB CLI, pass in the location of the database to open, in this case, an in-memory database, and a SQL command that utilizes `/dev/stdin` as a file location. ```batch cat test.csv | duckdb -c "SELECT * FROM read_csv('/dev/stdin')" ``` | woot | |-----:| | 42 | | 43 | To write back to stdout, the copy command can be used with the `/dev/stdout` file location. ```batch cat test.csv | \ duckdb -c "COPY (SELECT * FROM read_csv('/dev/stdin')) TO '/dev/stdout' WITH (FORMAT csv, HEADER)" ``` ```csv woot 42 43 ``` #### Reading Environment Variables {#docs:stable:clients:cli:overview::reading-environment-variables} The `getenv` function can read environment variables. ##### Examples {#docs:stable:clients:cli:overview::examples} To retrieve the home directory's path from the `HOME` environment variable, use: ```sql SELECT getenv('HOME') AS home; ``` | home | |------------------| | /Users/user_name | The output of the `getenv` function can be used to set [configuration options](#docs:stable:configuration:overview). For example, to set the `NULL` order based on the environment variable `DEFAULT_NULL_ORDER`, use: ```sql SET default_null_order = getenv('DEFAULT_NULL_ORDER'); ``` ##### Restrictions for Reading Environment Variables {#docs:stable:clients:cli:overview::restrictions-for-reading-environment-variables} The `getenv` function can only be run when the [`enable_external_access`](#docs:stable:configuration:overview::configuration-reference) option is set to `true` (the default setting). It is only available in the CLI client and is not supported in other DuckDB clients. #### Prepared Statements {#docs:stable:clients:cli:overview::prepared-statements} The DuckDB CLI supports executing [prepared statements](#docs:stable:sql:query_syntax:prepared_statements) in addition to regular `SELECT` statements. To create and execute a prepared statement in the CLI client, use the `PREPARE` clause and the `EXECUTE` statement. #### Query Completion ETA {#docs:stable:clients:cli:overview::query-completion-eta} DuckDB's CLI now provides intelligent time-to-completion estimates for running queries and displays total execution time upon completion. When executing queries in the DuckDB CLI, the progress bar displays an estimated time remaining until completion. This feature employs advanced statistical modeling ([Kalman filtering](https://en.wikipedia.org/wiki/Kalman_filter)) to deliver more accurate predictions than simple linear extrapolation. ##### How It Works {#docs:stable:clients:cli:overview::how-it-works} DuckDB calculates the estimated time to completion through the following process: 1. Progress Monitoring: DuckDB's internal progress API reports the estimated completion percentage for the running query 2. Statistical Filtering: A Kalman filter smooths noisy progress measurements and accounts for execution variability 3. Continuous Refinement: The system continuously updates predicted completion time as new progress data becomes available, improving accuracy throughout execution The Kalman filter adapts to changing execution conditions such as memory pressure, I/O bottlenecks, or network delays. This adaptive approach means estimated completion times may not always decrease linearlyâ€”estimates can increase when query execution becomes less predictable. ##### Factors Affecting The Accuracy of Query Completion ETA {#docs:stable:clients:cli:overview::factors-affecting-the-accuracy-of-query-completion-eta} Completion time estimates may be less reliable under these conditions: System resource constraints: * Memory pressure causing disk swapping * High CPU load from competing processes * Disk I/O bottlenecks Query execution characteristics: * Variable execution phases (initial setup versus main processing) * Network-dependent operations with inconsistent latency * Queries with unpredictable branching logic * Operations on remote data sources * External function calls * Highly skewed data distributions ### Command Line Arguments {#docs:stable:clients:cli:arguments} The table below summarizes DuckDB's command line options. To list all command line options, use the command: ```batch duckdb -help ``` For a list of dot commands available in the CLI shell, see the [Dot Commands page](#docs:stable:clients:cli:dot_commands). | Argument | Description | | ----------------- | ------------------------------------------------------------------------------------------------------------- | | `-append` | Append the database to the end of the file | | `-ascii` | Set [output mode](#docs:stable:clients:cli:output_formats) to `ascii` | | `-bail` | Stop after hitting an error | | `-batch` | Force batch I/O | | `-box` | Set [output mode](#docs:stable:clients:cli:output_formats) to `box` | | `-column` | Set [output mode](#docs:stable:clients:cli:output_formats) to `column` | | `-cmd COMMAND` | Run `COMMAND` before reading `stdin` | | `-c COMMAND` | Run `COMMAND` and exit | | `-csv` | Set [output mode](#docs:stable:clients:cli:output_formats) to `csv` | | `-echo` | Print commands before execution | | `-f FILENAME` | Run the script in `FILENAME` and exit. Note that the `~/.duckdbrc` is read and executed first (if it exists) | | `-init FILENAME` | Run the script in `FILENAME` upon startup (instead of `~/.duckdbrc`) | | `-header` | Turn headers on | | `-help` | Show this message | | `-html` | Set [output mode](#docs:stable:clients:cli:output_formats) to HTML | | `-interactive` | Force interactive I/O | | `-json` | Set [output mode](#docs:stable:clients:cli:output_formats) to `json` | | `-line` | Set [output mode](#docs:stable:clients:cli:output_formats) to `line` | | `-list` | Set [output mode](#docs:stable:clients:cli:output_formats) to `list` | | `-markdown` | Set [output mode](#docs:stable:clients:cli:output_formats) to `markdown` | | `-newline SEP` | Set output row separator. Default: `\n` | | `-nofollow` | Refuse to open symbolic links to database files | | `-noheader` | Turn headers off | | `-no-stdin` | Exit after processing options instead of reading stdin | | `-nullvalue TEXT` | Set text string for `NULL` values. Default: `NULL` | | `-quote` | Set [output mode](#docs:stable:clients:cli:output_formats) to `quote` | | `-readonly` | Open the database read-only. This option also supports attaching to remote databases via HTTPS | | `-s COMMAND` | Run `COMMAND` and exit | | `-separator SEP` | Set output column separator to `SEP`. Default: `|` | | `-table` | Set [output mode](#docs:stable:clients:cli:output_formats) to `table` | | `-ui` | Loads and starts the [DuckDB UI](#docs:stable:core_extensions:ui). If the UI is not yet installed, it installs the `ui` extension | | `-unsigned` | Allow loading of [unsigned extensions](#docs:stable:extensions:overview::unsigned-extensions). This option is intended to be used for developing extensions. Consult the [Securing DuckDB page](#docs:stable:operations_manual:securing_duckdb:securing_extensions) for guidelines on how set up DuckDB in a secure manner | | `-version` | Show DuckDB version | #### Passing a Sequence of Arguments {#docs:stable:clients:cli:arguments::passing-a-sequence-of-arguments} Note that the CLI arguments are processed in order, similarly to the behavior of the SQLite CLI. For example: ```batch duckdb -csv -c 'SELECT 42 AS hello' -json -c 'SELECT 84 AS world' ``` Returns the following: ```text hello 42 [{"world":84}] ``` ### Dot Commands {#docs:stable:clients:cli:dot_commands} Dot commands are available in the DuckDB CLI client. To use one of these commands, begin the line with a period (` .`) immediately followed by the name of the command you wish to execute. Additional arguments to the command are entered, space separated, after the command. If an argument must contain a space, either single or double quotes may be used to wrap that parameter. Dot commands must be entered on a single line, and no whitespace may occur before the period. No semicolon is required at the end of the line. To see available commands, use the `.help` command. #### List of Dot Commands {#docs:stable:clients:cli:dot_commands::list-of-dot-commands} | Command | Description | | --------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `.bail âŸ¨on/offâŸ©`{:.language-sql .highlight} | Stop after hitting an error. Default: `off` | | `.binary âŸ¨on/offâŸ©`{:.language-sql .highlight} | Turn binary output `on` or `off`. Default: `off` | | `.cd âŸ¨DIRECTORYâŸ©`{:.language-sql .highlight} | Change the working directory to `DIRECTORY` | | `.changes âŸ¨on/offâŸ©`{:.language-sql .highlight} | Show number of rows changed by SQL | | `.columns`{:.language-sql .highlight} | Column-wise rendering of query results | | `.constant âŸ¨COLORâŸ©`{:.language-sql .highlight} | Sets the syntax highlighting color used for constant values | | `.constantcode âŸ¨CODEâŸ©`{:.language-sql .highlight} | Sets the syntax highlighting terminal code used for constant values | | `.databases`{:.language-sql .highlight} | List names and files of attached databases | | `.echo âŸ¨on/offâŸ©`{:.language-sql .highlight} | Turn command echo `on` or `off` | | `.exit âŸ¨CODEâŸ©`{:.language-sql .highlight} | Exit this program with return-code `CODE` | | `.headers âŸ¨on/offâŸ©`{:.language-sql .highlight} | Turn display of headers `on` or `off`. Does not apply to duckbox mode | | `.help âŸ¨-allâŸ© âŸ¨PATTERNâŸ©`{:.language-sql .highlight} | Show help text for `PATTERN` | | `.highlight âŸ¨on/offâŸ©`{:.language-sql .highlight} | Toggle syntax highlighting in the shell `on` / `off`. See the [query syntax highlighting section](#::configuring-the-query-syntax-highlighter) for more details | | `.highlight_colors âŸ¨COMPONENTâŸ© âŸ¨COLORâŸ©`{:.language-sql .highlight} | Configure the color of each component in (duckbox only). See the [result syntax highlighting section](#::configuring-the-query-syntax-highlighter) for more details | | `.highlight_results âŸ¨on/offâŸ©`{:.language-sql .highlight} | Toggle highlighting in result tables `on` / `off` (duckbox only). See the [result syntaxx highlighting section](#::configuring-the-query-syntax-highlighter) for more details | | `.import âŸ¨FILEâŸ© âŸ¨TABLEâŸ©`{:.language-sql .highlight} | Import data from `FILE` into `TABLE` | | `.indexes âŸ¨TABLEâŸ©`{:.language-sql .highlight} | Show names of indexes | | `.keyword âŸ¨COLORâŸ©`{:.language-sql .highlight} | Sets the syntax highlighting color used for keywords | | `.keywordcode âŸ¨CODEâŸ©`{:.language-sql .highlight} | Sets the syntax highlighting terminal code used for keywords | | `.large_number_rendering âŸ¨all/footer/offâŸ©`{:.language-sql .highlight} | Toggle readable rendering of large numbers (duckbox only, default: `footer`) | | `.log âŸ¨FILE/offâŸ©`{:.language-sql .highlight} | Turn logging `on` or `off`. `FILE` can be `stderr` / `stdout` | | `.maxrows âŸ¨COUNTâŸ©`{:.language-sql .highlight} | Sets the maximum number of rows for display. Only for [duckbox mode](#docs:stable:clients:cli:output_formats) | | `.maxwidth âŸ¨COUNTâŸ©`{:.language-sql .highlight} | Sets the maximum width in characters. 0 defaults to terminal width. Only for [duckbox mode](#docs:stable:clients:cli:output_formats) | | `.mode âŸ¨MODEâŸ© âŸ¨TABLEâŸ©`{:.language-sql .highlight} | Set [output mode](#docs:stable:clients:cli:output_formats) | | `.multiline`{:.language-sql .highlight} | Set multi-line mode (default) | | `.nullvalue âŸ¨STRINGâŸ©`{:.language-sql .highlight} | Use `STRING` in place of `NULL` values. Default: `NULL` | | `.once âŸ¨OPTIONSâŸ© âŸ¨FILEâŸ©`{:.language-sql .highlight} | Output for the next SQL command only to `FILE` | | `.open âŸ¨OPTIONSâŸ© âŸ¨FILEâŸ©`{:.language-sql .highlight} | Close existing database and reopen `FILE` | | `.output âŸ¨FILEâŸ©`{:.language-sql .highlight} | Send output to `FILE` or `stdout` if `FILE` is omitted | | `.print âŸ¨STRING...âŸ©`{:.language-sql .highlight} | Print literal `STRING` | | `.prompt âŸ¨MAINâŸ© âŸ¨CONTINUEâŸ©`{:.language-sql .highlight} | Replace the standard prompts | | `.quit`{:.language-sql .highlight} | Exit this program | | `.read âŸ¨FILEâŸ©`{:.language-sql .highlight} | Read input from `FILE` | | `.rows`{:.language-sql .highlight} | Row-wise rendering of query results (default) | | `.safe_mode`{:.language-sql .highlight} | Activates [safe mode](#docs:stable:clients:cli:safe_mode) | | `.schema âŸ¨PATTERNâŸ©`{:.language-sql .highlight} | Show the `CREATE` statements matching `PATTERN` | | `.separator âŸ¨COLâŸ© âŸ¨ROWâŸ©`{:.language-sql .highlight} | Change the column and row separators | | `.shell âŸ¨CMDâŸ© âŸ¨ARGS...âŸ©`{:.language-sql .highlight} | Run `CMD` with `ARGS...` in a system shell | | `.show`{:.language-sql .highlight} | Show the current values for various settings | | `.singleline`{:.language-sql .highlight} | Set single-line mode | | `.system âŸ¨CMDâŸ© âŸ¨ARGS...âŸ©`{:.language-sql .highlight} | Run `CMD` with `ARGS...` in a system shell | | `.tables âŸ¨TABLEâŸ©`{:.language-sql .highlight} | List names of tables [matching `LIKE` pattern](#docs:stable:sql:functions:pattern_matching) `TABLE` | | `.timer âŸ¨on/offâŸ©`{:.language-sql .highlight} | Turn SQL timer `on` or `off`. SQL statements separated by `;` but _not_ separated via newline are measured together | | `.width âŸ¨NUM1âŸ© âŸ¨NUM2âŸ© ...`{:.language-sql .highlight} | Set minimum column widths for columnar output | #### Using the `.help` Command {#docs:stable:clients:cli:dot_commands::using-the-help-command} The `.help` text may be filtered by passing in a text string as the first argument. ```sql .help m ``` ```sql .maxrows COUNT Sets the maximum number of rows for display (default: 40). Only for duckbox mode. .maxwidth COUNT Sets the maximum width in characters. 0 defaults to terminal width. Only for duckbox mode. .mode MODE ?TABLE? Set output mode ``` #### `.output`: Writing Results to a File {#docs:stable:clients:cli:dot_commands::output-writing-results-to-a-file} By default, the DuckDB CLI sends results to the terminal's standard output. However, this can be modified using either the `.output` or `.once` commands. Pass in the desired output file location as a parameter. The `.once` command will only output the next set of results and then revert to standard out, but `.output` will redirect all subsequent output to that file location. Note that each result will overwrite the entire file at that destination. To revert back to standard output, enter `.output` with no file parameter. In this example, the output format is changed to `markdown`, the destination is identified as a Markdown file, and then DuckDB will write the output of the SQL statement to that file. Output is then reverted to standard output using `.output` with no parameter. ```sql .mode markdown .output my_results.md SELECT 'taking flight' AS output_column; .output SELECT 'back to the terminal' AS displayed_column; ``` The file `my_results.md` will then contain: ```text | output_column | |---------------| | taking flight | ``` The terminal will then display: ```text | displayed_column | |----------------------| | back to the terminal | ``` A common output format is CSV, or comma separated values. DuckDB supports [SQL syntax to export data as CSV or Parquet](#docs:stable:sql:statements:copy::copy-to), but the CLI-specific commands may be used to write a CSV instead if desired. ```sql .mode csv .once my_output_file.csv SELECT 1 AS col_1, 2 AS col_2 UNION ALL SELECT 10 AS col1, 20 AS col_2; ``` The file `my_output_file.csv` will then contain: ```csv col_1,col_2 1,2 10,20 ``` By passing special options (flags) to the `.once` command, query results can also be sent to a temporary file and automatically opened in the user's default program. Use either the `-e` flag for a text file (opened in the default text editor), or the `-x` flag for a CSV file (opened in the default spreadsheet editor). This is useful for more detailed inspection of query results, especially if there is a relatively large result set. The `.excel` command is equivalent to `.once -x`. ```sql .once -e SELECT 'quack' AS hello; ``` The results then open in the default text file editor of the system, for example: ![](../images/cli_docs_output_to_text_editor.jpg) > **Tip.** macOS users can copy the results to their clipboards using [`pbcopy`](https://ss64.com/mac/pbcopy.html) by using `.once` to output to `pbcopy` via a pipe: `.once |pbcopy` > > Combining this with the `.headers off` and `.mode lines` options can be particularly effective. #### Querying the Database Schema {#docs:stable:clients:cli:dot_commands::querying-the-database-schema} All DuckDB clients support [querying the database schema with SQL](#docs:stable:sql:meta:information_schema), but the CLI has additional [dot commands](#docs:stable:clients:cli:dot_commands) that can make it easier to understand the contents of a database. The `.tables` command will return a list of tables in the database. It has an optional argument that will filter the results according to a [`LIKE` pattern](#docs:stable:sql:functions:pattern_matching::like). ```sql CREATE TABLE swimmers AS SELECT 'duck' AS animal; CREATE TABLE fliers AS SELECT 'duck' AS animal; CREATE TABLE walkers AS SELECT 'duck' AS animal; .tables ``` ```text fliers swimmers walkers ``` For example, to filter to only tables that contain an `l`, use the `LIKE` pattern `%l%`. ```sql .tables %l% ``` ```text fliers walkers ``` The `.schema` command will show all of the SQL statements used to define the schema of the database. ```text .schema ``` ```sql CREATE TABLE fliers (animal VARCHAR); CREATE TABLE swimmers (animal VARCHAR); CREATE TABLE walkers (animal VARCHAR); ``` #### Syntax Highlighters {#docs:stable:clients:cli:dot_commands::syntax-highlighters} The DuckDB CLI client has a syntax highlighter for the SQL queries and another for the duckbox-formatted result tables. #### Configuring the Query Syntax Highlighter {#docs:stable:clients:cli:dot_commands::configuring-the-query-syntax-highlighter} By default the shell includes support for syntax highlighting. The CLI's syntax highlighter can be configured using the following commands. To turn off the highlighter: ```text .highlight off ``` To turn on the highlighter: ```text .highlight on ``` To configure the color used to highlight constants: ```text .constant [red|green|yellow|blue|magenta|cyan|white|brightblack|brightred|brightgreen|brightyellow|brightblue|brightmagenta|brightcyan|brightwhite] ``` ```text .constantcode âŸ¨terminal_codeâŸ© ``` For example: ```text .constantcode 033[31m ``` To configure the color used to highlight keywords: ```text .keyword [red|green|yellow|blue|magenta|cyan|white|brightblack|brightred|brightgreen|brightyellow|brightblue|brightmagenta|brightcyan|brightwhite] ``` ```text .keywordcode âŸ¨terminal_codeâŸ© ``` For example: ```text .keywordcode 033[31m ``` #### Configuring the Result Syntax Highlighter {#docs:stable:clients:cli:dot_commands::configuring-the-result-syntax-highlighter} By default, the result highlighting makes a few small modifications: - Bold column names - `NULL` values are greyed out - Layout elements are grayed out The highlighting of each of the components can be customized using the `.highlight_colors` command. For example: ```sql .highlight_colors layout red .highlight_colors column_type yellow .highlight_colors column_name yellow bold_underline .highlight_colors numeric_value cyan underline .highlight_colors temporal_value red bold .highlight_colors string_value green bold .highlight_colors footer gray ``` The result highlighting can be disabled using `.highlight_results off`. #### Shorthands {#docs:stable:clients:cli:dot_commands::shorthands} DuckDB's CLI allows using shorthands for dot commands. Once a sequence of characters can unambiguously completed to a dot command or an argument, the CLI (silently) autocompletes them. For example: ```text .mo ma ``` Is equivalent to: ```text .mode markdown ``` > **Tip.** Avoid using shorthands in SQL scripts to improve readability and ensure that the scripts and futureproof. #### Importing Data from CSV {#docs:stable:clients:cli:dot_commands::importing-data-from-csv} > **Deprecated.** This feature is only included for compatibility reasons and may be removed in the future. > Use the [`read_csv` function or the `COPY` statement](#docs:stable:data:csv:overview) to load CSV files. DuckDB supports [SQL syntax to directly query or import CSV files](#docs:stable:data:csv:overview), but the CLI-specific commands may be used to import a CSV instead if desired. The `.import` command takes two arguments and also supports several options. The first argument is the path to the CSV file, and the second is the name of the DuckDB table to create. Since DuckDB requires stricter typing than SQLite (upon which the DuckDB CLI is based), the destination table must be created before using the `.import` command. To automatically detect the schema and create a table from a CSV, see the [`read_csv` examples in the import docs](#docs:stable:data:csv:overview). In this example, a CSV file is generated by changing to CSV mode and setting an output file location: ```sql .mode csv .output import_example.csv SELECT 1 AS col_1, 2 AS col_2 UNION ALL SELECT 10 AS col1, 20 AS col_2; ``` Now that the CSV has been written, a table can be created with the desired schema and the CSV can be imported. The output is reset to the terminal to avoid continuing to edit the output file specified above. The `--skip N` option is used to ignore the first row of data since it is a header row and the table has already been created with the correct column names. ```text .mode csv .output CREATE TABLE test_table (col_1 INTEGER, col_2 INTEGER); .import import_example.csv test_table --skip 1 ``` Note that the `.import` command utilizes the current `.mode` and `.separator` settings when identifying the structure of the data to import. The `--csv` option can be used to override that behavior. ```text .import import_example.csv test_table --skip 1 --csv ``` ### Output Formats {#docs:stable:clients:cli:output_formats} The `.mode` [dot command](#docs:stable:clients:cli:dot_commands) may be used to change the appearance of the tables returned in the terminal output. In addition to customizing the appearance, these modes have additional benefits. This can be useful for presenting DuckDB output elsewhere by redirecting the terminal [output to a file](#docs:stable:clients:cli:dot_commands::output-writing-results-to-a-file). Using the `insert` mode will build a series of SQL statements that can be used to insert the data at a later point. The `markdown` mode is particularly useful for building documentation and the `latex` mode is useful for writing academic papers. #### List of Output Formats {#docs:stable:clients:cli:output_formats::list-of-output-formats} | Mode | Description | | ------------------------------------------- | -------------------------------------------------------------- | | `ascii` | Columns/rows delimited by 0x1F and 0x1E | | `box` | Tables using unicode box-drawing characters | | `csv` | Comma-separated values | | `column` | Output in columns (See `.width`) | | `duckbox` | Tables with extensive features (default) | | `html` | HTML `` code | | `insert âŸ¨TABLEâŸ©`{:.language-sql .highlight} | SQL insert statements for `âŸ¨TABLEâŸ©`{:.language-sql .highlight} | | `json` | Results in a JSON array | | `jsonlines` | Results in a NDJSON | | `latex` | LaTeX tabular environment code | | `line` | One value per line | | `list` | Values delimited by `|` | | `markdown` | Markdown table format | | `quote` | Escape answers as for SQL | | `table` | ASCII-art table | | `tabs` | Tab-separated values | | `tcl` | TCL list elements | | `trash` | No output | #### Changing the Output Format {#docs:stable:clients:cli:output_formats::changing-the-output-format} Use the vanilla `.mode` dot command to query the appearance currently in use. ```sql .mode ``` ```text current output mode: duckbox ``` Use the `.mode` dot command with an argument to set the output format. ```sql .mode markdown SELECT 'quacking intensifies' AS incoming_ducks; ``` ```text | incoming_ducks | |----------------------| | quacking intensifies | ``` The output appearance can also be adjusted with the `.separator` command. If using an export mode that relies on a separator (` csv` or `tabs` for example), the separator will be reset when the mode is changed. For example, `.mode csv` will set the separator to a comma (` ,`). Using `.separator "|"` will then convert the output to be pipe-separated. ```sql .mode csv SELECT 1 AS col_1, 2 AS col_2 UNION ALL SELECT 10 AS col1, 20 AS col_2; ``` ```csv col_1,col_2 1,2 10,20 ``` ```sql .separator "|" SELECT 1 AS col_1, 2 AS col_2 UNION ALL SELECT 10 AS col1, 20 AS col_2; ``` ```csv col_1|col_2 1|2 10|20 ``` #### `duckbox` Mode {#docs:stable:clients:cli:output_formats::duckbox-mode} By default, DuckDB renders query results in `duckbox` mode, which is a feature-rich ASCII-art style output format. The duckbox mode supports the `large_number_rendering` option, which allows human-readable rendering of large numbers. It has three levels: - `off` â€“ All numbers are printed using regular formatting. - `footer` (default) â€“ Large numbers are augmented with the human-readable format. Only applies to single-row results. - `all` - All large numbers are replaced with the human-readable format. See the following examples: ```sql .large_number_rendering off SELECT pi() * 1_000_000_000 AS x; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ x â”‚ â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 3141592653.589793 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ```sql .large_number_rendering footer SELECT pi() * 1_000_000_000 AS x; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ x â”‚ â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 3141592653.589793 â”‚ â”‚ (3.14 billion) â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ```sql .large_number_rendering all SELECT pi() * 1_000_000_000 AS x; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ x â”‚ â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 3.14 billion â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ### Editing {#docs:stable:clients:cli:editing} > The linenoise-based CLI editor is currently only available for macOS and Linux. DuckDB's CLI uses a line-editing library based on [linenoise](https://github.com/antirez/linenoise), which has shortcuts that are based on [Emacs mode of readline](https://readline.kablamo.org/emacs.html). Below is a list of available commands. #### Moving {#docs:stable:clients:cli:editing::moving} | Key | Action | |-----------------|------------------------------------------------------------------------| | `Left` | Move back a character | | `Right` | Move forward a character | | `Up` | Move up a line. When on the first line, move to previous history entry | | `Down` | Move down a line. When on last line, move to next history entry | | `Home` | Move to beginning of buffer | | `End` | Move to end of buffer | | `Ctrl`+`Left` | Move back a word | | `Ctrl`+`Right` | Move forward a word | | `Ctrl`+`A` | Move to beginning of buffer | | `Ctrl`+`B` | Move back a character | | `Ctrl`+`E` | Move to end of buffer | | `Ctrl`+`F` | Move forward a character | | `Alt`+`Left` | Move back a word | | `Alt`+`Right` | Move forward a word | #### History {#docs:stable:clients:cli:editing::history} | Key | Action | |------------|--------------------------------| | `Ctrl`+`P` | Move to previous history entry | | `Ctrl`+`N` | Move to next history entry | | `Ctrl`+`R` | Search the history | | `Ctrl`+`S` | Search the history | | `Alt`+`<` | Move to first history entry | | `Alt`+`>` | Move to last history entry | | `Alt`+`N` | Search the history | | `Alt`+`P` | Search the history | #### Changing Text {#docs:stable:clients:cli:editing::changing-text} | Key | Action | |-------------------|----------------------------------------------------------| | `Backspace` | Delete previous character | | `Delete` | Delete next character | | `Ctrl`+`D` | Delete next character. When buffer is empty, end editing | | `Ctrl`+`H` | Delete previous character | | `Ctrl`+`K` | Delete everything after the cursor | | `Ctrl`+`T` | Swap current and next character | | `Ctrl`+`U` | Delete all text | | `Ctrl`+`W` | Delete previous word | | `Alt`+`C` | Convert next word to titlecase | | `Alt`+`D` | Delete next word | | `Alt`+`L` | Convert next word to lowercase | | `Alt`+`R` | Delete all text | | `Alt`+`T` | Swap current and next word | | `Alt`+`U` | Convert next word to uppercase | | `Alt`+`Backspace` | Delete previous word | | `Alt`+`\` | Delete spaces around cursor | #### Completing {#docs:stable:clients:cli:editing::completing} | Key | Action | |---------------|--------------------------------------------------------| | `Tab` | Autocomplete. When autocompleting, cycle to next entry | | `Shift`+`Tab` | When autocompleting, cycle to previous entry | | `Esc`+`Esc` | When autocompleting, revert autocompletion | #### Miscellaneous {#docs:stable:clients:cli:editing::miscellaneous} | Key | Action | |------------|------------------------------------------------------------------------------------| | `Enter` | Execute query. If query is not complete, insert a newline at the end of the buffer | | `Ctrl`+`J` | Execute query. If query is not complete, insert a newline at the end of the buffer | | `Ctrl`+`C` | Cancel editing of current query | | `Ctrl`+`G` | Cancel editing of current query | | `Ctrl`+`L` | Clear screen | | `Ctrl`+`O` | Cancel editing of current query | | `Ctrl`+`X` | Insert a newline after the cursor | | `Ctrl`+`Z` | Suspend CLI and return to shell, use `fg` to re-open | #### External Editor Mode {#docs:stable:clients:cli:editing::external-editor-mode} Use `.edit` or `\e` to open a query in an external text editor. * When entered alone, it opens the previous command for editing. * When used inside a multi-line command, it opens the current command in the editor. The editor is taken from the first set environment variable among `DUCKDB_EDITOR`, `EDITOR` or `VISUAL` (in that order). If none are set, `vi` is used. > This feature is only available in the linenoise-based CLI editor, which is currently supported on macOS and Linux. #### Using Read-Line {#docs:stable:clients:cli:editing::using-read-line} If you prefer, you can use [`rlwrap`](https://github.com/hanslub42/rlwrap) to use read-line directly with the shell. Then, use `Shift`+`Enter` to insert a newline and `Enter` to execute the query: ```batch rlwrap --substitute-prompt="D " duckdb -batch ``` ### Safe Mode {#docs:stable:clients:cli:safe_mode} The DuckDB CLI client supports â€œsafe modeâ€. In safe mode, the CLI is prevented from accessing external files other than the database file that it was initially connected to and prevented from interacting with the host file system. This has the following effects: * The following [dot commands](#docs:stable:clients:cli:dot_commands) are disabled: * `.cd` * `.excel` * `.import` * `.log` * `.once` * `.open` * `.output` * `.read` * `.sh` * `.system` * Auto-complete no longer scans the file system for files to suggest as auto-complete targets. * The [`getenv` function](#docs:stable:clients:cli:overview::reading-environment-variables) is disabled. * The [`enable_external_access` option](#docs:stable:configuration:overview::configuration-reference) is set to `false`. This implies that: * `ATTACH` cannot attach to a database in a file. * `COPY` cannot read to or write from files. * Functions such as `read_csv`, `read_parquet`, `read_json`, etc. cannot read from an external source. Once safe mode is activated, it cannot be deactivated in the same DuckDB CLI session. For more information on running DuckDB in secure environments, see the [â€œSecuring DuckDBâ€ page](#docs:stable:operations_manual:securing_duckdb:overview). ### Autocomplete {#docs:stable:clients:cli:autocomplete} The shell offers context-aware autocomplete of SQL queries through the [`autocomplete` extension](#docs:stable:core_extensions:autocomplete). autocomplete is triggered by pressing `Tab`. Multiple autocomplete suggestions can be present. You can cycle forwards through the suggestions by repeatedly pressing `Tab`, or `Shift+Tab` to cycle backwards. autocompletion can be reverted by pressing `ESC` twice. The shell autocompletes four different groups: * Keywords * Table names and table functions * Column names and scalar functions * File names The shell looks at the position in the SQL statement to determine which of these autocompletions to trigger. For example: ```sql SELECT s ``` ```text student_id ``` ```sql SELECT student_id F ``` ```text FROM ``` ```sql SELECT student_id FROM g ``` ```text grades ``` ```sql SELECT student_id FROM 'd ``` ```text 'data/ ``` ```sql SELECT student_id FROM 'data/ ``` ```text 'data/grades.csv ``` ### Syntax Highlighting {#docs:stable:clients:cli:syntax_highlighting} > Syntax highlighting in the CLI is currently only available for macOS and Linux. SQL queries that are written in the shell are automatically highlighted using syntax highlighting. ![Image showing syntax highlighting in the shell](../images/syntax_highlighting_screenshot.png) There are several components of a query that are highlighted in different colors. The colors can be configured using [dot commands](#docs:stable:clients:cli:dot_commands). Syntax highlighting can also be disabled entirely using the `.highlight off` command. Below is a list of components that can be configured. | Type | Command | Default color | |-------------------------|-------------|-----------------| | Keywords | `.keyword` | `green` | | Constants ad literals | `.constant` | `yellow` | | Comments | `.comment` | `brightblack` | | Errors | `.error` | `red` | | Continuation | `.cont` | `brightblack` | | Continuation (Selected) | `.cont_sel` | `green` | The components can be configured using either a supported color name (e.g., `.keyword red`), or by directly providing a terminal code to use for rendering (e.g., `.keywordcode \033[31m`). Below is a list of supported color names and their corresponding terminal codes. | Color | Terminal code | |---------------|---------------| | red | `\033[31m` | | green | `\033[32m` | | yellow | `\033[33m` | | blue | `\033[34m` | | magenta | `\033[35m` | | cyan | `\033[36m` | | white | `\033[37m` | | brightblack | `\033[90m` | | brightred | `\033[91m` | | brightgreen | `\033[92m` | | brightyellow | `\033[93m` | | brightblue | `\033[94m` | | brightmagenta | `\033[95m` | | brightcyan | `\033[96m` | | brightwhite | `\033[97m` | For example, here is an alternative set of syntax highlighting colors: ```text .keyword brightred .constant brightwhite .comment cyan .error yellow .cont blue .cont_sel brightblue ``` If you wish to start up the CLI with a different set of colors every time, you can place these commands in the `~/.duckdbrc` file that is loaded on start-up of the CLI. #### Error Highlighting {#docs:stable:clients:cli:syntax_highlighting::error-highlighting} The shell has support for highlighting certain errors. In particular, mismatched brackets and unclosed quotes are highlighted in red (or another color if specified). This highlighting is automatically disabled for large queries. In addition, it can be disabled manually using the `.render_errors off` command. ### Known Issues {#docs:stable:clients:cli:known_issues} #### Incorrect Memory Values on Old Linux Distributions and WSL 2 {#docs:stable:clients:cli:known_issues::incorrect-memory-values-on-old-linux-distributions-and-wsl-2} On Windows Subsystem for Linux 2 (WSL2), when querying the `max_memory` or `memory_limit` from the `duckdb_settings`, the values may be inaccurate on certain Ubuntu versions (e.g., 20.04 and 24.04). The issue also occurs on older distributions such as Red Hat Enterprise Linux 8 (RHEL 8): Example: ```sql FROM duckdb_settings() WHERE name LIKE '%mem%'; ``` The output contains values larger than 1000 PiB: ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ name â”‚ value â”‚ description â”‚ input_type â”‚ scope â”‚ â”‚ varchar â”‚ varchar â”‚ varchar â”‚ varchar â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ max_memory â”‚ 1638.3 PiB â”‚ The maximum memory of the system (e.g. 1GB) â”‚ VARCHAR â”‚ GLOBAL â”‚ â”‚ memory_limit â”‚ 1638.3 PiB â”‚ The maximum memory of the system (e.g. 1GB) â”‚ VARCHAR â”‚ GLOBAL â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ## Dart Client {#docs:stable:clients:dart} > The latest stable version of the DuckDB Dart client is {{ site.current_duckdb_dart_version }}. DuckDB.Dart is the native Dart API for [DuckDB](https://duckdb.org/). #### Installation {#docs:stable:clients:dart::installation} DuckDB.Dart can be installed from [pub.dev](https://pub.dev/packages/dart_duckdb). Please see the [API Reference](https://pub.dev/documentation/dart_duckdb/latest/) for details. ##### Use This Package as a Library {#docs:stable:clients:dart::use-this-package-as-a-library} ###### Depend on It {#docs:stable:clients:dart::depend-on-it} Add the dependency with Flutter: ```batch flutter pub add dart_duckdb ``` This will add a line like this to your package's `pubspec.yaml` (and run an implicit `flutter pub get`): ```yaml dependencies: dart_duckdb: ^1.1.3 ``` Alternatively, your editor might support `flutter pub get`. Check the docs for your editor to learn more. ###### Import It {#docs:stable:clients:dart::import-it} Now in your Dart code, you can import it: ```dart import 'package:dart_duckdb/dart_duckdb.dart'; ``` #### Usage Examples {#docs:stable:clients:dart::usage-examples} See the example projects in the [`duckdb-dart` repository](https://github.com/TigerEyeLabs/duckdb-dart/): * [`cli`](https://github.com/TigerEyeLabs/duckdb-dart/tree/main/examples/cli): command-line application * [`duckdbexplorer`](https://github.com/TigerEyeLabs/duckdb-dart/tree/main/examples/duckdbexplorer): GUI application which builds for desktop operating systems as well as Android and iOS. Here are some common code snippets for DuckDB.Dart: ##### Querying an In-Memory Database {#docs:stable:clients:dart::querying-an-in-memory-database} ```dart import 'package:dart_duckdb/dart_duckdb.dart'; void main() { final db = duckdb.open(":memory:"); final connection = duckdb.connect(db); connection.execute(''' CREATE TABLE users (id INTEGER, name VARCHAR, age INTEGER); INSERT INTO users VALUES (1, 'Alice', 30), (2, 'Bob', 25); '''); final result = connection.query("SELECT * FROM users WHERE age > 28").fetchAll(); for (final row in result) { print(row); } connection.dispose(); db.dispose(); } ``` ##### Queries on Background Isolates {#docs:stable:clients:dart::queries-on-background-isolates} ```dart import 'package:dart_duckdb/dart_duckdb.dart'; void main() { final db = duckdb.open(":memory:"); final connection = duckdb.connect(db); await Isolate.spawn(backgroundTask, db.transferrable); connection.dispose(); db.dispose(); } void backgroundTask(TransferableDatabase transferableDb) { final connection = duckdb.connectWithTransferred(transferableDb); // Access database ... // fetch is needed to send the data back to the main isolate } ``` ## Go Client {#docs:stable:clients:go} > The DuckDB Go client's project recently moved from [`github.com/marcboeker/go-duckdb`](https://github.com/marcboeker/go-duckdb) to [`github.com/duckdb/duckdb-go`](https://github.com/duckdb/duckdb-go) starting with `v2.5.0`. Please follow the [migration guide](https://github.com/duckdb/duckdb-go#migration-from-marcboekergo-duckdb) to update to the new repository. The DuckDB Go client, `duckdb-go`, allows using DuckDB via the `database/sql` interface. For examples on how to use this interface, see the [official documentation](https://pkg.go.dev/database/sql) and [tutorial](https://go.dev/doc/tutorial/database-access). #### Installation {#docs:stable:clients:go::installation} To install the `duckdb-go` client, run: ```batch go get github.com/duckdb/duckdb-go/v2 ``` #### Importing {#docs:stable:clients:go::importing} To import the DuckDB Go package, add the following entries to your imports: ```go import ( "database/sql" _ "github.com/duckdb/duckdb-go/v2" ) ``` #### Appender {#docs:stable:clients:go::appender} The DuckDB Go client supports the [DuckDB Appender API](#docs:stable:data:appender) for bulk inserts. You can obtain a new Appender by supplying a DuckDB connection to `NewAppenderFromConn()`. For example: ```go connector, err := duckdb.NewConnector("test.db", nil) if err != nil { ... } conn, err := connector.Connect(context.Background()) if err != nil { ... } defer conn.Close() // Retrieve appender from connection (note that you have to create the table 'test' beforehand). appender, err := NewAppenderFromConn(conn, "", "test") if err != nil { ... } defer appender.Close() err = appender.AppendRow(...) if err != nil { ... } // Optional, if you want to access the appended rows immediately. err = appender.Flush() if err != nil { ... } ``` #### Examples {#docs:stable:clients:go::examples} ##### Simple Example {#docs:stable:clients:go::simple-example} An example for using the Go API is as follows: ```go package main import ( "database/sql" "errors" "fmt" "log" _ "github.com/duckdb/duckdb-go/v2" ) func main() { db, err := sql.Open("duckdb", "") if err != nil { log.Fatal(err) } defer db.Close() _, err = db.Exec(` CREATE TABLE people (id INTEGER, name VARCHAR)`) if err != nil { log.Fatal(err) } _, err = db.Exec(` INSERT INTO people VALUES (42, 'John')`) if err != nil { log.Fatal(err) } var ( id int name string ) row := db.QueryRow(` SELECT id, name FROM people`) err = row.Scan(&id, &name) if errors.Is(err, sql.ErrNoRows) { log.Println("no rows") } else if err != nil { log.Fatal(err) } fmt.Printf("id: %d, name: %s\n", id, name) } ``` ##### More Examples {#docs:stable:clients:go::more-examples} For more examples, see the [examples in the `duckdb-go` repository](https://github.com/duckdb/duckdb-go/tree/main/examples). #### Acknowledgements {#docs:stable:clients:go::acknowledgements} We would like to thank [Marc Boeker](https://github.com/marcboeker) for the initial implementation of the DuckDB Go client. ## Java JDBC Client {#docs:stable:clients:java} > The latest stable version of the DuckDB Java (JDBC) client is {{ site.current_duckdb_java_short_version }}. #### Installation {#docs:stable:clients:java::installation} The DuckDB Java JDBC API can be installed from [Maven Central](https://search.maven.org/artifact/org.duckdb/duckdb_jdbc). Please see the [installation page](https://duckdb.org/install) for details. #### Basic API Usage {#docs:stable:clients:java::basic-api-usage} DuckDB's JDBC API implements the main parts of the standard Java Database Connectivity (JDBC) API, version 4.1. Describing JDBC is beyond the scope of this page, see the [official documentation](https://docs.oracle.com/javase/tutorial/jdbc/basics/index.html) for details. Below we focus on the DuckDB-specific parts. Refer to the externally hosted [API Reference](https://javadoc.io/doc/org.duckdb/duckdb_jdbc) for more information about our extensions to the JDBC specification, or the below [Arrow Methods](#::arrow-methods). ##### Startup & Shutdown {#docs:stable:clients:java::startup--shutdown} In JDBC, database connections are created through the standard `java.sql.DriverManager` class. The driver should auto-register in the `DriverManager`, if that does not work for some reason, you can enforce registration using the following statement: ```java Class.forName("org.duckdb.DuckDBDriver"); ``` To create a DuckDB connection, call `DriverManager` with the `jdbc:duckdb:` JDBC URL prefix, like so: ```java import java.sql.Connection; import java.sql.DriverManager; Connection conn = DriverManager.getConnection("jdbc:duckdb:"); ``` To use DuckDB-specific features such as the [Appender](#::appender), cast the object to a `DuckDBConnection`: ```java import java.sql.DriverManager; import org.duckdb.DuckDBConnection; DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:"); ``` When using the `jdbc:duckdb:` URL alone, an **in-memory database** is created. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the Java program). If you would like to access or create a persistent database, append its file name after the path. For example, if your database is stored in `/tmp/my_database`, use the JDBC URL `jdbc:duckdb:/tmp/my_database` to create a connection to it. It is possible to open a DuckDB database file in **read-only** mode. This is for example useful if multiple Java processes want to read the same database file at the same time. To open an existing database file in read-only mode, set the connection property `duckdb.read_only` like so: ```java Properties readOnlyProperty = new Properties(); readOnlyProperty.setProperty("duckdb.read_only", "true"); Connection conn = DriverManager.getConnection("jdbc:duckdb:/tmp/my_database", readOnlyProperty); ``` Additional connections can be created using the `DriverManager`. A more efficient mechanism is to call the `DuckDBConnection#duplicate()` method: ```java Connection conn2 = ((DuckDBConnection) conn).duplicate(); ``` Multiple connections are allowed, but mixing read-write and read-only connections is unsupported. ##### Configuring Connections {#docs:stable:clients:java::configuring-connections} Configuration options can be provided to change different settings of the database system. Note that many of these settings can be changed later on using [`PRAGMA` statements](#docs:stable:configuration:pragmas) as well. ```java Properties connectionProperties = new Properties(); connectionProperties.setProperty("temp_directory", "/path/to/temp/dir/"); Connection conn = DriverManager.getConnection("jdbc:duckdb:/tmp/my_database", connectionProperties); ``` ##### Querying {#docs:stable:clients:java::querying} DuckDB supports the standard JDBC methods to send queries and retrieve result sets. First a `Statement` object has to be created from the `Connection`, this object can then be used to send queries using `execute` and `executeQuery`. `execute()` is meant for queries where no results are expected like `CREATE TABLE` or `UPDATE` etc. and `executeQuery()` is meant to be used for queries that produce results (e.g., `SELECT`). Below two examples. See also the JDBC [`Statement`](https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html) and [`ResultSet`](https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html) documentations. ```java import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; Connection conn = DriverManager.getConnection("jdbc:duckdb:"); // create a table Statement stmt = conn.createStatement(); stmt.execute("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)"); // insert two items into the table stmt.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)"); try (ResultSet rs = stmt.executeQuery("SELECT * FROM items")) { while (rs.next()) { System.out.println(rs.getString(1)); System.out.println(rs.getInt(3)); } } stmt.close(); ``` ```text jeans 1 hammer 2 ``` DuckDB also supports prepared statements as per the JDBC API: ```java import java.sql.PreparedStatement; try (PreparedStatement stmt = conn.prepareStatement("INSERT INTO items VALUES (?, ?, ?);")) { stmt.setString(1, "chainsaw"); stmt.setDouble(2, 500.0); stmt.setInt(3, 42); stmt.execute(); // more calls to execute() possible } ``` > **Warning.** Do *not* use prepared statements to insert large amounts of data into DuckDB. See the [data import documentation](#docs:stable:data:overview) for better options. ##### Arrow Methods {#docs:stable:clients:java::arrow-methods} Refer to the [API Reference](https://javadoc.io/doc/org.duckdb/duckdb_jdbc/latest/org/duckdb/DuckDBResultSet.html#arrowExportStream(java.lang.Object,long)) for type signatures ###### Arrow Export {#docs:stable:clients:java::arrow-export} The following demonstrates exporting an arrow stream and consuming it using the java arrow bindings ```java import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.ipc.ArrowReader; import org.duckdb.DuckDBResultSet; try (var conn = DriverManager.getConnection("jdbc:duckdb:"); var stmt = conn.prepareStatement("SELECT * FROM generate_series(2000)"); var resultset = (DuckDBResultSet) stmt.executeQuery(); var allocator = new RootAllocator()) { try (var reader = (ArrowReader) resultset.arrowExportStream(allocator, 256)) { while (reader.loadNextBatch()) { System.out.println(reader.getVectorSchemaRoot().getVector("generate_series")); } } stmt.close(); } ``` ###### Arrow Import {#docs:stable:clients:java::arrow-import} The following demonstrates consuming an Arrow stream from the Java Arrow bindings. ```java import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.ipc.ArrowReader; import org.duckdb.DuckDBConnection; // Arrow binding try (var allocator = new RootAllocator(); ArrowStreamReader reader = null; // should not be null of course var arrow_array_stream = ArrowArrayStream.allocateNew(allocator)) { Data.exportArrayStream(allocator, reader, arrow_array_stream); // DuckDB setup try (var conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:")) { conn.registerArrowStream("asdf", arrow_array_stream); // run a query try (var stmt = conn.createStatement(); var rs = (DuckDBResultSet) stmt.executeQuery("SELECT count(*) FROM asdf")) { while (rs.next()) { System.out.println(rs.getInt(1)); } } } } ``` ##### Streaming Results {#docs:stable:clients:java::streaming-results} Result streaming is opt-in in the JDBC driver â€“ by setting the `jdbc_stream_results` config to `true` before running a query. The easiest way do that is to pass it in the `Properties` object. ```java Properties props = new Properties(); props.setProperty(DuckDBDriver.JDBC_STREAM_RESULTS, String.valueOf(true)); Connection conn = DriverManager.getConnection("jdbc:duckdb:", props); ``` ##### Appender {#docs:stable:clients:java::appender} The [Appender](#docs:stable:data:appender) is available in the DuckDB JDBC driver via the `org.duckdb.DuckDBAppender` class. The constructor of the class requires the schema name and the table name it is applied to. The Appender is flushed when the `close()` method is called. Example: ```java import java.sql.DriverManager; import java.sql.Statement; import org.duckdb.DuckDBConnection; DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:"); try (var stmt = conn.createStatement()) { stmt.execute("CREATE TABLE tbl (x BIGINT, y FLOAT, s VARCHAR)" ); // using try-with-resources to automatically close the appender at the end of the scope try (var appender = conn.createAppender(DuckDBConnection.DEFAULT_SCHEMA, "tbl")) { appender.beginRow(); appender.append(10); appender.append(3.2); appender.append("hello"); appender.endRow(); appender.beginRow(); appender.append(20); appender.append(-8.1); appender.append("world"); appender.endRow(); } ``` ##### Batch Writer {#docs:stable:clients:java::batch-writer} The DuckDB JDBC driver offers batch write functionality. The batch writer supports prepared statements to mitigate the overhead of query parsing. > The preferred method for bulk inserts is to use the [Appender](#::appender) due to its higher performance. > However, when using the Appender is not possible, the batch writer is available as alternative. ###### Batch Writer with Prepared Statements {#docs:stable:clients:java::batch-writer-with-prepared-statements} ```java import java.sql.DriverManager; import java.sql.PreparedStatement; import org.duckdb.DuckDBConnection; DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:"); PreparedStatement stmt = conn.prepareStatement("INSERT INTO test (x, y, z) VALUES (?, ?, ?);"); stmt.setObject(1, 1); stmt.setObject(2, 2); stmt.setObject(3, 3); stmt.addBatch(); stmt.setObject(1, 4); stmt.setObject(2, 5); stmt.setObject(3, 6); stmt.addBatch(); stmt.executeBatch(); stmt.close(); ``` ###### Batch Writer with Vanilla Statements {#docs:stable:clients:java::batch-writer-with-vanilla-statements} The batch writer also supports vanilla SQL statements: ```java import java.sql.DriverManager; import java.sql.Statement; import org.duckdb.DuckDBConnection; DuckDBConnection conn = (DuckDBConnection) DriverManager.getConnection("jdbc:duckdb:"); Statement stmt = conn.createStatement(); stmt.execute("CREATE TABLE test (x INTEGER, y INTEGER, z INTEGER)"); stmt.addBatch("INSERT INTO test (x, y, z) VALUES (1, 2, 3);"); stmt.addBatch("INSERT INTO test (x, y, z) VALUES (4, 5, 6);"); stmt.executeBatch(); stmt.close(); ``` #### Troubleshooting {#docs:stable:clients:java::troubleshooting} ##### Driver Class Not Found {#docs:stable:clients:java::driver-class-not-found} If the Java application is unable to find the DuckDB, it may throw the following error: ```console Exception in thread "main" java.sql.SQLException: No suitable driver found for jdbc:duckdb: at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:706) at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:252) ... ``` And when trying to load the class manually, it may result in this error: ```console Exception in thread "main" java.lang.ClassNotFoundException: org.duckdb.DuckDBDriver at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520) at java.base/java.lang.Class.forName0(Native Method) at java.base/java.lang.Class.forName(Class.java:375) ... ``` These errors stem from the DuckDB Maven/Gradle dependency not being detected. To ensure that it is detected, force refresh the Maven configuration in your IDE. ## Julia Client {#docs:stable:clients:julia} The DuckDB Julia package provides a high-performance front-end for DuckDB. Much like SQLite, DuckDB runs in-process within the Julia client, and provides a DBInterface front-end. The package also supports multi-threaded execution. It uses Julia threads/tasks for this purpose. If you wish to run queries in parallel, you must launch Julia with multi-threading support (by e.g., setting the `JULIA_NUM_THREADS` environment variable). #### Installation {#docs:stable:clients:julia::installation} Install DuckDB as follows: ```julia using Pkg Pkg.add("DuckDB") ``` Alternatively, enter the package manager using the `]` key, and issue the following command: ```julia pkg> add DuckDB ``` #### Basics {#docs:stable:clients:julia::basics} ```julia using DuckDB # create a new in-memory database con = DBInterface.connect(DuckDB.DB, ":memory:") # create a table DBInterface.execute(con, "CREATE TABLE integers (i INTEGER)") # insert data by executing a prepared statement stmt = DBInterface.prepare(con, "INSERT INTO integers VALUES(?)") DBInterface.execute(stmt, [42]) # query the database results = DBInterface.execute(con, "SELECT 42 a") print(results) ``` Some SQL statements, such as PIVOT and IMPORT DATABASE are executed as multiple prepared statements and will error when using `DuckDB.execute()`. Instead they can be run with `DuckDB.query()` instead of `DuckDB.execute()` and will always return a materialized result. #### Scanning DataFrames {#docs:stable:clients:julia::scanning-dataframes} The DuckDB Julia package also provides support for querying Julia DataFrames. Note that the DataFrames are directly read by DuckDB â€“ they are not inserted or copied into the database itself. If you wish to load data from a DataFrame into a DuckDB table you can run a `CREATE TABLE ... AS` or `INSERT INTO` query. ```julia using DuckDB using DataFrames # create a new in-memory dabase con = DBInterface.connect(DuckDB.DB) # create a DataFrame df = DataFrame(a = [1, 2, 3], b = [42, 84, 42]) # register it as a view in the database DuckDB.register_data_frame(con, df, "my_df") # run a SQL query over the DataFrame results = DBInterface.execute(con, "SELECT * FROM my_df") print(results) ``` #### Appender API {#docs:stable:clients:julia::appender-api} The DuckDB Julia package also supports the [Appender API](#docs:stable:data:appender), which is much faster than using prepared statements or individual `INSERT INTO` statements. Appends are made in row-wise format. For every column, an `append()` call should be made, after which the row should be finished by calling `flush()`. After all rows have been appended, `close()` should be used to finalize the Appender and clean up the resulting memory. ```julia using DuckDB, DataFrames, Dates db = DuckDB.DB() # create a table DBInterface.execute(db, "CREATE OR REPLACE TABLE data (id INTEGER PRIMARY KEY, value FLOAT, timestamp TIMESTAMP, date DATE)") # create data to insert len = 100 df = DataFrames.DataFrame( id = collect(1:len), value = rand(len), timestamp = Dates.now() + Dates.Second.(1:len), date = Dates.today() + Dates.Day.(1:len) ) # append data by row appender = DuckDB.Appender(db, "data") for i in eachrow(df) for j in i DuckDB.append(appender, j) end DuckDB.end_row(appender) end # close the appender after all rows DuckDB.close(appender) ``` #### Concurrency {#docs:stable:clients:julia::concurrency} Within a Julia process, tasks are able to concurrently read and write to the database, as long as each task maintains its own connection to the database. In the example below, a single task is spawned to periodically read the database and many tasks are spawned to write to the database using both [`INSERT` statements](#docs:stable:sql:statements:insert) as well as the [Appender API](#docs:stable:data:appender). ```julia using Dates, DataFrames, DuckDB db = DuckDB.DB() DBInterface.connect(db) DBInterface.execute(db, "CREATE OR REPLACE TABLE data (date TIMESTAMP, id INTEGER)") function run_reader(db) # create a DuckDB connection specifically for this task conn = DBInterface.connect(db) while true println(DBInterface.execute(conn, "SELECT id, count(date) AS count, max(date) AS max_date FROM data GROUP BY id ORDER BY id") |> DataFrames.DataFrame) Threads.sleep(1) end DBInterface.close(conn) end # spawn one reader task Threads.@spawn run_reader(db) function run_inserter(db, id) # create a DuckDB connection specifically for this task conn = DBInterface.connect(db) for i in 1:1000 Threads.sleep(0.01) DuckDB.execute(conn, "INSERT INTO data VALUES (current_timestamp, ?)"; id); end DBInterface.close(conn) end # spawn many insert tasks for i in 1:100 Threads.@spawn run_inserter(db, 1) end function run_appender(db, id) # create a DuckDB connection specifically for this task appender = DuckDB.Appender(db, "data") for i in 1:1000 Threads.sleep(0.01) row = (Dates.now(Dates.UTC), id) for j in row DuckDB.append(appender, j); end DuckDB.end_row(appender); end DuckDB.close(appender); end # spawn many appender tasks for i in 1:100 Threads.@spawn run_appender(db, 2) end ``` #### Original Julia Connector {#docs:stable:clients:julia::original-julia-connector} Credits to kimmolinna for the [original DuckDB Julia connector](https://github.com/kimmolinna/DuckDB.jl). ## Node.js (Deprecated) {#clients:nodejs} ### Node.js API {#docs:stable:clients:nodejs:overview} > The latest stable version of the DuckDB Node.js (deprecated) client is {{ site.current_duckdb_nodejs_version }}. > **Deprecated.** The old DuckDB Node.js package is deprecated. > Please use the [DuckDB Node Neo package](#docs:stable:clients:node_neo:overview) instead. The [`duckdb`](https://www.npmjs.com/package/duckdb) package provides a Node.js API for DuckDB. The API for this client is somewhat compliant to the SQLite Node.js client for easier transition. #### Initializing {#docs:stable:clients:nodejs:overview::initializing} Load the package and create a database object: ```js const duckdb = require('duckdb'); const db = new duckdb.Database(':memory:'); // or a file name for a persistent DB ``` All options as described on [Database configuration](#docs:stable:configuration:overview::configuration-reference) can be (optionally) supplied to the `Database` constructor as second argument. The third argument can be optionally supplied to get feedback on the given options. ```js const db = new duckdb.Database(':memory:', { "access_mode": "READ_WRITE", "max_memory": "512MB", "threads": "4" }, (err) => { if (err) { console.error(err); } }); ``` #### Running a Query {#docs:stable:clients:nodejs:overview::running-a-query} The following code snippet runs a simple query using the `Database.all()` method. ```js db.all('SELECT 42 AS fortytwo', function(err, res) { if (err) { console.warn(err); return; } console.log(res[0].fortytwo) }); ``` Other available methods are `each`, where the callback is invoked for each row, `run` to execute a single statement without results and `exec`, which can execute several SQL commands at once but also does not return results. All those commands can work with prepared statements, taking the values for the parameters as additional arguments. For example like so: ```js db.all('SELECT ?::INTEGER AS fortytwo, ?::VARCHAR AS hello', 42, 'Hello, World', function(err, res) { if (err) { console.warn(err); return; } console.log(res[0].fortytwo) console.log(res[0].hello) }); ``` #### Connections {#docs:stable:clients:nodejs:overview::connections} A database can have multiple `Connection`s, those are created using `db.connect()`. ```js const con = db.connect(); ``` You can create multiple connections, each with their own transaction context. `Connection` objects also contain shorthands to directly call `run()`, `all()` and `each()` with parameters and callbacks, respectively, for example: ```js con.all('SELECT 42 AS fortytwo', function(err, res) { if (err) { console.warn(err); return; } console.log(res[0].fortytwo) }); ``` #### Prepared Statements {#docs:stable:clients:nodejs:overview::prepared-statements} From connections, you can create prepared statements (and only that) using `con.prepare()`: ```js const stmt = con.prepare('SELECT ?::INTEGER AS fortytwo'); ``` To execute this statement, you can call for example `all()` on the `stmt` object: ```js stmt.all(42, function(err, res) { if (err) { console.warn(err); } else { console.log(res[0].fortytwo) } }); ``` You can also execute the prepared statement multiple times. This is for example useful to fill a table with data: ```js con.run('CREATE TABLE a (i INTEGER)'); const stmt = con.prepare('INSERT INTO a VALUES (?)'); for (let i = 0; i < 10; i++) { stmt.run(i); } stmt.finalize(); con.all('SELECT * FROM a', function(err, res) { if (err) { console.warn(err); } else { console.log(res) } }); ``` `prepare()` can also take a callback which gets the prepared statement as an argument: ```js const stmt = con.prepare('SELECT ?::INTEGER AS fortytwo', function(err, stmt) { stmt.all(42, function(err, res) { if (err) { console.warn(err); } else { console.log(res[0].fortytwo) } }); }); ``` #### Inserting Data via Arrow {#docs:stable:clients:nodejs:overview::inserting-data-via-arrow} [Apache Arrow](#docs:stable:guides:python:sql_on_arrow) can be used to insert data into DuckDB without making a copy: ```js const arrow = require('apache-arrow'); const db = new duckdb.Database(':memory:'); const jsonData = [ {"userId":1,"id":1,"title":"delectus aut autem","completed":false}, {"userId":1,"id":2,"title":"quis ut nam facilis et officia qui","completed":false} ]; // note; doesn't work on Windows yet db.exec(` INSTALL arrow; LOAD arrow;`, (err) => { if (err) { console.warn(err); return; } const arrowTable = arrow.tableFromJSON(jsonData); db.register_buffer("jsonDataTable", [arrow.tableToIPC(arrowTable)], true, (err, res) => { if (err) { console.warn(err); return; } // `SELECT * FROM jsonDataTable` would return the entries in `jsonData` }); }); ``` #### Loading Unsigned Extensions {#docs:stable:clients:nodejs:overview::loading-unsigned-extensions} To load [unsigned extensions](#docs:stable:extensions:overview::unsigned-extensions), instantiate the database as follows: ```js db = new duckdb.Database(':memory:', {"allow_unsigned_extensions": "true"}); ``` ### Node.js API {#docs:stable:clients:nodejs:reference} #### Modules {#docs:stable:clients:nodejs:reference::modules}

duckdb

#### Typedefs {#docs:stable:clients:nodejs:reference::typedefs}

ColumnInfo : object
TypeInfo : object
DuckDbError : object
HTTPError : object

#### duckdb {#docs:stable:clients:nodejs:reference::duckdb} **Summary**: DuckDB is an embeddable SQL OLAP Database Management System * [duckdb](#::module_duckdb) * [~Connection](#::module_duckdb..Connection) * [.run(sql, ...params, callback)](#::module_duckdb..Connection+run) â‡’ void * [.all(sql, ...params, callback)](#::module_duckdb..Connection+all) â‡’ void * [.arrowIPCAll(sql, ...params, callback)](#::module_duckdb..Connection+arrowIPCAll) â‡’ void * [.arrowIPCStream(sql, ...params, callback)](#::module_duckdb..Connection+arrowIPCStream) â‡’ * [.each(sql, ...params, callback)](#::module_duckdb..Connection+each) â‡’ void * [.stream(sql, ...params)](#::module_duckdb..Connection+stream) * [.register_udf(name, return_type, fun)](#::module_duckdb..Connection+register_udf) â‡’ void * [.prepare(sql, ...params, callback)](#::module_duckdb..Connection+prepare) â‡’ Statement * [.exec(sql, ...params, callback)](#::module_duckdb..Connection+exec) â‡’ void * [.register_udf_bulk(name, return_type, callback)](#::module_duckdb..Connection+register_udf_bulk) â‡’ void * [.unregister_udf(name, return_type, callback)](#::module_duckdb..Connection+unregister_udf) â‡’ void * [.register_buffer(name, array, force, callback)](#::module_duckdb..Connection+register_buffer) â‡’ void * [.unregister_buffer(name, callback)](#::module_duckdb..Connection+unregister_buffer) â‡’ void * [.close(callback)](#::module_duckdb..Connection+close) â‡’ void * [~Statement](#::module_duckdb..Statement) * [.sql](#::module_duckdb..Statement+sql) â‡’ * [.get()](#::module_duckdb..Statement+get) * [.run(sql, ...params, callback)](#::module_duckdb..Statement+run) â‡’ void * [.all(sql, ...params, callback)](#::module_duckdb..Statement+all) â‡’ void * [.arrowIPCAll(sql, ...params, callback)](#::module_duckdb..Statement+arrowIPCAll) â‡’ void * [.each(sql, ...params, callback)](#::module_duckdb..Statement+each) â‡’ void * [.finalize(sql, ...params, callback)](#::module_duckdb..Statement+finalize) â‡’ void * [.stream(sql, ...params)](#::module_duckdb..Statement+stream) * [.columns()](#::module_duckdb..Statement+columns) â‡’ [Array.<ColumnInfo>](#::ColumnInfo) * [~QueryResult](#::module_duckdb..QueryResult) * [.nextChunk()](#::module_duckdb..QueryResult+nextChunk) â‡’ * [.nextIpcBuffer()](#::module_duckdb..QueryResult+nextIpcBuffer) â‡’ * [.asyncIterator()](#::module_duckdb..QueryResult+asyncIterator) * [~Database](#::module_duckdb..Database) * [.close(callback)](#::module_duckdb..Database+close) â‡’ void * [.close_internal(callback)](#::module_duckdb..Database+close_internal) â‡’ void * [.wait(callback)](#::module_duckdb..Database+wait) â‡’ void * [.serialize(callback)](#::module_duckdb..Database+serialize) â‡’ void * [.parallelize(callback)](#::module_duckdb..Database+parallelize) â‡’ void * [.connect(path)](#::module_duckdb..Database+connect) â‡’ Connection * [.interrupt(callback)](#::module_duckdb..Database+interrupt) â‡’ void * [.prepare(sql)](#::module_duckdb..Database+prepare) â‡’ Statement * [.run(sql, ...params, callback)](#::module_duckdb..Database+run) â‡’ void * [.scanArrowIpc(sql, ...params, callback)](#::module_duckdb..Database+scanArrowIpc) â‡’ void * [.each(sql, ...params, callback)](#::module_duckdb..Database+each) â‡’ void * [.stream(sql, ...params)](#::module_duckdb..Database+stream) * [.all(sql, ...params, callback)](#::module_duckdb..Database+all) â‡’ void * [.arrowIPCAll(sql, ...params, callback)](#::module_duckdb..Database+arrowIPCAll) â‡’ void * [.arrowIPCStream(sql, ...params, callback)](#::module_duckdb..Database+arrowIPCStream) â‡’ void * [.exec(sql, ...params, callback)](#::module_duckdb..Database+exec) â‡’ void * [.register_udf(name, return_type, fun)](#::module_duckdb..Database+register_udf) â‡’ this * [.register_buffer(name)](#::module_duckdb..Database+register_buffer) â‡’ this * [.unregister_buffer(name)](#::module_duckdb..Database+unregister_buffer) â‡’ this * [.unregister_udf(name)](#::module_duckdb..Database+unregister_udf) â‡’ this * [.registerReplacementScan(fun)](#::module_duckdb..Database+registerReplacementScan) â‡’ this * [.tokenize(text)](#::module_duckdb..Database+tokenize) â‡’ ScriptTokens * [.get()](#::module_duckdb..Database+get) * [~TokenType](#::module_duckdb..TokenType) * [~ERROR](#::module_duckdb..ERROR) : number * [~OPEN_READONLY](#::module_duckdb..OPEN_READONLY) : number * [~OPEN_READWRITE](#::module_duckdb..OPEN_READWRITE) : number * [~OPEN_CREATE](#::module_duckdb..OPEN_CREATE) : number * [~OPEN_FULLMUTEX](#::module_duckdb..OPEN_FULLMUTEX) : number * [~OPEN_SHAREDCACHE](#::module_duckdb..OPEN_SHAREDCACHE) : number * [~OPEN_PRIVATECACHE](#::module_duckdb..OPEN_PRIVATECACHE) : number ##### duckdb~Connection {#docs:stable:clients:nodejs:reference::duckdbconnection} **Kind**: inner class of [duckdb](#::module_duckdb) * [~Connection](#::module_duckdb..Connection) * [.run(sql, ...params, callback)](#::module_duckdb..Connection+run) â‡’ void * [.all(sql, ...params, callback)](#::module_duckdb..Connection+all) â‡’ void * [.arrowIPCAll(sql, ...params, callback)](#::module_duckdb..Connection+arrowIPCAll) â‡’ void * [.arrowIPCStream(sql, ...params, callback)](#::module_duckdb..Connection+arrowIPCStream) â‡’ * [.each(sql, ...params, callback)](#::module_duckdb..Connection+each) â‡’ void * [.stream(sql, ...params)](#::module_duckdb..Connection+stream) * [.register_udf(name, return_type, fun)](#::module_duckdb..Connection+register_udf) â‡’ void * [.prepare(sql, ...params, callback)](#::module_duckdb..Connection+prepare) â‡’ Statement * [.exec(sql, ...params, callback)](#::module_duckdb..Connection+exec) â‡’ void * [.register_udf_bulk(name, return_type, callback)](#::module_duckdb..Connection+register_udf_bulk) â‡’ void * [.unregister_udf(name, return_type, callback)](#::module_duckdb..Connection+unregister_udf) â‡’ void * [.register_buffer(name, array, force, callback)](#::module_duckdb..Connection+register_buffer) â‡’ void * [.unregister_buffer(name, callback)](#::module_duckdb..Connection+unregister_buffer) â‡’ void * [.close(callback)](#::module_duckdb..Connection+close) â‡’ void ###### connection.run(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::connectionrunsql-params-callback--codevoidcode} Run a SQL statement and trigger a callback when done **Kind**: instance method of [Connection](#::module_duckdb..Connection) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### connection.all(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::connectionallsql-params-callback--codevoidcode} Run a SQL query and triggers the callback once for all result rows **Kind**: instance method of [Connection](#::module_duckdb..Connection) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### connection.arrowIPCAll(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::connectionarrowipcallsql-params-callback--codevoidcode} Run a SQL query and serialize the result into the Apache Arrow IPC format (requires arrow extension to be loaded) **Kind**: instance method of [Connection](#::module_duckdb..Connection) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### connection.arrowIPCStream(sql, ...params, callback) â‡’ {#docs:stable:clients:nodejs:reference::connectionarrowipcstreamsql-params-callback-} Run a SQL query, returns a IpcResultStreamIterator that allows streaming the result into the Apache Arrow IPC format (requires arrow extension to be loaded) **Kind**: instance method of [Connection](#::module_duckdb..Connection) **Returns**: Promise | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### connection.each(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::connectioneachsql-params-callback--codevoidcode} Runs a SQL query and triggers the callback for each result row **Kind**: instance method of [Connection](#::module_duckdb..Connection) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### connection.stream(sql, ...params) {#docs:stable:clients:nodejs:reference::connectionstreamsql-params} **Kind**: instance method of [Connection](#::module_duckdb..Connection) | Param | Type | | --- | --- | | sql | | | ...params | \* | ###### connection.register\_udf(name, return_type, fun) â‡’ void {#docs:stable:clients:nodejs:reference::connectionregister_udfname-return_type-fun--codevoidcode} Register a User Defined Function **Kind**: instance method of [Connection](#::module_duckdb..Connection) **Note**: this follows the wasm udfs somewhat but is simpler because we can pass data much more cleanly | Param | | --- | | name | | return_type | | fun | ###### connection.prepare(sql, ...params, callback) â‡’ Statement {#docs:stable:clients:nodejs:reference::connectionpreparesql-params-callback--codestatementcode} Prepare a SQL query for execution **Kind**: instance method of [Connection](#::module_duckdb..Connection) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### connection.exec(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::connectionexecsql-params-callback--codevoidcode} Execute a SQL query **Kind**: instance method of [Connection](#::module_duckdb..Connection) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### connection.register\_udf\_bulk(name, return_type, callback) â‡’ void {#docs:stable:clients:nodejs:reference::connectionregister_udf_bulkname-return_type-callback--codevoidcode} Register a User Defined Function **Kind**: instance method of [Connection](#::module_duckdb..Connection) | Param | | --- | | name | | return_type | | callback | ###### connection.unregister\_udf(name, return_type, callback) â‡’ void {#docs:stable:clients:nodejs:reference::connectionunregister_udfname-return_type-callback--codevoidcode} Unregister a User Defined Function **Kind**: instance method of [Connection](#::module_duckdb..Connection) | Param | | --- | | name | | return_type | | callback | ###### connection.register\_buffer(name, array, force, callback) â‡’ void {#docs:stable:clients:nodejs:reference::connectionregister_buffername-array-force-callback--codevoidcode} Register a Buffer to be scanned using the Apache Arrow IPC scanner (requires arrow extension to be loaded) **Kind**: instance method of [Connection](#::module_duckdb..Connection) | Param | | --- | | name | | array | | force | | callback | ###### connection.unregister\_buffer(name, callback) â‡’ void {#docs:stable:clients:nodejs:reference::connectionunregister_buffername-callback--codevoidcode} Unregister the Buffer **Kind**: instance method of [Connection](#::module_duckdb..Connection) | Param | | --- | | name | | callback | ###### connection.close(callback) â‡’ void {#docs:stable:clients:nodejs:reference::connectionclosecallback--codevoidcode} Closes connection **Kind**: instance method of [Connection](#::module_duckdb..Connection) | Param | | --- | | callback | ##### duckdb~Statement {#docs:stable:clients:nodejs:reference::duckdbstatement} **Kind**: inner class of [duckdb](#::module_duckdb) * [~Statement](#::module_duckdb..Statement) * [.sql](#::module_duckdb..Statement+sql) â‡’ * [.get()](#::module_duckdb..Statement+get) * [.run(sql, ...params, callback)](#::module_duckdb..Statement+run) â‡’ void * [.all(sql, ...params, callback)](#::module_duckdb..Statement+all) â‡’ void * [.arrowIPCAll(sql, ...params, callback)](#::module_duckdb..Statement+arrowIPCAll) â‡’ void * [.each(sql, ...params, callback)](#::module_duckdb..Statement+each) â‡’ void * [.finalize(sql, ...params, callback)](#::module_duckdb..Statement+finalize) â‡’ void * [.stream(sql, ...params)](#::module_duckdb..Statement+stream) * [.columns()](#::module_duckdb..Statement+columns) â‡’ [Array.<ColumnInfo>](#::ColumnInfo) ###### statement.sql â‡’ {#docs:stable:clients:nodejs:reference::statementsql-} **Kind**: instance property of [Statement](#::module_duckdb..Statement) **Returns**: sql contained in statement **Field**: ###### statement.get() {#docs:stable:clients:nodejs:reference::statementget} Not implemented **Kind**: instance method of [Statement](#::module_duckdb..Statement) ###### statement.run(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::statementrunsql-params-callback--codevoidcode} **Kind**: instance method of [Statement](#::module_duckdb..Statement) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### statement.all(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::statementallsql-params-callback--codevoidcode} **Kind**: instance method of [Statement](#::module_duckdb..Statement) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### statement.arrowIPCAll(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::statementarrowipcallsql-params-callback--codevoidcode} **Kind**: instance method of [Statement](#::module_duckdb..Statement) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### statement.each(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::statementeachsql-params-callback--codevoidcode} **Kind**: instance method of [Statement](#::module_duckdb..Statement) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### statement.finalize(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::statementfinalizesql-params-callback--codevoidcode} **Kind**: instance method of [Statement](#::module_duckdb..Statement) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### statement.stream(sql, ...params) {#docs:stable:clients:nodejs:reference::statementstreamsql-params} **Kind**: instance method of [Statement](#::module_duckdb..Statement) | Param | Type | | --- | --- | | sql | | | ...params | \* | ###### statement.columns() â‡’ [Array.<ColumnInfo>](#::ColumnInfo) {#docs:stable:clients:nodejs:reference::statementcolumns--codearrayltcolumninfogtcodecolumninfo} **Kind**: instance method of [Statement](#::module_duckdb..Statement) **Returns**: [Array.<ColumnInfo>](#::ColumnInfo) - - Array of column names and types ##### duckdb~QueryResult {#docs:stable:clients:nodejs:reference::duckdbqueryresult} **Kind**: inner class of [duckdb](#::module_duckdb) * [~QueryResult](#::module_duckdb..QueryResult) * [.nextChunk()](#::module_duckdb..QueryResult+nextChunk) â‡’ * [.nextIpcBuffer()](#::module_duckdb..QueryResult+nextIpcBuffer) â‡’ * [.asyncIterator()](#::module_duckdb..QueryResult+asyncIterator) ###### queryResult.nextChunk() â‡’ {#docs:stable:clients:nodejs:reference::queryresultnextchunk-} **Kind**: instance method of [QueryResult](#::module_duckdb..QueryResult) **Returns**: data chunk ###### queryResult.nextIpcBuffer() â‡’ {#docs:stable:clients:nodejs:reference::queryresultnextipcbuffer-} Function to fetch the next result blob of an Arrow IPC Stream in a zero-copy way. (requires arrow extension to be loaded) **Kind**: instance method of [QueryResult](#::module_duckdb..QueryResult) **Returns**: data chunk ###### queryResult.asyncIterator() {#docs:stable:clients:nodejs:reference::queryresultasynciterator} **Kind**: instance method of [QueryResult](#::module_duckdb..QueryResult) ##### duckdb~Database {#docs:stable:clients:nodejs:reference::duckdbdatabase} Main database interface **Kind**: inner property of [duckdb](#::module_duckdb) | Param | Description | | --- | --- | | path | path to database file or :memory: for in-memory database | | access_mode | access mode | | config | the configuration object | | callback | callback function | * [~Database](#::module_duckdb..Database) * [.close(callback)](#::module_duckdb..Database+close) â‡’ void * [.close_internal(callback)](#::module_duckdb..Database+close_internal) â‡’ void * [.wait(callback)](#::module_duckdb..Database+wait) â‡’ void * [.serialize(callback)](#::module_duckdb..Database+serialize) â‡’ void * [.parallelize(callback)](#::module_duckdb..Database+parallelize) â‡’ void * [.connect(path)](#::module_duckdb..Database+connect) â‡’ Connection * [.interrupt(callback)](#::module_duckdb..Database+interrupt) â‡’ void * [.prepare(sql)](#::module_duckdb..Database+prepare) â‡’ Statement * [.run(sql, ...params, callback)](#::module_duckdb..Database+run) â‡’ void * [.scanArrowIpc(sql, ...params, callback)](#::module_duckdb..Database+scanArrowIpc) â‡’ void * [.each(sql, ...params, callback)](#::module_duckdb..Database+each) â‡’ void * [.stream(sql, ...params)](#::module_duckdb..Database+stream) * [.all(sql, ...params, callback)](#::module_duckdb..Database+all) â‡’ void * [.arrowIPCAll(sql, ...params, callback)](#::module_duckdb..Database+arrowIPCAll) â‡’ void * [.arrowIPCStream(sql, ...params, callback)](#::module_duckdb..Database+arrowIPCStream) â‡’ void * [.exec(sql, ...params, callback)](#::module_duckdb..Database+exec) â‡’ void * [.register_udf(name, return_type, fun)](#::module_duckdb..Database+register_udf) â‡’ this * [.register_buffer(name)](#::module_duckdb..Database+register_buffer) â‡’ this * [.unregister_buffer(name)](#::module_duckdb..Database+unregister_buffer) â‡’ this * [.unregister_udf(name)](#::module_duckdb..Database+unregister_udf) â‡’ this * [.registerReplacementScan(fun)](#::module_duckdb..Database+registerReplacementScan) â‡’ this * [.tokenize(text)](#::module_duckdb..Database+tokenize) â‡’ ScriptTokens * [.get()](#::module_duckdb..Database+get) ###### database.close(callback) â‡’ void {#docs:stable:clients:nodejs:reference::databaseclosecallback--codevoidcode} Closes database instance **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | | --- | | callback | ###### database.close\_internal(callback) â‡’ void {#docs:stable:clients:nodejs:reference::databaseclose_internalcallback--codevoidcode} Internal method. Do not use, call Connection#close instead **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | | --- | | callback | ###### database.wait(callback) â‡’ void {#docs:stable:clients:nodejs:reference::databasewaitcallback--codevoidcode} Triggers callback when all scheduled database tasks have completed. **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | | --- | | callback | ###### database.serialize(callback) â‡’ void {#docs:stable:clients:nodejs:reference::databaseserializecallback--codevoidcode} Currently a no-op. Provided for SQLite compatibility **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | | --- | | callback | ###### database.parallelize(callback) â‡’ void {#docs:stable:clients:nodejs:reference::databaseparallelizecallback--codevoidcode} Currently a no-op. Provided for SQLite compatibility **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | | --- | | callback | ###### database.connect(path) â‡’ Connection {#docs:stable:clients:nodejs:reference::databaseconnectpath--codeconnectioncode} Create a new database connection **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | Description | | --- | --- | | path | the database to connect to, either a file path, or `:memory:` | ###### database.interrupt(callback) â‡’ void {#docs:stable:clients:nodejs:reference::databaseinterruptcallback--codevoidcode} Supposedly interrupt queries, but currently does not do anything. **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | | --- | | callback | ###### database.prepare(sql) â‡’ Statement {#docs:stable:clients:nodejs:reference::databasepreparesql--codestatementcode} Prepare a SQL query for execution **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | | --- | | sql | ###### database.run(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::databaserunsql-params-callback--codevoidcode} Convenience method for Connection#run using a built-in default connection **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### database.scanArrowIpc(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::databasescanarrowipcsql-params-callback--codevoidcode} Convenience method for Connection#scanArrowIpc using a built-in default connection **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### database.each(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::databaseeachsql-params-callback--codevoidcode} **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### database.stream(sql, ...params) {#docs:stable:clients:nodejs:reference::databasestreamsql-params} **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | Type | | --- | --- | | sql | | | ...params | \* | ###### database.all(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::databaseallsql-params-callback--codevoidcode} Convenience method for Connection#apply using a built-in default connection **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### database.arrowIPCAll(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::databasearrowipcallsql-params-callback--codevoidcode} Convenience method for Connection#arrowIPCAll using a built-in default connection **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### database.arrowIPCStream(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::databasearrowipcstreamsql-params-callback--codevoidcode} Convenience method for Connection#arrowIPCStream using a built-in default connection **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### database.exec(sql, ...params, callback) â‡’ void {#docs:stable:clients:nodejs:reference::databaseexecsql-params-callback--codevoidcode} **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | Type | | --- | --- | | sql | | | ...params | \* | | callback | | ###### database.register\_udf(name, return_type, fun) â‡’ this {#docs:stable:clients:nodejs:reference::databaseregister_udfname-return_type-fun--codethiscode} Register a User Defined Function Convenience method for Connection#register_udf **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | | --- | | name | | return_type | | fun | ###### database.register\_buffer(name) â‡’ this {#docs:stable:clients:nodejs:reference::databaseregister_buffername--codethiscode} Register a buffer containing serialized data to be scanned from DuckDB. Convenience method for Connection#unregister_buffer **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | | --- | | name | ###### database.unregister\_buffer(name) â‡’ this {#docs:stable:clients:nodejs:reference::databaseunregister_buffername--codethiscode} Unregister a Buffer Convenience method for Connection#unregister_buffer **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | | --- | | name | ###### database.unregister\_udf(name) â‡’ this {#docs:stable:clients:nodejs:reference::databaseunregister_udfname--codethiscode} Unregister a UDF Convenience method for Connection#unregister_udf **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | | --- | | name | ###### database.registerReplacementScan(fun) â‡’ this {#docs:stable:clients:nodejs:reference::databaseregisterreplacementscanfun--codethiscode} Register a table replace scan function **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | Description | | --- | --- | | fun | Replacement scan function | ###### database.tokenize(text) â‡’ ScriptTokens {#docs:stable:clients:nodejs:reference::databasetokenizetext--codescripttokenscode} Return positions and types of tokens in given text **Kind**: instance method of [Database](#::module_duckdb..Database) | Param | | --- | | text | ###### database.get() {#docs:stable:clients:nodejs:reference::databaseget} Not implemented **Kind**: instance method of [Database](#::module_duckdb..Database) ##### duckdb~TokenType {#docs:stable:clients:nodejs:reference::duckdbtokentype} Types of tokens return by `tokenize`. **Kind**: inner property of [duckdb](#::module_duckdb) ##### duckdb~ERROR : number {#docs:stable:clients:nodejs:reference::duckdberror--codenumbercode} Check that errno attribute equals this to check for a duckdb error **Kind**: inner constant of [duckdb](#::module_duckdb) ##### duckdb~OPEN\_READONLY : number {#docs:stable:clients:nodejs:reference::duckdbopen_readonly--codenumbercode} Open database in readonly mode **Kind**: inner constant of [duckdb](#::module_duckdb) ##### duckdb~OPEN\_READWRITE : number {#docs:stable:clients:nodejs:reference::duckdbopen_readwrite--codenumbercode} Currently ignored **Kind**: inner constant of [duckdb](#::module_duckdb) ##### duckdb~OPEN\_CREATE : number {#docs:stable:clients:nodejs:reference::duckdbopen_create--codenumbercode} Currently ignored **Kind**: inner constant of [duckdb](#::module_duckdb) ##### duckdb~OPEN\_FULLMUTEX : number {#docs:stable:clients:nodejs:reference::duckdbopen_fullmutex--codenumbercode} Currently ignored **Kind**: inner constant of [duckdb](#::module_duckdb) ##### duckdb~OPEN\_SHAREDCACHE : number {#docs:stable:clients:nodejs:reference::duckdbopen_sharedcache--codenumbercode} Currently ignored **Kind**: inner constant of [duckdb](#::module_duckdb) ##### duckdb~OPEN\_PRIVATECACHE : number {#docs:stable:clients:nodejs:reference::duckdbopen_privatecache--codenumbercode} Currently ignored **Kind**: inner constant of [duckdb](#::module_duckdb) #### ColumnInfo : object {#docs:stable:clients:nodejs:reference::columninfo--codeobjectcode} **Kind**: global typedef **Properties** | Name | Type | Description | | --- | --- | --- | | name | string | Column name | | type | [TypeInfo](#::TypeInfo) | Column type | #### TypeInfo : object {#docs:stable:clients:nodejs:reference::typeinfo--codeobjectcode} **Kind**: global typedef **Properties** | Name | Type | Description | | --- | --- | --- | | id | string | Type ID | | [alias] | string | SQL type alias | | sql_type | string | SQL type name | #### DuckDbError : object {#docs:stable:clients:nodejs:reference::duckdberror--codeobjectcode} **Kind**: global typedef **Properties** | Name | Type | Description | | --- | --- | --- | | errno | number | -1 for DuckDB errors | | message | string | Error message | | code | string | 'DUCKDB_NODEJS_ERROR' for DuckDB errors | | errorType | string | DuckDB error type code (eg, HTTP, IO, Catalog) | #### HTTPError : object {#docs:stable:clients:nodejs:reference::httperror--codeobjectcode} **Kind**: global typedef **Extends**: [DuckDbError](#::DuckDbError) **Properties** | Name | Type | Description | | --- | --- | --- | | statusCode | number | HTTP response status code | | reason | string | HTTP response reason | | response | string | HTTP response body | | headers | object | HTTP headers | ## Node.js (Neo) {#clients:node_neo} ### Node.js Client (Neo) {#docs:stable:clients:node_neo:overview} > The latest stable version of the DuckDB Node.js (Neo) client is {{ site.current_duckdb_node_neo_version }}. An API for using [DuckDB](https://duckdb.org/index.html) in [Node.js](https://nodejs.org/). The primary package, [@duckdb/node-api](https://www.npmjs.com/package/@duckdb/node-api), is a high-level API meant for applications. It depends on low-level bindings that adhere closely to [DuckDB's C API](#docs:stable:clients:c:overview), available separately as [@duckdb/node-bindings](https://www.npmjs.com/package/@duckdb/node-bindings). #### Features {#docs:stable:clients:node_neo:overview::features} ##### Main Differences from [duckdb-node](https://www.npmjs.com/package/duckdb) {#docs:stable:clients:node_neo:overview::main-differences-from-duckdb-nodehttpswwwnpmjscompackageduckdb} * Native support for Promises; no need for separate [duckdb-async](https://www.npmjs.com/package/duckdb-async) wrapper. * DuckDB-specific API; not based on the [SQLite Node API](https://www.npmjs.com/package/sqlite3). * Lossless & efficient support for values of all [DuckDB data types](#docs:stable:sql:data_types:overview). * Wraps [released DuckDB binaries](https://github.com/duckdb/duckdb/releases) instead of rebuilding DuckDB. * Built on [DuckDB's C API](#docs:stable:clients:c:overview); exposes more functionality. ##### Roadmap {#docs:stable:clients:node_neo:overview::roadmap} Some features are not yet complete: * Binding and appending the MAP and UNION data types * Appending default values row-by-row * User-defined types & functions * Profiling info * Table description * APIs for Arrow See the [issues list on GitHub](https://github.com/duckdb/duckdb-node-neo/issues) for the most up-to-date roadmap. ##### Supported Platforms {#docs:stable:clients:node_neo:overview::supported-platforms} * Linux arm64 * Linux x64 * Mac OS X (Darwin) arm64 (Apple Silicon) * Mac OS X (Darwin) x64 (Intel) * Windows (Win32) x64 #### Examples {#docs:stable:clients:node_neo:overview::examples} ##### Get Basic Information {#docs:stable:clients:node_neo:overview::get-basic-information} ```ts import duckdb from '@duckdb/node-api'; console.log(duckdb.version()); console.log(duckdb.configurationOptionDescriptions()); ``` ##### Connect {#docs:stable:clients:node_neo:overview::connect} ```ts import { DuckDBConnection } from '@duckdb/node-api'; const connection = await DuckDBConnection.create(); ``` This uses the default instance. For advanced usage, you can create instances explicitly. ##### Create Instance {#docs:stable:clients:node_neo:overview::create-instance} ```ts import { DuckDBInstance } from '@duckdb/node-api'; ``` Create with an in-memory database: ```ts const instance = await DuckDBInstance.create(':memory:'); ``` Equivalent to the above: ```ts const instance = await DuckDBInstance.create(); ``` Read from and write to a database file, which is created if needed: ```ts const instance = await DuckDBInstance.create('my_duckdb.db'); ``` Set [configuration options](#docs:stable:configuration:overview): ```ts const instance = await DuckDBInstance.create('my_duckdb.db', { threads: '4' }); ``` ##### Instance Cache {#docs:stable:clients:node_neo:overview::instance-cache} Multiple instances in the same process should not attach the same database. To prevent this, an instance cache can be used: ```ts const instance = await DuckDBInstance.fromCache('my_duckdb.db'); ``` This uses the default instance cache. For advanced usage, you can create instance caches explicitly: ```ts import { DuckDBInstanceCache } from '@duckdb/node-api'; const cache = new DuckDBInstanceCache(); const instance = await cache.getOrCreateInstance('my_duckdb.db'); ``` ##### Connect to Instance {#docs:stable:clients:node_neo:overview::connect-to-instance} ```ts const connection = await instance.connect(); ``` ##### Disconnect {#docs:stable:clients:node_neo:overview::disconnect} Connections will be disconnected automatically soon after their reference is dropped, but you can also disconnect explicitly if and when you want: ```ts connection.disconnectSync(); ``` or, equivalently: ```ts connection.closeSync(); ``` ##### Run SQL {#docs:stable:clients:node_neo:overview::run-sql} ```ts const result = await connection.run('from test_all_types()'); ``` ##### Parameterize SQL {#docs:stable:clients:node_neo:overview::parameterize-sql} ```ts const prepared = await connection.prepare('select $1, $2, $3'); prepared.bindVarchar(1, 'duck'); prepared.bindInteger(2, 42); prepared.bindList(3, listValue([10, 11, 12]), LIST(INTEGER)); const result = await prepared.run(); ``` or: ```ts const prepared = await connection.prepare('select $a, $b, $c'); prepared.bind({ 'a': 'duck', 'b': 42, 'c': listValue([10, 11, 12]), }, { 'a': VARCHAR, 'b': INTEGER, 'c': LIST(INTEGER), }); const result = await prepared.run(); ``` or even: ```ts const result = await connection.run('select $a, $b, $c', { 'a': 'duck', 'b': 42, 'c': listValue([10, 11, 12]), }, { 'a': VARCHAR, 'b': INTEGER, 'c': LIST(INTEGER), }); ``` Unspecified types will be inferred: ```ts const result = await connection.run('select $a, $b, $c', { 'a': 'duck', 'b': 42, 'c': listValue([10, 11, 12]), }); ``` ##### Specifying Values {#docs:stable:clients:node_neo:overview::specifying-values} Values of many data types are represented using one of the JS primitives `boolean`, `number`, `bigint`, or `string`. Also, any type can have `null` values. Values of some data types need to be constructed using special functions. These are: | Type | Function | | ---- | -------- | | `ARRAY` | `arrayValue` | | `BIT` | `bitValue` | | `BLOB` | `blobValue` | | `DATE` | `dateValue` | | `DECIMAL` | `decimalValue` | | `INTERVAL` | `intervalValue` | | `LIST` | `listValue` | | `MAP` | `mapValue` | | `STRUCT` | `structValue` | | `TIME` | `timeValue` | | `TIMETZ` | `timeTZValue` | | `TIMESTAMP` | `timestampValue` | | `TIMESTAMPTZ` | `timestampTZValue` | | `TIMESTAMP_S` | `timestampSecondsValue` | | `TIMESTAMP_MS` | `timestampMillisValue` | | `TIMESTAMP_NS` | `timestampNanosValue` | | `UNION` | `unionValue` | | `UUID` | `uuidValue` | ##### Stream Results {#docs:stable:clients:node_neo:overview::stream-results} Streaming results evaluate lazily when rows are read. ```ts const result = await connection.stream('from range(10_000)'); ``` ##### Inspect Result Metadata {#docs:stable:clients:node_neo:overview::inspect-result-metadata} Get column names and types: ```ts const columnNames = result.columnNames(); const columnTypes = result.columnTypes(); ``` ##### Read Result Data {#docs:stable:clients:node_neo:overview::read-result-data} Run and read all data: ```ts const reader = await connection.runAndReadAll('from test_all_types()'); const rows = reader.getRows(); // OR: const columns = reader.getColumns(); ``` Stream and read up to (at least) some number of rows: ```ts const reader = await connection.streamAndReadUntil( 'from range(5000)', 1000 ); const rows = reader.getRows(); // rows.length === 2048. (Rows are read in chunks of 2048.) ``` Read rows incrementally: ```ts const reader = await connection.streamAndRead('from range(5000)'); reader.readUntil(2000); // reader.currentRowCount === 2048 (Rows are read in chunks of 2048.) // reader.done === false reader.readUntil(4000); // reader.currentRowCount === 4096 // reader.done === false reader.readUntil(6000); // reader.currentRowCount === 5000 // reader.done === true ``` ##### Get Result Data {#docs:stable:clients:node_neo:overview::get-result-data} Result data can be retrieved in a variety of forms: ```ts const reader = await connection.runAndReadAll( 'from range(3) select range::int as i, 10 + i as n' ); const rows = reader.getRows(); // [ [0, 10], [1, 11], [2, 12] ] const rowObjects = reader.getRowObjects(); // [ { i: 0, n: 10 }, { i: 1, n: 11 }, { i: 2, n: 12 } ] const columns = reader.getColumns(); // [ [0, 1, 2], [10, 11, 12] ] const columnsObject = reader.getColumnsObject(); // { i: [0, 1, 2], n: [10, 11, 12] } ``` ##### Convert Result Data {#docs:stable:clients:node_neo:overview::convert-result-data} By default, data values that cannot be represented as JS built-ins are returned as specialized JS objects; see `Inspect Data Values` below. To retrieve data in a different form, such as JS built-ins or values that can be losslessly serialized to JSON, use the `JS` or `Json` forms of the above result data methods. Custom converters can be supplied as well. See the implementations of [JSDuckDBValueConverter](https://github.com/duckdb/duckdb-node-neo/blob/main/api/src/JSDuckDBValueConverter.ts) and [JsonDuckDBValueConverters](https://github.com/duckdb/duckdb-node-neo/blob/main/api/src/JsonDuckDBValueConverter.ts) for how to do this. Examples (using the `Json` forms): ```ts const reader = await connection.runAndReadAll( 'from test_all_types() select bigint, date, interval limit 2' ); const rows = reader.getRowsJson(); // [ // [ // "-9223372036854775808", // "5877642-06-25 (BC)", // { "months": 0, "days": 0, "micros": "0" } // ], // [ // "9223372036854775807", // "5881580-07-10", // { "months": 999, "days": 999, "micros": "999999999" } // ] // ] const rowObjects = reader.getRowObjectsJson(); // [ // { // "bigint": "-9223372036854775808", // "date": "5877642-06-25 (BC)", // "interval": { "months": 0, "days": 0, "micros": "0" } // }, // { // "bigint": "9223372036854775807", // "date": "5881580-07-10", // "interval": { "months": 999, "days": 999, "micros": "999999999" } // } // ] const columns = reader.getColumnsJson(); // [ // [ "-9223372036854775808", "9223372036854775807" ], // [ "5877642-06-25 (BC)", "5881580-07-10" ], // [ // { "months": 0, "days": 0, "micros": "0" }, // { "months": 999, "days": 999, "micros": "999999999" } // ] // ] const columnsObject = reader.getColumnsObjectJson(); // { // "bigint": [ "-9223372036854775808", "9223372036854775807" ], // "date": [ "5877642-06-25 (BC)", "5881580-07-10" ], // "interval": [ // { "months": 0, "days": 0, "micros": "0" }, // { "months": 999, "days": 999, "micros": "999999999" } // ] // } ``` These methods handle nested types as well: ```ts const reader = await connection.runAndReadAll( 'from test_all_types() select int_array, struct, map, "union" limit 2' ); const rows = reader.getRowsJson(); // [ // [ // [], // { "a": null, "b": null }, // [], // { "tag": "name", "value": "Frank" } // ], // [ // [ 42, 999, null, null, -42], // { "a": 42, "b": "ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†" }, // [ // { "key": "key1", "value": "ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†" }, // { "key": "key2", "value": "goose" } // ], // { "tag": "age", "value": 5 } // ] // ] const rowObjects = reader.getRowObjectsJson(); // [ // { // "int_array": [], // "struct": { "a": null, "b": null }, // "map": [], // "union": { "tag": "name", "value": "Frank" } // }, // { // "int_array": [ 42, 999, null, null, -42 ], // "struct": { "a": 42, "b": "ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†" }, // "map": [ // { "key": "key1", "value": "ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†" }, // { "key": "key2", "value": "goose" } // ], // "union": { "tag": "age", "value": 5 } // } // ] const columns = reader.getColumnsJson(); // [ // [ // [], // [42, 999, null, null, -42] // ], // [ // { "a": null, "b": null }, // { "a": 42, "b": "ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†" } // ], // [ // [], // [ // { "key": "key1", "value": "ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†" }, // { "key": "key2", "value": "goose"} // ] // ], // [ // { "tag": "name", "value": "Frank" }, // { "tag": "age", "value": 5 } // ] // ] const columnsObject = reader.getColumnsObjectJson(); // { // "int_array": [ // [], // [42, 999, null, null, -42] // ], // "struct": [ // { "a": null, "b": null }, // { "a": 42, "b": "ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†" } // ], // "map": [ // [], // [ // { "key": "key1", "value": "ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†" }, // { "key": "key2", "value": "goose" } // ] // ], // "union": [ // { "tag": "name", "value": "Frank" }, // { "tag": "age", "value": 5 } // ] // } ``` Column names and types can also be serialized to JSON: ```ts const columnNamesAndTypes = reader.columnNamesAndTypesJson(); // { // "columnNames": [ // "int_array", // "struct", // "map", // "union" // ], // "columnTypes": [ // { // "typeId": 24, // "valueType": { // "typeId": 4 // } // }, // { // "typeId": 25, // "entryNames": [ // "a", // "b" // ], // "entryTypes": [ // { // "typeId": 4 // }, // { // "typeId": 17 // } // ] // }, // { // "typeId": 26, // "keyType": { // "typeId": 17 // }, // "valueType": { // "typeId": 17 // } // }, // { // "typeId": 28, // "memberTags": [ // "name", // "age" // ], // "memberTypes": [ // { // "typeId": 17 // }, // { // "typeId": 3 // } // ] // } // ] // } const columnNameAndTypeObjects = reader.columnNameAndTypeObjectsJson(); // [ // { // "columnName": "int_array", // "columnType": { // "typeId": 24, // "valueType": { // "typeId": 4 // } // } // }, // { // "columnName": "struct", // "columnType": { // "typeId": 25, // "entryNames": [ // "a", // "b" // ], // "entryTypes": [ // { // "typeId": 4 // }, // { // "typeId": 17 // } // ] // } // }, // { // "columnName": "map", // "columnType": { // "typeId": 26, // "keyType": { // "typeId": 17 // }, // "valueType": { // "typeId": 17 // } // } // }, // { // "columnName": "union", // "columnType": { // "typeId": 28, // "memberTags": [ // "name", // "age" // ], // "memberTypes": [ // { // "typeId": 17 // }, // { // "typeId": 3 // } // ] // } // } // ] ``` ##### Fetch Chunks {#docs:stable:clients:node_neo:overview::fetch-chunks} Fetch all chunks: ```ts const chunks = await result.fetchAllChunks(); ``` Fetch one chunk at a time: ```ts const chunks = []; while (true) { const chunk = await result.fetchChunk(); // Last chunk will have zero rows. if (chunk.rowCount === 0) { break; } chunks.push(chunk); } ``` For materialized (non-streaming) results, chunks can be read by index: ```ts const rowCount = result.rowCount; const chunkCount = result.chunkCount; for (let i = 0; i < chunkCount; i++) { const chunk = result.getChunk(i); // ... } ``` Get chunk data: ```ts const rows = chunk.getRows(); const rowObjects = chunk.getRowObjects(result.deduplicatedColumnNames()); const columns = chunk.getColumns(); const columnsObject = chunk.getColumnsObject(result.deduplicatedColumnNames()); ``` Get chunk data (one value at a time) ```ts const columns = []; const columnCount = chunk.columnCount; for (let columnIndex = 0; columnIndex < columnCount; columnIndex++) { const columnValues = []; const columnVector = chunk.getColumnVector(columnIndex); const itemCount = columnVector.itemCount; for (let itemIndex = 0; itemIndex < itemCount; itemIndex++) { const value = columnVector.getItem(itemIndex); columnValues.push(value); } columns.push(columnValues); } ``` ##### Inspect Data Types {#docs:stable:clients:node_neo:overview::inspect-data-types} ```ts import { DuckDBTypeId } from '@duckdb/node-api'; if (columnType.typeId === DuckDBTypeId.ARRAY) { const arrayValueType = columnType.valueType; const arrayLength = columnType.length; } if (columnType.typeId === DuckDBTypeId.DECIMAL) { const decimalWidth = columnType.width; const decimalScale = columnType.scale; } if (columnType.typeId === DuckDBTypeId.ENUM) { const enumValues = columnType.values; } if (columnType.typeId === DuckDBTypeId.LIST) { const listValueType = columnType.valueType; } if (columnType.typeId === DuckDBTypeId.MAP) { const mapKeyType = columnType.keyType; const mapValueType = columnType.valueType; } if (columnType.typeId === DuckDBTypeId.STRUCT) { const structEntryNames = columnType.names; const structEntryTypes = columnType.valueTypes; } if (columnType.typeId === DuckDBTypeId.UNION) { const unionMemberTags = columnType.memberTags; const unionMemberTypes = columnType.memberTypes; } // For the JSON type (https://duckdb.org/docs/data/json/json_type) if (columnType.alias === 'JSON') { const json = JSON.parse(columnValue); } ``` Every type implements toString. The result is both human-friendly and readable by DuckDB in an appropriate expression. ```ts const typeString = columnType.toString(); ``` ##### Inspect Data Values {#docs:stable:clients:node_neo:overview::inspect-data-values} ```ts import { DuckDBTypeId } from '@duckdb/node-api'; if (columnType.typeId === DuckDBTypeId.ARRAY) { const arrayItems = columnValue.items; // array of values const arrayString = columnValue.toString(); } if (columnType.typeId === DuckDBTypeId.BIT) { const bools = columnValue.toBools(); // array of booleans const bits = columnValue.toBits(); // array of 0s and 1s const bitString = columnValue.toString(); // string of '0's and '1's } if (columnType.typeId === DuckDBTypeId.BLOB) { const blobBytes = columnValue.bytes; // Uint8Array const blobString = columnValue.toString(); } if (columnType.typeId === DuckDBTypeId.DATE) { const dateDays = columnValue.days; const dateString = columnValue.toString(); const { year, month, day } = columnValue.toParts(); } if (columnType.typeId === DuckDBTypeId.DECIMAL) { const decimalWidth = columnValue.width; const decimalScale = columnValue.scale; // Scaled-up value. Represented number is value/(10^scale). const decimalValue = columnValue.value; // bigint const decimalString = columnValue.toString(); const decimalDouble = columnValue.toDouble(); } if (columnType.typeId === DuckDBTypeId.INTERVAL) { const intervalMonths = columnValue.months; const intervalDays = columnValue.days; const intervalMicros = columnValue.micros; // bigint const intervalString = columnValue.toString(); } if (columnType.typeId === DuckDBTypeId.LIST) { const listItems = columnValue.items; // array of values const listString = columnValue.toString(); } if (columnType.typeId === DuckDBTypeId.MAP) { const mapEntries = columnValue.entries; // array of { key, value } const mapString = columnValue.toString(); } if (columnType.typeId === DuckDBTypeId.STRUCT) { // { name1: value1, name2: value2, ... } const structEntries = columnValue.entries; const structString = columnValue.toString(); } if (columnType.typeId === DuckDBTypeId.TIMESTAMP_MS) { const timestampMillis = columnValue.milliseconds; // bigint const timestampMillisString = columnValue.toString(); } if (columnType.typeId === DuckDBTypeId.TIMESTAMP_NS) { const timestampNanos = columnValue.nanoseconds; // bigint const timestampNanosString = columnValue.toString(); } if (columnType.typeId === DuckDBTypeId.TIMESTAMP_S) { const timestampSecs = columnValue.seconds; // bigint const timestampSecsString = columnValue.toString(); } if (columnType.typeId === DuckDBTypeId.TIMESTAMP_TZ) { const timestampTZMicros = columnValue.micros; // bigint const timestampTZString = columnValue.toString(); const { date: { year, month, day }, time: { hour, min, sec, micros }, } = columnValue.toParts(); } if (columnType.typeId === DuckDBTypeId.TIMESTAMP) { const timestampMicros = columnValue.micros; // bigint const timestampString = columnValue.toString(); const { date: { year, month, day }, time: { hour, min, sec, micros }, } = columnValue.toParts(); } if (columnType.typeId === DuckDBTypeId.TIME_TZ) { const timeTZMicros = columnValue.micros; // bigint const timeTZOffset = columnValue.offset; const timeTZString = columnValue.toString(); const { time: { hour, min, sec, micros }, offset, } = columnValue.toParts(); } if (columnType.typeId === DuckDBTypeId.TIME) { const timeMicros = columnValue.micros; // bigint const timeString = columnValue.toString(); const { hour, min, sec, micros } = columnValue.toParts(); } if (columnType.typeId === DuckDBTypeId.UNION) { const unionTag = columnValue.tag; const unionValue = columnValue.value; const unionValueString = columnValue.toString(); } if (columnType.typeId === DuckDBTypeId.UUID) { const uuidHugeint = columnValue.hugeint; // bigint const uuidString = columnValue.toString(); } // other possible values are: null, boolean, number, bigint, or string ``` ##### Displaying Timezones {#docs:stable:clients:node_neo:overview::displaying-timezones} Converting a TIMESTAMP_TZ value to a string depends on a timezone offset. By default, this is set to the offset for the local timezone when the Node process is started. To change it, set the `timezoneOffsetInMinutes` property of `DuckDBTimestampTZValue`: ```ts DuckDBTimestampTZValue.timezoneOffsetInMinutes = -8 * 60; const pst = DuckDBTimestampTZValue.Epoch.toString(); // 1969-12-31 16:00:00-08 DuckDBTimestampTZValue.timezoneOffsetInMinutes = +1 * 60; const cet = DuckDBTimestampTZValue.Epoch.toString(); // 1970-01-01 01:00:00+01 ``` Note that the timezone offset used for this string conversion is distinct from the `TimeZone` setting of DuckDB. The following sets this offset to match the `TimeZone` setting of DuckDB: ```ts const reader = await connection.runAndReadAll( `select (timezone(current_timestamp) / 60)::int` ); DuckDBTimestampTZValue.timezoneOffsetInMinutes = reader.getColumns()[0][0]; ``` ##### Append To Table {#docs:stable:clients:node_neo:overview::append-to-table} ```ts await connection.run( `create or replace table target_table(i integer, v varchar)` ); const appender = await connection.createAppender('target_table'); appender.appendInteger(42); appender.appendVarchar('duck'); appender.endRow(); appender.appendInteger(123); appender.appendVarchar('mallard'); appender.endRow(); appender.flushSync(); appender.appendInteger(17); appender.appendVarchar('goose'); appender.endRow(); appender.closeSync(); // also flushes ``` ##### Append Data Chunk {#docs:stable:clients:node_neo:overview::append-data-chunk} ```ts await connection.run( `create or replace table target_table(i integer, v varchar)` ); const appender = await connection.createAppender('target_table'); const chunk = DuckDBDataChunk.create([INTEGER, VARCHAR]); chunk.setColumns([ [42, 123, 17], ['duck', 'mallad', 'goose'], ]); // OR: // chunk.setRows([ // [42, 'duck'], // [123, 'mallard'], // [17, 'goose'], // ]); appender.appendDataChunk(chunk); appender.flushSync(); ``` See "Specifying Values" above for how to supply values to the appender. ##### Extract Statements {#docs:stable:clients:node_neo:overview::extract-statements} ```ts const extractedStatements = await connection.extractStatements(` create or replace table numbers as from range(?); from numbers where range < ?; drop table numbers; `); const parameterValues = [10, 7]; const statementCount = extractedStatements.count; for (let stmtIndex = 0; stmtIndex < statementCount; stmtIndex++) { const prepared = await extractedStatements.prepare(stmtIndex); let parameterCount = prepared.parameterCount; for (let paramIndex = 1; paramIndex <= parameterCount; paramIndex++) { prepared.bindInteger(paramIndex, parameterValues.shift()); } const result = await prepared.run(); // ... } ``` ##### Control Evaluation of Tasks {#docs:stable:clients:node_neo:overview::control-evaluation-of-tasks} ```ts import { DuckDBPendingResultState } from '@duckdb/node-api'; async function sleep(ms) { return new Promise((resolve) => { setTimeout(resolve, ms); }); } const prepared = await connection.prepare('from range(10_000_000)'); const pending = prepared.start(); while (pending.runTask() !== DuckDBPendingResultState.RESULT_READY) { console.log('not ready'); await sleep(1); } console.log('ready'); const result = await pending.getResult(); // ... ``` ##### Ways to Run SQL {#docs:stable:clients:node_neo:overview::ways-to-run-sql} ```ts // Run to completion but don't yet retrieve any rows. // Optionally take values to bind to SQL parameters, // and (optionally) types of those parameters, // either as an array (for positional parameters), // or an object keyed by parameter name. const result = await connection.run(sql); const result = await connection.run(sql, values); const result = await connection.run(sql, values, types); // Run to completion but don't yet retrieve any rows. // Wrap in a DuckDBDataReader for convenient data retrieval. const reader = await connection.runAndRead(sql); const reader = await connection.runAndRead(sql, values); const reader = await connection.runAndRead(sql, values, types); // Run to completion, wrap in a reader, and read all rows. const reader = await connection.runAndReadAll(sql); const reader = await connection.runAndReadAll(sql, values); const reader = await connection.runAndReadAll(sql, values, types); // Run to completion, wrap in a reader, and read at least // the given number of rows. (Rows are read in chunks, so more than // the target may be read.) const reader = await connection.runAndReadUntil(sql, targetRowCount); const reader = await connection.runAndReadAll(sql, targetRowCount, values); const reader = await connection.runAndReadAll(sql, targetRowCount, values, types); // Create a streaming result and don't yet retrieve any rows. const result = await connection.stream(sql); const result = await connection.stream(sql, values); const result = await connection.stream(sql, values, types); // Create a streaming result and don't yet retrieve any rows. // Wrap in a DuckDBDataReader for convenient data retrieval. const reader = await connection.streamAndRead(sql); const reader = await connection.streamAndRead(sql, values); const reader = await connection.streamAndRead(sql, values, types); // Create a streaming result, wrap in a reader, and read all rows. const reader = await connection.streamAndReadAll(sql); const reader = await connection.streamAndReadAll(sql, values); const reader = await connection.streamAndReadAll(sql, values, types); // Create a streaming result, wrap in a reader, and read at least // the given number of rows. const reader = await connection.streamAndReadUntil(sql, targetRowCount); const reader = await connection.streamAndReadUntil(sql, targetRowCount, values); const reader = await connection.streamAndReadUntil(sql, targetRowCount, values, types); // Prepared Statements // Prepare a possibly-parametered SQL statement to run later. const prepared = await connection.prepare(sql); // Bind values to the parameters. prepared.bind(values); prepared.bind(values, types); // Run the prepared statement. These mirror the methods on the connection. const result = prepared.run(); const reader = prepared.runAndRead(); const reader = prepared.runAndReadAll(); const reader = prepared.runAndReadUntil(targetRowCount); const result = prepared.stream(); const reader = prepared.streamAndRead(); const reader = prepared.streamAndReadAll(); const reader = prepared.streamAndReadUntil(targetRowCount); // Pending Results // Create a pending result. const pending = await connection.start(sql); const pending = await connection.start(sql, values); const pending = await connection.start(sql, values, types); // Create a pending, streaming result. const pending = await connection.startStream(sql); const pending = await connection.startStream(sql, values); const pending = await connection.startStream(sql, values, types); // Create a pending result from a prepared statement. const pending = await prepared.start(); const pending = await prepared.startStream(); while (pending.runTask() !== DuckDBPendingResultState.RESULT_READY) { // optionally sleep or do other work between tasks } // Retrieve the result. If not yet READY, will run until it is. const result = await pending.getResult(); const reader = await pending.read(); const reader = await pending.readAll(); const reader = await pending.readUntil(targetRowCount); ``` ##### Ways to Get Result Data {#docs:stable:clients:node_neo:overview::ways-to-get-result-data} ```ts // From a result // Asynchronously retrieve data for all rows: const columns = await result.getColumns(); const columnsJson = await result.getColumnsJson(); const columnsObject = await result.getColumnsObject(); const columnsObjectJson = await result.getColumnsObjectJson(); const rows = await result.getRows(); const rowsJson = await result.getRowsJson(); const rowObjects = await result.getRowObjects(); const rowObjectsJson = await result.getRowObjectsJson(); // From a reader // First, (asynchronously) read some rows: await reader.readAll(); // or: await reader.readUntil(targetRowCount); // Then, (synchronously) get result data for the rows read: const columns = reader.getColumns(); const columnsJson = reader.getColumnsJson(); const columnsObject = reader.getColumnsObject(); const columnsObjectJson = reader.getColumnsObjectJson(); const rows = reader.getRows(); const rowsJson = reader.getRowsJson(); const rowObjects = reader.getRowObjects(); const rowObjectsJson = reader.getRowObjectsJson(); // Individual values can also be read directly: const value = reader.value(columnIndex, rowIndex); // Using chunks // If desired, one or more chunks can be fetched from a result: const chunk = await result.fetchChunk(); const chunks = await result.fetchAllChunks(); // And then data can be retrieved from each chunk: const columnValues = chunk.getColumnValues(columnIndex); const columns = chunk.getColumns(); const rowValues = chunk.getRowValues(rowIndex); const rows = chunk.getRows(); // Or, values can be visited: chunk.visitColumnValues(columnIndex, (value, rowIndex, columnIndex, type) => { /* ... */ } ); chunk.visitColumns((column, columnIndex, type) => { /* ... */ }); chunk.visitColumnMajor( (value, rowIndex, columnIndex, type) => { /* ... */ } ); chunk.visitRowValues(rowIndex, (value, rowIndex, columnIndex, type) => { /* ... */ } ); chunk.visitRows((row, rowIndex) => { /* ... */ }); chunk.visitRowMajor( (value, rowIndex, columnIndex, type) => { /* ... */ } ); // Or converted: // The `converter` argument implements `DuckDBValueConverter`, // which has the single method convertValue(value, type). const columnValues = chunk.convertColumnValues(columnIndex, converter); const columns = chunk.convertColumns(converter); const rowValues = chunk.convertRowValues(rowIndex, converter); const rows = chunk.convertRows(converter); // The reader abstracts these low-level chunk manipulations // and is recommended for most cases. ``` ## ODBC {#clients:odbc} ### ODBC API Overview {#docs:stable:clients:odbc:overview} > The latest stable version of the DuckDB ODBC client is {{ site.current_duckdb_odbc_short_version }}. The ODBC (Open Database Connectivity) is a C-style API that provides access to different flavors of Database Management Systems (DBMSs). The ODBC API consists of the Driver Manager (DM) and the ODBC drivers. The Driver Manager is part of the system library, e.g., unixODBC, which manages the communications between the user applications and the ODBC drivers. Typically, applications are linked against the DM, which uses Data Source Name (DSN) to look up the correct ODBC driver. The ODBC driver is a DBMS implementation of the ODBC API, which handles all the internals of that DBMS. The DM maps user application calls of ODBC functions to the correct ODBC driver that performs the specified function and returns the proper values. #### DuckDB ODBC Driver {#docs:stable:clients:odbc:overview::duckdb-odbc-driver} DuckDB supports the ODBC version 3.0 according to the [Core Interface Conformance](https://docs.microsoft.com/en-us/sql/odbc/reference/develop-app/core-interface-conformance?view=sql-server-ver15). The ODBC driver is available for all operating systems. Visit the [installation page](https://duckdb.org/install) for direct links. ### ODBC API on Linux {#docs:stable:clients:odbc:linux} #### Driver Manager {#docs:stable:clients:odbc:linux::driver-manager} A driver manager is required to manage communication between applications and the ODBC driver. We tested and support `unixODBC` that is a complete ODBC driver manager for Linux. Users can install it from the command line: On Debian-based distributions (Ubuntu, Mint, etc.), run: ```batch sudo apt-get install unixodbc odbcinst ``` On Fedora-based distributions (Amazon Linux, RHEL, CentOS, etc.), run: ```batch sudo yum install unixODBC ``` #### Setting Up the Driver {#docs:stable:clients:odbc:linux::setting-up-the-driver} 1. Download the ODBC Linux Asset corresponding to your architecture: * [x86_64 (AMD64)](https://github.com/duckdb/duckdb-odbc/releases/download/v1.4.1.0/duckdb_odbc-linux-amd64.zip) * [arm64 (AArch64)](https://github.com/duckdb/duckdb-odbc/releases/download/v1.4.1.0/duckdb_odbc-linux-aarch64.zip) 2. The package contains the following files: * `libduckdb_odbc.so`: the DuckDB driver. * `unixodbc_setup.sh`: a setup script to aid the configuration on Linux. To extract them, run: ```batch mkdir duckdb_odbc && unzip duckdb_odbc-linux-amd64.zip -d duckdb_odbc ``` 3. The `unixodbc_setup.sh` script performs the configuration of the DuckDB ODBC Driver. It is based on the unixODBC package that provides some commands to handle the ODBC setup and test like `odbcinst` and `isql`. Run the following commands with either option `-u` or `-s` to configure DuckDB ODBC. The `-u` option based on the user home directory to setup the ODBC init files. ```batch ./unixodbc_setup.sh -u ``` The `-s` option changes the system level files that will be visible for all users, because of that it requires root privileges. ```batch sudo ./unixodbc_setup.sh -s ``` The option `--help` shows the usage of `unixodbc_setup.sh` prints the help. ```batch ./unixodbc_setup.sh --help ``` ```text Usage: ./unixodbc_setup.sh [options] Example: ./unixodbc_setup.sh -u -db ~/database_path -D ~/driver_path/libduckdb_odbc.so Level: -s: System-level, using 'sudo' to configure DuckDB ODBC at the system-level, changing the files: /etc/odbc[inst].ini -u: User-level, configuring the DuckDB ODBC at the user-level, changing the files: ~/.odbc[inst].ini. Options: -db database_path>: the DuckDB database file path, the default is ':memory:' if not provided. -D driver_path: the driver file path (i.e., the path for libduckdb_odbc.so), the default is using the base script directory ``` 4. The ODBC setup on Linux is based on the `.odbc.ini` and `.odbcinst.ini` files. These files can be placed to the user home directory `/home/âŸ¨usernameâŸ©`{:.language-sql .highlight} or in the system `/etc`{:.language-sql .highlight} directory. The Driver Manager prioritizes the user configuration files over the system files. For the details of the configuration parameters, see the [ODBC configuration page](#docs:stable:clients:odbc:configuration). ### ODBC API on Windows {#docs:stable:clients:odbc:windows} Using the DuckDB ODBC API on Windows requires the following steps: 1. The Microsoft Windows requires an ODBC Driver Manager to manage communication between applications and the ODBC drivers. The Driver Manager on Windows is provided in a DLL file `odbccp32.dll`, and other files and tools. For detailed information check out the [Common ODBC Component Files](https://docs.microsoft.com/en-us/previous-versions/windows/desktop/odbc/dn170563(v=vs.85)). 2. DuckDB releases the ODBC driver as an asset. For Windows, download it from the [Windows ODBC asset (x86_64/AMD64)](https://github.com/duckdb/duckdb-odbc/releases/download/v1.4.1.0/duckdb_odbc-windows-amd64.zip). 3. The archive contains the following artifacts: * `duckdb_odbc.dll`: the DuckDB driver compiled for Windows. * `duckdb_odbc_setup.dll`: a setup DLL used by the Windows ODBC Data Source Administrator tool. * `odbc_install.exe`: an installation script to aid the configuration on Windows. Decompress the archive to a directory (e.g., `duckdb_odbc`). 4. The `odbc_install.exe` binary performs the configuration of the DuckDB ODBC Driver on Windows. It depends on the `Odbccp32.dll` that provides functions to configure the ODBC registry entries. Inside the permanent directory (e.g., `duckdb_odbc`), double-click on the `odbc_install.exe`. Windows administrator privileges are required. In case of a non-administrator, a User Account Control prompt will occur. 5. `odbc_install.exe` adds a default DSN configuration into the ODBC registries with a default database `:memory:`. ##### DSN Windows Setup {#docs:stable:clients:odbc:windows::dsn-windows-setup} After the installation, it is possible to change the default DSN configuration or add a new one using the Windows ODBC Data Source Administrator tool `odbcad32.exe`. It also can be launched thought the Windows start: ![](../images/blog/odbc/launch_odbcad.png) ##### Default DuckDB DSN {#docs:stable:clients:odbc:windows::default-duckdb-dsn} The newly installed DSN is visible on the ***System DSN*** in the Windows ODBC Data Source Administrator tool: ![Windows ODBC Config Tool](../images/blog/odbc/odbcad32_exe.png) ##### Changing DuckDB DSN {#docs:stable:clients:odbc:windows::changing-duckdb-dsn} When selecting the default DSN (i.e., `DuckDB`) or adding a new configuration, the following setup window will display: ![DuckDB Windows DSN Setup](../images/blog/odbc/duckdb_DSN_setup.png) This window allows you to set the DSN and the database file path associated with that DSN. #### More Detailed Windows Setup {#docs:stable:clients:odbc:windows::more-detailed-windows-setup} There are two ways to configure the ODBC driver, either by altering the registry keys as detailed below, or by connecting with [`SQLDriverConnect`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqldriverconnect-function?view=sql-server-ver16). A combination of the two is also possible. Furthermore, the ODBC driver supports all the [configuration options](#docs:stable:configuration:overview) included in DuckDB. > If a configuration is set in both the connection string passed to `SQLDriverConnect` and in the `odbc.ini` file, > the one passed to `SQLDriverConnect` will take precedence. For the details of the configuration parameters, see the [ODBC configuration page](#docs:stable:clients:odbc:configuration). ##### Registry Keys {#docs:stable:clients:odbc:windows::registry-keys} The ODBC setup on Windows is based on registry keys (see [Registry Entries for ODBC Components](https://docs.microsoft.com/en-us/sql/odbc/reference/install/registry-entries-for-odbc-components?view=sql-server-ver15)). The ODBC entries can be placed at the current user registry key (` HKCU`) or the system registry key (` HKLM`). We have tested and used the system entries based on `HKLM->SOFTWARE->ODBC`. The `odbc_install.exe` changes this entry that has two subkeys: `ODBC.INI` and `ODBCINST.INI`. The `ODBC.INI` is where users usually insert DSN registry entries for the drivers. For example, the DSN registry for DuckDB would look like this: ![`HKLM->SOFTWARE->ODBC->ODBC.INI->DuckDB`](../images/blog/odbc/odbc_ini-registry-entry.png) The `ODBCINST.INI` contains one entry for each ODBC driver and other keys predefined for [Windows ODBC configuration](https://docs.microsoft.com/en-us/sql/odbc/reference/install/registry-entries-for-odbc-components?view=sql-server-ver15). ##### Updating the ODBC Driver {#docs:stable:clients:odbc:windows::updating-the-odbc-driver} When a new version of the ODBC driver is released, installing the new version will overwrite the existing one. However, the installer doesn't always update the version number in the registry. To ensure the correct version is used, check that `HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBCINST.INI\DuckDB Driver` has the most recent version, and `HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\DuckDB\Driver` has the correct path to the new driver. ### ODBC API on macOS {#docs:stable:clients:odbc:macos} 1. A driver manager is required to manage communication between applications and the ODBC driver. DuckDB supports `unixODBC`, which is a complete ODBC driver manager for macOS and Linux. Users can install it from the command line via [Homebrew](https://brew.sh/): ```batch brew install unixodbc ``` 2. DuckDB releases a universal [ODBC driver for macOS](https://github.com/duckdb/duckdb-odbc/releases/download/v1.4.1.0/duckdb_odbc-osx-universal.zip) (supporting both Intel and Apple Silicon CPUs). To download it, run: ```batch wget https://github.com/duckdb/duckdb-odbc/releases/download/v1.4.1.0/duckdb_odbc-osx-universal.zip ``` 3. The archive contains the `libduckdb_odbc.dylib` artifact. To extract it to a directory, run: ```batch mkdir duckdb_odbc && unzip duckdb_odbc-osx-universal.zip -d duckdb_odbc ``` 4. There are two ways to configure the ODBC driver, either by initializing via the configuration files, or by connecting with [`SQLDriverConnect`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqldriverconnect-function?view=sql-server-ver16). A combination of the two is also possible. Furthermore, the ODBC driver supports all the [configuration options](#docs:stable:configuration:overview) included in DuckDB. > If a configuration is set in both the connection string passed to `SQLDriverConnect` and in the `odbc.ini` file, > the one passed to `SQLDriverConnect` will take precedence. For the details of the configuration parameters, see the [ODBC configuration page](#docs:stable:clients:odbc:configuration). 5. After the configuration, to validate the installation, it is possible to use an ODBC client. unixODBC uses a command line tool called `isql`. Use the DSN defined in `odbc.ini` as a parameter of `isql`. ```batch isql DuckDB ``` ```text +---------------------------------------+ | Connected! | | | | sql-statement | | help [tablename] | | echo [string] | | quit | | | +---------------------------------------+ ``` ```sql SQL> SELECT 42; ``` ```text +------------+ | 42 | +------------+ | 42 | +------------+ SQLRowCount returns -1 1 rows fetched ``` ### ODBC Configuration {#docs:stable:clients:odbc:configuration} This page documents the files using the ODBC configuration, [`odbc.ini`](#::odbcini-and-odbcini) and [`odbcinst.ini`](#::odbcinstini-and-odbcinstini). These are either placed in the home directory as dotfiles (` .odbc.ini` and `.odbcinst.ini`, respectively) or in a system directory. For platform-specific details, see the pages for [Linux](#docs:stable:clients:odbc:linux), [macOS](#docs:stable:clients:odbc:macos), and [Windows](#docs:stable:clients:odbc:windows). #### `odbc.ini` and `.odbc.ini` {#docs:stable:clients:odbc:configuration::odbcini-and-odbcini} The `odbc.ini` file contains the DSNs for the drivers, which can have specific knobs. An example of `odbc.ini` with DuckDB: ```ini [DuckDB] Driver = DuckDB Driver Database = :memory: access_mode = read_only ``` The lines correspond to the following parameters: * `[DuckDB]`: between the brackets is a DSN for the DuckDB. * `Driver`: Describes the driver's name, as well as where to find the configurations in the `odbcinst.ini`. * `Database`: Describes the database name used by DuckDB, can also be a file path to a `.db` in the system. * `access_mode`: The mode in which to connect to the database. #### `odbcinst.ini` and `.odbcinst.ini` {#docs:stable:clients:odbc:configuration::odbcinstini-and-odbcinstini} The `odbcinst.ini` file contains general configurations for the ODBC installed drivers in the system. A driver section starts with the driver name between brackets, and then it follows specific configuration knobs belonging to that driver. Example of `odbcinst.ini` with the DuckDB: ```ini [ODBC] Trace = yes TraceFile = /tmp/odbctrace [DuckDB Driver] Driver = /path/to/libduckdb_odbc.dylib ``` The lines correspond to the following parameters: * `[ODBC]`: The DM configuration section. * `Trace`: Enables the ODBC trace file using the option `yes`. * `TraceFile`: The absolute system file path for the ODBC trace file. * `[DuckDB Driver]`: The section of the DuckDB installed driver. * `Driver`: The absolute system file path of the DuckDB driver. Change to match your configuration. ## PHP Client {#docs:stable:clients:php} > The DuckDB PHP client is a [tertiary client](#docs:stable:clients:overview) and is maintained by a third-party. Client API for PHP. Focused on performance, it uses the official C API internally through [FFI](https://www.php.net/manual/en/book.ffi.php), achieving good benchmarks. This library is more than just a wrapper for the C API; it introduces custom, PHP-friendly methods to simplify working with DuckDB. It is compatible with Linux, Windows, and macOS, requiring PHP version 8.3 or higher. #### Install {#docs:stable:clients:php::install} ```batch composer require satur.io/duckdb ``` #### Documentation {#docs:stable:clients:php::documentation} Full documentation is available at [https://duckdb-php.readthedocs.io/](https://duckdb-php.readthedocs.io/). #### Quick Start {#docs:stable:clients:php::quick-start} ```php DuckDB::sql("SELECT 'quack' as my_column")->print(); ``` ```text ------------------- | my_column | ------------------- | quack | ------------------- ``` The function we used here, `DuckDB::sql()`, performs the query in a new in-memory database which is destroyed after retrieving the result. This is not the most common use case, let's see how to get a persistent connection. ##### Connection {#docs:stable:clients:php::connection} ```php $duckDB = DuckDB::create('duck.db'); // or DuckDB::create() for in-memory database $duckDB->query('CREATE TABLE test (i INTEGER, b BOOL, f FLOAT);'); $duckDB->query('INSERT INTO test VALUES (3, true, 1.1), (5, true, 1.2), (3, false, 1.1), (3, null, 1.2);'); $duckDB->query('SELECT * FROM test')->print(); ``` As you probably guessed, `DuckDB::create()` creates a new connection to the specified database, or create a new one if it doesn't exist yet and then establishes the connection. After that, we can use the function `query` to perform the requests. > Notice the difference between the static method `sql` and the non-static method `query`. > While the first one always creates and destroys a new in-memory database, the second one > uses a previously established connection and should be the preferred option in most cases. In addition, the library also provides prepared statements for binding parameters to our query. ##### Prepared Statements {#docs:stable:clients:php::prepared-statements} ```php $duckDB = DuckDB::create(); $duckDB->query('CREATE TABLE test (i INTEGER, b BOOL, f FLOAT);'); $duckDB->query('INSERT INTO test VALUES (3, true, 1.1), (5, true, 1.2), (3, false, 1.1), (3, null, 1.2);'); $boolPreparedStatement = $duckDB->preparedStatement('SELECT * FROM test WHERE b = $1'); $boolPreparedStatement->bindParam(1, true); $result = $boolPreparedStatement->execute(); $result->print(); $intPreparedStatement = $duckDB->preparedStatement('SELECT * FROM test WHERE i = ?'); $intPreparedStatement->bindParam(1, 3); $result = $intPreparedStatement->execute(); $result->print(); ``` ##### Appenders {#docs:stable:clients:php::appenders} Appenders are the preferred method to load data in DuckDB. See [Appender page](#docs:stable:clients:c:appender) for more information. ```php $duckDB = DuckDB::create(); $result = $duckDB->query('CREATE TABLE people (id INTEGER, name VARCHAR);'); $appender = $duckDB->appender('people'); for ($i = 0; $i < 100; ++$i) { $appender->append(rand(1, 100000)); $appender->append('string-'.rand(1, 100)); $appender->endRow(); } $appender->flush(); ``` ##### DuckDB-Powerful {#docs:stable:clients:php::duckdb-powerful} DuckDB provides some amazing features. For example, you can query remote files directly. Let's use an aggregate function to calculate the average of a column for a Parquet remote file: ```php DuckDB::sql( 'SELECT "Reporting Year", avg("Gas Produced, MCF") as "AVG Gas Produced" FROM "https://github.com/plotly/datasets/raw/refs/heads/master/oil-and-gas.parquet" WHERE "Reporting Year" BETWEEN 1985 AND 1990 GROUP BY "Reporting Year";' )->print(); ``` ```text -------------------------------------- | Reporting Year | AVG Gas Produce | -------------------------------------- | 1985 | 2461.4047344111 | | 1986 | 6060.8575605681 | | 1987 | 5047.5813074014 | | 1988 | 4763.4090541633 | | 1989 | 4175.2989758837 | | 1990 | 3706.9404742437 | -------------------------------------- ``` Or summarize a remote csv: ```php DuckDB::sql('SUMMARIZE TABLE "https://blobs.duckdb.org/data/Star_Trek-Season_1.csv";')->print(); ``` ```text ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | column_name | column_type | min | max | approx_unique | avg | std | q25 | q50 | q75 | count | null_percentage | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | season_num | BIGINT | 1 | 1 | 1 | 1.0 | 0.0 | 1 | 1 | 1 | 30 | 0 | | episode_num | BIGINT | 0 | 29 | 29 | 14.5 | 8.8034084308295 | 7 | 14 | 22 | 30 | 0 | | aired_date | DATE | 1965-02-28 | 1967-04-13 | 35 | | | 1966-10-20 | 1966-12-22 | 1967-02-16 | 30 | 0 | | cnt_kirk_hookup | BIGINT | 0 | 2 | 3 | 0.3333333333333 | 0.6064784348631 | 0 | 0 | 1 | 30 | 0 | ... ``` #### Requirements {#docs:stable:clients:php::requirements} - Linux, macOS, or Windows - x64 platform - PHP >= 8.3 - ext-ffi ##### Recommended {#docs:stable:clients:php::recommended} - ext-bcmath - Needed for big integers (> PHP_INT_MAX) - ext-zend-opcache - For better performance #### Type Support {#docs:stable:clients:php::type-support} From version 1.2.0 on the library supports all DuckDB file types. | DuckDB Type | SQL Type | PHP Type | |--------------------------|--------------|--------------------------------------| | DUCKDB_TYPE_BOOLEAN | BOOLEAN | bool | | DUCKDB_TYPE_TINYINT | TINYINT | int | | DUCKDB_TYPE_SMALLINT | SMALLINT | int | | DUCKDB_TYPE_INTEGER | INTEGER | int | | DUCKDB_TYPE_BIGINT | BIGINT | int | | DUCKDB_TYPE_UTINYINT | UTINYINT | int | | DUCKDB_TYPE_USMALLINT | USMALLINT | int | | DUCKDB_TYPE_UINTEGER | UINTEGER | int | | DUCKDB_TYPE_UBIGINT | UBIGINT | Saturio\DuckDB\Type\Math\LongInteger | | DUCKDB_TYPE_FLOAT | FLOAT | float | | DUCKDB_TYPE_DOUBLE | DOUBLE | float | | DUCKDB_TYPE_TIMESTAMP | TIMESTAMP | Saturio\DuckDB\Type\Timestamp | | DUCKDB_TYPE_DATE | DATE | Saturio\DuckDB\Type\Date | | DUCKDB_TYPE_TIME | TIME | Saturio\DuckDB\Type\Time | | DUCKDB_TYPE_INTERVAL | INTERVAL | Saturio\DuckDB\Type\Interval | | DUCKDB_TYPE_HUGEINT | HUGEINT | Saturio\DuckDB\Type\Math\LongInteger | | DUCKDB_TYPE_UHUGEINT | UHUGEINT | Saturio\DuckDB\Type\Math\LongInteger | | DUCKDB_TYPE_VARCHAR | VARCHAR | string | | DUCKDB_TYPE_BLOB | BLOB | Saturio\DuckDB\Type\Blob | | DUCKDB_TYPE_TIMESTAMP_S | TIMESTAMP_S | Saturio\DuckDB\Type\Timestamp | | DUCKDB_TYPE_TIMESTAMP_MS | TIMESTAMP_MS | Saturio\DuckDB\Type\Timestamp | | DUCKDB_TYPE_TIMESTAMP_NS | TIMESTAMP_NS | Saturio\DuckDB\Type\Timestamp | | DUCKDB_TYPE_UUID | UUID | Saturio\DuckDB\Type\UUID | | DUCKDB_TYPE_TIME_TZ | TIMETZ | Saturio\DuckDB\Type\Time | | DUCKDB_TYPE_TIMESTAMP_TZ | TIMESTAMPTZ | Saturio\DuckDB\Type\Timestamp | | DUCKDB_TYPE_DECIMAL | DECIMAL | float | | DUCKDB_TYPE_ENUM | ENUM | string | | DUCKDB_TYPE_LIST | LIST | array | | DUCKDB_TYPE_STRUCT | STRUCT | array | | DUCKDB_TYPE_ARRAY | ARRAY | array | | DUCKDB_TYPE_MAP | MAP | array | | DUCKDB_TYPE_UNION | UNION | mixed | | DUCKDB_TYPE_BIT | BIT | string | | DUCKDB_TYPE_BIGNUM | BIGNUM | string | | DUCKDB_TYPE_SQLNULL | NULL | null | ## Python {#clients:python} ### Python API {#docs:stable:clients:python:overview} > The latest stable version of the DuckDB Python client is 1.4.1. #### Installation {#docs:stable:clients:python:overview::installation} The DuckDB Python API can be installed using [pip](https://pip.pypa.io): `pip install duckdb`. Please see the [installation page](https://duckdb.org/install): `conda install python-duckdb -c conda-forge`. **Python version:** DuckDB requires Python 3.9 or newer. #### Basic API Usage {#docs:stable:clients:python:overview::basic-api-usage} The most straight-forward manner of running SQL queries using DuckDB is using the `duckdb.sql` command. ```python import duckdb duckdb.sql("SELECT 42").show() ``` This will run queries using an **in-memory database** that is stored globally inside the Python module. The result of the query is returned as a **Relation**. A relation is a symbolic representation of the query. The query is not executed until the result is fetched or requested to be printed to the screen. Relations can be referenced in subsequent queries by storing them inside variables, and using them as tables. This way queries can be constructed incrementally. ```python import duckdb r1 = duckdb.sql("SELECT 42 AS i") duckdb.sql("SELECT i * 2 AS k FROM r1").show() ``` #### Data Input {#docs:stable:clients:python:overview::data-input} DuckDB can ingest data from a wide variety of formats â€“ both on-disk and in-memory. See the [data ingestion page](#docs:stable:clients:python:data_ingestion) for more information. ```python import duckdb duckdb.read_csv("example.csv") # read a CSV file into a Relation duckdb.read_parquet("example.parquet") # read a Parquet file into a Relation duckdb.read_json("example.json") # read a JSON file into a Relation duckdb.sql("SELECT * FROM 'example.csv'") # directly query a CSV file duckdb.sql("SELECT * FROM 'example.parquet'") # directly query a Parquet file duckdb.sql("SELECT * FROM 'example.json'") # directly query a JSON file ``` ##### DataFrames {#docs:stable:clients:python:overview::dataframes} DuckDB can directly query Pandas DataFrames, Polars DataFrames and Arrow tables. Note that these are read-only, i.e., editing these tables via [`INSERT`](#docs:stable:sql:statements:insert) or [`UPDATE` statements](#docs:stable:sql:statements:update) is not possible. ###### Pandas {#docs:stable:clients:python:overview::pandas} To directly query a Pandas DataFrame, run: ```python import duckdb import pandas as pd pandas_df = pd.DataFrame({"a": [42]}) duckdb.sql("SELECT * FROM pandas_df") ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â” â”‚ a â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 42 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ###### Polars {#docs:stable:clients:python:overview::polars} To directly query a Polars DataFrame, run: ```python import duckdb import polars as pl polars_df = pl.DataFrame({"a": [42]}) duckdb.sql("SELECT * FROM polars_df") ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â” â”‚ a â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 42 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ###### PyArrow {#docs:stable:clients:python:overview::pyarrow} To directly query a PyArrow table, run: ```python import duckdb import pyarrow as pa arrow_table = pa.Table.from_pydict({"a": [42]}) duckdb.sql("SELECT * FROM arrow_table") ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â” â”‚ a â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 42 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Result Conversion {#docs:stable:clients:python:overview::result-conversion} DuckDB supports converting query results efficiently to a variety of formats. See the [result conversion page](#docs:stable:clients:python:conversion) for more information. ```python import duckdb duckdb.sql("SELECT 42").fetchall() # Python objects duckdb.sql("SELECT 42").df() # Pandas DataFrame duckdb.sql("SELECT 42").pl() # Polars DataFrame duckdb.sql("SELECT 42").arrow() # Arrow Table duckdb.sql("SELECT 42").fetchnumpy() # NumPy Arrays ``` #### Writing Data to Disk {#docs:stable:clients:python:overview::writing-data-to-disk} DuckDB supports writing Relation objects directly to disk in a variety of formats. The [`COPY` statement](#docs:stable:sql:statements:copy) can be used to write data to disk using SQL as an alternative. ```python import duckdb duckdb.sql("SELECT 42").write_parquet("out.parquet") # Write to a Parquet file duckdb.sql("SELECT 42").write_csv("out.csv") # Write to a CSV file duckdb.sql("COPY (SELECT 42) TO 'out.parquet'") # Copy to a Parquet file ``` #### Connection Options {#docs:stable:clients:python:overview::connection-options} Applications can open a new DuckDB connection via the `duckdb.connect()` method. ##### Using an In-Memory Database {#docs:stable:clients:python:overview::using-an-in-memory-database} When using DuckDB through `duckdb.sql()`, it operates on an **in-memory** database, i.e., no tables are persisted on disk. Invoking the `duckdb.connect()` method without arguments returns a connection, which also uses an in-memory database: ```python import duckdb con = duckdb.connect() con.sql("SELECT 42 AS x").show() ``` ##### Persistent Storage {#docs:stable:clients:python:overview::persistent-storage} The `duckdb.connect(dbname)` creates a connection to a **persistent** database. Any data written to that connection will be persisted, and can be reloaded by reconnecting to the same file, both from Python and from other DuckDB clients. ```python import duckdb # create a connection to a file called 'file.db' con = duckdb.connect("file.db") # create a table and load data into it con.sql("CREATE TABLE test (i INTEGER)") con.sql("INSERT INTO test VALUES (42)") # query the table con.table("test").show() # explicitly close the connection con.close() # Note: connections also closed implicitly when they go out of scope ``` You can also use a context manager to ensure that the connection is closed: ```python import duckdb with duckdb.connect("file.db") as con: con.sql("CREATE TABLE test (i INTEGER)") con.sql("INSERT INTO test VALUES (42)") con.table("test").show() # the context manager closes the connection automatically ``` ##### Configuration {#docs:stable:clients:python:overview::configuration} The `duckdb.connect()` accepts a `config` dictionary, where [configuration options](#docs:stable:configuration:overview::configuration-reference) can be specified. For example: ```python import duckdb con = duckdb.connect(config = {'threads': 1}) ``` ##### Connection Object and Module {#docs:stable:clients:python:overview::connection-object-and-module} The connection object and the `duckdb` module can be used interchangeably â€“ they support the same methods. The only difference is that when using the `duckdb` module a global in-memory database is used. > If you are developing a package designed for others to use, and use DuckDB in the package, it is recommend that you create connection objects instead of using the methods on the `duckdb` module. That is because the `duckdb` module uses a shared global database â€“ which can cause hard to debug issues if used from within multiple different packages. ##### Using Connections in Parallel Python Programs {#docs:stable:clients:python:overview::using-connections-in-parallel-python-programs} The `DuckDBPyConnection` object is not thread-safe. If you would like to write to the same database from multiple threads, create a cursor for each thread with the [`DuckDBPyConnection.cursor()` method](#docs:stable:clients:python:reference:index::duckdb.DuckDBPyConnection.cursor). #### Loading and Installing Extensions {#docs:stable:clients:python:overview::loading-and-installing-extensions} DuckDB's Python API provides functions for installing and loading [extensions](#docs:stable:extensions:overview), which perform the equivalent operations to running the `INSTALL` and `LOAD` SQL commands, respectively. An example that installs and loads the [`spatial` extension](#docs:stable:core_extensions:spatial:overview) looks like follows: ```python import duckdb con = duckdb.connect() con.install_extension("spatial") con.load_extension("spatial") ``` ##### Community Extensions {#docs:stable:clients:python:overview::community-extensions} To load [community extensions](#community_extensions:index), use the `repository="community"` argument with the `install_extension` method. For example, install and load the `h3` community extension as follows: ```python import duckdb con = duckdb.connect() con.install_extension("h3", repository="community") con.load_extension("h3") ``` ##### Unsigned Extensions {#docs:stable:clients:python:overview::unsigned-extensions} To load [unsigned extensions](#docs:stable:extensions:overview::unsigned-extensions), use the `config = {"allow_unsigned_extensions": "true"}` argument with the `duckdb.connect()` method. ### Data Ingestion {#docs:stable:clients:python:data_ingestion} This page contains examples for data ingestion to Python using DuckDB. First, import the DuckDB page: ```python import duckdb ``` Then, proceed with any of the following sections. #### CSV Files {#docs:stable:clients:python:data_ingestion::csv-files} CSV files can be read using the `read_csv` function, called either from within Python or directly from within SQL. By default, the `read_csv` function attempts to auto-detect the CSV settings by sampling from the provided file. Read from a file using fully auto-detected settings: ```python duckdb.read_csv("example.csv") ``` Read multiple CSV files from a folder: ```python duckdb.read_csv("folder/*.csv") ``` Specify options on how the CSV is formatted internally: ```python duckdb.read_csv("example.csv", header = False, sep = ",") ``` Override types of the first two columns: ```python duckdb.read_csv("example.csv", dtype = ["int", "varchar"]) ``` Directly read a CSV file from within SQL: ```python duckdb.sql("SELECT * FROM 'example.csv'") ``` Call `read_csv` from within SQL: ```python duckdb.sql("SELECT * FROM read_csv('example.csv')") ``` See the [CSV Import](#docs:stable:data:csv:overview) page for more information. #### Parquet Files {#docs:stable:clients:python:data_ingestion::parquet-files} Parquet files can be read using the `read_parquet` function, called either from within Python or directly from within SQL. Read from a single Parquet file: ```python duckdb.read_parquet("example.parquet") ``` Read multiple Parquet files from a folder: ```python duckdb.read_parquet("folder/*.parquet") ``` Read a Parquet file over [https](#docs:stable:core_extensions:httpfs:overview): ```python duckdb.read_parquet("https://some.url/some_file.parquet") ``` Read a list of Parquet files: ```python duckdb.read_parquet(["file1.parquet", "file2.parquet", "file3.parquet"]) ``` Directly read a Parquet file from within SQL: ```python duckdb.sql("SELECT * FROM 'example.parquet'") ``` Call `read_parquet` from within SQL: ```python duckdb.sql("SELECT * FROM read_parquet('example.parquet')") ``` See the [Parquet Loading](#docs:stable:data:parquet:overview) page for more information. #### JSON Files {#docs:stable:clients:python:data_ingestion::json-files} JSON files can be read using the `read_json` function, called either from within Python or directly from within SQL. By default, the `read_json` function will automatically detect if a file contains newline-delimited JSON or regular JSON, and will detect the schema of the objects stored within the JSON file. Read from a single JSON file: ```python duckdb.read_json("example.json") ``` Read multiple JSON files from a folder: ```python duckdb.read_json("folder/*.json") ``` Directly read a JSON file from within SQL: ```python duckdb.sql("SELECT * FROM 'example.json'") ``` Call `read_json` from within SQL: ```python duckdb.sql("SELECT * FROM read_json_auto('example.json')") ``` #### Directly Accessing DataFrames and Arrow Objects {#docs:stable:clients:python:data_ingestion::directly-accessing-dataframes-and-arrow-objects} DuckDB is automatically able to query certain Python variables by referring to their variable name (as if it was a table). These types include the following: Pandas DataFrame, Polars DataFrame, Polars LazyFrame, NumPy arrays, [relations](#docs:stable:clients:python:relational_api), and Arrow objects. Only variables that are visible to Python code at the location of the `sql()` or `execute()` call can be used in this manner. Accessing these variables is made possible by [replacement scans](#docs:stable:clients:c:replacement_scans). To disable replacement scans entirely, use: ```sql SET python_enable_replacements = false; ``` DuckDB supports querying multiple types of Apache Arrow objects including [tables](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html), [datasets](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Dataset.html), [RecordBatchReaders](https://arrow.apache.org/docs/python/generated/pyarrow.ipc.RecordBatchStreamReader.html), and [scanners](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Scanner.html). See the Python [guides](#docs:stable:python:overview) for more examples. ```python import duckdb import pandas as pd test_df = pd.DataFrame.from_dict({"i": [1, 2, 3, 4], "j": ["one", "two", "three", "four"]}) print(duckdb.sql("SELECT * FROM test_df").fetchall()) ``` ```text [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')] ``` DuckDB also supports â€œregisteringâ€ a DataFrame or Arrow object as a virtual table, comparable to a SQL `VIEW`. This is useful when querying a DataFrame/Arrow object that is stored in another way (as a class variable, or a value in a dictionary). Below is a Pandas example: If your Pandas DataFrame is stored in another location, here is an example of manually registering it: ```python import duckdb import pandas as pd my_dictionary = {} my_dictionary["test_df"] = pd.DataFrame.from_dict({"i": [1, 2, 3, 4], "j": ["one", "two", "three", "four"]}) duckdb.register("test_df_view", my_dictionary["test_df"]) print(duckdb.sql("SELECT * FROM test_df_view").fetchall()) ``` ```text [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')] ``` You can also create a persistent table in DuckDB from the contents of the DataFrame (or the view): ```python # create a new table from the contents of a DataFrame con.execute("CREATE TABLE test_df_table AS SELECT * FROM test_df") # insert into an existing table from the contents of a DataFrame con.execute("INSERT INTO test_df_table SELECT * FROM test_df") ``` ##### Pandas DataFrames â€“ `object` Columns {#docs:stable:clients:python:data_ingestion::pandas-dataframes--object-columns} `pandas.DataFrame` columns of an `object` dtype require some special care, since this stores values of arbitrary type. To convert these columns to DuckDB, we first go through an analyze phase before converting the values. In this analyze phase a sample of all the rows of the column are analyzed to determine the target type. This sample size is by default set to 1000. If the type picked during the analyze step is incorrect, this will result in `Invalid Input Error: Failed to cast value`, in which case you will need to increase the sample size. The sample size can be changed by setting the `pandas_analyze_sample` config option. ```python # example setting the sample size to 100k duckdb.execute("SET GLOBAL pandas_analyze_sample = 100_000") ``` ##### Registering Objects {#docs:stable:clients:python:data_ingestion::registering-objects} You can register Python objects as DuckDB tables using the [`DuckDBPyConnection.register()` function](#docs:stable:clients:python:reference:index::duckdb.DuckDBPyConnection.register). The precedence of objects with the same name is as follows: * Objects explicitly registered via `DuckDBPyConnection.register()` * Native DuckDB tables and views * [Replacement scans](#docs:stable:clients:c:replacement_scans) ### Conversion between DuckDB and Python {#docs:stable:clients:python:conversion} This page documents the rules for converting [Python objects to DuckDB](#::object-conversion-python-object-to-duckdb) and [DuckDB results to Python](#::result-conversion-duckdb-results-to-python). #### Object Conversion: Python Object to DuckDB {#docs:stable:clients:python:conversion::object-conversion-python-object-to-duckdb} This is a mapping of Python object types to DuckDB [Logical Types](#docs:stable:sql:data_types:overview): * `None` â†’ `NULL` * `bool` â†’ `BOOLEAN` * `datetime.timedelta` â†’ `INTERVAL` * `str` â†’ `VARCHAR` * `bytearray` â†’ `BLOB` * `memoryview` â†’ `BLOB` * `decimal.Decimal` â†’ `DECIMAL` / `DOUBLE` * `uuid.UUID` â†’ `UUID` The rest of the conversion rules are as follows. ##### `int` {#docs:stable:clients:python:conversion::int} Since integers can be of arbitrary size in Python, there is not a one-to-one conversion possible for ints. Instead we perform these casts in order until one succeeds: * `BIGINT` * `INTEGER` * `UBIGINT` * `UINTEGER` * `DOUBLE` When using the DuckDB Value class, it's possible to set a target type, which will influence the conversion. ##### `float` {#docs:stable:clients:python:conversion::float} These casts are tried in order until one succeeds: * `DOUBLE` * `FLOAT` ##### `datetime.datetime` {#docs:stable:clients:python:conversion::datetimedatetime} For `datetime` we will check `pandas.isnull` if it's available and return `NULL` if it returns `true`. We check against `datetime.datetime.min` and `datetime.datetime.max` to convert to `-inf` and `+inf` respectively. If the `datetime` has tzinfo, we will use `TIMESTAMPTZ`, otherwise it becomes `TIMESTAMP`. ##### `datetime.time` {#docs:stable:clients:python:conversion::datetimetime} If the `time` has tzinfo, we will use `TIMETZ`, otherwise it becomes `TIME`. ##### `datetime.date` {#docs:stable:clients:python:conversion::datetimedate} `date` converts to the `DATE` type. We check against `datetime.date.min` and `datetime.date.max` to convert to `-inf` and `+inf` respectively. ##### `bytes` {#docs:stable:clients:python:conversion::bytes} `bytes` converts to `BLOB` by default, when it's used to construct a Value object of type `BITSTRING`, it maps to `BITSTRING` instead. ##### `list` {#docs:stable:clients:python:conversion::list} `list` becomes a `LIST` type of the â€œmost permissiveâ€ type of its children, for example: ```python my_list_value = [ 12345, "test" ] ``` Will become `VARCHAR[]` because 12345 can convert to `VARCHAR` but `test` can not convert to `INTEGER`. ```sql [12345, test] ``` ##### `dict` {#docs:stable:clients:python:conversion::dict} The `dict` object can convert to either `STRUCT(...)` or `MAP(..., ...)` depending on its structure. If the dict has a structure similar to: ```python import duckdb my_map_dict = { "key": [ 1, 2, 3 ], "value": [ "one", "two", "three" ] } duckdb.values(my_map_dict) ``` Then we'll convert it to a `MAP` of key-value pairs of the two lists zipped together. The example above becomes a `MAP(INTEGER, VARCHAR)`: ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ {1=one, 2=two, 3=three} â”‚ â”‚ map(integer, varchar) â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ {1=one, 2=two, 3=three} â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` If the dict is returned by a [function](#docs:stable:clients:python:function), the function will return a `MAP`, therefore the function `return_type` has to be specified. Providing a return type which cannot convert to `MAP` will raise an error: ```python import duckdb duckdb_conn = duckdb.connect() def get_map() -> dict[str,list[str]|list[int]]: return { "key": [ 1, 2, 3 ], "value": [ "one", "two", "three" ] } duckdb_conn.create_function("get_map", get_map, return_type=dict[int, str]) duckdb_conn.sql("select get_map()").show() duckdb_conn.create_function("get_map_error", get_map) duckdb_conn.sql("select get_map_error()").show() ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ get_map() â”‚ â”‚ map(bigint, varchar) â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ {1=one, 2=two, 3=three} â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ConversionException: Conversion Error: Type VARCHAR can't be cast as UNION(u1 VARCHAR[], u2 BIGINT[]). VARCHAR can't be implicitly cast to any of the union member types: VARCHAR[], BIGINT[] ``` > The names of the fields matter and the two lists need to have the same size. Otherwise we'll try to convert it to a `STRUCT`. ```python import duckdb my_struct_dict = { 1: "one", "2": 2, "three": [1, 2, 3], False: True } duckdb.values(my_struct_dict) ``` Becomes: ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ {'1': 'one', '2': 2, 'three': [1, 2, 3], 'False': true} â”‚ â”‚ struct("1" varchar, "2" integer, three integer[], "false" boolean) â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ {'1': one, '2': 2, 'three': [1, 2, 3], 'False': true} â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` If the dict is returned by a [function](#docs:stable:clients:python:function), the function will return a `MAP`, due to [automatic conversion](#docs:stable:clients:python:types::dictkey_type-value_type). To return a `STRUCT`, the `return_type` has to be provided: ```python import duckdb from duckdb.typing import BOOLEAN, INTEGER, VARCHAR from duckdb import list_type, struct_type duckdb_conn = duckdb.connect() my_struct_dict = { 1: "one", "2": 2, "three": [1, 2, 3], False: True } def get_struct() -> dict[str|int|bool,str|int|list[int]|bool]: return my_struct_dict duckdb_conn.create_function("get_struct_as_map", get_struct) duckdb_conn.sql("select get_struct_as_map()").show() duckdb_conn.create_function("get_struct", get_struct, return_type=struct_type({ 1: VARCHAR, "2": INTEGER, "three": list_type(duckdb.typing.INTEGER), False: BOOLEAN })) duckdb_conn.sql("select get_struct()").show() ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ get_struct_as_map() â”‚ â”‚ map(union(u1 varchar, u2 bigint, u3 boolean), union(u1 varchar, u2 bigint, u3 bigint[], u4 boolean)) â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ {1=one, 2=2, three=[1, 2, 3], false=true} â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ get_struct() â”‚ â”‚ struct("1" varchar, "2" integer, three integer[], "false" boolean) â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ {'1': one, '2': 2, 'three': [1, 2, 3], 'False': true} â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` > Every `key` of the dictionary is converted to string. ##### `tuple` {#docs:stable:clients:python:conversion::tuple} `tuple` converts to `LIST` by default, when it's used to construct a Value object of type `STRUCT` it will convert to `STRUCT` instead. ##### `numpy.ndarray` and `numpy.datetime64` {#docs:stable:clients:python:conversion::numpyndarray-and-numpydatetime64} `ndarray` and `datetime64` are converted by calling `tolist()` and converting the result of that. #### Result Conversion: DuckDB Results to Python {#docs:stable:clients:python:conversion::result-conversion-duckdb-results-to-python} DuckDB's Python client provides multiple additional methods that can be used to efficiently retrieve data. ##### NumPy {#docs:stable:clients:python:conversion::numpy} * `fetchnumpy()` fetches the data as a dictionary of NumPy arrays ##### Pandas {#docs:stable:clients:python:conversion::pandas} * `df()` fetches the data as a Pandas DataFrame * `fetchdf()` is an alias of `df()` * `fetch_df()` is an alias of `df()` * `fetch_df_chunk(vector_multiple)` fetches a portion of the results into a DataFrame. The number of rows returned in each chunk is the vector size (2048 by default) * vector_multiple (1 by default). ##### Apache Arrow {#docs:stable:clients:python:conversion::apache-arrow} * `arrow()` fetches the data as an [Arrow table](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) * `fetch_arrow_table()` is an alias of `arrow()` * `fetch_record_batch(chunk_size)` returns an [Arrow record batch reader](https://arrow.apache.org/docs/python/generated/pyarrow.ipc.RecordBatchStreamReader.html) with `chunk_size` rows per batch ##### Polars {#docs:stable:clients:python:conversion::polars} * `pl()` fetches the data as a Polars DataFrame ##### Examples {#docs:stable:clients:python:conversion::examples} Below are some examples using this functionality. See the [Python guides](#docs:stable:python:overview) for more examples. Fetch as Pandas DataFrame: ```python df = con.execute("SELECT * FROM items").fetchdf() print(df) ``` ```text item value count 0 jeans 20.0 1 1 hammer 42.2 2 2 laptop 2000.0 1 3 chainsaw 500.0 10 4 iphone 300.0 2 ``` Fetch as dictionary of NumPy arrays: ```python arr = con.execute("SELECT * FROM items").fetchnumpy() print(arr) ``` ```text {'item': masked_array(data=['jeans', 'hammer', 'laptop', 'chainsaw', 'iphone'], mask=[False, False, False, False, False], fill_value='?', dtype=object), 'value': masked_array(data=[20.0, 42.2, 2000.0, 500.0, 300.0], mask=[False, False, False, False, False], fill_value=1e+20), 'count': masked_array(data=[1, 2, 1, 10, 2], mask=[False, False, False, False, False], fill_value=999999, dtype=int32)} ``` Fetch as an Arrow table. Converting to Pandas afterwards just for pretty printing: ```python tbl = con.execute("SELECT * FROM items").fetch_arrow_table() print(tbl.to_pandas()) ``` ```text item value count 0 jeans 20.00 1 1 hammer 42.20 2 2 laptop 2000.00 1 3 chainsaw 500.00 10 4 iphone 300.00 2 ``` ### Python DB API {#docs:stable:clients:python:dbapi} The standard DuckDB Python API provides a SQL interface compliant with the [DB-API 2.0 specification described by PEP 249](https://www.python.org/dev/peps/pep-0249/) similar to the [SQLite Python API](https://docs.python.org/3.7/library/sqlite3.html). #### Connection {#docs:stable:clients:python:dbapi::connection} To use the module, you must first create a `DuckDBPyConnection` object that represents a connection to a database. This is done through the [`duckdb.connect`](#docs:stable:clients:python:reference:index::duckdb.connect) method. The 'config' keyword argument can be used to provide a `dict` that contains key->value pairs referencing [settings](#docs:stable:configuration:overview::configuration-reference) understood by DuckDB. ##### In-Memory Connection {#docs:stable:clients:python:dbapi::in-memory-connection} The special value `:memory:` can be used to create an **in-memory database**. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the Python process). ###### Named In-memory Connections {#docs:stable:clients:python:dbapi::named-in-memory-connections} The special value `:memory:` can also be postfixed with a name, for example: `:memory:conn3`. When a name is provided, subsequent `duckdb.connect` calls will create a new connection to the same database, sharing the catalogs (views, tables, macros etc.). Using `:memory:` without a name will always create a new and separate database instance. ##### Default Connection {#docs:stable:clients:python:dbapi::default-connection} By default we create an (unnamed) **in-memory-database** that lives inside the `duckdb` module. Every method of `DuckDBPyConnection` is also available on the `duckdb` module, this connection is what's used by these methods. The special value `:default:` can be used to get this default connection. ##### File-Based Connection {#docs:stable:clients:python:dbapi::file-based-connection} If the `database` is a file path, a connection to a persistent database is established. If the file does not exist the file will be created (the extension of the file is irrelevant and can be `.db`, `.duckdb` or anything else). ###### `read_only` Connections {#docs:stable:clients:python:dbapi::read_only-connections} If you would like to connect in read-only mode, you can set the `read_only` flag to `True`. If the file does not exist, it is **not** created when connecting in read-only mode. Read-only mode is required if multiple Python processes want to access the same database file at the same time. ```python import duckdb duckdb.execute("CREATE TABLE tbl AS SELECT 42 a") con = duckdb.connect(":default:") con.sql("SELECT * FROM tbl") # or duckdb.default_connection().sql("SELECT * FROM tbl") ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â” â”‚ a â”‚ â”‚ int32 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 42 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ```python import duckdb # to start an in-memory database con = duckdb.connect(database = ":memory:") # to use a database file (not shared between processes) con = duckdb.connect(database = "my-db.duckdb", read_only = False) # to use a database file (shared between processes) con = duckdb.connect(database = "my-db.duckdb", read_only = True) # to explicitly get the default connection con = duckdb.connect(database = ":default:") ``` If you want to create a second connection to an existing database, you can use the `cursor()` method. This might be useful for example to allow parallel threads running queries independently. A single connection is thread-safe but is locked for the duration of the queries, effectively serializing database access in this case. Connections are closed implicitly when they go out of scope or if they are explicitly closed using `close()`. Once the last connection to a database instance is closed, the database instance is closed as well. #### Querying {#docs:stable:clients:python:dbapi::querying} SQL queries can be sent to DuckDB using the `execute()` method of connections. Once a query has been executed, results can be retrieved using the `fetchone` and `fetchall` methods on the connection. `fetchall` will retrieve all results and complete the transaction. `fetchone` will retrieve a single row of results each time that it is invoked until no more results are available. The transaction will only close once `fetchone` is called and there are no more results remaining (the return value will be `None`). As an example, in the case of a query only returning a single row, `fetchone` should be called once to retrieve the results and a second time to close the transaction. Below are some short examples: ```python # create a table con.execute("CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)") # insert two items into the table con.execute("INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)") # retrieve the items again con.execute("SELECT * FROM items") print(con.fetchall()) # [('jeans', Decimal('20.00'), 1), ('hammer', Decimal('42.20'), 2)] # retrieve the items one at a time con.execute("SELECT * FROM items") print(con.fetchone()) # ('jeans', Decimal('20.00'), 1) print(con.fetchone()) # ('hammer', Decimal('42.20'), 2) print(con.fetchone()) # This closes the transaction. Any subsequent calls to .fetchone will return None # None ``` The `description` property of the connection object contains the column names as per the standard. ##### Prepared Statements {#docs:stable:clients:python:dbapi::prepared-statements} DuckDB also supports [prepared statements](#docs:stable:sql:query_syntax:prepared_statements) in the API with the `execute` and `executemany` methods. The values may be passed as an additional parameter after a query that contains `?` or `$1` (dollar symbol and a number) placeholders. Using the `?` notation adds the values in the same sequence as passed within the Python parameter. Using the `$` notation allows for values to be reused within the SQL statement based on the number and index of the value found within the Python parameter. Values are converted according to the [conversion rules](#docs:stable:clients:python:conversion::object-conversion-python-object-to-duckdb). Here are some examples. First, insert a row using a [prepared statement](#docs:stable:sql:query_syntax:prepared_statements): ```python con.execute("INSERT INTO items VALUES (?, ?, ?)", ["laptop", 2000, 1]) ``` Second, insert several rows using a [prepared statement](#docs:stable:sql:query_syntax:prepared_statements): ```python con.executemany("INSERT INTO items VALUES (?, ?, ?)", [["chainsaw", 500, 10], ["iphone", 300, 2]] ) ``` Query the database using a [prepared statement](#docs:stable:sql:query_syntax:prepared_statements): ```python con.execute("SELECT item FROM items WHERE value > ?", [400]) print(con.fetchall()) ``` ```text [('laptop',), ('chainsaw',)] ``` Query using the `$` notation for a [prepared statement](#docs:stable:sql:query_syntax:prepared_statements) and reused values: ```python con.execute("SELECT $1, $1, $2", ["duck", "goose"]) print(con.fetchall()) ``` ```text [('duck', 'duck', 'goose')] ``` > **Warning.** Do *not* use `executemany` to insert large amounts of data into DuckDB. See the [data ingestion page](#docs:stable:clients:python:data_ingestion) for better options. #### Named Parameters {#docs:stable:clients:python:dbapi::named-parameters} Besides the standard unnamed parameters, like `$1`, `$2` etc., it's also possible to supply named parameters, like `$my_parameter`. When using named parameters, you have to provide a dictionary mapping of `str` to value in the `parameters` argument. An example use is the following: ```python import duckdb res = duckdb.execute(""" SELECT $my_param, $other_param, $also_param """, { "my_param": 5, "other_param": "DuckDB", "also_param": [42] } ).fetchall() print(res) ``` ```text [(5, 'DuckDB', [42])] ``` ### Relational API {#docs:stable:clients:python:relational_api} The Relational API is an alternative API that can be used to incrementally construct queries. The API is centered around `DuckDBPyRelation` nodes. The relations can be seen as symbolic representations of SQL queries. #### Lazy Evaluation {#docs:stable:clients:python:relational_api::lazy-evaluation} The relations do not hold any data â€“ and nothing is executed â€“ until [a method that triggers execution](#::output) is called. For example, we create a relation, which loads 1 billion rows: ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql("from range(1_000_000_000)") ``` At the moment of execution, `rel` does not hold any data and no data is retrieved from the database. By calling `rel.show()` or simply printing `rel` on the terminal, the first 10K rows are fetched. If there are more than 10K rows, the output window will show >9999 rows (as the amount of rows in the relation is unknown). By calling an [output](#::output) method, the data is retrieved and stored in the specified format: ```python rel.to_table("example_rel") # 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ– ``` #### Relation Creation {#docs:stable:clients:python:relational_api::relation-creation-} This section contains the details on how a relation is created. The methods are [lazy evaluated](#::lazy-evaluation). | Name | Description | |:--|:-------| | [`from_arrow`](#::from_arrow) | Create a relation object from an Arrow object | | [`from_csv_auto`](#::from_csv_auto) | Create a relation object from the CSV file in 'name' | | [`from_df`](#::from_df) | Create a relation object from the DataFrame in df | | [`from_parquet`](#::from_parquet) | Create a relation object from the Parquet files | | [`from_query`](#::from_query) | Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is. | | [`query`](#::query) | Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is. | | [`read_csv`](#::read_csv) | Create a relation object from the CSV file in 'name' | | [`read_json`](#::read_json) | Create a relation object from the JSON file in 'name' | | [`read_parquet`](#::read_parquet) | Create a relation object from the Parquet files | | [`sql`](#::sql) | Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is. | | [`table`](#::table) | Create a relation object for the named table | | [`table_function`](#::table_function) | Create a relation object from the named table function with given parameters | | [`values`](#::values) | Create a relation object from the passed values | | [`view`](#::view) | Create a relation object for the named view | ###### `from_arrow` {#docs:stable:clients:python:relational_api::from_arrow} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python from_arrow(self: _duckdb.DuckDBPyConnection, arrow_object: object) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Create a relation object from an Arrow object ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **arrow_object** : pyarrow.Table, pyarrow.RecordBatch Arrow object to create a relation from ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb import pyarrow as pa ids = pa.array([1], type=pa.int8()) texts = pa.array(['a'], type=pa.string()) example_table = pa.table([ids, texts], names=["id", "text"]) duckdb_conn = duckdb.connect() rel = duckdb_conn.from_arrow(example_table) rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ text â”‚ â”‚ int8 â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ a â”‚ â””â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `from_csv_auto` {#docs:stable:clients:python:relational_api::from_csv_auto} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python from_csv_auto(self: _duckdb.DuckDBPyConnection, path_or_buffer: object, **kwargs) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Create a relation object from the CSV file in 'name' **Aliases**: [`read_csv`](#::read_csv) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **path_or_buffer** : Union[str, StringIO, TextIOBase] Path to the CSV file or buffer to read from. - **header** : Optional[bool], Optional[int] Row number(s) to use as the column names, or None if no header. - **compression** : Optional[str] Compression type (e.g., 'gzip', 'bz2'). - **sep** : Optional[str] Delimiter to use; defaults to comma. - **delimiter** : Optional[str] Alternative delimiter to use. - **dtype** : Optional[Dict[str, str]], Optional[List[str]] Data types for columns. - **na_values** : Optional[str], Optional[List[str]] Additional strings to recognize as NA/NaN. - **skiprows** : Optional[int] Number of rows to skip at the start. - **quotechar** : Optional[str] Character used to quote fields. - **escapechar** : Optional[str] Character used to escape delimiter or quote characters. - **encoding** : Optional[str] Encoding to use for UTF when reading/writing. - **parallel** : Optional[bool] Enable parallel reading. - **date_format** : Optional[str] Format to parse dates. - **timestamp_format** : Optional[str] Format to parse timestamps. - **sample_size** : Optional[int] Number of rows to sample for schema inference. - **all_varchar** : Optional[bool] Treat all columns as VARCHAR. - **normalize_names** : Optional[bool] Normalize column names to lowercase. - **null_padding** : Optional[bool] Enable null padding for rows with missing columns. - **names** : Optional[List[str]] List of column names to use. - **lineterminator** : Optional[str] Character to break lines on. - **columns** : Optional[Dict[str, str]] Column mapping for schema. - **auto_type_candidates** : Optional[List[str]] List of columns for automatic type inference. - **max_line_size** : Optional[int] Maximum line size in bytes. - **ignore_errors** : Optional[bool] Ignore parsing errors. - **store_rejects** : Optional[bool] Store rejected rows. - **rejects_table** : Optional[str] Table name to store rejected rows. - **rejects_scan** : Optional[str] Scan to use for rejects. - **rejects_limit** : Optional[int] Limit number of rejects stored. - **force_not_null** : Optional[List[str]] List of columns to force as NOT NULL. - **buffer_size** : Optional[int] Buffer size in bytes. - **decimal** : Optional[str] Character to recognize as decimal point. - **allow_quoted_nulls** : Optional[bool] Allow quoted NULL values. - **filename** : Optional[bool], Optional[str] Add filename column or specify filename. - **hive_partitioning** : Optional[bool] Enable Hive-style partitioning. - **union_by_name** : Optional[bool] Union files by column name instead of position. - **hive_types** : Optional[Dict[str, str]] Hive types for columns. - **hive_types_autocast** : Optional[bool] Automatically cast Hive types. - **connection** : DuckDBPyConnection DuckDB connection to use. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import csv import duckdb duckdb_conn = duckdb.connect() with open('code_example.csv', 'w', newline='') as csvfile: fieldnames = ['id', 'text'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() writer.writerow({'id': '1', 'text': 'a'}) rel = duckdb_conn.from_csv_auto("code_example.csv") rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ text â”‚ â”‚ int64 â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ a â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `from_df` {#docs:stable:clients:python:relational_api::from_df} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python from_df(self: _duckdb.DuckDBPyConnection, df: pandas.DataFrame) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Create a relation object from the DataFrame in df ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **df** : pandas.DataFrame A pandas DataFrame to be converted into a DuckDB relation. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb import pandas as pd df = pd.DataFrame(data = {'id': [1], "text":["a"]}) duckdb_conn = duckdb.connect() rel = duckdb_conn.from_df(df) rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ text â”‚ â”‚ int64 â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ a â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `from_parquet` {#docs:stable:clients:python:relational_api::from_parquet} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python from_parquet(*args, **kwargs) Overloaded function. 1. from_parquet(self: _duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _duckdb.DuckDBPyRelation Create a relation object from the Parquet files in file_glob 2. from_parquet(self: _duckdb.DuckDBPyConnection, file_globs: collections.abc.Sequence[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _duckdb.DuckDBPyRelation Create a relation object from the Parquet files in file_globs ``` ####### Description {#docs:stable:clients:python:relational_api::description} Create a relation object from the Parquet files **Aliases**: [`read_parquet`](#::read_parquet) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **file_glob** : str File path or glob pattern pointing to Parquet files to be read. - **binary_as_string** : bool, default: False Interpret binary columns as strings instead of blobs. - **file_row_number** : bool, default: False Add a column containing the row number within each file. - **filename** : bool, default: False Add a column containing the name of the file each row came from. - **hive_partitioning** : bool, default: False Enable automatic detection of Hive-style partitions in file paths. - **union_by_name** : bool, default: False Union Parquet files by matching column names instead of positions. - **compression** : object Optional compression codec to use when reading the Parquet files. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb import pyarrow as pa import pyarrow.parquet as pq ids = pa.array([1], type=pa.int8()) texts = pa.array(['a'], type=pa.string()) example_table = pa.table([ids, texts], names=["id", "text"]) pq.write_table(example_table, "code_example.parquet") duckdb_conn = duckdb.connect() rel = duckdb_conn.from_parquet("code_example.parquet") rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ text â”‚ â”‚ int8 â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ a â”‚ â””â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `from_query` {#docs:stable:clients:python:relational_api::from_query} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python from_query(self: _duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is. **Aliases**: [`query`](#::query), [`sql`](#::sql) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **query** : object The SQL query or subquery to be executed and converted into a relation. - **alias** : str, default: '' Optional alias name to assign to the resulting relation. - **params** : object Optional query parameters to be used in the SQL query. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.from_query("from range(1,2) tbl(id)") rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `query` {#docs:stable:clients:python:relational_api::query} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python query(self: _duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is. **Aliases**: [`from_query`](#::from_query), [`sql`](#::sql) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **query** : object The SQL query or subquery to be executed and converted into a relation. - **alias** : str, default: '' Optional alias name to assign to the resulting relation. - **params** : object Optional query parameters to be used in the SQL query. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.query("from range(1,2) tbl(id)") rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `read_csv` {#docs:stable:clients:python:relational_api::read_csv} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python read_csv(self: _duckdb.DuckDBPyConnection, path_or_buffer: object, **kwargs) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Create a relation object from the CSV file in 'name' **Aliases**: [`from_csv_auto`](#::from_csv_auto) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **path_or_buffer** : Union[str, StringIO, TextIOBase] Path to the CSV file or buffer to read from. - **header** : Optional[bool], Optional[int] Row number(s) to use as the column names, or None if no header. - **compression** : Optional[str] Compression type (e.g., 'gzip', 'bz2'). - **sep** : Optional[str] Delimiter to use; defaults to comma. - **delimiter** : Optional[str] Alternative delimiter to use. - **dtype** : Optional[Dict[str, str]], Optional[List[str]] Data types for columns. - **na_values** : Optional[str], Optional[List[str]] Additional strings to recognize as NA/NaN. - **skiprows** : Optional[int] Number of rows to skip at the start. - **quotechar** : Optional[str] Character used to quote fields. - **escapechar** : Optional[str] Character used to escape delimiter or quote characters. - **encoding** : Optional[str] Encoding to use for UTF when reading/writing. - **parallel** : Optional[bool] Enable parallel reading. - **date_format** : Optional[str] Format to parse dates. - **timestamp_format** : Optional[str] Format to parse timestamps. - **sample_size** : Optional[int] Number of rows to sample for schema inference. - **all_varchar** : Optional[bool] Treat all columns as VARCHAR. - **normalize_names** : Optional[bool] Normalize column names to lowercase. - **null_padding** : Optional[bool] Enable null padding for rows with missing columns. - **names** : Optional[List[str]] List of column names to use. - **lineterminator** : Optional[str] Character to break lines on. - **columns** : Optional[Dict[str, str]] Column mapping for schema. - **auto_type_candidates** : Optional[List[str]] List of columns for automatic type inference. - **max_line_size** : Optional[int] Maximum line size in bytes. - **ignore_errors** : Optional[bool] Ignore parsing errors. - **store_rejects** : Optional[bool] Store rejected rows. - **rejects_table** : Optional[str] Table name to store rejected rows. - **rejects_scan** : Optional[str] Scan to use for rejects. - **rejects_limit** : Optional[int] Limit number of rejects stored. - **force_not_null** : Optional[List[str]] List of columns to force as NOT NULL. - **buffer_size** : Optional[int] Buffer size in bytes. - **decimal** : Optional[str] Character to recognize as decimal point. - **allow_quoted_nulls** : Optional[bool] Allow quoted NULL values. - **filename** : Optional[bool], Optional[str] Add filename column or specify filename. - **hive_partitioning** : Optional[bool] Enable Hive-style partitioning. - **union_by_name** : Optional[bool] Union files by column name instead of position. - **hive_types** : Optional[Dict[str, str]] Hive types for columns. - **hive_types_autocast** : Optional[bool] Automatically cast Hive types. - **connection** : DuckDBPyConnection DuckDB connection to use. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import csv import duckdb duckdb_conn = duckdb.connect() with open('code_example.csv', 'w', newline='') as csvfile: fieldnames = ['id', 'text'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() writer.writerow({'id': '1', 'text': 'a'}) rel = duckdb_conn.read_csv("code_example.csv") rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ text â”‚ â”‚ int64 â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ a â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `read_json` {#docs:stable:clients:python:relational_api::read_json} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python read_json(self: _duckdb.DuckDBPyConnection, path_or_buffer: object, *, columns: typing.Optional[object] = None, sample_size: typing.Optional[object] = None, maximum_depth: typing.Optional[object] = None, records: typing.Optional[str] = None, format: typing.Optional[str] = None, date_format: typing.Optional[object] = None, timestamp_format: typing.Optional[object] = None, compression: typing.Optional[object] = None, maximum_object_size: typing.Optional[object] = None, ignore_errors: typing.Optional[object] = None, convert_strings_to_integers: typing.Optional[object] = None, field_appearance_threshold: typing.Optional[object] = None, map_inference_threshold: typing.Optional[object] = None, maximum_sample_files: typing.Optional[object] = None, filename: typing.Optional[object] = None, hive_partitioning: typing.Optional[object] = None, union_by_name: typing.Optional[object] = None, hive_types: typing.Optional[object] = None, hive_types_autocast: typing.Optional[object] = None) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Create a relation object from the JSON file in 'name' ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **path_or_buffer** : object File path or file-like object containing JSON data to be read. - **columns** : object Optional list of column names to project from the JSON data. - **sample_size** : object Number of rows to sample for inferring JSON schema. - **maximum_depth** : object Maximum depth to which JSON objects should be parsed. - **records** : str Format string specifying whether JSON is in records mode. - **format** : str Format of the JSON data (e.g., 'auto', 'newline_delimited'). - **date_format** : object Format string for parsing date fields. - **timestamp_format** : object Format string for parsing timestamp fields. - **compression** : object Compression codec used on the JSON data (e.g., 'gzip'). - **maximum_object_size** : object Maximum size in bytes for individual JSON objects. - **ignore_errors** : object If True, skip over JSON records with parsing errors. - **convert_strings_to_integers** : object If True, attempt to convert strings to integers where appropriate. - **field_appearance_threshold** : object Threshold for inferring optional fields in nested JSON. - **map_inference_threshold** : object Threshold for inferring maps from JSON object patterns. - **maximum_sample_files** : object Maximum number of files to sample for schema inference. - **filename** : object If True, include a column with the source filename for each row. - **hive_partitioning** : object If True, enable Hive partitioning based on directory structure. - **union_by_name** : object If True, align JSON columns by name instead of position. - **hive_types** : object If True, use Hive types from directory structure for schema. - **hive_types_autocast** : object If True, automatically cast data types to match Hive types. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb import json with open("code_example.json", mode="w") as f: json.dump([{'id': 1, "text":"a"}], f) duckdb_conn = duckdb.connect() rel = duckdb_conn.read_json("code_example.json") rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ text â”‚ â”‚ int64 â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ a â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `read_parquet` {#docs:stable:clients:python:relational_api::read_parquet} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python read_parquet(*args, **kwargs) Overloaded function. 1. read_parquet(self: _duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _duckdb.DuckDBPyRelation Create a relation object from the Parquet files in file_glob 2. read_parquet(self: _duckdb.DuckDBPyConnection, file_globs: collections.abc.Sequence[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _duckdb.DuckDBPyRelation Create a relation object from the Parquet files in file_globs ``` ####### Description {#docs:stable:clients:python:relational_api::description} Create a relation object from the Parquet files **Aliases**: [`from_parquet`](#::from_parquet) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **file_glob** : str File path or glob pattern pointing to Parquet files to be read. - **binary_as_string** : bool, default: False Interpret binary columns as strings instead of blobs. - **file_row_number** : bool, default: False Add a column containing the row number within each file. - **filename** : bool, default: False Add a column containing the name of the file each row came from. - **hive_partitioning** : bool, default: False Enable automatic detection of Hive-style partitions in file paths. - **union_by_name** : bool, default: False Union Parquet files by matching column names instead of positions. - **compression** : object Optional compression codec to use when reading the Parquet files. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb import pyarrow as pa import pyarrow.parquet as pq ids = pa.array([1], type=pa.int8()) texts = pa.array(['a'], type=pa.string()) example_table = pa.table([ids, texts], names=["id", "text"]) pq.write_table(example_table, "code_example.parquet") duckdb_conn = duckdb.connect() rel = duckdb_conn.read_parquet("code_example.parquet") rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ text â”‚ â”‚ int8 â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ a â”‚ â””â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `sql` {#docs:stable:clients:python:relational_api::sql} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python sql(self: _duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is. **Aliases**: [`from_query`](#::from_query), [`query`](#::query) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **query** : object The SQL query or subquery to be executed and converted into a relation. - **alias** : str, default: '' Optional alias name to assign to the resulting relation. - **params** : object Optional query parameters to be used in the SQL query. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql("from range(1,2) tbl(id)") rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `table` {#docs:stable:clients:python:relational_api::table} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python table(self: _duckdb.DuckDBPyConnection, table_name: str) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Create a relation object for the named table ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **table_name** : str Name of the table to create a relation from. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() duckdb_conn.sql("create table code_example as select * from range(1,2) tbl(id)") rel = duckdb_conn.table("code_example") rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `table_function` {#docs:stable:clients:python:relational_api::table_function} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python table_function(self: _duckdb.DuckDBPyConnection, name: str, parameters: object = None) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Create a relation object from the named table function with given parameters ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **name** : str Name of the table function to call. - **parameters** : object Optional parameters to pass to the table function. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() duckdb_conn.sql(""" create macro get_record_for(x) as table select x*range from range(1,2) """) rel = duckdb_conn.table_function(name="get_record_for", parameters=[1]) rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ (1 * "range") â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `values` {#docs:stable:clients:python:relational_api::values} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python values(self: _duckdb.DuckDBPyConnection, *args) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Create a relation object from the passed values ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.values([1, 'a']) rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ col0 â”‚ col1 â”‚ â”‚ int32 â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ a â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `view` {#docs:stable:clients:python:relational_api::view} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python view(self: _duckdb.DuckDBPyConnection, view_name: str) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Create a relation object for the named view ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **view_name** : str Name of the view to create a relation from. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() duckdb_conn.sql("create table code_example as select * from range(1,2) tbl(id)") rel = duckdb_conn.view("code_example") rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Relation Definition Details {#docs:stable:clients:python:relational_api::relation-definition-details-} This section contains the details on how to inspect a relation. | Name | Description | |:--|:-------| | [`alias`](#::alias) | Get the name of the current alias | | [`columns`](#::columns) | Return a list containing the names of the columns of the relation. | | [`describe`](#::describe) | Gives basic statistics (e.g., min, max) and if NULL exists for each column of the relation. | | [`description`](#::description) | Return the description of the result | | [`dtypes`](#::dtypes) | Return a list containing the types of the columns of the relation. | | [`explain`](#::explain) | explain(self: _duckdb.DuckDBPyRelation, type: _duckdb.ExplainType = 'standard') -> str | | [`query`](#::query-1) | Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object | | [`set_alias`](#::set_alias) | Rename the relation object to new alias | | [`shape`](#::shape) | Tuple of # of rows, # of columns in relation. | | [`show`](#::show) | Display a summary of the data | | [`sql_query`](#::sql_query) | Get the SQL query that is equivalent to the relation | | [`type`](#::type) | Get the type of the relation. | | [`types`](#::types) | Return a list containing the types of the columns of the relation. | ###### `alias` {#docs:stable:clients:python:relational_api::alias} ####### Description {#docs:stable:clients:python:relational_api::description} Get the name of the current alias ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.alias ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text unnamed_relation_43c808c247431be5 ``` ---- ###### `columns` {#docs:stable:clients:python:relational_api::columns} ####### Description {#docs:stable:clients:python:relational_api::description} Return a list containing the names of the columns of the relation. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.columns ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text ['id', 'description', 'value', 'created_timestamp'] ``` ---- ###### `describe` {#docs:stable:clients:python:relational_api::describe} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python describe(self: _duckdb.DuckDBPyRelation) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Gives basic statistics (e.g., min, max) and if NULL exists for each column of the relation. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.describe() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ aggr â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ â”‚ varchar â”‚ varchar â”‚ varchar â”‚ double â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ count â”‚ 9 â”‚ 9 â”‚ 9.0 â”‚ 9 â”‚ â”‚ mean â”‚ NULL â”‚ NULL â”‚ 5.0 â”‚ NULL â”‚ â”‚ stddev â”‚ NULL â”‚ NULL â”‚ 2.7386127875258306 â”‚ NULL â”‚ â”‚ min â”‚ 08fdcbf8-4e53-4290-9e81-423af263b518 â”‚ value is even â”‚ 1.0 â”‚ 2025-04-09 15:41:20.642+02 â”‚ â”‚ max â”‚ fb10390e-fad5-4694-91cb-e82728cb6f9f â”‚ value is uneven â”‚ 9.0 â”‚ 2025-04-09 15:49:20.642+02 â”‚ â”‚ median â”‚ NULL â”‚ NULL â”‚ 5.0 â”‚ NULL â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `description` {#docs:stable:clients:python:relational_api::description} ####### Description {#docs:stable:clients:python:relational_api::description} Return the description of the result ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.description ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text [('id', 'UUID', None, None, None, None, None), ('description', 'STRING', None, None, None, None, None), ('value', 'NUMBER', None, None, None, None, None), ('created_timestamp', 'DATETIME', None, None, None, None, None)] ``` ---- ###### `dtypes` {#docs:stable:clients:python:relational_api::dtypes} ####### Description {#docs:stable:clients:python:relational_api::description} Return a list containing the types of the columns of the relation. **Aliases**: [`types`](#::types) ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.dtypes ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text [UUID, VARCHAR, BIGINT, TIMESTAMP WITH TIME ZONE] ``` ---- ###### `explain` {#docs:stable:clients:python:relational_api::explain} ####### Description {#docs:stable:clients:python:relational_api::description} explain(self: _duckdb.DuckDBPyRelation, type: _duckdb.ExplainType = 'standard') -> str ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.explain() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ PROJECTION â”‚ â”‚ â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€ â”‚ â”‚ id â”‚ â”‚ description â”‚ â”‚ value â”‚ â”‚ created_timestamp â”‚ â”‚ â”‚ â”‚ ~9 Rows â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ RANGE â”‚ â”‚ â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€ â”‚ â”‚ Function: RANGE â”‚ â”‚ â”‚ â”‚ ~9 Rows â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `query` {#docs:stable:clients:python:relational_api::query} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python query(self: _duckdb.DuckDBPyRelation, virtual_table_name: str, sql_query: str) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **virtual_table_name** : str The name to assign to the current relation when referenced in the SQL query. - **sql_query** : str The SQL query string that uses the virtual table name to query the relation. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.query(virtual_table_name="rel_view", sql_query="from rel") duckdb_conn.sql("show rel_view") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ column_name â”‚ column_type â”‚ null â”‚ key â”‚ default â”‚ extra â”‚ â”‚ varchar â”‚ varchar â”‚ varchar â”‚ varchar â”‚ varchar â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ id â”‚ UUID â”‚ YES â”‚ NULL â”‚ NULL â”‚ NULL â”‚ â”‚ description â”‚ VARCHAR â”‚ YES â”‚ NULL â”‚ NULL â”‚ NULL â”‚ â”‚ value â”‚ BIGINT â”‚ YES â”‚ NULL â”‚ NULL â”‚ NULL â”‚ â”‚ created_timestamp â”‚ TIMESTAMP WITH TIME ZONE â”‚ YES â”‚ NULL â”‚ NULL â”‚ NULL â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `set_alias` {#docs:stable:clients:python:relational_api::set_alias} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python set_alias(self: _duckdb.DuckDBPyRelation, alias: str) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Rename the relation object to new alias ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **alias** : str The alias name to assign to the relation. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.set_alias('abc').select('abc.id') ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text In the SQL query, the alias will be `abc` ``` ---- ###### `shape` {#docs:stable:clients:python:relational_api::shape} ####### Description {#docs:stable:clients:python:relational_api::description} Tuple of # of rows, # of columns in relation. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.shape ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text (9, 4) ``` ---- ###### `show` {#docs:stable:clients:python:relational_api::show} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python show(self: _duckdb.DuckDBPyRelation, *, max_width: typing.Optional[typing.SupportsInt] = None, max_rows: typing.Optional[typing.SupportsInt] = None, max_col_width: typing.Optional[typing.SupportsInt] = None, null_value: typing.Optional[str] = None, render_mode: object = None) -> None ``` ####### Description {#docs:stable:clients:python:relational_api::description} Display a summary of the data ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **max_width** : int Maximum display width for the entire output in characters. - **max_rows** : int Maximum number of rows to display. - **max_col_width** : int Maximum number of characters to display per column. - **null_value** : str String to display in place of NULL values. - **render_mode** : object Render mode for displaying the output. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ â”‚ uuid â”‚ varchar â”‚ int64 â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 642ea3d7-793d-4867-a759-91c1226c25a0 â”‚ value is uneven â”‚ 1 â”‚ 2025-04-09 15:41:20.642+02 â”‚ â”‚ 6817dd31-297c-40a8-8e40-8521f00b2d08 â”‚ value is even â”‚ 2 â”‚ 2025-04-09 15:42:20.642+02 â”‚ â”‚ 45143f9a-e16e-4e59-91b2-3a0800eed6d6 â”‚ value is uneven â”‚ 3 â”‚ 2025-04-09 15:43:20.642+02 â”‚ â”‚ fb10390e-fad5-4694-91cb-e82728cb6f9f â”‚ value is even â”‚ 4 â”‚ 2025-04-09 15:44:20.642+02 â”‚ â”‚ 111ced5c-9155-418e-b087-c331b814db90 â”‚ value is uneven â”‚ 5 â”‚ 2025-04-09 15:45:20.642+02 â”‚ â”‚ 66a870a6-aef0-4085-87d5-5d1b35d21c66 â”‚ value is even â”‚ 6 â”‚ 2025-04-09 15:46:20.642+02 â”‚ â”‚ a7e8e796-bca0-44cd-a269-1d71090fb5cc â”‚ value is uneven â”‚ 7 â”‚ 2025-04-09 15:47:20.642+02 â”‚ â”‚ 74908d48-7f2d-4bdd-9c92-1e7920b115b5 â”‚ value is even â”‚ 8 â”‚ 2025-04-09 15:48:20.642+02 â”‚ â”‚ 08fdcbf8-4e53-4290-9e81-423af263b518 â”‚ value is uneven â”‚ 9 â”‚ 2025-04-09 15:49:20.642+02 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `sql_query` {#docs:stable:clients:python:relational_api::sql_query} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python sql_query(self: _duckdb.DuckDBPyRelation) -> str ``` ####### Description {#docs:stable:clients:python:relational_api::description} Get the SQL query that is equivalent to the relation ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.sql_query() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```sql SELECT gen_random_uuid() AS id, concat('value is ', CASE WHEN ((mod("range", 2) = 0)) THEN ('even') ELSE 'uneven' END) AS description, "range" AS "value", (now() + CAST(concat("range", ' ', 'minutes') AS INTERVAL)) AS created_timestamp FROM "range"(1, 10) ``` ---- ###### `type` {#docs:stable:clients:python:relational_api::type} ####### Description {#docs:stable:clients:python:relational_api::description} Get the type of the relation. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.type ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text QUERY_RELATION ``` ---- ###### `types` {#docs:stable:clients:python:relational_api::types} ####### Description {#docs:stable:clients:python:relational_api::description} Return a list containing the types of the columns of the relation. **Aliases**: [`dtypes`](#::dtypes) ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.types ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text [UUID, VARCHAR, BIGINT, TIMESTAMP WITH TIME ZONE] ``` #### Transformation {#docs:stable:clients:python:relational_api::transformation-} This section contains the methods which can be used to chain queries. The methods are [lazy evaluated](#::lazy-evaluation). | Name | Description | |:--|:-------| | [`aggregate`](#::aggregate) | Compute the aggregate aggr_expr by the optional groups group_expr on the relation | | [`apply`](#::apply) | Compute the function of a single column or a list of columns by the optional groups on the relation | | [`cross`](#::cross) | Create cross/cartesian product of two relational objects | | [`except_`](#::except_) | Create the set except of this relation object with another relation object in other_rel | | [`filter`](#::filter) | Filter the relation object by the filter in filter_expr | | [`insert`](#::insert) | Inserts the given values into the relation | | [`insert_into`](#::insert_into) | Inserts the relation object into an existing table named table_name | | [`intersect`](#::intersect) | Create the set intersection of this relation object with another relation object in other_rel | | [`join`](#::join) | Join the relation object with another relation object in other_rel using the join condition expression in join_condition. Types supported are 'inner', 'left', 'right', 'outer', 'semi' and 'anti' | | [`limit`](#::limit) | Only retrieve the first n rows from this relation object, starting at offset | | [`map`](#::map) | Calls the passed function on the relation | | [`order`](#::order) | Reorder the relation object by order_expr | | [`project`](#::project) | Project the relation object by the projection in project_expr | | [`select`](#::select) | Project the relation object by the projection in project_expr | | [`sort`](#::sort) | Reorder the relation object by the provided expressions | | [`union`](#::union) | Create the set union of this relation object with another relation object in other_rel | | [`update`](#::update) | Update the given relation with the provided expressions | ###### `aggregate` {#docs:stable:clients:python:relational_api::aggregate} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python aggregate(self: _duckdb.DuckDBPyRelation, aggr_expr: object, group_expr: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Compute the aggregate aggr_expr by the optional groups group_expr on the relation ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **aggr_expr** : str, list[Expression] The list of columns and aggregation functions. - **group_expr** : str, default: '' The list of columns to be included in `group_by`. If `None`, `group by all` is applied. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.aggregate('max(value)') ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ max("value") â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 9 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `apply` {#docs:stable:clients:python:relational_api::apply} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python apply(self: _duckdb.DuckDBPyRelation, function_name: str, function_aggr: str, group_expr: str = '', function_parameter: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Compute the function of a single column or a list of columns by the optional groups on the relation ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **function_name** : str Name of the function to apply over the relation. - **function_aggr** : str The list of columns to apply the function over. - **group_expr** : str, default: '' Optional SQL expression for grouping. - **function_parameter** : str, default: '' Optional parameters to pass into the function. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.apply( function_name="count", function_aggr="id", group_expr="description", projected_columns="description" ) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ count(id) â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 5 â”‚ â”‚ value is even â”‚ 4 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `cross` {#docs:stable:clients:python:relational_api::cross} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python cross(self: _duckdb.DuckDBPyRelation, other_rel: _duckdb.DuckDBPyRelation) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Create cross/cartesian product of two relational objects ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **other_rel** : duckdb.duckdb.DuckDBPyRelation Another relation to perform a cross product with. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.cross(other_rel=rel.set_alias("other_rel")) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ â”‚ uuid â”‚ varchar â”‚ int64 â”‚ timestamp with time zone â”‚ uuid â”‚ varchar â”‚ int64 â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ cb2b453f-1a06-4f5e-abe1-bâ€¦ â”‚ value is uneven â”‚ 1 â”‚ 2025-04-10 09:53:29.78+02 â”‚ cb2b453f-1a06-4f5e-abe1-bfd413581bcf â”‚ value is uneven â”‚ 1 â”‚ 2025-04-10 09:53:29.78+02 â”‚ ... ``` ---- ###### `except_` {#docs:stable:clients:python:relational_api::except_} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python except_(self: _duckdb.DuckDBPyRelation, other_rel: _duckdb.DuckDBPyRelation) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Create the set except of this relation object with another relation object in other_rel ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **other_rel** : duckdb.duckdb.DuckDBPyRelation The relation to subtract from the current relation (set difference). ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.except_(other_rel=rel.set_alias("other_rel")) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text The relation query is executed twice, therefore generating different ids and timestamps: â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ â”‚ uuid â”‚ varchar â”‚ int64 â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ f69ed6dd-a7fe-4de2-b6af-1c2418096d69 â”‚ value is uneven â”‚ 3 â”‚ 2025-04-10 11:43:05.711+02 â”‚ â”‚ 08ad11dc-a9c2-4aaa-9272-760b27ad1f5d â”‚ value is uneven â”‚ 7 â”‚ 2025-04-10 11:47:05.711+02 â”‚ ... ``` ---- ###### `filter` {#docs:stable:clients:python:relational_api::filter} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python filter(self: _duckdb.DuckDBPyRelation, filter_expr: object) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Filter the relation object by the filter in filter_expr ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **filter_expr** : str, Expression The filter expression to apply over the relation. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.filter("value = 2") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ â”‚ uuid â”‚ varchar â”‚ int64 â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ b0684ab7-fcbf-41c5-8e4a-a51bdde86926 â”‚ value is even â”‚ 2 â”‚ 2025-04-10 09:54:29.78+02 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `insert` {#docs:stable:clients:python:relational_api::insert} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python insert(self: _duckdb.DuckDBPyRelation, values: object) -> None ``` ####### Description {#docs:stable:clients:python:relational_api::description} Inserts the given values into the relation ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **values** : object A tuple of values matching the relation column list, to be inserted. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb from datetime import datetime from uuid import uuid4 duckdb_conn = duckdb.connect() duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ).to_table("code_example") rel = duckdb_conn.table("code_example") rel.insert( ( uuid4(), 'value is even', 10, datetime.now() ) ) rel.filter("value = 10") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ â”‚ uuid â”‚ varchar â”‚ int64 â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ c6dfab87-fae6-4213-8f76-1b96a8d179f6 â”‚ value is even â”‚ 10 â”‚ 2025-04-10 10:02:24.652218+02 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `insert_into` {#docs:stable:clients:python:relational_api::insert_into} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python insert_into(self: _duckdb.DuckDBPyRelation, table_name: str) -> None ``` ####### Description {#docs:stable:clients:python:relational_api::description} Inserts the relation object into an existing table named table_name ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **table_name** : str The table name to insert the data into. The relation must respect the column order of the table. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb from datetime import datetime from uuid import uuid4 duckdb_conn = duckdb.connect() duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ).to_table("code_example") rel = duckdb_conn.values( [ uuid4(), 'value is even', 10, datetime.now() ] ) rel.insert_into("code_example") duckdb_conn.table("code_example").filter("value = 10") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ â”‚ uuid â”‚ varchar â”‚ int64 â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 271c5ddd-c1d5-4638-b5a0-d8c7dc9e8220 â”‚ value is even â”‚ 10 â”‚ 2025-04-10 14:29:18.616379+02 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `intersect` {#docs:stable:clients:python:relational_api::intersect} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python intersect(self: _duckdb.DuckDBPyRelation, other_rel: _duckdb.DuckDBPyRelation) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Create the set intersection of this relation object with another relation object in other_rel ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **other_rel** : duckdb.duckdb.DuckDBPyRelation The relation to intersect with the current relation (set intersection). ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.intersect(other_rel=rel.set_alias("other_rel")) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text The relation query is executed once with `rel` and once with `other_rel`, therefore generating different ids and timestamps: â”Œâ”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ â”‚ uuid â”‚ varchar â”‚ int64 â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 0 rows â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `join` {#docs:stable:clients:python:relational_api::join} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python join(self: _duckdb.DuckDBPyRelation, other_rel: _duckdb.DuckDBPyRelation, condition: object, how: str = 'inner') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Join the relation object with another relation object in other_rel using the join condition expression in join_condition. Types supported are 'inner', 'left', 'right', 'outer', 'semi' and 'anti' Depending on how the `condition` parameter is provided, the JOIN clause generated is: - `USING` ```python import duckdb duckdb_conn = duckdb.connect() rel1 = duckdb_conn.sql("select range as id, concat('dummy 1', range) as text from range(1,10)") rel2 = duckdb_conn.sql("select range as id, concat('dummy 2', range) as text from range(5,7)") rel1.join(rel2, condition="id", how="inner").sql_query() ``` with following SQL: ```sql SELECT * FROM ( SELECT "range" AS id, concat('dummy 1', "range") AS "text" FROM "range"(1, 10) ) AS unnamed_relation_41bc15e744037078 INNER JOIN ( SELECT "range" AS id, concat('dummy 2', "range") AS "text" FROM "range"(5, 7) ) AS unnamed_relation_307e245965aa2c2b USING (id) ``` - `ON` ```python import duckdb duckdb_conn = duckdb.connect() rel1 = duckdb_conn.sql("select range as id, concat('dummy 1', range) as text from range(1,10)") rel2 = duckdb_conn.sql("select range as id, concat('dummy 2', range) as text from range(5,7)") rel1.join(rel2, condition=f"{rel1.alias}.id = {rel2.alias}.id", how="inner").sql_query() ``` with the following SQL: ```sql SELECT * FROM ( SELECT "range" AS id, concat('dummy 1', "range") AS "text" FROM "range"(1, 10) ) AS unnamed_relation_41bc15e744037078 INNER JOIN ( SELECT "range" AS id, concat('dummy 2', "range") AS "text" FROM "range"(5, 7) ) AS unnamed_relation_307e245965aa2c2b ON ((unnamed_relation_41bc15e744037078.id = unnamed_relation_307e245965aa2c2b.id)) ``` > `NATURAL`, `POSITIONAL` and `ASOF` joins are not provided by the relational API. > `CROSS` joins are provided through the [cross method](#::cross). ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **other_rel** : duckdb.duckdb.DuckDBPyRelation The relation to join with the current relation. - **condition** : object The join condition, typically a SQL expression or the duplicated column name to join on. - **how** : str, default: 'inner' The type of join to perform: 'inner', 'left', 'right', 'outer', 'semi' and 'anti'. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.set_alias("rel").join( other_rel=rel.set_alias("other_rel"), condition="rel.id = other_rel.id", how="left" ) rel.count("*") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ count_star() â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 9 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `limit` {#docs:stable:clients:python:relational_api::limit} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python limit(self: _duckdb.DuckDBPyRelation, n: typing.SupportsInt, offset: typing.SupportsInt = 0) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Only retrieve the first n rows from this relation object, starting at offset ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **n** : int The maximum number of rows to return. - **offset** : int, default: 0 The number of rows to skip before starting to return rows. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.limit(1) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ â”‚ uuid â”‚ varchar â”‚ int64 â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 4135597b-29e7-4cb9-a443-41f3d54f25df â”‚ value is uneven â”‚ 1 â”‚ 2025-04-10 10:52:03.678+02 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `map` {#docs:stable:clients:python:relational_api::map} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python map(self: _duckdb.DuckDBPyRelation, map_function: collections.abc.Callable, *, schema: typing.Optional[object] = None) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Calls the passed function on the relation ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **map_function** : Callable A Python function that takes a DataFrame and returns a transformed DataFrame. - **schema** : object, default: None Optional schema describing the structure of the output relation. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb from pandas import DataFrame def multiply_by_2(df: DataFrame): df["id"] = df["id"] * 2 return df duckdb_conn = duckdb.connect() rel = duckdb_conn.sql("select range as id, 'dummy' as text from range(1,3)") rel.map(multiply_by_2, schema={"id": int, "text": str}) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ text â”‚ â”‚ int64 â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 2 â”‚ dummy â”‚ â”‚ 4 â”‚ dummy â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `order` {#docs:stable:clients:python:relational_api::order} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python order(self: _duckdb.DuckDBPyRelation, order_expr: str) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Reorder the relation object by order_expr ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **order_expr** : str SQL expression defining the ordering of the result rows. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.order("value desc").limit(1, offset=4) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ â”‚ uuid â”‚ varchar â”‚ int64 â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 55899131-e3d3-463c-a215-f65cb8aef3bf â”‚ value is uneven â”‚ 5 â”‚ 2025-04-10 10:56:03.678+02 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `project` {#docs:stable:clients:python:relational_api::project} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python project(self: _duckdb.DuckDBPyRelation, *args, groups: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Project the relation object by the projection in project_expr **Aliases**: [`select`](#::select) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.project("description").limit(1) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `select` {#docs:stable:clients:python:relational_api::select} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python select(self: _duckdb.DuckDBPyRelation, *args, groups: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Project the relation object by the projection in project_expr **Aliases**: [`project`](#::project) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.select("description").limit(1) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `sort` {#docs:stable:clients:python:relational_api::sort} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python sort(self: _duckdb.DuckDBPyRelation, *args) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Reorder the relation object by the provided expressions ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.sort("description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ â”‚ uuid â”‚ varchar â”‚ int64 â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 5e0dfa8c-de4d-4ccd-8cff-450dabb86bde â”‚ value is even â”‚ 6 â”‚ 2025-04-10 16:52:15.605+02 â”‚ â”‚ 95f1ad48-facf-4a84-a971-0a4fecce68c7 â”‚ value is even â”‚ 2 â”‚ 2025-04-10 16:48:15.605+02 â”‚ ... ``` ---- ###### `union` {#docs:stable:clients:python:relational_api::union} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python union(self: _duckdb.DuckDBPyRelation, union_rel: _duckdb.DuckDBPyRelation) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Create the set union of this relation object with another relation object in other_rel >The union is `union all`. In order to retrieve distinct values, apply [distinct](#::distinct). ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **union_rel** : duckdb.duckdb.DuckDBPyRelation The relation to union with the current relation (set union). ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.union(union_rel=rel) rel.count("*") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ count_star() â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 18 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `update` {#docs:stable:clients:python:relational_api::update} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python update(self: _duckdb.DuckDBPyRelation, set: object, *, condition: object = None) -> None ``` ####### Description {#docs:stable:clients:python:relational_api::description} Update the given relation with the provided expressions ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **set** : object Mapping of columns to new values for the update operation. - **condition** : object, default: None Optional condition to filter which rows to update. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb from duckdb import ColumnExpression duckdb_conn = duckdb.connect() duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ).to_table("code_example") rel = duckdb_conn.table("code_example") rel.update(set={"description":None}, condition=ColumnExpression("value") == 1) # the update is executed on the table, but not reflected on the relationship # the relationship has to be recreated to retrieve the modified data rel = duckdb_conn.table("code_example") rel.show() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ â”‚ uuid â”‚ varchar â”‚ int64 â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 66dcaa14-f4a6-4a55-af3b-7f6aa23ab4ad â”‚ NULL â”‚ 1 â”‚ 2025-04-10 16:54:49.317+02 â”‚ â”‚ c6a18a42-67fb-4c95-827b-c966f2f95b88 â”‚ value is even â”‚ 2 â”‚ 2025-04-10 16:55:49.317+02 â”‚ ... ``` #### Functions {#docs:stable:clients:python:relational_api::functions-} This section contains the functions which can be applied to a relation, in order to get a (scalar) result. The functions are [lazy evaluated](#::lazy-evaluation). | Name | Description | |:--|:-------| | [`any_value`](#::any_value) | Returns the first non-null value from a given column | | [`arg_max`](#::arg_max) | Finds the row with the maximum value for a value column and returns the value of that row for an argument column | | [`arg_min`](#::arg_min) | Finds the row with the minimum value for a value column and returns the value of that row for an argument column | | [`avg`](#::avg) | Computes the average on a given column | | [`bit_and`](#::bit_and) | Computes the bitwise AND of all bits present in a given column | | [`bit_or`](#::bit_or) | Computes the bitwise OR of all bits present in a given column | | [`bit_xor`](#::bit_xor) | Computes the bitwise XOR of all bits present in a given column | | [`bitstring_agg`](#::bitstring_agg) | Computes a bitstring with bits set for each distinct value in a given column | | [`bool_and`](#::bool_and) | Computes the logical AND of all values present in a given column | | [`bool_or`](#::bool_or) | Computes the logical OR of all values present in a given column | | [`count`](#::count) | Computes the number of elements present in a given column | | [`cume_dist`](#::cume_dist) | Computes the cumulative distribution within the partition | | [`dense_rank`](#::dense_rank) | Computes the dense rank within the partition | | [`distinct`](#::distinct) | Retrieve distinct rows from this relation object | | [`favg`](#::favg) | Computes the average of all values present in a given column using a more accurate floating point summation (Kahan Sum) | | [`first`](#::first) | Returns the first value of a given column | | [`first_value`](#::first_value) | Computes the first value within the group or partition | | [`fsum`](#::fsum) | Computes the sum of all values present in a given column using a more accurate floating point summation (Kahan Sum) | | [`geomean`](#::geomean) | Computes the geometric mean over all values present in a given column | | [`histogram`](#::histogram) | Computes the histogram over all values present in a given column | | [`lag`](#::lag) | Computes the lag within the partition | | [`last`](#::last) | Returns the last value of a given column | | [`last_value`](#::last_value) | Computes the last value within the group or partition | | [`lead`](#::lead) | Computes the lead within the partition | | [`list`](#::list) | Returns a list containing all values present in a given column | | [`max`](#::max) | Returns the maximum value present in a given column | | [`mean`](#::mean) | Computes the average on a given column | | [`median`](#::median) | Computes the median over all values present in a given column | | [`min`](#::min) | Returns the minimum value present in a given column | | [`mode`](#::mode) | Computes the mode over all values present in a given column | | [`n_tile`](#::n_tile) | Divides the partition as equally as possible into num_buckets | | [`nth_value`](#::nth_value) | Computes the nth value within the partition | | [`percent_rank`](#::percent_rank) | Computes the relative rank within the partition | | [`product`](#::product) | Returns the product of all values present in a given column | | [`quantile`](#::quantile) | Computes the exact quantile value for a given column | | [`quantile_cont`](#::quantile_cont) | Computes the interpolated quantile value for a given column | | [`quantile_disc`](#::quantile_disc) | Computes the exact quantile value for a given column | | [`rank`](#::rank) | Computes the rank within the partition | | [`rank_dense`](#::rank_dense) | Computes the dense rank within the partition | | [`row_number`](#::row_number) | Computes the row number within the partition | | [`select_dtypes`](#::select_dtypes) | Select columns from the relation, by filtering based on type(s) | | [`select_types`](#::select_types) | Select columns from the relation, by filtering based on type(s) | | [`std`](#::std) | Computes the sample standard deviation for a given column | | [`stddev`](#::stddev) | Computes the sample standard deviation for a given column | | [`stddev_pop`](#::stddev_pop) | Computes the population standard deviation for a given column | | [`stddev_samp`](#::stddev_samp) | Computes the sample standard deviation for a given column | | [`string_agg`](#::string_agg) | Concatenates the values present in a given column with a separator | | [`sum`](#::sum) | Computes the sum of all values present in a given column | | [`unique`](#::unique) | Returns the distinct values in a column. | | [`value_counts`](#::value_counts) | Computes the number of elements present in a given column, also projecting the original column | | [`var`](#::var) | Computes the sample variance for a given column | | [`var_pop`](#::var_pop) | Computes the population variance for a given column | | [`var_samp`](#::var_samp) | Computes the sample variance for a given column | | [`variance`](#::variance) | Computes the sample variance for a given column | ###### `any_value` {#docs:stable:clients:python:relational_api::any_value} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python any_value(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Returns the first non-null value from a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name from which to retrieve any value. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)`. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.any_value('id') ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ any_value(id) â”‚ â”‚ uuid â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 642ea3d7-793d-4867-a759-91c1226c25a0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `arg_max` {#docs:stable:clients:python:relational_api::arg_max} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python arg_max(self: _duckdb.DuckDBPyRelation, arg_column: str, value_column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Finds the row with the maximum value for a value column and returns the value of that row for an argument column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **arg_column** : str The column name for which to find the argument maximizing the value. - **value_column** : str The column name containing values used to determine the maximum. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)`. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.arg_max(arg_column="value", value_column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ arg_max("value", "value") â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 9 â”‚ â”‚ value is even â”‚ 8 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `arg_min` {#docs:stable:clients:python:relational_api::arg_min} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python arg_min(self: _duckdb.DuckDBPyRelation, arg_column: str, value_column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Finds the row with the minimum value for a value column and returns the value of that row for an argument column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **arg_column** : str The column name for which to find the argument minimizing the value. - **value_column** : str The column name containing values used to determine the minimum. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.arg_min(arg_column="value", value_column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ arg_min("value", "value") â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 2 â”‚ â”‚ value is uneven â”‚ 1 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `avg` {#docs:stable:clients:python:relational_api::avg} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python avg(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the average on a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the average on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.avg('value') ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ avg("value") â”‚ â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 5.0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `bit_and` {#docs:stable:clients:python:relational_api::bit_and} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python bit_and(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the bitwise AND of all bits present in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to perform the bitwise AND aggregation on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.select("description, value::bit as value_bit") rel.bit_and(column="value_bit", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ bit_and(value_bit) â”‚ â”‚ varchar â”‚ bit â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 0000000000000000000000000000000000000000000000000000000000000001 â”‚ â”‚ value is even â”‚ 0000000000000000000000000000000000000000000000000000000000000000 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `bit_or` {#docs:stable:clients:python:relational_api::bit_or} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python bit_or(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the bitwise OR of all bits present in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to perform the bitwise OR aggregation on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.select("description, value::bit as value_bit") rel.bit_or(column="value_bit", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ bit_or(value_bit) â”‚ â”‚ varchar â”‚ bit â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 0000000000000000000000000000000000000000000000000000000000001111 â”‚ â”‚ value is even â”‚ 0000000000000000000000000000000000000000000000000000000000001110 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `bit_xor` {#docs:stable:clients:python:relational_api::bit_xor} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python bit_xor(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the bitwise XOR of all bits present in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to perform the bitwise XOR aggregation on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.select("description, value::bit as value_bit") rel.bit_xor(column="value_bit", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ bit_xor(value_bit) â”‚ â”‚ varchar â”‚ bit â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 0000000000000000000000000000000000000000000000000000000000001000 â”‚ â”‚ value is uneven â”‚ 0000000000000000000000000000000000000000000000000000000000001001 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `bitstring_agg` {#docs:stable:clients:python:relational_api::bitstring_agg} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python bitstring_agg(self: _duckdb.DuckDBPyRelation, column: str, min: typing.Optional[object] = None, max: typing.Optional[object] = None, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes a bitstring with bits set for each distinct value in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to aggregate as a bitstring. - **min** : object, default: None Optional minimum bitstring value for aggregation. - **max** : object, default: None Optional maximum bitstring value for aggregation. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.bitstring_agg(column="value", groups="description", projected_columns="description", min=1, max=9) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ bitstring_agg("value") â”‚ â”‚ varchar â”‚ bit â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 101010101 â”‚ â”‚ value is even â”‚ 010101010 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `bool_and` {#docs:stable:clients:python:relational_api::bool_and} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python bool_and(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the logical AND of all values present in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to perform the boolean AND aggregation on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.select("description, mod(value,2)::boolean as uneven") rel.bool_and(column="uneven", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ bool_and(uneven) â”‚ â”‚ varchar â”‚ boolean â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ false â”‚ â”‚ value is uneven â”‚ true â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `bool_or` {#docs:stable:clients:python:relational_api::bool_or} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python bool_or(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the logical OR of all values present in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to perform the boolean OR aggregation on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel = rel.select("description, mod(value,2)::boolean as uneven") rel.bool_or(column="uneven", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ bool_or(uneven) â”‚ â”‚ varchar â”‚ boolean â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ false â”‚ â”‚ value is uneven â”‚ true â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `count` {#docs:stable:clients:python:relational_api::count} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python count(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the number of elements present in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to perform count on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.count("id") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ count(id) â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 9 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `cume_dist` {#docs:stable:clients:python:relational_api::cume_dist} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python cume_dist(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the cumulative distribution within the partition ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.cume_dist(window_spec="over (partition by description order by value)", projected_columns="description, value") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ value â”‚ cume_dist() OVER (PARTITION BY description ORDER BY "value") â”‚ â”‚ varchar â”‚ int64 â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 1 â”‚ 0.2 â”‚ â”‚ value is uneven â”‚ 3 â”‚ 0.4 â”‚ â”‚ value is uneven â”‚ 5 â”‚ 0.6 â”‚ â”‚ value is uneven â”‚ 7 â”‚ 0.8 â”‚ â”‚ value is uneven â”‚ 9 â”‚ 1.0 â”‚ â”‚ value is even â”‚ 2 â”‚ 0.25 â”‚ â”‚ value is even â”‚ 4 â”‚ 0.5 â”‚ â”‚ value is even â”‚ 6 â”‚ 0.75 â”‚ â”‚ value is even â”‚ 8 â”‚ 1.0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `dense_rank` {#docs:stable:clients:python:relational_api::dense_rank} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python dense_rank(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the dense rank within the partition **Aliases**: [`rank_dense`](#::rank_dense) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.dense_rank(window_spec="over (partition by description order by value)", projected_columns="description, value") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ value â”‚ dense_rank() OVER (PARTITION BY description ORDER BY "value") â”‚ â”‚ varchar â”‚ int64 â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 2 â”‚ 1 â”‚ â”‚ value is even â”‚ 4 â”‚ 2 â”‚ â”‚ value is even â”‚ 6 â”‚ 3 â”‚ â”‚ value is even â”‚ 8 â”‚ 4 â”‚ â”‚ value is uneven â”‚ 1 â”‚ 1 â”‚ â”‚ value is uneven â”‚ 3 â”‚ 2 â”‚ â”‚ value is uneven â”‚ 5 â”‚ 3 â”‚ â”‚ value is uneven â”‚ 7 â”‚ 4 â”‚ â”‚ value is uneven â”‚ 9 â”‚ 5 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `distinct` {#docs:stable:clients:python:relational_api::distinct} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python distinct(self: _duckdb.DuckDBPyRelation) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Retrieve distinct rows from this relation object ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql("select range from range(1,4)") rel = rel.union(union_rel=rel) rel.distinct().order("range") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â” â”‚ range â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ â”‚ 2 â”‚ â”‚ 3 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `favg` {#docs:stable:clients:python:relational_api::favg} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python favg(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the average of all values present in a given column using a more accurate floating point summation (Kahan Sum) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the average on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.favg(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ favg("value") â”‚ â”‚ varchar â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 5.0 â”‚ â”‚ value is even â”‚ 5.0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `first` {#docs:stable:clients:python:relational_api::first} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python first(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Returns the first value of a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name from which to retrieve the first value. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.first(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ "first"("value") â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 2 â”‚ â”‚ value is uneven â”‚ 1 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `first_value` {#docs:stable:clients:python:relational_api::first_value} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python first_value(self: _duckdb.DuckDBPyRelation, column: str, window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the first value within the group or partition ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name from which to retrieve the first value. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.first_value(column="value", window_spec="over (partition by description order by value)", projected_columns="description").distinct() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ first_value("value") OVER (PARTITION BY description ORDER BY "value") â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 2 â”‚ â”‚ value is uneven â”‚ 1 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `fsum` {#docs:stable:clients:python:relational_api::fsum} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python fsum(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the sum of all values present in a given column using a more accurate floating point summation (Kahan Sum) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the sum on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.fsum(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ fsum("value") â”‚ â”‚ varchar â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 20.0 â”‚ â”‚ value is uneven â”‚ 25.0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `geomean` {#docs:stable:clients:python:relational_api::geomean} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python geomean(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the geometric mean over all values present in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the geometric mean on. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.geomean(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ geomean("value") â”‚ â”‚ varchar â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 3.936283427035351 â”‚ â”‚ value is even â”‚ 4.426727678801287 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `histogram` {#docs:stable:clients:python:relational_api::histogram} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python histogram(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the histogram over all values present in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the histogram on. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.histogram(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ histogram("value") â”‚ â”‚ varchar â”‚ map(bigint, ubigint) â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ {1=1, 3=1, 5=1, 7=1, 9=1} â”‚ â”‚ value is even â”‚ {2=1, 4=1, 6=1, 8=1} â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `lag` {#docs:stable:clients:python:relational_api::lag} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python lag(self: _duckdb.DuckDBPyRelation, column: str, window_spec: str, offset: typing.SupportsInt = 1, default_value: str = 'NULL', ignore_nulls: bool = False, projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the lag within the partition ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to apply the lag function on. - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **offset** : int, default: 1 The number of rows to lag behind. - **default_value** : str, default: 'NULL' The default value to return when the lag offset goes out of bounds. - **ignore_nulls** : bool, default: False Whether to ignore NULL values when computing the lag. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.lag(column="description", window_spec="over (order by value)", projected_columns="description, value") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ value â”‚ lag(description, 1, NULL) OVER (ORDER BY "value") â”‚ â”‚ varchar â”‚ int64 â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 1 â”‚ NULL â”‚ â”‚ value is even â”‚ 2 â”‚ value is uneven â”‚ â”‚ value is uneven â”‚ 3 â”‚ value is even â”‚ â”‚ value is even â”‚ 4 â”‚ value is uneven â”‚ â”‚ value is uneven â”‚ 5 â”‚ value is even â”‚ â”‚ value is even â”‚ 6 â”‚ value is uneven â”‚ â”‚ value is uneven â”‚ 7 â”‚ value is even â”‚ â”‚ value is even â”‚ 8 â”‚ value is uneven â”‚ â”‚ value is uneven â”‚ 9 â”‚ value is even â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `last` {#docs:stable:clients:python:relational_api::last} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python last(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Returns the last value of a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name from which to retrieve the last value. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.last(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ "last"("value") â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 8 â”‚ â”‚ value is uneven â”‚ 9 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `last_value` {#docs:stable:clients:python:relational_api::last_value} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python last_value(self: _duckdb.DuckDBPyRelation, column: str, window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the last value within the group or partition ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name from which to retrieve the last value within the window. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.last_value(column="value", window_spec="over (order by description)", projected_columns="description").distinct() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ last_value("value") OVER (ORDER BY description) â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 9 â”‚ â”‚ value is even â”‚ 8 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `lead` {#docs:stable:clients:python:relational_api::lead} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python lead(self: _duckdb.DuckDBPyRelation, column: str, window_spec: str, offset: typing.SupportsInt = 1, default_value: str = 'NULL', ignore_nulls: bool = False, projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the lead within the partition ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to apply the lead function on. - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **offset** : int, default: 1 The number of rows to lead ahead. - **default_value** : str, default: 'NULL' The default value to return when the lead offset goes out of bounds. - **ignore_nulls** : bool, default: False Whether to ignore NULL values when computing the lead. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.lead(column="description", window_spec="over (order by value)", projected_columns="description, value") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ value â”‚ lead(description, 1, NULL) OVER (ORDER BY "value") â”‚ â”‚ varchar â”‚ int64 â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 1 â”‚ value is even â”‚ â”‚ value is even â”‚ 2 â”‚ value is uneven â”‚ â”‚ value is uneven â”‚ 3 â”‚ value is even â”‚ â”‚ value is even â”‚ 4 â”‚ value is uneven â”‚ â”‚ value is uneven â”‚ 5 â”‚ value is even â”‚ â”‚ value is even â”‚ 6 â”‚ value is uneven â”‚ â”‚ value is uneven â”‚ 7 â”‚ value is even â”‚ â”‚ value is even â”‚ 8 â”‚ value is uneven â”‚ â”‚ value is uneven â”‚ 9 â”‚ NULL â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `list` {#docs:stable:clients:python:relational_api::list} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python list(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Returns a list containing all values present in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to aggregate values into a list. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.list(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ list("value") â”‚ â”‚ varchar â”‚ int64[] â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ [2, 4, 6, 8] â”‚ â”‚ value is uneven â”‚ [1, 3, 5, 7, 9] â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `max` {#docs:stable:clients:python:relational_api::max} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python max(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Returns the maximum value present in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the maximum value of. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.max(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ max("value") â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 8 â”‚ â”‚ value is uneven â”‚ 9 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `mean` {#docs:stable:clients:python:relational_api::mean} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python mean(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the average on a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the mean value of. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.mean(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ avg("value") â”‚ â”‚ varchar â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 5.0 â”‚ â”‚ value is uneven â”‚ 5.0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `median` {#docs:stable:clients:python:relational_api::median} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python median(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the median over all values present in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the median value of. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.median(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ median("value") â”‚ â”‚ varchar â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 5.0 â”‚ â”‚ value is uneven â”‚ 5.0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `min` {#docs:stable:clients:python:relational_api::min} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python min(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Returns the minimum value present in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the min value of. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.min(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ min("value") â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 1 â”‚ â”‚ value is even â”‚ 2 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `mode` {#docs:stable:clients:python:relational_api::mode} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python mode(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the mode over all values present in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the mode (most frequent value) of. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.mode(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ "mode"("value") â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 1 â”‚ â”‚ value is even â”‚ 2 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `n_tile` {#docs:stable:clients:python:relational_api::n_tile} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python n_tile(self: _duckdb.DuckDBPyRelation, window_spec: str, num_buckets: typing.SupportsInt, projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Divides the partition as equally as possible into num_buckets ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **num_buckets** : int The number of buckets to divide the rows into. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.n_tile(window_spec="over (partition by description)", num_buckets=2, projected_columns="description, value") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ value â”‚ ntile(2) OVER (PARTITION BY description) â”‚ â”‚ varchar â”‚ int64 â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 1 â”‚ 1 â”‚ â”‚ value is uneven â”‚ 3 â”‚ 1 â”‚ â”‚ value is uneven â”‚ 5 â”‚ 1 â”‚ â”‚ value is uneven â”‚ 7 â”‚ 2 â”‚ â”‚ value is uneven â”‚ 9 â”‚ 2 â”‚ â”‚ value is even â”‚ 2 â”‚ 1 â”‚ â”‚ value is even â”‚ 4 â”‚ 1 â”‚ â”‚ value is even â”‚ 6 â”‚ 2 â”‚ â”‚ value is even â”‚ 8 â”‚ 2 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `nth_value` {#docs:stable:clients:python:relational_api::nth_value} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python nth_value(self: _duckdb.DuckDBPyRelation, column: str, window_spec: str, offset: typing.SupportsInt, ignore_nulls: bool = False, projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the nth value within the partition ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name from which to retrieve the nth value within the window. - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **offset** : int The position of the value to retrieve within the window (1-based index). - **ignore_nulls** : bool, default: False Whether to ignore NULL values when computing the nth value. - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.nth_value(column="value", window_spec="over (partition by description)", projected_columns="description", offset=1) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ nth_value("value", 1) OVER (PARTITION BY description) â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 2 â”‚ â”‚ value is even â”‚ 2 â”‚ â”‚ value is even â”‚ 2 â”‚ â”‚ value is even â”‚ 2 â”‚ â”‚ value is uneven â”‚ 1 â”‚ â”‚ value is uneven â”‚ 1 â”‚ â”‚ value is uneven â”‚ 1 â”‚ â”‚ value is uneven â”‚ 1 â”‚ â”‚ value is uneven â”‚ 1 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `percent_rank` {#docs:stable:clients:python:relational_api::percent_rank} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python percent_rank(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the relative rank within the partition ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.percent_rank(window_spec="over (partition by description order by value)", projected_columns="description, value") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ value â”‚ percent_rank() OVER (PARTITION BY description ORDER BY "value") â”‚ â”‚ varchar â”‚ int64 â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 2 â”‚ 0.0 â”‚ â”‚ value is even â”‚ 4 â”‚ 0.3333333333333333 â”‚ â”‚ value is even â”‚ 6 â”‚ 0.6666666666666666 â”‚ â”‚ value is even â”‚ 8 â”‚ 1.0 â”‚ â”‚ value is uneven â”‚ 1 â”‚ 0.0 â”‚ â”‚ value is uneven â”‚ 3 â”‚ 0.25 â”‚ â”‚ value is uneven â”‚ 5 â”‚ 0.5 â”‚ â”‚ value is uneven â”‚ 7 â”‚ 0.75 â”‚ â”‚ value is uneven â”‚ 9 â”‚ 1.0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `product` {#docs:stable:clients:python:relational_api::product} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python product(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Returns the product of all values present in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the product of. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.product(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ product("value") â”‚ â”‚ varchar â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 945.0 â”‚ â”‚ value is even â”‚ 384.0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `quantile` {#docs:stable:clients:python:relational_api::quantile} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python quantile(self: _duckdb.DuckDBPyRelation, column: str, q: object = 0.5, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the exact quantile value for a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to compute the quantile for. - **q** : object, default: 0.5 The quantile value to compute (e.g., 0.5 for median). - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.quantile(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ quantile_disc("value", 0.500000) â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 5 â”‚ â”‚ value is even â”‚ 4 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `quantile_cont` {#docs:stable:clients:python:relational_api::quantile_cont} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python quantile_cont(self: _duckdb.DuckDBPyRelation, column: str, q: object = 0.5, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the interpolated quantile value for a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to compute the continuous quantile for. - **q** : object, default: 0.5 The quantile value to compute (e.g., 0.5 for median). - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.quantile_cont(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ quantile_cont("value", 0.500000) â”‚ â”‚ varchar â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 5.0 â”‚ â”‚ value is uneven â”‚ 5.0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `quantile_disc` {#docs:stable:clients:python:relational_api::quantile_disc} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python quantile_disc(self: _duckdb.DuckDBPyRelation, column: str, q: object = 0.5, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the exact quantile value for a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to compute the discrete quantile for. - **q** : object, default: 0.5 The quantile value to compute (e.g., 0.5 for median). - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.quantile_disc(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ quantile_disc("value", 0.500000) â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 4 â”‚ â”‚ value is uneven â”‚ 5 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `rank` {#docs:stable:clients:python:relational_api::rank} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python rank(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the rank within the partition ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.rank(window_spec="over (partition by description order by value)", projected_columns="description, value") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ value â”‚ rank() OVER (PARTITION BY description ORDER BY "value") â”‚ â”‚ varchar â”‚ int64 â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 1 â”‚ 1 â”‚ â”‚ value is uneven â”‚ 3 â”‚ 2 â”‚ â”‚ value is uneven â”‚ 5 â”‚ 3 â”‚ â”‚ value is uneven â”‚ 7 â”‚ 4 â”‚ â”‚ value is uneven â”‚ 9 â”‚ 5 â”‚ â”‚ value is even â”‚ 2 â”‚ 1 â”‚ â”‚ value is even â”‚ 4 â”‚ 2 â”‚ â”‚ value is even â”‚ 6 â”‚ 3 â”‚ â”‚ value is even â”‚ 8 â”‚ 4 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `rank_dense` {#docs:stable:clients:python:relational_api::rank_dense} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python rank_dense(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the dense rank within the partition **Aliases**: [`dense_rank`](#::dense_rank) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.rank_dense(window_spec="over (partition by description order by value)", projected_columns="description, value") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ value â”‚ dense_rank() OVER (PARTITION BY description ORDER BY "value") â”‚ â”‚ varchar â”‚ int64 â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 1 â”‚ 1 â”‚ â”‚ value is uneven â”‚ 3 â”‚ 2 â”‚ â”‚ value is uneven â”‚ 5 â”‚ 3 â”‚ â”‚ value is uneven â”‚ 7 â”‚ 4 â”‚ â”‚ value is uneven â”‚ 9 â”‚ 5 â”‚ â”‚ value is even â”‚ 2 â”‚ 1 â”‚ â”‚ value is even â”‚ 4 â”‚ 2 â”‚ â”‚ value is even â”‚ 6 â”‚ 3 â”‚ â”‚ value is even â”‚ 8 â”‚ 4 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `row_number` {#docs:stable:clients:python:relational_api::row_number} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python row_number(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the row number within the partition ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **window_spec** : str Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.row_number(window_spec="over (partition by description order by value)", projected_columns="description, value") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ value â”‚ row_number() OVER (PARTITION BY description ORDER BY "value") â”‚ â”‚ varchar â”‚ int64 â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 1 â”‚ 1 â”‚ â”‚ value is uneven â”‚ 3 â”‚ 2 â”‚ â”‚ value is uneven â”‚ 5 â”‚ 3 â”‚ â”‚ value is uneven â”‚ 7 â”‚ 4 â”‚ â”‚ value is uneven â”‚ 9 â”‚ 5 â”‚ â”‚ value is even â”‚ 2 â”‚ 1 â”‚ â”‚ value is even â”‚ 4 â”‚ 2 â”‚ â”‚ value is even â”‚ 6 â”‚ 3 â”‚ â”‚ value is even â”‚ 8 â”‚ 4 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `select_dtypes` {#docs:stable:clients:python:relational_api::select_dtypes} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python select_dtypes(self: _duckdb.DuckDBPyRelation, types: object) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Select columns from the relation, by filtering based on type(s) **Aliases**: [`select_types`](#::select_types) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **types** : object Data type(s) to select columns by. Can be a single type or a collection of types. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.select_dtypes(types=[duckdb.typing.VARCHAR]).distinct() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ â”‚ value is uneven â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `select_types` {#docs:stable:clients:python:relational_api::select_types} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python select_types(self: _duckdb.DuckDBPyRelation, types: object) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Select columns from the relation, by filtering based on type(s) **Aliases**: [`select_dtypes`](#::select_dtypes) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **types** : object Data type(s) to select columns by. Can be a single type or a collection of types. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.select_types(types=[duckdb.typing.VARCHAR]).distinct() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ â”‚ value is uneven â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `std` {#docs:stable:clients:python:relational_api::std} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python std(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the sample standard deviation for a given column **Aliases**: [`stddev`](#::stddev), [`stddev_samp`](#::stddev_samp) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the standard deviation for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.std(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ stddev_samp("value") â”‚ â”‚ varchar â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 3.1622776601683795 â”‚ â”‚ value is even â”‚ 2.581988897471611 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `stddev` {#docs:stable:clients:python:relational_api::stddev} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python stddev(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the sample standard deviation for a given column **Aliases**: [`std`](#::std), [`stddev_samp`](#::stddev_samp) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the standard deviation for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.stddev(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ stddev_samp("value") â”‚ â”‚ varchar â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 2.581988897471611 â”‚ â”‚ value is uneven â”‚ 3.1622776601683795 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `stddev_pop` {#docs:stable:clients:python:relational_api::stddev_pop} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python stddev_pop(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the population standard deviation for a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the standard deviation for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.stddev_pop(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ stddev_pop("value") â”‚ â”‚ varchar â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 2.23606797749979 â”‚ â”‚ value is uneven â”‚ 2.8284271247461903 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `stddev_samp` {#docs:stable:clients:python:relational_api::stddev_samp} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python stddev_samp(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the sample standard deviation for a given column **Aliases**: [`stddev`](#::stddev), [`std`](#::std) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the standard deviation for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.stddev_samp(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ stddev_samp("value") â”‚ â”‚ varchar â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 2.581988897471611 â”‚ â”‚ value is uneven â”‚ 3.1622776601683795 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `string_agg` {#docs:stable:clients:python:relational_api::string_agg} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python string_agg(self: _duckdb.DuckDBPyRelation, column: str, sep: str = ',', groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Concatenates the values present in a given column with a separator ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to concatenate values from. - **sep** : str, default: ',' Separator string to use between concatenated values. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.string_agg(column="value", sep=",", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ string_agg("value", ',') â”‚ â”‚ varchar â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 2,4,6,8 â”‚ â”‚ value is uneven â”‚ 1,3,5,7,9 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `sum` {#docs:stable:clients:python:relational_api::sum} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python sum(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the sum of all values present in a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the sum for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.sum(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ sum("value") â”‚ â”‚ varchar â”‚ int128 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 20 â”‚ â”‚ value is uneven â”‚ 25 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `unique` {#docs:stable:clients:python:relational_api::unique} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python unique(self: _duckdb.DuckDBPyRelation, unique_aggr: str) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Returns the distinct values in a column. ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **unique_aggr** : str The column to get the distinct values for. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.unique(unique_aggr="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ â”‚ value is uneven â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `value_counts` {#docs:stable:clients:python:relational_api::value_counts} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python value_counts(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the number of elements present in a given column, also projecting the original column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to count values from. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.value_counts(column="description", groups="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ count(description) â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is uneven â”‚ 5 â”‚ â”‚ value is even â”‚ 4 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `var` {#docs:stable:clients:python:relational_api::var} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python var(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the sample variance for a given column **Aliases**: [`variance`](#::variance), [`var_samp`](#::var_samp) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the sample variance for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.var(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ var_samp("value") â”‚ â”‚ varchar â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 6.666666666666667 â”‚ â”‚ value is uneven â”‚ 10.0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `var_pop` {#docs:stable:clients:python:relational_api::var_pop} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python var_pop(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the population variance for a given column ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the population variance for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.var_pop(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ var_pop("value") â”‚ â”‚ varchar â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 5.0 â”‚ â”‚ value is uneven â”‚ 8.0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `var_samp` {#docs:stable:clients:python:relational_api::var_samp} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python var_samp(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the sample variance for a given column **Aliases**: [`variance`](#::variance), [`var`](#::var) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the sample variance for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.var_samp(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ var_samp("value") â”‚ â”‚ varchar â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 6.666666666666667 â”‚ â”‚ value is uneven â”‚ 10.0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `variance` {#docs:stable:clients:python:relational_api::variance} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python variance(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Computes the sample variance for a given column **Aliases**: [`var`](#::var), [`var_samp`](#::var_samp) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **column** : str The column name to calculate the sample variance for. - **groups** : str, default: '' Comma-separated list of columns to include in the `group by`. - **window_spec** : str, default: '' Optional window specification for window functions, provided as `over (partition by ... order by ...)` - **projected_columns** : str, default: '' Comma-separated list of columns to include in the result. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.variance(column="value", groups="description", projected_columns="description") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ description â”‚ var_samp("value") â”‚ â”‚ varchar â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ value is even â”‚ 6.666666666666667 â”‚ â”‚ value is uneven â”‚ 10.0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Output {#docs:stable:clients:python:relational_api::output-} This section contains the functions which will trigger an SQL execution and retrieve the data. | Name | Description | |:--|:-------| | [`arrow`](#::arrow) | Execute and return an Arrow Record Batch Reader that yields all rows | | [`close`](#::close) | Closes the result | | [`create`](#::create) | Creates a new table named table_name with the contents of the relation object | | [`create_view`](#::create_view) | Creates a view named view_name that refers to the relation object | | [`df`](#::df) | Execute and fetch all rows as a pandas DataFrame | | [`execute`](#::execute) | Transform the relation into a result set | | [`fetch_arrow_reader`](#::fetch_arrow_reader) | Execute and return an Arrow Record Batch Reader that yields all rows | | [`fetch_arrow_table`](#::fetch_arrow_table) | Execute and fetch all rows as an Arrow Table | | [`fetch_df_chunk`](#::fetch_df_chunk) | Execute and fetch a chunk of the rows | | [`fetchall`](#::fetchall) | Execute and fetch all rows as a list of tuples | | [`fetchdf`](#::fetchdf) | Execute and fetch all rows as a pandas DataFrame | | [`fetchmany`](#::fetchmany) | Execute and fetch the next set of rows as a list of tuples | | [`fetchnumpy`](#::fetchnumpy) | Execute and fetch all rows as a Python dict mapping each column to one numpy arrays | | [`fetchone`](#::fetchone) | Execute and fetch a single row as a tuple | | [`pl`](#::pl) | Execute and fetch all rows as a Polars DataFrame | | [`record_batch`](#::record_batch) | record_batch(self: object, batch_size: typing.SupportsInt = 1000000) -> object | | [`tf`](#::tf) | Fetch a result as dict of TensorFlow Tensors | | [`to_arrow_table`](#::to_arrow_table) | Execute and fetch all rows as an Arrow Table | | [`to_csv`](#::to_csv) | Write the relation object to a CSV file in 'file_name' | | [`to_df`](#::to_df) | Execute and fetch all rows as a pandas DataFrame | | [`to_parquet`](#::to_parquet) | Write the relation object to a Parquet file in 'file_name' | | [`to_table`](#::to_table) | Creates a new table named table_name with the contents of the relation object | | [`to_view`](#::to_view) | Creates a view named view_name that refers to the relation object | | [`torch`](#::torch) | Fetch a result as dict of PyTorch Tensors | | [`write_csv`](#::write_csv) | Write the relation object to a CSV file in 'file_name' | | [`write_parquet`](#::write_parquet) | Write the relation object to a Parquet file in 'file_name' | ###### `arrow` {#docs:stable:clients:python:relational_api::arrow} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python arrow(self: _duckdb.DuckDBPyRelation, batch_size: typing.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader ``` ####### Description {#docs:stable:clients:python:relational_api::description} Execute and return an Arrow Record Batch Reader that yields all rows **Aliases**: [`fetch_arrow_table`](#::fetch_arrow_table), [`to_arrow_table`](#::to_arrow_table) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **batch_size** : int, default: 1000000 The batch size of writing the data to the Arrow table ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) pa_table = rel.arrow() pa_table ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text pyarrow.Table id: string description: string value: int64 created_timestamp: timestamp[us, tz=Europe/Amsterdam] ---- id: [["3ac9e0ba-8390-4a02-ad72-33b1caea6354","8b844392-1404-4bbc-b731-120f42c8ca27","ca5584ca-8e97-4fca-a295-ae3c16c32f5b","926d071e-5f64-488f-ae02-d19e315f9f5c","aabeedf0-5783-4eff-9963-b3967a6ea5d8","1f20db9a-bee8-4b65-b7e8-e7c36b5b8fee","795c678e-3524-4b52-96ec-7b48c24eeab1","9ffbd403-169f-4fe4-bc41-09751066f1f1","8fdb0a60-29f0-4f5b-afcc-c736a03cd083"]] description: [["value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven"]] value: [[1,2,3,4,5,6,7,8,9]] created_timestamp: [[2025-04-10 09:07:12.614000Z,2025-04-10 09:08:12.614000Z,2025-04-10 09:09:12.614000Z,2025-04-10 09:10:12.614000Z,2025-04-10 09:11:12.614000Z,2025-04-10 09:12:12.614000Z,2025-04-10 09:13:12.614000Z,2025-04-10 09:14:12.614000Z,2025-04-10 09:15:12.614000Z]] ``` ---- ###### `close` {#docs:stable:clients:python:relational_api::close} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python close(self: _duckdb.DuckDBPyRelation) -> None ``` ####### Description {#docs:stable:clients:python:relational_api::description} Closes the result ---- ###### `create` {#docs:stable:clients:python:relational_api::create} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python create(self: _duckdb.DuckDBPyRelation, table_name: str) -> None ``` ####### Description {#docs:stable:clients:python:relational_api::description} Creates a new table named table_name with the contents of the relation object **Aliases**: [`to_table`](#::to_table) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **table_name** : str The name of the table to be created. There shouldn't be any other table with the same name. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.create("table_code_example") duckdb_conn.table("table_code_example").limit(1) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ â”‚ uuid â”‚ varchar â”‚ int64 â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 3ac9e0ba-8390-4a02-ad72-33b1caea6354 â”‚ value is uneven â”‚ 1 â”‚ 2025-04-10 11:07:12.614+02 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `create_view` {#docs:stable:clients:python:relational_api::create_view} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python create_view(self: _duckdb.DuckDBPyRelation, view_name: str, replace: bool = True) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Creates a view named view_name that refers to the relation object **Aliases**: [`to_view`](#::to_view) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **view_name** : str The name of the view to be created. - **replace** : bool, default: True If the view should be created with `CREATE OR REPLACE`. When set to `False`, there shouldn't be another view with the same `view_name`. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.create_view("view_code_example", replace=True) duckdb_conn.table("view_code_example").limit(1) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ â”‚ uuid â”‚ varchar â”‚ int64 â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 3ac9e0ba-8390-4a02-ad72-33b1caea6354 â”‚ value is uneven â”‚ 1 â”‚ 2025-04-10 11:07:12.614+02 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ###### `df` {#docs:stable:clients:python:relational_api::df} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python df(self: _duckdb.DuckDBPyRelation, *, date_as_object: bool = False) -> pandas.DataFrame ``` ####### Description {#docs:stable:clients:python:relational_api::description} Execute and fetch all rows as a pandas DataFrame **Aliases**: [`fetchdf`](#::fetchdf), [`to_df`](#::to_df) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **date_as_object** : bool, default: False If the date columns should be interpreted as Python date objects. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.df() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text id description value created_timestamp 0 3ac9e0ba-8390-4a02-ad72-33b1caea6354 value is uneven 1 2025-04-10 11:07:12.614000+02:00 1 8b844392-1404-4bbc-b731-120f42c8ca27 value is even 2 2025-04-10 11:08:12.614000+02:00 2 ca5584ca-8e97-4fca-a295-ae3c16c32f5b value is uneven 3 2025-04-10 11:09:12.614000+02:00 ... ``` ---- ###### `execute` {#docs:stable:clients:python:relational_api::execute} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python execute(self: _duckdb.DuckDBPyRelation) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Transform the relation into a result set ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.execute() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ description â”‚ value â”‚ created_timestamp â”‚ â”‚ uuid â”‚ varchar â”‚ int64 â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 3ac9e0ba-8390-4a02-ad72-33b1caea6354 â”‚ value is uneven â”‚ 1 â”‚ 2025-04-10 11:07:12.614+02 â”‚ â”‚ 8b844392-1404-4bbc-b731-120f42c8ca27 â”‚ value is even â”‚ 2 â”‚ 2025-04-10 11:08:12.614+02 â”‚ â”‚ ca5584ca-8e97-4fca-a295-ae3c16c32f5b â”‚ value is uneven â”‚ 3 â”‚ 2025-04-10 11:09:12.614+02 â”‚ ``` ---- ###### `fetch_arrow_reader` {#docs:stable:clients:python:relational_api::fetch_arrow_reader} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python fetch_arrow_reader(self: _duckdb.DuckDBPyRelation, batch_size: typing.SupportsInt = 1000000) -> pyarrow.lib.RecordBatchReader ``` ####### Description {#docs:stable:clients:python:relational_api::description} Execute and return an Arrow Record Batch Reader that yields all rows ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **batch_size** : int, default: 1000000 The batch size for fetching the data. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) pa_reader = rel.fetch_arrow_reader(batch_size=1) pa_reader.read_next_batch() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text pyarrow.RecordBatch id: string description: string value: int64 created_timestamp: timestamp[us, tz=Europe/Amsterdam] ---- id: ["e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd"] description: ["value is even"] value: [2] created_timestamp: [2025-04-10 09:25:51.259000Z] ``` ---- ###### `fetch_arrow_table` {#docs:stable:clients:python:relational_api::fetch_arrow_table} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python fetch_arrow_table(self: _duckdb.DuckDBPyRelation, batch_size: typing.SupportsInt = 1000000) -> pyarrow.lib.Table ``` ####### Description {#docs:stable:clients:python:relational_api::description} Execute and fetch all rows as an Arrow Table **Aliases**: [`arrow`](#::arrow), [`to_arrow_table`](#::to_arrow_table) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **batch_size** : int, default: 1000000 The batch size for fetching the data. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.fetch_arrow_table() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text pyarrow.Table id: string description: string value: int64 created_timestamp: timestamp[us, tz=Europe/Amsterdam] ---- id: [["1587b4b0-3023-49fe-82cf-06303ca136ac","e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd","3f8ad67a-290f-4a22-b41b-0173b8e45afa","9a4e37ef-d8bd-46dd-ab01-51cf4973549f","12baa624-ebc9-45ae-b73e-6f4029e31d2d","56d41292-53cc-48be-a1b8-e1f5d6ca5581","1accca18-c950-47c1-9108-aef8afbd5249","56d8db75-72c4-4d40-90d2-a3c840579c37","e19f6201-8646-401c-b019-e37c42c39632"]] description: [["value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven"]] value: [[1,2,3,4,5,6,7,8,9]] created_timestamp: [[2025-04-10 09:24:51.259000Z,2025-04-10 09:25:51.259000Z,2025-04-10 09:26:51.259000Z,2025-04-10 09:27:51.259000Z,2025-04-10 09:28:51.259000Z,2025-04-10 09:29:51.259000Z,2025-04-10 09:30:51.259000Z,2025-04-10 09:31:51.259000Z,2025-04-10 09:32:51.259000Z]] ``` ---- ###### `fetch_df_chunk` {#docs:stable:clients:python:relational_api::fetch_df_chunk} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python fetch_df_chunk(self: _duckdb.DuckDBPyRelation, vectors_per_chunk: typing.SupportsInt = 1, *, date_as_object: bool = False) -> pandas.DataFrame ``` ####### Description {#docs:stable:clients:python:relational_api::description} Execute and fetch a chunk of the rows ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **vectors_per_chunk** : int, default: 1 Number of data chunks to be processed before converting to dataframe. - **date_as_object** : bool, default: False If the date columns should be interpreted as Python date objects. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.fetch_df_chunk() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text id description value created_timestamp 0 1587b4b0-3023-49fe-82cf-06303ca136ac value is uneven 1 2025-04-10 11:24:51.259000+02:00 1 e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd value is even 2 2025-04-10 11:25:51.259000+02:00 2 3f8ad67a-290f-4a22-b41b-0173b8e45afa value is uneven 3 2025-04-10 11:26:51.259000+02:00 ... ``` ---- ###### `fetchall` {#docs:stable:clients:python:relational_api::fetchall} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python fetchall(self: _duckdb.DuckDBPyRelation) -> list ``` ####### Description {#docs:stable:clients:python:relational_api::description} Execute and fetch all rows as a list of tuples ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.limit(1).fetchall() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text [(UUID('1587b4b0-3023-49fe-82cf-06303ca136ac'), 'value is uneven', 1, datetime.datetime(2025, 4, 10, 11, 24, 51, 259000, tzinfo=))] ``` ---- ###### `fetchdf` {#docs:stable:clients:python:relational_api::fetchdf} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python fetchdf(self: _duckdb.DuckDBPyRelation, *, date_as_object: bool = False) -> pandas.DataFrame ``` ####### Description {#docs:stable:clients:python:relational_api::description} Execute and fetch all rows as a pandas DataFrame **Aliases**: [`df`](#::df), [`to_df`](#::to_df) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **date_as_object** : bool, default: False If the date columns should be interpreted as Python date objects. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.fetchdf() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text id description value created_timestamp 0 1587b4b0-3023-49fe-82cf-06303ca136ac value is uneven 1 2025-04-10 11:24:51.259000+02:00 1 e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd value is even 2 2025-04-10 11:25:51.259000+02:00 2 3f8ad67a-290f-4a22-b41b-0173b8e45afa value is uneven 3 2025-04-10 11:26:51.259000+02:00 ... ``` ---- ###### `fetchmany` {#docs:stable:clients:python:relational_api::fetchmany} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python fetchmany(self: _duckdb.DuckDBPyRelation, size: typing.SupportsInt = 1) -> list ``` ####### Description {#docs:stable:clients:python:relational_api::description} Execute and fetch the next set of rows as a list of tuples >Warning Executing any operation during the retrieval of the data from an [aggregate](#::aggregate) relation, >will close the result set. >```python >import duckdb > >duckdb_conn = duckdb.connect() > >rel = duckdb_conn.sql(""" > select > gen_random_uuid() as id, > concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, > range as value, > now() + concat(range,' ', 'minutes')::interval as created_timestamp > from range(1, 10) > """ >) > >agg_rel = rel.aggregate("value") > >while res := agg_rel.fetchmany(size=1): > print(res) > rel.show() >``` ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **size** : int, default: 1 The number of records to be fetched. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) while res := rel.fetchmany(size=1): print(res) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text [(UUID('cf4c5e32-d0aa-4699-a3ee-0092e900f263'), 'value is uneven', 1, datetime.datetime(2025, 4, 30, 16, 23, 5, 310000, tzinfo=))] [(UUID('cec335ac-24ac-49a3-ae9a-bb35f71fc88d'), 'value is even', 2, datetime.datetime(2025, 4, 30, 16, 24, 5, 310000, tzinfo=))] [(UUID('2423295d-9bb0-453c-a385-21bdacba03b6'), 'value is uneven', 3, datetime.datetime(2025, 4, 30, 16, 25, 5, 310000, tzinfo=))] [(UUID('88806b21-192d-41e7-a293-c789aad636ba'), 'value is even', 4, datetime.datetime(2025, 4, 30, 16, 26, 5, 310000, tzinfo=))] [(UUID('05837a28-dacf-4121-88a6-a374aefb8a07'), 'value is uneven', 5, datetime.datetime(2025, 4, 30, 16, 27, 5, 310000, tzinfo=))] [(UUID('b9c1f7e9-6156-4554-b80e-67d3b5d810bb'), 'value is even', 6, datetime.datetime(2025, 4, 30, 16, 28, 5, 310000, tzinfo=))] [(UUID('4709c7fa-d286-4864-bb48-69748b447157'), 'value is uneven', 7, datetime.datetime(2025, 4, 30, 16, 29, 5, 310000, tzinfo=))] [(UUID('30e48457-b103-4fa5-95cf-1c7f0143335b'), 'value is even', 8, datetime.datetime(2025, 4, 30, 16, 30, 5, 310000, tzinfo=))] [(UUID('036b7f4b-bd78-4ffb-a351-964d93f267b7'), 'value is uneven', 9, datetime.datetime(2025, 4, 30, 16, 31, 5, 310000, tzinfo=))] ``` ---- ###### `fetchnumpy` {#docs:stable:clients:python:relational_api::fetchnumpy} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python fetchnumpy(self: _duckdb.DuckDBPyRelation) -> dict ``` ####### Description {#docs:stable:clients:python:relational_api::description} Execute and fetch all rows as a Python dict mapping each column to one numpy arrays ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.fetchnumpy() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text {'id': array([UUID('1587b4b0-3023-49fe-82cf-06303ca136ac'), UUID('e4ab8cb4-4609-40cb-ad7e-4304ed5ed4bd'), UUID('3f8ad67a-290f-4a22-b41b-0173b8e45afa'), UUID('9a4e37ef-d8bd-46dd-ab01-51cf4973549f'), UUID('12baa624-ebc9-45ae-b73e-6f4029e31d2d'), UUID('56d41292-53cc-48be-a1b8-e1f5d6ca5581'), UUID('1accca18-c950-47c1-9108-aef8afbd5249'), UUID('56d8db75-72c4-4d40-90d2-a3c840579c37'), UUID('e19f6201-8646-401c-b019-e37c42c39632')], dtype=object), 'description': array(['value is uneven', 'value is even', 'value is uneven', 'value is even', 'value is uneven', 'value is even', 'value is uneven', 'value is even', 'value is uneven'], dtype=object), 'value': array([1, 2, 3, 4, 5, 6, 7, 8, 9]), 'created_timestamp': array(['2025-04-10T09:24:51.259000', '2025-04-10T09:25:51.259000', '2025-04-10T09:26:51.259000', '2025-04-10T09:27:51.259000', '2025-04-10T09:28:51.259000', '2025-04-10T09:29:51.259000', '2025-04-10T09:30:51.259000', '2025-04-10T09:31:51.259000', '2025-04-10T09:32:51.259000'], dtype='datetime64[us]')} ``` ---- ###### `fetchone` {#docs:stable:clients:python:relational_api::fetchone} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python fetchone(self: _duckdb.DuckDBPyRelation) -> typing.Optional[tuple] ``` ####### Description {#docs:stable:clients:python:relational_api::description} Execute and fetch a single row as a tuple >Warning Executing any operation during the retrieval of the data from an [aggregate](#::aggregate) relation, >will close the result set. >```python >import duckdb > >duckdb_conn = duckdb.connect() > >rel = duckdb_conn.sql(""" > select > gen_random_uuid() as id, > concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, > range as value, > now() + concat(range,' ', 'minutes')::interval as created_timestamp > from range(1, 10) > """ >) > >agg_rel = rel.aggregate("value") > >while res := agg_rel.fetchone(): > print(res) > rel.show() >``` ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) while res := rel.fetchone(): print(res) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text (UUID('fe036411-f4c7-4f52-9ddd-80cd2bb56613'), 'value is uneven', 1, datetime.datetime(2025, 4, 30, 12, 59, 8, 912000, tzinfo=)) (UUID('466c9b43-e9f0-4237-8f26-155f259a5b59'), 'value is even', 2, datetime.datetime(2025, 4, 30, 13, 0, 8, 912000, tzinfo=)) (UUID('5755cf16-a94f-41ef-a16d-21e856d71f9f'), 'value is uneven', 3, datetime.datetime(2025, 4, 30, 13, 1, 8, 912000, tzinfo=)) (UUID('05b52c93-bd68-45e1-b02a-a08d682c33d5'), 'value is even', 4, datetime.datetime(2025, 4, 30, 13, 2, 8, 912000, tzinfo=)) (UUID('cf61ef13-2840-4541-900d-f493767d7622'), 'value is uneven', 5, datetime.datetime(2025, 4, 30, 13, 3, 8, 912000, tzinfo=)) (UUID('033e7c68-e800-4ee8-9787-6cf50aabc27b'), 'value is even', 6, datetime.datetime(2025, 4, 30, 13, 4, 8, 912000, tzinfo=)) (UUID('8b8d6545-ff54-45d6-b69a-97edb63dfe43'), 'value is uneven', 7, datetime.datetime(2025, 4, 30, 13, 5, 8, 912000, tzinfo=)) (UUID('7da79dfe-b29c-462b-a414-9d5e3cc80139'), 'value is even', 8, datetime.datetime(2025, 4, 30, 13, 6, 8, 912000, tzinfo=)) (UUID('f83ffff2-33b9-4f86-9d14-46974b546bab'), 'value is uneven', 9, datetime.datetime(2025, 4, 30, 13, 7, 8, 912000, tzinfo=)) ``` ---- ###### `pl` {#docs:stable:clients:python:relational_api::pl} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python pl(self: _duckdb.DuckDBPyRelation, batch_size: typing.SupportsInt = 1000000, *, lazy: bool = False) -> duckdb::PolarsDataFrame ``` ####### Description {#docs:stable:clients:python:relational_api::description} Execute and fetch all rows as a Polars DataFrame ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **batch_size** : int, default: 1000000 The number of records to be fetched per batch. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.pl(batch_size=1) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text shape: (9, 4) â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”† description â”† value â”† created_timestamp â”‚ â”‚ --- â”† --- â”† --- â”† --- â”‚ â”‚ str â”† str â”† i64 â”† datetime[Î¼s, Europe/Amsterdam] â”‚ â•žâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•ªâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•ªâ•â•â•â•â•â•â•â•ªâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•¡ â”‚ b2f92c3c-9372-49f3-897f-2c86fcâ€¦ â”† value is uneven â”† 1 â”† 2025-04-10 11:49:51.886 CEST â”‚ ``` ---- ###### `record_batch` {#docs:stable:clients:python:relational_api::record_batch} ####### Description {#docs:stable:clients:python:relational_api::description} record_batch(self: object, batch_size: typing.SupportsInt = 1000000) -> object ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **batch_size** : int, default: 1000000 The batch size for fetching the data. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) pa_batch = rel.record_batch(batch_size=1) pa_batch.read_next_batch() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text pyarrow.RecordBatch id: string description: string value: int64 created_timestamp: timestamp[us, tz=Europe/Amsterdam] ---- id: ["908cf67c-a086-4b94-9017-2089a83e4a6c"] description: ["value is uneven"] value: [1] created_timestamp: [2025-04-10 09:52:55.249000Z] ``` ---- ###### `tf` {#docs:stable:clients:python:relational_api::tf} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python tf(self: _duckdb.DuckDBPyRelation) -> dict ``` ####### Description {#docs:stable:clients:python:relational_api::description} Fetch a result as dict of TensorFlow Tensors ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.select("description, value").tf() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text {'description': , 'value': } ``` ---- ###### `to_arrow_table` {#docs:stable:clients:python:relational_api::to_arrow_table} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python to_arrow_table(self: _duckdb.DuckDBPyRelation, batch_size: typing.SupportsInt = 1000000) -> pyarrow.lib.Table ``` ####### Description {#docs:stable:clients:python:relational_api::description} Execute and fetch all rows as an Arrow Table **Aliases**: [`fetch_arrow_table`](#::fetch_arrow_table), [`arrow`](#::arrow) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **batch_size** : int, default: 1000000 The batch size for fetching the data. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.to_arrow_table() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text pyarrow.Table id: string description: string value: int64 created_timestamp: timestamp[us, tz=Europe/Amsterdam] ---- id: [["86b2011d-3818-426f-a41e-7cd5c7321f79","07fa4f89-0bba-4049-9acd-c933332a66d5","f2f1479e-f582-4fe4-b82f-9b753b69634c","529d3c63-5961-4adb-b0a8-8249188fc82a","aa9eea7d-7fac-4dcf-8f32-4a0b5d64f864","4852aa32-03f2-40d3-8006-b8213904775a","c0127203-f2e3-4925-9810-655bc02a3c19","2a1356ba-5707-44d6-a492-abd0a67e5efb","800a1c24-231c-4dae-bd68-627654c8a110"]] description: [["value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven","value is even","value is uneven"]] value: [[1,2,3,4,5,6,7,8,9]] created_timestamp: [[2025-04-10 09:54:24.015000Z,2025-04-10 09:55:24.015000Z,2025-04-10 09:56:24.015000Z,2025-04-10 09:57:24.015000Z,2025-04-10 09:58:24.015000Z,2025-04-10 09:59:24.015000Z,2025-04-10 10:00:24.015000Z,2025-04-10 10:01:24.015000Z,2025-04-10 10:02:24.015000Z]] ``` ---- ###### `to_csv` {#docs:stable:clients:python:relational_api::to_csv} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python to_csv(self: _duckdb.DuckDBPyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None) -> None ``` ####### Description {#docs:stable:clients:python:relational_api::description} Write the relation object to a CSV file in 'file_name' **Aliases**: [`write_csv`](#::write_csv) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **file_name** : str The name of the output CSV file. - **sep** : str, default: ',' Field delimiter for the output file. - **na_rep** : str, default: '' Missing data representation. - **header** : bool, default: True Whether to write column headers. - **quotechar** : str, default: '"' Character used to quote fields containing special characters. - **escapechar** : str, default: None Character used to escape the delimiter if quoting is set to QUOTE_NONE. - **date_format** : str, default: None Custom format string for DATE values. - **timestamp_format** : str, default: None Custom format string for TIMESTAMP values. - **quoting** : int, default: csv.QUOTE_MINIMAL Control field quoting behavior (e.g., QUOTE_MINIMAL, QUOTE_ALL). - **encoding** : str, default: 'utf-8' Character encoding for the output file. - **compression** : str, default: auto Compression type (e.g., 'gzip', 'bz2', 'zstd'). - **overwrite** : bool, default: False When true, all existing files inside targeted directories will be removed (not supported on remote filesystems). Only has an effect when used with `partition_by`. - **per_thread_output** : bool, default: False When `true`, write one file per thread, rather than one file in total. This allows for faster parallel writing. - **use_tmp_file** : bool, default: False Write to a temporary file before renaming to final name to avoid partial writes. - **partition_by** : list[str], default: None List of column names to partition output by (creates folder structure). - **write_partition_columns** : bool, default: False Whether or not to write partition columns into files. Only has an effect when used with `partition_by`. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.to_csv("code_example.csv") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text The data is exported to a CSV file, named code_example.csv ``` ---- ###### `to_df` {#docs:stable:clients:python:relational_api::to_df} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python to_df(self: _duckdb.DuckDBPyRelation, *, date_as_object: bool = False) -> pandas.DataFrame ``` ####### Description {#docs:stable:clients:python:relational_api::description} Execute and fetch all rows as a pandas DataFrame **Aliases**: [`fetchdf`](#::fetchdf), [`df`](#::df) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **date_as_object** : bool, default: False If the date columns should be interpreted as Python date objects. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.to_df() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text id description value created_timestamp 0 e1f79925-60fd-4ee2-ae67-5eff6b0543d1 value is uneven 1 2025-04-10 11:56:04.452000+02:00 1 caa619d4-d79c-4c00-b82e-9319b086b6f8 value is even 2 2025-04-10 11:57:04.452000+02:00 2 64c68032-99b9-4e8f-b4a3-6c522d5419b3 value is uneven 3 2025-04-10 11:58:04.452000+02:00 ... ``` ---- ###### `to_parquet` {#docs:stable:clients:python:relational_api::to_parquet} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python to_parquet(self: _duckdb.DuckDBPyRelation, file_name: str, *, compression: object = None, field_ids: object = None, row_group_size_bytes: object = None, row_group_size: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None, append: object = None) -> None ``` ####### Description {#docs:stable:clients:python:relational_api::description} Write the relation object to a Parquet file in 'file_name' **Aliases**: [`write_parquet`](#::write_parquet) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **file_name** : str The name of the output Parquet file. - **compression** : str, default: 'snappy' The compression format to use (` uncompressed`, `snappy`, `gzip`, `zstd`, `brotli`, `lz4`, `lz4_raw`). - **field_ids** : STRUCT The field_id for each column. Pass auto to attempt to infer automatically. - **row_group_size_bytes** : int, default: row_group_size * 1024 The target size of each row group. You can pass either a human-readable string, e.g., 2MB, or an integer, i.e., the number of bytes. This option is only used when you have issued `SET preserve_insertion_order = false;`, otherwise, it is ignored. - **row_group_size** : int, default: 122880 The target size, i.e., number of rows, of each row group. - **overwrite** : bool, default: False If True, overwrite the file if it exists. - **per_thread_output** : bool, default: False When `True`, write one file per thread, rather than one file in total. This allows for faster parallel writing. - **use_tmp_file** : bool, default: False Write to a temporary file before renaming to final name to avoid partial writes. - **partition_by** : list[str], default: None List of column names to partition output by (creates folder structure). - **write_partition_columns** : bool, default: False Whether or not to write partition columns into files. Only has an effect when used with `partition_by`. - **append** : bool, default: False When `True`, in the event a filename pattern is generated that already exists, the path will be regenerated to ensure no existing files are overwritten. Only has an effect when used with `partition_by`. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.to_parquet("code_example.parquet") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text The data is exported to a Parquet file, named code_example.parquet ``` ---- ###### `to_table` {#docs:stable:clients:python:relational_api::to_table} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python to_table(self: _duckdb.DuckDBPyRelation, table_name: str) -> None ``` ####### Description {#docs:stable:clients:python:relational_api::description} Creates a new table named table_name with the contents of the relation object **Aliases**: [`create`](#::create) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **table_name** : str The name of the table to be created. There shouldn't be any other table with the same name. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.to_table("table_code_example") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text A table, named table_code_example, is created with the data of the relation ``` ---- ###### `to_view` {#docs:stable:clients:python:relational_api::to_view} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python to_view(self: _duckdb.DuckDBPyRelation, view_name: str, replace: bool = True) -> _duckdb.DuckDBPyRelation ``` ####### Description {#docs:stable:clients:python:relational_api::description} Creates a view named view_name that refers to the relation object **Aliases**: [`create_view`](#::create_view) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **view_name** : str The name of the view to be created. - **replace** : bool, default: True If the view should be created with `CREATE OR REPLACE`. When set to `False`, there shouldn't be another view with the same `view_name`. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.to_view("view_code_example", replace=True) ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text A view, named view_code_example, is created with the query definition of the relation ``` ---- ###### `torch` {#docs:stable:clients:python:relational_api::torch} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python torch(self: _duckdb.DuckDBPyRelation) -> dict ``` ####### Description {#docs:stable:clients:python:relational_api::description} Fetch a result as dict of PyTorch Tensors ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.select("value").torch() ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text {'value': tensor([1, 2, 3, 4, 5, 6, 7, 8, 9])} ``` ---- ###### `write_csv` {#docs:stable:clients:python:relational_api::write_csv} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python write_csv(self: _duckdb.DuckDBPyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None) -> None ``` ####### Description {#docs:stable:clients:python:relational_api::description} Write the relation object to a CSV file in 'file_name' **Aliases**: [`to_csv`](#::to_csv) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **file_name** : str The name of the output CSV file. - **sep** : str, default: ',' Field delimiter for the output file. - **na_rep** : str, default: '' Missing data representation. - **header** : bool, default: True Whether to write column headers. - **quotechar** : str, default: '"' Character used to quote fields containing special characters. - **escapechar** : str, default: None Character used to escape the delimiter if quoting is set to QUOTE_NONE. - **date_format** : str, default: None Custom format string for DATE values. - **timestamp_format** : str, default: None Custom format string for TIMESTAMP values. - **quoting** : int, default: csv.QUOTE_MINIMAL Control field quoting behavior (e.g., QUOTE_MINIMAL, QUOTE_ALL). - **encoding** : str, default: 'utf-8' Character encoding for the output file. - **compression** : str, default: auto Compression type (e.g., 'gzip', 'bz2', 'zstd'). - **overwrite** : bool, default: False When true, all existing files inside targeted directories will be removed (not supported on remote filesystems). Only has an effect when used with `partition_by`. - **per_thread_output** : bool, default: False When `true`, write one file per thread, rather than one file in total. This allows for faster parallel writing. - **use_tmp_file** : bool, default: False Write to a temporary file before renaming to final name to avoid partial writes. - **partition_by** : list[str], default: None List of column names to partition output by (creates folder structure). - **write_partition_columns** : bool, default: False Whether or not to write partition columns into files. Only has an effect when used with `partition_by`. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.write_csv("code_example.csv") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text The data is exported to a CSV file, named code_example.csv ``` ---- ###### `write_parquet` {#docs:stable:clients:python:relational_api::write_parquet} ####### Signature {#docs:stable:clients:python:relational_api::signature} ```python write_parquet(self: _duckdb.DuckDBPyRelation, file_name: str, *, compression: object = None, field_ids: object = None, row_group_size_bytes: object = None, row_group_size: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None, append: object = None) -> None ``` ####### Description {#docs:stable:clients:python:relational_api::description} Write the relation object to a Parquet file in 'file_name' **Aliases**: [`to_parquet`](#::to_parquet) ####### Parameters {#docs:stable:clients:python:relational_api::parameters} - **file_name** : str The name of the output Parquet file. - **compression** : str, default: 'snappy' The compression format to use (` uncompressed`, `snappy`, `gzip`, `zstd`, `brotli`, `lz4`, `lz4_raw`). - **field_ids** : STRUCT The field_id for each column. Pass auto to attempt to infer automatically. - **row_group_size_bytes** : int, default: row_group_size * 1024 The target size of each row group. You can pass either a human-readable string, e.g., 2MB, or an integer, i.e., the number of bytes. This option is only used when you have issued `SET preserve_insertion_order = false;`, otherwise, it is ignored. - **row_group_size** : int, default: 122880 The target size, i.e., number of rows, of each row group. - **overwrite** : bool, default: False If True, overwrite the file if it exists. - **per_thread_output** : bool, default: False When `True`, write one file per thread, rather than one file in total. This allows for faster parallel writing. - **use_tmp_file** : bool, default: False Write to a temporary file before renaming to final name to avoid partial writes. - **partition_by** : list[str], default: None List of column names to partition output by (creates folder structure). - **write_partition_columns** : bool, default: False Whether or not to write partition columns into files. Only has an effect when used with `partition_by`. - **append** : bool, default: False When `True`, in the event a filename pattern is generated that already exists, the path will be regenerated to ensure no existing files are overwritten. Only has an effect when used with `partition_by`. ####### Example {#docs:stable:clients:python:relational_api::example} ```python import duckdb duckdb_conn = duckdb.connect() rel = duckdb_conn.sql(""" select gen_random_uuid() as id, concat('value is ', case when mod(range,2)=0 then 'even' else 'uneven' end) as description, range as value, now() + concat(range,' ', 'minutes')::interval as created_timestamp from range(1, 10) """ ) rel.write_parquet("code_example.parquet") ``` ####### Result {#docs:stable:clients:python:relational_api::result} ```text The data is exported to a Parquet file, named code_example.parquet ``` ### Python Function API {#docs:stable:clients:python:function} You can create a DuckDB user-defined function (UDF) from a Python function so it can be used in SQL queries. Similarly to regular [functions](#docs:stable:sql:functions:overview), they need to have a name, a return type and parameter types. Here is an example using a Python function that calls a third-party library. ```python import duckdb from duckdb.typing import VARCHAR from faker import Faker def generate_random_name(): fake = Faker() return fake.name() duckdb.create_function("random_name", generate_random_name, [], VARCHAR) res = duckdb.sql("SELECT random_name()").fetchall() print(res) ``` ```text [('Gerald Ashley',)] ``` #### Creating Functions {#docs:stable:clients:python:function::creating-functions} To register a Python UDF, use the `create_function` method from a DuckDB connection. Here is the syntax: ```python import duckdb con = duckdb.connect() con.create_function(name, function, parameters, return_type) ``` The `create_function` method takes the following parameters: 1. `name` A string representing the unique name of the UDF within the connection catalog. 2. `function` The Python function you wish to register as a UDF. 3. `parameters` Scalar functions can operate on one or more columns. This parameter takes a list of column types used as input. 4. `return_type` Scalar functions return one element per row. This parameter specifies the return type of the function. 5. `type` (optional): DuckDB supports both native Python types and PyArrow Arrays. By default, `type = 'native'` is assumed, but you can specify `type = 'arrow'` to use PyArrow Arrays. In general, using an Arrow UDF will be much more efficient than native because it will be able to operate in batches. 6. `null_handling` (optional): By default, `NULL` values are automatically handled as `NULL`-in `NULL`-out. Users can specify a desired behavior for `NULL` values by setting `null_handling = 'special'`. 7. `exception_handling` (optional): By default, when an exception is thrown from the Python function, it will be re-thrown in Python. Users can disable this behavior, and instead return `NULL`, by setting this parameter to `'return_null'` 8. `side_effects` (optional): By default, functions are expected to produce the same result for the same input. If the result of a function is impacted by any type of randomness, `side_effects` must be set to `True`. To unregister a UDF, you can call the `remove_function` method with the UDF name: ```python con.remove_function(name) ``` ##### Using Partial Functions {#docs:stable:clients:python:function::using-partial-functions} DuckDB UDFs can also be created with [Python partial functions](https://docs.python.org/3/library/functools.html#functools.partial). In the below example, we show how a custom logger will return the concatenation of the execution datetime in ISO format, always followed by argument passed at UDF creation and the input parameter provided to the function call: ```python from datetime import datetime import duckdb import functools def get_datetime_iso_format() -> str: return datetime.now().isoformat() def logger_udf(func, arg1: str, arg2: int) -> str: return ' '.join([func(), arg1, str(arg2)]) with duckdb.connect() as con: con.sql("select * from range(10) tbl(id)").to_table("example_table") con.create_function( 'custom_logger', functools.partial(logger_udf, get_datetime_iso_format, 'logging data') ) rel = con.sql("SELECT custom_logger(id) from example_table;") rel.show() con.create_function( 'another_custom_logger', functools.partial(logger_udf, get_datetime_iso_format, ':') ) rel = con.sql("SELECT another_custom_logger(id) from example_table;") rel.show() ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ custom_logger(id) â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 2025-03-27T12:07:56.811251 logging data 0 â”‚ â”‚ 2025-03-27T12:07:56.811264 logging data 1 â”‚ â”‚ 2025-03-27T12:07:56.811266 logging data 2 â”‚ â”‚ 2025-03-27T12:07:56.811268 logging data 3 â”‚ â”‚ 2025-03-27T12:07:56.811269 logging data 4 â”‚ â”‚ 2025-03-27T12:07:56.811270 logging data 5 â”‚ â”‚ 2025-03-27T12:07:56.811271 logging data 6 â”‚ â”‚ 2025-03-27T12:07:56.811272 logging data 7 â”‚ â”‚ 2025-03-27T12:07:56.811274 logging data 8 â”‚ â”‚ 2025-03-27T12:07:56.811275 logging data 9 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 10 rows â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ another_custom_logger(id) â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 2025-03-27T12:07:56.812106 : 0 â”‚ â”‚ 2025-03-27T12:07:56.812116 : 1 â”‚ â”‚ 2025-03-27T12:07:56.812118 : 2 â”‚ â”‚ 2025-03-27T12:07:56.812119 : 3 â”‚ â”‚ 2025-03-27T12:07:56.812121 : 4 â”‚ â”‚ 2025-03-27T12:07:56.812122 : 5 â”‚ â”‚ 2025-03-27T12:07:56.812123 : 6 â”‚ â”‚ 2025-03-27T12:07:56.812124 : 7 â”‚ â”‚ 2025-03-27T12:07:56.812126 : 8 â”‚ â”‚ 2025-03-27T12:07:56.812127 : 9 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 10 rows â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Type Annotation {#docs:stable:clients:python:function::type-annotation} When the function has type annotation it's often possible to leave out all of the optional parameters. Using `DuckDBPyType` we can implicitly convert many known types to DuckDBs type system. For example: ```python import duckdb def my_function(x: int) -> str: return x duckdb.create_function("my_func", my_function) print(duckdb.sql("SELECT my_func(42)")) ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ my_func(42) â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 42 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` If only the parameter list types can be inferred, you'll need to pass in `None` as `parameters`. #### `NULL` Handling {#docs:stable:clients:python:function::null-handling} By default when functions receive a `NULL` value, this instantly returns `NULL`, as part of the default `NULL`-handling. When this is not desired, you need to explicitly set this parameter to `"special"`. ```python import duckdb from duckdb.typing import BIGINT def dont_intercept_null(x): return 5 duckdb.create_function("dont_intercept", dont_intercept_null, [BIGINT], BIGINT) res = duckdb.sql("SELECT dont_intercept(NULL)").fetchall() print(res) ``` ```text [(None,)] ``` With `null_handling="special"`: ```python import duckdb from duckdb.typing import BIGINT def dont_intercept_null(x): return 5 duckdb.create_function("dont_intercept", dont_intercept_null, [BIGINT], BIGINT, null_handling="special") res = duckdb.sql("SELECT dont_intercept(NULL)").fetchall() print(res) ``` ```text [(5,)] ``` > Always use `null_handling="special"` when the function can return NULL. ```python import duckdb from duckdb.typing import VARCHAR def return_str_or_none(x: str) -> str | None: if not x: return None return x duckdb.create_function( "return_str_or_none", return_str_or_none, [VARCHAR], VARCHAR, null_handling="special" ) res = duckdb.sql("SELECT return_str_or_none('')").fetchall() print(res) ``` ```text [(None,)] ``` #### Exception Handling {#docs:stable:clients:python:function::exception-handling} By default, when an exception is thrown from the Python function, we'll forward (re-throw) the exception. If you want to disable this behavior, and instead return `NULL`, you'll need to set this parameter to `"return_null"`. ```python import duckdb from duckdb.typing import BIGINT def will_throw(): raise ValueError("ERROR") duckdb.create_function("throws", will_throw, [], BIGINT) try: res = duckdb.sql("SELECT throws()").fetchall() except duckdb.InvalidInputException as e: print(e) duckdb.create_function("doesnt_throw", will_throw, [], BIGINT, exception_handling="return_null") res = duckdb.sql("SELECT doesnt_throw()").fetchall() print(res) ``` ```console Invalid Input Error: Python exception occurred while executing the UDF: ValueError: ERROR At: ...(5): will_throw ...(9): ``` ```text [(None,)] ``` #### Side Effects {#docs:stable:clients:python:function::side-effects} By default DuckDB will assume the created function is a *pure* function, meaning it will produce the same output when given the same input. If your function does not follow that rule, for example when your function makes use of randomness, then you will need to mark this function as having `side_effects`. For example, this function will produce a new count for every invocation. ```python def count() -> int: old = count.counter; count.counter += 1 return old count.counter = 0 ``` If we create this function without marking it as having side effects, the result will be the following: ```python con = duckdb.connect() con.create_function("my_counter", count, side_effects=False) res = con.sql("SELECT my_counter() FROM range(10)").fetchall() print(res) ``` ```text [(0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,), (0,)] ``` Which is obviously not the desired result, when we add `side_effects=True`, the result is as we would expect: ```python con.remove_function("my_counter") count.counter = 0 con.create_function("my_counter", count, side_effects=True) res = con.sql("SELECT my_counter() FROM range(10)").fetchall() print(res) ``` ```text [(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)] ``` #### Python Function Types {#docs:stable:clients:python:function::python-function-types} Currently, two function types are supported, `native` (default) and `arrow`. ##### Arrow {#docs:stable:clients:python:function::arrow} If the function is expected to receive arrow arrays, set the `type` parameter to `'arrow'`. This will let the system know to provide arrow arrays of up to `STANDARD_VECTOR_SIZE` tuples to the function, and also expect an array of the same amount of tuples to be returned from the function. In general, using an Arrow UDF will be much more efficient than native because it will be able to operate in batches. ```python import duckdb import pyarrow as pa from duckdb.typing import VARCHAR from pyarrow import compute as pc def mirror(strings: pa.Array, sep: pa.Array) -> pa.Array: assert isinstance(strings, pa.ChunkedArray) assert isinstance(sep, pa.ChunkedArray) return pc.binary_join_element_wise(strings, pc.ascii_reverse(strings), sep) duckdb.create_function( "mirror", mirror, [VARCHAR, VARCHAR], return_type=VARCHAR, type="arrow", ) duckdb.sql( "CREATE OR REPLACE TABLE strings AS SELECT 'hello' AS str UNION ALL SELECT 'world' AS str;" ) print(duckdb.sql("SELECT mirror(str, '|') FROM strings;").fetchall()) ``` ```text [('hello|olleh',), ('world|dlrow',)] ``` ##### Native {#docs:stable:clients:python:function::native} When the function type is set to `native` the function will be provided with a single tuple at a time, and expect only a single value to be returned. This can be useful to interact with Python libraries that don't operate on Arrow, such as `faker`: ```python import duckdb from duckdb.typing import DATE from faker import Faker def random_date(): fake = Faker() return fake.date_between() duckdb.create_function( "random_date", random_date, parameters=[], return_type=DATE, type="native", ) res = duckdb.sql("SELECT random_date()").fetchall() print(res) ``` ```text [(datetime.date(2019, 5, 15),)] ``` ### Types API {#docs:stable:clients:python:types} The `DuckDBPyType` class represents a type instance of our [data types](#docs:stable:sql:data_types:overview). #### Converting from Other Types {#docs:stable:clients:python:types::converting-from-other-types} To make the API as easy to use as possible, we have added implicit conversions from existing type objects to a DuckDBPyType instance. This means that wherever a DuckDBPyType object is expected, it is also possible to provide any of the options listed below. ##### Python Built-Ins {#docs:stable:clients:python:types::python-built-ins} The table below shows the mapping of Python Built-in types to DuckDB type. | Built-in types | DuckDB type | |:---------------|:------------| | bool | BOOLEAN | | bytearray | BLOB | | bytes | BLOB | | float | DOUBLE | | int | BIGINT | | str | VARCHAR | ##### Numpy DTypes {#docs:stable:clients:python:types::numpy-dtypes} The table below shows the mapping of Numpy DType to DuckDB type. | Type | DuckDB type | |:------------|:------------| | bool | BOOLEAN | | float32 | FLOAT | | float64 | DOUBLE | | int16 | SMALLINT | | int32 | INTEGER | | int64 | BIGINT | | int8 | TINYINT | | uint16 | USMALLINT | | uint32 | UINTEGER | | uint64 | UBIGINT | | uint8 | UTINYINT | ##### Nested Types {#docs:stable:clients:python:types::nested-types} ###### `list[child_type]` {#docs:stable:clients:python:types::listchild_type} `list` type objects map to a `LIST` type of the child type. Which can also be arbitrarily nested. ```python import duckdb from typing import Union duckdb.typing.DuckDBPyType(list[dict[Union[str, int], str]]) ``` ```text MAP(UNION(u1 VARCHAR, u2 BIGINT), VARCHAR)[] ``` ###### `dict[key_type, value_type]` {#docs:stable:clients:python:types::dictkey_type-value_type} `dict` type objects map to a `MAP` type of the key type and the value type. ```python import duckdb print(duckdb.typing.DuckDBPyType(dict[str, int])) ``` ```text MAP(VARCHAR, BIGINT) ``` ###### `{'a': field_one, 'b': field_two, ..., 'n': field_n}` {#docs:stable:clients:python:types::a-field_one-b-field_two--n-field_n} `dict` objects map to a `STRUCT` composed of the keys and values of the dict. ```python import duckdb print(duckdb.typing.DuckDBPyType({'a': str, 'b': int})) ``` ```text STRUCT(a VARCHAR, b BIGINT) ``` ###### `Union[type_1, ... type_n]` {#docs:stable:clients:python:types::uniontype_1--type_n} `typing.Union` objects map to a `UNION` type of the provided types. ```python import duckdb from typing import Union print(duckdb.typing.DuckDBPyType(Union[int, str, bool, bytearray])) ``` ```text UNION(u1 BIGINT, u2 VARCHAR, u3 BOOLEAN, u4 BLOB) ``` ##### Creation Functions {#docs:stable:clients:python:types::creation-functions} For the built-in types, you can use the constants defined in `duckdb.typing`: | DuckDB type | |:---------------| | BIGINT | | BIT | | BLOB | | BOOLEAN | | DATE | | DOUBLE | | FLOAT | | HUGEINT | | INTEGER | | INTERVAL | | SMALLINT | | SQLNULL | | TIME_TZ | | TIME | | TIMESTAMP_MS | | TIMESTAMP_NS | | TIMESTAMP_S | | TIMESTAMP_TZ | | TIMESTAMP | | TINYINT | | UBIGINT | | UHUGEINT | | UINTEGER | | USMALLINT | | UTINYINT | | UUID | | VARCHAR | For the complex types there are methods available on the `DuckDBPyConnection` object or the `duckdb` module. Anywhere a `DuckDBPyType` is accepted, we will also accept one of the type objects that can implicitly convert to a `DuckDBPyType`. ###### `list_type` | `array_type` {#docs:stable:clients:python:types::list_type--array_type} Parameters: * `child_type: DuckDBPyType` ###### `struct_type` | `row_type` {#docs:stable:clients:python:types::struct_type--row_type} Parameters: * `fields: Union[list[DuckDBPyType], dict[str, DuckDBPyType]]` ###### `map_type` {#docs:stable:clients:python:types::map_type} Parameters: * `key_type: DuckDBPyType` * `value_type: DuckDBPyType` ###### `decimal_type` {#docs:stable:clients:python:types::decimal_type} Parameters: * `width: int` * `scale: int` ###### `union_type` {#docs:stable:clients:python:types::union_type} Parameters: * `members: Union[list[DuckDBPyType], dict[str, DuckDBPyType]]` ###### `string_type` {#docs:stable:clients:python:types::string_type} Parameters: * `collation: Optional[str]` ### Expression API {#docs:stable:clients:python:expression} The `Expression` class represents an instance of an [expression](#docs:stable:sql:expressions:overview). #### Why Would I Use the Expression API? {#docs:stable:clients:python:expression::why-would-i-use-the-expression-api} Using this API makes it possible to dynamically build up expressions, which are typically created by the parser from the query string. This allows you to skip that and have more fine-grained control over the used expressions. Below is a list of currently supported expressions that can be created through the API. #### Column Expression {#docs:stable:clients:python:expression::column-expression} This expression references a column by name. ```python import duckdb import pandas as pd df = pd.DataFrame({ 'a': [1, 2, 3, 4], 'b': [True, None, False, True], 'c': [42, 21, 13, 14] }) ``` Selecting a single column: ```python col = duckdb.ColumnExpression('a') duckdb.df(df).select(col).show() ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â” â”‚ a â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ â”‚ 2 â”‚ â”‚ 3 â”‚ â”‚ 4 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”˜ ``` Selecting multiple columns: ```python col_list = [ duckdb.ColumnExpression('a') * 10, duckdb.ColumnExpression('b').isnull(), duckdb.ColumnExpression('c') + 5 ] duckdb.df(df).select(*col_list).show() ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ (a * 10) â”‚ (b IS NULL) â”‚ (c + 5) â”‚ â”‚ int64 â”‚ boolean â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 10 â”‚ false â”‚ 47 â”‚ â”‚ 20 â”‚ true â”‚ 26 â”‚ â”‚ 30 â”‚ false â”‚ 18 â”‚ â”‚ 40 â”‚ false â”‚ 19 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Star Expression {#docs:stable:clients:python:expression::star-expression} This expression selects all columns of the input source. Optionally it's possible to provide an `exclude` list to filter out columns of the table. This `exclude` list can contain either strings or Expressions. ```python import duckdb import pandas as pd df = pd.DataFrame({ 'a': [1, 2, 3, 4], 'b': [True, None, False, True], 'c': [42, 21, 13, 14] }) star = duckdb.StarExpression(exclude = ['b']) duckdb.df(df).select(star).show() ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â” â”‚ a â”‚ c â”‚ â”‚ int64 â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1 â”‚ 42 â”‚ â”‚ 2 â”‚ 21 â”‚ â”‚ 3 â”‚ 13 â”‚ â”‚ 4 â”‚ 14 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Constant Expression {#docs:stable:clients:python:expression::constant-expression} This expression contains a single value. ```python import duckdb import pandas as pd df = pd.DataFrame({ 'a': [1, 2, 3, 4], 'b': [True, None, False, True], 'c': [42, 21, 13, 14] }) const = duckdb.ConstantExpression('hello') duckdb.df(df).select(const).show() ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ 'hello' â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ hello â”‚ â”‚ hello â”‚ â”‚ hello â”‚ â”‚ hello â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Case Expression {#docs:stable:clients:python:expression::case-expression} This expression contains a `CASE WHEN (...) THEN (...) ELSE (...) END` expression. By default `ELSE` is `NULL` and it can be set using `.else(value = ...)`. Additional `WHEN (...) THEN (...)` blocks can be added with `.when(condition = ..., value = ...)`. ```python import duckdb import pandas as pd from duckdb import ( ConstantExpression, ColumnExpression, CaseExpression ) df = pd.DataFrame({ 'a': [1, 2, 3, 4], 'b': [True, None, False, True], 'c': [42, 21, 13, 14] }) hello = ConstantExpression('hello') world = ConstantExpression('world') case = \ CaseExpression(condition = ColumnExpression('b') == False, value = world) \ .otherwise(hello) duckdb.df(df).select(case).show() ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ CASE WHEN ((b = false)) THEN ('world') ELSE 'hello' END â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ hello â”‚ â”‚ hello â”‚ â”‚ world â”‚ â”‚ hello â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Function Expression {#docs:stable:clients:python:expression::function-expression} This expression contains a function call. It can be constructed by providing the function name and an arbitrary amount of Expressions as arguments. ```python import duckdb import pandas as pd from duckdb import ( ConstantExpression, ColumnExpression, FunctionExpression ) df = pd.DataFrame({ 'a': [1, 2, 3, 4], 'b': [True, None, False, True], 'c': [42, 21, 13, 14] }) multiply_by_2 = FunctionExpression('multiply', ColumnExpression('a'), ConstantExpression(2)) duckdb.df(df).select(multiply_by_2).show() ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ multiply(a, 2) â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 2 â”‚ â”‚ 4 â”‚ â”‚ 6 â”‚ â”‚ 8 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### SQL Expression {#docs:stable:clients:python:expression::sql-expression} This expression contains any valid SQL expression. ```python import duckdb import pandas as pd from duckdb import SQLExpression df = pd.DataFrame({ 'a': [1, 2, 3, 4], 'b': [True, None, False, True], 'c': [42, 21, 13, 14] }) duckdb.df(df).filter( SQLExpression("b is true") ).select( SQLExpression("a").alias("selecting_column_a"), SQLExpression("case when a = 1 then 1 else 0 end").alias("selecting_case_expression"), SQLExpression("1").alias("constant_numeric_column"), SQLExpression("'hello'").alias("constant_text_column") ).aggregate( aggr_expr=[ SQLExpression("SUM(selecting_column_a)").alias("sum_a"), "selecting_case_expression" , "constant_numeric_column", "constant_text_column" ], ).show() ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ sum_a â”‚ selecting_case_expression â”‚ constant_numeric_column â”‚ constant_text_column â”‚ â”‚ int128 â”‚ int32 â”‚ int32 â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 4 â”‚ 0 â”‚ 1 â”‚ hello â”‚ â”‚ 1 â”‚ 1 â”‚ 1 â”‚ hello â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Common Operations {#docs:stable:clients:python:expression::common-operations} The Expression class also contains many operations that can be applied to any Expression type. | Operation | Description | |--------------------------------|-----------------------------------------------------------------------------------------------------------------------------| | `.alias(name: str)` | Applies an alias to the expression | | `.cast(type: DuckDBPyType)` | Applies a cast to the provided type on the expression | | `.isin(*exprs: Expression)` | Creates an [`IN` expression](#docs:stable:sql:expressions:in::in) against the provided expressions as the list | | `.isnotin(*exprs: Expression)` | Creates a [`NOT IN` expression](#docs:stable:sql:expressions:in::not-in) against the provided expressions as the list | | `.isnotnull()` | Checks whether the expression is not `NULL` | | `.isnull()` | Checks whether the expression is `NULL` | ##### Order Operations {#docs:stable:clients:python:expression::order-operations} When expressions are provided to `DuckDBPyRelation.order()`, the following order operations can be applied. | Operation | Description | |--------------------------------|------------------------------------------------------------------------------------| | `.asc()` | Indicates that this expression should be sorted in ascending order | | `.desc()` | Indicates that this expression should be sorted in descending order | | `.nulls_first()` | Indicates that the nulls in this expression should precede the non-null values | | `.nulls_last()` | Indicates that the nulls in this expression should come after the non-null values | ### Spark API {#docs:stable:clients:python:spark_api} The DuckDB Spark API implements the [PySpark API](https://spark.apache.org/docs/3.5.0/api/python/reference/index.html), allowing you to use the familiar Spark API to interact with DuckDB. All statements are translated to DuckDB's internal plans using our [relational API](#docs:stable:clients:python:relational_api) and executed using DuckDB's query engine. > **Warning.** The DuckDB Spark API is currently experimental and features are still missing. We are very interested in feedback. Please report any functionality that you are missing, either through [Discord](https://discord.duckdb.org) or on [GitHub](https://github.com/duckdb/duckdb/issues). #### Example {#docs:stable:clients:python:spark_api::example} ```python from duckdb.experimental.spark.sql import SparkSession as session from duckdb.experimental.spark.sql.functions import lit, col import pandas as pd spark = session.builder.getOrCreate() pandas_df = pd.DataFrame({ 'age': [34, 45, 23, 56], 'name': ['Joan', 'Peter', 'John', 'Bob'] }) df = spark.createDataFrame(pandas_df) df = df.withColumn( 'location', lit('Seattle') ) res = df.select( col('age'), col('location') ).collect() print(res) ``` ```text [ Row(age=34, location='Seattle'), Row(age=45, location='Seattle'), Row(age=23, location='Seattle'), Row(age=56, location='Seattle') ] ``` #### Contribution Guidelines {#docs:stable:clients:python:spark_api::contribution-guidelines} Contributions to the experimental Spark API are welcome. When making a contribution, please follow these guidelines: * Instead of using temporary files, use our `pytest` testing framework. * When adding new functions, ensure that method signatures comply with those in the [PySpark API](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/index.html). ### Python Client API {#docs:stable:clients:python:reference:index}

class duckdb.BinaryValue(object: Any)¶: Bases: Value

class duckdb.BinderException¶: Bases: ProgrammingError

class duckdb.BitValue(object: Any)¶: Bases: Value

class duckdb.BlobValue(object: Any)¶: Bases: Value

class duckdb.BooleanValue(object: Any)¶: Bases: Value

class duckdb.CSVLineTerminator¶

Bases: pybind11_object

Members:

LINE_FEED

CARRIAGE_RETURN_LINE_FEED

property name¶

class duckdb.CatalogException¶: Bases: ProgrammingError

class duckdb.ConnectionException¶: Bases: OperationalError

class duckdb.ConstraintException¶: Bases: IntegrityError

class duckdb.ConversionException¶: Bases: DataError

class duckdb.DBAPITypeObject(types: list[DuckDBPyType])¶: Bases: object

class duckdb.DataError¶: Bases: DatabaseError

class duckdb.DateValue(object: Any)¶: Bases: Value

class duckdb.DecimalValue(object: Any, width: int, scale: int)¶: Bases: Value

class duckdb.DoubleValue(object: Any)¶: Bases: Value

class duckdb.DuckDBPyConnection¶

Bases: pybind11_object

append(self: _duckdb.DuckDBPyConnection, table_name: str, df: pandas.DataFrame, *, by_name: bool = False) → _duckdb.DuckDBPyConnection¶: Append the passed DataFrame to the named table

array_type(self: _duckdb.DuckDBPyConnection, type: _duckdb.typing.DuckDBPyType, size: SupportsInt) → _duckdb.typing.DuckDBPyType¶: Create an array type object of ‘type’

arrow(self: _duckdb.DuckDBPyConnection, rows_per_batch: SupportsInt = 1000000) → pyarrow.lib.RecordBatchReader¶: Fetch an Arrow RecordBatchReader following execute()

begin(self: _duckdb.DuckDBPyConnection) → _duckdb.DuckDBPyConnection¶: Start a new transaction

checkpoint(self: _duckdb.DuckDBPyConnection) → _duckdb.DuckDBPyConnection¶: Synchronizes data in the write-ahead log (WAL) to the database data file (no-op for in-memory connections)

close(self: _duckdb.DuckDBPyConnection) → None¶: Close the connection

commit(self: _duckdb.DuckDBPyConnection) → _duckdb.DuckDBPyConnection¶: Commit changes performed within a transaction

create_function(self: _duckdb.DuckDBPyConnection, name: str, function: collections.abc.Callable, parameters: object = None, return_type: _duckdb.typing.DuckDBPyType = None, *, type: _duckdb.functional.PythonUDFType = <PythonUDFType.NATIVE: 0>, null_handling: _duckdb.functional.FunctionNullHandling = <FunctionNullHandling.DEFAULT: 0>, exception_handling: _duckdb.PythonExceptionHandling = <PythonExceptionHandling.DEFAULT: 0>, side_effects: bool = False) → _duckdb.DuckDBPyConnection¶: Create a DuckDB function out of the passing in Python function so it can be used in queries

cursor(self: _duckdb.DuckDBPyConnection) → _duckdb.DuckDBPyConnection¶: Create a duplicate of the current connection

decimal_type(self: _duckdb.DuckDBPyConnection, width: SupportsInt, scale: SupportsInt) → _duckdb.typing.DuckDBPyType¶: Create a decimal type with ‘width’ and ‘scale’

property description¶: Get result set attributes, mainly column names

df(self: _duckdb.DuckDBPyConnection, *, date_as_object: bool = False) → pandas.DataFrame¶: Fetch a result as DataFrame following execute()

dtype(self: _duckdb.DuckDBPyConnection, type_str: str) → _duckdb.typing.DuckDBPyType¶: Create a type object by parsing the ‘type_str’ string

duplicate(self: _duckdb.DuckDBPyConnection) → _duckdb.DuckDBPyConnection¶: Create a duplicate of the current connection

enum_type(self: _duckdb.DuckDBPyConnection, name: str, type: _duckdb.typing.DuckDBPyType, values: list) → _duckdb.typing.DuckDBPyType¶: Create an enum type of underlying ‘type’, consisting of the list of ‘values’

execute(self: _duckdb.DuckDBPyConnection, query: object, parameters: object = None) → _duckdb.DuckDBPyConnection¶: Execute the given SQL query, optionally using prepared statements with parameters set

executemany(self: _duckdb.DuckDBPyConnection, query: object, parameters: object = None) → _duckdb.DuckDBPyConnection¶: Execute the given prepared statement multiple times using the list of parameter sets in parameters

extract_statements(self: _duckdb.DuckDBPyConnection, query: str) → list¶: Parse the query string and extract the Statement object(s) produced

fetch_arrow_table(self: _duckdb.DuckDBPyConnection, rows_per_batch: SupportsInt = 1000000) → pyarrow.lib.Table¶: Fetch a result as Arrow table following execute()

fetch_df(self: _duckdb.DuckDBPyConnection, *, date_as_object: bool = False) → pandas.DataFrame¶: Fetch a result as DataFrame following execute()

fetch_df_chunk(self: _duckdb.DuckDBPyConnection, vectors_per_chunk: SupportsInt = 1, *, date_as_object: bool = False) → pandas.DataFrame¶: Fetch a chunk of the result as DataFrame following execute()

fetch_record_batch(self: _duckdb.DuckDBPyConnection, rows_per_batch: SupportsInt = 1000000) → pyarrow.lib.RecordBatchReader¶: Fetch an Arrow RecordBatchReader following execute()

fetchall(self: _duckdb.DuckDBPyConnection) → list¶: Fetch all rows from a result following execute

fetchdf(self: _duckdb.DuckDBPyConnection, *, date_as_object: bool = False) → pandas.DataFrame¶: Fetch a result as DataFrame following execute()

fetchmany(self: _duckdb.DuckDBPyConnection, size: SupportsInt = 1) → list¶: Fetch the next set of rows from a result following execute

fetchnumpy(self: _duckdb.DuckDBPyConnection) → dict¶: Fetch a result as list of NumPy arrays following execute

fetchone(self: _duckdb.DuckDBPyConnection) → Optional[tuple]¶: Fetch a single row from a result following execute

filesystem_is_registered(self: _duckdb.DuckDBPyConnection, name: str) → bool¶: Check if a filesystem with the provided name is currently registered

from_arrow(self: _duckdb.DuckDBPyConnection, arrow_object: object) → _duckdb.DuckDBPyRelation¶: Create a relation object from an Arrow object

from_csv_auto(self: _duckdb.DuckDBPyConnection, path_or_buffer: object, **kwargs) → _duckdb.DuckDBPyRelation¶: Create a relation object from the CSV file in ‘name’

from_df(self: _duckdb.DuckDBPyConnection, df: pandas.DataFrame) → _duckdb.DuckDBPyRelation¶: Create a relation object from the DataFrame in df

from_parquet(*args, **kwargs)¶

Overloaded function.

from_parquet(self: _duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_glob

from_parquet(self: _duckdb.DuckDBPyConnection, file_globs: collections.abc.Sequence[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_globs

from_query(self: _duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) → _duckdb.DuckDBPyRelation¶: Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

get_table_names(self: _duckdb.DuckDBPyConnection, query: str, *, qualified: bool = False) → set[str]¶: Extract the required table names from a query

install_extension(self: _duckdb.DuckDBPyConnection, extension: str, *, force_install: bool = False, repository: object = None, repository_url: object = None, version: object = None) → None¶: Install an extension by name, with an optional version and/or repository to get the extension from

interrupt(self: _duckdb.DuckDBPyConnection) → None¶: Interrupt pending operations

list_filesystems(self: _duckdb.DuckDBPyConnection) → list¶: List registered filesystems, including builtin ones

list_type(self: _duckdb.DuckDBPyConnection, type: _duckdb.typing.DuckDBPyType) → _duckdb.typing.DuckDBPyType¶: Create a list type object of ‘type’

load_extension(self: _duckdb.DuckDBPyConnection, extension: str) → None¶: Load an installed extension

map_type(self: _duckdb.DuckDBPyConnection, key: _duckdb.typing.DuckDBPyType, value: _duckdb.typing.DuckDBPyType) → _duckdb.typing.DuckDBPyType¶: Create a map type object from ‘key_type’ and ‘value_type’

pl(self: _duckdb.DuckDBPyConnection, rows_per_batch: SupportsInt = 1000000, *, lazy: bool = False) → duckdb::PolarsDataFrame¶: Fetch a result as Polars DataFrame following execute()

query(self: _duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) → _duckdb.DuckDBPyRelation¶: Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

query_progress(self: _duckdb.DuckDBPyConnection) → float¶: Query progress of pending operation

read_csv(self: _duckdb.DuckDBPyConnection, path_or_buffer: object, **kwargs) → _duckdb.DuckDBPyRelation¶: Create a relation object from the CSV file in ‘name’

read_json(self: _duckdb.DuckDBPyConnection, path_or_buffer: object, *, columns: Optional[object] = None, sample_size: Optional[object] = None, maximum_depth: Optional[object] = None, records: Optional[str] = None, format: Optional[str] = None, date_format: Optional[object] = None, timestamp_format: Optional[object] = None, compression: Optional[object] = None, maximum_object_size: Optional[object] = None, ignore_errors: Optional[object] = None, convert_strings_to_integers: Optional[object] = None, field_appearance_threshold: Optional[object] = None, map_inference_threshold: Optional[object] = None, maximum_sample_files: Optional[object] = None, filename: Optional[object] = None, hive_partitioning: Optional[object] = None, union_by_name: Optional[object] = None, hive_types: Optional[object] = None, hive_types_autocast: Optional[object] = None) → _duckdb.DuckDBPyRelation¶: Create a relation object from the JSON file in ‘name’

read_parquet(*args, **kwargs)¶

Overloaded function.

read_parquet(self: _duckdb.DuckDBPyConnection, file_glob: str, binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_glob

read_parquet(self: _duckdb.DuckDBPyConnection, file_globs: collections.abc.Sequence[str], binary_as_string: bool = False, *, file_row_number: bool = False, filename: bool = False, hive_partitioning: bool = False, union_by_name: bool = False, compression: object = None) -> _duckdb.DuckDBPyRelation

Create a relation object from the Parquet files in file_globs

register(self: _duckdb.DuckDBPyConnection, view_name: str, python_object: object) → _duckdb.DuckDBPyConnection¶: Register the passed Python Object value for querying with a view

register_filesystem(self: _duckdb.DuckDBPyConnection, filesystem: fsspec.AbstractFileSystem) → None¶: Register a fsspec compliant filesystem

remove_function(self: _duckdb.DuckDBPyConnection, name: str) → _duckdb.DuckDBPyConnection¶: Remove a previously created function

rollback(self: _duckdb.DuckDBPyConnection) → _duckdb.DuckDBPyConnection¶: Roll back changes performed within a transaction

row_type(self: _duckdb.DuckDBPyConnection, fields: object) → _duckdb.typing.DuckDBPyType¶: Create a struct type object from ‘fields’

property rowcount¶: Get result set row count

sql(self: _duckdb.DuckDBPyConnection, query: object, *, alias: str = '', params: object = None) → _duckdb.DuckDBPyRelation¶: Run a SQL query. If it is a SELECT statement, create a relation object from the given SQL query, otherwise run the query as-is.

sqltype(self: _duckdb.DuckDBPyConnection, type_str: str) → _duckdb.typing.DuckDBPyType¶: Create a type object by parsing the ‘type_str’ string

string_type(self: _duckdb.DuckDBPyConnection, collation: str = '') → _duckdb.typing.DuckDBPyType¶: Create a string type with an optional collation

struct_type(self: _duckdb.DuckDBPyConnection, fields: object) → _duckdb.typing.DuckDBPyType¶: Create a struct type object from ‘fields’

table(self: _duckdb.DuckDBPyConnection, table_name: str) → _duckdb.DuckDBPyRelation¶: Create a relation object for the named table

table_function(self: _duckdb.DuckDBPyConnection, name: str, parameters: object = None) → _duckdb.DuckDBPyRelation¶: Create a relation object from the named table function with given parameters

tf(self: _duckdb.DuckDBPyConnection) → dict¶: Fetch a result as dict of TensorFlow Tensors following execute()

torch(self: _duckdb.DuckDBPyConnection) → dict¶: Fetch a result as dict of PyTorch Tensors following execute()

type(self: _duckdb.DuckDBPyConnection, type_str: str) → _duckdb.typing.DuckDBPyType¶: Create a type object by parsing the ‘type_str’ string

union_type(self: _duckdb.DuckDBPyConnection, members: object) → _duckdb.typing.DuckDBPyType¶: Create a union type object from ‘members’

unregister(self: _duckdb.DuckDBPyConnection, view_name: str) → _duckdb.DuckDBPyConnection¶: Unregister the view name

unregister_filesystem(self: _duckdb.DuckDBPyConnection, name: str) → None¶: Unregister a filesystem

values(self: _duckdb.DuckDBPyConnection, *args) → _duckdb.DuckDBPyRelation¶: Create a relation object from the passed values

view(self: _duckdb.DuckDBPyConnection, view_name: str) → _duckdb.DuckDBPyRelation¶: Create a relation object for the named view

class duckdb.DuckDBPyRelation¶

Bases: pybind11_object

Detailed examples can be found at Relational API page.

aggregate(self: _duckdb.DuckDBPyRelation, aggr_expr: object, group_expr: str = '') → _duckdb.DuckDBPyRelation¶: Compute the aggregate aggr_expr by the optional groups group_expr on the relation
Detailed examples can be found at Relational API page.

alias¶: Get the name of the current alias
Detailed examples can be found at Relational API page.

any_value(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Returns the first non-null value from a given column
Detailed examples can be found at Relational API page.

apply(self: _duckdb.DuckDBPyRelation, function_name: str, function_aggr: str, group_expr: str = '', function_parameter: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Compute the function of a single column or a list of columns by the optional groups on the relation
Detailed examples can be found at Relational API page.

arg_max(self: _duckdb.DuckDBPyRelation, arg_column: str, value_column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Finds the row with the maximum value for a value column and returns the value of that row for an argument column
Detailed examples can be found at Relational API page.

arg_min(self: _duckdb.DuckDBPyRelation, arg_column: str, value_column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Finds the row with the minimum value for a value column and returns the value of that row for an argument column
Detailed examples can be found at Relational API page.

arrow(self: _duckdb.DuckDBPyRelation, batch_size: SupportsInt = 1000000) → pyarrow.lib.RecordBatchReader¶: Execute and return an Arrow Record Batch Reader that yields all rows
Detailed examples can be found at Relational API page.

avg(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the average on a given column
Detailed examples can be found at Relational API page.

bit_and(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the bitwise AND of all bits present in a given column
Detailed examples can be found at Relational API page.

bit_or(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the bitwise OR of all bits present in a given column
Detailed examples can be found at Relational API page.

bit_xor(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the bitwise XOR of all bits present in a given column
Detailed examples can be found at Relational API page.

bitstring_agg(self: _duckdb.DuckDBPyRelation, column: str, min: Optional[object] = None, max: Optional[object] = None, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes a bitstring with bits set for each distinct value in a given column
Detailed examples can be found at Relational API page.

bool_and(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the logical AND of all values present in a given column
Detailed examples can be found at Relational API page.

bool_or(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the logical OR of all values present in a given column
Detailed examples can be found at Relational API page.

close(self: _duckdb.DuckDBPyRelation) → None¶: Closes the result
Detailed examples can be found at Relational API page.

columns¶: Return a list containing the names of the columns of the relation.
Detailed examples can be found at Relational API page.

count(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the number of elements present in a given column
Detailed examples can be found at Relational API page.

create(self: _duckdb.DuckDBPyRelation, table_name: str) → None¶: Creates a new table named table_name with the contents of the relation object
Detailed examples can be found at Relational API page.

create_view(self: _duckdb.DuckDBPyRelation, view_name: str, replace: bool = True) → _duckdb.DuckDBPyRelation¶: Creates a view named view_name that refers to the relation object
Detailed examples can be found at Relational API page.

cross(self: _duckdb.DuckDBPyRelation, other_rel: _duckdb.DuckDBPyRelation) → _duckdb.DuckDBPyRelation¶: Create cross/cartesian product of two relational objects
Detailed examples can be found at Relational API page.

cume_dist(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the cumulative distribution within the partition
Detailed examples can be found at Relational API page.

dense_rank(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the dense rank within the partition
Detailed examples can be found at Relational API page.

describe(self: _duckdb.DuckDBPyRelation) → _duckdb.DuckDBPyRelation¶: Gives basic statistics (e.g., min, max) and if NULL exists for each column of the relation.
Detailed examples can be found at Relational API page.

description¶: Return the description of the result
Detailed examples can be found at Relational API page.

df(self: _duckdb.DuckDBPyRelation, *, date_as_object: bool = False) → pandas.DataFrame¶: Execute and fetch all rows as a pandas DataFrame
Detailed examples can be found at Relational API page.

distinct(self: _duckdb.DuckDBPyRelation) → _duckdb.DuckDBPyRelation¶: Retrieve distinct rows from this relation object
Detailed examples can be found at Relational API page.

dtypes¶: Return a list containing the types of the columns of the relation.
Detailed examples can be found at Relational API page.

except_(self: _duckdb.DuckDBPyRelation, other_rel: _duckdb.DuckDBPyRelation) → _duckdb.DuckDBPyRelation¶: Create the set except of this relation object with another relation object in other_rel
Detailed examples can be found at Relational API page.

execute(self: _duckdb.DuckDBPyRelation) → _duckdb.DuckDBPyRelation¶: Transform the relation into a result set
Detailed examples can be found at Relational API page.

explain(self: _duckdb.DuckDBPyRelation, type: _duckdb.ExplainType = 'standard') → str¶: Detailed examples can be found at Relational API page.

favg(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the average of all values present in a given column using a more accurate floating point summation (Kahan Sum)
Detailed examples can be found at Relational API page.

fetch_arrow_reader(self: _duckdb.DuckDBPyRelation, batch_size: SupportsInt = 1000000) → pyarrow.lib.RecordBatchReader¶: Execute and return an Arrow Record Batch Reader that yields all rows
Detailed examples can be found at Relational API page.

fetch_arrow_table(self: _duckdb.DuckDBPyRelation, batch_size: SupportsInt = 1000000) → pyarrow.lib.Table¶: Execute and fetch all rows as an Arrow Table
Detailed examples can be found at Relational API page.

fetch_df_chunk(self: _duckdb.DuckDBPyRelation, vectors_per_chunk: SupportsInt = 1, *, date_as_object: bool = False) → pandas.DataFrame¶: Execute and fetch a chunk of the rows
Detailed examples can be found at Relational API page.

fetch_record_batch(self: _duckdb.DuckDBPyRelation, rows_per_batch: SupportsInt = 1000000) → pyarrow.lib.RecordBatchReader¶: Execute and return an Arrow Record Batch Reader that yields all rows
Detailed examples can be found at Relational API page.

fetchall(self: _duckdb.DuckDBPyRelation) → list¶: Execute and fetch all rows as a list of tuples
Detailed examples can be found at Relational API page.

fetchdf(self: _duckdb.DuckDBPyRelation, *, date_as_object: bool = False) → pandas.DataFrame¶: Execute and fetch all rows as a pandas DataFrame
Detailed examples can be found at Relational API page.

fetchmany(self: _duckdb.DuckDBPyRelation, size: SupportsInt = 1) → list¶: Execute and fetch the next set of rows as a list of tuples
Detailed examples can be found at Relational API page.

fetchnumpy(self: _duckdb.DuckDBPyRelation) → dict¶: Execute and fetch all rows as a Python dict mapping each column to one numpy arrays
Detailed examples can be found at Relational API page.

fetchone(self: _duckdb.DuckDBPyRelation) → Optional[tuple]¶: Execute and fetch a single row as a tuple
Detailed examples can be found at Relational API page.

filter(self: _duckdb.DuckDBPyRelation, filter_expr: object) → _duckdb.DuckDBPyRelation¶: Filter the relation object by the filter in filter_expr
Detailed examples can be found at Relational API page.

first(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Returns the first value of a given column
Detailed examples can be found at Relational API page.

first_value(self: _duckdb.DuckDBPyRelation, column: str, window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the first value within the group or partition
Detailed examples can be found at Relational API page.

fsum(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the sum of all values present in a given column using a more accurate floating point summation (Kahan Sum)
Detailed examples can be found at Relational API page.

geomean(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the geometric mean over all values present in a given column
Detailed examples can be found at Relational API page.

histogram(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the histogram over all values present in a given column
Detailed examples can be found at Relational API page.

insert(self: _duckdb.DuckDBPyRelation, values: object) → None¶: Inserts the given values into the relation
Detailed examples can be found at Relational API page.

insert_into(self: _duckdb.DuckDBPyRelation, table_name: str) → None¶: Inserts the relation object into an existing table named table_name
Detailed examples can be found at Relational API page.

intersect(self: _duckdb.DuckDBPyRelation, other_rel: _duckdb.DuckDBPyRelation) → _duckdb.DuckDBPyRelation¶: Create the set intersection of this relation object with another relation object in other_rel
Detailed examples can be found at Relational API page.

join(self: _duckdb.DuckDBPyRelation, other_rel: _duckdb.DuckDBPyRelation, condition: object, how: str = 'inner') → _duckdb.DuckDBPyRelation¶: Join the relation object with another relation object in other_rel using the join condition expression in join_condition. Types supported are ‘inner’, ‘left’, ‘right’, ‘outer’, ‘semi’ and ‘anti’
Detailed examples can be found at Relational API page.

lag(self: _duckdb.DuckDBPyRelation, column: str, window_spec: str, offset: SupportsInt = 1, default_value: str = 'NULL', ignore_nulls: bool = False, projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the lag within the partition
Detailed examples can be found at Relational API page.

last(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Returns the last value of a given column
Detailed examples can be found at Relational API page.

last_value(self: _duckdb.DuckDBPyRelation, column: str, window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the last value within the group or partition
Detailed examples can be found at Relational API page.

lead(self: _duckdb.DuckDBPyRelation, column: str, window_spec: str, offset: SupportsInt = 1, default_value: str = 'NULL', ignore_nulls: bool = False, projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the lead within the partition
Detailed examples can be found at Relational API page.

limit(self: _duckdb.DuckDBPyRelation, n: SupportsInt, offset: SupportsInt = 0) → _duckdb.DuckDBPyRelation¶: Only retrieve the first n rows from this relation object, starting at offset
Detailed examples can be found at Relational API page.

list(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Returns a list containing all values present in a given column
Detailed examples can be found at Relational API page.

map(self: _duckdb.DuckDBPyRelation, map_function: collections.abc.Callable, *, schema: Optional[object] = None) → _duckdb.DuckDBPyRelation¶: Calls the passed function on the relation
Detailed examples can be found at Relational API page.

max(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Returns the maximum value present in a given column
Detailed examples can be found at Relational API page.

mean(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the average on a given column
Detailed examples can be found at Relational API page.

median(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the median over all values present in a given column
Detailed examples can be found at Relational API page.

min(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Returns the minimum value present in a given column
Detailed examples can be found at Relational API page.

mode(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the mode over all values present in a given column
Detailed examples can be found at Relational API page.

n_tile(self: _duckdb.DuckDBPyRelation, window_spec: str, num_buckets: SupportsInt, projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Divides the partition as equally as possible into num_buckets
Detailed examples can be found at Relational API page.

nth_value(self: _duckdb.DuckDBPyRelation, column: str, window_spec: str, offset: SupportsInt, ignore_nulls: bool = False, projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the nth value within the partition
Detailed examples can be found at Relational API page.

order(self: _duckdb.DuckDBPyRelation, order_expr: str) → _duckdb.DuckDBPyRelation¶: Reorder the relation object by order_expr
Detailed examples can be found at Relational API page.

percent_rank(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the relative rank within the partition
Detailed examples can be found at Relational API page.

pl(self: _duckdb.DuckDBPyRelation, batch_size: SupportsInt = 1000000, *, lazy: bool = False) → duckdb::PolarsDataFrame¶: Execute and fetch all rows as a Polars DataFrame
Detailed examples can be found at Relational API page.

product(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Returns the product of all values present in a given column
Detailed examples can be found at Relational API page.

project(self: _duckdb.DuckDBPyRelation, *args, groups: str = '') → _duckdb.DuckDBPyRelation¶: Project the relation object by the projection in project_expr
Detailed examples can be found at Relational API page.

quantile(self: _duckdb.DuckDBPyRelation, column: str, q: object = 0.5, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the exact quantile value for a given column
Detailed examples can be found at Relational API page.

quantile_cont(self: _duckdb.DuckDBPyRelation, column: str, q: object = 0.5, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the interpolated quantile value for a given column
Detailed examples can be found at Relational API page.

quantile_disc(self: _duckdb.DuckDBPyRelation, column: str, q: object = 0.5, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the exact quantile value for a given column
Detailed examples can be found at Relational API page.

query(self: _duckdb.DuckDBPyRelation, virtual_table_name: str, sql_query: str) → _duckdb.DuckDBPyRelation¶: Run the given SQL query in sql_query on the view named virtual_table_name that refers to the relation object
Detailed examples can be found at Relational API page.

rank(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the rank within the partition
Detailed examples can be found at Relational API page.

rank_dense(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the dense rank within the partition
Detailed examples can be found at Relational API page.

record_batch(self: object, batch_size: SupportsInt = 1000000) → object¶: Detailed examples can be found at Relational API page.

row_number(self: _duckdb.DuckDBPyRelation, window_spec: str, projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the row number within the partition
Detailed examples can be found at Relational API page.

select(self: _duckdb.DuckDBPyRelation, *args, groups: str = '') → _duckdb.DuckDBPyRelation¶: Project the relation object by the projection in project_expr
Detailed examples can be found at Relational API page.

select_dtypes(self: _duckdb.DuckDBPyRelation, types: object) → _duckdb.DuckDBPyRelation¶: Select columns from the relation, by filtering based on type(s)
Detailed examples can be found at Relational API page.

select_types(self: _duckdb.DuckDBPyRelation, types: object) → _duckdb.DuckDBPyRelation¶: Select columns from the relation, by filtering based on type(s)
Detailed examples can be found at Relational API page.

set_alias(self: _duckdb.DuckDBPyRelation, alias: str) → _duckdb.DuckDBPyRelation¶: Rename the relation object to new alias
Detailed examples can be found at Relational API page.

shape¶: Tuple of # of rows, # of columns in relation.
Detailed examples can be found at Relational API page.

show(self: _duckdb.DuckDBPyRelation, *, max_width: Optional[SupportsInt] = None, max_rows: Optional[SupportsInt] = None, max_col_width: Optional[SupportsInt] = None, null_value: Optional[str] = None, render_mode: object = None) → None¶: Display a summary of the data
Detailed examples can be found at Relational API page.

sort(self: _duckdb.DuckDBPyRelation, *args) → _duckdb.DuckDBPyRelation¶: Reorder the relation object by the provided expressions
Detailed examples can be found at Relational API page.

sql_query(self: _duckdb.DuckDBPyRelation) → str¶: Get the SQL query that is equivalent to the relation
Detailed examples can be found at Relational API page.

std(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the sample standard deviation for a given column
Detailed examples can be found at Relational API page.

stddev(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the sample standard deviation for a given column
Detailed examples can be found at Relational API page.

stddev_pop(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the population standard deviation for a given column
Detailed examples can be found at Relational API page.

stddev_samp(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the sample standard deviation for a given column
Detailed examples can be found at Relational API page.

string_agg(self: _duckdb.DuckDBPyRelation, column: str, sep: str = ',', groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Concatenates the values present in a given column with a separator
Detailed examples can be found at Relational API page.

sum(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the sum of all values present in a given column
Detailed examples can be found at Relational API page.

tf(self: _duckdb.DuckDBPyRelation) → dict¶: Fetch a result as dict of TensorFlow Tensors
Detailed examples can be found at Relational API page.

to_arrow_table(self: _duckdb.DuckDBPyRelation, batch_size: SupportsInt = 1000000) → pyarrow.lib.Table¶: Execute and fetch all rows as an Arrow Table
Detailed examples can be found at Relational API page.

to_csv(self: _duckdb.DuckDBPyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None) → None¶: Write the relation object to a CSV file in ‘file_name’
Detailed examples can be found at Relational API page.

to_df(self: _duckdb.DuckDBPyRelation, *, date_as_object: bool = False) → pandas.DataFrame¶: Execute and fetch all rows as a pandas DataFrame
Detailed examples can be found at Relational API page.

to_parquet(self: _duckdb.DuckDBPyRelation, file_name: str, *, compression: object = None, field_ids: object = None, row_group_size_bytes: object = None, row_group_size: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None, append: object = None) → None¶: Write the relation object to a Parquet file in ‘file_name’
Detailed examples can be found at Relational API page.

to_table(self: _duckdb.DuckDBPyRelation, table_name: str) → None¶: Creates a new table named table_name with the contents of the relation object
Detailed examples can be found at Relational API page.

to_view(self: _duckdb.DuckDBPyRelation, view_name: str, replace: bool = True) → _duckdb.DuckDBPyRelation¶: Creates a view named view_name that refers to the relation object
Detailed examples can be found at Relational API page.

torch(self: _duckdb.DuckDBPyRelation) → dict¶: Fetch a result as dict of PyTorch Tensors
Detailed examples can be found at Relational API page.

type¶: Get the type of the relation.
Detailed examples can be found at Relational API page.

types¶: Return a list containing the types of the columns of the relation.
Detailed examples can be found at Relational API page.

union(self: _duckdb.DuckDBPyRelation, union_rel: _duckdb.DuckDBPyRelation) → _duckdb.DuckDBPyRelation¶: Create the set union of this relation object with another relation object in other_rel
Detailed examples can be found at Relational API page.

unique(self: _duckdb.DuckDBPyRelation, unique_aggr: str) → _duckdb.DuckDBPyRelation¶: Returns the distinct values in a column.
Detailed examples can be found at Relational API page.

update(self: _duckdb.DuckDBPyRelation, set: object, *, condition: object = None) → None¶: Update the given relation with the provided expressions
Detailed examples can be found at Relational API page.

value_counts(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '') → _duckdb.DuckDBPyRelation¶: Computes the number of elements present in a given column, also projecting the original column
Detailed examples can be found at Relational API page.

var(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the sample variance for a given column
Detailed examples can be found at Relational API page.

var_pop(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the population variance for a given column
Detailed examples can be found at Relational API page.

var_samp(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the sample variance for a given column
Detailed examples can be found at Relational API page.

variance(self: _duckdb.DuckDBPyRelation, column: str, groups: str = '', window_spec: str = '', projected_columns: str = '') → _duckdb.DuckDBPyRelation¶: Computes the sample variance for a given column
Detailed examples can be found at Relational API page.

write_csv(self: _duckdb.DuckDBPyRelation, file_name: str, *, sep: object = None, na_rep: object = None, header: object = None, quotechar: object = None, escapechar: object = None, date_format: object = None, timestamp_format: object = None, quoting: object = None, encoding: object = None, compression: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None) → None¶: Write the relation object to a CSV file in ‘file_name’
Detailed examples can be found at Relational API page.

write_parquet(self: _duckdb.DuckDBPyRelation, file_name: str, *, compression: object = None, field_ids: object = None, row_group_size_bytes: object = None, row_group_size: object = None, overwrite: object = None, per_thread_output: object = None, use_tmp_file: object = None, partition_by: object = None, write_partition_columns: object = None, append: object = None) → None¶: Write the relation object to a Parquet file in ‘file_name’
Detailed examples can be found at Relational API page.

class duckdb.Error¶: Bases: Exception

class duckdb.ExpectedResultType¶

Bases: pybind11_object

Members:

QUERY_RESULT

CHANGED_ROWS

NOTHING

property name¶

class duckdb.ExplainType¶

Bases: pybind11_object

Members:

STANDARD

ANALYZE

property name¶

class duckdb.Expression¶

Bases: pybind11_object

alias(self: _duckdb.Expression, arg0: str) → _duckdb.Expression¶

Create a copy of this expression with the given alias.

Parameters:: name: The alias to use for the expression, this will affect how it can be referenced.
Returns:: Expression: self with an alias.

asc(self: _duckdb.Expression) → _duckdb.Expression¶: Set the order by modifier to ASCENDING.

between(self: _duckdb.Expression, lower: _duckdb.Expression, upper: _duckdb.Expression) → _duckdb.Expression¶

cast(self: _duckdb.Expression, type: _duckdb.typing.DuckDBPyType) → _duckdb.Expression¶

Create a CastExpression to type from self

Parameters:: type: The type to cast to
Returns:: CastExpression: self::type

collate(self: _duckdb.Expression, collation: str) → _duckdb.Expression¶

desc(self: _duckdb.Expression) → _duckdb.Expression¶: Set the order by modifier to DESCENDING.

get_name(self: _duckdb.Expression) → str¶

Return the stringified version of the expression.

Returns:: str: The string representation.

isin(self: _duckdb.Expression, *args) → _duckdb.Expression¶

Return an IN expression comparing self to the input arguments.

Returns:: DuckDBPyExpression: The compare IN expression

isnotin(self: _duckdb.Expression, *args) → _duckdb.Expression¶

Return a NOT IN expression comparing self to the input arguments.

Returns:: DuckDBPyExpression: The compare NOT IN expression

isnotnull(self: _duckdb.Expression) → _duckdb.Expression¶

Create a binary IS NOT NULL expression from self

Returns:: DuckDBPyExpression: self IS NOT NULL

isnull(self: _duckdb.Expression) → _duckdb.Expression¶

Create a binary IS NULL expression from self

Returns:: DuckDBPyExpression: self IS NULL

nulls_first(self: _duckdb.Expression) → _duckdb.Expression¶: Set the NULL order by modifier to NULLS FIRST.

nulls_last(self: _duckdb.Expression) → _duckdb.Expression¶: Set the NULL order by modifier to NULLS LAST.

otherwise(self: _duckdb.Expression, value: _duckdb.Expression) → _duckdb.Expression¶

Add an ELSE <value> clause to the CaseExpression.

Parameters:: value: The value to use if none of the WHEN conditions are met.
Returns:: CaseExpression: self with an ELSE clause.

show(self: _duckdb.Expression) → None¶: Print the stringified version of the expression.

when(self: _duckdb.Expression, condition: _duckdb.Expression, value: _duckdb.Expression) → _duckdb.Expression¶

Add an additional WHEN <condition> THEN <value> clause to the CaseExpression.

Parameters:: condition: The condition that must be met. value: The value to use if the condition is met.
Returns:: CaseExpression: self with an additional WHEN clause.

class duckdb.FatalException¶: Bases: DatabaseError

class duckdb.FloatValue(object: Any)¶: Bases: Value

class duckdb.HTTPException¶

Bases: IOException

Thrown when an error occurs in the httpfs extension, or whilst downloading an extension.

class duckdb.HugeIntegerValue(object: Any)¶: Bases: Value

class duckdb.IOException¶: Bases: OperationalError

class duckdb.IntegerValue(object: Any)¶: Bases: Value

class duckdb.IntegrityError¶: Bases: DatabaseError

class duckdb.InternalError¶: Bases: DatabaseError

class duckdb.InternalException¶: Bases: InternalError

class duckdb.InterruptException¶: Bases: DatabaseError

class duckdb.IntervalValue(object: Any)¶: Bases: Value

class duckdb.InvalidInputException¶: Bases: ProgrammingError

class duckdb.InvalidTypeException¶: Bases: ProgrammingError

class duckdb.LongValue(object: Any)¶: Bases: Value

class duckdb.NotImplementedException¶: Bases: NotSupportedError

class duckdb.NotSupportedError¶: Bases: DatabaseError

class duckdb.NullValue¶: Bases: Value

class duckdb.OperationalError¶: Bases: DatabaseError

class duckdb.OutOfMemoryException¶: Bases: OperationalError

class duckdb.OutOfRangeException¶: Bases: DataError

class duckdb.ParserException¶: Bases: ProgrammingError

class duckdb.PermissionException¶: Bases: DatabaseError

class duckdb.ProgrammingError¶: Bases: DatabaseError

class duckdb.PythonExceptionHandling¶

Bases: pybind11_object

Members:

DEFAULT

RETURN_NULL

property name¶

class duckdb.RenderMode¶

Bases: pybind11_object

Members:

ROWS

COLUMNS

property name¶

class duckdb.SequenceException¶: Bases: DatabaseError

class duckdb.SerializationException¶: Bases: OperationalError

class duckdb.ShortValue(object: Any)¶: Bases: Value

class duckdb.Statement¶

Bases: pybind11_object

property expected_result_type¶: Get the expected type of result produced by this statement, actual type may vary depending on the statement.

property named_parameters¶: Get the map of named parameters this statement has.

property query¶: Get the query equivalent to this statement.

property type¶: Get the type of the statement.

class duckdb.StatementType¶

Bases: pybind11_object

Members:

INVALID

SELECT

INSERT

UPDATE

CREATE

DELETE

PREPARE

EXECUTE

ALTER

TRANSACTION

COPY

ANALYZE

VARIABLE_SET

CREATE_FUNC

EXPLAIN

DROP

EXPORT

PRAGMA

VACUUM

CALL

SET

LOAD

RELATION

EXTENSION

LOGICAL_PLAN

ATTACH

DETACH

MULTI

COPY_DATABASE

MERGE_INTO

property name¶

class duckdb.StringValue(object: Any)¶: Bases: Value

class duckdb.SyntaxException¶: Bases: ProgrammingError

class duckdb.TimeTimeZoneValue(object: Any)¶: Bases: Value

class duckdb.TimeValue(object: Any)¶: Bases: Value

class duckdb.TimestampMilisecondValue(object: Any)¶: Bases: Value

class duckdb.TimestampNanosecondValue(object: Any)¶: Bases: Value

class duckdb.TimestampSecondValue(object: Any)¶: Bases: Value

class duckdb.TimestampTimeZoneValue(object: Any)¶: Bases: Value

class duckdb.TimestampValue(object: Any)¶: Bases: Value

class duckdb.TransactionException¶: Bases: OperationalError

class duckdb.TypeMismatchException¶: Bases: DataError

class duckdb.UUIDValue(object: Any)¶: Bases: Value

class duckdb.UnsignedBinaryValue(object: Any)¶: Bases: Value

class duckdb.UnsignedIntegerValue(object: Any)¶: Bases: Value

class duckdb.UnsignedLongValue(object: Any)¶: Bases: Value

class duckdb.UnsignedShortValue(object: Any)¶: Bases: Value

class duckdb.Value(object: Any, type: DuckDBPyType)¶: Bases: object

class duckdb.Warning¶: Bases: Exception

class duckdb.token_type¶

Bases: pybind11_object

Members:

identifier

numeric_const

string_const

operator

keyword

comment

property name¶

### Troubleshooting {#docs:stable:clients:python:known_issues} #### Troubleshooting {#docs:stable:clients:python:known_issues::troubleshooting} ##### Running `EXPLAIN` Renders Newlines {#docs:stable:clients:python:known_issues::running-explain-renders-newlines} In Python, the output of the [`EXPLAIN` statement](#docs:stable:guides:meta:explain) contains hard line breaks (` \n`): ```python In [1]: import duckdb ...: duckdb.sql("EXPLAIN SELECT 42 AS x") ``` ```text Out[1]: â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ explain_key â”‚ explain_value â”‚ â”‚ varchar â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ physical_plan â”‚ â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”\nâ”‚ PROJECTION â”‚\nâ”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚\nâ”‚ x â€¦ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` To work around this, `print` the output of the `explain()` function: ```python In [2]: print(duckdb.sql("SELECT 42 AS x").explain()) ``` ```text Out[2]: â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ PROJECTION â”‚ â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚ â”‚ x â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ DUMMY_SCAN â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` Please also check out the [Jupyter guide](#docs:stable:guides:python:jupyter) for tips on using Jupyter with JupySQL. ##### Crashes and Errors on Windows {#docs:stable:clients:python:known_issues::crashes-and-errors-on-windows} When importing DuckDB on Windows, the Python runtime may crash or return an error upon import or first use: ```python import duckdb duckdb.sql("...") ``` ```console ImportError: DLL load failed while importing duckdb: The specified module could not be found. ``` ```console Windows fatal exception: access violation Current thread 0x0000311c (most recent call first): File "", line 1 in ``` ```console Process finished with exit code -1073741819 (0xC0000005) ``` The problem is likely caused by using an outdated Microsoft Visual C++ (MSVC) Redistributable package. The solution is to install the [latest MSVC Redistributable package](https://learn.microsoft.com/en-US/cpp/windows/latest-supported-vc-redist). Alternatively, you can instruct `pip` to compile the package from source as follows: ```batch python3 -m pip install duckdb --no-binary duckdb ``` #### Known Issues {#docs:stable:clients:python:known_issues::known-issues} Unfortunately there are some issues that are either beyond our control or are very elusive / hard to track down. Below is a list of these issues that you might have to be aware of, depending on your workflow. ##### Numpy Import Multithreading {#docs:stable:clients:python:known_issues::numpy-import-multithreading} When making use of multi threading and fetching results either directly as Numpy arrays or indirectly through a Pandas DataFrame, it might be necessary to ensure that `numpy.core.multiarray` is imported. If this module has not been imported from the main thread, and a different thread during execution attempts to import it this causes either a deadlock or a crash. To avoid this, it's recommended to `import numpy.core.multiarray` before starting up threads. #### `DESCRIBE` and `SUMMARIZE` Return Empty Tables in Jupyter {#docs:stable:clients:python:known_issues::describe-and-summarize-return-empty-tables-in-jupyter} The `DESCRIBE` and `SUMMARIZE` statements return an empty table: ```python %sql CREATE OR REPLACE TABLE tbl AS (SELECT 42 AS x); DESCRIBE tbl; ``` To work around this, wrap them into a subquery: ```python %sql CREATE OR REPLACE TABLE tbl AS (SELECT 42 AS x); FROM (DESCRIBE tbl); ``` ##### Protobuf Error for JupySQL in IPython {#docs:stable:clients:python:known_issues::protobuf-error-for-jupysql-in-ipython} Loading the JupySQL extension in IPython fails: ```python In [1]: %load_ext sql ``` ```console ImportError: cannot import name 'builder' from 'google.protobuf.internal' (unknown location) ``` The solution is to fix the `protobuf` package. This may require uninstalling conflicting packages, e.g.: ```python %pip uninstall tensorflow %pip install protobuf ``` ## R Client {#docs:stable:clients:r} > The latest stable version of the DuckDB R client is {{ site.current_duckdb_r_version }}. #### Installation {#docs:stable:clients:r::installation} ##### `duckdb`: R Client {#docs:stable:clients:r::duckdb-r-client} The DuckDB R client can be installed using the following command: ```r install.packages("duckdb") ``` Please see the [installation page](https://duckdb.org/install) for details. ##### `duckplyr`: dplyr Client {#docs:stable:clients:r::duckplyr-dplyr-client} DuckDB offers a [dplyr](https://dplyr.tidyverse.org/)-compatible API via the `duckplyr` package. It can be installed using `install.packages("duckplyr")`. For details, see the [`duckplyr` documentation](https://tidyverse.github.io/duckplyr/). #### Reference Manual {#docs:stable:clients:r::reference-manual} The reference manual for the DuckDB R client is available at [r.duckdb.org](https://r.duckdb.org). #### Basic Client Usage {#docs:stable:clients:r::basic-client-usage} The standard DuckDB R client implements the [DBI interface](https://cran.r-project.org/package=DBI) for R. If you are not familiar with DBI yet, see the [Using DBI page](https://solutions.rstudio.com/db/r-packages/DBI/) for an introduction. ##### Startup & Shutdown {#docs:stable:clients:r::startup--shutdown} To use DuckDB, you must first create a connection object that represents the database. The connection object takes as parameter the database file to read and write from. If the database file does not exist, it will be created (the file extension may be `.db`, `.duckdb`, or anything else). The special value `:memory:` (the default) can be used to create an **in-memory database**. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the R process). If you would like to connect to an existing database in read-only mode, set the `read_only` flag to `TRUE`. Read-only mode is required if multiple R processes want to access the same database file at the same time. ```r library("duckdb") # to start an in-memory database con <- dbConnect(duckdb()) # or con <- dbConnect(duckdb(), dbdir = ":memory:") # to use a database file (not shared between processes) con <- dbConnect(duckdb(), dbdir = "my-db.duckdb", read_only = FALSE) # to use a database file (shared between processes) con <- dbConnect(duckdb(), dbdir = "my-db.duckdb", read_only = TRUE) ``` Connections are closed implicitly when they go out of scope or if they are explicitly closed using `dbDisconnect()`. To shut down the database instance associated with the connection, use `dbDisconnect(con, shutdown = TRUE)` ##### Querying {#docs:stable:clients:r::querying} DuckDB supports the standard DBI methods to send queries and retrieve result sets. `dbExecute()` is meant for queries where no results are expected like `CREATE TABLE` or `UPDATE` etc. and `dbGetQuery()` is meant to be used for queries that produce results (e.g., `SELECT`). Below an example. ```r # create a table dbExecute(con, "CREATE TABLE items (item VARCHAR, value DECIMAL(10, 2), count INTEGER)") # insert two items into the table dbExecute(con, "INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)") # retrieve the items again res <- dbGetQuery(con, "SELECT * FROM items") print(res) # item value count # 1 jeans 20.0 1 # 2 hammer 42.2 2 ``` DuckDB also supports prepared statements in the R client with the `dbExecute` and `dbGetQuery` methods. Here is an example: ```r # prepared statement parameters are given as a list dbExecute(con, "INSERT INTO items VALUES (?, ?, ?)", list('laptop', 2000, 1)) # if you want to reuse a prepared statement multiple times, use dbSendStatement() and dbBind() stmt <- dbSendStatement(con, "INSERT INTO items VALUES (?, ?, ?)") dbBind(stmt, list('iphone', 300, 2)) dbBind(stmt, list('android', 3.5, 1)) dbClearResult(stmt) # query the database using a prepared statement res <- dbGetQuery(con, "SELECT item FROM items WHERE value > ?", list(400)) print(res) # item # 1 laptop ``` > **Warning.** Do **not** use prepared statements to insert large amounts of data into DuckDB. See below for better options. #### Efficient Transfer {#docs:stable:clients:r::efficient-transfer} To write a R data frame into DuckDB, use the standard DBI function `dbWriteTable()`. This creates a table in DuckDB and populates it with the data frame contents. For example: ```r dbWriteTable(con, "iris_table", iris) res <- dbGetQuery(con, "SELECT * FROM iris_table LIMIT 1") print(res) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa ``` It is also possible to â€œregisterâ€ a R data frame as a virtual table, comparable to a SQL `VIEW`. This *does not actually transfer data* into DuckDB yet. Below is an example: ```r duckdb_register(con, "iris_view", iris) res <- dbGetQuery(con, "SELECT * FROM iris_view LIMIT 1") print(res) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa ``` > DuckDB keeps a reference to the R data frame after registration. This prevents the data frame from being garbage-collected. The reference is cleared when the connection is closed, but can also be cleared manually using the `duckdb_unregister()` method. Also refer to the [data import documentation](#docs:stable:data:overview) for more options of efficiently importing data. #### dbplyr {#docs:stable:clients:r::dbplyr} DuckDB also plays well with the [dbplyr](https://CRAN.R-project.org/package=dbplyr) / [dplyr](https://dplyr.tidyverse.org) packages for programmatic query construction from R. Here is an example: ```r library("duckdb") library("dplyr") con <- dbConnect(duckdb()) duckdb_register(con, "flights", nycflights13::flights) tbl(con, "flights") |> group_by(dest) |> summarise(delay = mean(dep_time, na.rm = TRUE)) |> collect() ``` When using dbplyr, CSV and Parquet files can be read using the `dplyr::tbl` function. ```r # Establish a CSV for the sake of this example write.csv(mtcars, "mtcars.csv") # Summarize the dataset in DuckDB to avoid reading the entire CSV into R's memory tbl(con, "mtcars.csv") |> group_by(cyl) |> summarise(across(disp:wt, .fns = mean)) |> collect() ``` ```r # Establish a set of Parquet files dbExecute(con, "COPY flights TO 'dataset' (FORMAT parquet, PARTITION_BY (year, month))") # Summarize the dataset in DuckDB to avoid reading 12 Parquet files into R's memory tbl(con, "read_parquet('dataset/**/*.parquet', hive_partitioning = true)") |> filter(month == "3") |> summarise(delay = mean(dep_time, na.rm = TRUE)) |> collect() ``` #### Memory Limit {#docs:stable:clients:r::memory-limit} You can use the [`memory_limit` configuration option](#docs:stable:configuration:pragmas) to limit the memory use of DuckDB, e.g.: ```sql SET memory_limit = '2GB'; ``` Note that this limit is only applied to the memory DuckDB uses and it does not affect the memory use of other R libraries. Therefore, the total memory used by the R process may be higher than the configured `memory_limit`. #### Troubleshooting {#docs:stable:clients:r::troubleshooting} ##### Warning When Installing on macOS {#docs:stable:clients:r::warning-when-installing-on-macos} On macOS, installing DuckDB may result in a warning `unable to load shared object '.../R_X11.so'`: ```console Warning message: In doTryCatch(return(expr), name, parentenv, handler) : unable to load shared object '/Library/Frameworks/R.framework/Resources/modules//R_X11.so': dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 0x0006): Library not loaded: /opt/X11/lib/libSM.6.dylib Referenced from: <31EADEB5-0A17-3546-9944-9B3747071FE8> /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/modules/R_X11.so Reason: tried: '/opt/X11/lib/libSM.6.dylib' (no such file) ... > ') ``` Note that this is just a warning, so the simplest solution is to ignore it. Alternatively, you can install DuckDB from the [R-universe](https://r-universe.dev/search): ```R install.packages("duckdb", repos = c("https://duckdb.r-universe.dev", "https://cloud.r-project.org")) ``` You may also install the optional [`xquartz` dependency via Homebrew](https://formulae.brew.sh/cask/xquartz). ## Rust Client {#docs:stable:clients:rust} > The latest stable version of the DuckDB Rust client is {{ site.current_duckdb_rust_version }}. #### Installation {#docs:stable:clients:rust::installation} The DuckDB Rust client can be installed from [crates.io](https://crates.io/crates/duckdb). Please see the [docs.rs](http://docs.rs/duckdb) for details. #### Basic API Usage {#docs:stable:clients:rust::basic-api-usage} duckdb-rs is an ergonomic wrapper based on the [DuckDB C API](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb.h), please refer to the [README](https://github.com/duckdb/duckdb-rs) for details. ##### Startup & Shutdown {#docs:stable:clients:rust::startup--shutdown} To use duckdb, you must first initialize a `Connection` handle using `Connection::open()`. `Connection::open()` takes as parameter the database file to read and write from. If the database file does not exist, it will be created (the file extension may be `.db`, `.duckdb`, or anything else). You can also use `Connection::open_in_memory()` to create an **in-memory database**. Note that for an in-memory database no data is persisted to disk (i.e., all data is lost when you exit the process). ```rust use duckdb::{params, Connection, Result}; let conn = Connection::open_in_memory()?; ``` The `Connection` will automatically close the underlying db connection for you when it goes out of scope (via `Drop`). You can also explicitly close the `Connection` with `conn.close()`. This is not much difference between these in the typical case, but in case there is an error, you'll have the chance to handle it with the explicit close. ##### Querying {#docs:stable:clients:rust::querying} SQL queries can be sent to DuckDB using the `execute()` method of connections, or we can also prepare the statement and then query on that. ```rust #[derive(Debug)] struct Person { id: i32, name: String, data: Option>, } conn.execute( "INSERT INTO person (name, data) VALUES (?, ?)", params![me.name, me.data], )?; let mut stmt = conn.prepare("SELECT id, name, data FROM person")?; let person_iter = stmt.query_map([], |row| { Ok(Person { id: row.get(0)?, name: row.get(1)?, data: row.get(2)?, }) })?; for person in person_iter { println!("Found person {:?}", person.unwrap()); } ``` #### Appender {#docs:stable:clients:rust::appender} The Rust client supports the [DuckDB Appender API](#docs:stable:data:appender) for bulk inserts. For example: ```rust fn insert_rows(conn: &Connection) -> Result<()> { let mut app = conn.appender("foo")?; app.append_rows([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])?; Ok(()) } ``` ## Swift Client {#docs:stable:clients:swift} DuckDB has a Swift client. See the [announcement post](https://duckdb.org/2023/04/21/swift) for details. #### Instantiating DuckDB {#docs:stable:clients:swift::instantiating-duckdb} DuckDB supports both in-memory and persistent databases. To work with an in-memory datatabase, run: ```swift let database = try Database(store: .inMemory) ``` To work with a persistent database, run: ```swift let database = try Database(store: .file(at: "test.db")) ``` Queries can be issued through a database connection. ```swift let connection = try database.connect() ``` DuckDB supports multiple connections per database. #### Application Example {#docs:stable:clients:swift::application-example} The rest of the page is based on the example of our [announcement post](https://duckdb.org/2023/04/21/swift), which uses raw data from [NASA's Exoplanet Archive](https://exoplanetarchive.ipac.caltech.edu) loaded directly into DuckDB. ##### Creating an Application-Specific Type {#docs:stable:clients:swift::creating-an-application-specific-type} We first create an application-specific type that we'll use to house our database and connection and through which we'll eventually define our app-specific queries. ```swift import DuckDB final class ExoplanetStore { let database: Database let connection: Connection init(database: Database, connection: Connection) { self.database = database self.connection = connection } } ``` ##### Loading a CSV File {#docs:stable:clients:swift::loading-a-csv-file} We load the data from [NASA's Exoplanet Archive](https://exoplanetarchive.ipac.caltech.edu): ```text wget https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=select+pl_name+,+disc_year+from+pscomppars&format=csv -O downloaded_exoplanets.csv ``` Once we have our CSV downloaded locally, we can use the following SQL command to load it as a new table to DuckDB: ```sql CREATE TABLE exoplanets AS SELECT * FROM read_csv('downloaded_exoplanets.csv'); ``` Let's package this up as a new asynchronous factory method on our `ExoplanetStore` type: ```swift import DuckDB import Foundation final class ExoplanetStore { // Factory method to create and prepare a new ExoplanetStore static func create() async throws -> ExoplanetStore { // Create our database and connection as described above let database = try Database(store: .inMemory) let connection = try database.connect() // Download the CSV from the exoplanet archive let (csvFileURL, _) = try await URLSession.shared.download( from: URL(string: "https://exoplanetarchive.ipac.caltech.edu/TAP/sync?query=select+pl_name+,+disc_year+from+pscomppars&format=csv")!) // Issue our first query to DuckDB try connection.execute(""" CREATE TABLE exoplanets AS SELECT * FROM read_csv('\(csvFileURL.path)'); """) // Create our pre-populated ExoplanetStore instance return ExoplanetStore( database: database, connection: connection ) } // Let's make the initializer we defined previously // private. This prevents anyone accidentally instantiating // the store without having pre-loaded our Exoplanet CSV // into the database private init(database: Database, connection: Connection) { ... } } ``` ##### Querying the Database {#docs:stable:clients:swift::querying-the-database} The following example queries DuckDB from within Swift via an async function. This means the callee won't be blocked while the query is executing. We'll then cast the result columns to Swift native types using DuckDB's `ResultSet` `cast(to:)` family of methods, before finally wrapping them up in a `DataFrame` from the TabularData framework. ```swift ... import TabularData extension ExoplanetStore { // Retrieves the number of exoplanets discovered by year func groupedByDiscoveryYear() async throws -> DataFrame { // Issue the query we described above let result = try connection.query(""" SELECT disc_year, count(disc_year) AS Count FROM exoplanets GROUP BY disc_year ORDER BY disc_year """) // Cast our DuckDB columns to their native Swift // equivalent types let discoveryYearColumn = result[0].cast(to: Int.self) let countColumn = result[1].cast(to: Int.self) // Use our DuckDB columns to instantiate TabularData // columns and populate a TabularData DataFrame return DataFrame(columns: [ TabularData.Column(discoveryYearColumn).eraseToAnyColumn(), TabularData.Column(countColumn).eraseToAnyColumn(), ]) } } ``` ##### Complete Project {#docs:stable:clients:swift::complete-project} For the complete example project,Â clone the [DuckDB Swift repository](https://github.com/duckdb/duckdb-swift) and open up the runnable app project located in [`Examples/SwiftUI/ExoplanetExplorer.xcodeproj`](https://github.com/duckdb/duckdb-swift/tree/main/Examples/SwiftUI/ExoplanetExplorer.xcodeproj). ## Wasm {#clients:wasm} ### DuckDB Wasm {#docs:stable:clients:wasm:overview} > The latest stable version of the DuckDB WebAssembly client is {{ site.current_duckdb_wasm_version }}. DuckDB has been compiled to WebAssembly, so it can run inside any browser on any device. DuckDB-Wasm offers a layered API, it can be embedded as a [JavaScript + WebAssembly library](https://www.npmjs.com/package/@duckdb/duckdb-wasm), as a [Web shell](https://www.npmjs.com/package/@duckdb/duckdb-wasm-shell), or [built from source](https://github.com/duckdb/duckdb-wasm) according to your needs. #### Getting Started with DuckDB-Wasm {#docs:stable:clients:wasm:overview::getting-started-with-duckdb-wasm} A great starting point is to read the [DuckDB-Wasm launch blog post](https://duckdb.org/2021/10/29/duckdb-wasm)! Another great resource is the [GitHub repository](https://github.com/duckdb/duckdb-wasm). For details, see the full [DuckDB-Wasm API Documentation](https://shell.duckdb.org/docs/modules/index.html). #### Limitations {#docs:stable:clients:wasm:overview::limitations} * By default, the WebAssembly client only uses a single thread. * The WebAssembly client has a limited amount of memory available. [WebAssembly limits the amount of available memory to 4 GB](https://v8.dev/blog/4gb-wasm-memory) and browsers may impose even stricter limits. ### Deploying DuckDB-Wasm {#docs:stable:clients:wasm:deploying_duckdb_wasm} A DuckDB-Wasm deployment needs to access the following components: * the DuckDB-Wasm main library component, distributed as TypeScript and compiled to JavaScript code * the DuckDB-Wasm Worker component, compiled to JavaScript code, possibly instantiated multiple times for threaded environments * the DuckDB-Wasm module, compiled as a WebAssembly file and instantiated by the browser * any relevant DuckDB-Wasm extension #### Main Library Component {#docs:stable:clients:wasm:deploying_duckdb_wasm::main-library-component} This is distributed as either TypeScript code or CommonJS JavaScript code in the `npm` duckdb-wasm package, and can be either bundled together with a given application, served in a same origin (sub-)domain and included at runtime or served from a third party CDN like JSDelivery. This do need some form of transpilation, and can't be served as-is, given it needs to know the location of the follow up files for this to be functional. Details will depend on your given setup, examples can be found at . Example deployment could be for example , that transpile the main library component together with shell code (first approach). Or the `bare-browser` example at . #### JS Worker Component {#docs:stable:clients:wasm:deploying_duckdb_wasm::js-worker-component} This is distributed as a JavaScript file in 3 different flavors, `mvp`, `eh` and `threads`, and needs to be served as is, and the main library components needs to be informed of the actual location. There are 3 variants for 3 different `platforms`: * `mvp` targets WebAssembly 1.0 spec * `eh` targets WebAssembly 1.0 spec WITH Wasm-level exceptions handling added, that improves performances * `threads` targets WebAssembly spec WITH exception and threading constructs You could serve all 3, and feature detect, or serve a single variant and instruct duckdb-wasm library on which one to use #### Wasm Worker Component {#docs:stable:clients:wasm:deploying_duckdb_wasm::wasm-worker-component} Same as the JS Worker component, 3 different flavors, `mvp`, `eh` and `threads`, each one is needed by the relevant JS component. These WebAssembly modules needs to be served as-is at an arbitrary [sub-] domain that is reachable from the main one. #### DuckDB Extensions {#docs:stable:clients:wasm:deploying_duckdb_wasm::duckdb-extensions} DuckDB extensions for DuckDB-Wasm, similar for the native cases, are served signed at the default extension endpoint: `https://extensions.duckdb.org`. If you are deploying duckdb-wasm you can consider mirroring relevant extensions at a different endpoint, possibly allowing for air-tight deployments on internal networks. ```sql SET custom_extension_repository = 'âŸ¨https://some.endpoint.org/path/to/repositoryâŸ©'; ``` Changes the default extension repository from the public `https://extensons.duckdb.org` to the one specified. Note that extensions are still signed, so the best path is downloading and serving the extensions with a similar structure to the original repository. See additional notes at . Community extensions are served at , and they are signed with a different key, so they can be disabled with a one way SQL statement such as: ```sql SET allow_community_extensions = false; ``` This will allow loading **only** of core duckdb extensions. Note that the failure is at `LOAD` time, not at `INSTALL` time. Please review for general information about extensions. #### Security Considerations {#docs:stable:clients:wasm:deploying_duckdb_wasm::security-considerations} > **Warning.** Deploying DuckDB-Wasm with access to your own data means whoever has access to SQL can access the data that DuckDB-Wasm can access. Also DuckDB-Wasm in default setting can access to remote endpoints, so have visible effect on external word even from within the sandbox. ### Instantiation {#docs:stable:clients:wasm:instantiation} DuckDB-Wasm has multiple ways to be instantiated depending on the use case. #### `cdn(jsdelivr)` {#docs:stable:clients:wasm:instantiation::cdnjsdelivr} ```ts import * as duckdb from '@duckdb/duckdb-wasm'; const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles(); // Select a bundle based on browser checks const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES); const worker_url = URL.createObjectURL( new Blob([`importScripts("${bundle.mainWorker}");`], {type: 'text/javascript'}) ); // Instantiate the asynchronous version of DuckDB-Wasm const worker = new Worker(worker_url); const logger = new duckdb.ConsoleLogger(); const db = new duckdb.AsyncDuckDB(logger, worker); await db.instantiate(bundle.mainModule, bundle.pthreadWorker); URL.revokeObjectURL(worker_url); ``` #### `webpack` {#docs:stable:clients:wasm:instantiation::webpack} ```ts import * as duckdb from '@duckdb/duckdb-wasm'; import duckdb_wasm from '@duckdb/duckdb-wasm/dist/duckdb-mvp.wasm'; import duckdb_wasm_next from '@duckdb/duckdb-wasm/dist/duckdb-eh.wasm'; const MANUAL_BUNDLES: duckdb.DuckDBBundles = { mvp: { mainModule: duckdb_wasm, mainWorker: new URL('@duckdb/duckdb-wasm/dist/duckdb-browser-mvp.worker.js', import.meta.url).toString(), }, eh: { mainModule: duckdb_wasm_next, mainWorker: new URL('@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js', import.meta.url).toString(), }, }; // Select a bundle based on browser checks const bundle = await duckdb.selectBundle(MANUAL_BUNDLES); // Instantiate the asynchronous version of DuckDB-Wasm const worker = new Worker(bundle.mainWorker!); const logger = new duckdb.ConsoleLogger(); const db = new duckdb.AsyncDuckDB(logger, worker); await db.instantiate(bundle.mainModule, bundle.pthreadWorker); ``` #### `vite` {#docs:stable:clients:wasm:instantiation::vite} ```ts import * as duckdb from '@duckdb/duckdb-wasm'; import duckdb_wasm from '@duckdb/duckdb-wasm/dist/duckdb-mvp.wasm?url'; import mvp_worker from '@duckdb/duckdb-wasm/dist/duckdb-browser-mvp.worker.js?url'; import duckdb_wasm_eh from '@duckdb/duckdb-wasm/dist/duckdb-eh.wasm?url'; import eh_worker from '@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js?url'; const MANUAL_BUNDLES: duckdb.DuckDBBundles = { mvp: { mainModule: duckdb_wasm, mainWorker: mvp_worker, }, eh: { mainModule: duckdb_wasm_eh, mainWorker: eh_worker, }, }; // Select a bundle based on browser checks const bundle = await duckdb.selectBundle(MANUAL_BUNDLES); // Instantiate the asynchronous version of DuckDB-wasm const worker = new Worker(bundle.mainWorker!); const logger = new duckdb.ConsoleLogger(); const db = new duckdb.AsyncDuckDB(logger, worker); await db.instantiate(bundle.mainModule, bundle.pthreadWorker); ``` #### Statically Served {#docs:stable:clients:wasm:instantiation::statically-served} It is possible to manually download the files from . ```ts import * as duckdb from '@duckdb/duckdb-wasm'; const MANUAL_BUNDLES: duckdb.DuckDBBundles = { mvp: { mainModule: 'change/me/../duckdb-mvp.wasm', mainWorker: 'change/me/../duckdb-browser-mvp.worker.js', }, eh: { mainModule: 'change/m/../duckdb-eh.wasm', mainWorker: 'change/m/../duckdb-browser-eh.worker.js', }, }; // Select a bundle based on browser checks const bundle = await duckdb.selectBundle(MANUAL_BUNDLES); // Instantiate the asynchronous version of DuckDB-Wasm const worker = new Worker(bundle.mainWorker!); const logger = new duckdb.ConsoleLogger(); const db = new duckdb.AsyncDuckDB(logger, worker); await db.instantiate(bundle.mainModule, bundle.pthreadWorker); ``` ### Data Ingestion {#docs:stable:clients:wasm:data_ingestion} DuckDB-Wasm has multiple ways to import data, depending on the format of the data. There are two steps to import data into DuckDB. First, the data file is imported into a local file system using register functions ([registerEmptyFileBuffer](https://shell.duckdb.org/docs/classes/index.AsyncDuckDB.html#registerEmptyFileBuffer), [registerFileBuffer](https://shell.duckdb.org/docs/classes/index.AsyncDuckDB.html#registerFileBuffer), [registerFileHandle](https://shell.duckdb.org/docs/classes/index.AsyncDuckDB.html#registerFileHandle), [registerFileText](https://shell.duckdb.org/docs/classes/index.AsyncDuckDB.html#registerFileText), [registerFileURL](https://shell.duckdb.org/docs/classes/index.AsyncDuckDB.html#registerFileURL)). Then, the data file is imported into DuckDB using insert functions ([insertArrowFromIPCStream](https://shell.duckdb.org/docs/classes/index.AsyncDuckDBConnection.html#insertArrowFromIPCStream), [insertArrowTable](https://shell.duckdb.org/docs/classes/index.AsyncDuckDBConnection.html#insertArrowTable), [insertCSVFromPath](https://shell.duckdb.org/docs/classes/index.AsyncDuckDBConnection.html#insertCSVFromPath), [insertJSONFromPath](https://shell.duckdb.org/docs/classes/index.AsyncDuckDBConnection.html#insertJSONFromPath)) or directly using FROM SQL query (using extensions like Parquet or [Wasm-flavored httpfs](#::httpfs-wasm-flavored)). [Insert statements](#docs:stable:data:insert) can also be used to import data. #### Data Import {#docs:stable:clients:wasm:data_ingestion::data-import} ##### Open & Close Connection {#docs:stable:clients:wasm:data_ingestion::open--close-connection} ```ts // Create a new connection const c = await db.connect(); // ... import data // Close the connection to release memory await c.close(); ``` ##### Apache Arrow {#docs:stable:clients:wasm:data_ingestion::apache-arrow} ```ts // Data can be inserted from an existing arrow.Table // More Example https://arrow.apache.org/docs/js/ import { tableFromArrays } from 'apache-arrow'; // EOS signal according to Arrow IPC streaming format // See https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format const EOS = new Uint8Array([255, 255, 255, 255, 0, 0, 0, 0]); const arrowTable = tableFromArrays({ id: [1, 2, 3], name: ['John', 'Jane', 'Jack'], age: [20, 21, 22], }); await c.insertArrowTable(arrowTable, { name: 'arrow_table' }); // Write EOS await c.insertArrowTable(EOS, { name: 'arrow_table' }); // ..., from a raw Arrow IPC stream const streamResponse = await fetch(` someapi`); const streamReader = streamResponse.body.getReader(); const streamInserts = []; while (true) { const { value, done } = await streamReader.read(); if (done) break; streamInserts.push(c.insertArrowFromIPCStream(value, { name: 'streamed' })); } // Write EOS streamInserts.push(c.insertArrowFromIPCStream(EOS, { name: 'streamed' })); await Promise.all(streamInserts); ``` ##### CSV {#docs:stable:clients:wasm:data_ingestion::csv} ```ts // ..., from CSV files // (interchangeable: registerFile{Text,Buffer,URL,Handle}) const csvContent = '1|foo\n2|bar\n'; await db.registerFileText(` data.csv`, csvContent); // ... with typed insert options await c.insertCSVFromPath('data.csv', { schema: 'main', name: 'foo', detect: false, header: false, delimiter: '|', columns: { col1: new arrow.Int32(), col2: new arrow.Utf8(), }, }); ``` ##### JSON {#docs:stable:clients:wasm:data_ingestion::json} ```ts // ..., from JSON documents in row-major format const jsonRowContent = [ { "col1": 1, "col2": "foo" }, { "col1": 2, "col2": "bar" }, ]; await db.registerFileText( 'rows.json', JSON.stringify(jsonRowContent), ); await c.insertJSONFromPath('rows.json', { name: 'rows' }); // ... or column-major format const jsonColContent = { "col1": [1, 2], "col2": ["foo", "bar"] }; await db.registerFileText( 'columns.json', JSON.stringify(jsonColContent), ); await c.insertJSONFromPath('columns.json', { name: 'columns' }); // From API const streamResponse = await fetch(` someapi/content.json`); await db.registerFileBuffer('file.json', new Uint8Array(await streamResponse.arrayBuffer())) await c.insertJSONFromPath('file.json', { name: 'JSONContent' }); ``` ##### Parquet {#docs:stable:clients:wasm:data_ingestion::parquet} ```ts // from Parquet files // ...Local const pickedFile: File = letUserPickFile(); await db.registerFileHandle('local.parquet', pickedFile, DuckDBDataProtocol.BROWSER_FILEREADER, true); // ...Remote await db.registerFileURL('remote.parquet', 'https://origin/remote.parquet', DuckDBDataProtocol.HTTP, false); // ... Using Fetch const res = await fetch('https://origin/remote.parquet'); await db.registerFileBuffer('buffer.parquet', new Uint8Array(await res.arrayBuffer())); // ..., by specifying URLs in the SQL text await c.query(` CREATE TABLE direct AS SELECT * FROM 'https://origin/remote.parquet' `); // ..., or by executing raw insert statements await c.query(` INSERT INTO existing_table VALUES (1, 'foo'), (2, 'bar')`); ``` ##### httpfs (Wasm-Flavored) {#docs:stable:clients:wasm:data_ingestion::httpfs-wasm-flavored} ```ts // ..., by specifying URLs in the SQL text await c.query(` CREATE TABLE direct AS SELECT * FROM 'https://origin/remote.parquet' `); ``` > **Tip.** If you encounter a Network Error (` Failed to execute 'send' on 'XMLHttpRequest'`) when you try to query files from S3, configure the S3 permission CORS header. For example: ```json [ { "AllowedHeaders": [ "*" ], "AllowedMethods": [ "GET", "HEAD" ], "AllowedOrigins": [ "*" ], "ExposeHeaders": [], "MaxAgeSeconds": 3000 } ] ``` ##### Insert Statement {#docs:stable:clients:wasm:data_ingestion::insert-statement} ```ts // ..., or by executing raw insert statements await c.query(` INSERT INTO existing_table VALUES (1, 'foo'), (2, 'bar')`); ``` ### Query {#docs:stable:clients:wasm:query} DuckDB-Wasm provides functions for querying data. Queries are run sequentially. First, a connection need to be created by calling [connect](https://shell.duckdb.org/docs/classes/index.AsyncDuckDB.html#connect). Then, queries can be run by calling [query](https://shell.duckdb.org/docs/classes/index.AsyncDuckDBConnection.html#query) or [send](https://shell.duckdb.org/docs/classes/index.AsyncDuckDBConnection.html#send). #### Query Execution {#docs:stable:clients:wasm:query::query-execution} ```ts // Create a new connection const conn = await db.connect(); // Either materialize the query result await conn.query<{ v: arrow.Int }>(` SELECT * FROM generate_series(1, 100) t(v) `); // ..., or fetch the result chunks lazily for await (const batch of await conn.send<{ v: arrow.Int }>(` SELECT * FROM generate_series(1, 100) t(v) `)) { // ... } // Close the connection to release memory await conn.close(); ``` #### Prepared Statements {#docs:stable:clients:wasm:query::prepared-statements} ```ts // Create a new connection const conn = await db.connect(); // Prepare query const stmt = await conn.prepare(` SELECT v + ? FROM generate_series(0, 10_000) t(v);`); // ... and run the query with materialized results await stmt.query(234); // ... or result chunks for await (const batch of await stmt.send(234)) { // ... } // Close the statement to release memory await stmt.close(); // Closing the connection will release statements as well await conn.close(); ``` #### Arrow Table to JSON {#docs:stable:clients:wasm:query::arrow-table-to-json} ```ts // Create a new connection const conn = await db.connect(); // Query const arrowResult = await conn.query<{ v: arrow.Int }>(` SELECT * FROM generate_series(1, 100) t(v) `); // Convert arrow table to json const result = arrowResult.toArray().map((row) => row.toJSON()); // Close the connection to release memory await conn.close(); ``` #### Export Parquet {#docs:stable:clients:wasm:query::export-parquet} ```ts // Create a new connection const conn = await db.connect(); // Export Parquet conn.send(` COPY (SELECT * FROM tbl) TO 'result-snappy.parquet' (FORMAT parquet);`); const parquet_buffer = await this._db.copyFileToBuffer('result-snappy.parquet'); // Generate a download link const link = URL.createObjectURL(new Blob([parquet_buffer])); // Close the connection to release memory await conn.close(); ``` ### Extensions {#docs:stable:clients:wasm:extensions} DuckDB-Wasm's (dynamic) extension loading is modeled after the regular DuckDB's extension loading, with a few relevant differences due to the difference in platform. #### Format {#docs:stable:clients:wasm:extensions::format} Extensions in DuckDB are binaries to be dynamically loaded via `dlopen`. A cryptographical signature is appended to the binary. An extension in DuckDB-Wasm is a regular Wasm file to be dynamically loaded via Emscripten's `dlopen`. A cryptographical signature is appended to the Wasm file as a WebAssembly custom section called `duckdb_signature`. This ensures the file remains a valid WebAssembly file. > Currently, we require this custom section to be the last one, but this can be potentially relaxed in the future. #### `INSTALL` and `LOAD` {#docs:stable:clients:wasm:extensions::install-and-load} The `INSTALL` semantic in native embeddings of DuckDB is to fetch, decompress from `gzip` and store data in local disk. The `LOAD` semantic in native embeddings of DuckDB is to (optionally) perform signature checks *and* dynamic load the binary with the main DuckDB binary. In DuckDB-Wasm, `INSTALL` is a no-op given there is no durable cross-session storage. The `LOAD` operation will fetch (and decompress on the fly), perform signature checks *and* dynamically load via the Emscripten implementation of `dlopen`. #### Autoloading {#docs:stable:clients:wasm:extensions::autoloading} [Autoloading](#docs:stable:extensions:overview), i.e., the possibility for DuckDB to add extension functionality on-the-fly, is enabled by default in DuckDB-Wasm. #### List of Officially Available Extensions {#docs:stable:clients:wasm:extensions::list-of-officially-available-extensions} | Extension name | Description | Aliases | | ----------------------------------------------------------------------- | ---------------------------------------------------------------- | --------------- | | [autocomplete](#docs:stable:core_extensions:autocomplete) | Adds support for autocomplete in the shell | | | [excel](#docs:stable:core_extensions:excel) | Adds support for Excel-like format strings | | | [fts](#docs:stable:core_extensions:full_text_search) | Adds support for Full-Text Search Indexes | | | [icu](#docs:stable:core_extensions:icu) | Adds support for time zones and collations using the ICU library | | | [inet](#docs:stable:core_extensions:inet) | Adds support for IP-related data types and functions | | | [json](#docs:stable:data:json:overview) | Adds support for JSON operations | | | [parquet](#docs:stable:data:parquet:overview) | Adds support for reading and writing Parquet files | | | [sqlite](#docs:stable:core_extensions:sqlite) | Adds support for reading SQLite database files | sqlite, sqlite3 | | [sqlsmith](#docs:stable:core_extensions:sqlsmith) | | | | [tpcds](#docs:stable:core_extensions:tpcds) | Adds TPC-DS data generation and query support | | | [tpch](#docs:stable:core_extensions:tpch) | Adds TPC-H data generation and query support | | WebAssembly is basically an additional platform, and there might be platform-specific limitations that make some extensions not able to match their native capabilities or to perform them in a different way. We will document here relevant differences for DuckDB-hosted extensions. ##### HTTPFS {#docs:stable:clients:wasm:extensions::httpfs} The HTTPFS extension is, at the moment, not available in DuckDB-Wasm. Https protocol capabilities needs to go through an additional layer, the browser, which adds both differences and some restrictions to what is doable from native. Instead, DuckDB-Wasm has a separate implementation that for most purposes is interchangeable, but does not support all use cases (as it must follow security rules imposed by the browser, such as CORS). Due to this CORS restriction, any requests for data made using the HTTPFS extension must be to websites that allow (using CORS headers) the website hosting the DuckDB-Wasm instance to access that data. The [MDN website](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS) is a great resource for more information regarding CORS. #### Extension Signing {#docs:stable:clients:wasm:extensions::extension-signing} As with regular DuckDB extensions, DuckDB-Wasm extension are by default checked on `LOAD` to verify the signature confirm the extension has not been tampered with. Extension signature verification can be disabled via a configuration option. Signing is a property of the binary itself, so copying a DuckDB extension (say to serve it from a different location) will still keep a valid signature (e.g., for local development). #### Fetching DuckDB-Wasm Extensions {#docs:stable:clients:wasm:extensions::fetching-duckdb-wasm-extensions} Official DuckDB extensions are served at `extensions.duckdb.org`, and this is also the default value for the `default_extension_repository` option. When installing extensions, a relevant URL will be built that will look like `extensions.duckdb.org/$duckdb_version_hash/$duckdb_platform/$name.duckdb_extension.gz`. DuckDB-Wasm extension are fetched only on load, and the URL will look like: `extensions.duckdb.org/duckdb-wasm/$duckdb_version_hash/$duckdb_platform/$name.duckdb_extension.wasm`. Note that an additional `duckdb-wasm` is added to the folder structure, and the file is served as a `.wasm` file. DuckDB-Wasm extensions are served pre-compressed using Brotli compression. While fetched from a browser, extensions will be transparently uncompressed. If you want to fetch the `duckdb-wasm` extension manually, you can use `curl --compress extensions.duckdb.org/<...>/icu.duckdb_extension.wasm`. #### Serving Extensions from a Third-Party Repository {#docs:stable:clients:wasm:extensions::serving-extensions-from-a-third-party-repository} As with regular DuckDB, if you use `SET custom_extension_repository = some.url.com`, subsequent loads will be attempted at `some.url.com/duckdb-wasm/$duckdb_version_hash/$duckdb_platform/$name.duckdb_extension.wasm`. Note that GET requests on the extensions needs to be [CORS enabled](https://www.w3.org/wiki/CORS_Enabled) for a browser to allow the connection. #### Tooling {#docs:stable:clients:wasm:extensions::tooling} Both DuckDB-Wasm and its extensions have been compiled using latest packaged Emscripten toolchain. # SQL {#sql} ## SQL Introduction {#docs:stable:sql:introduction} Here we provide an overview of how to perform simple operations in SQL. This tutorial is only intended to give you an introduction and is in no way a complete tutorial on SQL. This tutorial is adapted from the [PostgreSQL tutorial](https://www.postgresql.org/docs/current/tutorial-sql-intro.html). > DuckDB's SQL dialect closely follows the conventions of the PostgreSQL dialect. > The few exceptions to this are listed on the [PostgreSQL compatibility page](#docs:stable:sql:dialect:postgresql_compatibility). In the examples that follow, we assume that you have installed the DuckDB Command Line Interface (CLI) shell. See the [installation page](https://duckdb.org/install) for information on how to install the CLI. #### Concepts {#docs:stable:sql:introduction::concepts} DuckDB is a relational database management system (RDBMS). That means it is a system for managing data stored in relations. A relation is essentially a mathematical term for a table. Each table is a named collection of rows. Each row of a given table has the same set of named columns, and each column is of a specific data type. Tables themselves are stored inside schemas, and a collection of schemas constitutes the entire database that you can access. #### Creating a New Table {#docs:stable:sql:introduction::creating-a-new-table} You can create a new table by specifying the table name, along with all column names and their types: ```sql CREATE TABLE weather ( city VARCHAR, temp_lo INTEGER, -- minimum temperature on a day temp_hi INTEGER, -- maximum temperature on a day prcp FLOAT, date DATE ); ``` You can enter this into the shell with the line breaks. The command is not terminated until the semicolon. White space (i.e., spaces, tabs, and newlines) can be used freely in SQL commands. That means you can type the command aligned differently than above, or even all on one line. Two dash characters (` --`) introduce comments. Whatever follows them is ignored up to the end of the line. SQL is case-insensitive about keywords and identifiers. When returning identifiers, [their original cases are preserved](#docs:stable:sql:dialect:keywords_and_identifiers::rules-for-case-sensitivity). In the SQL command, we first specify the type of command that we want to perform: `CREATE TABLE`. After that follows the parameters for the command. First, the table name, `weather`, is given. Then the column names and column types follow. `city VARCHAR` specifies that the table has a column called `city` that is of type `VARCHAR`. `VARCHAR` specifies a data type that can store text of arbitrary length. The temperature fields are stored in an `INTEGER` type, a type that stores integer numbers (i.e., whole numbers without a decimal point). `FLOAT` columns store single precision floating-point numbers (i.e., numbers with a decimal point). `DATE` stores a date (i.e., year, month, day combination). `DATE` only stores the specific day, not a time associated with that day. DuckDB supports the standard SQL types `INTEGER`, `SMALLINT`, `FLOAT`, `DOUBLE`, `DECIMAL`, `CHAR(n)`, `VARCHAR(n)`, `DATE`, `TIME` and `TIMESTAMP`. The second example will store cities and their associated geographical location: ```sql CREATE TABLE cities ( name VARCHAR, lat DECIMAL, lon DECIMAL ); ``` Finally, it should be mentioned that if you don't need a table any longer or want to recreate it differently you can remove it using the following command: ```sql DROP TABLE âŸ¨tablenameâŸ©; ``` #### Populating a Table with Rows {#docs:stable:sql:introduction::populating-a-table-with-rows} The insert statement is used to populate a table with rows: ```sql INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27'); ``` Constants that are not numeric values (e.g., text and dates) must be surrounded by single quotes (` ''`), as in the example. Input dates for the date type must be formatted as `'YYYY-MM-DD'`. We can insert into the `cities` table in the same manner. ```sql INSERT INTO cities VALUES ('San Francisco', -194.0, 53.0); ``` The syntax used so far requires you to remember the order of the columns. An alternative syntax allows you to list the columns explicitly: ```sql INSERT INTO weather (city, temp_lo, temp_hi, prcp, date) VALUES ('San Francisco', 43, 57, 0.0, '1994-11-29'); ``` You can list the columns in a different order if you wish or even omit some columns, e.g., if the `prcp` is unknown: ```sql INSERT INTO weather (date, city, temp_hi, temp_lo) VALUES ('1994-11-29', 'Hayward', 54, 37); ``` > **Tip.** Many developers consider explicitly listing the columns better style than relying on the order implicitly. Please enter all the commands shown above so you have some data to work with in the following sections. Alternatively, you can use the `COPY` statement. This is faster for large amounts of data because the `COPY` command is optimized for bulk loading while allowing less flexibility than `INSERT`. An example with [`weather.csv`](https://duckdb.org/data/weather.csv) would be: ```sql COPY weather FROM 'weather.csv'; ``` Where the file name for the source file must be available on the machine running the process. There are many other ways of loading data into DuckDB, see the [corresponding documentation section](#docs:stable:data:overview) for more information. #### Querying a Table {#docs:stable:sql:introduction::querying-a-table} To retrieve data from a table, the table is queried. A SQL `SELECT` statement is used to do this. The statement is divided into a select list (the part that lists the columns to be returned), a table list (the part that lists the tables from which to retrieve the data), and an optional qualification (the part that specifies any restrictions). For example, to retrieve all the rows of table weather, type: ```sql SELECT * FROM weather; ``` Here `*` is a shorthand for â€œall columnsâ€. So the same result would be had with: ```sql SELECT city, temp_lo, temp_hi, prcp, date FROM weather; ``` The output should be: | city | temp_lo | temp_hi | prcp | date | |---------------|--------:|--------:|-----:|------------| | San Francisco | 46 | 50 | 0.25 | 1994-11-27 | | San Francisco | 43 | 57 | 0.0 | 1994-11-29 | | Hayward | 37 | 54 | NULL | 1994-11-29 | You can write expressions, not just simple column references, in the select list. For example, you can do: ```sql SELECT city, (temp_hi + temp_lo) / 2 AS temp_avg, date FROM weather; ``` This should give: | city | temp_avg | date | |---------------|---------:|------------| | San Francisco | 48.0 | 1994-11-27 | | San Francisco | 50.0 | 1994-11-29 | | Hayward | 45.5 | 1994-11-29 | Notice how the `AS` clause is used to relabel the output column. (The `AS` clause is optional.) A query can be â€œqualifiedâ€ by adding a `WHERE` clause that specifies which rows are wanted. The `WHERE` clause contains a Boolean (truth value) expression, and only rows for which the Boolean expression is true are returned. The usual Boolean operators (` AND`, `OR`, and `NOT`) are allowed in the qualification. For example, the following retrieves the weather of San Francisco on rainy days: ```sql SELECT * FROM weather WHERE city = 'San Francisco' AND prcp > 0.0; ``` Result: | city | temp_lo | temp_hi | prcp | date | |---------------|--------:|--------:|-----:|------------| | San Francisco | 46 | 50 | 0.25 | 1994-11-27 | You can request that the results of a query be returned in sorted order: ```sql SELECT * FROM weather ORDER BY city; ``` | city | temp_lo | temp_hi | prcp | date | |---------------|--------:|--------:|-----:|------------| | Hayward | 37 | 54 | NULL | 1994-11-29 | | San Francisco | 43 | 57 | 0.0 | 1994-11-29 | | San Francisco | 46 | 50 | 0.25 | 1994-11-27 | In this example, the sort order isn't fully specified, and so you might get the San Francisco rows in either order. But you'd always get the results shown above if you do: ```sql SELECT * FROM weather ORDER BY city, temp_lo; ``` You can request that duplicate rows be removed from the result of a query: ```sql SELECT DISTINCT city FROM weather; ``` | city | |---------------| | San Francisco | | Hayward | Here again, the result row ordering might vary. You can ensure consistent results by using `DISTINCT` and `ORDER BY` together: ```sql SELECT DISTINCT city FROM weather ORDER BY city; ``` #### Joins between Tables {#docs:stable:sql:introduction::joins-between-tables} Thus far, our queries have only accessed one table at a time. Queries can access multiple tables at once, or access the same table in such a way that multiple rows of the table are being processed at the same time. A query that accesses multiple rows of the same or different tables at one time is called a join query. As an example, say you wish to list all the weather records together with the location of the associated city. To do that, we need to compare the city column of each row of the `weather` table with the name column of all rows in the `cities` table, and select the pairs of rows where these values match. This would be accomplished by the following query: ```sql SELECT * FROM weather, cities WHERE city = name; ``` | city | temp_lo | temp_hi | prcp | date | name | lat | lon | |---------------|--------:|--------:|-----:|------------|---------------|---------:|-------:| | San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | -194.000 | 53.000 | | San Francisco | 43 | 57 | 0.0 | 1994-11-29 | San Francisco | -194.000 | 53.000 | Observe two things about the result set: * There is no result row for the city of Hayward. This is because there is no matching entry in the `cities` table for Hayward, so the join ignores the unmatched rows in the `weather` table. We will see shortly how this can be fixed. * There are two columns containing the city name. This is correct because the lists of columns from the `weather` and `cities` tables are concatenated. In practice this is undesirable, though, so you will probably want to list the output columns explicitly rather than using `*`: ```sql SELECT city, temp_lo, temp_hi, prcp, date, lon, lat FROM weather, cities WHERE city = name; ``` | city | temp_lo | temp_hi | prcp | date | lon | lat | |---------------|--------:|--------:|-----:|------------|-------:|---------:| | San Francisco | 46 | 50 | 0.25 | 1994-11-27 | 53.000 | -194.000 | | San Francisco | 43 | 57 | 0.0 | 1994-11-29 | 53.000 | -194.000 | Since the columns all had different names, the parser automatically found which table they belong to. If there were duplicate column names in the two tables you'd need to qualify the column names to show which one you meant, as in: ```sql SELECT weather.city, weather.temp_lo, weather.temp_hi, weather.prcp, weather.date, cities.lon, cities.lat FROM weather, cities WHERE cities.name = weather.city; ``` It is widely considered good style to qualify all column names in a join query, so that the query won't fail if a duplicate column name is later added to one of the tables. Join queries of the kind seen thus far can also be written in this alternative form: ```sql SELECT * FROM weather INNER JOIN cities ON weather.city = cities.name; ``` This syntax is not as commonly used as the one above, but we show it here to help you understand the following topics. Now we will figure out how we can get the Hayward records back in. What we want the query to do is to scan the `weather` table and for each row to find the matching cities row(s). If no matching row is found we want some â€œempty valuesâ€ to be substituted for the `cities` table's columns. This kind of query is called an outer join. (The joins we have seen so far are inner joins.) The command looks like this: ```sql SELECT * FROM weather LEFT OUTER JOIN cities ON weather.city = cities.name; ``` | city | temp_lo | temp_hi | prcp | date | name | lat | lon | |---------------|--------:|--------:|-----:|------------|---------------|---------:|-------:| | San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | -194.000 | 53.000 | | San Francisco | 43 | 57 | 0.0 | 1994-11-29 | San Francisco | -194.000 | 53.000 | | Hayward | 37 | 54 | NULL | 1994-11-29 | NULL | NULL | NULL | This query is called a left outer join because the table mentioned on the left of the join operator will have each of its rows in the output at least once, whereas the table on the right will only have those rows output that match some row of the left table. When outputting a left-table row for which there is no right-table match, empty (null) values are substituted for the right-table columns. #### Aggregate Functions {#docs:stable:sql:introduction::aggregate-functions} Like most other relational database products, DuckDB supports aggregate functions. An aggregate function computes a single result from multiple input rows. For example, there are aggregates to compute the `count`, `sum`, `avg` (average), `max` (maximum) and `min` (minimum) over a set of rows. As an example, we can find the highest low-temperature reading anywhere with: ```sql SELECT max(temp_lo) FROM weather; ``` | max(temp_lo) | |-------------:| | 46 | If we wanted to know what city (or cities) that reading occurred in, we might try: ```sql SELECT city FROM weather WHERE temp_lo = max(temp_lo); ``` But this will not work since the aggregate max cannot be used in the `WHERE` clause: ```console Binder Error: WHERE clause cannot contain aggregates! ``` This restriction exists because the `WHERE` clause determines which rows will be included in the aggregate calculation; so obviously it has to be evaluated before aggregate functions are computed. However, as is often the case the query can be restated to accomplish the desired result, here by using a subquery: ```sql SELECT city FROM weather WHERE temp_lo = (SELECT max(temp_lo) FROM weather); ``` | city | |---------------| | San Francisco | This is OK because the subquery is an independent computation that computes its own aggregate separately from what is happening in the outer query. Aggregates are also very useful in combination with `GROUP BY` clauses. For example, we can get the maximum low temperature observed in each city with: ```sql SELECT city, max(temp_lo) FROM weather GROUP BY city; ``` | city | max(temp_lo) | |---------------|--------------| | San Francisco | 46 | | Hayward | 37 | Which gives us one output row per city. Each aggregate result is computed over the table rows matching that city. We can filter these grouped rows using `HAVING`: ```sql SELECT city, max(temp_lo) FROM weather GROUP BY city HAVING max(temp_lo) < 40; ``` | city | max(temp_lo) | |---------|-------------:| | Hayward | 37 | which gives us the same results for only the cities that have all `temp_lo` values below 40. Finally, if we only care about cities whose names begin with `S`, we can use the `LIKE` operator: ```sql SELECT city, max(temp_lo) FROM weather WHERE city LIKE 'S%' -- (1) GROUP BY city HAVING max(temp_lo) < 40; ``` More information about the `LIKE` operator can be found in the [pattern matching page](#docs:stable:sql:functions:pattern_matching). It is important to understand the interaction between aggregates and SQL's `WHERE` and `HAVING` clauses. The fundamental difference between `WHERE` and `HAVING` is this: `WHERE` selects input rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate computation), whereas `HAVING` selects group rows after groups and aggregates are computed. Thus, the `WHERE` clause must not contain aggregate functions; it makes no sense to try to use an aggregate to determine which rows will be inputs to the aggregates. On the other hand, the `HAVING` clause always contains aggregate functions. In the previous example, we can apply the city name restriction in `WHERE`, since it needs no aggregate. This is more efficient than adding the restriction to `HAVING`, because we avoid doing the grouping and aggregate calculations for all rows that fail the `WHERE` check. #### Updates {#docs:stable:sql:introduction::updates} You can update existing rows using the `UPDATE` command. Suppose you discover the temperature readings are all off by 2 degrees after November 28. You can correct the data as follows: ```sql UPDATE weather SET temp_hi = temp_hi - 2, temp_lo = temp_lo - 2 WHERE date > '1994-11-28'; ``` Look at the new state of the data: ```sql SELECT * FROM weather; ``` | city | temp_lo | temp_hi | prcp | date | |---------------|--------:|--------:|-----:|------------| | San Francisco | 46 | 50 | 0.25 | 1994-11-27 | | San Francisco | 41 | 55 | 0.0 | 1994-11-29 | | Hayward | 35 | 52 | NULL | 1994-11-29 | #### Deletions {#docs:stable:sql:introduction::deletions} Rows can be removed from a table using the `DELETE` command. Suppose you are no longer interested in the weather of Hayward. Then you can do the following to delete those rows from the table: ```sql DELETE FROM weather WHERE city = 'Hayward'; ``` All weather records belonging to Hayward are removed. ```sql SELECT * FROM weather; ``` | city | temp_lo | temp_hi | prcp | date | |---------------|--------:|--------:|-----:|------------| | San Francisco | 46 | 50 | 0.25 | 1994-11-27 | | San Francisco | 41 | 55 | 0.0 | 1994-11-29 | One should be cautious when issuing statements of the following form: ```sql DELETE FROM âŸ¨table_nameâŸ©; ``` > **Warning.** Without a qualification, `DELETE` will remove all rows from the given table, leaving it empty. The system will not request confirmation before doing this. ## Statements {#sql:statements} ### Statements Overview {#docs:stable:sql:statements:overview} ### ANALYZE Statement {#docs:stable:sql:statements:analyze} The `ANALYZE` statement recomputes the statistics on DuckDB's tables. #### Usage {#docs:stable:sql:statements:analyze::usage} The statistics recomputed by the `ANALYZE` statement are only used for [join order optimization](https://blobs.duckdb.org/papers/tom-ebergen-msc-thesis-join-order-optimization-with-almost-no-statistics.pdf). It is therefore recommended to recompute these statistics for improved join orders, especially after performing large updates (inserts and/or deletes). To recompute the statistics, run: ```sql ANALYZE; ``` ### ALTER DATABASE Statement {#docs:stable:sql:statements:alter_database} The `ALTER DATABASE` command modifies a DuckDB database. This command can be used to change a database's name without needing to detach and reattach it. #### Syntax {#docs:stable:sql:statements:alter_database::syntax} #### `RENAME DATABASE` {#docs:stable:sql:statements:alter_database::rename-database} The following scenarios are not supported when renaming a DuckDB database with `ALTER DATABASE`: * Renaming a database to certain reserved keywords such as `system` or `temp`. * Renaming a database to the same name of a database attached in memory. Rename a database from `old_name` to `new_name`: ```sql ALTER DATABASE old_name RENAME TO new_name; ``` Check if a database exists and rename it: ```sql ALTER DATABASE IF EXISTS non_existent RENAME TO something_else; ``` ### ALTER TABLE Statement {#docs:stable:sql:statements:alter_table} The `ALTER TABLE` statement changes the schema of an existing table in the catalog. #### Examples {#docs:stable:sql:statements:alter_table::examples} ```sql CREATE TABLE integers (i INTEGER, j INTEGER); ``` Add a new column with name `k` to the table `integers`, it will be filled with the default value `NULL`: ```sql ALTER TABLE integers ADD COLUMN k INTEGER; ``` Add a new column with name `l` to the table integers, it will be filled with the default value 10: ```sql ALTER TABLE integers ADD COLUMN l INTEGER DEFAULT 10; ``` Drop the column `k` from the table integers: ```sql ALTER TABLE integers DROP k; ``` Change the type of the column `i` to the type `VARCHAR` using a standard cast: ```sql ALTER TABLE integers ALTER i TYPE VARCHAR; ``` Change the type of the column `i` to the type `VARCHAR`, using the specified expression to convert the data for each row: ```sql ALTER TABLE integers ALTER i SET DATA TYPE VARCHAR USING concat(i, '_', j); ``` Set the default value of a column: ```sql ALTER TABLE integers ALTER COLUMN i SET DEFAULT 10; ``` Drop the default value of a column: ```sql ALTER TABLE integers ALTER COLUMN i DROP DEFAULT; ``` Make a column not nullable: ```sql ALTER TABLE integers ALTER COLUMN i SET NOT NULL; ``` Drop the not-`NULL` constraint: ```sql ALTER TABLE integers ALTER COLUMN i DROP NOT NULL; ``` Rename a table: ```sql ALTER TABLE integers RENAME TO integers_old; ``` Rename a column of a table: ```sql ALTER TABLE integers RENAME i TO ii; ``` Add a primary key to a column of a table: ```sql ALTER TABLE integers ADD PRIMARY KEY (i); ``` #### Syntax {#docs:stable:sql:statements:alter_table::syntax} `ALTER TABLE` changes the schema of an existing table. All the changes made by `ALTER TABLE` fully respect the transactional semantics, i.e., they will not be visible to other transactions until committed, and can be fully reverted through a rollback. #### `RENAME TABLE` {#docs:stable:sql:statements:alter_table::rename-table} Rename a table: ```sql ALTER TABLE integers RENAME TO integers_old; ``` The `RENAME TO` clause renames an entire table, changing its name in the schema. Note that any views that rely on the table are **not** automatically updated. #### `RENAME COLUMN` {#docs:stable:sql:statements:alter_table::rename-column} To rename a column of a table, use the `RENAME` or `RENAME COLUMN` clauses: ```sql ALTER TABLE integers RENAME COLUMN i TO j; ``` ```sql ALTER TABLE integers RENAME i TO j; ``` The `RENAME [COLUMN]` clause renames a single column within a table. Any constraints that rely on this name (e.g., `CHECK` constraints) are automatically updated. However, note that any views that rely on this column name are **not** automatically updated. #### `ADD COLUMN` {#docs:stable:sql:statements:alter_table::add-column} To add a column of a table, use the `ADD` or `ADD COLUMN` clauses. E.g., to add a new column with name `k` to the table `integers`, it will be filled with the default value `NULL`: ```sql ALTER TABLE integers ADD COLUMN k INTEGER; ``` Or: ```sql ALTER TABLE integers ADD k INTEGER; ``` Add a new column with name `l` to the table integers, it will be filled with the default value 10: ```sql ALTER TABLE integers ADD COLUMN l INTEGER DEFAULT 10; ``` The `ADD [COLUMN]` clause can be used to add a new column of a specified type to a table. The new column will be filled with the specified default value, or `NULL` if none is specified. #### `DROP COLUMN` {#docs:stable:sql:statements:alter_table::drop-column} To drop a column of a table, use the `DROP` or `DROP COLUMN` clause: E.g., to drop the column `k` from the table `integers`: ```sql ALTER TABLE integers DROP COLUMN k; ``` Or: ```sql ALTER TABLE integers DROP k; ``` The `DROP [COLUMN]` clause can be used to remove a column from a table. Note that columns can only be removed if they do not have any indexes that rely on them. This includes any indexes created as part of a `PRIMARY KEY` or `UNIQUE` constraint. Columns that are part of multi-column check constraints cannot be dropped either. If you attempt to drop a column with an index on it, DuckDB will return the following error message: ```console Dependency Error: Cannot alter entry "..." because there are entries that depend on it. ``` #### `[SET [DATA]] TYPE` {#docs:stable:sql:statements:alter_table::set-data-type} Change the type of the column `i` to the type `VARCHAR` using a standard cast: ```sql ALTER TABLE integers ALTER i TYPE VARCHAR; ``` > Instead of > `ALTER âŸ¨column_nameâŸ© TYPE âŸ¨typeâŸ©`{:.language-sql .highlight}, you can also use the equivalent > `ALTER âŸ¨column_nameâŸ© SET TYPE âŸ¨typeâŸ©`{:.language-sql .highlight} and the > `ALTER âŸ¨column_nameâŸ© SET DATA TYPE âŸ¨typeâŸ©`{:.language-sql .highlight} clauses. Change the type of the column `i` to the type `VARCHAR`, using the specified expression to convert the data for each row: ```sql ALTER TABLE integers ALTER i SET DATA TYPE VARCHAR USING concat(i, '_', j); ``` The `[SET [DATA]] TYPE` clause changes the type of a column in a table. Any data present in the column is converted according to the provided expression in the `USING` clause, or, if the `USING` clause is absent, cast to the new data type. Note that columns can only have their type changed if they do not have any indexes that rely on them and are not part of any `CHECK` constraints. ##### Handling Structs {#docs:stable:sql:statements:alter_table::handling-structs} There are two options to change the sub-schema of a [`STRUCT`](#docs:stable:sql:data_types:struct)-typed column. ###### `ALTER TABLE` with `struct_insert` {#docs:stable:sql:statements:alter_table::alter-table-with-struct_insert} You can use `ALTER TABLE` with the `struct_insert` function. For example: ```sql CREATE TABLE tbl (col STRUCT(i INTEGER)); ALTER TABLE tbl ALTER col TYPE USING struct_insert(col, a := 42, b := NULL::VARCHAR); ``` ###### `ALTER TABLE` with `ADD COLUMN` / `DROP COLUMN` / `RENAME COLUMN` {#docs:stable:sql:statements:alter_table::alter-table-with-add-column--drop-column--rename-column} Starting with DuckDB v1.3.0, `ALTER TABLE` supports the [`ADD COLUMN`, `DROP COLUMN` and `RENAME COLUMN` clauses](#docs:stable:sql:data_types:struct::updating-the-schema) to update the sub-schema of a `STRUCT`. #### `SET` / `DROP DEFAULT` {#docs:stable:sql:statements:alter_table::set--drop-default} Set the default value of a column: ```sql ALTER TABLE integers ALTER COLUMN i SET DEFAULT 10; ``` Drop the default value of a column: ```sql ALTER TABLE integers ALTER COLUMN i DROP DEFAULT; ``` The `SET/DROP DEFAULT` clause modifies the `DEFAULT` value of an existing column. Note that this does not modify any existing data in the column. Dropping the default is equivalent to setting the default value to NULL. > **Warning.** At the moment DuckDB will not allow you to alter a table if there are any dependencies. That means that if you have an index on a column you will first need to drop the index, alter the table, and then recreate the index. Otherwise, you will get a `Dependency Error`. #### `ADD PRIMARY KEY` {#docs:stable:sql:statements:alter_table::add-primary-key} Add a primary key to a column of a table: ```sql ALTER TABLE integers ADD PRIMARY KEY (i); ``` Add a primary key to multiple columns of a table: ```sql ALTER TABLE integers ADD PRIMARY KEY (i, j); ``` #### `ADD` / `DROP CONSTRAINT` {#docs:stable:sql:statements:alter_table::add--drop-constraint} > `ADD CONSTRAINT` and `DROP CONSTRAINT` clauses are not yet supported in DuckDB. #### Limitations {#docs:stable:sql:statements:alter_table::limitations} `ALTER COLUMN` fails if values of conflicting types have occurred in the table at any point, even if they have been deleted: ```sql CREATE TABLE tbl (col VARCHAR); INSERT INTO tbl VALUES ('asdf'), ('42'); DELETE FROM tbl WHERE col = 'asdf'; ALTER TABLE tbl ALTER COLUMN col TYPE INTEGER; ``` ```console Conversion Error: Could not convert string 'asdf' to INT32 ``` Currently, this is expected behavior. As a workaround, you can create a copy of the table: ```sql CREATE OR REPLACE TABLE tbl AS FROM tbl; ``` ### ALTER VIEW Statement {#docs:stable:sql:statements:alter_view} The `ALTER VIEW` statement changes the schema of an existing view in the catalog. #### Examples {#docs:stable:sql:statements:alter_view::examples} Rename a view: ```sql ALTER VIEW view1 RENAME TO view2; ``` `ALTER VIEW` changes the schema of an existing table. All the changes made by `ALTER VIEW` fully respect the transactional semantics, i.e., they will not be visible to other transactions until committed, and can be fully reverted through a rollback. Note that other views that rely on the table are **not** automatically updated. ### ATTACH and DETACH Statements {#docs:stable:sql:statements:attach} DuckDB allows attaching to and detaching from database files. #### Examples {#docs:stable:sql:statements:attach::examples} Attach the database `file.db` with the alias inferred from the name (` file`): ```sql ATTACH 'file.db'; ``` Attach the database `file.db` with an explicit alias (` file_db`): ```sql ATTACH 'file.db' AS file_db; ``` Attach the database `file.db` in read only mode: ```sql ATTACH 'file.db' (READ_ONLY); ``` Attach the database `file.db` with a block size of 16 kB: ```sql ATTACH 'file.db' (BLOCK_SIZE 16_384); ``` Attach the database `file.db` with a row group size of 100 rows: ```sql ATTACH 'file.db' (ROW_GROUP_SIZE 100); ``` Attach a SQLite database for reading and writing (see the [`sqlite` extension](#docs:stable:core_extensions:sqlite) for more information): ```sql ATTACH 'sqlite_file.db' AS sqlite_db (TYPE sqlite); ``` Attach the database `file.db` if inferred database alias `file` does not yet exist: ```sql ATTACH IF NOT EXISTS 'file.db'; ``` Attach the database `file.db` if explicit database alias `file_db` does not yet exist: ```sql ATTACH IF NOT EXISTS 'file.db' AS file_db; ``` Attach the database `file2.db` as alias `file_db` detaching and replacing the existing alias if it exists: ```sql ATTACH OR REPLACE 'file2.db' AS file_db; ``` Create a table in the attached database with alias `file`: ```sql CREATE TABLE file.new_table (i INTEGER); ``` Detach the database with alias `file`: ```sql DETACH file; ``` Show a list of all attached databases: ```sql SHOW DATABASES; ``` Change the default database that is used to the database `file`: ```sql USE file; ``` #### `ATTACH` {#docs:stable:sql:statements:attach::attach} The `ATTACH` statement adds a new database file to the catalog that can be read from and written to. Note that attachment definitions are not persisted between sessions: when a new session is launched, you have to re-attach to all databases. ##### `ATTACH` Syntax {#docs:stable:sql:statements:attach::attach-syntax} `ATTACH` allows DuckDB to operate on multiple database files, and allows for transfer of data between different database files. `ATTACH` supports HTTP and S3 endpoints. For these, it creates a read-only connection by default. Therefore, the following two commands are equivalent: ```sql ATTACH 'https://blobs.duckdb.org/databases/stations.duckdb' AS stations_db; ATTACH 'https://blobs.duckdb.org/databases/stations.duckdb' AS stations_db (READ_ONLY); ``` Similarly, the following two commands connecting to S3 are equivalent: ```sql ATTACH 's3://duckdb-blobs/databases/stations.duckdb' AS stations_db; ATTACH 's3://duckdb-blobs/databases/stations.duckdb' AS stations_db (READ_ONLY); ``` ##### Explicit Storage Versions {#docs:stable:sql:statements:attach::explicit-storage-versions} [DuckDB v1.2.0 introduced the `STORAGE_VERSION` option](https://duckdb.org/2025/02/05/announcing-duckdb-120#explicit-storage-versions), which allows explicitly specifying the storage version. Using this, you can opt-in to newer forwards-incompatible features: ```sql ATTACH 'file.db' (STORAGE_VERSION 'v1.2.0'); ``` This setting specifies the minimum DuckDB version that should be able to read the database file. When database files are written with this option, the resulting files cannot be opened by older DuckDB released versions than the specified version. They can be read by the specified version and all newer versions of DuckDB. For more details, see the [â€œStorageâ€ page](#docs:stable:internals:storage::explicit-storage-versions). ##### Database Encryption {#docs:stable:sql:statements:attach::database-encryption} DuckDB supports database encryption. By default, it uses [AES encryption](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard) with a key length of 256 bits using the recommended [GCM](https://en.wikipedia.org/wiki/Galois/Counter_Mode) mode. The encryption covers the main database file, the write-ahead-log (WAL) file, and even temporary files. To attach to an encrypted database, use the `ATTACH` statement with an `ENCRYPTION_KEY`. ```sql ATTACH 'encrypted.db' AS enc_db (ENCRYPTION_KEY 'quack_quack'); ``` The encryption covers the main database file, the write-ahead-log (WAL) file, and even temporary files. To encrypt data, DuckDB can use either the built-in `mbedtls` library or the OpenSSL library from the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview). Note that the OpenSSL versions are much faster due to hardware acceleration, so make sure to load the `httpfs` for good encryption performance: ```sql LOAD httpfs; ATTACH 'encrypted.db' AS enc_db (ENCRYPTION_KEY 'quack_quack'); -- will be faster thanks to httpfs ``` To change the AES mode to [CBC](#<::en.wikipedia.org:wiki:block_cipher_mode_of_operation::cipher_block_chaining_>) or [CTR](#<::en.wikipedia.org:wiki:block_cipher_mode_of_operation::counter_>), use the `ENCRYPTION_CIPHER` option: ```sql ATTACH 'encrypted.db' AS enc_db (ENCRYPTION_KEY 'quack_quack', ENCRYPTION_CIPHER 'CBC'); ATTACH 'encrypted.db' AS enc_db (ENCRYPTION_KEY 'quack_quack', ENCRYPTION_CIPHER 'CTR'); ``` Database encryption implies using [storage version](#::explicit-storage-versions) 1.4.0 or later. ##### Options {#docs:stable:sql:statements:attach::options} Zero or more copy options may be provided within parentheses following the `ATTACH` statement. Parameter values can be passed in with or without wrapping in single quotes. Arbitrary expressions may be used for parameter values. | Name | Description | Type | Default value | | ------------------- | --------------------------------------------------------------------------------------------------------------------------- | --------- | ------------- | | `ACCESS_MODE` | Access mode of the database (` AUTOMATIC`, `READ_ONLY`, or `READ_WRITE`). | `VARCHAR` | `automatic` | | `COMPRESS` | Whether the database is compressed. Only applicable for in-memory databases. | `VARCHAR` | `false` | | `TYPE` | The file type (` DUCKDB` or `SQLITE`), or deduced from the input string literal (MySQL, PostgreSQL). | `VARCHAR` | `DUCKDB` | | `BLOCK_SIZE` | The block size of a new database file. Must be a power of two and within [16384, 262144]. Cannot be set for existing files. | `UBIGINT` | `262144` | | `ROW_GROUP_SIZE` | The row group size of a new database file. | `UBIGINT` | `122880` | | `STORAGE_VERSION` | The version of the storage used. | `VARCHAR` | `v1.0.0` | | `ENCRYPTION_KEY` | The encryption key used for encrypting the database. | `VARCHAR` | - | | `ENCRYPTION_CIPHER` | The encryption cipher used for encrypting the database (` CBC`, `CTR` or `GCM`). | `VARCHAR` | - | #### `DETACH` {#docs:stable:sql:statements:attach::detach} The `DETACH` statement allows previously attached database files to be closed and detached, releasing any locks held on the database file. Note that it is not possible to detach from the default database: if you would like to do so, issue the [`USE` statement](#docs:stable:sql:statements:use) to change the default database to another one. For example, if you are connected to a persistent database, you may change to an in-memory database by issuing: ```sql ATTACH ':memory:' AS memory_db; USE memory_db; ``` > **Warning.** Closing the connection, e.g., invoking the [`close()` function in Python](#docs:stable:clients:python:dbapi::connection), does not release the locks held on the database files as the file handles are held by the main DuckDB instance (in Python's case, the `duckdb` module). ##### `DETACH` Syntax {#docs:stable:sql:statements:attach::detach-syntax} #### Name Qualification {#docs:stable:sql:statements:attach::name-qualification} The fully qualified name of catalog objects contains the _catalog_, the _schema_ and the _name_ of the object. For example: Attach the database `new_db`: ```sql ATTACH 'new_db.db'; ``` Create the schema `my_schema` in the database `new_db`: ```sql CREATE SCHEMA new_db.my_schema; ``` Create the table `my_table` in the schema `my_schema`: ```sql CREATE TABLE new_db.my_schema.my_table (col INTEGER); ``` Refer to the column `col` inside the table `my_table`: ```sql SELECT new_db.my_schema.my_table.col FROM new_db.my_schema.my_table; ``` Note that often the fully qualified name is not required. When a name is not fully qualified, the system looks for which entries to reference using the _catalog search path_. The default catalog search path includes the system catalog, the temporary catalog and the initially attached database together with the `main` schema. Also note the rules on [identifiers and database names in particular](#docs:stable:sql:dialect:keywords_and_identifiers::database-names). ##### Default Database and Schema {#docs:stable:sql:statements:attach::default-database-and-schema} When a table is created without any qualifications, the table is created in the default schema of the default database. The default database is the database that is launched when the system is created â€“ and the default schema is `main`. Create the table `my_table` in the default database: ```sql CREATE TABLE my_table (col INTEGER); ``` ##### Changing the Default Database and Schema {#docs:stable:sql:statements:attach::changing-the-default-database-and-schema} The default database and schema can be changed using the `USE` command. Set the default database schema to `new_db.main`: ```sql USE new_db; ``` Set the default database schema to `new_db.my_schema`: ```sql USE new_db.my_schema; ``` ##### Resolving Conflicts {#docs:stable:sql:statements:attach::resolving-conflicts} When providing only a single qualification, the system can interpret this as _either_ a catalog _or_ a schema, as long as there are no conflicts. For example: ```sql ATTACH 'new_db.db'; CREATE SCHEMA my_schema; ``` Creates the table `new_db.main.tbl`: ```sql CREATE TABLE new_db.tbl (i INTEGER); ``` Creates the table `default_db.my_schema.tbl`: ```sql CREATE TABLE my_schema.tbl (i INTEGER); ``` If we create a conflict (i.e., we have both a schema and a catalog with the same name) the system requests that a fully qualified path is used instead: ```sql CREATE SCHEMA new_db; CREATE TABLE new_db.tbl (i INTEGER); ``` ```console Binder Error: Ambiguous reference to catalog or schema "new_db" - use a fully qualified path like "memory.new_db" ``` ##### Changing the Catalog Search Path {#docs:stable:sql:statements:attach::changing-the-catalog-search-path} The catalog search path can be adjusted by setting the `search_path` configuration option, which uses a comma-separated list of values that will be on the search path. The following example demonstrates searching in two databases: ```sql ATTACH ':memory:' AS db1; ATTACH ':memory:' AS db2; CREATE table db1.tbl1 (i INTEGER); CREATE table db2.tbl2 (j INTEGER); ``` Reference the tables using their fully qualified name: ```sql SELECT * FROM db1.tbl1; SELECT * FROM db2.tbl2; ``` Or set the search path and reference the tables using their name: ```sql SET search_path = 'db1,db2'; SELECT * FROM tbl1; SELECT * FROM tbl2; ``` #### Transactional Semantics {#docs:stable:sql:statements:attach::transactional-semantics} When running queries on multiple databases, the system opens separate transactions per database. The transactions are started _lazily_ by default â€“ when a given database is referenced for the first time in a query, a transaction for that database will be started. `SET immediate_transaction_mode = true` can be toggled to change this behavior to eagerly start transactions in all attached databases instead. While multiple transactions can be active at a time â€“ the system only supports _writing_ to a single attached database in a single transaction. If you try to write to multiple attached databases in a single transaction the following error will be thrown: ```console Attempting to write to database "db2" in a transaction that has already modified database "db1" - a single transaction can only write to a single attached database. ``` The reason for this restriction is that the system does not maintain atomicity for transactions across attached databases. Transactions are only atomic _within_ each database file. By restricting the global transaction to write to only a single database file the atomicity guarantees are maintained. ### CALL Statement {#docs:stable:sql:statements:call} The `CALL` statement invokes the given table function and returns the results. #### Examples {#docs:stable:sql:statements:call::examples} Invoke the 'duckdb_functions' table function: ```sql CALL duckdb_functions(); ``` Invoke the 'pragma_table_info' table function: ```sql CALL pragma_table_info('pg_am'); ``` Select only the functions where the name starts with `ST_`: ```sql SELECT function_name, parameters, parameter_types, return_type FROM duckdb_functions() WHERE function_name LIKE 'ST_%'; ``` #### Syntax {#docs:stable:sql:statements:call::syntax} ### CHECKPOINT Statement {#docs:stable:sql:statements:checkpoint} The `CHECKPOINT` statement synchronizes data in the write-ahead log (WAL) to the database data file. #### Examples {#docs:stable:sql:statements:checkpoint::examples} Synchronize data in the default database: ```sql CHECKPOINT; ``` Synchronize data in the specified database: ```sql CHECKPOINT file_db; ``` Abort any in-progress transactions to synchronize the data: ```sql FORCE CHECKPOINT; ``` #### Checkpointing In-Memory Tables {#docs:stable:sql:statements:checkpoint::checkpointing-in-memory-tables} Starting with v1.4.0, in-memory tables support checkpointing. This has two key benefits: * In-memory tables also support compression. This is disabled by default â€“ you can turn it on using: ```sql ATTACH ':memory:' AS memory_compressed (COMPRESS); USE memory_compressed; ``` * Checkpointing triggers vacuuming deleted rows, allowing space to be reclaimed after deletes/truncation. #### Syntax {#docs:stable:sql:statements:checkpoint::syntax} Checkpoint operations happen automatically based on the WAL size (see [Configuration](#docs:stable:configuration:overview)). This statement is for manual checkpoint actions. #### Behavior {#docs:stable:sql:statements:checkpoint::behavior} The default `CHECKPOINT` command will fail if there are any running transactions. Including `FORCE` will abort any transactions and execute the checkpoint operation. Also see the related [`PRAGMA` option](#docs:stable:configuration:pragmas::force-checkpoint) for further behavior modification. ##### Reclaiming Space {#docs:stable:sql:statements:checkpoint::reclaiming-space} When performing a checkpoint (automatic or otherwise), the space occupied by deleted rows is partially reclaimed. Note that this does not remove all deleted rows, but rather merges row groups that have a significant amount of deletes together. In the current implementation this requires ~25% of rows to be deleted in adjacent row groups. When running in in-memory mode, checkpointing has no effect, hence it does not reclaim space after deletes in in-memory databases. > **Warning.** The [`VACUUM` statement](#docs:stable:sql:statements:vacuum) does _not_ trigger vacuuming deletes and hence does not reclaim space. ### COMMENT ON Statement {#docs:stable:sql:statements:comment_on} The `COMMENT ON` statement allows adding metadata to catalog entries (tables, columns, etc.). It follows the [PostgreSQL syntax](https://www.postgresql.org/docs/16/sql-comment.html). #### Examples {#docs:stable:sql:statements:comment_on::examples} Create a comment on a `TABLE`: ```sql COMMENT ON TABLE test_table IS 'very nice table'; ``` Create a comment on a `COLUMN`: ```sql COMMENT ON COLUMN test_table.test_table_column IS 'very nice column'; ``` Create a comment on a `VIEW`: ```sql COMMENT ON VIEW test_view IS 'very nice view'; ``` Create a comment on an `INDEX`: ```sql COMMENT ON INDEX test_index IS 'very nice index'; ``` Create a comment on a `SEQUENCE`: ```sql COMMENT ON SEQUENCE test_sequence IS 'very nice sequence'; ``` Create a comment on a `TYPE`: ```sql COMMENT ON TYPE test_type IS 'very nice type'; ``` Create a comment on a `MACRO`: ```sql COMMENT ON MACRO test_macro IS 'very nice macro'; ``` Create a comment on a `MACRO TABLE`: ```sql COMMENT ON MACRO TABLE test_table_macro IS 'very nice table macro'; ``` To unset a comment, set it to `NULL`, e.g.: ```sql COMMENT ON TABLE test_table IS NULL; ``` #### Reading Comments {#docs:stable:sql:statements:comment_on::reading-comments} Comments can be read by querying the `comment` column of the respective [metadata functions](#docs:stable:sql:meta:duckdb_table_functions): List comments on `TABLE`s: ```sql SELECT comment FROM duckdb_tables(); ``` List comments on `COLUMN`s: ```sql SELECT comment FROM duckdb_columns(); ``` List comments on `VIEW`s: ```sql SELECT comment FROM duckdb_views(); ``` List comments on `INDEX`s: ```sql SELECT comment FROM duckdb_indexes(); ``` List comments on `SEQUENCE`s: ```sql SELECT comment FROM duckdb_sequences(); ``` List comments on `TYPE`s: ```sql SELECT comment FROM duckdb_types(); ``` List comments on `MACRO`s: ```sql SELECT comment FROM duckdb_functions(); ``` List comments on `MACRO TABLE`s: ```sql SELECT comment FROM duckdb_functions(); ``` #### Limitations {#docs:stable:sql:statements:comment_on::limitations} The `COMMENT ON` statement currently has the following limitations: * It is not possible to comment on schemas or databases. * It is not possible to comment on things that have a dependency (e.g., a table with an index). #### Syntax {#docs:stable:sql:statements:comment_on::syntax} ### COPY Statement {#docs:stable:sql:statements:copy} #### Examples {#docs:stable:sql:statements:copy::examples} Read a CSV file into the `lineitem` table, using auto-detected CSV options: ```sql COPY lineitem FROM 'lineitem.csv'; ``` Read a CSV file into the `lineitem` table, using manually specified CSV options: ```sql COPY lineitem FROM 'lineitem.csv' (DELIMITER '|'); ``` Read a Parquet file into the `lineitem` table: ```sql COPY lineitem FROM 'lineitem.pq' (FORMAT parquet); ``` Read a JSON file into the `lineitem` table, using auto-detected options: ```sql COPY lineitem FROM 'lineitem.json' (FORMAT json, AUTO_DETECT true); ``` Read a CSV file into the `lineitem` table, using double quotes: ```sql COPY lineitem FROM "lineitem.csv"; ``` Read a CSV file into the `lineitem` table, omitting quotes: ```sql COPY lineitem FROM lineitem.csv; ``` Write a table to a CSV file: ```sql COPY lineitem TO 'lineitem.csv' (FORMAT csv, DELIMITER '|', HEADER); ``` Write a table to a CSV file, using double quotes: ```sql COPY lineitem TO "lineitem.csv"; ``` Write a table to a CSV file, omitting quotes: ```sql COPY lineitem TO lineitem.csv; ``` Write the result of a query to a Parquet file: ```sql COPY (SELECT l_orderkey, l_partkey FROM lineitem) TO 'lineitem.parquet' (COMPRESSION zstd); ``` Copy the entire content of database `db1` to database `db2`: ```sql COPY FROM DATABASE db1 TO db2; ``` Copy only the schema (catalog elements) but not any data: ```sql COPY FROM DATABASE db1 TO db2 (SCHEMA); ``` #### Overview {#docs:stable:sql:statements:copy::overview} `COPY` moves data between DuckDB and external files. `COPY ... FROM` imports data into DuckDB from an external file. `COPY ... TO` writes data from DuckDB to an external file. The `COPY` command can be used for `CSV`, `PARQUET` and `JSON` files. #### `COPY ... FROM` {#docs:stable:sql:statements:copy::copy--from} `COPY ... FROM` imports data from an external file into an existing table. The data is appended to whatever data is in the table already. The amount of columns inside the file must match the amount of columns in the table `tbl`, and the contents of the columns must be convertible to the column types of the table. In case this is not possible, an error will be thrown. If a list of columns is specified, `COPY` will only copy the data in the specified columns from the file. If there are any columns in the table that are not in the column list, `COPY ... FROM` will insert the default values for those columns Copy the contents of a comma-separated file `test.csv` without a header into the table `test`: ```sql COPY test FROM 'test.csv'; ``` Copy the contents of a comma-separated file with a header into the `category` table: ```sql COPY category FROM 'categories.csv' (HEADER); ``` Copy the contents of `lineitem.tbl` into the `lineitem` table, where the contents are delimited by a pipe character (` |`): ```sql COPY lineitem FROM 'lineitem.tbl' (DELIMITER '|'); ``` Copy the contents of `lineitem.tbl` into the `lineitem` table, where the delimiter, quote character, and presence of a header are automatically detected: ```sql COPY lineitem FROM 'lineitem.tbl' (AUTO_DETECT true); ``` Read the contents of a comma-separated file `names.csv` into the `name` column of the `category` table. Any other columns of this table are filled with their default value: ```sql COPY category(name) FROM 'names.csv'; ``` Read the contents of a Parquet file `lineitem.parquet` into the `lineitem` table: ```sql COPY lineitem FROM 'lineitem.parquet' (FORMAT parquet); ``` Read the contents of a newline-delimited JSON file `lineitem.ndjson` into the `lineitem` table: ```sql COPY lineitem FROM 'lineitem.ndjson' (FORMAT json); ``` Read the contents of a JSON file `lineitem.json` into the `lineitem` table: ```sql COPY lineitem FROM 'lineitem.json' (FORMAT json, ARRAY true); ``` An expression may be used as the source of a `COPY ... FROM` command if it is placed within parentheses. Read the contents of a file whose path is stored in a variable into the `lineitem` table: ```sql SET VARIABLE source_file = 'lineitem.json'; COPY lineitem FROM (getvariable('source_file')); ``` Read the contents of a file provided as parameter of a prepared statement into the `lineitem` table: ```sql PREPARE v1 AS COPY lineitem FROM ($1); EXECUTE v1('lineitem.json'); ``` ##### Syntax {#docs:stable:sql:statements:copy::syntax} > To ensure compatibility with PostgreSQL, DuckDB accepts `COPY ... FROM` statements that do not fully comply with the railroad diagram shown here. For example, the following is a valid statement: > > ```sql > COPY tbl FROM 'tbl.csv' WITH DELIMITER '|' CSV HEADER; > ``` #### `COPY ... TO` {#docs:stable:sql:statements:copy::copy--to} `COPY ... TO` exports data from DuckDB to an external CSV, Parquet, JSON or BLOB file. It has mostly the same set of options as `COPY ... FROM`, however, in the case of `COPY ... TO` the options specify how the file should be written to disk. Any file created by `COPY ... TO` can be copied back into the database by using `COPY ... FROM` with a similar set of options. The `COPY ... TO` function can be called specifying either a table name, or a query. When a table name is specified, the contents of the entire table will be written into the resulting file. When a query is specified, the query is executed and the result of the query is written to the resulting file. Copy the contents of the `lineitem` table to a CSV file with a header: ```sql COPY lineitem TO 'lineitem.csv'; ``` Copy the contents of the `lineitem` table to the file `lineitem.tbl`, where the columns are delimited by a pipe character (` |`), including a header line: ```sql COPY lineitem TO 'lineitem.tbl' (DELIMITER '|'); ``` Use tab separators to create a TSV file without a header: ```sql COPY lineitem TO 'lineitem.tsv' (DELIMITER '\t', HEADER false); ``` Copy the l_orderkey column of the `lineitem` table to the file `orderkey.tbl`: ```sql COPY lineitem(l_orderkey) TO 'orderkey.tbl' (DELIMITER '|'); ``` Copy the result of a query to the file `query.csv`, including a header with column names: ```sql COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.csv' (DELIMITER ','); ``` Copy the result of a query to the Parquet file `query.parquet`: ```sql COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.parquet' (FORMAT parquet); ``` Copy the result of a query to the newline-delimited JSON file `query.ndjson`: ```sql COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.ndjson' (FORMAT json); ``` Copy the result of a query to the JSON file `query.json`: ```sql COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.json' (FORMAT json, ARRAY true); ``` Return the files and their column statistics that were written as part of the `COPY` statement: ```sql COPY (SELECT l_orderkey, l_comment FROM lineitem) TO 'lineitem_part.parquet' (RETURN_STATS); ``` | filename | count | file_size_bytes | footer_size_bytes | column_statistics | partition_keys | |-----------------------|-------:|----------------:|------------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------| | lineitem_part.parquet | 600572 | 8579141 | 1445 | {'"l_comment"'={column_size_bytes=7642227, max=zzle. slyly, min=' Tiresias above the blit', null_count=0}, '"l_orderkey"'={column_size_bytes=935457, max=600000, min=1, null_count=0}} | NULL | Note: for nested columns (e.g., structs) the column statistics are defined for each part. For example, if we have a column `name STRUCT(field1 INTEGER, field2 INTEGER)` the column statistics will have stats for `name.field1` and `name.field2`. An expression may be used as the target of a `COPY ... TO` command if it is placed within parentheses. Copy the result of a query to a file whose path is stored in a variable: ```sql SET VARIABLE target_file = 'target_file.parquet'; COPY (SELECT 'hello world') TO (getvariable('target_file')); ``` Copy to a file provided as parameter of a prepared statement: ```sql PREPARE v1 AS COPY (SELECT 42 i) to $1; EXECUTE v1('file.csv'); ``` Expressions may be used for options as well. Copy to a file using a format stored in a variable: ```sql SET VARIABLE my_format = 'parquet'; COPY (SELECT 42 i) TO 'file' (FORMAT getvariable('my_format')); ``` ##### `COPY ... TO` Options {#docs:stable:sql:statements:copy::copy--to-options} Zero or more copy options may be provided as a part of the copy operation. The `WITH` specifier is optional, but if any options are specified, the parentheses are required. Parameter values can be passed in with or without wrapping in single quotes. Arbitrary expressions may be used for parameter values. Any option that is a Boolean can be enabled or disabled in multiple ways. You can write `true`, `ON`, or `1` to enable the option, and `false`, `OFF`, or `0` to disable it. The `BOOLEAN` value can also be omitted, e.g., by only passing `(HEADER)`, in which case `true` is assumed. With few exceptions, the below options are applicable to all formats written with `COPY`. | Name | Description | Type | Default | |:--|:-----|:-|:-| | `FORMAT` | Specifies the copy function to use. The default is selected from the file extension (e.g., `.parquet` results in a Parquet file being written/read). If the file extension is unknown `CSV` is selected. Vanilla DuckDB provides `CSV`, `PARQUET` and `JSON` but additional copy functions can be added by [`extensions`](#docs:stable:extensions:overview). | `VARCHAR` | `auto` | | `USE_TMP_FILE` | Whether or not to write to a temporary file first if the original file exists (` target.csv.tmp`). This prevents overwriting an existing file with a broken file in case the writing is cancelled. | `BOOL` | `auto` | | `OVERWRITE_OR_IGNORE` | Whether or not to allow overwriting files if they already exist. Only has an effect when used with `PARTITION_BY`. | `BOOL` | `false` | | `OVERWRITE` | When `true`, all existing files inside targeted directories will be removed (not supported on remote filesystems). Only has an effect when used with `PARTITION_BY`. | `BOOL` | `false` | | `APPEND` | When `true`, in the event a filename pattern is generated that already exists, the path will be regenerated to ensure no existing files are overwritten. Only has an effect when used with `PARTITION_BY`. | `BOOL` | `false` | | `FILENAME_PATTERN` | Set a pattern to use for the filename, can optionally contain `{uuid}` / `{uuidv4}` or `{uuidv7}` to be filled in with a generated [UUID](#docs:stable:sql:data_types:numeric::universally-unique-identifiers-uuids) (v4 or v7, respectively), and `{i}`, which is replaced by an incrementing index. Only has an effect when used with `PARTITION_BY`. | `VARCHAR` | `auto` | | `FILE_EXTENSION` | Set the file extension that should be assigned to the generated file(s). | `VARCHAR` | `auto` | | `PER_THREAD_OUTPUT` | When `true`, the `COPY` command generates one file per thread, rather than one file in total. This allows for faster parallel writing. | `BOOL` | `false` | | `FILE_SIZE_BYTES` | If this parameter is set, the `COPY` process creates a directory which will contain the exported files. If a file exceeds the set limit (specified as bytes such as `1000` or in human-readable format such as `1k`), the process creates a new file in the directory. This parameter works in combination with `PER_THREAD_OUTPUT`. Note that the size is used as an approximation, and files can be occasionally slightly over the limit. | `VARCHAR` or `BIGINT` | (empty) | | `PARTITION_BY` | The columns to partition by using a Hive partitioning scheme, see the [partitioned writes section](#docs:stable:data:partitioning:partitioned_writes). | `VARCHAR[]` | (empty) | | `PRESERVE_ORDER` | Whether or not to [preserve order](#docs:stable:sql:dialect:order_preservation) during the copy operation. Defaults to the value of the `preserve_insertion_order` [configuration option](#docs:stable:configuration:overview). | `BOOL`| (*) | | `RETURN_FILES` | Whether or not to include the created filepath(s) (as a `files VARCHAR[]` column) in the query result. | `BOOL` | `false` | | `RETURN_STATS` | Whether or not to return the files and their column statistics that were written as part of the `COPY` statement. | `BOOL`| `false` | | `WRITE_PARTITION_COLUMNS` | Whether or not to write partition columns into files. Only has an effect when used with `PARTITION_BY`. | `BOOL` | `false` | ##### Syntax {#docs:stable:sql:statements:copy::syntax} > To ensure compatibility with PostgreSQL, DuckDB accepts `COPY ... TO` statements that do not fully comply with the railroad diagram shown here. For example, the following is a valid statement: > > ```sql > COPY (SELECT 42 AS x, 84 AS y) TO 'out.csv' WITH DELIMITER '|' CSV HEADER; > ``` #### `COPY FROM DATABASE ... TO` {#docs:stable:sql:statements:copy::copy-from-database--to} The `COPY FROM DATABASE ... TO` statement copies the entire content from one attached database to another attached database. This includes the schema, including constraints, indexes, sequences, macros, and the data itself. ```sql ATTACH 'db1.db' AS db1; CREATE TABLE db1.tbl AS SELECT 42 AS x, 3 AS y; CREATE MACRO db1.two_x_plus_y(x, y) AS 2 * x + y; ATTACH 'db2.db' AS db2; COPY FROM DATABASE db1 TO db2; SELECT db2.two_x_plus_y(x, y) AS z FROM db2.tbl; ``` | z | |---:| | 87 | To only copy the **schema** of `db1` to `db2` but omit copying the data, add `SCHEMA` to the statement: ```sql COPY FROM DATABASE db1 TO db2 (SCHEMA); ``` ##### Syntax {#docs:stable:sql:statements:copy::syntax} #### Format-Specific Options {#docs:stable:sql:statements:copy::format-specific-options} ##### CSV Options {#docs:stable:sql:statements:copy::csv-options} The below options are applicable when writing CSV files. | Name | Description | Type | Default | |:--|:-----|:-|:-| | `COMPRESSION` | The compression type for the file. By default this will be detected automatically from the file extension (e.g., `file.csv.gz` will use `gzip`, `file.csv.zst` will use `zstd`, and `file.csv` will use `none`). Options are `none`, `gzip`, `zstd`. | `VARCHAR` | `auto` | | `DATEFORMAT` | Specifies the date format to use when writing dates. See [Date Format](#docs:stable:sql:functions:dateformat). | `VARCHAR` | (empty) | | `DELIM` or `SEP` | The character that is written to separate columns within each row. | `VARCHAR` | `,` | | `ESCAPE` | The character that should appear before a character that matches the `quote` value. | `VARCHAR` | `"` | | `FORCE_QUOTE` | The list of columns to always add quotes to, even if not required. | `VARCHAR[]` | `[]` | | `HEADER` | Whether or not to write a header for the CSV file. | `BOOL` | `true` | | `NULLSTR` | The string that is written to represent a `NULL` value. | `VARCHAR` | (empty) | | `PREFIX` | Prefixes the CSV file with a specified string. This option must be used in conjunction with `SUFFIX` and requires `HEADER` to be set to `false`.| `VARCHAR` | (empty) | | `SUFFIX` | Appends a specified string as a suffix to the CSV file. This option must be used in conjunction with `PREFIX` and requires `HEADER` to be set to `false`.| `VARCHAR` | (empty) | | `QUOTE` | The quoting character to be used when a data value is quoted. | `VARCHAR` | `"` | | `TIMESTAMPFORMAT` | Specifies the date format to use when writing timestamps. See [Date Format](#docs:stable:sql:functions:dateformat). | `VARCHAR` | (empty) | ##### Parquet Options {#docs:stable:sql:statements:copy::parquet-options} The below options are applicable when writing Parquet files. | Name | Description | Type | Default | |:--|:-----|:-|:-| | `COMPRESSION` | The compression format to use (` uncompressed`, `snappy`, `gzip`, `zstd`, `brotli`, `lz4`, `lz4_raw`). | `VARCHAR` | `snappy` | | `COMPRESSION_LEVEL` | Compression level, set between 1 (lowest compression, fastest) and 22 (highest compression, slowest). Only supported for zstd compression. | `BIGINT` | `3` | | `FIELD_IDS` | The `field_id` for each column. Pass `auto` to attempt to infer automatically. | `STRUCT` | (empty) | | `ROW_GROUP_SIZE_BYTES` | The target size of each row group. You can pass either a human-readable string, e.g., `2MB`, or an integer, i.e., the number of bytes. This option is only used when you have issued `SET preserve_insertion_order = false;`, otherwise, it is ignored. | `BIGINT` | `row_group_size * 1024` | | `ROW_GROUP_SIZE` | The target size, i.e., number of rows, of each row group. | `BIGINT` | 122880 | | `ROW_GROUPS_PER_FILE` | Create a new Parquet file if the current one has a specified number of row groups. If multiple threads are active, the number of row groups in a file may slightly exceed the specified number of row groups to limit the amount of locking â€“ similarly to the behaviour of `FILE_SIZE_BYTES`. However, if `per_thread_output` is set, only one thread writes to each file, and it becomes accurate again. | `BIGINT` | (empty) | | `PARQUET_VERSION` | The Parquet version to use (` V1`, `V2`). | `VARCHAR` | `V1` | Some examples of `FIELD_IDS` are as follows. Assign `field_ids` automatically: ```sql COPY (SELECT 128 AS i) TO 'my.parquet' (FIELD_IDS 'auto'); ``` Sets the `field_id` of column `i` to 42: ```sql COPY (SELECT 128 AS i) TO 'my.parquet' (FIELD_IDS {i: 42}); ``` Sets the `field_id` of column `i` to 42, and column `j` to 43: ```sql COPY (SELECT 128 AS i, 256 AS j) TO 'my.parquet' (FIELD_IDS {i: 42, j: 43}); ``` Sets the `field_id` of column `my_struct` to 43, and column `i` (nested inside `my_struct`) to 43: ```sql COPY (SELECT {i: 128} AS my_struct) TO 'my.parquet' (FIELD_IDS {my_struct: {__duckdb_field_id: 42, i: 43}}); ``` Sets the `field_id` of column `my_list` to 42, and column `element` (default name of list child) to 43: ```sql COPY (SELECT [128, 256] AS my_list) TO 'my.parquet' (FIELD_IDS {my_list: {__duckdb_field_id: 42, element: 43}}); ``` Sets the `field_id` of column `my_map` to 42, and columns `key` and `value` (default names of map children) to 43 and 44: ```sql COPY (SELECT MAP {'key1' : 128, 'key2': 256} my_map) TO 'my.parquet' (FIELD_IDS {my_map: {__duckdb_field_id: 42, key: 43, value: 44}}); ``` ##### JSON Options {#docs:stable:sql:statements:copy::json-options} The below options are applicable when writing `JSON` files. | Name | Description | Type | Default | |:--|:-----|:-|:-| | `ARRAY` | Whether to write a JSON array. If `true`, a JSON array of records is written, if `false`, newline-delimited JSON is written | `BOOL` | `false` | | `COMPRESSION` | The compression type for the file. By default this will be detected automatically from the file extension (e.g., `file.json.gz` will use `gzip`, `file.json.zst` will use `zstd`, and `file.json` will use `none`). Options are `none`, `gzip`, `zstd`. | `VARCHAR` | `auto` | | `DATEFORMAT` | Specifies the date format to use when writing dates. See [Date Format](#docs:stable:sql:functions:dateformat). | `VARCHAR` | (empty) | | `TIMESTAMPFORMAT` | Specifies the date format to use when writing timestamps. See [Date Format](#docs:stable:sql:functions:dateformat). | `VARCHAR` | (empty) | Sets the value of column `hello` to `QUACK!` and outputs the results to `quack.json`: ```sql COPY (SELECT 'QUACK!' AS hello) TO 'quack.json'; --RETURNS: {"hello":"QUACK!"} ``` Sets the value of column `num_list` to `[1,2,3]` and outputs the results to `numbers.json`: ```sql COPY (SELECT [1, 2, 3] AS num_list) TO 'numbers.json'; --RETURNS: {"num_list":[1,2,3]} ``` Sets the value of column `compression_type` to `gzip_explicit` and outputs the results to `compression.json.gz` with explicit compression: ```sql COPY (SELECT 'gzip_explicit' AS compression_type) TO 'explicit_compression.json' (FORMAT json, COMPRESSION 'GZIP'); -- RETURNS: {"compression_type":"gzip_explicit"} ``` Sets all values of single rows to be returned as nested arrays to `array_true.json`: ```sql COPY (SELECT 1 AS id, 'Alice' AS name, [1, 2, 3] AS numbers UNION ALL SELECT 2, 'Bob', [4, 5, 6] AS numbers) TO 'array_true.json' (FORMAT json, ARRAY true); -- RETURNS: /* [ {"id":1,"name":"Alice","numbers":[1,2,3]}, {"id":2,"name":"Bob","numbers":[1,2,3]} ] */ ``` Sets all values of single rows to be returned as non-nested arrays to `array_false.json`: ```sql COPY (SELECT 1 AS id, 'Alice' AS name, [1, 2, 3] AS numbers UNION ALL SELECT 2, 'Bob', [4, 5, 6] AS numbers) TO 'array_false.json' (FORMAT json, ARRAY false); -- RETURNS: /* {"id":1,"name":"Alice","numbers":[1,2,3]} {"id":2,"name":"Bob","numbers":[4,5,6]} */ ``` ##### BLOB Options {#docs:stable:sql:statements:copy::blob-options} The `BLOB` format option allows you to select a single column of a DuckDB table into a `.blob` file. The column must be cast to the `BLOB` data type. For details on typecasting, see the [Casting Operations Matrix](#docs:preview:sql:data_types:typecasting::Casting-Operations-Matrix). The below options are applicable when writing `BLOB` files. | Name | Description | Type | Default | |:--|:-----|:-|:-| | `COMPRESSION` | The compression type for the file. By default this will be detected automatically from the file extension (e.g., `file.blob.gz` will use `gzip`, `file.blob.zst` will use `zstd`, and `file.blob` will use `none`). Options are `none`, `gzip`, `zstd`. | `VARCHAR` | `auto` | Type casts the string value `foo` to the `BLOB` data type and outputs the results to `blob_output.blob`: ```sql COPY (select 'foo'::BLOB) TO 'blob_output.blob' (FORMAT BLOB); ``` Type casts the string value `foo` to the `BLOB` data type and outputs the results to `blob_output_gzip.blob.gz` with `gzip` compression: ```sql COPY (select 'foo'::BLOB) TO 'blob_output_gzip.blob' (FORMAT BLOB, COMPRESSION 'GZIP'); ``` #### Limitations {#docs:stable:sql:statements:copy::limitations} `COPY` does not support copying between tables. To copy between tables, use an [`INSERT statement`](#docs:stable:sql:statements:insert): ```sql INSERT INTO tbl2 FROM tbl1; ``` ### CREATE MACRO Statement {#docs:stable:sql:statements:create_macro} The `CREATE MACRO` statement can create a scalar or table macro (function) in the catalog. For a scalar macro, `CREATE MACRO` is followed by the name of the macro, and optionally parameters within a set of parentheses. The keyword `AS` is next, followed by the text of the macro. By design, a scalar macro may only return a single value. For a table macro, the syntax is similar to a scalar macro except `AS` is replaced with `AS TABLE`. A table macro may return a table of arbitrary size and shape. > If a `MACRO` is temporary, it is only usable within the same database connection and is deleted when the connection is closed. #### Examples {#docs:stable:sql:statements:create_macro::examples} ##### Scalar Macros {#docs:stable:sql:statements:create_macro::scalar-macros} Create a macro that adds two expressions (` a` and `b`): ```sql CREATE MACRO add(a, b) AS a + b; ``` Create a macro, replacing possible existing definitions: ```sql CREATE OR REPLACE MACRO add(a, b) AS a + b; ``` Create a macro if it does not already exist, else do nothing: ```sql CREATE MACRO IF NOT EXISTS add(a, b) AS a + b; ``` Create a macro for a `CASE` expression: ```sql CREATE MACRO ifelse(a, b, c) AS CASE WHEN a THEN b ELSE c END; ``` Create a macro that does a subquery: ```sql CREATE MACRO one() AS (SELECT 1); ``` Macros are schema-dependent, and have an alias, `FUNCTION`: ```sql CREATE FUNCTION main.my_avg(x) AS sum(x) / count(x); ``` Create a macro with a default parameter: ```sql CREATE MACRO add_default(a, b := 5) AS a + b; ``` Create a macro `arr_append` (with a functionality equivalent to `array_append`): ```sql CREATE MACRO arr_append(l, e) AS list_concat(l, list_value(e)); ``` Create a macro with a typed parameter: ```sql CREATE MACRO is_maximal(a INTEGER) AS a = 2^31 - 1; ``` ##### Table Macros {#docs:stable:sql:statements:create_macro::table-macros} Create a table macro without parameters: ```sql CREATE MACRO static_table() AS TABLE SELECT 'Hello' AS column1, 'World' AS column2; ``` Create a table macro with parameters (that can be of any type): ```sql CREATE MACRO dynamic_table(col1_value, col2_value) AS TABLE SELECT col1_value AS column1, col2_value AS column2; ``` Create a table macro that returns multiple rows. It will be replaced if it already exists, and it is temporary (will be automatically deleted when the connection ends): ```sql CREATE OR REPLACE TEMP MACRO dynamic_table(col1_value, col2_value) AS TABLE SELECT col1_value AS column1, col2_value AS column2 UNION ALL SELECT 'Hello' AS col1_value, 456 AS col2_value; ``` Pass an argument as a list: ```sql CREATE MACRO get_users(i) AS TABLE SELECT * FROM users WHERE uid IN (SELECT unnest(i)); ``` An example for how to use the `get_users` table macro is the following: ```sql CREATE TABLE users AS SELECT * FROM (VALUES (1, 'Ada'), (2, 'Bob'), (3, 'Carl'), (4, 'Dan'), (5, 'Eve')) t(uid, name); SELECT * FROM get_users([1, 5]); ``` To define macros on arbitrary tables, use the [`query_table` function](#docs:stable:guides:sql_features:query_and_query_table_functions). For example, the following macro computes a column-wise checksum on a table: ```sql CREATE MACRO checksum(tbl) AS TABLE SELECT bit_xor(md5_number(COLUMNS(*)::VARCHAR)) FROM query_table(tbl); CREATE TABLE tbl AS SELECT unnest([42, 43]) AS x, 100 AS y; SELECT * FROM checksum('tbl'); ``` #### Overloading {#docs:stable:sql:statements:create_macro::overloading} It is possible to overload a macro based on the types or the number of its parameters; this works for both scalar and table macros. By providing overloads we can have both `add_x(a, b)` and `add_x(a, b, c)` with different function bodies. ```sql CREATE MACRO add_x (a, b) AS a + b, (a, b, c) AS a + b + c; ``` ```sql SELECT add_x(21, 42) AS two_args, add_x(21, 42, 21) AS three_args; ``` | two_args | three_args | |----------|------------| | 63 | 84 | ```sql CREATE OR REPLACE MACRO is_maximal (a TINYINT) AS a = 2^7 - 1, (a INT) AS a = 2^31 - 1; ``` ```sql SELECT is_maximal(127::TINYINT) AS tiny, is_maximal(127) AS regular; ``` | tiny | regular | |----------|------------| | true | false | #### Syntax {#docs:stable:sql:statements:create_macro::syntax} Macros allow you to create shortcuts for combinations of expressions. ```sql CREATE MACRO add(a) AS a + b; ``` ```console Binder Error: Referenced column "b" not found in FROM clause! ``` This works: ```sql CREATE MACRO add(a, b) AS a + b; ``` Usage example: ```sql SELECT add(1, 2) AS x; ``` | x | |--:| | 3 | However, this fails: ```sql SELECT add('hello', 3); ``` ```console Binder Error: Could not choose a best candidate function for the function call "add(STRING_LITERAL, INTEGER_LITERAL)". In order to select one, please add explicit type casts. Candidate functions: add(DATE, INTEGER) -> DATE add(INTEGER, INTEGER) -> INTEGER ``` Macros can have default parameters. `b` is a default parameter: ```sql CREATE MACRO add_default(a, b := 5) AS a + b; ``` The following will result in 42: ```sql SELECT add_default(37); ``` The order of named parameters does not matter: ```sql CREATE MACRO triple_add(a, b := 5, c := 10) AS a + b + c; ``` ```sql SELECT triple_add(40, c := 1, b := 1) AS x; ``` | x | |---:| | 42 | When macros are used, they are expanded (i.e., replaced with the original expression), and the parameters within the expanded expression are replaced with the supplied arguments. Step by step: The `add` macro we defined above is used in a query: ```sql SELECT add(40, 2) AS x; ``` Internally, `add` is replaced with its definition of `a + b`: ```sql SELECT a + b AS x; ``` Then, the parameters are replaced by the supplied arguments: ```sql SELECT 40 + 2 AS x; ``` #### Limitations {#docs:stable:sql:statements:create_macro::limitations} ##### Using Subquery Macros {#docs:stable:sql:statements:create_macro::using-subquery-macros} Table macros as well as scalar macros defined using scalar subqueries cannot be used in the arguments of table functions. DuckDB will return the following error: ```console Binder Error: Table function cannot contain subqueries ``` ##### Overloads {#docs:stable:sql:statements:create_macro::overloads} Overloads for macro functions have to be set at creation, it is not possible to define a macro by the same name twice without first removing the first definition. ##### Recursive Functions {#docs:stable:sql:statements:create_macro::recursive-functions} Defining recursive functions is not supported. For example, the following macro â€“ supposed to compute the *n*th number of the Fibonacci sequence â€“ fails: ```sql CREATE OR REPLACE FUNCTION fibo(n) AS (SELECT 1); CREATE OR REPLACE FUNCTION fibo(n) AS ( CASE WHEN n <= 1 THEN 1 ELSE fibo(n - 1) END ); SELECT fibo(3); ``` ```console Binder Error: Max expression depth limit of 1000 exceeded. Use "SET max_expression_depth TO x" to increase the maximum expression depth. ``` ##### Function Chaining on the First Function Does Not Work {#docs:stable:sql:statements:create_macro::function-chaining-on-the-first-function-does-not-work} Macros do not support the dot operator for function chaining on the first function. To illustrate this, see an example with the `lower` function, which works: ```sql CREATE OR REPLACE MACRO low(s) AS lower(s); SELECT low('AA'); ``` However, rewriting `lower(s)` to use function chaining does not work: ```sql CREATE OR REPLACE MACRO low(s) AS s.lower(); SELECT low('AA'); ``` ```console Binder Error: Referenced column "s" not found in FROM clause! ``` ### CREATE SCHEMA Statement {#docs:stable:sql:statements:create_schema} The `CREATE SCHEMA` statement creates a schema in the catalog. The default schema is `main`. #### Examples {#docs:stable:sql:statements:create_schema::examples} Create a schema: ```sql CREATE SCHEMA s1; ``` Create a schema if it does not exist yet: ```sql CREATE SCHEMA IF NOT EXISTS s2; ``` Create a schema or replace a schema if it exists: ```sql CREATE OR REPLACE SCHEMA s2; ``` Create table in the schemas: ```sql CREATE TABLE s1.t (id INTEGER PRIMARY KEY, other_id INTEGER); CREATE TABLE s2.t (id INTEGER PRIMARY KEY, j VARCHAR); ``` Compute a join between tables from two schemas: ```sql SELECT * FROM s1.t s1t, s2.t s2t WHERE s1t.other_id = s2t.id; ``` #### Syntax {#docs:stable:sql:statements:create_schema::syntax} ### CREATE SECRET Statement {#docs:stable:sql:statements:create_secret} The `CREATE SECRET` statement creates a new secret in the [Secrets Manager](#docs:stable:configuration:secrets_manager). ##### Syntax for `CREATE SECRET` {#docs:stable:sql:statements:create_secret::syntax-for-create-secret} ##### Syntax for `DROP SECRET` {#docs:stable:sql:statements:create_secret::syntax-for-drop-secret} ### CREATE SEQUENCE Statement {#docs:stable:sql:statements:create_sequence} The `CREATE SEQUENCE` statement creates a new sequence number generator. ##### Examples {#docs:stable:sql:statements:create_sequence::examples} Generate an ascending sequence starting from 1: ```sql CREATE SEQUENCE serial; ``` Generate sequence from a given start number: ```sql CREATE SEQUENCE serial START 101; ``` Generate odd numbers using `INCREMENT BY`: ```sql CREATE SEQUENCE serial START WITH 1 INCREMENT BY 2; ``` Generate a descending sequence starting from 99: ```sql CREATE SEQUENCE serial START WITH 99 INCREMENT BY -1 MAXVALUE 99; ``` By default, cycles are not allowed and will result in error, e.g.: ```console Sequence Error: nextval: reached maximum value of sequence "serial" (10) ``` ```sql CREATE SEQUENCE serial START WITH 1 MAXVALUE 10; ``` `CYCLE` allows cycling through the same sequence repeatedly: ```sql CREATE SEQUENCE serial START WITH 1 MAXVALUE 10 CYCLE; ``` ##### Creating and Dropping Sequences {#docs:stable:sql:statements:create_sequence::creating-and-dropping-sequences} Sequences can be created and dropped similarly to other catalogue items. Overwrite an existing sequence: ```sql CREATE OR REPLACE SEQUENCE serial; ``` Only create sequence if no such sequence exists yet: ```sql CREATE SEQUENCE IF NOT EXISTS serial; ``` Remove sequence: ```sql DROP SEQUENCE serial; ``` Remove sequence if exists: ```sql DROP SEQUENCE IF EXISTS serial; ``` ##### Using Sequences for Primary Keys {#docs:stable:sql:statements:create_sequence::using-sequences-for-primary-keys} Sequences can be used as `DEFAULT` values in [`CREATE TABLE` statements](#{%link docs:stable:sql:statements:create_table.md %}). The example below uses a sequence to create an integer [primary key](#docs:stable:sql:constraints::primary-key-and-unique-constraint): ```sql CREATE SEQUENCE id_sequence START 1; CREATE TABLE tbl (id INTEGER PRIMARY KEY DEFAULT nextval('id_sequence'), s VARCHAR); INSERT INTO tbl (s) VALUES ('hello'), ('world'); SELECT * FROM tbl; ``` The script results in the following table: | id | s | |---:|-------| | 1 | hello | | 2 | world | Sequences can also be added using the [`ALTER TABLE` statement](#docs:stable:sql:statements:alter_table). The following example adds an `id` column and fills it with values generated by the sequence: ```sql CREATE TABLE tbl (s VARCHAR); INSERT INTO tbl VALUES ('hello'), ('world'); CREATE SEQUENCE id_sequence START 1; ALTER TABLE tbl ADD COLUMN id INTEGER DEFAULT nextval('id_sequence'); SELECT * FROM tbl; ``` This script results in the same table as the previous example. ##### Selecting the Next Value {#docs:stable:sql:statements:create_sequence::selecting-the-next-value} To select the next number from a sequence, use `nextval`: ```sql CREATE SEQUENCE serial START 1; SELECT nextval('serial') AS nextval; ``` | nextval | |--------:| | 1 | Using this sequence in an `INSERT` command: ```sql INSERT INTO distributors VALUES (nextval('serial'), 'nothing'); ``` ##### Selecting the Current Value {#docs:stable:sql:statements:create_sequence::selecting-the-current-value} You may also view the current number from the sequence. Note that the `nextval` function must have already been called before calling `currval`, otherwise a Serialization Error (` sequence is not yet defined in this session`) will be thrown. ```sql CREATE SEQUENCE serial START 1; SELECT nextval('serial') AS nextval; SELECT currval('serial') AS currval; ``` | currval | |--------:| | 1 | ##### Syntax {#docs:stable:sql:statements:create_sequence::syntax} `CREATE SEQUENCE` creates a new sequence number generator. If a schema name is given then the sequence is created in the specified schema. Otherwise it is created in the current schema. Temporary sequences exist in a special schema, so a schema name may not be given when creating a temporary sequence. The sequence name must be distinct from the name of any other sequence in the same schema. After a sequence is created, you use the function `nextval` to operate on the sequence. #### Parameters {#docs:stable:sql:statements:create_sequence::parameters} | Name | Description | |:--|:-----| | `CYCLE` or `NO CYCLE` | The `CYCLE` option allows the sequence to wrap around when the `maxvalue` or `minvalue` has been reached by an ascending or descending sequence respectively. If the limit is reached, the next number generated will be the `minvalue` or `maxvalue`, respectively. If `NO CYCLE` is specified, any calls to `nextval` after the sequence has reached its maximum value will return an error. If neither `CYCLE` or `NO CYCLE` are specified, `NO CYCLE` is the default. | | `increment` | The optional clause `INCREMENT BY increment` specifies which value is added to the current sequence value to create a new value. A positive value will make an ascending sequence, a negative one a descending sequence. The default value is 1. | | `maxvalue` | The optional clause `MAXVALUE maxvalue` determines the maximum value for the sequence. If this clause is not supplied or `NO MAXVALUE` is specified, then default values will be used. The defaults are 2^63 - 1 and -1 for ascending and descending sequences, respectively. | | `minvalue` | The optional clause `MINVALUE minvalue` determines the minimum value a sequence can generate. If this clause is not supplied or `NO MINVALUE` is specified, then defaults will be used. The defaults are 1 and -(2^63 - 1) for ascending and descending sequences, respectively. | | `name` | The name (optionally schema-qualified) of the sequence to be created. | | `start` | The optional clause `START WITH start` allows the sequence to begin anywhere. The default starting value is `minvalue` for ascending sequences and `maxvalue` for descending ones. | | `TEMPORARY` or `TEMP` | If specified, the sequence object is created only for this session, and is automatically dropped on session exit. Existing permanent sequences with the same name are not visible (in this session) while the temporary sequence exists, unless they are referenced with schema-qualified names. | > Sequences are based on `BIGINT` arithmetic, so the range cannot exceed the range of an eight-byte integer (-9223372036854775808 to 9223372036854775807). #### Limitations {#docs:stable:sql:statements:create_sequence::limitations} Due to limitations in DuckDB's dependency manager, `DROP SEQUENCE` will fail in some corner cases. For example, deleting a column that uses a sequence should allow the sequence to be dropped but this currently returns an error: ```sql CREATE SEQUENCE id_sequence START 1; CREATE TABLE tbl (id INTEGER DEFAULT nextval('id_sequence'), s VARCHAR); ALTER TABLE tbl DROP COLUMN id; DROP SEQUENCE id_sequence; ``` ```console Dependency Error: Cannot drop entry "id_sequence" because there are entries that depend on it. table "tbl" depends on index "id_sequence". Use DROP...CASCADE to drop all dependents. ``` This can be worked around by using the `CASCADE` modifier. The following command drops the sequence: ```sql DROP SEQUENCE id_sequence CASCADE; ``` ### CREATE TABLE Statement {#docs:stable:sql:statements:create_table} The `CREATE TABLE` statement creates a table in the catalog. #### Examples {#docs:stable:sql:statements:create_table::examples} Create a table with two integer columns (` i` and `j`): ```sql CREATE TABLE t1 (i INTEGER, j INTEGER); ``` Create a table with a primary key: ```sql CREATE TABLE t1 (id INTEGER PRIMARY KEY, j VARCHAR); ``` Create a table with a composite primary key: ```sql CREATE TABLE t1 (id INTEGER, j VARCHAR, PRIMARY KEY (id, j)); ``` Create a table with various different types, constraints, and default values: ```sql CREATE TABLE t1 ( i INTEGER NOT NULL DEFAULT 0, decimalnr DOUBLE CHECK (decimalnr < 10), date DATE UNIQUE, time TIMESTAMP ); ``` Create table with `CREATE TABLE ... AS SELECT` (CTAS): ```sql CREATE TABLE t1 AS SELECT 42 AS i, 84 AS j; ``` Create a table from a CSV file (automatically detecting column names and types): ```sql CREATE TABLE t1 AS SELECT * FROM read_csv('path/file.csv'); ``` We can use the `FROM`-first syntax to omit `SELECT *`: ```sql CREATE TABLE t1 AS FROM read_csv('path/file.csv'); ``` Copy the schema of `t2` to `t1`: ```sql CREATE TABLE t1 AS FROM t2 LIMIT 0; ``` Note that only the column names and types are copied to `t2`, other pieces of information (indexes, constraints, default values, etc.) are not copied. #### Temporary Tables {#docs:stable:sql:statements:create_table::temporary-tables} Temporary tables are session scoped, meaning that only the specific connection that created them can access them and once the connection to DuckDB is closed they will be automatically dropped (similar to PostgreSQL, for example). They can be created using the `CREATE TEMP TABLE` or the `CREATE TEMPORARY TABLE` statement (see diagram below) and are part of the `temp.main` schema. While discouraged, their names can overlap with the names of the regular database tables. In these cases, temporary tables take priority in name resolution and full qualifaction is required to refer to a regular table e.g., `memory.main.t1`. Temporary tables reside in memory rather than on disk even when connecting to a persistent DuckDB, but if the `temp_directory` [configuration](#docs:stable:configuration:overview) is set, data will be spilled to disk if memory becomes constrained. Create a temporary table from a CSV file (automatically detecting column names and types): ```sql CREATE TEMP TABLE t1 AS SELECT * FROM read_csv('path/file.csv'); ``` Allow temporary tables to off-load excess memory to disk: ```sql SET temp_directory = '/path/to/directory/'; ``` #### `CREATE OR REPLACE` {#docs:stable:sql:statements:create_table::create-or-replace} The `CREATE OR REPLACE` syntax allows a new table to be created or for an existing table to be overwritten by the new table. This is shorthand for dropping the existing table and then creating the new one. Create a table with two integer columns (i and j) even if t1 already exists: ```sql CREATE OR REPLACE TABLE t1 (i INTEGER, j INTEGER); ``` #### `IF NOT EXISTS` {#docs:stable:sql:statements:create_table::if-not-exists} The `IF NOT EXISTS` syntax will only proceed with the creation of the table if it does not already exist. If the table already exists, no action will be taken and the existing table will remain in the database. Create a table with two integer columns (` i` and `j`) only if `t1` does not exist yet: ```sql CREATE TABLE IF NOT EXISTS t1 (i INTEGER, j INTEGER); ``` #### `CREATE TABLE ... AS SELECT` (CTAS) {#docs:stable:sql:statements:create_table::create-table--as-select-ctas} DuckDB supports the `CREATE TABLE ... AS SELECT` syntax, also known as â€œCTASâ€: ```sql CREATE TABLE nums AS SELECT i FROM range(0, 3) t(i); ``` This syntax can be used in combination with the [CSV reader](#docs:stable:data:csv:overview), the shorthand to read directly from CSV files without specifying a function, the [`FROM`-first syntax](#docs:stable:sql:query_syntax:from), and the [HTTP(S) support](#docs:stable:core_extensions:httpfs:https), yielding concise SQL commands such as the following: ```sql CREATE TABLE flights AS FROM 'https://duckdb.org/data/flights.csv'; ``` The CTAS construct also works with the `OR REPLACE` modifier, yielding `CREATE OR REPLACE TABLE ... AS` statements: ```sql CREATE OR REPLACE TABLE flights AS FROM 'https://duckdb.org/data/flights.csv'; ``` ##### Copying the Schema {#docs:stable:sql:statements:create_table::copying-the-schema} You can create a copy of the table's schema (column names and types only) as follows: ```sql CREATE TABLE t1 AS FROM t2 WITH NO DATA; ``` Or: ```sql CREATE TABLE t1 AS FROM t2 LIMIT 0; ``` It is not possible to create tables using CTAS statements with constraints (primary keys, check constraints, etc.). #### Check Constraints {#docs:stable:sql:statements:create_table::check-constraints} A `CHECK` constraint is an expression that must be satisfied by the values of every row in the table. ```sql CREATE TABLE t1 ( id INTEGER PRIMARY KEY, percentage INTEGER CHECK (0 <= percentage AND percentage <= 100) ); INSERT INTO t1 VALUES (1, 5); INSERT INTO t1 VALUES (2, -1); ``` ```console Constraint Error: CHECK constraint failed: t1 ``` ```sql INSERT INTO t1 VALUES (3, 101); ``` ```console Constraint Error: CHECK constraint failed: t1 ``` ```sql CREATE TABLE t2 (id INTEGER PRIMARY KEY, x INTEGER, y INTEGER CHECK (x < y)); INSERT INTO t2 VALUES (1, 5, 10); INSERT INTO t2 VALUES (2, 5, 3); ``` ```console Constraint Error: CHECK constraint failed: t2 ``` `CHECK` constraints can also be added as part of the `CONSTRAINTS` clause: ```sql CREATE TABLE t3 ( id INTEGER PRIMARY KEY, x INTEGER, y INTEGER, CONSTRAINT x_smaller_than_y CHECK (x < y) ); INSERT INTO t3 VALUES (1, 5, 10); INSERT INTO t3 VALUES (2, 5, 3); ``` ```console Constraint Error: CHECK constraint failed: t3 ``` #### Foreign Key Constraints {#docs:stable:sql:statements:create_table::foreign-key-constraints} A `FOREIGN KEY` is a column (or set of columns) that references another table's primary key. Foreign keys check referential integrity, i.e., the referred primary key must exist in the other table upon insertion. ```sql CREATE TABLE t1 (id INTEGER PRIMARY KEY, j VARCHAR); CREATE TABLE t2 ( id INTEGER PRIMARY KEY, t1_id INTEGER, FOREIGN KEY (t1_id) REFERENCES t1 (id) ); ``` Example: ```sql INSERT INTO t1 VALUES (1, 'a'); INSERT INTO t2 VALUES (1, 1); INSERT INTO t2 VALUES (2, 2); ``` ```console Constraint Error: Violates foreign key constraint because key "id: 2" does not exist in the referenced table ``` Foreign keys can be defined on composite primary keys: ```sql CREATE TABLE t3 (id INTEGER, j VARCHAR, PRIMARY KEY (id, j)); CREATE TABLE t4 ( id INTEGER PRIMARY KEY, t3_id INTEGER, t3_j VARCHAR, FOREIGN KEY (t3_id, t3_j) REFERENCES t3(id, j) ); ``` Example: ```sql INSERT INTO t3 VALUES (1, 'a'); INSERT INTO t4 VALUES (1, 1, 'a'); INSERT INTO t4 VALUES (2, 1, 'b'); ``` ```console Constraint Error: Violates foreign key constraint because key "id: 1, j: b" does not exist in the referenced table ``` Foreign keys can also be defined on unique columns: ```sql CREATE TABLE t5 (id INTEGER UNIQUE, j VARCHAR); CREATE TABLE t6 ( id INTEGER PRIMARY KEY, t5_id INTEGER, FOREIGN KEY (t5_id) REFERENCES t5(id) ); ``` ##### Limitations {#docs:stable:sql:statements:create_table::limitations} Foreign keys have the following limitations. Foreign keys with cascading deletes (` FOREIGN KEY ... REFERENCES ... ON DELETE CASCADE`) are not supported. Inserting into tables with self-referencing foreign keys is currently not supported and will result in the following error: ```console Constraint Error: Violates foreign key constraint because key "..." does not exist in the referenced table. ``` #### Generated Columns {#docs:stable:sql:statements:create_table::generated-columns} The `[type] [GENERATED ALWAYS] AS (expr) [VIRTUAL|STORED]` syntax will create a generated column. The data in this kind of column is generated from its expression, which can reference other (regular or generated) columns of the table. Since they are produced by calculations, these columns can not be inserted into directly. DuckDB can infer the type of the generated column based on the expression's return type. This allows you to leave out the type when declaring a generated column. It is possible to explicitly set a type, but insertions into the referenced columns might fail if the type can not be cast to the type of the generated column. Generated columns come in two varieties: `VIRTUAL` and `STORED`. The data of virtual generated columns is not stored on disk, instead it is computed from the expression every time the column is referenced (through a select statement). The data of stored generated columns is stored on disk and is computed every time the data of their dependencies change (through an `INSERT` / `UPDATE` / `DROP` statement). Currently, only the `VIRTUAL` kind is supported, and it is also the default option if the last field is left blank. The simplest syntax for a generated column: The type is derived from the expression, and the variant defaults to `VIRTUAL`: ```sql CREATE TABLE t1 (x FLOAT, two_x AS (2 * x)); ``` Fully specifying the same generated column for completeness: ```sql CREATE TABLE t1 (x FLOAT, two_x FLOAT GENERATED ALWAYS AS (2 * x) VIRTUAL); ``` #### Syntax {#docs:stable:sql:statements:create_table::syntax} ### CREATE VIEW Statement {#docs:stable:sql:statements:create_view} The `CREATE VIEW` statement defines a new view in the catalog. #### Examples {#docs:stable:sql:statements:create_view::examples} Create a simple view: ```sql CREATE VIEW view1 AS SELECT * FROM tbl; ``` Create a view or replace it if a view with that name already exists: ```sql CREATE OR REPLACE VIEW view1 AS SELECT 42; ``` Create a view and replace the column names: ```sql CREATE VIEW view1(a) AS SELECT 42; ``` The SQL query behind an existing view can be read using the [`duckdb_views()` function](#docs:stable:sql:meta:duckdb_table_functions::duckdb_views) like this: ```sql SELECT sql FROM duckdb_views() WHERE view_name = 'view1'; ``` #### Syntax {#docs:stable:sql:statements:create_view::syntax} `CREATE VIEW` defines a view of a query. The view is not physically materialized. Instead, the query is run every time the view is referenced in a query. `CREATE OR REPLACE VIEW` is similar, but if a view of the same name already exists, it is replaced. If a schema name is given then the view is created in the specified schema. Otherwise, it is created in the current schema. Temporary views exist in a special schema, so a schema name cannot be given when creating a temporary view. The name of the view must be distinct from the name of any other view or table in the same schema. ### CREATE TYPE Statement {#docs:stable:sql:statements:create_type} The `CREATE TYPE` statement defines a new type in the catalog. #### Examples {#docs:stable:sql:statements:create_type::examples} Create a simple `ENUM` type: ```sql CREATE TYPE mood AS ENUM ('happy', 'sad', 'curious'); ``` Create a simple `STRUCT` type: ```sql CREATE TYPE many_things AS STRUCT(k INTEGER, l VARCHAR); ``` Create a simple `UNION` type: ```sql CREATE TYPE one_thing AS UNION(number INTEGER, string VARCHAR); ``` Create a type alias: ```sql CREATE TYPE x_index AS INTEGER; ``` #### Syntax {#docs:stable:sql:statements:create_type::syntax} The `CREATE TYPE` clause defines a new data type available to this DuckDB instance. These new types can then be inspected in the [`duckdb_types` table](#docs:stable:sql:meta:duckdb_table_functions::duckdb_types). #### Limitations {#docs:stable:sql:statements:create_type::limitations} * Extending types to support custom operators (such as the PostgreSQL `&&` operator) is not possible via plain SQL. Instead, it requires adding additional C++ code. To do this, create an [extension](#docs:stable:extensions:overview). * The `CREATE TYPE` clause does not support the `OR REPLACE` modifier. ### DELETE Statement {#docs:stable:sql:statements:delete} The `DELETE` statement removes rows from the table identified by the table-name. If the `WHERE` clause is not present, all records in the table are deleted. If a `WHERE` clause is supplied, then only those rows for which the `WHERE` clause results in true are deleted. Rows for which the expression is false or `NULL` are retained. #### Examples {#docs:stable:sql:statements:delete::examples} Remove the rows matching the condition `i = 2` from the database: ```sql DELETE FROM tbl WHERE i = 2; ``` Delete all rows in the table `tbl`: ```sql DELETE FROM tbl; ``` ##### `USING` Clause {#docs:stable:sql:statements:delete::using-clause} The `USING` clause allows deleting based on the content of other tables or subqueries. ##### `RETURNING` Clause {#docs:stable:sql:statements:delete::returning-clause} The `RETURNING` clause allows returning the deletes values. It uses the same syntax as the `SELECT` clause except the `DISTINCT` modifier is not supported. ```sql CREATE TABLE employees (name VARCHAR, age INTEGER); INSERT INTO employees VALUES ('Kat', 32); DELETE FROM employees RETURNING name, 2025 - age AS approx_birthyear; ``` | name | approx_birthyear | |------|-----------------:| | Kat | 1993 | #### Syntax {#docs:stable:sql:statements:delete::syntax} #### The`TRUNCATE` Statement {#docs:stable:sql:statements:delete::thetruncate-statement} The `TRUNCATE` statement removes all rows from a table, acting as an alias for `DELETE FROM` without a `WHERE` clause: ```sql TRUNCATE tbl; ``` #### Limitations on Reclaiming Memory and Disk Space {#docs:stable:sql:statements:delete::limitations-on-reclaiming-memory-and-disk-space} Running `DELETE` does not mean space is reclaimed. In general, rows are only marked as deleted. DuckDB reclaims space upon [performing a `CHECKPOINT`](#docs:stable:sql:statements:checkpoint). [`VACUUM`](#docs:stable:sql:statements:vacuum) currently does not reclaim space. ### DESCRIBE Statement {#docs:stable:sql:statements:describe} The `DESCRIBE` statement shows the schema of a table, view or query. #### Usage {#docs:stable:sql:statements:describe::usage} ```sql DESCRIBE tbl; ``` In order to summarize a query, prepend `DESCRIBE` to a query. ```sql DESCRIBE SELECT * FROM tbl; ``` #### Alias {#docs:stable:sql:statements:describe::alias} The `SHOW` statement is an alias for `DESCRIBE`. #### See Also {#docs:stable:sql:statements:describe::see-also} For more examples, see the [guide on `DESCRIBE`](#docs:stable:guides:meta:describe). ### DROP Statement {#docs:stable:sql:statements:drop} The `DROP` statement removes a catalog entry added previously with the `CREATE` command. #### Examples {#docs:stable:sql:statements:drop::examples} Delete the table with the name `tbl`: ```sql DROP TABLE tbl; ``` Drop the view with the name `view1`; do not throw an error if the view does not exist: ```sql DROP VIEW IF EXISTS view1; ``` Drop function `fn`: ```sql DROP FUNCTION fn; ``` Drop index `idx`: ```sql DROP INDEX idx; ``` Drop schema `sch`: ```sql DROP SCHEMA sch; ``` Drop sequence `seq`: ```sql DROP SEQUENCE seq; ``` Drop macro `mcr`: ```sql DROP MACRO mcr; ``` Drop macro table `mt`: ```sql DROP MACRO TABLE mt; -- the `TABLE` is optional since v1.4.0 ``` Drop type `typ`: ```sql DROP TYPE typ; ``` #### Syntax {#docs:stable:sql:statements:drop::syntax} #### Dependencies of Dropped Objects {#docs:stable:sql:statements:drop::dependencies-of-dropped-objects} DuckDB performs limited dependency tracking for some object types. By default or if the `RESTRICT` clause is provided, the entry will not be dropped if there are any other objects that depend on it. If the `CASCADE` clause is provided then all the objects that are dependent on the object will be dropped as well. ```sql CREATE SCHEMA myschema; CREATE TABLE myschema.t1 (i INTEGER); DROP SCHEMA myschema; ``` ```console Dependency Error: Cannot drop entry "myschema" because there are entries that depend on it. table "t1" depends on schema "myschema". Use DROP...CASCADE to drop all dependents. ``` The `CASCADE` modifier drops both myschema and `myschema.t1`: ```sql CREATE SCHEMA myschema; CREATE TABLE myschema.t1 (i INTEGER); DROP SCHEMA myschema CASCADE; ``` The following dependencies are tracked and thus will raise an error if the user tries to drop the depending object without the `CASCADE` modifier. | Depending object type | Dependant object type | |--|--| | `SCHEMA` | `FUNCTION` | | `SCHEMA` | `INDEX` | | `SCHEMA` | `MACRO TABLE` | | `SCHEMA` | `MACRO` | | `SCHEMA` | `SCHEMA` | | `SCHEMA` | `SEQUENCE` | | `SCHEMA` | `TABLE` | | `SCHEMA` | `TYPE` | | `SCHEMA` | `VIEW` | | `TABLE` | `INDEX` | #### Limitations {#docs:stable:sql:statements:drop::limitations} ##### Dependencies on Views {#docs:stable:sql:statements:drop::dependencies-on-views} Currently, dependencies are not tracked for views. For example, if a view is created that references a table and the table is dropped, then the view will be in an invalid state: ```sql CREATE TABLE tbl (i INTEGER); CREATE VIEW view1 AS SELECT i FROM tbl; DROP TABLE tbl RESTRICT; SELECT * FROM view1; ``` This returns the following error message: ```console Catalog Error: Table with name tbl does not exist! ``` #### Limitations on Reclaiming Disk Space {#docs:stable:sql:statements:drop::limitations-on-reclaiming-disk-space} Running `DROP TABLE` should free the memory used by the table, but not always disk space. Even if disk space does not decrease, the free blocks will be marked as `free`. For example, if we have a 2 GB file and we drop a 1 GB table, the file might still be 2 GB, but it should have 1 GB of free blocks in it. To check this, use the following `PRAGMA` and check the number of `free_blocks` in the output: ```sql PRAGMA database_size; ``` For instruction on reclaiming space after dropping a table, refer to the [â€œReclaiming spaceâ€ page](#docs:stable:operations_manual:footprint_of_duckdb:reclaiming_space). ### EXPORT and IMPORT DATABASE Statements {#docs:stable:sql:statements:export} The `EXPORT DATABASE` command allows you to export the contents of the database to a specific directory. The `IMPORT DATABASE` command allows you to then read the contents again. #### Examples {#docs:stable:sql:statements:export::examples} Export the database to the target directory 'target_directory' as CSV files: ```sql EXPORT DATABASE 'target_directory'; ``` Export to directory 'target_directory', using the given options for the CSV serialization: ```sql EXPORT DATABASE 'target_directory' (FORMAT csv, DELIMITER '|'); ``` Export to directory 'target_directory', tables serialized as Parquet: ```sql EXPORT DATABASE 'target_directory' (FORMAT parquet); ``` Export to directory 'target_directory', tables serialized as Parquet, compressed with ZSTD, with a row_group_size of 100,000: ```sql EXPORT DATABASE 'target_directory' ( FORMAT parquet, COMPRESSION zstd, ROW_GROUP_SIZE 100_000 ); ``` Reload the database again: ```sql IMPORT DATABASE 'source_directory'; ``` Alternatively, use a `PRAGMA`: ```sql PRAGMA import_database('source_directory'); ``` For details regarding the writing of Parquet files, see the [Parquet Files page in the Data Import section](#docs:stable:data:parquet:overview::writing-to-parquet-files) and the [`COPY` Statement page](#docs:stable:sql:statements:copy). #### `EXPORT DATABASE` {#docs:stable:sql:statements:export::export-database} The `EXPORT DATABASE` command exports the full contents of the database â€“ including schema information, tables, views and sequences â€“ to a specific directory that can then be loaded again. The created directory will be structured as follows: ```text target_directory/schema.sql target_directory/load.sql target_directory/t_1.csv ... target_directory/t_n.csv ``` The `schema.sql` file contains the schema statements that are found in the database. It contains any `CREATE SCHEMA`, `CREATE TABLE`, `CREATE VIEW` and `CREATE SEQUENCE` commands that are necessary to re-construct the database. The `load.sql` file contains a set of `COPY` statements that can be used to read the data from the CSV files again. The file contains a single `COPY` statement for every table found in the schema. ##### Syntax {#docs:stable:sql:statements:export::syntax} #### `IMPORT DATABASE` {#docs:stable:sql:statements:export::import-database} The database can be reloaded by using the `IMPORT DATABASE` command again, or manually by running `schema.sql` followed by `load.sql` to re-load the data. ##### Syntax {#docs:stable:sql:statements:export::syntax} ### INSERT Statement {#docs:stable:sql:statements:insert} The `INSERT` statement inserts new data into a table. ##### Examples {#docs:stable:sql:statements:insert::examples} Insert the values 1, 2, 3 into `tbl`: ```sql INSERT INTO tbl VALUES (1), (2), (3); ``` Insert the result of a query into a table: ```sql INSERT INTO tbl SELECT * FROM other_tbl; ``` Insert values into the `i` column, inserting the default value into other columns: ```sql INSERT INTO tbl (i) VALUES (1), (2), (3); ``` Explicitly insert the default value into a column: ```sql INSERT INTO tbl (i) VALUES (1), (DEFAULT), (3); ``` Assuming `tbl` has a primary key/unique constraint, do nothing on conflict: ```sql INSERT OR IGNORE INTO tbl (i) VALUES (1); ``` Or update the table with the new values instead: ```sql INSERT OR REPLACE INTO tbl (i) VALUES (1); ``` ##### Syntax {#docs:stable:sql:statements:insert::syntax} `INSERT INTO` inserts new rows into a table. One can insert one or more rows specified by value expressions, or zero or more rows resulting from a query. #### Insert Column Order {#docs:stable:sql:statements:insert::insert-column-order} It's possible to provide an optional insert column order, this can either be `BY POSITION` (the default) or `BY NAME`. Each column not present in the explicit or implicit column list will be filled with a default value, either its declared default value or `NULL` if there is none. If the expression for any column is not of the correct data type, automatic type conversion will be attempted. ##### `INSERT INTO ... [BY POSITION]` {#docs:stable:sql:statements:insert::insert-into--by-position} The order that values are inserted into the columns of the table is determined by the order that the columns were declared in. That is, the values supplied by the `VALUES` clause or query are associated with the column list left-to-right. This is the default option, that can be explicitly specified using the `BY POSITION` option. For example: ```sql CREATE TABLE tbl (a INTEGER, b INTEGER); INSERT INTO tbl VALUES (5, 42); ``` Specifying `BY POSITION` is optional and is equivalent to the default behavior: ```sql INSERT INTO tbl BY POSITION VALUES (5, 42); ``` To use a different order, column names can be provided as part of the target, for example: ```sql CREATE TABLE tbl (a INTEGER, b INTEGER); INSERT INTO tbl (b, a) VALUES (5, 42); ``` Adding `BY POSITION` results in the same behavior: ```sql INSERT INTO tbl BY POSITION (b, a) VALUES (5, 42); ``` This will insert `5` into `b` and `42` into `a`. ##### `INSERT INTO ... BY NAME` {#docs:stable:sql:statements:insert::insert-into--by-name} Using the `BY NAME` modifier, the names of the column list of the `SELECT` statement are matched against the column names of the table to determine the order that values should be inserted into the table. This allows inserting even in cases when the order of the columns in the table differs from the order of the values in the `SELECT` statement or certain columns are missing. For example: ```sql CREATE TABLE tbl (a INTEGER, b INTEGER); INSERT INTO tbl BY NAME (SELECT 42 AS b, 32 AS a); INSERT INTO tbl BY NAME (SELECT 22 AS b); SELECT * FROM tbl; ``` | a | b | |-----:|---:| | 32 | 42 | | NULL | 22 | It's important to note that when using `INSERT INTO ... BY NAME`, the column names specified in the `SELECT` statement must match the column names in the table. If a column name is misspelled or does not exist in the table, an error will occur. Columns that are missing from the `SELECT` statement will be filled with the default value. #### `ON CONFLICT` Clause {#docs:stable:sql:statements:insert::on-conflict-clause} An `ON CONFLICT` clause can be used to perform a certain action on conflicts that arise from `UNIQUE` or `PRIMARY KEY` constraints. An example for such a conflict is shown in the following example: ```sql CREATE TABLE tbl (i INTEGER PRIMARY KEY, j INTEGER); INSERT INTO tbl VALUES (1, 42); INSERT INTO tbl VALUES (1, 84); ``` This raises as an error: ```console Constraint Error: Duplicate key "i: 1" violates primary key constraint. ``` The table will contain the row that was first inserted: ```sql SELECT * FROM tbl; ``` | i | j | |--:|---:| | 1 | 42 | These error messages can be avoided by explicitly handling conflicts. DuckDB supports two such clauses: [`ON CONFLICT DO NOTHING`](#::do-nothing-clause) and [`ON CONFLICT DO UPDATE SET ...`](#::do-update-clause-upsert). ##### `DO NOTHING` Clause {#docs:stable:sql:statements:insert::do-nothing-clause} The `DO NOTHING` clause causes the error(s) to be ignored, and the values are not inserted or updated. For example: ```sql CREATE TABLE tbl (i INTEGER PRIMARY KEY, j INTEGER); INSERT INTO tbl VALUES (1, 42); INSERT INTO tbl VALUES (1, 84) ON CONFLICT DO NOTHING; ``` These statements finish successfully and leaves the table with the row ``. ###### `INSERT OR IGNORE INTO` {#docs:stable:sql:statements:insert::insert-or-ignore-into} The `INSERT OR IGNORE INTO ...` statement is a shorter syntax alternative to `INSERT INTO ... ON CONFLICT DO NOTHING`. For example, the following statements are equivalent: ```sql INSERT OR IGNORE INTO tbl VALUES (1, 84); INSERT INTO tbl VALUES (1, 84) ON CONFLICT DO NOTHING; ``` ##### `DO UPDATE` Clause (Upsert) {#docs:stable:sql:statements:insert::do-update-clause-upsert} The `DO UPDATE` clause causes the `INSERT` to turn into an `UPDATE` on the conflicting row(s) instead. The `SET` expressions that follow determine how these rows are updated. The expressions can use the special virtual table `EXCLUDED`, which contains the conflicting values for the row. Optionally you can provide an additional `WHERE` clause that can exclude certain rows from the update. The conflicts that don't meet this condition are ignored instead. Because we need a way to refer to both the **to-be-inserted** tuple and the **existing** tuple, we introduce the special `EXCLUDED` qualifier. When the `EXCLUDED` qualifier is provided, the reference refers to the **to-be-inserted** tuple, otherwise, it refers to the **existing** tuple. This special qualifier can be used within the `WHERE` clauses and `SET` expressions of the `ON CONFLICT` clause. ```sql CREATE TABLE tbl (i INTEGER PRIMARY KEY, j INTEGER); INSERT INTO tbl VALUES (1, 42); INSERT INTO tbl VALUES (1, 52), (1, 62) ON CONFLICT DO UPDATE SET j = EXCLUDED.j; ``` ###### Examples {#docs:stable:sql:statements:insert::examples} An example using `DO UPDATE` is the following: ```sql CREATE TABLE tbl (i INTEGER PRIMARY KEY, j INTEGER); INSERT INTO tbl VALUES (1, 42); INSERT INTO tbl VALUES (1, 84) ON CONFLICT DO UPDATE SET j = EXCLUDED.j; SELECT * FROM tbl; ``` | i | j | |--:|---:| | 1 | 84 | Rearranging columns and using `BY NAME` is also possible: ```sql CREATE TABLE tbl (i INTEGER PRIMARY KEY, j INTEGER); INSERT INTO tbl VALUES (1, 42); INSERT INTO tbl (j, i) VALUES (168, 1) ON CONFLICT DO UPDATE SET j = EXCLUDED.j; INSERT INTO tbl BY NAME (SELECT 1 AS i, 336 AS j) ON CONFLICT DO UPDATE SET j = EXCLUDED.j; SELECT * FROM tbl; ``` | i | j | |--:|----:| | 1 | 336 | ###### `INSERT OR REPLACE INTO` {#docs:stable:sql:statements:insert::insert-or-replace-into} The `INSERT OR REPLACE INTO ...` statement is a shorter syntax alternative to `INSERT INTO ... DO UPDATE SET c1 = EXCLUDED.c1, c2 = EXCLUDED.c2, ...`. That is, it updates every column of the **existing** row to the new values of the **to-be-inserted** row. For example, given the following input table: ```sql CREATE TABLE tbl (i INTEGER PRIMARY KEY, j INTEGER); INSERT INTO tbl VALUES (1, 42); ``` These statements are equivalent: ```sql INSERT OR REPLACE INTO tbl VALUES (1, 84); INSERT INTO tbl VALUES (1, 84) ON CONFLICT DO UPDATE SET j = EXCLUDED.j; INSERT INTO tbl (j, i) VALUES (84, 1) ON CONFLICT DO UPDATE SET j = EXCLUDED.j; INSERT INTO tbl BY NAME (SELECT 84 AS j, 1 AS i) ON CONFLICT DO UPDATE SET j = EXCLUDED.j; ``` ###### Limitations {#docs:stable:sql:statements:insert::limitations} When the `ON CONFLICT ... DO UPDATE` clause is used and a conflict occurs, DuckDB internally assigns `NULL` values to the row's columns that are unaffected by the conflict, then re-assigns their values. If the affected columns use a `NOT NULL` constraint, this will trigger a `NOT NULL constraint failed` error. For example: ```sql CREATE TABLE t1 (id INTEGER PRIMARY KEY, val1 DOUBLE, val2 DOUBLE NOT NULL); CREATE TABLE t2 (id INTEGER PRIMARY KEY, val1 DOUBLE); INSERT INTO t1 VALUES (1, 2, 3); INSERT INTO t2 VALUES (1, 5); INSERT INTO t1 BY NAME (SELECT id, val1 FROM t2) ON CONFLICT DO UPDATE SET val1 = EXCLUDED.val1; ``` This fails with the following error: ```console Constraint Error: NOT NULL constraint failed: t1.val2 ``` ###### Composite Primary Key {#docs:stable:sql:statements:insert::composite-primary-key} When multiple columns need to be part of the uniqueness constraint, use a single `PRIMARY KEY` clause including all relevant columns: ```sql CREATE TABLE t1 (id1 INTEGER, id2 INTEGER, val1 DOUBLE, PRIMARY KEY(id1, id2)); INSERT OR REPLACE INTO t1 VALUES (1, 2, 3); INSERT OR REPLACE INTO t1 VALUES (1, 2, 4); ``` ##### Defining a Conflict Target {#docs:stable:sql:statements:insert::defining-a-conflict-target} A conflict target may be provided as `ON CONFLICT (conflict_target)`. This is a group of columns that an index or uniqueness/key constraint is defined on. If the conflict target is omitted, or `PRIMARY KEY` constraint(s) on the table are targeted. Specifying a conflict target is optional unless using a [`DO UPDATE`](#::do-update-clause-upsert) and there are multiple unique/primary key constraints on the table. ```sql CREATE TABLE tbl (i INTEGER PRIMARY KEY, j INTEGER UNIQUE, k INTEGER); INSERT INTO tbl VALUES (1, 20, 300); SELECT * FROM tbl; ``` | i | j | k | |--:|---:|----:| | 1 | 20 | 300 | ```sql INSERT INTO tbl VALUES (1, 40, 700) ON CONFLICT (i) DO UPDATE SET k = 2 * EXCLUDED.k; ``` | i | j | k | |--:|---:|-----:| | 1 | 20 | 1400 | ```sql INSERT INTO tbl VALUES (1, 20, 900) ON CONFLICT (j) DO UPDATE SET k = 5 * EXCLUDED.k; ``` | i | j | k | |--:|---:|-----:| | 1 | 20 | 4500 | When a conflict target is provided, you can further filter this with a `WHERE` clause, that should be met by all conflicts. ```sql INSERT INTO tbl VALUES (1, 40, 700) ON CONFLICT (i) DO UPDATE SET k = 2 * EXCLUDED.k WHERE k < 100; ``` #### `RETURNING` Clause {#docs:stable:sql:statements:insert::returning-clause} The `RETURNING` clause may be used to return the contents of the rows that were inserted. This can be useful if some columns are calculated upon insert. For example, if the table contains an automatically incrementing primary key, then the `RETURNING` clause will include the automatically created primary key. This is also useful in the case of generated columns. Some or all columns can be explicitly chosen to be returned and they may optionally be renamed using aliases. Arbitrary non-aggregating expressions may also be returned instead of simply returning a column. All columns can be returned using the `*` expression, and columns or expressions can be returned in addition to all columns returned by the `*`. For example: ```sql CREATE TABLE t1 (i INTEGER); INSERT INTO t1 SELECT 42 RETURNING *; ``` | i | |---:| | 42 | A more complex example that includes an expression in the `RETURNING` clause: ```sql CREATE TABLE t2 (i INTEGER, j INTEGER); INSERT INTO t2 SELECT 2 AS i, 3 AS j RETURNING *, i * j AS i_times_j; ``` | i | j | i_times_j | |--:|--:|----------:| | 2 | 3 | 6 | The next example shows a situation where the `RETURNING` clause is more helpful. First, a table is created with a primary key column. Then a sequence is created to allow for that primary key to be incremented as new rows are inserted. When we insert into the table, we do not already know the values generated by the sequence, so it is valuable to return them. For additional information, see the [`CREATE SEQUENCE` page](#docs:stable:sql:statements:create_sequence). ```sql CREATE TABLE t3 (i INTEGER PRIMARY KEY, j INTEGER); CREATE SEQUENCE 't3_key'; INSERT INTO t3 SELECT nextval('t3_key') AS i, 42 AS j UNION ALL SELECT nextval('t3_key') AS i, 43 AS j RETURNING *; ``` | i | j | |--:|---:| | 1 | 42 | | 2 | 43 | ### LOAD / INSTALL Statements {#docs:stable:sql:statements:load_and_install} #### `INSTALL` {#docs:stable:sql:statements:load_and_install::install} The `INSTALL` statement downloads an extension so it can be loaded into a DuckDB session. ##### Examples {#docs:stable:sql:statements:load_and_install::examples} Install the [`httpfs`](#docs:stable:core_extensions:httpfs:overview) extension: ```sql INSTALL httpfs; ``` Install the [`h3` community extension](#community_extensions:extensions:h3): ```sql INSTALL h3 FROM community; ``` ##### Syntax {#docs:stable:sql:statements:load_and_install::syntax} #### `LOAD` {#docs:stable:sql:statements:load_and_install::load} The `LOAD` statement loads an installed DuckDB extension into the current session. ##### Examples {#docs:stable:sql:statements:load_and_install::examples} Load the [`httpfs`](#docs:stable:core_extensions:httpfs:overview) extension: ```sql LOAD httpfs; ``` Load the [`spatial`](#docs:stable:core_extensions:spatial:overview) extension: ```sql LOAD spatial; ``` ##### Syntax {#docs:stable:sql:statements:load_and_install::syntax} ### MERGE INTO Statement {#docs:stable:sql:statements:merge_into} The `MERGE INTO` statement is an alternative to `INSERT INTO ... ON CONFLICT` that doesn't need a primary key since it allows for a custom match condition. This is a very useful alternative for upserting use cases (` INSERT` + `UPDATE`) when the destination table does not have a primary key constraint. #### Examples {#docs:stable:sql:statements:merge_into::examples} First, let's create a simple table. ```sql CREATE TABLE people (id INTEGER, name VARCHAR, salary FLOAT); INSERT INTO people VALUES (1, 'John', 92_000.0), (2, 'Anna', 100_000.0); ``` The simplest upsert would be updating or inserting a whole row. ```sql MERGE INTO people USING ( SELECT unnest([3, 1]) AS id, unnest(['Sarah', 'John']) AS name, unnest([95_000.0, 105_000.0]) AS salary ) AS upserts ON (upserts.id = people.id) WHEN MATCHED THEN UPDATE WHEN NOT MATCHED THEN INSERT; FROM people ORDER BY id; ``` | id | name | salary | |---:|-------|---------:| | 1 | John | 105000.0 | | 2 | Anna | 100000.0 | | 3 | Sarah | 95000.0 | In the previous example we are updating the whole row if `id` matches. However, it is also a common pattern to receive a _change set_ with some keys and the changed value. This is a good use for `SET`. If the match condition uses a column that has the same name in the source and destination, the keyword `USING` can be used in the match condition. ```sql MERGE INTO people USING ( SELECT 1 AS id, 98_000.0 AS salary ) AS salary_updates USING (id) WHEN MATCHED THEN UPDATE SET salary = salary_updates.salary; FROM people ORDER BY id; ``` | id | name | salary | |---:|-------|---------:| | 1 | John | 98000.0 | | 2 | Anna | 100000.0 | | 3 | Sarah | 95000.0 | Another common pattern is to receive a _delete set_ of rows, which may only contain ids of rows to be deleted. ```sql MERGE INTO people USING ( SELECT 1 AS id, ) AS deletes USING (id) WHEN MATCHED THEN DELETE; FROM people ORDER BY id; ``` | id | name | salary | |---:|-------|---------:| | 2 | Anna | 100000.0 | | 3 | Sarah | 95000.0 | `MERGE INTO` also supports more complex conditions, for example, for a given _delete set_ we can decide to only remove rows that contain a `salary` bigger or equal than a certain amount. ```sql MERGE INTO people USING ( SELECT unnest([3, 2]) AS id, ) AS deletes USING (id) WHEN MATCHED AND people.salary >= 100_000.0 THEN DELETE; FROM people ORDER BY id; ``` | id | name | salary | |---:|-------|--------:| | 3 | Sarah | 95000.0 | If needed, DuckDB also supports multiple `UPDATE` and `DELETE` conditions. The `RETURNING` clause can be used to indicate which rows where affected by the `MERGE` statement. ```sql -- Let's get John back in! INSERT INTO people VALUES (1, 'John', 105_000.0); MERGE INTO people USING ( SELECT unnest([3, 1]) AS id, unnest([89_000.0, 70_000.0]) AS salary ) AS upserts USING (id) WHEN MATCHED AND people.salary < 100_000.0 THEN UPDATE SET salary = upserts.salary -- Second update or delete condition WHEN MATCHED AND people.salary > 100_000.0 THEN DELETE WHEN NOT MATCHED THEN INSERT BY NAME RETURNING merge_action, *; ``` | merge_action | id | name | salary | |--------------|---:|-------|---------:| | UPDATE | 3 | Sarah | 89000.0 | | DELETE | 1 | John | 105000.0 | In some cases, you may want to perform a different action specifically if the source doesn't meet a condition. For example, if we expect that data that is not present on the source shouldn't be present in the target: ```sql CREATE TABLE source AS SELECT unnest([1,2]) AS id; MERGE INTO source USING (SELECT 1 AS id) target USING (id) WHEN MATCHED THEN UPDATE WHEN NOT MATCHED BY SOURCE THEN DELETE RETURNING merge_action, *; ``` | merge_action | id | |--------------|---:| | UPDATE | 1 | | DELETE | 2 | There is also the possibility of specifying `WHEN NOT MATCHED BY TARGET`. However, the behavior is, as you may expect, the same as `WHEN NOT MATCHED` since by default when specifying conditions, we look at the target. #### Syntax {#docs:stable:sql:statements:merge_into::syntax} ### PIVOT Statement {#docs:stable:sql:statements:pivot} The `PIVOT` statement allows distinct values within a column to be separated into their own columns. The values within those new columns are calculated using an aggregate function on the subset of rows that match each distinct value. DuckDB implements both the SQL Standard `PIVOT` syntax and a simplified `PIVOT` syntax that automatically detects the columns to create while pivoting. `PIVOT_WIDER` may also be used in place of the `PIVOT` keyword. For details on how the `PIVOT` statement is implemented, see the [Pivot Internals site](#docs:stable:internals:pivot::pivot). > The [`UNPIVOT` statement](#docs:stable:sql:statements:unpivot) is the inverse of the `PIVOT` statement. #### Simplified `PIVOT` Syntax {#docs:stable:sql:statements:pivot::simplified-pivot-syntax} The full syntax diagram is below, but the simplified `PIVOT` syntax can be summarized using spreadsheet pivot table naming conventions as: ```sql PIVOT âŸ¨datasetâŸ© ON âŸ¨columnsâŸ© USING âŸ¨valuesâŸ© GROUP BY âŸ¨rowsâŸ© ORDER BY âŸ¨columns_with_order_directionsâŸ© LIMIT âŸ¨number_of_rowsâŸ©; ``` The `ON`, `USING`, and `GROUP BY` clauses are each optional, but they may not all be omitted. ##### Example Data {#docs:stable:sql:statements:pivot::example-data} All examples use the dataset produced by the queries below: ```sql CREATE TABLE cities ( country VARCHAR, name VARCHAR, year INTEGER, population INTEGER ); INSERT INTO cities VALUES ('NL', 'Amsterdam', 2000, 1005), ('NL', 'Amsterdam', 2010, 1065), ('NL', 'Amsterdam', 2020, 1158), ('US', 'Seattle', 2000, 564), ('US', 'Seattle', 2010, 608), ('US', 'Seattle', 2020, 738), ('US', 'New York City', 2000, 8015), ('US', 'New York City', 2010, 8175), ('US', 'New York City', 2020, 8772); ``` ```sql SELECT * FROM cities; ``` | country | name | year | population | |---------|---------------|-----:|-----------:| | NL | Amsterdam | 2000 | 1005 | | NL | Amsterdam | 2010 | 1065 | | NL | Amsterdam | 2020 | 1158 | | US | Seattle | 2000 | 564 | | US | Seattle | 2010 | 608 | | US | Seattle | 2020 | 738 | | US | New York City | 2000 | 8015 | | US | New York City | 2010 | 8175 | | US | New York City | 2020 | 8772 | ##### `PIVOT ON` and `USING` {#docs:stable:sql:statements:pivot::pivot-on-and-using} Use the `PIVOT` statement below to create a separate column for each year and calculate the total population in each. The `ON` clause specifies which column(s) to split into separate columns. It is equivalent to the columns parameter in a spreadsheet pivot table. The `USING` clause determines how to aggregate the values that are split into separate columns. This is equivalent to the values parameter in a spreadsheet pivot table. If the `USING` clause is not included, it defaults to `count(*)`. ```sql PIVOT cities ON year USING sum(population); ``` | country | name | 2000 | 2010 | 2020 | |---------|---------------|-----:|-----:|-----:| | NL | Amsterdam | 1005 | 1065 | 1158 | | US | Seattle | 564 | 608 | 738 | | US | New York City | 8015 | 8175 | 8772 | In the above example, the `sum` aggregate is always operating on a single value. If we only want to change the orientation of how the data is displayed without aggregating, use the `first` aggregate function. In this example, we are pivoting numeric values, but the `first` function works very well for pivoting out a text column. (This is something that is difficult to do in a spreadsheet pivot table, but easy in DuckDB!) This query produces a result that is identical to the one above: ```sql PIVOT cities ON year USING first(population); ``` > **Note.** The SQL syntax permits [`FILTER` clauses](#docs:stable:sql:query_syntax:filter) with aggregate functions in the `USING` clause. > In DuckDB, the `PIVOT` statement currently does not support these and they are silently ignored. ##### `PIVOT ON`, `USING`, and `GROUP BY` {#docs:stable:sql:statements:pivot::pivot-on-using-and-group-by} By default, the `PIVOT` statement retains all columns not specified in the `ON` or `USING` clauses. To include only certain columns and further aggregate, specify columns in the `GROUP BY` clause. This is equivalent to the rows parameter of a spreadsheet pivot table. In the below example, the `name` column is no longer included in the output, and the data is aggregated up to the `country` level. ```sql PIVOT cities ON year USING sum(population) GROUP BY country; ``` | country | 2000 | 2010 | 2020 | |---------|-----:|-----:|-----:| | NL | 1005 | 1065 | 1158 | | US | 8579 | 8783 | 9510 | ##### `IN` Filter for `ON` Clause {#docs:stable:sql:statements:pivot::in-filter-for-on-clause} To only create a separate column for specific values within a column in the `ON` clause, use an optional `IN` expression. Let's say for example that we wanted to forget about the year 2020 for no particular reason... ```sql PIVOT cities ON year IN (2000, 2010) USING sum(population) GROUP BY country; ``` | country | 2000 | 2010 | |---------|-----:|-----:| | NL | 1005 | 1065 | | US | 8579 | 8783 | ##### Multiple Expressions per Clause {#docs:stable:sql:statements:pivot::multiple-expressions-per-clause} Multiple columns can be specified in the `ON` and `GROUP BY` clauses, and multiple aggregate expressions can be included in the `USING` clause. ###### Multiple `ON` Columns and `ON` Expressions {#docs:stable:sql:statements:pivot::multiple-on-columns-and-on-expressions} Multiple columns can be pivoted out into their own columns. DuckDB will find the distinct values in each `ON` clause column and create one new column for all combinations of those values (a Cartesian product). In the below example, all combinations of unique countries and unique cities receive their own column. Some combinations may not be present in the underlying data, so those columns are populated with `NULL` values. ```sql PIVOT cities ON country, name USING sum(population); ``` | year | NL_Amsterdam | NL_New York City | NL_Seattle | US_Amsterdam | US_New York City | US_Seattle | |-----:|-------------:|------------------|------------|--------------|-----------------:|-----------:| | 2000 | 1005 | NULL | NULL | NULL | 8015 | 564 | | 2010 | 1065 | NULL | NULL | NULL | 8175 | 608 | | 2020 | 1158 | NULL | NULL | NULL | 8772 | 738 | To pivot only the combinations of values that are present in the underlying data, use an expression in the `ON` clause. Multiple expressions and/or columns may be provided. Here, `country` and `name` are concatenated together and the resulting concatenations each receive their own column. Any arbitrary non-aggregating expression may be used. In this case, concatenating with an underscore is used to imitate the naming convention the `PIVOT` clause uses when multiple `ON` columns are provided (like in the prior example). ```sql PIVOT cities ON country || '_' || name USING sum(population); ``` | year | NL_Amsterdam | US_New York City | US_Seattle | |-----:|-------------:|-----------------:|-----------:| | 2000 | 1005 | 8015 | 564 | | 2010 | 1065 | 8175 | 608 | | 2020 | 1158 | 8772 | 738 | ###### Multiple `USING` Expressions {#docs:stable:sql:statements:pivot::multiple-using-expressions} An alias may also be included for each expression in the `USING` clause. It will be appended to the generated column names after an underscore (` _`). This makes the column naming convention much cleaner when multiple expressions are included in the `USING` clause. In this example, both the `sum` and `max` of the population column are calculated for each year and are split into separate columns. ```sql PIVOT cities ON year USING sum(population) AS total, max(population) AS max GROUP BY country; ``` | country | 2000_total | 2000_max | 2010_total | 2010_max | 2020_total | 2020_max | |---------|-----------:|---------:|-----------:|---------:|-----------:|---------:| | US | 8579 | 8015 | 8783 | 8175 | 9510 | 8772 | | NL | 1005 | 1005 | 1065 | 1065 | 1158 | 1158 | ###### Multiple `GROUP BY` Columns {#docs:stable:sql:statements:pivot::multiple-group-by-columns} Multiple `GROUP BY` columns may also be provided. Note that column names must be used rather than column positions (1, 2, etc.), and that expressions are not supported in the `GROUP BY` clause. ```sql PIVOT cities ON year USING sum(population) GROUP BY country, name; ``` | country | name | 2000 | 2010 | 2020 | |---------|---------------|-----:|-----:|-----:| | NL | Amsterdam | 1005 | 1065 | 1158 | | US | Seattle | 564 | 608 | 738 | | US | New York City | 8015 | 8175 | 8772 | ##### Using `PIVOT` within a `SELECT` Statement {#docs:stable:sql:statements:pivot::using-pivot-within-a-select-statement} The `PIVOT` statement may be included within a `SELECT` statement as a CTE ([a Common Table Expression, or `WITH` clause](#docs:stable:sql:query_syntax:with)), or a subquery. This allows for a `PIVOT` to be used alongside other SQL logic, as well as for multiple `PIVOT`s to be used in one query. No `SELECT` is needed within the CTE, the `PIVOT` keyword can be thought of as taking its place. ```sql WITH pivot_alias AS ( PIVOT cities ON year USING sum(population) GROUP BY country ) SELECT * FROM pivot_alias; ``` A `PIVOT` may be used in a subquery and must be wrapped in parentheses. Note that this behavior is different than the SQL Standard Pivot, as illustrated in subsequent examples. ```sql SELECT * FROM ( PIVOT cities ON year USING sum(population) GROUP BY country ) pivot_alias; ``` ##### Multiple `PIVOT` Statements {#docs:stable:sql:statements:pivot::multiple-pivot-statements} Each `PIVOT` can be treated as if it were a `SELECT` node, so they can be joined together or manipulated in other ways. For example, if two `PIVOT` statements share the same `GROUP BY` expression, they can be joined together using the columns in the `GROUP BY` clause into a wider pivot. ```sql SELECT * FROM (PIVOT cities ON year USING sum(population) GROUP BY country) year_pivot JOIN (PIVOT cities ON name USING sum(population) GROUP BY country) name_pivot USING (country); ``` | country | 2000 | 2010 | 2020 | Amsterdam | New York City | Seattle | |---------|-----:|-----:|-----:|----------:|--------------:|--------:| | NL | 1005 | 1065 | 1158 | 3228 | NULL | NULL | | US | 8579 | 8783 | 9510 | NULL | 24962 | 1910 | #### Simplified `PIVOT` Full Syntax Diagram {#docs:stable:sql:statements:pivot::simplified-pivot-full-syntax-diagram} Below is the full syntax diagram of the `PIVOT` statement. #### SQL Standard `PIVOT` Syntax {#docs:stable:sql:statements:pivot::sql-standard-pivot-syntax} The full syntax diagram is below, but the SQL Standard `PIVOT` syntax can be summarized as: ```sql SELECT * FROM âŸ¨datasetâŸ© PIVOT ( âŸ¨valuesâŸ© FOR âŸ¨column_1âŸ© IN (âŸ¨in_listâŸ©) âŸ¨column_2âŸ© IN (âŸ¨in_listâŸ©) ... GROUP BY âŸ¨rowsâŸ© ); ``` Unlike the simplified syntax, the `IN` clause must be specified for each column to be pivoted. If you are interested in dynamic pivoting, the simplified syntax is recommended. Note that no commas separate the expressions in the `FOR` clause, but that `value` and `GROUP BY` expressions must be comma-separated! #### Examples {#docs:stable:sql:statements:pivot::examples} This example uses a single value expression, a single column expression, and a single row expression: ```sql SELECT * FROM cities PIVOT ( sum(population) FOR year IN (2000, 2010, 2020) GROUP BY country ); ``` | country | 2000 | 2010 | 2020 | |---------|-----:|-----:|-----:| | NL | 1005 | 1065 | 1158 | | US | 8579 | 8783 | 9510 | This example is somewhat contrived, but serves as an example of using multiple value expressions and multiple columns in the `FOR` clause. ```sql SELECT * FROM cities PIVOT ( sum(population) AS total, count(population) AS count FOR year IN (2000, 2010) country IN ('NL', 'US') ); ``` | name | 2000_NL_total | 2000_NL_count | 2000_US_total | 2000_US_count | 2010_NL_total | 2010_NL_count | 2010_US_total | 2010_US_count | |--|-:|-:|-:|-:|-:|-:|-:|-:| | Amsterdam | 1005 | 1 | NULL | 0 | 1065 | 1 | NULL | 0 | | Seattle | NULL | 0 | 564 | 1 | NULL | 0 | 608 | 1 | | New York City | NULL | 0 | 8015 | 1 | NULL | 0 | 8175 | 1 | ##### SQL Standard `PIVOT` Full Syntax Diagram {#docs:stable:sql:statements:pivot::sql-standard-pivot-full-syntax-diagram} Below is the full syntax diagram of the SQL Standard version of the `PIVOT` statement. #### Limitations {#docs:stable:sql:statements:pivot::limitations} `PIVOT` currently only accepts an aggregate function, expressions are not allowed. For example, the following query attempts to get the population as the number of people instead of thousands of people (i.e., instead of 564, get 564000): ```sql PIVOT cities ON year USING sum(population) * 1000; ``` However, it fails with the following error: ```console Catalog Error: * is not an aggregate function ``` To work around this limitation, perform the `PIVOT` with the aggregation only, then use the [`COLUMNS` expression](#docs:stable:sql:expressions:star::columns-expression): ```sql SELECT country, name, 1000 * COLUMNS(* EXCLUDE (country, name)) FROM ( PIVOT cities ON year USING sum(population) ); ``` ### Profiling Queries {#docs:stable:sql:statements:profiling} DuckDB supports profiling queries via the `EXPLAIN` and `EXPLAIN ANALYZE` statements. #### `EXPLAIN` {#docs:stable:sql:statements:profiling::explain} To see the query plan of a query without executing it, run: ```sql EXPLAIN âŸ¨queryâŸ©; ``` The output of `EXPLAIN` contains the estimated cardinalities for each operator. #### `EXPLAIN ANALYZE` {#docs:stable:sql:statements:profiling::explain-analyze} To profile a query, run: ```sql EXPLAIN ANALYZE âŸ¨queryâŸ©; ``` The `EXPLAIN ANALYZE` statement runs the query, and shows the actual cardinalities for each operator, as well as the cumulative wall-clock time spent in each operator. ### SELECT Statement {#docs:stable:sql:statements:select} The `SELECT` statement retrieves rows from the database. ##### Examples {#docs:stable:sql:statements:select::examples} Select all columns from the table `tbl`: ```sql SELECT * FROM tbl; ``` Select the rows from `tbl`: ```sql SELECT j FROM tbl WHERE i = 3; ``` Perform an aggregate grouped by the column `i`: ```sql SELECT i, sum(j) FROM tbl GROUP BY i; ``` Select only the top 3 rows from the `tbl`: ```sql SELECT * FROM tbl ORDER BY i DESC LIMIT 3; ``` Join two tables together using the `USING` clause: ```sql SELECT * FROM t1 JOIN t2 USING (a, b); ``` Use column indexes to select the first and third column from the table `tbl`: ```sql SELECT #1, #3 FROM tbl; ``` Select all unique cities from the addresses table: ```sql SELECT DISTINCT city FROM addresses; ``` Return a `STRUCT` by using a row variable: ```sql SELECT d FROM (SELECT 1 AS a, 2 AS b) d; ``` ##### Syntax {#docs:stable:sql:statements:select::syntax} The `SELECT` statement retrieves rows from the database. The canonical order of a `SELECT` statement is as follows, with less common clauses being indented: ```sql SELECT âŸ¨select_listâŸ© FROM âŸ¨tablesâŸ© USING SAMPLE âŸ¨sample_expressionâŸ© WHERE âŸ¨conditionâŸ© GROUP BY âŸ¨groupsâŸ© HAVING âŸ¨group_filterâŸ© WINDOW âŸ¨window_expressionâŸ© QUALIFY âŸ¨qualify_filterâŸ© ORDER BY âŸ¨order_expressionâŸ© LIMIT âŸ¨nâŸ©; ``` Optionally, the `SELECT` statement can be prefixed with a [`WITH` clause](#docs:stable:sql:query_syntax:with). As the `SELECT` statement is so complex, we have split up the syntax diagrams into several parts. The full syntax diagram can be found at the bottom of the page. #### `SELECT` Clause {#docs:stable:sql:statements:select::select-clause} The [`SELECT` clause](#docs:stable:sql:query_syntax:select) specifies the list of columns that will be returned by the query. While it appears first in the clause, *logically* the expressions here are executed only at the end. The `SELECT` clause can contain arbitrary expressions that transform the output, as well as aggregates and window functions. The `DISTINCT` keyword ensures that only unique tuples are returned. > Column names are case-insensitive. See the [Rules for Case Sensitivity](#docs:stable:sql:dialect:keywords_and_identifiers::rules-for-case-sensitivity) for more details. #### `FROM` Clause {#docs:stable:sql:statements:select::from-clause} The [`FROM` clause](#docs:stable:sql:query_syntax:from) specifies the *source* of the data on which the remainder of the query should operate. Logically, the `FROM` clause is where the query starts execution. The `FROM` clause can contain a single table, a combination of multiple tables that are joined together, or another `SELECT` query inside a subquery node. #### `SAMPLE` Clause {#docs:stable:sql:statements:select::sample-clause} The [`SAMPLE` clause](#docs:stable:sql:query_syntax:sample) allows you to run the query on a sample from the base table. This can significantly speed up processing of queries, at the expense of accuracy in the result. Samples can also be used to quickly see a snapshot of the data when exploring a dataset. The `SAMPLE` clause is applied right after anything in the `FROM` clause (i.e., after any joins, but before the where clause or any aggregates). See the [Samples](#docs:stable:sql:samples) page for more information. #### `WHERE` Clause {#docs:stable:sql:statements:select::where-clause} The [`WHERE` clause](#docs:stable:sql:query_syntax:where) specifies any filters to apply to the data. This allows you to select only a subset of the data in which you are interested. Logically the `WHERE` clause is applied immediately after the `FROM` clause. #### `GROUP BY` and `HAVING` Clauses {#docs:stable:sql:statements:select::group-by-and-having-clauses} The [`GROUP BY` clause](#docs:stable:sql:query_syntax:groupby) specifies which grouping columns should be used to perform any aggregations in the `SELECT` clause. If the `GROUP BY` clause is specified, the query is always an aggregate query, even if no aggregations are present in the `SELECT` clause. #### `WINDOW` Clause {#docs:stable:sql:statements:select::window-clause} The [`WINDOW` clause](#docs:stable:sql:query_syntax:window) allows you to specify named windows that can be used within window functions. These are useful when you have multiple window functions, as they allow you to avoid repeating the same window clause. #### `QUALIFY` Clause {#docs:stable:sql:statements:select::qualify-clause} The [`QUALIFY` clause](#docs:stable:sql:query_syntax:qualify) is used to filter the result of [`WINDOW` functions](#docs:stable:sql:functions:window_functions). #### `ORDER BY`, `LIMIT` and `OFFSET` Clauses {#docs:stable:sql:statements:select::order-by-limit-and-offset-clauses} [`ORDER BY`](#docs:stable:sql:query_syntax:orderby), [`LIMIT` and `OFFSET`](#docs:stable:sql:query_syntax:limit) are output modifiers. Logically they are applied at the very end of the query. The `ORDER BY` clause sorts the rows on the sorting criteria in either ascending or descending order. The `LIMIT` clause restricts the amount of rows fetched, while the `OFFSET` clause indicates at which position to start reading the values. #### `VALUES` List {#docs:stable:sql:statements:select::values-list} [A `VALUES` list](#docs:stable:sql:query_syntax:values) is a set of values that is supplied instead of a `SELECT` statement. #### Row IDs {#docs:stable:sql:statements:select::row-ids} For each table, the [`rowid` pseudocolumn](https://docs.oracle.com/cd/B19306_01/server.102/b14200/pseudocolumns008.htm) returns the row identifiers based on the physical storage. ```sql CREATE TABLE t (id INTEGER, content VARCHAR); INSERT INTO t VALUES (42, 'hello'), (43, 'world'); SELECT rowid, id, content FROM t; ``` | rowid | id | content | |------:|---:|---------| | 0 | 42 | hello | | 1 | 43 | world | In the current storage, these identifiers are contiguous unsigned integers (0, 1, ...) if no rows were deleted. Deletions introduce gaps in the rowids which may be reclaimed later: ```sql CREATE OR REPLACE TABLE t AS (FROM range(10) r(i)); DELETE FROM t WHERE i % 2 = 0; SELECT rowid FROM t; ``` | rowid | |------:| | 1 | | 3 | | 5 | | 7 | | 9 | The `rowid` values are stable within a transaction. > **Best practice.** It is strongly advised to avoid using rowids as identifiers. > If there is a user-defined column named `rowid`, it shadows the `rowid` pseudocolumn. #### Common Table Expressions {#docs:stable:sql:statements:select::common-table-expressions} #### Full Syntax Diagram {#docs:stable:sql:statements:select::full-syntax-diagram} Below is the full syntax diagram of the `SELECT` statement: ### SET and RESET Statements {#docs:stable:sql:statements:set} The `SET` statement modifies the provided DuckDB configuration option at the specified scope. #### Examples {#docs:stable:sql:statements:set::examples} Update the `memory_limit` configuration value: ```sql SET memory_limit = '10GB'; ``` Configure the system to use `1` thread: ```sql SET threads = 1; ``` Or use the `TO` keyword: ```sql SET threads TO 1; ``` Change configuration option to default value: ```sql RESET threads; ``` Retrieve configuration value: ```sql SELECT current_setting('threads'); ``` Set the default collation for the session: ```sql SET SESSION default_collation = 'nocase'; ``` ##### Set a Global Variable {#docs:stable:sql:statements:set::set-a-global-variable} Set the default sort order globally: ```sql SET GLOBAL sort_order = 'desc'; ``` Set the default threads globally: ```sql SET GLOBAL threads = 4; ``` #### Syntax {#docs:stable:sql:statements:set::syntax} `SET` updates a DuckDB configuration option to the provided value. #### `RESET` {#docs:stable:sql:statements:set::reset} The `RESET` statement changes the given DuckDB configuration option to the default value. #### Scopes {#docs:stable:sql:statements:set::scopes} Configuration options can have different scopes: * `GLOBAL`: Configuration value is used (or reset) across the entire DuckDB instance. * `SESSION`: Configuration value is used (or reset) only for the current session attached to a DuckDB instance. * `LOCAL`: Not yet implemented. When not specified, the default scope for the configuration option is used. For most options this is `GLOBAL`. #### Configuration {#docs:stable:sql:statements:set::configuration} See the [Configuration](#docs:stable:configuration:overview) page for the full list of configuration options. ### SET VARIABLE and RESET VARIABLE Statements {#docs:stable:sql:statements:set_variable} DuckDB supports the definition of SQL-level variables using the `SET VARIABLE` and `RESET VARIABLE` statements. ##### Variable Scopes {#docs:stable:sql:statements:set_variable::variable-scopes} DuckDB supports two levels of variable scopes: | Scope | Description | |---|---| | `SESSION` | Variables with a `SESSION` scope are local to you and only affect the current session. | | `GLOBAL` | Variables with a `GLOBAL` scope are specific [configuration option variables](https://duckdb.org/docs/stable/configuration/overview.html#global-configuration-options) that affect the entire DuckDB instance and all sessions. For example, see [Set a Global Variable](#{% link docs:stable:sql:statements:set.md%}::set-a-global-variable). | #### `SET VARIABLE` {#docs:stable:sql:statements:set_variable::set-variable} The `SET VARIABLE` statement assigns a value to a variable, which can be accessed using the `getvariable` call: ```sql SET VARIABLE my_var = 30; SELECT 20 + getvariable('my_var') AS total; ``` | total | |------:| | 50 | If `SET VARIABLE` is invoked on an existing variable, it will overwrite its value: ```sql SET VARIABLE my_var = 30; SET VARIABLE my_var = 100; SELECT 20 + getvariable('my_var') AS total; ``` | total | |------:| | 120 | Variables can have different types: ```sql SET VARIABLE my_date = DATE '2018-07-13'; SET VARIABLE my_string = 'Hello world'; SET VARIABLE my_map = MAP {'k1': 10, 'k2': 20}; ``` Variables can also be assigned to results of queries: ```sql -- write some CSV files COPY (SELECT 42 AS a) TO 'test1.csv'; COPY (SELECT 84 AS a) TO 'test2.csv'; -- add a list of CSV files to a table CREATE TABLE csv_files (file VARCHAR); INSERT INTO csv_files VALUES ('test1.csv'), ('test2.csv'); -- initialize a variable with the list of csv files SET VARIABLE list_of_files = (SELECT list(file) FROM csv_files); -- read the CSV files SELECT * FROM read_csv(getvariable('list_of_files'), filename := True); ``` | a | filename | |-----:|------------:| | 42 | test.csv | | 84 | test2.csv | If a variable is not set, the `getvariable` function returns `NULL`: ```sql SELECT getvariable('undefined_var') AS result; ``` | result | |--------| | NULL | The `getvariable` function can also be used in a [`COLUMNS` expression](#docs:stable:sql:expressions:star::columns-expression): ```sql SET VARIABLE column_to_exclude = 'col1'; CREATE TABLE tbl AS SELECT 12 AS col0, 34 AS col1, 56 AS col2; SELECT COLUMNS(c -> c != getvariable('column_to_exclude')) FROM tbl; ``` | col0 | col2 | |-----:|-----:| | 12 | 56 | ##### Syntax {#docs:stable:sql:statements:set_variable::syntax} #### `RESET VARIABLE` {#docs:stable:sql:statements:set_variable::reset-variable} The `RESET VARIABLE` statement unsets a variable. ```sql SET VARIABLE my_var = 30; RESET VARIABLE my_var; SELECT getvariable('my_var') AS my_var; ``` | my_var | |--------| | NULL | ##### Syntax {#docs:stable:sql:statements:set_variable::syntax} ### SUMMARIZE Statement {#docs:stable:sql:statements:summarize} The `SUMMARIZE` statement returns summary statistics for a table, view or a query. #### Usage {#docs:stable:sql:statements:summarize::usage} ```sql SUMMARIZE tbl; ``` In order to summarize a query, prepend `SUMMARIZE` to a query. ```sql SUMMARIZE SELECT * FROM tbl; ``` #### See Also {#docs:stable:sql:statements:summarize::see-also} For more examples, see the [guide on `SUMMARIZE`](#docs:stable:guides:meta:summarize). ### Transaction Management {#docs:stable:sql:statements:transactions} DuckDB supports [ACID database transactions](https://en.wikipedia.org/wiki/Database_transaction). Transactions provide isolation, i.e., changes made by a transaction are not visible from concurrent transactions until it is committed. A transaction can also be aborted, which discards any changes it made so far. #### Statements {#docs:stable:sql:statements:transactions::statements} DuckDB provides the following statements for transaction management. ##### Starting a Transaction {#docs:stable:sql:statements:transactions::starting-a-transaction} To start a transaction, run: ```sql BEGIN TRANSACTION; ``` ##### Committing a Transaction {#docs:stable:sql:statements:transactions::committing-a-transaction} You can commit a transaction to make it visible to other transactions and to write it to persistent storage (if using DuckDB in persistent mode). To commit a transaction, run: ```sql COMMIT; ``` If you are not in an active transaction, the `COMMIT` statement will fail. ##### Rolling Back a Transaction {#docs:stable:sql:statements:transactions::rolling-back-a-transaction} You can abort a transaction. This operation, also known as rolling back, will discard any changes the transaction made to the database. To abort a transaction, run: ```sql ROLLBACK; ``` You can also use the abort command, which has an identical behavior: ```sql ABORT; ``` If you are not in an active transaction, the `ROLLBACK` and `ABORT` statements will fail. ##### Example {#docs:stable:sql:statements:transactions::example} We illustrate the use of transactions through a simple example. ```sql CREATE TABLE person (name VARCHAR, age BIGINT); BEGIN TRANSACTION; INSERT INTO person VALUES ('Ada', 52); COMMIT; BEGIN TRANSACTION; DELETE FROM person WHERE name = 'Ada'; INSERT INTO person VALUES ('Bruce', 39); ROLLBACK; SELECT * FROM person; ``` The first transaction (inserting â€œAdaâ€) was committed but the second (deleting â€œAdaâ€ and inserting â€œBruceâ€) was aborted. Therefore, the resulting table will only contain `<'Ada', 52>`. ### UNPIVOT Statement {#docs:stable:sql:statements:unpivot} The `UNPIVOT` statement allows multiple columns to be stacked into fewer columns. In the basic case, multiple columns are stacked into two columns: a `NAME` column (which contains the name of the source column) and a `VALUE` column (which contains the value from the source column). DuckDB implements both the SQL Standard `UNPIVOT` syntax and a simplified `UNPIVOT` syntax. Both can utilize a [`COLUMNS` expression](#docs:stable:sql:expressions:star::columns) to automatically detect the columns to unpivot. `PIVOT_LONGER` may also be used in place of the `UNPIVOT` keyword. For details on how the `UNPIVOT` statement is implemented, see the [Pivot Internals site](#docs:stable:internals:pivot::unpivot). > The [`PIVOT` statement](#docs:stable:sql:statements:pivot) is the inverse of the `UNPIVOT` statement. #### Simplified `UNPIVOT` Syntax {#docs:stable:sql:statements:unpivot::simplified-unpivot-syntax} The full syntax diagram is below, but the simplified `UNPIVOT` syntax can be summarized using spreadsheet pivot table naming conventions as: ```sql UNPIVOT âŸ¨datasetâŸ© ON âŸ¨column(s)âŸ© INTO NAME âŸ¨name_column_nameâŸ© VALUE âŸ¨value_column_name(s)âŸ© ORDER BY âŸ¨column(s)_with_order_direction(s)âŸ© LIMIT âŸ¨number_of_rowsâŸ©; ``` ##### Example Data {#docs:stable:sql:statements:unpivot::example-data} All examples use the dataset produced by the queries below: ```sql CREATE OR REPLACE TABLE monthly_sales (empid INTEGER, dept TEXT, Jan INTEGER, Feb INTEGER, Mar INTEGER, Apr INTEGER, May INTEGER, Jun INTEGER); INSERT INTO monthly_sales VALUES (1, 'electronics', 1, 2, 3, 4, 5, 6), (2, 'clothes', 10, 20, 30, 40, 50, 60), (3, 'cars', 100, 200, 300, 400, 500, 600); ``` ```sql FROM monthly_sales; ``` | empid | dept | Jan | Feb | Mar | Apr | May | Jun | |------:|-------------|----:|----:|----:|----:|----:|----:| | 1 | electronics | 1 | 2 | 3 | 4 | 5 | 6 | | 2 | clothes | 10 | 20 | 30 | 40 | 50 | 60 | | 3 | cars | 100 | 200 | 300 | 400 | 500 | 600 | ##### `UNPIVOT` Manually {#docs:stable:sql:statements:unpivot::unpivot-manually} The most typical `UNPIVOT` transformation is to take already pivoted data and re-stack it into a column each for the name and value. In this case, all months will be stacked into a `month` column and a `sales` column. ```sql UNPIVOT monthly_sales ON jan, feb, mar, apr, may, jun INTO NAME month VALUE sales; ``` | empid | dept | month | sales | |------:|-------------|-------|------:| | 1 | electronics | Jan | 1 | | 1 | electronics | Feb | 2 | | 1 | electronics | Mar | 3 | | 1 | electronics | Apr | 4 | | 1 | electronics | May | 5 | | 1 | electronics | Jun | 6 | | 2 | clothes | Jan | 10 | | 2 | clothes | Feb | 20 | | 2 | clothes | Mar | 30 | | 2 | clothes | Apr | 40 | | 2 | clothes | May | 50 | | 2 | clothes | Jun | 60 | | 3 | cars | Jan | 100 | | 3 | cars | Feb | 200 | | 3 | cars | Mar | 300 | | 3 | cars | Apr | 400 | | 3 | cars | May | 500 | | 3 | cars | Jun | 600 | ##### `UNPIVOT` Dynamically Using Columns Expression {#docs:stable:sql:statements:unpivot::unpivot-dynamically-using-columns-expression} In many cases, the number of columns to unpivot is not easy to predetermine ahead of time. In the case of this dataset, the query above would have to change each time a new month is added. The [`COLUMNS` expression](#docs:stable:sql:expressions:star::columns-expression) can be used to select all columns that are not `empid` or `dept`. This enables dynamic unpivoting that will work regardless of how many months are added. The query below returns identical results to the one above. ```sql UNPIVOT monthly_sales ON COLUMNS(* EXCLUDE (empid, dept)) INTO NAME month VALUE sales; ``` | empid | dept | month | sales | |------:|-------------|-------|------:| | 1 | electronics | Jan | 1 | | 1 | electronics | Feb | 2 | | 1 | electronics | Mar | 3 | | 1 | electronics | Apr | 4 | | 1 | electronics | May | 5 | | 1 | electronics | Jun | 6 | | 2 | clothes | Jan | 10 | | 2 | clothes | Feb | 20 | | 2 | clothes | Mar | 30 | | 2 | clothes | Apr | 40 | | 2 | clothes | May | 50 | | 2 | clothes | Jun | 60 | | 3 | cars | Jan | 100 | | 3 | cars | Feb | 200 | | 3 | cars | Mar | 300 | | 3 | cars | Apr | 400 | | 3 | cars | May | 500 | | 3 | cars | Jun | 600 | ##### `UNPIVOT` into Multiple Value Columns {#docs:stable:sql:statements:unpivot::unpivot-into-multiple-value-columns} The `UNPIVOT` statement has additional flexibility: more than 2 destination columns are supported. This can be useful when the goal is to reduce the extent to which a dataset is pivoted, but not completely stack all pivoted columns. To demonstrate this, the query below will generate a dataset with a separate column for the number of each month within the quarter (month 1, 2, or 3), and a separate row for each quarter. Since there are fewer quarters than months, this does make the dataset longer, but not as long as the above. To accomplish this, multiple sets of columns are included in the `ON` clause. The `q1` and `q2` aliases are optional. The number of columns in each set of columns in the `ON` clause must match the number of columns in the `VALUE` clause. ```sql UNPIVOT monthly_sales ON (jan, feb, mar) AS q1, (apr, may, jun) AS q2 INTO NAME quarter VALUE month_1_sales, month_2_sales, month_3_sales; ``` | empid | dept | quarter | month_1_sales | month_2_sales | month_3_sales | |------:|-------------|---------|--------------:|--------------:|--------------:| | 1 | electronics | q1 | 1 | 2 | 3 | | 1 | electronics | q2 | 4 | 5 | 6 | | 2 | clothes | q1 | 10 | 20 | 30 | | 2 | clothes | q2 | 40 | 50 | 60 | | 3 | cars | q1 | 100 | 200 | 300 | | 3 | cars | q2 | 400 | 500 | 600 | ##### Using `UNPIVOT` within a `SELECT` Statement {#docs:stable:sql:statements:unpivot::using-unpivot-within-a-select-statement} The `UNPIVOT` statement may be included within a `SELECT` statement as a CTE ([a Common Table Expression, or WITH clause](#docs:stable:sql:query_syntax:with)), or a subquery. This allows for an `UNPIVOT` to be used alongside other SQL logic, as well as for multiple `UNPIVOT`s to be used in one query. No `SELECT` is needed within the CTE, the `UNPIVOT` keyword can be thought of as taking its place. ```sql WITH unpivot_alias AS ( UNPIVOT monthly_sales ON COLUMNS(* EXCLUDE (empid, dept)) INTO NAME month VALUE sales ) SELECT * FROM unpivot_alias; ``` An `UNPIVOT` may be used in a subquery and must be wrapped in parentheses. Note that this behavior is different than the SQL Standard Unpivot, as illustrated in subsequent examples. ```sql SELECT * FROM ( UNPIVOT monthly_sales ON COLUMNS(* EXCLUDE (empid, dept)) INTO NAME month VALUE sales ) unpivot_alias; ``` ##### Expressions within `UNPIVOT` Statements {#docs:stable:sql:statements:unpivot::expressions-within-unpivot-statements} DuckDB allows expressions within the `UNPIVOT` statements, provided that they only involve a single column. These can be used to perform computations as well as [explicit casts](#docs:stable:sql:data_types:typecasting::explicit-casting). For example: ```sql UNPIVOT (SELECT 42 AS col1, 'woot' AS col2) ON (col1 * 2)::VARCHAR, col2; ``` | name | value | |------|-------| | col1 | 84 | | col2 | woot | ##### Simplified `UNPIVOT` Full Syntax Diagram {#docs:stable:sql:statements:unpivot::simplified-unpivot-full-syntax-diagram} Below is the full syntax diagram of the `UNPIVOT` statement. #### SQL Standard `UNPIVOT` Syntax {#docs:stable:sql:statements:unpivot::sql-standard-unpivot-syntax} The full syntax diagram is below, but the SQL Standard `UNPIVOT` syntax can be summarized as: ```sql FROM [dataset] UNPIVOT [INCLUDE NULLS] ( [value-column-name(s)] FOR [name-column-name] IN [column(s)] ); ``` Note that only one column can be included in the `name-column-name` expression. ##### SQL Standard `UNPIVOT` Manually {#docs:stable:sql:statements:unpivot::sql-standard-unpivot-manually} To complete the basic `UNPIVOT` operation using the SQL standard syntax, only a few additions are needed. ```sql FROM monthly_sales UNPIVOT ( sales FOR month IN (jan, feb, mar, apr, may, jun) ); ``` | empid | dept | month | sales | |------:|-------------|-------|------:| | 1 | electronics | Jan | 1 | | 1 | electronics | Feb | 2 | | 1 | electronics | Mar | 3 | | 1 | electronics | Apr | 4 | | 1 | electronics | May | 5 | | 1 | electronics | Jun | 6 | | 2 | clothes | Jan | 10 | | 2 | clothes | Feb | 20 | | 2 | clothes | Mar | 30 | | 2 | clothes | Apr | 40 | | 2 | clothes | May | 50 | | 2 | clothes | Jun | 60 | | 3 | cars | Jan | 100 | | 3 | cars | Feb | 200 | | 3 | cars | Mar | 300 | | 3 | cars | Apr | 400 | | 3 | cars | May | 500 | | 3 | cars | Jun | 600 | ##### SQL Standard `UNPIVOT` Dynamically Using the `COLUMNS` Expression {#docs:stable:sql:statements:unpivot::sql-standard-unpivot-dynamically-using-the-columns-expression} The [`COLUMNS` expression](#docs:stable:sql:expressions:star::columns) can be used to determine the `IN` list of columns dynamically. This will continue to work even if additional `month` columns are added to the dataset. It produces the same result as the query above. ```sql FROM monthly_sales UNPIVOT ( sales FOR month IN (columns(* EXCLUDE (empid, dept))) ); ``` ##### SQL Standard `UNPIVOT` into Multiple Value Columns {#docs:stable:sql:statements:unpivot::sql-standard-unpivot-into-multiple-value-columns} The `UNPIVOT` statement has additional flexibility: more than 2 destination columns are supported. This can be useful when the goal is to reduce the extent to which a dataset is pivoted, but not completely stack all pivoted columns. To demonstrate this, the query below will generate a dataset with a separate column for the number of each month within the quarter (month 1, 2, or 3), and a separate row for each quarter. Since there are fewer quarters than months, this does make the dataset longer, but not as long as the above. To accomplish this, multiple columns are included in the `value-column-name` portion of the `UNPIVOT` statement. Multiple sets of columns are included in the `IN` clause. The `q1` and `q2` aliases are optional. The number of columns in each set of columns in the `IN` clause must match the number of columns in the `value-column-name` portion. ```sql FROM monthly_sales UNPIVOT ( (month_1_sales, month_2_sales, month_3_sales) FOR quarter IN ( (jan, feb, mar) AS q1, (apr, may, jun) AS q2 ) ); ``` | empid | dept | quarter | month_1_sales | month_2_sales | month_3_sales | |------:|-------------|---------|--------------:|--------------:|--------------:| | 1 | electronics | q1 | 1 | 2 | 3 | | 1 | electronics | q2 | 4 | 5 | 6 | | 2 | clothes | q1 | 10 | 20 | 30 | | 2 | clothes | q2 | 40 | 50 | 60 | | 3 | cars | q1 | 100 | 200 | 300 | | 3 | cars | q2 | 400 | 500 | 600 | ##### SQL Standard `UNPIVOT` Full Syntax Diagram {#docs:stable:sql:statements:unpivot::sql-standard-unpivot-full-syntax-diagram} Below is the full syntax diagram of the SQL Standard version of the `UNPIVOT` statement. ### UPDATE Statement {#docs:stable:sql:statements:update} The `UPDATE` statement modifies the values of rows in a table. #### Examples {#docs:stable:sql:statements:update::examples} For every row where `i` is `NULL`, set the value to 0 instead: ```sql UPDATE tbl SET i = 0 WHERE i IS NULL; ``` Set all values of `i` to 1 and all values of `j` to 2: ```sql UPDATE tbl SET i = 1, j = 2; ``` #### Syntax {#docs:stable:sql:statements:update::syntax} `UPDATE` changes the values of the specified columns in all rows that satisfy the condition. Only the columns to be modified need be mentioned in the `SET` clause; columns not explicitly modified retain their previous values. #### Update from Other Table {#docs:stable:sql:statements:update::update-from-other-table} A table can be updated based upon values from another table. This can be done by specifying a table in a `FROM` clause, or using a sub-select statement. Both approaches have the benefit of completing the `UPDATE` operation in bulk for increased performance. ```sql CREATE OR REPLACE TABLE original AS SELECT 1 AS key, 'original value' AS value UNION ALL SELECT 2 AS key, 'original value 2' AS value; CREATE OR REPLACE TABLE new AS SELECT 1 AS key, 'new value' AS value UNION ALL SELECT 2 AS key, 'new value 2' AS value; SELECT * FROM original; ``` | key | value | |-----|------------------| | 1 | original value | | 2 | original value 2 | ```sql UPDATE original SET value = new.value FROM new WHERE original.key = new.key; ``` Or: ```sql UPDATE original SET value = ( SELECT new.value FROM new WHERE original.key = new.key ); ``` ```sql SELECT * FROM original; ``` | key | value | |-----|-------------| | 1 | new value | | 2 | new value 2 | #### Update from Same Table {#docs:stable:sql:statements:update::update-from-same-table} The only difference between this case and the above is that a different table alias must be specified on both the target table and the source table. In this example `AS true_original` and `AS new` are both required. ```sql UPDATE original AS true_original SET value = ( SELECT new.value || ' a change!' AS value FROM original AS new WHERE true_original.key = new.key ); ``` #### Update Using Joins {#docs:stable:sql:statements:update::update-using-joins} To select the rows to update, `UPDATE` statements can use the `FROM` clause and express joins via the `WHERE` clause. For example: ```sql CREATE TABLE city (name VARCHAR, revenue BIGINT, country_code VARCHAR); CREATE TABLE country (code VARCHAR, name VARCHAR); INSERT INTO city VALUES ('Paris', 700, 'FR'), ('Lyon', 200, 'FR'), ('Brussels', 400, 'BE'); INSERT INTO country VALUES ('FR', 'France'), ('BE', 'Belgium'); ``` To increase the revenue of all cities in France, join the `city` and the `country` tables, and filter on the latter: ```sql UPDATE city SET revenue = revenue + 100 FROM country WHERE city.country_code = country.code AND country.name = 'France'; ``` ```sql SELECT * FROM city; ``` | name | revenue | country_code | |----------|--------:|--------------| | Paris | 800 | FR | | Lyon | 300 | FR | | Brussels | 400 | BE | #### Upsert (Insert or Update) {#docs:stable:sql:statements:update::upsert-insert-or-update} See the [Insert documentation](#docs:stable:sql:statements:insert::on-conflict-clause) for details. ### USE Statement {#docs:stable:sql:statements:use} The `USE` statement selects a database and optional schema, or just a schema to use as the default. #### Examples {#docs:stable:sql:statements:use::examples} ```sql --- Sets the 'memory' database as the default. Will use 'main' schema implicitly or error --- if it does not exist. USE memory; --- Sets the 'duck.main' database and schema as the default USE duck.main; -- Sets the `main` schema of the currently selected database as the default, in this case 'duck.main' USE main; ``` #### Syntax {#docs:stable:sql:statements:use::syntax} The `USE` statement sets a default database, schema or database/schema combination to use for future operations. For instance, tables created without providing a fully qualified table name will be created in the default database. ### VACUUM Statement {#docs:stable:sql:statements:vacuum} The `VACUUM` statement only has basic support in DuckDB and is mostly provided for PostgreSQL-compatibility. Some variants of it, such as when calling for a given column, recompute the distinct statistics (the amount if distinct entities) if they have become stale due to updates. > **Warning.** The behavior of `VACUUM` is not consistent with PostgreSQL semantics and it is likely going to change in the future. #### Examples {#docs:stable:sql:statements:vacuum::examples} No-op: ```sql VACUUM; ``` No-op: ```sql VACUUM ANALYZE; ``` Calling `VACUUM` on a given table-column pair rebuilds statistics for the table and column: ```sql VACUUM my_table(my_column); ``` Rebuild statistics for the table and column: ```sql VACUUM ANALYZE my_table(my_column); ``` The following operation is not supported: ```sql VACUUM FULL; ``` ```console Not implemented Error: Full vacuum option ``` #### Reclaiming Space {#docs:stable:sql:statements:vacuum::reclaiming-space} The `VACUUM` statement does not reclaim space. For instruction on reclaiming space, refer to the [â€œReclaiming spaceâ€ page](#docs:stable:operations_manual:footprint_of_duckdb:reclaiming_space). #### Syntax {#docs:stable:sql:statements:vacuum::syntax} ## Query Syntax {#sql:query_syntax} ### SELECT Clause {#docs:stable:sql:query_syntax:select} The `SELECT` clause specifies the list of columns that will be returned by the query. While it appears first in the clause, *logically* the expressions here are executed only at the end. The `SELECT` clause can contain arbitrary expressions that transform the output, as well as aggregates and window functions. #### Examples {#docs:stable:sql:query_syntax:select::examples} Select all columns from the table called `tbl`: ```sql SELECT * FROM tbl; ``` Perform arithmetic on the columns in a table, and provide an alias: ```sql SELECT col1 + col2 AS res, sqrt(col1) AS root FROM tbl; ``` Use prefix aliases: ```sql SELECT res: col1 + col2, root: sqrt(col1) FROM tbl; ``` Select all unique cities from the `addresses` table: ```sql SELECT DISTINCT city FROM addresses; ``` Return the total number of rows in the `addresses` table: ```sql SELECT count(*) FROM addresses; ``` Select all columns except the city column from the `addresses` table: ```sql SELECT * EXCLUDE (city) FROM addresses; ``` Select all columns from the `addresses` table, but replace `city` with `lower(city)`: ```sql SELECT * REPLACE (lower(city) AS city) FROM addresses; ``` Select all columns matching the given regular expression from the table: ```sql SELECT COLUMNS('number\d+') FROM addresses; ``` Compute a function on all given columns of a table: ```sql SELECT min(COLUMNS(*)) FROM addresses; ``` To select columns with spaces or special characters, use double quotes (` "`): ```sql SELECT "Some Column Name" FROM tbl; ``` #### Syntax {#docs:stable:sql:query_syntax:select::syntax} #### `SELECT` List {#docs:stable:sql:query_syntax:select::select-list} The `SELECT` clause contains a list of expressions that specify the result of a query. The select list can refer to any columns in the `FROM` clause, and combine them using expressions. As the output of a SQL query is a table â€“ every expression in the `SELECT` clause also has a name. The expressions can be explicitly named using the `AS` clause (e.g., `expr AS name`). If a name is not provided by the user the expressions are named automatically by the system. > Column names are case-insensitive. See the [Rules for Case Sensitivity](#docs:stable:sql:dialect:keywords_and_identifiers::rules-for-case-sensitivity) for more details. ##### Star Expressions {#docs:stable:sql:query_syntax:select::star-expressions} Select all columns from the table called `tbl`: ```sql SELECT * FROM tbl; ``` Select all columns matching the given regular expression from the table: ```sql SELECT COLUMNS('number\d+') FROM addresses; ``` The [star expression](#docs:stable:sql:expressions:star) is a special expression that expands to *multiple expressions* based on the contents of the `FROM` clause. In the simplest case, `*` expands to **all** expressions in the `FROM` clause. Columns can also be selected using regular expressions or lambda functions. See the [star expression page](#docs:stable:sql:expressions:star) for more details. ##### `DISTINCT` Clause {#docs:stable:sql:query_syntax:select::distinct-clause} Select all unique cities from the addresses table: ```sql SELECT DISTINCT city FROM addresses; ``` The `DISTINCT` clause can be used to return **only** the unique rows in the result â€“ so that any duplicate rows are filtered out. > Queries starting with `SELECT DISTINCT` run deduplication, which is an expensive operation. Therefore, only use `DISTINCT` if necessary. ##### `DISTINCT ON` Clause {#docs:stable:sql:query_syntax:select::distinct-on-clause} Select only the highest population city for each country: ```sql SELECT DISTINCT ON(country) city, population FROM cities ORDER BY population DESC; ``` The `DISTINCT ON` clause returns only one row per unique value in the set of expressions as defined in the `ON` clause. If an `ORDER BY` clause is present, the row that is returned is the first row that is encountered as per the `ORDER BY` criteria. If an `ORDER BY` clause is not present, the first row that is encountered is not defined and can be any row in the table. > When querying large datasets, using `DISTINCT` on all columns can be expensive. Therefore, consider using `DISTINCT ON` on a column (or a set of columns) which guaranetees a sufficient degree of uniqueness for your results. For example, using `DISTINCT ON` on the key column(s) of a table guarantees full uniqueness. ##### Aggregates {#docs:stable:sql:query_syntax:select::aggregates} Return the total number of rows in the addresses table: ```sql SELECT count(*) FROM addresses; ``` Return the total number of rows in the addresses table grouped by city: ```sql SELECT city, count(*) FROM addresses GROUP BY city; ``` [Aggregate functions](#docs:stable:sql:functions:aggregates) are special functions that *combine* multiple rows into a single value. When aggregate functions are present in the `SELECT` clause, the query is turned into an aggregate query. In an aggregate query, **all** expressions must either be part of an aggregate function, or part of a group (as specified by the [`GROUP BY clause`](#docs:stable:sql:query_syntax:groupby)). ##### Window Functions {#docs:stable:sql:query_syntax:select::window-functions} Generate a `row_number` column containing incremental identifiers for each row: ```sql SELECT row_number() OVER () FROM sales; ``` Compute the difference between the current amount, and the previous amount, by order of time: ```sql SELECT amount - lag(amount) OVER (ORDER BY time) FROM sales; ``` [Window functions](#docs:stable:sql:functions:window_functions) are special functions that allow the computation of values relative to *other rows* in a result. Window functions are marked by the `OVER` clause which contains the *window specification*. The window specification defines the frame or context in which the window function is computed. See the [window functions page](#docs:stable:sql:functions:window_functions) for more information. ##### `unnest` Function {#docs:stable:sql:query_syntax:select::unnest-function} Unnest an array by one level: ```sql SELECT unnest([1, 2, 3]); ``` Unnest a struct by one level: ```sql SELECT unnest({'a': 42, 'b': 84}); ``` The [`unnest`](#docs:stable:sql:query_syntax:unnest) function is a special function that can be used together with [arrays](#docs:stable:sql:data_types:array), [lists](#docs:stable:sql:data_types:list), or [structs](#docs:stable:sql:data_types:struct). The unnest function strips one level of nesting from the type. For example, `INTEGER[]` is transformed into `INTEGER`. `STRUCT(a INTEGER, b INTEGER)` is transformed into `a INTEGER, b INTEGER`. The unnest function can be used to transform nested types into regular scalar types, which makes them easier to operate on. ### FROM and JOIN Clauses {#docs:stable:sql:query_syntax:from} The `FROM` clause specifies the *source* of the data on which the remainder of the query should operate. Logically, the `FROM` clause is where the query starts execution. The `FROM` clause can contain a single table, a combination of multiple tables that are joined together using `JOIN` clauses, or another `SELECT` query inside a subquery node. DuckDB also has an optional `FROM`-first syntax which enables you to also query without a `SELECT` statement. #### Examples {#docs:stable:sql:query_syntax:from::examples} Select all columns from the table called `tbl`: ```sql SELECT * FROM tbl; ``` Select all columns from the table using the `FROM`-first syntax: ```sql FROM tbl SELECT *; ``` Select all columns using the `FROM`-first syntax and omitting the `SELECT` clause: ```sql FROM tbl; ``` Select all columns from the table called `tbl` through an alias `tn`: ```sql SELECT tn.* FROM tbl tn; ``` Use a prefix alias: ```sql SELECT tn.* FROM tn: tbl; ``` Select all columns from the table `tbl` in the schema `schema_name`: ```sql SELECT * FROM schema_name.tbl; ``` Select the column `i` from the table function `range`, where the first column of the range function is renamed to `i`: ```sql SELECT t.i FROM range(100) AS t(i); ``` Select all columns from the CSV file called `test.csv`: ```sql SELECT * FROM 'test.csv'; ``` Select all columns from a subquery: ```sql SELECT * FROM (SELECT * FROM tbl); ``` Select the entire row of the table as a struct: ```sql SELECT t FROM t; ``` Select the entire row of the subquery as a struct (i.e., a single column): ```sql SELECT t FROM (SELECT unnest(generate_series(41, 43)) AS x, 'hello' AS y) t; ``` Join two tables together: ```sql SELECT * FROM tbl JOIN other_table ON tbl.key = other_table.key; ``` Select a 10% sample from a table: ```sql SELECT * FROM tbl TABLESAMPLE 10%; ``` Select a sample of 10 rows from a table: ```sql SELECT * FROM tbl TABLESAMPLE 10 ROWS; ``` Use the `FROM`-first syntax with `WHERE` clause and aggregation: ```sql FROM range(100) AS t(i) SELECT sum(t.i) WHERE i % 2 = 0; ``` ##### Table Functions {#docs:stable:sql:query_syntax:from::table-functions} Some functions in duckdb return entire tables rather than individual values. These functions are accordingly called _table functions_ and can be used with a `FROM` clause like regular table references. Examples include [`read_csv`](#{%link docs:stable:data:csv:overview.md %}::csv-functions), [`read_parquet`](#{%link docs:stable:data:parquet:overview.md %}::read_parquet-function), [`range`](#docs:stable:sql:functions:list::rangestart-stop-step), [`generate_series`](#docs:stable:sql:functions:list::generate_seriesstart-stop-step), [`repeat`](#docs:stable:sql:functions:utility::repeat_rowvarargs-num_rows), [`unnest`](#docs:stable:sql:query_syntax:unnest), and [`glob`](#{%link docs:stable:sql:functions:utility.md %}::globsearch_path) (note that some of the examples here can be used as both scalar and table functions). For example, ```sql SELECT * FROM 'test.csv'; ``` is implicitly translated to a call of the `read_csv` table function: ```sql SELECT * FROM read_csv('test.csv'); ``` All table functions support a `WITH ORDINALITY` prefix, which extends the returned table by an integer column `ordinality` that enumerates the generated rows starting at `1`. ```sql SELECT * FROM read_csv('test.csv') WITH ORDINALITY; ``` Note that the same result could be achieved using the [`row_number` window function](#docs:stable:sql:functions:window_functions::row_numberorder-by-ordering). In the presence of [joins](#::joins), however, `WITH ORDINALITY` allows enumerating one side of the join instead of the final result set, without having to resort to sub-queries. #### Joins {#docs:stable:sql:query_syntax:from::joins} Joins are a fundamental relational operation used to connect two tables or relations horizontally. The relations are referred to as the _left_ and _right_ sides of the join based on how they are written in the join clause. Each result row has the columns from both relations. A join uses a rule to match pairs of rows from each relation. Often this is a predicate, but there are other implied rules that may be specified. ##### Outer Joins {#docs:stable:sql:query_syntax:from::outer-joins} Rows that do not have any matches can still be returned if an `OUTER` join is specified. Outer joins can be one of: * `LEFT` (All rows from the left relation appear at least once) * `RIGHT` (All rows from the right relation appear at least once) * `FULL` (All rows from both relations appear at least once) A join that is not `OUTER` is `INNER` (only rows that get paired are returned). When an unpaired row is returned, the attributes from the other table are set to `NULL`. ##### Cross Product Joins (Cartesian Product) {#docs:stable:sql:query_syntax:from::cross-product-joins-cartesian-product} The simplest type of join is a `CROSS JOIN`. There are no conditions for this type of join, and it just returns all the possible pairs. Return all pairs of rows: ```sql SELECT a.*, b.* FROM a CROSS JOIN b; ``` This is equivalent to omitting the `JOIN` clause: ```sql SELECT a.*, b.* FROM a, b; ``` ##### Conditional Joins {#docs:stable:sql:query_syntax:from::conditional-joins} Most joins are specified by a predicate that connects attributes from one side to attributes from the other side. The conditions can be explicitly specified using an `ON` clause with the join (clearer) or implied by the `WHERE` clause (old-fashioned). We use the `l_regions` and the `l_nations` tables from the TPC-H schema: ```sql CREATE TABLE l_regions ( r_regionkey INTEGER NOT NULL PRIMARY KEY, r_name CHAR(25) NOT NULL, r_comment VARCHAR(152) ); CREATE TABLE l_nations ( n_nationkey INTEGER NOT NULL PRIMARY KEY, n_name CHAR(25) NOT NULL, n_regionkey INTEGER NOT NULL, n_comment VARCHAR(152), FOREIGN KEY (n_regionkey) REFERENCES l_regions(r_regionkey) ); ``` Return the regions for the nations: ```sql SELECT n.*, r.* FROM l_nations n JOIN l_regions r ON (n_regionkey = r_regionkey); ``` If the column names are the same and are required to be equal, then the simpler `USING` syntax can be used: ```sql CREATE TABLE l_regions (regionkey INTEGER NOT NULL PRIMARY KEY, name CHAR(25) NOT NULL, comment VARCHAR(152)); CREATE TABLE l_nations (nationkey INTEGER NOT NULL PRIMARY KEY, name CHAR(25) NOT NULL, regionkey INTEGER NOT NULL, comment VARCHAR(152), FOREIGN KEY (regionkey) REFERENCES l_regions(regionkey)); ``` Return the regions for the nations: ```sql SELECT n.*, r.* FROM l_nations n JOIN l_regions r USING (regionkey); ``` The expressions do not have to be equalities â€“ any predicate can be used: Return the pairs of jobs where one ran longer but cost less: ```sql SELECT s1.t_id, s2.t_id FROM west s1, west s2 WHERE s1.time > s2.time AND s1.cost < s2.cost; ``` ##### Natural Joins {#docs:stable:sql:query_syntax:from::natural-joins} Natural joins join two tables based on attributes that share the same name. For example, take the following example with cities, airport codes and airport names. Note that both tables are intentionally incomplete, i.e., they do not have a matching pair in the other table. ```sql CREATE TABLE city_airport (city_name VARCHAR, iata VARCHAR); CREATE TABLE airport_names (iata VARCHAR, airport_name VARCHAR); INSERT INTO city_airport VALUES ('Amsterdam', 'AMS'), ('Rotterdam', 'RTM'), ('Eindhoven', 'EIN'), ('Groningen', 'GRQ'); INSERT INTO airport_names VALUES ('AMS', 'Amsterdam Airport Schiphol'), ('RTM', 'Rotterdam The Hague Airport'), ('MST', 'Maastricht Aachen Airport'); ``` To join the tables on their shared [`IATA`](https://en.wikipedia.org/wiki/IATA_airport_code) attributes, run: ```sql SELECT * FROM city_airport NATURAL JOIN airport_names; ``` This produces the following result: | city_name | iata | airport_name | |-----------|------|-----------------------------| | Amsterdam | AMS | Amsterdam Airport Schiphol | | Rotterdam | RTM | Rotterdam The Hague Airport | Note that only rows where the same `iata` attribute was present in both tables were included in the result. We can also express query using the vanilla `JOIN` clause with the `USING` keyword: ```sql SELECT * FROM city_airport JOIN airport_names USING (iata); ``` ##### Semi and Anti Joins {#docs:stable:sql:query_syntax:from::semi-and-anti-joins} Semi joins return rows from the left table that have at least one match in the right table. Anti joins return rows from the left table that have _no_ matches in the right table. When using a semi or anti join the result will never have more rows than the left hand side table. Semi joins provide the same logic as the [`IN` operator](#docs:stable:sql:expressions:in) statement. Anti joins provide the same logic as the `NOT IN` operator, except anti joins ignore `NULL` values from the right table. ###### Semi Join Example {#docs:stable:sql:query_syntax:from::semi-join-example} Return a list of cityâ€“airport code pairs from the `city_airport` table where the airport name **is available** in the `airport_names` table: ```sql SELECT * FROM city_airport SEMI JOIN airport_names USING (iata); ``` | city_name | iata | |-----------|------| | Amsterdam | AMS | | Rotterdam | RTM | This query is equivalent with: ```sql SELECT * FROM city_airport WHERE iata IN (SELECT iata FROM airport_names); ``` ###### Anti Join Example {#docs:stable:sql:query_syntax:from::anti-join-example} Return a list of cityâ€“airport code pairs from the `city_airport` table where the airport name **is not available** in the `airport_names` table: ```sql SELECT * FROM city_airport ANTI JOIN airport_names USING (iata); ``` | city_name | iata | |-----------|------| | Eindhoven | EIN | | Groningen | GRQ | This query is equivalent with: ```sql SELECT * FROM city_airport WHERE iata NOT IN (SELECT iata FROM airport_names WHERE iata IS NOT NULL); ``` ##### Lateral Joins {#docs:stable:sql:query_syntax:from::lateral-joins} The `LATERAL` keyword allows subqueries in the `FROM` clause to refer to previous subqueries. This feature is also known as a _lateral join_. ```sql SELECT * FROM range(3) t(i), LATERAL (SELECT i + 1) t2(j); ``` | i | j | |--:|--:| | 0 | 1 | | 2 | 3 | | 1 | 2 | Lateral joins are a generalization of correlated subqueries, as they can return multiple values per input value rather than only a single value. ```sql SELECT * FROM generate_series(0, 1) t(i), LATERAL (SELECT i + 10 UNION ALL SELECT i + 100) t2(j); ``` | i | j | |--:|----:| | 0 | 10 | | 1 | 11 | | 0 | 100 | | 1 | 101 | It may be helpful to think about `LATERAL` as a loop where we iterate through the rows of the first subquery and use it as input to the second (` LATERAL`) subquery. In the examples above, we iterate through table `t` and refer to its column `i` from the definition of table `t2`. The rows of `t2` form column `j` in the result. It is possible to refer to multiple attributes from the `LATERAL` subquery. Using the table from the first example: ```sql CREATE TABLE t1 AS SELECT * FROM range(3) t(i), LATERAL (SELECT i + 1) t2(j); SELECT * FROM t1, LATERAL (SELECT i + j) t2(k) ORDER BY ALL; ``` | i | j | k | |--:|--:|--:| | 0 | 1 | 1 | | 1 | 2 | 3 | | 2 | 3 | 5 | > DuckDB detects when `LATERAL` joins should be used, making the use of the `LATERAL` keyword optional. ##### Positional Joins {#docs:stable:sql:query_syntax:from::positional-joins} When working with data frames or other embedded tables of the same size, the rows may have a natural correspondence based on their physical order. In scripting languages, this is easily expressed using a loop: ```cpp for (i = 0; i < n; i++) { f(t1.a[i], t2.b[i]); } ``` It is difficult to express this in standard SQL because relational tables are not ordered, but imported tables such as [data frames](#docs:stable:clients:python:data_ingestion::pandas-dataframes-â€“-object-columns) or disk files (like [CSVs](#docs:stable:data:csv:overview) or [Parquet files](#docs:stable:data:parquet:overview)) do have a natural ordering. Connecting them using this ordering is called a _positional join:_ ```sql CREATE TABLE t1 (x INTEGER); CREATE TABLE t2 (s VARCHAR); INSERT INTO t1 VALUES (1), (2), (3); INSERT INTO t2 VALUES ('a'), ('b'); SELECT * FROM t1 POSITIONAL JOIN t2; ``` | x | s | |--:|------| | 1 | a | | 2 | b | | 3 | NULL | Positional joins are always `FULL OUTER` joins, i.e., missing values (the last values in the shorter column) are set to `NULL`. ##### As-Of Joins {#docs:stable:sql:query_syntax:from::as-of-joins} A common operation when working with temporal or similarly-ordered data is to find the nearest (first) event in a reference table (such as prices). This is called an _as-of join:_ Attach prices to stock trades: ```sql SELECT t.*, p.price FROM trades t ASOF JOIN prices p ON t.symbol = p.symbol AND t.when >= p.when; ``` The `ASOF` join requires at least one inequality condition on the ordering field. The inequality can be any inequality condition (` >=`, `>`, `<=`, `<`) on any data type, but the most common form is `>=` on a temporal type. Any other conditions must be equalities (or `NOT DISTINCT`). This means that the left/right order of the tables is significant. `ASOF` joins each left side row with at most one right side row. It can be specified as an `OUTER` join to find unpaired rows (e.g., trades without prices or prices which have no trades.) Attach prices or NULLs to stock trades: ```sql SELECT * FROM trades t ASOF LEFT JOIN prices p ON t.symbol = p.symbol AND t.when >= p.when; ``` `ASOF` joins can also specify join conditions on matching column names with the `USING` syntax, but the *last* attribute in the list must be the inequality, which will be greater than or equal to (` >=`): ```sql SELECT * FROM trades t ASOF JOIN prices p USING (symbol, "when"); ``` Returns symbol, trades.when, price (but NOT prices.when): If you combine `USING` with a `SELECT *` like this, the query will return the left side (probe) column values for the matches, not the right side (build) column values. To get the `prices` times in the example, you will need to list the columns explicitly: ```sql SELECT t.symbol, t.when AS trade_when, p.when AS price_when, price FROM trades t ASOF LEFT JOIN prices p USING (symbol, "when"); ``` ##### Self-Joins {#docs:stable:sql:query_syntax:from::self-joins} DuckDB allows self-joins for all types of joins. Note that tables need to be aliased, using the same table name without aliases will result in an error: ```sql CREATE TABLE t (x INTEGER); SELECT * FROM t JOIN t USING(x); ``` ```console Binder Error: Duplicate alias "t" in query! ``` Adding the aliases allows the query to parse successfully: ```sql SELECT * FROM t AS t1 JOIN t AS t2 USING(x); ``` ##### Shorthands in the `JOIN` Clause {#docs:stable:sql:query_syntax:from::shorthands-in-the-join-clause} You can specify column names in the `JOIN` clause: ```sql CREATE TABLE t1 (x INTEGER); CREATE TABLE t2 (y INTEGER); INSERT INTO t1 VALUES (1), (2), (4); INSERT INTO t2 VALUES (2), (3); SELECT * FROM t1 NATURAL JOIN t2 t2(x); ``` | x | |--:| | 2 | You can also use the `VALUES` clause in the `JOIN` clause: ```sql SELECT * FROM t1 NATURAL JOIN (VALUES (2), (4)) _(x); ``` | x | |--:| | 2 | | 4 | #### `FROM`-First Syntax {#docs:stable:sql:query_syntax:from::from-first-syntax} DuckDB's SQL supports the `FROM`-first syntax, i.e., it allows putting the `FROM` clause before the `SELECT` clause or completely omitting the `SELECT` clause. We use the following example to demonstrate it: ```sql CREATE TABLE tbl AS SELECT * FROM (VALUES ('a'), ('b')) t1(s), range(1, 3) t2(i); ``` ##### `FROM`-First Syntax with a `SELECT` Clause {#docs:stable:sql:query_syntax:from::from-first-syntax-with-a-select-clause} The following statement demonstrates the use of the `FROM`-first syntax: ```sql FROM tbl SELECT i, s; ``` This is equivalent to: ```sql SELECT i, s FROM tbl; ``` | i | s | |--:|---| | 1 | a | | 2 | a | | 1 | b | | 2 | b | ##### `FROM`-First Syntax without a `SELECT` Clause {#docs:stable:sql:query_syntax:from::from-first-syntax-without-a-select-clause} The following statement demonstrates the use of the optional `SELECT` clause: ```sql FROM tbl; ``` This is equivalent to: ```sql SELECT * FROM tbl; ``` | s | i | |---|--:| | a | 1 | | a | 2 | | b | 1 | | b | 2 | #### Syntax {#docs:stable:sql:query_syntax:from::syntax} ### WHERE Clause {#docs:stable:sql:query_syntax:where} The `WHERE` clause specifies any filters to apply to the data. This allows you to select only a subset of the data in which you are interested. Logically the `WHERE` clause is applied immediately after the `FROM` clause. #### Examples {#docs:stable:sql:query_syntax:where::examples} Select all rows that where the `id` is equal to 3: ```sql SELECT * FROM tbl WHERE id = 3; ``` Select all rows that match the given **case-sensitive** `LIKE` expression: ```sql SELECT * FROM tbl WHERE name LIKE '%mark%'; ``` Select all rows that match the given **case-insensitive** expression formulated with the `ILIKE` operator: ```sql SELECT * FROM tbl WHERE name ILIKE '%mark%'; ``` Select all rows that match the given composite expression: ```sql SELECT * FROM tbl WHERE id = 3 OR id = 7; ``` #### Syntax {#docs:stable:sql:query_syntax:where::syntax} ### GROUP BY Clause {#docs:stable:sql:query_syntax:groupby} The `GROUP BY` clause specifies which grouping columns should be used to perform any aggregations in the `SELECT` clause. If the `GROUP BY` clause is specified, the query is always an aggregate query, even if no aggregations are present in the `SELECT` clause. When a `GROUP BY` clause is specified, all tuples that have matching data in the grouping columns (i.e., all tuples that belong to the same group) will be combined. The values of the grouping columns themselves are unchanged, and any other columns can be combined using an [aggregate function](#docs:stable:sql:functions:aggregates) (such as `count`, `sum`, `avg`, etc). #### `GROUP BY ALL` {#docs:stable:sql:query_syntax:groupby::group-by-all} Use `GROUP BY ALL` to `GROUP BY` all columns in the `SELECT` statement that are not wrapped in aggregate functions. This simplifies the syntax by allowing the columns list to be maintained in a single location, and prevents bugs by keeping the `SELECT` granularity aligned to the `GROUP BY` granularity (e.g., it prevents duplication). See examples below and additional examples in the [â€œFriendlier SQL with DuckDBâ€ blog post](https://duckdb.org/2022/05/04/friendlier-sql#group-by-all). #### Multiple Dimensions {#docs:stable:sql:query_syntax:groupby::multiple-dimensions} Normally, the `GROUP BY` clause groups along a single dimension. Using the [`GROUPING SETS`, `CUBE` or `ROLLUP` clauses](#docs:stable:sql:query_syntax:grouping_sets) it is possible to group along multiple dimensions. See the [`GROUPING SETS`](#docs:stable:sql:query_syntax:grouping_sets) page for more information. #### Examples {#docs:stable:sql:query_syntax:groupby::examples} Count the number of entries in the `addresses` table that belong to each different city: ```sql SELECT city, count(*) FROM addresses GROUP BY city; ``` Compute the average income per city per street_name: ```sql SELECT city, street_name, avg(income) FROM addresses GROUP BY city, street_name; ``` ##### `GROUP BY ALL` Examples {#docs:stable:sql:query_syntax:groupby::group-by-all-examples} Group by city and street_name to remove any duplicate values: ```sql SELECT city, street_name FROM addresses GROUP BY ALL; ``` Compute the average income per city per street_name. Since income is wrapped in an aggregate function, do not include it in the `GROUP BY`: ```sql SELECT city, street_name, avg(income) FROM addresses GROUP BY ALL; -- GROUP BY city, street_name: ``` #### Syntax {#docs:stable:sql:query_syntax:groupby::syntax} ### GROUPING SETS {#docs:stable:sql:query_syntax:grouping_sets} `GROUPING SETS`, `ROLLUP` and `CUBE` can be used in the `GROUP BY` clause to perform a grouping over multiple dimensions within the same query. Note that this syntax is not compatible with [`GROUP BY ALL`](#docs:stable:sql:query_syntax:groupby::group-by-all). #### Examples {#docs:stable:sql:query_syntax:grouping_sets::examples} Compute the average income along the provided four different dimensions: ```sql -- the syntax () denotes the empty set (i.e., computing an ungrouped aggregate) SELECT city, street_name, avg(income) FROM addresses GROUP BY GROUPING SETS ((city, street_name), (city), (street_name), ()); ``` Compute the average income along the same dimensions: ```sql SELECT city, street_name, avg(income) FROM addresses GROUP BY CUBE (city, street_name); ``` Compute the average income along the dimensions `(city, street_name)`, `(city)` and `()`: ```sql SELECT city, street_name, avg(income) FROM addresses GROUP BY ROLLUP (city, street_name); ``` #### Description {#docs:stable:sql:query_syntax:grouping_sets::description} `GROUPING SETS` perform the same aggregate across different `GROUP BY clauses` in a single query. ```sql CREATE TABLE students (course VARCHAR, type VARCHAR); INSERT INTO students (course, type) VALUES ('CS', 'Bachelor'), ('CS', 'Bachelor'), ('CS', 'PhD'), ('Math', 'Masters'), ('CS', NULL), ('CS', NULL), ('Math', NULL); ``` ```sql SELECT course, type, count(*) FROM students GROUP BY GROUPING SETS ((course, type), course, type, ()); ``` | course | type | count_star() | |--------|----------|-------------:| | Math | NULL | 1 | | NULL | NULL | 7 | | CS | PhD | 1 | | CS | Bachelor | 2 | | Math | Masters | 1 | | CS | NULL | 2 | | Math | NULL | 2 | | CS | NULL | 5 | | NULL | NULL | 3 | | NULL | Masters | 1 | | NULL | Bachelor | 2 | | NULL | PhD | 1 | In the above query, we group across four different sets: `course, type`, `course`, `type` and `()` (the empty group). The result contains `NULL` for a group which is not in the grouping set for the result, i.e., the above query is equivalent to the following statement of `UNION ALL` clauses: ```sql -- Group by course, type: SELECT course, type, count(*) FROM students GROUP BY course, type UNION ALL -- Group by type: SELECT NULL AS course, type, count(*) FROM students GROUP BY type UNION ALL -- Group by course: SELECT course, NULL AS type, count(*) FROM students GROUP BY course UNION ALL -- Group by nothing: SELECT NULL AS course, NULL AS type, count(*) FROM students; ``` `CUBE` and `ROLLUP` are syntactic sugar to easily produce commonly used grouping sets. The `ROLLUP` clause will produce all â€œsub-groupsâ€ of a grouping set, e.g., `ROLLUP (country, city, zip)` produces the grouping sets `(country, city, zip), (country, city), (country), ()`. This can be useful for producing different levels of detail of a group by clause. This produces `n+1` grouping sets where n is the amount of terms in the `ROLLUP` clause. `CUBE` produces grouping sets for all combinations of the inputs, e.g., `CUBE (country, city, zip)` will produce `(country, city, zip), (country, city), (country, zip), (city, zip), (country), (city), (zip), ()`. This produces `2^n` grouping sets. #### Identifying Grouping Sets with `GROUPING_ID()` {#docs:stable:sql:query_syntax:grouping_sets::identifying-grouping-sets-with-grouping_id} The super-aggregate rows generated by `GROUPING SETS`, `ROLLUP` and `CUBE` can often be identified by `NULL`-values returned for the respective column in the grouping. But if the columns used in the grouping can themselves contain actual `NULL`-values, then it can be challenging to distinguish whether the value in the resultset is a â€œrealâ€ `NULL`-value coming out of the data itself, or a `NULL`-value generated by the grouping construct. The `GROUPING_ID()` or `GROUPING()` function is designed to identify which groups generated the super-aggregate rows in the result. `GROUPING_ID()` is an aggregate function that takes the column expressions that make up the grouping(s). It returns a `BIGINT` value. The return value is `0` for the rows that are not super-aggregate rows. But for the super-aggregate rows, it returns an integer value that identifies the combination of expressions that make up the group for which the super-aggregate is generated. At this point, an example might help. Consider the following query: ```sql WITH days AS ( SELECT year("generate_series") AS y, quarter("generate_series") AS q, month("generate_series") AS m FROM generate_series(DATE '2023-01-01', DATE '2023-12-31', INTERVAL 1 DAY) ) SELECT y, q, m, GROUPING_ID(y, q, m) AS "grouping_id()" FROM days GROUP BY GROUPING SETS ( (y, q, m), (y, q), (y), () ) ORDER BY y, q, m; ``` These are the results: | y | q | m | grouping_id() | |-----:|-----:|-----:|--------------:| | 2023 | 1 | 1 | 0 | | 2023 | 1 | 2 | 0 | | 2023 | 1 | 3 | 0 | | 2023 | 1 | NULL | 1 | | 2023 | 2 | 4 | 0 | | 2023 | 2 | 5 | 0 | | 2023 | 2 | 6 | 0 | | 2023 | 2 | NULL | 1 | | 2023 | 3 | 7 | 0 | | 2023 | 3 | 8 | 0 | | 2023 | 3 | 9 | 0 | | 2023 | 3 | NULL | 1 | | 2023 | 4 | 10 | 0 | | 2023 | 4 | 11 | 0 | | 2023 | 4 | 12 | 0 | | 2023 | 4 | NULL | 1 | | 2023 | NULL | NULL | 3 | | NULL | NULL | NULL | 7 | In this example, the lowest level of grouping is at the month level, defined by the grouping set `(y, q, m)`. Result rows corresponding to that level are simply aggregate rows and the `GROUPING_ID(y, q, m)` function returns `0` for those. The grouping set `(y, q)` results in super-aggregate rows over the month level, leaving a `NULL`-value for the `m` column, and for which `GROUPING_ID(y, q, m)` returns `1`. The grouping set `(y)` results in super-aggregate rows over the quarter level, leaving `NULL`-values for the `m` and `q` column, for which `GROUPING_ID(y, q, m)` returns `3`. Finally, the `()` grouping set results in one super-aggregate row for the entire resultset, leaving `NULL`-values for `y`, `q` and `m` and for which `GROUPING_ID(y, q, m)` returns `7`. To understand the relationship between the return value and the grouping set, you can think of `GROUPING_ID(y, q, m)` writing to a bitfield, where the first bit corresponds to the last expression passed to `GROUPING_ID()`, the second bit to the one-but-last expression passed to `GROUPING_ID()`, and so on. This may become clearer by casting `GROUPING_ID()` to `BIT`: ```sql WITH days AS ( SELECT year("generate_series") AS y, quarter("generate_series") AS q, month("generate_series") AS m FROM generate_series(DATE '2023-01-01', DATE '2023-12-31', INTERVAL 1 DAY) ) SELECT y, q, m, GROUPING_ID(y, q, m) AS "grouping_id(y, q, m)", right(GROUPING_ID(y, q, m)::BIT::VARCHAR, 3) AS "y_q_m_bits" FROM days GROUP BY GROUPING SETS ( (y, q, m), (y, q), (y), () ) ORDER BY y, q, m; ``` Which returns these results: | y | q | m | grouping_id(y, q, m) | y_q_m_bits | |-----:|-----:|-----:|---------------------:|------------| | 2023 | 1 | 1 | 0 | 000 | | 2023 | 1 | 2 | 0 | 000 | | 2023 | 1 | 3 | 0 | 000 | | 2023 | 1 | NULL | 1 | 001 | | 2023 | 2 | 4 | 0 | 000 | | 2023 | 2 | 5 | 0 | 000 | | 2023 | 2 | 6 | 0 | 000 | | 2023 | 2 | NULL | 1 | 001 | | 2023 | 3 | 7 | 0 | 000 | | 2023 | 3 | 8 | 0 | 000 | | 2023 | 3 | 9 | 0 | 000 | | 2023 | 3 | NULL | 1 | 001 | | 2023 | 4 | 10 | 0 | 000 | | 2023 | 4 | 11 | 0 | 000 | | 2023 | 4 | 12 | 0 | 000 | | 2023 | 4 | NULL | 1 | 001 | | 2023 | NULL | NULL | 3 | 011 | | NULL | NULL | NULL | 7 | 111 | Note that the number of expressions passed to `GROUPING_ID()`, or the order in which they are passed is independent from the actual group definitions appearing in the `GROUPING SETS`-clause (or the groups implied by `ROLLUP` and `CUBE`). As long as the expressions passed to `GROUPING_ID()` are expressions that appear some where in the `GROUPING SETS`-clause, `GROUPING_ID()` will set a bit corresponding to the position of the expression whenever that expression is rolled up to a super-aggregate. #### Syntax {#docs:stable:sql:query_syntax:grouping_sets::syntax} ### HAVING Clause {#docs:stable:sql:query_syntax:having} The `HAVING` clause can be used after the `GROUP BY` clause to provide filter criteria *after* the grouping has been completed. In terms of syntax the `HAVING` clause is identical to the `WHERE` clause, but while the `WHERE` clause occurs before the grouping, the `HAVING` clause occurs after the grouping. #### Examples {#docs:stable:sql:query_syntax:having::examples} Count the number of entries in the `addresses` table that belong to each different `city`, filtering out cities with a count below 50: ```sql SELECT city, count(*) FROM addresses GROUP BY city HAVING count(*) >= 50; ``` Compute the average income per city per `street_name`, filtering out cities with an average `income` bigger than twice the median `income`: ```sql SELECT city, street_name, avg(income) FROM addresses GROUP BY city, street_name HAVING avg(income) > 2 * median(income); ``` #### Syntax {#docs:stable:sql:query_syntax:having::syntax} ### ORDER BY Clause {#docs:stable:sql:query_syntax:orderby} `ORDER BY` is an output modifier. Logically it is applied near the very end of the query (just prior to [`LIMIT`](#docs:stable:sql:query_syntax:limit) or [`OFFSET`](#docs:stable:sql:query_syntax:limit), if present). The `ORDER BY` clause sorts the rows on the sorting criteria in either ascending or descending order. In addition, every order clause can specify whether `NULL` values should be moved to the beginning or to the end. The `ORDER BY` clause may contain one or more expressions, separated by commas. An error will be thrown if no expressions are included, since the `ORDER BY` clause should be removed in that situation. The expressions may begin with either an arbitrary scalar expression (which could be a column name), a column position number (where the indexing starts from 1), or the keyword `ALL`. Each expression can optionally be followed by an order modifier (` ASC` or `DESC`, default is `ASC`), and/or a `NULL` order modifier (` NULLS FIRST` or `NULLS LAST`, default is `NULLS LAST`). #### `ORDER BY ALL` {#docs:stable:sql:query_syntax:orderby::order-by-all} The `ALL` keyword indicates that the output should be sorted by every column in order from left to right. The direction of this sort may be modified using either `ORDER BY ALL ASC` or `ORDER BY ALL DESC` and/or `NULLS FIRST` or `NULLS LAST`. Note that `ALL` may not be used in combination with other expressions in the `ORDER BY` clause â€“ it must be by itself. See examples below. #### `NULL` Order Modifier {#docs:stable:sql:query_syntax:orderby::null-order-modifier} By default, DuckDB sorts `ASC` and `NULLS LAST`, i.e., the values are sorted in ascending order and `NULL` values are placed last. This is identical to the default sort order of PostgreSQL. The default sort order can be changed with the following configuration options. Use the `default_null_order` option to change the default `NULL` sorting order to either `NULLS_FIRST`, `NULLS_LAST`, `NULLS_FIRST_ON_ASC_LAST_ON_DESC` or `NULLS_LAST_ON_ASC_FIRST_ON_DESC`: ```sql SET default_null_order = 'NULLS_FIRST'; ``` Use the `default_order` to change the direction of the default sorting order to either `DESC` or `ASC`: ```sql SET default_order = 'DESC'; ``` #### Collations {#docs:stable:sql:query_syntax:orderby::collations} Text is sorted using the binary comparison collation by default, which means values are sorted on their binary UTF-8 values. While this works well for ASCII text (e.g., for English language data), the sorting order can be incorrect for other languages. For this purpose, DuckDB provides collations. For more information on collations, see the [Collation page](#docs:stable:sql:expressions:collations). #### Examples {#docs:stable:sql:query_syntax:orderby::examples} All examples use this example table: ```sql CREATE OR REPLACE TABLE addresses AS SELECT '123 Quack Blvd' AS address, 'DuckTown' AS city, '11111' AS zip UNION ALL SELECT '111 Duck Duck Goose Ln', 'DuckTown', '11111' UNION ALL SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111' UNION ALL SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111-0001'; ``` Select the addresses, ordered by city name using the default `NULL` order and default order: ```sql SELECT * FROM addresses ORDER BY city; ``` Select the addresses, ordered by city name in descending order with nulls at the end: ```sql SELECT * FROM addresses ORDER BY city DESC NULLS LAST; ``` Order by city and then by zip code, both using the default orderings: ```sql SELECT * FROM addresses ORDER BY city, zip; ``` Order by city using German collation rules: ```sql SELECT * FROM addresses ORDER BY city COLLATE DE; ``` ##### `ORDER BY ALL` Examples {#docs:stable:sql:query_syntax:orderby::order-by-all-examples} Order from left to right (by address, then by city, then by zip) in ascending order: ```sql SELECT * FROM addresses ORDER BY ALL; ``` | address | city | zip | |------------------------|-----------|------------| | 111 Duck Duck Goose Ln | Duck Town | 11111 | | 111 Duck Duck Goose Ln | Duck Town | 11111-0001 | | 111 Duck Duck Goose Ln | DuckTown | 11111 | | 123 Quack Blvd | DuckTown | 11111 | Order from left to right (by address, then by city, then by zip) in descending order: ```sql SELECT * FROM addresses ORDER BY ALL DESC; ``` | address | city | zip | |------------------------|-----------|------------| | 123 Quack Blvd | DuckTown | 11111 | | 111 Duck Duck Goose Ln | DuckTown | 11111 | | 111 Duck Duck Goose Ln | Duck Town | 11111-0001 | | 111 Duck Duck Goose Ln | Duck Town | 11111 | #### Syntax {#docs:stable:sql:query_syntax:orderby::syntax} ### LIMIT and OFFSET Clauses {#docs:stable:sql:query_syntax:limit} `LIMIT` is an output modifier. Logically it is applied at the very end of the query. The `LIMIT` clause restricts the amount of rows fetched. The `OFFSET` clause indicates at which position to start reading the values, i.e., the first `OFFSET` values are ignored. Note that while `LIMIT` can be used without an `ORDER BY` clause, the results might not be deterministic without the `ORDER BY` clause. This can still be useful, however, for example when you want to inspect a quick snapshot of the data. #### Examples {#docs:stable:sql:query_syntax:limit::examples} Select the first 5 rows from the addresses table: ```sql SELECT * FROM addresses LIMIT 5; ``` Select the 5 rows from the addresses table, starting at position 5 (i.e., ignoring the first 5 rows): ```sql SELECT * FROM addresses LIMIT 5 OFFSET 5; ``` Select the top 5 cities with the highest population: ```sql SELECT city, count(*) AS population FROM addresses GROUP BY city ORDER BY population DESC LIMIT 5; ``` Select 10% of the rows from the addresses table: ```sql SELECT * FROM addresses LIMIT 10%; ``` #### Syntax {#docs:stable:sql:query_syntax:limit::syntax} ### SAMPLE Clause {#docs:stable:sql:query_syntax:sample} The `SAMPLE` clause allows you to run the query on a sample from the base table. This can significantly speed up processing of queries, at the expense of accuracy in the result. Samples can also be used to quickly see a snapshot of the data when exploring a dataset. The sample clause is applied right after anything in the `FROM` clause (i.e., after any joins, but before the `WHERE` clause or any aggregates). See the [`SAMPLE`](#docs:stable:sql:samples) page for more information. #### Examples {#docs:stable:sql:query_syntax:sample::examples} Select a sample of 1% of the addresses table using default (system) sampling: ```sql SELECT * FROM addresses USING SAMPLE 1%; ``` Select a sample of 1% of the addresses table using bernoulli sampling: ```sql SELECT * FROM addresses USING SAMPLE 1% (bernoulli); ``` Select a sample of 10 rows from the subquery: ```sql SELECT * FROM (SELECT * FROM addresses) USING SAMPLE 10 ROWS; ``` #### Syntax {#docs:stable:sql:query_syntax:sample::syntax} ### Unnesting {#docs:stable:sql:query_syntax:unnest} #### Examples {#docs:stable:sql:query_syntax:unnest::examples} Unnest a list, generating 3 rows (1, 2, 3): ```sql SELECT unnest([1, 2, 3]); ``` Unnesting a struct, generating two columns (a, b): ```sql SELECT unnest({'a': 42, 'b': 84}); ``` Recursive unnest of a list of structs: ```sql SELECT unnest([{'a': 42, 'b': 84}, {'a': 100, 'b': NULL}], recursive := true); ``` Limit depth of recursive unnest using `max_depth`: ```sql SELECT unnest([[[1, 2], [3, 4]], [[5, 6], [7, 8, 9], []], [[10, 11]]], max_depth := 2); ``` The `unnest` special function is used to unnest lists or structs by one level. The function can be used as a regular scalar function, but only in the `SELECT` clause. Invoking `unnest` with the `recursive` parameter will unnest lists and structs of multiple levels. The depth of unnesting can be limited using the `max_depth` parameter (which assumes `recursive` unnesting by default). ##### Unnesting Lists {#docs:stable:sql:query_syntax:unnest::unnesting-lists} Unnest a list, generating 3 rows (1, 2, 3): ```sql SELECT unnest([1, 2, 3]); ``` Unnest a list, generating 3 rows ((1, 10), (2, 10), (3, 10)): ```sql SELECT unnest([1, 2, 3]), 10; ``` Unnest two lists of different sizes, generating 3 rows ((1, 10), (2, 11), (3, NULL)): ```sql SELECT unnest([1, 2, 3]), unnest([10, 11]); ``` Unnest a list column from a subquery: ```sql SELECT unnest(l) + 10 FROM (VALUES ([1, 2, 3]), ([4, 5])) tbl(l); ``` Empty result: ```sql SELECT unnest([]); ``` Empty result: ```sql SELECT unnest(NULL); ``` Using `unnest` on a list emits one row per list entry. Regular scalar expressions in the same `SELECT` clause are repeated for every emitted row. When multiple lists are unnested in the same `SELECT` clause, the lists are unnested side-by-side. If one list is longer than the other, the shorter list is padded with `NULL` values. Empty and `NULL` lists both unnest to zero rows. ##### Unnesting Structs {#docs:stable:sql:query_syntax:unnest::unnesting-structs} Unnesting a struct, generating two columns (a, b): ```sql SELECT unnest({'a': 42, 'b': 84}); ``` Unnesting a struct, generating two columns (a, b): ```sql SELECT unnest({'a': 42, 'b': {'x': 84}}); ``` `unnest` on a struct will emit one column per entry in the struct. ##### Recursive Unnest {#docs:stable:sql:query_syntax:unnest::recursive-unnest} Unnesting a list of lists recursively, generating 5 rows (1, 2, 3, 4, 5): ```sql SELECT unnest([[1, 2, 3], [4, 5]], recursive := true); ``` Unnesting a list of structs recursively, generating two rows of two columns (a, b): ```sql SELECT unnest([{'a': 42, 'b': 84}, {'a': 100, 'b': NULL}], recursive := true); ``` Unnesting a struct, generating two columns (a, b): ```sql SELECT unnest({'a': [1, 2, 3], 'b': 88}, recursive := true); ``` Calling `unnest` with the `recursive` setting will fully unnest lists, followed by fully unnesting structs. This can be useful to fully flatten columns that contain lists within lists, or lists of structs. Note that lists *within* structs are not unnested. ##### Setting the Maximum Depth of Unnesting {#docs:stable:sql:query_syntax:unnest::setting-the-maximum-depth-of-unnesting} The `max_depth` parameter allows limiting the maximum depth of recursive unnesting (which is assumed by default and does not have to be specified separately). For example, unnestig to `max_depth` of 2 yields the following: ```sql SELECT unnest([[[1, 2], [3, 4]], [[5, 6], [7, 8, 9], []], [[10, 11]]], max_depth := 2) AS x; ``` | x | |-----------| | [1, 2] | | [3, 4] | | [5, 6] | | [7, 8, 9] | | [] | | [10, 11] | Meanwhile, unnesting to `max_depth` of 3 results in: ```sql SELECT unnest([[[1, 2], [3, 4]], [[5, 6], [7, 8, 9], []], [[10, 11]]], max_depth := 3) AS x; ``` | x | |---:| | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | | 10 | | 11 | ##### Keeping Track of List Entry Positions {#docs:stable:sql:query_syntax:unnest::keeping-track-of-list-entry-positions} To keep track of each entry's position within the original list, `unnest` may be combined with [`generate_subscripts`](#docs:stable:sql:functions:list::generate_subscripts): ```sql SELECT unnest(l) AS x, generate_subscripts(l, 1) AS index FROM (VALUES ([1, 2, 3]), ([4, 5])) tbl(l); ``` | x | index | |--:|------:| | 1 | 1 | | 2 | 2 | | 3 | 3 | | 4 | 1 | | 5 | 2 | ### WITH Clause {#docs:stable:sql:query_syntax:with} The `WITH` clause allows you to specify common table expressions (CTEs). Regular (non-recursive) common-table-expressions are essentially views that are limited in scope to a particular query. CTEs can reference each-other and can be nested. [Recursive CTEs](#::recursive-ctes) can reference themselves. #### Basic CTE Examples {#docs:stable:sql:query_syntax:with::basic-cte-examples} Create a CTE called `cte` and use it in the main query: ```sql WITH cte AS (SELECT 42 AS x) SELECT * FROM cte; ``` | x | |---:| | 42 | Create two CTEs `cte1` and `cte2`, where the second CTE references the first CTE: ```sql WITH cte1 AS (SELECT 42 AS i), cte2 AS (SELECT i * 100 AS x FROM cte1) SELECT * FROM cte2; ``` | x | |-----:| | 4200 | You can specify column names for CTEs: ```sql WITH cte(j) AS (SELECT 42 AS i) FROM cte; ``` #### CTE Materialization {#docs:stable:sql:query_syntax:with::cte-materialization} DuckDB handles CTEs as _materialized_ by default, meaning that the CTE is evaluated once and the result is stored in a temporary table. However, under certain conditions, DuckDB can _inline_ the CTE into the main query, which means that the CTE is not materialized and its definition is duplicated in each place it is referenced. Inlining is done using the following heuristics: - The CTE is not referenced more than once. - The CTE does not contain a `VOLATILE` function. - The CTE is using `AS NOT MATERIALIZED` and does not use `AS MATERIALIZED`. - The CTE does not perform a grouped aggregation. Materialization can be explicitly activated by defining the CTE using `AS MATERIALIZED` and disabled by using `AS NOT MATERIALIZED`. Note that inlining is not always possible, even if the heuristics are met. For example, if the CTE contains a `read_csv` function, it cannot be inlined. Take the following query for example, which invokes the same CTE three times: ```sql WITH t(x) AS (âŸ¨complex_queryâŸ©) SELECT * FROM t AS t1, t AS t2, t AS t3; ``` Inlining duplicates the definition of `t` for each reference which results in the following query: ```sql SELECT * FROM (âŸ¨complex_queryâŸ©) AS t1(x), (âŸ¨complex_queryâŸ©) AS t2(x), (âŸ¨complex_queryâŸ©) AS t3(x); ``` If `complex_query` is expensive, materializing it with the `MATERIALIZED` keyword can improve performance. In this case, `complex_query` is evaluated only once. ```sql WITH t(x) AS MATERIALIZED (âŸ¨complex_queryâŸ©) SELECT * FROM t AS t1, t AS t2, t AS t3; ``` If one wants to disable materialization, use `NOT MATERIALIZED`: ```sql WITH t(x) AS NOT MATERIALIZED (âŸ¨complex_queryâŸ©) SELECT * FROM t AS t1, t AS t2, t AS t3; ``` Generally, it is not recommended to use explicit materialization hints, as DuckDB's query optimizer is capable of deciding when to materialize or inline a CTE based on the query structure and the heuristics mentioned above. However, in some cases, it may be beneficial to use `MATERIALIZED` or `NOT MATERIALIZED` to control the behavior explicitly. #### Recursive CTEs {#docs:stable:sql:query_syntax:with::recursive-ctes} `WITH RECURSIVE` allows the definition of CTEs which can refer to themselves. Note that the query must be formulated in a way that ensures termination, otherwise, it may run into an infinite loop. ##### Example: Fibonacci Sequence {#docs:stable:sql:query_syntax:with::example-fibonacci-sequence} `WITH RECURSIVE` can be used to make recursive calculations. For example, here is how `WITH RECURSIVE` could be used to calculate the first ten Fibonacci numbers: ```sql WITH RECURSIVE FibonacciNumbers ( RecursionDepth, FibonacciNumber, NextNumber ) AS ( -- Base case SELECT 0 AS RecursionDepth, 0 AS FibonacciNumber, 1 AS NextNumber UNION ALL -- Recursive step SELECT fib.RecursionDepth + 1 AS RecursionDepth, fib.NextNumber AS FibonacciNumber, fib.FibonacciNumber + fib.NextNumber AS NextNumber FROM FibonacciNumbers fib WHERE fib.RecursionDepth + 1 < 10 ) SELECT fn.RecursionDepth AS FibonacciNumberIndex, fn.FibonacciNumber FROM FibonacciNumbers fn; ``` | FibonacciNumberIndex | FibonacciNumber | |---------------------:|----------------:| | 0 | 0 | | 1 | 1 | | 2 | 1 | | 3 | 2 | | 4 | 3 | | 5 | 5 | | 6 | 8 | | 7 | 13 | | 8 | 21 | | 9 | 34 | ##### Example: Tree Traversal {#docs:stable:sql:query_syntax:with::example-tree-traversal} `WITH RECURSIVE` can be used to traverse trees. For example, take a hierarchy of tags: Example graph

```sql CREATE TABLE tag (id INTEGER, name VARCHAR, subclassof INTEGER); INSERT INTO tag VALUES (1, 'U2', 5), (2, 'Blur', 5), (3, 'Oasis', 5), (4, '2Pac', 6), (5, 'Rock', 7), (6, 'Rap', 7), (7, 'Music', 9), (8, 'Movies', 9), (9, 'Art', NULL); ``` The following query returns the path from the node `Oasis` to the root of the tree (` Art`). ```sql WITH RECURSIVE tag_hierarchy(id, source, path) AS ( SELECT id, name, [name] AS path FROM tag WHERE subclassof IS NULL UNION ALL SELECT tag.id, tag.name, list_prepend(tag.name, tag_hierarchy.path) FROM tag, tag_hierarchy WHERE tag.subclassof = tag_hierarchy.id ) SELECT path FROM tag_hierarchy WHERE source = 'Oasis'; ``` | path | |---------------------------| | [Oasis, Rock, Music, Art] | ##### Graph Traversal {#docs:stable:sql:query_syntax:with::graph-traversal} The `WITH RECURSIVE` clause can be used to express graph traversal on arbitrary graphs. However, if the graph has cycles, the query must perform cycle detection to prevent infinite loops. One way to achieve this is to store the path of a traversal in a [list](#docs:stable:sql:data_types:list) and, before extending the path with a new edge, check whether its endpoint has been visited before (see the example later). Take the following directed graph from the [LDBC Graphalytics benchmark](https://arxiv.org/pdf/2011.15028.pdf): Example graph

```sql CREATE TABLE edge (node1id INTEGER, node2id INTEGER); INSERT INTO edge VALUES (1, 3), (1, 5), (2, 4), (2, 5), (2, 10), (3, 1), (3, 5), (3, 8), (3, 10), (5, 3), (5, 4), (5, 8), (6, 3), (6, 4), (7, 4), (8, 1), (9, 4); ``` Note that the graph contains directed cycles, e.g., between nodes 1, 5 and 8. ###### Enumerate All Paths from a Node {#docs:stable:sql:query_syntax:with::enumerate-all-paths-from-a-node} The following query returns **all paths** starting in node 1: ```sql WITH RECURSIVE paths(startNode, endNode, path) AS ( SELECT -- Define the path as the first edge of the traversal node1id AS startNode, node2id AS endNode, [node1id, node2id] AS path FROM edge WHERE startNode = 1 UNION ALL SELECT -- Concatenate new edge to the path paths.startNode AS startNode, node2id AS endNode, array_append(path, node2id) AS path FROM paths JOIN edge ON paths.endNode = node1id -- Prevent adding a repeated node to the path. -- This ensures that no cycles occur. WHERE list_position(paths.path, node2id) IS NULL ) SELECT startNode, endNode, path FROM paths ORDER BY length(path), path; ``` | startNode | endNode | path | |----------:|--------:|---------------| | 1 | 3 | [1, 3] | | 1 | 5 | [1, 5] | | 1 | 5 | [1, 3, 5] | | 1 | 8 | [1, 3, 8] | | 1 | 10 | [1, 3, 10] | | 1 | 3 | [1, 5, 3] | | 1 | 4 | [1, 5, 4] | | 1 | 8 | [1, 5, 8] | | 1 | 4 | [1, 3, 5, 4] | | 1 | 8 | [1, 3, 5, 8] | | 1 | 8 | [1, 5, 3, 8] | | 1 | 10 | [1, 5, 3, 10] | Note that the result of this query is not restricted to shortest paths, e.g., for node 5, the results include paths `[1, 5]` and `[1, 3, 5]`. ###### Enumerate Unweighted Shortest Paths from a Node {#docs:stable:sql:query_syntax:with::enumerate-unweighted-shortest-paths-from-a-node} In most cases, enumerating all paths is not practical or feasible. Instead, only the **(unweighted) shortest paths** are of interest. To find these, the second half of the `WITH RECURSIVE` query should be adjusted such that it only includes a node if it has not yet been visited. This is implemented by using a subquery that checks if any of the previous paths includes the node: ```sql WITH RECURSIVE paths(startNode, endNode, path) AS ( SELECT -- Define the path as the first edge of the traversal node1id AS startNode, node2id AS endNode, [node1id, node2id] AS path FROM edge WHERE startNode = 1 UNION ALL SELECT -- Concatenate new edge to the path paths.startNode AS startNode, node2id AS endNode, array_append(path, node2id) AS path FROM paths JOIN edge ON paths.endNode = node1id -- Prevent adding a node that was visited previously by any path. -- This ensures that (1) no cycles occur and (2) only nodes that -- were not visited by previous (shorter) paths are added to a path. WHERE NOT EXISTS ( FROM paths previous_paths WHERE list_contains(previous_paths.path, node2id) ) ) SELECT startNode, endNode, path FROM paths ORDER BY length(path), path; ``` | startNode | endNode | path | |----------:|--------:|------------| | 1 | 3 | [1, 3] | | 1 | 5 | [1, 5] | | 1 | 8 | [1, 3, 8] | | 1 | 10 | [1, 3, 10] | | 1 | 4 | [1, 5, 4] | | 1 | 8 | [1, 5, 8] | ###### Enumerate Unweighted Shortest Paths between Two Nodes {#docs:stable:sql:query_syntax:with::enumerate-unweighted-shortest-paths-between-two-nodes} `WITH RECURSIVE` can also be used to find **all (unweighted) shortest paths between two nodes**. To ensure that the recursive query is stopped as soon as we reach the end node, we use a [window function](#docs:stable:sql:functions:window_functions) which checks whether the end node is among the newly added nodes. The following query returns all unweighted shortest paths between nodes 1 (start node) and 8 (end node): ```sql WITH RECURSIVE paths(startNode, endNode, path, endReached) AS ( SELECT -- Define the path as the first edge of the traversal node1id AS startNode, node2id AS endNode, [node1id, node2id] AS path, (node2id = 8) AS endReached FROM edge WHERE startNode = 1 UNION ALL SELECT -- Concatenate new edge to the path paths.startNode AS startNode, node2id AS endNode, array_append(path, node2id) AS path, max(CASE WHEN node2id = 8 THEN 1 ELSE 0 END) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS endReached FROM paths JOIN edge ON paths.endNode = node1id WHERE NOT EXISTS ( FROM paths previous_paths WHERE list_contains(previous_paths.path, node2id) ) AND paths.endReached = 0 ) SELECT startNode, endNode, path FROM paths WHERE endNode = 8 ORDER BY length(path), path; ``` | startNode | endNode | path | |----------:|--------:|-----------| | 1 | 8 | [1, 3, 8] | | 1 | 8 | [1, 5, 8] | #### Recursive CTEs with `USING KEY` {#docs:stable:sql:query_syntax:with::recursive-ctes-with-using-key} `USING KEY` alters the behavior of a regular recursive CTE. In each iteration, a regular recursive CTE appends result rows to the union table, which ultimately defines the overall result of the CTE. In contrast, a CTE with `USING KEY` has the ability to update rows that have been placed in the union table in an earlier iteration: if the current iteration produces a row with key `k`, it replaces a row with the same key `k` in the union table (like a dictionary). If no such row exists in the union table yet, the new row is appended to the union table as usual. This allows a CTE to exercise fine-grained control over the union table contents. Avoiding the append-only behavior can lead to significantly smaller union table sizes. This helps query runtime, memory consumption, and makes it feasible to access the union table while the iteration is still ongoing (this is impossible for regular recursive CTEs): in a CTE `WITH RECURSIVE T(...) USING KEY ...`, table `T` denotes the rows added by the last iteration (as is usual for recursive CTEs), while table `recurring.T` denotes the union table built so far. References to `recurring.T` allow for the elegant and idiomatic translation of rather complex algorithms into readable SQL code. ##### Example: `USING KEY` {#docs:stable:sql:query_syntax:with::example-using-key} This is a recursive CTE where `USING KEY` has a key column (` a`) and a payload column (` b`). The payload columns correspond to the columns to be overwritten. In the first iteration we have two different keys, `1` and `2`. These two keys will generate two new rows, `(1, 3)` and `(2, 4)`. In the next iteration we produce a new key, `3`, which generates a new row. We also generate the row `(2, 3)`, where `2` is a key that already exists from the previous iteration. This will overwrite the old payload `4` with the new payload `3`. ```sql WITH RECURSIVE tbl(a, b) USING KEY (a) AS ( SELECT a, b FROM (VALUES (1, 3), (2, 4)) t(a, b) UNION SELECT a + 1, b FROM tbl WHERE a < 3 ) SELECT * FROM tbl; ``` | a | b | |--:|--:| | 1 | 3 | | 2 | 3 | | 3 | 3 | #### Using `VALUES` {#docs:stable:sql:query_syntax:with::using-values} You can use the `VALUES` clause for the initial (anchor) part of the CTE: ```sql WITH RECURSIVE tbl(a, b) USING KEY (a) AS ( VALUES (1, 3), (2, 4) UNION SELECT a + 1, b FROM tbl WHERE a < 3 ) SELECT * FROM tbl; ``` ##### Example: `USING KEY` References Union Table {#docs:stable:sql:query_syntax:with::example-using-key-references-union-table} As well as using the union table as a dictionary, we can now reference it in queries. This allows you to use results from not just the previous iteration, but also earlier ones. This new feature makes certain algorithms easier to implement. One example is the connected components algorithm. For each node, the algorithm determines the node with the lowest ID to which it is connected. To achieve this, we use the entries in the union table to track the lowest ID found for a node. If a new incoming row contains a lower ID, we update this value. Example graph

```sql CREATE TABLE nodes (id INTEGER); INSERT INTO nodes VALUES (1), (2), (3), (4), (5), (6), (7), (8); CREATE TABLE edges (node1id INTEGER, node2id INTEGER); INSERT INTO edges VALUES (1, 3), (2, 3), (3, 7), (7, 8), (5, 4), (6, 4); ``` ```sql WITH RECURSIVE connected_components(id, comp) USING KEY (id) AS ( SELECT n.id, n.id AS comp FROM nodes AS n UNION ( SELECT DISTINCT ON (previous_iter.id) previous_iter.id, initial_iter.comp FROM recurring.connected_components AS previous_iter, connected_components AS initial_iter, edges AS e WHERE ((e.node1id, e.node2id) = (previous_iter.id, initial_iter.id) OR (e.node2id, e.node1id) = (previous_iter.id, initial_iter.id)) AND initial_iter.comp < previous_iter.comp ORDER BY initial_iter.id ASC, previous_iter.comp ASC) ) TABLE connected_components ORDER BY id; ``` | id | comp | |---:|-----:| | 1 | 1 | | 2 | 1 | | 3 | 1 | | 4 | 4 | | 5 | 4 | | 6 | 4 | | 7 | 1 | | 8 | 1 | #### Limitations {#docs:stable:sql:query_syntax:with::limitations} DuckDB does not support mutually recursive CTEs. See the [related issue and discussion in the DuckDB repository](https://github.com/duckdb/duckdb/issues/14716#issuecomment-2467952456). #### Syntax {#docs:stable:sql:query_syntax:with::syntax} ### WINDOW Clause {#docs:stable:sql:query_syntax:window} The `WINDOW` clause allows you to specify named windows that can be used within [window functions](#docs:stable:sql:functions:window_functions). These are useful when you have multiple window functions, as they allow you to avoid repeating the same window clause. #### Syntax {#docs:stable:sql:query_syntax:window::syntax} ### QUALIFY Clause {#docs:stable:sql:query_syntax:qualify} The `QUALIFY` clause is used to filter the results of [`WINDOW` functions](#docs:stable:sql:functions:window_functions). This filtering of results is similar to how a [`HAVING` clause](#docs:stable:sql:query_syntax:having) filters the results of aggregate functions applied based on the [`GROUP BY` clause](#docs:stable:sql:query_syntax:groupby). The `QUALIFY` clause avoids the need for a subquery or [`WITH` clause](#docs:stable:sql:query_syntax:with) to perform this filtering (much like `HAVING` avoids a subquery). An example using a `WITH` clause instead of `QUALIFY` is included below the `QUALIFY` examples. Note that this is filtering based on [`WINDOW` functions](#docs:stable:sql:functions:window_functions), not necessarily based on the [`WINDOW` clause](#docs:stable:sql:query_syntax:window). The `WINDOW` clause is optional and can be used to simplify the creation of multiple `WINDOW` function expressions. The position of where to specify a `QUALIFY` clause is following the [`WINDOW` clause](#docs:stable:sql:query_syntax:window) in a `SELECT` statement (` WINDOW` does not need to be specified), and before the [`ORDER BY`](#docs:stable:sql:query_syntax:orderby). #### Examples {#docs:stable:sql:query_syntax:qualify::examples} Each of the following examples produce the same output, located below. Filter based on a window function defined in the `QUALIFY` clause: ```sql SELECT schema_name, function_name, -- In this example the function_rank column in the select clause is for reference row_number() OVER (PARTITION BY schema_name ORDER BY function_name) AS function_rank FROM duckdb_functions() QUALIFY row_number() OVER (PARTITION BY schema_name ORDER BY function_name) < 3; ``` Filter based on a window function defined in the `SELECT` clause: ```sql SELECT schema_name, function_name, row_number() OVER (PARTITION BY schema_name ORDER BY function_name) AS function_rank FROM duckdb_functions() QUALIFY function_rank < 3; ``` Filter based on a window function defined in the `QUALIFY` clause, but using the `WINDOW` clause: ```sql SELECT schema_name, function_name, -- In this example the function_rank column in the select clause is for reference row_number() OVER my_window AS function_rank FROM duckdb_functions() WINDOW my_window AS (PARTITION BY schema_name ORDER BY function_name) QUALIFY row_number() OVER my_window < 3; ``` Filter based on a window function defined in the `SELECT` clause, but using the `WINDOW` clause: ```sql SELECT schema_name, function_name, row_number() OVER my_window AS function_rank FROM duckdb_functions() WINDOW my_window AS (PARTITION BY schema_name ORDER BY function_name) QUALIFY function_rank < 3; ``` Equivalent query based on a `WITH` clause (without a `QUALIFY` clause): ```sql WITH ranked_functions AS ( SELECT schema_name, function_name, row_number() OVER (PARTITION BY schema_name ORDER BY function_name) AS function_rank FROM duckdb_functions() ) SELECT * FROM ranked_functions WHERE function_rank < 3; ``` | schema_name | function_name | function_rank | |:---|:---|:---| | main | !__postfix | 1 | | main | !~~ | 2 | | pg_catalog | col_description | 1 | | pg_catalog | format_pg_type | 2 | #### Syntax {#docs:stable:sql:query_syntax:qualify::syntax} ### VALUES Clause {#docs:stable:sql:query_syntax:values} The `VALUES` clause is used to specify a fixed number of rows. The `VALUES` clause can be used as a stand-alone statement, as part of the `FROM` clause, or as input to an `INSERT INTO` statement. #### Examples {#docs:stable:sql:query_syntax:values::examples} Generate two rows and directly return them: ```sql VALUES ('Amsterdam', 1), ('London', 2); ``` Generate two rows as part of a `FROM` clause, and rename the columns: ```sql SELECT * FROM (VALUES ('Amsterdam', 1), ('London', 2)) cities(name, id); ``` Generate two rows and insert them into a table: ```sql INSERT INTO cities VALUES ('Amsterdam', 1), ('London', 2); ``` Create a table directly from a `VALUES` clause: ```sql CREATE TABLE cities AS SELECT * FROM (VALUES ('Amsterdam', 1), ('London', 2)) cities(name, id); ``` #### Syntax {#docs:stable:sql:query_syntax:values::syntax} ### FILTER Clause {#docs:stable:sql:query_syntax:filter} The `FILTER` clause may optionally follow an aggregate function in a `SELECT` statement. This will filter the rows of data that are fed into the aggregate function in the same way that a `WHERE` clause filters rows, but localized to the specific aggregate function. There are multiple types of situations where this is useful, including when evaluating multiple aggregates with different filters, and when creating a pivoted view of a dataset. `FILTER` provides a cleaner syntax for pivoting data when compared with the more traditional `CASE WHEN` approach discussed below. Some aggregate functions also do not filter out `NULL` values, so using a `FILTER` clause will return valid results when at times the `CASE WHEN` approach will not. This occurs with the functions `first` and `last`, which are desirable in a non-aggregating pivot operation where the goal is to simply re-orient the data into columns rather than re-aggregate it. `FILTER` also improves `NULL` handling when using the `list` and `array_agg` functions, as the `CASE WHEN` approach will include `NULL` values in the list result, while the `FILTER` clause will remove them. #### Examples {#docs:stable:sql:query_syntax:filter::examples} Return the following: * The total number of rows * The number of rows where `i <= 5` * The number of rows where `i` is odd ```sql SELECT count() AS total_rows, count() FILTER (i <= 5) AS lte_five, count() FILTER (i % 2 = 1) AS odds FROM generate_series(1, 10) tbl(i); ``` | total_rows | lte_five | odds | |:---|:---|:---| | 10 | 5 | 5 | > Simply counting rows that satisfy a condition can also be achieved without `FILTER` clause, using the boolean `sum` aggregate function, e.g., `sum(i <= 5)`. Different aggregate functions may be used, and multiple `WHERE` expressions are also permitted: ```sql SELECT sum(i) FILTER (i <= 5) AS lte_five_sum, median(i) FILTER (i % 2 = 1) AS odds_median, median(i) FILTER (i % 2 = 1 AND i <= 5) AS odds_lte_five_median FROM generate_series(1, 10) tbl(i); ``` | lte_five_sum | odds_median | odds_lte_five_median | |:---|:---|:---| | 15 | 5.0 | 3.0 | The `FILTER` clause can also be used to pivot data from rows into columns. This is a static pivot, as columns must be defined prior to runtime in SQL. However, this kind of statement can be dynamically generated in a host programming language to leverage DuckDB's SQL engine for rapid, larger than memory pivoting. First generate an example dataset: ```sql CREATE TEMP TABLE stacked_data AS SELECT i, CASE WHEN i <= rows * 0.25 THEN 2022 WHEN i <= rows * 0.5 THEN 2023 WHEN i <= rows * 0.75 THEN 2024 WHEN i <= rows * 0.875 THEN 2025 ELSE NULL END AS year FROM ( SELECT i, count(*) OVER () AS rows FROM generate_series(1, 100_000_000) tbl(i) ) tbl; ``` â€œPivotâ€ the data out by year (move each year out to a separate column): ```sql SELECT count(i) FILTER (year = 2022) AS "2022", count(i) FILTER (year = 2023) AS "2023", count(i) FILTER (year = 2024) AS "2024", count(i) FILTER (year = 2025) AS "2025", count(i) FILTER (year IS NULL) AS "NULLs" FROM stacked_data; ``` This syntax produces the same results as the `FILTER` clauses above: ```sql SELECT count(CASE WHEN year = 2022 THEN i END) AS "2022", count(CASE WHEN year = 2023 THEN i END) AS "2023", count(CASE WHEN year = 2024 THEN i END) AS "2024", count(CASE WHEN year = 2025 THEN i END) AS "2025", count(CASE WHEN year IS NULL THEN i END) AS "NULLs" FROM stacked_data; ``` | 2022 | 2023 | 2024 | 2025 | NULLs | |:---|:---|:---|:---|:---| | 25000000 | 25000000 | 25000000 | 12500000 | 12500000 | However, the `CASE WHEN` approach will not work as expected when using an aggregate function that does not ignore `NULL` values. The `first` function falls into this category, so `FILTER` is preferred in this case. â€œPivotâ€ the data out by year (move each year out to a separate column): ```sql SELECT first(i) FILTER (year = 2022) AS "2022", first(i) FILTER (year = 2023) AS "2023", first(i) FILTER (year = 2024) AS "2024", first(i) FILTER (year = 2025) AS "2025", first(i) FILTER (year IS NULL) AS "NULLs" FROM stacked_data; ``` | 2022 | 2023 | 2024 | 2025 | NULLs | |:---|:---|:---|:---|:---| | 1474561 | 25804801 | 50749441 | 76431361 | 87500001 | This will produce `NULL` values whenever the first evaluation of the `CASE WHEN` clause returns a `NULL`: ```sql SELECT first(CASE WHEN year = 2022 THEN i END) AS "2022", first(CASE WHEN year = 2023 THEN i END) AS "2023", first(CASE WHEN year = 2024 THEN i END) AS "2024", first(CASE WHEN year = 2025 THEN i END) AS "2025", first(CASE WHEN year IS NULL THEN i END) AS "NULLs" FROM stacked_data; ``` | 2022 | 2023 | 2024 | 2025 | NULLs | |:---|:---|:---|:---|:---| | 1228801 | NULL | NULL | NULL | NULL | #### Aggregate Function Syntax (Including `FILTER` Clause) {#docs:stable:sql:query_syntax:filter::aggregate-function-syntax-including-filter-clause} ### Set Operations {#docs:stable:sql:query_syntax:setops} Set operations allow queries to be combined according to [set operation semantics](https://en.wikipedia.org/wiki/Set_(mathematics)#Basic_operations). Set operations refer to the [`UNION [ALL]`](#union), [`INTERSECT [ALL]`](#intersect) and [`EXCEPT [ALL]`](#except) clauses. The vanilla variants use set semantics, i.e., they eliminate duplicates, while the variants with `ALL` use bag semantics. Traditional set operations unify queries **by column position**, and require the to-be-combined queries to have the same number of input columns. If the columns are not of the same type, casts may be added. The result will use the column names from the first query. DuckDB also supports [`UNION [ALL] BY NAME`](#union-all-by-name), which joins columns by name instead of by position. `UNION BY NAME` does not require the inputs to have the same number of columns. `NULL` values will be added in case of missing columns. #### `UNION` {#docs:stable:sql:query_syntax:setops::union} The `UNION` clause can be used to combine rows from multiple queries. The queries are required to return the same number of columns. [Implicit casting](https://duckdb.org/docs/sql/data_types/typecasting#implicit-casting) to one of the returned types is performed to combine columns of different types where necessary. If this is not possible, the `UNION` clause throws an error. ##### Vanilla `UNION` (Set Semantics) {#docs:stable:sql:query_syntax:setops::vanilla-union-set-semantics} The vanilla `UNION` clause follows set semantics, therefore it performs duplicate elimination, i.e., only unique rows will be included in the result. ```sql SELECT * FROM range(2) t1(x) UNION SELECT * FROM range(3) t2(x); ``` | x | |--:| | 2 | | 1 | | 0 | ##### `UNION ALL` (Bag Semantics) {#docs:stable:sql:query_syntax:setops::union-all-bag-semantics} `UNION ALL` returns all rows of both queries following bag semantics, i.e., *without* duplicate elimination. ```sql SELECT * FROM range(2) t1(x) UNION ALL SELECT * FROM range(3) t2(x); ``` | x | |--:| | 0 | | 1 | | 0 | | 1 | | 2 | ##### `UNION [ALL] BY NAME` {#docs:stable:sql:query_syntax:setops::union-all-by-name} The `UNION [ALL] BY NAME` clause can be used to combine rows from different tables by name, instead of by position. `UNION BY NAME` does not require both queries to have the same number of columns. Any columns that are only found in one of the queries are filled with `NULL` values for the other query. Take the following tables for example: ```sql CREATE TABLE capitals (city VARCHAR, country VARCHAR); INSERT INTO capitals VALUES ('Amsterdam', 'NL'), ('Berlin', 'Germany'); CREATE TABLE weather (city VARCHAR, degrees INTEGER, date DATE); INSERT INTO weather VALUES ('Amsterdam', 10, '2022-10-14'), ('Seattle', 8, '2022-10-12'); ``` ```sql SELECT * FROM capitals UNION BY NAME SELECT * FROM weather; ``` | city | country | degrees | date | |-----------|---------|--------:|------------| | Seattle | NULL | 8 | 2022-10-12 | | Amsterdam | NL | NULL | NULL | | Berlin | Germany | NULL | NULL | | Amsterdam | NULL | 10 | 2022-10-14 | `UNION BY NAME` follows set semantics (therefore it performs duplicate elimination), whereas `UNION ALL BY NAME` follows bag semantics. #### `INTERSECT` {#docs:stable:sql:query_syntax:setops::intersect} The `INTERSECT` clause can be used to select all rows that occur in the result of **both** queries. ##### Vanilla `INTERSECT` (Set Semantics) {#docs:stable:sql:query_syntax:setops::vanilla-intersect-set-semantics} Vanilla `INTERSECT` performs duplicate elimination, so only unique rows are returned. ```sql SELECT * FROM range(2) t1(x) INTERSECT SELECT * FROM range(6) t2(x); ``` | x | |--:| | 0 | | 1 | ##### `INTERSECT ALL` (Bag Semantics) {#docs:stable:sql:query_syntax:setops::intersect-all-bag-semantics} `INTERSECT ALL` follows bag semantics, so duplicates are returned. ```sql SELECT unnest([5, 5, 6, 6, 6, 6, 7, 8]) AS x INTERSECT ALL SELECT unnest([5, 6, 6, 7, 7, 9]); ``` | x | |--:| | 5 | | 6 | | 6 | | 7 | #### `EXCEPT` {#docs:stable:sql:query_syntax:setops::except} The `EXCEPT` clause can be used to select all rows that **only** occur in the left query. ##### Vanilla `EXCEPT` (Set Semantics) {#docs:stable:sql:query_syntax:setops::vanilla-except-set-semantics} Vanilla `EXCEPT` follows set semantics, therefore, it performs duplicate elimination, so only unique rows are returned. ```sql SELECT * FROM range(5) t1(x) EXCEPT SELECT * FROM range(2) t2(x); ``` | x | |--:| | 2 | | 3 | | 4 | ##### `EXCEPT ALL` (Bag Semantics) {#docs:stable:sql:query_syntax:setops::except-all-bag-semantics} `EXCEPT ALL` uses bag semantics: ```sql SELECT unnest([5, 5, 6, 6, 6, 6, 7, 8]) AS x EXCEPT ALL SELECT unnest([5, 6, 6, 7, 7, 9]); ``` | x | |--:| | 5 | | 8 | | 6 | | 6 | #### Syntax {#docs:stable:sql:query_syntax:setops::syntax} ### Prepared Statements {#docs:stable:sql:query_syntax:prepared_statements} DuckDB supports prepared statements where parameters are substituted when the query is executed. This can improve readability and is useful for preventing [SQL injections](https://en.wikipedia.org/wiki/SQL_injection). #### Syntax {#docs:stable:sql:query_syntax:prepared_statements::syntax} There are three syntaxes for denoting parameters in prepared statements: auto-incremented (` ?`), positional (` $1`), and named (` $param`). Note that not all clients support all of these syntaxes, e.g., the [JDBC client](#docs:stable:clients:java) only supports auto-incremented parameters in prepared statements. ##### Example Dataset {#docs:stable:sql:query_syntax:prepared_statements::example-dataset} In the following, we introduce the three different syntaxes and illustrate them with examples using the following table. ```sql CREATE TABLE person (name VARCHAR, age BIGINT); INSERT INTO person VALUES ('Alice', 37), ('Ana', 35), ('Bob', 41), ('Bea', 25); ``` In our example query, we'll look for people whose name starts with a `B` and are at least 40 years old. This will return a single row `<'Bob', 41>`. ##### Auto-Incremented Parameters: `?` {#docs:stable:sql:query_syntax:prepared_statements::auto-incremented-parameters-} DuckDB support using prepared statements with auto-incremented indexing, i.e., the position of the parameters in the query corresponds to their position in the execution statement. For example: ```sql PREPARE query_person AS SELECT * FROM person WHERE starts_with(name, ?) AND age >= ?; ``` Using the CLI client, the statement is executed as follows. ```sql EXECUTE query_person('B', 40); ``` ##### Positional Parameters: `$1` {#docs:stable:sql:query_syntax:prepared_statements::positional-parameters-1} Prepared statements can use positional parameters, where parameters are denoted with an integer (` $1`, `$2`). For example: ```sql PREPARE query_person AS SELECT * FROM person WHERE starts_with(name, $2) AND age >= $1; ``` Using the CLI client, the statement is executed as follows. Note that the first parameter corresponds to `$1`, the second to `$2`, and so on. ```sql EXECUTE query_person(40, 'B'); ``` ##### Named Parameters: `$parameter` {#docs:stable:sql:query_syntax:prepared_statements::named-parameters-parameter} DuckDB also supports names parameters where parameters are denoted with `$parameter_name`. For example: ```sql PREPARE query_person AS SELECT * FROM person WHERE starts_with(name, $name_start_letter) AND age >= $minimum_age; ``` Using the CLI client, the statement is executed as follows. ```sql EXECUTE query_person(name_start_letter := 'B', minimum_age := 40); ``` #### Dropping Prepared Statements: `DEALLOCATE` {#docs:stable:sql:query_syntax:prepared_statements::dropping-prepared-statements-deallocate} To drop a prepared statement, use the `DEALLOCATE` statement: ```sql DEALLOCATE query_person; ``` Alternatively, use: ```sql DEALLOCATE PREPARE query_person; ``` ## Data Types {#sql:data_types} ### Data Types {#docs:stable:sql:data_types:overview} #### General-Purpose Data Types {#docs:stable:sql:data_types:overview::general-purpose-data-types} The table below shows all the built-in general-purpose data types. The alternatives listed in the aliases column can be used to refer to these types as well, however, note that the aliases are not part of the SQL standard and hence might not be accepted by other database engines. | Name | Aliases | Description | | :------------------------- | :--------------------------------- | :--------------------------------------------------------------------------------------------------------- | | `BIGINT` | `INT8`, `LONG` | Signed eight-byte integer | | `BIT` | `BITSTRING` | String of 1s and 0s | | `BLOB` | `BYTEA`, `BINARY,` `VARBINARY` | Variable-length binary data | | `BIGNUM` | | Variable-length integer | | `BOOLEAN` | `BOOL`, `LOGICAL` | Logical Boolean (` true` / `false`) | | `DATE` | | Calendar date (year, month day) | | `DECIMAL(prec, scale)` | `NUMERIC(prec, scale)` | Fixed-precision number with the given width (precision) and scale, defaults to `prec = 18` and `scale = 3` | | `DOUBLE` | `FLOAT8`, | Double precision floating-point number (8 bytes) | | `FLOAT` | `FLOAT4`, `REAL` | Single precision floating-point number (4 bytes) | | `HUGEINT` | | Signed sixteen-byte integer | | `INTEGER` | `INT4`, `INT`, `SIGNED` | Signed four-byte integer | | `INTERVAL` | | Date / time delta | | `JSON` | | JSON object (via the [`json` extension](#docs:stable:data:json:overview)) | | `SMALLINT` | `INT2`, `SHORT` | Signed two-byte integer | | `TIME` | | Time of day (no time zone) | | `TIMESTAMP WITH TIME ZONE` | `TIMESTAMPTZ` | Combination of time and date that uses the current time zone | | `TIMESTAMP` | `DATETIME` | Combination of time and date | | `TINYINT` | `INT1` | Signed one-byte integer | | `UBIGINT` | | Unsigned eight-byte integer | | `UHUGEINT` | | Unsigned sixteen-byte integer | | `UINTEGER` | | Unsigned four-byte integer | | `USMALLINT` | | Unsigned two-byte integer | | `UTINYINT` | | Unsigned one-byte integer | | `UUID` | | UUID data type | | `VARCHAR` | `CHAR`, `BPCHAR`, `TEXT`, `STRING` | Variable-length character string | Implicit and explicit typecasting is possible between numerous types, see the [Typecasting](#docs:stable:sql:data_types:typecasting) page for details. #### Nested / Composite Types {#docs:stable:sql:data_types:overview::nested--composite-types} DuckDB supports five nested data types: `ARRAY`, `LIST`, `MAP`, `STRUCT`, and `UNION`. Each supports different use cases and has a different structure. | Name | Description | Rules when used in a column | Build from values | Define in DDL/CREATE | |:-|:---|:---|:--|:--| | [`ARRAY`](#docs:stable:sql:data_types:array) | An ordered, fixed-length sequence of data values of the same type. | Each row must have the same data type within each instance of the `ARRAY` and the same number of elements. | `[1, 2, 3]` | `INTEGER[3]` | | [`LIST`](#docs:stable:sql:data_types:list) | An ordered sequence of data values of the same type. | Each row must have the same data type within each instance of the `LIST`, but can have any number of elements. | `[1, 2, 3]` | `INTEGER[]` | | [`MAP`](#docs:stable:sql:data_types:map) | A dictionary of multiple named values, each key having the same type and each value having the same type. Keys and values can be any type and can be different types from one another. | Rows may have different keys. | `map([1, 2], ['a', 'b'])` | `MAP(INTEGER, VARCHAR)` | | [`STRUCT`](#docs:stable:sql:data_types:struct) | A dictionary of multiple named values, where each key is a string, but the value can be a different type for each key. | Each row must have the same keys. | `{'i': 42, 'j': 'a'}` | `STRUCT(i INTEGER, j VARCHAR)` | | [`UNION`](#docs:stable:sql:data_types:union) | A union of multiple alternative data types, storing one of them in each value at a time. A union also contains a discriminator â€œtagâ€ value to inspect and access the currently set member type. | Rows may be set to different member types of the union. | `union_value(num := 2)` | `UNION(num INTEGER, text VARCHAR)` | ##### Rules for Case Sensitivity {#docs:stable:sql:data_types:overview::rules-for-case-sensitivity} The keys of `MAP`s are case-sensitive, while keys of `UNION`s and `STRUCT`s are case-insensitive. For examples, see the [Rules for Case Sensitivity section](#docs:stable:sql:dialect:overview::case-sensitivity-of-keys-in-nested-data-structures). ##### Updating Values of Nested Types {#docs:stable:sql:data_types:overview::updating-values-of-nested-types} When performing _updates_ on values of nested types, DuckDB performs a _delete_ operation followed by an _insert_ operation. When used in a table with ART indexes (either via explicit indexes or primary keys/unique constraints), this can lead to [unexpected constraint violations](#docs:stable:sql:indexes::constraint-checking-in-update-statements). #### Nesting {#docs:stable:sql:data_types:overview::nesting} `ARRAY`, `LIST`, `MAP`, `STRUCT`, and `UNION` types can be arbitrarily nested to any depth, so long as the type rules are observed. Struct with `LIST`s: ```sql SELECT {'birds': ['duck', 'goose', 'heron'], 'aliens': NULL, 'amphibians': ['frog', 'toad']}; ``` Struct with list of `MAP`s: ```sql SELECT {'test': [MAP([1, 5], [42.1, 45]), MAP([1, 5], [42.1, 45])]}; ``` A list of `UNION`s: ```sql SELECT [union_value(num := 2), union_value(str := 'ABC')::UNION(str VARCHAR, num INTEGER)]; ``` #### Performance Implications {#docs:stable:sql:data_types:overview::performance-implications} The choice of data types can have a strong effect on performance. Please consult the [Performance Guide](#docs:stable:guides:performance:schema) for details. ### Array Type {#docs:stable:sql:data_types:array} An `ARRAY` column stores fixed-sized arrays. All fields in the column must have the same length and the same underlying type. Arrays are typically used to store arrays of numbers, but can contain any uniform data type, including `ARRAY`, [`LIST`](#docs:stable:sql:data_types:list) and [`STRUCT`](#docs:stable:sql:data_types:struct) types. Arrays can be used to store vectors such as [word embeddings](https://en.wikipedia.org/wiki/Word_embedding) or image embeddings. To store variable-length lists, use the [`LIST` type](#docs:stable:sql:data_types:list). See the [data types overview](#docs:stable:sql:data_types:overview) for a comparison between nested data types. > The `ARRAY` type in PostgreSQL allows variable-length fields. DuckDB's `ARRAY` type is fixed-length. #### Creating Arrays {#docs:stable:sql:data_types:array::creating-arrays} Arrays can be created using the [`array_value(expr, ...)` function](#docs:stable:sql:functions:array::array_valueindex). Construct with the `array_value` function: ```sql SELECT array_value(1, 2, 3); ``` You can always implicitly cast an array to a list (and use list functions, like `list_extract`, `[i]`): ```sql SELECT array_value(1, 2, 3)[2]; ``` You can cast from a list to an array (the dimensions have to match): ```sql SELECT [3, 2, 1]::INTEGER[3]; ``` Arrays can be nested: ```sql SELECT array_value(array_value(1, 2), array_value(3, 4), array_value(5, 6)); ``` Arrays can store structs: ```sql SELECT array_value({'a': 1, 'b': 2}, {'a': 3, 'b': 4}); ``` #### Defining an Array Field {#docs:stable:sql:data_types:array::defining-an-array-field} Arrays can be created using the `âŸ¨TYPE_NAMEâŸ©[âŸ¨LENGTHâŸ©]`{:.language-sql .highlight} syntax. For example, to create an array field for 3 integers, run: ```sql CREATE TABLE array_table (id INTEGER, arr INTEGER[3]); INSERT INTO array_table VALUES (10, [1, 2, 3]), (20, [4, 5, 6]); ``` #### Retrieving Values from Arrays {#docs:stable:sql:data_types:array::retrieving-values-from-arrays} Retrieving one or more values from an array can be accomplished using brackets and slicing notation, or through [list functions](#docs:stable:sql:functions:list::list-functions) like `list_extract` and `array_extract`. Using the example in [Defining an Array Field](#::defining-an-array-field). The following queries for extracting the second element of an array are equivalent: ```sql SELECT id, arr[1] AS element FROM array_table; SELECT id, list_extract(arr, 1) AS element FROM array_table; SELECT id, array_extract(arr, 1) AS element FROM array_table; ``` | id | element | |---:|--------:| | 10 | 1 | | 20 | 4 | Using the slicing notation returns a `LIST`: ```sql SELECT id, arr[1:2] AS elements FROM array_table; ``` | id | elements | |---:|----------| | 10 | [1, 2] | | 20 | [4, 5] | #### Functions {#docs:stable:sql:data_types:array::functions} All [`LIST` functions](#docs:stable:sql:functions:list) work with the `ARRAY` type. Additionally, several `ARRAY`-native functions are also supported. See the [`ARRAY` functions](#docs:stable:sql:functions:array::array-native-functions). #### Examples {#docs:stable:sql:data_types:array::examples} Create sample data: ```sql CREATE TABLE x (i INTEGER, v FLOAT[3]); CREATE TABLE y (i INTEGER, v FLOAT[3]); INSERT INTO x VALUES (1, array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT)); INSERT INTO y VALUES (1, array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT)); ``` Compute cross product: ```sql SELECT array_cross_product(x.v, y.v) FROM x, y WHERE x.i = y.i; ``` Compute cosine similarity: ```sql SELECT array_cosine_similarity(x.v, y.v) FROM x, y WHERE x.i = y.i; ``` #### Ordering {#docs:stable:sql:data_types:array::ordering} The ordering of `ARRAY` instances is defined using a lexicographical order. `NULL` values compare greater than all other values and are considered equal to each other. #### See Also {#docs:stable:sql:data_types:array::see-also} For more functions, see [List Functions](#docs:stable:sql:functions:list). ### Bitstring Type {#docs:stable:sql:data_types:bitstring} | Name | Aliases | Description | |:---|:---|:---| | `BITSTRING` | `BIT` | Variable-length strings of 1s and 0s | Bitstrings are strings of 1s and 0s. The bit type data is of variable length. A bitstring value requires 1 byte for each group of 8 bits, plus a fixed amount to store some metadata. By default bitstrings will not be padded with zeroes. Bitstrings can be very large, having the same size restrictions as `BLOB`s. #### Creating a Bitstring {#docs:stable:sql:data_types:bitstring::creating-a-bitstring} A string encoding a bitstring can be cast to a `BITSTRING`: ```sql SELECT '101010'::BITSTRING AS b; ``` | b | |--------| | 101010 | Create a `BITSTRING` with predefined length is possible with the `bitstring` function. The resulting bitstring will be left-padded with zeroes. ```sql SELECT bitstring('0101011', 12) AS b; ``` | b | |--------------| | 000000101011 | Numeric values (integer and float values) can also be converted to a `BITSTRING` via casting. For example: ```sql SELECT 123::BITSTRING AS b; ``` | b | |----------------------------------| | 00000000000000000000000001111011 | #### Functions {#docs:stable:sql:data_types:bitstring::functions} See [Bitstring Functions](#docs:stable:sql:functions:bitstring). ### Blob Type {#docs:stable:sql:data_types:blob} | Name | Aliases | Description | |:---|:---|:---| | `BLOB` | `BYTEA`, `BINARY`, `VARBINARY` | Variable-length binary data | The blob (**B**inary **L**arge **OB**ject) type represents an arbitrary binary object stored in the database system. The blob type can contain any type of binary data with no restrictions. What the actual bytes represent is opaque to the database system. Create a `BLOB` value with a single byte (170): ```sql SELECT '\xAA'::BLOB; ``` Create a `BLOB` value with three bytes (170, 171, 172): ```sql SELECT '\xAA\xAB\xAC'::BLOB; ``` Create a `BLOB` value with two bytes (65, 66): ```sql SELECT 'AB'::BLOB; ``` Blobs are typically used to store non-textual objects that the database does not provide explicit support for, such as images. While blobs can hold objects up to 4 GB in size, typically it is not recommended to store very large objects within the database system. In many situations it is better to store the large file on the file system, and store the path to the file in the database system in a `VARCHAR` field. #### Functions {#docs:stable:sql:data_types:blob::functions} See [Blob Functions](#docs:stable:sql:functions:blob). ### Boolean Type {#docs:stable:sql:data_types:boolean} | Name | Aliases | Description | |:---|:---|:---| | `BOOLEAN` | `BOOL` | Logical Boolean (` true` / `false`) | The `BOOLEAN` type represents a statement of truth (â€œtrueâ€ or â€œfalseâ€). In SQL, the `BOOLEAN` field can also have a third state â€œunknownâ€ which is represented by the SQL `NULL` value. Select the three possible values of a `BOOLEAN` column: ```sql SELECT true, false, NULL::BOOLEAN; ``` Boolean values can be explicitly created using the literals `true` and `false`. However, they are most often created as a result of comparisons or conjunctions. For example, the comparison `i > 10` results in a Boolean value. Boolean values can be used in the `WHERE` and `HAVING` clauses of a SQL statement to filter out tuples from the result. In this case, tuples for which the predicate evaluates to `true` will pass the filter, and tuples for which the predicate evaluates to `false` or `NULL` will be filtered out. Consider the following example: Create a table with the values 5, 15 and `NULL`: ```sql CREATE TABLE integers (i INTEGER); INSERT INTO integers VALUES (5), (15), (NULL); ``` Select all entries where `i > 10`: ```sql SELECT * FROM integers WHERE i > 10; ``` In this case 5 and `NULL` are filtered out (` 5 > 10` is `false` and `NULL > 10` is `NULL`): | i | |---:| | 15 | #### Conjunctions {#docs:stable:sql:data_types:boolean::conjunctions} The `AND` / `OR` conjunctions can be used to combine Boolean values. Below is the truth table for the `AND` conjunction (i.e., `x AND y`). | `X` | `X AND true` | `X AND false` | `X AND NULL` | |-------|-------|-------|-------| | true | true | false | NULL | | false | false | false | false | | NULL | NULL | false | NULL | Below is the truth table for the `OR` conjunction (i.e., `x OR y`). | `X` | `X OR true` | `X OR false` | `X OR NULL` | |-------|------|-------|------| | true | true | true | true | | false | true | false | NULL | | NULL | true | NULL | NULL | #### Expressions {#docs:stable:sql:data_types:boolean::expressions} See [Logical Operators](#docs:stable:sql:expressions:logical_operators) and [Comparison Operators](#docs:stable:sql:expressions:comparison_operators). ### Date Types {#docs:stable:sql:data_types:date} | Name | Aliases | Description | |:-------|:--------|:--------------------------------| | `DATE` | | Calendar date (year, month day) | A date specifies a combination of year, month and day. DuckDB follows the SQL standard's lead by counting dates exclusively in the Gregorian calendar, even for years before that calendar was in use. Dates can be created using the `DATE` keyword, where the data must be formatted according to the ISO 8601 format (` YYYY-MM-DD`). ```sql SELECT DATE '1992-09-20'; ``` #### Special Values {#docs:stable:sql:data_types:date::special-values} There are also three special date values that can be used on input: | Input string | Description | |:-------------|:----------------------------------| | epoch | 1970-01-01 (Unix system day zero) | | infinity | Later than all other dates | | -infinity | Earlier than all other dates | The values `infinity` and `-infinity` are specially represented inside the system and will be displayed unchanged; but `epoch` is simply a notational shorthand that will be converted to the date value when read. ```sql SELECT '-infinity'::DATE AS negative, 'epoch'::DATE AS epoch, 'infinity'::DATE AS positive; ``` | negative | epoch | positive | |-----------|------------|----------| | -infinity | 1970-01-01 | infinity | #### Functions {#docs:stable:sql:data_types:date::functions} See [Date Functions](#docs:stable:sql:functions:date). ### Enum Data Type {#docs:stable:sql:data_types:enum} | Name | Description | |:--|:-----| | `ENUM` | Dictionary representing all possible string values of a column | The enum type represents a dictionary data structure with all possible unique values of a column. For example, a column storing the days of the week can be an enum holding all possible days. Enums are particularly interesting for string columns with low cardinality (i.e., fewer distinct values). This is because the column only stores a numerical reference to the string in the enum dictionary, resulting in immense savings in disk storage and faster query performance. #### Creating Enums {#docs:stable:sql:data_types:enum::creating-enums} You can create enum using hardcoded values: ```sql CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy'); -- This statement will fail since enums cannot hold NULL values: -- CREATE TYPE mood AS ENUM ('sad', NULL); -- This statement will fail since enum values must be unique: -- CREATE TYPE mood AS ENUM ('sad', 'sad'); ``` You can create enums in a specific schema: ```sql CREATE SCHEMA my_schema; CREATE TYPE my_schema.mood AS ENUM ('sad', 'ok', 'happy'); ``` Anonymous enums can be created on the fly during [casting](#docs:preview:sql:expressions:cast): ```sql SELECT 'clubs'::ENUM ('spades', 'hearts', 'diamonds', 'clubs'); ``` You can also create an enum using a `SELECT` statement that returns a single column of `VARCHAR`s. The set of values from the select statement will be deduplicated automatically, and `NULL` values will be ignored: ```sql CREATE TYPE region AS ENUM (SELECT region FROM sales_data); ``` If you are importing data from a file, you can create an enum for a `VARCHAR` column before importing: ```sql CREATE TYPE region AS ENUM (SELECT region FROM 'sales_data.csv'); CREATE TABLE sales_data (amount INTEGER, region region); COPY sales_data FROM 'sales_data.csv'; ``` #### Using Enums {#docs:stable:sql:data_types:enum::using-enums} Enum values are case-sensitive, so 'maltese' and 'Maltese' are considered different values: ```sql CREATE TYPE breed AS ENUM ('maltese', 'Maltese'); -- Will return false SELECT 'maltese'::breed = 'Maltese'::breed; -- Will error SELECT 'MALTESE'::breed; ``` After an enum has been created, it can be used anywhere a standard built-in type is used. For example, we can create a table with a column that references the enum. ```sql CREATE TABLE person ( name TEXT, current_mood mood ); INSERT INTO person VALUES ('Pedro', 'happy'), ('Mark', NULL), ('Pagliacci', 'sad'), ('Mr. Mackey', 'ok'); ``` The following query will fail since the mood type does not have `quackity-quack` value. ```sql INSERT INTO person VALUES ('Hannes', 'quackity-quack'); ``` #### Enums vs. Strings {#docs:stable:sql:data_types:enum::enums-vs-strings} DuckDB enums are automatically cast to `VARCHAR` types whenever necessary. This characteristic allows for comparisons between different enums, or an enum and a `VARCHAR` column. It also allows for an enum to be used in any `VARCHAR` function. For example: ```sql SELECT current_mood, regexp_matches(current_mood, '.*a.*') AS contains_a FROM person; ``` | current_mood | contains_a | |:-------------|:-----------| | happy | true | | NULL | NULL | | sad | true | | ok | false | When comparing two different enum types, DuckDB will cast both to strings and perform a string comparison: ```sql CREATE TYPE new_mood AS ENUM ('happy', 'anxious'); SELECT * FROM person WHERE current_mood = 'happy'::new_mood; -- Equivalent to `WHERE current_mood::VARCHAR = 'happy'::VARCHAR` ``` | name | current_mood | |:----------|:-------------| | Pedro | happy | When comparing an enum to a `VARCHAR`, DuckDB will cast the enum to `VARCHAR` and perform a string comparison: ```sql SELECT * FROM person WHERE current_mood = name; -- Equivalent to `WHERE current_mood::VARCHAR = name` -- No rows returned ``` When comparing against a constant string, DuckDB will perform an optimization and `try_cast(âŸ¨constant stringâŸ©, enum_type)`{:.language-sql .highlight} so that physically we are doing an integer comparison instead of a string comparison (but logically it is still a string comparison): ```sql SELECT * FROM person WHERE current_mood = 'sad'; -- Equivalent to `WHERE current_mood::VARCHAR = 'sad'` ``` | name | current_mood | |:----------|:-------------| | Pagliacci | sad | > **Warning.** This means that comparing against a random (non-equivalent) string always results in `false` (and does not error): ```sql SELECT * FROM person WHERE current_mood = 'bogus'; -- Equivalent to `WHERE current_mood::VARCHAR = 'bogus'` -- No rows returned ``` If you want to enforce type-safety, cast to the enum explicitly: ```sql SELECT * FROM person WHERE current_mood = 'bogus'::mood; -- Conversion Error: Could not convert string 'bogus' to UINT8 ``` #### Ordering of Enums {#docs:stable:sql:data_types:enum::ordering-of-enums} Enum values are ordered according to their order in the enum's definition. For example: ```sql CREATE TYPE priority AS ENUM ('low', 'medium', 'high'); SELECT 'low'::priority < 'high'::priority AS comp; -- note that 'low'::VARCHAR < 'high'::VARCHAR is false! ``` | comp | |-----:| | true | ```sql SELECT unnest(['medium'::priority, 'high'::priority, 'low'::priority]) AS m ORDER BY m; ``` | m | |:-------| | low | | medium | | high | > **Warning.** If you compare an enum to a non-enum (e.g., a `VARCHAR` or a different enum type), the enum will first be cast to a string (as described in the previous section), and the comparison will be done lexicographically as with strings: ```sql CREATE TABLE tasks (name TEXT, priority_level priority); INSERT INTO tasks VALUES ('a', 'low'), ('b', 'medium'), ('c', 'high'); -- WARNING! -- Equivalent to `WHERE priority_level::VARCHAR >= 'medium'` SELECT * FROM tasks WHERE priority_level >= 'medium'; -- Misses the 'high' priority task! ``` | name | priority_level | |:-----|:----------------| | b | medium | So, if you want to e.g. "get all priorities at or above `medium`" then explicitly cast to the enum type: ```sql SELECT * FROM tasks WHERE priority_level >= 'medium'::priority; ``` | name | priority_level | |:-----|:----------------| | b | medium | | c | high | #### Functions {#docs:stable:sql:data_types:enum::functions} See [Enum Functions](#docs:preview:sql:functions:enum). For example, show the available values in the `moods` enum using the `enum_range` function: ```sql SELECT enum_range(NULL::moods) AS my_enum_range; ``` | my_enum_range | |--------------------| | `[sad, ok, happy]` | #### Enum Removal {#docs:stable:sql:data_types:enum::enum-removal} Enum types are stored in the catalog, and a catalog dependency is added to each table that uses them. It is possible to drop an enum from the catalog using the following command: ```sql DROP TYPE âŸ¨enum_nameâŸ©; ``` Currently, it is possible to drop enums that are used in tables without affecting the tables. > **Warning.** This behavior of the enum removal feature is subject to change. In future releases, it is expected that any dependent columns must be removed before dropping the enum, or the enum must be dropped with the additional `CASCADE` parameter. ### Interval Type {#docs:stable:sql:data_types:interval} `INTERVAL`s represent periods of time that can be added to or subtracted from `DATE`, `TIMESTAMP`, `TIMESTAMPTZ`, or `TIME` values. | Name | Description | |:---|:---| | `INTERVAL` | Period of time | An `INTERVAL` can be constructed by providing amounts together with units. Units that aren't *months*, *days*, or *microseconds* are converted to equivalent amounts in the next smaller of these three basis units. ```sql SELECT INTERVAL 1 YEAR, -- single unit using YEAR keyword; stored as 12 months INTERVAL (random() * 10) YEAR, -- parentheses necessary for variable amounts; -- stored as integer number of months INTERVAL '1 month 1 day', -- string type necessary for multiple units; stored as (1 month, 1 day) '16 months'::INTERVAL, -- string cast supported; stored as 16 months '48:00:00'::INTERVAL, -- HH::MM::SS string supported; stored as (48 * 60 * 60 * 1e6 microseconds) ; ``` > **Warning.** Decimal values are truncated to integers when used with unit keywords (unless the unit is `SECONDS` or `MILLISECONDS`). > > ```sql > SELECT INTERVAL '1.5' YEARS; > -- Returns 12 months; equivalent to `to_years(CAST(trunc(1.5) AS INTEGER))` > ``` > > For more precision, include the unit in the string or use a more granular unit; e.g., `INTERVAL '1.5 years'` or `INTERVAL 18 MONTHS`. Three independent basis units are necessary because a month does not correspond to a fixed amount of days (February has fewer days than March) and a day doesn't correspond to a fixed amount of microseconds (days can be 25 hours or 23 hours long because of daylight saving time). The division into components makes the `INTERVAL` class suitable for adding or subtracting specific time units to a date. For example, we can generate a table with the first day of every month using the following SQL query: ```sql SELECT DATE '2000-01-01' + INTERVAL (i) MONTH FROM range(12) t(i); ``` When `INTERVAL`s are deconstructed via the `datepart` function, the *months* component is additionally split into years and months, and the *microseconds* component is split into hours, minutes, and microseconds. The *days* component is not split into additional units. To demonstrate this, the following query generates an `INTERVAL` called `period` by summing random amounts of the three basis units. It then extracts the aforementioned six parts from `period`, adds them back together, and confirms that the result is always equal to the original `period`. ```sql SELECT period = list_reduce( [INTERVAL (datepart(part, period) || part) FOR part IN ['year', 'month', 'day', 'hour', 'minute', 'microsecond'] ], (i1, i2) -> i1 + i2 ) -- always true FROM ( VALUES ( INTERVAL (random() * 123_456_789_123) MICROSECONDS + INTERVAL (random() * 12_345) DAYS + INTERVAL (random() * 12_345) MONTHS ) ) _(period); ``` > **Warning.** The *microseconds* component is split only into hours, minutes, and microseconds, rather than hours, minutes, *seconds*, and microseconds. The following table describe how these parts are extracted by `datepart` in formulas, as a function of the three basis units. | Part | Formula | |----------------------|--------------------------------------------------| | `year` | `#months // 12` | | `month` | `#months % 12` | | `day` | `#days` | | `hour` | `#microseconds // (60 * 60 * 1_000_000)` | | `minute` | `(#microseconds // (60 * 1_000_000)) % 60` | | `microsecond` | `#microseconds % (60 * 1_000_000)` | Additionally, `datepart` may be used to extract centuries, decades, quarters, seconds, and milliseconds from `INTERVAL`s. However, these parts are not required when reassembling the original `INTERVAL`. In fact, if the previous query additionally extracted any of these additional parts, then the sum of the extracted parts would generally be larger than the original `period`. | Part | Formula | |----------------------|--------------------------------------------------| | `century` | `datepart('year', interval) // 100` | | `decade` | `datepart('year', interval) // 10` | | `quarter` | `datepart('month', interval) // 3 + 1` | | `second` | `datepart('microsecond', interval) // 1_000_000` | | `millisecond` | `datepart('microsecond', interval) // 1_000` | > All units use 0-based indexing, except for quarters, which use 1-based indexing. For example: ```sql SELECT datepart('decade', INTERVAL 12 YEARS), -- returns 1 datepart('year', INTERVAL 12 YEARS), -- returns 12 datepart('second', INTERVAL 1_234 MILLISECONDS), -- returns 1 datepart('microsecond', INTERVAL 1_234 MILLISECONDS), -- returns 1_234_000 ; ``` #### Arithmetic with Timestamps, Dates and Intervals {#docs:stable:sql:data_types:interval::arithmetic-with-timestamps-dates-and-intervals} `INTERVAL`s can be added to and subtracted from `TIMESTAMP`s, `TIMESTAMPTZ`s, `DATE`s, and `TIME`s using the `+` and `-` operators. ```sql SELECT DATE '2000-01-01' + INTERVAL 1 YEAR, TIMESTAMP '2000-01-01 01:33:30' - INTERVAL '1 month 13 hours', TIME '02:00:00' - INTERVAL '3 days 23 hours', -- wraps; equals TIME '03:00:00' ; ``` > Adding an `INTERVAL` to a `DATE` returns a `TIMESTAMP` even when the `INTERVAL` has no microseconds component. The result is the same as if the `DATE` was cast to a `TIMESTAMP` (which sets the time component to `00:00:00`) before adding the `INTERVAL`. Conversely, subtracting two `TIMESTAMP`s or two `TIMESTAMPTZ`s from one another creates an `INTERVAL` describing the difference between the timestamps with only the *days and microseconds* components. For example: ```sql SELECT TIMESTAMP '2000-02-06 12:00:00' - TIMESTAMP '2000-01-01 11:00:00', -- 36 days 1 hour TIMESTAMP '2000-02-01' + (TIMESTAMP '2000-02-01' - TIMESTAMP '2000-01-01'), -- '2000-03-03', NOT '2000-03-01' ; ``` Subtracting two `DATE`s from one another does not create an `INTERVAL` but rather returns the number of days between the given dates as integer value. > **Warning.** Extracting a part of the `INTERVAL` difference between two `TIMESTAMP`s is not equivalent to computing the number of partition boundaries between the two `TIMESTAMP`s for the corresponding unit, as computed by the `datediff` function: > > ```sql > SELECT > datediff('day', TIMESTAMP '2020-01-01 01:00:00', TIMESTAMP '2020-01-02 00:00:00'), -- 1 > datepart('day', TIMESTAMP '2020-01-02 00:00:00' - TIMESTAMP '2020-01-01 01:00:00'), -- 0 > ; > ``` #### Equality and Comparison {#docs:stable:sql:data_types:interval::equality-and-comparison} For equality and ordering comparisons only, the total number of microseconds in an `INTERVAL` is computed by converting the days basis unit to `24 * 60 * 60 * 1e6` microseconds and the months basis unit to 30 days, or `30 * 24 * 60 * 60 * 1e6` microseconds. As a result, `INTERVAL`s can compare equal even when they are functionally different, and the ordering of `INTERVAL`s is not always preserved when they are added to dates or timestamps. For example: * `INTERVAL 30 DAYS = INTERVAL 1 MONTH` * but `DATE '2020-01-01' + INTERVAL 30 DAYS != DATE '2020-01-01' + INTERVAL 1 MONTH`. and * `INTERVAL '30 days 12 hours' > INTERVAL 1 MONTH` * but `DATE '2020-01-01' + INTERVAL '30 days 12 hours' < DATE '2020-01-01' + INTERVAL 1 MONTH`. #### Functions {#docs:stable:sql:data_types:interval::functions} See the [Date Part Functions page](#docs:stable:sql:functions:datepart) for a list of available date parts for use with an `INTERVAL`. See the [Interval Operators page](#docs:stable:sql:functions:interval) for functions that operate on intervals. ### List Type {#docs:stable:sql:data_types:list} A `LIST` column encodes lists of values. Fields in the column can have values with different lengths, but they must all have the same underlying type. `LIST`s are typically used to store arrays of numbers, but can contain any uniform data type, including other `LIST`s and `STRUCT`s. `LIST`s are similar to PostgreSQL's `ARRAY` type. DuckDB uses the `LIST` terminology, but some [`array_` functions](#docs:stable:sql:functions:list) are provided for PostgreSQL compatibility. See the [data types overview](#docs:stable:sql:data_types:overview) for a comparison between nested data types. > For storing fixed-length lists, DuckDB uses the [`ARRAY` type](#docs:stable:sql:data_types:array). #### Creating Lists {#docs:stable:sql:data_types:list::creating-lists} Lists can be created using the [`list_value(expr, ...)`](#docs:stable:sql:functions:list::list_valueany-) function or the equivalent bracket notation `[expr, ...]`. The expressions can be constants or arbitrary expressions. To create a list from a table column, use the [`list`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) aggregate function. List of integers: ```sql SELECT [1, 2, 3]; ``` List of strings with a `NULL` value: ```sql SELECT ['duck', 'goose', NULL, 'heron']; ``` List of lists with `NULL` values: ```sql SELECT [['duck', 'goose', 'heron'], NULL, ['frog', 'toad'], []]; ``` Create a list with the list_value function: ```sql SELECT list_value(1, 2, 3); ``` Create a table with an `INTEGER` list column and a `VARCHAR` list column: ```sql CREATE TABLE list_table (int_list INTEGER[], varchar_list VARCHAR[]); ``` #### Retrieving from Lists {#docs:stable:sql:data_types:list::retrieving-from-lists} Retrieving one or more values from a list can be accomplished using brackets and slicing notation, or through [list functions](#docs:stable:sql:functions:list) like `list_extract`. Multiple equivalent functions are provided as aliases for compatibility with systems that refer to lists as arrays. For example, the function `array_slice`. | Example | Result | |:-----------------------------------------|:-----------| | SELECT ['a', 'b', 'c'][3] | 'c' | | SELECT ['a', 'b', 'c'][-1] | 'c' | | SELECT ['a', 'b', 'c'][2 + 1] | 'c' | | SELECT list_extract(['a', 'b', 'c'], 3) | 'c' | | SELECT ['a', 'b', 'c'][1:2] | ['a', 'b'] | | SELECT ['a', 'b', 'c'][:2] | ['a', 'b'] | | SELECT ['a', 'b', 'c'][-2:] | ['b', 'c'] | | SELECT list_slice(['a', 'b', 'c'], 2, 3) | ['b', 'c'] | #### Comparison and Ordering {#docs:stable:sql:data_types:list::comparison-and-ordering} The `LIST` type can be compared using all the [comparison operators](#docs:stable:sql:expressions:comparison_operators). These comparisons can be used in [logical expressions](#docs:stable:sql:expressions:logical_operators) such as `WHERE` and `HAVING` clauses, and return [`BOOLEAN` values](#docs:stable:sql:data_types:boolean). The `LIST` ordering is defined positionally using the following rules, where `min_len = min(len(l1), len(l2))`. * **Equality.** `l1` and `l2` are equal, if for each `i` in `[1, min_len]`: `l1[i] = l2[i]`. * **Less Than**. For the first index `i` in `[1, min_len]` where `l1[i] != l2[i]`: If `l1[i] < l2[i]`, `l1` is less than `l2`. `NULL` values are compared following PostgreSQL's semantics. Lower nesting levels are used for tie-breaking. Here are some queries returning `true` for the comparison. ```sql SELECT [1, 2] < [1, 3] AS result; ``` ```sql SELECT [[1], [2, 4, 5]] < [[2]] AS result; ``` ```sql SELECT [ ] < [1] AS result; ``` These queries return `false`. ```sql SELECT [ ] < [ ] AS result; ``` ```sql SELECT [1, 2] < [1] AS result; ``` These queries return `NULL`. ```sql SELECT [1, 2] < [1, NULL, 4] AS result; ``` #### Functions {#docs:stable:sql:data_types:list::functions} See [List Functions](#docs:stable:sql:functions:list). ### Literal Types {#docs:stable:sql:data_types:literal_types} DuckDB has special literal types for representing `NULL`, integer and string literals in queries. These have their own binding and conversion rules. > Prior to DuckDB version 0.10.0, integer and string literals behaved identically to the `INTEGER` and `VARCHAR` types. #### Null Literals {#docs:stable:sql:data_types:literal_types::null-literals} The `NULL` literal is denoted with the keyword `NULL`. The `NULL` literal can be implicitly converted to any other type. #### Integer Literals {#docs:stable:sql:data_types:literal_types::integer-literals} Integer literals are denoted as a sequence of one or more decimal digits. At runtime, these result in values of the `INTEGER_LITERAL` type. `INTEGER_LITERAL` types can be implicitly converted to any [integer type](#docs:stable:sql:data_types:numeric::integer-types) in which the value fits. For example, the integer literal `42` can be implicitly converted to a `TINYINT`, but the integer literal `1000` cannot be. > DuckDB does not support hexadecimal or binary literals directly. However, strings or string literals in hexadecimal or binary notation with `0x` or `0b` prefixes respectively, can be cast to integer types, e.g., `'0xFF'::INT = 255` or `0b101::INT = 5`. #### Other Numeric Literals {#docs:stable:sql:data_types:literal_types::other-numeric-literals} Non-integer numeric literals can be denoted with decimal notation, using the period character (` .`) to separate the integer part and the decimal part of the number. Either the integer part or the decimal part may be omitted: ```sql SELECT 1.5; -- 1.5 SELECT .50; -- 0.5 SELECT 2.; -- 2.0 ``` Non-integer numeric literals can also be denoted using [_E notation_](https://en.wikipedia.org/wiki/Scientific_notation#E_notation). In E notation, an integer or decimal literal is followed by and exponential part, which is denoted by `e` or `E`, followed by a literal integer indicating the exponent. The exponential part indicates that the preceding value should be multiplied by 10 raised to the power of the exponent: ```sql SELECT 1e2; -- 100 SELECT 6.02214e23; -- Avogadro's constant SELECT 1e-10; -- 1 Ã¥ngstrÃ¶m ``` #### Underscores in Numeric Literals {#docs:stable:sql:data_types:literal_types::underscores-in-numeric-literals} DuckDB's SQL dialect allows using the underscore character `_` in numeric literals as an optional separator. The rules for using underscores are as follows: * Underscores are allowed in integer, decimal, hexadecimal and binary notation. * Underscores can not be the first or last character in a literal. * Underscores have to have an integer/numeric part on either side of them, i.e., there can not be multiple underscores in a row and not immediately before/after a decimal or exponent. Examples: ```sql SELECT 100_000_000; -- 100000000 SELECT '0xFF_FF'::INTEGER; -- 65535 SELECT 1_2.1_2E0_1; -- 121.2 SELECT '0b0_1_0_1'::INTEGER; -- 5 ``` #### String Literals {#docs:stable:sql:data_types:literal_types::string-literals} String literals are delimited using single quotes (` '`, apostrophe) and result in `STRING_LITERAL` values. Note that double quotes (` "`) cannot be used as string delimiter character: instead, double quotes are used to delimit [quoted identifiers](#docs:stable:sql:dialect:keywords_and_identifiers::identifiers). ##### Implicit String Literal Concatenation {#docs:stable:sql:data_types:literal_types::implicit-string-literal-concatenation} Consecutive single-quoted string literals separated only by whitespace that contains at least one newline are implicitly concatenated: ```sql SELECT 'Hello' ' ' 'World' AS greeting; ``` is equivalent to: ```sql SELECT 'Hello' || ' ' || 'World' AS greeting; ``` They both return the following result: | greeting | |-------------| | Hello World | Note that implicit concatenation only works if there is at least one newline between the literals. Using adjacent string literals separated by whitespace without a newline results in a syntax error: ```sql SELECT 'Hello' ' ' 'World' AS greeting; ``` ```console Parser Error: syntax error at or near "' '" LINE 1: SELECT 'Hello' ' ' 'World' AS greeting; ^ ``` Also note that implicit concatenation only works with single-quoted string literals, and does not work with other kinds of string values. ##### Implicit String Conversion {#docs:stable:sql:data_types:literal_types::implicit-string-conversion} `STRING_LITERAL` instances can be implicitly converted to _any_ other type. For example, we can compare string literals with dates: ```sql SELECT d > '1992-01-01' AS result FROM (VALUES (DATE '1992-01-01')) t(d); ``` | result | |:-------| | false | However, we cannot compare `VARCHAR` values with dates. ```sql SELECT d > '1992-01-01'::VARCHAR FROM (VALUES (DATE '1992-01-01')) t(d); ``` ```console Binder Error: Cannot compare values of type DATE and type VARCHAR - an explicit cast is required ``` ##### Escape String Literals {#docs:stable:sql:data_types:literal_types::escape-string-literals} To escape a single quote (apostrophe) character in a string literal, use `''`. For example, `SELECT '''' AS s` returns `'`. To enable some common escape sequences, such as `\n` for the newline character, prefix a string literal with `e` (or `E`). ```sql SELECT e'Hello\nworld' AS msg; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ msg â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Hello\nworld â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` The following backslash escape sequences are supported: | Escape sequence | Name | ASCII code | |:--|:--|--:| | `\b` | backspace | 8 | | `\f` | form feed | 12 | | `\n` | newline | 10 | | `\r` | carriage return | 13 | | `\t` | tab | 9 | ##### Dollar-Quoted String Literals {#docs:stable:sql:data_types:literal_types::dollar-quoted-string-literals} DuckDB supports dollar-quoted string literals, which are surrounded by double-dollar symbols (` $$`): ```sql SELECT $$Hello world$$ AS msg; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ msg â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Hello\nworld â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ```sql SELECT $$The price is $9.95$$ AS msg; ``` | msg | |--------------------| | The price is $9.95 | Even more, you can insert alphanumeric tags in the double-dollar symbols to allow for the use of regular double-dollar symbols *within* the string literal: ```sql SELECT $tag$ this string can contain newlines, 'single quotes', "double quotes", and $$dollar quotes$$ $tag$ AS msg; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ msg â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ this string can contain newlines,\n'single quotes',\n"double quotes",\nand $$dollar quotes$$ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` [Implicit concatenation](#::implicit-string-literal-concatenation) only works for single-quoted string literals, not with dollar-quoted ones. ### Map Type {#docs:stable:sql:data_types:map} `MAP`s are similar to `STRUCT`s in that they are an ordered list of key-value pairs. However, `MAP`s do not need to have the same keys present for each row, and thus are suitable for use cases where the schema is unknown beforehand or varies per row. `MAP`s must have a single type for all keys, and a single type for all values. Keys and values can be any type, and the type of the keys does not need to match the type of the values (e.g., a `MAP` of `VARCHAR` to `INT` is valid). `MAP`s may not have duplicate keys. `MAP`s return `NULL` if a key is not found rather than throwing an error as structs do. In contrast, `STRUCT`s must have string keys, but each value may have a different type. See the [data types overview](#docs:stable:sql:data_types:overview) for a comparison between nested data types. To construct a `MAP`, use the bracket syntax preceded by the `MAP` keyword. #### Creating Maps {#docs:stable:sql:data_types:map::creating-maps} A map with `VARCHAR` keys and `INTEGER` values. This returns `{key1=10, key2=20, key3=30}`: ```sql SELECT MAP {'key1': 10, 'key2': 20, 'key3': 30}; ``` Alternatively use the `map_from_entries` function. This returns `{key1=10, key2=20, key3=30}`: ```sql SELECT map_from_entries([('key1', 10), ('key2', 20), ('key3', 30)]); ``` A map can be also created using two lists: keys and values. This returns `{key1=10, key2=20, key3=30}`: ```sql SELECT MAP(['key1', 'key2', 'key3'], [10, 20, 30]); ``` A map can also use `INTEGER` keys and `NUMERIC` values. This returns `{1=42.001, 5=-32.100}`: ```sql SELECT MAP {1: 42.001, 5: -32.1}; ``` Keys and/or values can also be nested types. This returns `{[a, b]=[1.1, 2.2], [c, d]=[3.3, 4.4]}`: ```sql SELECT MAP {['a', 'b']: [1.1, 2.2], ['c', 'd']: [3.3, 4.4]}; ``` Create a table with a map column that has `INTEGER` keys and `DOUBLE` values: ```sql CREATE TABLE tbl (col MAP(INTEGER, DOUBLE)); ``` #### Retrieving from Maps {#docs:stable:sql:data_types:map::retrieving-from-maps} `MAP` values can be retrieved using the `map_extract_value` function or bracket notation: ```sql SELECT MAP {'key1': 5, 'key2': 43}['key1']; ``` ```text 5 ``` If the key has the wrong type, an error is thrown. If it has the correct type but is merely not contained in the map, a `NULL` value is returned: ```sql SELECT MAP {'key1': 5, 'key2': 43}['key3']; ``` ```text NULL ``` The `map_extract` function (and its synonym `element_at`) can be used to retrieve a value wrapped in a list; it returns an empty list if the key is not contained in the map: ```sql SELECT map_extract(MAP {'key1': 5, 'key2': 43}, 'key1'); ``` ```text [5] ``` ```sql SELECT MAP {'key1': 5, 'key2': 43}['key3']; ``` ```text [] ``` #### Comparison Operators {#docs:stable:sql:data_types:map::comparison-operators} Nested types can be compared using all the [comparison operators](#docs:stable:sql:expressions:comparison_operators). These comparisons can be used in [logical expressions](#docs:stable:sql:expressions:logical_operators) for both `WHERE` and `HAVING` clauses, as well as for creating [Boolean values](#docs:stable:sql:data_types:boolean). The ordering is defined positionally in the same way that words can be ordered in a dictionary. `NULL` values compare greater than all other values and are considered equal to each other. At the top level, `NULL` nested values obey standard SQL `NULL` comparison rules: comparing a `NULL` nested value to a non-`NULL` nested value produces a `NULL` result. Comparing nested value _members_, however, uses the internal nested value rules for `NULL`s, and a `NULL` nested value member will compare above a non-`NULL` nested value member. #### Functions {#docs:stable:sql:data_types:map::functions} See [Map Functions](#docs:stable:sql:functions:map). ### NULL Values {#docs:stable:sql:data_types:nulls} `NULL` values are special values that are used to represent missing data in SQL. Columns of any type can contain `NULL` values. Logically, a `NULL` value can be seen as â€œthe value of this field is unknownâ€. A `NULL` value can be inserted to any field that does not have the `NOT NULL` qualifier: ```sql CREATE TABLE integers (i INTEGER); INSERT INTO integers VALUES (NULL); ``` `NULL` values have special semantics in many parts of the query as well as in many functions: > Any comparison with a `NULL` value returns `NULL`, including `NULL = NULL`. You can use `IS NOT DISTINCT FROM` to perform an equality comparison where `NULL` values compare equal to each other. Use `IS (NOT) NULL` to check if a value is `NULL`. ```sql SELECT NULL = NULL; ``` ```text NULL ``` ```sql SELECT NULL IS NOT DISTINCT FROM NULL; ``` ```text true ``` ```sql SELECT NULL IS NULL; ``` ```text true ``` #### NULL and Functions {#docs:stable:sql:data_types:nulls::null-and-functions} A function that has input argument as `NULL` **usually** returns `NULL`. ```sql SELECT cos(NULL); ``` ```text NULL ``` The `coalesce` function is an exception to this: it takes any number of arguments, and returns for each row the first argument that is not `NULL`. If all arguments are `NULL`, `coalesce` also returns `NULL`. ```sql SELECT coalesce(NULL, NULL, 1); ``` ```text 1 ``` ```sql SELECT coalesce(10, 20); ``` ```text 10 ``` ```sql SELECT coalesce(NULL, NULL); ``` ```text NULL ``` The `ifnull` function is a two-argument version of `coalesce`. ```sql SELECT ifnull(NULL, 'default_string'); ``` ```text default_string ``` ```sql SELECT ifnull(1, 'default_string'); ``` ```text 1 ``` #### `NULL` and `AND` / `OR` {#docs:stable:sql:data_types:nulls::null-and-and--or} `NULL` values have special behavior when used with `AND` and `OR`. For details, see the [Boolean Type documentation](#docs:stable:sql:data_types:boolean). #### `NULL` and `IN` / `NOT IN` {#docs:stable:sql:data_types:nulls::null-and-in--not-in} The behavior of `... IN âŸ¨something with a NULLâŸ©`{:.language-sql .highlight} is different from `... IN âŸ¨something with no NULLsâŸ©`{:.language-sql .highlight}. For details, see the [`IN` documentation](#docs:stable:sql:expressions:in). #### `NULL` and Aggregate Functions {#docs:stable:sql:data_types:nulls::null-and-aggregate-functions} `NULL` values are ignored in most aggregate functions. Aggregate functions that do not ignore `NULL` values include: `first`, `last`, `list`, and `array_agg`. To exclude `NULL` values from those aggregate functions, the [`FILTER` clause](#docs:stable:sql:query_syntax:filter) can be used. ```sql CREATE TABLE integers (i INTEGER); INSERT INTO integers VALUES (1), (10), (NULL); ``` ```sql SELECT min(i) FROM integers; ``` ```text 1 ``` ```sql SELECT max(i) FROM integers; ``` ```text 10 ``` ### Numeric Types {#docs:stable:sql:data_types:numeric} #### Fixed-Width Integer Types {#docs:stable:sql:data_types:numeric::fixed-width-integer-types} The types `TINYINT`, `SMALLINT`, `INTEGER`, `BIGINT` and `HUGEINT` store whole numbers, that is, numbers without fractional components, of various ranges. Attempts to store values outside of the allowed range will result in an error. The types `UTINYINT`, `USMALLINT`, `UINTEGER`, `UBIGINT` and `UHUGEINT` store whole unsigned numbers. Attempts to store negative numbers or values outside of the allowed range will result in an error. | Name | Aliases | Min | Max | Size in bytes | | :---------- | :------------------------------- | ------: | --------: | ------------: | | `TINYINT` | `INT1` | - 2^7 | 2^7 - 1 | 1 | | `SMALLINT` | `INT2`, `INT16`, `SHORT` | - 2^15 | 2^15 - 1 | 2 | | `INTEGER` | `INT4`, `INT32`, `INT`, `SIGNED` | - 2^31 | 2^31 - 1 | 4 | | `BIGINT` | `INT8`, `INT64`, `LONG` | - 2^63 | 2^63 - 1 | 8 | | `HUGEINT` | `INT128` | - 2^127 | 2^127 - 1 | 16 | | `UTINYINT` | `UINT8` | 0 | 2^8 - 1 | 1 | | `USMALLINT` | `UINT16` | 0 | 2^16 - 1 | 2 | | `UINTEGER` | `UINT32` | 0 | 2^32 - 1 | 4 | | `UBIGINT` | `UINT64` | 0 | 2^64 - 1 | 8 | | `UHUGEINT` | `UINT128` | 0 | 2^128 - 1 | 16 | > `INT8` is a 64-bit integer, and is not the signed equivalent of `UINT8`, an unsigned, 8-bit integer. The type aliases `INT1`, `INT2`, `INT4` and `INT8` for signed integers were inherited from PostgreSQL, where digits in these names indicate their size in *bytes*, whereas the type aliases for their unsigned equivalents, `UINT8`, `UINT16`, `UINT32` and `UINT64`, indicate their size in *bits* following the C/C++ convention. The type integer is the common choice, as it offers the best balance between range, storage size, and performance. The `SMALLINT` type is generally only used if disk space is at a premium. The `BIGINT` and `HUGEINT` types are designed to be used when the range of the integer type is insufficient. #### Variable-Length Integers {#docs:stable:sql:data_types:numeric::variable-length-integers} The previously mentioned integer types all have in common that the numbers in the minimum and maximum range all have the same storage size, `UTINYINT` is 1 byte, `SMALLINT` is 2 bytes, etc. But sometimes you need numbers that are even bigger than what is supported by a `HUGEINT`! In these situations, you can use the `BIGNUM` type, which stores positive numbers in a similar fashion as other integer types, but uses three additional bytes to store the required size and a sign bit. A number with `N` decimal digits requires approximately `0.415 * N + 3` bytes when stored in a `BIGNUM`. Unlike variable-length integer implementations in other systems, there are limits to `BIGNUM`: the maximal and minimal representable values are approximately `Â±4.27e20201778`. Those are numbers with 20,201,779 decimal digits and storing a single such number requires 8 megabytes. #### Fixed-Point Decimals {#docs:stable:sql:data_types:numeric::fixed-point-decimals} The data type `DECIMAL(WIDTH, SCALE)` (also available under the alias `NUMERIC(WIDTH, SCALE)`) represents an exact fixed-point decimal value. When creating a value of type `DECIMAL`, the `WIDTH` and `SCALE` can be specified to define which size of decimal values can be held in the field. The `WIDTH` field determines how many digits can be held, and the `scale` determines the amount of digits after the decimal point. For example, the type `DECIMAL(3, 2)` can fit the value `1.23`, but cannot fit the value `12.3` or the value `1.234`. The default `WIDTH` and `SCALE` is `DECIMAL(18, 3)`, if none are specified. Addition, subtraction, and multiplication of two fixed-point decimals returns another fixed-point decimal with the required `WIDTH` and `SCALE` to contain the exact result, or throws an error if the required `WIDTH` would exceed the maximal supported `WIDTH`, which is currently 38. Division of fixed-point decimals does not typically produce numbers with finite decimal expansion. Therefore, DuckDB uses approximate [floating-point arithmetic](#::floating-point-types) for all divisions that involve fixed-point decimals and accordingly returns floating-point data types. Internally, decimals are represented as integers depending on their specified `WIDTH`. | Width | Internal | Size (bytes) | | :---- | :------- | -----------: | | 1-4 | `INT16` | 2 | | 5-9 | `INT32` | 4 | | 10-18 | `INT64` | 8 | | 19-38 | `INT128` | 16 | Performance can be impacted by using too large decimals when not required. In particular decimal values with a width above 19 are slow, as arithmetic involving the `INT128` type is much more expensive than operations involving the `INT32` or `INT64` types. It is therefore recommended to stick with a `WIDTH` of `18` or below, unless there is a good reason for why this is insufficient. #### Floating-Point Types {#docs:stable:sql:data_types:numeric::floating-point-types} The data types `FLOAT` and `DOUBLE` precision are variable-precision numeric types. In practice, these types are usually implementations of IEEE Standard 754 for Binary Floating-Point Arithmetic (single and double precision, respectively), to the extent that the underlying processor, operating system, and compiler support it. | Name | Aliases | Description | | :------- | :--------------- | :----------------------------------------------- | | `FLOAT` | `FLOAT4`, `REAL` | Single precision floating-point number (4 bytes) | | `DOUBLE` | `FLOAT8` | Double precision floating-point number (8 bytes) | Like for fixed-point data types, conversion from literals or casts from other datatypes to floating-point types stores inputs that cannot be represented exactly as approximations. However, it can be harder to predict what inputs are affected by this. For example, it is not surprising that `1.3::DECIMAL(1, 0) - 0.7::DECIMAL(1, 0) != 0.6::DECIMAL(1, 0)` but it may he surprising that `1.3::FLOAT - 0.7::FLOAT != 0.6::FLOAT`. Additionally, whereas multiplication, addition, and subtraction of fixed-point decimal data types is exact, these operations are only approximate on floating-point binary data types. For more complex mathematical operations, however, floating-point arithmetic is used internally and more precise results can be obtained if intermediate steps are _not_ cast to fixed point formats of the same width as in- and outputs. For example, `(10::FLOAT / 3::FLOAT)::FLOAT * 3 = 10` whereas `(10::DECIMAL(18, 3) / 3::DECIMAL(18, 3))::DECIMAL(18, 3) * 3 = 9.999`. In general, we advise that: - If you require exact storage of numbers with a known number of decimal digits and require exact additions, subtractions, and multiplications (such as for monetary amounts), use the [`DECIMAL` data type](#::fixed-point-decimals) or its `NUMERIC` alias instead. - If you want to do fast or complicated calculations, the floating-point data types may be more appropriate. However, if you use the results for anything important, you should evaluate your implementation carefully for corner cases (ranges, infinities, underflows, invalid operations) that may be handled differently from what you expect and you should familiarize yourself with common floating-point pitfalls. The article [â€œWhat Every Computer Scientist Should Know About Floating-Point Arithmeticâ€ by David Goldberg](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) and the [floating point series on Bruce Dawson's blog](https://randomascii.wordpress.com/2017/06/19/sometimes-floating-point-math-is-perfect/) provide excellent starting points. On most platforms, the `FLOAT` type has a range of at least 1E-37 to 1E+37 with a precision of at least 6 decimal digits. The `DOUBLE` type typically has a range of around 1E-307 to 1E+308 with a precision of at least 15 digits. Positive numbers outside of these ranges (and negative numbers ourside the mirrored ranges) may cause errors on some platforms but will usually be converted to zero or infinity, respectively. In addition to ordinary numeric values, the floating-point types have several special values representing IEEE 754 special values: - `Infinity`: infinity - `-Infinity`: negative infinity - `NaN`: not a number On machines with the required CPU/FPU support, DuckDB follows the IEEE 754 specification regarding these special values, with two exceptions: - `NaN` compares equal to `NaN` and greater than any other floating point number. - Some floating point functions, like `sqrt` / `sin` / `asin` throw errors rather than return `NaN` for values outside their ranges of definition. To insert these values as literals in a SQL command, you must put quotes around them, you may abbreviate `Infinity` as `Inf`, and you may use any capitalization. For example: ```sql SELECT sqrt(2) > '-inf', 'nan' > sqrt(2); ``` | `(sqrt(2) > '-inf')` | `('nan' > sqrt(2))` | | -------------------: | ------------------: | | true | true | #### Universally Unique Identifiers (` UUID`s) {#docs:stable:sql:data_types:numeric::universally-unique-identifiers--uuids} DuckDB supports [universally unique identifiers (UUIDs)](https://en.wikipedia.org/wiki/Universally_unique_identifier) through the `UUID` type. These use 128 bits and are represented internally as `HUGEINT` values. When printed, they are shown with lowercase hexadecimal characters, separated by dashes as follows: `âŸ¨12345678âŸ©-âŸ¨1234âŸ©-âŸ¨1234âŸ©-âŸ¨1234âŸ©-âŸ¨1234567890abâŸ©`{:.language-sql .highlight} (using 36 characters in total including the dashes). For example, `4ac7a9e9-607c-4c8a-84f3-843f0191e3fd` is a valid UUID. DuckDB supports generating UUIDv4 and [UUIDv7](https://uuid7.com/) identifiers. To retrieve the version of a UUID value, use the [`uuid_extract_version` function](#docs:stable:sql:functions:utility::uuid_extract_versionuuid). ##### UUIDv4 {#docs:stable:sql:data_types:numeric::uuidv4} To generate a UUIDv4 value, use the [`uuid()` function](#docs:stable:sql:functions:utility::uuid) or it aliases the [`uuidv4()`](#docs:stable:sql:functions:utility::uuidv4) and [`gen_random_uuid()`](#docs:stable:sql:functions:utility::gen_random_uuid) functions. ##### UUIDv7 {#docs:stable:sql:data_types:numeric::uuidv7} To generate a UUIDv7 value, use the [`uuidv7()`](#docs:stable:sql:functions:utility::uuidv7) function. To retrieve the timestamp from a UUIDv7 value, use the [`uuid_extract_timestamp` function](#docs:stable:sql:functions:utility::uuid_extract_timestampuuidv7): ```sql SELECT uuid_extract_timestamp(uuidv7()) AS ts; ``` | ts | | ------------------------- | | 2025-04-19 15:51:20.07+00 | #### Functions {#docs:stable:sql:data_types:numeric::functions} See [Numeric Functions and Operators](#docs:stable:sql:functions:numeric). ### Struct Data Type {#docs:stable:sql:data_types:struct} Conceptually, a `STRUCT` column contains an ordered list of columns called â€œentriesâ€. The entries are referenced by name using strings. This document refers to those entry names as keys. Each row in the `STRUCT` column must have the same keys. The names of the struct entries are part of the *schema*. Each row in a `STRUCT` column must have the same layout. The names of the struct entries are case-insensitive. `STRUCT`s are typically used to nest multiple columns into a single column, and the nested column can be of any type, including other `STRUCT`s and `LIST`s. `STRUCT`s are similar to PostgreSQL's `ROW` type. The key difference is that DuckDB `STRUCT`s require the same keys in each row of a `STRUCT` column. This allows DuckDB to provide significantly improved performance by fully utilizing its vectorized execution engine, and also enforces type consistency for improved correctness. DuckDB includes a `row` function as a special way to produce a `STRUCT`, but does not have a `ROW` data type. See an example below and the [`STRUCT` functions documentation](#docs:stable:sql:functions:struct) for details. See the [data types overview](#docs:stable:sql:data_types:overview) for a comparison between nested data types. ##### Creating Structs {#docs:stable:sql:data_types:struct::creating-structs} Structs can be created using the [`struct_pack(name := expr, ...)`](#docs:stable:sql:functions:struct) function, the equivalent array notation `{'name': expr, ...}`, using a row variable, or using the `row` function. Create a struct using the `struct_pack` function. Note the lack of single quotes around the keys and the use of the `:=` operator: ```sql SELECT struct_pack(key1 := 'value1', key2 := 42) AS s; ``` Create a struct using the array notation: ```sql SELECT {'key1': 'value1', 'key2': 42} AS s; ``` Create a struct using a row variable: ```sql SELECT d AS s FROM (SELECT 'value1' AS key1, 42 AS key2) d; ``` Create a struct of integers: ```sql SELECT {'x': 1, 'y': 2, 'z': 3} AS s; ``` Create a struct of strings with a `NULL` value: ```sql SELECT {'yes': 'duck', 'maybe': 'goose', 'huh': NULL, 'no': 'heron'} AS s; ``` Create a struct with a different type for each key: ```sql SELECT {'key1': 'string', 'key2': 1, 'key3': 12.345} AS s; ``` Create a struct of structs with `NULL` values: ```sql SELECT { 'birds': {'yes': 'duck', 'maybe': 'goose', 'huh': NULL, 'no': 'heron'}, 'aliens': NULL, 'amphibians': {'yes': 'frog', 'maybe': 'salamander', 'huh': 'dragon', 'no': 'toad'} } AS s; ``` ##### Adding or Updating Fields of Structs {#docs:stable:sql:data_types:struct::adding-or-updating-fields-of-structs} To add new fields or update existing ones, you can use `struct_update`: ```sql SELECT struct_update({'a': 1, 'b': 2}, b := 3, c := 4) AS s; ``` Alternatively, `struct_insert` also allows adding new fields but not updating existing ones. ##### Retrieving from Structs {#docs:stable:sql:data_types:struct::retrieving-from-structs} Retrieving a value from a struct can be accomplished using dot notation, bracket notation, or through [struct functions](#docs:stable:sql:functions:struct) like `struct_extract`. Use dot notation to retrieve the value at a key's location. In the following query, the subquery generates a struct column `a`, which we then query with `a.x`. ```sql SELECT a.x FROM (SELECT {'x': 1, 'y': 2, 'z': 3} AS a); ``` If a key contains a space, simply wrap it in double quotes (` "`). ```sql SELECT a."x space" FROM (SELECT {'x space': 1, 'y': 2, 'z': 3} AS a); ``` Bracket notation may also be used. Note that this uses single quotes (` '`) since the goal is to specify a certain string key and only constant expressions may be used inside the brackets (no expressions): ```sql SELECT a['x space'] FROM (SELECT {'x space': 1, 'y': 2, 'z': 3} AS a); ``` The `struct_extract` function is also equivalent. This returns 1: ```sql SELECT struct_extract({'x space': 1, 'y': 2, 'z': 3}, 'x space'); ``` ###### `unnest` / `STRUCT.*` {#docs:stable:sql:data_types:struct::unnest--struct} Rather than retrieving a single key from a struct, the `unnest` special function can be used to retrieve all keys from a struct as separate columns. This is particularly useful when a prior operation creates a struct of unknown shape, or if a query must handle any potential struct keys: ```sql SELECT unnest(a) FROM (SELECT {'x': 1, 'y': 2, 'z': 3} AS a); ``` | x | y | z | |--:|--:|--:| | 1 | 2 | 3 | The same can be achieved with the star notation (` *`), which additionally allows [modifications of the returned columns](#docs:stable:sql:expressions:star): ```sql SELECT a.* EXCLUDE ('y') FROM (SELECT {'x': 1, 'y': 2, 'z': 3} AS a); ``` | x | z | |--:|--:| | 1 | 3 | > **Warning.** The star notation is currently limited to top-level struct columns and non-aggregate expressions. ##### Dot Notation Order of Operations {#docs:stable:sql:data_types:struct::dot-notation-order-of-operations} Referring to structs with dot notation can be ambiguous with referring to schemas and tables. In general, DuckDB looks for columns first, then for struct keys within columns. DuckDB resolves references in these orders, using the first match to occur: ###### No Dots {#docs:stable:sql:data_types:struct::no-dots} ```sql SELECT part1 FROM tbl; ``` 1. `part1` is a column ###### One Dot {#docs:stable:sql:data_types:struct::one-dot} ```sql SELECT part1.part2 FROM tbl; ``` 1. `part1` is a table, `part2` is a column 2. `part1` is a column, `part2` is a property of that column ###### Two (or More) Dots {#docs:stable:sql:data_types:struct::two-or-more-dots} ```sql SELECT part1.part2.part3 FROM tbl; ``` 1. `part1` is a schema, `part2` is a table, `part3` is a column 2. `part1` is a table, `part2` is a column, `part3` is a property of that column 3. `part1` is a column, `part2` is a property of that column, `part3` is a property of that column Any extra parts (e.g., `.part4.part5`, etc.) are always treated as properties ##### Creating Structs with the `row` Function {#docs:stable:sql:data_types:struct::creating-structs-with-the-row-function} The `row` function can be used to automatically convert multiple columns to a single struct column. When using `row` the keys will be empty strings allowing for easy insertion into a table with a struct column. Columns, however, cannot be initialized with the `row` function, and must be explicitly named. For example, inserting values into a struct column using the `row` function: ```sql CREATE TABLE t1 (s STRUCT(v VARCHAR, i INTEGER)); INSERT INTO t1 VALUES (row('a', 42)); SELECT * FROM t1; ``` The table will contain a single entry: ```sql {'v': a, 'i': 42} ``` The following produces the same result as above: ```sql CREATE TABLE t1 AS ( SELECT row('a', 42)::STRUCT(v VARCHAR, i INTEGER) ); ``` Initializing a struct column with the `row` function will fail: ```sql CREATE TABLE t2 AS SELECT row('a'); ``` ```console Invalid Input Error: A table cannot be created from an unnamed struct ``` When casting between structs, the names of at least one field have to match. Therefore, the following query will fail: ```sql SELECT a::STRUCT(y INTEGER) AS b FROM (SELECT {'x': 42} AS a); ``` ```console Binder Error: STRUCT to STRUCT cast must have at least one matching member ``` A workaround for this is to use [`struct_pack`](#::creating-structs) instead: ```sql SELECT struct_pack(y := a.x) AS b FROM (SELECT {'x': 42} AS a); ``` The `row` function can be used to return unnamed structs. For example: ```sql SELECT row(x, x + 1, y) FROM (SELECT 1 AS x, 'a' AS y) AS s; ``` This produces `(1, 2, a)`. If using multiple expressions when creating a struct, the `row` function is optional. The following query returns the same result as the previous one: ```sql SELECT (x, x + 1, y) AS s FROM (SELECT 1 AS x, 'a' AS y); ``` #### Comparison and Ordering {#docs:stable:sql:data_types:struct::comparison-and-ordering} The `STRUCT` type can be compared using all the [comparison operators](#docs:stable:sql:expressions:comparison_operators). These comparisons can be used in [logical expressions](#docs:stable:sql:expressions:logical_operators) such as `WHERE` and `HAVING` clauses, and return [`BOOLEAN` values](#docs:stable:sql:data_types:boolean). Comparisons are done in lexicographical order, with individual entries being compared as usual except that `NULL` values are treated as larger than all other values. Specifically: * If all values of `s1` and `s2` compare equal, then `s1` and `s2` compare equal. * else, if `s1.value[i] < s2.value[i] OR s2.value[i] is NULL` for the first index `i` where `s1.value[i] != s2.value[i]`, then `s1` is less than `s2`, and vice versa. Structs of different types are implicitly cast to a struct type with the union of the involved keys, following the rules for [combination casting](#docs:stable:sql:data_types:typecasting::structs). The following queries return `true`: ```sql SELECT {'k1': 0, 'k2': 0} < {'k1': 1, 'k2': 0}; ``` ```sql SELECT {'k1': 'hello'} < {'k1': 'world'}; ``` ```sql SELECT {'k1': 0, 'k2': 0} < {'k1': 0, 'k2': NULL}; ``` ```sql SELECT {'k1': 0} < {'k2': 0}; ``` ```sql SELECT {'k1': 0, 'k2': 0} < {'k2': 0, 'k3': 0}; ``` ```sql SELECT {'k1': 1, 'k2': 0} > {'k3': 0, 'k1': 0}; ``` The following queries return `false`: ```sql SELECT {'k1': 1, 'k2': 0} < {'k1': 0, 'k2': 1}; ``` ```sql SELECT {'k1': [0]} < {'k1': [0, 0]}; ``` ```sql SELECT {'k1': 1} > {'k2': 0}; ``` ```sql SELECT {'k1': 0, 'k2': 0} < {'k3': 0, 'k1': 1}; ``` ```sql SELECT {'k1': 1, 'k2': 0} > {'k2': 0, 'k3': 0}; ``` #### Updating the Schema {#docs:stable:sql:data_types:struct::updating-the-schema} Starting with DuckDB v1.3.0, it's possible to update the sub-schema of structs using the [`ALTER TABLE` clause](#docs:stable:sql:statements:alter_table). To follow the examples, initialize the `test` table as follows: ```sql CREATE TABLE test (s STRUCT(i INTEGER, j INTEGER)); INSERT INTO test VALUES (ROW(1, 1)), (ROW(2, 2)); ``` ##### Adding a Field {#docs:stable:sql:data_types:struct::adding-a-field} Add field `k INTEGER` to struct `s` in table `test`: ```sql ALTER TABLE test ADD COLUMN s.k INTEGER; FROM test; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ s â”‚ â”‚ struct(i integer, j integer, k integer) â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ {'i': 1, 'j': 1, 'k': NULL} â”‚ â”‚ {'i': 2, 'j': 2, 'k': NULL} â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ##### Dropping a Field {#docs:stable:sql:data_types:struct::dropping-a-field} Drop field `i` from struct `s` in table `test`: ```sql ALTER TABLE test DROP COLUMN s.i; FROM test; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ s â”‚ â”‚ struct(j integer, k integer) â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ {'j': 1, 'k': NULL} â”‚ â”‚ {'j': 2, 'k': NULL} â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ##### Renaming a Field {#docs:stable:sql:data_types:struct::renaming-a-field} Renaming field `j` of struct `s` to `v1` in table test`: ```sql ALTER TABLE test RENAME s.j TO v1; FROM test; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ s â”‚ â”‚ struct(v1 integer, k integer) â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ {'v1': 1, 'k': NULL} â”‚ â”‚ {'v1': 2, 'k': NULL} â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Functions {#docs:stable:sql:data_types:struct::functions} See [Struct Functions](#docs:stable:sql:functions:struct). ### Text Types {#docs:stable:sql:data_types:text} In DuckDB, strings can be stored in the `VARCHAR` field. The field allows storage of Unicode characters. Internally, the data is encoded as UTF-8. | Name | Aliases | Description | |:---|:---|:---| | `VARCHAR` | `CHAR`, `BPCHAR`, `STRING`, `TEXT` | Variable-length character string | | `VARCHAR(n)` | `CHAR(n)`, `BPCHAR(n)`, `STRING(n)`, `TEXT(n)` | Variable-length character string. The maximum length `n` has no effect and is only provided for compatibility | #### Specifying a Length Limit {#docs:stable:sql:data_types:text::specifying-a-length-limit} Specifying the length for the `VARCHAR`, `STRING`, and `TEXT` types is not required and has no effect on the system. Specifying the length will not improve performance or reduce storage space of the strings in the database. These variants are supported for compatibility with other systems that do require a length to be specified for strings. If you wish to restrict the number of characters in a `VARCHAR` column for data integrity reasons the `CHECK` constraint should be used, for example: ```sql CREATE TABLE strings ( val VARCHAR CHECK (length(val) <= 10) -- val has a maximum length of 10 ); ``` The `VARCHAR` field allows storage of Unicode characters. Internally, the data is encoded as UTF-8. #### Specifying a Compression Type {#docs:stable:sql:data_types:text::specifying-a-compression-type} You can specify a compression type for a string with the `USING COMPRESSION` clause. For example, to apply zstd compression, run: ```sql CREATE TABLE tbl (s VARCHAR USING COMPRESSION zstd); ``` #### Text Type Values {#docs:stable:sql:data_types:text::text-type-values} Values of the text type are character strings, also known as string values or simply strings. At runtime, string values are constructed in one of the following ways: * referencing columns whose declared or implied type is the text data type * [string literals](#docs:stable:sql:data_types:literal_types::string-literals) * [casting](#docs:stable:sql:expressions:cast::explicit-casting) expressions to a text type * applying a [string operator](#docs:stable:sql:functions:text::text-functions-and-operators), or invoking a function that returns a text type value #### Strings with Special Characters {#docs:stable:sql:data_types:text::strings-with-special-characters} To use special characters in string, use [escape string literals](#docs:stable:sql:data_types:literal_types::escape-string-literals) or [dollar-quoted string literals](#docs:stable:sql:data_types:literal_types::dollar-quoted-string-literals). Alternatively, you can use concatenation and the [`chr` character function](#docs:stable:sql:functions:text): ```sql SELECT 'Hello' || chr(10) || 'world' AS msg; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ msg â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Hello\nworld â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Functions {#docs:stable:sql:data_types:text::functions} See [Text Functions](#docs:stable:sql:functions:text) and [Pattern Matching](#docs:stable:sql:functions:pattern_matching). ### Time Types {#docs:stable:sql:data_types:time} The `TIME` and `TIMETZ` types specify the hour, minute, second, microsecond of a day. | Name | Aliases | Description | | :-------- | :----------------------- | :--------------------------------- | | `TIME` | `TIME WITHOUT TIME ZONE` | Time of day | | `TIMETZ` | `TIME WITH TIME ZONE` | Time of day, with time zone offset | | `TIME_NS` | | Time of day, nanosecond precision | Instances can be created using the type names as a keyword, where the data must be formatted according to the ISO 8601 format (` hh:mm:ss[.zzzzzz[zzz]][+-TT[:tt]]`). ```sql SELECT TIME '1992-09-20 11:30:00.123456'; ``` ```text 11:30:00.123456 ``` ```sql SELECT TIMETZ '1992-09-20 11:30:00.123456'; ``` ```text 11:30:00.123456+00 ``` ```sql SELECT TIMETZ '1992-09-20 11:30:00.123456-02:00'; ``` ```text 13:30:00.123456+00 ``` ```sql SELECT TIMETZ '1992-09-20 11:30:00.123456+05:30'; ``` ```text 06:00:00.123456+00 ``` ```sql SELECT '15:30:00.123456789'::TIME_NS; ``` ```text 15:30:00.123456789 ``` `TIME_NS` values can also be read from Parquet when the type is [`TIME` with unit `NANOS`](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#time). > **Warning.** The `TIME` type should only be used in rare cases, where the date part of the timestamp can be disregarded. > Most applications should use the [`TIMESTAMP` types](#docs:stable:sql:data_types:timestamp) to represent their timestamps. ### Timestamp Types {#docs:stable:sql:data_types:timestamp} Timestamps represent points in time. As such, they combine [`DATE`](#docs:stable:sql:data_types:date) and [`TIME`](#docs:stable:sql:data_types:time) information. They can be created using the type name followed by a string formatted according to the ISO 8601 format, `YYYY-MM-DD hh:mm:ss[.zzzzzzzzz][+-TT[:tt]]`, which is also the format we use in this documentation. Decimal places beyond the supported precision are ignored. #### Timestamp Types {#docs:stable:sql:data_types:timestamp::timestamp-types} | Name | Aliases | Description | |:---|:---|:---| | `TIMESTAMP_NS` | | Naive timestamp with nanosecond precision | | `TIMESTAMP` | `DATETIME`, `TIMESTAMP WITHOUT TIME ZONE` | Naive timestamp with microsecond precision | | `TIMESTAMP_MS` | | Naive timestamp with millisecond precision | | `TIMESTAMP_S` | | Naive timestamp with second precision | | `TIMESTAMPTZ` | `TIMESTAMP WITH TIME ZONE` | Time zone aware timestamp with microsecond precision | > **Warning.** Since there is not currently a `TIMESTAMP_NS WITH TIME ZONE` data type, external columns with nanosecond precision and `WITH TIME ZONE` semantics, e.g., [Parquet timestamp columns with `isAdjustedToUTC=true`](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#instant-semantics-timestamps-normalized-to-utc), are converted to `TIMESTAMP WITH TIME ZONE` and thus lose precision when read using DuckDB. ```sql SELECT TIMESTAMP_NS '1992-09-20 11:30:00.123456789'; ``` ```text 1992-09-20 11:30:00.123456789 ``` ```sql SELECT TIMESTAMP '1992-09-20 11:30:00.123456789'; ``` ```text 1992-09-20 11:30:00.123456 ``` ```sql SELECT TIMESTAMP_MS '1992-09-20 11:30:00.123456789'; ``` ```text 1992-09-20 11:30:00.123 ``` ```sql SELECT TIMESTAMP_S '1992-09-20 11:30:00.123456789'; ``` ```text 1992-09-20 11:30:00 ``` ```sql SELECT TIMESTAMPTZ '1992-09-20 11:30:00.123456789'; ``` ```text 1992-09-20 11:30:00.123456+00 ``` ```sql SELECT TIMESTAMPTZ '1992-09-20 12:30:00.123456789+01:00'; ``` ```text 1992-09-20 11:30:00.123456+00 ``` DuckDB distinguishes timestamps `WITHOUT TIME ZONE` and `WITH TIME ZONE` (of which the only current representative is `TIMESTAMP WITH TIME ZONE`). Despite the name, a `TIMESTAMP WITH TIME ZONE` does not store time zone information. Instead, it only stores the `INT64` number of non-leap microseconds since the Unix epoch `1970-01-01 00:00:00+00`, and thus unambiguously identifies a point in absolute time, or [*instant*](#docs:stable:sql:data_types:timestamp::instants). The reason for the labels *time zone aware* and `WITH TIME ZONE` is that timestamp arithmetic, [*binning*](#docs:stable:sql:data_types:timestamp::temporal-binning), and string formatting for this type are performed in a [configured time zone](#docs:stable:sql:data_types:timestamp::time-zone-support), which defaults to the system time zone and is just `UTC+00:00` in the examples above. The corresponding `TIMESTAMP WITHOUT TIME ZONE` stores the same `INT64`, but arithmetic, binning, and string formatting follow the straightforward rules of Coordinated Universal Time (UTC) without offsets or time zones. Accordingly, `TIMESTAMP`s could be interpreted as UTC timestamps, but more commonly they are used to represent *local* observations of time recorded in an unspecified time zone, and operations on these types can be interpreted as simply manipulating tuple fields following nominal temporal logic. It is a common data cleaning problem to disambiguate such observations, which may also be stored in raw strings without time zone specification or UTC offsets, into unambiguous `TIMESTAMP WITH TIME ZONE` instants. One possible solution to this is to append UTC offsets to strings, followed by an explicit cast to `TIMESTAMP WITH TIME ZONE`. Alternatively, a `TIMESTAMP WITHOUT TIME ZONE` may be created first and then be combined with a time zone specification to obtain a time zone aware `TIMESTAMP WITH TIME ZONE`. #### Conversion between Strings and NaÃ¯ve / Time Zone-Aware Timestamps {#docs:stable:sql:data_types:timestamp::conversion-between-strings-and-nave--time-zone-aware-timestamps} The conversion between strings *without* UTC offsets or IANA time zone names and `WITHOUT TIME ZONE` types is unambiguous and straightforward. The conversion between strings *with* UTC offsets or time zone names and `WITH TIME ZONE` types is also unambiguous, but requires the `ICU` extension to handle time zone names. When strings *without* UTC offsets or time zone names are converted to a `WITH TIME ZONE` type, the string is interpreted in the configured time zone. When strings with UTC offsets are passed to a `WITHOUT TIME ZONE` type, the offsets or timezone specifications are ignored. When strings with time zone names other than `UTC` are passed to a `WITHOUT TIME ZONE` type, an error is thrown. Finally, when `WITH TIME ZONE` and `WITHOUT TIME ZONE` types are converted to each other via explicit or implicit casts, the translation uses the configured time zone. To use an alternative time zone, the `timezone` function provided by the `ICU` extension may be used: ```sql SELECT timezone('America/Denver', TIMESTAMP '2001-02-16 20:38:40') AS aware1, timezone('America/Denver', TIMESTAMPTZ '2001-02-16 04:38:40') AS naive1, timezone('UTC', TIMESTAMP '2001-02-16 20:38:40+00:00') AS aware2, timezone('UTC', TIMESTAMPTZ '2001-02-16 04:38:40 Europe/Berlin') AS naive2; ``` | aware1 | naive1 | aware2 | naive2 | |------------------------|---------------------|------------------------|---------------------| | 2001-02-17 04:38:40+01 | 2001-02-15 20:38:40 | 2001-02-16 21:38:40+01 | 2001-02-16 03:38:40 | Note that `TIMESTAMP`s are displayed without time zone specification in the results, following ISO 8601 rules for local times, while time-zone aware `TIMESTAMPTZ`s are displayed with the UTC offset of the configured time zone, which is `'Europe/Berlin'` in the example. The UTC offsets of `'America/Denver'` and `'Europe/Berlin'` at all involved instants are `-07:00` and `+01:00`, respectively. #### Special Values {#docs:stable:sql:data_types:timestamp::special-values} Three special strings can be used to create timestamps: | Input string | Description | |:-------------|:-------------------------------------------------| | `epoch` | 1970-01-01 00:00:00[+00] (Unix system time zero) | | `infinity` | Later than all other timestamps | | `-infinity` | Earlier than all other timestamps | The values `infinity` and `-infinity` are special cased and are displayed unchanged, whereas the value `epoch` is simply a notational shorthand that is converted to the corresponding timestamp value when read. ```sql SELECT '-infinity'::TIMESTAMP, 'epoch'::TIMESTAMP, 'infinity'::TIMESTAMP; ``` | Negative | Epoch | Positive | |:----------|:--------------------|:---------| | -infinity | 1970-01-01 00:00:00 | infinity | #### Functions {#docs:stable:sql:data_types:timestamp::functions} See [Timestamp Functions](#docs:stable:sql:functions:timestamp). #### Time Zones {#docs:stable:sql:data_types:timestamp::time-zones} To understand time zones and the `WITH TIME ZONE` types, it helps to start with two concepts: *instants* and *temporal binning*. ##### Instants {#docs:stable:sql:data_types:timestamp::instants} An instant is a point in absolute time, usually given as a count of some time increment from a fixed point in time (called the *epoch*). This is similar to how positions on the earth's surface are given using latitude and longitude relative to the equator and the Greenwich Meridian. In DuckDB, the fixed point is the Unix epoch `1970-01-01 00:00:00+00:00`, and the increment is in seconds, milliseconds, microseconds, or nanoseconds, depending on the specific data type. ##### Temporal Binning {#docs:stable:sql:data_types:timestamp::temporal-binning} Binning is a common practice with continuous data: A range of possible values is broken up into contiguous subsets and the binning operation maps actual values to the *bin* they fall into. *Temporal binning* is simply applying this practice to instants; for example, by binning instants into years, months, and days. ![](../images/blog/timezones/tz-instants-light.svg) ![](../images/blog/timezones/tz-instants-dark.svg) Temporal binning rules are complex, and generally come in two sets: *time zones* and *calendars*. For most tasks, the calendar will just be the widely used Gregorian calendar, but time zones apply locale-specific rules and can vary widely. For example, here is what binning for the `'America/Los_Angeles'` time zone looks like near the epoch: ![](../images/blog/timezones/tz-timezone-light.svg) ![](../images/blog/timezones/tz-timezone-dark.svg) The most common temporal binning problem occurs when daylight savings time changes. The example below contains a daylight savings time change where the "hour" bin is two hours long. To distinguish the two hours, another range of bins containing the offset from UTC is needed: ![](../images/blog/timezones/tz-daylight-light.svg) ![](../images/blog/timezones/tz-daylight-dark.svg) ##### Time Zone Support {#docs:stable:sql:data_types:timestamp::time-zone-support} The `TIMESTAMPTZ` type can be binned into calendar and clock bins using a suitable extension. The built-in [ICU extension](#docs:stable:core_extensions:icu) implements all the binning and arithmetic functions using the [International Components for Unicode](https://icu.unicode.org) time zone and calendar functions. To set the time zone to use, first load the ICU extension. The ICU extension comes pre-bundled with several DuckDB clients (including Python, R, JDBC, and ODBC), so this step can be skipped in those cases. In other cases you might first need to install and load the ICU extension. ```sql INSTALL icu; LOAD icu; ``` Next, use the `SET TimeZone` command: ```sql SET TimeZone = 'America/Los_Angeles'; ``` Time binning operations for `TIMESTAMPTZ` will then be implemented using the given time zone. A list of available time zones can be pulled from the `pg_timezone_names()` table function: ```sql SELECT name, abbrev, utc_offset FROM pg_timezone_names() ORDER BY name; ``` You can also find a reference table of [available time zones](#docs:stable:sql:data_types:timezones). #### Calendar Support {#docs:stable:sql:data_types:timestamp::calendar-support} The [ICU extension](#docs:stable:core_extensions:icu) also supports non-Gregorian calendars using the `SET Calendar` command. Note that the `INSTALL` and `LOAD` steps are only required if the DuckDB client does not bundle the ICU extension. ```sql INSTALL icu; LOAD icu; SET Calendar = 'japanese'; ``` Time binning operations for `TIMESTAMPTZ` will then be implemented using the given calendar. In this example, the `era` part will now report the Japanese imperial era number. A list of available calendars can be pulled from the `icu_calendar_names()` table function: ```sql SELECT name FROM icu_calendar_names() ORDER BY 1; ``` #### Settings {#docs:stable:sql:data_types:timestamp::settings} The current value of the `TimeZone` and `Calendar` settings are determined by ICU when it starts up. They can be queried from in the `duckdb_settings()` table function: ```sql SELECT * FROM duckdb_settings() WHERE name = 'TimeZone'; ``` | name | value | description | input_type | |----------|------------------|-----------------------|------------| | TimeZone | Europe/Amsterdam | The current time zone | VARCHAR | ```sql SELECT * FROM duckdb_settings() WHERE name = 'Calendar'; ``` | name | value | description | input_type | |----------|-----------|----------------------|------------| | Calendar | gregorian | The current calendar | VARCHAR | > If you find that your binning operations are not behaving as you expect, check the `TimeZone` and `Calendar` values and adjust them if needed. ### Time Zone Reference List {#docs:stable:sql:data_types:timezones} An up-to-date version of this list can be pulled from the `pg_timezone_names()` table function: ```sql SELECT name, abbrev FROM pg_timezone_names() ORDER BY name; ``` | name | abbrev | |----------------------------------|----------------------------------| | ACT | ACT | | AET | AET | | AGT | AGT | | ART | ART | | AST | AST | | Africa/Abidjan | Iceland | | Africa/Accra | Iceland | | Africa/Addis_Ababa | EAT | | Africa/Algiers | Africa/Algiers | | Africa/Asmara | EAT | | Africa/Asmera | EAT | | Africa/Bamako | Iceland | | Africa/Bangui | Africa/Bangui | | Africa/Banjul | Iceland | | Africa/Bissau | Africa/Bissau | | Africa/Blantyre | CAT | | Africa/Brazzaville | Africa/Brazzaville | | Africa/Bujumbura | CAT | | Africa/Cairo | ART | | Africa/Casablanca | Africa/Casablanca | | Africa/Ceuta | Africa/Ceuta | | Africa/Conakry | Iceland | | Africa/Dakar | Iceland | | Africa/Dar_es_Salaam | EAT | | Africa/Djibouti | EAT | | Africa/Douala | Africa/Douala | | Africa/El_Aaiun | Africa/El_Aaiun | | Africa/Freetown | Iceland | | Africa/Gaborone | CAT | | Africa/Harare | CAT | | Africa/Johannesburg | Africa/Johannesburg | | Africa/Juba | Africa/Juba | | Africa/Kampala | EAT | | Africa/Khartoum | Africa/Khartoum | | Africa/Kigali | CAT | | Africa/Kinshasa | Africa/Kinshasa | | Africa/Lagos | Africa/Lagos | | Africa/Libreville | Africa/Libreville | | Africa/Lome | Iceland | | Africa/Luanda | Africa/Luanda | | Africa/Lubumbashi | CAT | | Africa/Lusaka | CAT | | Africa/Malabo | Africa/Malabo | | Africa/Maputo | CAT | | Africa/Maseru | Africa/Maseru | | Africa/Mbabane | Africa/Mbabane | | Africa/Mogadishu | EAT | | Africa/Monrovia | Africa/Monrovia | | Africa/Nairobi | EAT | | Africa/Ndjamena | Africa/Ndjamena | | Africa/Niamey | Africa/Niamey | | Africa/Nouakchott | Iceland | | Africa/Ouagadougou | Iceland | | Africa/Porto-Novo | Africa/Porto-Novo | | Africa/Sao_Tome | Africa/Sao_Tome | | Africa/Timbuktu | Iceland | | Africa/Tripoli | Libya | | Africa/Tunis | Africa/Tunis | | Africa/Windhoek | Africa/Windhoek | | America/Adak | America/Adak | | America/Anchorage | AST | | America/Anguilla | PRT | | America/Antigua | PRT | | America/Araguaina | America/Araguaina | | America/Argentina/Buenos_Aires | AGT | | America/Argentina/Catamarca | America/Argentina/Catamarca | | America/Argentina/ComodRivadavia | America/Argentina/ComodRivadavia | | America/Argentina/Cordoba | America/Argentina/Cordoba | | America/Argentina/Jujuy | America/Argentina/Jujuy | | America/Argentina/La_Rioja | America/Argentina/La_Rioja | | America/Argentina/Mendoza | America/Argentina/Mendoza | | America/Argentina/Rio_Gallegos | America/Argentina/Rio_Gallegos | | America/Argentina/Salta | America/Argentina/Salta | | America/Argentina/San_Juan | America/Argentina/San_Juan | | America/Argentina/San_Luis | America/Argentina/San_Luis | | America/Argentina/Tucuman | America/Argentina/Tucuman | | America/Argentina/Ushuaia | America/Argentina/Ushuaia | | America/Aruba | PRT | | America/Asuncion | America/Asuncion | | America/Atikokan | EST | | America/Atka | America/Atka | | America/Bahia | America/Bahia | | America/Bahia_Banderas | America/Bahia_Banderas | | America/Barbados | America/Barbados | | America/Belem | America/Belem | | America/Belize | America/Belize | | America/Blanc-Sablon | PRT | | America/Boa_Vista | America/Boa_Vista | | America/Bogota | America/Bogota | | America/Boise | America/Boise | | America/Buenos_Aires | AGT | | America/Cambridge_Bay | America/Cambridge_Bay | | America/Campo_Grande | America/Campo_Grande | | America/Cancun | America/Cancun | | America/Caracas | America/Caracas | | America/Catamarca | America/Catamarca | | America/Cayenne | America/Cayenne | | America/Cayman | EST | | America/Chicago | CST | | America/Chihuahua | America/Chihuahua | | America/Ciudad_Juarez | America/Ciudad_Juarez | | America/Coral_Harbour | EST | | America/Cordoba | America/Cordoba | | America/Costa_Rica | America/Costa_Rica | | America/Creston | MST | | America/Cuiaba | America/Cuiaba | | America/Curacao | PRT | | America/Danmarkshavn | America/Danmarkshavn | | America/Dawson | America/Dawson | | America/Dawson_Creek | America/Dawson_Creek | | America/Denver | Navajo | | America/Detroit | America/Detroit | | America/Dominica | PRT | | America/Edmonton | America/Edmonton | | America/Eirunepe | America/Eirunepe | | America/El_Salvador | America/El_Salvador | | America/Ensenada | America/Ensenada | | America/Fort_Nelson | America/Fort_Nelson | | America/Fort_Wayne | IET | | America/Fortaleza | America/Fortaleza | | America/Glace_Bay | America/Glace_Bay | | America/Godthab | America/Godthab | | America/Goose_Bay | America/Goose_Bay | | America/Grand_Turk | America/Grand_Turk | | America/Grenada | PRT | | America/Guadeloupe | PRT | | America/Guatemala | America/Guatemala | | America/Guayaquil | America/Guayaquil | | America/Guyana | America/Guyana | | America/Halifax | America/Halifax | | America/Havana | Cuba | | America/Hermosillo | America/Hermosillo | | America/Indiana/Indianapolis | IET | | America/Indiana/Knox | America/Indiana/Knox | | America/Indiana/Marengo | America/Indiana/Marengo | | America/Indiana/Petersburg | America/Indiana/Petersburg | | America/Indiana/Tell_City | America/Indiana/Tell_City | | America/Indiana/Vevay | America/Indiana/Vevay | | America/Indiana/Vincennes | America/Indiana/Vincennes | | America/Indiana/Winamac | America/Indiana/Winamac | | America/Indianapolis | IET | | America/Inuvik | America/Inuvik | | America/Iqaluit | America/Iqaluit | | America/Jamaica | Jamaica | | America/Jujuy | America/Jujuy | | America/Juneau | America/Juneau | | America/Kentucky/Louisville | America/Kentucky/Louisville | | America/Kentucky/Monticello | America/Kentucky/Monticello | | America/Knox_IN | America/Knox_IN | | America/Kralendijk | PRT | | America/La_Paz | America/La_Paz | | America/Lima | America/Lima | | America/Los_Angeles | PST | | America/Louisville | America/Louisville | | America/Lower_Princes | PRT | | America/Maceio | America/Maceio | | America/Managua | America/Managua | | America/Manaus | America/Manaus | | America/Marigot | PRT | | America/Martinique | America/Martinique | | America/Matamoros | America/Matamoros | | America/Mazatlan | America/Mazatlan | | America/Mendoza | America/Mendoza | | America/Menominee | America/Menominee | | America/Merida | America/Merida | | America/Metlakatla | America/Metlakatla | | America/Mexico_City | America/Mexico_City | | America/Miquelon | America/Miquelon | | America/Moncton | America/Moncton | | America/Monterrey | America/Monterrey | | America/Montevideo | America/Montevideo | | America/Montreal | America/Montreal | | America/Montserrat | PRT | | America/Nassau | America/Nassau | | America/New_York | EST5EDT | | America/Nipigon | America/Nipigon | | America/Nome | America/Nome | | America/Noronha | America/Noronha | | America/North_Dakota/Beulah | America/North_Dakota/Beulah | | America/North_Dakota/Center | America/North_Dakota/Center | | America/North_Dakota/New_Salem | America/North_Dakota/New_Salem | | America/Nuuk | America/Nuuk | | America/Ojinaga | America/Ojinaga | | America/Panama | EST | | America/Pangnirtung | America/Pangnirtung | | America/Paramaribo | America/Paramaribo | | America/Phoenix | MST | | America/Port-au-Prince | America/Port-au-Prince | | America/Port_of_Spain | PRT | | America/Porto_Acre | America/Porto_Acre | | America/Porto_Velho | America/Porto_Velho | | America/Puerto_Rico | PRT | | America/Punta_Arenas | America/Punta_Arenas | | America/Rainy_River | America/Rainy_River | | America/Rankin_Inlet | America/Rankin_Inlet | | America/Recife | America/Recife | | America/Regina | America/Regina | | America/Resolute | America/Resolute | | America/Rio_Branco | America/Rio_Branco | | America/Rosario | America/Rosario | | America/Santa_Isabel | America/Santa_Isabel | | America/Santarem | America/Santarem | | America/Santiago | America/Santiago | | America/Santo_Domingo | America/Santo_Domingo | | America/Sao_Paulo | BET | | America/Scoresbysund | America/Scoresbysund | | America/Shiprock | Navajo | | America/Sitka | America/Sitka | | America/St_Barthelemy | PRT | | America/St_Johns | CNT | | America/St_Kitts | PRT | | America/St_Lucia | PRT | | America/St_Thomas | PRT | | America/St_Vincent | PRT | | America/Swift_Current | America/Swift_Current | | America/Tegucigalpa | America/Tegucigalpa | | America/Thule | America/Thule | | America/Thunder_Bay | America/Thunder_Bay | | America/Tijuana | America/Tijuana | | America/Toronto | America/Toronto | | America/Tortola | PRT | | America/Vancouver | America/Vancouver | | America/Virgin | PRT | | America/Whitehorse | America/Whitehorse | | America/Winnipeg | America/Winnipeg | | America/Yakutat | America/Yakutat | | America/Yellowknife | America/Yellowknife | | Antarctica/Casey | Antarctica/Casey | | Antarctica/Davis | Antarctica/Davis | | Antarctica/DumontDUrville | Antarctica/DumontDUrville | | Antarctica/Macquarie | Antarctica/Macquarie | | Antarctica/Mawson | Antarctica/Mawson | | Antarctica/McMurdo | NZ | | Antarctica/Palmer | Antarctica/Palmer | | Antarctica/Rothera | Antarctica/Rothera | | Antarctica/South_Pole | NZ | | Antarctica/Syowa | Antarctica/Syowa | | Antarctica/Troll | Antarctica/Troll | | Antarctica/Vostok | Antarctica/Vostok | | Arctic/Longyearbyen | Arctic/Longyearbyen | | Asia/Aden | Asia/Aden | | Asia/Almaty | Asia/Almaty | | Asia/Amman | Asia/Amman | | Asia/Anadyr | Asia/Anadyr | | Asia/Aqtau | Asia/Aqtau | | Asia/Aqtobe | Asia/Aqtobe | | Asia/Ashgabat | Asia/Ashgabat | | Asia/Ashkhabad | Asia/Ashkhabad | | Asia/Atyrau | Asia/Atyrau | | Asia/Baghdad | Asia/Baghdad | | Asia/Bahrain | Asia/Bahrain | | Asia/Baku | Asia/Baku | | Asia/Bangkok | Asia/Bangkok | | Asia/Barnaul | Asia/Barnaul | | Asia/Beirut | Asia/Beirut | | Asia/Bishkek | Asia/Bishkek | | Asia/Brunei | Asia/Brunei | | Asia/Calcutta | IST | | Asia/Chita | Asia/Chita | | Asia/Choibalsan | Asia/Choibalsan | | Asia/Chongqing | CTT | | Asia/Chungking | CTT | | Asia/Colombo | Asia/Colombo | | Asia/Dacca | BST | | Asia/Damascus | Asia/Damascus | | Asia/Dhaka | BST | | Asia/Dili | Asia/Dili | | Asia/Dubai | Asia/Dubai | | Asia/Dushanbe | Asia/Dushanbe | | Asia/Famagusta | Asia/Famagusta | | Asia/Gaza | Asia/Gaza | | Asia/Harbin | CTT | | Asia/Hebron | Asia/Hebron | | Asia/Ho_Chi_Minh | VST | | Asia/Hong_Kong | Hongkong | | Asia/Hovd | Asia/Hovd | | Asia/Irkutsk | Asia/Irkutsk | | Asia/Istanbul | Turkey | | Asia/Jakarta | Asia/Jakarta | | Asia/Jayapura | Asia/Jayapura | | Asia/Jerusalem | Israel | | Asia/Kabul | Asia/Kabul | | Asia/Kamchatka | Asia/Kamchatka | | Asia/Karachi | PLT | | Asia/Kashgar | Asia/Kashgar | | Asia/Kathmandu | Asia/Kathmandu | | Asia/Katmandu | Asia/Katmandu | | Asia/Khandyga | Asia/Khandyga | | Asia/Kolkata | IST | | Asia/Krasnoyarsk | Asia/Krasnoyarsk | | Asia/Kuala_Lumpur | Singapore | | Asia/Kuching | Asia/Kuching | | Asia/Kuwait | Asia/Kuwait | | Asia/Macao | Asia/Macao | | Asia/Macau | Asia/Macau | | Asia/Magadan | Asia/Magadan | | Asia/Makassar | Asia/Makassar | | Asia/Manila | Asia/Manila | | Asia/Muscat | Asia/Muscat | | Asia/Nicosia | Asia/Nicosia | | Asia/Novokuznetsk | Asia/Novokuznetsk | | Asia/Novosibirsk | Asia/Novosibirsk | | Asia/Omsk | Asia/Omsk | | Asia/Oral | Asia/Oral | | Asia/Phnom_Penh | Asia/Phnom_Penh | | Asia/Pontianak | Asia/Pontianak | | Asia/Pyongyang | Asia/Pyongyang | | Asia/Qatar | Asia/Qatar | | Asia/Qostanay | Asia/Qostanay | | Asia/Qyzylorda | Asia/Qyzylorda | | Asia/Rangoon | Asia/Rangoon | | Asia/Riyadh | Asia/Riyadh | | Asia/Saigon | VST | | Asia/Sakhalin | Asia/Sakhalin | | Asia/Samarkand | Asia/Samarkand | | Asia/Seoul | ROK | | Asia/Shanghai | CTT | | Asia/Singapore | Singapore | | Asia/Srednekolymsk | Asia/Srednekolymsk | | Asia/Taipei | ROC | | Asia/Tashkent | Asia/Tashkent | | Asia/Tbilisi | Asia/Tbilisi | | Asia/Tehran | Iran | | Asia/Tel_Aviv | Israel | | Asia/Thimbu | Asia/Thimbu | | Asia/Thimphu | Asia/Thimphu | | Asia/Tokyo | JST | | Asia/Tomsk | Asia/Tomsk | | Asia/Ujung_Pandang | Asia/Ujung_Pandang | | Asia/Ulaanbaatar | Asia/Ulaanbaatar | | Asia/Ulan_Bator | Asia/Ulan_Bator | | Asia/Urumqi | Asia/Urumqi | | Asia/Ust-Nera | Asia/Ust-Nera | | Asia/Vientiane | Asia/Vientiane | | Asia/Vladivostok | Asia/Vladivostok | | Asia/Yakutsk | Asia/Yakutsk | | Asia/Yangon | Asia/Yangon | | Asia/Yekaterinburg | Asia/Yekaterinburg | | Asia/Yerevan | NET | | Atlantic/Azores | Atlantic/Azores | | Atlantic/Bermuda | Atlantic/Bermuda | | Atlantic/Canary | Atlantic/Canary | | Atlantic/Cape_Verde | Atlantic/Cape_Verde | | Atlantic/Faeroe | Atlantic/Faeroe | | Atlantic/Faroe | Atlantic/Faroe | | Atlantic/Jan_Mayen | Atlantic/Jan_Mayen | | Atlantic/Madeira | Atlantic/Madeira | | Atlantic/Reykjavik | Iceland | | Atlantic/South_Georgia | Atlantic/South_Georgia | | Atlantic/St_Helena | Iceland | | Atlantic/Stanley | Atlantic/Stanley | | Australia/ACT | AET | | Australia/Adelaide | Australia/Adelaide | | Australia/Brisbane | Australia/Brisbane | | Australia/Broken_Hill | Australia/Broken_Hill | | Australia/Canberra | AET | | Australia/Currie | Australia/Currie | | Australia/Darwin | ACT | | Australia/Eucla | Australia/Eucla | | Australia/Hobart | Australia/Hobart | | Australia/LHI | Australia/LHI | | Australia/Lindeman | Australia/Lindeman | | Australia/Lord_Howe | Australia/Lord_Howe | | Australia/Melbourne | Australia/Melbourne | | Australia/NSW | AET | | Australia/North | ACT | | Australia/Perth | Australia/Perth | | Australia/Queensland | Australia/Queensland | | Australia/South | Australia/South | | Australia/Sydney | AET | | Australia/Tasmania | Australia/Tasmania | | Australia/Victoria | Australia/Victoria | | Australia/West | Australia/West | | Australia/Yancowinna | Australia/Yancowinna | | BET | BET | | BST | BST | | Brazil/Acre | Brazil/Acre | | Brazil/DeNoronha | Brazil/DeNoronha | | Brazil/East | BET | | Brazil/West | Brazil/West | | CAT | CAT | | CET | CET | | CNT | CNT | | CST | CST | | CST6CDT | CST | | CTT | CTT | | Canada/Atlantic | Canada/Atlantic | | Canada/Central | Canada/Central | | Canada/East-Saskatchewan | Canada/East-Saskatchewan | | Canada/Eastern | Canada/Eastern | | Canada/Mountain | Canada/Mountain | | Canada/Newfoundland | CNT | | Canada/Pacific | Canada/Pacific | | Canada/Saskatchewan | Canada/Saskatchewan | | Canada/Yukon | Canada/Yukon | | Chile/Continental | Chile/Continental | | Chile/EasterIsland | Chile/EasterIsland | | Cuba | Cuba | | EAT | EAT | | ECT | ECT | | EET | EET | | EST | EST | | EST5EDT | EST5EDT | | Egypt | ART | | Eire | Eire | | Etc/GMT | GMT | | Etc/GMT+0 | GMT | | Etc/GMT+1 | Etc/GMT+1 | | Etc/GMT+10 | Etc/GMT+10 | | Etc/GMT+11 | Etc/GMT+11 | | Etc/GMT+12 | Etc/GMT+12 | | Etc/GMT+2 | Etc/GMT+2 | | Etc/GMT+3 | Etc/GMT+3 | | Etc/GMT+4 | Etc/GMT+4 | | Etc/GMT+5 | Etc/GMT+5 | | Etc/GMT+6 | Etc/GMT+6 | | Etc/GMT+7 | Etc/GMT+7 | | Etc/GMT+8 | Etc/GMT+8 | | Etc/GMT+9 | Etc/GMT+9 | | Etc/GMT-0 | GMT | | Etc/GMT-1 | Etc/GMT-1 | | Etc/GMT-10 | Etc/GMT-10 | | Etc/GMT-11 | Etc/GMT-11 | | Etc/GMT-12 | Etc/GMT-12 | | Etc/GMT-13 | Etc/GMT-13 | | Etc/GMT-14 | Etc/GMT-14 | | Etc/GMT-2 | Etc/GMT-2 | | Etc/GMT-3 | Etc/GMT-3 | | Etc/GMT-4 | Etc/GMT-4 | | Etc/GMT-5 | Etc/GMT-5 | | Etc/GMT-6 | Etc/GMT-6 | | Etc/GMT-7 | Etc/GMT-7 | | Etc/GMT-8 | Etc/GMT-8 | | Etc/GMT-9 | Etc/GMT-9 | | Etc/GMT0 | GMT | | Etc/Greenwich | GMT | | Etc/UCT | UCT | | Etc/UTC | UCT | | Etc/Universal | UCT | | Etc/Zulu | UCT | | Europe/Amsterdam | CET | | Europe/Andorra | Europe/Andorra | | Europe/Astrakhan | Europe/Astrakhan | | Europe/Athens | EET | | Europe/Belfast | GB | | Europe/Belgrade | Europe/Belgrade | | Europe/Berlin | Europe/Berlin | | Europe/Bratislava | Europe/Bratislava | | Europe/Brussels | CET | | Europe/Bucharest | Europe/Bucharest | | Europe/Budapest | Europe/Budapest | | Europe/Busingen | Europe/Busingen | | Europe/Chisinau | Europe/Chisinau | | Europe/Copenhagen | Europe/Copenhagen | | Europe/Dublin | Eire | | Europe/Gibraltar | Europe/Gibraltar | | Europe/Guernsey | GB | | Europe/Helsinki | Europe/Helsinki | | Europe/Isle_of_Man | GB | | Europe/Istanbul | Turkey | | Europe/Jersey | GB | | Europe/Kaliningrad | Europe/Kaliningrad | | Europe/Kiev | Europe/Kiev | | Europe/Kirov | Europe/Kirov | | Europe/Kyiv | Europe/Kyiv | | Europe/Lisbon | WET | | Europe/Ljubljana | Europe/Ljubljana | | Europe/London | GB | | Europe/Luxembourg | CET | | Europe/Madrid | Europe/Madrid | | Europe/Malta | Europe/Malta | | Europe/Mariehamn | Europe/Mariehamn | | Europe/Minsk | Europe/Minsk | | Europe/Monaco | ECT | | Europe/Moscow | W-SU | | Europe/Nicosia | Europe/Nicosia | | Europe/Oslo | Europe/Oslo | | Europe/Paris | ECT | | Europe/Podgorica | Europe/Podgorica | | Europe/Prague | Europe/Prague | | Europe/Riga | Europe/Riga | | Europe/Rome | Europe/Rome | | Europe/Samara | Europe/Samara | | Europe/San_Marino | Europe/San_Marino | | Europe/Sarajevo | Europe/Sarajevo | | Europe/Saratov | Europe/Saratov | | Europe/Simferopol | Europe/Simferopol | | Europe/Skopje | Europe/Skopje | | Europe/Sofia | Europe/Sofia | | Europe/Stockholm | Europe/Stockholm | | Europe/Tallinn | Europe/Tallinn | | Europe/Tirane | Europe/Tirane | | Europe/Tiraspol | Europe/Tiraspol | | Europe/Ulyanovsk | Europe/Ulyanovsk | | Europe/Uzhgorod | Europe/Uzhgorod | | Europe/Vaduz | Europe/Vaduz | | Europe/Vatican | Europe/Vatican | | Europe/Vienna | Europe/Vienna | | Europe/Vilnius | Europe/Vilnius | | Europe/Volgograd | Europe/Volgograd | | Europe/Warsaw | Poland | | Europe/Zagreb | Europe/Zagreb | | Europe/Zaporozhye | Europe/Zaporozhye | | Europe/Zurich | Europe/Zurich | | Factory | Factory | | GB | GB | | GB-Eire | GB | | GMT | GMT | | GMT+0 | GMT | | GMT-0 | GMT | | GMT0 | GMT | | Greenwich | GMT | | HST | HST | | Hongkong | Hongkong | | IET | IET | | IST | IST | | Iceland | Iceland | | Indian/Antananarivo | EAT | | Indian/Chagos | Indian/Chagos | | Indian/Christmas | Indian/Christmas | | Indian/Cocos | Indian/Cocos | | Indian/Comoro | EAT | | Indian/Kerguelen | Indian/Kerguelen | | Indian/Mahe | Indian/Mahe | | Indian/Maldives | Indian/Maldives | | Indian/Mauritius | Indian/Mauritius | | Indian/Mayotte | EAT | | Indian/Reunion | Indian/Reunion | | Iran | Iran | | Israel | Israel | | JST | JST | | Jamaica | Jamaica | | Japan | JST | | Kwajalein | Kwajalein | | Libya | Libya | | MET | CET | | MIT | MIT | | MST | MST | | MST7MDT | Navajo | | Mexico/BajaNorte | Mexico/BajaNorte | | Mexico/BajaSur | Mexico/BajaSur | | Mexico/General | Mexico/General | | NET | NET | | NST | NZ | | NZ | NZ | | NZ-CHAT | NZ-CHAT | | Navajo | Navajo | | PLT | PLT | | PNT | MST | | PRC | CTT | | PRT | PRT | | PST | PST | | PST8PDT | PST | | Pacific/Apia | MIT | | Pacific/Auckland | NZ | | Pacific/Bougainville | Pacific/Bougainville | | Pacific/Chatham | NZ-CHAT | | Pacific/Chuuk | Pacific/Chuuk | | Pacific/Easter | Pacific/Easter | | Pacific/Efate | Pacific/Efate | | Pacific/Enderbury | Pacific/Enderbury | | Pacific/Fakaofo | Pacific/Fakaofo | | Pacific/Fiji | Pacific/Fiji | | Pacific/Funafuti | Pacific/Funafuti | | Pacific/Galapagos | Pacific/Galapagos | | Pacific/Gambier | Pacific/Gambier | | Pacific/Guadalcanal | SST | | Pacific/Guam | Pacific/Guam | | Pacific/Honolulu | HST | | Pacific/Johnston | HST | | Pacific/Kanton | Pacific/Kanton | | Pacific/Kiritimati | Pacific/Kiritimati | | Pacific/Kosrae | Pacific/Kosrae | | Pacific/Kwajalein | Kwajalein | | Pacific/Majuro | Pacific/Majuro | | Pacific/Marquesas | Pacific/Marquesas | | Pacific/Midway | Pacific/Midway | | Pacific/Nauru | Pacific/Nauru | | Pacific/Niue | Pacific/Niue | | Pacific/Norfolk | Pacific/Norfolk | | Pacific/Noumea | Pacific/Noumea | | Pacific/Pago_Pago | Pacific/Pago_Pago | | Pacific/Palau | Pacific/Palau | | Pacific/Pitcairn | Pacific/Pitcairn | | Pacific/Pohnpei | SST | | Pacific/Ponape | SST | | Pacific/Port_Moresby | Pacific/Port_Moresby | | Pacific/Rarotonga | Pacific/Rarotonga | | Pacific/Saipan | Pacific/Saipan | | Pacific/Samoa | Pacific/Samoa | | Pacific/Tahiti | Pacific/Tahiti | | Pacific/Tarawa | Pacific/Tarawa | | Pacific/Tongatapu | Pacific/Tongatapu | | Pacific/Truk | Pacific/Truk | | Pacific/Wake | Pacific/Wake | | Pacific/Wallis | Pacific/Wallis | | Pacific/Yap | Pacific/Yap | | Poland | Poland | | Portugal | WET | | ROC | ROC | | ROK | ROK | | SST | SST | | Singapore | Singapore | | SystemV/AST4 | SystemV/AST4 | | SystemV/AST4ADT | SystemV/AST4ADT | | SystemV/CST6 | SystemV/CST6 | | SystemV/CST6CDT | SystemV/CST6CDT | | SystemV/EST5 | SystemV/EST5 | | SystemV/EST5EDT | SystemV/EST5EDT | | SystemV/HST10 | SystemV/HST10 | | SystemV/MST7 | SystemV/MST7 | | SystemV/MST7MDT | SystemV/MST7MDT | | SystemV/PST8 | SystemV/PST8 | | SystemV/PST8PDT | SystemV/PST8PDT | | SystemV/YST9 | SystemV/YST9 | | SystemV/YST9YDT | SystemV/YST9YDT | | Turkey | Turkey | | UCT | UCT | | US/Alaska | AST | | US/Aleutian | US/Aleutian | | US/Arizona | MST | | US/Central | CST | | US/East-Indiana | IET | | US/Eastern | EST5EDT | | US/Hawaii | HST | | US/Indiana-Starke | US/Indiana-Starke | | US/Michigan | US/Michigan | | US/Mountain | Navajo | | US/Pacific | PST | | US/Pacific-New | PST | | US/Samoa | US/Samoa | | UTC | UCT | | Universal | UCT | | VST | VST | | W-SU | W-SU | | WET | WET | | Zulu | UCT | ### Union Type {#docs:stable:sql:data_types:union} A `UNION` *type* (not to be confused with the SQL [`UNION` operator](#docs:stable:sql:query_syntax:setops::union-all-by-name)) is a nested type capable of holding one of multiple â€œalternativeâ€ values, much like the `union` in C. The main difference being that these `UNION` types are *tagged unions* and thus always carry a discriminator â€œtagâ€ which signals which alternative it is currently holding, even if the inner value itself is null. `UNION` types are thus more similar to C++17's `std::variant`, Rust's `Enum` or the â€œsum typeâ€ present in most functional languages. `UNION` types must always have at least one member, and while they can contain multiple members of the same type, the tag names must be unique. `UNION` types can have at most 256 members. Under the hood, `UNION` types are implemented on top of `STRUCT` types, and simply keep the â€œtagâ€ as the first entry. `UNION` values can be created with the [`union_value(tag := expr)`](#docs:stable:sql:functions:union) function or by [casting from a member type](#::casting-to-unions). #### Example {#docs:stable:sql:data_types:union::example} Create a table with a `UNION` column: ```sql CREATE TABLE tbl1 (u UNION(num INTEGER, str VARCHAR)); INSERT INTO tbl1 VALUES (1), ('two'), (union_value(str := 'three')); ``` Any type can be implicitly cast to a `UNION` containing the type. Any `UNION` can also be implicitly cast to another `UNION` if the source `UNION` members are a subset of the target's (if the cast is unambiguous). `UNION` uses the member types' `VARCHAR` cast functions when casting to `VARCHAR`: ```sql SELECT u FROM tbl1; ``` | u | |-------| | 1 | | two | | three | Select all the `str` members: ```sql SELECT union_extract(u, 'str') AS str FROM tbl1; ``` | str | |-------| | NULL | | two | | three | Alternatively, you can use 'dot syntax' similarly to [`STRUCT`s](#docs:stable:sql:data_types:struct). ```sql SELECT u.str FROM tbl1; ``` | str | |-------| | NULL | | two | | three | Select the currently active tag from the `UNION` as an `ENUM`. ```sql SELECT union_tag(u) AS t FROM tbl1; ``` | t | |-----| | num | | str | | str | #### Union Casts {#docs:stable:sql:data_types:union::union-casts} Compared to other nested types, `UNION`s allow a set of implicit casts to facilitate unintrusive and natural usage when working with their members as â€œsubtypesâ€. However, these casts have been designed with two principles in mind, to avoid ambiguity and to avoid casts that could lead to loss of information. This prevents `UNION`s from being completely â€œtransparentâ€, while still allowing `UNION` types to have a â€œsupertypeâ€ relationship with their members. Thus `UNION` types can't be implicitly cast to any of their member types in general, since the information in the other members not matching the target type would be â€œlostâ€. If you want to coerce a `UNION` into one of its members, you should use the `union_extract` function explicitly instead. The only exception to this is when casting a `UNION` to `VARCHAR`, in which case the members will all use their corresponding `VARCHAR` casts. Since everything can be cast to `VARCHAR`, this is â€œsafeâ€ in a sense. ##### Casting to Unions {#docs:stable:sql:data_types:union::casting-to-unions} A type can always be implicitly cast to a `UNION` if it can be implicitly cast to one of the `UNION` member types. * If there are multiple candidates, the built in implicit casting priority rules determine the target type. For example, a `FLOAT` â†’ `UNION(i INTEGER, v VARCHAR)` cast will always cast the `FLOAT` to the `INTEGER` member before `VARCHAR`. * If the cast still is ambiguous, i.e., there are multiple candidates with the same implicit casting priority, an error is raised. This usually happens when the `UNION` contains multiple members of the same type, e.g., a `FLOAT` â†’ `UNION(i INTEGER, num INTEGER)` is always ambiguous. So how do we disambiguate if we want to create a `UNION` with multiple members of the same type? By using the `union_value` function, which takes a keyword argument specifying the tag. For example, `union_value(num := 2::INTEGER)` will create a `UNION` with a single member of type `INTEGER` with the tag `num`. This can then be used to disambiguate in an explicit (or implicit, read on below!) `UNION` to `UNION` cast, like `CAST(union_value(b := 2) AS UNION(a INTEGER, b INTEGER))`. ##### Casting between Unions {#docs:stable:sql:data_types:union::casting-between-unions} `UNION` types can be cast between each other if the source type is a â€œsubsetâ€ of the target type. In other words, all the tags in the source `UNION` must be present in the target `UNION`, and all the types of the matching tags must be implicitly castable between source and target. In essence, this means that `UNION` types are covariant with respect to their members. | Ok | Source | Target | Comments | |----|------------------------|------------------------|----------------------------------------| | âœ… | `UNION(a A, b B)` | `UNION(a A, b B, c C)` | | | âœ… | `UNION(a A, b B)` | `UNION(a A, b C)` | if `B` can be implicitly cast to `C` | | âŒ | `UNION(a A, b B, c C)` | `UNION(a A, b B)` | | | âŒ | `UNION(a A, b B)` | `UNION(a A, b C)` | if `B` can't be implicitly cast to `C` | | âŒ | `UNION(A, B, D)` | `UNION(A, B, C)` | | #### Comparison and Sorting {#docs:stable:sql:data_types:union::comparison-and-sorting} Since `UNION` types are implemented on top of `STRUCT` types internally, they can be used with all the comparison operators as well as in both `WHERE` and `HAVING` clauses with the [same semantics as `STRUCT`s](#docs:stable:sql:data_types:struct::comparison-operators). The â€œtagâ€ is always stored as the first struct entry, which ensures that the `UNION` types are compared and ordered by â€œtagâ€ first. #### Functions {#docs:stable:sql:data_types:union::functions} See [Union Functions](#docs:stable:sql:functions:union). ### Typecasting {#docs:stable:sql:data_types:typecasting} Typecasting is an operation that converts a value in one particular data type to the closest corresponding value in another data type. Like other SQL engines, DuckDB supports both implicit and explicit typecasting. #### Explicit Casting {#docs:stable:sql:data_types:typecasting::explicit-casting} Explicit typecasting is performed by using a `CAST` expression. For example, `CAST(col AS VARCHAR)` or `col::VARCHAR` explicitly cast the column `col` to `VARCHAR`. See the [cast page](#docs:stable:sql:expressions:cast) for more information. #### Implicit Casting {#docs:stable:sql:data_types:typecasting::implicit-casting} In many situations, the system will add casts by itself. This is called *implicit* casting and happens, for example, when a function is called with an argument that does not match the type of the function but can be cast to the required type. Implicit casts can only be added for a number of type combinations, and is generally only possible when the cast cannot fail. For example, an implicit cast can be added from `INTEGER` to `DOUBLE` â€“ but not from `DOUBLE` to `INTEGER`. Consider the function `sin(DOUBLE)`. This function takes as input argument a column of type `DOUBLE`, however, it can be called with an integer as well: `sin(1)`. The integer is converted into a double before being passed to the `sin` function. > **Tip.** To check whether a type can be implicitly cast to another type, use the [`can_cast_implicitly` function](#docs:stable:sql:functions:utility::can_cast_implicitlysource_value-target_value). ##### Combination Casting {#docs:stable:sql:data_types:typecasting::combination-casting} When values of different types need to be combined to an unspecified joint parent type, the system will perform implicit casts to an automatically selected parent type. For example, `list_value(1::INT64, 1::UINT64)` creates a list of type `INT128[]`. The implicit casts performed in this situation are sometimes more lenient than regular implicit casts. For example, a `BOOL` value may be cast to `INT` (with `true` mapping to `1` and `false` to `0`) even though this is not possible for regular implicit casts. This *combination casting* occurs for comparisons (` =` / `<` / `>`), set operations (` UNION` / `EXCEPT` / `INTERSECT`), and nested type constructors (` list_value` / `[...]` / `MAP`). #### Casting Operations Matrix {#docs:stable:sql:data_types:typecasting::casting-operations-matrix} Values of a particular data type cannot always be cast to any arbitrary target data type. The only exception is the `NULL` value â€“ which can always be converted between types. The following matrix describes which conversions are supported. When implicit casting is allowed, it implies that explicit casting is also possible. ![Typecasting matrix](../images/typecasting-matrix.png) Even though a casting operation is supported based on the source and target data type, it does not necessarily mean the cast operation will succeed at runtime. > **Deprecated.** Prior to version 0.10.0, DuckDB allowed any type to be implicitly cast to `VARCHAR` during function binding. > Version 0.10.0 introduced a [breaking change which no longer allows implicit casts to `VARCHAR`](https://duckdb.org/2024/02/13/announcing-duckdb-0100#breaking-sql-changes). > The [`old_implicit_casting` configuration option](#docs:stable:configuration:pragmas::implicit-casting-to-varchar) setting can be used to revert to the old behavior. > However, please note that this flag will be deprecated in the future. ##### Lossy Casts {#docs:stable:sql:data_types:typecasting::lossy-casts} Casting operations that result in loss of precision are allowed. For example, it is possible to explicitly cast a numeric type with fractional digits â€“ such as `DECIMAL`, `FLOAT` or `DOUBLE` â€“ to an integral type like `INTEGER` or `BIGINT`. The number will be rounded. ```sql SELECT CAST(3.1 AS INTEGER); -- 3 SELECT CAST(3.5 AS INTEGER); -- 4 SELECT CAST(-1.7 AS INTEGER); -- -2 ``` ##### Overflows {#docs:stable:sql:data_types:typecasting::overflows} Casting operations that would result in a value overflow throw an error. For example, the value `999` is too large to be represented by the `TINYINT` data type. Therefore, an attempt to cast that value to that type results in a runtime error: ```sql SELECT CAST(999 AS TINYINT); ``` ```console Conversion Error: Type INT32 with value 999 can't be cast because the value is out of range for the destination type INT8 ``` So even though the cast operation from `INTEGER` to `TINYINT` is supported, it is not possible for this particular value. [TRY_CAST](#docs:stable:sql:expressions:cast) can be used to convert the value into `NULL` instead of throwing an error. ##### Varchar {#docs:stable:sql:data_types:typecasting::varchar} The [`VARCHAR`](#docs:stable:sql:data_types:text) type acts as a universal target: any arbitrary value of any arbitrary type can always be cast to the `VARCHAR` type. This type is also used for displaying values in the shell. ```sql SELECT CAST(42.5 AS VARCHAR); ``` Casting from `VARCHAR` to another data type is supported, but can raise an error at runtime if DuckDB cannot parse and convert the provided text to the target data type. ```sql SELECT CAST('NotANumber' AS INTEGER); ``` In general, casting to `VARCHAR` is a lossless operation and any type can be cast back to the original type after being converted into text. ```sql SELECT CAST(CAST([1, 2, 3] AS VARCHAR) AS INTEGER[]); ``` ##### Literal Types {#docs:stable:sql:data_types:typecasting::literal-types} Integer literals (such as `42`) and string literals (such as `'string'`) have special implicit casting rules. See the [literal types page](#docs:stable:sql:data_types:literal_types) for more information. ##### Lists / Arrays {#docs:stable:sql:data_types:typecasting::lists--arrays} Lists can be explicitly cast to other lists using the same casting rules. The cast is applied to the children of the list. For example, if we convert an `INTEGER[]` list to a `VARCHAR[]` list, the child `INTEGER` elements are individually cast to `VARCHAR` and a new list is constructed. ```sql SELECT CAST([1, 2, 3] AS VARCHAR[]); ``` ##### Arrays {#docs:stable:sql:data_types:typecasting::arrays} Arrays follow the same casting rules as lists. In addition, arrays can be implicitly cast to lists of the same type. For example, an `INTEGER[3]` array can be implicitly cast to an `INTEGER[]` list. ##### Structs {#docs:stable:sql:data_types:typecasting::structs} Structs can be cast to other structs as long as they share at least one field. > The rationale behind this requirement is to help avoid unintended errors. If two structs do not have any fields in common, then the cast was likely not intended. ```sql SELECT CAST({'a': 42} AS STRUCT(a VARCHAR)); ``` Fields that exist in the target struct, but that do not exist in the source struct, default to `NULL`. ```sql SELECT CAST({'a': 42} AS STRUCT(a VARCHAR, b VARCHAR)); ``` Fields that only exist in the source struct are ignored. ```sql SELECT CAST({'a': 42, 'b': 43} AS STRUCT(a VARCHAR)); ``` The names of the struct can also be in a different order. The fields of the struct will be reshuffled based on the names of the structs. ```sql SELECT CAST({'a': 42, 'b': 84} AS STRUCT(b VARCHAR, a VARCHAR)); ``` For [combination casting](#docs:stable:sql:data_types:typecasting::combination-casting), the fields of the resulting struct are the superset of all fields of the input structs. This logic also applies recursively to potentially nested structs. ```sql SELECT {'outer1': {'inner1': 42, 'inner2': 42}} AS c UNION SELECT {'outer1': {'inner2': 'hello', 'inner3': 'world'}, 'outer2': '100'} AS c; ``` ```sql SELECT [{'a': 42}, {'b': 84}]; ``` ##### Unions {#docs:stable:sql:data_types:typecasting::unions} Union casting rules can be found on the [`UNION type page`](#docs:stable:sql:data_types:union::casting-to-unions). ## Expressions {#sql:expressions} ### Expressions {#docs:stable:sql:expressions:overview} An expression is a combination of values, operators and functions. Expressions are highly composable, and range from very simple to arbitrarily complex. They can be found in many different parts of SQL statements. In this section, we provide the different types of operators and functions that can be used within expressions. ### CASE Expression {#docs:stable:sql:expressions:case} The `CASE` expression performs a switch based on a condition. The basic form is identical to the ternary condition used in many programming languages (` CASE WHEN cond THEN a ELSE b END` is equivalent to `cond ? a : b`). With a single condition this can be expressed with `IF(cond, a, b)`. ```sql CREATE OR REPLACE TABLE integers AS SELECT unnest([1, 2, 3]) AS i; SELECT i, CASE WHEN i > 2 THEN 1 ELSE 0 END AS test FROM integers; ``` | i | test | |--:|-----:| | 1 | 0 | | 2 | 0 | | 3 | 1 | This is equivalent to: ```sql SELECT i, IF(i > 2, 1, 0) AS test FROM integers; ``` The `WHEN cond THEN expr` part of the `CASE` expression can be chained, whenever any of the conditions returns true for a single tuple, the corresponding expression is evaluated and returned. ```sql CREATE OR REPLACE TABLE integers AS SELECT unnest([1, 2, 3]) AS i; SELECT i, CASE WHEN i = 1 THEN 10 WHEN i = 2 THEN 20 ELSE 0 END AS test FROM integers; ``` | i | test | |--:|-----:| | 1 | 10 | | 2 | 20 | | 3 | 0 | The `ELSE` clause of the `CASE` expression is optional. If no `ELSE` clause is provided and none of the conditions match, the `CASE` expression will return `NULL`. ```sql CREATE OR REPLACE TABLE integers AS SELECT unnest([1, 2, 3]) AS i; SELECT i, CASE WHEN i = 1 THEN 10 END AS test FROM integers; ``` | i | test | |--:|-----:| | 1 | 10 | | 2 | NULL | | 3 | NULL | It is also possible to provide an individual expression after the `CASE` but before the `WHEN`. When this is done, the `CASE` expression is effectively transformed into a switch statement. ```sql CREATE OR REPLACE TABLE integers AS SELECT unnest([1, 2, 3]) AS i; SELECT i, CASE i WHEN 1 THEN 10 WHEN 2 THEN 20 WHEN 3 THEN 30 END AS test FROM integers; ``` | i | test | |--:|-----:| | 1 | 10 | | 2 | 20 | | 3 | 30 | This is equivalent to: ```sql SELECT i, CASE WHEN i = 1 THEN 10 WHEN i = 2 THEN 20 WHEN i = 3 THEN 30 END AS test FROM integers; ``` ### Casting {#docs:stable:sql:expressions:cast} Casting refers to the operation of converting a value in a particular data type to the corresponding value in another data type. Casting can occur either implicitly or explicitly. The syntax described here performs an explicit cast. More information on casting can be found on the [typecasting page](#docs:stable:sql:data_types:typecasting). #### Explicit Casting {#docs:stable:sql:expressions:cast::explicit-casting} The standard SQL syntax for explicit casting is `CAST(expr AS TYPENAME)`, where `TYPENAME` is a name (or alias) of one of [DuckDB's data types](#docs:stable:sql:data_types:overview). DuckDB also supports the shorthand `expr::TYPENAME`, which is also present in PostgreSQL. ```sql SELECT CAST(i AS VARCHAR) AS i FROM generate_series(1, 3) tbl(i); ``` | i | |---| | 1 | | 2 | | 3 | ```sql SELECT i::DOUBLE AS i FROM generate_series(1, 3) tbl(i); ``` | i | |----:| | 1.0 | | 2.0 | | 3.0 | ##### Casting Rules {#docs:stable:sql:expressions:cast::casting-rules} Not all casts are possible. For example, it is not possible to convert an `INTEGER` to a `DATE`. Casts may also throw errors when the cast could not be successfully performed. For example, trying to cast the string `'hello'` to an `INTEGER` will result in an error being thrown. ```sql SELECT CAST('hello' AS INTEGER); ``` ```console Conversion Error: Could not convert string 'hello' to INT32 ``` The exact behavior of the cast depends on the source and destination types. For example, when casting from `VARCHAR` to any other type, the string will be attempted to be converted. ##### `TRY_CAST` {#docs:stable:sql:expressions:cast::try_cast} `TRY_CAST` can be used when the preferred behavior is not to throw an error, but instead to return a `NULL` value. `TRY_CAST` will never throw an error, and will instead return `NULL` if a cast is not possible. ```sql SELECT TRY_CAST('hello' AS INTEGER) AS i; ``` | i | |------| | NULL | #### `cast_to_type` Function {#docs:stable:sql:expressions:cast::cast_to_type-function} The `cast_to_type` function allows generating a cast from an expression to the type of another column. For example: ```sql SELECT cast_to_type('42', NULL::INTEGER) AS result; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â” â”‚ res â”‚ â”‚ int32 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 42 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”˜ ``` This function is primarily useful in [macros](#docs:stable:guides:snippets:sharing_macros), as it allows you to maintain types. This helps with making generic macros that operate on different types. For example, the following macro adds to a number if the input is an `INTEGER`: ```sql CREATE TABLE tbl (i INT, s VARCHAR); INSERT INTO tbl VALUES (42, 'hello world'); CREATE MACRO conditional_add(col, nr) AS CASE WHEN typeof(col) == 'INTEGER' THEN cast_to_type(col::INTEGER + nr, col) ELSE col END; SELECT conditional_add(COLUMNS(*), 100) FROM tbl; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ i â”‚ s â”‚ â”‚ int32 â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 142 â”‚ hello world â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` Note that the `CASE` statement needs to return the same type in all code paths. We can perform the addition on any input column by adding a cast to the desired type â€“ but we need to cast the result of the addition back to the source type to make the binding work. ### Collations {#docs:stable:sql:expressions:collations} Collations provide rules for how text should be sorted or compared in the execution engine. Collations are useful for localization, as the rules for how text should be ordered are different for different languages or for different countries. These orderings are often incompatible with one another. For example, in English the letter `y` comes between `x` and `z`. However, in Lithuanian the letter `y` comes between the `i` and `j`. For that reason, different collations are supported. The user must choose which collation they want to use when performing sorting and comparison operations. By default, the `BINARY` collation is used. That means that strings are ordered and compared based only on their binary contents. This makes sense for standard ASCII characters (i.e., the letters A-Z and numbers 0-9), but generally does not make much sense for special unicode characters. It is, however, by far the fastest method of performing ordering and comparisons. Hence it is recommended to stick with the `BINARY` collation unless required otherwise. > The `BINARY` collation is also available under the aliases `C` and `POSIX`. > **Warning.** Collation support in DuckDB has [some known limitations](https://github.com/duckdb/duckdb/issues?q=is%3Aissue+is%3Aopen+collation+) and has [several planned improvements](https://github.com/duckdb/duckdb/issues/604). #### Using Collations {#docs:stable:sql:expressions:collations::using-collations} In the stand-alone installation of DuckDB three collations are included: `NOCASE`, `NOACCENT` and `NFC`. The `NOCASE` collation compares characters as equal regardless of their casing. The `NOACCENT` collation compares characters as equal regardless of their accents. The `NFC` collation performs NFC-normalized comparisons, see [Unicode normalization](https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization) for more information. ```sql SELECT 'hello' = 'hElLO'; ``` ```text false ``` ```sql SELECT 'hello' COLLATE NOCASE = 'hElLO'; ``` ```text true ``` ```sql SELECT 'hello' = 'hÃ«llo'; ``` ```text false ``` ```sql SELECT 'hello' COLLATE NOACCENT = 'hÃ«llo'; ``` ```text true ``` Collations can be combined by chaining them using the dot operator. Note, however, that not all collations can be combined together. In general, the `NOCASE` collation can be combined with any other collator, but most other collations cannot be combined. ```sql SELECT 'hello' COLLATE NOCASE = 'hElLÃ–'; ``` ```text false ``` ```sql SELECT 'hello' COLLATE NOACCENT = 'hElLÃ–'; ``` ```text false ``` ```sql SELECT 'hello' COLLATE NOCASE.NOACCENT = 'hElLÃ–'; ``` ```text true ``` #### Default Collations {#docs:stable:sql:expressions:collations::default-collations} The collations we have seen so far have all been specified *per expression*. It is also possible to specify a default collator, either on the global database level or on a base table column. The `PRAGMA` `default_collation` can be used to specify the global default collator. This is the collator that will be used if no other one is specified. ```sql SET default_collation = NOCASE; SELECT 'hello' = 'HeLlo'; ``` ```text true ``` Collations can also be specified per-column when creating a table. When that column is then used in a comparison, the per-column collation is used to perform that comparison. ```sql CREATE TABLE names (name VARCHAR COLLATE NOACCENT); INSERT INTO names VALUES ('hÃ¤nnes'); ``` ```sql SELECT name FROM names WHERE name = 'hannes'; ``` ```text hÃ¤nnes ``` Be careful here, however, as different collations cannot be combined. This can be problematic when you want to compare columns that have a different collation specified. ```sql SELECT name FROM names WHERE name = 'hannes' COLLATE NOCASE; ``` ```console ERROR: Cannot combine types with different collation! ``` ```sql CREATE TABLE other_names (name VARCHAR COLLATE NOCASE); INSERT INTO other_names VALUES ('HÃ„NNES'); ``` ```sql SELECT names.name AS name, other_names.name AS other_name FROM names, other_names WHERE names.name = other_names.name; ``` ```console ERROR: Cannot combine types with different collation! ``` We need to manually overwrite the collation: ```sql SELECT names.name AS name, other_names.name AS other_name FROM names, other_names WHERE names.name COLLATE NOACCENT.NOCASE = other_names.name COLLATE NOACCENT.NOCASE; ``` | name | other_name | |--------|------------| | hÃ¤nnes | HÃ„NNES | #### ICU Collations {#docs:stable:sql:expressions:collations::icu-collations} The collations we have seen so far are not region-dependent, and do not follow any specific regional rules. If you wish to follow the rules of a specific region or language, you will need to use one of the ICU collations. For that, you need to [load the ICU extension](#docs:stable:core_extensions:icu::installing-and-loading). Loading this extension will add a number of language and region specific collations to your database. These can be queried using `PRAGMA collations` command, or by querying the `pragma_collations` function. ```sql PRAGMA collations; SELECT list(collname) FROM pragma_collations(); ``` ```text [af, am, ar, ar_sa, as, az, be, bg, bn, bo, br, bs, ca, ceb, chr, cs, cy, da, de, de_at, dsb, dz, ee, el, en, en_us, eo, es, et, fa, fa_af, ff, fi, fil, fo, fr, fr_ca, fy, ga, gl, gu, ha, haw, he, he_il, hi, hr, hsb, hu, hy, icu_noaccent, id, id_id, ig, is, it, ja, ka, kk, kl, km, kn, ko, kok, ku, ky, lb, lkt, ln, lo, lt, lv, mk, ml, mn, mr, ms, mt, my, nb, nb_no, ne, nfc, nl, nn, noaccent, nocase, om, or, pa, pa_in, pl, ps, pt, ro, ru, sa, se, si, sk, sl, smn, sq, sr, sr_ba, sr_me, sr_rs, sv, sw, ta, te, th, tk, to, tr, ug, uk, ur, uz, vi, wae, wo, xh, yi, yo, yue, yue_cn, zh, zh_cn, zh_hk, zh_mo, zh_sg, zh_tw, zu] ``` These collations can then be used as the other collations would be used before. They can also be combined with the `NOCASE` collation. For example, to use the German collation rules you could use the following code snippet: ```sql CREATE TABLE strings (s VARCHAR COLLATE DE); INSERT INTO strings VALUES ('Gabel'), ('GÃ¶bel'), ('Goethe'), ('Goldmann'), ('GÃ¶the'), ('GÃ¶tz'); SELECT * FROM strings ORDER BY s; ``` ```text "Gabel", "GÃ¶bel", "Goethe", "Goldmann", "GÃ¶the", "GÃ¶tz" ``` ### Comparisons {#docs:stable:sql:expressions:comparison_operators} #### Comparison Operators {#docs:stable:sql:expressions:comparison_operators::comparison-operators} The table below shows the standard comparison operators. Whenever either of the input arguments is `NULL`, the output of the comparison is `NULL`. | Operator | Description | Example | Result | |:---|:---|:---|:---| | `<` | less than | `2 < 3` | `true` | | `>` | greater than | `2 > 3` | `false` | | `<=` | less than or equal to | `2 <= 3` | `true` | | `>=` | greater than or equal to | `4 >= NULL` | `NULL` | | `=` or `==` | equal | `NULL = NULL` | `NULL` | | `<>` or `!=` | not equal | `2 <> 2` | `false` | The table below shows the standard distinction operators. These operators treat `NULL` values as equal. | Operator | Description | Example | Result | |:---|:---|:---|:-| | `IS DISTINCT FROM` | not equal, including `NULL` | `2 IS DISTINCT FROM NULL` | `true` | | `IS NOT DISTINCT FROM` | equal, including `NULL` | `NULL IS NOT DISTINCT FROM NULL` | `true` | ##### Combination Casting {#docs:stable:sql:expressions:comparison_operators::combination-casting} When performing comparison on different types, DuckDB performs [Combination Casting](#docs:stable:sql:data_types:typecasting::combination-casting). These casts were introduced to make interactive querying more convenient and are in line with the casts performed by several programming languages but are often not compatible with PostgreSQL's behavior. For example, the following expressions evaluate and return `true` in DuckDB but fail in PostgreSQL. ```sql SELECT 1 = true; SELECT 1 = '1.1'; ``` > It is not possible to enforce stricter type-checking for DuckDB's comparison operators. If you require stricter type-checking, we recommend creating a [macro](#docs:stable:sql:statements:create_macro) with the [`typeof` function](#docs:stable:sql:functions:utility::typeofexpression) or implementing a [user-defined function](#docs:stable:clients:python:function). #### `BETWEEN` and `IS [NOT] NULL` {#docs:stable:sql:expressions:comparison_operators::between-and-is-not-null} Besides the standard comparison operators there are also the `BETWEEN` and `IS (NOT) NULL` operators. These behave much like operators, but have special syntax mandated by the SQL standard. They are shown in the table below. Note that `BETWEEN` and `NOT BETWEEN` are only equivalent to the examples below in the cases where both `a`, `x` and `y` are of the same type, as `BETWEEN` will cast all of its inputs to the same type. | Predicate | Description | |:---|:---| | `a BETWEEN x AND y` | equivalent to `x <= a AND a <= y` | | `a NOT BETWEEN x AND y` | equivalent to `x > a OR a > y` | | `expression IS NULL` | `true` if expression is `NULL`, `false` otherwise | | `expression ISNULL` | alias for `IS NULL` (non-standard) | | `expression IS NOT NULL` | `false` if expression is `NULL`, `true` otherwise | | `expression NOTNULL` | alias for `IS NOT NULL` (non-standard) | > For the expression `BETWEEN x AND y`, `x` is used as the lower bound and `y` is used as the upper bound. Therefore, if `x > y`, the result will always be `false`. ### IN Operator {#docs:stable:sql:expressions:in} The `IN` operator checks containment of the left expression inside the _collection_ on the right hand side (RHS). Supported collections on the RHS are tuples, lists, maps and subqueries that return a single column. #### `IN (val1, val2, ...)` (Tuple) {#docs:stable:sql:expressions:in::in-val1-val2--tuple} The `IN` operator on a tuple `(val1, val2, ...)` returns `true` if the expression is present in the RHS, `false` if the expression is not in the RHS and the RHS has no `NULL` values, or `NULL` if the expression is not in the RHS and the RHS has `NULL` values. ```sql SELECT 'Math' IN ('CS', 'Math'); ``` ```text true ``` ```sql SELECT 'English' IN ('CS', 'Math'); ``` ```text false ``` ```sql SELECT 'Math' IN ('CS', 'Math', NULL); ``` ```text true ``` ```sql SELECT 'English' IN ('CS', 'Math', NULL); ``` ```text NULL ``` #### `IN [val1, val2, ...]` (List) {#docs:stable:sql:expressions:in::in-val1-val2--list} The `IN` operator works on lists according to the semantics used in Python. Unlike for the [`IN tuple` operator](#::in-val1-val2--tuple), the presence of `NULL` values on the right hand side of the expression does not make a difference in the result: ```sql SELECT 'Math' IN ['CS', 'Math', NULL]; ``` ```text true ``` ```sql SELECT 'English' IN ['CS', 'Math', NULL]; ``` ```text false ``` #### `IN` Map {#docs:stable:sql:expressions:in::in-map} The `IN` operator works on [maps](#docs:stable:sql:data_types:map) according to the semantics used in Python, i.e., it checks for the presence of keys (not values): ```sql SELECT 'key1' IN MAP {'key1': 50, 'key2': 75}; ``` ```text true ``` ```sql SELECT 'key3' IN MAP {'key1': 50, 'key2': 75}; ``` ```text false ``` #### `IN` Subquery {#docs:stable:sql:expressions:in::in-subquery} The `IN` operator works with [subqueries](#docs:stable:sql:expressions:subqueries) that return a single column. For example: ```sql SELECT 42 IN (SELECT unnest([32, 42, 52]) AS x); ``` ```text true ``` If the subquery returns more than one column, a Binder Error is thrown: ```sql SELECT 42 IN (SELECT unnest([32, 42, 52]) AS x, 'a' AS y); ``` ```console Binder Error: Subquery returns 2 columns - expected 1 ``` #### `IN` String {#docs:stable:sql:expressions:in::in-string} The `IN` operator can be used as a shorthand for the [`contains` string function](#docs:stable:sql:functions:text::containsstring-search_string). For example: ```sql SELECT 'Hello' IN 'Hello World'; ``` ```text true ``` #### `NOT IN` {#docs:stable:sql:expressions:in::not-in} `NOT IN` can be used to check if an element is not present in the set. `x NOT IN y` is equivalent to `NOT (x IN y)`. ### Logical Operators {#docs:stable:sql:expressions:logical_operators} The following logical operators are available: `AND`, `OR` and `NOT`. SQL uses a three-valuad logic system with `true`, `false` and `NULL`. Note that logical operators involving `NULL` do not always evaluate to `NULL`. For example, `NULL AND false` will evaluate to `false`, and `NULL OR true` will evaluate to `true`. Below are the complete truth tables. ##### Binary Operators: `AND` and `OR` {#docs:stable:sql:expressions:logical_operators::binary-operators-and-and-or} | `a` | `b` | `a AND b` | `a OR b` | |:---|:---|:---|:---| | true | true | true | true | | true | false | false | true | | true | NULL | NULL | true | | false | false | false | false | | false | NULL | false | NULL | | NULL | NULL | NULL | NULL| ##### Unary Operator: NOT {#docs:stable:sql:expressions:logical_operators::unary-operator-not} | `a` | `NOT a` | |:---|:---| | true | false | | false | true | | NULL | NULL | The operators `AND` and `OR` are commutative, that is, you can switch the left and right operand without affecting the result. ### Star Expression {#docs:stable:sql:expressions:star} #### Syntax {#docs:stable:sql:expressions:star::syntax} The `*` expression can be used in a `SELECT` statement to select all columns that are projected in the `FROM` clause. ```sql SELECT * FROM tbl; ``` ##### `TABLE.*` and `STRUCT.*` {#docs:stable:sql:expressions:star::table-and-struct} The `*` expression can be prepended by a table name to select only columns from that table. ```sql SELECT tbl.* FROM tbl JOIN other_tbl USING (id); ``` Similarly, the `*` expression can also be used to retrieve all keys from a struct as separate columns. This is particularly useful when a prior operation creates a struct of unknown shape, or if a query must handle any potential struct keys. See the [`STRUCT` data type](#docs:stable:sql:data_types:struct) and [`STRUCT` functions](#docs:stable:sql:functions:struct) pages for more details on working with structs. For example: ```sql SELECT st.* FROM (SELECT {'x': 1, 'y': 2, 'z': 3} AS st); ``` | x | y | z | |--:|--:|--:| | 1 | 2 | 3 | ##### `EXCLUDE` Clause {#docs:stable:sql:expressions:star::exclude-clause} `EXCLUDE` allows you to exclude specific columns from the `*` expression. ```sql SELECT * EXCLUDE (col) FROM tbl; ``` ##### `REPLACE` Clause {#docs:stable:sql:expressions:star::replace-clause} `REPLACE` allows you to replace specific columns by alternative expressions. ```sql SELECT * REPLACE (col1 / 1_000 AS col1, col2 / 1_000 AS col2) FROM tbl; ``` ##### `RENAME` Clause {#docs:stable:sql:expressions:star::rename-clause} `RENAME` allows you to replace specific columns. ```sql SELECT * RENAME (col1 AS height, col2 AS width) FROM tbl; ``` ##### Column Filtering via Pattern Matching Operators {#docs:stable:sql:expressions:star::column-filtering-via-pattern-matching-operators} The [pattern matching operators](#docs:stable:sql:functions:pattern_matching) `LIKE`, `GLOB`, `SIMILAR TO` and their variants allow you to select columns by matching their names to patterns. ```sql SELECT * LIKE 'col%' FROM tbl; ``` ```sql SELECT * GLOB 'col*' FROM tbl; ``` ```sql SELECT * SIMILAR TO 'col.' FROM tbl; ``` #### `COLUMNS` Expression {#docs:stable:sql:expressions:star::columns-expression} The `COLUMNS` expression is similar to the regular star expression, but additionally allows you to execute the same expression on the resulting columns. ```sql CREATE TABLE numbers (id INTEGER, number INTEGER); INSERT INTO numbers VALUES (1, 10), (2, 20), (3, NULL); SELECT min(COLUMNS(*)), count(COLUMNS(*)) FROM numbers; ``` | id | number | id | number | |---:|-------:|---:|-------:| | 1 | 10 | 3 | 2 | ```sql SELECT min(COLUMNS(* REPLACE (number + id AS number))), count(COLUMNS(* EXCLUDE (number))) FROM numbers; ``` | id | min(number := (number + id)) | id | |---:|-----------------------------:|---:| | 1 | 11 | 3 | `COLUMNS` expressions can also be combined, as long as they contain the same star expression: ```sql SELECT COLUMNS(*) + COLUMNS(*) FROM numbers; ``` | id | number | |---:|-------:| | 2 | 20 | | 4 | 40 | | 6 | NULL | ##### `COLUMNS` Expression in a `WHERE` Clause {#docs:stable:sql:expressions:star::columns-expression-in-a-where-clause} `COLUMNS` expressions can also be used in `WHERE` clauses. The conditions are applied to all columns and are combined using the logical `AND` operator. ```sql SELECT * FROM ( SELECT 0 AS x, 1 AS y, 2 AS z UNION ALL SELECT 1 AS x, 2 AS y, 3 AS z UNION ALL SELECT 2 AS x, 3 AS y, 4 AS z ) WHERE COLUMNS(*) > 1; -- equivalent to: x > 1 AND y > 1 AND z > 1 ``` | x | y | z | |--:|--:|--:| | 2 | 3 | 4 | ##### Regular Expressions in a `COLUMNS` Expression {#docs:stable:sql:expressions:star::regular-expressions-in-a-columns-expression} `COLUMNS` expressions don't currently support the pattern matching operators, but they do supports regular expression matching by simply passing a string constant in place of the star: ```sql SELECT COLUMNS('(id|numbers?)') FROM numbers; ``` | id | number | |---:|-------:| | 1 | 10 | | 2 | 20 | | 3 | NULL | ##### Renaming Columns with Regular Expressions in a `COLUMNS` Expression {#docs:stable:sql:expressions:star::renaming-columns-with-regular-expressions-in-a-columns-expression} The matches of capture groups in regular expressions can be used to rename matching columns. The capture groups are one-indexed; `\0` is the original column name. For example, to select the first three letters of column names, run: ```sql SELECT COLUMNS('(\w{3}).*') AS '\1' FROM numbers; ``` | id | num | |---:|-----:| | 1 | 10 | | 2 | 20 | | 3 | NULL | To remove a colon (` :`) character in the middle of a column name, run: ```sql CREATE TABLE tbl ("Foo:Bar" INTEGER, "Foo:Baz" INTEGER, "Foo:Qux" INTEGER); SELECT COLUMNS('(\w*):(\w*)') AS '\1\2' FROM tbl; ``` To add the original column name to the expression alias, run: ```sql SELECT min(COLUMNS(*)) AS "min_\0" FROM numbers; ``` | min_id | min_number | |-------:|-----------:| | 1 | 10 | ##### `COLUMNS` Lambda Function {#docs:stable:sql:expressions:star::columns-lambda-function} `COLUMNS` also supports passing in a lambda function. The lambda function will be evaluated for all columns present in the `FROM` clause, and only columns that match the lambda function will be returned. This allows the execution of arbitrary expressions in order to select and rename columns. ```sql SELECT COLUMNS(c -> c LIKE '%num%') FROM numbers; ``` | number | |-------:| | 10 | | 20 | | NULL | ##### `COLUMNS` List {#docs:stable:sql:expressions:star::columns-list} `COLUMNS` also supports passing in a list of column names. ```sql SELECT COLUMNS(['id', 'num']) FROM numbers; ``` | id | num | |---:|-----:| | 1 | 10 | | 2 | 20 | | 3 | NULL | #### `*COLUMNS` Unpacked Columns {#docs:stable:sql:expressions:star::columns-unpacked-columns} The `*COLUMNS` clause is a variation of `COLUMNS`, which supports all of the previously mentioned capabilities. The difference is in how the expression expands. `*COLUMNS` will expand in-place, much like the [iterable unpacking behavior in Python](https://peps.python.org/pep-3132/), which inspired the `*` syntax. This implies that the expression expands into the parent expression. An example that shows this difference between `COLUMNS` and `*COLUMNS`: With `COLUMNS`: ```sql SELECT coalesce(COLUMNS(['a', 'b', 'c'])) AS result FROM (SELECT NULL a, 42 b, true c); ``` | result | result | result | |--------|-------:|-------:| | NULL | 42 | true | With `*COLUMNS`, the expression expands in its parent expression `coalesce`, resulting in a single result column: ```sql SELECT coalesce(*COLUMNS(['a', 'b', 'c'])) AS result FROM (SELECT NULL AS a, 42 AS b, true AS c); ``` | result | |-------:| | 42 | `*COLUMNS` also works with the `(*)` argument: ```sql SELECT coalesce(*COLUMNS(*)) AS result FROM (SELECT NULL a, 42 AS b, true AS c); ``` | result | |-------:| | 42 | #### `STRUCT.*` {#docs:stable:sql:expressions:star::struct} The `*` expression can also be used to retrieve all keys from a struct as separate columns. This is particularly useful when a prior operation creates a struct of unknown shape, or if a query must handle any potential struct keys. See the [`STRUCT` data type](#docs:stable:sql:data_types:struct) and [`STRUCT` functions](#docs:stable:sql:functions:struct) pages for more details on working with structs. For example: ```sql SELECT st.* FROM (SELECT {'x': 1, 'y': 2, 'z': 3} AS st); ``` | x | y | z | |--:|--:|--:| | 1 | 2 | 3 | ### Subqueries {#docs:stable:sql:expressions:subqueries} Subqueries are parenthesized query expressions that appear as part of a larger, outer query. Subqueries are usually based on `SELECT ... FROM`, but in DuckDB other query constructs such as [`PIVOT`](#docs:stable:sql:statements:pivot) can also appear as a subquery. #### Scalar Subquery {#docs:stable:sql:expressions:subqueries::scalar-subquery} Scalar subqueries are subqueries that return a single value. They can be used anywhere where an expression can be used. If a scalar subquery returns more than a single value, an error is raised (unless `scalar_subquery_error_on_multiple_rows` is set to `false`, in which case a row is selected randomly). Consider the following table: ##### Grades {#docs:stable:sql:expressions:subqueries::grades} | grade | course | |---:|:---| | 7 | Math | | 9 | Math | | 8 | CS | ```sql CREATE TABLE grades (grade INTEGER, course VARCHAR); INSERT INTO grades VALUES (7, 'Math'), (9, 'Math'), (8, 'CS'); ``` We can run the following query to obtain the minimum grade: ```sql SELECT min(grade) FROM grades; ``` | min(grade) | |-----------:| | 7 | By using a scalar subquery in the `WHERE` clause, we can figure out for which course this grade was obtained: ```sql SELECT course FROM grades WHERE grade = (SELECT min(grade) FROM grades); ``` | course | |--------| | Math | #### `ARRAY` Subqueries {#docs:stable:sql:expressions:subqueries::array-subqueries} Subqueries that return multiple values can be wrapped with `ARRAY` to collect all results in a list. ```sql SELECT ARRAY(SELECT grade FROM grades) AS all_grades; ``` | all_grades | |-----------:| | [7, 9, 8] | #### Subquery Comparisons: `ALL`, `ANY` and `SOME` {#docs:stable:sql:expressions:subqueries::subquery-comparisons-all-any-and-some} In the section on [scalar subqueries](#::scalar-subquery), a scalar expression was compared directly to a subquery using the equality [comparison operator](#docs:stable:sql:expressions:comparison_operators::comparison-operators) (` =`). Such direct comparisons only make sense with scalar subqueries. Scalar expressions can still be compared to single-column subqueries returning multiple rows by specifying a quantifier. Available quantifiers are `ALL`, `ANY` and `SOME`. The quantifiers `ANY` and `SOME` are equivalent. ##### `ALL` {#docs:stable:sql:expressions:subqueries::all} The `ALL` quantifier specifies that the comparison as a whole evaluates to `true` when the individual comparison results of _the expression at the left hand side of the comparison operator_ with each of the values from _the subquery at the right hand side of the comparison operator_ **all** evaluate to `true`: ```sql SELECT 6 <= ALL (SELECT grade FROM grades) AS adequate; ``` returns: | adequate | |----------| | true | because 6 is less than or equal to each of the subquery results 7, 8 and 9. However, the following query ```sql SELECT 8 >= ALL (SELECT grade FROM grades) AS excellent; ``` returns | excellent | |-----------| | false | because 8 is not greater than or equal to the subquery result 7. And thus, because not all comparisons evaluate `true`, `>= ALL` as a whole evaluates to `false`. ##### `ANY` {#docs:stable:sql:expressions:subqueries::any} The `ANY` quantifier specifies that the comparison as a whole evaluates to `true` when at least one of the individual comparison results evaluates to `true`. For example: ```sql SELECT 5 >= ANY (SELECT grade FROM grades) AS fail; ``` returns | fail | |-------| | false | because no result of the subquery is less than or equal to 5. The quantifier `SOME` maybe used instead of `ANY`: `ANY` and `SOME` are interchangeable. #### `EXISTS` {#docs:stable:sql:expressions:subqueries::exists} The `EXISTS` operator tests for the existence of any row inside the subquery. It returns either true when the subquery returns one or more records, and false otherwise. The `EXISTS` operator is generally the most useful as a *correlated* subquery to express semijoin operations. However, it can be used as an uncorrelated subquery as well. For example, we can use it to figure out if there are any grades present for a given course: ```sql SELECT EXISTS (FROM grades WHERE course = 'Math') AS math_grades_present; ``` | math_grades_present | |--------------------:| | true | ```sql SELECT EXISTS (FROM grades WHERE course = 'History') AS history_grades_present; ``` | history_grades_present | |-----------------------:| | false | > The subqueries in the examples above make use of the fact that you can omit the `SELECT *` in DuckDB thanks to the [`FROM`-first syntax](#docs:stable:sql:query_syntax:from). The `SELECT` clause is required in subqueries by other SQL systems but cannot fulfil any purpose in `EXISTS` and `NOT EXISTS` subqueries. ##### `NOT EXISTS` {#docs:stable:sql:expressions:subqueries::not-exists} The `NOT EXISTS` operator tests for the absence of any row inside the subquery. It returns either true when the subquery returns an empty result, and false otherwise. The `NOT EXISTS` operator is generally the most useful as a *correlated* subquery to express antijoin operations. For example, to find Person nodes without an interest: ```sql CREATE TABLE Person (id BIGINT, name VARCHAR); CREATE TABLE interest (PersonId BIGINT, topic VARCHAR); INSERT INTO Person VALUES (1, 'Jane'), (2, 'Joe'); INSERT INTO interest VALUES (2, 'Music'); SELECT * FROM Person WHERE NOT EXISTS (FROM interest WHERE interest.PersonId = Person.id); ``` | id | name | |---:|------| | 1 | Jane | > DuckDB automatically detects when a `NOT EXISTS` query expresses an antijoin operation. There is no need to manually rewrite such queries to use `LEFT OUTER JOIN ... WHERE ... IS NULL`. #### `IN` Operator {#docs:stable:sql:expressions:subqueries::in-operator} The `IN` operator checks containment of the left expression inside the result defined by the subquery or the set of expressions on the right hand side (RHS). The `IN` operator returns true if the expression is present in the RHS, false if the expression is not in the RHS and the RHS has no `NULL` values, or `NULL` if the expression is not in the RHS and the RHS has `NULL` values. We can use the `IN` operator in a similar manner as we used the `EXISTS` operator: ```sql SELECT 'Math' IN (SELECT course FROM grades) AS math_grades_present; ``` | math_grades_present | |--------------------:| | true | #### Correlated Subqueries {#docs:stable:sql:expressions:subqueries::correlated-subqueries} All the subqueries presented here so far have been **uncorrelated** subqueries, where the subqueries themselves are entirely self-contained and can be run without the parent query. There exists a second type of subqueries called **correlated** subqueries. For correlated subqueries, the subquery uses values from the parent subquery. Conceptually, the subqueries are run once for every single row in the parent query. Perhaps a simple way of envisioning this is that the correlated subquery is a **function** that is applied to every row in the source dataset. For example, suppose that we want to find the minimum grade for every course. We could do that as follows: ```sql SELECT * FROM grades grades_parent WHERE grade = (SELECT min(grade) FROM grades WHERE grades.course = grades_parent.course); ``` | grade | course | |------:|--------| | 7 | Math | | 8 | CS | The subquery uses a column from the parent query (` grades_parent.course`). Conceptually, we can see the subquery as a function where the correlated column is a parameter to that function: ```sql SELECT min(grade) FROM grades WHERE course = ?; ``` Now when we execute this function for each of the rows, we can see that for `Math` this will return `7`, and for `CS` it will return `8`. We then compare it against the grade for that actual row. As a result, the row `(Math, 9)` will be filtered out, as `9 <> 7`. #### Returning Each Row of the Subquery as a Struct {#docs:stable:sql:expressions:subqueries::returning-each-row-of-the-subquery-as-a-struct} Using the name of a subquery in the `SELECT` clause (without referring to a specific column) turns each row of the subquery into a struct whose fields correspond to the columns of the subquery. For example: ```sql SELECT t FROM (SELECT unnest(generate_series(41, 43)) AS x, 'hello' AS y) t; ``` | t | |-----------------------| | {'x': 41, 'y': hello} | | {'x': 42, 'y': hello} | | {'x': 43, 'y': hello} | ### TRY expression {#docs:stable:sql:expressions:try} The `TRY` expression ensures that errors caused by the input rows in the child (scalar) expression result in `NULL` for those rows, instead of causing the query to throw an error. > The `TRY` expression was inspired by the [`TRY_CAST` expression](#docs:stable:sql:expressions:cast::try_cast). #### Examples {#docs:stable:sql:expressions:try::examples} The following calls return errors when invoked without the `TRY` expression. When they are wrapped into as `TRY` expression, they return `NULL`: ##### Casting {#docs:stable:sql:expressions:try::casting} ###### Without `TRY` {#docs:stable:sql:expressions:try::without-try} ```sql SELECT 'abc'::INTEGER; ``` ```console Conversion Error: Could not convert string 'abc' to INT32 ``` ###### With `TRY` {#docs:stable:sql:expressions:try::with-try} ```sql SELECT TRY('abc'::INTEGER); ``` ```text NULL ``` ##### Logarithm on Zero {#docs:stable:sql:expressions:try::logarithm-on-zero} ###### Without `TRY` {#docs:stable:sql:expressions:try::without-try} ```sql SELECT ln(0); ``` ```console Out of Range Error: cannot take logarithm of zero ``` ###### With `TRY` {#docs:stable:sql:expressions:try::with-try} ```sql SELECT TRY(ln(0)); ``` ```text NULL ``` ##### Casting Multiple Rows {#docs:stable:sql:expressions:try::casting-multiple-rows} ###### Without `TRY` {#docs:stable:sql:expressions:try::without-try} ```sql WITH cte AS (FROM (VALUES ('123'), ('test'), ('235')) t(a)) SELECT a::INTEGER AS x FROM cte; ``` ```console Conversion Error: Could not convert string 'test' to INT32 ``` ###### With `TRY` {#docs:stable:sql:expressions:try::with-try} ```sql WITH cte AS (FROM (VALUES ('123'), ('test'), ('235')) t(a)) SELECT TRY(a::INTEGER) AS x FROM cte; ``` | x | |-----:| | 123 | | NULL | | 235 | #### Limitations {#docs:stable:sql:expressions:try::limitations} `TRY` cannot be used in combination with a volatile function or with a [scalar subquery](#docs:stable:sql:expressions:subqueries::scalar-subquery). For example: ```sql SELECT TRY(random()) ``` ```console Binder Error: TRY can not be used in combination with a volatile function ``` ## Functions {#sql:functions} ### Functions {#docs:stable:sql:functions:overview} #### Function Syntax {#docs:stable:sql:functions:overview::function-syntax} #### Function Chaining via the Dot Operator {#docs:stable:sql:functions:overview::function-chaining-via-the-dot-operator} DuckDB supports the dot syntax for function chaining. This allows the function call `fn(arg1, arg2, arg3, ...)` to be rewritten as `arg1.fn(arg2, arg3, ...)`. For example, take the following use of the [`replace` function](#docs:stable:sql:functions:text::replacestring-source-target): ```sql SELECT replace(goose_name, 'goose', 'duck') AS duck_name FROM unnest(['African goose', 'Faroese goose', 'Hungarian goose', 'Pomeranian goose']) breed(goose_name); ``` This can be rewritten as follows: ```sql SELECT goose_name.replace('goose', 'duck') AS duck_name FROM unnest(['African goose', 'Faroese goose', 'Hungarian goose', 'Pomeranian goose']) breed(goose_name); ``` ##### Using with Literals and Arrays {#docs:stable:sql:functions:overview::using-with-literals-and-arrays} To apply function chaining to literals and following array access operations, you must surround the argument with parentheses, e.g.: ```sql SELECT ('hello world').replace(' ', '_'); ``` ```sql SELECT (2).sqrt(); ``` ```sql SELECT (m[1]).map_entries() FROM (VALUES ([MAP {'hello': 42}, MAP {'world': 42}])) t(m); ``` In the absence of these parentheses, DuckDB will return a `Parser Error` for the function call: ```console Parser Error: syntax error at or near "(" ``` ##### Limitations {#docs:stable:sql:functions:overview::limitations} Function chaining via the dot operator is limited to *scalar* functions and is not supported for *table* functions. For example, the following call returns a `Parser Error`: ```sql SELECT * FROM ('my_file.parquet').read_parquet(); -- does not work ``` #### Query Functions {#docs:stable:sql:functions:overview::query-functions} The `duckdb_functions()` table function shows the list of functions currently built into the system. ```sql SELECT DISTINCT ON(function_name) function_name, function_type, return_type, parameters, parameter_types, description FROM duckdb_functions() WHERE function_type = 'scalar' AND function_name LIKE 'b%' ORDER BY function_name; ``` | function_name | function_type | return_type | parameters | parameter_types | description | |---------------|---------------|-------------|------------------------|----------------------------------|------------------------------------------------------------------------------------------------------------------------------------------| | bar | scalar | VARCHAR | [x, min, max, width] | [DOUBLE, DOUBLE, DOUBLE, DOUBLE] | Draws a band whose width is proportional to (x - min) and equal to width characters when x = max. width defaults to 80 | | base64 | scalar | VARCHAR | [blob] | [BLOB] | Convert a blob to a base64 encoded string | | bin | scalar | VARCHAR | [value] | [VARCHAR] | Converts the value to binary representation | | bit_count | scalar | TINYINT | [x] | [TINYINT] | Returns the number of bits that are set | | bit_length | scalar | BIGINT | [col0] | [VARCHAR] | NULL | | bit_position | scalar | INTEGER | [substring, bitstring] | [BIT, BIT] | Returns first starting index of the specified substring within bits, or zero if it is not present. The first (leftmost) bit is indexed 1 | | bitstring | scalar | BIT | [bitstring, length] | [VARCHAR, INTEGER] | Pads the bitstring until the specified length | > Currently, the description and parameter names of functions are not available in the `duckdb_functions()` function. ### Aggregate Functions {#docs:stable:sql:functions:aggregates} #### Examples {#docs:stable:sql:functions:aggregates::examples} Produce a single row containing the sum of the `amount` column: ```sql SELECT sum(amount) FROM sales; ``` Produce one row per unique region, containing the sum of `amount` for each group: ```sql SELECT region, sum(amount) FROM sales GROUP BY region; ``` Return only the regions that have a sum of `amount` higher than 100: ```sql SELECT region FROM sales GROUP BY region HAVING sum(amount) > 100; ``` Return the number of unique values in the `region` column: ```sql SELECT count(DISTINCT region) FROM sales; ``` Return two values, the total sum of `amount` and the sum of `amount` minus columns where the region is `north` using the [`FILTER` clause](#docs:stable:sql:query_syntax:filter): ```sql SELECT sum(amount), sum(amount) FILTER (region != 'north') FROM sales; ``` Returns a list of all regions in order of the `amount` column: ```sql SELECT list(region ORDER BY amount DESC) FROM sales; ``` Returns the amount of the first sale using the `first()` aggregate function: ```sql SELECT first(amount ORDER BY date ASC) FROM sales; ``` #### Syntax {#docs:stable:sql:functions:aggregates::syntax} Aggregates are functions that *combine* multiple rows into a single value. Aggregates are different from scalar functions and window functions because they change the cardinality of the result. As such, aggregates can only be used in the `SELECT` and `HAVING` clauses of a SQL query. ##### `DISTINCT` Clause in Aggregate Functions {#docs:stable:sql:functions:aggregates::distinct-clause-in-aggregate-functions} When the `DISTINCT` clause is provided, only distinct values are considered in the computation of the aggregate. This is typically used in combination with the `count` aggregate to get the number of distinct elements; but it can be used together with any aggregate function in the system. There are some aggregates that are insensitive to duplicate values (e.g., `min` and `max`) and for them this clause is parsed and ignored. ##### `ORDER BY` Clause in Aggregate Functions {#docs:stable:sql:functions:aggregates::order-by-clause-in-aggregate-functions} An `ORDER BY` clause can be provided after the last argument of the function call. Note the lack of the comma separator before the clause. ```sql SELECT âŸ¨aggregate_functionâŸ©(âŸ¨argâŸ©, âŸ¨sepâŸ© ORDER BY âŸ¨ordering_criteriaâŸ©); ``` This clause ensures that the values being aggregated are sorted before applying the function. Most aggregate functions are order-insensitive, and for them this clause is parsed and discarded. However, there are some order-sensitive aggregates that can have non-deterministic results without ordering, e.g., `first`, `last`, `list` and `string_agg` / `group_concat` / `listagg`. These can be made deterministic by ordering the arguments. For example: ```sql CREATE TABLE tbl AS SELECT s FROM range(1, 4) r(s); SELECT string_agg(s, ', ' ORDER BY s DESC) AS countdown FROM tbl; ``` | countdown | |-----------| | 3, 2, 1 | ##### Handling `NULL` Values {#docs:stable:sql:functions:aggregates::handling-null-values} All general aggregate functions ignore `NULL`s, except for [`list`](#::listarg) ([`array_agg`](#::listarg)), [`first`](#::firstarg) ([`arbitrary`](#::firstarg)) and [`last`](#::lastarg). To exclude `NULL`s from `list`, you can use a [`FILTER` clause](#docs:stable:sql:query_syntax:filter). To ignore `NULL`s from `first`, you can use the [`any_value` aggregate](#::any_valuearg). All general aggregate functions except [`count`](#::countarg) return `NULL` on empty groups. In particular, [`list`](#::listarg) does *not* return an empty list, [`sum`](#::sumarg) does *not* return zero, and [`string_agg`](#::string_aggarg-sep) does *not* return an empty string in this case. #### General Aggregate Functions {#docs:stable:sql:functions:aggregates::general-aggregate-functions} The table below shows the available general aggregate functions. | Function | Description | |:--|:--------| | [`any_value(arg)`](#::any_valuearg) | Returns the first non-null value from `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`arg_max(arg, val)`](#::arg_maxarg-val) | Finds the row with the maximum `val` and calculates the `arg` expression at that row. Rows where the value of the `arg` or `val` expression is `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`arg_max(arg, val, n)`](#::arg_maxarg-val-n) | The generalized case of [`arg_max`](#::arg_maxarg-val) for `n` values: returns a `LIST` containing the `arg` expressions for the top `n` rows ordered by `val` descending. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`arg_max_null(arg, val)`](#::arg_max_nullarg-val) | Finds the row with the maximum `val` and calculates the `arg` expression at that row. Rows where the `val` expression evaluates to `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`arg_min(arg, val)`](#::arg_minarg-val) | Finds the row with the minimum `val` and calculates the `arg` expression at that row. Rows where the value of the `arg` or `val` expression is `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`arg_min(arg, val, n)`](#::arg_minarg-val-n) | Returns a `LIST` containing the `arg` expressions for the "bottom" `n` rows ordered by `val` ascending. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`arg_min_null(arg, val)`](#::arg_min_nullarg-val) | Finds the row with the minimum `val` and calculates the `arg` expression at that row. Rows where the `val` expression evaluates to `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`avg(arg)`](#::avgarg) | Calculates the average of all non-null values in `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`bit_and(arg)`](#::bit_andarg) | Returns the bitwise AND of all bits in a given expression. | | [`bit_or(arg)`](#::bit_orarg) | Returns the bitwise OR of all bits in a given expression. | | [`bit_xor(arg)`](#::bit_xorarg) | Returns the bitwise XOR of all bits in a given expression. | | [`bitstring_agg(arg)`](#::bitstring_aggarg) | Returns a bitstring whose length corresponds to the range of the non-null (integer) values, with bits set at the location of each (distinct) value. | | [`bool_and(arg)`](#::bool_andarg) | Returns `true` if every input value is `true`, otherwise `false`. | | [`bool_or(arg)`](#::bool_orarg) | Returns `true` if any input value is `true`, otherwise `false`. | | [`count()`](#::count) | Returns the number of rows. | | [`count(arg)`](#::countarg) | Returns the number of rows where `arg` is not `NULL`. | | [`countif(arg)`](#::countifarg) | Returns the number of rows where `arg` is `true`. | | [`favg(arg)`](#::favgarg) | Calculates the average using a more accurate floating point summation (Kahan Sum). This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`first(arg)`](#::firstarg) | Returns the first value (null or non-null) from `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`fsum(arg)`](#::fsumarg) | Calculates the sum using a more accurate floating point summation (Kahan Sum). This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`geometric_mean(arg)`](#::geometric_meanarg) | Calculates the geometric mean of all non-null values in `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`histogram(arg)`](#::histogramarg) | Returns a `MAP` of key-value pairs representing buckets and counts. | | [`histogram(arg, boundaries)`](#::histogramarg-boundaries) | Returns a `MAP` of key-value pairs representing the provided upper `boundaries` and counts of elements in the corresponding bins (left-open and right-closed partitions) of the datatype. A boundary at the largest value of the datatype is automatically added when elements larger than all provided `boundaries` appear, see [`is_histogram_other_bin`](#docs:stable:sql:functions:utility::is_histogram_other_binarg). Boundaries may be provided, e.g., via [`equi_width_bins`](#docs:stable:sql:functions:utility::equi_width_binsminmaxbincountnice). | | [`histogram_exact(arg, elements)`](#::histogram_exactarg-elements) | Returns a `MAP` of key-value pairs representing the requested elements and their counts. A catch-all element specific to the data-type is automatically added to count other elements when they appear, see [`is_histogram_other_bin`](#docs:stable:sql:functions:utility::is_histogram_other_binarg). | | [`histogram_values(source, boundaries)`](#::histogram_valuessource-col_name-technique-bin_count) | Returns the upper boundaries of the bins and their counts. | | [`last(arg)`](#::lastarg) | Returns the last value of a column. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`list(arg)`](#::listarg) | Returns a `LIST` containing all the values of a column. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`max(arg)`](#::maxarg) | Returns the maximum value present in `arg`. This function is [unaffected by distinctness](#::distinct-clause-in-aggregate-functions). | | [`max(arg, n)`](#::maxarg-n) | Returns a `LIST` containing the `arg` values for the "top" `n` rows ordered by `arg` descending. | | [`min(arg)`](#::minarg) | Returns the minimum value present in `arg`. This function is [unaffected by distinctness](#::distinct-clause-in-aggregate-functions). | | [`min(arg, n)`](#::minarg-n) | Returns a `LIST` containing the `arg` values for the "bottom" `n` rows ordered by `arg` ascending. | | [`product(arg)`](#::productarg) | Calculates the product of all non-null values in `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`string_agg(arg)`](#::string_aggarg-sep) | Concatenates the column string values with a comma separator (` ,`). This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`string_agg(arg, sep)`](#::string_aggarg-sep) | Concatenates the column string values with a separator. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`sum(arg)`](#::sumarg) | Calculates the sum of all non-null values in `arg` / counts `true` values when `arg` is boolean. The floating-point versions of this function are [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`weighted_avg(arg, weight)`](#::weighted_avgarg-weight) | Calculates the weighted average all non-null values in `arg`, where each value is scaled by its corresponding `weight`. If `weight` is `NULL`, the corresponding `arg` value will be skipped. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | ###### `any_value(arg)` {#docs:stable:sql:functions:aggregates::any_valuearg} | | | |:--|:--------| | **Description** |Returns the first non-`NULL` value from `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `any_value(A)` | ###### `arg_max(arg, val)` {#docs:stable:sql:functions:aggregates::arg_maxarg-val} | | | |:--|:--------| | **Description** |Finds the row with the maximum `val` and calculates the `arg` expression at that row. Rows where the value of the `arg` or `val` expression is `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `arg_max(A, B)` | | **Alias(es)** | `argmax(arg, val)`, `max_by(arg, val)` | ###### `arg_max(arg, val, n)` {#docs:stable:sql:functions:aggregates::arg_maxarg-val-n} | | | |:--|:--------| | **Description** |The generalized case of [`arg_max`](#::arg_maxarg-val) for `n` values: returns a `LIST` containing the `arg` expressions for the top `n` rows ordered by `val` descending. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `arg_max(A, B, 2)` | | **Alias(es)** | `argmax(arg, val, n)`, `max_by(arg, val, n)` | ###### `arg_max_null(arg, val)` {#docs:stable:sql:functions:aggregates::arg_max_nullarg-val} | | | |:--|:--------| | **Description** |Finds the row with the maximum `val` and calculates the `arg` expression at that row. Rows where the `val` expression evaluates to `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `arg_max_null(A, B)` | ###### `arg_min(arg, val)` {#docs:stable:sql:functions:aggregates::arg_minarg-val} | | | |:--|:--------| | **Description** |Finds the row with the minimum `val` and calculates the `arg` expression at that row. Rows where the value of the `arg` or `val` expression is `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `arg_min(A, B)` | | **Alias(es)** | `argmin(arg, val)`, `min_by(arg, val)` | ###### `arg_min(arg, val, n)` {#docs:stable:sql:functions:aggregates::arg_minarg-val-n} | | | |:--|:--------| | **Description** |The generalized case of [`arg_min`](#::arg_minarg-val) for `n` values: returns a `LIST` containing the `arg` expressions for the top `n` rows ordered by `val` descending. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `arg_min(A, B, 2)` | | **Alias(es)** | `argmin(arg, val, n)`, `min_by(arg, val, n)` | ###### `arg_min_null(arg, val)` {#docs:stable:sql:functions:aggregates::arg_min_nullarg-val} | | | |:--|:--------| | **Description** |Finds the row with the minimum `val` and calculates the `arg` expression at that row. Rows where the `val` expression evaluates to `NULL` are ignored. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `arg_min_null(A, B)` | ###### `avg(arg)` {#docs:stable:sql:functions:aggregates::avgarg} | | | |:--|:--------| | **Description** |Calculates the average of all non-null values in `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `avg(A)` | | **Alias(es)** | `mean` | ###### `bit_and(arg)` {#docs:stable:sql:functions:aggregates::bit_andarg} | | | |:--|:--------| | **Description** |Returns the bitwise `AND` of all bits in a given expression. | | **Example** | `bit_and(A)` | ###### `bit_or(arg)` {#docs:stable:sql:functions:aggregates::bit_orarg} | | | |:--|:--------| | **Description** |Returns the bitwise `OR` of all bits in a given expression. | | **Example** | `bit_or(A)` | ###### `bit_xor(arg)` {#docs:stable:sql:functions:aggregates::bit_xorarg} | | | |:--|:--------| | **Description** |Returns the bitwise `XOR` of all bits in a given expression. | | **Example** | `bit_xor(A)` | ###### `bitstring_agg(arg)` {#docs:stable:sql:functions:aggregates::bitstring_aggarg} | | | |:--|:--------| | **Description** |Returns a bitstring whose length corresponds to the range of the non-null (integer) values, with bits set at the location of each (distinct) value. | | **Example** | `bitstring_agg(A)` | ###### `bool_and(arg)` {#docs:stable:sql:functions:aggregates::bool_andarg} | | | |:--|:--------| | **Description** |Returns `true` if every input value is `true`, otherwise `false`. | | **Example** | `bool_and(A)` | ###### `bool_or(arg)` {#docs:stable:sql:functions:aggregates::bool_orarg} | | | |:--|:--------| | **Description** |Returns `true` if any input value is `true`, otherwise `false`. | | **Example** | `bool_or(A)` | ###### `count()` {#docs:stable:sql:functions:aggregates::count} | | | |:--|:--------| | **Description** |Returns the number of rows. | | **Example** | `count()` | | **Alias(es)** | `count(*)` | ###### `count(arg)` {#docs:stable:sql:functions:aggregates::countarg} | | | |:--|:--------| | **Description** |Returns the number rows where `arg` is not `NULL`. | | **Example** | `count(A)` | ###### `countif(arg)` {#docs:stable:sql:functions:aggregates::countifarg} | | | |:--|:--------| | **Description** |Returns the number of rows where `arg` is `true`. | | **Example** | `countif(A)` | ###### `favg(arg)` {#docs:stable:sql:functions:aggregates::favgarg} | | | |:--|:--------| | **Description** |Calculates the average using a more accurate floating point summation (Kahan Sum). This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `favg(A)` | ###### `first(arg)` {#docs:stable:sql:functions:aggregates::firstarg} | | | |:--|:--------| | **Description** |Returns the first value (null or non-null) from `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `first(A)` | | **Alias(es)** | `arbitrary(A)` | ###### `fsum(arg)` {#docs:stable:sql:functions:aggregates::fsumarg} | | | |:--|:--------| | **Description** |Calculates the sum using a more accurate floating point summation (Kahan Sum). This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `fsum(A)` | | **Alias(es)** | `sumkahan`, `kahan_sum` | ###### `geometric_mean(arg)` {#docs:stable:sql:functions:aggregates::geometric_meanarg} | | | |:--|:--------| | **Description** |Calculates the geometric mean of all non-null values in `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `geometric_mean(A)` | | **Alias(es)** | `geomean(A)` | ###### `histogram(arg)` {#docs:stable:sql:functions:aggregates::histogramarg} | | | |:--|:--------| | **Description** |Returns a `MAP` of key-value pairs representing buckets and counts. | | **Example** | `histogram(A)` | ###### `histogram(arg, boundaries)` {#docs:stable:sql:functions:aggregates::histogramarg-boundaries} | | | |:--|:--------| | **Description** |Returns a `MAP` of key-value pairs representing the provided upper `boundaries` and counts of elements in the corresponding bins (left-open and right-closed partitions) of the datatype. A boundary at the largest value of the datatype is automatically added when elements larger than all provided `boundaries` appear, see [`is_histogram_other_bin`](#docs:stable:sql:functions:utility::is_histogram_other_binarg). Boundaries may be provided, e.g., via [`equi_width_bins`](#docs:stable:sql:functions:utility::equi_width_binsminmaxbincountnice). | | **Example** | `histogram(A, [0, 1, 10])` | ###### `histogram_exact(arg, elements)` {#docs:stable:sql:functions:aggregates::histogram_exactarg-elements} | | | |:--|:--------| | **Description** |Returns a `MAP` of key-value pairs representing the requested elements and their counts. A catch-all element specific to the data-type is automatically added to count other elements when they appear, see [`is_histogram_other_bin`](#docs:stable:sql:functions:utility::is_histogram_other_binarg). | | **Example** | `histogram_exact(A, [0, 1, 10])` | ###### `histogram_values(source, col_name, technique, bin_count)` {#docs:stable:sql:functions:aggregates::histogram_valuessource-col_name-technique-bin_count} | | | |:--|:--------| | **Description** |Returns the upper boundaries of the bins and their counts. | | **Example** | `histogram_values(integers, i, bin_count := 2)` | ###### `last(arg)` {#docs:stable:sql:functions:aggregates::lastarg} | | | |:--|:--------| | **Description** |Returns the last value of a column. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `last(A)` | ###### `list(arg)` {#docs:stable:sql:functions:aggregates::listarg} | | | |:--|:--------| | **Description** |Returns a `LIST` containing all the values of a column. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `list(A)` | | **Alias(es)** | `array_agg` | ###### `max(arg)` {#docs:stable:sql:functions:aggregates::maxarg} | | | |:--|:--------| | **Description** |Returns the maximum value present in `arg`. This function is [unaffected by distinctness](#::distinct-clause-in-aggregate-functions). | | **Example** | `max(A)` | ###### `max(arg, n)` {#docs:stable:sql:functions:aggregates::maxarg-n} | | | |:--|:--------| | **Description** | Returns a `LIST` containing the `arg` values for the "top" `n` rows ordered by `arg` descending. | | **Example** | `max(A, 2)` | ###### `min(arg)` {#docs:stable:sql:functions:aggregates::minarg} | | | |:--|:--------| | **Description** |Returns the minimum value present in `arg`. This function is [unaffected by distinctness](#::distinct-clause-in-aggregate-functions). | | **Example** | `min(A)` | ###### `min(arg, n)` {#docs:stable:sql:functions:aggregates::minarg-n} | | | |:--|:--------| | **Description** |Returns a `LIST` containing the `arg` values for the "bottom" `n` rows ordered by `arg` ascending. | | **Example** | `min(A, 2)` | ###### `product(arg)` {#docs:stable:sql:functions:aggregates::productarg} | | | |:--|:--------| | **Description** |Calculates the product of all non-null values in `arg`. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `product(A)` | ###### `string_agg(arg)` {#docs:stable:sql:functions:aggregates::string_aggarg} | | | |:--|:--------| | **Description** |Concatenates the column string values with a comma separator (` ,`). This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `string_agg(S, ',')` | | **Alias(es)** | `group_concat(arg)`, `listagg(arg)` | ###### `string_agg(arg, sep)` {#docs:stable:sql:functions:aggregates::string_aggarg-sep} | | | |:--|:--------| | **Description** |Concatenates the column string values with a separator. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `string_agg(S, ',')` | | **Alias(es)** | `group_concat(arg, sep)`, `listagg(arg, sep)` | ###### `sum(arg)` {#docs:stable:sql:functions:aggregates::sumarg} | | | |:--|:--------| | **Description** |Calculates the sum of all non-null values in `arg` / counts `true` values when `arg` is boolean. The floating-point versions of this function are [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `sum(A)` | ###### `weighted_avg(arg, weight)` {#docs:stable:sql:functions:aggregates::weighted_avgarg-weight} | | | |:--|:--------| | **Description** |Calculates the weighted average of all non-null values in `arg`, where each value is scaled by its corresponding `weight`. If `weight` is `NULL`, the value will be skipped. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Example** | `weighted_avg(A, W)` | | **Alias(es)** | `wavg(arg, weight)` | #### Approximate Aggregates {#docs:stable:sql:functions:aggregates::approximate-aggregates} The table below shows the available approximate aggregate functions. | Function | Description | Example | |:---|:---|:---| | `approx_count_distinct(x)` | Calculates the approximate count of distinct elements using HyperLogLog. | `approx_count_distinct(A)` | | `approx_quantile(x, pos)` | Calculates the approximate quantile using T-Digest. | `approx_quantile(A, 0.5)` | | `approx_top_k(arg, k)` | Calculates a `LIST` of the `k` approximately most frequent values of `arg` using Filtered Space-Saving. | | | `reservoir_quantile(x, quantile, sample_size = 8192)` | Calculates the approximate quantile using reservoir sampling, the sample size is optional and uses 8192 as a default size. | `reservoir_quantile(A, 0.5, 1024)` | #### Statistical Aggregates {#docs:stable:sql:functions:aggregates::statistical-aggregates} The table below shows the available statistical aggregate functions. They all ignore `NULL` values (in the case of a single input column `x`), or pairs where either input is `NULL` (in the case of two input columns `y` and `x`). | Function | Description | |:--|:--------| | [`corr(y, x)`](#::corry-x) | The correlation coefficient. | | [`covar_pop(y, x)`](#::covar_popy-x) | The population covariance, which does not include bias correction. | | [`covar_samp(y, x)`](#::covar_sampy-x) | The sample covariance, which includes Bessel's bias correction. | | [`entropy(x)`](#::entropyx) | The log-2 entropy. | | [`kurtosis_pop(x)`](#::kurtosis_popx) | The excess kurtosis (Fisherâ€™s definition) without bias correction. | | [`kurtosis(x)`](#::kurtosisx) | The excess kurtosis (Fisher's definition) with bias correction according to the sample size. | | [`mad(x)`](#::madx) | The median absolute deviation. Temporal types return a positive `INTERVAL`. | | [`median(x)`](#::medianx) | The middle value of the set. For even value counts, quantitative values are averaged and ordinal values return the lower value. | | [`mode(x)`](#::modex)| The most frequent value. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | [`quantile_cont(x, pos)`](#::quantile_contx-pos) | The interpolated `pos`-quantile of `x` for `0 <= pos <= 1`. Returns the `pos * (n_nonnull_values - 1)`th (zero-indexed, in the specified order) value of `x` or an interpolation between the adjacent values if the index is not an integer. Intuitively, arranges the values of `x` as equispaced *points* on a line, starting at 0 and ending at 1, and returns the (interpolated) value at `pos`. This is Type 7 in Hyndman & Fan (1996). If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding interpolated quantiles. | | [`quantile_disc(x, pos)`](#::quantile_discx-pos) | The discrete `pos`-quantile of `x` for `0 <= pos <= 1`. Returns the `greatest(ceil(pos * n_nonnull_values) - 1, 0)`th (zero-indexed, in the specified order) value of `x`. Intuitively, assigns to each value of `x` an equisized *sub-interval* (left-open and right-closed except for the initial interval) of the interval `[0, 1]`, and picks the value of the sub-interval that contains `pos`. This is Type 1 in Hyndman & Fan (1996). If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding discrete quantiles. | | [`regr_avgx(y, x)`](#::regr_avgxy-x) | The average of the independent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. | | [`regr_avgy(y, x)`](#::regr_avgyy-x) | The average of the dependent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. | | [`regr_count(y, x)`](#::regr_county-x) | The number of non-`NULL` pairs. | | [`regr_intercept(y, x)`](#::regr_intercepty-x) | The intercept of the univariate linear regression line, where x is the independent variable and y is the dependent variable. | | [`regr_r2(y, x)`](#::regr_r2y-x) | The squared Pearson correlation coefficient between y and x. Also: The coefficient of determination in a linear regression, where x is the independent variable and y is the dependent variable. | | [`regr_slope(y, x)`](#::regr_slopey-x) | The slope of the linear regression line, where x is the independent variable and y is the dependent variable. | | [`regr_sxx(y, x)`](#::regr_sxxy-x) | The sample variance, which includes Bessel's bias correction, of the independent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. | | [`regr_sxy(y, x)`](#::regr_sxyy-x) | The sample covariance, which includes Bessel's bias correction. | | [`regr_syy(y, x)`](#::regr_syyy-x) | The sample variance, which includes Bessel's bias correction, of the dependent variable for non-`NULL` pairs , where x is the independent variable and y is the dependent variable. | | [`skewness(x)`](#::skewnessx) | The skewness. | | [`sem(x)`](#::semx) | The standard error of the mean. | | [`stddev_pop(x)`](#::stddev_popx) | The population standard deviation. | | [`stddev_samp(x)`](#::stddev_sampx) | The sample standard deviation. | | [`var_pop(x)`](#::var_popx) | The population variance, which does not include bias correction. | | [`var_samp(x)`](#::var_sampx) | The sample variance, which includes Bessel's bias correction. | ###### `corr(y, x)` {#docs:stable:sql:functions:aggregates::corry-x} | | | |:--|:--------| | **Description** |The correlation coefficient. | **Formula** | `covar_pop(y, x) / (stddev_pop(x) * stddev_pop(y))` | ###### `covar_pop(y, x)` {#docs:stable:sql:functions:aggregates::covar_popy-x} | | | |:--|:--------| | **Description** |The population covariance, which does not include bias correction. | | **Formula** | `(sum(x*y) - sum(x) * sum(y) / regr_count(y, x)) / regr_count(y, x)`, `covar_samp(y, x) * (1 - 1 / regr_count(y, x))` | ###### `covar_samp(y, x)` {#docs:stable:sql:functions:aggregates::covar_sampy-x} | | | |:--|:--------| | **Description** |The sample covariance, which includes Bessel's bias correction. | | **Formula** | `(sum(x*y) - sum(x) * sum(y) / regr_count(y, x)) / (regr_count(y, x) - 1)`, `covar_pop(y, x) / (1 - 1 / regr_count(y, x))` | | **Alias(es)** | `regr_sxy(y, x)` | ###### `entropy(x)` {#docs:stable:sql:functions:aggregates::entropyx} | | | |:--|:--------| | **Description** |The log-2 entropy. | | **Formula** | - | ###### `kurtosis_pop(x)` {#docs:stable:sql:functions:aggregates::kurtosis_popx} | | | |:--|:--------| | **Description** |The excess kurtosis (Fisherâ€™s definition) without bias correction. | | **Formula** | - | ###### `kurtosis(x)` {#docs:stable:sql:functions:aggregates::kurtosisx} | | | |:--|:--------| | **Description** |The excess kurtosis (Fisher's definition) with bias correction according to the sample size. | | **Formula** | - | ###### `mad(x)` {#docs:stable:sql:functions:aggregates::madx} | | | |:--|:--------| | **Description** |The median absolute deviation. Temporal types return a positive `INTERVAL`. | | **Formula** | `median(abs(x - median(x)))` | ###### `median(x)` {#docs:stable:sql:functions:aggregates::medianx} | | | |:--|:--------| | **Description** |The middle value of the set. For even value counts, quantitative values are averaged and ordinal values return the lower value. | | **Formula** | `quantile_cont(x, 0.5)` | ###### `mode(x)` {#docs:stable:sql:functions:aggregates::modex} | | | |:--|:--------| | **Description** |The most frequent value. This function is [affected by ordering](#::order-by-clause-in-aggregate-functions). | | **Formula** | - | ###### `quantile_cont(x, pos)` {#docs:stable:sql:functions:aggregates::quantile_contx-pos} | | | |:--|:--------| | **Description** |The interpolated `pos`-quantile of `x` for `0 <= pos <= 1`. Returns the `pos * (n_nonnull_values - 1)`th (zero-indexed, in the specified order) value of `x` or an interpolation between the adjacent values if the index is not an integer. Intuitively, arranges the values of `x` as equispaced *points* on a line, starting at 0 and ending at 1, and returns the (interpolated) value at `pos`. This is Type 7 in Hyndman & Fan (1996). If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding interpolated quantiles. | | **Formula** | - | ###### `quantile_disc(x, pos)` {#docs:stable:sql:functions:aggregates::quantile_discx-pos} | | | |:--|:--------| | **Description** |The discrete `pos`-quantile of `x` for `0 <= pos <= 1`. Returns the `greatest(ceil(pos * n_nonnull_values) - 1, 0)`th (zero-indexed, in the specified order) value of `x`. Intuitively, assigns to each value of `x` an equisized *sub-interval* (left-open and right-closed except for the initial interval) of the interval `[0, 1]`, and picks the value of the sub-interval that contains `pos`. This is Type 1 in Hyndman & Fan (1996). If `pos` is a `LIST` of `FLOAT`s, then the result is a `LIST` of the corresponding discrete quantiles. | | **Formula** | - | | **Alias(es)** | `quantile` | ###### `regr_avgx(y, x)` {#docs:stable:sql:functions:aggregates::regr_avgxy-x} | | | |:--|:--------| | **Description** |The average of the independent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. | | **Formula** | - | ###### `regr_avgy(y, x)` {#docs:stable:sql:functions:aggregates::regr_avgyy-x} | | | |:--|:--------| | **Description** |The average of the dependent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. | | **Formula** | - | ###### `regr_count(y, x)` {#docs:stable:sql:functions:aggregates::regr_county-x} | | | |:--|:--------| | **Description** |The number of non-`NULL` pairs. | | **Formula** | - | ###### `regr_intercept(y, x)` {#docs:stable:sql:functions:aggregates::regr_intercepty-x} | | | |:--|:--------| | **Description** |The intercept of the univariate linear regression line, where x is the independent variable and y is the dependent variable. | | **Formula** | `regr_avgy(y, x) - regr_slope(y, x) * regr_avgx(y, x)` | ###### `regr_r2(y, x)` {#docs:stable:sql:functions:aggregates::regr_r2y-x} | | | |:--|:--------| | **Description** |The squared Pearson correlation coefficient between y and x. Also: The coefficient of determination in a linear regression, where x is the independent variable and y is the dependent variable. | | **Formula** | - | ###### `regr_slope(y, x)` {#docs:stable:sql:functions:aggregates::regr_slopey-x} | | | |:--|:--------| | **Description** |Returns the slope of the linear regression line, where x is the independent variable and y is the dependent variable. | | **Formula** | `regr_sxy(y, x) / regr_sxx(y, x)` | | **Alias(es)** | - | ###### `regr_sxx(y, x)` {#docs:stable:sql:functions:aggregates::regr_sxxy-x} | | | |:--|:--------| | **Description** |The sample variance, which includes Bessel's bias correction, of the independent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. | | **Formula** | - | ###### `regr_sxy(y, x)` {#docs:stable:sql:functions:aggregates::regr_sxyy-x} | | | |:--|:--------| | **Description** |The sample covariance, which includes Bessel's bias correction. | | **Formula** | `(sum(x*y) - sum(x) * sum(y) / regr_count(y, x)) / (regr_count(y, x) - 1)`, `covar_pop(y, x) / (1 - 1 / regr_count(y, x))` | | **Alias(es)** | `covar_samp(y, x)` | ###### `regr_syy(y, x)` {#docs:stable:sql:functions:aggregates::regr_syyy-x} | | | |:--|:--------| | **Description** |The sample variance, which includes Bessel's bias correction, of the dependent variable for non-`NULL` pairs, where x is the independent variable and y is the dependent variable. | | **Formula** | - | ###### `sem(x)` {#docs:stable:sql:functions:aggregates::semx} | | | |:--|:--------| | **Description** |The standard error of the mean. | | **Formula** | - | ###### `skewness(x)` {#docs:stable:sql:functions:aggregates::skewnessx} | | | |:--|:--------| | **Description** |The skewness. | | **Formula** | - | ###### `stddev_pop(x)` {#docs:stable:sql:functions:aggregates::stddev_popx} | | | |:--|:--------| | **Description** |The population standard deviation. | | **Formula** | `sqrt(var_pop(x))` | ###### `stddev_samp(x)` {#docs:stable:sql:functions:aggregates::stddev_sampx} | | | |:--|:--------| | **Description** |The sample standard deviation. | | **Formula** | `sqrt(var_samp(x))`| | **Alias(es)** | `stddev(x)`| ###### `var_pop(x)` {#docs:stable:sql:functions:aggregates::var_popx} | | | |:--|:--------| | **Description** |The population variance, which does not include bias correction. | | **Formula** | `(sum(x^2) - sum(x)^2 / count(x)) / count(x)`, `var_samp(y, x) * (1 - 1 / count(x))` | ###### `var_samp(x)` {#docs:stable:sql:functions:aggregates::var_sampx} | | | |:--|:--------| | **Description** |The sample variance, which includes Bessel's bias correction. | | **Formula** | `(sum(x^2) - sum(x)^2 / count(x)) / (count(x) - 1)`, `var_pop(y, x) / (1 - 1 / count(x))` | | **Alias(es)** | `variance(arg, val)` | #### Ordered Set Aggregate Functions {#docs:stable:sql:functions:aggregates::ordered-set-aggregate-functions} The table below shows the available â€œordered setâ€ aggregate functions. These functions are specified using the `WITHIN GROUP (ORDER BY sort_expression)` syntax, and they are converted to an equivalent aggregate function that takes the ordering expression as the first argument. | Function | Equivalent | |:---|:---| | mode() WITHIN GROUP (ORDER BY column [(ASC|DESC)]) | mode(column ORDER BY column [(ASC|DESC)]) | | percentile_cont(fraction) WITHIN GROUP (ORDER BY column [(ASC|DESC)]) | quantile_cont(column, fraction ORDER BY column [(ASC|DESC)]) | | percentile_cont(fractions) WITHIN GROUP (ORDER BY column [(ASC|DESC)]) | quantile_cont(column, fractions ORDER BY column [(ASC|DESC)]) | | percentile_disc(fraction) WITHIN GROUP (ORDER BY column [(ASC|DESC)]) | quantile_disc(column, fraction ORDER BY column [(ASC|DESC)]) | | percentile_disc(fractions) WITHIN GROUP (ORDER BY column [(ASC|DESC)]) | quantile_disc(column, fractions ORDER BY column [(ASC|DESC)]) | #### Miscellaneous Aggregate Functions {#docs:stable:sql:functions:aggregates::miscellaneous-aggregate-functions} | Function | Description | Alias | |:--|:---|:--| | `grouping()` | For queries with `GROUP BY` and either [`ROLLUP` or `GROUPING SETS`](#docs:stable:sql:query_syntax:grouping_sets::identifying-grouping-sets-with-grouping_id): Returns an integer identifying which of the argument expressions where used to group on to create the current supper-aggregate row. | `grouping_id()` | ### Array Functions {#docs:stable:sql:functions:array} All [`LIST` functions](#docs:stable:sql:functions:list) work with the [`ARRAY` data type](#docs:stable:sql:data_types:array). Additionally, several `ARRAY`-native functions are also supported. #### Array-Native Functions {#docs:stable:sql:functions:array::array-native-functions} | Function | Description | |:--|:-------| | [`array_cosine_distance(array1, array2)`](#::array_cosine_distancearray1-array2) | Computes the cosine distance between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments. | | [`array_cosine_similarity(array1, array2)`](#::array_cosine_similarityarray1-array2) | Computes the cosine similarity between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments. | | [`array_cross_product(array, array)`](#::array_cross_productarray-array) | Computes the cross product of two arrays of size 3. The array elements can not be `NULL`. | | [`array_distance(array1, array2)`](#::array_distancearray1-array2) | Computes the distance between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments. | | [`array_dot_product(array1, array2)`](#::array_inner_productarray1-array2) | Alias for `array_inner_product`. | | [`array_inner_product(array1, array2)`](#::array_inner_productarray1-array2) | Computes the inner product between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments. | | [`array_negative_dot_product(array1, array2)`](#::array_negative_inner_productarray1-array2) | Alias for `array_negative_inner_product`. | | [`array_negative_inner_product(array1, array2)`](#::array_negative_inner_productarray1-array2) | Computes the negative inner product between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments. | | [`array_value(arg, ...)`](#::array_valuearg-) | Creates an `ARRAY` containing the argument values. | ###### `array_cosine_distance(array1, array2)` {#docs:stable:sql:functions:array::array_cosine_distancearray1-array2} | | | |:--|:--------| | **Description** |Computes the cosine distance between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments. | | **Example** | `array_cosine_distance(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))` | | **Result** | `0.007416606` | ###### `array_cosine_similarity(array1, array2)` {#docs:stable:sql:functions:array::array_cosine_similarityarray1-array2} | | | |:--|:--------| | **Description** |Computes the cosine similarity between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments. | | **Example** | `array_cosine_similarity(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))` | | **Result** | `0.9925834` | ###### `array_cross_product(array, array)` {#docs:stable:sql:functions:array::array_cross_productarray-array} | | | |:--|:--------| | **Description** |Computes the cross product of two arrays of size 3. The array elements can not be `NULL`. | | **Example** | `array_cross_product(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))` | | **Result** | `[-1.0, 2.0, -1.0]` | ###### `array_distance(array1, array2)` {#docs:stable:sql:functions:array::array_distancearray1-array2} | | | |:--|:--------| | **Description** |Computes the distance between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments. | | **Example** | `array_distance(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))` | | **Result** | `1.7320508` | ###### `array_inner_product(array1, array2)` {#docs:stable:sql:functions:array::array_inner_productarray1-array2} | | | |:--|:--------| | **Description** |Computes the inner product between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments. | | **Example** | `array_inner_product(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))` | | **Result** | `20.0` | | **Alias** | `array_dot_product` | ###### `array_negative_inner_product(array1, array2)` {#docs:stable:sql:functions:array::array_negative_inner_productarray1-array2} | | | |:--|:--------| | **Description** |Computes the negative inner product between two arrays of the same size. The array elements can not be `NULL`. The arrays can have any size as long as the size is the same for both arguments. | | **Example** | `array_negative_inner_product(array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT), array_value(2.0::FLOAT, 3.0::FLOAT, 4.0::FLOAT))` | | **Result** | `-20.0` | | **Alias** | `array_negative_dot_product` | ###### `array_value(arg, ...)` {#docs:stable:sql:functions:array::array_valuearg-} | | | |:--|:--------| | **Description** |Creates an `ARRAY` containing the argument values. | | **Example** | `array_value(1.0::FLOAT, 2.0::FLOAT, 3.0::FLOAT)` | | **Result** | `[1.0, 2.0, 3.0]` | ### Bitstring Functions {#docs:stable:sql:functions:bitstring} This section describes functions and operators for examining and manipulating [`BITSTRING`](#docs:stable:sql:data_types:bitstring) values. Bitstrings must be of equal length when performing the bitwise operands AND, OR and XOR. When bit shifting, the original length of the string is preserved. #### Bitstring Operators {#docs:stable:sql:functions:bitstring::bitstring-operators} The table below shows the available mathematical operators for `BIT` type. | Operator | Description | Example | Result | |:---|:---|:---|---:| | `&` | Bitwise AND | `'10101'::BITSTRING & '10001'::BITSTRING` | `10001` | | `|` | Bitwise OR | `'1011'::BITSTRING | '0001'::BITSTRING` | `1011` | | `xor` | Bitwise XOR | `xor('101'::BITSTRING, '001'::BITSTRING)` | `100` | | `~` | Bitwise NOT | `~('101'::BITSTRING)` | `010` | | `<<` | Bitwise shift left | `'1001011'::BITSTRING << 3` | `1011000` | | `>>` | Bitwise shift right | `'1001011'::BITSTRING >> 3` | `0001001` | #### Bitstring Functions {#docs:stable:sql:functions:bitstring::bitstring-functions} The table below shows the available scalar functions for `BIT` type. | Name | Description | |:--|:-------| | [`bit_count(bitstring)`](#::bit_countbitstring) | Returns the number of set bits in the bitstring. | | [`bit_length(bitstring)`](#::bit_lengthbitstring) | Returns the number of bits in the bitstring. | | [`bit_position(substring, bitstring)`](#::bit_positionsubstring-bitstring) | Returns first starting index of the specified substring within bits, or zero if it's not present. The first (leftmost) bit is indexed 1. | | [`bitstring(bitstring, length)`](#::bitstringbitstring-length) | Returns a bitstring of determined length. | | [`get_bit(bitstring, index)`](#::get_bitbitstring-index) | Extracts the nth bit from bitstring; the first (leftmost) bit is indexed 0. | | [`length(bitstring)`](#::lengthbitstring) | Alias for `bit_length`. | | [`octet_length(bitstring)`](#::octet_lengthbitstring) | Returns the number of bytes in the bitstring. | | [`set_bit(bitstring, index, new_value)`](#::set_bitbitstring-index-new_value) | Sets the nth bit in bitstring to newvalue; the first (leftmost) bit is indexed 0. Returns a new bitstring. | ###### `bit_count(bitstring)` {#docs:stable:sql:functions:bitstring::bit_countbitstring} | | | |:--|:--------| | **Description** |Returns the number of set bits in the bitstring. | | **Example** | `bit_count('1101011'::BITSTRING)` | | **Result** | `5` | ###### `bit_length(bitstring)` {#docs:stable:sql:functions:bitstring::bit_lengthbitstring} | | | |:--|:--------| | **Description** |Returns the number of bits in the bitstring. | | **Example** | `bit_length('1101011'::BITSTRING)` | | **Result** | `7` | ###### `bit_position(substring, bitstring)` {#docs:stable:sql:functions:bitstring::bit_positionsubstring-bitstring} | | | |:--|:--------| | **Description** |Returns first starting index of the specified substring within bits, or zero if it's not present. The first (leftmost) bit is indexed 1 | | **Example** | `bit_position('010'::BITSTRING, '1110101'::BITSTRING)` | | **Result** | `4` | ###### `bitstring(bitstring, length)` {#docs:stable:sql:functions:bitstring::bitstringbitstring-length} | | | |:--|:--------| | **Description** |Returns a bitstring of determined length. | | **Example** | `bitstring('1010'::BITSTRING, 7)` | | **Result** | `0001010` | ###### `get_bit(bitstring, index)` {#docs:stable:sql:functions:bitstring::get_bitbitstring-index} | | | |:--|:--------| | **Description** |Extracts the nth bit from bitstring; the first (leftmost) bit is indexed 0. | | **Example** | `get_bit('0110010'::BITSTRING, 2)` | | **Result** | `1` | ###### `length(bitstring)` {#docs:stable:sql:functions:bitstring::lengthbitstring} | | | |:--|:--------| | **Description** |Alias for `bit_length`. | | **Example** | `length('1101011'::BITSTRING)` | | **Result** | `7` | ###### `octet_length(bitstring)` {#docs:stable:sql:functions:bitstring::octet_lengthbitstring} | | | |:--|:--------| | **Description** |Returns the number of bytes in the bitstring. | | **Example** | `octet_length('1101011'::BITSTRING)` | | **Result** | `1` | ###### `set_bit(bitstring, index, new_value)` {#docs:stable:sql:functions:bitstring::set_bitbitstring-index-new_value} | | | |:--|:--------| | **Description** |Sets the nth bit in bitstring to newvalue; the first (leftmost) bit is indexed 0. Returns a new bitstring. | | **Example** | `set_bit('0110010'::BITSTRING, 2, 0)` | | **Result** | `0100010` | #### Bitstring Aggregate Functions {#docs:stable:sql:functions:bitstring::bitstring-aggregate-functions} These aggregate functions are available for `BIT` type. | Name | Description | |:--|:-------| | [`bit_and(arg)`](#::bit_andarg) | Returns the bitwise AND operation performed on all bitstrings in a given expression. | | [`bit_or(arg)`](#::bit_orarg) | Returns the bitwise OR operation performed on all bitstrings in a given expression. | | [`bit_xor(arg)`](#::bit_xorarg) | Returns the bitwise XOR operation performed on all bitstrings in a given expression. | | [`bitstring_agg(arg)`](#::bitstring_aggarg) | Returns a bitstring with bits set for each distinct position defined in `arg`. | | [`bitstring_agg(arg, min, max)`](#::bitstring_aggarg-min-max) | Returns a bitstring with bits set for each distinct position defined in `arg`. All positions must be within the range [`min`, `max`] or an `Out of Range Error` will be thrown. | ###### `bit_and(arg)` {#docs:stable:sql:functions:bitstring::bit_andarg} | | | |:--|:--------| | **Description** |Returns the bitwise AND operation performed on all bitstrings in a given expression. | | **Example** | `bit_and(A)` | ###### `bit_or(arg)` {#docs:stable:sql:functions:bitstring::bit_orarg} | | | |:--|:--------| | **Description** |Returns the bitwise OR operation performed on all bitstrings in a given expression. | | **Example** | `bit_or(A)` | ###### `bit_xor(arg)` {#docs:stable:sql:functions:bitstring::bit_xorarg} | | | |:--|:--------| | **Description** |Returns the bitwise XOR operation performed on all bitstrings in a given expression. | | **Example** | `bit_xor(A)` | ###### `bitstring_agg(arg)` {#docs:stable:sql:functions:bitstring::bitstring_aggarg} | | | |:--|:--------| | **Description** |The `bitstring_agg` function takes any integer type as input and returns a bitstring with bits set for each distinct value. The left-most bit represents the smallest value in the column and the right-most bit the maximum value. If possible, the min and max are retrieved from the column statistics. Otherwise, it is also possible to provide the min and max values. | | **Example** | `bitstring_agg(A)` | > **Tip.** The combination of `bit_count` and `bitstring_agg` can be used as an alternative to `count(DISTINCT ...)`, with possible performance improvements in cases of low cardinality and dense values. ###### `bitstring_agg(arg, min, max)` {#docs:stable:sql:functions:bitstring::bitstring_aggarg-min-max} | | | |:--|:--------| | **Description** |Returns a bitstring with bits set for each distinct position defined in `arg`. All positions must be within the range [`min`, `max`] or an `Out of Range Error` will be thrown. | | **Example** | `bitstring_agg(A, 1, 42)` | ### Blob Functions {#docs:stable:sql:functions:blob} This section describes functions and operators for examining and manipulating [`BLOB` values](#docs:stable:sql:data_types:blob). | Function | Description | |:--|:-------| | [`arg1 || arg2`](#::arg1--arg2) | Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also [`concat(arg1, arg2, ...)`](#docs:stable:sql:functions:text::concatvalue-) and [`list_concat(list1, list2, ...)`](#docs:stable:sql:functions:list::list_concatlist_1--list_n). | | [`base64(blob)`](#::to_base64blob) | Alias for `to_base64`. | | [`decode(blob)`](#::decodeblob) | Converts `blob` to `VARCHAR`. Fails if `blob` is not valid UTF-8. | | [`encode(string)`](#::encodestring) | Converts the `string` to `BLOB`. Converts UTF-8 characters into literal encoding. | | [`from_base64(string)`](#::from_base64string) | Converts a base64 encoded `string` to a character string (` BLOB`). | | [`from_binary(value)`](#::unbinvalue) | Alias for `unbin`. | | [`from_hex(value)`](#::unhexvalue) | Alias for `unhex`. | | [`hex(blob)`](#::hexblob) | Converts `blob` to `VARCHAR` using hexadecimal encoding. | | [`md5(blob)`](#::md5blob) | Returns the MD5 hash of the `blob` as a `VARCHAR`. | | [`md5_number(blob)`](#::md5_numberblob) | Returns the MD5 hash of the `blob` as a `HUGEINT`. | | [`octet_length(blob)`](#::octet_lengthblob) | Number of bytes in `blob`. | | [`read_blob(source)`](#::read_blobsource) | Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `BLOB`. See the [`read_blob` guide](#docs:stable:guides:file_formats:read_file::read_blob) for more details. | | [`repeat(blob, count)`](#::repeatblob-count) | Repeats the `blob` `count` number of times. | | [`sha1(blob)`](#::sha1blob) | Returns a `VARCHAR` with the SHA-1 hash of the `blob`. | | [`sha256(blob)`](#::sha256blob) | Returns a `VARCHAR` with the SHA-256 hash of the `blob`. | | [`to_base64(blob)`](#::to_base64blob) | Converts a `blob` to a base64 encoded string. | | [`to_hex(blob)`](#::hexblob) | Alias for `hex`. | | [`unbin(value)`](#::unbinvalue) | Converts a `value` from binary representation to a blob. | | [`unhex(value)`](#::unhexvalue) | Converts a `value` from hexadecimal representation to a blob. | ###### `arg1 || arg2` {#docs:stable:sql:functions:blob::arg1--arg2} | | | |:--|:--------| | **Description** |Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also [`concat(arg1, arg2, ...)`](#docs:stable:sql:functions:text::concatvalue-) and [`list_concat(list1, list2, ...)`](#docs:stable:sql:functions:list::list_concatlist_1--list_n). | | **Example 1** | `'Duck' || 'DB'` | | **Result** | `DuckDB` | | **Example 2** | `[1, 2, 3] || [4, 5, 6]` | | **Result** | `[1, 2, 3, 4, 5, 6]` | | **Example 3** | `'\xAA'::BLOB || '\xBB'::BLOB` | | **Result** | `\xAA\xBB` | ###### `decode(blob)` {#docs:stable:sql:functions:blob::decodeblob} | | | |:--|:--------| | **Description** |Converts `blob` to `VARCHAR`. Fails if `blob` is not valid UTF-8. | | **Example** | `decode('\xC3\xBC'::BLOB)` | | **Result** | `Ã¼` | ###### `encode(string)` {#docs:stable:sql:functions:blob::encodestring} | | | |:--|:--------| | **Description** |Converts the `string` to `BLOB`. Converts UTF-8 characters into literal encoding. | | **Example** | `encode('my_string_with_Ã¼')` | | **Result** | `my_string_with_\xC3\xBC` | ###### `from_base64(string)` {#docs:stable:sql:functions:blob::from_base64string} | | | |:--|:--------| | **Description** |Converts a base64 encoded `string` to a character string (` BLOB`). | | **Example** | `from_base64('QQ==')` | | **Result** | `A` | ###### `hex(blob)` {#docs:stable:sql:functions:blob::hexblob} | | | |:--|:--------| | **Description** |Converts `blob` to `VARCHAR` using hexadecimal encoding. | | **Example** | `hex('\xAA\xBB'::BLOB)` | | **Result** | `AABB` | | **Alias** | `to_hex` | ###### `md5(blob)` {#docs:stable:sql:functions:blob::md5blob} | | | |:--|:--------| | **Description** |Returns the MD5 hash of the `blob` as a `VARCHAR`. | | **Example** | `md5('\xAA\xBB'::BLOB)` | | **Result** | `58cea1f6b2b06520613e09af90dc1c47` | ###### `md5_number(blob)` {#docs:stable:sql:functions:blob::md5_numberblob} | | | |:--|:--------| | **Description** |Returns the MD5 hash of the `blob` as a `HUGEINT`. | | **Example** | `md5_number('\xAA\xBB'::BLOB)` | | **Result** | `94525045605907259200829535064523132504` | ###### `octet_length(blob)` {#docs:stable:sql:functions:blob::octet_lengthblob} | | | |:--|:--------| | **Description** |Number of bytes in `blob`. | | **Example** | `octet_length('\xAA\xBB'::BLOB)` | | **Result** | `2` | ###### `read_blob(source)` {#docs:stable:sql:functions:blob::read_blobsource} | | | |:--|:--------| | **Description** |Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `BLOB`. See the [`read_blob` guide](#docs:stable:guides:file_formats:read_file::read_blob) for more details. | | **Example** | `read_blob('hello.bin')` | | **Result** | `hello\x0A` | ###### `repeat(blob, count)` {#docs:stable:sql:functions:blob::repeatblob-count} | | | |:--|:--------| | **Description** |Repeats the `blob` `count` number of times. | | **Example** | `repeat('\xAA\xBB'::BLOB, 5)` | | **Result** | `\xAA\xBB\xAA\xBB\xAA\xBB\xAA\xBB\xAA\xBB` | ###### `sha1(blob)` {#docs:stable:sql:functions:blob::sha1blob} | | | |:--|:--------| | **Description** |Returns a `VARCHAR` with the SHA-1 hash of the `blob`. | | **Example** | `sha1('\xAA\xBB'::BLOB)` | | **Result** | `65b1e351a6cbfeb41c927222bc9ef53aad3396b0` | ###### `sha256(blob)` {#docs:stable:sql:functions:blob::sha256blob} | | | |:--|:--------| | **Description** |Returns a `VARCHAR` with the SHA-256 hash of the `blob`. | | **Example** | `sha256('\xAA\xBB'::BLOB)` | | **Result** | `d798d1fac6bd4bb1c11f50312760351013379a0ab6f0a8c0af8a506b96b2525a` | ###### `to_base64(blob)` {#docs:stable:sql:functions:blob::to_base64blob} | | | |:--|:--------| | **Description** |Converts a `blob` to a base64 encoded string. | | **Example** | `to_base64('A'::BLOB)` | | **Result** | `QQ==` | | **Alias** | `base64` | ###### `unbin(value)` {#docs:stable:sql:functions:blob::unbinvalue} | | | |:--|:--------| | **Description** |Converts a `value` from binary representation to a blob. | | **Example** | `unbin('0110')` | | **Result** | `\x06` | | **Alias** | `from_binary` | ###### `unhex(value)` {#docs:stable:sql:functions:blob::unhexvalue} | | | |:--|:--------| | **Description** |Converts a `value` from hexadecimal representation to a blob. | | **Example** | `unhex('2A')` | | **Result** | `*` | | **Alias** | `from_hex` | ### Date Format Functions {#docs:stable:sql:functions:dateformat} The `strftime` and `strptime` functions can be used to convert between [`DATE`](#docs:stable:sql:data_types:date) / [`TIMESTAMP`](#docs:stable:sql:data_types:timestamp) values and strings. This is often required when parsing CSV files, displaying output to the user or transferring information between programs. Because there are many possible date representations, these functions accept a [format string](#::format-specifiers) that describes how the date or timestamp should be structured. #### `strftime` Examples {#docs:stable:sql:functions:dateformat::strftime-examples} The [`strftime(timestamp, format)`](#docs:stable:sql:functions:timestamp::strftimetimestamp-format) converts timestamps or dates to strings according to the specified pattern. ```sql SELECT strftime(DATE '1992-03-02', '%d/%m/%Y'); ``` ```text 02/03/1992 ``` ```sql SELECT strftime(TIMESTAMP '1992-03-02 20:32:45', '%A, %-d %B %Y - %I:%M:%S %p'); ``` ```text Monday, 2 March 1992 - 08:32:45 PM ``` #### `strptime` Examples {#docs:stable:sql:functions:dateformat::strptime-examples} The [`strptime(text, format)` function](#docs:stable:sql:functions:timestamp::strptimetext-format) converts strings to timestamps according to the specified pattern. ```sql SELECT strptime('02/03/1992', '%d/%m/%Y'); ``` ```text 1992-03-02 00:00:00 ``` ```sql SELECT strptime('Monday, 2 March 1992 - 08:32:45 PM', '%A, %-d %B %Y - %I:%M:%S %p'); ``` ```text 1992-03-02 20:32:45 ``` The `strptime` function throws an error on failure: ```sql SELECT strptime('02/50/1992', '%d/%m/%Y') AS x; ``` ```console Invalid Input Error: Could not parse string "02/50/1992" according to format specifier "%d/%m/%Y" 02/50/1992 ^ Error: Month out of range, expected a value between 1 and 12 ``` To return `NULL` on failure, use the [`try_strptime` function](#docs:stable:sql:functions:timestamp::try_strptimetext-format): ```text NULL ``` #### CSV Parsing {#docs:stable:sql:functions:dateformat::csv-parsing} The date formats can also be specified during CSV parsing, either in the [`COPY` statement](#docs:stable:sql:statements:copy) or in the `read_csv` function. This can be done by either specifying a `DATEFORMAT` or a `TIMESTAMPFORMAT` (or both). `DATEFORMAT` will be used for converting dates, and `TIMESTAMPFORMAT` will be used for converting timestamps. Below are some examples for how to use this. In a `COPY` statement: ```sql COPY dates FROM 'test.csv' (DATEFORMAT '%d/%m/%Y', TIMESTAMPFORMAT '%A, %-d %B %Y - %I:%M:%S %p'); ``` In a `read_csv` function: ```sql SELECT * FROM read_csv('test.csv', dateformat = '%m/%d/%Y', timestampformat = '%A, %-d %B %Y - %I:%M:%S %p'); ``` #### Format Specifiers {#docs:stable:sql:functions:dateformat::format-specifiers} Below is a full list of all available format specifiers. | Specifier | Description | Example | |:-|:------|:---| | `%a` | Abbreviated weekday name. | Sun, Mon, ... | | `%A` | Full weekday name. | Sunday, Monday, ... | | `%b` | Abbreviated month name. | Jan, Feb, ..., Dec | | `%B` | Full month name. | January, February, ... | | `%c` | ISO date and time representation | 1992-03-02 10:30:20 | | `%d` | Day of the month as a zero-padded decimal. | 01, 02, ..., 31 | | `%-d` | Day of the month as a decimal number. | 1, 2, ..., 30 | | `%f` | Microsecond as a decimal number, zero-padded on the left. | 000000 - 999999 | | `%g` | Millisecond as a decimal number, zero-padded on the left. | 000 - 999 | | `%G` | ISO 8601 year with century representing the year that contains the greater part of the ISO week (see `%V`). | 0001, 0002, ..., 2013, 2014, ..., 9998, 9999 | | `%H` | Hour (24-hour clock) as a zero-padded decimal number. | 00, 01, ..., 23 | | `%-H` | Hour (24-hour clock) as a decimal number. | 0, 1, ..., 23 | | `%I` | Hour (12-hour clock) as a zero-padded decimal number. | 01, 02, ..., 12 | | `%-I` | Hour (12-hour clock) as a decimal number. | 1, 2, ... 12 | | `%j` | Day of the year as a zero-padded decimal number. | 001, 002, ..., 366 | | `%-j` | Day of the year as a decimal number. | 1, 2, ..., 366 | | `%m` | Month as a zero-padded decimal number. | 01, 02, ..., 12 | | `%-m` | Month as a decimal number. | 1, 2, ..., 12 | | `%M` | Minute as a zero-padded decimal number. | 00, 01, ..., 59 | | `%-M` | Minute as a decimal number. | 0, 1, ..., 59 | | `%n` | Nanosecond as a decimal number, zero-padded on the left. | 000000000 - 999999999 | | `%p` | Locale's AM or PM. | AM, PM | | `%S` | Second as a zero-padded decimal number. | 00, 01, ..., 59 | | `%-S` | Second as a decimal number. | 0, 1, ..., 59 | | `%u` | ISO 8601 weekday as a decimal number where 1 is Monday. | 1, 2, ..., 7 | | `%U` | Week number of the year. Week 01 starts on the first Sunday of the year, so there can be week 00. Note that this is not compliant with the week date standard in ISO-8601. | 00, 01, ..., 53 | | `%V` | ISO 8601 week as a decimal number with Monday as the first day of the week. Week 01 is the week containing Jan 4. Note that `%V` is incompatible with year directive `%Y`. Use the ISO year `%G` instead. | 01, ..., 53 | | `%w` | Weekday as a decimal number. | 0, 1, ..., 6 | | `%W` | Week number of the year. Week 01 starts on the first Monday of the year, so there can be week 00. Note that this is not compliant with the week date standard in ISO-8601. | 00, 01, ..., 53 | | `%x` | ISO date representation | 1992-03-02 | | `%X` | ISO time representation | 10:30:20 | | `%y` | Year without century as a zero-padded decimal number. | 00, 01, ..., 99 | | `%-y` | Year without century as a decimal number. | 0, 1, ..., 99 | | `%Y` | Year with century as a decimal number. | 2013, 2019 etc. | | `%z` | [Time offset from UTC](https://en.wikipedia.org/wiki/ISO_8601#Time_offsets_from_UTC) in the form Â±HH:MM, Â±HHMM, or Â±HH. | -0700 | | `%Z` | Time zone name. | Europe/Amsterdam | | `%%` | A literal `%` character. | % | ### Date Functions {#docs:stable:sql:functions:date} This section describes functions and operators for examining and manipulating [`DATE`](#docs:stable:sql:data_types:date) values. #### Date Operators {#docs:stable:sql:functions:date::date-operators} The table below shows the available mathematical operators for `DATE` types. | Operator | Description | Example | Result | |:-|:--|:---|:--| | `+` | addition of days (integers) | `DATE '1992-03-22' + 5` | `1992-03-27` | | `+` | addition of an `INTERVAL` | `DATE '1992-03-22' + INTERVAL 5 DAY` | `1992-03-27 00:00:00` | | `+` | addition of a variable `INTERVAL` | `SELECT DATE '1992-03-22' + INTERVAL (d.days) DAY FROM (VALUES (5), (11)) d(days)` | `1992-03-27 00:00:00` and `1992-04-02 00:00:00` | | `-` | subtraction of `DATE`s | `DATE '1992-03-27' - DATE '1992-03-22'` | `5` | | `-` | subtraction of an `INTERVAL` | `DATE '1992-03-27' - INTERVAL 5 DAY` | `1992-03-22 00:00:00` | | `-` | subtraction of a variable `INTERVAL` | `SELECT DATE '1992-03-27' - INTERVAL (d.days) DAY FROM (VALUES (5), (11)) d(days)` | `1992-03-22 00:00:00` and `1992-03-16 00:00:00` | Adding to or subtracting from [infinite values](#docs:stable:sql:data_types:date::special-values) produces the same infinite value. #### Date Functions {#docs:stable:sql:functions:date::date-functions} The table below shows the available functions for `DATE` types. Dates can also be manipulated with the [timestamp functions](#docs:stable:sql:functions:timestamp) through type promotion. | Name | Description | |:--|:-------| | [`current_date`](#::current_date) | Current date (at start of current transaction) in the local time zone. Note that parentheses should be omitted from the function call. | | [`date_add(date, interval)`](#::date_adddate-interval) | Add the interval to the date and return a `DATETIME` value. | | [`date_diff(part, startdate, startdate)`](#::date_diffpart-startdate-enddate) | The number of [`part`](#docs:stable:sql:functions:datepart) boundaries between `startdate` and `enddate`, inclusive of the larger date and exclusive of the smaller date. | | [`date_part(part, date)`](#::date_partpart-date) | Get [subfield](#docs:stable:sql:functions:datepart) (equivalent to `extract`). | | [`date_sub(part, startdate, enddate)`](#::date_subpart-startdate-enddate) | The signed length of the interval between `startdate` and `enddate`, truncated to whole multiples of [`part`](#docs:stable:sql:functions:datepart). | | [`date_trunc(part, date)`](#::date_truncpart-date) | Truncate to specified [precision](#docs:stable:sql:functions:datepart). | | [`dayname(date)`](#::daynamedate) | The (English) name of the weekday. | | [`extract(part from date)`](#::extractpart-from-date) | Get [subfield](#docs:stable:sql:functions:datepart) from a date. | | [`greatest(date, date)`](#::greatestdate-date) | The later of two dates. | | [`isfinite(date)`](#::isfinitedate) | Returns true if the date is finite, false otherwise. | | [`isinf(date)`](#::isinfdate) | Returns true if the date is infinite, false otherwise. | | [`julian(date)`](#::juliandate) | Extract the Julian Day number from a date. | | [`last_day(date)`](#::last_daydate) | The last day of the corresponding month in the date. | | [`least(date, date)`](#::leastdate-date) | The earlier of two dates. | | [`make_date(year, month, day)`](#::make_dateyear-month-day) | The date for the given parts. | | [`monthname(date)`](#::monthnamedate) | The (English) name of the month. | | [`strftime(date, format)`](#::strftimedate-format) | Converts a date to a string according to the [format string](#docs:stable:sql:functions:dateformat). | | [`time_bucket(bucket_width, date[, offset])`](#time_bucketbucket_width-date-offset) | Truncate `date` to a grid of width `bucket_width`. The grid is anchored at `2000-01-01[ + offset]` when `bucket_width` is a number of months or coarser units, else `2000-01-03[ + offset]`. Note that `2000-01-03` is a Monday. | | [`time_bucket(bucket_width, date[, origin])`](#time_bucketbucket_width-date-origin) | Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at the `origin` timestamp, which defaults to `2000-01-01` when `bucket_width` is a number of months or coarser units, else `2000-01-03`. Note that `2000-01-03` is a Monday. | | [`today()`](#::today) | Current date (start of current transaction) in UTC. | ###### `current_date` {#docs:stable:sql:functions:date::current_date} | | | |:--|:--------| | **Description** |Current date (at start of current transaction) in the local time zone. Note that parentheses should be omitted from the function call. | | **Example** | `current_date` | | **Result** | `2022-10-08` | ###### `date_add(date, interval)` {#docs:stable:sql:functions:date::date_adddate-interval} | | | |:--|:--------| | **Description** |Add the interval to the date and return a `DATETIME` value. | | **Example** | `date_add(DATE '1992-09-15', INTERVAL 2 MONTH)` | | **Result** | `1992-11-15 00:00:00` | ###### `date_diff(part, startdate, enddate)` {#docs:stable:sql:functions:date::date_diffpart-startdate-enddate} | | | |:--|:--------| | **Description** |The number of [`part`](#docs:stable:sql:functions:datepart) boundaries between `startdate` and `enddate`, inclusive of the larger date and exclusive of the smaller date. | | **Example** | `date_diff('month', DATE '1992-09-15', DATE '1992-11-14')` | | **Result** | `2` | | **Alias** | `datediff` | ###### `date_part(part, date)` {#docs:stable:sql:functions:date::date_partpart-date} | | | |:--|:--------| | **Description** |Get the [subfield](#docs:stable:sql:functions:datepart) (equivalent to `extract`). | | **Example** | `date_part('year', DATE '1992-09-20')` | | **Result** | `1992` | | **Alias** | `datepart` | ###### `date_sub(part, startdate, enddate)` {#docs:stable:sql:functions:date::date_subpart-startdate-enddate} | | | |:--|:--------| | **Description** |The signed length of the interval between `startdate` and `enddate`, truncated to whole multiples of [`part`](#docs:stable:sql:functions:datepart). | | **Example** | `date_sub('month', DATE '1992-09-15', DATE '1992-11-14')` | | **Result** | `1` | | **Alias** | `datesub` | ###### `date_trunc(part, date)` {#docs:stable:sql:functions:date::date_truncpart-date} | | | |:--|:--------| | **Description** |Truncate to specified [precision](#docs:stable:sql:functions:datepart). | | **Example** | `date_trunc('month', DATE '1992-03-07')` | | **Result** | `1992-03-01` | | **Alias** | `datetrunc` | ###### `dayname(date)` {#docs:stable:sql:functions:date::daynamedate} | | | |:--|:--------| | **Description** |The (English) name of the weekday. | | **Example** | `dayname(DATE '1992-09-20')` | | **Result** | `Sunday` | ###### `extract(part from date)` {#docs:stable:sql:functions:date::extractpart-from-date} | | | |:--|:--------| | **Description** |Get [subfield](#docs:stable:sql:functions:datepart) from a date. | | **Example** | `extract('year' FROM DATE '1992-09-20')` | | **Result** | `1992` | ###### `greatest(date, date)` {#docs:stable:sql:functions:date::greatestdate-date} | | | |:--|:--------| | **Description** |The later of two dates. | | **Example** | `greatest(DATE '1992-09-20', DATE '1992-03-07')` | | **Result** | `1992-09-20` | ###### `isfinite(date)` {#docs:stable:sql:functions:date::isfinitedate} | | | |:--|:--------| | **Description** |Returns `true` if the date is finite, false otherwise. | | **Example** | `isfinite(DATE '1992-03-07')` | | **Result** | `true` | ###### `isinf(date)` {#docs:stable:sql:functions:date::isinfdate} | | | |:--|:--------| | **Description** |Returns `true` if the date is infinite, false otherwise. | | **Example** | `isinf(DATE '-infinity')` | | **Result** | `true` | ###### `julian(date)` {#docs:stable:sql:functions:date::juliandate} | | | |:--|:--------| | **Description** |Extract the Julian Day number from a date. | | **Example** | `julian(DATE '1992-09-20')` | | **Result** | `2448886.0` | ###### `last_day(date)` {#docs:stable:sql:functions:date::last_daydate} | | | |:--|:--------| | **Description** |The last day of the corresponding month in the date. | | **Example** | `last_day(DATE '1992-09-20')` | | **Result** | `1992-09-30` | ###### `least(date, date)` {#docs:stable:sql:functions:date::leastdate-date} | | | |:--|:--------| | **Description** |The earlier of two dates. | | **Example** | `least(DATE '1992-09-20', DATE '1992-03-07')` | | **Result** | `1992-03-07` | ###### `make_date(year, month, day)` {#docs:stable:sql:functions:date::make_dateyear-month-day} | | | |:--|:--------| | **Description** |The date for the given parts. | | **Example** | `make_date(1992, 9, 20)` | | **Result** | `1992-09-20` | ###### `monthname(date)` {#docs:stable:sql:functions:date::monthnamedate} | | | |:--|:--------| | **Description** |The (English) name of the month. | | **Example** | `monthname(DATE '1992-09-20')` | | **Result** | `September` | ###### `strftime(date, format)` {#docs:stable:sql:functions:date::strftimedate-format} | | | |:--|:--------| | **Description** |Converts a date to a string according to the [format string](#docs:stable:sql:functions:dateformat). | | **Example** | `strftime(DATE '1992-01-01', '%a, %-d %B %Y')` | | **Result** | `Wed, 1 January 1992` | ###### `time_bucket(bucket_width, date[, offset])` {#docs:stable:sql:functions:date::time_bucketbucket_width-date-offset} | | | |:--|:--------| | **Description** |Truncate `date` to a grid of width `bucket_width`. The grid is anchored at `2000-01-01[ + offset]` when `bucket_width` is a number of months or coarser units, else `2000-01-03[ + offset]`. Note that `2000-01-03` is a Monday. | | **Example** | `time_bucket(INTERVAL '2 months', DATE '1992-04-20', INTERVAL '1 month')` | | **Result** | `1992-04-01` | ###### `time_bucket(bucket_width, date[, origin])` {#docs:stable:sql:functions:date::time_bucketbucket_width-date-origin} | | | |:--|:--------| | **Description** |Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at the `origin` timestamp, which defaults to `2000-01-01` when `bucket_width` is a number of months or coarser units, else `2000-01-03`. Note that `2000-01-03` is a Monday. | | **Example** | `time_bucket(INTERVAL '2 weeks', DATE '1992-04-20', DATE '1992-04-01')` | | **Result** | `1992-04-15` | ###### `today()` {#docs:stable:sql:functions:date::today} | | | |:--|:--------| | **Description** |Current date (start of current transaction) in UTC. | | **Example** | `today()` | | **Result** | `2022-10-08` | #### Date Part Extraction Functions {#docs:stable:sql:functions:date::date-part-extraction-functions} There are also dedicated extraction functions to get the [subfields](#docs:stable:sql:functions:datepart::part-functions). A few examples include extracting the day from a date, or the day of the week from a date. Functions applied to infinite dates will either return the same infinite dates (e.g., `greatest`) or `NULL` (e.g., `date_part`) depending on what â€œmakes senseâ€. In general, if the function needs to examine the parts of the infinite date, the result will be `NULL`. ### Date Part Functions {#docs:stable:sql:functions:datepart} The `date_part`, `date_trunc`, and `date_diff` functions can be used to extract or manipulate parts of temporal types such as [`TIMESTAMP`](#docs:stable:sql:data_types:timestamp), [`TIMESTAMPTZ`](#docs:stable:sql:data_types:timestamp), [`DATE`](#docs:stable:sql:data_types:date) and [`INTERVAL`](#docs:stable:sql:data_types:interval). The parts to be extracted or manipulated are specified by one of the strings in the tables below. The example column provides the corresponding parts of the timestamp `2021-08-03 11:59:44.123456`. Only the entries of the first table can be extracted from `INTERVAL`s or used to construct them. > Except for `julian` and `epoch`, which return `DOUBLE`s, all parts are extracted as integers. Since there are no infinite integer values in DuckDB, `NULL`s are returned for infinite timestamps. #### Part Specifiers Usable as Date Part Specifiers and in Intervals {#docs:stable:sql:functions:datepart::part-specifiers-usable-as-date-part-specifiers-and-in-intervals} | Specifier | Description | Synonyms | Example | |:--|:--|:---|--:| | `century` | Gregorian century | `cent`, `centuries`, `c` | `21` | | `day` | Gregorian day | `days`, `d`, `dayofmonth` | `3` | | `decade` | Gregorian decade | `dec`, `decades`, `decs` | `202` | | `hour` | Hours | `hr`, `hours`, `hrs`, `h` | `11` | | `microseconds` | Sub-minute microseconds | `microsecond`, `us`, `usec`, `usecs`, `usecond`, `useconds` | `44123456` | | `millennium` | Gregorian millennium | `mil`, `millenniums`, `millenia`, `mils`, `millenium` | `3` | | `milliseconds` | Sub-minute milliseconds | `millisecond`, `ms`, `msec`, `msecs`, `msecond`, `mseconds` | `44123` | | `minute` | Minutes | `min`, `minutes`, `mins`, `m` | `59` | | `month` | Gregorian month | `mon`, `months`, `mons` | `8` | | `quarter` | Quarter of the year (1-4) | `quarters` | `3` | | `second` | Seconds | `sec`, `seconds`, `secs`, `s` | `44` | | `year` | Gregorian year | `yr`, `y`, `years`, `yrs` | `2021` | #### Part Specifiers Only Usable as Date Part Specifiers {#docs:stable:sql:functions:datepart::part-specifiers-only-usable-as-date-part-specifiers} | Specifier | Description | Synonyms | Example | |:--|:--|:---|--:| | `dayofweek` | Day of the week (Sunday = 0, Saturday = 6) | `weekday`, `dow` | `2` | | `dayofyear` | Day of the year (1-365/366) | `doy` | `215` | | `epoch` | Seconds since 1970-01-01 | | `1760465850.6698709` | | `era` | Gregorian era (CE/AD, BCE/BC) | | `1` | | `isodow` | ISO day of the week (Monday = 1, Sunday = 7) | | `2` | | `isoyear` | ISO Year number (Starts on Monday of week containing Jan 4th) | | `2021` | | `julian` | Julian Day number. | | `2459430.4998162435` | | `timezone_hour` | Time zone offset hour portion | | `0` | | `timezone_minute` | Time zone offset minute portion | | `0` | | `timezone` | Time zone offset in seconds | | `0` | | `week` | Week number | `weeks`, `w` | `31` | | `yearweek` | ISO year and week number in `YYYYWW` format | | `202131` | Note that the time zone parts are all zero unless a time zone extension such as [ICU](#docs:stable:core_extensions:icu) has been installed to support `TIMESTAMP WITH TIME ZONE`. #### Part Functions {#docs:stable:sql:functions:datepart::part-functions} There are dedicated extraction functions to get certain subfields: | Name | Description | |:--|:-------| | [`century(date)`](#::centurydate) | Century. | | [`day(date)`](#::daydate) | Day. | | [`dayofmonth(date)`](#::dayofmonthdate) | Day (synonym). | | [`dayofweek(date)`](#::dayofweekdate) | Numeric weekday (Sunday = 0, Saturday = 6). | | [`dayofyear(date)`](#::dayofyeardate) | Day of the year (starts from 1, i.e., January 1 = 1). | | [`decade(date)`](#::decadedate) | Decade (year / 10). | | [`epoch(date)`](#::epochdate) | Seconds since 1970-01-01. | | [`era(date)`](#::eradate) | Calendar era. | | [`hour(date)`](#::hourdate) | Hours. | | [`isodow(date)`](#::isodowdate) | Numeric ISO weekday (Monday = 1, Sunday = 7). | | [`isoyear(date)`](#::isoyeardate) | ISO Year number (Starts on Monday of week containing Jan 4th). | | [`julian(date)`](#::juliandate) | `DOUBLE` Julian Day number. | | [`microsecond(date)`](#::microseconddate) | Sub-minute microseconds. | | [`millennium(date)`](#::millenniumdate) | Millennium. | | [`millisecond(date)`](#::milliseconddate) | Sub-minute milliseconds. | | [`minute(date)`](#::minutedate) | Minutes. | | [`month(date)`](#::monthdate) | Month. | | [`quarter(date)`](#::quarterdate) | Quarter. | | [`second(date)`](#::seconddate) | Seconds. | | [`timezone_hour(date)`](#::timezone_hourdate) | Time zone offset hour portion. | | [`timezone_minute(date)`](#::timezone_minutedate) | Time zone offset minutes portion. | | [`timezone(date)`](#::timezonedate) | Time zone offset in minutes. | | [`week(date)`](#::weekdate) | ISO Week. | | [`weekday(date)`](#::weekdaydate) | Numeric weekday synonym (Sunday = 0, Saturday = 6). | | [`weekofyear(date)`](#::weekofyeardate) | ISO Week (synonym). | | [`year(date)`](#::yeardate) | Year. | | [`yearweek(date)`](#::yearweekdate) | `BIGINT` of combined ISO Year number and 2-digit version of ISO Week number. | ###### `century(date)` {#docs:stable:sql:functions:datepart::centurydate} | | | |:--|:--------| | **Description** |Century. | | **Example** | `century(DATE '1992-02-15')` | | **Result** | `20` | ###### `day(date)` {#docs:stable:sql:functions:datepart::daydate} | | | |:--|:--------| | **Description** |Day. | | **Example** | `day(DATE '1992-02-15')` | | **Result** | `15` | ###### `dayofmonth(date)` {#docs:stable:sql:functions:datepart::dayofmonthdate} | | | |:--|:--------| | **Description** |Day (synonym). | | **Example** | `dayofmonth(DATE '1992-02-15')` | | **Result** | `15` | ###### `dayofweek(date)` {#docs:stable:sql:functions:datepart::dayofweekdate} | | | |:--|:--------| | **Description** |Numeric weekday (Sunday = 0, Saturday = 6). | | **Example** | `dayofweek(DATE '1992-02-15')` | | **Result** | `6` | ###### `dayofyear(date)` {#docs:stable:sql:functions:datepart::dayofyeardate} | | | |:--|:--------| | **Description** |Day of the year (starts from 1, i.e., January 1 = 1). | | **Example** | `dayofyear(DATE '1992-02-15')` | | **Result** | `46` | ###### `decade(date)` {#docs:stable:sql:functions:datepart::decadedate} | | | |:--|:--------| | **Description** |Decade (year / 10). | | **Example** | `decade(DATE '1992-02-15')` | | **Result** | `199` | ###### `epoch(date)` {#docs:stable:sql:functions:datepart::epochdate} | | | |:--|:--------| | **Description** |Seconds since 1970-01-01. | | **Example** | `epoch(DATE '1992-02-15')` | | **Result** | `698112000` | ###### `era(date)` {#docs:stable:sql:functions:datepart::eradate} | | | |:--|:--------| | **Description** |Calendar era. | | **Example** | `era(DATE '0044-03-15 (BC)')` | | **Result** | `0` | ###### `hour(date)` {#docs:stable:sql:functions:datepart::hourdate} | | | |:--|:--------| | **Description** |Hours. | | **Example** | `hour(timestamp '2021-08-03 11:59:44.123456')` | | **Result** | `11` | ###### `isodow(date)` {#docs:stable:sql:functions:datepart::isodowdate} | | | |:--|:--------| | **Description** |Numeric ISO weekday (Monday = 1, Sunday = 7). | | **Example** | `isodow(DATE '1992-02-15')` | | **Result** | `6` | ###### `isoyear(date)` {#docs:stable:sql:functions:datepart::isoyeardate} | | | |:--|:--------| | **Description** |ISO Year number (Starts on Monday of week containing Jan 4th). | | **Example** | `isoyear(DATE '2022-01-01')` | | **Result** | `2021` | ###### `julian(date)` {#docs:stable:sql:functions:datepart::juliandate} | | | |:--|:--------| | **Description** |`DOUBLE` Julian Day number. | | **Example** | `julian(DATE '1992-09-20')` | | **Result** | `2448886.0` | ###### `microsecond(date)` {#docs:stable:sql:functions:datepart::microseconddate} | | | |:--|:--------| | **Description** |Sub-minute microseconds. | | **Example** | `microsecond(timestamp '2021-08-03 11:59:44.123456')` | | **Result** | `44123456` | ###### `millennium(date)` {#docs:stable:sql:functions:datepart::millenniumdate} | | | |:--|:--------| | **Description** |Millennium. | | **Example** | `millennium(DATE '1992-02-15')` | | **Result** | `2` | ###### `millisecond(date)` {#docs:stable:sql:functions:datepart::milliseconddate} | | | |:--|:--------| | **Description** |Sub-minute milliseconds. | | **Example** | `millisecond(timestamp '2021-08-03 11:59:44.123456')` | | **Result** | `44123` | ###### `minute(date)` {#docs:stable:sql:functions:datepart::minutedate} | | | |:--|:--------| | **Description** |Minutes. | | **Example** | `minute(timestamp '2021-08-03 11:59:44.123456')` | | **Result** | `59` | ###### `month(date)` {#docs:stable:sql:functions:datepart::monthdate} | | | |:--|:--------| | **Description** |Month. | | **Example** | `month(DATE '1992-02-15')` | | **Result** | `2` | ###### `quarter(date)` {#docs:stable:sql:functions:datepart::quarterdate} | | | |:--|:--------| | **Description** |Quarter. | | **Example** | `quarter(DATE '1992-02-15')` | | **Result** | `1` | ###### `second(date)` {#docs:stable:sql:functions:datepart::seconddate} | | | |:--|:--------| | **Description** |Seconds. | | **Example** | `second(timestamp '2021-08-03 11:59:44.123456')` | | **Result** | `44` | ###### `timezone_hour(date)` {#docs:stable:sql:functions:datepart::timezone_hourdate} | | | |:--|:--------| | **Description** |Time zone offset hour portion. | | **Example** | `timezone_hour(DATE '1992-02-15')` | | **Result** | `0` | ###### `timezone_minute(date)` {#docs:stable:sql:functions:datepart::timezone_minutedate} | | | |:--|:--------| | **Description** |Time zone offset minutes portion. | | **Example** | `timezone_minute(DATE '1992-02-15')` | | **Result** | `0` | ###### `timezone(date)` {#docs:stable:sql:functions:datepart::timezonedate} | | | |:--|:--------| | **Description** |Time zone offset in minutes. | | **Example** | `timezone(DATE '1992-02-15')` | | **Result** | `0` | ###### `week(date)` {#docs:stable:sql:functions:datepart::weekdate} | | | |:--|:--------| | **Description** |ISO Week. | | **Example** | `week(DATE '1992-02-15')` | | **Result** | `7` | ###### `weekday(date)` {#docs:stable:sql:functions:datepart::weekdaydate} | | | |:--|:--------| | **Description** |Numeric weekday synonym (Sunday = 0, Saturday = 6). | | **Example** | `weekday(DATE '1992-02-15')` | | **Result** | `6` | ###### `weekofyear(date)` {#docs:stable:sql:functions:datepart::weekofyeardate} | | | |:--|:--------| | **Description** |ISO Week (synonym). | | **Example** | `weekofyear(DATE '1992-02-15')` | | **Result** | `7` | ###### `year(date)` {#docs:stable:sql:functions:datepart::yeardate} | | | |:--|:--------| | **Description** |Year. | | **Example** | `year(DATE '1992-02-15')` | | **Result** | `1992` | ###### `yearweek(date)` {#docs:stable:sql:functions:datepart::yearweekdate} | | | |:--|:--------| | **Description** |`BIGINT` of combined ISO Year number and 2-digit version of ISO Week number. | | **Example** | `yearweek(DATE '1992-02-15')` | | **Result** | `199207` | ### Enum Functions {#docs:stable:sql:functions:enum} This section describes functions and operators for examining and manipulating [`ENUM` values](#docs:stable:sql:data_types:enum). The examples assume an enum type created as: ```sql CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy', 'anxious'); ``` These functions can take `NULL` or a specific value of the type as argument(s). With the exception of `enum_range_boundary`, the result depends only on the type of the argument and not on its value. | Name | Description | |:--|:-------| | [`enum_code(enum_value)`](#::enum_codeenum_value) | Returns the numeric value backing the given enum value. | | [`enum_first(enum)`](#::enum_firstenum) | Returns the first value of the input enum type. | | [`enum_last(enum)`](#::enum_lastenum) | Returns the last value of the input enum type. | | [`enum_range(enum)`](#::enum_rangeenum) | Returns all values of the input enum type as an array. | | [`enum_range_boundary(enum, enum)`](#::enum_range_boundaryenum-enum) | Returns the range between the two given enum values as an array. | ###### `enum_code(enum_value)` {#docs:stable:sql:functions:enum::enum_codeenum_value} | | | |:--|:--------| | **Description** |Returns the numeric value backing the given enum value. | | **Example** | `enum_code('happy'::mood)` | | **Result** | `2` | ###### `enum_first(enum)` {#docs:stable:sql:functions:enum::enum_firstenum} | | | |:--|:--------| | **Description** |Returns the first value of the input enum type. | | **Example** | `enum_first(NULL::mood)` | | **Result** | `sad` | ###### `enum_last(enum)` {#docs:stable:sql:functions:enum::enum_lastenum} | | | |:--|:--------| | **Description** |Returns the last value of the input enum type. | | **Example** | `enum_last(NULL::mood)` | | **Result** | `anxious` | ###### `enum_range(enum)` {#docs:stable:sql:functions:enum::enum_rangeenum} | | | |:--|:--------| | **Description** |Returns all values of the input enum type as an array. | | **Example** | `enum_range(NULL::mood)` | | **Result** | `[sad, ok, happy, anxious]` | ###### `enum_range_boundary(enum, enum)` {#docs:stable:sql:functions:enum::enum_range_boundaryenum-enum} | | | |:--|:--------| | **Description** |Returns the range between the two given enum values as an array. The values must be of the same enum type. When the first parameter is `NULL`, the result starts with the first value of the enum type. When the second parameter is `NULL`, the result ends with the last value of the enum type. | | **Example** | `enum_range_boundary(NULL, 'happy'::mood)` | | **Result** | `[sad, ok, happy]` | ### Interval Functions {#docs:stable:sql:functions:interval} This section describes functions and operators for examining and manipulating [`INTERVAL`](#docs:stable:sql:data_types:interval) values. #### Interval Operators {#docs:stable:sql:functions:interval::interval-operators} The table below shows the available mathematical operators for `INTERVAL` types. | Operator | Description | Example | Result | |:-|:--|:----|:--| | `+` | Addition of an `INTERVAL` | `INTERVAL 1 HOUR + INTERVAL 5 HOUR` | `INTERVAL 6 HOUR` | | `+` | Addition to a `DATE` | `DATE '1992-03-22' + INTERVAL 5 DAY` | `1992-03-27` | | `+` | Addition to a `TIMESTAMP` | `TIMESTAMP '1992-03-22 01:02:03' + INTERVAL 5 DAY` | `1992-03-27 01:02:03` | | `+` | Addition to a `TIME` | `TIME '01:02:03' + INTERVAL 5 HOUR` | `06:02:03` | | `-` | Subtraction of an `INTERVAL` | `INTERVAL 5 HOUR - INTERVAL 1 HOUR` | `INTERVAL 4 HOUR` | | `-` | Subtraction from a `DATE` | `DATE '1992-03-27' - INTERVAL 5 DAY` | `1992-03-22` | | `-` | Subtraction from a `TIMESTAMP` | `TIMESTAMP '1992-03-27 01:02:03' - INTERVAL 5 DAY` | `1992-03-22 01:02:03` | | `-` | Subtraction from a `TIME` | `TIME '06:02:03' - INTERVAL 5 HOUR` | `01:02:03` | #### Interval Functions {#docs:stable:sql:functions:interval::interval-functions} The table below shows the available scalar functions for `INTERVAL` types. | Name | Description | |:--|:-------| | [`date_part(part, interval)`](#::date_partpart-interval) | Extract [datepart component](#docs:stable:sql:functions:datepart) (equivalent to `extract`). See [`INTERVAL`](#docs:stable:sql:data_types:interval) for the sometimes surprising rules governing this extraction. | | [`datepart(part, interval)`](#::datepartpart-interval) | Alias of `date_part`. | | [`extract(part FROM interval)`](#::extractpart-from-interval) | Alias of `date_part`. | | [`epoch(interval)`](#::epochinterval) | Get total number of seconds, as double precision floating point number, in interval. | | [`to_centuries(integer)`](#::to_centuriesinteger) | Construct a century interval. | | [`to_days(integer)`](#::to_daysinteger) | Construct a day interval. | | [`to_decades(integer)`](#::to_decadesinteger) | Construct a decade interval. | | [`to_hours(integer)`](#::to_hoursinteger) | Construct an hour interval. | | [`to_microseconds(integer)`](#::to_microsecondsinteger) | Construct a microsecond interval. | | [`to_millennia(integer)`](#::to_millenniainteger) | Construct a millennium interval. | | [`to_milliseconds(integer)`](#::to_millisecondsinteger) | Construct a millisecond interval. | | [`to_minutes(integer)`](#::to_minutesinteger) | Construct a minute interval. | | [`to_months(integer)`](#::to_monthsinteger) | Construct a month interval. | | [`to_quarters(integer`)](#::to_quartersinteger) | Construct an interval of `integer` quarters. | | [`to_seconds(integer)`](#::to_secondsinteger) | Construct a second interval. | | [`to_weeks(integer)`](#::to_weeksinteger) | Construct a week interval. | | [`to_years(integer)`](#::to_yearsinteger) | Construct a year interval. | > Only the documented [date part components](#docs:stable:sql:functions:datepart) are defined for intervals. ###### `date_part(part, interval)` {#docs:stable:sql:functions:interval::date_partpart-interval} | | | |:--|:--------| | **Description** |Extract [datepart component](#docs:stable:sql:functions:datepart) (equivalent to `extract`). See [`INTERVAL`](#docs:stable:sql:data_types:interval) for the sometimes surprising rules governing this extraction. | | **Example** | `date_part('year', INTERVAL '14 months')` | | **Result** | `1` | ###### `datepart(part, interval)` {#docs:stable:sql:functions:interval::datepartpart-interval} | | | |:--|:--------| | **Description** |Alias of `date_part`. | | **Example** | `datepart('year', INTERVAL '14 months')` | | **Result** | `1` | ###### `extract(part FROM interval)` {#docs:stable:sql:functions:interval::extractpart-from-interval} | | | |:--|:--------| | **Description** |Alias of `date_part`. | | **Example** | `extract('month' FROM INTERVAL '14 months')` | | **Result** | 2 | ###### `epoch(interval)` {#docs:stable:sql:functions:interval::epochinterval} | | | |:--|:--------| | **Description** |Get total number of seconds, as double precision floating point number, in interval. | | **Example** | `epoch(INTERVAL 5 HOUR)` | | **Result** | `18000.0` | ###### `to_centuries(integer)` {#docs:stable:sql:functions:interval::to_centuriesinteger} | | | |:--|:--------| | **Description** |Construct a century interval. | | **Example** | `to_centuries(5)` | | **Result** | `INTERVAL 500 YEAR` | ###### `to_days(integer)` {#docs:stable:sql:functions:interval::to_daysinteger} | | | |:--|:--------| | **Description** |Construct a day interval. | | **Example** | `to_days(5)` | | **Result** | `INTERVAL 5 DAY` | ###### `to_decades(integer)` {#docs:stable:sql:functions:interval::to_decadesinteger} | | | |:--|:--------| | **Description** |Construct a decade interval. | | **Example** | `to_decades(5)` | | **Result** | `INTERVAL 50 YEAR` | ###### `to_hours(integer)` {#docs:stable:sql:functions:interval::to_hoursinteger} | | | |:--|:--------| | **Description** |Construct an hour interval. | | **Example** | `to_hours(5)` | | **Result** | `INTERVAL 5 HOUR` | ###### `to_microseconds(integer)` {#docs:stable:sql:functions:interval::to_microsecondsinteger} | | | |:--|:--------| | **Description** |Construct a microsecond interval. | | **Example** | `to_microseconds(5)` | | **Result** | `INTERVAL 5 MICROSECOND` | ###### `to_millennia(integer)` {#docs:stable:sql:functions:interval::to_millenniainteger} | | | |:--|:--------| | **Description** |Construct a millennium interval. | | **Example** | `to_millennia(5)` | | **Result** | `INTERVAL 5000 YEAR` | ###### `to_milliseconds(integer)` {#docs:stable:sql:functions:interval::to_millisecondsinteger} | | | |:--|:--------| | **Description** |Construct a millisecond interval. | | **Example** | `to_milliseconds(5)` | | **Result** | `INTERVAL 5 MILLISECOND` | ###### `to_minutes(integer)` {#docs:stable:sql:functions:interval::to_minutesinteger} | | | |:--|:--------| | **Description** |Construct a minute interval. | | **Example** | `to_minutes(5)` | | **Result** | `INTERVAL 5 MINUTE` | ###### `to_months(integer)` {#docs:stable:sql:functions:interval::to_monthsinteger} | | | |:--|:--------| | **Description** |Construct a month interval. | | **Example** | `to_months(5)` | | **Result** | `INTERVAL 5 MONTH` | ###### `to_quarters(integer)` {#docs:stable:sql:functions:interval::to_quartersinteger} | | | |:--|:--------| | **Description** |Construct an interval of `integer` quarters. | | **Example** | `to_quarters(5)` | | **Result** | `INTERVAL 1 YEAR 3 MONTHS` | ###### `to_seconds(integer)` {#docs:stable:sql:functions:interval::to_secondsinteger} | | | |:--|:--------| | **Description** |Construct a second interval. | | **Example** | `to_seconds(5)` | | **Result** | `INTERVAL 5 SECOND` | ###### `to_weeks(integer)` {#docs:stable:sql:functions:interval::to_weeksinteger} | | | |:--|:--------| | **Description** |Construct a week interval. | | **Example** | `to_weeks(5)` | | **Result** | `INTERVAL 35 DAY` | ###### `to_years(integer)` {#docs:stable:sql:functions:interval::to_yearsinteger} | | | |:--|:--------| | **Description** |Construct a year interval. | | **Example** | `to_years(5)` | | **Result** | `INTERVAL 5 YEAR` | ### Lambda Functions {#docs:stable:sql:functions:lambda} > **Deprecated.** DuckDB 1.3.0 deprecated the old lambda single arrow syntax (` x -> x + 1`) > in favor of the Python-style syntax (` lambda x : x + 1`). > > DuckDB 1.3.0 also introduces a new setting to configure the lambda syntax. > > ```sql > SET lambda_syntax = 'DEFAULT'; > SET lambda_syntax = 'ENABLE_SINGLE_ARROW'; > SET lambda_syntax = 'DISABLE_SINGLE_ARROW'; > ``` > > Currently, `DEFAULT` enables both syntax styles, i.e., > the old single arrow syntax and the Python-style syntax. > > DuckDB 1.4.0 will be the last release supporting the single arrow syntax without explicitly enabling it. > > DuckDB 1.5.0 disables the single arrow syntax on default. > > DuckDB 1.6.0 removes the `lambda_syntax` flag and fully deprecates the single arrow syntax, > so the old behavior will no longer be possible. Lambda functions enable the use of more complex and flexible expressions in queries. DuckDB supports several scalar functions that operate on [`LIST`s](#docs:stable:sql:data_types:list) and accept lambda functions as parameters in the form `lambda âŸ¨parameter1âŸ©, âŸ¨parameter2âŸ©, ... : âŸ¨expressionâŸ©`{:.language-sql .highlight}. If the lambda function has only one parameter, then the parentheses can be omitted. The parameters can have any names. For example, the following are all valid lambda functions: * `lambda param : param > 1`{:.language-sql .highlight} * `lambda s : contains(concat(s, 'DB'), 'duck')`{:.language-sql .highlight} * `lambda acc, x : acc + x`{:.language-sql .highlight} #### Scalar Functions That Accept Lambda Functions {#docs:stable:sql:functions:lambda::scalar-functions-that-accept-lambda-functions} | Function | Description | |:--|:-------| | [`apply(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. | | [`array_apply(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. | | [`array_filter(list, lambda(x))`](#::list_filterlist-lambdax) | Alias for `list_filter`. | | [`array_reduce(list, lambda(x, y)[, initial_value])`](#list_reducelist-lambdax-y-initial_value) | Alias for `list_reduce`. | | [`array_transform(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. | | [`filter(list, lambda(x))`](#::list_filterlist-lambdax) | Alias for `list_filter`. | | [`list_apply(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. | | [`list_filter(list, lambda(x))`](#::list_filterlist-lambdax) | Constructs a list from those elements of the input `list` for which the `lambda` function returns `true`. DuckDB must be able to cast the `lambda` function's return type to `BOOL`. The return type of `list_filter` is the same as the input list's. See [`list_filter` examples](#docs:stable:sql:functions:lambda::list_filter-examples). | | [`list_reduce(list, lambda(x, y)[, initial_value])`](#list_reducelist-lambdax-y-initial_value) | Reduces all elements of the input `list` into a single scalar value by executing the `lambda` function on a running result and the next list element. The `lambda` function has an optional `initial_value` argument. See [`list_reduce` examples](#docs:stable:sql:functions:lambda::list_reduce-examples). | | [`list_transform(list, lambda(x))`](#::list_transformlist-lambdax) | Returns a list that is the result of applying the `lambda` function to each element of the input `list`. The return type is defined by the return type of the `lambda` function. See [`list_transform` examples](#docs:stable:sql:functions:lambda::list_transform-examples). | | [`reduce(list, lambda(x, y)[, initial_value])`](#list_reducelist-lambdax-y-initial_value) | Alias for `list_reduce`. | ###### `list_filter(list, lambda(x))` {#docs:stable:sql:functions:lambda::list_filterlist-lambdax} | | | |:--|:--------| | **Description** |Constructs a list from those elements of the input `list` for which the `lambda` function returns `true`. DuckDB must be able to cast the `lambda` function's return type to `BOOL`. The return type of `list_filter` is the same as the input list's. See [`list_filter` examples](#docs:stable:sql:functions:lambda::list_filter-examples). | | **Example** | `list_filter([3, 4, 5], lambda x : x > 4)` | | **Result** | `[5]` | | **Aliases** | `array_filter`, `filter` | ###### `list_reduce(list, lambda(x, y)[, initial_value])` {#docs:stable:sql:functions:lambda::list_reducelist-lambdax-y-initial_value} | | | |:--|:--------| | **Description** |Reduces all elements of the input `list` into a single scalar value by executing the `lambda` function on a running result and the next list element. The `lambda` function has an optional `initial_value` argument. See [`list_reduce` examples](#docs:stable:sql:functions:lambda::list_reduce-examples). | | **Example** | `list_reduce([1, 2, 3], lambda x, y : x + y)` | | **Result** | `6` | | **Aliases** | `array_reduce`, `reduce` | ###### `list_transform(list, lambda(x))` {#docs:stable:sql:functions:lambda::list_transformlist-lambdax} | | | |:--|:--------| | **Description** |Returns a list that is the result of applying the `lambda` function to each element of the input `list`. The return type is defined by the return type of the `lambda` function. See [`list_transform` examples](#docs:stable:sql:functions:lambda::list_transform-examples). | | **Example** | `list_transform([1, 2, 3], lambda x : x + 1)` | | **Result** | `[2, 3, 4]` | | **Aliases** | `apply`, `array_apply`, `array_transform`, `list_apply` | #### Nesting Lambda Functions {#docs:stable:sql:functions:lambda::nesting-lambda-functions} All scalar functions can be arbitrarily nested. For example, nested lambda functions to get all squares of even list elements: ```sql SELECT list_transform( list_filter([0, 1, 2, 3, 4, 5], lambda x: x % 2 = 0), lambda y: y * y ); ``` ```text [0, 4, 16] ``` Nested lambda function to add each element of the first list to the sum of the second list: ```sql SELECT list_transform( [1, 2, 3], lambda x : list_reduce([4, 5, 6], lambda a, b: a + b) + x ); ``` ```text [16, 17, 18] ``` #### Scoping {#docs:stable:sql:functions:lambda::scoping} Lambda functions confirm to scoping rules in the following order: * inner lambda parameters * outer lambda parameters * column names * macro parameters ```sql CREATE TABLE tbl (x INTEGER); INSERT INTO tbl VALUES (10); SELECT list_apply( [1, 2], lambda x: list_apply([4], lambda x: x + tbl.x)[1] + x ) FROM tbl; ``` ```text [15, 16] ``` #### Indexes as Parameters {#docs:stable:sql:functions:lambda::indexes-as-parameters} All lambda functions accept an optional extra parameter that represents the index of the current element. This is always the last parameter of the lambda function (e.g., `i` in `(x, i)`), and is 1-based (i.e., the first element has index 1). Get all elements that are larger than their index: ```sql SELECT list_filter([1, 3, 1, 5], lambda x, i: x > i); ``` ```text [3, 5] ``` #### Examples {#docs:stable:sql:functions:lambda::examples} ##### `list_transform` Examples {#docs:stable:sql:functions:lambda::list_transform-examples} Incrementing each list element by one: ```sql SELECT list_transform([1, 2, NULL, 3], lambda x: x + 1); ``` ```text [2, 3, NULL, 4] ``` Transforming strings: ```sql SELECT list_transform(['Duck', 'Goose', 'Sparrow'], lambda s: concat(s, 'DB')); ``` ```text [DuckDB, GooseDB, SparrowDB] ``` Combining lambda functions with other functions: ```sql SELECT list_transform([5, NULL, 6], lambda x: coalesce(x, 0) + 1); ``` ```text [6, 1, 7] ``` ##### `list_filter` Examples {#docs:stable:sql:functions:lambda::list_filter-examples} Filter out negative values: ```sql SELECT list_filter([5, -6, NULL, 7], lambda x: x > 0); ``` ```text [5, 7] ``` Divisible by 2 and 5: ```sql SELECT list_filter( list_filter([2, 4, 3, 1, 20, 10, 3, 30], lambda x: x % 2 = 0), lambda y: y % 5 = 0 ); ``` ```text [20, 10, 30] ``` In combination with `range(...)` to construct lists: ```sql SELECT list_filter([1, 2, 3, 4], lambda x: x > #1) FROM range(4); ``` ```text [1, 2, 3, 4] [2, 3, 4] [3, 4] [4] ``` ##### `list_reduce` Examples {#docs:stable:sql:functions:lambda::list_reduce-examples} Sum of all list elements: ```sql SELECT list_reduce([1, 2, 3, 4], lambda acc, x: acc + x); ``` ```text 10 ``` Only add up list elements if they are greater than 2: ```sql SELECT list_reduce( list_filter([1, 2, 3, 4], lambda x: x > 2), lambda acc, x: acc + x ); ``` ```text 7 ``` Concat all list elements: ```sql SELECT list_reduce(['DuckDB', 'is', 'awesome'], lambda acc, x: concat(acc, ' ', x)); ``` ```text DuckDB is awesome ``` Concatenate elements with the index without an initial value: ```sql SELECT list_reduce( ['a', 'b', 'c', 'd'], lambda x, y, i: x || ' - ' || CAST(i AS VARCHAR) || ' - ' || y ); ``` ```text a - 2 - b - 3 - c - 4 - d ``` Concatenate elements with the index with an initial value: ```sql SELECT list_reduce( ['a', 'b', 'c', 'd'], lambda x, y, i: x || ' - ' || CAST(i AS VARCHAR) || ' - ' || y, 'INITIAL' ); ``` ```text INITIAL - 1 - a - 2 - b - 3 - c - 4 - d ``` #### Limitations {#docs:stable:sql:functions:lambda::limitations} Subqueries in lambda expressions are currently not supported. For example: ```sql SELECT list_apply([1, 2, 3], lambda x: (SELECT 42) + x); ``` ```console Binder Error: subqueries in lambda expressions are not supported ``` ### List Functions {#docs:stable:sql:functions:list} | Function | Description | |:--|:-------| | [`list[index]`](#listindex) | Extracts a single list element using a (1-based) `index`. | | [`list[begin[:end][:step]]`](#listbeginendstep) | Extracts a sublist using [slice conventions](#docs:stable:sql:functions:list::slicing). Negative values are accepted. | | [`list1 && list2`](#::list_has_anylist1-list2) | Alias for `list_has_any`. | | [`list1 <-> list2`](#::list_distancelist1-list2) | Alias for `list_distance`. | | [`list1 <=> list2`](#::list_cosine_distancelist1-list2) | Alias for `list_cosine_distance`. | | [`list1 <@ list2`](#::list_has_alllist1-list2) | Alias for `list_has_all`. | | [`list1 @> list2`](#::list_has_alllist1-list2) | Alias for `list_has_all`. | | [`arg1 || arg2`](#::arg1--arg2) | Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also [`concat(arg1, arg2, ...)`](#docs:stable:sql:functions:text::concatvalue-) and [`list_concat(list1, list2, ...)`](#docs:stable:sql:functions:list::list_concatlist_1--list_n). | | [`aggregate(list, function_name, ...)`](#::list_aggregatelist-function_name-) | Alias for `list_aggregate`. | | [`apply(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. | | [`array_aggr(list, function_name, ...)`](#::list_aggregatelist-function_name-) | Alias for `list_aggregate`. | | [`array_aggregate(list, function_name, ...)`](#::list_aggregatelist-function_name-) | Alias for `list_aggregate`. | | [`array_append(list, element)`](#::list_appendlist-element) | Alias for `list_append`. | | [`array_apply(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. | | [`array_cat(list_1, ..., list_n)`](#::list_concatlist_1--list_n) | Alias for `list_concat`. | | [`array_concat(list_1, ..., list_n)`](#::list_concatlist_1--list_n) | Alias for `list_concat`. | | [`array_contains(list, element)`](#::list_containslist-element) | Alias for `list_contains`. | | [`array_distinct(list)`](#::list_distinctlist) | Alias for `list_distinct`. | | [`array_extract(list, index)`](#::array_extractlist-index) | Extracts the `index`th (1-based) value from the `list`. | | [`array_filter(list, lambda(x))`](#::list_filterlist-lambdax) | Alias for `list_filter`. | | [`array_grade_up(list[, col1][, col2])`](#list_grade_uplist-col1-col2) | Alias for `list_grade_up`. | | [`array_has(list, element)`](#::list_containslist-element) | Alias for `list_contains`. | | [`array_has_all(list1, list2)`](#::list_has_alllist1-list2) | Alias for `list_has_all`. | | [`array_has_any(list1, list2)`](#::list_has_anylist1-list2) | Alias for `list_has_any`. | | [`array_indexof(list, element)`](#::list_positionlist-element) | Alias for `list_position`. | | [`array_intersect(list1, list2)`](#::list_intersectlist1-list2) | Alias for `list_intersect`. | | [`array_length(list)`](#::lengthlist) | Alias for `length`. | | [`array_pop_back(list)`](#::array_pop_backlist) | Returns the `list` without the last element. | | [`array_pop_front(list)`](#::array_pop_frontlist) | Returns the `list` without the first element. | | [`array_position(list, element)`](#::list_positionlist-element) | Alias for `list_position`. | | [`array_prepend(element, list)`](#::list_prependelement-list) | Alias for `list_prepend`. | | [`array_push_back(list, element)`](#::list_appendlist-element) | Alias for `list_append`. | | [`array_push_front(list, element)`](#::array_push_frontlist-element) | Prepends `element` to `list`. | | [`array_reduce(list, lambda(x,y)[, initial_value])`](#list_reducelist-lambdaxy-initial_value) | Alias for `list_reduce`. | | [`array_resize(list, size[[, value]])`](#list_resizelist-size-value) | Alias for `list_resize`. | | [`array_reverse(list)`](#::list_reverselist) | Alias for `list_reverse`. | | [`array_reverse_sort(list[, col1])`](#list_reverse_sortlist-col1) | Alias for `list_reverse_sort`. | | [`array_select(value_list, index_list)`](#::list_selectvalue_list-index_list) | Alias for `list_select`. | | [`array_slice(list, begin, end)`](#::list_slicelist-begin-end) | Alias for `list_slice`. | | [`array_slice(list, begin, end, step)`](#::list_slicelist-begin-end-step) | Alias for `list_slice`. | | [`array_sort(list[, col1][, col2])`](#list_sortlist-col1-col2) | Alias for `list_sort`. | | [`array_to_string(list, delimiter)`](#::array_to_stringlist-delimiter) | Concatenates list/array elements using an optional `delimiter`. | | [`array_to_string_comma_default(array)`](#::array_to_string_comma_defaultarray) | Concatenates list/array elements with a comma delimiter. | | [`array_transform(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. | | [`array_unique(list)`](#::list_uniquelist) | Alias for `list_unique`. | | [`array_where(value_list, mask_list)`](#::list_wherevalue_list-mask_list) | Alias for `list_where`. | | [`array_zip(list_1, ..., list_n[, truncate])`](#list_ziplist_1--list_n-truncate) | Alias for `list_zip`. | | [`char_length(list)`](#::lengthlist) | Alias for `length`. | | [`character_length(list)`](#::lengthlist) | Alias for `length`. | | [`concat(value, ...)`](#::concatvalue-) | Concatenates multiple strings or lists. `NULL` inputs are skipped. See also [operator `||`](#::arg1--arg2). | | [`contains(list, element)`](#::containslist-element) | Returns `true` if the `list` contains the `element`. | | [`filter(list, lambda(x))`](#::list_filterlist-lambdax) | Alias for `list_filter`. | | [`flatten(nested_list)`](#::flattennested_list) | [Flattens](#::flattening) a nested list by one level. | | [`generate_series(start[, stop][, step])`](#generate_seriesstart-stop-step) | Creates a list of values between `start` and `stop` - the stop parameter is inclusive. | | [`grade_up(list[, col1][, col2])`](#list_grade_uplist-col1-col2) | Alias for `list_grade_up`. | | [`len(list)`](#::lengthlist) | Alias for `length`. | | [`length(list)`](#::lengthlist) | Returns the length of the `list`. | | [`list_aggr(list, function_name, ...)`](#::list_aggregatelist-function_name-) | Alias for `list_aggregate`. | | [`list_aggregate(list, function_name, ...)`](#::list_aggregatelist-function_name-) | Executes the aggregate function `function_name` on the elements of `list`. See the [List Aggregates](#::list-aggregates) section for more details. | | [`list_any_value(list)`](#::list_any_valuelist) | Applies aggregate function [`any_value`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_append(list, element)`](#::list_appendlist-element) | Appends `element` to `list`. | | [`list_apply(list, lambda(x))`](#::list_transformlist-lambdax) | Alias for `list_transform`. | | [`list_approx_count_distinct(list)`](#::list_approx_count_distinctlist) | Applies aggregate function [`approx_count_distinct`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_avg(list)`](#::list_avglist) | Applies aggregate function [`avg`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_bit_and(list)`](#::list_bit_andlist) | Applies aggregate function [`bit_and`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_bit_or(list)`](#::list_bit_orlist) | Applies aggregate function [`bit_or`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_bit_xor(list)`](#::list_bit_xorlist) | Applies aggregate function [`bit_xor`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_bool_and(list)`](#::list_bool_andlist) | Applies aggregate function [`bool_and`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_bool_or(list)`](#::list_bool_orlist) | Applies aggregate function [`bool_or`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_cat(list_1, ..., list_n)`](#::list_concatlist_1--list_n) | Alias for `list_concat`. | | [`list_concat(list_1, ..., list_n)`](#::list_concatlist_1--list_n) | Concatenates lists. `NULL` inputs are skipped. See also [operator `||`](#::arg1--arg2). | | [`list_contains(list, element)`](#::list_containslist-element) | Returns true if the list contains the element. | | [`list_cosine_distance(list1, list2)`](#::list_cosine_distancelist1-list2) | Computes the cosine distance between two same-sized lists. | | [`list_cosine_similarity(list1, list2)`](#::list_cosine_similaritylist1-list2) | Computes the cosine similarity between two same-sized lists. | | [`list_count(list)`](#::list_countlist) | Applies aggregate function [`count`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_distance(list1, list2)`](#::list_distancelist1-list2) | Calculates the Euclidean distance between two points with coordinates given in two inputs lists of equal length. | | [`list_distinct(list)`](#::list_distinctlist) | Removes all duplicates and `NULL` values from a list. Does not preserve the original order. | | [`list_dot_product(list1, list2)`](#::list_inner_productlist1-list2) | Alias for `list_inner_product`. | | [`list_element(list, index)`](#::list_extractlist-index) | Alias for `list_extract`. | | [`list_entropy(list)`](#::list_entropylist) | Applies aggregate function [`entropy`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_extract(list, index)`](#::list_extractlist-index) | Extract the `index`th (1-based) value from the list. | | [`list_filter(list, lambda(x))`](#::list_filterlist-lambdax) | Constructs a list from those elements of the input `list` for which the `lambda` function returns `true`. DuckDB must be able to cast the `lambda` function's return type to `BOOL`. The return type of `list_filter` is the same as the input list's. See [`list_filter` examples](#docs:stable:sql:functions:lambda::list_filter-examples). | | [`list_first(list)`](#::list_firstlist) | Applies aggregate function [`first`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_grade_up(list[, col1][, col2])`](#list_grade_uplist-col1-col2) | Works like [`list_sort`](#::list_sortlist-col1-col2), but the results are the indexes that correspond to the position in the original list instead of the actual values. | | [`list_has(list, element)`](#::list_containslist-element) | Alias for `list_contains`. | | [`list_has_all(list1, list2)`](#::list_has_alllist1-list2) | Returns true if all elements of list2 are in list1. NULLs are ignored. | | [`list_has_any(list1, list2)`](#::list_has_anylist1-list2) | Returns true if the lists have any element in common. NULLs are ignored. | | [`list_histogram(list)`](#::list_histogramlist) | Applies aggregate function [`histogram`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_indexof(list, element)`](#::list_positionlist-element) | Alias for `list_position`. | | [`list_inner_product(list1, list2)`](#::list_inner_productlist1-list2) | Computes the inner product between two same-sized lists. | | [`list_intersect(list1, list2)`](#::list_intersectlist1-list2) | Returns a list of all the elements that exist in both `list1` and `list2`, without duplicates. | | [`list_kurtosis(list)`](#::list_kurtosislist) | Applies aggregate function [`kurtosis`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_kurtosis_pop(list)`](#::list_kurtosis_poplist) | Applies aggregate function [`kurtosis_pop`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_last(list)`](#::list_lastlist) | Applies aggregate function [`last`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_mad(list)`](#::list_madlist) | Applies aggregate function [`mad`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_max(list)`](#::list_maxlist) | Applies aggregate function [`max`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_median(list)`](#::list_medianlist) | Applies aggregate function [`median`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_min(list)`](#::list_minlist) | Applies aggregate function [`min`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_mode(list)`](#::list_modelist) | Applies aggregate function [`mode`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_negative_dot_product(list1, list2)`](#::list_negative_inner_productlist1-list2) | Alias for `list_negative_inner_product`. | | [`list_negative_inner_product(list1, list2)`](#::list_negative_inner_productlist1-list2) | Computes the negative inner product between two same-sized lists. | | [`list_pack(arg, ...)`](#::list_valuearg-) | Alias for `list_value`. | | [`list_position(list, element)`](#::list_positionlist-element) | Returns the index of the `element` if the `list` contains the `element`. If the `element` is not found, it returns `NULL`. | | [`list_prepend(element, list)`](#::list_prependelement-list) | Prepends `element` to `list`. | | [`list_product(list)`](#::list_productlist) | Applies aggregate function [`product`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_reduce(list, lambda(x,y)[, initial_value])`](#list_reducelist-lambdaxy-initial_value) | Reduces all elements of the input `list` into a single scalar value by executing the `lambda` function on a running result and the next list element. The `lambda` function has an optional `initial_value` argument. See [`list_reduce` examples](#docs:stable:sql:functions:lambda::list_reduce-examples). | | [`list_resize(list, size[[, value]])`](#list_resizelist-size-value) | Resizes the `list` to contain `size` elements. Initializes new elements with `value` or `NULL` if `value` is not set. | | [`list_reverse(list)`](#::list_reverselist) | Reverses the `list`. | | [`list_reverse_sort(list[, col1])`](#list_reverse_sortlist-col1) | Sorts the elements of the list in reverse order. See the [Sorting Lists](#::sorting-lists) section for more details about sorting order and `NULL` values. | | [`list_select(value_list, index_list)`](#::list_selectvalue_list-index_list) | Returns a list based on the elements selected by the `index_list`. | | [`list_sem(list)`](#::list_semlist) | Applies aggregate function [`sem`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_skewness(list)`](#::list_skewnesslist) | Applies aggregate function [`skewness`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_slice(list, begin, end)`](#::list_slicelist-begin-end) | Extracts a sublist or substring using [slice conventions](#docs:stable:sql:functions:list::slicing). Negative values are accepted. | | [`list_slice(list, begin, end, step)`](#::list_slicelist-begin-end-step) | list_slice with added step feature. | | [`list_sort(list[, col1][, col2])`](#list_sortlist-col1-col2) | Sorts the elements of the list. See the [Sorting Lists](#::sorting-lists) section for more details about sorting order and `NULL` values. | | [`list_stddev_pop(list)`](#::list_stddev_poplist) | Applies aggregate function [`stddev_pop`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_stddev_samp(list)`](#::list_stddev_samplist) | Applies aggregate function [`stddev_samp`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_string_agg(list)`](#::list_string_agglist) | Applies aggregate function [`string_agg`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_sum(list)`](#::list_sumlist) | Applies aggregate function [`sum`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_transform(list, lambda(x))`](#::list_transformlist-lambdax) | Returns a list that is the result of applying the `lambda` function to each element of the input `list`. The return type is defined by the return type of the `lambda` function. See [`list_transform` examples](#docs:stable:sql:functions:lambda::list_transform-examples). | | [`list_unique(list)`](#::list_uniquelist) | Counts the unique elements of a `list`. | | [`list_value(arg, ...)`](#::list_valuearg-) | Creates a LIST containing the argument values. | | [`list_var_pop(list)`](#::list_var_poplist) | Applies aggregate function [`var_pop`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_var_samp(list)`](#::list_var_samplist) | Applies aggregate function [`var_samp`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | [`list_where(value_list, mask_list)`](#::list_wherevalue_list-mask_list) | Returns a list with the `BOOLEAN`s in `mask_list` applied as a mask to the `value_list`. | | [`list_zip(list_1, ..., list_n[, truncate])`](#list_ziplist_1--list_n-truncate) | Zips n `LIST`s to a new `LIST` whose length will be that of the longest list. Its elements are structs of n elements from each list `list_1`, â€¦, `list_n`, missing elements are replaced with `NULL`. If `truncate` is set, all lists are truncated to the smallest list length. | | [`range(start[, stop][, step])`](#rangestart-stop-step) | Creates a list of values between `start` and `stop` - the stop parameter is exclusive. | | [`reduce(list, lambda(x,y)[, initial_value])`](#list_reducelist-lambdaxy-initial_value) | Alias for `list_reduce`. | | [`repeat(list, count)`](#::repeatlist-count) | Repeats the `list` `count` number of times. | | [`unnest(list)`](#::unnestlist) | Unnests a list by one level. Note that this is a special function that alters the cardinality of the result. See the [unnest page](#docs:stable:sql:query_syntax:unnest) for more details. | | [`unpivot_list(arg, ...)`](#::unpivot_listarg-) | Identical to list_value, but generated as part of unpivot for better error messages. | ###### `list[index]` {#docs:stable:sql:functions:list::listindex} | | | |:--|:--------| | **Description** |Extracts a single list element using a (1-based) `index`. | | **Example** | `[4, 5, 6][3]` | | **Result** | `6` | | **Alias** | `list_extract` | ###### `list[begin[:end][:step]]` {#docs:stable:sql:functions:list::listbeginendstep} | | | |:--|:--------| | **Description** |Extracts a sublist using [slice conventions](#docs:stable:sql:functions:list::slicing). Negative values are accepted. | | **Example** | `[4, 5, 6][3]` | | **Result** | `6` | | **Alias** | `list_slice` | ###### `arg1 || arg2` {#docs:stable:sql:functions:list::arg1--arg2} | | | |:--|:--------| | **Description** |Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also [`concat(arg1, arg2, ...)`](#docs:stable:sql:functions:text::concatvalue-) and [`list_concat(list1, list2, ...)`](#docs:stable:sql:functions:list::list_concatlist_1--list_n). | | **Example 1** | `'Duck' || 'DB'` | | **Result** | `DuckDB` | | **Example 2** | `[1, 2, 3] || [4, 5, 6]` | | **Result** | `[1, 2, 3, 4, 5, 6]` | | **Example 3** | `'\xAA'::BLOB || '\xBB'::BLOB` | | **Result** | `\xAA\xBB` | ###### `array_extract(list, index)` {#docs:stable:sql:functions:list::array_extractlist-index} | | | |:--|:--------| | **Description** |Extracts the `index`th (1-based) value from the `list`. | | **Example** | `array_extract([4, 5, 6], 3)` | | **Result** | `6` | ###### `array_pop_back(list)` {#docs:stable:sql:functions:list::array_pop_backlist} | | | |:--|:--------| | **Description** |Returns the `list` without the last element. | | **Example** | `array_pop_back([4, 5, 6])` | | **Result** | `[4, 5]` | ###### `array_pop_front(list)` {#docs:stable:sql:functions:list::array_pop_frontlist} | | | |:--|:--------| | **Description** |Returns the `list` without the first element. | | **Example** | `array_pop_front([4, 5, 6])` | | **Result** | `[5, 6]` | ###### `array_push_front(list, element)` {#docs:stable:sql:functions:list::array_push_frontlist-element} | | | |:--|:--------| | **Description** |Prepends `element` to `list`. | | **Example** | `array_push_front([4, 5, 6], 3)` | | **Result** | `[3, 4, 5, 6]` | ###### `array_to_string(list, delimiter)` {#docs:stable:sql:functions:list::array_to_stringlist-delimiter} | | | |:--|:--------| | **Description** |Concatenates list/array elements using an optional `delimiter`. | | **Example 1** | `array_to_string([1, 2, 3], '-')` | | **Result** | `1-2-3` | | **Example 2** | `array_to_string(['aa', 'bb', 'cc'], '')` | | **Result** | `aabbcc` | ###### `array_to_string_comma_default(array)` {#docs:stable:sql:functions:list::array_to_string_comma_defaultarray} | | | |:--|:--------| | **Description** |Concatenates list/array elements with a comma delimiter. | | **Example** | `array_to_string_comma_default(['Banana', 'Apple', 'Melon'])` | | **Result** | `Banana,Apple,Melon` | ###### `concat(value, ...)` {#docs:stable:sql:functions:list::concatvalue-} | | | |:--|:--------| | **Description** |Concatenates multiple strings or lists. `NULL` inputs are skipped. See also [operator `||`](#::arg1--arg2). | | **Example 1** | `concat('Hello', ' ', 'World')` | | **Result** | `Hello World` | | **Example 2** | `concat([1, 2, 3], NULL, [4, 5, 6])` | | **Result** | `[1, 2, 3, 4, 5, 6]` | ###### `contains(list, element)` {#docs:stable:sql:functions:list::containslist-element} | | | |:--|:--------| | **Description** |Returns `true` if the `list` contains the `element`. | | **Example** | `contains([1, 2, NULL], 1)` | | **Result** | `true` | ###### `flatten(nested_list)` {#docs:stable:sql:functions:list::flattennested_list} | | | |:--|:--------| | **Description** |[Flattens](#::flattening) a nested list by one level. | | **Example** | `flatten([[1, 2, 3], [4, 5]])` | | **Result** | `[1, 2, 3, 4, 5]` | ###### `generate_series(start[, stop][, step])` {#docs:stable:sql:functions:list::generate_seriesstart-stop-step} | | | |:--|:--------| | **Description** |Creates a list of values between `start` and `stop` - the stop parameter is inclusive. | | **Example** | `generate_series(2, 5, 3)` | | **Result** | `[2, 5]` | ###### `length(list)` {#docs:stable:sql:functions:list::lengthlist} | | | |:--|:--------| | **Description** |Returns the length of the `list`. | | **Example** | `length([1,2,3])` | | **Result** | `3` | | **Aliases** | `char_length`, `character_length`, `len` | ###### `list_aggregate(list, function_name, ...)` {#docs:stable:sql:functions:list::list_aggregatelist-function_name-} | | | |:--|:--------| | **Description** |Executes the aggregate function `function_name` on the elements of `list`. See the [List Aggregates](#::list-aggregates) section for more details. | | **Example** | `list_aggregate([1, 2, NULL], 'min')` | | **Result** | `1` | | **Aliases** | `aggregate`, `array_aggr`, `array_aggregate`, `list_aggr` | ###### `list_any_value(list)` {#docs:stable:sql:functions:list::list_any_valuelist} | | | |:--|:--------| | **Description** |Applies aggregate function [`any_value`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_any_value([3,3,9])` | | **Result** | `3` | ###### `list_append(list, element)` {#docs:stable:sql:functions:list::list_appendlist-element} | | | |:--|:--------| | **Description** |Appends `element` to `list`. | | **Example** | `list_append([2, 3], 4)` | | **Result** | `[2, 3, 4]` | | **Aliases** | `array_append`, `array_push_back` | ###### `list_approx_count_distinct(list)` {#docs:stable:sql:functions:list::list_approx_count_distinctlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`approx_count_distinct`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_approx_count_distinct([3,3,9])` | | **Result** | `2` | ###### `list_avg(list)` {#docs:stable:sql:functions:list::list_avglist} | | | |:--|:--------| | **Description** |Applies aggregate function [`avg`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_avg([3,3,9])` | | **Result** | `5.0` | ###### `list_bit_and(list)` {#docs:stable:sql:functions:list::list_bit_andlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`bit_and`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_bit_and([3,3,9])` | | **Result** | `1` | ###### `list_bit_or(list)` {#docs:stable:sql:functions:list::list_bit_orlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`bit_or`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_bit_or([3,3,9])` | | **Result** | `11` | ###### `list_bit_xor(list)` {#docs:stable:sql:functions:list::list_bit_xorlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`bit_xor`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_bit_xor([3,3,9])` | | **Result** | `9` | ###### `list_bool_and(list)` {#docs:stable:sql:functions:list::list_bool_andlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`bool_and`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_bool_and([true, false])` | | **Result** | `false` | ###### `list_bool_or(list)` {#docs:stable:sql:functions:list::list_bool_orlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`bool_or`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_bool_or([true, false])` | | **Result** | `true` | ###### `list_concat(list_1, ..., list_n)` {#docs:stable:sql:functions:list::list_concatlist_1--list_n} | | | |:--|:--------| | **Description** |Concatenates lists. `NULL` inputs are skipped. See also [operator `||`](#::arg1--arg2). | | **Example** | `list_concat([2, 3], [4, 5, 6], [7])` | | **Result** | `[2, 3, 4, 5, 6, 7]` | | **Aliases** | `list_cat`, `array_concat`, `array_cat` | ###### `list_contains(list, element)` {#docs:stable:sql:functions:list::list_containslist-element} | | | |:--|:--------| | **Description** |Returns true if the list contains the element. | | **Example** | `list_contains([1, 2, NULL], 1)` | | **Result** | `true` | | **Aliases** | `array_contains`, `array_has`, `list_has` | ###### `list_cosine_distance(list1, list2)` {#docs:stable:sql:functions:list::list_cosine_distancelist1-list2} | | | |:--|:--------| | **Description** |Computes the cosine distance between two same-sized lists. | | **Example** | `list_cosine_distance([1, 2, 3], [1, 2, 3])` | | **Result** | `0.0` | | **Alias** | `<=>` | ###### `list_cosine_similarity(list1, list2)` {#docs:stable:sql:functions:list::list_cosine_similaritylist1-list2} | | | |:--|:--------| | **Description** |Computes the cosine similarity between two same-sized lists. | | **Example** | `list_cosine_similarity([1, 2, 3], [1, 2, 3])` | | **Result** | `1.0` | ###### `list_count(list)` {#docs:stable:sql:functions:list::list_countlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`count`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_count([3,3,9])` | | **Result** | `3` | ###### `list_distance(list1, list2)` {#docs:stable:sql:functions:list::list_distancelist1-list2} | | | |:--|:--------| | **Description** |Calculates the Euclidean distance between two points with coordinates given in two inputs lists of equal length. | | **Example** | `list_distance([1, 2, 3], [1, 2, 5])` | | **Result** | `2.0` | | **Alias** | `<->` | ###### `list_distinct(list)` {#docs:stable:sql:functions:list::list_distinctlist} | | | |:--|:--------| | **Description** |Removes all duplicates and `NULL` values from a list. Does not preserve the original order. | | **Example** | `list_distinct([1, 1, NULL, -3, 1, 5])` | | **Result** | `[5, -3, 1]` | | **Alias** | `array_distinct` | ###### `list_entropy(list)` {#docs:stable:sql:functions:list::list_entropylist} | | | |:--|:--------| | **Description** |Applies aggregate function [`entropy`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_entropy([3,3,9])` | | **Result** | `0.9182958340544893` | ###### `list_extract(list, index)` {#docs:stable:sql:functions:list::list_extractlist-index} | | | |:--|:--------| | **Description** |Extract the `index`th (1-based) value from the list. | | **Example** | `list_extract([4, 5, 6], 3)` | | **Result** | `6` | | **Alias** | `list_element` | ###### `list_filter(list, lambda(x))` {#docs:stable:sql:functions:list::list_filterlist-lambdax} | | | |:--|:--------| | **Description** |Constructs a list from those elements of the input `list` for which the `lambda` function returns `true`. DuckDB must be able to cast the `lambda` function's return type to `BOOL`. The return type of `list_filter` is the same as the input list's. See [`list_filter` examples](#docs:stable:sql:functions:lambda::list_filter-examples). | | **Example** | `list_filter([3, 4, 5], lambda x : x > 4)` | | **Result** | `[5]` | | **Aliases** | `array_filter`, `filter` | ###### `list_first(list)` {#docs:stable:sql:functions:list::list_firstlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`first`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_first([3,3,9])` | | **Result** | `3` | ###### `list_grade_up(list[, col1][, col2])` {#docs:stable:sql:functions:list::list_grade_uplist-col1-col2} | | | |:--|:--------| | **Description** |Works like [`list_sort`](#::list_sortlist-col1-col2), but the results are the indexes that correspond to the position in the original list instead of the actual values. | | **Example** | `list_grade_up([3, 6, 1, 2])` | | **Result** | `[3, 4, 1, 2]` | | **Aliases** | `array_grade_up`, `grade_up` | ###### `list_has_all(list1, list2)` {#docs:stable:sql:functions:list::list_has_alllist1-list2} | | | |:--|:--------| | **Description** |Returns true if all elements of list2 are in list1. NULLs are ignored. | | **Example** | `list_has_all([1, 2, 3], [2, 3])` | | **Result** | `true` | | **Aliases** | `<@`, `@>`, `array_has_all` | ###### `list_has_any(list1, list2)` {#docs:stable:sql:functions:list::list_has_anylist1-list2} | | | |:--|:--------| | **Description** |Returns true if the lists have any element in common. NULLs are ignored. | | **Example** | `list_has_any([1, 2, 3], [2, 3, 4])` | | **Result** | `true` | | **Aliases** | `&&`, `array_has_any` | ###### `list_histogram(list)` {#docs:stable:sql:functions:list::list_histogramlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`histogram`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_histogram([3,3,9])` | | **Result** | `{3=2, 9=1}` | ###### `list_inner_product(list1, list2)` {#docs:stable:sql:functions:list::list_inner_productlist1-list2} | | | |:--|:--------| | **Description** |Computes the inner product between two same-sized lists. | | **Example** | `list_inner_product([1, 2, 3], [1, 2, 3])` | | **Result** | `14.0` | | **Alias** | `list_dot_product` | ###### `list_intersect(list1, list2)` {#docs:stable:sql:functions:list::list_intersectlist1-list2} | | | |:--|:--------| | **Description** |Returns a list of all the elements that exist in both `list1` and `list2`, without duplicates. | | **Example** | `list_intersect([1, 2, 3], [2, 3, 4])` | | **Result** | `[3, 2]` | | **Alias** | `array_intersect` | ###### `list_kurtosis(list)` {#docs:stable:sql:functions:list::list_kurtosislist} | | | |:--|:--------| | **Description** |Applies aggregate function [`kurtosis`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_kurtosis([3,3,9])` | | **Result** | `NULL` | ###### `list_kurtosis_pop(list)` {#docs:stable:sql:functions:list::list_kurtosis_poplist} | | | |:--|:--------| | **Description** |Applies aggregate function [`kurtosis_pop`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_kurtosis_pop([3,3,9])` | | **Result** | `-1.4999999999999978` | ###### `list_last(list)` {#docs:stable:sql:functions:list::list_lastlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`last`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_last([3,3,9])` | | **Result** | `9` | ###### `list_mad(list)` {#docs:stable:sql:functions:list::list_madlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`mad`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_mad([3,3,9])` | | **Result** | `0.0` | ###### `list_max(list)` {#docs:stable:sql:functions:list::list_maxlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`max`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_max([3,3,9])` | | **Result** | `9` | ###### `list_median(list)` {#docs:stable:sql:functions:list::list_medianlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`median`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_median([3,3,9])` | | **Result** | `3.0` | ###### `list_min(list)` {#docs:stable:sql:functions:list::list_minlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`min`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_min([3,3,9])` | | **Result** | `3` | ###### `list_mode(list)` {#docs:stable:sql:functions:list::list_modelist} | | | |:--|:--------| | **Description** |Applies aggregate function [`mode`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_mode([3,3,9])` | | **Result** | `3` | ###### `list_negative_inner_product(list1, list2)` {#docs:stable:sql:functions:list::list_negative_inner_productlist1-list2} | | | |:--|:--------| | **Description** |Computes the negative inner product between two same-sized lists. | | **Example** | `list_negative_inner_product([1, 2, 3], [1, 2, 3])` | | **Result** | `-14.0` | | **Alias** | `list_negative_dot_product` | ###### `list_position(list, element)` {#docs:stable:sql:functions:list::list_positionlist-element} | | | |:--|:--------| | **Description** |Returns the index of the `element` if the `list` contains the `element`. If the `element` is not found, it returns `NULL`. | | **Example** | `list_position([1, 2, NULL], 2)` | | **Result** | `2` | | **Aliases** | `array_indexof`, `array_position`, `list_indexof` | ###### `list_prepend(element, list)` {#docs:stable:sql:functions:list::list_prependelement-list} | | | |:--|:--------| | **Description** |Prepends `element` to `list`. | | **Example** | `list_prepend(3, [4, 5, 6])` | | **Result** | `[3, 4, 5, 6]` | | **Alias** | `array_prepend` | ###### `list_product(list)` {#docs:stable:sql:functions:list::list_productlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`product`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_product([3,3,9])` | | **Result** | `81.0` | ###### `list_reduce(list, lambda(x,y)[, initial_value])` {#docs:stable:sql:functions:list::list_reducelist-lambdaxy-initial_value} | | | |:--|:--------| | **Description** |Reduces all elements of the input `list` into a single scalar value by executing the `lambda` function on a running result and the next list element. The `lambda` function has an optional `initial_value` argument. See [`list_reduce` examples](#docs:stable:sql:functions:lambda::list_reduce-examples). | | **Example** | `list_reduce([1, 2, 3], lambda x, y : x + y)` | | **Result** | `6` | | **Aliases** | `array_reduce`, `reduce` | ###### `list_resize(list, size[[, value]])` {#docs:stable:sql:functions:list::list_resizelist-size-value} | | | |:--|:--------| | **Description** |Resizes the `list` to contain `size` elements. Initializes new elements with `value` or `NULL` if `value` is not set. | | **Example** | `list_resize([1, 2, 3], 5, 0)` | | **Result** | `[1, 2, 3, 0, 0]` | | **Alias** | `array_resize` | ###### `list_reverse(list)` {#docs:stable:sql:functions:list::list_reverselist} | | | |:--|:--------| | **Description** |Reverses the `list`. | | **Example** | `list_reverse([3, 6, 1, 2])` | | **Result** | `[2, 1, 6, 3]` | | **Alias** | `array_reverse` | ###### `list_reverse_sort(list[, col1])` {#docs:stable:sql:functions:list::list_reverse_sortlist-col1} | | | |:--|:--------| | **Description** |Sorts the elements of the list in reverse order. See the [Sorting Lists](#::sorting-lists) section for more details about sorting order and `NULL` values. | | **Example** | `list_reverse_sort([3, 6, 1, 2])` | | **Result** | `[6, 3, 2, 1]` | | **Alias** | `array_reverse_sort` | ###### `list_select(value_list, index_list)` {#docs:stable:sql:functions:list::list_selectvalue_list-index_list} | | | |:--|:--------| | **Description** |Returns a list based on the elements selected by the `index_list`. | | **Example** | `list_select([10, 20, 30, 40], [1, 4])` | | **Result** | `[10, 40]` | | **Alias** | `array_select` | ###### `list_sem(list)` {#docs:stable:sql:functions:list::list_semlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`sem`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_sem([3,3,9])` | | **Result** | `1.6329931618554523` | ###### `list_skewness(list)` {#docs:stable:sql:functions:list::list_skewnesslist} | | | |:--|:--------| | **Description** |Applies aggregate function [`skewness`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_skewness([3,3,9])` | | **Result** | `1.7320508075688796` | ###### `list_slice(list, begin, end)` {#docs:stable:sql:functions:list::list_slicelist-begin-end} | | | |:--|:--------| | **Description** |Extracts a sublist or substring using [slice conventions](#docs:stable:sql:functions:list::slicing). Negative values are accepted. | | **Example** | `list_slice([4, 5, 6], 2, 3)` | | **Result** | `[5, 6]` | | **Alias** | `array_slice` | ###### `list_slice(list, begin, end, step)` {#docs:stable:sql:functions:list::list_slicelist-begin-end-step} | | | |:--|:--------| | **Description** |list_slice with added step feature. | | **Example** | `list_slice([4, 5, 6], 1, 3, 2)` | | **Result** | `[4, 6]` | | **Alias** | `array_slice` | ###### `list_sort(list[, col1][, col2])` {#docs:stable:sql:functions:list::list_sortlist-col1-col2} | | | |:--|:--------| | **Description** |Sorts the elements of the list. See the [Sorting Lists](#::sorting-lists) section for more details about sorting order and `NULL` values. | | **Example** | `list_sort([3, 6, 1, 2])` | | **Result** | `[1, 2, 3, 6]` | | **Alias** | `array_sort` | ###### `list_stddev_pop(list)` {#docs:stable:sql:functions:list::list_stddev_poplist} | | | |:--|:--------| | **Description** |Applies aggregate function [`stddev_pop`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_stddev_pop([3,3,9])` | | **Result** | `2.8284271247461903` | ###### `list_stddev_samp(list)` {#docs:stable:sql:functions:list::list_stddev_samplist} | | | |:--|:--------| | **Description** |Applies aggregate function [`stddev_samp`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_stddev_samp([3,3,9])` | | **Result** | `3.4641016151377544` | ###### `list_string_agg(list)` {#docs:stable:sql:functions:list::list_string_agglist} | | | |:--|:--------| | **Description** |Applies aggregate function [`string_agg`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_string_agg([3,3,9])` | | **Result** | `3,3,9` | ###### `list_sum(list)` {#docs:stable:sql:functions:list::list_sumlist} | | | |:--|:--------| | **Description** |Applies aggregate function [`sum`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_sum([3,3,9])` | | **Result** | `15` | ###### `list_transform(list, lambda(x))` {#docs:stable:sql:functions:list::list_transformlist-lambdax} | | | |:--|:--------| | **Description** |Returns a list that is the result of applying the `lambda` function to each element of the input `list`. The return type is defined by the return type of the `lambda` function. See [`list_transform` examples](#docs:stable:sql:functions:lambda::list_transform-examples). | | **Example** | `list_transform([1, 2, 3], lambda x : x + 1)` | | **Result** | `[2, 3, 4]` | | **Aliases** | `apply`, `array_apply`, `array_transform`, `list_apply` | ###### `list_unique(list)` {#docs:stable:sql:functions:list::list_uniquelist} | | | |:--|:--------| | **Description** |Counts the unique elements of a `list`. | | **Example** | `list_unique([1, 1, NULL, -3, 1, 5])` | | **Result** | `3` | | **Alias** | `array_unique` | ###### `list_value(arg, ...)` {#docs:stable:sql:functions:list::list_valuearg-} | | | |:--|:--------| | **Description** |Creates a LIST containing the argument values. | | **Example** | `list_value(4, 5, 6)` | | **Result** | `[4, 5, 6]` | | **Alias** | `list_pack` | ###### `list_var_pop(list)` {#docs:stable:sql:functions:list::list_var_poplist} | | | |:--|:--------| | **Description** |Applies aggregate function [`var_pop`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_var_pop([3,3,9])` | | **Result** | `8.0` | ###### `list_var_samp(list)` {#docs:stable:sql:functions:list::list_var_samplist} | | | |:--|:--------| | **Description** |Applies aggregate function [`var_samp`](#docs:stable:sql:functions:aggregates::general-aggregate-functions) to the `list`. | | **Example** | `list_var_samp([3,3,9])` | | **Result** | `12.0` | ###### `list_where(value_list, mask_list)` {#docs:stable:sql:functions:list::list_wherevalue_list-mask_list} | | | |:--|:--------| | **Description** |Returns a list with the `BOOLEAN`s in `mask_list` applied as a mask to the `value_list`. | | **Example** | `list_where([10, 20, 30, 40], [true, false, false, true])` | | **Result** | `[10, 40]` | | **Alias** | `array_where` | ###### `list_zip(list_1, ..., list_n[, truncate])` {#docs:stable:sql:functions:list::list_ziplist_1--list_n-truncate} | | | |:--|:--------| | **Description** |Zips n `LIST`s to a new `LIST` whose length will be that of the longest list. Its elements are structs of n elements from each list `list_1`, â€¦, `list_n`, missing elements are replaced with `NULL`. If `truncate` is set, all lists are truncated to the smallest list length. | | **Example 1** | `list_zip([1, 2], [3, 4], [5, 6])` | | **Result** | `[(1, 3, 5), (2, 4, 6)]` | | **Example 2** | `list_zip([1, 2], [3, 4], [5, 6, 7])` | | **Result** | `[(1, 3, 5), (2, 4, 6), (NULL, NULL, 7)]` | | **Example 3** | `list_zip([1, 2], [3, 4], [5, 6, 7], true)` | | **Result** | `[(1, 3, 5), (2, 4, 6)]` | | **Alias** | `array_zip` | ###### `range(start[, stop][, step])` {#docs:stable:sql:functions:list::rangestart-stop-step} | | | |:--|:--------| | **Description** |Creates a list of values between `start` and `stop` - the stop parameter is exclusive. | | **Example** | `range(2, 5, 3)` | | **Result** | `[2]` | ###### `repeat(list, count)` {#docs:stable:sql:functions:list::repeatlist-count} | | | |:--|:--------| | **Description** |Repeats the `list` `count` number of times. | | **Example** | `repeat([1, 2, 3], 5)` | | **Result** | `[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]` | ###### `unnest(list)` {#docs:stable:sql:functions:list::unnestlist} | | | |:--|:--------| | **Description** |Unnests a list by one level. Note that this is a special function that alters the cardinality of the result. See the [unnest page](#docs:stable:sql:query_syntax:unnest) for more details. | | **Example** | `unnest([1, 2, 3])` | | **Result** | Multiple rows: `'1'`, `'2'`, `'3'` | ###### `unpivot_list(arg, ...)` {#docs:stable:sql:functions:list::unpivot_listarg-} | | | |:--|:--------| | **Description** |Identical to list_value, but generated as part of unpivot for better error messages. | | **Example** | `unpivot_list(4, 5, 6)` | | **Result** | `[4, 5, 6]` | #### List Operators {#docs:stable:sql:functions:list::list-operators} The following operators are supported for lists: | Operator | Description | Example | Result | |-|--|---|-| | `&&` | Alias for [`list_has_any`](#::list_has_anylist1-list2). | `[1, 2, 3, 4, 5] && [2, 5, 5, 6]` | `true` | | `@>` | Alias for [`list_has_all`](#::list_has_alllist1-list2), where the list on the **right** of the operator is the sublist. | `[1, 2, 3, 4] @> [3, 4, 3]` | `true` | | `<@` | Alias for [`list_has_all`](#::list_has_alllist1-list2), where the list on the **left** of the operator is the sublist. | `[1, 4] <@ [1, 2, 3, 4]` | `true` | | `||` | Similar to [`list_concat`](#::list_concatlist_1--list_n), except any `NULL` input results in `NULL`. | `[1, 2, 3] || [4, 5, 6]` | `[1, 2, 3, 4, 5, 6]` | | `<=>` | Alias for [`list_cosine_distance`](#::list_cosine_distancelist1-list2). | `[1, 2, 3] <=> [1, 2, 5]` | `0.007416606` | | `<->` | Alias for [`list_distance`](#::list_distancelist1-list2). | `[1, 2, 3] <-> [1, 2, 5]` | `2.0` | #### List Comprehension {#docs:stable:sql:functions:list::list-comprehension} Python-style list comprehension can be used to compute expressions over elements in a list. For example: ```sql SELECT [lower(x) FOR x IN strings] AS strings FROM (VALUES (['Hello', '', 'World'])) t(strings); ``` | strings | |------------------| | [hello, , world] | ```sql SELECT [upper(x) FOR x IN strings IF len(x) > 0] AS strings FROM (VALUES (['Hello', '', 'World'])) t(strings); ``` | strings | |----------------| | [HELLO, WORLD] | List comprehensions can also use the position of the list elements by adding a second variable. In the following example, we use `x, i`, where `x` is the value and `i` is the position: ```sql SELECT [4, 5, 6] AS l, [x FOR x, i IN l IF i != 2] AS filtered; ``` | l | filtered | |-----------|----------| | [4, 5, 6] | [4, 6] | Under the hood, `[f(x) FOR x IN y IF g(x)]` is translated to `list_transform(list_filter(y, x -> f(x)), x -> f(x))`. #### Range Functions {#docs:stable:sql:functions:list::range-functions} DuckDB offers two range functions, [`range(start, stop, step)`](#::range) and [`generate_series(start, stop, step)`](#::generate_series), and their variants with default arguments for `stop` and `step`. The two functions' behavior is different regarding their `stop` argument. This is documented below. ##### `range` {#docs:stable:sql:functions:list::range} The `range` function creates a list of values in the range between `start` and `stop`. The `start` parameter is inclusive, while the `stop` parameter is exclusive. The default value of `start` is 0 and the default value of `step` is 1. Based on the number of arguments, the following variants of `range` exist. ###### `range(stop)` {#docs:stable:sql:functions:list::rangestop} ```sql SELECT range(5); ``` ```text [0, 1, 2, 3, 4] ``` ###### `range(start, stop)` {#docs:stable:sql:functions:list::rangestart-stop} ```sql SELECT range(2, 5); ``` ```text [2, 3, 4] ``` ###### `range(start, stop, step)` {#docs:stable:sql:functions:list::rangestart-stop-step} ```sql SELECT range(2, 5, 3); ``` ```text [2] ``` ##### `generate_series` {#docs:stable:sql:functions:list::generate_series} The `generate_series` function creates a list of values in the range between `start` and `stop`. Both the `start` and the `stop` parameters are inclusive. The default value of `start` is 0 and the default value of `step` is 1. Based on the number of arguments, the following variants of `generate_series` exist. ###### `generate_series(stop)` {#docs:stable:sql:functions:list::generate_seriesstop} ```sql SELECT generate_series(5); ``` ```text [0, 1, 2, 3, 4, 5] ``` ###### `generate_series(start, stop)` {#docs:stable:sql:functions:list::generate_seriesstart-stop} ```sql SELECT generate_series(2, 5); ``` ```text [2, 3, 4, 5] ``` ###### `generate_series(start, stop, step)` {#docs:stable:sql:functions:list::generate_seriesstart-stop-step} ```sql SELECT generate_series(2, 5, 3); ``` ```text [2, 5] ``` ###### `generate_subscripts(arr, dim)` {#docs:stable:sql:functions:list::generate_subscriptsarr-dim} The `generate_subscripts(arr, dim)` function generates indexes along the `dim`th dimension of array `arr`. ```sql SELECT generate_subscripts([4, 5, 6], 1) AS i; ``` | i | |--:| | 1 | | 2 | | 3 | ##### Date Ranges {#docs:stable:sql:functions:list::date-ranges} Date ranges are also supported for `TIMESTAMP` and `TIMESTAMP WITH TIME ZONE` values. Note that for these types, the `stop` and `step` arguments have to be specified explicitly (a default value is not provided). ###### `range` for Date Ranges {#docs:stable:sql:functions:list::range-for-date-ranges} ```sql SELECT * FROM range(DATE '1992-01-01', DATE '1992-03-01', INTERVAL '1' MONTH); ``` | range | |---------------------| | 1992-01-01 00:00:00 | | 1992-02-01 00:00:00 | ###### `generate_series` for Date Ranges {#docs:stable:sql:functions:list::generate_series-for-date-ranges} ```sql SELECT * FROM generate_series(DATE '1992-01-01', DATE '1992-03-01', INTERVAL '1' MONTH); ``` | generate_series | |---------------------| | 1992-01-01 00:00:00 | | 1992-02-01 00:00:00 | | 1992-03-01 00:00:00 | #### Slicing {#docs:stable:sql:functions:list::slicing} The function [`list_slice`](#::list_slicelist-begin-end) can be used to extract a sublist from a list. The following variants exist: * `list_slice(list, begin, end)` * `list_slice(list, begin, end, step)` * `array_slice(list, begin, end)` * `array_slice(list, begin, end, step)` * `list[begin:end]` * `list[begin:end:step]` The arguments are as follows: * `list` * Is the list to be sliced * `begin` * Is the index of the first element to be included in the slice * When `begin < 0` the index is counted from the end of the list * When `begin < 0` and `-begin > length`, `begin` is clamped to the beginning of the list * When `begin > length`, the result is an empty list * **Bracket Notation:** When `begin` is omitted, it defaults to the beginning of the list * `end` * Is the index of the last element to be included in the slice * When `end < 0` the index is counted from the end of the list * When `end > length`, end is clamped to `length` * When `end < begin`, the result is an empty list * **Bracket Notation:** When `end` is omitted, it defaults to the end of the list. When `end` is omitted and a `step` is provided, `end` must be replaced with a `-` * `step` *(optional)* * Is the step size between elements in the slice * When `step < 0` the slice is reversed, and `begin` and `end` are swapped * Must be non-zero Examples: ```sql SELECT list_slice([1, 2, 3, 4, 5], 2, 4); ``` ```text [2, 3, 4] ``` ```sql SELECT ([1, 2, 3, 4, 5])[2:4:2]; ``` ```text [2, 4] ``` ```sql SELECT([1, 2, 3, 4, 5])[4:2:-2]; ``` ```text [4, 2] ``` ```sql SELECT ([1, 2, 3, 4, 5])[:]; ``` ```text [1, 2, 3, 4, 5] ``` ```sql SELECT ([1, 2, 3, 4, 5])[:-:2]; ``` ```text [1, 3, 5] ``` ```sql SELECT ([1, 2, 3, 4, 5])[:-:-2]; ``` ```text [5, 3, 1] ``` #### List Aggregates {#docs:stable:sql:functions:list::list-aggregates} The function [`list_aggregate`](#::list_aggregatelist-function_name-) allows the execution of arbitrary existing aggregate functions on the elements of a list. Its first argument is the list (column), its second argument is the aggregate function name, e.g., `min`, `histogram` or `sum`. `list_aggregate` accepts additional arguments after the aggregate function name. These extra arguments are passed directly to the aggregate function, which serves as the second argument of `list_aggregate`. Order-sensitive aggregate functions are applied in the order of the list. The `ORDER BY`, `DISTINCT` and `FILTER` clauses are not supported by `list_aggregate`. They may instead be emulated using `list_sort`, `list_grade_up`, `list_select`, `list_distinct` and `list_filter`. ```sql SELECT list_aggregate([1, 2, -4, NULL], 'min'); ``` ```text -4 ``` ```sql SELECT list_aggregate([2, 4, 8, 42], 'sum'); ``` ```text 56 ``` ```sql SELECT list_aggregate([[1, 2], [NULL], [2, 10, 3]], 'last'); ``` ```text [2, 10, 3] ``` ```sql SELECT list_aggregate([2, 4, 8, 42], 'string_agg', '|'); ``` ```text 2|4|8|42 ``` ##### `list_*` Rewrite Functions {#docs:stable:sql:functions:list::list_-rewrite-functions} The following is a list of existing rewrites. Rewrites simplify the use of the list aggregate function by only taking the list (column) as their argument. `list_avg`, `list_var_samp`, `list_var_pop`, `list_stddev_pop`, `list_stddev_samp`, `list_sem`, `list_approx_count_distinct`, `list_bit_xor`, `list_bit_or`, `list_bit_and`, `list_bool_and`, `list_bool_or`, `list_count`, `list_entropy`, `list_last`, `list_first`, `list_kurtosis`, `list_kurtosis_pop`, `list_min`, `list_max`, `list_product`, `list_skewness`, `list_sum`, `list_string_agg`, `list_mode`, `list_median`, `list_mad` and `list_histogram`. ```sql SELECT list_min([1, 2, -4, NULL]); ``` ```text -4 ``` ```sql SELECT list_sum([2, 4, 8, 42]); ``` ```text 56 ``` ```sql SELECT list_last([[1, 2], [NULL], [2, 10, 3]]); ``` ```text [2, 10, 3] ``` ###### `array_to_string` {#docs:stable:sql:functions:list::array_to_string} Concatenates list/array elements using an optional delimiter. ```sql SELECT array_to_string([1, 2, 3], '-') AS str; ``` ```text 1-2-3 ``` This is equivalent to the following SQL: ```sql SELECT list_aggr([1, 2, 3], 'string_agg', '-') AS str; ``` ```text 1-2-3 ``` #### Sorting Lists {#docs:stable:sql:functions:list::sorting-lists} The function `list_sort` sorts the elements of a list either in ascending or descending order. In addition, it allows to provide whether `NULL` values should be moved to the beginning or to the end of the list. It has the same sorting behavior as DuckDB's `ORDER BY` clause. Therefore, (nested) values compare the same in `list_sort` as in `ORDER BY`. By default, if no modifiers are provided, DuckDB sorts `ASC NULLS FIRST`. I.e., the values are sorted in ascending order and `NULL` values are placed first. This is identical to the default sort order of SQLite. The default sort order can be changed using [`PRAGMA` statements.](#..:query_syntax:orderby). `list_sort` leaves it open to the user whether they want to use the default sort order or a custom order. `list_sort` takes up to two additional optional parameters. The second parameter provides the sort order and can be either `ASC` or `DESC`. The third parameter provides the `NULL` order and can be either `NULLS FIRST` or `NULLS LAST`. This query uses the default sort order and the default `NULL` order. ```sql SELECT list_sort([1, 3, NULL, 5, NULL, -5]); ``` ```sql [NULL, NULL, -5, 1, 3, 5] ``` This query provides the sort order. The `NULL` order uses the configurable default value. ```sql SELECT list_sort([1, 3, NULL, 2], 'ASC'); ``` ```sql [NULL, 1, 2, 3] ``` This query provides both the sort order and the `NULL` order. ```sql SELECT list_sort([1, 3, NULL, 2], 'DESC', 'NULLS FIRST'); ``` ```sql [NULL, 3, 2, 1] ``` `list_reverse_sort` has an optional second parameter providing the `NULL` sort order. It can be either `NULLS FIRST` or `NULLS LAST`. This query uses the default `NULL` sort order. ```sql SELECT list_sort([1, 3, NULL, 5, NULL, -5]); ``` ```sql [NULL, NULL, -5, 1, 3, 5] ``` This query provides the `NULL` sort order. ```sql SELECT list_reverse_sort([1, 3, NULL, 2], 'NULLS LAST'); ``` ```sql [3, 2, 1, NULL] ``` #### Flattening {#docs:stable:sql:functions:list::flattening} The flatten function is a scalar function that converts a list of lists into a single list by concatenating each sub-list together. Note that this only flattens one level at a time, not all levels of sub-lists. Convert a list of lists into a single list: ```sql SELECT flatten([ [1, 2], [3, 4] ]); ``` ```text [1, 2, 3, 4] ``` If the list has multiple levels of lists, only the first level of sub-lists is concatenated into a single list: ```sql SELECT flatten([ [ [1, 2], [3, 4], ], [ [5, 6], [7, 8], ] ]); ``` ```text [[1, 2], [3, 4], [5, 6], [7, 8]] ``` In general, the input to the flatten function should be a list of lists (not a single level list). However, the behavior of the flatten function has specific behavior when handling empty lists and `NULL` values. If the input list is empty, return an empty list: ```sql SELECT flatten([]); ``` ```text [] ``` If the entire input to flatten is `NULL`, return `NULL`: ```sql SELECT flatten(NULL); ``` ```text NULL ``` If a list whose only entry is `NULL` is flattened, return an empty list: ```sql SELECT flatten([NULL]); ``` ```text [] ``` If the sub-list in a list of lists only contains `NULL`, do not modify the sub-list: ```sql -- (Note the extra set of parentheses vs. the prior example) SELECT flatten([[NULL]]); ``` ```text [NULL] ``` Even if the only contents of each sub-list is `NULL`, still concatenate them together. Note that no de-duplication occurs when flattening. See `list_distinct` function for de-duplication: ```sql SELECT flatten([[NULL], [NULL]]); ``` ```text [NULL, NULL] ``` #### Lambda Functions {#docs:stable:sql:functions:list::lambda-functions} DuckDB supports lambda functions in the form `(parameter1, parameter2, ...) -> expression`. For details, see the [lambda functions page](#docs:stable:sql:functions:lambda). #### Related Functions {#docs:stable:sql:functions:list::related-functions} * The [aggregate functions](#docs:stable:sql:functions:aggregates) `list` and `histogram` produce lists and lists of structs. * The [`unnest` function](#docs:stable:sql:query_syntax:unnest) is used to unnest a list by one level. ### Map Functions {#docs:stable:sql:functions:map} | Name | Description | |:--|:-------| | [`cardinality(map)`](#::cardinalitymap) | Return the size of the map (or the number of entries in the map). | | [`element_at(map, key)`](#::element_atmap-key) | Return the value for a given `key` as a list, or an empty list if the key is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys; else, an error is thrown. | | [`map_concat(maps...)`](#::map_concatmaps) | Returns a map created from merging the input `maps`. On key collision the value is taken from the last map with that key. | | [`map_contains(map, key)`](#::map_containsmap-key) | Checks if a map contains a given key. | | [`map_contains_entry(map, key, value)`](#::map_contains_entrymap-key-value) | Check if a map contains a given key-value pair. | | [`map_contains_value(map, value)`](#::map_contains_valuemap-value) | Checks if a map contains a given value. | | [`map_entries(map)`](#::map_entriesmap) | Return a list of struct(k, v) for each key-value pair in the map. | | [`map_extract(map, key)`](#::map_extractmap-key) | Return the value for a given `key` as a list, or an empty list if the key is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys; else, an error is thrown. | | [`map_extract_value(map, key)`](#::map_extract_valuemap-key) | Returns the value for a given `key` or `NULL` if the `key` is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys; else, an error is thrown. | | [`map_from_entries(STRUCT(k, v)[])`](#map_from_entriesstructk-v) | Returns a map created from the entries of the array. | | [`map_keys(map)`](#::map_keysmap) | Return a list of all keys in the map. | | [`map_values(map)`](#::map_valuesmap) | Return a list of all values in the map. | | [`map()`](#::map) | Returns an empty map. | | [`map[entry]`](#mapentry) | Returns the value for a given `key` or `NULL` if the `key` is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys; else, an error is thrown. | ###### `cardinality(map)` {#docs:stable:sql:functions:map::cardinalitymap} | | | |:--|:--------| | **Description** |Return the size of the map (or the number of entries in the map). | | **Example** | `cardinality(map([4, 2], ['a', 'b']))` | | **Result** | `2` | ###### `element_at(map, key)` {#docs:stable:sql:functions:map::element_atmap-key} | | | |:--|:--------| | **Description** |Return the value for a given `key` as a list, or an empty list if the key is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys; else, an error is thrown. | | **Example** | `element_at(map([100, 5], [42, 43]), 100)` | | **Result** | `[42]` | | **Aliases** | `map_extract(map, key)` | ###### `map_concat(maps...)` {#docs:stable:sql:functions:map::map_concatmaps} | | | |:--|:--------| | **Description** |Returns a map created from merging the input `maps`. On key collision the value is taken from the last map with that key. | | **Example** | `map_concat(MAP {'key1': 10, 'key2': 20}, MAP {'key3': 30}, MAP {'key2': 5})` | | **Result** | `{key1=10, key2=5, key3=30}` | ###### `map_contains(map, key)` {#docs:stable:sql:functions:map::map_containsmap-key} | | | |:--|:--------| | **Description** |Checks if a map contains a given key. | | **Example** | `map_contains(MAP {'key1': 10, 'key2': 20, 'key3': 30}, 'key2')` | | **Result** | `true` | ###### `map_contains_entry(map, key, value)` {#docs:stable:sql:functions:map::map_contains_entrymap-key-value} | | | |:--|:--------| | **Description** |Check if a map contains a given key-value pair. | | **Example** | `map_contains_entry(MAP {'key1': 10, 'key2': 20, 'key3': 30}, 'key2', 20)` | | **Result** | `true` | ###### `map_contains_value(map, value)` {#docs:stable:sql:functions:map::map_contains_valuemap-value} | | | |:--|:--------| | **Description** |Checks if a map contains a given value. | | **Example** | `map_contains_value(MAP {'key1': 10, 'key2': 20, 'key3': 30}, 20)` | | **Result** | `true` | ###### `map_entries(map)` {#docs:stable:sql:functions:map::map_entriesmap} | | | |:--|:--------| | **Description** |Return a list of struct(k, v) for each key-value pair in the map. | | **Example** | `map_entries(map([100, 5], [42, 43]))` | | **Result** | `[{'key': 100, 'value': 42}, {'key': 5, 'value': 43}]` | ###### `map_extract(map, key)` {#docs:stable:sql:functions:map::map_extractmap-key} | | | |:--|:--------| | **Description** |Return the value for a given `key` as a list, or `NULL` if the key is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys else an error is returned. | | **Example** | `map_extract(map([100, 5], [42, 43]), 100)` | | **Result** | `[42]` | | **Aliases** | `element_at(map, key)` | ###### `map_extract_value(map, key)` {#docs:stable:sql:functions:map::map_extract_valuemap-key} | | | |:--|:--------| | **Description** |Returns the value for a given `key` or `NULL` if the `key` is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys; else, an error is thrown. | | **Example** | `map_extract_value(map([100, 5], [42, 43]), 100);` | | **Result** | `42` | | **Aliases** | `map[key]` | ###### `map_from_entries(STRUCT(k, v)[])` {#docs:stable:sql:functions:map::map_from_entriesstructk-v} | | | |:--|:--------| | **Description** |Returns a map created from the entries of the array. | | **Example** | `map_from_entries([{k: 5, v: 'val1'}, {k: 3, v: 'val2'}])` | | **Result** | `{5=val1, 3=val2}` | ###### `map_keys(map)` {#docs:stable:sql:functions:map::map_keysmap} | | | |:--|:--------| | **Description** |Return a list of all keys in the map. | | **Example** | `map_keys(map([100, 5], [42,43]))` | | **Result** | `[100, 5]` | ###### `map_values(map)` {#docs:stable:sql:functions:map::map_valuesmap} | | | |:--|:--------| | **Description** |Return a list of all values in the map. | | **Example** | `map_values(map([100, 5], [42, 43]))` | | **Result** | `[42, 43]` | ###### `map()` {#docs:stable:sql:functions:map::map} | | | |:--|:--------| | **Description** |Returns an empty map. | | **Example** | `map()` | | **Result** | `{}` | ###### `map[entry]` {#docs:stable:sql:functions:map::mapentry} | | | |:--|:--------| | **Description** |Returns the value for a given `key` or `NULL` if the `key` is not contained in the map. The type of the key provided in the second parameter must match the type of the map's keys; else, an error is thrown. | | **Example** | `map([100, 5], ['a', 'b'])[100]` | | **Result** | `a` | | **Aliases** | `map_extract_value(map, key)` | ### Nested Functions {#docs:stable:sql:functions:nested} There are five [nested data types](#docs:stable:sql:data_types:overview::nested--composite-types): | Name | Type page | Functions page | |--|---|---| | `ARRAY` | [`ARRAY` type](#docs:stable:sql:data_types:array) | [`ARRAY` functions](#docs:stable:sql:functions:array) | | `LIST` | [`LIST` type](#docs:stable:sql:data_types:list) | [`LIST` functions](#docs:stable:sql:functions:list) | | `MAP` | [`MAP` type](#docs:stable:sql:data_types:map) | [`MAP` functions](#docs:stable:sql:functions:map) | | `STRUCT` | [`STRUCT` type](#docs:stable:sql:data_types:struct) | [`STRUCT` functions](#docs:stable:sql:functions:struct) | | `UNION` | [`UNION` type](#docs:stable:sql:data_types:union) | [`UNION` functions](#docs:stable:sql:functions:union) | ### Numeric Functions {#docs:stable:sql:functions:numeric} #### Numeric Operators {#docs:stable:sql:functions:numeric::numeric-operators} The table below shows the available mathematical operators for [numeric types](#docs:stable:sql:data_types:numeric). | Operator | Description | Example | Result | |-|-----|--|-| | `+` | Addition | `2 + 3` | `5` | | `-` | Subtraction | `2 - 3` | `-1` | | `*` | Multiplication | `2 * 3` | `6` | | `/` | Float division | `5 / 2` | `2.5` | | `//` | Division | `5 // 2` | `2` | | `%` | Modulo (remainder) | `5 % 4` | `1` | | `**` | Exponent | `3 ** 4` | `81` | | `^` | Exponent (alias for `**`) | `3 ^ 4` | `81` | | `&` | Bitwise AND | `91 & 15` | `11` | | `|` | Bitwise OR | `32 | 3` | `35` | | `<<` | Bitwise shift left | `1 << 4` | `16` | | `>>` | Bitwise shift right | `8 >> 2` | `2` | | `~` | Bitwise negation | `~15` | `-16` | | `!` | Factorial of `x` | `4!` | `24` | ##### Division and Modulo Operators {#docs:stable:sql:functions:numeric::division-and-modulo-operators} There are two division operators: `/` and `//`. They are equivalent when at least one of the operands is a `FLOAT` or a `DOUBLE`. When both operands are integers, `/` performs floating points division (` 5 / 2 = 2.5`) while `//` performs integer division (` 5 // 2 = 2`). ##### Supported Types {#docs:stable:sql:functions:numeric::supported-types} The modulo, bitwise, and negation and factorial operators work only on integral data types, whereas the others are available for all numeric data types. #### Numeric Functions {#docs:stable:sql:functions:numeric::numeric-functions} The table below shows the available mathematical functions. | Name | Description | |:--|:-------| | [`@(x)`](#::x) | Absolute value. Parentheses are optional if `x` is a column name. | | [`abs(x)`](#::absx) | Absolute value. | | [`acos(x)`](#::acosx) | Computes the inverse cosine of `x`. | | [`acosh(x)`](#::acoshx) | Computes the inverse hyperbolic cosine of `x`. | | [`add(x, y)`](#::addx-y) | Alias for `x + y`. | | [`asin(x)`](#::asinx) | Computes the inverse sine of `x`. | | [`asinh(x)`](#::asinhx) | Computes the inverse hyperbolic sine of `x`. | | [`atan(x)`](#::atanx) | Computes the inverse tangent of `x`. | | [`atanh(x)`](#::atanhx) | Computes the inverse hyperbolic tangent of `x`. | | [`atan2(y, x)`](#::atan2y-x) | Computes the inverse tangent of `(y, x)`. | | [`bit_count(x)`](#::bit_countx) | Returns the number of bits that are set. | | [`cbrt(x)`](#::cbrtx) | Returns the cube root of the number. | | [`ceil(x)`](#::ceilx) | Rounds the number up. | | [`ceiling(x)`](#::ceilingx) | Rounds the number up. Alias of `ceil`. | | [`cos(x)`](#::cosx) | Computes the cosine of `x`. | | [`cot(x)`](#::cotx) | Computes the cotangent of `x`. | | [`degrees(x)`](#::degreesx) | Converts radians to degrees. | | [`divide(x, y)`](#::dividex-y) | Alias for `x // y`. | | [`even(x)`](#::evenx) | Round to next even number by rounding away from zero. | | [`exp(x)`](#::expx) | Computes `e ** x`. | | [`factorial(x)`](#::factorialx) | See the `!` operator. Computes the product of the current integer and all integers below it. | | [`fdiv(x, y)`](#::fdivx-y) | Performs integer division (` x // y`) but returns a `DOUBLE` value. | | [`floor(x)`](#::floorx) | Rounds the number down. | | [`fmod(x, y)`](#::fmodx-y) | Calculates the modulo value. Always returns a `DOUBLE` value. | | [`gamma(x)`](#::gammax) | Interpolation of the factorial of `x - 1`. Fractional inputs are allowed. | | [`gcd(x, y)`](#::gcdx-y) | Computes the greatest common divisor of `x` and `y`. | | [`greatest_common_divisor(x, y)`](#::greatest_common_divisorx-y) | Computes the greatest common divisor of `x` and `y`. | | [`greatest(x1, x2, ...)`](#::greatestx1-x2-) | Selects the largest value. | | [`isfinite(x)`](#::isfinitex) | Returns true if the floating point value is finite, false otherwise. | | [`isinf(x)`](#::isinfx) | Returns true if the floating point value is infinite, false otherwise. | | [`isnan(x)`](#::isnanx) | Returns true if the floating point value is not a number, false otherwise. | | [`lcm(x, y)`](#::lcmx-y) | Computes the least common multiple of `x` and `y`. | | [`least_common_multiple(x, y)`](#::least_common_multiplex-y) | Computes the least common multiple of `x` and `y`. | | [`least(x1, x2, ...)`](#::leastx1-x2-) | Selects the smallest value. | | [`lgamma(x)`](#::lgammax) | Computes the log of the `gamma` function. | | [`ln(x)`](#::lnx) | Computes the natural logarithm of `x`. | | [`log(x)`](#::logx) | Computes the base-10 logarithm of `x`. | | [`log10(x)`](#::log10x) | Alias of `log`. Computes the base-10 logarithm of `x`. | | [`log2(x)`](#::log2x) | Computes the base-2 log of `x`. | | [`multiply(x, y)`](#::multiplyx-y) | Alias for `x * y`. | | [`nextafter(x, y)`](#::nextafterx-y) | Return the next floating point value after `x` in the direction of `y`. | | [`pi()`](#::pi) | Returns the value of pi. | | [`pow(x, y)`](#::powx-y) | Computes `x` to the power of `y`. | | [`power(x, y)`](#::powerx-y) | Alias of `pow`. Computes `x` to the power of `y`. | | [`radians(x)`](#::radiansx) | Converts degrees to radians. | | [`random()`](#::random) | Returns a random number `x` in the range `0.0 <= x < 1.0`. | | [`round_even(v NUMERIC, s INTEGER)`](#::round_evenv-numeric-s-integer) | Alias of `roundbankers(v, s)`. Round to `s` decimal places using the [_rounding half to even_ rule](https://en.wikipedia.org/wiki/Rounding#Rounding_half_to_even). Values `s < 0` are allowed. | | [`round(v NUMERIC, s INTEGER)`](#::roundv-numeric-s-integer) | Round to `s` decimal places. Values `s < 0` are allowed. | | [`setseed(x)`](#::setseedx) | Sets the seed to be used for the random function. | | [`sign(x)`](#::signx) | Returns the sign of `x` as -1, 0 or 1. | | [`signbit(x)`](#::signbitx) | Returns whether the signbit is set or not. | | [`sin(x)`](#::sinx) | Computes the sin of `x`. | | [`sqrt(x)`](#::sqrtx) | Returns the square root of the number. | | [`subtract(x, y)`](#::subtractx-y) | Alias for `x - y`. | | [`tan(x)`](#::tanx) | Computes the tangent of `x`. | | [`trunc(x)`](#::truncx) | Truncates the number. | | [`xor(x, y)`](#::xorx-y) | Bitwise XOR. | ###### `@(x)` {#docs:stable:sql:functions:numeric::x} | | | |:--|:--------| | **Description** |Absolute value. Parentheses are optional if `x` is a column name. | | **Example** | `@(-17.4)` | | **Result** | `17.4` | | **Alias** | `abs` | ###### `abs(x)` {#docs:stable:sql:functions:numeric::absx} | | | |:--|:--------| | **Description** |Absolute value. | | **Example** | `abs(-17.4)` | | **Result** | `17.4` | | **Alias** | `@` | ###### `acos(x)` {#docs:stable:sql:functions:numeric::acosx} | | | |:--|:--------| | **Description** |Computes the inverse cosine of `x`. | | **Example** | `acos(0.5)` | | **Result** | `1.0471975511965976` | ###### `acosh(x)` {#docs:stable:sql:functions:numeric::acoshx} | | | |:--|:--------| | **Description** |Computes the inverse hyperbolic cosine of `x`. | | **Example** | `acosh(1.5)` | | **Result** | `0.9624236501192069` | ###### `add(x, y)` {#docs:stable:sql:functions:numeric::addx-y} | | | |:--|:--------| | **Description** |Alias for `x + y`. | | **Example** | `add(2, 3)` | | **Result** | `5` | ###### `asin(x)` {#docs:stable:sql:functions:numeric::asinx} | | | |:--|:--------| | **Description** |Computes the inverse sine of `x`. | | **Example** | `asin(0.5)` | | **Result** | `0.5235987755982989` | ###### `asinh(x)` {#docs:stable:sql:functions:numeric::asinhx} | | | |:--|:--------| | **Description** |Computes the inverse hyperbolix sine of `x`. | | **Example** | `asinh(0.5)` | | **Result** | `0.48121182505960347` | ###### `atan(x)` {#docs:stable:sql:functions:numeric::atanx} | | | |:--|:--------| | **Description** |Computes the inverse tangent of `x`. | | **Example** | `atan(0.5)` | | **Result** | `0.4636476090008061` | ###### `atanh(x)` {#docs:stable:sql:functions:numeric::atanhx} | | | |:--|:--------| | **Description** |Computes the inverse hyperbolic tangent of `x`. | | **Example** | `atanh(0.5)` | | **Result** | `0.5493061443340549` | ###### `atan2(y, x)` {#docs:stable:sql:functions:numeric::atan2y-x} | | | |:--|:--------| | **Description** |Computes the inverse tangent (y, x). | | **Example** | `atan2(0.5, 0.5)` | | **Result** | `0.7853981633974483` | ###### `bit_count(x)` {#docs:stable:sql:functions:numeric::bit_countx} | | | |:--|:--------| | **Description** |Returns the number of bits that are set. | | **Example** | `bit_count(31)` | | **Result** | `5` | ###### `cbrt(x)` {#docs:stable:sql:functions:numeric::cbrtx} | | | |:--|:--------| | **Description** |Returns the cube root of the number. | | **Example** | `cbrt(8)` | | **Result** | `2` | ###### `ceil(x)` {#docs:stable:sql:functions:numeric::ceilx} | | | |:--|:--------| | **Description** |Rounds the number up. | | **Example** | `ceil(17.4)` | | **Result** | `18` | ###### `ceiling(x)` {#docs:stable:sql:functions:numeric::ceilingx} | | | |:--|:--------| | **Description** |Rounds the number up. Alias of `ceil`. | | **Example** | `ceiling(17.4)` | | **Result** | `18` | ###### `cos(x)` {#docs:stable:sql:functions:numeric::cosx} | | | |:--|:--------| | **Description** |Computes the cosine of `x`. | | **Example** | `cos(pi() / 3)` | | **Result** | `0.5000000000000001 ` | ###### `cot(x)` {#docs:stable:sql:functions:numeric::cotx} | | | |:--|:--------| | **Description** |Computes the cotangent of `x`. | | **Example** | `cot(0.5)` | | **Result** | `1.830487721712452` | ###### `degrees(x)` {#docs:stable:sql:functions:numeric::degreesx} | | | |:--|:--------| | **Description** |Converts radians to degrees. | | **Example** | `degrees(pi())` | | **Result** | `180` | ###### `divide(x, y)` {#docs:stable:sql:functions:numeric::dividex-y} | | | |:--|:--------| | **Description** |Alias for `x // y`. | | **Example** | `divide(5, 2)` | | **Result** | `2` | ###### `even(x)` {#docs:stable:sql:functions:numeric::evenx} | | | |:--|:--------| | **Description** |Round to next even number by rounding away from zero. | | **Example** | `even(2.9)` | | **Result** | `4` | ###### `exp(x)` {#docs:stable:sql:functions:numeric::expx} | | | |:--|:--------| | **Description** |Computes `e ** x`. | | **Example** | `exp(0.693)` | | **Result** | `2` | ###### `factorial(x)` {#docs:stable:sql:functions:numeric::factorialx} | | | |:--|:--------| | **Description** |See the `!` operator. Computes the product of the current integer and all integers below it. | | **Example** | `factorial(4)` | | **Result** | `24` | ###### `fdiv(x, y)` {#docs:stable:sql:functions:numeric::fdivx-y} | | | |:--|:--------| | **Description** |Performs integer division (` x // y`) but returns a `DOUBLE` value. | | **Example** | `fdiv(5, 2)` | | **Result** | `2.0` | ###### `floor(x)` {#docs:stable:sql:functions:numeric::floorx} | | | |:--|:--------| | **Description** |Rounds the number down. | | **Example** | `floor(17.4)` | | **Result** | `17` | ###### `fmod(x, y)` {#docs:stable:sql:functions:numeric::fmodx-y} | | | |:--|:--------| | **Description** |Calculates the modulo value. Always returns a `DOUBLE` value. | | **Example** | `fmod(5, 2)` | | **Result** | `1.0` | ###### `gamma(x)` {#docs:stable:sql:functions:numeric::gammax} | | | |:--|:--------| | **Description** |Interpolation of the factorial of `x - 1`. Fractional inputs are allowed. | | **Example** | `gamma(5.5)` | | **Result** | `52.34277778455352` | ###### `gcd(x, y)` {#docs:stable:sql:functions:numeric::gcdx-y} | | | |:--|:--------| | **Description** |Computes the greatest common divisor of `x` and `y`. | | **Example** | `gcd(42, 57)` | | **Result** | `3` | ###### `greatest_common_divisor(x, y)` {#docs:stable:sql:functions:numeric::greatest_common_divisorx-y} | | | |:--|:--------| | **Description** |Computes the greatest common divisor of `x` and `y`. | | **Example** | `greatest_common_divisor(42, 57)` | | **Result** | `3` | ###### `greatest(x1, x2, ...)` {#docs:stable:sql:functions:numeric::greatestx1-x2-} | | | |:--|:--------| | **Description** |Selects the largest value. | | **Example** | `greatest(3, 2, 4, 4)` | | **Result** | `4` | ###### `isfinite(x)` {#docs:stable:sql:functions:numeric::isfinitex} | | | |:--|:--------| | **Description** |Returns true if the floating point value is finite, false otherwise. | | **Example** | `isfinite(5.5)` | | **Result** | `true` | ###### `isinf(x)` {#docs:stable:sql:functions:numeric::isinfx} | | | |:--|:--------| | **Description** |Returns true if the floating point value is infinite, false otherwise. | | **Example** | `isinf('Infinity'::float)` | | **Result** | `true` | ###### `isnan(x)` {#docs:stable:sql:functions:numeric::isnanx} | | | |:--|:--------| | **Description** |Returns true if the floating point value is not a number, false otherwise. | | **Example** | `isnan('NaN'::float)` | | **Result** | `true` | ###### `lcm(x, y)` {#docs:stable:sql:functions:numeric::lcmx-y} | | | |:--|:--------| | **Description** |Computes the least common multiple of `x` and `y`. | | **Example** | `lcm(42, 57)` | | **Result** | `798` | ###### `least_common_multiple(x, y)` {#docs:stable:sql:functions:numeric::least_common_multiplex-y} | | | |:--|:--------| | **Description** |Computes the least common multiple of `x` and `y`. | | **Example** | `least_common_multiple(42, 57)` | | **Result** | `798` | ###### `least(x1, x2, ...)` {#docs:stable:sql:functions:numeric::leastx1-x2-} | | | |:--|:--------| | **Description** |Selects the smallest value. | | **Example** | `least(3, 2, 4, 4)` | | **Result** | `2` | ###### `lgamma(x)` {#docs:stable:sql:functions:numeric::lgammax} | | | |:--|:--------| | **Description** |Computes the log of the `gamma` function. | | **Example** | `lgamma(2)` | | **Result** | `0` | ###### `ln(x)` {#docs:stable:sql:functions:numeric::lnx} | | | |:--|:--------| | **Description** |Computes the natural logarithm of `x`. | | **Example** | `ln(2)` | | **Result** | `0.693` | ###### `log(x)` {#docs:stable:sql:functions:numeric::logx} | | | |:--|:--------| | **Description** |Computes the base-10 log of `x`. | | **Example** | `log(100)` | | **Result** | `2` | ###### `log10(x)` {#docs:stable:sql:functions:numeric::log10x} | | | |:--|:--------| | **Description** |Alias of `log`. Computes the base-10 log of `x`. | | **Example** | `log10(1000)` | | **Result** | `3` | ###### `log2(x)` {#docs:stable:sql:functions:numeric::log2x} | | | |:--|:--------| | **Description** |Computes the base-2 log of `x`. | | **Example** | `log2(8)` | | **Result** | `3` | ###### `multiply(x, y)` {#docs:stable:sql:functions:numeric::multiplyx-y} | | | |:--|:--------| | **Description** |Alias for `x * y`. | | **Example** | `multiply(2, 3)` | | **Result** | `6` | ###### `nextafter(x, y)` {#docs:stable:sql:functions:numeric::nextafterx-y} | | | |:--|:--------| | **Description** |Return the next floating point value after `x` in the direction of `y`. | | **Example** | `nextafter(1::float, 2::float)` | | **Result** | `1.0000001` | ###### `pi()` {#docs:stable:sql:functions:numeric::pi} | | | |:--|:--------| | **Description** |Returns the value of pi. | | **Example** | `pi()` | | **Result** | `3.141592653589793` | ###### `pow(x, y)` {#docs:stable:sql:functions:numeric::powx-y} | | | |:--|:--------| | **Description** |Computes `x` to the power of `y`. | | **Example** | `pow(2, 3)` | | **Result** | `8` | ###### `power(x, y)` {#docs:stable:sql:functions:numeric::powerx-y} | | | |:--|:--------| | **Description** |Alias of `pow`. Computes `x` to the power of `y`. | | **Example** | `power(2, 3)` | | **Result** | `8` | ###### `radians(x)` {#docs:stable:sql:functions:numeric::radiansx} | | | |:--|:--------| | **Description** |Converts degrees to radians. | | **Example** | `radians(90)` | | **Result** | `1.5707963267948966` | ###### `random()` {#docs:stable:sql:functions:numeric::random} | | | |:--|:--------| | **Description** |Returns a random number `x` in the range `0.0 <= x < 1.0`. | | **Example** | `random()` | | **Result** | various | ###### `round_even(v NUMERIC, s INTEGER)` {#docs:stable:sql:functions:numeric::round_evenv-numeric-s-integer} | | | |:--|:--------| | **Description** |Alias of `roundbankers(v, s)`. Round to `s` decimal places using the [_rounding half to even_ rule](https://en.wikipedia.org/wiki/Rounding#Rounding_half_to_even). Values `s < 0` are allowed. | | **Example** | `round_even(24.5, 0)` | | **Result** | `24.0` | ###### `round(v NUMERIC, s INTEGER)` {#docs:stable:sql:functions:numeric::roundv-numeric-s-integer} | | | |:--|:--------| | **Description** |Round to `s` decimal places. Values `s < 0` are allowed. | | **Example** | `round(42.4332, 2)` | | **Result** | `42.43` | ###### `setseed(x)` {#docs:stable:sql:functions:numeric::setseedx} | | | |:--|:--------| | **Description** |Sets the seed to be used for the random function. | | **Example** | `setseed(0.42)` | ###### `sign(x)` {#docs:stable:sql:functions:numeric::signx} | | | |:--|:--------| | **Description** |Returns the sign of `x` as -1, 0 or 1. | | **Example** | `sign(-349)` | | **Result** | `-1` | ###### `signbit(x)` {#docs:stable:sql:functions:numeric::signbitx} | | | |:--|:--------| | **Description** |Returns whether the signbit is set or not. | | **Example** | `signbit(-1.0)` | | **Result** | `true` | ###### `sin(x)` {#docs:stable:sql:functions:numeric::sinx} | | | |:--|:--------| | **Description** |Computes the sin of `x`. | | **Example** | `sin(pi() / 6)` | | **Result** | `0.49999999999999994` | ###### `sqrt(x)` {#docs:stable:sql:functions:numeric::sqrtx} | | | |:--|:--------| | **Description** |Returns the square root of the number. | | **Example** | `sqrt(9)` | | **Result** | `3` | ###### `subtract(x, y)` {#docs:stable:sql:functions:numeric::subtractx-y} | | | |:--|:--------| | **Description** |Alias for `x - y`. | | **Example** | `subtract(2, 3)` | | **Result** | `-1` | ###### `tan(x)` {#docs:stable:sql:functions:numeric::tanx} | | | |:--|:--------| | **Description** |Computes the tangent of `x`. | | **Example** | `tan(pi() / 4)` | | **Result** | `0.9999999999999999` | ###### `trunc(x)` {#docs:stable:sql:functions:numeric::truncx} | | | |:--|:--------| | **Description** |Truncates the number. | | **Example** | `trunc(17.4)` | | **Result** | `17` | ###### `xor(x, y)` {#docs:stable:sql:functions:numeric::xorx-y} | | | |:--|:--------| | **Description** |Bitwise XOR. | | **Example** | `xor(17, 5)` | | **Result** | `20` | ### Pattern Matching {#docs:stable:sql:functions:pattern_matching} There are four separate approaches to pattern matching provided by DuckDB: the traditional SQL [`LIKE` operator](#::like), the more recent [`SIMILAR TO` operator](#::similar-to) (added in SQL:1999), a [`GLOB` operator](#::glob), and POSIX-style [regular expressions](#::regular-expressions). #### `LIKE` {#docs:stable:sql:functions:pattern_matching::like} The `LIKE` expression returns `true` if the string matches the supplied pattern. (As expected, the `NOT LIKE` expression returns `false` if `LIKE` returns `true`, and vice versa. An equivalent expression is `NOT (string LIKE pattern)`.) If pattern does not contain percent signs or underscores, then the pattern only represents the string itself; in that case `LIKE` acts like the equals operator. An underscore (` _`) in pattern stands for (matches) any single character; a percent sign (` %`) matches any sequence of zero or more characters. `LIKE` pattern matching always covers the entire string. Therefore, if it's desired to match a sequence anywhere within a string, the pattern must start and end with a percent sign. Some examples: ```sql SELECT 'abc' LIKE 'abc'; -- true SELECT 'abc' LIKE 'a%' ; -- true SELECT 'abc' LIKE '_b_'; -- true SELECT 'abc' LIKE 'c'; -- false SELECT 'abc' LIKE 'c%' ; -- false SELECT 'abc' LIKE '%c'; -- true SELECT 'abc' NOT LIKE '%c'; -- false ``` The keyword `ILIKE` can be used instead of `LIKE` to make the match case-insensitive according to the active locale: ```sql SELECT 'abc' ILIKE '%C'; -- true ``` ```sql SELECT 'abc' NOT ILIKE '%C'; -- false ``` To search within a string for a character that is a wildcard (` %` or `_`), the pattern must use an `ESCAPE` clause and an escape character to indicate the wildcard should be treated as a literal character instead of a wildcard. See an example below. Additionally, the function `like_escape` has the same functionality as a `LIKE` expression with an `ESCAPE` clause, but using function syntax. See the [Text Functions Docs](#docs:stable:sql:functions:text) for details. Search for strings with 'a' then a literal percent sign then 'c': ```sql SELECT 'a%c' LIKE 'a$%c' ESCAPE '$'; -- true SELECT 'azc' LIKE 'a$%c' ESCAPE '$'; -- false ``` Case-insensitive ILIKE with ESCAPE: ```sql SELECT 'A%c' ILIKE 'a$%c' ESCAPE '$'; -- true ``` There are also alternative characters that can be used as keywords in place of `LIKE` expressions. These enhance PostgreSQL compatibility. | PostgreSQL-style | `LIKE`-style | | :--------------- | :----------- | | `~~` | `LIKE` | | `!~~` | `NOT LIKE` | | `~~*` | `ILIKE` | | `!~~*` | `NOT ILIKE` | #### `SIMILAR TO` {#docs:stable:sql:functions:pattern_matching::similar-to} The `SIMILAR TO` operator returns true or false depending on whether its pattern matches the given string. It is similar to `LIKE`, except that it interprets the pattern using a [regular expression](#docs:stable:sql:functions:regular_expressions). Like `LIKE`, the `SIMILAR TO` operator succeeds only if its pattern matches the entire string; this is unlike common regular expression behavior where the pattern can match any part of the string. A regular expression is a character sequence that is an abbreviated definition of a set of strings (a regular set). A string is said to match a regular expression if it is a member of the regular set described by the regular expression. As with `LIKE`, pattern characters match string characters exactly unless they are special characters in the regular expression language â€” but regular expressions use different special characters than `LIKE` does. Some examples: ```sql SELECT 'abc' SIMILAR TO 'abc'; -- true SELECT 'abc' SIMILAR TO 'a'; -- false SELECT 'abc' SIMILAR TO '.*(b|d).*'; -- true SELECT 'abc' SIMILAR TO '(b|c).*'; -- false SELECT 'abc' NOT SIMILAR TO 'abc'; -- false ``` > In PostgreSQL, `~` is equivalent to `SIMILAR TO` > and `!~` is equivalent to `NOT SIMILAR TO`. > In DuckDB, these equivalences do not hold currently, > see the [PostgreSQL compatibility page](#docs:stable:sql:dialect:postgresql_compatibility). #### Globbing {#docs:stable:sql:functions:pattern_matching::globbing} DuckDB supports file name expansion, also known as globbing, for discovering files. DuckDB's glob syntax uses the question mark (` ?`) wildcard to match any single character and the asterisk (` *`) to match zero or more characters. In addition, you can use the bracket syntax (` [...]`) to match any single character contained within the brackets, or within the character range specified by the brackets. An exclamation mark (` !`) may be used inside the first bracket to search for a character that is not contained within the brackets. To learn more, visit the [â€œglob (programming)â€ Wikipedia page](https://en.wikipedia.org/wiki/Glob_(programming)). ##### `GLOB` {#docs:stable:sql:functions:pattern_matching::glob} The `GLOB` operator returns `true` or `false` if the string matches the `GLOB` pattern. The `GLOB` operator is most commonly used when searching for filenames that follow a specific pattern (for example a specific file extension). Some examples: ```sql SELECT 'best.txt' GLOB '*.txt'; -- true SELECT 'best.txt' GLOB '????.txt'; -- true SELECT 'best.txt' GLOB '?.txt'; -- false SELECT 'best.txt' GLOB '[abc]est.txt'; -- true SELECT 'best.txt' GLOB '[a-z]est.txt'; -- true ``` The bracket syntax is case-sensitive: ```sql SELECT 'Best.txt' GLOB '[a-z]est.txt'; -- false SELECT 'Best.txt' GLOB '[a-zA-Z]est.txt'; -- true ``` The `!` applies to all characters within the brackets: ```sql SELECT 'Best.txt' GLOB '[!a-zA-Z]est.txt'; -- false ``` To negate a GLOB operator, negate the entire expression: ```sql SELECT NOT 'best.txt' GLOB '*.txt'; -- false ``` Three tildes (` ~~~`) may also be used in place of the `GLOB` keyword. | GLOB-style | Symbolic-style | | :--------- | :------------- | | `GLOB` | `~~~` | ##### Glob Function to Find Filenames {#docs:stable:sql:functions:pattern_matching::glob-function-to-find-filenames} The glob pattern matching syntax can also be used to search for filenames using the `glob` table function. It accepts one parameter: the path to search (which may include glob patterns). Search the current directory for all files: ```sql SELECT * FROM glob('*'); ``` | file | | ------------- | | duckdb.exe | | test.csv | | test.json | | test.parquet | | test2.csv | | test2.parquet | | todos.json | ##### Globbing Semantics {#docs:stable:sql:functions:pattern_matching::globbing-semantics} DuckDB's globbing implementation follows the semantics of [Python's `glob`](https://docs.python.org/3/library/glob.html) and not the `glob` used in the shell. A notable difference is the behavior of the `**/` construct: `**/âŸ¨filenameâŸ©`{:.language-sql .highlight} will not return a file with `âŸ¨filenameâŸ©`{:.language-sql .highlight} in top-level directory. For example, with a `README.md` file present in the directory, the following query finds it: ```sql SELECT * FROM glob('README.md'); ``` | file | | --------- | | README.md | However, the following query returns an empty result: ```sql SELECT * FROM glob('**/README.md'); ``` Meanwhile, the globbing of Bash, Zsh, etc. finds the file using the same syntax: ```batch ls **/README.md ``` ```text README.md ``` #### Regular Expressions {#docs:stable:sql:functions:pattern_matching::regular-expressions} DuckDB's regular expression support is documented on the [Regular Expressions page](#docs:stable:sql:functions:regular_expressions). DuckDB supports some PostgreSQL-style operators for regular expression matching: | PostgreSQL-style | Equivalent expression | | :--------------- | :------------------------------------------------------------------------------------------------------- | | `~` | [`regexp_full_match`](#docs:stable:sql:functions:text::regexp_full_matchstring-regex) | | `!~` | `NOT` [`regexp_full_match`](#docs:stable:sql:functions:text::regexp_full_matchstring-regex) | | `~*` | (not supported) | | `!~*` | (not supported) | ### Regular Expressions {#docs:stable:sql:functions:regular_expressions} DuckDB offers [pattern matching operators](#docs:stable:sql:functions:pattern_matching) ([`LIKE`](#docs:stable:sql:functions:pattern_matching::like), [`SIMILAR TO`](#docs:stable:sql:functions:pattern_matching::similar-to), [`GLOB`](#docs:stable:sql:functions:pattern_matching::glob)), as well as support for regular expressions via functions. #### Regular Expression Syntax {#docs:stable:sql:functions:regular_expressions::regular-expression-syntax} DuckDB uses the [RE2 library](https://github.com/google/re2) as its regular expression engine. For the regular expression syntax, see the [RE2 docs](https://github.com/google/re2/wiki/Syntax). #### Functions {#docs:stable:sql:functions:regular_expressions::functions} All functions accept an optional set of [options](#::options-for-regular-expression-functions). | Name | Description | |:--|:-------| | [`regexp_extract(string, pattern[, group = 0][, options])`](#regexp_extractstring-pattern-group--0-options) | If `string` contains the regexp `pattern`, returns the capturing group specified by optional parameter `group`; otherwise, returns the empty string. The `group` must be a constant value. If no `group` is given, it defaults to 0. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. | | [`regexp_extract(string, pattern, name_list[, options])`](#regexp_extractstring-pattern-name_list-options) | If `string` contains the regexp `pattern`, returns the capturing groups as a struct with corresponding names from `name_list`; otherwise, returns a struct with the same keys and empty strings as values. | | [`regexp_extract_all(string, regex[, group = 0][, options])`](#regexp_extract_allstring-regex-group--0-options) | Finds non-overlapping occurrences of `regex` in `string` and returns the corresponding values of `group`. | | [`regexp_full_match(string, regex[, options])`](#regexp_full_matchstring-regex-options) | Returns `true` if the entire `string` matches the `regex`. | | [`regexp_matches(string, pattern[, options])`](#regexp_matchesstring-pattern-options) | Returns `true` if `string` contains the regexp `pattern`, `false` otherwise. | | [`regexp_replace(string, pattern, replacement[, options])`](#regexp_replacestring-pattern-replacement-options) | If `string` contains the regexp `pattern`, replaces the matching part with `replacement`. By default, only the first occurrence is replaced. A set of optional [`options`](#::options-for-regular-expression-functions), including the global flag `g`, can be set. | | [`regexp_split_to_array(string, regex[, options])`](#regexp_split_to_arraystring-regex-options) | Alias of `string_split_regex`. Splits the `string` along the `regex`. | | [`regexp_split_to_table(string, regex[, options])`](#regexp_split_to_tablestring-regex-options) | Splits the `string` along the `regex` and returns a row for each part. | ###### `regexp_extract(string, pattern[, group = 0][, options])` {#docs:stable:sql:functions:regular_expressions::regexp_extractstring-pattern-group--0-options} | | | |:--|:--------| | **Description** |If `string` contains the regexp `pattern`, returns the capturing group specified by optional parameter `group`; otherwise, returns the empty string. The `group` must be a constant value. If no `group` is given, it defaults to 0. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. | | **Example** | `regexp_extract('abc', '([a-z])(b)', 1)` | | **Result** | `a` | ###### `regexp_extract(string, pattern, name_list[, options])` {#docs:stable:sql:functions:regular_expressions::regexp_extractstring-pattern-name_list-options} | | | |:--|:--------| | **Description** |If `string` contains the regexp `pattern`, returns the capturing groups as a struct with corresponding names from `name_list`; otherwise, returns a struct with the same keys and empty strings as values. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. | | **Example** | `regexp_extract('2023-04-15', '(\d+)-(\d+)-(\d+)', ['y', 'm', 'd'])` | | **Result** | `{'y':'2023', 'm':'04', 'd':'15'}` | ###### `regexp_extract_all(string, regex[, group = 0][, options])` {#docs:stable:sql:functions:regular_expressions::regexp_extract_allstring-regex-group--0-options} | | | |:--|:--------| | **Description** |Finds non-overlapping occurrences of `regex` in `string` and returns the corresponding values of `group`. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. | | **Example** | `regexp_extract_all('Peter: 33, Paul:14', '(\w+):\s*(\d+)', 2)` | | **Result** | `[33, 14]` | ###### `regexp_full_match(string, regex[, options])` {#docs:stable:sql:functions:regular_expressions::regexp_full_matchstring-regex-options} | | | |:--|:--------| | **Description** |Returns `true` if the entire `string` matches the `regex`. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. | | **Example** | `regexp_full_match('anabanana', '(an)*')` | | **Result** | `false` | ###### `regexp_matches(string, pattern[, options])` {#docs:stable:sql:functions:regular_expressions::regexp_matchesstring-pattern-options} | | | |:--|:--------| | **Description** |Returns `true` if `string` contains the regexp `pattern`, `false` otherwise. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. | | **Example** | `regexp_matches('anabanana', '(an)*')` | | **Result** | `true` | ###### `regexp_replace(string, pattern, replacement[, options])` {#docs:stable:sql:functions:regular_expressions::regexp_replacestring-pattern-replacement-options} | | | |:--|:--------| | **Description** |If `string` contains the regexp `pattern`, replaces the matching part with `replacement`. By default, only the first occurrence is replaced. A set of optional [`options`](#::options-for-regular-expression-functions), including the global flag `g`, can be set. | | **Example** | `regexp_replace('hello', '[lo]', '-')` | | **Result** | `he-lo` | ###### `regexp_split_to_array(string, regex[, options])` {#docs:stable:sql:functions:regular_expressions::regexp_split_to_arraystring-regex-options} | | | |:--|:--------| | **Description** |Alias of `string_split_regex`. Splits the `string` along the `regex`. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. | | **Example** | `regexp_split_to_array('hello world; 42', ';? ')` | | **Result** | `['hello', 'world', '42']` | ###### `regexp_split_to_table(string, regex[, options])` {#docs:stable:sql:functions:regular_expressions::regexp_split_to_tablestring-regex-options} | | | |:--|:--------| | **Description** |Splits the `string` along the `regex` and returns a row for each part. A set of optional [`options`](#::options-for-regular-expression-functions) can be set. | | **Example** | `regexp_split_to_table('hello world; 42', ';? ')` | | **Result** | Three rows: `'hello'`, `'world'`, `'42'` | The `regexp_matches` function is similar to the `SIMILAR TO` operator, however, it does not require the entire string to match. Instead, `regexp_matches` returns `true` if the string merely contains the pattern (unless the special tokens `^` and `$` are used to anchor the regular expression to the start and end of the string). Below are some examples: ```sql SELECT regexp_matches('abc', 'abc'); -- true SELECT regexp_matches('abc', '^abc$'); -- true SELECT regexp_matches('abc', 'a'); -- true SELECT regexp_matches('abc', '^a$'); -- false SELECT regexp_matches('abc', '.*(b|d).*'); -- true SELECT regexp_matches('abc', '(b|c).*'); -- true SELECT regexp_matches('abc', '^(b|c).*'); -- false SELECT regexp_matches('abc', '(?i)A'); -- true SELECT regexp_matches('abc', 'A', 'i'); -- true ``` #### Options for Regular Expression Functions {#docs:stable:sql:functions:regular_expressions::options-for-regular-expression-functions} The regex functions support the following `options`. | Option | Description | |:---|:---| | `'c'` | Case-sensitive matching | | `'i'` | Case-insensitive matching | | `'l'` | Match literals instead of regular expression tokens | | `'m'`, `'n'`, `'p'` | Newline sensitive matching | | `'g'` | Global replace, only available for `regexp_replace` | | `'s'` | Non-newline sensitive matching | For example: ```sql SELECT regexp_matches('abcd', 'ABC', 'c'); -- false SELECT regexp_matches('abcd', 'ABC', 'i'); -- true SELECT regexp_matches('ab^/$cd', '^/$', 'l'); -- true SELECT regexp_matches(E'hello\nworld', 'hello.world', 'p'); -- false SELECT regexp_matches(E'hello\nworld', 'hello.world', 's'); -- true ``` ##### Using `regexp_matches` {#docs:stable:sql:functions:regular_expressions::using-regexp_matches} The `regexp_matches` operator will be optimized to the `LIKE` operator when possible. To achieve best performance, the `'c'` option (case-sensitive matching) should be passed if applicable. Note that by default the [`RE2` library](#::regular-expression-syntax) doesn't match the `.` character to newline. | Original | Optimized equivalent | |:---|:---| | `regexp_matches('hello world', '^hello', 'c')` | `prefix('hello world', 'hello')` | | `regexp_matches('hello world', 'world$', 'c')` | `suffix('hello world', 'world')` | | `regexp_matches('hello world', 'hello.world', 'c')` | `LIKE 'hello_world'` | | `regexp_matches('hello world', 'he.*rld', 'c')` | `LIKE '%he%rld'` | ##### Using `regexp_replace` {#docs:stable:sql:functions:regular_expressions::using-regexp_replace} The `regexp_replace` function can be used to replace the part of a string that matches the regexp pattern with a replacement string. The notation `\d` (where `d` is a number indicating the group) can be used to refer to groups captured in the regular expression in the replacement string. Note that by default, `regexp_replace` only replaces the first occurrence of the regular expression. To replace all occurrences, use the global replace (` g`) flag. Some examples for using `regexp_replace`: ```sql SELECT regexp_replace('abc', '(b|c)', 'X'); -- aXc SELECT regexp_replace('abc', '(b|c)', 'X', 'g'); -- aXX SELECT regexp_replace('abc', '(b|c)', '\1\1\1\1'); -- abbbbc SELECT regexp_replace('abc', '(.*)c', '\1e'); -- abe SELECT regexp_replace('abc', '(a)(b)', '\2\1'); -- bac ``` ##### Using `regexp_extract` {#docs:stable:sql:functions:regular_expressions::using-regexp_extract} The `regexp_extract` function is used to extract a part of a string that matches the regexp pattern. A specific capturing group within the pattern can be extracted using the `group` parameter. If `group` is not specified, it defaults to 0, extracting the first match with the whole pattern. ```sql SELECT regexp_extract('abc', '.b.'); -- abc SELECT regexp_extract('abc', '.b.', 0); -- abc SELECT regexp_extract('abc', '.b.', 1); -- (empty) SELECT regexp_extract('abc', '([a-z])(b)', 1); -- a SELECT regexp_extract('abc', '([a-z])(b)', 2); -- b ``` The `regexp_extract` function also supports a `name_list` argument, which is a `LIST` of strings. Using `name_list`, the `regexp_extract` will return the corresponding capture groups as fields of a `STRUCT`: ```sql SELECT regexp_extract('2023-04-15', '(\d+)-(\d+)-(\d+)', ['y', 'm', 'd']); ``` ```text {'y': 2023, 'm': 04, 'd': 15} ``` ```sql SELECT regexp_extract('2023-04-15 07:59:56', '^(\d+)-(\d+)-(\d+) (\d+):(\d+):(\d+)', ['y', 'm', 'd']); ``` ```text {'y': 2023, 'm': 04, 'd': 15} ``` ```sql SELECT regexp_extract('duckdb_0_7_1', '^(\w+)_(\d+)_(\d+)', ['tool', 'major', 'minor', 'fix']); ``` ```console Binder Error: Not enough group names in regexp_extract ``` If the number of column names is less than the number of capture groups, then only the first groups are returned. If the number of column names is greater, then an error is generated. #### Limitations {#docs:stable:sql:functions:regular_expressions::limitations} Regular expressions only support 9 capture groups: `\1`, `\2`, `\3`, ..., `\9`. Capture groups with two or more digits are not supported. ### Struct Functions {#docs:stable:sql:functions:struct} | Name | Description | |:--|:-------| | [`struct.entry`](#::structentry) | Dot notation that serves as an alias for `struct_extract` from named `STRUCT`s. | | [`struct[entry]`](#structentry) | Bracket notation that serves as an alias for `struct_extract` from named `STRUCT`s. | | [`struct[idx]`](#structidx) | Bracket notation that serves as an alias for `struct_extract` from unnamed `STRUCT`s (tuples), using an index (1-based). | | [`row(any, ...)`](#::rowany-) | Create an unnamed `STRUCT` (tuple) containing the argument values. | | [`struct_concat(structs...)`](#::struct_concatstructs) | Merge the multiple `structs` into a single `STRUCT`. | | [`struct_extract(struct, 'entry')`](#::struct_extractstruct-entry) | Extract the named entry from the `STRUCT`. | | [`struct_extract(struct, idx)`](#::struct_extractstruct-idx) | Extract the entry from an unnamed `STRUCT` (tuple) using an index (1-based). | | [`struct_extract_at(struct, idx)`](#::struct_extract_atstruct-idx) | Extract the entry from a `STRUCT` (tuple) using an index (1-based). | | [`struct_insert(struct, name := any, ...)`](#::struct_insertstruct-name--any-) | Add field(s) to an existing `STRUCT`. | | [`struct_pack(name := any, ...)`](#::struct_packname--any-) | Create a `STRUCT` containing the argument values. The entry name will be the bound variable name. | | [`struct_update(struct, name := any, ...)`](#::struct_updatestruct-name--any-) | Add or update field(s) of an existing `STRUCT`. | ###### `struct.entry` {#docs:stable:sql:functions:struct::structentry} | | | |:--|:--------| | **Description** |Dot notation that serves as an alias for `struct_extract` from named `STRUCT`s. | | **Example** | `({'i': 3, 's': 'string'}).i` | | **Result** | `3` | ###### `struct[entry]` {#docs:stable:sql:functions:struct::structentry} | | | |:--|:--------| | **Description** |Bracket notation that serves as an alias for `struct_extract` from named `STRUCT`s. | | **Example** | `({'i': 3, 's': 'string'})['i']` | | **Result** | `3` | ###### `struct[idx]` {#docs:stable:sql:functions:struct::structidx} | | | |:--|:--------| | **Description** |Bracket notation that serves as an alias for `struct_extract` from unnamed `STRUCT`s (tuples), using an index (1-based). | | **Example** | `(row(42, 84))[1]` | | **Result** | `42` | ###### `row(any, ...)` {#docs:stable:sql:functions:struct::rowany-} | | | |:--|:--------| | **Description** |Create an unnamed `STRUCT` (tuple) containing the argument values. | | **Example** | `row(i, i % 4, i / 4)` | | **Result** | `(10, 2, 2.5)` | ###### `struct_concat(structs...)` {#docs:stable:sql:functions:struct::struct_concatstructs} | | | |:--|:--------| | **Description** |Merge the multiple `structs` into a single `STRUCT`. | | **Example** | `struct_concat(struct_pack(i := 4), struct_pack(s := 'string'))` | | **Result** | `{'i': 4, 's': string}` | ###### `struct_extract(struct, 'entry')` {#docs:stable:sql:functions:struct::struct_extractstruct-entry} | | | |:--|:--------| | **Description** |Extract the named entry from the `STRUCT`. | | **Example** | `struct_extract({'i': 3, 'v2': 3, 'v3': 0}, 'i')` | | **Result** | `3` | ###### `struct_extract(struct, idx)` {#docs:stable:sql:functions:struct::struct_extractstruct-idx} | | | |:--|:--------| | **Description** |Extract the entry from an unnamed `STRUCT` (tuple) using an index (1-based). | | **Example** | `struct_extract(row(42, 84), 1)` | | **Result** | `42` | ###### `struct_extract_at(struct, idx)` {#docs:stable:sql:functions:struct::struct_extract_atstruct-idx} | | | |:--|:--------| | **Description** |Extract the entry from a `STRUCT` (tuple) using an index (1-based). | | **Example** | `struct_extract_at({'v1': 10, 'v2': 20, 'v3': 3}, 20)` | | **Result** | `20` | ###### `struct_insert(struct, name := any, ...)` {#docs:stable:sql:functions:struct::struct_insertstruct-name--any-} | | | |:--|:--------| | **Description** |Add field(s) to an existing `STRUCT`. | | **Example** | `struct_insert({'a': 1}, b := 2)` | | **Result** | `{'a': 1, 'b': 2}` | ###### `struct_pack(name := any, ...)` {#docs:stable:sql:functions:struct::struct_packname--any-} | | | |:--|:--------| | **Description** |Create a `STRUCT` containing the argument values. The entry name will be the bound variable name. | | **Example** | `struct_pack(i := 4, s := 'string')` | | **Result** | `{'i': 4, 's': string}` | ###### `struct_update(struct, name := any, ...)` {#docs:stable:sql:functions:struct::struct_updatestruct-name--any-} | | | |:--|:--------| | **Description** |Add or update field(s) of an existing `STRUCT`. | | **Example** | `struct_insert({'a': 1, 'b': 2}, b := 3, c := 4)` | | **Result** | `{'a': 1, 'b': 3, 'c': 4}` | ### Text Functions {#docs:stable:sql:functions:text} #### Text Functions and Operators {#docs:stable:sql:functions:text::text-functions-and-operators} This section describes functions and operators for examining and manipulating [`STRING` values](#docs:stable:sql:data_types:text). | Function | Description | |:--|:-------| | [`string[index]`](#stringindex) | Extracts a single character using a (1-based) `index`. | | [`string[begin:end]`](#stringbeginend) | Extracts a string using [slice conventions](#docs:stable:sql:functions:list::slicing) similar to Python. Missing `begin` or `end` arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted. | | [`string LIKE target`](#::string-like-target) | Returns `true` if the `string` matches the like specifier (see [Pattern Matching](#docs:stable:sql:functions:pattern_matching)). | | [`string SIMILAR TO regex`](#::string-similar-to-regex) | Returns `true` if the `string` matches the `regex` (see [Pattern Matching](#docs:stable:sql:functions:pattern_matching)). | | [`string ^@ search_string`](#::starts_withstring-search_string) | Alias for `starts_with`. | | [`arg1 || arg2`](#::arg1--arg2) | Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also [`concat(arg1, arg2, ...)`](#docs:stable:sql:functions:text::concatvalue-) and [`list_concat(list1, list2, ...)`](#docs:stable:sql:functions:list::list_concatlist_1--list_n). | | [`array_extract(string, index)`](#::array_extractstring-index) | Extracts a single character from a `string` using a (1-based) `index`. | | [`array_slice(list, begin, end)`](#::array_slicelist-begin-end) | Extracts a sublist or substring using [slice conventions](#docs:stable:sql:functions:list::slicing). Negative values are accepted. | | [`ascii(string)`](#::asciistring) | Returns an integer that represents the Unicode code point of the first character of the `string`. | | [`bar(x, min, max[, width])`](#barx-min-max-width) | Draws a band whose width is proportional to (` x - min`) and equal to `width` characters when `x` = `max`. `width` defaults to 80. | | [`base64(blob)`](#::to_base64blob) | Alias for `to_base64`. | | [`bin(string)`](#::binstring) | Converts the `string` to binary representation. | | [`bit_length(string)`](#::bit_lengthstring) | Number of bits in a `string`. | | [`char_length(string)`](#::lengthstring) | Alias for `length`. | | [`character_length(string)`](#::lengthstring) | Alias for `length`. | | [`chr(code_point)`](#::chrcode_point) | Returns a character which is corresponding the ASCII code value or Unicode code point. | | [`concat(value, ...)`](#::concatvalue-) | Concatenates multiple strings or lists. `NULL` inputs are skipped. See also [operator `||`](#::arg1--arg2). | | [`concat_ws(separator, string, ...)`](#::concat_wsseparator-string-) | Concatenates many strings, separated by `separator`. `NULL` inputs are skipped. | | [`contains(string, search_string)`](#::containsstring-search_string) | Returns `true` if `search_string` is found within `string`. Note that [collations](#docs:stable:sql:expressions:collations) are not supported. |g | [`ends_with(string, search_string)`](#::suffixstring-search_string) | Alias for `suffix`. | | [`format(format, ...)`](#::formatformat-) | Formats a string using the [fmt syntax](#::fmt-syntax). | | [`formatReadableDecimalSize(integer)`](#::formatreadabledecimalsizeinteger) | Converts `integer` to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.). | | [`formatReadableSize(integer)`](#::format_bytesinteger) | Alias for `format_bytes`. | | [`format_bytes(integer)`](#::format_bytesinteger) | Converts `integer` to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.). | | [`from_base64(string)`](#::from_base64string) | Converts a base64 encoded `string` to a character string (` BLOB`). | | [`from_binary(value)`](#::unbinvalue) | Alias for `unbin`. | | [`from_hex(value)`](#::unhexvalue) | Alias for `unhex`. | | [`greatest(arg1, ...)`](#::greatestarg1-) | Returns the largest value in lexicographical order. Note that lowercase characters are considered larger than uppercase characters and [collations](#docs:stable:sql:expressions:collations) are not supported. | | [`hash(value, ...)`](#::hashvalue-) | Returns a `UBIGINT` with the hash of the `value`. Note that this is not a cryptographic hash. | | [`hex(string)`](#::hexstring) | Converts the `string` to hexadecimal representation. | | [`ilike_escape(string, like_specifier, escape_character)`](#::ilike_escapestring-like_specifier-escape_character) | Returns `true` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:stable:sql:functions:pattern_matching)) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | [`instr(string, search_string)`](#::instrstring-search_string) | Returns location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. | | [`lcase(string)`](#::lowerstring) | Alias for `lower`. | | [`least(arg1, ...)`](#::leastarg1-) | Returns the smallest value in lexicographical order. Note that uppercase characters are considered smaller than lowercase characters and [collations](#docs:stable:sql:expressions:collations) are not supported. | | [`left(string, count)`](#::leftstring-count) | Extracts the left-most count characters. | | [`left_grapheme(string, count)`](#::left_graphemestring-count) | Extracts the left-most count grapheme clusters. | | [`len(string)`](#::lengthstring) | Alias for `length`. | | [`length(string)`](#::lengthstring) | Number of characters in `string`. | | [`length_grapheme(string)`](#::length_graphemestring) | Number of grapheme clusters in `string`. | | [`like_escape(string, like_specifier, escape_character)`](#::like_escapestring-like_specifier-escape_character) | Returns `true` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:stable:sql:functions:pattern_matching)) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | [`lower(string)`](#::lowerstring) | Converts `string` to lower case. | | [`lpad(string, count, character)`](#::lpadstring-count-character) | Pads the `string` with the `character` on the left until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters. | | [`ltrim(string[, characters])`](#ltrimstring-characters) | Removes any occurrences of any of the `characters` from the left side of the `string`. `characters` defaults to `space`. | | [`md5(string)`](#::md5string) | Returns the MD5 hash of the `string` as a `VARCHAR`. | | [`md5_number(string)`](#::md5_numberstring) | Returns the MD5 hash of the `string` as a `HUGEINT`. | | [`md5_number_lower(string)`](#::md5_number_lowerstring) | Returns the lower 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. | | [`md5_number_upper(string)`](#::md5_number_upperstring) | Returns the upper 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. | | [`nfc_normalize(string)`](#::nfc_normalizestring) | Converts `string` to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not. | | [`not_ilike_escape(string, like_specifier, escape_character)`](#::not_ilike_escapestring-like_specifier-escape_character) | Returns `false` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:stable:sql:functions:pattern_matching)) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | [`not_like_escape(string, like_specifier, escape_character)`](#::not_like_escapestring-like_specifier-escape_character) | Returns `false` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:stable:sql:functions:pattern_matching)) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | [`ord(string)`](#::unicodestring) | Alias for `unicode`. | | [`parse_dirname(path[, separator])`](#parse_dirnamepath-separator) | Returns the top-level directory name from the given `path`. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | [`parse_dirpath(path[, separator])`](#parse_dirpathpath-separator) | Returns the head of the `path` (the pathname until the last slash) similarly to Python's [`os.path.dirname`](https://docs.python.org/3.7/library/os.path.html#os.path.dirname). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | [`parse_filename(string[, trim_extension][, separator])`](#parse_filenamestring-trim_extension-separator) | Returns the last component of the `path` similarly to Python's [`os.path.basename`](https://docs.python.org/3.7/library/os.path.html#os.path.basename) function. If `trim_extension` is `true`, the file extension will be removed (defaults to `false`). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | [`parse_path(path[, separator])`](#parse_pathpath-separator) | Returns a list of the components (directories and filename) in the `path` similarly to Python's [`pathlib.parts`](https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.parts) function. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | [`position(search_string IN string)`](#::positionsearch_string-in-string) | Return location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. | | [`position(string, search_string)`](#::instrstring-search_string) | Alias for `instr`. | | [`prefix(string, search_string)`](#::prefixstring-search_string) | Returns `true` if `string` starts with `search_string`. | | [`printf(format, ...)`](#::printfformat-) | Formats a `string` using [printf syntax](#::printf-syntax). | | [`read_text(source)`](#::read_textsource) | Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `VARCHAR`. The file content is first validated to be valid UTF-8. If `read_text` attempts to read a file with invalid UTF-8 an error is thrown suggesting to use `read_blob` instead. See the [`read_text` guide](#docs:stable:guides:file_formats:read_file::read_text) for more details. | | [`regexp_escape(string)`](#::regexp_escapestring) | Escapes special patterns to turn `string` into a regular expression similarly to Python's [`re.escape` function](https://docs.python.org/3/library/re.html#re.escape). | | [`regexp_extract(string, regex[, group][, options])`](#regexp_extractstring-regex-group-options) | If `string` contains the `regex` pattern, returns the capturing group specified by optional parameter `group`; otherwise, returns the empty string. The `group` must be a constant value. If no `group` is given, it defaults to 0. A set of optional [regex `options`](#docs:stable:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. | | [`regexp_extract(string, regex, name_list[, options])`](#regexp_extractstring-regex-name_list-options) | If `string` contains the `regex` pattern, returns the capturing groups as a struct with corresponding names from `name_list`; otherwise, returns a struct with the same keys and empty strings as values. A set of optional [regex `options`](#docs:stable:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. | | [`regexp_extract_all(string, regex[, group][, options])`](#regexp_extract_allstring-regex-group-options) | Finds non-overlapping occurrences of the `regex` in the `string` and returns the corresponding values of the capturing `group`. A set of optional [regex `options`](#docs:stable:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. | | [`regexp_full_match(string, regex[, col2])`](#regexp_full_matchstring-regex-col2) | Returns `true` if the entire `string` matches the `regex`. A set of optional [regex `options`](#docs:stable:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. | | [`regexp_matches(string, regex[, options])`](#regexp_matchesstring-regex-options) | Returns `true` if `string` contains the `regex`, `false` otherwise. A set of optional [regex `options`](#docs:stable:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. | | [`regexp_replace(string, regex, replacement[, options])`](#regexp_replacestring-regex-replacement-options) | If `string` contains the `regex`, replaces the matching part with `replacement`. A set of optional [regex `options`](#docs:stable:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. | | [`regexp_split_to_array(string, regex[, options])`](#string_split_regexstring-regex-options) | Alias for `string_split_regex`. | | [`regexp_split_to_table(string, regex)`](#::regexp_split_to_tablestring-regex) | Splits the `string` along the `regex` and returns a row for each part. | | [`repeat(string, count)`](#::repeatstring-count) | Repeats the `string` `count` number of times. | | [`replace(string, source, target)`](#::replacestring-source-target) | Replaces any occurrences of the `source` with `target` in `string`. | | [`reverse(string)`](#::reversestring) | Reverses the `string`. | | [`right(string, count)`](#::rightstring-count) | Extract the right-most `count` characters. | | [`right_grapheme(string, count)`](#::right_graphemestring-count) | Extracts the right-most `count` grapheme clusters. | | [`rpad(string, count, character)`](#::rpadstring-count-character) | Pads the `string` with the `character` on the right until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters. | | [`rtrim(string[, characters])`](#rtrimstring-characters) | Removes any occurrences of any of the `characters` from the right side of the `string`. `characters` defaults to `space`. | | [`sha1(value)`](#::sha1value) | Returns a `VARCHAR` with the SHA-1 hash of the `value`. | | [`sha256(value)`](#::sha256value) | Returns a `VARCHAR` with the SHA-256 hash of the `value` | | [`split(string, separator)`](#::string_splitstring-separator) | Alias for `string_split`. | | [`split_part(string, separator, index)`](#::split_partstring-separator-index) | Splits the `string` along the `separator` and returns the data at the (1-based) `index` of the list. If the `index` is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior). | | [`starts_with(string, search_string)`](#::starts_withstring-search_string) | Returns `true` if `string` begins with `search_string`. | | [`str_split(string, separator)`](#::string_splitstring-separator) | Alias for `string_split`. | | [`str_split_regex(string, regex[, options])`](#string_split_regexstring-regex-options) | Alias for `string_split_regex`. | | [`string_split(string, separator)`](#::string_splitstring-separator) | Splits the `string` along the `separator`. | | [`string_split_regex(string, regex[, options])`](#string_split_regexstring-regex-options) | Splits the `string` along the `regex`. A set of optional [regex `options`](#docs:stable:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. | | [`string_to_array(string, separator)`](#::string_splitstring-separator) | Alias for `string_split`. | | [`strip_accents(string)`](#::strip_accentsstring) | Strips accents from `string`. | | [`strlen(string)`](#::strlenstring) | Number of bytes in `string`. | | [`strpos(string, search_string)`](#::instrstring-search_string) | Alias for `instr`. | | [`substr(string, start[, length])`](#substringstring-start-length) | Alias for `substring`. | | [`substring(string, start[, length])`](#substringstring-start-length) | Extracts substring starting from character `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` characters instead. Note that a `start` value of `1` refers to the first character of the `string`. | | [`substring_grapheme(string, start[, length])`](#substring_graphemestring-start-length) | Extracts substring starting from grapheme clusters `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` grapheme clusters instead. Note that a `start` value of `1` refers to the `first` character of the `string`. | | [`suffix(string, search_string)`](#::suffixstring-search_string) | Returns `true` if `string` ends with `search_string`. Note that [collations](#docs:stable:sql:expressions:collations) are not supported. | | [`to_base(number, radix[, min_length])`](#to_basenumber-radix-min_length) | Converts `number` to a string in the given base `radix`, optionally padding with leading zeros to `min_length`. | | [`to_base64(blob)`](#::to_base64blob) | Converts a `blob` to a base64 encoded string. | | [`to_binary(string)`](#::binstring) | Alias for `bin`. | | [`to_hex(string)`](#::hexstring) | Alias for `hex`. | | [`translate(string, from, to)`](#::translatestring-from-to) | Replaces each character in `string` that matches a character in the `from` set with the corresponding character in the `to` set. If `from` is longer than `to`, occurrences of the extra characters in `from` are deleted. | | [`trim(string[, characters])`](#trimstring-characters) | Removes any occurrences of any of the `characters` from either side of the `string`. `characters` defaults to `space`. | | [`ucase(string)`](#::upperstring) | Alias for `upper`. | | [`unbin(value)`](#::unbinvalue) | Converts a `value` from binary representation to a blob. | | [`unhex(value)`](#::unhexvalue) | Converts a `value` from hexadecimal representation to a blob. | | [`unicode(string)`](#::unicodestring) | Returns an `INTEGER` representing the `unicode` codepoint of the first character in the `string`. | | [`upper(string)`](#::upperstring) | Converts `string` to upper case. | | [`url_decode(string)`](#::url_decodestring) | Decodes a URL from a representation using [Percent-Encoding](https://datatracker.ietf.org/doc/html/rfc3986#section-2.1). | | [`url_encode(string)`](#::url_encodestring) | Encodes a URL to a representation using [Percent-Encoding](https://datatracker.ietf.org/doc/html/rfc3986#section-2.1). | ###### `string[index]` {#docs:stable:sql:functions:text::stringindex} | | | |:--|:--------| | **Description** |Extracts a single character using a (1-based) `index`. | | **Example** | `'DuckDB'[4]` | | **Result** | `k` | | **Alias** | `array_extract` | ###### `string[begin:end]` {#docs:stable:sql:functions:text::stringbeginend} | | | |:--|:--------| | **Description** |Extracts a string using [slice conventions](#docs:stable:sql:functions:list::slicing) similar to Python. Missing `begin` or `end` arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted. | | **Example** | `'DuckDB'[:4]` | | **Result** | `Duck` | | **Alias** | `array_slice` | ###### `string LIKE target` {#docs:stable:sql:functions:text::string-like-target} | | | |:--|:--------| | **Description** |Returns `true` if the `string` matches the like specifier (see [Pattern Matching](#docs:stable:sql:functions:pattern_matching)). | | **Example** | `'hello' LIKE '%lo'` | | **Result** | `true` | ###### `string SIMILAR TO regex` {#docs:stable:sql:functions:text::string-similar-to-regex} | | | |:--|:--------| | **Description** |Returns `true` if the `string` matches the `regex` (see [Pattern Matching](#docs:stable:sql:functions:pattern_matching)). | | **Example** | `'hello' SIMILAR TO 'l+'` | | **Result** | `false` | | **Alias** | `regexp_full_match` | ###### `arg1 || arg2` {#docs:stable:sql:functions:text::arg1--arg2} | | | |:--|:--------| | **Description** |Concatenates two strings, lists, or blobs. Any `NULL` input results in `NULL`. See also [`concat(arg1, arg2, ...)`](#docs:stable:sql:functions:text::concatvalue-) and [`list_concat(list1, list2, ...)`](#docs:stable:sql:functions:list::list_concatlist_1--list_n). | | **Example 1** | `'Duck' || 'DB'` | | **Result** | `DuckDB` | | **Example 2** | `[1, 2, 3] || [4, 5, 6]` | | **Result** | `[1, 2, 3, 4, 5, 6]` | | **Example 3** | `'\xAA'::BLOB || '\xBB'::BLOB` | | **Result** | `\xAA\xBB` | ###### `array_extract(string, index)` {#docs:stable:sql:functions:text::array_extractstring-index} | | | |:--|:--------| | **Description** |Extracts a single character from a `string` using a (1-based) `index`. | | **Example** | `array_extract('DuckDB', 2)` | | **Result** | `u` | ###### `array_slice(list, begin, end)` {#docs:stable:sql:functions:text::array_slicelist-begin-end} | | | |:--|:--------| | **Description** |Extracts a sublist or substring using [slice conventions](#docs:stable:sql:functions:list::slicing). Negative values are accepted. | | **Example 1** | `array_slice('DuckDB', 3, 4)` | | **Result** | `ck` | | **Example 2** | `array_slice('DuckDB', 3, NULL)` | | **Result** | `NULL` | | **Example 3** | `array_slice('DuckDB', 0, -3)` | | **Result** | `Duck` | | **Alias** | `list_slice` | ###### `ascii(string)` {#docs:stable:sql:functions:text::asciistring} | | | |:--|:--------| | **Description** |Returns an integer that represents the Unicode code point of the first character of the `string`. | | **Example** | `ascii('Î©')` | | **Result** | `937` | ###### `bar(x, min, max[, width])` {#docs:stable:sql:functions:text::barx-min-max-width} | | | |:--|:--------| | **Description** |Draws a band whose width is proportional to (` x - min`) and equal to `width` characters when `x` = `max`. `width` defaults to 80. | | **Example** | `bar(5, 0, 20, 10)` | | **Result** | `â–ˆâ–ˆâ–Œ ` | ###### `bin(string)` {#docs:stable:sql:functions:text::binstring} | | | |:--|:--------| | **Description** |Converts the `string` to binary representation. | | **Example** | `bin('Aa')` | | **Result** | `0100000101100001` | | **Alias** | `to_binary` | ###### `bit_length(string)` {#docs:stable:sql:functions:text::bit_lengthstring} | | | |:--|:--------| | **Description** |Number of bits in a `string`. | | **Example** | `bit_length('abc')` | | **Result** | `24` | ###### `chr(code_point)` {#docs:stable:sql:functions:text::chrcode_point} | | | |:--|:--------| | **Description** |Returns a character which is corresponding the ASCII code value or Unicode code point. | | **Example** | `chr(65)` | | **Result** | `A` | ###### `concat(value, ...)` {#docs:stable:sql:functions:text::concatvalue-} | | | |:--|:--------| | **Description** |Concatenates multiple strings or lists. `NULL` inputs are skipped. See also [operator `||`](#::arg1--arg2). | | **Example 1** | `concat('Hello', ' ', 'World')` | | **Result** | `Hello World` | | **Example 2** | `concat([1, 2, 3], NULL, [4, 5, 6])` | | **Result** | `[1, 2, 3, 4, 5, 6]` | ###### `concat_ws(separator, string, ...)` {#docs:stable:sql:functions:text::concat_wsseparator-string-} | | | |:--|:--------| | **Description** |Concatenates many strings, separated by `separator`. `NULL` inputs are skipped. | | **Example** | `concat_ws(', ', 'Banana', 'Apple', 'Melon')` | | **Result** | `Banana, Apple, Melon` | ###### `contains(string, search_string)` {#docs:stable:sql:functions:text::containsstring-search_string} | | | |:--|:--------| | **Description** |Returns `true` if `search_string` is found within `string`. | | **Example** | `contains('abc', 'a')` | | **Result** | `true` | ###### `format(format, ...)` {#docs:stable:sql:functions:text::formatformat-} | | | |:--|:--------| | **Description** |Formats a string using the [fmt syntax](#::fmt-syntax). | | **Example** | `format('Benchmark "{}" took {} seconds', 'CSV', 42)` | | **Result** | `Benchmark "CSV" took 42 seconds` | ###### `formatReadableDecimalSize(integer)` {#docs:stable:sql:functions:text::formatreadabledecimalsizeinteger} | | | |:--|:--------| | **Description** |Converts `integer` to a human-readable representation using units based on powers of 10 (KB, MB, GB, etc.). | | **Example** | `formatReadableDecimalSize(16_000)` | | **Result** | `16.0 kB` | ###### `format_bytes(integer)` {#docs:stable:sql:functions:text::format_bytesinteger} | | | |:--|:--------| | **Description** |Converts `integer` to a human-readable representation using units based on powers of 2 (KiB, MiB, GiB, etc.). | | **Example** | `format_bytes(16_000)` | | **Result** | `15.6 KiB` | | **Alias** | `formatReadableSize` | ###### `from_base64(string)` {#docs:stable:sql:functions:text::from_base64string} | | | |:--|:--------| | **Description** |Converts a base64 encoded `string` to a character string (` BLOB`). | | **Example** | `from_base64('QQ==')` | | **Result** | `A` | ###### `greatest(arg1, ...)` {#docs:stable:sql:functions:text::greatestarg1-} | | | |:--|:--------| | **Description** |Returns the largest value in lexicographical order. Note that lowercase characters are considered larger than uppercase characters and [collations](#docs:stable:sql:expressions:collations) are not supported. | | **Example 1** | `greatest(42, 84)` | | **Result** | `84` | | **Example 2** | `greatest('abc', 'bcd', 'cde', 'EFG')` | | **Result** | `cde` | ###### `hash(value, ...)` {#docs:stable:sql:functions:text::hashvalue-} | | | |:--|:--------| | **Description** |Returns a `UBIGINT` with the hash of the `value`. Note that this is not a cryptographic hash. | | **Example** | `hash('ðŸ¦†')` | | **Result** | `4164431626903154684` | ###### `hex(string)` {#docs:stable:sql:functions:text::hexstring} | | | |:--|:--------| | **Description** |Converts the `string` to hexadecimal representation. | | **Example** | `hex('Hello')` | | **Result** | `48656C6C6F` | | **Alias** | `to_hex` | ###### `ilike_escape(string, like_specifier, escape_character)` {#docs:stable:sql:functions:text::ilike_escapestring-like_specifier-escape_character} | | | |:--|:--------| | **Description** |Returns `true` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:stable:sql:functions:pattern_matching)) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | **Example** | `ilike_escape('A%c', 'a$%C', '$')` | | **Result** | `true` | ###### `instr(string, search_string)` {#docs:stable:sql:functions:text::instrstring-search_string} | | | |:--|:--------| | **Description** |Returns location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. | | **Example** | `instr('test test', 'es')` | | **Result** | `2` | | **Aliases** | `position`, `strpos` | ###### `least(arg1, ...)` {#docs:stable:sql:functions:text::leastarg1-} | | | |:--|:--------| | **Description** |Returns the smallest value in lexicographical order. Note that uppercase characters are considered smaller than lowercase characters and [collations](#docs:stable:sql:expressions:collations) are not supported. | | **Example 1** | `least(42, 84)` | | **Result** | `42` | | **Example 2** | `least('abc', 'bcd', 'cde', 'EFG')` | | **Result** | `EFG` | ###### `left(string, count)` {#docs:stable:sql:functions:text::leftstring-count} | | | |:--|:--------| | **Description** |Extracts the left-most count characters. | | **Example** | `left('HelloðŸ¦†', 2)` | | **Result** | `He` | ###### `left_grapheme(string, count)` {#docs:stable:sql:functions:text::left_graphemestring-count} | | | |:--|:--------| | **Description** |Extracts the left-most count grapheme clusters. | | **Example** | `left_grapheme('ðŸ¤¦ðŸ¼â€â™‚ï¸ðŸ¤¦ðŸ½â€â™€ï¸', 1)` | | **Result** | `ðŸ¤¦ðŸ¼â€â™‚ï¸` | ###### `length(string)` {#docs:stable:sql:functions:text::lengthstring} | | | |:--|:--------| | **Description** |Number of characters in `string`. | | **Example** | `length('HelloðŸ¦†')` | | **Result** | `6` | | **Aliases** | `char_length`, `character_length`, `len` | ###### `length_grapheme(string)` {#docs:stable:sql:functions:text::length_graphemestring} | | | |:--|:--------| | **Description** |Number of grapheme clusters in `string`. | | **Example** | `length_grapheme('ðŸ¤¦ðŸ¼â€â™‚ï¸ðŸ¤¦ðŸ½â€â™€ï¸')` | | **Result** | `2` | ###### `like_escape(string, like_specifier, escape_character)` {#docs:stable:sql:functions:text::like_escapestring-like_specifier-escape_character} | | | |:--|:--------| | **Description** |Returns `true` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:stable:sql:functions:pattern_matching)) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | **Example** | `like_escape('a%c', 'a$%c', '$')` | | **Result** | `true` | ###### `lower(string)` {#docs:stable:sql:functions:text::lowerstring} | | | |:--|:--------| | **Description** |Converts `string` to lower case. | | **Example** | `lower('Hello')` | | **Result** | `hello` | | **Alias** | `lcase` | ###### `lpad(string, count, character)` {#docs:stable:sql:functions:text::lpadstring-count-character} | | | |:--|:--------| | **Description** |Pads the `string` with the `character` on the left until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters. | | **Example** | `lpad('hello', 8, '>')` | | **Result** | `>>>hello` | ###### `ltrim(string[, characters])` {#docs:stable:sql:functions:text::ltrimstring-characters} | | | |:--|:--------| | **Description** |Removes any occurrences of any of the `characters` from the left side of the `string`. `characters` defaults to `space`. | | **Example 1** | `ltrim(' test ')` | | **Result** | `test ` | | **Example 2** | `ltrim('>>>>test<<', '><')` | | **Result** | `test<<` | ###### `md5(string)` {#docs:stable:sql:functions:text::md5string} | | | |:--|:--------| | **Description** |Returns the MD5 hash of the `string` as a `VARCHAR`. | | **Example** | `md5('abc')` | | **Result** | `900150983cd24fb0d6963f7d28e17f72` | ###### `md5_number(string)` {#docs:stable:sql:functions:text::md5_numberstring} | | | |:--|:--------| | **Description** |Returns the MD5 hash of the `string` as a `HUGEINT`. | | **Example** | `md5_number('abc')` | | **Result** | `152195979970564155685860391459828531600` | ###### `md5_number_lower(string)` {#docs:stable:sql:functions:text::md5_number_lowerstring} | | | |:--|:--------| | **Description** |Returns the lower 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. | | **Example** | `md5_number_lower('abc')` | | **Result** | `8250560606382298838` | ###### `md5_number_upper(string)` {#docs:stable:sql:functions:text::md5_number_upperstring} | | | |:--|:--------| | **Description** |Returns the upper 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. | | **Example** | `md5_number_upper('abc')` | | **Result** | `12704604231530709392` | ###### `nfc_normalize(string)` {#docs:stable:sql:functions:text::nfc_normalizestring} | | | |:--|:--------| | **Description** |Converts `string` to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not. | | **Example** | `nfc_normalize('ardeÌ€ch')` | | **Result** | `ardÃ¨ch` | ###### `not_ilike_escape(string, like_specifier, escape_character)` {#docs:stable:sql:functions:text::not_ilike_escapestring-like_specifier-escape_character} | | | |:--|:--------| | **Description** |Returns `false` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:stable:sql:functions:pattern_matching)) using case-insensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | **Example** | `not_ilike_escape('A%c', 'a$%C', '$')` | | **Result** | `false` | ###### `not_like_escape(string, like_specifier, escape_character)` {#docs:stable:sql:functions:text::not_like_escapestring-like_specifier-escape_character} | | | |:--|:--------| | **Description** |Returns `false` if the `string` matches the `like_specifier` (see [Pattern Matching](#docs:stable:sql:functions:pattern_matching)) using case-sensitive matching. `escape_character` is used to search for wildcard characters in the `string`. | | **Example** | `not_like_escape('a%c', 'a$%c', '$')` | | **Result** | `false` | ###### `parse_dirname(path[, separator])` {#docs:stable:sql:functions:text::parse_dirnamepath-separator} | | | |:--|:--------| | **Description** |Returns the top-level directory name from the given `path`. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | **Example** | `parse_dirname('path/to/file.csv', 'system')` | | **Result** | `path` | ###### `parse_dirpath(path[, separator])` {#docs:stable:sql:functions:text::parse_dirpathpath-separator} | | | |:--|:--------| | **Description** |Returns the head of the `path` (the pathname until the last slash) similarly to Python's [`os.path.dirname`](https://docs.python.org/3.7/library/os.path.html#os.path.dirname). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | **Example** | `parse_dirpath('path/to/file.csv', 'forward_slash')` | | **Result** | `path/to` | ###### `parse_filename(string[, trim_extension][, separator])` {#docs:stable:sql:functions:text::parse_filenamestring-trim_extension-separator} | | | |:--|:--------| | **Description** |Returns the last component of the `path` similarly to Python's [`os.path.basename`](https://docs.python.org/3.7/library/os.path.html#os.path.basename) function. If `trim_extension` is `true`, the file extension will be removed (defaults to `false`). `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | **Example** | `parse_filename('path/to/file.csv', true, 'forward_slash')` | | **Result** | `file` | ###### `parse_path(path[, separator])` {#docs:stable:sql:functions:text::parse_pathpath-separator} | | | |:--|:--------| | **Description** |Returns a list of the components (directories and filename) in the `path` similarly to Python's [`pathlib.parts`](https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.parts) function. `separator` options: `system`, `both_slash` (default), `forward_slash`, `backslash`. | | **Example** | `parse_path('path/to/file.csv', 'system')` | | **Result** | `[path, to, file.csv]` | ###### `position(search_string IN string)` {#docs:stable:sql:functions:text::positionsearch_string-in-string} | | | |:--|:--------| | **Description** |Return location of first occurrence of `search_string` in `string`, counting from 1. Returns 0 if no match found. | | **Example** | `position('b' IN 'abc')` | | **Result** | `2` | | **Aliases** | `instr`, `strpos` | ###### `prefix(string, search_string)` {#docs:stable:sql:functions:text::prefixstring-search_string} | | | |:--|:--------| | **Description** |Returns `true` if `string` starts with `search_string`. | | **Example** | `prefix('abc', 'ab')` | | **Result** | `true` | ###### `printf(format, ...)` {#docs:stable:sql:functions:text::printfformat-} | | | |:--|:--------| | **Description** |Formats a `string` using [printf syntax](#::printf-syntax). | | **Example** | `printf('Benchmark "%s" took %d seconds', 'CSV', 42)` | | **Result** | `Benchmark "CSV" took 42 seconds` | ###### `read_text(source)` {#docs:stable:sql:functions:text::read_textsource} | | | |:--|:--------| | **Description** |Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `VARCHAR`. The file content is first validated to be valid UTF-8. If `read_text` attempts to read a file with invalid UTF-8 an error is thrown suggesting to use `read_blob` instead. See the [`read_text` guide](#docs:stable:guides:file_formats:read_file::read_text) for more details. | | **Example** | `read_text('hello.txt')` | | **Result** | `hello\n` | ###### `regexp_escape(string)` {#docs:stable:sql:functions:text::regexp_escapestring} | | | |:--|:--------| | **Description** |Escapes special patterns to turn `string` into a regular expression similarly to Python's [`re.escape` function](https://docs.python.org/3/library/re.html#re.escape). | | **Example** | `regexp_escape('https://duckdb.org')` | | **Result** | `https\:\/\/duckdb\.org` | ###### `regexp_extract(string, regex[, group][, options])` {#docs:stable:sql:functions:text::regexp_extractstring-regex-group-options} | | | |:--|:--------| | **Description** |If `string` contains the `regex` pattern, returns the capturing group specified by optional parameter `group`; otherwise, returns the empty string. The `group` must be a constant value. If no `group` is given, it defaults to 0. A set of optional [regex `options`](#docs:stable:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. | | **Example** | `regexp_extract('ABC', '([a-z])(b)', 1, 'i')` | | **Result** | `A` | ###### `regexp_extract(string, regex, name_list[, options])` {#docs:stable:sql:functions:text::regexp_extractstring-regex-name_list-options} | | | |:--|:--------| | **Description** |If `string` contains the `regex` pattern, returns the capturing groups as a struct with corresponding names from `name_list`; otherwise, returns a struct with the same keys and empty strings as values. A set of optional [regex `options`](#docs:stable:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. | | **Example** | `regexp_extract('John Doe', '([a-z]+) ([a-z]+)', ['first_name', 'last_name'], 'i')` | | **Result** | `{'first_name': John, 'last_name': Doe}` | ###### `regexp_extract_all(string, regex[, group][, options])` {#docs:stable:sql:functions:text::regexp_extract_allstring-regex-group-options} | | | |:--|:--------| | **Description** |Finds non-overlapping occurrences of the `regex` in the `string` and returns the corresponding values of the capturing `group`. A set of optional [regex `options`](#docs:stable:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. | | **Example** | `regexp_extract_all('Peter: 33, Paul:14', '(\w+):\s*(\d+)', 2)` | | **Result** | `[33, 14]` | ###### `regexp_full_match(string, regex[, col2])` {#docs:stable:sql:functions:text::regexp_full_matchstring-regex-col2} | | | |:--|:--------| | **Description** |Returns `true` if the entire `string` matches the `regex`. A set of optional [regex `options`](#docs:stable:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. | | **Example** | `regexp_full_match('anabanana', '(an)*')` | | **Result** | `false` | ###### `regexp_matches(string, regex[, options])` {#docs:stable:sql:functions:text::regexp_matchesstring-regex-options} | | | |:--|:--------| | **Description** |Returns `true` if `string` contains the `regex`, `false` otherwise. A set of optional [regex `options`](#docs:stable:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. | | **Example** | `regexp_matches('anabanana', '(an)*')` | | **Result** | `true` | ###### `regexp_replace(string, regex, replacement[, options])` {#docs:stable:sql:functions:text::regexp_replacestring-regex-replacement-options} | | | |:--|:--------| | **Description** |If `string` contains the `regex`, replaces the matching part with `replacement`. A set of optional [regex `options`](#docs:stable:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. | | **Example** | `regexp_replace('hello', '[lo]', '-')` | | **Result** | `he-lo` | ###### `regexp_split_to_table(string, regex)` {#docs:stable:sql:functions:text::regexp_split_to_tablestring-regex} | | | |:--|:--------| | **Description** |Splits the `string` along the `regex` and returns a row for each part. | | **Example** | `regexp_split_to_table('hello world; 42', ';? ')` | | **Result** | Multiple rows: `'hello'`, `'world'`, `'42'` | ###### `repeat(string, count)` {#docs:stable:sql:functions:text::repeatstring-count} | | | |:--|:--------| | **Description** |Repeats the `string` `count` number of times. | | **Example** | `repeat('A', 5)` | | **Result** | `AAAAA` | ###### `replace(string, source, target)` {#docs:stable:sql:functions:text::replacestring-source-target} | | | |:--|:--------| | **Description** |Replaces any occurrences of the `source` with `target` in `string`. | | **Example** | `replace('hello', 'l', '-')` | | **Result** | `he--o` | ###### `reverse(string)` {#docs:stable:sql:functions:text::reversestring} | | | |:--|:--------| | **Description** |Reverses the `string`. | | **Example** | `reverse('hello')` | | **Result** | `olleh` | ###### `right(string, count)` {#docs:stable:sql:functions:text::rightstring-count} | | | |:--|:--------| | **Description** |Extract the right-most `count` characters. | | **Example** | `right('HelloðŸ¦†', 3)` | | **Result** | `loðŸ¦†` | ###### `right_grapheme(string, count)` {#docs:stable:sql:functions:text::right_graphemestring-count} | | | |:--|:--------| | **Description** |Extracts the right-most `count` grapheme clusters. | | **Example** | `right_grapheme('ðŸ¤¦ðŸ¼â€â™‚ï¸ðŸ¤¦ðŸ½â€â™€ï¸', 1)` | | **Result** | `ðŸ¤¦ðŸ½â€â™€ï¸` | ###### `rpad(string, count, character)` {#docs:stable:sql:functions:text::rpadstring-count-character} | | | |:--|:--------| | **Description** |Pads the `string` with the `character` on the right until it has `count` characters. Truncates the `string` on the right if it has more than `count` characters. | | **Example** | `rpad('hello', 10, '<')` | | **Result** | `hello<<<<<` | ###### `rtrim(string[, characters])` {#docs:stable:sql:functions:text::rtrimstring-characters} | | | |:--|:--------| | **Description** |Removes any occurrences of any of the `characters` from the right side of the `string`. `characters` defaults to `space`. | | **Example 1** | `rtrim('test ')` | | **Result** | `test` | | **Example 2** | `rtrim('>>>>test<<', '><')` | | **Result** | `>>>>test` | ###### `sha1(value)` {#docs:stable:sql:functions:text::sha1value} | | | |:--|:--------| | **Description** |Returns a `VARCHAR` with the SHA-1 hash of the `value`. | | **Example** | `sha1('ðŸ¦†')` | | **Result** | `949bf843dc338be348fb9525d1eb535d31241d76` | ###### `sha256(value)` {#docs:stable:sql:functions:text::sha256value} | | | |:--|:--------| | **Description** |Returns a `VARCHAR` with the SHA-256 hash of the `value` | | **Example** | `sha256('ðŸ¦†')` | | **Result** | `d7a5c5e0d1d94c32218539e7e47d4ba9c3c7b77d61332fb60d633dde89e473fb` | ###### `split_part(string, separator, index)` {#docs:stable:sql:functions:text::split_partstring-separator-index} | | | |:--|:--------| | **Description** |Splits the `string` along the `separator` and returns the data at the (1-based) `index` of the list. If the `index` is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior). | | **Example** | `split_part('a;b;c', ';', 2)` | | **Result** | `b` | ###### `starts_with(string, search_string)` {#docs:stable:sql:functions:text::starts_withstring-search_string} | | | |:--|:--------| | **Description** |Returns `true` if `string` begins with `search_string`. | | **Example** | `starts_with('abc', 'a')` | | **Result** | `true` | | **Alias** | `^@` | ###### `string_split(string, separator)` {#docs:stable:sql:functions:text::string_splitstring-separator} | | | |:--|:--------| | **Description** |Splits the `string` along the `separator`. | | **Example** | `string_split('hello-world', '-')` | | **Result** | `[hello, world]` | | **Aliases** | `split`, `str_split`, `string_to_array` | ###### `string_split_regex(string, regex[, options])` {#docs:stable:sql:functions:text::string_split_regexstring-regex-options} | | | |:--|:--------| | **Description** |Splits the `string` along the `regex`. A set of optional [regex `options`](#docs:stable:sql:functions:regular_expressions::options-for-regular-expression-functions) can be set. | | **Example** | `string_split_regex('hello world; 42', ';? ')` | | **Result** | `[hello, world, 42]` | | **Aliases** | `regexp_split_to_array`, `str_split_regex` | ###### `strip_accents(string)` {#docs:stable:sql:functions:text::strip_accentsstring} | | | |:--|:--------| | **Description** |Strips accents from `string`. | | **Example** | `strip_accents('mÃ¼hleisen')` | | **Result** | `muhleisen` | ###### `strlen(string)` {#docs:stable:sql:functions:text::strlenstring} | | | |:--|:--------| | **Description** |Number of bytes in `string`. | | **Example** | `strlen('ðŸ¦†')` | | **Result** | `4` | ###### `substring(string, start[, length])` {#docs:stable:sql:functions:text::substringstring-start-length} | | | |:--|:--------| | **Description** |Extracts substring starting from character `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` characters instead. Note that a `start` value of `1` refers to the first character of the `string`. | | **Example 1** | `substring('Hello', 2)` | | **Result** | `ello` | | **Example 2** | `substring('Hello', 2, 2)` | | **Result** | `el` | | **Alias** | `substr` | ###### `substring_grapheme(string, start[, length])` {#docs:stable:sql:functions:text::substring_graphemestring-start-length} | | | |:--|:--------| | **Description** |Extracts substring starting from grapheme clusters `start` up to the end of the string. If optional argument `length` is set, extracts a substring of `length` grapheme clusters instead. Note that a `start` value of `1` refers to the `first` character of the `string`. | | **Example 1** | `substring_grapheme('ðŸ¦†ðŸ¤¦ðŸ¼â€â™‚ï¸ðŸ¤¦ðŸ½â€â™€ï¸ðŸ¦†', 3)` | | **Result** | `ðŸ¤¦ðŸ½â€â™€ï¸ðŸ¦†` | | **Example 2** | `substring_grapheme('ðŸ¦†ðŸ¤¦ðŸ¼â€â™‚ï¸ðŸ¤¦ðŸ½â€â™€ï¸ðŸ¦†', 3, 2)` | | **Result** | `ðŸ¤¦ðŸ½â€â™€ï¸ðŸ¦†` | ###### `suffix(string, search_string)` {#docs:stable:sql:functions:text::suffixstring-search_string} | | | |:--|:--------| | **Description** |Returns `true` if `string` ends with `search_string`. Note that [collations](#docs:stable:sql:expressions:collations) are not supported. | | **Example** | `suffix('abc', 'bc')` | | **Result** | `true` | | **Alias** | `ends_with` | ###### `to_base(number, radix[, min_length])` {#docs:stable:sql:functions:text::to_basenumber-radix-min_length} | | | |:--|:--------| | **Description** |Converts `number` to a string in the given base `radix`, optionally padding with leading zeros to `min_length`. | | **Example** | `to_base(42, 16, 5)` | | **Result** | `0002A` | ###### `to_base64(blob)` {#docs:stable:sql:functions:text::to_base64blob} | | | |:--|:--------| | **Description** |Converts a `blob` to a base64 encoded string. | | **Example** | `to_base64('A'::BLOB)` | | **Result** | `QQ==` | | **Alias** | `base64` | ###### `translate(string, from, to)` {#docs:stable:sql:functions:text::translatestring-from-to} | | | |:--|:--------| | **Description** |Replaces each character in `string` that matches a character in the `from` set with the corresponding character in the `to` set. If `from` is longer than `to`, occurrences of the extra characters in `from` are deleted. | | **Example** | `translate('12345', '143', 'ax')` | | **Result** | `a2x5` | ###### `trim(string[, characters])` {#docs:stable:sql:functions:text::trimstring-characters} | | | |:--|:--------| | **Description** |Removes any occurrences of any of the `characters` from either side of the `string`. `characters` defaults to `space`. | | **Example 1** | `trim(' test ')` | | **Result** | `test` | | **Example 2** | `trim('>>>>test<<', '><')` | | **Result** | `test` | ###### `unbin(value)` {#docs:stable:sql:functions:text::unbinvalue} | | | |:--|:--------| | **Description** |Converts a `value` from binary representation to a blob. | | **Example** | `unbin('0110')` | | **Result** | `\x06` | | **Alias** | `from_binary` | ###### `unhex(value)` {#docs:stable:sql:functions:text::unhexvalue} | | | |:--|:--------| | **Description** |Converts a `value` from hexadecimal representation to a blob. | | **Example** | `unhex('2A')` | | **Result** | `*` | | **Alias** | `from_hex` | ###### `unicode(string)` {#docs:stable:sql:functions:text::unicodestring} | | | |:--|:--------| | **Description** |Returns an `INTEGER` representing the `unicode` codepoint of the first character in the `string`. | | **Example** | `[unicode('Ã¢bcd'), unicode('Ã¢'), unicode(''), unicode(NULL)]` | | **Result** | `[226, 226, -1, NULL]` | | **Alias** | `ord` | ###### `upper(string)` {#docs:stable:sql:functions:text::upperstring} | | | |:--|:--------| | **Description** |Converts `string` to upper case. | | **Example** | `upper('Hello')` | | **Result** | `HELLO` | | **Alias** | `ucase` | ###### `url_decode(string)` {#docs:stable:sql:functions:text::url_decodestring} | | | |:--|:--------| | **Description** |Decodes a URL from a representation using [Percent-Encoding](https://datatracker.ietf.org/doc/html/rfc3986#section-2.1). | | **Example** | `url_decode('https%3A%2F%2Fduckdb.org%2Fwhy_duckdb%23portable')` | | **Result** | `https://duckdb.org/why_duckdb#portable` | ###### `url_encode(string)` {#docs:stable:sql:functions:text::url_encodestring} | | | |:--|:--------| | **Description** |Encodes a URL to a representation using [Percent-Encoding](https://datatracker.ietf.org/doc/html/rfc3986#section-2.1). | | **Example** | `url_encode('this string has/ special+ characters>')` | | **Result** | `this%20string%20has%2F%20special%2B%20characters%3E` | #### Text Similarity Functions {#docs:stable:sql:functions:text::text-similarity-functions} These functions are used to measure the similarity of two strings using various [similarity measures](https://en.wikipedia.org/wiki/Similarity_measure). | Function | Description | |:--|:-------| | [`damerau_levenshtein(s1, s2)`](#::damerau_levenshteins1-s2) | Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., `a` and `A`) are considered different. | | [`editdist3(s1, s2)`](#::levenshteins1-s2) | Alias for `levenshtein`. | | [`hamming(s1, s2)`](#::hammings1-s2) | The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., `a` and `A`) are considered different. | | [`jaccard(s1, s2)`](#::jaccards1-s2) | The Jaccard similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. | | [`jaro_similarity(s1, s2[, score_cutoff])`](#jaro_similaritys1-s2-score_cutoff) | The Jaro similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0. | | [`jaro_winkler_similarity(s1, s2[, score_cutoff])`](#jaro_winkler_similaritys1-s2-score_cutoff) | The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0. | | [`levenshtein(s1, s2)`](#::levenshteins1-s2) | The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., `a` and `A`) are considered different. | | [`mismatches(s1, s2)`](#::hammings1-s2) | Alias for `hamming`. | ###### `damerau_levenshtein(s1, s2)` {#docs:stable:sql:functions:text::damerau_levenshteins1-s2} | | | |:--|:--------| | **Description** |Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Characters of different cases (e.g., `a` and `A`) are considered different. | | **Example** | `damerau_levenshtein('duckdb', 'udckbd')` | | **Result** | `2` | ###### `hamming(s1, s2)` {#docs:stable:sql:functions:text::hammings1-s2} | | | |:--|:--------| | **Description** |The Hamming distance between to strings, i.e., the number of positions with different characters for two strings of equal length. Strings must be of equal length. Characters of different cases (e.g., `a` and `A`) are considered different. | | **Example** | `hamming('duck', 'luck')` | | **Result** | `1` | | **Alias** | `mismatches` | ###### `jaccard(s1, s2)` {#docs:stable:sql:functions:text::jaccards1-s2} | | | |:--|:--------| | **Description** |The Jaccard similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. | | **Example** | `jaccard('duck', 'luck')` | | **Result** | `0.6` | ###### `jaro_similarity(s1, s2[, score_cutoff])` {#docs:stable:sql:functions:text::jaro_similaritys1-s2-score_cutoff} | | | |:--|:--------| | **Description** |The Jaro similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0. | | **Example** | `jaro_similarity('duck', 'duckdb')` | | **Result** | `0.8888888888888888` | ###### `jaro_winkler_similarity(s1, s2[, score_cutoff])` {#docs:stable:sql:functions:text::jaro_winkler_similaritys1-s2-score_cutoff} | | | |:--|:--------| | **Description** |The Jaro-Winkler similarity between two strings. Characters of different cases (e.g., `a` and `A`) are considered different. Returns a number between 0 and 1. For similarity < `score_cutoff`, 0 is returned instead. `score_cutoff` defaults to 0. | | **Example** | `jaro_winkler_similarity('duck', 'duckdb')` | | **Result** | `0.9333333333333333` | ###### `levenshtein(s1, s2)` {#docs:stable:sql:functions:text::levenshteins1-s2} | | | |:--|:--------| | **Description** |The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Characters of different cases (e.g., `a` and `A`) are considered different. | | **Example** | `levenshtein('duck', 'db')` | | **Result** | `3` | | **Alias** | `editdist3` | #### Formatters {#docs:stable:sql:functions:text::formatters} ##### `fmt` Syntax {#docs:stable:sql:functions:text::fmt-syntax} The `format(format, parameters...)` function formats strings, loosely following the syntax of the [{fmt} open-source formatting library](https://fmt.dev/latest/syntax/). Format without additional parameters: ```sql SELECT format('Hello world'); -- Hello world ``` Format a string using {}: ```sql SELECT format('The answer is {}', 42); -- The answer is 42 ``` Format a string using positional arguments: ```sql SELECT format('I''d rather be {1} than {0}.', 'right', 'happy'); -- I'd rather be happy than right. ``` ###### Format Specifiers {#docs:stable:sql:functions:text::format-specifiers} | Specifier | Description | Example | |:-|:------|:---| | `{:d}` | integer | `654321` | | `{:E}` | scientific notation | `3.141593E+00` | | `{:f}` | float | `4.560000` | | `{:o}` | octal | `2375761` | | `{:s}` | string | `asd` | | `{:x}` | hexadecimal | `9fbf1` | | `{:tX}` | integer, `X` is the thousand separator | `654 321` | ###### Formatting Types {#docs:stable:sql:functions:text::formatting-types} Integers: ```sql SELECT format('{} + {} = {}', 3, 5, 3 + 5); -- 3 + 5 = 8 ``` Booleans: ```sql SELECT format('{} != {}', true, false); -- true != false ``` Format datetime values: ```sql SELECT format('{}', DATE '1992-01-01'); -- 1992-01-01 SELECT format('{}', TIME '12:01:00'); -- 12:01:00 SELECT format('{}', TIMESTAMP '1992-01-01 12:01:00'); -- 1992-01-01 12:01:00 ``` Format BLOB: ```sql SELECT format('{}', BLOB '\x00hello'); -- \x00hello ``` Pad integers with 0s: ```sql SELECT format('{:04d}', 33); -- 0033 ``` Create timestamps from integers: ```sql SELECT format('{:02d}:{:02d}:{:02d} {}', 12, 3, 16, 'AM'); -- 12:03:16 AM ``` Convert to hexadecimal: ```sql SELECT format('{:x}', 123_456_789); -- 75bcd15 ``` Convert to binary: ```sql SELECT format('{:b}', 123_456_789); -- 111010110111100110100010101 ``` ###### Print Numbers with Thousand Separators {#docs:stable:sql:functions:text::print-numbers-with-thousand-separators} Integers: ```sql SELECT format('{:,}', 123_456_789); -- 123,456,789 SELECT format('{:t.}', 123_456_789); -- 123.456.789 SELECT format('{:''}', 123_456_789); -- 123'456'789 SELECT format('{:_}', 123_456_789); -- 123_456_789 SELECT format('{:t }', 123_456_789); -- 123 456 789 SELECT format('{:tX}', 123_456_789); -- 123X456X789 ``` Float, double and decimal: ```sql SELECT format('{:,f}', 123456.789); -- 123,456.78900 SELECT format('{:,.2f}', 123456.789); -- 123,456.79 SELECT format('{:t..2f}', 123456.789); -- 123.456,79 ``` ##### `printf` Syntax {#docs:stable:sql:functions:text::printf-syntax} The `printf(format, parameters...)` function formats strings using the [`printf` syntax](https://cplusplus.com/reference/cstdio/printf/). Format without additional parameters: ```sql SELECT printf('Hello world'); ``` ```text Hello world ``` Format a string using arguments in a given order: ```sql SELECT printf('The answer to %s is %d', 'life', 42); ``` ```text The answer to life is 42 ``` Format a string using positional arguments `%position$formatter`, e.g., the second parameter as a string is encoded as `%2$s`: ```sql SELECT printf('I''d rather be %2$s than %1$s.', 'right', 'happy'); ``` ```text I'd rather be happy than right. ``` ###### Format Specifiers {#docs:stable:sql:functions:text::format-specifiers} | Specifier | Description | Example | |:-|:------|:---| | `%c` | character code to character | `a` | | `%d` | integer | `654321` | | `%Xd` | integer with thousand seperarator `X` from `,`, `.`, `''`, `_` | `654_321` | | `%E` | scientific notation | `3.141593E+00` | | `%f` | float | `4.560000` | | `%hd` | integer | `654321` | | `%hhd` | integer | `654321` | | `%lld` | integer | `654321` | | `%o` | octal | `2375761` | | `%s` | string | `asd` | | `%x` | hexadecimal | `9fbf1` | ###### Formatting Types {#docs:stable:sql:functions:text::formatting-types} Integers: ```sql SELECT printf('%d + %d = %d', 3, 5, 3 + 5); -- 3 + 5 = 8 ``` Booleans: ```sql SELECT printf('%s != %s', true, false); -- true != false ``` Format datetime values: ```sql SELECT printf('%s', DATE '1992-01-01'); -- 1992-01-01 SELECT printf('%s', TIME '12:01:00'); -- 12:01:00 SELECT printf('%s', TIMESTAMP '1992-01-01 12:01:00'); -- 1992-01-01 12:01:00 ``` Format BLOB: ```sql SELECT printf('%s', BLOB '\x00hello'); -- \x00hello ``` Pad integers with 0s: ```sql SELECT printf('%04d', 33); -- 0033 ``` Create timestamps from integers: ```sql SELECT printf('%02d:%02d:%02d %s', 12, 3, 16, 'AM'); -- 12:03:16 AM ``` Convert to hexadecimal: ```sql SELECT printf('%x', 123_456_789); -- 75bcd15 ``` Convert to binary: ```sql SELECT printf('%b', 123_456_789); -- 111010110111100110100010101 ``` ###### Thousand Separators {#docs:stable:sql:functions:text::thousand-separators} Integers: ```sql SELECT printf('%,d', 123_456_789); -- 123,456,789 SELECT printf('%.d', 123_456_789); -- 123.456.789 SELECT printf('%''d', 123_456_789); -- 123'456'789 SELECT printf('%_d', 123_456_789); -- 123_456_789 ``` Float, double and decimal: ```sql SELECT printf('%,f', 123456.789); -- 123,456.789000 SELECT printf('%,.2f', 123456.789); -- 123,456.79 ``` ### Time Functions {#docs:stable:sql:functions:time} This section describes functions and operators for examining and manipulating [`TIME` values](#docs:stable:sql:data_types:time). #### Time Operators {#docs:stable:sql:functions:time::time-operators} The table below shows the available mathematical operators for `TIME` types. | Operator | Description | Example | Result | |:-|:---|:----|:--| | `+` | addition of an `INTERVAL` | `TIME '01:02:03' + INTERVAL 5 HOUR` | `06:02:03` | | `-` | subtraction of an `INTERVAL` | `TIME '06:02:03' - INTERVAL 5 HOUR` | `01:02:03` | #### Time Functions {#docs:stable:sql:functions:time::time-functions} The table below shows the available scalar functions for `TIME` types. | Name | Description | |:--|:-------| | [`current_time`](#::current_time) | Current time (start of current transaction) in the local time zone. Note that parentheses should be omitted from the function call. | | [`date_diff(part, starttime, endtime)`](#::date_diffpart-starttime-endtime) | The number of [`part`](#docs:stable:sql:functions:datepart) boundaries between `starttime` and `endtime`, inclusive of the larger time and exclusive of the smaller time. | | [`date_part(part, time)`](#::date_partpart-time) | Get [subfield](#docs:stable:sql:functions:datepart) (equivalent to `extract`). | | [`date_sub(part, starttime, endtime)`](#::date_subpart-starttime-endtime) | The signed length of the interval between `starttime` and `endtime`, truncated to whole multiples of [`part`](#docs:stable:sql:functions:datepart). | | [`extract(part FROM time)`](#::extractpart-from-time) | Get subfield from a time. | | [`get_current_time()`](#::get_current_time) | Current time (start of current transaction) in UTC. | | [`make_time(bigint, bigint, double)`](#::make_timebigint-bigint-double) | The time for the given parts. | The only [date parts](#docs:stable:sql:functions:datepart) that are defined for times are `epoch`, `hours`, `minutes`, `seconds`, `milliseconds` and `microseconds`. ###### `current_time` {#docs:stable:sql:functions:time::current_time} | | | |:--|:--------| | **Description** |Current time (start of current transaction) in the local time zone. Note that parentheses should be omitted from the function call. | | **Example** | `current_time` | | **Result** | `10:31:58.578` | | **Alias** | `get_current_time()` | ###### `date_diff(part, starttime, endtime)` {#docs:stable:sql:functions:time::date_diffpart-starttime-endtime} | | | |:--|:--------| | **Description** |The number of [`part`](#docs:stable:sql:functions:datepart) boundaries between `starttime` and `endtime`, inclusive of the larger time and exclusive of the smaller time. | | **Example** | `date_diff('hour', TIME '01:02:03', TIME '06:01:03')` | | **Result** | `5` | | **Alias** | `datediff` | ###### `date_part(part, time)` {#docs:stable:sql:functions:time::date_partpart-time} | | | |:--|:--------| | **Description** |Get [subfield](#docs:stable:sql:functions:datepart) (equivalent to `extract`). | | **Example** | `date_part('minute', TIME '14:21:13')` | | **Result** | `21` | | **Alias** | `datepart` | ###### `date_sub(part, starttime, endtime)` {#docs:stable:sql:functions:time::date_subpart-starttime-endtime} | | | |:--|:--------| | **Description** |The signed length of the interval between `starttime` and `endtime`, truncated to whole multiples of [`part`](#docs:stable:sql:functions:datepart). | | **Example** | `date_sub('hour', TIME '01:02:03', TIME '06:01:03')` | | **Result** | `4` | | **Alias** | `datesub` | ###### `extract(part FROM time)` {#docs:stable:sql:functions:time::extractpart-from-time} | | | |:--|:--------| | **Description** |Get subfield from a time. | | **Example** | `extract('hour' FROM TIME '14:21:13')` | | **Result** | `14` | ###### `get_current_time()` {#docs:stable:sql:functions:time::get_current_time} | | | |:--|:--------| | **Description** |Current time (start of current transaction) in UTC. | | **Example** | `get_current_time()` | | **Result** | `10:31:58.578` | | **Alias** | `current_time` | ###### `make_time(bigint, bigint, double)` {#docs:stable:sql:functions:time::make_timebigint-bigint-double} | | | |:--|:--------| | **Description** |The time for the given parts. | | **Example** | `make_time(13, 34, 27.123456)` | | **Result** | `13:34:27.123456` | ### Timestamp Functions {#docs:stable:sql:functions:timestamp} This section describes functions and operators for examining and manipulating [`TIMESTAMP` values](#docs:stable:sql:data_types:timestamp). See also the related [`TIMESTAMPTZ` functions](#docs:stable:sql:functions:timestamptz). #### Timestamp Operators {#docs:stable:sql:functions:timestamp::timestamp-operators} The table below shows the available mathematical operators for `TIMESTAMP` types. | Operator | Description | Example | Result | |:-|:--|:----|:--| | `+` | addition of an `INTERVAL` | `TIMESTAMP '1992-03-22 01:02:03' + INTERVAL 5 DAY` | `1992-03-27 01:02:03` | | `-` | subtraction of `TIMESTAMP`s | `TIMESTAMP '1992-03-27' - TIMESTAMP '1992-03-22'` | `5 days` | | `-` | subtraction of an `INTERVAL` | `TIMESTAMP '1992-03-27 01:02:03' - INTERVAL 5 DAY` | `1992-03-22 01:02:03` | Adding to or subtracting from [infinite values](#docs:stable:sql:data_types:timestamp::special-values) produces the same infinite value. #### Scalar Timestamp Functions {#docs:stable:sql:functions:timestamp::scalar-timestamp-functions} The table below shows the available scalar functions for `TIMESTAMP` values. | Name | Description | |:--|:-------| | [`age(timestamp, timestamp)`](#::agetimestamp-timestamp) | Subtract arguments, resulting in the time difference between the two timestamps. | | [`age(timestamp)`](#::agetimestamp) | Subtract from current_date. | | [`century(timestamp)`](#::centurytimestamp) | Extracts the century of a timestamp. | | [`current_localtimestamp()`](#::current_localtimestamp) | Returns the current timestamp (at the start of the transaction). | | [`date_diff(part, starttimestamp, endtimestamp)`](#::date_diffpart-starttimestamp-endtimestamp) | The number of [`part`](#docs:stable:sql:functions:datepart) boundaries between `starttimestamp` and `endtimestamp`, inclusive of the larger timstamp and exclusive of the smaller timestamp. | | [`date_part([part, ...], timestamp)`](#date_partpart--timestamp) | Get the listed [subfields](#docs:stable:sql:functions:datepart) as a `struct`. The list must be constant. | | [`date_part(part, timestamp)`](#::date_partpart-timestamp) | Get [subfield](#docs:stable:sql:functions:datepart) (equivalent to `extract`). | | [`date_sub(part, starttimestamp, endtimestamp)`](#::date_subpart-starttimestamp-endtimestamp) | The signed length of the interval between `starttimestamp` and `endtimestamp`, truncated to whole multiples of [`part`](#docs:stable:sql:functions:datepart). | | [`date_trunc(part, timestamp)`](#::date_truncpart-timestamp) | Truncate to specified [precision](#docs:stable:sql:functions:datepart). | | [`dayname(timestamp)`](#::daynametimestamp) | The (English) name of the weekday. | | [`epoch_ms(timestamp)`](#::epoch_mstimestamp) | Returns the total number of milliseconds since the epoch. | | [`epoch_ns(timestamp)`](#::epoch_nstimestamp) | Returns the total number of nanoseconds since the epoch. | | [`epoch_us(timestamp)`](#::epoch_ustimestamp) | Returns the total number of microseconds since the epoch. | | [`epoch(timestamp)`](#::epochtimestamp) | Returns the total number of seconds since the epoch. | | [`extract(field FROM timestamp)`](#::extractfield-from-timestamp) | Get [subfield](#docs:stable:sql:functions:datepart) from a timestamp. | | [`greatest(timestamp, timestamp)`](#::greatesttimestamp-timestamp) | The later of two timestamps. | | [`isfinite(timestamp)`](#::isfinitetimestamp) | Returns true if the timestamp is finite, false otherwise. | | [`isinf(timestamp)`](#::isinftimestamp) | Returns true if the timestamp is infinite, false otherwise. | | [`julian(timestamp)`](#::juliantimestamp) | Extract the Julian Day number from a timestamp. | | [`last_day(timestamp)`](#::last_daytimestamp) | The last day of the month. | | [`least(timestamp, timestamp)`](#::leasttimestamp-timestamp) | The earlier of two timestamps. | | [`make_timestamp(bigint, bigint, bigint, bigint, bigint, double)`](#::make_timestampbigint-bigint-bigint-bigint-bigint-double) | The timestamp for the given parts. | | [`make_timestamp(microseconds)`](#::make_timestampmicroseconds) | Converts microseconds since the epoch to a timestamp. | | [`make_timestamp_ms(milliseconds)`](#::make_timestamp_msmilliseconds) | Converts milliseconds since the epoch to a timestamp. | | [`make_timestamp_ns(nanoseconds)`](#::make_timestamp_nsnanoseconds) | Converts nanoseconds since the epoch to a timestamp. | | [`monthname(timestamp)`](#::monthnametimestamp) | The (English) name of the month. | | [`strftime(timestamp, format)`](#::strftimetimestamp-format) | Converts timestamp to string according to the [format string](#docs:stable:sql:functions:dateformat::format-specifiers). | | [`strptime(text, format-list)`](#::strptimetext-format-list) | Converts the string `text` to timestamp applying the [format strings](#docs:stable:sql:functions:dateformat) in the list until one succeeds. Throws an error on failure. To return `NULL` on failure, use [`try_strptime`](#::try_strptimetext-format-list). | | [`strptime(text, format)`](#::strptimetext-format) | Converts the string `text` to timestamp according to the [format string](#docs:stable:sql:functions:dateformat::format-specifiers). Throws an error on failure. To return `NULL` on failure, use [`try_strptime`](#::try_strptimetext-format). | | [`time_bucket(bucket_width, timestamp[, offset])`](#time_bucketbucket_width-timestamp-offset) | Truncate `timestamp` to a grid of width `bucket_width`. The grid is anchored at `2000-01-01 00:00:00[ + offset]` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00[ + offset]`. Note that `2000-01-03` is a Monday. | | [`time_bucket(bucket_width, timestamp[, origin])`](#time_bucketbucket_width-timestamp-origin) | Truncate `timestamp` to a grid of width `bucket_width`. The grid is anchored at the `origin` timestamp, which defaults to `2000-01-01 00:00:00` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00`. Note that `2000-01-03` is a Monday. | | [`try_strptime(text, format-list)`](#::try_strptimetext-format-list) | Converts the string `text` to timestamp applying the [format strings](#docs:stable:sql:functions:dateformat) in the list until one succeeds. Returns `NULL` on failure. | | [`try_strptime(text, format)`](#::try_strptimetext-format) | Converts the string `text` to timestamp according to the [format string](#docs:stable:sql:functions:dateformat::format-specifiers). Returns `NULL` on failure. | There are also dedicated extraction functions to get the [subfields](#docs:stable:sql:functions:datepart). Functions applied to infinite dates will either return the same infinite dates (e.g., `greatest`) or `NULL` (e.g., `date_part`) depending on what â€œmakes senseâ€. In general, if the function needs to examine the parts of the infinite date, the result will be `NULL`. ###### `age(timestamp, timestamp)` {#docs:stable:sql:functions:timestamp::agetimestamp-timestamp} | | | |:--|:--------| | **Description** |Subtract arguments, resulting in the time difference between the two timestamps. | | **Example** | `age(TIMESTAMP '2001-04-10', TIMESTAMP '1992-09-20')` | | **Result** | `8 years 6 months 20 days` | ###### `age(timestamp)` {#docs:stable:sql:functions:timestamp::agetimestamp} | | | |:--|:--------| | **Description** |Subtract from current_date. | | **Example** | `age(TIMESTAMP '1992-09-20')` | | **Result** | `29 years 1 month 27 days 12:39:00.844` | ###### `century(timestamp)` {#docs:stable:sql:functions:timestamp::centurytimestamp} | | | |:--|:--------| | **Description** |Extracts the century of a timestamp. | | **Example** | `century(TIMESTAMP '1992-03-22')` | | **Result** | `20` | ###### `current_localtimestamp()` {#docs:stable:sql:functions:timestamp::current_localtimestamp} | | | |:--|:--------| | **Description** |Returns the current timestamp with time zone (at the start of the transaction). | | **Example** | `current_localtimestamp()` | | **Result** | `2024-11-30 13:28:48.895` | ###### `date_diff(part, starttimestamp, endtimestamp)` {#docs:stable:sql:functions:timestamp::date_diffpart-starttimestamp-endtimestamp} | | | |:--|:--------| | **Description** |The signed number of [`part`](#docs:stable:sql:functions:datepart) boundaries between `starttimestamp` and `endtimestamp`, inclusive of the larger timestamp and exclusive of the smaller timestamp. | | **Example** | `date_diff('hour', TIMESTAMP '1992-09-30 23:59:59', TIMESTAMP '1992-10-01 01:58:00')` | | **Result** | `2` | ###### `date_part([part, ...], timestamp)` {#docs:stable:sql:functions:timestamp::date_partpart--timestamp} | | | |:--|:--------| | **Description** |Get the listed [subfields](#docs:stable:sql:functions:datepart) as a `struct`. The list must be constant. | | **Example** | `date_part(['year', 'month', 'day'], TIMESTAMP '1992-09-20 20:38:40')` | | **Result** | `{year: 1992, month: 9, day: 20}` | ###### `date_part(part, timestamp)` {#docs:stable:sql:functions:timestamp::date_partpart-timestamp} | | | |:--|:--------| | **Description** |Get [subfield](#docs:stable:sql:functions:datepart) (equivalent to `extract`). | | **Example** | `date_part('minute', TIMESTAMP '1992-09-20 20:38:40')` | | **Result** | `38` | ###### `date_sub(part, starttimestamp, endtimestamp)` {#docs:stable:sql:functions:timestamp::date_subpart-starttimestamp-endtimestamp} | | | |:--|:--------| | **Description** |The signed length of the interval between `starttimestamp` and `endtimestamp`, truncated to whole multiples of [`part`](#docs:stable:sql:functions:datepart). | | **Example** | `date_sub('hour', TIMESTAMP '1992-09-30 23:59:59', TIMESTAMP '1992-10-01 01:58:00')` | | **Result** | `1` | ###### `date_trunc(part, timestamp)` {#docs:stable:sql:functions:timestamp::date_truncpart-timestamp} | | | |:--|:--------| | **Description** |Truncate to specified [precision](#docs:stable:sql:functions:datepart). | | **Example** | `date_trunc('hour', TIMESTAMP '1992-09-20 20:38:40')` | | **Result** | `1992-09-20 20:00:00` | ###### `dayname(timestamp)` {#docs:stable:sql:functions:timestamp::daynametimestamp} | | | |:--|:--------| | **Description** |The (English) name of the weekday. | | **Example** | `dayname(TIMESTAMP '1992-03-22')` | | **Result** | `Sunday` | ###### `epoch_ms(timestamp)` {#docs:stable:sql:functions:timestamp::epoch_mstimestamp} | | | |:--|:--------| | **Description** |Returns the total number of milliseconds since the epoch. | | **Example** | `epoch_ms(TIMESTAMP '2021-08-03 11:59:44.123456')` | | **Result** | `1627991984123` | ###### `epoch_ns(timestamp)` {#docs:stable:sql:functions:timestamp::epoch_nstimestamp} | | | |:--|:--------| | **Description** |Return the total number of nanoseconds since the epoch. | | **Example** | `epoch_ns(TIMESTAMP '2021-08-03 11:59:44.123456')` | | **Result** | `1627991984123456000` | ###### `epoch_us(timestamp)` {#docs:stable:sql:functions:timestamp::epoch_ustimestamp} | | | |:--|:--------| | **Description** |Returns the total number of microseconds since the epoch. | | **Example** | `epoch_us(TIMESTAMP '2021-08-03 11:59:44.123456')` | | **Result** | `1627991984123456` | ###### `epoch(timestamp)` {#docs:stable:sql:functions:timestamp::epochtimestamp} | | | |:--|:--------| | **Description** |Returns the total number of seconds since the epoch. | | **Example** | `epoch('2022-11-07 08:43:04'::TIMESTAMP);` | | **Result** | `1667810584` | ###### `extract(field FROM timestamp)` {#docs:stable:sql:functions:timestamp::extractfield-from-timestamp} | | | |:--|:--------| | **Description** |Get [subfield](#docs:stable:sql:functions:datepart) from a timestamp. | | **Example** | `extract('hour' FROM TIMESTAMP '1992-09-20 20:38:48')` | | **Result** | `20` | ###### `greatest(timestamp, timestamp)` {#docs:stable:sql:functions:timestamp::greatesttimestamp-timestamp} | | | |:--|:--------| | **Description** |The later of two timestamps. | | **Example** | `greatest(TIMESTAMP '1992-09-20 20:38:48', TIMESTAMP '1992-03-22 01:02:03.1234')` | | **Result** | `1992-09-20 20:38:48` | ###### `isfinite(timestamp)` {#docs:stable:sql:functions:timestamp::isfinitetimestamp} | | | |:--|:--------| | **Description** |Returns true if the timestamp is finite, false otherwise. | | **Example** | `isfinite(TIMESTAMP '1992-03-07')` | | **Result** | `true` | ###### `isinf(timestamp)` {#docs:stable:sql:functions:timestamp::isinftimestamp} | | | |:--|:--------| | **Description** |Returns true if the timestamp is infinite, false otherwise. | | **Example** | `isinf(TIMESTAMP '-infinity')` | | **Result** | `true` | ###### `julian(timestamp)` {#docs:stable:sql:functions:timestamp::juliantimestamp} | | | |:--|:--------| | **Description** |Extract the Julian Day number from a timestamp. | | **Example** | `julian(TIMESTAMP '1992-03-22 01:02:03.1234')` | | **Result** | `2448704.043091706` | ###### `last_day(timestamp)` {#docs:stable:sql:functions:timestamp::last_daytimestamp} | | | |:--|:--------| | **Description** |The last day of the month. | | **Example** | `last_day(TIMESTAMP '1992-03-22 01:02:03.1234')` | | **Result** | `1992-03-31` | ###### `least(timestamp, timestamp)` {#docs:stable:sql:functions:timestamp::leasttimestamp-timestamp} | | | |:--|:--------| | **Description** |The earlier of two timestamps. | | **Example** | `least(TIMESTAMP '1992-09-20 20:38:48', TIMESTAMP '1992-03-22 01:02:03.1234')` | | **Result** | `1992-03-22 01:02:03.1234` | ###### `make_timestamp(bigint, bigint, bigint, bigint, bigint, double)` {#docs:stable:sql:functions:timestamp::make_timestampbigint-bigint-bigint-bigint-bigint-double} | | | |:--|:--------| | **Description** |The timestamp for the given parts. | | **Example** | `make_timestamp(1992, 9, 20, 13, 34, 27.123456)` | | **Result** | `1992-09-20 13:34:27.123456` | ###### `make_timestamp(microseconds)` {#docs:stable:sql:functions:timestamp::make_timestampmicroseconds} | | | |:--|:--------| | **Description** |Converts microseconds since the epoch to a timestamp. | | **Example** | `make_timestamp(1667810584123456)` | | **Result** | `2022-11-07 08:43:04.123456` | ###### `make_timestamp_ms(milliseconds)` {#docs:stable:sql:functions:timestamp::make_timestamp_msmilliseconds} | | | |:--|:--------| | **Description** |Converts milliseconds since the epoch to a timestamp. | | **Example** | `make_timestamp(1667810584123)` | | **Result** | `2022-11-07 08:43:04.123` | ###### `make_timestamp_ns(nanoseconds)` {#docs:stable:sql:functions:timestamp::make_timestamp_nsnanoseconds} | | | |:--|:--------| | **Description** |Converts nanoseconds since the epoch to a timestamp. | | **Example** | `make_timestamp_ns(1667810584123456789)` | | **Result** | `2022-11-07 08:43:04.123456789` | ###### `monthname(timestamp)` {#docs:stable:sql:functions:timestamp::monthnametimestamp} | | | |:--|:--------| | **Description** |The (English) name of the month. | | **Example** | `monthname(TIMESTAMP '1992-09-20')` | | **Result** | `September` | ###### `strftime(timestamp, format)` {#docs:stable:sql:functions:timestamp::strftimetimestamp-format} | | | |:--|:--------| | **Description** |Converts timestamp to string according to the [format string](#docs:stable:sql:functions:dateformat::format-specifiers). | | **Example** | `strftime(timestamp '1992-01-01 20:38:40', '%a, %-d %B %Y - %I:%M:%S %p')` | | **Result** | `Wed, 1 January 1992 - 08:38:40 PM` | ###### `strptime(text, format-list)` {#docs:stable:sql:functions:timestamp::strptimetext-format-list} | | | |:--|:--------| | **Description** |Converts the string `text` to timestamp applying the [format strings](#docs:stable:sql:functions:dateformat) in the list until one succeeds. Throws an error on failure. To return `NULL` on failure, use [`try_strptime`](#::try_strptimetext-format-list). | | **Example** | `strptime('4/15/2023 10:56:00', ['%d/%m/%Y %H:%M:%S', '%m/%d/%Y %H:%M:%S'])` | | **Result** | `2023-04-15 10:56:00` | ###### `strptime(text, format)` {#docs:stable:sql:functions:timestamp::strptimetext-format} | | | |:--|:--------| | **Description** |Converts the string `text` to timestamp according to the [format string](#docs:stable:sql:functions:dateformat::format-specifiers). Throws an error on failure. To return `NULL` on failure, use [`try_strptime`](#::try_strptimetext-format). | | **Example** | `strptime('Wed, 1 January 1992 - 08:38:40 PM', '%a, %-d %B %Y - %I:%M:%S %p')` | | **Result** | `1992-01-01 20:38:40` | ###### `time_bucket(bucket_width, timestamp[, offset])` {#docs:stable:sql:functions:timestamp::time_bucketbucket_width-timestamp-offset} | | | |:--|:--------| | **Description** |Truncate `timestamp` to a grid of width `bucket_width`. The grid includes `2000-01-01 00:00:00[ + offset]` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00[ + offset]`. Note that `2000-01-03` is a Monday. | | **Example** | `time_bucket(INTERVAL '10 minutes', TIMESTAMP '1992-04-20 15:26:00-07', INTERVAL '5 minutes')` | | **Result** | `1992-04-20 15:25:00` | ###### `time_bucket(bucket_width, timestamp[, origin])` {#docs:stable:sql:functions:timestamp::time_bucketbucket_width-timestamp-origin} | | | |:--|:--------| | **Description** |Truncate `timestamp` to a grid of width `bucket_width`. The grid includes the `origin` timestamp, which defaults to `2000-01-01 00:00:00` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00`. Note that `2000-01-03` is a Monday. | | **Example** | `time_bucket(INTERVAL '2 weeks', TIMESTAMP '1992-04-20 15:26:00', TIMESTAMP '1992-04-01 00:00:00')` | | **Result** | `1992-04-15 00:00:00` | ###### `try_strptime(text, format-list)` {#docs:stable:sql:functions:timestamp::try_strptimetext-format-list} | | | |:--|:--------| | **Description** |Converts the string `text` to timestamp applying the [format strings](#docs:stable:sql:functions:dateformat) in the list until one succeeds. Returns `NULL` on failure. | | **Example** | `try_strptime('4/15/2023 10:56:00', ['%d/%m/%Y %H:%M:%S', '%m/%d/%Y %H:%M:%S'])` | | **Result** | `2023-04-15 10:56:00` | ###### `try_strptime(text, format)` {#docs:stable:sql:functions:timestamp::try_strptimetext-format} | | | |:--|:--------| | **Description** |Converts the string `text` to timestamp according to the [format string](#docs:stable:sql:functions:dateformat::format-specifiers). Returns `NULL` on failure. | | **Example** | `try_strptime('Wed, 1 January 1992 - 08:38:40 PM', '%a, %-d %B %Y - %I:%M:%S %p')` | | **Result** | `1992-01-01 20:38:40` | #### Timestamp Table Functions {#docs:stable:sql:functions:timestamp::timestamp-table-functions} The table below shows the available table functions for `TIMESTAMP` types. | Name | Description | |:--|:-------| | [`generate_series(timestamp, timestamp, interval)`](#::generate_seriestimestamp-timestamp-interval) | Generate a table of timestamps in the closed range, stepping by the interval. | | [`range(timestamp, timestamp, interval)`](#::rangetimestamp-timestamp-interval) | Generate a table of timestamps in the half open range, stepping by the interval. | > Infinite values are not allowed as table function bounds. ###### `generate_series(timestamp, timestamp, interval)` {#docs:stable:sql:functions:timestamp::generate_seriestimestamp-timestamp-interval} | | | |:--|:--------| | **Description** |Generate a table of timestamps in the closed range, stepping by the interval. | | **Example** | `generate_series(TIMESTAMP '2001-04-10', TIMESTAMP '2001-04-11', INTERVAL 30 MINUTE)` | ###### `range(timestamp, timestamp, interval)` {#docs:stable:sql:functions:timestamp::rangetimestamp-timestamp-interval} | | | |:--|:--------| | **Description** |Generate a table of timestamps in the half open range, stepping by the interval. | | **Example** | `range(TIMESTAMP '2001-04-10', TIMESTAMP '2001-04-11', INTERVAL 30 MINUTE)` | ### Timestamp with Time Zone Functions {#docs:stable:sql:functions:timestamptz} This section describes functions and operators for examining and manipulating [`TIMESTAMP WITH TIME ZONE` (or `TIMESTAMPTZ`) values](#docs:stable:sql:data_types:timestamp). See also the related [`TIMESTAMP` functions](#docs:stable:sql:functions:timestamp). Time zone support is provided by the built-in [ICU extension](#docs:stable:core_extensions:icu). In the examples below, the current time zone is presumed to be `America/Los_Angeles` using the Gregorian calendar. #### Built-In Timestamp with Time Zone Functions {#docs:stable:sql:functions:timestamptz::built-in-timestamp-with-time-zone-functions} The table below shows the available scalar functions for `TIMESTAMPTZ` values. Since these functions do not involve binning or display, they are always available. | Name | Description | |:--|:-------| | [`current_timestamp`](#::current_timestamp) | Current date and time (start of current transaction). | | [`get_current_timestamp()`](#::get_current_timestamp) | Current date and time (start of current transaction). | | [`greatest(timestamptz, timestamptz)`](#::greatesttimestamptz-timestamptz) | The later of two timestamps. | | [`isfinite(timestamptz)`](#::isfinitetimestamptz) | Returns true if the timestamp with time zone is finite, false otherwise. | | [`isinf(timestamptz)`](#::isinftimestamptz) | Returns true if the timestamp with time zone is infinite, false otherwise. | | [`least(timestamptz, timestamptz)`](#::leasttimestamptz-timestamptz) | The earlier of two timestamps. | | [`now()`](#::now) | Current date and time (start of current transaction). | | [`timetz_byte_comparable(timetz)`](#::timetz_byte_comparabletimetz) | Converts a `TIME WITH TIME ZONE` to a `UBIGINT` sort key. | | [`to_timestamp(double)`](#::to_timestampdouble) | Converts seconds since the epoch to a timestamp with time zone. | | [`transaction_timestamp()`](#::transaction_timestamp) | Current date and time (start of current transaction). | ###### `current_timestamp` {#docs:stable:sql:functions:timestamptz::current_timestamp} | | | |:--|:--------| | **Description** |Current date and time (start of current transaction). | | **Example** | `current_timestamp` | | **Result** | `2022-10-08 12:44:46.122-07` | ###### `get_current_timestamp()` {#docs:stable:sql:functions:timestamptz::get_current_timestamp} | | | |:--|:--------| | **Description** |Current date and time (start of current transaction). | | **Example** | `get_current_timestamp()` | | **Result** | `2022-10-08 12:44:46.122-07` | ###### `greatest(timestamptz, timestamptz)` {#docs:stable:sql:functions:timestamptz::greatesttimestamptz-timestamptz} | | | |:--|:--------| | **Description** |The later of two timestamps. | | **Example** | `greatest(TIMESTAMPTZ '1992-09-20 20:38:48', TIMESTAMPTZ '1992-03-22 01:02:03.1234')` | | **Result** | `1992-09-20 20:38:48-07` | ###### `isfinite(timestamptz)` {#docs:stable:sql:functions:timestamptz::isfinitetimestamptz} | | | |:--|:--------| | **Description** |Returns true if the timestamp with time zone is finite, false otherwise. | | **Example** | `isfinite(TIMESTAMPTZ '1992-03-07')` | | **Result** | `true` | ###### `isinf(timestamptz)` {#docs:stable:sql:functions:timestamptz::isinftimestamptz} | | | |:--|:--------| | **Description** |Returns true if the timestamp with time zone is infinite, false otherwise. | | **Example** | `isinf(TIMESTAMPTZ '-infinity')` | | **Result** | `true` | ###### `least(timestamptz, timestamptz)` {#docs:stable:sql:functions:timestamptz::leasttimestamptz-timestamptz} | | | |:--|:--------| | **Description** |The earlier of two timestamps. | | **Example** | `least(TIMESTAMPTZ '1992-09-20 20:38:48', TIMESTAMPTZ '1992-03-22 01:02:03.1234')` | | **Result** | `1992-03-22 01:02:03.1234-08` | ###### `now()` {#docs:stable:sql:functions:timestamptz::now} | | | |:--|:--------| | **Description** |Current date and time (start of current transaction). | | **Example** | `now()` | | **Result** | `2022-10-08 12:44:46.122-07` | ###### `timetz_byte_comparable(timetz)` {#docs:stable:sql:functions:timestamptz::timetz_byte_comparabletimetz} | | | |:--|:--------| | **Description** |Converts a `TIME WITH TIME ZONE` to a `UBIGINT` sort key. | | **Example** | `timetz_byte_comparable('18:18:16.21-07:00'::TIMETZ)` | | **Result** | `2494691656335442799` | ###### `to_timestamp(double)` {#docs:stable:sql:functions:timestamptz::to_timestampdouble} | | | |:--|:--------| | **Description** |Converts seconds since the epoch to a timestamp with time zone. | | **Example** | `to_timestamp(1284352323.5)` | | **Result** | `2010-09-13 04:32:03.5+00` | ###### `transaction_timestamp()` {#docs:stable:sql:functions:timestamptz::transaction_timestamp} | | | |:--|:--------| | **Description** |Current date and time (start of current transaction). | | **Example** | `transaction_timestamp()` | | **Result** | `2022-10-08 12:44:46.122-07` | #### Timestamp with Time Zone Strings {#docs:stable:sql:functions:timestamptz::timestamp-with-time-zone-strings} With no time zone extension loaded, `TIMESTAMPTZ` values will be cast to and from strings using offset notation. This will let you specify an instant correctly without access to time zone information. For portability, `TIMESTAMPTZ` values will always be displayed using GMT offsets: ```sql SELECT '2022-10-08 13:13:34-07'::TIMESTAMPTZ; ``` ```text 2022-10-08 20:13:34+00 ``` If a time zone extension such as ICU is loaded, then a time zone can be parsed from a string and cast to a representation in the local time zone: ```sql SELECT '2022-10-08 13:13:34 Europe/Amsterdam'::TIMESTAMPTZ::VARCHAR; ``` ```text 2022-10-08 04:13:34-07 -- the offset will differ based on your local time zone ``` #### ICU Timestamp with Time Zone Operators {#docs:stable:sql:functions:timestamptz::icu-timestamp-with-time-zone-operators} The table below shows the available mathematical operators for `TIMESTAMP WITH TIME ZONE` values provided by the ICU extension. | Operator | Description | Example | Result | |:-|:--|:----|:--| | `+` | addition of an `INTERVAL` | `TIMESTAMPTZ '1992-03-22 01:02:03' + INTERVAL 5 DAY` | `1992-03-27 01:02:03` | | `-` | subtraction of `TIMESTAMPTZ`s | `TIMESTAMPTZ '1992-03-27' - TIMESTAMPTZ '1992-03-22'` | `5 days` | | `-` | subtraction of an `INTERVAL` | `TIMESTAMPTZ '1992-03-27 01:02:03' - INTERVAL 5 DAY` | `1992-03-22 01:02:03` | Adding to or subtracting from [infinite values](#docs:stable:sql:data_types:timestamp::special-values) produces the same infinite value. #### ICU Timestamp with Time Zone Functions {#docs:stable:sql:functions:timestamptz::icu-timestamp-with-time-zone-functions} The table below shows the ICU provided scalar functions for `TIMESTAMP WITH TIME ZONE` values. | Name | Description | |:--|:-------| | [`age(timestamptz, timestamptz)`](#::agetimestamptz-timestamptz) | Subtract arguments, resulting in the time difference between the two timestamps. | | [`age(timestamptz)`](#::agetimestamptz) | Subtract from current_date. | | [`date_diff(part, starttimestamptz, endtimestamptz)`](#::date_diffpart-starttimestamptz-endtimestamptz) | The number of [`part`](#docs:stable:sql:functions:datepart) boundaries between `starttimestamptz` and `endtimestamptz` inclusive of the larger timstamp and exclusive of the smaller timestamp. | | [`date_part([part, ...], timestamp)`](#date_partpart--timestamptz) | Get the listed [subfields](#docs:stable:sql:functions:datepart) as a `struct`. The list must be constant. | | [`date_part(part, timestamp)`](#::date_partpart-timestamptz) | Get [subfield](#docs:stable:sql:functions:datepart) (equivalent to `extract`). | | [`date_sub(part, starttimestamptz, endtimestamptz)`](#::date_subpart-starttimestamptz-endtimestamptz) | The signed length of the interval between `starttimestamptz` and `endtimestamptz`, truncated to whole multiples of [`part`](#docs:stable:sql:functions:datepart). | | [`date_trunc(part, timestamptz)`](#::date_truncpart-timestamptz) | Truncate to specified [precision](#docs:stable:sql:functions:datepart). | | [`epoch_ns(timestamptz)`](#::epoch_nstimestamptz) | Converts a timestamptz to nanoseconds since the epoch. | | [`epoch_us(timestamptz)`](#::epoch_ustimestamptz) | Converts a timestamptz to microseconds since the epoch. | | [`extract(field FROM timestamptz)`](#::extractfield-from-timestamptz) | Get [subfield](#docs:stable:sql:functions:datepart) from a `TIMESTAMP WITH TIME ZONE`. | | [`last_day(timestamptz)`](#::last_daytimestamptz) | The last day of the month. | | [`make_timestamptz(bigint, bigint, bigint, bigint, bigint, double, string)`](#::make_timestamptzbigint-bigint-bigint-bigint-bigint-double-string) | The `TIMESTAMP WITH TIME ZONE` for the given parts and time zone. | | [`make_timestamptz(bigint, bigint, bigint, bigint, bigint, double)`](#::make_timestamptzbigint-bigint-bigint-bigint-bigint-double) | The `TIMESTAMP WITH TIME ZONE` for the given parts in the current time zone. | | [`make_timestamptz(microseconds)`](#::make_timestamptzmicroseconds) | The `TIMESTAMP WITH TIME ZONE` for the given Âµs since the epoch. | | [`strftime(timestamptz, format)`](#::strftimetimestamptz-format) | Converts a `TIMESTAMP WITH TIME ZONE` value to string according to the [format string](#docs:stable:sql:functions:dateformat::format-specifiers). | | [`strptime(text, format)`](#::strptimetext-format) | Converts string to `TIMESTAMP WITH TIME ZONE` according to the [format string](#docs:stable:sql:functions:dateformat::format-specifiers) if `%Z` is specified. | | [`time_bucket(bucket_width, timestamptz[, offset])`](#time_bucketbucket_width-timestamptz-offset) | Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at `2000-01-01 00:00:00+00:00[ + offset]` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00+00:00[ + offset]`. Note that `2000-01-03` is a Monday. | | [`time_bucket(bucket_width, timestamptz[, origin])`](#time_bucketbucket_width-timestamptz-origin) | Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at the `origin` timestamp, which defaults to `2000-01-01 00:00:00+00:00` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00+00:00`. Note that `2000-01-03` is a Monday. | | [`time_bucket(bucket_width, timestamptz[, timezone])`](#time_bucketbucket_width-timestamptz-origin) | Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at the `origin` timestamp, which defaults to `2000-01-01 00:00:00` in the provided `timezone` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00` in the provided `timezone`. The default timezone is `'UTC'`. Note that `2000-01-03` is a Monday. | ###### `age(timestamptz, timestamptz)` {#docs:stable:sql:functions:timestamptz::agetimestamptz-timestamptz} | | | |:--|:--------| | **Description** |Subtract arguments, resulting in the time difference between the two timestamps. | | **Example** | `age(TIMESTAMPTZ '2001-04-10', TIMESTAMPTZ '1992-09-20')` | | **Result** | `8 years 6 months 20 days` | ###### `age(timestamptz)` {#docs:stable:sql:functions:timestamptz::agetimestamptz} | | | |:--|:--------| | **Description** |Subtract from current_date. | | **Example** | `age(TIMESTAMP '1992-09-20')` | | **Result** | `29 years 1 month 27 days 12:39:00.844` | ###### `date_diff(part, starttimestamptz, endtimestamptz)` {#docs:stable:sql:functions:timestamptz::date_diffpart-starttimestamptz-endtimestamptz} | | | |:--|:--------| | **Description** |The signed number of [`part`](#docs:stable:sql:functions:datepart) boundaries between `starttimestamptz` and `endtimestamptz`, inclusive of the larger timestamp and exclusive of the smaller timestamp. | | **Example** | `date_diff('hour', TIMESTAMPTZ '1992-09-30 23:59:59', TIMESTAMPTZ '1992-10-01 01:58:00')` | | **Result** | `2` | ###### `date_part([part, ...], timestamptz)` {#docs:stable:sql:functions:timestamptz::date_partpart--timestamptz} | | | |:--|:--------| | **Description** |Get the listed [subfields](#docs:stable:sql:functions:datepart) as a `struct`. The list must be constant. | | **Example** | `date_part(['year', 'month', 'day'], TIMESTAMPTZ '1992-09-20 20:38:40-07')` | | **Result** | `{year: 1992, month: 9, day: 20}` | ###### `date_part(part, timestamptz)` {#docs:stable:sql:functions:timestamptz::date_partpart-timestamptz} | | | |:--|:--------| | **Description** |Get [subfield](#docs:stable:sql:functions:datepart) (equivalent to *extract*). | | **Example** | `date_part('minute', TIMESTAMPTZ '1992-09-20 20:38:40')` | | **Result** | `38` | ###### `date_sub(part, starttimestamptz, endtimestamptz)` {#docs:stable:sql:functions:timestamptz::date_subpart-starttimestamptz-endtimestamptz} | | | |:--|:--------| | **Description** |The signed length of the interval between `starttimestamptz` and `endtimestamptz`, truncated to whole multiples of [`part`](#docs:stable:sql:functions:datepart). | | **Example** | `date_sub('hour', TIMESTAMPTZ '1992-09-30 23:59:59', TIMESTAMPTZ '1992-10-01 01:58:00')` | | **Result** | `1` | ###### `date_trunc(part, timestamptz)` {#docs:stable:sql:functions:timestamptz::date_truncpart-timestamptz} | | | |:--|:--------| | **Description** |Truncate to specified [precision](#docs:stable:sql:functions:datepart). | | **Example** | `date_trunc('hour', TIMESTAMPTZ '1992-09-20 20:38:40')` | | **Result** | `1992-09-20 20:00:00` | ###### `epoch_ns(timestamptz)` {#docs:stable:sql:functions:timestamptz::epoch_nstimestamptz} | | | |:--|:--------| | **Description** |Converts a timestamptz to nanoseconds since the epoch. | | **Example** | `epoch_ns('2022-11-07 08:43:04.123456+00'::TIMESTAMPTZ);` | | **Result** | `1667810584123456000` | ###### `epoch_us(timestamptz)` {#docs:stable:sql:functions:timestamptz::epoch_ustimestamptz} | | | |:--|:--------| | **Description** |Converts a timestamptz to microseconds since the epoch. | | **Example** | `epoch_us('2022-11-07 08:43:04.123456+00'::TIMESTAMPTZ);` | | **Result** | `1667810584123456` | ###### `extract(field FROM timestamptz)` {#docs:stable:sql:functions:timestamptz::extractfield-from-timestamptz} | | | |:--|:--------| | **Description** |Get [subfield](#docs:stable:sql:functions:datepart) from a `TIMESTAMP WITH TIME ZONE`. | | **Example** | `extract('hour' FROM TIMESTAMPTZ '1992-09-20 20:38:48')` | | **Result** | `20` | ###### `last_day(timestamptz)` {#docs:stable:sql:functions:timestamptz::last_daytimestamptz} | | | |:--|:--------| | **Description** |The last day of the month. | | **Example** | `last_day(TIMESTAMPTZ '1992-03-22 01:02:03.1234')` | | **Result** | `1992-03-31` | ###### `make_timestamptz(bigint, bigint, bigint, bigint, bigint, double, string)` {#docs:stable:sql:functions:timestamptz::make_timestamptzbigint-bigint-bigint-bigint-bigint-double-string} | | | |:--|:--------| | **Description** |The `TIMESTAMP WITH TIME ZONE` for the given parts and time zone. | | **Example** | `make_timestamptz(1992, 9, 20, 15, 34, 27.123456, 'CET')` | | **Result** | `1992-09-20 06:34:27.123456-07` | ###### `make_timestamptz(bigint, bigint, bigint, bigint, bigint, double)` {#docs:stable:sql:functions:timestamptz::make_timestamptzbigint-bigint-bigint-bigint-bigint-double} | | | |:--|:--------| | **Description** |The `TIMESTAMP WITH TIME ZONE` for the given parts in the current time zone. | | **Example** | `make_timestamptz(1992, 9, 20, 13, 34, 27.123456)` | | **Result** | `1992-09-20 13:34:27.123456-07` | ###### `make_timestamptz(microseconds)` {#docs:stable:sql:functions:timestamptz::make_timestamptzmicroseconds} | | | |:--|:--------| | **Description** |The `TIMESTAMP WITH TIME ZONE` for the given Âµs since the epoch. | | **Example** | `make_timestamptz(1667810584123456)` | | **Result** | `2022-11-07 16:43:04.123456-08` | ###### `strftime(timestamptz, format)` {#docs:stable:sql:functions:timestamptz::strftimetimestamptz-format} | | | |:--|:--------| | **Description** |Converts a `TIMESTAMP WITH TIME ZONE` value to string according to the [format string](#docs:stable:sql:functions:dateformat::format-specifiers). | | **Example** | `strftime(timestamptz '1992-01-01 20:38:40', '%a, %-d %B %Y - %I:%M:%S %p')` | | **Result** | `Wed, 1 January 1992 - 08:38:40 PM` | ###### `strptime(text, format)` {#docs:stable:sql:functions:timestamptz::strptimetext-format} | | | |:--|:--------| | **Description** |Converts string to `TIMESTAMP WITH TIME ZONE` according to the [format string](#docs:stable:sql:functions:dateformat::format-specifiers) if `%Z` is specified. | | **Example** | `strptime('Wed, 1 January 1992 - 08:38:40 PST', '%a, %-d %B %Y - %H:%M:%S %Z')` | | **Result** | `1992-01-01 08:38:40-08` | ###### `time_bucket(bucket_width, timestamptz[, offset])` {#docs:stable:sql:functions:timestamptz::time_bucketbucket_width-timestamptz-offset} | | | |:--|:--------| | **Description** |Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at `2000-01-01 00:00:00+00:00[ + offset]` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00+00:00[ + offset]`. Note that `2000-01-03` is a Monday. | | **Example** | `time_bucket(INTERVAL '10 minutes', TIMESTAMPTZ '1992-04-20 15:26:00-07', INTERVAL '5 minutes')` | | **Result** | `1992-04-20 15:25:00-07` | ###### `time_bucket(bucket_width, timestamptz[, origin])` {#docs:stable:sql:functions:timestamptz::time_bucketbucket_width-timestamptz-origin} | | | |:--|:--------| | **Description** |Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at the `origin` timestamp, which defaults to `2000-01-01 00:00:00+00:00` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00+00:00`. Note that `2000-01-03` is a Monday. | | **Example** | `time_bucket(INTERVAL '2 weeks', TIMESTAMPTZ '1992-04-20 15:26:00-07', TIMESTAMPTZ '1992-04-01 00:00:00-07')` | | **Result** | `1992-04-15 00:00:00-07` | ###### `time_bucket(bucket_width, timestamptz[, timezone])` {#docs:stable:sql:functions:timestamptz::time_bucketbucket_width-timestamptz-timezone} | | | |:--|:--------| | **Description** |Truncate `timestamptz` to a grid of width `bucket_width`. The grid is anchored at the `origin` timestamp, which defaults to `2000-01-01 00:00:00` in the provided `timezone` when `bucket_width` is a number of months or coarser units, else `2000-01-03 00:00:00` in the provided `timezone`. The default timezone is `'UTC'`. Note that `2000-01-03` is a Monday. | | **Example** | `time_bucket(INTERVAL '2 days', TIMESTAMPTZ '1992-04-20 15:26:00-07', 'Europe/Berlin')` | | **Result** | `1992-04-19 15:00:00-07` (=`1992-04-20 00:00:00 Europe/Berlin`) | There are also dedicated extraction functions to get the [subfields](#docs:stable:sql:functions:datepart). #### ICU Timestamp Table Functions {#docs:stable:sql:functions:timestamptz::icu-timestamp-table-functions} The table below shows the available table functions for `TIMESTAMP WITH TIME ZONE` types. | Name | Description | |:--|:-------| | [`generate_series(timestamptz, timestamptz, interval)`](#::generate_seriestimestamptz-timestamptz-interval) | Generate a table of timestamps in the closed range (including both the starting timestamp and the ending timestamp), stepping by the interval. | | [`range(timestamptz, timestamptz, interval)`](#::rangetimestamptz-timestamptz-interval) | Generate a table of timestamps in the half open range (including the starting timestamp, but stopping before the ending timestamp), stepping by the interval. | > Infinite values are not allowed as table function bounds. ###### `generate_series(timestamptz, timestamptz, interval)` {#docs:stable:sql:functions:timestamptz::generate_seriestimestamptz-timestamptz-interval} | | | |:--|:--------| | **Description** |Generate a table of timestamps in the closed range (including both the starting timestamp and the ending timestamp), stepping by the interval. | | **Example** | `generate_series(TIMESTAMPTZ '2001-04-10', TIMESTAMPTZ '2001-04-11', INTERVAL 30 MINUTE)` | ###### `range(timestamptz, timestamptz, interval)` {#docs:stable:sql:functions:timestamptz::rangetimestamptz-timestamptz-interval} | | | |:--|:--------| | **Description** |Generate a table of timestamps in the half open range (including the starting timestamp, but stopping before the ending timestamp), stepping by the interval. | | **Example** | `range(TIMESTAMPTZ '2001-04-10', TIMESTAMPTZ '2001-04-11', INTERVAL 30 MINUTE)` | #### ICU Timestamp Without Time Zone Functions {#docs:stable:sql:functions:timestamptz::icu-timestamp-without-time-zone-functions} The table below shows the ICU provided scalar functions that operate on plain `TIMESTAMP` values. These functions assume that the `TIMESTAMP` is a â€œlocal timestampâ€. A local timestamp is effectively a way of encoding the part values from a time zone into a single value. They should be used with caution because the produced values can contain gaps and ambiguities thanks to daylight savings time. Often the same functionality can be implemented more reliably using the `struct` variant of the `date_part` function. | Name | Description | |:--|:-------| | [`current_localtime()`](#::current_localtime) | Returns a `TIME` whose GMT bin values correspond to local time in the current time zone. | | [`current_localtimestamp()`](#::current_localtimestamp) | Returns a `TIMESTAMP` whose GMT bin values correspond to local date and time in the current time zone. | | [`localtime`](#::localtime) | Synonym for the `current_localtime()` function call. | | [`localtimestamp`](#::localtimestamp) | Synonym for the `current_localtimestamp()` function call. | | [`timezone(text, timestamp)`](#::timezonetext-timestamp) | Use the [date parts](#docs:stable:sql:functions:datepart) of the timestamp in GMT to construct a timestamp in the given time zone. Effectively, the argument is a â€œlocalâ€ time. | | [`timezone(text, timestamptz)`](#::timezonetext-timestamptz) | Use the [date parts](#docs:stable:sql:functions:datepart) of the timestamp in the given time zone to construct a timestamp. Effectively, the result is a â€œlocalâ€ time. | ###### `current_localtime()` {#docs:stable:sql:functions:timestamptz::current_localtime} | | | |:--|:--------| | **Description** |Returns a `TIME` whose GMT bin values correspond to local time in the current time zone. | | **Example** | `current_localtime()` | | **Result** | `08:47:56.497` | ###### `current_localtimestamp()` {#docs:stable:sql:functions:timestamptz::current_localtimestamp} | | | |:--|:--------| | **Description** |Returns a `TIMESTAMP` whose GMT bin values correspond to local date and time in the current time zone. | | **Example** | `current_localtimestamp()` | | **Result** | `2022-12-17 08:47:56.497` | ###### `localtime` {#docs:stable:sql:functions:timestamptz::localtime} | | | |:--|:--------| | **Description** |Synonym for the `current_localtime()` function call. | | **Example** | `localtime` | | **Result** | `08:47:56.497` | ###### `localtimestamp` {#docs:stable:sql:functions:timestamptz::localtimestamp} | | | |:--|:--------| | **Description** |Synonym for the `current_localtimestamp()` function call. | | **Example** | `localtimestamp` | | **Result** | `2022-12-17 08:47:56.497` | ###### `timezone(text, timestamp)` {#docs:stable:sql:functions:timestamptz::timezonetext-timestamp} | | | |:--|:--------| | **Description** |Use the [date parts](#docs:stable:sql:functions:datepart) of the timestamp in GMT to construct a timestamp in the given time zone. Effectively, the argument is a â€œlocalâ€ time. | | **Example** | `timezone('America/Denver', TIMESTAMP '2001-02-16 20:38:40')` | | **Result** | `2001-02-16 19:38:40-08` | ###### `timezone(text, timestamptz)` {#docs:stable:sql:functions:timestamptz::timezonetext-timestamptz} | | | |:--|:--------| | **Description** |Use the [date parts](#docs:stable:sql:functions:datepart) of the timestamp in the given time zone to construct a timestamp. Effectively, the result is a â€œlocalâ€ time. | | **Example** | `timezone('America/Denver', TIMESTAMPTZ '2001-02-16 20:38:40-05')` | | **Result** | `2001-02-16 18:38:40` | #### At Time Zone {#docs:stable:sql:functions:timestamptz::at-time-zone} The `AT TIME ZONE` syntax is syntactic sugar for the (two argument) `timezone` function listed above: ```sql SELECT TIMESTAMP '2001-02-16 20:38:40' AT TIME ZONE 'America/Denver' AS ts; ``` ```text 2001-02-16 19:38:40-08 ``` ```sql SELECT TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40-05' AT TIME ZONE 'America/Denver' AS ts; ``` ```text 2001-02-16 18:38:40 ``` Note that numeric timezones are not allowed: ```sql SELECT TIMESTAMP '2001-02-16 20:38:40-05' AT TIME ZONE '0200' AS ts; ``` ```console Not implemented Error: Unknown TimeZone '0200' ``` #### Infinities {#docs:stable:sql:functions:timestamptz::infinities} Functions applied to infinite dates will either return the same infinite dates (e.g., `greatest`) or `NULL` (e.g., `date_part`) depending on what â€œmakes senseâ€. In general, if the function needs to examine the parts of the infinite temporal value, the result will be `NULL`. #### Calendars {#docs:stable:sql:functions:timestamptz::calendars} The ICU extension also supports [non-Gregorian calendars](#docs:stable:sql:data_types:timestamp::calendar-support). If such a calendar is current, then the display and binning operations will use that calendar. ### Union Functions {#docs:stable:sql:functions:union} | Name | Description | |:--|:-------| | [`union.tag`](#::uniontag) | Dot notation serves as an alias for `union_extract`. | | [`union_extract(union, 'tag')`](#::union_extractunion-tag) | Extract the value with the named tags from the union. `NULL` if the tag is not currently selected. | | [`union_value(tag := any)`](#::union_valuetag--any) | Create a single member `UNION` containing the argument value. The tag of the value will be the bound variable name. | | [`union_tag(union)`](#::union_tagunion) | Retrieve the currently selected tag of the union as an [Enum](#docs:stable:sql:data_types:enum). | ###### `union.tag` {#docs:stable:sql:functions:union::uniontag} | | | |:--|:--------| | **Description** |Dot notation serves as an alias for `union_extract`. | | **Example** | `(union_value(k := 'hello')).k` | | **Result** | `string` | ###### `union_extract(union, 'tag')` {#docs:stable:sql:functions:union::union_extractunion-tag} | | | |:--|:--------| | **Description** |Extract the value with the named tags from the union. `NULL` if the tag is not currently selected. | | **Example** | `union_extract(s, 'k')` | | **Result** | `hello` | ###### `union_value(tag := any)` {#docs:stable:sql:functions:union::union_valuetag--any} | | | |:--|:--------| | **Description** |Create a single member `UNION` containing the argument value. The tag of the value will be the bound variable name. | | **Example** | `union_value(k := 'hello')` | | **Result** | `'hello'::UNION(k VARCHAR)` | ###### `union_tag(union)` {#docs:stable:sql:functions:union::union_tagunion} | | | |:--|:--------| | **Description** |Retrieve the currently selected tag of the union as an [Enum](#docs:stable:sql:data_types:enum). | | **Example** | `union_tag(union_value(k := 'foo'))` | | **Result** | `'k'` | ### Utility Functions {#docs:stable:sql:functions:utility} #### Scalar Utility Functions {#docs:stable:sql:functions:utility::scalar-utility-functions} The functions below are difficult to categorize into specific function types and are broadly useful. | Name | Description | |:--|:-------| | [`alias(column)`](#::aliascolumn) | Return the name of the column. | | [`can_cast_implicitly(source_value, target_value)`](#::can_cast_implicitlysource_value-target_value) | Whether or not we can implicitly cast from the types of the source value to the target value. | | [`checkpoint(database)`](#::checkpointdatabase) | Synchronize WAL with file for (optional) database without interrupting transactions. | | [`coalesce(expr, ...)`](#::coalesceexpr-) | Return the first expression that evaluates to a non-`NULL` value. Accepts 1 or more parameters. Each expression can be a column, literal value, function result, or many others. | | [`constant_or_null(arg1, arg2)`](#::constant_or_nullarg1-arg2) | If `arg2` is `NULL`, return `NULL`. Otherwise, return `arg1`. | | [`count_if(x)`](#::count_ifx) | Aggregate function; rows contribute 1 if `x` is `true` or a non-zero number, else 0. | | [`create_sort_key(parameters...)`](#::create_sort_keyparameters) | Constructs a binary-comparable sort key based on a set of input parameters and sort qualifiers. | | [`current_catalog()`](#::current_catalog) | Return the name of the currently active catalog. Default is memory. | | [`current_database()`](#::current_database) | Return the name of the currently active database. | | [`current_query()`](#::current_query) | Return the current query as a string. | | [`current_schema()`](#::current_schema) | Return the name of the currently active schema. Default is main. | | [`current_schemas(boolean)`](#::current_schemasboolean) | Return list of schemas. Pass a parameter of `true` to include implicit schemas. | | [`current_setting('setting_name')`](#::current_settingsetting_name) | Return the current value of the configuration setting. | | [`currval('sequence_name')`](#::currvalsequence_name) | Return the current value of the sequence. Note that `nextval` must be called at least once prior to calling `currval`. | | [`error(message)`](#::errormessage) | Throws the given error `message`. | | [`equi_width_bins(min, max, bincount, nice := false)`](#::equi_width_binsmin-max-bincount-nice--false) | Returns the upper boundaries of a partition of the interval `[min, max]` into `bin_count` equal-sized subintervals (for use with, e.g., [`histogram`](#docs:stable:sql:functions:aggregates::histogramargboundaries)). If `nice = true`, then `min`, `max`, and `bincount` may be adjusted to produce more aesthetically pleasing results. | | [`force_checkpoint(database)`](#::force_checkpointdatabase) | Synchronize WAL with file for (optional) database interrupting transactions. | | [`gen_random_uuid()`](#::gen_random_uuid) | Return a random UUID similar to this: `eeccb8c5-9943-b2bb-bb5e-222f4e14b687`. | | [`getenv(var)`](#::getenvvar) | Returns the value of the environment variable `var`. Only available in the [command line client](#docs:stable:clients:cli:overview). | | [`hash(value)`](#::hashvalue) | Returns a `UBIGINT` with a hash of `value`. The used hash function may change across DuckDB versions.| | [`icu_sort_key(string, collator)`](#::icu_sort_keystring-collator) | Surrogate [sort key](https://unicode-org.github.io/icu/userguide/collation/architecture.html#sort-keys) used to sort special characters according to the specific locale. Collator parameter is optional. Only available when the ICU extension is installed. | | [`if(a, b, c)`](#::ifa-b-c) | Ternary conditional operator. | | [`ifnull(expr, other)`](#::ifnullexpr-other) | A two-argument version of coalesce. | | [`is_histogram_other_bin(arg)`](#::is_histogram_other_binarg) | Returns `true` when `arg` is the "catch-all element" of its datatype for the purpose of the [`histogram_exact`](#docs:stable:sql:functions:aggregates::histogram_exactargelements) function, which is equal to the "right-most boundary" of its datatype for the purpose of the [`histogram`](#docs:stable:sql:functions:aggregates::histogramargboundaries) function. | | [`md5(string)`](#::md5string) | Returns the MD5 hash of the `string` as a `VARCHAR`. | | [`md5_number(string)`](#::md5_numberstring) | Returns the MD5 hash of the `string` as a `UHUGEINT`. | | [`md5_number_lower(string)`](#::md5_number_lowerstring) | Returns the lower 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. | | [`md5_number_upper(string)`](#::md5_number_upperstring) | Returns the upper 64-bit segment of the MD5 hash of the `string` as a `UBIGINT`. | | [`nextval('sequence_name')`](#::nextvalsequence_name) | Return the following value of the sequence. | | [`nullif(a, b)`](#::nullifa-b) | Return `NULL` if `a = b`, else return `a`. Equivalent to `CASE WHEN a = b THEN NULL ELSE a END`. | | [`pg_typeof(expression)`](#::pg_typeofexpression) | Returns the lower case name of the data type of the result of the expression. For PostgreSQL compatibility. | | [`query(` *`query_string`*`)`](#::queryquery_string) | Table function that parses and executes the query defined in *`query_string`*. Only constant strings are allowed. Warning: this function allows invoking arbitrary queries, potentially altering the database state. | | [`query_table(` *`tbl_name`*`)`](#::query_tabletbl_name) | Table function that returns the table given in *`tbl_name`*. | | [`query_table(` *`tbl_names`*`, [`*`by_name`*`])`](#query_tabletbl_names-by_name) | Table function that returns the union of tables given in *`tbl_names`*. If the optional *`by_name`* parameter is set to `true`, it uses [`UNION ALL BY NAME`](#docs:stable:sql:query_syntax:setops::union-all-by-name) semantics. | | [`read_blob(source)`](#::read_blobsource) | Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `BLOB`. See the [`read_blob` guide](#docs:stable:guides:file_formats:read_file::read_blob) for more details. | | [`read_text(source)`](#::read_textsource) | Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `VARCHAR`. The file content is first validated to be valid UTF-8. If `read_text` attempts to read a file with invalid UTF-8 an error is thrown suggesting to use `read_blob` instead. See the [`read_text` guide](#docs:stable:guides:file_formats:read_file::read_text) for more details. | | [`sha1(string)`](#::sha1string) | Returns a `VARCHAR` with the SHA-1 hash of the `string`. | | [`sha256(string)`](#::sha256string) | Returns a `VARCHAR` with the SHA-256 hash of the `string`. | | [`stats(expression)`](#::statsexpression) | Returns a string with statistics about the expression. Expression can be a column, constant, or SQL expression. | | [`txid_current()`](#::txid_current) | Returns the current transaction's identifier, a `BIGINT` value. It will assign a new one if the current transaction does not have one already. | | [`typeof(expression)`](#::typeofexpression) | Returns the name of the data type of the result of the expression. | | [`uuid()`](#::uuid) | Return a random UUID (UUIDv4) similar to this: `eeccb8c5-9943-b2bb-bb5e-222f4e14b687`. | | [`uuidv4()`](#::uuidv4) | Return a random UUID (UUIDv4) similar to this: `eeccb8c5-9943-b2bb-bb5e-222f4e14b687`. | | [`uuidv7()`](#::uuidv7) | Return a random UUIDv7 similar to this: `81964ebe-00b1-7e1d-b0f9-43c29b6fb8f5`. | | [`uuid_extract_timestamp(uuidv7)`](#::uuid_extract_timestampuuidv7) | Extracts `TIMESTAMP WITH TIME ZONE` from a UUIDv7 value. | | [`uuid_extract_version(uuid)`](#::uuid_extract_versionuuid) | Extracts UUID version (` 4` or `7`). | | [`version()`](#::version) | Return the currently active version of DuckDB in this format. | ###### `alias(column)` {#docs:stable:sql:functions:utility::aliascolumn} | | | |:--|:--------| | **Description** |Return the name of the column. | | **Example** | `alias(column1)` | | **Result** | `column1` | ###### `can_cast_implicitly(source_value, target_value)` {#docs:stable:sql:functions:utility::can_cast_implicitlysource_value-target_value} | | | |:--|:--------| | **Description** |Whether or not we can implicitly cast from the types of the source value to the target value. | | **Example** | `can_cast_implicitly(1::BIGINT, 1::SMALLINT)` | | **Result** | `false` | ###### `checkpoint(database)` {#docs:stable:sql:functions:utility::checkpointdatabase} | | | |:--|:--------| | **Description** |Synchronize WAL with file for (optional) database without interrupting transactions. | | **Example** | `checkpoint(my_db)` | | **Result** | success Boolean | ###### `coalesce(expr, ...)` {#docs:stable:sql:functions:utility::coalesceexpr-} | | | |:--|:--------| | **Description** |Return the first expression that evaluates to a non-`NULL` value. Accepts 1 or more parameters. Each expression can be a column, literal value, function result, or many others. | | **Example** | `coalesce(NULL, NULL, 'default_string')` | | **Result** | `default_string` | ###### `constant_or_null(arg1, arg2)` {#docs:stable:sql:functions:utility::constant_or_nullarg1-arg2} | | | |:--|:--------| | **Description** |If `arg2` is `NULL`, return `NULL`. Otherwise, return `arg1`. | | **Example** | `constant_or_null(42, NULL)` | | **Result** | `NULL` | ###### `count_if(x)` {#docs:stable:sql:functions:utility::count_ifx} | | | |:--|:--------| | **Description** |Aggregate function; rows contribute 1 if `x` is `true` or a non-zero number, else 0. | | **Example** | `count_if(42)` | | **Result** | 1 | ###### `create_sort_key(parameters...)` {#docs:stable:sql:functions:utility::create_sort_keyparameters} | | | |:--|:--------| | **Description** |Constructs a binary-comparable sort key based on a set of input parameters and sort qualifiers. | | **Example** | `create_sort_key('abc', 'ASC NULLS FIRST');` | | **Result** | `\x02bcd\x00` | ###### `current_catalog()` {#docs:stable:sql:functions:utility::current_catalog} | | | |:--|:--------| | **Description** |Return the name of the currently active catalog. Default is memory. | | **Example** | `current_catalog()` | | **Result** | `memory` | ###### `current_database()` {#docs:stable:sql:functions:utility::current_database} | | | |:--|:--------| | **Description** |Return the name of the currently active database. | | **Example** | `current_database()` | | **Result** | `memory` | ###### `current_query()` {#docs:stable:sql:functions:utility::current_query} | | | |:--|:--------| | **Description** |Return the current query as a string. | | **Example** | `current_query()` | | **Result** | `SELECT current_query();` | ###### `current_schema()` {#docs:stable:sql:functions:utility::current_schema} | | | |:--|:--------| | **Description** |Return the name of the currently active schema. Default is main. | | **Example** | `current_schema()` | | **Result** | `main` | ###### `current_schemas(boolean)` {#docs:stable:sql:functions:utility::current_schemasboolean} | | | |:--|:--------| | **Description** |Return list of schemas. Pass a parameter of `true` to include implicit schemas. | | **Example** | `current_schemas(true)` | | **Result** | `['temp', 'main', 'pg_catalog']` | ###### `current_setting('setting_name')` {#docs:stable:sql:functions:utility::current_settingsetting_name} | | | |:--|:--------| | **Description** |Return the current value of the configuration setting. | | **Example** | `current_setting('access_mode')` | | **Result** | `automatic` | ###### `currval('sequence_name')` {#docs:stable:sql:functions:utility::currvalsequence_name} | | | |:--|:--------| | **Description** |Return the current value of the sequence. Note that `nextval` must be called at least once prior to calling `currval`. | | **Example** | `currval('my_sequence_name')` | | **Result** | `1` | ###### `error(message)` {#docs:stable:sql:functions:utility::errormessage} | | | |:--|:--------| | **Description** |Throws the given error `message`. | | **Example** | `error('access_mode')` | ###### `equi_width_bins(min, max, bincount, nice := false)` {#docs:stable:sql:functions:utility::equi_width_binsmin-max-bincount-nice--false} | | | |:--|:--------| | **Description** |Returns the upper boundaries of a partition of the interval `[min, max]` into `bin_count` equal-sized subintervals (for use with, e.g., [`histogram`](#docs:stable:sql:functions:aggregates::histogramargboundaries)). If `nice = true`, then `min`, `max`, and `bincount` may be adjusted to produce more aesthetically pleasing results. | | **Example** | `equi_width_bins(0.1, 2.7, 4, true)` | | **Result** | `[0.5, 1.0, 1.5, 2.0, 2.5, 3.0]` | ###### `force_checkpoint(database)` {#docs:stable:sql:functions:utility::force_checkpointdatabase} | | | |:--|:--------| | **Description** |Synchronize WAL with file for (optional) database interrupting transactions. | | **Example** | `force_checkpoint(my_db)` | | **Result** | success Boolean | ###### `gen_random_uuid()` {#docs:stable:sql:functions:utility::gen_random_uuid} | | | |:--|:--------| | **Description** |Return a random UUID (UUIDv4) similar to this: `eeccb8c5-9943-b2bb-bb5e-222f4e14b687`. | | **Example** | `gen_random_uuid()` | | **Result** | various | ###### `getenv(var)` {#docs:stable:sql:functions:utility::getenvvar} | | | |:--|:--------| | **Description** |Returns the value of the environment variable `var`. Only available in the [command line client](#docs:stable:clients:cli:overview). | | **Example** | `getenv('HOME')` | | **Result** | `/path/to/user/home` | ###### `hash(value)` {#docs:stable:sql:functions:utility::hashvalue} | | | |:--|:--------| | **Description** |Returns a `UBIGINT` with the hash of the `value`. The used hash function may change across DuckDB versions. | | **Example** | `hash('ðŸ¦†')` | | **Result** | `2595805878642663834` | ###### `icu_sort_key(string, collator)` {#docs:stable:sql:functions:utility::icu_sort_keystring-collator} | | | |:--|:--------| | **Description** |Surrogate [sort key](https://unicode-org.github.io/icu/userguide/collation/architecture.html#sort-keys) used to sort special characters according to the specific locale. Collator parameter is optional. Only available when the ICU extension is installed. | | **Example** | `icu_sort_key('Ã¶', 'DE')` | | **Result** | `460145960106` | ###### `if(a, b, c)` {#docs:stable:sql:functions:utility::ifa-b-c} | | | |:--|:--------| | **Description** |Ternary conditional operator; returns b if a, else returns c. Equivalent to `CASE WHEN a THEN b ELSE c END`. | | **Example** | `if(2 > 1, 3, 4)` | | **Result** | `3` | ###### `ifnull(expr, other)` {#docs:stable:sql:functions:utility::ifnullexpr-other} | | | |:--|:--------| | **Description** |A two-argument version of coalesce. | | **Example** | `ifnull(NULL, 'default_string')` | | **Result** | `default_string` | ###### `is_histogram_other_bin(arg)` {#docs:stable:sql:functions:utility::is_histogram_other_binarg} | | | |:--|:--------| | **Description** |Returns `true` when `arg` is the "catch-all element" of its datatype for the purpose of the [`histogram_exact`](#docs:stable:sql:functions:aggregates::histogram_exactargelements) function, which is equal to the "right-most boundary" of its datatype for the purpose of the [`histogram`](#docs:stable:sql:functions:aggregates::histogramargboundaries) function. | | **Example** | `is_histogram_other_bin('')` | | **Result** | `true` | ###### `md5(string)` {#docs:stable:sql:functions:utility::md5string} | | | |:--|:--------| | **Description** |Returns the MD5 hash of the `string` as a `VARCHAR`. | | **Example** | `md5('abc')` | | **Result** | `900150983cd24fb0d6963f7d28e17f72` | ###### `md5_number(string)` {#docs:stable:sql:functions:utility::md5_numberstring} | | | |:--|:--------| | **Description** |Returns the MD5 hash of the `string` as a `UHUGEINT`. | | **Example** | `md5_number('abc')` | | **Result** | `152195979970564155685860391459828531600` | ###### `md5_number_lower(string)` {#docs:stable:sql:functions:utility::md5_number_lowerstring} | | | |:--|:--------| | **Description** |Returns the lower 8 bytes of the MD5 hash of `string` as a `UBIGINT`. | | **Example** | `md5_number_lower('abc')` | | **Result** | `8250560606382298838` | ###### `md5_number_upper(string)` {#docs:stable:sql:functions:utility::md5_number_upperstring} | | | |:--|:--------| | **Description** |Returns the upper 8 bytes of the MD5 hash of `string` as a `UBIGINT`. | | **Example** | `md5_number_upper('abc')` | | **Result** | `12704604231530709392` | ###### `nextval('sequence_name')` {#docs:stable:sql:functions:utility::nextvalsequence_name} | | | |:--|:--------| | **Description** |Return the following value of the sequence. | | **Example** | `nextval('my_sequence_name')` | | **Result** | `2` | ###### `nullif(a, b)` {#docs:stable:sql:functions:utility::nullifa-b} | | | |:--|:--------| | **Description** |Return `NULL` if a = b, else return a. Equivalent to `CASE WHEN a = b THEN NULL ELSE a END`. | | **Example** | `nullif(1+1, 2)` | | **Result** | `NULL` | ###### `pg_typeof(expression)` {#docs:stable:sql:functions:utility::pg_typeofexpression} | | | |:--|:--------| | **Description** |Returns the lower case name of the data type of the result of the expression. For PostgreSQL compatibility. | | **Example** | `pg_typeof('abc')` | | **Result** | `varchar` | ###### `query(query_string)` {#docs:stable:sql:functions:utility::queryquery_string} | | | |:--|:--------| | **Description** |Table function that parses and executes the query defined in `query_string`. Only constant strings are allowed. Warning: this function allows invoking arbitrary queries, potentially altering the database state. | | **Example** | `query('SELECT 42 AS x')` | | **Result** | `42` | ###### `query_table(tbl_name)` {#docs:stable:sql:functions:utility::query_tabletbl_name} | | | |:--|:--------| | **Description** |Table function that returns the table given in `tbl_name`. | | **Example** | `query_table('t1')` | | **Result** | (the rows of `t1`) | ###### `query_table(tbl_names, [by_name])` {#docs:stable:sql:functions:utility::query_tabletbl_names-by_name} | | | |:--|:--------| | **Description** |Table function that returns the union of tables given in `tbl_names`. If the optional `by_name` parameter is set to `true`, it uses [`UNION ALL BY NAME`](#docs:stable:sql:query_syntax:setops::union-all-by-name) semantics. | | **Example** | `query_table(['t1', 't2'])` | | **Result** | (the union of the two tables) | ###### `read_blob(source)` {#docs:stable:sql:functions:utility::read_blobsource} | | | |:--|:--------| | **Description** |Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `BLOB`. See the [`read_blob` guide](#docs:stable:guides:file_formats:read_file::read_blob) for more details. | | **Example** | `read_blob('hello.bin')` | | **Result** | `hello\x0A` | ###### `read_text(source)` {#docs:stable:sql:functions:utility::read_textsource} | | | |:--|:--------| | **Description** |Returns the content from `source` (a filename, a list of filenames, or a glob pattern) as a `VARCHAR`. The file content is first validated to be valid UTF-8. If `read_text` attempts to read a file with invalid UTF-8 an error is thrown suggesting to use `read_blob` instead. See the [`read_text` guide](#docs:stable:guides:file_formats:read_file::read_text) for more details. | | **Example** | `read_text('hello.txt')` | | **Result** | `hello\n` | ###### `sha1(string)` {#docs:stable:sql:functions:utility::sha1string} | | | |:--|:--------| | **Description** |Returns a `VARCHAR` with the SHA-1 hash of the `string`. | | **Example** | `sha1('ðŸ¦†')` | | **Result** | `949bf843dc338be348fb9525d1eb535d31241d76` | ###### `sha256(string)` {#docs:stable:sql:functions:utility::sha256string} | | | |:--|:--------| | **Description** |Returns a `VARCHAR` with the SHA-256 hash of the `string`. | | **Example** | `sha256('ðŸ¦†')` | | **Result** | `d7a5c5e0d1d94c32218539e7e47d4ba9c3c7b77d61332fb60d633dde89e473fb` | ###### `stats(expression)` {#docs:stable:sql:functions:utility::statsexpression} | | | |:--|:--------| | **Description** |Returns a string with statistics about the expression. Expression can be a column, constant, or SQL expression. | | **Example** | `stats(5)` | | **Result** | `'[Min: 5, Max: 5][Has Null: false]'` | ###### `txid_current()` {#docs:stable:sql:functions:utility::txid_current} | | | |:--|:--------| | **Description** |Returns the current transaction's identifier, a `BIGINT` value. It will assign a new one if the current transaction does not have one already. | | **Example** | `txid_current()` | | **Result** | various | ###### `typeof(expression)` {#docs:stable:sql:functions:utility::typeofexpression} | | | |:--|:--------| | **Description** |Returns the name of the data type of the result of the expression. | | **Example** | `typeof('abc')` | | **Result** | `VARCHAR` | ###### `uuid()` {#docs:stable:sql:functions:utility::uuid} | | | |:--|:--------| | **Description** |Return a random UUID (UUIDv4) similar to this: `eeccb8c5-9943-b2bb-bb5e-222f4e14b687`. | | **Example** | `uuid()` | | **Result** | various | ###### `uuidv4()` {#docs:stable:sql:functions:utility::uuidv4} | | | |:--|:--------| | **Description** |Return a random UUID (UUIDv4) similar to this: `eeccb8c5-9943-b2bb-bb5e-222f4e14b687`. | | **Example** | `uuidv4()` | | **Result** | various | ###### `uuidv7()` {#docs:stable:sql:functions:utility::uuidv7} | | | |:--|:--------| | **Description** |Return a random UUIDv7 similar to this: `81964ebe-00b1-7e1d-b0f9-43c29b6fb8f5`. | | **Example** | `uuidv7()` | | **Result** | various | ###### `uuid_extract_timestamp(uuidv7)` {#docs:stable:sql:functions:utility::uuid_extract_timestampuuidv7} | | | |:--|:--------| | **Description** |Extracts `TIMESTAMP WITH TIME ZONE` from a UUIDv7 value. | | **Example** | `uuid_extract_timestamp(uuidv7())` | | **Result** | `2025-04-19 15:51:20.07+00` | ###### `uuid_extract_version(uuid)` {#docs:stable:sql:functions:utility::uuid_extract_versionuuid} | | | |:--|:--------| | **Description** |Extracts UUID version (` 4` or `7`). | | **Example** | `uuid_extract_version(uuidv7())` | | **Result** | `7` | ###### `version()` {#docs:stable:sql:functions:utility::version} | | | |:--|:--------| | **Description** |Return the currently active version of DuckDB in this format. | | **Example** | `version()` | | **Result** | various | #### Utility Table Functions {#docs:stable:sql:functions:utility::utility-table-functions} A table function is used in place of a table in a `FROM` clause. | Name | Description | |:--|:-------| | [`glob(search_path)`](#::globsearch_path) | Return filenames found at the location indicated by the *search_path* in a single column named `file`. The *search_path* may contain [glob pattern matching syntax](#docs:stable:sql:functions:pattern_matching). | | [`repeat_row(varargs, num_rows)`](#::repeat_rowvarargs-num_rows) | Returns a table with `num_rows` rows, each containing the fields defined in `varargs`. | ###### `glob(search_path)` {#docs:stable:sql:functions:utility::globsearch_path} | | | |:--|:--------| | **Description** |Return filenames found at the location indicated by the *search_path* in a single column named `file`. The *search_path* may contain [glob pattern matching syntax](#docs:stable:sql:functions:pattern_matching). | | **Example** | `glob('*')` | | **Result** | (table of filenames) | ###### `repeat_row(varargs, num_rows)` {#docs:stable:sql:functions:utility::repeat_rowvarargs-num_rows} | | | |:--|:--------| | **Description** |Returns a table with `num_rows` rows, each containing the fields defined in `varargs`. | | **Example** | `repeat_row(1, 2, 'foo', num_rows = 3)` | | **Result** | 3 rows of `1, 2, 'foo'` | ### Window Functions {#docs:stable:sql:functions:window_functions} DuckDB supports [window functions](https://en.wikipedia.org/wiki/Window_function_(SQL)), which can use multiple rows to calculate a value for each row. Window functions are [blocking operators](#docs:stable:guides:performance:how_to_tune_workloads::blocking-operators), i.e., they require their entire input to be buffered, making them one of the most memory-intensive operators in SQL. Window function are available in SQL since [SQL:2003](https://en.wikipedia.org/wiki/SQL:2003) and are supported by major SQL database systems. #### Examples {#docs:stable:sql:functions:window_functions::examples} Generate a `row_number` column to enumerate rows: ```sql SELECT row_number() OVER () FROM sales; ``` > **Tip.** If you only need a number for each row in a table, you can use the [`rowid` pseudocolumn](#docs:stable:sql:statements:select::row-ids). Generate a `row_number` column to enumerate rows, ordered by `time`: ```sql SELECT row_number() OVER (ORDER BY time) FROM sales; ``` Generate a `row_number` column to enumerate rows, ordered by `time` and partitioned by `region`: ```sql SELECT row_number() OVER (PARTITION BY region ORDER BY time) FROM sales; ``` Compute the difference between the current and the previous-by-`time` `amount`: ```sql SELECT amount - lag(amount) OVER (ORDER BY time) FROM sales; ``` Compute the percentage of the total `amount` of sales per `region` for each row: ```sql SELECT amount / sum(amount) OVER (PARTITION BY region) FROM sales; ``` #### Syntax {#docs:stable:sql:functions:window_functions::syntax} Window functions can only be used in the `SELECT` clause. To share `OVER` specifications between functions, use the statement's [`WINDOW` clause](#docs:stable:sql:query_syntax:window) and use the `OVER âŸ¨window_nameâŸ©`{:.language-sql .highlight} syntax. #### General-Purpose Window Functions {#docs:stable:sql:functions:window_functions::general-purpose-window-functions} The table below shows the available general window functions. | Name | Description | |:--|:-------| | [`cume_dist([ORDER BY ordering])`](#cume_distorder-by-ordering) | The cumulative distribution: (number of partition rows preceding or peer with current row) / total partition rows. | | [`dense_rank()`](#::dense_rank) | The rank of the current row *without gaps;* this function counts peer groups. | | [`fill(expr [ ORDER BY ordering])`](#fillexpr-order-by-ordering) | Fill in missing values using linear interpolation with `ORDER BY` as the X-axis. | | [`first_value(expr[ ORDER BY ordering][ IGNORE NULLS])`](#first_valueexpr-order-by-ordering-ignore-nulls) | Returns `expr` evaluated at the row that is the first row (with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame. | | [`lag(expr[, offset[, default]][ ORDER BY ordering][ IGNORE NULLS])`](#lagexpr-offset-default-order-by-ordering-ignore-nulls) | Returns `expr` evaluated at the row that is `offset` rows (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) before the current row within the window frame; if there is no such row, instead return `default` (which must be of the Same type as `expr`). Both `offset` and `default` are evaluated with respect to the current row. If omitted, `offset` defaults to `1` and default to `NULL`. | | [`last_value(expr[ ORDER BY ordering][ IGNORE NULLS])`](#last_valueexpr-order-by-ordering-ignore-nulls) | Returns `expr` evaluated at the row that is the last row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame. | | [`lead(expr[, offset[, default]][ ORDER BY ordering][ IGNORE NULLS])`](#leadexpr-offset-default-order-by-ordering-ignore-nulls) | Returns `expr` evaluated at the row that is `offset` rows after the current row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) within the window frame; if there is no such row, instead return `default` (which must be of the Same type as `expr`). Both `offset` and `default` are evaluated with respect to the current row. If omitted, `offset` defaults to `1` and default to `NULL`. | | [`nth_value(expr, nth[ ORDER BY ordering][ IGNORE NULLS])`](#nth_valueexpr-nth-order-by-ordering-ignore-nulls) | Returns `expr` evaluated at the nth row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame (counting from 1); `NULL` if no such row. | | [`ntile(num_buckets[ ORDER BY ordering])`](#ntilenum_buckets-order-by-ordering) | An integer ranging from 1 to `num_buckets`, dividing the partition as equally as possible. | | [`percent_rank([ORDER BY ordering])`](#percent_rankorder-by-ordering) | The relative rank of the current row: `(rank() - 1) / (total partition rows - 1)`. | | [`rank([ORDER BY ordering])`](#rankorder-by-ordering) | The rank of the current row *with gaps;* same as `row_number` of its first peer. | | [`row_number([ORDER BY ordering])`](#row_numberorder-by-ordering) | The number of the current row within the partition, counting from 1. | ###### `cume_dist([ORDER BY ordering])` {#docs:stable:sql:functions:window_functions::cume_distorder-by-ordering} | | | |:--|:--------| | **Description** |The cumulative distribution: (number of partition rows preceding or peer with current row) / total partition rows. If an `ORDER BY` clause is specified, the distribution is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | `DOUBLE` | | **Example** | `cume_dist()` | ###### `dense_rank()` {#docs:stable:sql:functions:window_functions::dense_rank} | | | |:--|:--------| | **Description** |The rank of the current row *without gaps;* this function counts peer groups. | | **Return type** | `BIGINT` | | **Example** | `dense_rank()` | | **Aliases** | `rank_dense()` | ###### `fill(expr[ ORDER BY ordering])` {#docs:stable:sql:functions:window_functions::fillexpr-order-by-ordering} | | | |:--|:--------| | **Description** |Replaces `NULL` values of `expr` with a linear interpolation based on the closest non-`NULL` values and the sort values. Both values must support arithmetic and there must be only one ordering key. For missing values at the ends, linear extrapolation is used. Failure to interpolate results in the `NULL` value being retained. | | **Return type** | Same type as `expr` | | **Example** | `fill(column)` | ###### `first_value(expr[ ORDER BY ordering][ IGNORE NULLS])` {#docs:stable:sql:functions:window_functions::first_valueexpr-order-by-ordering-ignore-nulls} | | | |:--|:--------| | **Description** |Returns `expr` evaluated at the row that is the first row (with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame. If an `ORDER BY` clause is specified, the first row number is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | Same type as `expr` | | **Example** | `first_value(column)` | ###### `lag(expr[, offset[, default]][ ORDER BY ordering][ IGNORE NULLS])` {#docs:stable:sql:functions:window_functions::lagexpr-offset-default-order-by-ordering-ignore-nulls} | | | |:--|:--------| | **Description** |Returns `expr` evaluated at the row that is `offset` rows (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) before the current row within the window frame; if there is no such row, instead return `default` (which must be of the Same type as `expr`). Both `offset` and `default` are evaluated with respect to the current row. If omitted, `offset` defaults to `1` and default to `NULL`. If an `ORDER BY` clause is specified, the lagged row number is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | Same type as `expr` | | **Example** | `lag(column, 3, 0)` | ###### `last_value(expr[ ORDER BY ordering][ IGNORE NULLS])` {#docs:stable:sql:functions:window_functions::last_valueexpr-order-by-ordering-ignore-nulls} | | | |:--|:--------| | **Description** |Returns `expr` evaluated at the row that is the last row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame. If omitted, `offset` defaults to `1` and default to `NULL`. If an `ORDER BY` clause is specified, the last row is determined within the frame using the provided ordering instead of the frame ordering. | | **Return type** | Same type as `expr` | | **Example** | `last_value(column)` | ###### `lead(expr[, offset[, default]][ ORDER BY ordering][ IGNORE NULLS])` {#docs:stable:sql:functions:window_functions::leadexpr-offset-default-order-by-ordering-ignore-nulls} | | | |:--|:--------| | **Description** |Returns `expr` evaluated at the row that is `offset` rows after the current row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) within the window frame; if there is no such row, instead return `default` (which must be of the Same type as `expr`). Both `offset` and `default` are evaluated with respect to the current row. If omitted, `offset` defaults to `1` and default to `NULL`. If an `ORDER BY` clause is specified, the leading row number is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | Same type as `expr` | | **Example** | `lead(column, 3, 0)` | ###### `nth_value(expr, nth[ ORDER BY ordering][ IGNORE NULLS])` {#docs:stable:sql:functions:window_functions::nth_valueexpr-nth-order-by-ordering-ignore-nulls} | | | |:--|:--------| | **Description** |Returns `expr` evaluated at the nth row (among rows with a non-null value of `expr` if `IGNORE NULLS` is set) of the window frame (counting from 1); `NULL` if no such row. If an `ORDER BY` clause is specified, the nth row number is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | Same type as `expr` | | **Example** | `nth_value(column, 2)` | ###### `ntile(num_buckets[ ORDER BY ordering])` {#docs:stable:sql:functions:window_functions::ntilenum_buckets-order-by-ordering} | | | |:--|:--------| | **Description** |An integer ranging from 1 to `num_buckets`, dividing the partition as equally as possible. If an `ORDER BY` clause is specified, the ntile is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | `BIGINT` | | **Example** | `ntile(4)` | ###### `percent_rank([ORDER BY ordering])` {#docs:stable:sql:functions:window_functions::percent_rankorder-by-ordering} | | | |:--|:--------| | **Description** |The relative rank of the current row: `(rank() - 1) / (total partition rows - 1)`. If an `ORDER BY` clause is specified, the relative rank is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | `DOUBLE` | | **Example** | `percent_rank()` | ###### `rank([ORDER BY ordering])` {#docs:stable:sql:functions:window_functions::rankorder-by-ordering} | | | |:--|:--------| | **Description** |The rank of the current row *with gaps*; same as `row_number` of its first peer. If an `ORDER BY` clause is specified, the rank is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | `BIGINT` | | **Example** | `rank()` | ###### `row_number([ORDER BY ordering])` {#docs:stable:sql:functions:window_functions::row_numberorder-by-ordering} | | | |:--|:--------| | **Description** |The number of the current row within the partition, counting from 1. If an `ORDER BY` clause is specified, the row number is computed within the frame using the provided ordering instead of the frame ordering. | | **Return type** | `BIGINT` | | **Example** | `row_number()` | #### Aggregate Window Functions {#docs:stable:sql:functions:window_functions::aggregate-window-functions} All [aggregate functions](#docs:stable:sql:functions:aggregates) can be used in a windowing context, including the optional [`FILTER` clause](#docs:stable:sql:query_syntax:filter). The `first` and `last` aggregate functions are shadowed by the respective general-purpose window functions, with the minor consequence that the `FILTER` clause is not available for these but `IGNORE NULLS` is. #### DISTINCT Arguments {#docs:stable:sql:functions:window_functions::distinct-arguments} All aggregate window functions support using a `DISTINCT` clause for the arguments. When the `DISTINCT` clause is provided, only distinct values are considered in the computation of the aggregate. This is typically used in combination with the `COUNT` aggregate to get the number of distinct elements; but it can be used together with any aggregate function in the system. There are some aggregates that are insensitive to duplicate values (e.g., `min`, `max`) and for them this clause is parsed and ignored. ```sql -- Count the number of distinct users at a given point in time SELECT count(DISTINCT name) OVER (ORDER BY time) FROM sales; -- Concatenate those distinct users into a list SELECT list(DISTINCT name) OVER (ORDER BY time) FROM sales; ``` #### ORDER BY Arguments {#docs:stable:sql:functions:window_functions::order-by-arguments} All aggregate window functions support using an `ORDER BY` argument clause that is *different* from the window ordering. When the `ORDER BY` argument clause is provided, the values being aggregated are sorted before applying the function. Usually this is not important, but there are some order-sensitive aggregates that can have indeterminate results (e.g., `mode`, `list` and `string_agg`). These can be made deterministic by ordering the arguments. For order-insensitive aggregates, this clause is parsed and ignored. ```sql -- Compute the modal value up to each time, breaking ties in favour of the most recent value. SELECT mode(value ORDER BY time DESC) OVER (ORDER BY time) FROM sales; ``` The SQL standard does not provide for using `ORDER BY` with general-purpose window functions, but we have extended all of these functions (except `dense_rank`) to accept this syntax and use framing to restrict the range that the secondary ordering applies to. ```sql -- Compare each athlete's time in an event with the best time to date SELECT event, date, athlete, time first_value(time ORDER BY time DESC) OVER w AS record_time, first_value(athlete ORDER BY time DESC) OVER w AS record_athlete, FROM meet_results WINDOW w AS (PARTITION BY event ORDER BY datetime) ORDER BY ALL ``` Note that there is no comma separating the arguments from the `ORDER BY` clause. #### Nulls {#docs:stable:sql:functions:window_functions::nulls} All [general-purpose window functions](#::general-purpose-window-functions) that accept `IGNORE NULLS` respect nulls by default. This default behavior can optionally be made explicit via `RESPECT NULLS`. In contrast, all [aggregate window functions](#::aggregate-window-functions) (except for `list` and its aliases, which can be made to ignore nulls via a `FILTER`) ignore nulls and do not accept `RESPECT NULLS`. For example, `sum(column) OVER (ORDER BY time) AS cumulativeColumn` computes a cumulative sum where rows with a `NULL` value of `column` have the same value of `cumulativeColumn` as the row that precedes them. #### Evaluation {#docs:stable:sql:functions:window_functions::evaluation} Windowing works by breaking a relation up into independent *partitions*, *ordering* those partitions, and then computing a new column for each row as a function of the nearby values. Some window functions depend only on the partition boundary and the ordering, but a few (including all the aggregates) also use a *frame*. Frames are specified as a number of rows on either side (*preceding* or *following*) of the *current row*. The distance can be specified as a number of *rows*, as a *range* of values using the partition's ordering value and a distance, or as a number of *groups* (sets of rows with the same sort value). The full syntax is shown in the diagram at the top of the page, and this diagram visually illustrates computation environment: ![](../images/framing-light.png) ![](../images/framing-dark.png) ##### Partition and Ordering {#docs:stable:sql:functions:window_functions::partition-and-ordering} Partitioning breaks the relation up into independent, unrelated pieces. Partitioning is optional, and if none is specified then the entire relation is treated as a single partition. Window functions cannot access values outside of the partition containing the row they are being evaluated at. Ordering is also optional, but without it the results of [general-purpose window functions](#::general-purpose-window-functions) and [order-sensitive aggregate functions](#docs:stable:sql:functions:aggregates::order-by-clause-in-aggregate-functions), and the order of [framing](#::framing) are not well-defined. Each partition is ordered using the same ordering clause. Here is a table of power generation data, available as a CSV file ([`power-plant-generation-history.csv`](https://duckdb.org/data/power-plant-generation-history.csv)). To load the data, run: ```sql CREATE TABLE "Generation History" AS FROM 'power-plant-generation-history.csv'; ``` After partitioning by plant and ordering by date, it will have this layout: | Plant | Date | MWh | |:---|:---|---:| | Boston | 2019-01-02 | 564337 | | Boston | 2019-01-03 | 507405 | | Boston | 2019-01-04 | 528523 | | Boston | 2019-01-05 | 469538 | | Boston | 2019-01-06 | 474163 | | Boston | 2019-01-07 | 507213 | | Boston | 2019-01-08 | 613040 | | Boston | 2019-01-09 | 582588 | | Boston | 2019-01-10 | 499506 | | Boston | 2019-01-11 | 482014 | | Boston | 2019-01-12 | 486134 | | Boston | 2019-01-13 | 531518 | | Worcester | 2019-01-02 | 118860 | | Worcester | 2019-01-03 | 101977 | | Worcester | 2019-01-04 | 106054 | | Worcester | 2019-01-05 | 92182 | | Worcester | 2019-01-06 | 94492 | | Worcester | 2019-01-07 | 99932 | | Worcester | 2019-01-08 | 118854 | | Worcester | 2019-01-09 | 113506 | | Worcester | 2019-01-10 | 96644 | | Worcester | 2019-01-11 | 93806 | | Worcester | 2019-01-12 | 98963 | | Worcester | 2019-01-13 | 107170 | In what follows, we shall use this table (or small sections of it) to illustrate various pieces of window function evaluation. The simplest window function is `row_number()`. This function just computes the 1-based row number within the partition using the query: ```sql SELECT "Plant", "Date", row_number() OVER (PARTITION BY "Plant" ORDER BY "Date") AS "Row" FROM "Generation History" ORDER BY 1, 2; ``` The result will be the following: | Plant | Date | Row | |:---|:---|---:| | Boston | 2019-01-02 | 1 | | Boston | 2019-01-03 | 2 | | Boston | 2019-01-04 | 3 | | ... | ... | ... | | Worcester | 2019-01-02 | 1 | | Worcester | 2019-01-03 | 2 | | Worcester | 2019-01-04 | 3 | | ... | ... | ... | Note that even though the function is computed with an `ORDER BY` clause, the result does not have to be sorted, so the `SELECT` also needs to be explicitly sorted if that is desired. ##### Framing {#docs:stable:sql:functions:window_functions::framing} Framing specifies a set of rows relative to each row where the function is evaluated. The distance from the current row is given as an expression either `PRECEDING` or `FOLLOWING` the current row in the order specified by the `ORDER BY` clause in the `OVER` specification. This distance can either be specified as an integral number of `ROWS` or `GROUPS`, or as a `RANGE` delta expression. It is invalid for a frame to start after it ends. For a `RANGE` specification, there must be only one ordering expression and it must support subtraction unless only the sentinel boundary values `UNBOUNDED PRECEDING` / `UNBOUNDED FOLLOWING` / `CURRENT ROW` are used. Using the [`EXCLUDE` clause](#::exclude-clause), rows comparing equal to the current row in the specified ordering expression (so-called peers) can be excluded from the frame. The default frame is unbounded (i.e., the entire partition) when no `ORDER BY` clause is present and `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW` when an `ORDER BY` clause is present. By default, the `CURRENT ROW` boundary value (but not the `CURRENT ROW` in the `EXCLUDE` clause) means the current row and all its peers when `RANGE` or `GROUP` framing are used but it means only the current row when `ROWS` framing is used. ###### `ROWS` Framing {#docs:stable:sql:functions:window_functions::rows-framing} Here is a simple `ROW` frame query, using an aggregate function: ```sql SELECT points, sum(points) OVER ( ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS we FROM results; ``` This query computes the `sum` of each point and the points on either side of it: ![](../images/blog/windowing/moving-sum.jpg) Notice that at the edge of the partition, there are only two values added together. This is because frames are cropped to the edge of the partition. ###### `RANGE` Framing {#docs:stable:sql:functions:window_functions::range-framing} Returning to the power data, suppose the data is noisy. We might want to compute a 7 day moving average for each plant to smooth out the noise. To do this, we can use this window query: ```sql SELECT "Plant", "Date", avg("MWh") OVER ( PARTITION BY "Plant" ORDER BY "Date" ASC RANGE BETWEEN INTERVAL 3 DAYS PRECEDING AND INTERVAL 3 DAYS FOLLOWING) AS "MWh 7-day Moving Average" FROM "Generation History" ORDER BY 1, 2; ``` This query partitions the data by `Plant` (to keep the different power plants' data separate), orders each plant's partition by `Date` (to put the energy measurements next to each other), and uses a `RANGE` frame of three days on either side of each day for the `avg` (to handle any missing days). This is the result: | Plant | Date | MWh 7-day Moving Average | |:---|:---|---:| | Boston | 2019-01-02 | 517450.75 | | Boston | 2019-01-03 | 508793.20 | | Boston | 2019-01-04 | 508529.83 | | ... | ... | ... | | Boston | 2019-01-13 | 499793.00 | | Worcester | 2019-01-02 | 104768.25 | | Worcester | 2019-01-03 | 102713.00 | | Worcester | 2019-01-04 | 102249.50 | | ... | ... | ... | ###### `GROUPS` Framing {#docs:stable:sql:functions:window_functions::groups-framing} The third type of framing counts *groups* of rows relative the current row. A *group* in this framing is a set of values with identical `ORDER BY` values. If we assume that power is being generated on every day, we can use `GROUPS` framing to compute the moving average of all power generated in the system without having to resort to date arithmetic: ```sql SELECT "Date", "Plant", avg("MWh") OVER ( ORDER BY "Date" ASC GROUPS BETWEEN 3 PRECEDING AND 3 FOLLOWING) AS "MWh 7-day Moving Average" FROM "Generation History" ORDER BY 1, 2; ``` | Date | Plant | MWh 7-day Moving Average | |------------|-----------|-------------------------:| | 2019-01-02 | Boston | 311109.500 | | 2019-01-02 | Worcester | 311109.500 | | 2019-01-03 | Boston | 305753.100 | | 2019-01-03 | Worcester | 305753.100 | | 2019-01-04 | Boston | 305389.667 | | 2019-01-04 | Worcester | 305389.667 | | ... | ... | ... | | 2019-01-12 | Boston | 309184.900 | | 2019-01-12 | Worcester | 309184.900 | | 2019-01-13 | Boston | 299469.375 | | 2019-01-13 | Worcester | 299469.375 | Notice how the values for each date are the same. ###### `EXCLUDE` Clause {#docs:stable:sql:functions:window_functions::exclude-clause} `EXCLUDE` is an optional modifier to the frame clause for excluding rows around the `CURRENT ROW`. This is useful when you want to compute some aggregate value of nearby rows to see how the current row compares to it. In the following example, we want to know how an athlete's time in an event compares to the average of all the times recorded for their event within Â±10 days: ```sql SELECT event, date, athlete, avg(time) OVER w AS recent, FROM results WINDOW w AS ( PARTITION BY event ORDER BY date RANGE BETWEEN INTERVAL 10 DAYS PRECEDING AND INTERVAL 10 DAYS FOLLOWING EXCLUDE CURRENT ROW ) ORDER BY event, date, athlete; ``` There are four options for `EXCLUDE` that specify how to treat the current row: * `CURRENT ROW` â€“ exclude just the current row * `GROUP` â€“ exclude the current row and all its â€œpeersâ€ (rows that have the same `ORDER BY` value) * `TIES` â€“ exclude all peer rows, but _not_ the current row (this makes a hole on either side) * `NO OTHERS` â€“ don't exclude anything (the default) Exclusion is implemented for both windowed aggregates as well as for the `first`, `last` and `nth_value` functions. ##### `WINDOW` Clauses {#docs:stable:sql:functions:window_functions::window-clauses} Multiple different `OVER` clauses can be specified in the same `SELECT`, and each will be computed separately. Often, however, we want to use the same layout for multiple window functions. The `WINDOW` clause can be used to define a *named* window that can be shared between multiple window functions: ```sql SELECT "Plant", "Date", min("MWh") OVER seven AS "MWh 7-day Moving Minimum", avg("MWh") OVER seven AS "MWh 7-day Moving Average", max("MWh") OVER seven AS "MWh 7-day Moving Maximum" FROM "Generation History" WINDOW seven AS ( PARTITION BY "Plant" ORDER BY "Date" ASC RANGE BETWEEN INTERVAL 3 DAYS PRECEDING AND INTERVAL 3 DAYS FOLLOWING) ORDER BY 1, 2; ``` The three window functions will also share the data layout, which will improve performance. Multiple windows can be defined in the same `WINDOW` clause by comma-separating them: ```sql SELECT "Plant", "Date", min("MWh") OVER seven AS "MWh 7-day Moving Minimum", avg("MWh") OVER seven AS "MWh 7-day Moving Average", max("MWh") OVER seven AS "MWh 7-day Moving Maximum", min("MWh") OVER three AS "MWh 3-day Moving Minimum", avg("MWh") OVER three AS "MWh 3-day Moving Average", max("MWh") OVER three AS "MWh 3-day Moving Maximum" FROM "Generation History" WINDOW seven AS ( PARTITION BY "Plant" ORDER BY "Date" ASC RANGE BETWEEN INTERVAL 3 DAYS PRECEDING AND INTERVAL 3 DAYS FOLLOWING), three AS ( PARTITION BY "Plant" ORDER BY "Date" ASC RANGE BETWEEN INTERVAL 1 DAYS PRECEDING AND INTERVAL 1 DAYS FOLLOWING) ORDER BY 1, 2; ``` The queries above do not use a number of clauses commonly found in select statements, like `WHERE`, `GROUP BY`, etc. For more complex queries you can find where `WINDOW` clauses fall in the canonical order of the [`SELECT statement`](#docs:stable:sql:statements:select). ##### Filtering the Results of Window Functions Using `QUALIFY` {#docs:stable:sql:functions:window_functions::filtering-the-results-of-window-functions-using-qualify} Window functions are executed after the [`WHERE`](#docs:stable:sql:query_syntax:where) and [`HAVING`](#docs:stable:sql:query_syntax:having) clauses have been already evaluated, so it's not possible to use these clauses to filter the results of window functions The [`QUALIFY` clause](#docs:stable:sql:query_syntax:qualify) avoids the need for a subquery or [`WITH` clause](#docs:stable:sql:query_syntax:with) to perform this filtering. ##### Box and Whisker Queries {#docs:stable:sql:functions:window_functions::box-and-whisker-queries} All aggregates can be used as windowing functions, including the complex statistical functions. These function implementations have been optimized for windowing, and we can use the window syntax to write queries that generate the data for moving box-and-whisker plots: ```sql SELECT "Plant", "Date", min("MWh") OVER seven AS "MWh 7-day Moving Minimum", quantile_cont("MWh", [0.25, 0.5, 0.75]) OVER seven AS "MWh 7-day Moving IQR", max("MWh") OVER seven AS "MWh 7-day Moving Maximum", FROM "Generation History" WINDOW seven AS ( PARTITION BY "Plant" ORDER BY "Date" ASC RANGE BETWEEN INTERVAL 3 DAYS PRECEDING AND INTERVAL 3 DAYS FOLLOWING) ORDER BY 1, 2; ``` ## Constraints {#docs:stable:sql:constraints} In SQL, constraints can be specified for tables. Constraints enforce certain properties over data that is inserted into a table. Constraints can be specified along with the schema of the table as part of the [`CREATE TABLE` statement](#docs:stable:sql:statements:create_table). In certain cases, constraints can also be added to a table using the [`ALTER TABLE` statement](#docs:stable:sql:statements:alter_table), but this is not currently supported for all constraints. > **Warning.** Constraints have a strong impact on performance: they slow down loading and updates but speed up certain queries. Please consult the [Performance Guide](#docs:stable:guides:performance:schema::constraints) for details. #### Syntax {#docs:stable:sql:constraints::syntax} #### Check Constraint {#docs:stable:sql:constraints::check-constraint} Check constraints allow you to specify an arbitrary Boolean expression. Any columns that *do not* satisfy this expression violate the constraint. For example, we could enforce that the `name` column does not contain spaces using the following `CHECK` constraint. ```sql CREATE TABLE students (name VARCHAR CHECK (NOT contains(name, ' '))); INSERT INTO students VALUES ('this name contains spaces'); ``` ```console Constraint Error: CHECK constraint failed on table students with expression CHECK((NOT contains("name", ' '))) ``` #### Not Null Constraint {#docs:stable:sql:constraints::not-null-constraint} A not-null constraint specifies that the column cannot contain any `NULL` values. By default, all columns in tables are nullable. Adding `NOT NULL` to a column definition enforces that a column cannot contain `NULL` values. ```sql CREATE TABLE students (name VARCHAR NOT NULL); INSERT INTO students VALUES (NULL); ``` ```console Constraint Error: NOT NULL constraint failed: students.name ``` #### Primary Key and Unique Constraint {#docs:stable:sql:constraints::primary-key-and-unique-constraint} Primary key or unique constraints define a column, or set of columns, that are a unique identifier for a row in the table. The constraint enforces that the specified columns are *unique* within a table, i.e., that at most one row contains the given values for the set of columns. ```sql CREATE TABLE students (id INTEGER PRIMARY KEY, name VARCHAR); INSERT INTO students VALUES (1, 'Student 1'); INSERT INTO students VALUES (1, 'Student 2'); ``` ```console Constraint Error: Duplicate key "id: 1" violates primary key constraint ``` ```sql CREATE TABLE students (id INTEGER, name VARCHAR, PRIMARY KEY (id, name)); INSERT INTO students VALUES (1, 'Student 1'); INSERT INTO students VALUES (1, 'Student 2'); INSERT INTO students VALUES (1, 'Student 1'); ``` ```console Constraint Error: Duplicate key "id: 1, name: Student 1" violates primary key constraint ``` In order to enforce this property efficiently, an [ART index is automatically created](#docs:stable:sql:indexes) for every primary key or unique constraint that is defined in the table. Primary key constraints and unique constraints are identical except for two points: * A table can only have one primary key constraint defined, but many unique constraints * A primary key constraint also enforces the keys to not be `NULL`. ```sql CREATE TABLE students (id INTEGER PRIMARY KEY, name VARCHAR, email VARCHAR UNIQUE); INSERT INTO students VALUES (1, 'Student 1', '[email protected]'); INSERT INTO students VALUES (2, 'Student 2', '[email protected]'); ``` ```console Constraint Error: Duplicate key "email: [email protected]" violates unique constraint. ``` ```sql INSERT INTO students(id, name) VALUES (3, 'Student 3'); INSERT INTO students(name, email) VALUES ('Student 3', '[email protected]'); ``` ```console Constraint Error: NOT NULL constraint failed: students.id ``` > **Warning.** Indexes have certain limitations that might result in constraints being evaluated too eagerly, leading to constraint errors such as `violates primary key constraint` and `violates unique constraint`. See the [indexes section for more details](#docs:stable:sql:indexes::index-limitations). #### Foreign Keys {#docs:stable:sql:constraints::foreign-keys} Foreign keys define a column, or set of columns, that refer to a primary key or unique constraint from *another* table. The constraint enforces that the key exists in the other table. ```sql CREATE TABLE students (id INTEGER PRIMARY KEY, name VARCHAR); CREATE TABLE subjects (id INTEGER PRIMARY KEY, name VARCHAR); CREATE TABLE exams ( exam_id INTEGER PRIMARY KEY, subject_id INTEGER REFERENCES subjects(id), student_id INTEGER REFERENCES students(id), grade INTEGER ); INSERT INTO students VALUES (1, 'Student 1'); INSERT INTO subjects VALUES (1, 'CS 101'); INSERT INTO exams VALUES (1, 1, 1, 10); INSERT INTO exams VALUES (2, 1, 2, 10); ``` ```console Constraint Error: Violates foreign key constraint because key "id: 2" does not exist in the referenced table ``` In order to enforce this property efficiently, an [ART index is automatically created](#docs:stable:sql:indexes) for every foreign key constraint that is defined in the table. > **Warning.** Indexes have certain limitations that might result in constraints being evaluated too eagerly, leading to constraint errors such as `violates primary key constraint` and `violates unique constraint`. See the [indexes section for more details](#docs:stable:sql:indexes::index-limitations). ## Indexes {#docs:stable:sql:indexes} #### Index Types {#docs:stable:sql:indexes::index-types} DuckDB has two built-in index types. Indexes can also be defined via [extensions](#docs:stable:extensions:overview). ##### Min-Max Index (Zonemap) {#docs:stable:sql:indexes::min-max-index-zonemap} A [min-max index](https://en.wikipedia.org/wiki/Block_Range_Index) (also known as zonemap or block range index) is _automatically created_ for columns of all [general-purpose data types](#docs:stable:sql:data_types:overview). ##### Adaptive Radix Tree (ART) {#docs:stable:sql:indexes::adaptive-radix-tree-art} An [Adaptive Radix Tree (ART)](https://db.in.tum.de/~leis/papers/ART.pdf) is mainly used to ensure primary key constraints and to speed up point and very highly selective (i.e., < 0.1%) queries. ART indexes can be created manually using `CREATE INDEX` clause and they are automatically created for columns with a `UNIQUE` or `PRIMARY KEY` constraint. > **Warning.** ART indexes must currently be able to fit in memory during index creation. Avoid creating ART indexes if the index does not fit in memory during index creation. ##### Indexes Defined by Extensions {#docs:stable:sql:indexes::indexes-defined-by-extensions} DuckDB supports [R-trees for spatial indexing](#docs:stable:core_extensions:spatial:r-tree_indexes) via the `spatial` extension. #### Persistence {#docs:stable:sql:indexes::persistence} Both min-max indexes and ART indexes are persisted on disk. #### `CREATE INDEX` and `DROP INDEX` Statements {#docs:stable:sql:indexes::create-index-and-drop-index-statements} To create an [ART index](#::adaptive-radix-tree-art), use the [`CREATE INDEX` statement](#docs:stable:sql:statements:create_index::create-index). To drop an [ART index](#::adaptive-radix-tree-art), use the [`DROP INDEX` statement](#docs:stable:sql:statements:create_index::drop-index). #### Limitations of ART Indexes {#docs:stable:sql:indexes::limitations-of-art-indexes} ART indexes create a secondary copy of the data in a second location â€“ this complicates processing, particularly when combined with transactions. Certain limitations apply when it comes to modifying data that is also stored in secondary indexes. > As expected, indexes have a strong effect on performance, slowing down loading and updates, but speeding up certain queries. Please consult the [Performance Guide](#docs:stable:guides:performance:indexing) for details. ##### Constraint Checking in `UPDATE` Statements {#docs:stable:sql:indexes::constraint-checking-in-update-statements} `UPDATE` statements on indexed columns and columns that cannot be updated in place are transformed into a `DELETE` of the original row followed by an `INSERT` of the updated row. This rewrite has performance implications, particularly for wide tables, as entire rows are rewritten instead of only the affected columns. Additionally, it causes the following constraint-checking limitation of `UPDATE` statements. The same limitation exists in other DBMSs, like PostgreSQL. In the example below, note how the number of rows exceeds DuckDB's standard vector size, which is 2048. The `UPDATE` statement is rewritten into a `DELETE`, followed by an `INSERT`. This rewrite happens per chunk of data (2048 rows) moving through DuckDB's processing pipeline. When updating `i = 2047` to `i = 2048`, we do not yet know that 2048 becomes 2049, and so forth. That is because we have not yet seen that chunk. Thus, we throw a constraint violation. ```sql CREATE TABLE my_table (i INTEGER PRIMARY KEY); INSERT INTO my_table SELECT range FROM range(3_000); UPDATE my_table SET i = i + 1; ``` ```console Constraint Error: Duplicate key "i: 2048" violates primary key constraint. ``` A workaround is to split the `UPDATE` into a `DELETE ... RETURNING ...` followed by an `INSERT`, with some additional logic to (temporarily) store the result of the `DELETE`. All statements should be run inside a transaction via `BEGIN`, and eventually `COMMIT`. Here's an example of how that could look like in the command line client. ```sql CREATE TABLE my_table (i INTEGER PRIMARY KEY); INSERT INTO my_table SELECT range FROM range(3_000); BEGIN; CREATE TEMP TABLE tmp AS SELECT i FROM my_table; DELETE FROM my_table; INSERT INTO my_table SELECT i FROM tmp; DROP TABLE tmp; COMMIT; ``` In other clients, you might be able to fetch the result of `DELETE ... RETURNING ...`. Then, you can use that result in a subsequent `INSERT ...` statements, or potentially make use of DuckDB's `Appender` (if available in the client). ##### Over-Eager Constraint Checking in Foreign Keys {#docs:stable:sql:indexes::over-eager-constraint-checking-in-foreign-keys} This limitation occurs if you meet the following conditions: * A table has a `FOREIGN KEY` constraint. * There is an `UPDATE` on the corresponding `PRIMARY KEY` table, which DuckDB rewrites into a `DELETE` followed by an `INSERT`. * The to-be-deleted row exists in the foreign key table. If these hold, you'll encounter an unexpected constraint violation: ```sql CREATE TABLE pk_table (id INTEGER PRIMARY KEY, payload VARCHAR[]); INSERT INTO pk_table VALUES (1, ['hello']); CREATE TABLE fk_table (id INTEGER REFERENCES pk_table(id)); INSERT INTO fk_table VALUES (1); UPDATE pk_table SET payload = ['world'] WHERE id = 1; ``` ```console Constraint Error: Violates foreign key constraint because key "id: 1" is still referenced by a foreign key in a different table. If this is an unexpected constraint violation, please refer to our foreign key limitations in the documentation ``` The reason for this is because DuckDB does not yet support â€œlooking aheadâ€. During the `INSERT`, it is unaware it will reinsert the foreign key value as part of the `UPDATE` rewrite. ##### Constraint Checking After Delete With Concurrent Transactions {#docs:stable:sql:indexes::constraint-checking-after-delete-with-concurrent-transactions} When a delete is committed on a table with an index, data can only be removed from the index when no further transactions exist that refer to the deleted entry. This means that for indices which enforce constraint violations, if you perform a delete transaction, constraint checking can fail for a subsequent transaction which inserts a record with the same key as the deleted record if there is a concurrent transaction referencing the deleted record. Note that constraint violations are only relevant for primary key, foreign key, and `UNIQUE` indexes. There are two main ways that constraint checking can fail: ###### Over-Eager Unique Constraint Checking {#docs:stable:sql:indexes::over-eager-unique-constraint-checking} For uniqueness constraints, inserts can fail when they should succeed: ```cpp // Assume "someTable" is a table with an index enforcing uniqueness tx1 = duckdbTxStart() someRecord = duckdb(tx1, "select * from someTable using sample 1 rows") tx2 = duckdbTxStart() duckdbDelete(tx2, someRecord) duckdbTxCommit(tx2) // At this point someRecord is deleted, but the ART index is not updated, so the following would fail with a constraint error: // tx3 = duckdbTxStart() // duckdbInsert(tx3, someRecord) // duckdbTxCommit(tx3) duckdbTxCommit(tx1) // Following this, the above insert would succeed because the ART index was allowed to update ``` ###### Under-Eager Foreign Key Constraint Checking {#docs:stable:sql:indexes::under-eager-foreign-key-constraint-checking} For foreign key constraints, inserts can succeed when they should fail: ```cpp // Setup: Create a primary table with UUID primary key and a secondary table with foreign key reference primaryId = generateNewGUID() conn = duckdbConnectInMemory() // Create tables and insert initial record in primary table duckdb(conn, "CREATE TABLE primary_table (id UUID PRIMARY KEY)") duckdb(conn, "CREATE TABLE secondary_table (primary_id UUID, FOREIGN KEY (primary_id) REFERENCES primary_table(id))") duckdbInsert(conn, "primary_table", {id: primaryId}) // Start transaction tx1 which will read from primary_table tx1 = duckdbTxStart(conn) readRecord = duckdb(tx1, "SELECT id FROM primary_table LIMIT 1") // Note: tx1 remains open, holding locks/resources // Outside of tx1, delete the record from primary_table duckdbDelete(conn, "primary_table", {id: primaryId}) // Try to insert into secondary_table with foreign key reference to the now-deleted primary record // This succeeds because tx1 is still open and the constraint isn't fully enforced yet duckdbInsert(conn, "secondary_table", {primary_id: primaryId}) // Commit tx1, releasing any locks/resources duckdbTxCommit(tx1) // Verify the primary record is indeed deleted count = duckdb(conn, "SELECT count() FROM primary_table WHERE id = $primaryId", {primaryId: primaryId}) assert(count == 0, "Record should be deleted") // Verify the secondary record with the foreign key reference exists, an inconsistent state count = duckdb(conn, "SELECT count() FROM secondary_table WHERE primary_id = $primaryId", {primaryId: primaryId}) assert(count == 1, "Foreign key reference should exist") ``` ## Meta Queries {#sql:meta} ### Information Schema {#docs:stable:sql:meta:information_schema} The views in the `information_schema` are SQL-standard views that describe the catalog entries of the database. These views can be filtered to obtain information about a specific column or table. DuckDB's implementation is based on [PostgreSQL's information schema](https://www.postgresql.org/docs/16/infoschema-columns.html). #### Tables {#docs:stable:sql:meta:information_schema::tables} ##### `character_sets`: Character Sets {#docs:stable:sql:meta:information_schema::character_sets-character-sets} | Column | Description | Type | Example | |--------|-------------|------|---------| | `character_set_catalog` | Currently not implemented â€“ always `NULL`. | `VARCHAR` | `NULL` | | `character_set_schema` | Currently not implemented â€“ always `NULL`. | `VARCHAR` | `NULL` | | `character_set_name` | Name of the character set, currently implemented as showing the name of the database encoding. | `VARCHAR` | `'UTF8'` | | `character_repertoire` | Character repertoire, showing `UCS` if the encoding is `UTF8`, else just the encoding name. | `VARCHAR` | `'UCS'` | | `form_of_use` | Character encoding form, same as the database encoding. | `VARCHAR` | `'UTF8'` | | `default_collate_catalog`| Name of the database containing the default collation (always the current database). | `VARCHAR` | `'my_db'` | | `default_collate_schema` | Name of the schema containing the default collation. | `VARCHAR` | `'pg_catalog'` | | `default_collate_name` | Name of the default collation. | `VARCHAR` | `'ucs_basic'` | ##### `columns`: Columns {#docs:stable:sql:meta:information_schema::columns-columns} The view that describes the catalog information for columns is `information_schema.columns`. It lists the column present in the database and has the following layout: | Column | Description | Type | Example | |:--|:---|:-|:-| | `table_catalog` | Name of the database containing the table (always the current database). | `VARCHAR` | `'my_db'` | | `table_schema` | Name of the schema containing the table. | `VARCHAR` | `'main'` | | `table_name` | Name of the table. | `VARCHAR` | `'widgets'` | | `column_name` | Name of the column. | `VARCHAR` | `'price'` | | `ordinal_position` | Ordinal position of the column within the table (count starts at 1). | `INTEGER` | `5` | | `column_default` | Default expression of the column. |`VARCHAR`| `1.99` | | `is_nullable` | `YES` if the column is possibly nullable, `NO` if it is known not nullable. |`VARCHAR`| `'YES'` | | `data_type` | Data type of the column. |`VARCHAR`| `'DECIMAL(18, 2)'` | | `character_maximum_length` | If `data_type` identifies a character or bit string type, the declared maximum length; `NULL` for all other data types or if no maximum length was declared. |`INTEGER`| `255` | | `character_octet_length` | If `data_type` identifies a character type, the maximum possible length in octets (bytes) of a datum; `NULL` for all other data types. The maximum octet length depends on the declared character maximum length (see above) and the character encoding. |`INTEGER`| `1073741824` | | `numeric_precision` | If `data_type` identifies a numeric type, this column contains the (declared or implicit) precision of the type for this column. The precision indicates the number of significant digits. For all other data types, this column is `NULL`. |`INTEGER`| `18` | | `numeric_scale` | If `data_type` identifies a numeric type, this column contains the (declared or implicit) scale of the type for this column. The precision indicates the number of significant digits. For all other data types, this column is `NULL`. |`INTEGER`| `2` | | `datetime_precision` | If `data_type` identifies a date, time, timestamp, or interval type, this column contains the (declared or implicit) fractional seconds precision of the type for this column, that is, the number of decimal digits maintained following the decimal point in the seconds value. No fractional seconds are currently supported in DuckDB. For all other data types, this column is `NULL`. |`INTEGER`| `0` | ##### `constraint_column_usage`: Constraint Column Usage {#docs:stable:sql:meta:information_schema::constraint_column_usage-constraint-column-usage} This view describes all columns in the current database that are used by some constraint. For a check constraint, this view identifies the columns that are used in the check expression. For a not-null constraint, this view identifies the column that the constraint is defined on. For a foreign key constraint, this view identifies the columns that the foreign key references. For a unique or primary key constraint, this view identifies the constrained columns. | Column | Description | Type | Example | |--------|-------------|------|---------| | `table_catalog` | Name of the database that contains the table that contains the column that is used by some constraint (always the current database) |`VARCHAR`| `'my_db'` | | `table_schema` | Name of the schema that contains the table that contains the column that is used by some constraint |`VARCHAR`| `'main'` | | `table_name` | Name of the table that contains the column that is used by some constraint |`VARCHAR`| `'widgets'` | | `column_name` | Name of the column that is used by some constraint |`VARCHAR`| `'price'` | | `constraint_catalog` | Name of the database that contains the constraint (always the current database) |`VARCHAR`| `'my_db'` | | `constraint_schema` | Name of the schema that contains the constraint |`VARCHAR`| `'main'` | | `constraint_name` | Name of the constraint |`VARCHAR`| `'exam_id_students_id_fkey'` | ##### `key_column_usage`: Key Column Usage {#docs:stable:sql:meta:information_schema::key_column_usage-key-column-usage} | Column | Description | Type | Example | |--------|-------------|------|---------| | `constraint_catalog` | Name of the database that contains the constraint (always the current database). | `VARCHAR` | `'my_db'` | | `constraint_schema` | Name of the schema that contains the constraint. | `VARCHAR` | `'main'` | | `constraint_name` | Name of the constraint. | `VARCHAR` | `'exams_exam_id_fkey'` | | `table_catalog` | Name of the database that contains the table that contains the column that is restricted by this constraint (always the current database). | `VARCHAR` | `'my_db'` | | `table_schema` | Name of the schema that contains the table that contains the column that is restricted by this constraint. | `VARCHAR` | `'main'` | | `table_name` | Name of the table that contains the column that is restricted by this constraint. | `VARCHAR` | `'exams'` | | `column_name` | Name of the column that is restricted by this constraint. | `VARCHAR` | `'exam_id'` | | `ordinal_position` | Ordinal position of the column within the constraint key (count starts at 1). | `INTEGER` | `1` | | `position_in_unique_constraint` | For a foreign-key constraint, ordinal position of the referenced column within its unique constraint (count starts at `1`); otherwise `NULL`. | `INTEGER` | `1` | ##### `referential_constraints`: Referential Constraints {#docs:stable:sql:meta:information_schema::referential_constraints-referential-constraints} | Column | Description | Type | Example | |--------|-------------|------|---------| | `constraint_catalog` | Name of the database containing the constraint (always the current database). | `VARCHAR` | `'my_db'` | | `constraint_schema` | Name of the schema containing the constraint. | `VARCHAR` | `main` | | `constraint_name` | Name of the constraint. | `VARCHAR` | `exam_id_students_id_fkey` | | `unique_constraint_catalog` | Name of the database that contains the unique or primary key constraint that the foreign key constraint references. | `VARCHAR` | `'my_db'` | | `unique_constraint_schema` | Name of the schema that contains the unique or primary key constraint that the foreign key constraint references. | `VARCHAR` | `'main'` | | `unique_constraint_name` | Name of the unique or primary key constraint that the foreign key constraint references. | `VARCHAR` | `'students_id_pkey'` | | `match_option` | Match option of the foreign key constraint. Always `NONE`. | `VARCHAR` | `NONE` | | `update_rule` | Update rule of the foreign key constraint. Always `NO ACTION`. | `VARCHAR` | `NO ACTION` | | `delete_rule` | Delete rule of the foreign key constraint. Always `NO ACTION`. | `VARCHAR` | `NO ACTION` | ##### `schemata`: Database, Catalog and Schema {#docs:stable:sql:meta:information_schema::schemata-database-catalog-and-schema} The top level catalog view is `information_schema.schemata`. It lists the catalogs and the schemas present in the database and has the following layout: | Column | Description | Type | Example | |:--|:---|:-|:-| | `catalog_name` | Name of the database that the schema is contained in. | `VARCHAR` | `'my_db'` | | `schema_name` | Name of the schema. | `VARCHAR` | `'main'` | | `schema_owner` | Name of the owner of the schema. Not yet implemented. | `VARCHAR` | `'duckdb'` | | `default_character_set_catalog` | Applies to a feature not available in DuckDB. | `VARCHAR` | `NULL` | | `default_character_set_schema` | Applies to a feature not available in DuckDB. | `VARCHAR` | `NULL` | | `default_character_set_name` | Applies to a feature not available in DuckDB. | `VARCHAR` | `NULL` | | `sql_path` | Applies to a feature not available in DuckDB. | `VARCHAR` | `NULL` | ##### `tables`: Tables and Views {#docs:stable:sql:meta:information_schema::tables-tables-and-views} The view that describes the catalog information for tables and views is `information_schema.tables`. It lists the tables present in the database and has the following layout: | Column | Description | Type | Example | |:--|:---|:-|:-| | `table_catalog` | The catalog the table or view belongs to. | `VARCHAR` | `'my_db'` | | `table_schema` | The schema the table or view belongs to. | `VARCHAR` | `'main'` | | `table_name` | The name of the table or view. | `VARCHAR` | `'widgets'` | | `table_type` | The type of table. One of: `BASE TABLE`, `LOCAL TEMPORARY`, `VIEW`. | `VARCHAR` | `'BASE TABLE'` | | `self_referencing_column_name` | Applies to a feature not available in DuckDB. | `VARCHAR` | `NULL` | | `reference_generation` | Applies to a feature not available in DuckDB. | `VARCHAR` | `NULL` | | `user_defined_type_catalog` | If the table is a typed table, the name of the database that contains the underlying data type (always the current database), else `NULL`. Currently unimplemented. | `VARCHAR` | `NULL` | | `user_defined_type_schema` | If the table is a typed table, the name of the schema that contains the underlying data type, else `NULL`. Currently unimplemented. | `VARCHAR` | `NULL` | | `user_defined_type_name` | If the table is a typed table, the name of the underlying data type, else `NULL`. Currently unimplemented. | `VARCHAR` | `NULL` | | `is_insertable_into` | `YES` if the table is insertable into, `NO` if not (Base tables are always insertable into, views not necessarily.)| `VARCHAR` | `'YES'` | | `is_typed` | `YES` if the table is a typed table, `NO` if not. | `VARCHAR` | `'NO'` | | `commit_action` | Not yet implemented. | `VARCHAR` | `'NO'` | ##### `table_constraints`: Table Constraints {#docs:stable:sql:meta:information_schema::table_constraints-table-constraints} | Column | Description | Type | Example | |--------|-------------|------|---------| | `constraint_catalog` | Name of the database that contains the constraint (always the current database). | `VARCHAR` | `'my_db'` | | `constraint_schema` | Name of the schema that contains the constraint. | `VARCHAR` | `'main'` | | `constraint_name` | Name of the constraint. | `VARCHAR` | `'exams_exam_id_fkey'` | | `table_catalog` | Name of the database that contains the table (always the current database). | `VARCHAR` | `'my_db'` | | `table_schema` | Name of the schema that contains the table. | `VARCHAR` | `'main'` | | `table_name` | Name of the table. | `VARCHAR` | `'exams'` | | `constraint_type` | Type of the constraint: `CHECK`, `FOREIGN KEY`, `PRIMARY KEY`, or `UNIQUE`. | `VARCHAR` | `'FOREIGN KEY'` | | `is_deferrable` | `YES` if the constraint is deferrable, `NO` if not. | `VARCHAR` | `'NO'` | | `initially_deferred` | `YES` if the constraint is deferrable and initially deferred, `NO` if not. | `VARCHAR` | `'NO'` | | `enforced` | Always `YES`. | `VARCHAR` | `'YES'` | | `nulls_distinct` | If the constraint is a unique constraint, then `YES` if the constraint treats `NULL`s as distinct or `NO` if it treats `NULL`s as not distinct, otherwise `NULL` for other types of constraints. | `VARCHAR` | `'YES'` | #### Catalog Functions {#docs:stable:sql:meta:information_schema::catalog-functions} Several functions are also provided to see details about the catalogs and schemas that are configured in the database. | Function | Description | Example | Result | |:--|:---|:--|:--| | `current_catalog()` | Return the name of the currently active catalog. Default is memory. | `current_catalog()` | `'memory'` | | `current_schema()` | Return the name of the currently active schema. Default is main. | `current_schema()` | `'main'` | | `current_schemas(boolean)` | Return list of schemas. Pass a parameter of `true` to include implicit schemas. | `current_schemas(true)` | `['temp', 'main', 'pg_catalog']` | ### DuckDB_% Metadata Functions {#docs:stable:sql:meta:duckdb_table_functions} DuckDB offers a collection of table functions that provide metadata about the current database. These functions reside in the `main` schema and their names are prefixed with `duckdb_`. The resultset returned by a `duckdb_` table function may be used just like an ordinary table or view. For example, you can use a `duckdb_` function call in the `FROM` clause of a `SELECT` statement, and you may refer to the columns of its returned resultset elsewhere in the statement, for example in the `WHERE` clause. Table functions are still functions, and you should write parenthesis after the function name to call it to obtain its returned resultset: ```sql SELECT * FROM duckdb_settings(); ``` Alternatively, you may execute table functions also using the `CALL`-syntax: ```sql CALL duckdb_settings(); ``` In this case too, the parentheses are mandatory. > For some of the `duckdb_%` functions, there is also an identically named view available, which also resides in the `main` schema. Typically, these views do a `SELECT` on the `duckdb_` table function with the same name, while filtering out those objects that are marked as internal. We mention it here, because if you accidentally omit the parentheses in your `duckdb_` table function call, you might still get a result, but from the identically named view. Example: The `duckdb_views()` _table function_ returns all views, including those marked internal: ```sql SELECT * FROM duckdb_views(); ``` The `duckdb_views` _view_ returns views that are not marked as internal: ```sql SELECT * FROM duckdb_views; ``` #### `duckdb_columns` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_columns} The `duckdb_columns()` function provides metadata about the columns available in the DuckDB instance. | Column | Description | Type | |:-|:---|:-| | `database_name` | The name of the database that contains the column object. | `VARCHAR` | | `database_oid` | Internal identifier of the database that contains the column object. | `BIGINT` | | `schema_name` | The SQL name of the schema that contains the table object that defines this column. | `VARCHAR` | | `schema_oid` | Internal identifier of the schema object that contains the table of the column. | `BIGINT` | | `table_name` | The SQL name of the table that defines the column. | `VARCHAR` | | `table_oid` | Internal identifier (name) of the table object that defines the column. | `BIGINT` | | `column_name` | The SQL name of the column. | `VARCHAR` | | `column_index` | The unique position of the column within its table. | `INTEGER` | | `comment` | A comment created by the [`COMMENT ON` statement](#docs:stable:sql:statements:comment_on). | `VARCHAR` | | `internal` | `true` if this column built-in, `false` if it is user-defined. | `BOOLEAN` | | `column_default` | The default value of the column (expressed in SQL)| `VARCHAR` | | `is_nullable` | `true` if the column can hold `NULL` values; `false` if the column cannot hold `NULL`-values. | `BOOLEAN` | | `data_type` | The name of the column datatype. | `VARCHAR` | | `data_type_id` | The internal identifier of the column data type. | `BIGINT` | | `character_maximum_length` | Always `NULL`. DuckDB [text types](#docs:stable:sql:data_types:text) do not enforce a value length restriction based on a length type parameter. | `INTEGER` | | `numeric_precision` | The number of units (in the base indicated by `numeric_precision_radix`) used for storing column values. For integral and approximate numeric types, this is the number of bits. For decimal types, this is the number of digits positions. | `INTEGER` | | `numeric_precision_radix` | The number-base of the units in the `numeric_precision` column. For integral and approximate numeric types, this is `2`, indicating the precision is expressed as a number of bits. For the `decimal` type this is `10`, indicating the precision is expressed as a number of decimal positions. | `INTEGER` | | `numeric_scale` | Applicable to `decimal` type. Indicates the maximum number of fractional digits (i.e., the number of digits that may appear after the decimal separator). | `INTEGER` | The [`information_schema.columns`](#docs:stable:sql:meta:information_schema::columns-columns) system view provides a more standardized way to obtain metadata about database columns, but the `duckdb_columns` function also returns metadata about DuckDB internal objects. (In fact, `information_schema.columns` is implemented as a query on top of `duckdb_columns()`) #### `duckdb_constraints` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_constraints} The `duckdb_constraints()` function provides metadata about the constraints available in the DuckDB instance. | Column | Description | Type | |:-|:---|:-| | `database_name` | The name of the database that contains the constraint. | `VARCHAR` | | `database_oid` | Internal identifier of the database that contains the constraint. | `BIGINT` | | `schema_name` | The SQL name of the schema that contains the table on which the constraint is defined. | `VARCHAR` | | `schema_oid` | Internal identifier of the schema object that contains the table on which the constraint is defined. | `BIGINT` | | `table_name` | The SQL name of the table on which the constraint is defined. | `VARCHAR` | | `table_oid` | Internal identifier (name) of the table object on which the constraint is defined. | `BIGINT` | | `constraint_index` | Indicates the position of the constraint as it appears in its table definition. | `BIGINT` | | `constraint_type` | Indicates the type of constraint. Applicable values are `CHECK`, `FOREIGN KEY`, `PRIMARY KEY`, `NOT NULL`, `UNIQUE`. | `VARCHAR` | | `constraint_text` | The definition of the constraint expressed as a SQL-phrase. (Not necessarily a complete or syntactically valid DDL-statement.)| `VARCHAR` | | `expression` | If constraint is a check constraint, the definition of the condition being checked, otherwise `NULL`. | `VARCHAR` | | `constraint_column_indexes` | An array of table column indexes referring to the columns that appear in the constraint definition. | `BIGINT[]` | | `constraint_column_names` | An array of table column names appearing in the constraint definition. | `VARCHAR[]` | | `constraint_name` | The name of the constraint. | `VARCHAR` | | `referenced_table` | The table referenced by the constraint. | `VARCHAR` | | `referenced_column_names` | The column names references the by the constraint. | `VARCHAR[]` | The [`information_schema.referential_constraints`](#docs:stable:sql:meta:information_schema::referential_constraints-referential-constraints) and [`information_schema.table_constraints`](#docs:stable:sql:meta:information_schema::table_constraints-table-constraints) system views provide a more standardized way to obtain metadata about constraints, but the `duckdb_constraints` function also returns metadata about DuckDB internal objects. (In fact, `information_schema.referential_constraints` and `information_schema.table_constraints` are implemented as a query on top of `duckdb_constraints()`) #### `duckdb_databases` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_databases} The `duckdb_databases()` function lists the databases that are accessible from within the current DuckDB process. Apart from the database associated at startup, the list also includes databases that were [attached](#docs:stable:sql:statements:attach) later on to the DuckDB process | Column | Description | Type | |:-|:---|:-| | `database_name` | The name of the database, or the alias if the database was attached using an ALIAS-clause. | `VARCHAR` | | `database_oid` | The internal identifier of the database. | `VARCHAR` | | `path` | The file path associated with the database. | `VARCHAR` | | `comment` | A comment created by the [`COMMENT ON` statement](#docs:stable:sql:statements:comment_on). | `VARCHAR` | | `tags` | A map of string keyâ€“value pairs. | `MAP(VARCHAR, VARCHAR)` | | `internal` | `true` indicates a system or built-in database. `false` indicates a user-defined database. | `BOOLEAN` | | `type` | The type indicates the type of RDBMS implemented by the attached database. For DuckDB databases, that value is `duckdb`. | `VARCHAR` | | `readonly` | Denotes whether the database is read-only. | `BOOLEAN` | #### `duckdb_dependencies` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_dependencies} The `duckdb_dependencies()` function provides metadata about the dependencies available in the DuckDB instance. | Column | Description | Type | |:--|:------|:-| | `classid` | Always 0| `BIGINT` | | `objid` | The internal id of the object. | `BIGINT` | | `objsubid` | Always 0| `INTEGER` | | `refclassid` | Always 0| `BIGINT` | | `refobjid` | The internal id of the dependent object. | `BIGINT` | | `refobjsubid` | Always 0| `INTEGER` | | `deptype` | The type of dependency. Either regular (n) or automatic (a). | `VARCHAR` | #### `duckdb_extensions` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_extensions} The `duckdb_extensions()` function provides metadata about the extensions available in the DuckDB instance. | Column | Description | Type | |:--|:------|:-| | `extension_name` | The name of the extension. | `VARCHAR` | | `loaded` | `true` if the extension is loaded, `false` if it's not loaded. | `BOOLEAN` | | `installed` | `true` if the extension is installed, `false` if it's not installed. | `BOOLEAN` | | `install_path` | `(BUILT-IN)` if the extension is built-in, otherwise, the filesystem path where binary that implements the extension resides. | `VARCHAR` | | `description` | Human readable text that describes the extension's functionality. | `VARCHAR` | | `aliases` | List of alternative names for this extension. | `VARCHAR[]` | | `extension_version` | The version of the extension (` vX.Y.Z` for stable versions and 6-character hash for unstable versions). | `VARCHAR` | | `install_mode` | The installation mode that was used to install the extension: `UNKNOWN`, `REPOSITORY`, `CUSTOM_PATH`, `STATICALLY_LINKED`, `NOT_INSTALLED`, `NULL`. | `VARCHAR` | | `installed_from` | Name of the repository the extension was installed from, e.g., `community` or `core_nightly`. The empty string denotes the `core` repository. | `VARCHAR` | #### `duckdb_functions` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_functions} The `duckdb_functions()` function provides metadata about the functions (including macros) available in the DuckDB instance. | Column | Description | Type | |:-|:---|:-| | `database_name` | The name of the database that contains this function. | `VARCHAR` | | `database_oid` | Internal identifier of the database containing the index. | `BIGINT` | | `schema_name` | The SQL name of the schema where the function resides. | `VARCHAR` | | `function_name` | The SQL name of the function. | `VARCHAR` | | `function_type` | The function kind. Value is one of: `table`,`scalar`,`aggregate`,`pragma`,`macro`| `VARCHAR` | | `description` | Description of this function (always `NULL`)| `VARCHAR` | | `comment` | A comment created by the [`COMMENT ON` statement](#docs:stable:sql:statements:comment_on). | `VARCHAR` | | `tags` | A map of string keyâ€“value pairs. | `MAP(VARCHAR, VARCHAR)` | | `return_type` | The logical data type name of the returned value. Applicable for scalar and aggregate functions. | `VARCHAR` | | `parameters` | If the function has parameters, the list of parameter names. | `VARCHAR[]` | | `parameter_types` | If the function has parameters, a list of logical data type names corresponding to the parameter list. | `VARCHAR[]` | | `varargs` | The name of the data type in case the function has a variable number of arguments, or `NULL` if the function does not have a variable number of arguments. | `VARCHAR` | | `macro_definition` | If this is a [macro](#docs:stable:sql:statements:create_macro), the SQL expression that defines it. | `VARCHAR` | | `has_side_effects` | `false` if this is a pure function. `true` if this function changes the database state (like sequence functions `nextval()` and `curval()`). | `BOOLEAN` | | `internal` | `true` if the function is built-in (defined by DuckDB or an extension), `false` if it was defined using the [`CREATE MACRO` statement](#docs:stable:sql:statements:create_macro). | `BOOLEAN` | | `function_oid` | The internal identifier for this function. | `BIGINT` | | `examples` | Examples of using the function. Used to generate the documentation. | `VARCHAR[]` | | `stability` | The stability of the function (` CONSISTENT`, `VOLATILE`, `CONSISTENT_WITHIN_QUERY` or `NULL`) | `VARCHAR` | #### `duckdb_indexes` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_indexes} The `duckdb_indexes()` function provides metadata about secondary indexes available in the DuckDB instance. | Column | Description | Type | |:-|:---|:-| | `database_name` | The name of the database that contains this index. | `VARCHAR` | | `database_oid` | Internal identifier of the database containing the index. | `BIGINT` | | `schema_name` | The SQL name of the schema that contains the table with the secondary index. | `VARCHAR` | | `schema_oid` | Internal identifier of the schema object. | `BIGINT` | | `index_name` | The SQL name of this secondary index. | `VARCHAR` | | `index_oid` | The object identifier of this index. | `BIGINT` | | `table_name` | The name of the table with the index. | `VARCHAR` | | `table_oid` | Internal identifier (name) of the table object. | `BIGINT` | | `comment` | A comment created by the [`COMMENT ON` statement](#docs:stable:sql:statements:comment_on). | `VARCHAR` | | `tags` | A map of string keyâ€“value pairs. | `MAP(VARCHAR, VARCHAR)` | | `is_unique` | `true` if the index was created with the `UNIQUE` modifier, `false` if it was not. | `BOOLEAN` | | `is_primary` | Always `false`. | `BOOLEAN` | | `expressions` | Always `NULL`. | `VARCHAR` | | `sql` | The definition of the index, expressed as a `CREATE INDEX` SQL statement. | `VARCHAR` | Note that `duckdb_indexes` only provides metadata about secondary indexes, i.e., those indexes created by explicit [`CREATE INDEX`](#docs:stable:sql:indexes::create-index) statements. Primary keys, foreign keys, and `UNIQUE` constraints are maintained using indexes, but their details are included in the `duckdb_constraints()` function. #### `duckdb_keywords` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_keywords} The `duckdb_keywords()` function provides metadata about DuckDB's keywords and reserved words. | Column | Description | Type | |:-|:---|:-| | `keyword_name` | The keyword. | `VARCHAR` | | `keyword_category` | Indicates the category of the keyword. Values are `column_name`, `reserved`, `type_function` and `unreserved`. | `VARCHAR` | #### `duckdb_log_contexts` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_log_contexts} The `duckdb_log_contexts()` function provides information on the contexts of DuckDB log entries. | Column | Description | Type | |:-|:---|:-| | `context_id` | The identifier of the context. The `context_id` column in the [`duckdb_logs`](#::duckdb_logs) table is a foreign key that points to this column. | `UBIGINT` | | `scope` | The scope of the context (` connection`, `database` or `file_opener` TODO: + more ? ). | `VARCHAR` | | `connection_id` | The identifier of the connection. | `UBIGINT` | | `transaction_id` | The identifier of the transaction. | `UBIGINT` | | `query_id` | The identifier of the query. | `UBIGINT` | | `thread_id` | The identifier of the thread. | `UBIGINT` | #### `duckdb_logs` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_logs} The `duckdb_logs()` function returns a table of DuckDB log entries. | Column | Description | Type | |:-|:---|:-| | `context_id` | The identifier of the context of the log entry. Foreign key to the [`duckdb_log_contexts`](#::duckdb_log_contexts) table. | `UBIGINT` | | `timestamp` | The timestamp of the log entry. | `TIMESTAMP` | | `type` | The type of the log entry. TODO: ?? | `VARCHAR` | | `log_level` | The level of the log entry (` TRACE`, `DEBUG`, `INFO`, `WARN`, `ERROR` or `FATAL`). | `VARCHAR` | | `message` | The message of the log entry. | `VARCHAR` | #### `duckdb_memory` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_memory} The `duckdb_memory()` function provides metadata about DuckDB's buffer manager. | Column | Description | Type | |:-|:---|:-| | `tag` | The memory tag. It has one of the following values: `BASE_TABLE`, `HASH_TABLE`, `PARQUET_READER`, `CSV_READER`, `ORDER_BY`, `ART_INDEX`, `COLUMN_DATA`, `METADATA`, `OVERFLOW_STRINGS`, `IN_MEMORY_TABLE`, `ALLOCATOR`, `EXTENSION`. | `VARCHAR` | | `memory_usage_bytes` | The memory used (in bytes). | `BIGINT` | | `temporary_storage_bytes` | The disk storage used (in bytes). | `BIGINT` | #### `duckdb_optimizers` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_optimizers} The `duckdb_optimizers()` function provides metadata about the optimization rules (e.g., `expression_rewriter`, `filter_pushdown`) available in the DuckDB instance. These can be selectively turned off using [`PRAGMA disabled_optimizers`](#docs:stable:configuration:pragmas::selectively-disabling-optimizers). | Column | Description | Type | |:-|:---|:-| | `name` | The name of the optimization rule. | `VARCHAR` | #### `duckdb_prepared_statements` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_prepared_statements} The `duckdb_prepared_statements()` function provides metadata about the [prepared statements](#docs:stable:sql:query_syntax:prepared_statements) that exist in the current DuckDB session. | Column | Description | Type | |:-|:---|:-| | `name` | The name of the prepared statement. | `VARCHAR` | | `statement` | The SQL statement. | `VARCHAR` | | `parameter_types` | The expected parameter types for the statement's parameters. Currently returns `UNKNOWN` for all parameters. | `VARCHAR[]` | | `result_types` | The types of the columns in the table returned by the prepared statement. | `VARCHAR[]` | #### `duckdb_schemas` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_schemas} The `duckdb_schemas()` function provides metadata about the schemas available in the DuckDB instance. | Column | Description | Type | |:-|:---|:-| | `oid` | Internal identifier of the schema object. | `BIGINT` | | `database_name` | The name of the database that contains this schema. | `VARCHAR` | | `database_oid` | Internal identifier of the database containing the schema. | `BIGINT` | | `schema_name` | The SQL name of the schema. | `VARCHAR` | | `comment` | A comment created by the [`COMMENT ON` statement](#docs:stable:sql:statements:comment_on). | `VARCHAR` | | `tags` | A map of string keyâ€“value pairs. | `MAP(VARCHAR, VARCHAR)` | | `internal` | `true` if this is an internal (built-in) schema, `false` if this is a user-defined schema. | `BOOLEAN` | | `sql` | Always `NULL`| `VARCHAR` | The [`information_schema.schemata`](#docs:stable:sql:meta:information_schema::schemata-database-catalog-and-schema) system view provides a more standardized way to obtain metadata about database schemas. #### `duckdb_secret_types` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_secret_types} The `duckdb_secret_types()` lists secret types that are supported in the current DuckDB session. | Column | Description | Type | |:-|:---|:-| | `type` | The name of the secret type, e.g., `s3`. | `VARCHAR` | | `default_provider` | The default secret provider, e.g., `config`. | `VARCHAR` | | `extension` | The extension that registered the secret type, e.g., `aws`. | `VARCHAR` | #### `duckdb_secrets` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_secrets} The `duckdb_secrets()` function provides metadata about the secrets available in the DuckDB instance. | Column | Description | Type | |:-|:---|:-| | `name` | The name of the secret. | `VARCHAR` | | `type` | The type of the secret, e.g., `S3`, `GCS`, `R2`, `AZURE`. | `VARCHAR` | | `provider` | The provider of the secret. | `VARCHAR` | | `persistent` | Denotes whether the secret is persistent. | `BOOLEAN` | | `storage` | The backend for storing the secret. | `VARCHAR` | | `scope` | The scope of the secret. | `VARCHAR[]` | | `secret_string` | Returns the content of the secret as a string. Sensitive pieces of information, e.g., they access key, are redacted. | `VARCHAR` | #### `duckdb_sequences` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_sequences} The `duckdb_sequences()` function provides metadata about the sequences available in the DuckDB instance. | Column | Description | Type | |:-|:---|:-| | `database_name` | The name of the database that contains this sequence | `VARCHAR` | | `database_oid` | Internal identifier of the database containing the sequence. | `BIGINT` | | `schema_name` | The SQL name of the schema that contains the sequence object. | `VARCHAR` | | `schema_oid` | Internal identifier of the schema object that contains the sequence object. | `BIGINT` | | `sequence_name` | The SQL name that identifies the sequence within the schema. | `VARCHAR` | | `sequence_oid` | The internal identifier of this sequence object. | `BIGINT` | | `comment` | A comment created by the [`COMMENT ON` statement](#docs:stable:sql:statements:comment_on). | `VARCHAR` | | `tags` | A map of string keyâ€“value pairs. | `MAP(VARCHAR, VARCHAR)` | | `temporary` | Whether this sequence is temporary. Temporary sequences are transient and only visible within the current connection. | `BOOLEAN` | | `start_value` | The initial value of the sequence. This value will be returned when `nextval()` is called for the very first time on this sequence. | `BIGINT` | | `min_value` | The minimum value of the sequence. | `BIGINT` | | `max_value` | The maximum value of the sequence. | `BIGINT` | | `increment_by` | The value that is added to the current value of the sequence to draw the next value from the sequence. | `BIGINT` | | `cycle` | Whether the sequence should start over when drawing the next value would result in a value outside the range. | `BOOLEAN` | | `last_value` | `NULL` if no value was ever drawn from the sequence using `nextval(...)`. `1` if a value was drawn. | `BIGINT` | | `sql` | The definition of this object, expressed as SQL DDL-statement. | `VARCHAR` | Attributes like `temporary`, `start_value` etc. correspond to the various options available in the [`CREATE SEQUENCE`](#docs:stable:sql:statements:create_sequence) statement and are documented there in full. Note that the attributes will always be filled out in the `duckdb_sequences` resultset, even if they were not explicitly specified in the `CREATE SEQUENCE` statement. > 1. The column name `last_value` suggests that it contains the last value that was drawn from the sequence, but that is not the case. It's either `NULL` if a value was never drawn from the sequence, or `1` (when there was a value drawn, ever, from the sequence). > > 2. If the sequence cycles, then the sequence will start over from the boundary of its range, not necessarily from the value specified as start value. #### `duckdb_settings` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_settings} The `duckdb_settings()` function provides metadata about the settings available in the DuckDB instance. | Column | Description | Type | |:-|:---|:-| | `name` | Name of the setting. | `VARCHAR` | | `value` | Current value of the setting. | `VARCHAR` | | `description` | A description of the setting. | `VARCHAR` | | `input_type` | The logical datatype of the setting's value. | `VARCHAR` | | `scope` | The scope of the setting (` LOCAL` or `GLOBAL`). | `VARCHAR` | The various settings are described in the [configuration page](#docs:stable:configuration:overview). #### `duckdb_tables` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_tables} The `duckdb_tables()` function provides metadata about the base tables available in the DuckDB instance. | Column | Description | Type | |:-|:---|:-| | `database_name` | The name of the database that contains this table | `VARCHAR` | | `database_oid` | Internal identifier of the database containing the table. | `BIGINT` | | `schema_name` | The SQL name of the schema that contains the base table. | `VARCHAR` | | `schema_oid` | Internal identifier of the schema object that contains the base table. | `BIGINT` | | `table_name` | The SQL name of the base table. | `VARCHAR` | | `table_oid` | Internal identifier of the base table object. | `BIGINT` | | `comment` | A comment created by the [`COMMENT ON` statement](#docs:stable:sql:statements:comment_on). | `VARCHAR` | | `tags` | A map of string keyâ€“value pairs. | `MAP(VARCHAR, VARCHAR)` | | `internal` | `false` if this is a user-defined table. | `BOOLEAN` | | `temporary` | Whether this is a temporary table. Temporary tables are not persisted and only visible within the current connection. | `BOOLEAN` | | `has_primary_key` | `true` if this table object defines a `PRIMARY KEY`. | `BOOLEAN` | | `estimated_size` | The estimated number of rows in the table. | `BIGINT` | | `column_count` | The number of columns defined by this object. | `BIGINT` | | `index_count` | The number of indexes associated with this table. This number includes all secondary indexes, as well as internal indexes generated to maintain `PRIMARY KEY` and/or `UNIQUE` constraints. | `BIGINT` | | `check_constraint_count` | The number of check constraints active on columns within the table. | `BIGINT` | | `sql` | The definition of this object, expressed as SQL [`CREATE TABLE`-statement](#docs:stable:sql:statements:create_table). | `VARCHAR` | The [`information_schema.tables`](#docs:stable:sql:meta:information_schema::tables-tables-and-views) system view provides a more standardized way to obtain metadata about database tables that also includes views. But the resultset returned by `duckdb_tables` contains a few columns that are not included in `information_schema.tables`. #### `duckdb_temporary_files` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_temporary_files} The `duckdb_temporary_files()` function provides metadata about the temporary files DuckDB has written to disk, to offload data from memory. This function mostly exists for debugging and testing purposes. | Column | Description | Type | |:-|:---|:-| | `path` | The name of the temporary file. | `VARCHAR` | | `size` | The size in bytes of the temporary file. | `BIGINT` | #### `duckdb_types` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_types} The `duckdb_types()` function provides metadata about the data types available in the DuckDB instance. | Column | Description | Type | |:-|:---|:-| | `database_name` | The name of the database that contains this schema. | `VARCHAR` | | `database_oid` | Internal identifier of the database that contains the data type. | `BIGINT` | | `schema_name` | The SQL name of the schema containing the type definition. Always `main`. | `VARCHAR` | | `schema_oid` | Internal identifier of the schema object. | `BIGINT` | | `type_name` | The name or alias of this data type. | `VARCHAR` | | `type_oid` | The internal identifier of the data type object. If `NULL`, then this is an alias of the type (as identified by the value in the `logical_type` column). | `BIGINT` | | `type_size` | The number of bytes required to represent a value of this type in memory. | `BIGINT` | | `logical_type` | The 'canonical' name of this data type. The same `logical_type` may be referenced by several types having different `type_name`s. | `VARCHAR` | | `type_category` | The category to which this type belongs. Data types within the same category generally expose similar behavior when values of this type are used in expression. For example, the `NUMERIC` type_category includes integers, decimals, and floating point numbers. | `VARCHAR` | | `comment` | A comment created by the [`COMMENT ON` statement](#docs:stable:sql:statements:comment_on). | `VARCHAR` | | `tags` | A map of string keyâ€“value pairs. | `MAP(VARCHAR, VARCHAR)` | | `internal` | Whether this is an internal (built-in) or a user object. | `BOOLEAN` | | `labels` | Labels for categorizing types. Used for generating the documentation. | `VARCHAR[]` | #### `duckdb_variables` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_variables} The `duckdb_variables()` function provides metadata about the variables available in the DuckDB instance. | Column | Description | Type | |:-|:---|:-| | `name` | The name of the variable, e.g., `x`. | `VARCHAR` | | `value` | The value of the variable, e.g., `12`. | `VARCHAR` | | `type` | The type of the variable, e.g., `INTEGER`. | `VARCHAR` | #### `duckdb_views` {#docs:stable:sql:meta:duckdb_table_functions::duckdb_views} The `duckdb_views()` function provides metadata about the views available in the DuckDB instance. | Column | Description | Type | |:-|:---|:-| | `database_name` | The name of the database that contains this view. | `VARCHAR` | | `database_oid` | Internal identifier of the database that contains this view. | `BIGINT` | | `schema_name` | The SQL name of the schema where the view resides. | `VARCHAR` | | `schema_oid` | Internal identifier of the schema object that contains the view. | `BIGINT` | | `view_name` | The SQL name of the view object. | `VARCHAR` | | `view_oid` | The internal identifier of this view object. | `BIGINT` | | `comment` | A comment created by the [`COMMENT ON` statement](#docs:stable:sql:statements:comment_on). | `VARCHAR` | | `tags` | A map of string keyâ€“value pairs. | `MAP(VARCHAR, VARCHAR)` | | `internal` | `true` if this is an internal (built-in) view, `false` if this is a user-defined view. | `BOOLEAN` | | `temporary` | `true` if this is a temporary view. Temporary views are not persistent and are only visible within the current connection. | `BOOLEAN` | | `column_count` | The number of columns defined by this view object. | `BIGINT` | | `sql` | The definition of this object, expressed as SQL DDL-statement. | `VARCHAR` | The [`information_schema.tables`](#docs:stable:sql:meta:information_schema::tables-tables-and-views) system view provides a more standardized way to obtain metadata about database views that also includes base tables. But the resultset returned by `duckdb_views` contains also definitions of internal view objects as well as a few columns that are not included in `information_schema.tables`. ## DuckDB's SQL Dialect {#sql:dialect} ### Overview {#docs:stable:sql:dialect:overview} DuckDB's SQL dialect is based on PostgreSQL. DuckDB tries to closely match PostgreSQL's semantics, however, some use cases require slightly different behavior. For example, interchangeability with data frame libraries necessitates [order preservation of inserts](#docs:stable:sql:dialect:order_preservation) to be supported by default. These differences are documented in the pages below. ### Indexing {#docs:stable:sql:dialect:indexing} DuckDB uses 1-based indexing except for [JSON objects](#docs:stable:data:json:overview), which use 0-based indexing. #### Examples {#docs:stable:sql:dialect:indexing::examples} The index origin is 1 for strings, lists, etc. ```sql SELECT list[1] AS element FROM (SELECT ['first', 'second', 'third'] AS list); ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ element â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ first â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` The index origin is 0 for JSON objects. ```sql SELECT json[1] AS element FROM (SELECT '["first", "second", "third"]'::JSON AS json); ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ element â”‚ â”‚ json â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ "second" â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ### Friendly SQL {#docs:stable:sql:dialect:friendly_sql} DuckDB offers several advanced SQL features and syntactic sugar to make SQL queries more concise. We refer to these colloquially as â€œfriendly SQLâ€. > Several of these features are also supported in other systems while some are (currently) exclusive to DuckDB. #### Clauses {#docs:stable:sql:dialect:friendly_sql::clauses} * Creating tables and inserting data: * [`CREATE OR REPLACE TABLE`](#docs:stable:sql:statements:create_table::create-or-replace): avoid `DROP TABLE IF EXISTS` statements in scripts. * [`CREATE TABLE ... AS SELECT` (CTAS)](#docs:stable:sql:statements:create_table::create-table--as-select-ctas): create a new table from the output of a table without manually defining a schema. * [`INSERT INTO ... BY NAME`](#docs:stable:sql:statements:insert::insert-into--by-name): this variant of the `INSERT` statement allows using column names instead of positions. * [`INSERT OR IGNORE INTO ...`](#docs:stable:sql:statements:insert::insert-or-ignore-into): insert the rows that do not result in a conflict due to `UNIQUE` or `PRIMARY KEY` constraints. * [`INSERT OR REPLACE INTO ...`](#docs:stable:sql:statements:insert::insert-or-replace-into): insert the rows that do not result in a conflict due to `UNIQUE` or `PRIMARY KEY` constraints. For those that result in a conflict, replace the columns of the existing row to the new values of the to-be-inserted row. * Describing tables and computing statistics: * [`DESCRIBE`](#docs:stable:guides:meta:describe): provides a succinct summary of the schema of a table or query. * [`SUMMARIZE`](#docs:stable:guides:meta:summarize): returns summary statistics for a table or query. * Making SQL clauses more compact and readable: * [`FROM`-first syntax with an optional `SELECT` clause](#docs:stable:sql:query_syntax:from::from-first-syntax): DuckDB allows queries in the form of `FROM tbl` which selects all columns (performing a `SELECT *` statement). * [`GROUP BY ALL`](#docs:stable:sql:query_syntax:groupby::group-by-all): omit the group-by columns by inferring them from the list of attributes in the `SELECT` clause. * [`ORDER BY ALL`](#docs:stable:sql:query_syntax:orderby::order-by-all): shorthand to order on all columns (e.g., to ensure deterministic results). * [`SELECT * EXCLUDE`](#docs:stable:sql:expressions:star::exclude-clause): the `EXCLUDE` option allows excluding specific columns from the `*` expression. * [`SELECT * REPLACE`](#docs:stable:sql:expressions:star::replace-clause): the `REPLACE` option allows replacing specific columns with different expressions in a `*` expression. * [`UNION BY NAME`](#docs:stable:sql:query_syntax:setops::union-all-by-name): perform the `UNION` operation along the names of columns (instead of relying on positions). * [Prefix aliases in the `SELECT` and `FROM` clauses](#docs:stable:sql:query_syntax:select): write `x: 42` instead of `42 AS x` for improved readability. * [Specifying a percentage of the table size for the `LIMIT` clause](#docs:stable:sql:query_syntax:limit): write `LIMIT 10%` to return 10% of the query results. * Transforming tables: * [`PIVOT`](#docs:stable:sql:statements:pivot) to turn long tables to wide tables. * [`UNPIVOT`](#docs:stable:sql:statements:unpivot) to turn wide tables to long tables. * Defining SQL-level variables: * [`SET VARIABLE`](#docs:stable:sql:statements:set::set-variable) * [`RESET VARIABLE`](#docs:stable:sql:statements:set::reset-variable) #### Query Features {#docs:stable:sql:dialect:friendly_sql::query-features} * [Column aliases in `WHERE`, `GROUP BY`, and `HAVING`](https://duckdb.org/2022/05/04/friendlier-sql#column-aliases-in-where--group-by--having). (Note that column aliases cannot be used in the `ON` clause of [`JOIN` clauses](#docs:stable:sql:query_syntax:from::joins).) * [`COLUMNS()` expression](#docs:stable:sql:expressions:star::columns-expression) can be used to execute the same expression on multiple columns: * [with regular expressions](https://duckdb.org/2023/08/23/even-friendlier-sql#columns-with-regular-expressions) * [with `EXCLUDE` and `REPLACE`](https://duckdb.org/2023/08/23/even-friendlier-sql#columns-with-exclude-and-replace) * [with lambda functions](https://duckdb.org/2023/08/23/even-friendlier-sql#columns-with-lambda-functions) * Reusable column aliases (also known as â€œlateral column aliasesâ€), e.g.: `SELECT i + 1 AS j, j + 2 AS k FROM range(0, 3) t(i)` * Advanced aggregation features for analytical (OLAP) queries: * [`FILTER` clause](#docs:stable:sql:query_syntax:filter) * [`GROUPING SETS`, `GROUP BY CUBE`, `GROUP BY ROLLUP` clauses](#docs:stable:sql:query_syntax:grouping_sets) * [`count()` shorthand](#docs:stable:sql:functions:aggregates) for `count(*)` * [`IN` operator for lists and maps](#docs:stable:sql:expressions:in) * [Specifying column names for common table expressions (` WITH`)](#docs:stable:sql:query_syntax:with::basic-cte-examples) * [Specifying column names in the `JOIN` clause](#docs:stable:sql:query_syntax:from::shorthands-in-the-join-clause) * [Using `VALUES` in the `JOIN` clause](#docs:stable:sql:query_syntax:from::shorthands-in-the-join-clause) * [Using `VALUES` in the anchor part of common table expressions](#docs:stable:sql:query_syntax:with::using-values) #### Literals and Identifiers {#docs:stable:sql:dialect:friendly_sql::literals-and-identifiers} * [Case-insensitivity while maintaining case of entities in the catalog](#docs:stable:sql:dialect:keywords_and_identifiers::case-sensitivity-of-identifiers) * [Deduplicating identifiers](#docs:stable:sql:dialect:keywords_and_identifiers::deduplicating-identifiers) * [Underscores as digit separators in numeric literals](#docs:stable:sql:dialect:keywords_and_identifiers::numeric-literals) #### Data Types {#docs:stable:sql:dialect:friendly_sql::data-types} * [`MAP` data type](#docs:stable:sql:data_types:map) * [`UNION` data type](#docs:stable:sql:data_types:union) #### Data Import {#docs:stable:sql:dialect:friendly_sql::data-import} * [Auto-detecting the headers and schema of CSV files](#docs:stable:data:csv:auto_detection) * Directly querying [CSV files](#docs:stable:data:csv:overview) and [Parquet files](#docs:stable:data:parquet:overview) * [Replacement scans](#docs:stable:guides:glossary): * You can load from files using the syntax `FROM 'my.csv'`, `FROM 'my.csv.gz'`, `FROM 'my.parquet'`, etc. * In Python, you can [access Pandas data frames using `FROM df`](#docs:stable:guides:python:export_pandas). * [Filename expansion (globbing)](#docs:stable:sql:functions:pattern_matching::globbing), e.g.: `FROM 'my-data/part-*.parquet'` #### Functions and Expressions {#docs:stable:sql:dialect:friendly_sql::functions-and-expressions} * [Dot operator for function chaining](#docs:stable:sql:functions:overview::function-chaining-via-the-dot-operator): `SELECT ('hello').upper()` * String formatters: the [`format()` function with the `fmt` syntax](#docs:stable:sql:functions:text::fmt-syntax) and the [`printf() function`](#docs:stable:sql:functions:text::printf-syntax) * [List comprehensions](https://duckdb.org/2023/08/23/even-friendlier-sql#list-comprehensions) * [List slicing](https://duckdb.org/2022/05/04/friendlier-sql#string-slicing) and indexing from the back (` [-1]`) * [String slicing](https://duckdb.org/2022/05/04/friendlier-sql#string-slicing) * [`STRUCT.*` notation](https://duckdb.org/2022/05/04/friendlier-sql#struct-dot-notation) * [Creating `LIST` using square brackets](#docs:stable:sql:data_types:list::creating-lists) * [Simple `LIST` and `STRUCT` creation](https://duckdb.org/2022/05/04/friendlier-sql#simple-list-and-struct-creation) * [Updating the schema of `STRUCT`s](#docs:stable:sql:data_types:struct::updating-the-schema) #### Join Types {#docs:stable:sql:dialect:friendly_sql::join-types} * [`ASOF` joins](#docs:stable:sql:query_syntax:from::as-of-joins) * [`LATERAL` joins](#docs:stable:sql:query_syntax:from::lateral-joins) * [`POSITIONAL` joins](#docs:stable:sql:query_syntax:from::positional-joins) #### Trailing Commas {#docs:stable:sql:dialect:friendly_sql::trailing-commas} DuckDB allows [trailing commas](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Trailing_commas), both when listing entities (e.g., column and table names) and when constructing [`LIST` items](#docs:stable:sql:data_types:list::creating-lists). For example, the following query works: ```sql SELECT 42 AS x, ['a', 'b', 'c',] AS y, 'hello world' AS z, ; ``` #### "Top-N in Group" Queries {#docs:stable:sql:dialect:friendly_sql::top-n-in-group-queries} Computing the "top-N rows in a group" ordered by some criteria is a common task in SQL that unfortunately often requires a complex query involving window functions and/or subqueries. To aid in this, DuckDB provides the aggregate functions [`max(arg, n)`](#docs:stable:sql:functions:aggregates::maxarg-n), [`min(arg, n)`](#docs:stable:sql:functions:aggregates::minarg-n), [`arg_max(arg, val, n)`](#docs:stable:sql:functions:aggregates::arg_maxarg-val-n), [`arg_min(arg, val, n)`](#docs:stable:sql:functions:aggregates::arg_minarg-val-n), [`max_by(arg, val, n)`](#docs:stable:sql:functions:aggregates::max_byarg-val-n) and [`min_by(arg, val, n)`](#docs:stable:sql:functions:aggregates::min_byarg-val-n) to efficiently return the "top" `n` rows in a group based on a specific column in either ascending or descending order. For example, let's use the following table: ```sql SELECT * FROM t1; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â” â”‚ grp â”‚ val â”‚ â”‚ varchar â”‚ int32 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ a â”‚ 2 â”‚ â”‚ a â”‚ 1 â”‚ â”‚ b â”‚ 5 â”‚ â”‚ b â”‚ 4 â”‚ â”‚ a â”‚ 3 â”‚ â”‚ b â”‚ 6 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”˜ ``` We want to get a list of the top-3 `val` values in each group `grp`. The conventional way to do this is to use a window function in a subquery: ```sql SELECT array_agg(rs.val), rs.grp FROM (SELECT val, grp, row_number() OVER (PARTITION BY grp ORDER BY val DESC) AS rid FROM t1 ORDER BY val DESC) AS rs WHERE rid < 4 GROUP BY rs.grp; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ array_agg(rs.val) â”‚ grp â”‚ â”‚ int32[] â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ [3, 2, 1] â”‚ a â”‚ â”‚ [6, 5, 4] â”‚ b â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` But in DuckDB, we can do this much more concisely (and efficiently!): ```sql SELECT max(val, 3) FROM t1 GROUP BY grp; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ max(val, 3) â”‚ â”‚ int32[] â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ [3, 2, 1] â”‚ â”‚ [6, 5, 4] â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Related Blog Posts {#docs:stable:sql:dialect:friendly_sql::related-blog-posts} * [â€œFriendlier SQL with DuckDBâ€](https://duckdb.org/2022/05/04/friendlier-sql) blog post * [â€œEven Friendlier SQL with DuckDBâ€](https://duckdb.org/2023/08/23/even-friendlier-sql) blog post * [â€œSQL Gymnastics: Bending SQL into Flexible New Shapesâ€](https://duckdb.org/2024/03/01/sql-gymnastics) blog post ### Keywords and Identifiers {#docs:stable:sql:dialect:keywords_and_identifiers} #### Identifiers {#docs:stable:sql:dialect:keywords_and_identifiers::identifiers} Similarly to other SQL dialects and programming languages, identifiers in DuckDB's SQL are subject to several rules. * Unquoted identifiers need to conform to a number of rules: * They must not be a reserved keyword (see [`duckdb_keywords()`](#docs:stable:sql:meta:duckdb_table_functions::duckdb_keywords)), e.g., `SELECT 123 AS SELECT` will fail. * They must not start with a number or special character, e.g., `SELECT 123 AS 1col` is invalid. * They cannot contain whitespaces (including tabs and newline characters). * Identifiers can be quoted using double-quote characters (` "`). Quoted identifiers can use any keyword, whitespace or special character, e.g., `"SELECT"` and `" Â§ ðŸ¦† Â¶ "` are valid identifiers. * Double quotes can be escaped by repeating the quote character, e.g., to create an identifier named `IDENTIFIER "X"`, use `"IDENTIFIER ""X"""`. ##### Deduplicating Identifiers {#docs:stable:sql:dialect:keywords_and_identifiers::deduplicating-identifiers} In some cases, duplicate identifiers can occur, e.g., column names may conflict when unnesting a nested data structure. In these cases, DuckDB automatically deduplicates column names by renaming them according to the following rules: * For a column named `âŸ¨nameâŸ©`{:.language-sql .highlight}, the first instance is not renamed. * Subsequent instances are renamed to `âŸ¨nameâŸ©_âŸ¨countâŸ©`{:.language-sql .highlight}, where `âŸ¨countâŸ©`{:.language-sql .highlight} starts at 1. For example: ```sql SELECT * FROM (SELECT unnest({'a': 42, 'b': {'a': 88, 'b': 99}}, recursive := true)); ``` | a | a_1 | b | |---:|----:|---:| | 42 | 88 | 99 | #### Database Names {#docs:stable:sql:dialect:keywords_and_identifiers::database-names} Database names are subject to the rules for [identifiers](#::identifiers). Additionally, it is best practice to avoid DuckDB's two internal [database schema names](#docs:stable:sql:meta:duckdb_table_functions::duckdb_databases), `system` and `temp`. By default, persistent databases are named after their filename without the extension. Therefore, the filenames `system.db` and `temp.db` (as well as `system.duckdb` and `temp.duckdb`) result in the database names `system` and `temp`, respectively. If you need to attach to a database that has one of these names, use an alias, e.g.: ```sql ATTACH 'temp.db' AS temp2; USE temp2; ``` #### Rules for Case-Sensitivity {#docs:stable:sql:dialect:keywords_and_identifiers::rules-for-case-sensitivity} ##### Keywords and Function Names {#docs:stable:sql:dialect:keywords_and_identifiers::keywords-and-function-names} SQL keywords and function names are case-insensitive in DuckDB. For example, the following two queries are equivalent: ```matlab select COS(Pi()) as CosineOfPi; SELECT cos(pi()) AS CosineOfPi; ``` | CosineOfPi | |-----------:| | -1.0 | ##### Case-Sensitivity of Identifiers {#docs:stable:sql:dialect:keywords_and_identifiers::case-sensitivity-of-identifiers} Identifiers in DuckDB are always case-insensitive, similarly to PostgreSQL. However, unlike PostgreSQL (and some other major SQL implementations), DuckDB also treats quoted identifiers as case-insensitive. **Comparison of identifiers:** Case-insensitivity is implemented using an ASCII-based comparison: `col_A` and `col_a` are equal but `col_Ã¡` is not equal to them. ```sql SELECT col_A FROM (SELECT 'x' AS col_a); -- succeeds SELECT col_Ã¡ FROM (SELECT 'x' AS col_a); -- fails ``` **Preserving cases:** While DuckDB treats identifiers in a case-insensitive manner, it preservers the cases of these identifiers. That is, each character's case (uppercase/lowercase) is maintained as originally specified by the user even if a query uses different cases when referring to the identifier. For example: ```sql CREATE TABLE tbl AS SELECT cos(pi()) AS CosineOfPi; SELECT cosineofpi FROM tbl; ``` | CosineOfPi | |-----------:| | -1.0 | To change this behavior, set the `preserve_identifier_case` [configuration option](#docs:stable:configuration:overview::configuration-reference) to `false`. ##### Case-Sensitivity of Keys in Nested Data Structures {#docs:stable:sql:dialect:keywords_and_identifiers::case-sensitivity-of-keys-in-nested-data-structures} The keys of `MAP`s are case-sensitive: ```sql SELECT MAP(['key1'], [1]) = MAP(['KEY1'], [1]) AS equal; ``` ```text false ``` The keys of `UNION`s and `STRUCT`s are case-insensitive: ```sql SELECT {'key1': 1} = {'KEY1': 1} AS equal; ``` ```text true ``` ```sql SELECT union_value(key1 := 1) = union_value(KEY1 := 1) as equal; ``` ```text true ``` ###### Handling Conflicts {#docs:stable:sql:dialect:keywords_and_identifiers::handling-conflicts} In case of a conflict, when the same identifier is spelt with different cases, one will be selected randomly. For example: ```sql CREATE TABLE t1 (idfield INTEGER, x INTEGER); CREATE TABLE t2 (IdField INTEGER, y INTEGER); INSERT INTO t1 VALUES (1, 123); INSERT INTO t2 VALUES (1, 456); SELECT * FROM t1 NATURAL JOIN t2; ``` | idfield | x | y | |--------:|----:|----:| | 1 | 123 | 456 | ###### Disabling Preserving Cases {#docs:stable:sql:dialect:keywords_and_identifiers::disabling-preserving-cases} With the `preserve_identifier_case` [configuration option](#docs:stable:configuration:overview::configuration-reference) set to `false`, all identifiers are turned into lowercase: ```sql SET preserve_identifier_case = false; CREATE TABLE tbl AS SELECT cos(pi()) AS CosineOfPi; SELECT CosineOfPi FROM tbl; ``` | cosineofpi | |-----------:| | -1.0 | ### Order Preservation {#docs:stable:sql:dialect:order_preservation} For many operations, DuckDB preserves the order of rows, similarly to data frame libraries such as Pandas. #### Example {#docs:stable:sql:dialect:order_preservation::example} Take the following table for example: ```sql CREATE TABLE tbl AS SELECT * FROM (VALUES (1, 'a'), (2, 'b'), (3, 'c')) t(x, y); SELECT * FROM tbl; ``` | x | y | |--:|---| | 1 | a | | 2 | b | | 3 | c | Let's take the following query that returns the rows where `x` is an odd number: ```sql SELECT * FROM tbl WHERE x % 2 == 1; ``` | x | y | |--:|---| | 1 | a | | 3 | c | Because the row `(1, 'a')` occurs before `(3, 'c')` in the original table, it is guaranteed to come before that row in this table too. #### Clauses {#docs:stable:sql:dialect:order_preservation::clauses} The following clauses guarantee that the original row order is preserved: * `COPY` (see [Insertion Order](#::insertion-order)) * `FROM` with a single table * `LIMIT` * `OFFSET` * `SELECT` * `UNION ALL` * `WHERE` * Window functions with an empty `OVER` clause * Common table expressions and table subqueries as long as they only contains the aforementioned components > **Tip.** `row_number() OVER ()` allows turning the original row order into an explicit column that can be referenced in the operations that don't preserve row order by default. On materialized tables, the `rowid` pseudo-column can be used to the same effect. The following operations **do not** guarantee that the row order is preserved: * `FROM` with multiple tables and/or subqueries * `JOIN` * `UNION` * `USING SAMPLE` * `GROUP BY` (in particular, the output order is undefined and the order in which rows are fed into [order-sensitive aggregate functions](https://duckdb.org/docs/sql/functions/aggregates.html#order-by-clause-in-aggregate-functions) is undefined unless explicitly specified in the aggregate function) * `ORDER BY` (specifically, `ORDER BY` may not use a [stable algorithm](https://en.m.wikipedia.org/wiki/Stable_algorithm)) * Scalar subqueries #### Insertion Order {#docs:stable:sql:dialect:order_preservation::insertion-order} By default, the following components preserve insertion order: * [CSV reader](#docs:stable:data:csv:overview::order-preservation) (` read_csv` function) * [JSON reader](#docs:stable:data:json:overview::order-preservation) (` read_json` function) * [Parquet reader](#docs:stable:data:parquet:overview::order-preservation) (` read_parquet` function) Preservation of insertion order is controlled by the `preserve_insertion_order` [configuration option](#docs:stable:configuration:overview). This setting is `true` by default, indicating that the order should be preserved. To change this setting, use: ```sql SET preserve_insertion_order = false; ``` ### PostgreSQL Compatibility {#docs:stable:sql:dialect:postgresql_compatibility} DuckDB's SQL dialect closely follows the conventions of the PostgreSQL dialect. The few exceptions to this are listed on this page. #### Floating-Point Arithmetic {#docs:stable:sql:dialect:postgresql_compatibility::floating-point-arithmetic} DuckDB and PostgreSQL handle floating-point arithmetic differently for division by zero. DuckDB conforms to the [IEEE Standard for Floating-Point Arithmetic (IEEE 754)](https://en.wikipedia.org/wiki/IEEE_754) for both division by zero and operations involving infinity values. PostgreSQL returns an error for division by zero but aligns with IEEE 754 for handling infinity values. To show the differences, run the following SQL queries: ```sql SELECT 1.0 / 0.0 AS x; SELECT 0.0 / 0.0 AS x; SELECT -1.0 / 0.0 AS x; SELECT 'Infinity'::FLOAT / 'Infinity'::FLOAT AS x; SELECT 1.0 / 'Infinity'::FLOAT AS x; SELECT 'Infinity'::FLOAT - 'Infinity'::FLOAT AS x; SELECT 'Infinity'::FLOAT - 1.0 AS x; ``` | Expression | PostgreSQL | DuckDB | IEEE 754 | | :---------------------- | ---------: | --------: | --------: | | 1.0 / 0.0 | error | Infinity | Infinity | | 0.0 / 0.0 | error | NaN | NaN | | -1.0 / 0.0 | error | -Infinity | -Infinity | | 'Infinity' / 'Infinity' | NaN | NaN | NaN | | 1.0 / 'Infinity' | 0.0 | 0.0 | 0.0 | | 'Infinity' - 'Infinity' | NaN | NaN | NaN | | 'Infinity' - 1.0 | Infinity | Infinity | Infinity | #### Division on Integers {#docs:stable:sql:dialect:postgresql_compatibility::division-on-integers} When computing division on integers, PostgreSQL performs integer division, while DuckDB performs float division: ```sql SELECT 1 / 2 AS x; ``` PostgreSQL returns `0`, while DuckDB returns `0.5`. To perform integer division in DuckDB, use the `//` operator: ```sql SELECT 1 // 2 AS x; ``` This returns `0`. #### `UNION` of Boolean and Integer Values {#docs:stable:sql:dialect:postgresql_compatibility::union-of-boolean-and-integer-values} The following query fails in PostgreSQL but successfully completes in DuckDB: ```sql SELECT true AS x UNION SELECT 2; ``` PostgreSQL returns an error: ```console ERROR: UNION types boolean and integer cannot be matched ``` DuckDB performs an enforced cast, therefore, it completes the query and returns the following: | x | | ---: | | 1 | | 2 | #### Implicit Casting on Equality Checks {#docs:stable:sql:dialect:postgresql_compatibility::implicit-casting-on-equality-checks} DuckDB performs implicit casting on equality checks, e.g., converting strings to numeric and boolean values. Therefore, there are several instances, where PostgreSQL throws an error while DuckDB successfully computes the result: | Expression | PostgreSQL | DuckDB | | :------------ | ---------- | ------ | | '1.1' = 1 | error | true | | '1.1' = 1.1 | true | true | | 1 = 1.1 | false | false | | true = 'true' | true | true | | true = 1 | error | true | | 'true' = 1 | error | error | #### Case Sensitivity for Quoted Identifiers {#docs:stable:sql:dialect:postgresql_compatibility::case-sensitivity-for-quoted-identifiers} PostgreSQL is case-insensitive. The way PostgreSQL achieves case insensitivity is by lowercasing unquoted identifiers within SQL, whereas quoting preserves case, e.g., the following command creates a table named `mytable` but tries to query for `MyTaBLe` because quotes preserve the case. ```sql CREATE TABLE MyTaBLe (x INTEGER); SELECT * FROM "MyTaBLe"; ``` ```console ERROR: relation "MyTaBLe" does not exist ``` PostgreSQL does not only treat quoted identifiers as case-sensitive, PostgreSQL treats all identifiers as case-sensitive, e.g., this also does not work: ```sql CREATE TABLE "PreservedCase" (x INTEGER); SELECT * FROM PreservedCase; ``` ```console ERROR: relation "preservedcase" does not exist ``` Therefore, case-insensitivity in PostgreSQL only works if you never use quoted identifiers with different cases. For DuckDB, this behavior was problematic when interfacing with other tools (e.g., Parquet, Pandas) that are case-sensitive by default â€“ since all identifiers would be lowercased all the time. Therefore, DuckDB achieves case insensitivity by making identifiers fully case insensitive throughout the system but [_preserving their case_](#docs:stable:sql:dialect:keywords_and_identifiers::rules-for-case-sensitivity). In DuckDB, the scripts above complete successfully: ```sql CREATE TABLE MyTaBLe (x INTEGER); SELECT * FROM "MyTaBLe"; CREATE TABLE "PreservedCase" (x INTEGER); SELECT * FROM PreservedCase; SELECT tbl FROM duckdb_tables(); ``` | tbl | | ------------- | | MyTaBLe | | PreservedCase | PostgreSQL's behavior of lowercasing identifiers is accessible using the [`preserve_identifier_case` option](#docs:stable:configuration:overview::local-configuration-options): ```sql SET preserve_identifier_case = false; CREATE TABLE MyTaBLe (x INTEGER); SELECT tbl FROM duckdb_tables(); ``` | tbl | | ---------- | | mytable | However, the case insensitive matching in the system for identifiers cannot be turned off. #### Using Double Equality Sign for Comparison {#docs:stable:sql:dialect:postgresql_compatibility::using-double-equality-sign-for-comparison} DuckDB supports both `=` and `==` for quality comparison, while PostgreSQL only supports `=`. ```sql SELECT 1 == 1 AS t; ``` DuckDB returns `true`, while PostgreSQL returns: ```console postgres=# SELECT 1 == 1 AS t; ERROR: operator does not exist: integer == integer LINE 1: SELECT 1 == 1 AS t; ``` Note that the use of `==` is not encouraged due to its limited portability. #### Vacuuming Tables {#docs:stable:sql:dialect:postgresql_compatibility::vacuuming-tables} In PostgreSQL, the `VACUUM` statement garbage collects tables and analyzes tables. In DuckDB, the [`VACUUM` statement](#docs:stable:sql:statements:vacuum) is only used to rebuild statistics. For instruction on reclaiming space, refer to the [â€œReclaiming spaceâ€ page](#docs:stable:operations_manual:footprint_of_duckdb:reclaiming_space). #### Strings {#docs:stable:sql:dialect:postgresql_compatibility::strings} Since version 1.3.0, DuckDB escapes characters such as `'` in strings serialized in nested data structures. PostgreSQL does not do this. For an example, run: ```sql SELECT ARRAY['''']; ``` PostgreSQL returns: ```text {'} ``` DuckDB returns: ```text ['\''] ``` #### Functions {#docs:stable:sql:dialect:postgresql_compatibility::functions} ##### `regexp_extract` Function {#docs:stable:sql:dialect:postgresql_compatibility::regexp_extract-function} Unlike PostgreSQL's `regexp_substr` function, DuckDB's `regexp_extract` returns empty strings instead of `NULL`s when there is no match. ##### `to_date` Function {#docs:stable:sql:dialect:postgresql_compatibility::to_date-function} DuckDB does not support the [`to_date` PostgreSQL date formatting function](https://www.postgresql.org/docs/17/functions-formatting.html). Instead, please use the [`strptime` function](#docs:stable:sql:functions:dateformat::strptime-examples). ##### `date_part` Function {#docs:stable:sql:dialect:postgresql_compatibility::date_part-function} Most parts extracted by the [`date_part` function](#docs:stable:sql:functions:datepart) are returned as integers. Since there are no infinite integer values in DuckDB, `NULL`s are returned for infinite timestamps. #### Resolution of Type Names in the Schema {#docs:stable:sql:dialect:postgresql_compatibility::resolution-of-type-names-in-the-schema} For [`CREATE TABLE` statements](#docs:stable:sql:statements:create_table), DuckDB attempts to resolve type names in the schema where a table is created. For example: ```sql CREATE SCHEMA myschema; CREATE TYPE myschema.mytype AS ENUM ('as', 'df'); CREATE TABLE myschema.mytable (v mytype); ``` PostgreSQL returns an error on the last statement: ```console ERROR: type "mytype" does not exist LINE 1: CREATE TABLE myschema.mytable (v mytype); ``` DuckDB runs the statement and creates the table successfully, confirmed by the following query: ```sql DESCRIBE myschema.mytable; ``` | column_name | column_type | null | key | default | extra | | ----------- | ---------------- | ---- | ---- | ------- | ----- | | v | ENUM('as', 'df') | YES | NULL | NULL | NULL | #### Exploiting Functional Dependencies for `GROUP BY` {#docs:stable:sql:dialect:postgresql_compatibility::exploiting-functional-dependencies-for-group-by} PostgreSQL can exploit functional dependencies, such as `i -> j` in the following query: ```sql CREATE TABLE tbl (i INTEGER, j INTEGER, PRIMARY KEY (i)); SELECT j FROM tbl GROUP BY i; ``` PostgreSQL runs the query. DuckDB fails: ```console Binder Error: column "j" must appear in the GROUP BY clause or must be part of an aggregate function. Either add it to the GROUP BY list, or use "ANY_VALUE(j)" if the exact value of "j" is not important. ``` To work around this, add the other attributes or use the [`GROUP BY ALL` clause](https://duckdb.org/docs/sql/query_syntax/groupby#group-by-all). #### Behavior of Regular Expression Match Operators {#docs:stable:sql:dialect:postgresql_compatibility::behavior-of-regular-expression-match-operators} PostgreSQL supports the [POSIX regular expression matching operators](#docs:stable:sql:functions:pattern_matching) `~` (case-sensitive partial regex matching) and `~*` (case-insensitive partial regex matching) as well as their negated variants, `!~` and `!~*`, respectively. In DuckDB, `~` is equivalent to [`regexp_full_match`](#docs:stable:sql:functions:text::regexp_full_matchstring-regex) and `!~` is equivalent to `NOT regexp_full_match`. The operators `~*` and `!~*` are not supported. The table below shows that the correspondence between these functions in PostgreSQL and DuckDB is almost non-existent. We recommend avoiding the POSIX regular expression matching operators in DuckDB. | Expression | PostgreSQL | DuckDB | | :------------------ | ---------- | ------ | | `'aaa' ~ '(a|b)'` | true | false | | `'AAA' ~* '(a|b)'` | true | error | | `'aaa' !~ '(a|b)'` | false | true | | `'AAA' !~* '(a|b)'` | false | error | ### SQL Quirks {#docs:stable:sql:dialect:sql_quirks} Like all programming languages and libraries, DuckDB has its share of idiosyncrasies and inconsistencies. Some are vestiges of our feathered friend's evolution; others are inevitable because we strive to adhere to the [SQL Standard](https://blog.ansi.org/sql-standard-iso-iec-9075-2023-ansi-x3-135/) and specifically to PostgreSQL's dialect (see the [â€œPostgreSQL Compatibilityâ€](#docs:stable:sql:dialect:postgresql_compatibility) page for exceptions). The rest may simply come down to different preferences, or we may even agree on what _should_ be done but just havenâ€™t gotten around to it yet. Acknowledging these quirks is the best we can do, which is why we have compiled below a list of examples. #### Aggregating Empty Groups {#docs:stable:sql:dialect:sql_quirks::aggregating-empty-groups} On empty groups, the aggregate functions `sum`, `list`, and `string_agg` all return `NULL` instead of `0`, `[]` and `''`, respectively. This is dictated by the SQL Standard and obeyed by all SQL implementations we know. This behavior is inherited by the list aggregate [`list_sum`](#docs:stable:sql:functions:list::list_-rewrite-functions), but not by the DuckDB original [`list_dot_product`](#docs:stable:sql:functions:list::list_dot_productlist1-list2) which returns `0` on empty lists. #### 0 vs. 1-Based Indexing {#docs:stable:sql:dialect:sql_quirks::0-vs-1-based-indexing} To comply with standard SQL, one-based indexing is used almost everywhere, e.g., array and string indexing and slicing, and window functions (` row_number`, `rank`, `dense_rank`). However, similarly to PostgreSQL, [JSON features use a zero-based indexing](#docs:stable:data:json:overview::indexing). #### Types {#docs:stable:sql:dialect:sql_quirks::types} ##### `UINT8` vs. `INT8` {#docs:stable:sql:dialect:sql_quirks::uint8-vs-int8} `UINT8` and `INT8` are aliases to integer types of different widths: * `UINT8` corresponds to `UTINYINT` because it's an _8-bit_ unsigned integer * `INT8` corresponds to `BIGINT` because it's an _8-byte_ signed integer Explanation: the `n` in the numeric type `INTn` and `UINTn` denote the width of the number in either bytes or bits. `INT1`, `INT2`, `INT4` correspond to the number of bytes, while `INT16`, `INT32` and `INT64` correpsond to the number of bits. The same applies to `UINT` values. However, the value `n = 8` is a valid choice for both the number of bits and bytes. For unsigned values, `UINT8` corresponds to `UTINYINT` (8 bits). For signed values, `INT8` corresponds to `BIGINT` (8 bytes). #### Expressions {#docs:stable:sql:dialect:sql_quirks::expressions} ##### Results That May Surprise You {#docs:stable:sql:dialect:sql_quirks::results-that-may-surprise-you} | Expression | Result | Note | |----------------------------|---------|-------------------------------------------------------------------------------| | `-2^2` | `4.0` | PostgreSQL compatibility means the unary minus has higher precedence than the exponentiation operator. Use additional parentheses, e.g., `-(2^2)` or the [`pow` function](#docs:stable:sql:functions:numeric::powx-y), e.g., `-pow(2, 2)`, to avoid mistakes. | | `'t' = true` | `true` | Compatible with PostgreSQL. | | `1 = '1'` | `true` | Compatible with PostgreSQL. | | `1 = ' 1'` | `true` | Compatible with PostgreSQL. | | `1 = '01'` | `true` | Compatible with PostgreSQL. | | `1 = ' 01 '` | `true` | Compatible with PostgreSQL. | | `1 = true` | `true` | Not compatible with PostgreSQL. | | `1 = '1.1'` | `true` | Not compatible with PostgreSQL. | | `1 IN (0, NULL)` | `NULL` | Makes sense if you think of the `NULL`s in the input and output as `UNKNOWN`. | | `1 in [0, NULL]` | `false` | | | `concat('abc', NULL)` | `abc` | Compatible with PostgreSQL. `list_concat` behaves similarly. | | `'abc' || NULL` | `NULL` | | ##### `NaN` Values {#docs:stable:sql:dialect:sql_quirks::nan-values} `'NaN'::FLOAT = 'NaN'::FLOAT` and `'NaN'::FLOAT > 3` violate IEEE-754 but mean floating point data types have a total order, like all other data types (beware the consequences for `greatest` / `least`). ##### `age` Function {#docs:stable:sql:dialect:sql_quirks::age-function} `age(x)` is `current_date - x` instead of `current_timestamp - x`. Another quirk inherited from PostgreSQL. ##### Extract Functions {#docs:stable:sql:dialect:sql_quirks::extract-functions} `list_extract` / `map_extract` return `NULL` on non-existing keys. `struct_extract` throws an error because keys of structs are like columns. #### Clauses {#docs:stable:sql:dialect:sql_quirks::clauses} ##### Automatic Column Deduplication in `SELECT` {#docs:stable:sql:dialect:sql_quirks::automatic-column-deduplication-in-select} Column names are deduplicated with the first occurrence shadowing the others: ```sql CREATE TABLE tbl AS SELECT 1 AS a; SELECT a FROM (SELECT *, 2 AS a FROM tbl); ``` | a | |--:| | 1 | ##### Case Insensitivity for `SELECT`ing Columns {#docs:stable:sql:dialect:sql_quirks::case-insensitivity-for-selecting-columns} Due to case-insensitivity, it's not possible to use `SELECT a FROM 'file.parquet'` when a column called `A` appears before the desired column `a` in `file.parquet`. ##### `USING SAMPLE` {#docs:stable:sql:dialect:sql_quirks::using-sample} The `USING SAMPLE` clause is syntactically placed after the `WHERE` and `GROUP BY` clauses (same as the `LIMIT` clause) but is semantically applied before both (unlike the `LIMIT` clause). ## Samples {#docs:stable:sql:samples} Samples are used to randomly select a subset of a dataset. ##### Examples {#docs:stable:sql:samples::examples} Select a sample of exactly 5 rows from `tbl` using `reservoir` sampling: ```sql SELECT * FROM tbl USING SAMPLE 5; ``` Select a sample of *approximately* 10% of the table using `system` sampling: ```sql SELECT * FROM tbl USING SAMPLE 10%; ``` > **Warning.** By default, when you specify a percentage, each [*vector*](#docs:stable:internals:vector) is included in the sample with that probability. If your table contains fewer than ~10k rows, it makes sense to specify the `bernoulli` sampling option instead, which applies the probability to each row independently. Even then, you'll sometimes get more and sometimes less than the specified percentage of the number of rows, but it is much less likely that you get no rows at all. To get exactly 10% of rows (up to rounding), you must use the `reservoir` sampling option. Select a sample of *approximately* 10% of the table using `bernoulli` sampling: ```sql SELECT * FROM tbl USING SAMPLE 10 PERCENT (bernoulli); ``` Select a sample of *exactly* 10% (up to rounding) of the table using `reservoir` sampling: ```sql SELECT * FROM tbl USING SAMPLE 10 PERCENT (reservoir); ``` Select a sample of *exactly* 50 rows of the table using reservoir sampling with a fixed seed (100): ```sql SELECT * FROM tbl USING SAMPLE reservoir(50 ROWS) REPEATABLE (100); ``` Select a sample of *approximately* 20% of the table using `system` sampling with a fixed seed (377): ```sql SELECT * FROM tbl USING SAMPLE 20% (system, 377); ``` Select a sample of *approximately* 20% of `tbl` **before** the join with `tbl2`: ```sql SELECT * FROM tbl TABLESAMPLE reservoir(20%), tbl2 WHERE tbl.i = tbl2.i; ``` Select a sample of *approximately* 20% of `tbl` **after** the join with `tbl2`: ```sql SELECT * FROM tbl, tbl2 WHERE tbl.i = tbl2.i USING SAMPLE reservoir(20%); ``` ##### Syntax {#docs:stable:sql:samples::syntax} Samples allow you to randomly extract a subset of a dataset. Samples are useful for exploring a dataset faster, as often you might not be interested in the exact answers to queries, but only in rough indications of what the data looks like and what is in the data. Samples allow you to get approximate answers to queries faster, as they reduce the amount of data that needs to pass through the query engine. DuckDB supports three different types of sampling methods: `reservoir`, `bernoulli` and `system`. By default, DuckDB uses `reservoir` sampling when an exact number of rows is sampled, and `system` sampling when a percentage is specified. The sampling methods are described in detail below. Samples require a *sample size*, which is an indication of how many elements will be sampled from the total population. Samples can either be given as a percentage (` 10%` or `10 PERCENT`) or as a fixed number of rows (` 10` or `10 ROWS`). All three sampling methods support sampling over a percentage, but **only** reservoir sampling supports sampling a fixed number of rows. Samples are probabilistic, that is to say, samples can be different between runs *unless* the seed is specifically specified. Specifying the seed *only* guarantees that the sample is the same if multi-threading is not enabled (i.e., `SET threads = 1`). In the case of multiple threads running over a sample, samples are not necessarily consistent even with a fixed seed. ##### `reservoir` {#docs:stable:sql:samples::reservoir} Reservoir sampling is a stream sampling technique that selects a random sample by keeping a *reservoir* of size equal to the sample size, and randomly replacing elements as more elements come in. Reservoir sampling allows us to specify *exactly* how many elements we want in the resulting sample (by selecting the size of the reservoir). As a result, reservoir sampling *always* outputs the same amount of elements, unlike system and bernoulli sampling. Reservoir sampling is only recommended for small sample sizes, and is not recommended for use with percentages. That is because reservoir sampling needs to materialize the entire sample and randomly replace tuples within the materialized sample. The larger the sample size, the higher the performance hit incurred by this process. Reservoir sampling also incurs an additional performance penalty when multi-processing is used, since the reservoir is to be shared amongst the different threads to ensure unbiased sampling. This is not a big problem when the reservoir is very small, but becomes costly when the sample is large. > **Best practice.** Avoid using reservoir sampling with large sample sizes if possible. > Reservoir sampling requires the entire sample to be materialized in memory. ##### `bernoulli` {#docs:stable:sql:samples::bernoulli} Bernoulli sampling can only be used when a sampling percentage is specified. It is rather straightforward: every row in the underlying table is included with a chance equal to the specified percentage. As a result, bernoulli sampling can return a different number of tuples even if the same percentage is specified. The *expected* number of rows is equal to the specified percentage of the table, but there will be some *variance*. Because bernoulli sampling is completely independent (there is no shared state), there is no penalty for using bernoulli sampling together with multiple threads. ##### `system` {#docs:stable:sql:samples::system} System sampling is a variant of bernoulli sampling with one crucial difference: every *vector* is included with a chance equal to the sampling percentage. This is a form of cluster sampling. System sampling is more efficient than bernoulli sampling, as no per-tuple selections have to be performed. The *expected* number of rows is still equal to the specified percentage of the table, but the *variance* is `vectorSize` times higher. As such, system sampling is not suitable for datasets with fewer than ~10k rows, where it can happen that all rows will be filtered out, or all the data will be included, even when you ask for `50 PERCENT`. #### Table Samples {#docs:stable:sql:samples::table-samples} The `TABLESAMPLE` and `USING SAMPLE` clauses are identical in terms of syntax and effect, with one important difference: tablesamples sample directly from the table for which they are specified, whereas the sample clause samples after the entire from clause has been resolved. This is relevant when there are joins present in the query plan. The `TABLESAMPLE` clause is essentially equivalent to creating a subquery with the `USING SAMPLE` clause, i.e., the following two queries are identical: Sample 20% of `tbl` **before** the join: ```sql SELECT * FROM tbl TABLESAMPLE reservoir(20%), tbl2 WHERE tbl.i = tbl2.i; ``` Sample 20% of `tbl` **before** the join: ```sql SELECT * FROM (SELECT * FROM tbl USING SAMPLE reservoir(20%)) tbl, tbl2 WHERE tbl.i = tbl2.i; ``` Sample 20% **after** the join (i.e., sample 20% of the join result): ```sql SELECT * FROM tbl, tbl2 WHERE tbl.i = tbl2.i USING SAMPLE reservoir(20%); ``` # Configuration {#configuration} ## Configuration {#docs:stable:configuration:overview} DuckDB has a number of configuration options that can be used to change the behavior of the system. The configuration options can be set using either the [`SET` statement](#docs:stable:sql:statements:set) or the [`PRAGMA` statement](#docs:stable:configuration:pragmas). They can be reset to their original values using the [`RESET` statement](#docs:stable:sql:statements:set::reset). The values of configuration options can be queried via the [`current_setting()` scalar function](#docs:stable:sql:functions:utility) or using the [`duckdb_settings()` table function](#docs:stable:sql:meta:duckdb_table_functions::duckdb_settings). For example: ```sql SELECT current_setting('memory_limit') AS memlimit; ``` Or: ```sql SELECT value AS memlimit FROM duckdb_settings() WHERE name = 'memory_limit'; ``` #### Examples {#docs:stable:configuration:overview::examples} Set the memory limit of the system to 10 GB. ```sql SET memory_limit = '10GB'; ``` Configure the system to use 1 thread. ```sql SET threads TO 1; ``` Enable printing of a progress bar during long-running queries. ```sql SET enable_progress_bar = true; ``` Set the default null order to `NULLS LAST`. ```sql SET default_null_order = 'nulls_last'; ``` Return the current value of a specific setting. ```sql SELECT current_setting('threads') AS threads; ``` | threads | |--------:| | 10 | Query a specific setting. ```sql SELECT * FROM duckdb_settings() WHERE name = 'threads'; ``` | name | value | description | input_type | scope | |---------|-------|-------------------------------------------------|------------|--------| | threads | 1 | The number of total threads used by the system. | BIGINT | GLOBAL | Show a list of all available settings. ```sql SELECT * FROM duckdb_settings(); ``` Reset the memory limit of the system back to the default. ```sql RESET memory_limit; ``` #### Secrets Manager {#docs:stable:configuration:overview::secrets-manager} DuckDB has a [Secrets manager](#docs:stable:sql:statements:create_secret), which provides a unified user interface for secrets across all backends (e.g., AWS S3) that use them. #### Configuration Reference {#docs:stable:configuration:overview::configuration-reference} Configuration options come with different default [scopes](#docs:stable:sql:statements:set::scopes): `GLOBAL` and `LOCAL`. Below is a list of all available configuration options by scope. ##### Global Configuration Options {#docs:stable:configuration:overview::global-configuration-options} | Name | Description | Type | Default value | |----|--------|--|---| | `Calendar` | The current calendar | `VARCHAR` | System (locale) calendar | | `TimeZone` | The current time zone | `VARCHAR` | System (locale) timezone | | `access_mode` | Access mode of the database (` AUTOMATIC`, `READ_ONLY` or `READ_WRITE`) | `VARCHAR` | `automatic` | | `allocator_background_threads` | Whether to enable the allocator background thread. | `BOOLEAN` | `false` | | `allocator_bulk_deallocation_flush_threshold` | If a bulk deallocation larger than this occurs, flush outstanding allocations. | `VARCHAR` | `512.0 MiB` | | `allocator_flush_threshold` | Peak allocation threshold at which to flush the allocator after completing a task. | `VARCHAR` | `128.0 MiB` | | `allow_community_extensions` | Allow to load community built extensions | `BOOLEAN` | `true` | | `allow_extensions_metadata_mismatch` | Allow to load extensions with not compatible metadata | `BOOLEAN` | `false` | | `allow_persistent_secrets` | Allow the creation of persistent secrets, that are stored and loaded on restarts | `BOOLEAN` | `true` | | `allow_unredacted_secrets` | Allow printing unredacted secrets | `BOOLEAN` | `false` | | `allow_unsigned_extensions` | Allow to load extensions with invalid or missing signatures | `BOOLEAN` | `false` | | `allowed_directories` | List of directories/prefixes that are ALWAYS allowed to be queried - even when enable_external_access is false | `VARCHAR[]` | `[]` | | `allowed_paths` | List of files that are ALWAYS allowed to be queried - even when enable_external_access is false | `VARCHAR[]` | `[]` | | `arrow_large_buffer_size` | Whether Arrow buffers for strings, blobs, uuids and bits should be exported using large buffers | `BOOLEAN` | `false` | | `arrow_lossless_conversion` | Whenever a DuckDB type does not have a clear native or canonical extension match in Arrow, export the types with a duckdb.type_name extension name. | `BOOLEAN` | `false` | | `arrow_output_list_view` | Whether export to Arrow format should use ListView as the physical layout for LIST columns | `BOOLEAN` | `false` | | `arrow_output_version` | Whether strings should be produced by DuckDB in Utf8View format instead of Utf8 | `VARCHAR` | `1.0` | | `asof_loop_join_threshold` | The maximum number of rows we need on the left side of an ASOF join to use a nested loop join | `UBIGINT` | `64` | | `auto_fallback_to_full_download` | Allows automatically falling back to full file downloads when possible. | `BOOLEAN` | `true` | | `autoinstall_extension_repository` | Overrides the custom endpoint for extension installation on autoloading | `VARCHAR` | | | `autoinstall_known_extensions` | Whether known extensions are allowed to be automatically installed when a query depends on them | `BOOLEAN` | `true` | | `autoload_known_extensions` | Whether known extensions are allowed to be automatically loaded when a query depends on them | `BOOLEAN` | `true` | | `binary_as_string` | In Parquet files, interpret binary data as a string. | `BOOLEAN` | `false` | | `ca_cert_file` | Path to a custom certificate file for self-signed certificates. | `VARCHAR` | | | `catalog_error_max_schemas` | The maximum number of schemas the system will scan for "did you mean..." style errors in the catalog | `UBIGINT` | `100` | | `checkpoint_threshold`, `wal_autocheckpoint` | The WAL size threshold at which to automatically trigger a checkpoint (e.g., 1GB) | `VARCHAR` | `16.0 MiB` | | `custom_extension_repository` | Overrides the custom endpoint for remote extension installation | `VARCHAR` | | | `custom_user_agent` | Metadata from DuckDB callers | `VARCHAR` | | | `default_block_size` | The default block size for new duckdb database files (new as-in, they do not yet exist). | `UBIGINT` | `262144` | | `default_collation` | The collation setting used when none is specified | `VARCHAR` | | | `default_null_order`, `null_order` | NULL ordering used when none is specified (` NULLS_FIRST` or `NULLS_LAST`) | `VARCHAR` | `NULLS_LAST` | | `default_order` | The order type used when none is specified (` ASC` or `DESC`) | `VARCHAR` | `ASCENDING` | | `default_secret_storage` | Allows switching the default storage for secrets | `VARCHAR` | `local_file` | | `disable_database_invalidation` | Disables invalidating the database instance when encountering a fatal error. Should be used with great care, as DuckDB cannot guarantee correct behavior after a fatal error. | `BOOLEAN` | `false` | | `disable_parquet_prefetching` | Disable the prefetching mechanism in Parquet | `BOOLEAN` | `false` | | `disable_timestamptz_casts` | Disable casting from timestamp to timestamptz | `BOOLEAN` | `false` | | `disabled_compression_methods` | Disable a specific set of compression methods (comma separated) | `VARCHAR` | | | `disabled_filesystems` | Disable specific file systems preventing access (e.g., LocalFileSystem) | `VARCHAR` | | | `disabled_log_types` | Sets the list of disabled loggers | `VARCHAR` | | | `duckdb_api` | DuckDB API surface | `VARCHAR` | `cli` | | `dynamic_or_filter_threshold` | The maximum amount of OR filters we generate dynamically from a hash join | `UBIGINT` | `50` | | `enable_curl_server_cert_verification` | Enable server side certificate verification for CURL backend. | `BOOLEAN` | `true` | | `enable_external_access` | Allow the database to access external state (through e.g., loading/installing modules, COPY TO/FROM, CSV readers, pandas replacement scans, etc) | `BOOLEAN` | `true` | | `enable_external_file_cache` | Allow the database to cache external files (e.g., Parquet) in memory. | `BOOLEAN` | `true` | | `enable_fsst_vectors` | Allow scans on FSST compressed segments to emit compressed vectors to utilize late decompression | `BOOLEAN` | `false` | | `enable_geoparquet_conversion` | Attempt to decode/encode geometry data in/as GeoParquet files if the spatial extension is present. | `BOOLEAN` | `true` | | `enable_http_metadata_cache` | Whether or not the global http metadata is used to cache HTTP metadata | `BOOLEAN` | `false` | | `enable_logging` | Enables the logger | `BOOLEAN` | `0` | | `enable_macro_dependencies` | Enable created MACROs to create dependencies on the referenced objects (such as tables) | `BOOLEAN` | `false` | | `enable_object_cache` | [PLACEHOLDER] Legacy setting - does nothing | `BOOLEAN` | `false` | | `enable_server_cert_verification` | Enable server side certificate verification. | `BOOLEAN` | `false` | | `enable_view_dependencies` | Enable created VIEWs to create dependencies on the referenced objects (such as tables) | `BOOLEAN` | `false` | | `enabled_log_types` | Sets the list of enabled loggers | `VARCHAR` | | | `experimental_metadata_reuse` | EXPERIMENTAL: Re-use row group and table metadata when checkpointing. | `BOOLEAN` | `false` | | `extension_directory` | Set the directory to store extensions in | `VARCHAR` | | | `external_threads` | The number of external threads that work on DuckDB tasks. | `UBIGINT` | `1` | | `force_download` | Forces upfront download of file | `BOOLEAN` | `false` | | `http_keep_alive` | Keep alive connections. Setting this to false can help when running into connection failures | `BOOLEAN` | `true` | | `http_proxy_password` | Password for HTTP proxy | `VARCHAR` | | | `http_proxy_username` | Username for HTTP proxy | `VARCHAR` | | | `http_proxy` | HTTP proxy host | `VARCHAR` | | | `http_retries` | HTTP retries on I/O error | `UBIGINT` | `3` | | `http_retry_backoff` | Backoff factor for exponentially increasing retry wait time | `FLOAT` | `4` | | `http_retry_wait_ms` | Time between retries | `UBIGINT` | `100` | | `http_timeout` | HTTP timeout read/write/connection/retry (in seconds) | `UBIGINT` | `30` | | `httpfs_client_implementation` | Select which is the HTTPUtil implementation to be used | `VARCHAR` | `default` | | `ieee_floating_point_ops` | Use IEE754-compliant floating point operations (returning NAN instead of errors/NULL). | `BOOLEAN` | `true` | | `immediate_transaction_mode` | Whether transactions should be started lazily when needed, or immediately when BEGIN TRANSACTION is called | `BOOLEAN` | `false` | | `index_scan_max_count` | The maximum index scan count sets a threshold for index scans. If fewer than MAX(index_scan_max_count, index_scan_percentage * total_row_count) rows match, we perform an index scan instead of a table scan. | `UBIGINT` | `2048` | | `index_scan_percentage` | The index scan percentage sets a threshold for index scans. If fewer than MAX(index_scan_max_count, index_scan_percentage * total_row_count) rows match, we perform an index scan instead of a table scan. | `DOUBLE` | `0.001` | | `integer_division` | Whether or not the / operator defaults to integer division, or to floating point division | `BOOLEAN` | `false` | | `late_materialization_max_rows` | The maximum amount of rows in the LIMIT/SAMPLE for which we trigger late materialization | `UBIGINT` | `50` | | `lock_configuration` | Whether or not the configuration can be altered | `BOOLEAN` | `false` | | `logging_level` | The log level which will be recorded in the log | `VARCHAR` | `INFO` | | `logging_mode` | Determines which types of log messages are logged | `VARCHAR` | `LEVEL_ONLY` | | `logging_storage` | Set the logging storage (memory/stdout/file/) | `VARCHAR` | `memory` | | `max_memory`, `memory_limit` | The maximum memory of the system (e.g., 1GB) | `VARCHAR` | 80% of RAM | | `max_temp_directory_size` | The maximum amount of data stored inside the 'temp_directory' (when set) (e.g., 1GB) | `VARCHAR` | `90% of available disk space` | | `max_vacuum_tasks` | The maximum vacuum tasks to schedule during a checkpoint. | `UBIGINT` | `100` | | `merge_join_threshold` | The maximum number of rows on either table to choose a merge join | `UBIGINT` | `1000` | | `nested_loop_join_threshold` | The maximum number of rows on either table to choose a nested loop join | `UBIGINT` | `5` | | `old_implicit_casting` | Allow implicit casting to/from VARCHAR | `BOOLEAN` | `false` | | `order_by_non_integer_literal` | Allow ordering by non-integer literals - ordering by such literals has no effect. | `BOOLEAN` | `false` | | `ordered_aggregate_threshold` | The number of rows to accumulate before sorting, used for tuning | `UBIGINT` | `262144` | | `parquet_metadata_cache` | Cache Parquet metadata - useful when reading the same files multiple times | `BOOLEAN` | `false` | | `partitioned_write_flush_threshold` | The threshold in number of rows after which we flush a thread state when writing using `PARTITION_BY` | `UBIGINT` | `524288` | | `partitioned_write_max_open_files` | The maximum amount of files the system can keep open before flushing to disk when writing using `PARTITION_BY` | `UBIGINT` | `100` | | `password` | The password to use. Ignored for legacy compatibility. | `VARCHAR` | NULL | | `perfect_ht_threshold` | Threshold in bytes for when to use a perfect hash table | `UBIGINT` | `12` | | `pin_threads` | Whether to pin threads to cores (Linux only, default AUTO: on when there are more than 64 cores) | `VARCHAR` | `auto` | | `pivot_filter_threshold` | The threshold to switch from using filtered aggregates to LIST with a dedicated pivot operator | `UBIGINT` | `20` | | `pivot_limit` | The maximum number of pivot columns in a pivot statement | `UBIGINT` | `100000` | | `prefer_range_joins` | Force use of range joins with mixed predicates | `BOOLEAN` | `false` | | `prefetch_all_parquet_files` | Use the prefetching mechanism for all types of parquet files | `BOOLEAN` | `false` | | `preserve_identifier_case` | Whether or not to preserve the identifier case, instead of always lowercasing all non-quoted identifiers | `BOOLEAN` | `true` | | `preserve_insertion_order` | Whether or not to preserve insertion order. If set to false the system is allowed to re-order any results that do not contain ORDER BY clauses. | `BOOLEAN` | `true` | | `produce_arrow_string_view` | Whether Arrow strings should be produced by DuckDB in Utf8View format instead of Utf8 | `BOOLEAN` | `false` | | `s3_access_key_id` | S3 Access Key ID | `VARCHAR` | NULL | | `s3_endpoint` | S3 Endpoint | `VARCHAR` | NULL | | `s3_kms_key_id` | S3 KMS Key ID | `VARCHAR` | NULL | | `s3_region` | S3 Region | `VARCHAR` | NULL | | `s3_requester_pays` | S3 use requester pays mode | `BOOLEAN` | `false` | | `s3_secret_access_key` | S3 Access Key | `VARCHAR` | NULL | | `s3_session_token` | S3 Session Token | `VARCHAR` | NULL | | `s3_uploader_max_filesize` | S3 Uploader max filesize (between 50GB and 5TB) | `VARCHAR` | `800GB` | | `s3_uploader_max_parts_per_file` | S3 Uploader max parts per file (between 1 and 10000) | `UBIGINT` | `10000` | | `s3_uploader_thread_limit` | S3 Uploader global thread limit | `UBIGINT` | `50` | | `s3_url_compatibility_mode` | Disable Globs and Query Parameters on S3 URLs | `BOOLEAN` | `false` | | `s3_url_style` | S3 URL style | `VARCHAR` | `vhost` | | `s3_use_ssl` | S3 use SSL | `BOOLEAN` | `true` | | `scalar_subquery_error_on_multiple_rows` | When a scalar subquery returns multiple rows - return a random row instead of returning an error. | `BOOLEAN` | `true` | | `scheduler_process_partial` | Partially process tasks before rescheduling - allows for more scheduler fairness between separate queries | `BOOLEAN` | `false` | | `secret_directory` | Set the directory to which persistent secrets are stored | `VARCHAR` | `~/.duckdb/stored_secrets` | | `storage_compatibility_version` | Serialize on checkpoint with compatibility for a given duckdb version | `VARCHAR` | `v0.10.2` | | `temp_directory` | Set the directory to which to write temp files | `VARCHAR` | `âŸ¨database_nameâŸ©.tmp` or `.tmp` (in in-memory mode) | | `temp_file_encryption` | Encrypt all temporary files if database is encrypted | `BOOLEAN` | `false` | | `threads`, `worker_threads` | The number of total threads used by the system. | `BIGINT` | # CPU cores | | `unsafe_disable_etag_checks` | Disable checks on ETag consistency | `BOOLEAN` | `false` | | `user`, `username` | The username to use. Ignored for legacy compatibility. | `VARCHAR` | NULL | | `variant_legacy_encoding` | Enables the Parquet reader to identify a Variant structurally. | `BOOLEAN` | `false` | | `zstd_min_string_length` | The (average) length at which to enable ZSTD compression, defaults to 4096 | `UBIGINT` | `4096` | ##### Local Configuration Options {#docs:stable:configuration:overview::local-configuration-options} | Name | Description | Type | Default value | |----|--------|--|---| | `custom_profiling_settings` | Accepts a `JSON` enabling custom metrics | `VARCHAR` | `{"TOTAL_BYTES_WRITTEN": "true", "TOTAL_BYTES_READ": "true", "ROWS_RETURNED": "true", "LATENCY": "true", "RESULT_SET_SIZE": "true", "OPERATOR_TIMING": "true", "OPERATOR_ROWS_SCANNED": "true", "CUMULATIVE_ROWS_SCANNED": "true", "OPERATOR_CARDINALITY": "true", "OPERATOR_TYPE": "true", "OPERATOR_NAME": "true", "CPU_TIME": "true", "EXTRA_INFO": "true", "SYSTEM_PEAK_BUFFER_MEMORY": "true", "BLOCKED_THREAD_TIME": "true", "CUMULATIVE_CARDINALITY": "true", "SYSTEM_PEAK_TEMP_DIR_SIZE": "true", "QUERY_NAME": "true"}` | | `enable_http_logging` | (deprecated) Enables HTTP logging | `BOOLEAN` | `true` | | `enable_profiling` | Enables profiling, and sets the output format (` JSON`, `QUERY_TREE`, `QUERY_TREE_OPTIMIZER`) | `VARCHAR` | NULL | | `enable_progress_bar_print` | Controls the printing of the progress bar, when 'enable_progress_bar' is true | `BOOLEAN` | `true` | | `enable_progress_bar` | Enables the progress bar, printing progress to the terminal for long queries | `BOOLEAN` | `true` | | `errors_as_json` | Output error messages as structured `JSON` instead of as a raw string | `BOOLEAN` | `false` | | `explain_output` | Output of EXPLAIN statements (` ALL`, `OPTIMIZED_ONLY`, `PHYSICAL_ONLY`) | `VARCHAR` | `physical_only` | | `file_search_path` | A comma separated list of directories to search for input files | `VARCHAR` | | | `home_directory` | Sets the home directory used by the system | `VARCHAR` | | | `http_logging_output` | (deprecated) The file to which HTTP logging output should be saved, or empty to print to the terminal | `VARCHAR` | | | `lambda_syntax` | Configures the use of the deprecated single arrow operator (->) for lambda functions. | `VARCHAR` | `DEFAULT` | | `log_query_path` | Specifies the path to which queries should be logged (default: NULL, queries are not logged) | `VARCHAR` | NULL | | `max_expression_depth` | The maximum expression depth limit in the parser. WARNING: increasing this setting and using very deep expressions might lead to stack overflow errors. | `UBIGINT` | `1000` | | `profile_output`, `profiling_output` | The file to which profile output should be saved, or empty to print to the terminal | `VARCHAR` | | | `profiling_coverage` | The profiling coverage (SELECT or `ALL`) | `VARCHAR` | `SELECT` | | `profiling_mode` | The profiling mode (` STANDARD` or `DETAILED`) | `VARCHAR` | NULL | | `progress_bar_time` | Sets the time (in milliseconds) how long a query needs to take before we start printing a progress bar | `BIGINT` | `2000` | | `schema` | Sets the default search schema. Equivalent to setting search_path to a single value. | `VARCHAR` | `main` | | `search_path` | Sets the default catalog search path as a comma-separated list of values | `VARCHAR` | | | `streaming_buffer_size` | The maximum memory to buffer between fetching from a streaming result (e.g., 1GB) | `VARCHAR` | `976.5 KiB` | ## Pragmas {#docs:stable:configuration:pragmas} The `PRAGMA` statement is a SQL extension adopted by DuckDB from SQLite. `PRAGMA` statements can be issued in a similar manner to regular SQL statements. `PRAGMA` commands may alter the internal state of the database engine, and can influence the subsequent execution or behavior of the engine. `PRAGMA` statements that assign a value to an option can also be issued using the [`SET` statement](#docs:stable:sql:statements:set) and the value of an option can be retrieved using `SELECT current_setting(option_name)`. For DuckDB's built in configuration options, see the [Configuration Reference](#docs:stable:configuration:overview::configuration-reference). DuckDB [extensions](#docs:stable:extensions:overview) may register additional configuration options. These are documented in the respective extensions' documentation pages. This page contains the supported `PRAGMA` settings. #### Metadata {#docs:stable:configuration:pragmas::metadata} ###### Schema Information {#docs:stable:configuration:pragmas::schema-information} List all databases: ```sql PRAGMA database_list; ``` List all tables: ```sql PRAGMA show_tables; ``` List all tables, with extra information, similarly to [`DESCRIBE`](#docs:stable:guides:meta:describe): ```sql PRAGMA show_tables_expanded; ``` To list all functions: ```sql PRAGMA functions; ``` For queries targeting non-existing schemas, DuckDB generates â€œdid you mean...â€ style error messages. When there are thousands of attached databases, these errors can take a long time to generate. To limit the number of schemas DuckDB looks through, use the `catalog_error_max_schemas` option: ```sql SET catalog_error_max_schemas = 10; ``` ###### Table Information {#docs:stable:configuration:pragmas::table-information} Get info for a specific table: ```sql PRAGMA table_info('table_name'); CALL pragma_table_info('table_name'); ``` `table_info` returns information about the columns of the table with name `table_name`. The exact format of the table returned is given below: ```sql cid INTEGER, -- cid of the column name VARCHAR, -- name of the column type VARCHAR, -- type of the column notnull BOOLEAN, -- if the column is marked as NOT NULL dflt_value VARCHAR, -- default value of the column, or NULL if not specified pk BOOLEAN -- part of the primary key or not ``` ###### Database Size {#docs:stable:configuration:pragmas::database-size} Get the file and memory size of each database: ```sql PRAGMA database_size; CALL pragma_database_size(); ``` `database_size` returns information about the file and memory size of each database. The column types of the returned results are given below: ```sql database_name VARCHAR, -- database name database_size VARCHAR, -- total block count times the block size block_size BIGINT, -- database block size total_blocks BIGINT, -- total blocks in the database used_blocks BIGINT, -- used blocks in the database free_blocks BIGINT, -- free blocks in the database wal_size VARCHAR, -- write ahead log size memory_usage VARCHAR, -- memory used by the database buffer manager memory_limit VARCHAR -- maximum memory allowed for the database ``` ###### Storage Information {#docs:stable:configuration:pragmas::storage-information} To get storage information: ```sql PRAGMA storage_info('table_name'); CALL pragma_storage_info('table_name'); ``` This call returns the following information for the given table: | Name | Type | Description | |----------------|-----------|-------------------------------------------------------| | `row_group_id` | `BIGINT` | | | `column_name` | `VARCHAR` | | | `column_id` | `BIGINT` | | | `column_path` | `VARCHAR` | | | `segment_id` | `BIGINT` | | | `segment_type` | `VARCHAR` | | | `start` | `BIGINT` | The start row id of this chunk | | `count` | `BIGINT` | The amount of entries in this storage chunk | | `compression` | `VARCHAR` | Compression type used for this column â€“ see the [â€œLightweight Compression in DuckDBâ€ blog post](https://duckdb.org/2022/10/28/lightweight-compression) | | `stats` | `VARCHAR` | | | `has_updates` | `BOOLEAN` | | | `persistent` | `BOOLEAN` | `false` if temporary table | | `block_id` | `BIGINT` | Empty unless persistent | | `block_offset` | `BIGINT` | Empty unless persistent | See [Storage](#docs:stable:internals:storage) for more information. ###### Show Databases {#docs:stable:configuration:pragmas::show-databases} The following statement is equivalent to the [`SHOW DATABASES` statement](#docs:stable:sql:statements:attach): ```sql PRAGMA show_databases; ``` #### Resource Management {#docs:stable:configuration:pragmas::resource-management} ###### Memory Limit {#docs:stable:configuration:pragmas::memory-limit} Set the memory limit for the buffer manager: ```sql SET memory_limit = '1GB'; ``` > **Warning.** The specified memory limit is only applied to the buffer manager. > For most queries, the buffer manager handles the majority of the data processed. > However, certain in-memory data structures such as [vectors](#docs:stable:internals:vector) and query results are allocated outside of the buffer manager. > Additionally, [aggregate functions](#docs:stable:sql:functions:aggregates) with complex state (e.g., `list`, `mode`, `quantile`, `string_agg`, and `approx` functions) use memory outside of the buffer manager. > Therefore, the actual memory consumption can be higher than the specified memory limit. ###### Threads {#docs:stable:configuration:pragmas::threads} Set the amount of threads for parallel query execution: ```sql SET threads = 4; ``` #### Collations {#docs:stable:configuration:pragmas::collations} List all available collations: ```sql PRAGMA collations; ``` Set the default collation to one of the available ones: ```sql SET default_collation = 'nocase'; ``` #### Default Ordering for NULLs {#docs:stable:configuration:pragmas::default-ordering-for-nulls} Set the default ordering for NULLs to be either `NULLS_FIRST`, `NULLS_LAST`, `NULLS_FIRST_ON_ASC_LAST_ON_DESC` or `NULLS_LAST_ON_ASC_FIRST_ON_DESC`: ```sql SET default_null_order = 'NULLS_FIRST'; SET default_null_order = 'NULLS_LAST_ON_ASC_FIRST_ON_DESC'; ``` Set the default result set ordering direction to `ASCENDING` or `DESCENDING`: ```sql SET default_order = 'ASCENDING'; SET default_order = 'DESCENDING'; ``` #### Ordering by Non-Integer Literals {#docs:stable:configuration:pragmas::ordering-by-non-integer-literals} By default, ordering by non-integer literals is not allowed: ```sql SELECT 42 ORDER BY 'hello world'; ``` ```console -- Binder Error: ORDER BY non-integer literal has no effect. ``` To allow this behavior, use the `order_by_non_integer_literal` option: ```sql SET order_by_non_integer_literal = true; ``` #### Implicit Casting to `VARCHAR` {#docs:stable:configuration:pragmas::implicit-casting-to-varchar} Prior to version 0.10.0, DuckDB would automatically allow any type to be implicitly cast to `VARCHAR` during function binding. As a result it was possible to e.g., compute the substring of an integer without using an explicit cast. For version v0.10.0 and later an explicit cast is needed instead. To revert to the old behavior that performs implicit casting, set the `old_implicit_casting` variable to `true`: ```sql SET old_implicit_casting = true; ``` #### Python: Scan All Dataframes {#docs:stable:configuration:pragmas::python-scan-all-dataframes} Prior to version 1.1.0, DuckDB's [replacement scan mechanism](#docs:stable:clients:c:replacement_scans) in Python scanned the global Python namespace. To revert to this old behavior, use the following setting: ```sql SET python_scan_all_frames = true; ``` #### Information on DuckDB {#docs:stable:configuration:pragmas::information-on-duckdb} ###### Version {#docs:stable:configuration:pragmas::version} Show DuckDB version: ```sql PRAGMA version; CALL pragma_version(); ``` ###### Platform {#docs:stable:configuration:pragmas::platform} `platform` returns an identifier for the platform the current DuckDB executable has been compiled for, e.g., `osx_arm64`. The format of this identifier matches the platform name as described in the [extension loading explainer](#docs:stable:extensions:extension_distribution::platforms): ```sql PRAGMA platform; CALL pragma_platform(); ``` ###### User Agent {#docs:stable:configuration:pragmas::user-agent} The following statement returns the user agent information, e.g., `duckdb/v0.10.0(osx_arm64)`: ```sql PRAGMA user_agent; ``` ###### Metadata Information {#docs:stable:configuration:pragmas::metadata-information} The following statement returns information on the metadata store (` block_id`, `total_blocks`, `free_blocks`, and `free_list`): ```sql PRAGMA metadata_info; ``` #### Progress Bar {#docs:stable:configuration:pragmas::progress-bar} Show progress bar when running queries: ```sql PRAGMA enable_progress_bar; ``` Or: ```sql PRAGMA enable_print_progress_bar; ``` Don't show a progress bar for running queries: ```sql PRAGMA disable_progress_bar; ``` Or: ```sql PRAGMA disable_print_progress_bar; ``` #### EXPLAIN Output {#docs:stable:configuration:pragmas::explain-output} The output of [`EXPLAIN`](#docs:stable:sql:statements:profiling) can be configured to show only the physical plan. The default configuration of `EXPLAIN`: ```sql SET explain_output = 'physical_only'; ``` To only show the optimized query plan: ```sql SET explain_output = 'optimized_only'; ``` To show all query plans: ```sql SET explain_output = 'all'; ``` #### Profiling {#docs:stable:configuration:pragmas::profiling} ##### Enable Profiling {#docs:stable:configuration:pragmas::enable-profiling} The following query enables profiling with the default format, `query_tree`. Independent of the format, `enable_profiling` is **mandatory** to enable profiling. ```sql PRAGMA enable_profiling; PRAGMA enable_profile; ``` ##### Profiling Coverage {#docs:stable:configuration:pragmas::profiling-coverage} By default, the profiling coverage is set to `SELECT`. `SELECT` runs the profiler for each operator in the physical plan of a `SELECT` statement. ```sql SET profiling_coverage = 'SELECT'; ``` By default, the profiler **does not** emit profiling information for other statement types (` INSERT INTO`, `ATTACH`, etc.). To run the profiler for all statement types, change this setting to `ALL`. ```sql SET profiling_coverage = 'ALL'; ``` ##### Profiling Format {#docs:stable:configuration:pragmas::profiling-format} The format of `enable_profiling` can be specified as `query_tree`, `json`, `query_tree_optimizer`, or `no_output`. Each format prints its output to the configured output, except `no_output`. The default format is `query_tree`. It prints the physical query plan and the metrics of each operator in the tree. ```sql SET enable_profiling = 'query_tree'; ``` Alternatively, `json` returns the physical query plan as JSON: ```sql SET enable_profiling = 'json'; ``` > **Tip.** To visualize query plans, consider using the [DuckDB execution plan visualizer](https://db.cs.uni-tuebingen.de/explain/) developed by the [Database Systems Research Group at the University of TÃ¼bingen](https://github.com/DBatUTuebingen). To return the physical query plan, including optimizer and planner metrics: ```sql SET enable_profiling = 'query_tree_optimizer'; ``` Database drivers and other applications can also access profiling information through API calls, in which case users can disable any other output. Even though the parameter reads `no_output`, it is essential to note that this **only** affects printing to the configurable output. When accessing profiling information through API calls, it is still crucial to enable profiling: ```sql SET enable_profiling = 'no_output'; ``` ##### Profiling Output {#docs:stable:configuration:pragmas::profiling-output} By default, DuckDB prints profiling information to the standard output. However, if you prefer to write the profiling information to a file, you can use `PRAGMA` `profiling_output` to specify a filepath. > **Warning.** The file contents will be overwritten for every newly issued query. > Hence, the file will only contain the profiling information of the last run query: ```sql SET profiling_output = '/path/to/file.json'; SET profile_output = '/path/to/file.json'; ``` ##### Profiling Mode {#docs:stable:configuration:pragmas::profiling-mode} By default, a limited amount of profiling information is provided (` standard`). ```sql SET profiling_mode = 'standard'; ``` For more details, use the detailed profiling mode by setting `profiling_mode` to `detailed`. The output of this mode includes profiling of the planner and optimizer stages. ```sql SET profiling_mode = 'detailed'; ``` ##### Custom Metrics {#docs:stable:configuration:pragmas::custom-metrics} By default, profiling enables all metrics except those activated by detailed profiling. Using the `custom_profiling_settings` `PRAGMA`, each metric, including those from detailed profiling, can be individually enabled or disabled. This `PRAGMA` accepts a JSON object with metric names as keys and Boolean values to toggle them on or off. Settings specified by this `PRAGMA` override the default behavior. > **Note.** This only affects the metrics when the `enable_profiling` is set to `json` or `no_output`. > The `query_tree` and `query_tree_optimizer` always use a default set of metrics. In the following example, the `CPU_TIME` metric is disabled. The `EXTRA_INFO`, `OPERATOR_CARDINALITY`, and `OPERATOR_TIMING` metrics are enabled. ```sql SET custom_profiling_settings = '{"CPU_TIME": "false", "EXTRA_INFO": "true", "OPERATOR_CARDINALITY": "true", "OPERATOR_TIMING": "true"}'; ``` The profiling documentation contains an overview of the available [metrics](#docs:stable:dev:profiling::metrics). ##### Disable Profiling {#docs:stable:configuration:pragmas::disable-profiling} To disable profiling: ```sql PRAGMA disable_profiling; PRAGMA disable_profile; ``` #### Query Optimization {#docs:stable:configuration:pragmas::query-optimization} ###### Optimizer {#docs:stable:configuration:pragmas::optimizer} To disable the query optimizer: ```sql PRAGMA disable_optimizer; ``` To enable the query optimizer: ```sql PRAGMA enable_optimizer; ``` ###### Selectively Disabling Optimizers {#docs:stable:configuration:pragmas::selectively-disabling-optimizers} The `disabled_optimizers` option allows selectively disabling optimization steps. For example, to disable `filter_pushdown` and `statistics_propagation`, run: ```sql SET disabled_optimizers = 'filter_pushdown,statistics_propagation'; ``` The available optimizations can be queried using the [`duckdb_optimizers()` table function](#docs:stable:sql:meta:duckdb_table_functions::duckdb_optimizers). To re-enable the optimizers, run: ```sql SET disabled_optimizers = ''; ``` > **Warning.** The `disabled_optimizers` option should only be used for debugging performance issues and should be avoided in production. #### Logging {#docs:stable:configuration:pragmas::logging} Set a path for query logging: ```sql SET log_query_path = '/tmp/duckdb_log/'; ``` Disable query logging: ```sql SET log_query_path = ''; ``` #### Full-Text Search Indexes {#docs:stable:configuration:pragmas::full-text-search-indexes} The `create_fts_index` and `drop_fts_index` options are only available when the [`fts` extension](#docs:stable:core_extensions:full_text_search) is loaded. Their usage is documented on the [Full-Text Search extension page](#docs:stable:core_extensions:full_text_search). #### Verification {#docs:stable:configuration:pragmas::verification} ###### Verification of External Operators {#docs:stable:configuration:pragmas::verification-of-external-operators} Enable verification of external operators: ```sql PRAGMA verify_external; ``` Disable verification of external operators: ```sql PRAGMA disable_verify_external; ``` ###### Verification of Round-Trip Capabilities {#docs:stable:configuration:pragmas::verification-of-round-trip-capabilities} Enable verification of round-trip capabilities for supported logical plans: ```sql PRAGMA verify_serializer; ``` Disable verification of round-trip capabilities: ```sql PRAGMA disable_verify_serializer; ``` #### Object Cache {#docs:stable:configuration:pragmas::object-cache} Enable caching of objects for e.g., Parquet metadata: ```sql PRAGMA enable_object_cache; ``` Disable caching of objects: ```sql PRAGMA disable_object_cache; ``` #### Checkpointing {#docs:stable:configuration:pragmas::checkpointing} ###### Compression {#docs:stable:configuration:pragmas::compression} During checkpointing, the existing column data + any new changes get compressed. There exist a couple pragmas to influence which compression functions are considered. ####### Force Compression {#docs:stable:configuration:pragmas::force-compression} Prefer using this compression method over any other method if possible: ```sql PRAGMA force_compression = 'bitpacking'; ``` ####### Disabled Compression Methods {#docs:stable:configuration:pragmas::disabled-compression-methods} Avoid using any of the listed compression methods from the comma separated list: ```sql PRAGMA disabled_compression_methods = 'fsst,rle'; ``` ###### Force Checkpoint {#docs:stable:configuration:pragmas::force-checkpoint} When [`CHECKPOINT`](#docs:stable:sql:statements:checkpoint) is called when no changes are made, force a checkpoint regardless: ```sql PRAGMA force_checkpoint; ``` ###### Checkpoint on Shutdown {#docs:stable:configuration:pragmas::checkpoint-on-shutdown} Run a `CHECKPOINT` on successful shutdown and delete the WAL, to leave only a single database file behind: ```sql PRAGMA enable_checkpoint_on_shutdown; ``` Don't run a `CHECKPOINT` on shutdown: ```sql PRAGMA disable_checkpoint_on_shutdown; ``` #### Temp Directory for Spilling Data to Disk {#docs:stable:configuration:pragmas::temp-directory-for-spilling-data-to-disk} By default, DuckDB uses a temporary directory named `âŸ¨database_file_nameâŸ©.tmp`{:.language-sql .highlight} to spill to disk, located in the same directory as the database file. To change this, use: ```sql SET temp_directory = '/path/to/temp_dir.tmp/'; ``` #### Returning Errors as JSON {#docs:stable:configuration:pragmas::returning-errors-as-json} The `errors_as_json` option can be set to obtain error information in raw JSON format. For certain errors, extra information or decomposed information is provided for easier machine processing. For example: ```sql SET errors_as_json = true; ``` Then, running a query that results in an error produces a JSON output: ```sql SELECT * FROM nonexistent_tbl; ``` ```json { "exception_type":"Catalog", "exception_message":"Table with name nonexistent_tbl does not exist!\nDid you mean \"temp.information_schema.tables\"?", "name":"nonexistent_tbl", "candidates":"temp.information_schema.tables", "position":"14", "type":"Table", "error_subtype":"MISSING_ENTRY" } ``` #### IEEE Floating-Point Operation Semantics {#docs:stable:configuration:pragmas::ieee-floating-point-operation-semantics} DuckDB follows IEEE floating-point operation semantics. If you would like to turn this off, run: ```sql SET ieee_floating_point_ops = false; ``` In this case, floating point division by zero (e.g., `1.0 / 0.0`, `0.0 / 0.0` and `-1.0 / 0.0`) will all return `NULL`. #### Query Verification (for Development) {#docs:stable:configuration:pragmas::query-verification-for-development} The following `PRAGMA`s are mostly used for development and internal testing. Enable query verification: ```sql PRAGMA enable_verification; ``` Disable query verification: ```sql PRAGMA disable_verification; ``` Enable force parallel query processing: ```sql PRAGMA verify_parallelism; ``` Disable force parallel query processing: ```sql PRAGMA disable_verify_parallelism; ``` #### Block Sizes {#docs:stable:configuration:pragmas::block-sizes} When persisting a database to disk, DuckDB writes to a dedicated file containing a list of blocks holding the data. In the case of a file that only holds very little data, e.g., a small table, the default block size of 256 kB might not be ideal. Therefore, DuckDB's storage format supports different block sizes. There are a few constraints on possible block size values. * Must be a power of two. * Must be greater or equal to 16384 (16 kB). * Must be lesser or equal to 262144 (256 kB). You can set the default block size for all new DuckDB files created by an instance like so: ```sql SET default_block_size = '16384'; ``` It is also possible to set the block size on a per-file basis, see [`ATTACH`](#docs:stable:sql:statements:attach) for details. ## Secrets Manager {#docs:stable:configuration:secrets_manager} The **Secrets manager** provides a unified user interface for secrets across all backends that use them. Secrets can be scoped, so different storage prefixes can have different secrets, allowing for example to join data across organizations in a single query. Secrets can also be persisted, so that they do not need to be specified every time DuckDB is launched. > **Warning.** Persistent secrets are stored in unencrypted binary format on the disk. #### Types of Secrets {#docs:stable:configuration:secrets_manager::types-of-secrets} Secrets are typed, their type identifies which service they are for. Most secrets are not included in DuckDB default, instead, they are registered by extensions. Currently, the following secret types are available: | Secret type | Service / protocol | Extension | | ------------- | -------------------- | --------------------------------------------------------------------------------- | | `azure` | Azure Blob Storage | [`azure`](#docs:stable:core_extensions:azure) | | `ducklake` | DuckLake | [`ducklake`](https://ducklake.select/docs/stable/duckdb/usage/connecting#secrets) | | `gcs` | Google Cloud Storage | [`httpfs`](#docs:stable:core_extensions:httpfs:s3api) | | `http` | HTTP and HTTPS | [`httpfs`](#docs:stable:core_extensions:httpfs:https) | | `huggingface` | Hugging Face | [`httpfs`](#docs:stable:core_extensions:httpfs:hugging_face) | | `iceberg` | Iceberg REST Catalog | [`httpfs`](#docs:stable:core_extensions:httpfs:s3api), [`iceberg`](#docs:stable:core_extensions:iceberg:iceberg_rest_catalogs) | | `mysql` | MySQL | [`mysql`](#docs:stable:core_extensions:mysql) | | `postgres` | PostgreSQL | [`postgres`](#docs:stable:core_extensions:postgres) | | `r2` | Cloudflare R2 | [`httpfs`](#docs:stable:core_extensions:httpfs:s3api) | | `s3` | AWS S3 | [`httpfs`](#docs:stable:core_extensions:httpfs:s3api) | For each type, there are one or more â€œsecret providersâ€ that specify how the secret is created. Secrets can also have an optional scope, which is a file path prefix that the secret applies to. When fetching a secret for a path, the secret scopes are compared to the path, returning the matching secret for the path. In the case of multiple matching secrets, the longest prefix is chosen. #### Creating a Secret {#docs:stable:configuration:secrets_manager::creating-a-secret} Secrets can be created using the [`CREATE SECRET` SQL statement](#docs:stable:sql:statements:create_secret). Secrets can be **temporary** or **persistent**. Temporary secrets are used by default â€“ and are stored in-memory for the life span of the DuckDB instance similar to how settings worked previously. Persistent secrets are stored in **unencrypted binary format** in the `~/.duckdb/stored_secrets` directory. On startup of DuckDB, persistent secrets are read from this directory and automatically loaded. ##### Secret Providers {#docs:stable:configuration:secrets_manager::secret-providers} To create a secret, a **Secret Provider** needs to be used. A Secret Provider is a mechanism through which a secret is generated. To illustrate this, for the `S3`, `GCS`, `R2`, and `AZURE` secret types, DuckDB currently supports two providers: `CONFIG` and `credential_chain`. The `CONFIG` provider requires the user to pass all configuration information into the `CREATE SECRET`, whereas the `credential_chain` provider will automatically try to fetch credentials. When no Secret Provider is specified, the `CONFIG` provider is used. For more details on how to create secrets using different providers check out the respective pages on [httpfs](#docs:stable:core_extensions:httpfs:overview::configuration-and-authentication-using-secrets) and [azure](#docs:stable:core_extensions:azure::authentication-with-secret). ##### Temporary Secrets {#docs:stable:configuration:secrets_manager::temporary-secrets} To create a temporary unscoped secret to access S3, we can now use the following: ```sql CREATE SECRET my_secret ( TYPE s3, KEY_ID 'my_secret_key', SECRET 'my_secret_value', REGION 'my_region' ); ``` Note that we implicitly use the default `CONFIG` secret provider here. ##### Persistent Secrets {#docs:stable:configuration:secrets_manager::persistent-secrets} In order to persist secrets between DuckDB database instances, we can now use the `CREATE PERSISTENT SECRET` command, e.g.: ```sql CREATE PERSISTENT SECRET my_persistent_secret ( TYPE s3, KEY_ID 'my_secret_key', SECRET 'my_secret_value' ); ``` By default, this will write the secret (unencrypted) to the `~/.duckdb/stored_secrets` directory. To change the secrets directory, issue: ```sql SET secret_directory = 'path/to/my_secrets_dir'; ``` Note that setting the value of the `home_directory` configuration option has no effect on the location of the secrets. #### Deleting Secrets {#docs:stable:configuration:secrets_manager::deleting-secrets} Secrets can be deleted using the [`DROP SECRET` statement](#docs:stable:sql:statements:create_secret::syntax-for-drop-secret), e.g.: ```sql DROP PERSISTENT SECRET my_persistent_secret; ``` #### Creating Multiple Secrets for the Same Service Type {#docs:stable:configuration:secrets_manager::creating-multiple-secrets-for-the-same-service-type} If two secrets exist for a service type, the scope can be used to decide which one should be used. For example: ```sql CREATE SECRET secret1 ( TYPE s3, KEY_ID 'my_secret_key1', SECRET 'my_secret_value1', SCOPE 's3://âŸ¨my-bucketâŸ©' ); ``` ```sql CREATE SECRET secret2 ( TYPE s3, KEY_ID 'my_secret_key2', SECRET 'my_secret_value2', SCOPE 's3://âŸ¨my-other-bucketâŸ©' ); ``` Now, if the user queries something from `s3://âŸ¨my-other-bucketâŸ©/something`, secret `secret2` will be chosen automatically for that request. To see which secret is being used, the `which_secret` scalar function can be used, which takes a path and a secret type as parameters: ```sql FROM which_secret('s3://âŸ¨my-other-bucketâŸ©/file.parquet', 's3'); ``` #### Listing Secrets {#docs:stable:configuration:secrets_manager::listing-secrets} Secrets can be listed using the built-in table-producing function, e.g., by using the [`duckdb_secrets()` table function](#docs:stable:sql:meta:duckdb_table_functions::duckdb_secrets): ```sql FROM duckdb_secrets(); ``` Sensitive information will be redacted. # Extensions {#extensions} ## Extensions {#docs:stable:extensions:overview} DuckDB has a flexible extension mechanism that allows for dynamically loading extensions. Extensions can enhance DuckDB's functionality by providing support for additional file formats, introducing new types, and domain-specific functionality. > Extensions are loadable on all clients (e.g., Python and R). > Extensions distributed via the Core and Community repositories are built and tested on macOS, Windows and Linux. All operating systems are supported for both the AMD64 and the ARM64 architectures. #### Listing Extensions {#docs:stable:extensions:overview::listing-extensions} To get a list of extensions, use the `duckdb_extensions` function: ```sql SELECT extension_name, installed, description FROM duckdb_extensions(); ``` | extension_name | installed | description | |-------------------|-----------|--------------------------------------------------------------| | arrow | false | A zero-copy data integration between Apache Arrow and DuckDB | | autocomplete | false | Adds support for autocomplete in the shell | | ... | ... | ... | This list will show which extensions are available, which extensions are installed, at which version, where it is installed, and more. The list includes most, but not all, available core extensions. For the full list, we maintain a [list of core extensions](#docs:stable:core_extensions:overview). #### Built-In Extensions {#docs:stable:extensions:overview::built-in-extensions} DuckDB's binary distribution comes standard with a few built-in extensions. They are statically linked into the binary and can be used as is. For example, to use the built-in [`json` extension](#docs:stable:data:json:overview) to read a JSON file: ```sql SELECT * FROM 'test.json'; ``` To make the DuckDB distribution lightweight, only a few essential extensions are built-in, varying slightly per distribution. Which extension is built-in on which platform is documented in the [list of core extensions](#docs:stable:core_extensions:overview::default-extensions). #### Installing More Extensions {#docs:stable:extensions:overview::installing-more-extensions} To make an extension that is not built-in available in DuckDB, two steps need to happen: 1. **Extension installation** is the process of downloading the extension binary and verifying its metadata. During installation, DuckDB stores the downloaded extension and some metadata in a local directory. From this directory DuckDB can then load the Extension whenever it needs to. This means that installation needs to happen only once. 2. **Extension loading** is the process of dynamically loading the binary into a DuckDB instance. DuckDB will search the local extension directory for the installed extension, then load it to make its features available. This means that every time DuckDB is restarted, all extensions that are used need to be (re)loaded > Extension installation and loading are subject to a few [limitations](#docs:stable:extensions:installing_extensions::limitations). There are two main methods of making DuckDB perform the **installation** and **loading** steps for an installable extension: **explicitly** and through **autoloading**. ##### Explicit `INSTALL` and `LOAD` {#docs:stable:extensions:overview::explicit-install-and-load} In DuckDB extensions can also be explicitly installed and loaded. Both non-autoloadable and autoloadable extensions can be installed this way. To explicitly install and load an extension, DuckDB has the dedicated SQL statements `LOAD` and `INSTALL`. For example, to install and load the [`spatial` extension](#docs:stable:core_extensions:spatial:overview), run: ```sql INSTALL spatial; LOAD spatial; ``` With these statements, DuckDB will ensure the spatial extension is installed (ignoring the `INSTALL` statement if it is already installed), then proceed to `LOAD` the spatial extension (again ignoring the statement if it is already loaded). ###### Extension Repository {#docs:stable:extensions:overview::extension-repository} Optionally a repository can be provided where the extension should be installed from, by appending `FROM âŸ¨repositoryâŸ©`{:.language-sql .highlight} to the `INSTALL` / `FORCE INSTALL` command. This repository can either be an alias, such as [`community`](#community_extensions:index), or it can be a direct URL, provided as a single-quoted string. After installing/loading an extension, the [`duckdb_extensions` function](#::listing-extensions) can be used to get more information. ##### Autoloading Extensions {#docs:stable:extensions:overview::autoloading-extensions} For many of DuckDB's core extensions, explicitly loading and installing extensions is not necessary. DuckDB contains an autoloading mechanism which can install and load the core extensions as soon as they are used in a query. For example, when running: ```sql SELECT * FROM 'https://raw.githubusercontent.com/duckdb/duckdb-web/main/data/weather.csv'; ``` DuckDB will automatically install and load the [`httpfs`](#docs:stable:core_extensions:httpfs:overview) extension. No explicit `INSTALL` or `LOAD` statements are required. Not all extensions can be autoloaded. This can have various reasons: some extensions make several changes to the running DuckDB instance, making autoloading technically not (yet) possible. For others, it is preferred to have users opt-in to the extension explicitly before use due to the way they modify behavior in DuckDB. To see which extensions can be autoloaded, check the [core extensions list](#docs:stable:core_extensions:overview). ##### Community Extensions {#docs:stable:extensions:overview::community-extensions} DuckDB supports installing third-party [community extensions](#community_extensions:index). For example, you can install the [`avro` community extension](#community_extensions:extensions:avro) via: ```sql INSTALL avro FROM community; ``` Community extensions are contributed by community members but they are built, [signed](#docs:stable:extensions:extension_distribution::signed-extensions), and distributed in a centralized repository. #### Updating Extensions {#docs:stable:extensions:overview::updating-extensions} While built-in extensions are tied to a DuckDB release due to their nature of being built into the DuckDB binary, installable extensions can and do receive updates. To ensure all currently installed extensions are on the most recent version, call: ```sql UPDATE EXTENSIONS; ``` For more details on extension version refer to the [Extension Versioning page](#docs:stable:extensions:versioning_of_extensions). #### Developing Extensions {#docs:stable:extensions:overview::developing-extensions} The same API that the core extensions use is available for developing extensions. This allows users to extend the functionality of DuckDB such that it suits their domain the best. A template for creating extensions is available in the [`extension-template` repository](https://github.com/duckdb/extension-template/). This template also holds some documentation on how to get started building your own extension. #### Working with Extensions {#docs:stable:extensions:overview::working-with-extensions} See the [installation instructions](#docs:stable:extensions:installing_extensions) and the [advanced installation methods page](#docs:stable:extensions:advanced_installation_methods). ## Installing Extensions {#docs:stable:extensions:installing_extensions} To install core DuckDB extensions, use the `INSTALL` command. For example: ```sql INSTALL httpfs; ``` This installs the extension from the default repository (` core`). #### Extension Repositories {#docs:stable:extensions:installing_extensions::extension-repositories} By default, DuckDB extensions are installed from a single repository containing extensions built and signed by the core DuckDB team. This ensures the stability and security of the core set of extensions. These extensions live in the default `core` repository, which points to `http://extensions.duckdb.org`. Besides the core repository, DuckDB also supports installing extensions from other repositories. For example, the `core_nightly` repository contains nightly builds for core extensions that are built for the latest stable release of DuckDB. This allows users to try out new features in extensions before they are officially published. ##### Installing Extensions from Different Repositories {#docs:stable:extensions:installing_extensions::installing-extensions-from-different-repositories} To install extensions from the default repository (` core`), run: ```sql INSTALL httpfs; ``` To explicitly install an extension from the core repository, run: ```sql INSTALL httpfs FROM core; -- or INSTALL httpfs FROM 'http://extensions.duckdb.org'; ``` To install an extension from the core nightly repository: ```sql INSTALL spatial FROM core_nightly; -- or INSTALL spatial FROM 'http://nightly-extensions.duckdb.org'; ``` To install an extension from a custom repository: ```sql INSTALL âŸ¨custom_extensionâŸ© FROM 'https://my-custom-extension-repository'; ``` Alternatively, the `custom_extension_repository` setting can be used to change the default repository used by DuckDB: ```sql SET custom_extension_repository = 'http://nightly-extensions.duckdb.org'; ``` DuckDB contains the following predefined repositories: | Alias | URL | Description | |:----------------------|:-----------------------------------------|:---------------------------------------------------------------------------------------| | `core` | `http://extensions.duckdb.org` | DuckDB core extensions | | `core_nightly` | `http://nightly-extensions.duckdb.org` | Nightly builds for `core` | | `community` | `http://community-extensions.duckdb.org` | DuckDB community extensions | | `local_build_debug` | `./build/debug/repository` | Repository created when building DuckDB from source in debug mode (for development) | | `local_build_release` | `./build/release/repository` | Repository created when building DuckDB from source in release mode (for development) | #### Working with Multiple Repositories {#docs:stable:extensions:installing_extensions::working-with-multiple-repositories} When working with extensions from different repositories, especially mixing `core` and `core_nightly`, it is important to know the origins and version of the different extensions. For this reason, DuckDB keeps track of this in the extension installation metadata. For example: ```sql INSTALL httpfs FROM core; INSTALL aws FROM core_nightly; SELECT extension_name, extension_version, installed_from, install_mode FROM duckdb_extensions(); ``` This outputs: | extensions_name | extensions_version | installed_from | install_mode | |:----------------|:-------------------|:---------------|:-------------| | httpfs | 62d61a417f | core | REPOSITORY | | aws | 42c78d3 | core_nightly | REPOSITORY | | ... | ... | ... | ... | #### Force Installing to Upgrade Extensions {#docs:stable:extensions:installing_extensions::force-installing-to-upgrade-extensions} When DuckDB installs an extension, it is copied to a local directory to be cached and avoid future network traffic. Any subsequent calls to `INSTALL âŸ¨extension_nameâŸ©`{:.language-sql .highlight} will use the local version instead of downloading the extension again. To force re-downloading the extension, run: ```sql FORCE INSTALL extension_name; ``` Force installing can also be used to overwrite an extension with an extension of the same name from another repository, For example, first, `spatial` is installed from the core repository: ```sql INSTALL spatial; ``` Then, to overwrite this installation with the `spatial` extension from the `core_nightly` repository: ```sql FORCE INSTALL spatial FROM core_nightly; ``` ##### Switching between Repositories {#docs:stable:extensions:installing_extensions::switching-between-repositories} To switch repositories for an extension, use the `FORCE INSTALL` command. For example, if you have installed `httpfs` from the `core_nightly` repository but would like to switch back to using `core`, run: ```sql FORCE INSTALL httpfs FROM core; ``` #### Installing Extensions through Client APIs {#docs:stable:extensions:installing_extensions::installing-extensions-through-client-apis} For many clients, using SQL to load and install extensions is the preferred method. However, some clients have a dedicated API to install and load extensions. For example, the [Python client](#docs:stable:clients:python:overview::loading-and-installing-extensions), has dedicated `install_extension(name: str)` and `load_extension(name: str)` methods. For more details on a specific client API, refer to the [Client API documentation](#docs:stable:clients:overview) #### Installation Location {#docs:stable:extensions:installing_extensions::installation-location} By default, extensions are installed under the user's home directory: ```sql ~/.duckdb/extensions/âŸ¨duckdb_versionâŸ©/âŸ¨platform_nameâŸ©/ ``` For stable DuckDB releases, the `âŸ¨duckdb_versionâŸ©`{:.language-sql .highlight} will be equal to the version tag of that release. For nightly DuckDB builds, it will be equal to the short git hash of the build. So for example, the extensions for DuckDB version v0.10.3 on macOS ARM64 (Apple Silicon) are installed to `~/.duckdb/extensions/v0.10.3/osx_arm64/`. An example installation path for a nightly DuckDB build could be `~/.duckdb/extensions/fc2e4b26a6/linux_amd64`. To change the default location where DuckDB stores its extensions, use the `extension_directory` configuration option: ```sql SET extension_directory = '/path/to/your/extension/directory'; ``` Note that setting the value of the `home_directory` configuration option has no effect on the location of the extensions. #### Uninstalling Extensions {#docs:stable:extensions:installing_extensions::uninstalling-extensions} Currently, DuckDB does not provide a command to uninstall extensions. To uninstall an extension, navigate to the extension's [Installation Location](#::installation-location) and remove its `.duckdb_extension` binary file: For example: ```batch rm ~/.duckdb/extensions/v1.2.1/osx_arm64/excel.duckdb_extension ``` #### Sharing Extensions between Clients {#docs:stable:extensions:installing_extensions::sharing-extensions-between-clients} The shared installation location allows extensions to be shared between the client APIs _of the same DuckDB version_, as long as they share the same `platform` or ABI. For example, if an extension is installed with version 1.2.1 of the CLI client on macOS, it is available from the Python, R, etc. client libraries provided that they have access to the user's home directory and use DuckDB version 1.2.1. #### Limitations {#docs:stable:extensions:installing_extensions::limitations} DuckDB's extension mechanism has the following limitations: * Extensions cannot be unloaded. * Extensions cannot be reloaded. If you [update extensions](#docs:stable:sql:statements:update_extensions), restart the DuckDB process to use newer extensions. ## Advanced Installation Methods {#docs:stable:extensions:advanced_installation_methods} #### Downloading Extensions Directly from S3 {#docs:stable:extensions:advanced_installation_methods::downloading-extensions-directly-from-s3} Downloading an extension directly can be helpful when building a [Lambda service](https://aws.amazon.com/pm/lambda/) or container that uses DuckDB. DuckDB extensions are stored in public S3 buckets, but the directory structure of those buckets is not searchable. As a result, a direct URL to the file must be used. To download an extension file directly, use the following format: ```text http://extensions.duckdb.org/vâŸ¨duckdb_versionâŸ©/âŸ¨platform_nameâŸ©/âŸ¨extension_nameâŸ©.duckdb_extension.gz ``` For example: ```text http://extensions.duckdb.org/v1.4.1/windows_amd64/json.duckdb_extension.gz ``` #### Installing an Extension from an Explicit Path {#docs:stable:extensions:advanced_installation_methods::installing-an-extension-from-an-explicit-path} The `INSTALL` command can be used with the path to a `.duckdb_extension` file: ```sql INSTALL 'path/to/httpfs.duckdb_extension'; ``` Note that compressed `.duckdb_extension.gz` files need to be decompressed beforehand. It is also possible to specify remote paths. #### Loading an Extension from an Explicit Path {#docs:stable:extensions:advanced_installation_methods::loading-an-extension-from-an-explicit-path} `LOAD` can be used with the path to a `.duckdb_extension`. For example, if the file was available at the (relative) path `path/to/httpfs.duckdb_extension`, you can load it as follows: ```sql LOAD 'path/to/httpfs.duckdb_extension'; ``` This will skip any currently installed extensions and load the specified extension directly. Note that using remote paths for compressed files is currently not possible. #### Building and Installing Extensions from Source {#docs:stable:extensions:advanced_installation_methods::building-and-installing-extensions-from-source} For building and installing extensions from source, see the [Building DuckDB guide](#docs:stable:dev:building:overview). ##### Statically Linking Extensions {#docs:stable:extensions:advanced_installation_methods::statically-linking-extensions} To statically link extensions, follow the [developer documentation's â€œUsing extension config filesâ€ section](https://github.com/duckdb/duckdb/blob/main/extension/README.md#using-extension-config-files). ## Extension Distribution {#docs:stable:extensions:extension_distribution} #### Platforms {#docs:stable:extensions:extension_distribution::platforms} Extension binaries are distributed for several platforms (see below). For platforms where packages for certain extensions are not available, users can build them from source and [install the resulting binaries manually](#docs:stable:extensions:advanced_installation_methods::installing-an-extension-from-an-explicit-path). All official extensions are distributed for the following platforms. | Platform name | Operating system | Architecture | CPU types | Used by | |--------------------|------------------|-----------------|--------------------------------|----------------------------| | `linux_amd64` | Linux | x86_64 (AMD64) | | Node.js packages, etc. | | `linux_arm64` | Linux | AArch64 (ARM64) | AWS Graviton, Snapdragon, etc. | All packages | | `osx_amd64` | macOS | x86_64 (AMD64) | Intel | All packages | | `osx_arm64` | macOS | AArch64 (ARM64) | Apple Silicon M1, M2, etc. | All packages | | `windows_amd64` | Windows | x86_64 (AMD64) | Intel, AMD, etc. | All packages | Some extensions are distributed for the following platforms: * `windows_amd64_mingw` * `wasm_eh` and `wasm_mvp` (see [DuckDB-Wasm's extensions](#docs:stable:clients:wasm:extensions)) For platforms outside the ones listed above, we do not officially distribute extensions (e.g., `linux_arm64_android`). #### Extensions Signing {#docs:stable:extensions:extension_distribution::extensions-signing} ##### Signed Extensions {#docs:stable:extensions:extension_distribution::signed-extensions} Extensions can be signed with a cryptographic key. By default, DuckDB uses its built-in public keys to verify the integrity of extension before loading them. All core and community extensions are signed by the DuckDB team. Signing the extension simplifies their distribution, this is why they can be distributed over HTTP without the need for HTTPS, which is itself is supported through an extension ([`httpfs`](#docs:stable:core_extensions:httpfs:overview)). ##### Unsigned Extensions {#docs:stable:extensions:extension_distribution::unsigned-extensions} > **Warning.** > Only load unsigned extensions from sources you trust. > Avoid loading unsigned extensions over HTTP. > Consult the [Securing DuckDB page](#docs:stable:operations_manual:securing_duckdb:securing_extensions) for guidelines on how set up DuckDB in a secure manner. If you wish to load your own extensions or extensions from third-parties you will need to enable the `allow_unsigned_extensions` flag. To load unsigned extensions using the [CLI client](#docs:stable:clients:cli:overview), pass the `-unsigned` flag to it on startup: ```batch duckdb -unsigned ``` Now any extension can be loaded, signed or not: ```sql LOAD './some/local/ext.duckdb_extension'; ``` For client APIs, the `allow_unsigned_extensions` database configuration options needs to be set, see the respective [Client API docs](#docs:stable:clients:overview). For example, for the Python client, see the [Loading and Installing Extensions section in the Python API documentation](#docs:stable:clients:python:overview::loading-and-installing-extensions). #### Binary Compatibility {#docs:stable:extensions:extension_distribution::binary-compatibility} To avoid binary compatibility issues, the binary extensions distributed by DuckDB are tied both to a specific DuckDB version and a [platform](#::platforms). This means that DuckDB can automatically detect binary compatibility between it and a loadable extension. When trying to load an extension that was compiled for a different version or platform, DuckDB will throw an error and refuse to load the extension. #### Creating a Custom Repository {#docs:stable:extensions:extension_distribution::creating-a-custom-repository} You can create custom DuckDB extension repository. A DuckDB repository is an HTTP, HTTPS, S3, or local file based directory that serves the extensions files in a specific structure. This structure is described in the [â€œDownloading Extensions Directly from S3â€ section](#docs:stable:extensions:advanced_installation_methods::downloading-extensions-directly-from-s3), and is the same for local paths and remote servers, for example: ```text base_repository_path_or_url â””â”€â”€ v1.0.0 â””â”€â”€ osx_arm64 â”œâ”€â”€ autocomplete.duckdb_extension â”œâ”€â”€ httpfs.duckdb_extension â”œâ”€â”€ icu.duckdb_extension â”œâ”€â”€ inet.duckdb_extension â”œâ”€â”€ json.duckdb_extension â”œâ”€â”€ parquet.duckdb_extension â”œâ”€â”€ tpcds.duckdb_extension â”œâ”€â”€ tpcds.duckdb_extension â””â”€â”€ tpch.duckdb_extension ``` See the [`extension-template` repository](https://github.com/duckdb/extension-template/) for all necessary code and scripts to set up a repository. When installing an extension from a custom repository, DuckDB will search for both a gzipped and non-gzipped version. For example: ```sql INSTALL icu FROM 'âŸ¨custom_repositoryâŸ©'; ``` The execution of this statement will first look `icu.duckdb_extension.gz`, then `icu.duckdb_extension` in the repository's directory structure. If the custom repository is served over HTTPS or S3, the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview) is required. DuckDB will attempt to [autoload](#docs:stable:extensions:overview::autoloading-extensions) the `httpfs` extension when an installation over HTTPS or S3 is attempted. ## Versioning of Extensions {#docs:stable:extensions:versioning_of_extensions} #### Extension Versioning {#docs:stable:extensions:versioning_of_extensions::extension-versioning} Most software has some sort of version number. Version numbers serve a few important goals: * Tie a binary to a specific state of the source code * Allow determining the expected feature set * Allow determining the state of the APIs * Allow efficient processing of bug reports (e.g., bug `#1337` was introduced in version `v3.4.5` ) * Allow determining chronological order of releases (e.g., version `v1.2.3` is older than `v1.2.4`) * Give an indication of expected stability (e.g., `v0.0.1` is likely not very stable, whereas `v13.11.0` probably is stable) Just like [DuckDB itself](#release_calendar), DuckDB extensions have their own version number. To ensure consistent semantics of these version numbers across the various extensions, DuckDB's [Core Extensions](#docs:stable:core_extensions:overview) use a versioning scheme that prescribes how extensions should be versioned. The versioning scheme for Core Extensions is made up of 3 different stability levels: **unstable**, **pre-release**, and **stable**. Let's go over each of the 3 levels and describe their format: ##### Unstable Extensions {#docs:stable:extensions:versioning_of_extensions::unstable-extensions} Unstable extensions are extensions that can't (or don't want to) give any guarantees regarding their current stability, or their goals of becoming stable. Unstable extensions are tagged with the **short git hash** of the extension. For example, at the time of writing this, the version of the `vss` extension is an unstable extension of version `690bfc5`. What to expect from an extension that has a version number in the **unstable** format? * The state of the source code of the extension can be found by looking up the hash in the extension repository * Functionality may change or be removed completely with every release * This extension's API could change with every release * This extension may not follow a structured release cycle, new (breaking) versions can be pushed at any time ##### Pre-Release Extensions {#docs:stable:extensions:versioning_of_extensions::pre-release-extensions} Pre-release extensions are the next step up from Unstable extensions. They are tagged with version in the **[SemVer](https://semver.org/)** format, more specifically, those in the `v0.y.z` format. In semantic versioning, versions starting with `v0` have a special meaning: they indicate that the more strict semantics of regular (` >v1.0.0`) versions do not yet apply. It basically means that an extensions is working towards becoming a stable extension, but is not quite there yet. For example, at the time of writing this, the version of the `delta` extension is a pre-release extension of version `v0.1.0`. What to expect from an extension that has a version number in the **pre-release** format? * The extension is compiled from the source code corresponding to the tag. * Semantic Versioning semantics apply. See the [Semantic Versioning](https://semver.org/) specification for details. * The extension follows a release cycle where new features are tested in nightly builds before being grouped into a release and pushed to the `core` repository. * Release notes describing what has been added each release should be available to make it easy to understand the difference between versions. ##### Stable Extensions {#docs:stable:extensions:versioning_of_extensions::stable-extensions} Stable extensions are the final step of extension stability. This is denoted by using a **stable SemVer** of format `vx.y.z` where `x>0`. For example, at the time of writing this, the version of the `parquet` extension is a stable extension of version `v1.0.0`. What to expect from an extension that has a version number in the **stable** format? Essentially the same as pre-release extensions, but now the more strict SemVer semantics apply: the API of the extension should now be stable and will only change in backwards incompatible ways when the major version is bumped. See the SemVer specification for details #### Release Cycle of Pre-Release and Stable Core Extensions {#docs:stable:extensions:versioning_of_extensions::release-cycle-of-pre-release-and-stable-core-extensions} In general for extensions the release cycle depends on their stability level. **unstable** extensions are often in sync with DuckDB's release cycle, but may also be quietly updated between DuckDB releases. **pre-release** and **stable** extensions follow their own release cycle. These may or may not coincide with DuckDB releases. To find out more about the release cycle of a specific extension, refer to the documentation or GitHub page of the respective extension. Generally, **pre-release** and **stable** extensions will document their releases as GitHub releases, an example of which you can see in the [`delta` extension](https://github.com/duckdb/duckdb-delta/releases). Finally, there is a small exception: All [in-tree](#docs:stable:extensions:advanced_installation_methods::in-tree-vs-out-of-tree) extensions simply follow DuckDB's release cycle. #### Nightly Builds {#docs:stable:extensions:versioning_of_extensions::nightly-builds} Just like DuckDB itself, DuckDB's core extensions have nightly or dev builds that can be used to try out features before they are officially released. This can be useful when your workflow depends on a new feature, or when you need to confirm that your stack is compatible with the upcoming version. Nightly builds for extensions are slightly complicated due to the fact that currently DuckDB extensions binaries are tightly bound to a single DuckDB version. Because of this tight connection, there is a potential risk for a combinatorial explosion. Therefore, not all combinations of nightly extension build and nightly DuckDB build are available. In general, there are 2 ways of using nightly builds: using a nightly DuckDB build and using a stable DuckDB build. Let's go over the differences between the two: ##### From Stable DuckDB {#docs:stable:extensions:versioning_of_extensions::from-stable-duckdb} In most cases, user's will be interested in a nightly build of a specific extension, but don't necessarily want to switch to using the nightly build of DuckDB itself. This allows using a specific bleeding-edge feature while limiting the exposure to unstable code. To achieve this, Core Extensions tend to regularly push builds to the [`core_nightly` repository](#docs:stable:extensions:installing_extensions::extension-repositories). Let's look at an example: First we install a [**stable DuckDB build**](https://duckdb.org/install/index.html). Then we can install and load a **nightly** extension like this: ```sql INSTALL aws FROM core_nightly; LOAD aws; ``` In this example we are using the latest **nightly** build of the aws extension with the latest **stable** version of DuckDB. ##### From Nightly DuckDB {#docs:stable:extensions:versioning_of_extensions::from-nightly-duckdb} When DuckDB CI produces a nightly binary of DuckDB itself, the binaries are distributed with a set of extensions that are pinned at a specific version. This extension version will be tested for that specific build of DuckDB, but might not be the latest dev build. Let's look at an example: First, we install a [**nightly DuckDB build**](https://duckdb.org/install/index.html). Then, we can install and load the `aws` extension as expected: ```sql INSTALL aws; LOAD aws; ``` #### Updating Extensions {#docs:stable:extensions:versioning_of_extensions::updating-extensions} DuckDB has a dedicated statement that will automatically update all extensions to their latest version. The output will give the user information on which extensions were updated to/from which version. For example: ```sql UPDATE EXTENSIONS; ``` | extension_name | repository | update_result | previous_version | current_version | |:---------------|:-------------|:----------------------|:-----------------|:----------------| | httpfs | core | NO_UPDATE_AVAILABLE | 70fd6a8a24 | 70fd6a8a24 | | delta | core | UPDATED | d9e5cc1 | 04c61e4 | | azure | core | NO_UPDATE_AVAILABLE | 49b63dc | 49b63dc | | aws | core_nightly | NO_UPDATE_AVAILABLE | 42c78d3 | 42c78d3 | Note that DuckDB will look for updates in the source repository for each extension. So if an extension was installed from `core_nightly`, it will be updated with the latest nightly build. The update statement can also be provided with a list of specific extensions to update: ```sql UPDATE EXTENSIONS (httpfs, azure); ``` | extension_name | repository | update_result | previous_version | current_version | |:---------------|:-------------|:----------------------|:-----------------|:----------------| | httpfs | core | NO_UPDATE_AVAILABLE | 70fd6a8a24 | 70fd6a8a24 | | azure | core | NO_UPDATE_AVAILABLE | 49b63dc | 49b63dc | #### Target DuckDB Version {#docs:stable:extensions:versioning_of_extensions::target-duckdb-version} Currently, when extensions are compiled, they are tied to a specific version of DuckDB. What this means is that, for example, an extension binary compiled for version 0.10.3 does not work for version 1.0.0. In most cases, this will not cause any issues and is fully transparent; DuckDB will automatically ensure it installs the correct binary for its version. For extension developers, this means that they must ensure that new binaries are created whenever a new version of DuckDB is released. However, note that DuckDB provides an [extension template](https://github.com/duckdb/extension-template) that makes this fairly simple. #### In-Tree vs. Out-of-Tree {#docs:stable:extensions:versioning_of_extensions::in-tree-vs-out-of-tree} Originally, DuckDB extensions lived exclusively in the DuckDB main repository, `github.com/duckdb/duckdb`. These extensions are called in-tree. Later, the concept of out-of-tree extensions was added, where extensions were separated into their own repository, which we call out-of-tree. While from a user's perspective, there are generally no noticeable differences, there are some minor differences related to versioning: * in-tree extensions use the version of DuckDB instead of having their own version * in-tree extensions do not have dedicated release notes, their changes are reflected in the regular [DuckDB release notes](https://github.com/duckdb/duckdb/releases) * core out-of tree extensions tend to live in repositories named `github.com/duckdb/duckdb-âŸ¨extension_nameâŸ©`{:.language-sql .highlight} but the name may vary. See the [full list](#docs:stable:core_extensions:overview) of core extensions for details. ## Troubleshooting of Extensions {#docs:stable:extensions:troubleshooting} You might be visiting this page directed via a DuckDB error message, similar to: ```sql INSTALL non_existing; ``` ```console HTTP Error: Failed to download extension "non_existing" at URL "http://extensions.duckdb.org/v1.4.0/osx_arm64/non_existing.duckdb_extension.gz" (HTTP 404) Candidate extensions: "inet", "encodings", "core_functions", "sqlite_scanner", "postgres_scanner" For more info, visit https://duckdb.org/docs/stable/extensions/troubleshooting?version=v1.4.0&platform=osx_arm64&extension=non_existing ``` There are multiple scenarios for which an extensions might not be available in a given extension repository at a given time: * the extension have not been uploaded yet, here some delay after a given release date might be expected. Consider checking the issues at [`duckdb/duckdb`](https://github.com/duckdb/duckdb) or [`duckdb/community-extensions`](https://github.com/duckdb/community-extensions), or creating one yourself. * the extension is available, but in a different repository, try for example `INSTALL FROM core;` or `INSTALL FROM community;` or `INSTALL FROM core_nightly;` (see the [Installing Extensions page](#docs:stable:extensions:installing_extensions::extension-repositories)). * networking issues, so extension exists at the endpoint but it's not reachable from your local DuckDB. Here you can try visiting the given URL via a browser directly pasting the link from the error message in the search bar. If you are on a development version of DuckDB, that is any version for which `PRAGMA version` returns a `library_version` not starting with a `v`, then extensions might not be available anymore on the default extension repository. When in doubt, consider raising an issue on [`duckdb/duckdb`](https://github.com/duckdb/duckdb). # Core Extensions {#core_extensions} ## Core Extensions {#docs:stable:core_extensions:overview} #### List of Core Extensions {#docs:stable:core_extensions:overview::list-of-core-extensions} | Name | GitHub | Description | Stage | Aliases | | :---------------------------------------------------------------------- | -------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------- | :----------- | :---------------------- | | [autocomplete](#docs:stable:core_extensions:autocomplete) | | Adds support for autocomplete in the shell | stable | | | [avro](#docs:stable:core_extensions:avro) | [GitHub](https://github.com/duckdb/duckdb-avro) | Add support for reading Avro files | stable | | | [aws](#docs:stable:core_extensions:aws) | [GitHub](https://github.com/duckdb/duckdb-aws) | Provides features that depend on the AWS SDK | stable | | | [azure](#docs:stable:core_extensions:azure) | [GitHub](https://github.com/duckdb/duckdb-azure) | Adds a filesystem abstraction for Azure blob storage to DuckDB | stable | | | [delta](#docs:stable:core_extensions:delta) | [GitHub](https://github.com/duckdb/duckdb-delta) | Adds support for Delta Lake | experimental | | | [ducklake](#docs:stable:core_extensions:ducklake) | [GitHub](https://github.com/duckdb/ducklake) | Adds support for DuckLake | experimental | | | [encodings](#docs:stable:core_extensions:encodings) | [GitHub](https://github.com/duckdb/duckdb-encodings) | Adds support for encodings available in the ICU data repository | experimental | | | [excel](#docs:stable:core_extensions:excel) | [GitHub](https://github.com/duckdb/duckdb-excel) | Adds support for reading and writing Excel files | experimental | | | [fts](#docs:stable:core_extensions:full_text_search) | [GitHub](https://github.com/duckdb/duckdb-fts) | Adds support for full-text search indexes | experimental | | | [httpfs](#docs:stable:core_extensions:httpfs:overview) | [GitHub](https://github.com/duckdb/duckdb-httpfs) | Adds support for reading and writing files over an HTTP(S) or S3 connection | stable | http, https, s3 | | [iceberg](#docs:stable:core_extensions:iceberg:overview) | [GitHub](https://github.com/duckdb/duckdb-iceberg) | Adds support for Apache Iceberg | experimental | | | [icu](#docs:stable:core_extensions:icu) | | Adds support for time zones and collations using the ICU library | stable | | | [inet](#docs:stable:core_extensions:inet) | [GitHub](https://github.com/duckdb/duckdb-inet) | Adds support for IP-related data types and functions | experimental | | | [jemalloc](#docs:stable:core_extensions:jemalloc) | | Overwrites system the allocator with jemalloc | stable | | | [json](#docs:stable:data:json:overview) | | Adds support for JSON operations | stable | | | [mysql](#docs:stable:core_extensions:mysql) | [GitHub](https://github.com/duckdb/duckdb-mysql) | Adds support for reading from and writing to a MySQL database | stable | mysql_scanner | | [parquet](#docs:stable:data:parquet:overview) | | Adds support for reading and writing Parquet files | stable | | | [postgres](#docs:stable:core_extensions:postgres) | [GitHub](https://github.com/duckdb/duckdb-postgres) | Adds support for reading from and writing to a PostgreSQL database | stable | postgres_scanner | | [spatial](#docs:stable:core_extensions:spatial:overview) | [GitHub](https://github.com/duckdb/duckdb-spatial) | Geospatial extension that adds support for working with spatial data and functions | experimental | | | [sqlite](#docs:stable:core_extensions:sqlite) | [GitHub](https://github.com/duckdb/duckdb-sqlite) | Adds support for reading from and writing to SQLite database files | stable | sqlite_scanner, sqlite3 | | [tpcds](#docs:stable:core_extensions:tpcds) | | Adds TPC-DS data generation and query support | experimental | | | [tpch](#docs:stable:core_extensions:tpch) | | Adds TPC-H data generation and query support | stable | | | [ui](#docs:stable:core_extensions:ui) | [GitHub](https://github.com/duckdb/duckdb-ui) | Adds local UI for DuckDB | experimental | | | [vss](#docs:stable:core_extensions:vss) | [GitHub](https://github.com/duckdb/duckdb-vss) | Adds support for vector similarity search queries | experimental | | The **Stage** column shows the lifecycle stage of the extension following the convention of the [lifecycle stages used in tidyverse](https://lifecycle.r-lib.org/articles/stages.html). #### Default Extensions {#docs:stable:core_extensions:overview::default-extensions} Different DuckDB clients ship a different set of extensions. We summarize the main distributions in the table below. | Name | CLI | Python | R | Java | Node.js | | ----------------------------------------------------------------------- | --- | ------ | --- | ---- | ------- | | [autocomplete](#docs:stable:core_extensions:autocomplete) | yes | | | | | | [icu](#docs:stable:core_extensions:icu) | yes | yes | | yes | yes | | [json](#docs:stable:data:json:overview) | yes | yes | | yes | yes | | [parquet](#docs:stable:data:parquet:overview) | yes | yes | yes | yes | yes | The jemalloc extension's availability is based on the operating system. Please check the [jemalloc page](#docs:stable:core_extensions:jemalloc) for details. ## AutoComplete Extension {#docs:stable:core_extensions:autocomplete} The `autocomplete` extension adds supports for autocomplete in the [CLI client](#docs:stable:clients:cli:overview). The extension is shipped by default with the CLI client. #### Behavior {#docs:stable:core_extensions:autocomplete::behavior} For the behavior of the `autocomplete` extension, see the [documentation of the CLI client](#docs:stable:clients:cli:autocomplete). #### Functions {#docs:stable:core_extensions:autocomplete::functions} | Function | Description | |:----------------------------------|:-----------------------------------------------------| | `sql_auto_complete(query_string)` | Attempts autocompletion on the given `query_string`. | #### Example {#docs:stable:core_extensions:autocomplete::example} ```sql SELECT * FROM sql_auto_complete('SEL'); ``` Returns: | suggestion | suggestion_start | |-------------|------------------| | SELECT | 0 | | DELETE | 0 | | INSERT | 0 | | CALL | 0 | | LOAD | 0 | | CALL | 0 | | ALTER | 0 | | BEGIN | 0 | | EXPORT | 0 | | CREATE | 0 | | PREPARE | 0 | | EXECUTE | 0 | | EXPLAIN | 0 | | ROLLBACK | 0 | | DESCRIBE | 0 | | SUMMARIZE | 0 | | CHECKPOINT | 0 | | DEALLOCATE | 0 | | UPDATE | 0 | | DROP | 0 | ## Avro Extension {#docs:stable:core_extensions:avro} The `avro` extension enables DuckDB to read [Apache Avro](https://avro.apache.org) files. > The `avro` extensions was [released as a community extension in late 2024](https://duckdb.org/2024/12/09/duckdb-avro-extension) and became a core extension in early 2025. #### The `read_avro` Function {#docs:stable:core_extensions:avro::the-read_avro-function} The extension adds a single DuckDB function, `read_avro`. This function can be used like so: ```sql FROM read_avro('âŸ¨some_fileâŸ©.avro'); ``` This function will expose the contents of the Avro file as a DuckDB table. You can then use any arbitrary SQL constructs to further transform this table. #### File IO {#docs:stable:core_extensions:avro::file-io} The `read_avro` function is integrated into DuckDB's file system abstraction, meaning you can read Avro files directly from, e.g., HTTP or S3 sources. For example: ```sql FROM read_avro('http://blobs.duckdb.org/data/userdata1.avro'); FROM read_avro('s3://âŸ¨your-bucketâŸ©/âŸ¨some_fileâŸ©.avro'); ``` should "just" work. You can also *glob* multiple files in a single read call or pass a list of files to the functions: ```sql FROM read_avro('some_file_*.avro'); FROM read_avro(['some_file_1.avro', 'some_file_2.avro']); ``` If the filenames somehow contain valuable information (as is unfortunately all-too-common), you can pass the `filename` argument to `read_avro`: ```sql FROM read_avro('some_file_*.avro', filename=true); ``` This will result in an additional column in the result set that contains the actual filename of the Avro file. #### Schema Conversion {#docs:stable:core_extensions:avro::schema-conversion} This extension automatically translates the Avro Schema to the DuckDB schema. *All* Avro types can be translated, except for *recursive type definitions*, which DuckDB does not support. The type mapping is very straightforward except for Avro's "unique" way of handling `NULL`. Unlike other systems, Avro does not treat `NULL` as a possible value in a range of e.g. `INTEGER` but instead represents `NULL` as a union of the actual type with a special `NULL` type. This is different to DuckDB, where any value can be `NULL`. Of course DuckDB also supports `UNION` types, but this would be quite cumbersome to work with. This extension *simplifies* the Avro schema where possible: An Avro union of any type and the special null type is simplified to just the non-null type. For example, an Avro record of the union type `["int","null"]` becomes a DuckDB `INTEGER`, which just happens to be `NULL` sometimes. Similarly, an Avro union that contains only a single type is converted to the type it contains. For example, an Avro record of the union type `["int"]` also becomes a DuckDB `INTEGER`. The extension also "flattens" the Avro schema. Avro defines tables as root-level "record" fields, which are the same as DuckDB `STRUCT` fields. For more convenient handling, this extension turns the entries of a single top-level record into top-level columns. #### Implementation {#docs:stable:core_extensions:avro::implementation} Internally, this extension uses the "official" [Apache Avro C API](https://avro.apache.org/docs/++version++/api/c/), albeit with some minor patching to allow reading of Avro files from memory. #### Limitations and Future Plans {#docs:stable:core_extensions:avro::limitations-and-future-plans} * This extension currently does not make use of **parallelism** when reading either a single (large) Avro file or when reading a list of files. Adding support for parallelism in the latter case is on the roadmap. * There is currently no support for neither projection nor filter **pushdown**, but this is also planned at a later stage. * There is currently no support for the Wasm or the Windows-MinGW builds of DuckDB due to issues with the Avro library dependency (sigh again). We plan to fix this eventually. * As mentioned above, DuckDB cannot express recursive type definitions that Avro has, this is unlikely to ever change. * There is no support to allow users to provide a separate Avro schema file. This is unlikely to change, all Avro files we have seen so far had their schema embedded. * There is currently no support for the `union_by_name` flag that other readers in DuckDB support. This is planned for the future. ## AWS Extension {#docs:stable:core_extensions:aws} The `aws` extension adds functionality (e.g., authentication) on top of the `httpfs` extension's [S3 capabilities](#docs:stable:core_extensions:httpfs:overview::s3-api), using the AWS SDK. #### Installing and Loading {#docs:stable:core_extensions:aws::installing-and-loading} The `aws` extension will be transparently [autoloaded](#docs:stable:core_extensions:overview::autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run: ```sql INSTALL aws; LOAD aws; ``` > In most cases, the `aws` extension works in conjunction with the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview. ::). ##### `config` Provider {#docs:stable:core_extensions:aws::config-provider} The default provider, `config` (i.e., user-configured), allows access to the S3 bucket by manually providing a key. For example: ```sql CREATE OR REPLACE SECRET secret ( TYPE s3, PROVIDER config, KEY_ID 'âŸ¨AKIAIOSFODNN7EXAMPLEâŸ©', SECRET 'âŸ¨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYâŸ©', REGION 'âŸ¨us-east-1âŸ©' ); ``` > **Tip.** If you get an IO Error (` Connection error for HTTP HEAD`), configure the endpoint explicitly via `ENDPOINT 's3.âŸ¨your_regionâŸ©.amazonaws.com'`{:.language-sql .highlight}. Now, to query using the above secret, simply query any `s3://` prefixed file: ```sql SELECT * FROM 's3://âŸ¨your-bucketâŸ©/âŸ¨your_fileâŸ©.parquet'; ``` ##### `credential_chain` Provider {#docs:stable:core_extensions:aws::credential_chain-provider} The `credential_chain` provider allows automatically fetching credentials using mechanisms provided by the AWS SDK. For example, to use the AWS SDK default provider: ```sql CREATE OR REPLACE SECRET secret ( TYPE s3, PROVIDER credential_chain ); ``` Again, to query a file using the above secret, simply query any `s3://` prefixed file. DuckDB also allows specifying a specific chain using the `CHAIN` keyword. This takes a semicolon-separated list (` a;b;c`) of providers that will be tried in order. For example: ```sql CREATE OR REPLACE SECRET secret ( TYPE s3, PROVIDER credential_chain, CHAIN 'env;config' ); ``` The possible values for `CHAIN` are the following: * [`config`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html) * [`sts`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_s_t_s_assume_role_web_identity_credentials_provider.html) * [`sso`](https://aws.amazon.com/what-is/sso/) * [`env`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html) * [`instance`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html) * [`process`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_process_credentials_provider.html) The `credential_chain` provider also allows overriding the automatically fetched config. For example, to automatically load credentials, and then override the region, run: ```sql CREATE OR REPLACE SECRET secret ( TYPE s3, PROVIDER credential_chain, CHAIN config, REGION 'âŸ¨eu-west-1âŸ©' ); ``` ##### Validation {#docs:stable:core_extensions:aws::validation} Since v1.4.0, the AWS `credential_chain` provider will look for any required credentials during `CREATE SECRET` time, failing if absent/unavailable. Since v1.4.1 this behavior may be configured via the `VALIDATION` option as follows: ```sql CREATE OR REPLACE SECRET secret ( TYPE s3, PROVIDER credential_chain, VALIDATION 'exists' ); ``` Two validation modes are supported: * `exists` (default) requires present credentials. * `none` allows `CREATE SECRET` to succeed for `credential_chains` with no available credentials. > `VALIDATION 'exists'` validates only the __presence__ of a credential, __not its operational readiness__. Thus, no attempt is made to > convert into an access token, or perform a read, write, etc. ##### Auto-Refresh {#docs:stable:core_extensions:aws::auto-refresh} Some AWS endpoints require periodic refreshing of the credentials. This can be specified with the `REFRESH auto` option: ```sql CREATE SECRET env_test ( TYPE s3, PROVIDER credential_chain, REFRESH auto ); ``` #### Legacy Features {#docs:stable:core_extensions:aws::legacy-features} > **Deprecated.** The `load_aws_credentials` function is deprecated. Prior to version 0.10.0, DuckDB did not have a [Secrets manager](#docs:stable:sql:statements:create_secret), to load the credentials automatically, the AWS extension provided a special function to load the AWS credentials in the [legacy authentication method](#docs:stable:core_extensions:httpfs:s3api_legacy_authentication). | Function | Type | Description | |---|---|-------| | `load_aws_credentials` | `PRAGMA` function | Loads the AWS credentials through the [AWS Default Credentials Provider Chain](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials-chain.html) | ##### Load AWS Credentials (Legacy) {#docs:stable:core_extensions:aws::load-aws-credentials-legacy} To load the AWS credentials, run: ```sql CALL load_aws_credentials(); ``` | loaded_access_key_id | loaded_secret_access_key | loaded_session_token | loaded_region | |----------------------|--------------------------|----------------------|---------------| | AKIAIOSFODNN7EXAMPLE | `` | NULL | us-east-2 | The function takes a string parameter to specify a specific profile: ```sql CALL load_aws_credentials('minio-testing-2'); ``` | loaded_access_key_id | loaded_secret_access_key | loaded_session_token | loaded_region | |----------------------|--------------------------|----------------------|---------------| | minio_duckdb_user_2 | `` | NULL | NULL | There are several parameters to tweak the behavior of the call: ```sql CALL load_aws_credentials('minio-testing-2', set_region = false, redact_secret = false); ``` | loaded_access_key_id | loaded_secret_access_key | loaded_session_token | loaded_region | |----------------------|------------------------------|----------------------|---------------| | minio_duckdb_user_2 | minio_duckdb_user_password_2 | NULL | NULL | ## Azure Extension {#docs:stable:core_extensions:azure} The `azure` extension is a loadable extension that adds a filesystem abstraction for the [Azure Blob storage](https://azure.microsoft.com/en-us/products/storage/blobs) to DuckDB. #### Installing and Loading {#docs:stable:core_extensions:azure::installing-and-loading} The `azure` extension will be transparently [autoloaded](#docs:stable:core_extensions:overview::autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run: ```sql INSTALL azure; LOAD azure; ``` #### Usage {#docs:stable:core_extensions:azure::usage} Once the [authentication](#::authentication) is set up, you can query Azure storage as follows: ##### Azure Blob Storage {#docs:stable:core_extensions:azure::azure-blob-storage} Allowed URI schemes: `az` or `azure` ```sql SELECT count(*) FROM 'az://âŸ¨my_containerâŸ©/âŸ¨pathâŸ©/âŸ¨my_fileâŸ©.âŸ¨parquet_or_csvâŸ©'; ``` Globs are also supported: ```sql SELECT * FROM 'az://âŸ¨my_containerâŸ©/âŸ¨pathâŸ©/*.csv'; ``` ```sql SELECT * FROM 'az://âŸ¨my_containerâŸ©/âŸ¨pathâŸ©/**'; ``` Or with a fully qualified path syntax: ```sql SELECT count(*) FROM 'az://âŸ¨my_storage_accountâŸ©.blob.core.windows.net/âŸ¨my_containerâŸ©/âŸ¨pathâŸ©/âŸ¨my_fileâŸ©.âŸ¨parquet_or_csvâŸ©'; ``` ```sql SELECT * FROM 'az://âŸ¨my_storage_accountâŸ©.blob.core.windows.net/âŸ¨my_containerâŸ©/âŸ¨pathâŸ©/*.csv'; ``` ##### Azure Data Lake Storage (ADLS) {#docs:stable:core_extensions:azure::azure-data-lake-storage-adls} Allowed URI schemes: `abfss` ```sql SELECT count(*) FROM 'abfss://âŸ¨my_filesystemâŸ©/âŸ¨pathâŸ©/âŸ¨my_fileâŸ©.âŸ¨parquet_or_csvâŸ©'; ``` Globs are also supported: ```sql SELECT * FROM 'abfss://âŸ¨my_filesystemâŸ©/âŸ¨pathâŸ©/*.csv'; ``` ```sql SELECT * FROM 'abfss://âŸ¨my_filesystemâŸ©/âŸ¨pathâŸ©/**'; ``` Or with a fully qualified path syntax: ```sql SELECT count(*) FROM 'abfss://âŸ¨my_storage_accountâŸ©.dfs.core.windows.net/âŸ¨my_filesystemâŸ©/âŸ¨pathâŸ©/âŸ¨my_fileâŸ©.âŸ¨parquet_or_csvâŸ©'; ``` ```sql SELECT * FROM 'abfss://âŸ¨my_storage_accountâŸ©.dfs.core.windows.net/âŸ¨my_filesystemâŸ©/âŸ¨pathâŸ©/*.csv'; ``` #### Configuration {#docs:stable:core_extensions:azure::configuration} Use the following [configuration options](#docs:stable:configuration:overview) how the extension reads remote files: | Name | Description | Type | Default | |:---|:---|:---|:---| | `azure_http_stats` | Include HTTP info from Azure Storage in the [`EXPLAIN ANALYZE` statement](#docs:stable:dev:profiling). | `BOOLEAN` | `false` | | `azure_read_transfer_concurrency` | Maximum number of threads the Azure client can use for a single parallel read. If `azure_read_transfer_chunk_size` is less than `azure_read_buffer_size` then setting this > 1 will allow the Azure client to do concurrent requests to fill the buffer. | `BIGINT` | `5` | | `azure_read_transfer_chunk_size` | Maximum size in bytes that the Azure client will read in a single request. It is recommended that this is a factor of `azure_read_buffer_size`. | `BIGINT` | `1024*1024` | | `azure_read_buffer_size` | Size of the read buffer. It is recommended that this is evenly divisible by `azure_read_transfer_chunk_size`. | `UBIGINT` | `1024*1024` | | `azure_transport_option_type` | Underlying [adapter](https://github.com/Azure/azure-sdk-for-cpp/blob/main/doc/HttpTransportAdapter.md) to use in the Azure SDK. Valid values are: `default` or `curl`. | `VARCHAR` | `default` | | `azure_context_caching` | Enable/disable the caching of the underlying Azure SDK HTTP connection in the DuckDB connection context when performing queries. If you suspect that this is causing some side effect, you can try to disable it by setting it to false (not recommended). | `BOOLEAN` | `true` | > Setting `azure_transport_option_type` explicitly to `curl` will have the following effect: > * On Linux, this may solve certificate issue (` Error: Invalid Error: Fail to get a new connection for: https://storage_account_name.blob.core.windows.net/. Problem with the SSL CA cert (path? access rights?)`) because when specifying the extension will try to find the bundle certificate in various paths (that is not done by *curl* by default and might be wrong due to static linking). > * On Windows, this replaces the default adapter (*WinHTTP*) allowing you to use all *curl* capabilities (for example using a socks proxies). > * On all operating systems, it will honor the following environment variables: > * `CURL_CA_INFO`: Path to a PEM encoded file containing the certificate authorities sent to libcurl. Note that this option is known to only work on Linux and might throw if set on other platforms. > * `CURL_CA_PATH`: Path to a directory which holds PEM encoded files, containing the certificate authorities sent to libcurl. Example: ```sql SET azure_http_stats = false; SET azure_read_transfer_concurrency = 5; SET azure_read_transfer_chunk_size = 1_048_576; SET azure_read_buffer_size = 1_048_576; ``` #### Authentication {#docs:stable:core_extensions:azure::authentication} The Azure extension has two ways to configure the authentication. The preferred way is to use Secrets. ##### Authentication with Secret {#docs:stable:core_extensions:azure::authentication-with-secret} Multiple [Secret Providers](#docs:stable:configuration:secrets_manager::secret-providers) are available for the Azure extension: * If you need to define different secrets for different storage accounts, use the [`SCOPE` configuration](#docs:stable:configuration:secrets_manager::creating-multiple-secrets-for-the-same-service-type). Note that the `SCOPE` requires a trailing slash (` SCOPE 'azure://some_container/'`). * If you use fully qualified path then the `ACCOUNT_NAME` attribute is optional. ###### `CONFIG` Provider {#docs:stable:core_extensions:azure::config-provider} The default provider, `CONFIG` (i.e., user-configured), allows access to the storage account using a connection string or anonymously. For example: ```sql CREATE SECRET secret1 ( TYPE azure, CONNECTION_STRING 'âŸ¨valueâŸ©' ); ``` If you do not use authentication, you still need to specify the storage account name. For example: ```sql CREATE SECRET secret2 ( TYPE azure, PROVIDER config, ACCOUNT_NAME 'âŸ¨storage_account_nameâŸ©' ); ``` The default `PROVIDER` is `CONFIG`. ###### `credential_chain` Provider {#docs:stable:core_extensions:azure::credential_chain-provider} The `credential_chain` provider allows connecting using credentials automatically fetched by the Azure SDK via the Azure credential chain. By default, the `DefaultAzureCredential` chain used, which tries credentials according to the order specified by the [Azure documentation](https://learn.microsoft.com/en-us/javascript/api/@azure/identity/defaultazurecredential?view=azure-node-latest#@azure-identity-defaultazurecredential-constructor). For example: ```sql CREATE SECRET secret3 ( TYPE azure, PROVIDER credential_chain, ACCOUNT_NAME 'âŸ¨storage_account_nameâŸ©' ); ``` DuckDB also allows specifying a specific chain using the `CHAIN` keyword. This takes a semicolon-separated list (` a;b;c`) of providers that will be tried in order. For example: ```sql CREATE SECRET secret4 ( TYPE azure, PROVIDER credential_chain, CHAIN 'cli;env', ACCOUNT_NAME 'âŸ¨storage_account_nameâŸ©' ); ``` The possible values are the following: [`cli`](https://learn.microsoft.com/en-us/cli/azure/authenticate-azure-cli); [`managed_identity`](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview); [`workload_identity`](https://learn.microsoft.com/en-us/entra/workload-id/workload-identities-overview); [`env`](https://github.com/Azure/azure-sdk-for-cpp/blob/azure-identity_1.6.0/sdk/identity/azure-identity/README.md#environment-variables); [`default`](https://github.com/Azure/azure-sdk-for-cpp/blob/azure-identity_1.6.0/sdk/identity/azure-identity/README.md#defaultazurecredential); If no explicit `CHAIN` is provided, the default one will be [`default`](https://github.com/Azure/azure-sdk-for-cpp/blob/azure-identity_1.6.0/sdk/identity/azure-identity/README.md#defaultazurecredential) ###### Managed Identity {#docs:stable:core_extensions:azure::managed-identity} Managed Identity (MI) can be used gracefully and automatically via the `credential_chain`. In typical cases where the executor has a single MI available, no configuration is needed. If your execution environment has multiple Identities, use the `MANAGED_IDENTITY` provider and specify which identity to use. This provider allows identity specification via one of `CLIENT_ID`, `OBJECT_ID` or `RESOURCE_ID`, e.g.: ```sql CREATE SECRET secret1 ( TYPE AZURE, PROVIDER MANAGED_IDENTITY, ACCOUNT_NAME 'âŸ¨storage account nameâŸ©', CLIENT_ID 'âŸ¨used-assigned managed identity client idâŸ©' ); ``` The provider may be used without specifying an ID; if only a single ID is available this provider will function identically to the `credential_chain` provider, and use the single available ID. If multiple IDs are available, behavior is undefined (or more specifically, defined by the Azure SDK) â€“ therefore we recommend explicit Identity setting in this situation. ###### `SERVICE_PRINCIPAL` Provider {#docs:stable:core_extensions:azure::service_principal-provider} The `SERVICE_PRINCIPAL` provider allows connecting using a [Azure Service Principal (SPN)](https://learn.microsoft.com/en-us/entra/architecture/service-accounts-principal). Either with a secret: ```sql CREATE SECRET azure_spn ( TYPE azure, PROVIDER service_principal, TENANT_ID 'âŸ¨tenant_idâŸ©', CLIENT_ID 'âŸ¨client_idâŸ©', CLIENT_SECRET 'âŸ¨client_secretâŸ©', ACCOUNT_NAME 'âŸ¨storage_account_nameâŸ©' ); ``` Or with a certificate: ```sql CREATE SECRET azure_spn_cert ( TYPE azure, PROVIDER service_principal, TENANT_ID 'âŸ¨tenant_idâŸ©', CLIENT_ID 'âŸ¨client_idâŸ©', CLIENT_CERTIFICATE_PATH 'âŸ¨client_cert_pathâŸ©', ACCOUNT_NAME 'âŸ¨storage_account_nameâŸ©' ); ``` ###### Configuring a Proxy {#docs:stable:core_extensions:azure::configuring-a-proxy} To configure proxy information when using secrets, you can add `HTTP_PROXY`, `PROXY_USER_NAME`, and `PROXY_PASSWORD` in the secret definition. For example: ```sql CREATE SECRET secret5 ( TYPE azure, CONNECTION_STRING 'âŸ¨valueâŸ©', HTTP_PROXY 'http://localhost:3128', PROXY_USER_NAME 'john', PROXY_PASSWORD 'doe' ); ``` > * When using secrets, the `HTTP_PROXY` environment variable will still be honored except if you provide an explicit value for it. > * When using secrets, the `SET` variable of the *Authentication with variables* session will be ignored. > * The Azure `credential_chain` provider, the actual token is fetched at query time, not when the secret is created. ##### Authentication with Variables (Deprecated) {#docs:stable:core_extensions:azure::authentication-with-variables-deprecated} ```sql SET variable_name = variable_value; ``` Where `variable_name` can be one of the following: | Name | Description | Type | Default | |:---|:---|:---|:---| | `azure_storage_connection_string` | Azure connection string, used for authenticating and configuring Azure requests. | `STRING` | - | | `azure_account_name` | Azure account name, when set, the extension will attempt to automatically detect credentials (not used if you pass the connection string). | `STRING` | - | | `azure_endpoint` | Override the Azure endpoint for when the Azure credential providers are used. | `STRING` | `blob.core.windows.net` | | `azure_credential_chain`| Ordered list of Azure credential providers, in string format separated by `;`. For example: `'cli;managed_identity;env'`. See the list of possible values in the [`credential_chain` provider section](#::credential_chain-provider). Not used if you pass the connection string. | `STRING` | - | | `azure_http_proxy` | Proxy to use when login & performing request to Azure. | `STRING` | `HTTP_PROXY` environment variable (if set). | | `azure_proxy_user_name` | HTTP proxy username if needed. | `STRING` | - | | `azure_proxy_password` | HTTP proxy password if needed. | `STRING` | - | #### Additional Information {#docs:stable:core_extensions:azure::additional-information} ##### Logging {#docs:stable:core_extensions:azure::logging} The Azure extension relies on the Azure SDK to connect to Azure Blob storage and supports printing the SDK logs to the console. To control the log level, set the [`AZURE_LOG_LEVEL`](https://github.com/Azure/azure-sdk-for-cpp/blob/main/sdk/core/azure-core/README.md#sdk-log-messages) environment variable. For instance, verbose logs can be enabled in Python as follows: ```python import os import duckdb os.environ["AZURE_LOG_LEVEL"] = "verbose" duckdb.sql("CREATE SECRET myaccount (TYPE azure, PROVIDER credential_chain, SCOPE 'az://myaccount.blob.core.windows.net/')") duckdb.sql("SELECT count(*) FROM 'az://myaccount.blob.core.windows.net/path/to/blob.parquet'") ``` ##### Difference between ADLS and Blob Storage {#docs:stable:core_extensions:azure::difference-between-adls-and-blob-storage} Even though ADLS implements similar functionality as the Blob storage, there are some important performance benefits to using the ADLS endpoints for globbing, especially when using (complex) glob patterns. To demonstrate, let's look at an example of how a glob is performed internally using the Blob and ADLS endpoints, respectively. Using the following filesystem: ```text root â”œâ”€â”€ l_receipmonth=1997-10 â”‚ â”œâ”€â”€ l_shipmode=AIR â”‚ â”‚ â””â”€â”€ data_0.csv â”‚ â”œâ”€â”€ l_shipmode=SHIP â”‚ â”‚ â””â”€â”€ data_0.csv â”‚ â””â”€â”€ l_shipmode=TRUCK â”‚ â””â”€â”€ data_0.csv â”œâ”€â”€ l_receipmonth=1997-11 â”‚ â”œâ”€â”€ l_shipmode=AIR â”‚ â”‚ â””â”€â”€ data_0.csv â”‚ â”œâ”€â”€ l_shipmode=SHIP â”‚ â”‚ â””â”€â”€ data_0.csv â”‚ â””â”€â”€ l_shipmode=TRUCK â”‚ â””â”€â”€ data_0.csv â””â”€â”€ l_receipmonth=1997-12 â”œâ”€â”€ l_shipmode=AIR â”‚ â””â”€â”€ data_0.csv â”œâ”€â”€ l_shipmode=SHIP â”‚ â””â”€â”€ data_0.csv â””â”€â”€ l_shipmode=TRUCK â””â”€â”€ data_0.csv ``` The following query is performed through the Blob endpoint: ```sql SELECT count(*) FROM 'az://root/l_receipmonth=1997-*/l_shipmode=SHIP/*.csv'; ``` It will perform the following steps: * List all the files with the prefix `root/l_receipmonth=1997-` * `root/l_receipmonth=1997-10/l_shipmode=SHIP/data_0.csv` * `root/l_receipmonth=1997-10/l_shipmode=AIR/data_0.csv` * `root/l_receipmonth=1997-10/l_shipmode=TRUCK/data_0.csv` * `root/l_receipmonth=1997-11/l_shipmode=SHIP/data_0.csv` * `root/l_receipmonth=1997-11/l_shipmode=AIR/data_0.csv` * `root/l_receipmonth=1997-11/l_shipmode=TRUCK/data_0.csv` * `root/l_receipmonth=1997-12/l_shipmode=SHIP/data_0.csv` * `root/l_receipmonth=1997-12/l_shipmode=AIR/data_0.csv` * `root/l_receipmonth=1997-12/l_shipmode=TRUCK/data_0.csv` * Filter the result with the requested pattern `root/l_receipmonth=1997-*/l_shipmode=SHIP/*.csv` * `root/l_receipmonth=1997-10/l_shipmode=SHIP/data_0.csv` * `root/l_receipmonth=1997-11/l_shipmode=SHIP/data_0.csv` * `root/l_receipmonth=1997-12/l_shipmode=SHIP/data_0.csv` Meanwhile, the same query can be performed through the datalake endpoint as follows: ```sql SELECT count(*) FROM 'abfss://root/l_receipmonth=1997-*/l_shipmode=SHIP/*.csv'; ``` This will perform the following steps: * List all directories in `root/` * `root/l_receipmonth=1997-10` * `root/l_receipmonth=1997-11` * `root/l_receipmonth=1997-12` * Filter and list subdirectories: `root/l_receipmonth=1997-10`, `root/l_receipmonth=1997-11`, `root/l_receipmonth=1997-12` * `root/l_receipmonth=1997-10/l_shipmode=SHIP` * `root/l_receipmonth=1997-10/l_shipmode=AIR` * `root/l_receipmonth=1997-10/l_shipmode=TRUCK` * `root/l_receipmonth=1997-11/l_shipmode=SHIP` * `root/l_receipmonth=1997-11/l_shipmode=AIR` * `root/l_receipmonth=1997-11/l_shipmode=TRUCK` * `root/l_receipmonth=1997-12/l_shipmode=SHIP` * `root/l_receipmonth=1997-12/l_shipmode=AIR` * `root/l_receipmonth=1997-12/l_shipmode=TRUCK` * Filter and list subdirectories: `root/l_receipmonth=1997-10/l_shipmode=SHIP`, `root/l_receipmonth=1997-11/l_shipmode=SHIP`, `root/l_receipmonth=1997-12/l_shipmode=SHIP` * `root/l_receipmonth=1997-10/l_shipmode=SHIP/data_0.csv` * `root/l_receipmonth=1997-11/l_shipmode=SHIP/data_0.csv` * `root/l_receipmonth=1997-12/l_shipmode=SHIP/data_0.csv` As you can see because the Blob endpoint does not support the notion of directories, the filter can only be performed after the listing, whereas the ADLS endpoint will list files recursively. Especially with higher partition/directory counts, the performance difference can be very significant. ## Delta Extension {#docs:stable:core_extensions:delta} The `delta` extension adds support for the [Delta Lake open-source storage format](https://delta.io/). It is built using the [Delta Kernel](https://github.com/delta-incubator/delta-kernel-rs). The extension offers **read support** for Delta tables, both local and remote. For implementation details, see the [announcement blog post](https://duckdb.org/2024/06/10/delta). > **Warning.** The `delta` extension is currently experimental and is [only supported on given platforms](#::supported-duckdb-versions-and-platforms). #### Installing and Loading {#docs:stable:core_extensions:delta::installing-and-loading} The `delta` extension will be transparently [autoloaded](#docs:stable:extensions:overview::autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run: ```sql INSTALL delta; LOAD delta; ``` #### Usage {#docs:stable:core_extensions:delta::usage} To scan a local Delta table, run: ```sql SELECT * FROM delta_scan('file:///some/path/on/local/machine'); ``` ##### Reading from an S3 Bucket {#docs:stable:core_extensions:delta::reading-from-an-s3-bucket} To scan a Delta table in an [S3 bucket](#docs:stable:core_extensions:httpfs:s3api), run: ```sql SELECT * FROM delta_scan('s3://some/delta/table'); ``` For authenticating to S3 buckets, DuckDB [Secrets](#docs:stable:configuration:secrets_manager) are supported: ```sql CREATE SECRET ( TYPE s3, PROVIDER credential_chain ); SELECT * FROM delta_scan('s3://some/delta/table/with/auth'); ``` To scan public buckets on S3, you may need to pass the correct region by creating a secret containing the region of your public S3 bucket: ```sql CREATE SECRET ( TYPE s3, REGION 'my-region' ); SELECT * FROM delta_scan('s3://some/public/table/in/my-region'); ``` ##### Reading from Azure Blob Storage {#docs:stable:core_extensions:delta::reading-from-azure-blob-storage} To scan a Delta table in an [Azure Blob Storage bucket](#docs:stable:core_extensions:azure::azure-blob-storage), run: ```sql SELECT * FROM delta_scan('az://my-container/my-table'); ``` For authenticating to Azure Blob Storage, DuckDB [Secrets](#docs:stable:configuration:secrets_manager) are supported: ```sql CREATE SECRET ( TYPE azure, PROVIDER credential_chain ); SELECT * FROM delta_scan('az://my-container/my-table-with-auth'); ``` #### Features {#docs:stable:core_extensions:delta::features} While the `delta` extension is still experimental, many (scanning) features and optimizations are already supported: * multithreaded scans and Parquet metadata reading * data skipping/filter pushdown * skipping row groups in file (based on Parquet metadata) * skipping complete files (based on Delta partition information) * projection pushdown * scanning tables with deletion vectors * all primitive types * structs * S3 support with secrets More optimizations are going to be released in the future. #### Supported DuckDB Versions and Platforms {#docs:stable:core_extensions:delta::supported-duckdb-versions-and-platforms} The `delta` extension requires DuckDB version 0.10.3 or newer. The `delta` extension currently only supports the following platforms: * Linux AMD64 (x86_64 and ARM64): `linux_amd64` and `linux_arm64` * macOS Intel and Apple Silicon: `osx_amd64` and `osx_arm64` * Windows AMD64: `windows_amd64` Support for the [other DuckDB platforms](#docs:stable:extensions:extension_distribution::platforms) is work-in-progress. ## DuckLake {#docs:stable:core_extensions:ducklake} > DuckLake has been released in May 2025. > Read the [announcement blog post](https://duckdb.org/2025/05/27/ducklake). The `ducklake` extension add support for attaching to databases stored in the [DuckLake format](http://ducklake.select/). The complete documentation of this extension is available at the [DuckLake website](https://ducklake.select/docs/stable/duckdb/introduction). #### Installing and Loading {#docs:stable:core_extensions:ducklake::installing-and-loading} To install `ducklake`, run: ```sql INSTALL ducklake; ``` The `ducklake` extension will be transparently [autoloaded](#docs:stable:core_extensions:overview::autoloading-extensions) on first use in an `ATTACH` clause. If you would like to load it manually, run: ```sql LOAD ducklake; ``` #### Usage {#docs:stable:core_extensions:ducklake::usage} ```sql ATTACH 'ducklake:metadata.ducklake' AS my_ducklake (DATA_PATH 'data_files'); USE my_ducklake; ``` #### Tables {#docs:stable:core_extensions:ducklake::tables} In DuckDB, the `ducklake` extension stores the [catalog tables](http://ducklake.select/docs/stable/specification/tables/overview) for a DuckLake named `my_ducklake` in the `__ducklake_metadata_âŸ¨my_ducklakeâŸ©`{:.language-sql .highlight} catalog. #### Functions {#docs:stable:core_extensions:ducklake::functions} Note that DuckLake registers several functions. These should be called with the catalog name as the first argument, e.g.: ```sql FROM ducklake_snapshots('my_ducklake'); ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ snapshot_id â”‚ snapshot_time â”‚ schema_version â”‚ changes â”‚ â”‚ int64 â”‚ timestamp with time zone â”‚ int64 â”‚ map(varchar, varchar[]) â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 0 â”‚ 2025-05-26 11:41:10.838+02 â”‚ 0 â”‚ {schemas_created=[main]} â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ##### `ducklake_snapshots` {#docs:stable:core_extensions:ducklake::ducklake_snapshots} Returns the snapshots stored in the DuckLake catalog name `catalog`. | Parameter name | Parameter type | Named parameter | Description | | -------------- | -------------- | --------------- | ----------- | | `catalog` | `VARCHAR` | no | | The information is encoded into a table with the following schema: | Column name | Column type | | ---------------- | -------------------------- | | `snapshot_id` | `BIGINT` | | `snapshot_time` | `TIMESTAMP WITH TIME ZONE` | | `schema_version` | `BIGINT` | | `changes` | `MAP(VARCHAR, VARCHAR[])` | ##### `ducklake_table_info` {#docs:stable:core_extensions:ducklake::ducklake_table_info} The `ducklake_table_info` function returns information on the tables stored in the DuckLake catalog named `catalog`. | Parameter name | Parameter type | Named parameter | Description | | -------------- | -------------- | --------------- | ----------- | | `catalog` | `VARCHAR` | no | | The information is encoded into a table with the following schema: | Column name | Column type | | ------------------------ | ----------- | | `table_name` | `VARCHAR` | | `schema_id` | `BIGINT` | | `table_id` | `BIGINT` | | `table_uuid` | `UUID` | | `file_count` | `BIGINT` | | `file_size_bytes` | `BIGINT` | | `delete_file_count` | `BIGINT` | | `delete_file_size_bytes` | `BIGINT` | ##### `ducklake_table_insertions` {#docs:stable:core_extensions:ducklake::ducklake_table_insertions} The `ducklake_table_insertions` function returns the rows inserted in a given table between snapshots of given versions or timestamps. The function has two variants, depending on whether `start_snapshot` and `end_snapshot` have types `BIGINT` or `TIMESTAMP WITH TIME ZONE`. | Parameter name | Parameter type | Named parameter | Description | | ---------------- | ------------------------------------- | --------------- | ----------- | | `catalog` | `VARCHAR` | no | | | `schema_name` | `VARCHAR` | no | | | `table_name` | `VARCHAR` | no | | | `start_snapshot` | `BIGINT` / `TIMESTAMP WITH TIME ZONE` | no | | | `end_snapshot` | `BIGINT` / `TIMESTAMP WITH TIME ZONE` | no | | The schema of the table returned by the function is equivalent to that of the table `table_name`. ##### `ducklake_table_deletions` {#docs:stable:core_extensions:ducklake::ducklake_table_deletions} The `ducklake_table_deletions` function returns the rows deleted from a given table between snapshots of given versions or timestamps. The function has two variants, depending on whether `start_snapshot` and `end_snapshot` have types `BIGINT` or `TIMESTAMP WITH TIME ZONE`. | Parameter name | Parameter type | Named parameter | Description | | ---------------- | ------------------------------------- | --------------- | ----------- | | `catalog` | `VARCHAR` | no | | | `schema_name` | `VARCHAR` | no | | | `table_name` | `VARCHAR` | no | | | `start_snapshot` | `BIGINT` / `TIMESTAMP WITH TIME ZONE` | no | | | `end_snapshot` | `BIGINT` / `TIMESTAMP WITH TIME ZONE` | no | | The schema of the table returned by the function is equivalent to that of the table `table_name`. ##### `ducklake_table_changes` {#docs:stable:core_extensions:ducklake::ducklake_table_changes} The `ducklake_table_changes` function returns the rows changed in a given table between snapshots of given versions or timestamps. The function has two variants, depending on whether `start_snapshot` and `end_snapshot` have types `BIGINT` or `TIMESTAMP WITH TIME ZONE`. | Parameter name | Parameter type | Named parameter | Description | | ---------------- | ------------------------------------- | --------------- | ----------- | | `catalog` | `VARCHAR` | no | | | `schema_name` | `VARCHAR` | no | | | `table_name` | `VARCHAR` | no | | | `start_snapshot` | `BIGINT` / `TIMESTAMP WITH TIME ZONE` | no | | | `end_snapshot` | `BIGINT` / `TIMESTAMP WITH TIME ZONE` | no | | The schema of the table returned by the function contains the following three columns plus the schema of the table `table_name`. | Column name | Column type | Description | | ------------- | ----------- | ---------------------------------------- | | `snapshot_id` | `BIGINT` | | | `rowid` | `BIGINT` | | | `change_type` | `VARCHAR` | The type of change: `insert` or `delete` | #### Commands {#docs:stable:core_extensions:ducklake::commands} ##### `ducklake_cleanup_old_files` {#docs:stable:core_extensions:ducklake::ducklake_cleanup_old_files} The `ducklake_cleanup_old_files` function cleans up old files in the DuckLake denoted by `catalog`. Upon success, it returns a table with a single column (` Success`) and 0 rows. | Parameter name | Parameter type | Named parameter | Description | | -------------- | -------------------------- | --------------- | ----------- | | `catalog` | `VARCHAR` | no | | | `cleanup_all` | `BOOLEAN` | yes | | | `dry_run` | `BOOLEAN` | yes | | | `older_than` | `TIMESTAMP WITH TIME ZONE` | yes | | ##### `ducklake_expire_snapshots` {#docs:stable:core_extensions:ducklake::ducklake_expire_snapshots} The `ducklake_expire_snapshots` function expires snapshots with the versions specified by the `versions` parameter or the ones older than the `older_than` parameter. Upon success, it returns a table with a single column (` Success`) and 0 rows. | Parameter name | Parameter type | Named parameter | Description | | -------------- | -------------------------- | --------------- | ----------- | | `catalog` | `VARCHAR` | no | | | `versions` | `UBIGINT[]` | yes | | | `older_than` | `TIMESTAMP WITH TIME ZONE` | yes | | ##### `ducklake_merge_adjacent_files` {#docs:stable:core_extensions:ducklake::ducklake_merge_adjacent_files} The `ducklake_merge_adjacent_files` function merges adjacent files in the storage. Upon success, it returns a table with a single column (` Success`) and 0 rows. | Parameter name | Parameter type | Named parameter | Description | | -------------- | -------------- | --------------- | ----------- | | `catalog` | `VARCHAR` | no | | #### Compatibility Matrix {#docs:stable:core_extensions:ducklake::compatibility-matrix} The DuckLake specification and the `ducklake` DuckDB extension are currently released together. This may not be the case in the future, where the specification and the extension may have different release cadences. It can also be the case that the extension needs a DuckDB core update, therefore DuckDB versions are also included in this compatibility matrix. | DuckDB | DuckLake Extension | DuckLake Spec | |--------|--------------------|---------------| | 1.4.x | 0.3 | 0.3 | | 1.3.x | 0.2 | 0.2 | | 1.3.x | 0.1 | 0.1 | ## Encodings Extension {#docs:stable:core_extensions:encodings} > **Warning.** The Encodings extension will be available with the release of DuckDB v1.3.0. The `encodings` extension adds supports for reading CSVs using more than 1,000 character encodings. - For a complete list of supported `encodings`, see [All Supported Encodings](#::all-supported-encodings). - For detailed information on character encoding, see the [ICU data repository](https://github.com/unicode-org/icu-data/tree/main/charset/data/ucm). ##### Installing and Loading {#docs:stable:core_extensions:encodings::installing-and-loading} ```sql INSTALL encodings; LOAD encodings; ``` #### Usage {#docs:stable:core_extensions:encodings::usage} Refer to the `encoding` while reading from files. To read a `.csv` file with `shift_jis` encoding: ```sql FROM read_csv('my_shift_jis.csv', encoding = 'shift_jis'); ``` To read a `.csv` file with `windows-1251-2000` encoding: ```sql FROM read_csv('my_windows_1251_2000.csv', encoding = 'windows-1251-2000'); ``` To read a `.csv` file with `windows-1252-2000` encoding: ```sql FROM read_csv('my_windows_1252_2000.csv', encoding = 'windows-1252-2000'); ``` To read a `.csv` file with `EUC_CN` encoding: ```sql FROM read_csv('my_euc_cn.csv', encoding = 'EUC_CN'); ``` #### All Supported Encodings {#docs:stable:core_extensions:encodings::all-supported-encodings} The following table is an alphabetized list of all `encoding` values supported by DuckDB using the `encodings` core extension: | Id | Encoding | |-----:|:------------------------------| |1|`5601`| |2|`8859_1`| |3|`8859_10`| |4|`8859_15`| |5|`8859_2`| |6|`8859_3`| |7|`8859_4`| |8|`8859_5`| |9|`8859_6`| |10|`8859_7`| |11|`8859_8`| |12|`8859_9`| |13|`aix-IBM_udcJP-4.3.6`| |14|`ANSI_X3.110`| |15|`ascii`| |16|`ASMO_449`| |17|`BALTIC`| |18|`big5`| |19|`CNS11643.1986_1`| |20|`CNS11643.1986_2`| |21|`CNS-11643-1992`| |22|`cp037`| |23|`cp1026`| |24|`CP1250`| |25|`CP1251`| |26|`CP1252`| |27|`CP1253`| |28|`CP1254`| |29|`CP1255`| |30|`CP1256`| |31|`CP1257`| |32|`CP1258`| |33|`cp273`| |34|`cp424`| |35|`cp437`| |36|`cp500`| |37|`CP737`| |38|`CP775`| |39|`cp850`| |40|`cp852`| |41|`cp855`| |42|`cp857`| |43|`cp860`| |44|`cp861`| |45|`cp862`| |46|`cp863`| |47|`cp864`| |48|`cp865`| |49|`cp866`| |50|`cp869`| |51|`cp949`| |52|`CSN_369103`| |53|`CWI`| |54|`DEC_MCS`| |55|`EBCDIC_AT_DE`| |56|`EBCDIC_AT_DE_A`| |57|`EBCDIC_CA_FR`| |58|`EBCDIC_DK_NO`| |59|`EBCDIC_DK_NO_A`| |60|`EBCDIC_ES`| |61|`EBCDIC_ES_A`| |62|`EBCDIC_ES_S`| |63|`EBCDIC_FI_SE`| |64|`EBCDIC_FI_SE_A`| |65|`EBCDIC_FR`| |66|`EBCDIC_IS_FRISS`| |67|`EBCDIC_IT`| |68|`EBCDIC_PT`| |69|`EBCDIC_UK`| |70|`EBCDIC_US`| |71|`EUC_CN`| |72|`EUC_JP`| |73|`EUC_KR`| |74|`EUC_TW`| |75|`euc-jp-2007`| |76|`eucTH`| |77|`euc-tw-2014`| |78|`gb18030`| |79|`glibc-ANSI_X3.110-2.3.3`| |80|`glibc-ARMSCII_8-2.3.3`| |81|`glibc-BIG5-2.3.3`| |82|`glibc-BIG5HKSCS-2.3.3`| |83|`glibc-BS_4730-2.3.3`| |84|`glibc-CP10007-2.3.3`| |85|`glibc-CP1125-2.3.3`| |86|`glibc-CP932-2.3.3`| |87|`glibc-CSA_Z243.4_1985_1-2.3.3`| |88|`glibc-CSA_Z243.4_1985_2-2.3.3`| |89|`glibc-DIN_66003-2.3.3`| |90|`glibc-DS_2089-2.3.3`| |91|`glibc-ECMA_CYRILLIC-2.3.3`| |92|`glibc-ES-2.3.3`| |93|`glibc-ES2-2.3.3`| |94|`glibc-EUC_CN-2.3.3`| |95|`glibc-EUC_JP_MS-2.3.3`| |96|`glibc-EUC_JP-2.3.3`| |97|`glibc-EUC_KR-2.3.3`| |98|`glibc-GB_1988_80-2.3.3`| |99|`glibc-GBK-2.3.3`| |100|`glibc-GEORGIAN_ACADEMY-2.3.3`| |101|`glibc-GEORGIAN_PS-2.3.3`| |102|`glibc-IBM1046-2.3.3`| |103|`glibc-IBM1124-2.3.3`| |104|`glibc-IBM1129-2.3.3`| |105|`glibc-IBM1132-2.3.3`| |106|`glibc-IBM1133-2.3.3`| |107|`glibc-IBM1160-2.3.3`| |108|`glibc-IBM1161-2.3.3`| |109|`glibc-IBM1162-2.3.3`| |110|`glibc-IBM1163-2.3.3`| |111|`glibc-IBM1164-2.3.3`| |112|`glibc-IBM856-2.3.3`| |113|`glibc-IBM864-2.3.3`| |114|`glibc-IBM866NAV-2.3.3`| |115|`glibc-IBM870-2.3.3`| |116|`glibc-IBM874-2.3.3`| |117|`glibc-IBM922-2.3.3`| |118|`glibc-IBM943-2.3.3`| |119|`glibc-ISIRI_3342-2.3.3`| |120|`glibc-ISO_5428-2.3.3`| |121|`glibc-ISO_6937-2.3.3`| |122|`glibc-ISO_8859_13-2.3.3`| |123|`glibc-ISO_8859_16-2.3.3`| |124|`glibc-ISO_8859_7-2.3.3`| |125|`glibc-ISO_8859_8-2.3.3`| |126|`glibc-ISO_IR_209-2.3.3`| |127|`glibc-IT-2.3.3`| |128|`glibc-JIS_C6220_1969_RO-2.3.3`| |129|`glibc-JIS_C6229_1984_B-2.3.3`| |130|`glibc-JOHAB-2.3.3`| |131|`glibc-JUS_I.B1.002-2.3.3`| |132|`glibc-KOI8_R-2.3.3`| |133|`glibc-KOI8_T-2.3.3`| |134|`glibc-KOI8_U-2.3.3`| |135|`glibc-KSC5636-2.3.3`| |136|`glibc-MAC_SAMI-2.3.3`| |137|`glibc-MACINTOSH-2.3.3`| |138|`glibc-MSZ_7795.3-2.3.3`| |139|`glibc-NC_NC00_10-2.3.3`| |140|`glibc-NF_Z_62_010_1973-2.3.3`| |141|`glibc-NF_Z_62_010-2.3.3`| |142|`glibc-NS_4551_1-2.3.3`| |143|`glibc-NS_4551_2-2.3.3`| |144|`glibc-PT154-2.3.3`| |145|`glibc-PT-2.3.3`| |146|`glibc-PT2-2.3.3`| |147|`glibc-RK1048-2.3.3`| |148|`glibc-SEN_850200_B-2.3.3`| |149|`glibc-SEN_850200_C-2.3.3`| |150|`glibc-SJIS-2.3.3`| |151|`glibc-T.61_8BIT-2.3.3`| |152|`glibc-UHC-2.3.3`| |153|`glibc-VISCII-2.3.3`| |154|`glibc-WIN_SAMI_2-2.3.3`| |155|`GOST_19768_74`| |156|`GREEK_CCITT`| |157|`GREEK7`| |158|`GREEK7_OLD`| |159|`HP_ROMAN8`| |160|`hpux-big5-11.11`| |161|`hpux-cp1140-11.11`| |162|`hpux-cp1141-11.11`| |163|`hpux-cp1142-11.11`| |164|`hpux-cp1143-11.11`| |165|`hpux-cp1144-11.11`| |166|`hpux-cp1145-11.11`| |167|`hpux-cp1146-11.11`| |168|`hpux-cp1147-11.11`| |169|`hpux-cp1148-11.11`| |170|`hpux-cp1149-11.11`| |171|`hpux-cp1250-11.11`| |172|`hpux-cp1251-11.11`| |173|`hpux-cp1252-11.11`| |174|`hpux-cp1253-11.11`| |175|`hpux-cp1254-11.11`| |176|`hpux-cp1255-11.11`| |177|`hpux-cp1256-11.11`| |178|`hpux-cp1257-11.11`| |179|`hpux-cp1258-11.11`| |180|`hpux-cp437-11.11`| |181|`hpux-cp737-11.11`| |182|`hpux-cp775-11.11`| |183|`hpux-cp850-11.11`| |184|`hpux-cp852-11.11`| |185|`hpux-cp855-11.11`| |186|`hpux-cp857-11.11`| |187|`hpux-cp860-11.11`| |188|`hpux-cp861-11.11`| |189|`hpux-cp862-11.11`| |190|`hpux-cp863-11.11`| |191|`hpux-cp864-11.11`| |192|`hpux-cp865-11.11`| |193|`hpux-cp866-11.11`| |194|`hpux-cp869-11.11`| |195|`hpux-cp874-11.11`| |196|`hpux-eucJP0201-11.11`| |197|`hpux-eucJP-11.11`| |198|`hpux-eucJPMS-11.11`| |199|`hpux-eucKR-11.11`| |200|`hpux-eucTW-11.11`| |201|`hpux-greee-11.11`| |202|`hpux-hkbig5-11.11`| |203|`hpux-hp15CN-11.11`| |204|`hpux-iso87-11.11`| |205|`hpux-roc15-11.11`| |206|`hpux-sjis0201-11.11`| |207|`hpux-sjis-11.11`| |208|`hpux-sjisMS-11.11`| |209|`IBM_1046`| |210|`IBM_1124`| |211|`IBM_1129`| |212|`IBM_1252`| |213|`IBM_850`| |214|`IBM_856`| |215|`IBM_858`| |216|`IBM_932`| |217|`IBM_943`| |218|`IBM_eucJP`| |219|`IBM_eucKR`| |220|`IBM_eucTW`| |221|`IBM_udcJP_GR`| |222|`IBM038`| |223|`IBM1004`| |224|`ibm-1004_P100-1995`| |225|`ibm-1006_P100-1995`| |226|`ibm-1006_X100-1995`| |227|`ibm-1008_P100-1995`| |228|`ibm-1008_X100-1995`| |229|`ibm-1009_P100-1995`| |230|`ibm-1010_P100-1995`| |231|`ibm-1011_P100-1995`| |232|`ibm-1012_P100-1995`| |233|`ibm-1013_P100-1995`| |234|`ibm-1014_P100-1995`| |235|`ibm-1015_P100-1995`| |236|`ibm-1016_P100-1995`| |237|`ibm-1017_P100-1995`| |238|`ibm-1018_P100-1995`| |239|`ibm-1019_P100-1995`| |240|`ibm-1020_P100-2003`| |241|`ibm-1021_P100-2003`| |242|`ibm-1023_P100-2003`| |243|`ibm-1025_P100-1995`| |244|`ibm-1026_P100-1995`| |245|`ibm-1027_P100-1995`| |246|`ibm-1040_P100-1995`| |247|`ibm-1041_P100-1995`| |248|`ibm-1042_P100-1995`| |249|`ibm-1043_P100-1995`| |250|`ibm-1046_X110-1999`| |251|`IBM1047`| |252|`ibm-1047_P100-1995`| |253|`ibm-1051_P100-1999`| |254|`ibm-1088_P100-1995`| |255|`ibm-1089_P100-1995`| |256|`ibm-1097_P100-1995`| |257|`ibm-1097_X100-1995`| |258|`ibm-1098_P100-1995`| |259|`ibm-1098_X100-1995`| |260|`ibm-1100_P100-2003`| |261|`ibm-1101_P100-2003`| |262|`ibm-1102_P100-2003`| |263|`ibm-1103_P100-2003`| |264|`ibm-1104_P100-2003`| |265|`ibm-1105_P100-2003`| |266|`ibm-1106_P100-2003`| |267|`ibm-1107_P100-2003`| |268|`ibm-1112_P100-1995`| |269|`ibm-1114_P100-1995`| |270|`ibm-1114_P100-2001`| |271|`ibm-1115_P100-1995`| |272|`ibm-1122_P100-1999`| |273|`ibm-1123_P100-1995`| |274|`ibm-1124_X100-1996`| |275|`ibm-1125_P100-1997`| |276|`ibm-1126_P100_P100-1997_U3`| |277|`ibm-1126_P100-1997`| |278|`ibm-1127_P100-2004`| |279|`ibm-1129_P100-1997`| |280|`ibm-1130_P100-1997`| |281|`ibm-1131_P100-1997`| |282|`ibm-1132_P100-1997`| |283|`ibm-1132_P100-1998`| |284|`ibm-1133_P100-1997`| |285|`ibm-1137_P100-1999`| |286|`ibm-1137_PMOD-1999`| |287|`ibm-1140_P100-1997`| |288|`ibm-1141_P100-1997`| |289|`ibm-1142_P100-1997`| |290|`ibm-1143_P100-1997`| |291|`ibm-1144_P100-1997`| |292|`ibm-1145_P100-1997`| |293|`ibm-1146_P100-1997`| |294|`ibm-1147_P100-1997`| |295|`ibm-1148_P100-1997`| |296|`ibm-1149_P100-1997`| |297|`ibm-1153_P100-1999`| |298|`ibm-1154_P100-1999`| |299|`ibm-1155_P100-1999`| |300|`ibm-1156_P100-1999`| |301|`ibm-1157_P100-1999`| |302|`ibm-1158_P100-1999`| |303|`ibm-1159_P100-1999`| |304|`ibm-1160_P100-1999`| |305|`ibm-1161_P100-1999`| |306|`ibm-1162_P100-1999`| |307|`ibm-1163_P100-1999`| |308|`ibm-1164_P100-1999`| |309|`ibm-1165_P101-2000`| |310|`ibm-1166_P100-2002`| |311|`ibm-1167_P100-2002`| |312|`ibm-1168_P100-2002`| |313|`ibm-1174_X100-2007`| |314|`ibm-1250_P100-1999`| |315|`ibm-1251_P100-1995`| |316|`ibm-1252_P100-2000`| |317|`ibm-1253_P100-1995`| |318|`ibm-1254_P100-1995`| |319|`ibm-1255_P100-1995`| |320|`ibm-1256_P110-1997`| |321|`ibm-1257_P100-1995`| |322|`ibm-1258_P100-1997`| |323|`ibm-12712_P100-1998`| |324|`ibm-1275_P100-1995`| |325|`ibm-1275_X100-1995`| |326|`ibm-1276_P100-1995`| |327|`ibm-1277_P100-1995`| |328|`ibm-1280_P100-1996`| |329|`ibm-1281_P100-1996`| |330|`ibm-1282_P100-1996`| |331|`ibm-1283_P100-1996`| |332|`ibm-1284_P100-1996`| |333|`ibm-1285_P100-1996`| |334|`ibm-13121_P100-1995`| |335|`ibm-13124_P100-1995`| |336|`ibm-13124_P10A-1995`| |337|`ibm-13125_P100-1997`| |338|`ibm-13140_P101-2000`| |339|`ibm-13143_P101-2000`| |340|`ibm-13145_P101-2000`| |341|`ibm-13156_P101-2000`| |342|`ibm-13157_P101-2000`| |343|`ibm-13162_P101-2000`| |344|`ibm-13218_P100-1996`| |345|`ibm-1350_P110-1997`| |346|`ibm-1351_P110-1997`| |347|`ibm-1362_P100-1997`| |348|`ibm-1362_P110-1999`| |349|`ibm-1363_P100-1997`| |350|`ibm-1363_P10A-1997`| |351|`ibm-1363_P10B-1998`| |352|`ibm-1363_P110-1999`| |353|`ibm-1363_P11A-1999`| |354|`ibm-1363_P11B-1999`| |355|`ibm-1363_P11C-2006`| |356|`ibm-1364_P100-2007`| |357|`ibm-1364_P110-2007`| |358|`ibm-13676_P102-2001`| |359|`ibm-1370_P100-1999`| |360|`ibm-1370_X100-1999`| |361|`ibm-1371_P100-1999`| |362|`ibm-1371_X100-1999`| |363|`ibm-1373_P100-2002`| |364|`ibm-1374_P100_P100-2005_MS`| |365|`ibm-1374_P100-2005`| |366|`ibm-1375_P100-2004`| |367|`ibm-1375_P100-2006`| |368|`ibm-1375_P100-2007`| |369|`ibm-1375_P100-2008`| |370|`ibm-1375_X100-2004`| |371|`ibm-1377_P100_P100-2006_U3`| |372|`ibm-1377_P100-2006`| |373|`ibm-1377_P100-2008`| |374|`ibm-1380_P100-1995`| |375|`ibm-1380_X100-1995`| |376|`ibm-1381_P110-1999`| |377|`ibm-1381_X110-1999`| |378|`ibm-1382_P100-1995`| |379|`ibm-1382_X100-1995`| |380|`ibm-1383_P110-1999`| |381|`ibm-1383_X110-1999`| |382|`ibm-1385_P100-1997`| |383|`ibm-1385_P100-2005`| |384|`ibm-1386_P100-2001`| |385|`ibm-1386_P110-1997`| |386|`ibm-1388_P100-2024`| |387|`ibm-1388_P103-2001`| |388|`ibm-1388_P110-2000`| |389|`ibm-1390_P100-1999`| |390|`ibm-1390_P110-2003`| |391|`ibm-1399_P100-1999`| |392|`ibm-1399_P110-2003`| |393|`ibm-16684_P100-1999`| |394|`ibm-16684_P110-2003`| |395|`ibm-16804_X110-1999`| |396|`ibm-17221_P100-2001`| |397|`ibm-17240_P101-2000`| |398|`ibm-17248_X110-1999`| |399|`ibm-20780_P100-1999`| |400|`ibm-21344_P101-2000`| |401|`ibm-21427_P100-1999`| |402|`ibm-21427_X100-1999`| |403|`ibm-25546_P100-1997`| |404|`IBM256`| |405|`ibm-256_P100-1995`| |406|`ibm-259_P100-1995`| |407|`ibm-259_X100-1995`| |408|`ibm-273_P100-1999`| |409|`IBM274`| |410|`ibm-274_P100-2000`| |411|`IBM275`| |412|`ibm-275_P100-1995`| |413|`IBM277`| |414|`ibm-277_P100-1999`| |415|`IBM278`| |416|`ibm-278_P100-1999`| |417|`IBM280`| |418|`ibm-280_P100-1999`| |419|`IBM281`| |420|`ibm-282_P100-1995`| |421|`IBM284`| |422|`ibm-284_P100-1999`| |423|`IBM285`| |424|`ibm-285_P100-1999`| |425|`ibm-286_P100-2003`| |426|`ibm-28709_P100-1995`| |427|`IBM290`| |428|`ibm-290_P100-1995`| |429|`ibm-293_P100-1995`| |430|`ibm-293_X100-1995`| |431|`IBM297`| |432|`ibm-297_P100-1999`| |433|`ibm-300_P110-1997`| |434|`ibm-300_P120-2006`| |435|`ibm-300_X110-1997`| |436|`ibm-301_P110-1997`| |437|`ibm-301_X110-1997`| |438|`ibm-33058_P100-2000`| |439|`ibm-33722_P120-1999`| |440|`ibm-33722_P12A_P12A-2004_U2`| |441|`ibm-33722_P12A_P12A-2009_U2`| |442|`ibm-33722_P12A-1999`| |443|`ibm-367_P100-1995`| |444|`ibm-37_P100-1999`| |445|`IBM420`| |446|`ibm-420_X110-1999`| |447|`ibm-420_X120-1999`| |448|`IBM423`| |449|`ibm-423_P100-1995`| |450|`ibm-424_P100-1995`| |451|`ibm-425_P101-2000`| |452|`ibm-437_P100-1995`| |453|`ibm-4517_P100-2005`| |454|`ibm-4899_P100-1998`| |455|`ibm-4904_P101-2000`| |456|`ibm-4909_P100-1999`| |457|`ibm-4930_P100-1997`| |458|`ibm-4930_P110-1999`| |459|`ibm-4933_P100-1996`| |460|`ibm-4933_P100-2002`| |461|`ibm-4944_P101-2000`| |462|`ibm-4945_P101-2000`| |463|`ibm-4948_P100-1995`| |464|`ibm-4951_P100-1995`| |465|`ibm-4952_P100-1995`| |466|`ibm-4954_P101-2000`| |467|`ibm-4955_P101-2000`| |468|`ibm-4956_P101-2000`| |469|`ibm-4957_P101-2000`| |470|`ibm-4958_P101-2000`| |471|`ibm-4959_P101-2000`| |472|`ibm-4960_P100-1995`| |473|`ibm-4960_X100-1995`| |474|`ibm-4961_P101-2000`| |475|`ibm-4962_P101-2000`| |476|`ibm-4963_P101-2000`| |477|`ibm-4971_P100-1999`| |478|`ibm-500_P100-1999`| |479|`ibm-5012_P100-1999`| |480|`ibm-5026_P120-1999`| |481|`ibm-5026_X120-1999`| |482|`ibm-5035_P120_P12A-2005_U2`| |483|`ibm-5035_P120-1999`| |484|`ibm-5035_X120-1999`| |485|`ibm-5039_P110-1996`| |486|`ibm-5039_P11A-1998`| |487|`ibm-5048_P100-1995`| |488|`ibm-5049_P100-1995`| |489|`ibm-5050_P120-1999`| |490|`ibm-5050_P12A-1999`| |491|`ibm-5067_P100-1995`| |492|`ibm-5104_X110-1999`| |493|`ibm-5123_P100-1999`| |494|`ibm-5142_P100-1995`| |495|`ibm-5210_P100-1999`| |496|`ibm-5233_P100-2011`| |497|`ibm-5346_P100-1998`| |498|`ibm-5347_P100-1998`| |499|`ibm-5348_P100-1997`| |500|`ibm-5349_P100-1998`| |501|`ibm-5350_P100-1998`| |502|`ibm-5351_P100-1998`| |503|`ibm-5352_P100-1998`| |504|`ibm-5353_P100-1998`| |505|`ibm-5354_P100-1998`| |506|`ibm-53685_P101-2000`| |507|`ibm-54191_P100-2006`| |508|`ibm-5470_P100_P100-2005_MS`| |509|`ibm-5470_P100-2005`| |510|`ibm-5471_P100-2006`| |511|`ibm-5471_P100-2007`| |512|`ibm-5473_P100-2006`| |513|`ibm-5478_P100-1995`| |514|`ibm-5486_P100-1999`| |515|`ibm-5487_P100-2001`| |516|`ibm-5488_P100-2001`| |517|`ibm-5495_P100-1999`| |518|`ibm-62383_P100-2007`| |519|`ibm-720_P100-1997`| |520|`ibm-737_P100-1997`| |521|`ibm-775_P100-1996`| |522|`ibm-803_P100-1999`| |523|`ibm-806_P100-1998`| |524|`ibm-808_P100-1999`| |525|`ibm-813_P100-1995`| |526|`ibm-819_P100-1999`| |527|`ibm-833_P100-1995`| |528|`ibm-834_P100-1995`| |529|`ibm-834_X100-1995`| |530|`ibm-835_P100-1995`| |531|`ibm-835_X100-1995`| |532|`ibm-836_P100-1995`| |533|`ibm-837_P100-1995`| |534|`ibm-837_P100-2011`| |535|`ibm-837_X100-1995`| |536|`ibm-838_P100-1995`| |537|`ibm-848_P100-1999`| |538|`ibm-8482_P100-1999`| |539|`ibm-849_P100-1999`| |540|`ibm-850_P100-1999`| |541|`IBM851`| |542|`ibm-851_P100-1995`| |543|`ibm-852_P100-1999`| |544|`ibm-855_P100-1995`| |545|`ibm-856_P100-1995`| |546|`ibm-857_P100-1995`| |547|`ibm-858_P100-1997`| |548|`ibm-859_P100-1999`| |549|`ibm-860_P100-1995`| |550|`ibm-861_P100-1995`| |551|`ibm-8612_P100-1995`| |552|`ibm-8612_X110-1995`| |553|`ibm-862_P100-1995`| |554|`ibm-863_P100-1995`| |555|`ibm-864_X110-1999`| |556|`ibm-864_X120-2012`| |557|`ibm-865_P100-1995`| |558|`ibm-866_P100-1995`| |559|`ibm-867_P100-1998`| |560|`IBM868`| |561|`ibm-868_P100-1995`| |562|`ibm-868_X100-1995`| |563|`ibm-869_P100-1995`| |564|`IBM870`| |565|`ibm-870_P100-1999`| |566|`IBM871`| |567|`ibm-871_P100-1999`| |568|`ibm-872_P100-1999`| |569|`IBM874`| |570|`ibm-874_P100-1995`| |571|`IBM875`| |572|`ibm-875_P100-1995`| |573|`ibm-878_P100-1996`| |574|`IBM880`| |575|`ibm-880_P100-1995`| |576|`IBM891`| |577|`ibm-891_P100-1995`| |578|`ibm-895_P100-1995`| |579|`ibm-896_P100-1995`| |580|`ibm-897_P100-1995`| |581|`ibm-9005_X100-2005`| |582|`ibm-9005_X110-2007`| |583|`ibm-901_P100-1999`| |584|`ibm-902_P100-1999`| |585|`ibm-9027_P100-1999`| |586|`ibm-9027_X100-1999`| |587|`IBM903`| |588|`ibm-903_P100-1995`| |589|`ibm-9030_P100-1995`| |590|`IBM904`| |591|`ibm-904_P100-1995`| |592|`ibm-9042_P101-2000`| |593|`ibm-9044_P100-1999`| |594|`ibm-9048_P100-1998`| |595|`ibm-9049_P100-1999`| |596|`IBM905`| |597|`ibm-905_P100-1995`| |598|`ibm-9056_P100-1995`| |599|`ibm-9061_P100-1999`| |600|`ibm-9064_P101-2000`| |601|`ibm-9066_P100-1995`| |602|`ibm-9067_X100-2005`| |603|`ibm-912_P100-1999`| |604|`ibm-913_P100-2000`| |605|`ibm-914_P100-1995`| |606|`ibm-9145_P110-1997`| |607|`ibm-9145_X110-1997`| |608|`ibm-915_P100-1995`| |609|`ibm-916_P100-1995`| |610|`IBM918`| |611|`ibm-918_P100-1995`| |612|`ibm-918_X100-1995`| |613|`ibm-920_P100-1995`| |614|`ibm-921_P100-1995`| |615|`ibm-922_P100-1999`| |616|`ibm-923_P100-1998`| |617|`ibm-9238_X110-1999`| |618|`ibm-924_P100-1998`| |619|`ibm-926_P100-2000`| |620|`ibm-927_P100-1995`| |621|`ibm-927_X100-1995`| |622|`ibm-928_P100-1995`| |623|`ibm-930_P120_P12A-2006_U2`| |624|`ibm-930_P120-1999`| |625|`ibm-930_X120-1999`| |626|`ibm-9306_P101-2000`| |627|`ibm-931_P120-1999`| |628|`ibm-931_X120-1999`| |629|`ibm-932_P120-1999`| |630|`ibm-932_P12A_P12A-2000_U2`| |631|`ibm-932_P12A-1999`| |632|`ibm-933_P110-1999`| |633|`ibm-933_X110-1999`| |634|`ibm-935_P110-1999`| |635|`ibm-935_X110-1999`| |636|`ibm-937_P110-1999`| |637|`ibm-937_X110-1999`| |638|`ibm-939_P120_P12A-2005_U2`| |639|`ibm-939_P120-1999`| |640|`ibm-939_X120-1999`| |641|`ibm-941_P120-1996`| |642|`ibm-941_P12A-1996`| |643|`ibm-941_P130-2001`| |644|`ibm-941_P13A-2001`| |645|`ibm-941_X110-1996`| |646|`ibm-941_X11A-1996`| |647|`ibm-942_P120-1999`| |648|`ibm-942_P12A_P12A-2000_U2`| |649|`ibm-942_P12A-1999`| |650|`ibm-943_P130-1999`| |651|`ibm-943_P14A-1999`| |652|`ibm-943_P15A-2003`| |653|`ibm-944_P100-1995`| |654|`ibm-944_X100-1995`| |655|`ibm-9444_P100_P100-2005_MS`| |656|`ibm-9444_P100-2001`| |657|`ibm-9444_P100-2005`| |658|`ibm-9447_P100-2002`| |659|`ibm-9448_X100-2005`| |660|`ibm-9449_P100-2002`| |661|`ibm-946_P100-1995`| |662|`ibm-947_P100-1995`| |663|`ibm-947_X100-1995`| |664|`ibm-948_P110-1999`| |665|`ibm-948_X110-1999`| |666|`ibm-949_P110-1999`| |667|`ibm-949_P11A-1999`| |668|`ibm-949_X110-1999`| |669|`ibm-950_P110-1999`| |670|`ibm-950_X110-1999`| |671|`ibm-951_P100-1995`| |672|`ibm-951_X100-1995`| |673|`ibm-952_P110-1997`| |674|`ibm-953_P100-2000`| |675|`ibm-954_P101-2007`| |676|`ibm-955_P110-1997`| |677|`ibm-9577_P100-2001`| |678|`ibm-9580_P110-1999`| |679|`ibm-960_P100-2000`| |680|`ibm-963_P100-1995`| |681|`ibm-964_P110-1999`| |682|`ibm-964_X110-1999`| |683|`ibm-970_P110_P110-2006_U2`| |684|`ibm-970_P110-1999`| |685|`ibm-971_P100-1995`| |686|`IEC_P27_1`| |687|`INIS`| |688|`INIS_8`| |689|`INIS_CYRILLIC`| |690|`ISO_10367_BOX`| |691|`ISO_5427`| |692|`ISO_5427_EXT`| |693|`ISO_5428`| |694|`ISO_8859_1`| |695|`ISO_8859_10`| |696|`ISO_8859_11`| |697|`ISO_8859_13`| |698|`ISO_8859_14`| |699|`ISO_8859_15`| |700|`ISO_8859_2`| |701|`ISO_8859_3`| |702|`ISO_8859_4`| |703|`ISO_8859_5`| |704|`ISO_8859_6`| |705|`ISO_8859_7`| |706|`ISO_8859_8`| |707|`ISO_8859_9`| |708|`ISO_IR_197`| |709|`ISO646_US`| |710|`iso81`| |711|`iso815`| |712|`iso82`| |713|`iso85`| |714|`iso86`| |715|`iso87`| |716|`iso88`| |717|`ISO8859_1`| |718|`iso-8859_10-1998`| |719|`iso-8859_11-2001`| |720|`iso-8859_1-1998`| |721|`iso-8859_13-1998`| |722|`iso-8859_14-1998`| |723|`ISO8859_15`| |724|`iso-8859_15-1999`| |725|`iso-8859_16-2001`| |726|`ISO8859_2`| |727|`iso-8859_2-1999`| |728|`ISO8859_3`| |729|`iso-8859_3-1999`| |730|`ISO8859_4`| |731|`iso-8859_4-1998`| |732|`ISO8859_5`| |733|`iso-8859_5-1999`| |734|`ISO8859_6`| |735|`iso-8859_6-1999`| |736|`ISO8859_7`| |737|`iso-8859_7-1987`| |738|`iso-8859_7-2003`| |739|`ISO8859_8`| |740|`iso-8859_8-1999`| |741|`ISO8859_9`| |742|`iso-8859_9-1999`| |743|`iso89`| |744|`java-ASCII-1.3_P`| |745|`java-Big5-1.3_P`| |746|`java-Cp037-1.3_P`| |747|`java-Cp1006-1.3_P`| |748|`java-Cp1025-1.3_P`| |749|`java-Cp1026-1.3_P`| |750|`java-Cp1097-1.3_P`| |751|`java-Cp1098-1.3_P`| |752|`java-Cp1112-1.3_P`| |753|`java-Cp1122-1.3_P`| |754|`java-Cp1123-1.3_P`| |755|`java-Cp1124-1.3_P`| |756|`java-Cp1250-1.3_P`| |757|`java-Cp1251-1.3_P`| |758|`java-Cp1252-1.3_P`| |759|`java-Cp1253-1.3_P`| |760|`java-Cp1254-1.3_P`| |761|`java-Cp1255-1.3_P`| |762|`java-Cp1256-1.3_P`| |763|`java-Cp1257-1.3_P`| |764|`java-Cp1258-1.3_P`| |765|`java-Cp1381-1.3_P`| |766|`java-Cp1383-1.3_P`| |767|`java-Cp273-1.3_P`| |768|`java-Cp277-1.3_P`| |769|`java-Cp278-1.3_P`| |770|`java-Cp280-1.3_P`| |771|`java-Cp284-1.3_P`| |772|`java-Cp285-1.3_P`| |773|`java-Cp297-1.3_P`| |774|`java-Cp33722-1.3_P`| |775|`java-Cp420-1.3_P`| |776|`java-Cp424-1.3_P`| |777|`java-Cp437-1.3_P`| |778|`java-Cp500-1.3_P`| |779|`java-Cp737-1.3_P`| |780|`java-Cp775-1.3_P`| |781|`java-Cp838-1.3_P`| |782|`java-Cp850-1.3_P`| |783|`java-Cp852-1.3_P`| |784|`java-Cp855-1.3_P`| |785|`java-Cp856-1.3_P`| |786|`java-Cp857-1.3_P`| |787|`java-Cp860-1.3_P`| |788|`java-Cp861-1.3_P`| |789|`java-Cp862-1.3_P`| |790|`java-Cp863-1.3_P`| |791|`java-Cp864-1.3_P`| |792|`java-Cp865-1.3_P`| |793|`java-Cp866-1.3_P`| |794|`java-Cp868-1.3_P`| |795|`java-Cp869-1.3_P`| |796|`java-Cp870-1.3_P`| |797|`java-Cp871-1.3_P`| |798|`java-Cp874-1.3_P`| |799|`java-Cp875-1.3_P`| |800|`java-Cp918-1.3_P`| |801|`java-Cp921-1.3_P`| |802|`java-Cp922-1.3_P`| |803|`java-Cp930-1.3_P`| |804|`java-Cp933-1.3_P`| |805|`java-Cp935-1.3_P`| |806|`java-Cp937-1.3_P`| |807|`java-Cp939-1.3_P`| |808|`java-Cp942-1.3_P`| |809|`java-Cp942C-1.3_P`| |810|`java-Cp943-1.2.2`| |811|`java-Cp943C-1.3_P`| |812|`java-Cp948-1.3_P`| |813|`java-Cp949-1.3_P`| |814|`java-Cp949C-1.3_P`| |815|`java-Cp950-1.3_P`| |816|`java-Cp964-1.3_P`| |817|`java-Cp970-1.3_P`| |818|`java-EUC_CN-1.3_P`| |819|`java-EUC_JP-1.3_P`| |820|`java-EUC_KR-1.3_P`| |821|`java-EUC_TW-1.3_P`| |822|`java-ISO2022JP-1.3_P`| |823|`java-ISO2022KR-1.3_P`| |824|`java-ISO8859_1-1.3_P`| |825|`java-ISO8859_13-1.3_P`| |826|`java-ISO8859_2-1.3_P`| |827|`java-ISO8859_3-1.3_P`| |828|`java-ISO8859_4-1.3_P`| |829|`java-ISO8859_5-1.3_P`| |830|`java-ISO8859_6-1.3_P`| |831|`java-ISO8859_7-1.3_P`| |832|`java-ISO8859_8-1.3_P`| |833|`java-ISO8859_9-1.3_P`| |834|`java-Johab-1.3_P`| |835|`java-KOI8_R-1.3_P`| |836|`java-MS874-1.3_P`| |837|`java-MS932-1.3_P`| |838|`java-MS949-1.3_P`| |839|`java-SJIS-1.3_P`| |840|`java-TIS620-1.3_P`| |841|`JISX0201.1976_0`| |842|`JISX0201.1976_GR`| |843|`JISX0208.1983_0`| |844|`JISX0208.1983_GR`| |845|`KOI_8`| |846|`KOI8_R`| |847|`KOI8_U`| |848|`KSC5601.1987_0`| |849|`LATIN_GREEK`| |850|`LATIN_GREEK_1`| |851|`latin-1`| |852|`MAC_IS`| |853|`mac_roman`| |854|`MAC_UK`| |855|`macos-0_1-10.2`| |856|`macos-0_2-10.2`| |857|`macos-1024-10.2`| |858|`macos-1040-10.2`| |859|`macos-1049-10.2`| |860|`macos-1057-10.2`| |861|`macos-1059-10.2`| |862|`macos-1280-10.2`| |863|`macos-1281-10.2`| |864|`macos-1282-10.2`| |865|`macos-1283-10.2`| |866|`macos-1284-10.2`| |867|`macos-1285-10.2`| |868|`macos-1286-10.2`| |869|`macos-1287-10.2`| |870|`macos-1288-10.2`| |871|`macos-1536-10.2`| |872|`macos-21-10.5`| |873|`macos-2562-10.2`| |874|`macos-2563-10.2`| |875|`macos-2566-10.2`| |876|`macos-2817-10.2`| |877|`macos-29-10.2`| |878|`macos-3074-10.2`| |879|`macos-33-10.5`| |880|`macos-34-10.2`| |881|`macos-35-10.2`| |882|`macos-36_1-10.2`| |883|`macos-36_2-10.2`| |884|`macos-37_2-10.2`| |885|`macos-37_3-10.2`| |886|`macos-37_4-10.2`| |887|`macos-37_5-10.2`| |888|`macos-38_1-10.2`| |889|`macos-38_2-10.2`| |890|`macos-513-10.2`| |891|`macos-514-10.2`| |892|`macos-515-10.2`| |893|`macos-516-10.2`| |894|`macos-517-10.2`| |895|`macos-518-10.2`| |896|`macos-519-10.2`| |897|`macos-520-10.2`| |898|`macos-521-10.2`| |899|`macos-527-10.2`| |900|`macos-6_2-10.4`| |901|`macos-6-10.2`| |902|`macos-7_1-10.2`| |903|`macos-7_2-10.2`| |904|`macos-7_3-10.2`| |905|`NATS_DANO`| |906|`NATS_SEFI`| |907|`osd-EBCDIC-DF03-IRV`| |908|`osd-EBCDIC-DF04-1`| |909|`osd-EBCDIC-DF04-15`| |910|`PCK`| |911|`roma8`| |912|`shift_jis`| |913|`solaris-zh_HK.hkscs-5.9`| |914|`solaris-zh_TW_big5-2.7`| |915|`thai8`| |916|`TIS_620`| |917|`utf-16`| |918|`utf-8`| |919|`windows-10000-2000`| |920|`windows-10001-2000`| |921|`windows-10002-2000`| |922|`windows-10003-2000`| |923|`windows-10004-2000`| |924|`windows-10005-2000`| |925|`windows-10006-2000`| |926|`windows-10007-2000`| |927|`windows-10008-2000`| |928|`windows-10010-2000`| |929|`windows-10017-2000`| |930|`windows-10021-2000`| |931|`windows-10029-2000`| |932|`windows-10079-2000`| |933|`windows-10081-2000`| |934|`windows-10082-2000`| |935|`windows-1026-2000`| |936|`windows-1047-2000`| |937|`windows-1140-2000`| |938|`windows-1141-2000`| |939|`windows-1142-2000`| |940|`windows-1143-2000`| |941|`windows-1144-2000`| |942|`windows-1145-2000`| |943|`windows-1146-2000`| |944|`windows-1147-2000`| |945|`windows-1148-2000`| |946|`windows-1149-2000`| |947|`windows-1250-2000`| |948|`windows-1251-2000`| |949|`windows-1252-2000`| |950|`windows-1253-2000`| |951|`windows-1254-2000`| |952|`windows-1255-2000`| |953|`windows-1256-2000`| |954|`windows-1257-2000`| |955|`windows-1258_db-2013`| |956|`windows-1258-2000`| |957|`windows-1361-2000`| |958|`windows-20000-2000`| |959|`windows-20001-2000`| |960|`windows-20002-2000`| |961|`windows-20003-2000`| |962|`windows-20004-2000`| |963|`windows-20005-2000`| |964|`windows-20105-2000`| |965|`windows-20106-2000`| |966|`windows-20107-2000`| |967|`windows-20108-2000`| |968|`windows-20127-2000`| |969|`windows-20261-2000`| |970|`windows-20269-2000`| |971|`windows-20273-2000`| |972|`windows-20277-2000`| |973|`windows-20278-2000`| |974|`windows-20280-2000`| |975|`windows-20284-2000`| |976|`windows-20285-2000`| |977|`windows-20290-2000`| |978|`windows-20297-2000`| |979|`windows-20420-2000`| |980|`windows-20423-2000`| |981|`windows-20424-2000`| |982|`windows-20833-2000`| |983|`windows-20838-2000`| |984|`windows-20866-2000`| |985|`windows-20871-2000`| |986|`windows-20880-2000`| |987|`windows-20905-2000`| |988|`windows-20924-2000`| |989|`windows-20932-2000`| |990|`windows-20936-2000`| |991|`windows-20949-2000`| |992|`windows-21025-2000`| |993|`windows-21027-2000`| |994|`windows-21866-2000`| |995|`windows-28591-2000`| |996|`windows-28592-2000`| |997|`windows-28593-2000`| |998|`windows-28594-2000`| |999|`windows-28595-2000`| |1000|`windows-28596-2000`| |1001|`windows-28597-2000`| |1002|`windows-28598-2000`| |1003|`windows-28599-2000`| |1004|`windows-28603-vista`| |1005|`windows-28605-2000`| |1006|`windows-37-2000`| |1007|`windows-38598-2000`| |1008|`windows-437-2000`| |1009|`windows-500-2000`| |1010|`windows-51932-2006`| |1011|`windows-51936-2000`| |1012|`windows-51949-2000`| |1013|`windows-708-2000`| |1014|`windows-720-2000`| |1015|`windows-737-2000`| |1016|`windows-775-2000`| |1017|`windows-850-2000`| |1018|`windows-852-2000`| |1019|`windows-855-2000`| |1020|`windows-857-2000`| |1021|`windows-858-2000`| |1022|`windows-860-2000`| |1023|`windows-861-2000`| |1024|`windows-862-2000`| |1025|`windows-863-2000`| |1026|`windows-864-2000`| |1027|`windows-865-2000`| |1028|`windows-866-2000`| |1029|`windows-869-2000`| |1030|`windows-870-2000`| |1031|`windows-874-2000`| |1032|`windows-875-2000`| |1033|`windows-932-2000`| |1034|`windows-936-2000`| |1035|`windows-949-2000`| |1036|`windows-950_hkscs-2001`| |1037|`windows-950-2000`| |1038|`zh_CN.euc`| |1039|`zh_CN.gbk`| |1040|`zh_CN_cp935`| |1041|`zh_TW_cp937`| |1042|`zh_TW_euc`| ## Excel Extension {#docs:stable:core_extensions:excel} The `excel` extension provides functions to format numbers per Excel's formatting rules by wrapping the [i18npool library](https://www.openoffice.org/l10n/i18n_framework/index.html), but as of DuckDB 1.2 also provides functionality to read and write Excel (` .xlsx`) files. However, `.xls` files are not supported. Previously, reading and writing Excel files was handled through the [`spatial` extension](#docs:stable:core_extensions:spatial:overview), which coincidentally included support for XLSX files through one of its dependencies, but this capability may be removed from the spatial extension in the future. Additionally, the `excel` extension is more efficient and provides more control over the import/export process. See the [Excel Import](#docs:stable:guides:file_formats:excel_import) and [Excel Export](#docs:stable:guides:file_formats:excel_export) pages for instructions. #### Installing and Loading {#docs:stable:core_extensions:excel::installing-and-loading} The `excel` extension will be transparently [autoloaded](#docs:stable:extensions:overview::autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run: ```sql INSTALL excel; LOAD excel; ``` #### Excel Scalar Functions {#docs:stable:core_extensions:excel::excel-scalar-functions} | Function | Description | | :---------------------------------- | :------------------------------------------------------------------- | | `excel_text(number, format_string)` | Format the given `number` per the rules given in the `format_string` | | `text(number, format_string)` | Alias for `excel_text` | #### Examples {#docs:stable:core_extensions:excel::examples} ```sql SELECT excel_text(1_234_567.897, 'h:mm AM/PM') AS timestamp; ``` | timestamp | | --------- | | 9:31 PM | ```sql SELECT excel_text(1_234_567.897, 'h AM/PM') AS timestamp; ``` | timestamp | | --------- | | 9 PM | #### Reading XLSX Files {#docs:stable:core_extensions:excel::reading-xlsx-files} Reading a `.xlsx` file is as simple as just `SELECT`ing from it immediately, e.g.: ```sql SELECT * FROM 'test.xlsx'; ``` | a | b | | --: | --: | | 1.0 | 2.0 | | 3.0 | 4.0 | However, if you want to set additional options to control the import process, you can use the `read_xlsx` function instead. The following named parameters are supported. | Option | Type | Default | Description | | ------------------ | --------- | ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `header` | `BOOLEAN` | _automatically inferred_ | Whether to treat the first row as containing the names of the resulting columns. | | `sheet` | `VARCHAR` | _automatically inferred_ | The name of the sheet in the xlsx file to read. Default is the first sheet. | | `all_varchar` | `BOOLEAN` | `false` | Whether to read all cells as containing `VARCHAR`s. | | `ignore_errors` | `BOOLEAN` | `false` | Whether to ignore errors and silently replace cells that cant be cast to the corresponding inferred column type with `NULL`'s. | | `range` | `VARCHAR` | _automatically inferred_ | The range of cells to read, in spreadsheet notation. For example, `A1:B2` reads the cells from A1 to B2. If not specified the resulting range will be inferred as rectangular region of cells between the first row of consecutive non-empty cells and the first empty row spanning the same columns. | | `stop_at_empty` | `BOOLEAN` | _automatically inferred_ | Whether to stop reading the file when an empty row is encountered. If an explicit `range` option is provided, this is `false` by default, otherwise `true`. | | `empty_as_varchar` | `BOOLEAN` | `false` | Whether to treat empty cells as `VARCHAR` instead of `DOUBLE` when trying to automatically infer column types. | ```sql SELECT * FROM read_xlsx('test.xlsx', header = true); ``` | a | b | | --: | --: | | 1.0 | 2.0 | | 3.0 | 4.0 | Alternatively, the `COPY` statement with the `XLSX` format option can be used to import an Excel file into an existing table, in which case the types of the columns in the target table will be used to coerce the types of the cells in the Excel file. ```sql CREATE TABLE test (a DOUBLE, b DOUBLE); COPY test FROM 'test.xlsx' WITH (FORMAT xlsx, HEADER); SELECT * FROM test; ``` ##### Type and Range Inference {#docs:stable:core_extensions:excel::type-and-range-inference} Because Excel itself only really stores numbers or strings in cells, and does not enforce that all cells in a column are of the same type, the `excel` extension has to do some guesswork to "infer" and decide the types of the columns when importing an Excel sheet. While almost all columns are inferred as either `DOUBLE` or `VARCHAR`, there are some caveats: * `TIMESTAMP`, `TIME`, `DATE` and `BOOLEAN` types are inferred when possible based on the _format_ applied to the cell. * Text cells containing `TRUE` and `FALSE` are inferred as `BOOLEAN`. * Empty cells are considered to be `DOUBLE` by default, unless the `empty_as_varchar` option is set to `true`, in which case they are typed as `VARCHAR`. If the `all_varchar` option is set to `true`, none of the above applies and all cells are read as `VARCHAR`. When no types are specified explicitly, (e.g., when using the `read_xlsx` function instead of `COPY TO ... FROM 'âŸ¨fileâŸ©.xlsx'`{:.language-sql .highlight}) the types of the resulting columns are inferred based on the first "data" row in the sheet, that is: * If no explicit range is given * The first row after the header if a header is found or forced by the `header` option * The first non-empty row in the sheet if no header is found or forced * If an explicit range is given * The second row of the range if a header is found in the first row or forced by the `header` option * The first row of the range if no header is found or forced This can sometimes lead to issues if the first "data row" is not representative of the rest of the sheet (e.g., it contains empty cells) in which case the `ignore_errors` or `empty_as_varchar` options can be used to work around this. However, when the `COPY TO ... FROM 'âŸ¨fileâŸ©.xlsx'` syntax is used, no type inference is done and the types of the resulting columns are determined by the types of the columns in the table being copied to. All cells will simply be converted by casting from `DOUBLE` or `VARCHAR` to the target column type. #### Writing XLSX Files {#docs:stable:core_extensions:excel::writing-xlsx-files} Writing `.xlsx` files is supported using the `COPY` statement with `XLSX` given as the format. The following additional parameters are supported. | Option | Type | Default | Description | | ----------------- | --------- | --------- | ------------------------------------------------------------------------------------ | | `header` | `BOOLEAN` | `false` | Whether to write the column names as the first row in the sheet | | `sheet` | `VARCHAR` | `Sheet1` | The name of the sheet in the xlsx file to write. | | `sheet_row_limit` | `INTEGER` | `1048576` | The maximum number of rows in a sheet. An error is thrown if this limit is exceeded. | > **Warning.** Many tools only support a maximum of 1,048,576 rows in a sheet, so increasing the `sheet_row_limit` may render the resulting file unreadable by other software. These are passed as options to the `COPY` statement after the `FORMAT`, e.g.: ```sql CREATE TABLE test AS SELECT * FROM (VALUES (1, 2), (3, 4)) AS t(a, b); COPY test TO 'test.xlsx' WITH (FORMAT xlsx, HEADER true); ``` ##### Type Conversions {#docs:stable:core_extensions:excel::type-conversions} Because XLSX files only really support storing numbers or strings â€“ the equivalent of `VARCHAR` and `DOUBLE`, the following type conversions are applied when writing XLSX files. * Numeric types are cast to `DOUBLE` when writing to an XLSX file. * Temporal types (` TIMESTAMP`, `DATE`, `TIME`, etc.) are converted to excel "serial" numbers, that is the number of days since 1900-01-01 for dates and the fraction of a day for times. These are then styled with a "number format" so that they appear as dates or times when opened in Excel. * `TIMESTAMP_TZ` and `TIME_TZ` are cast to UTC `TIMESTAMP` and `TIME` respectively, with the timezone information being lost. * `BOOLEAN`s are converted to `1` and `0`, with a "number format" applied to make them appear as `TRUE` and `FALSE` in Excel. * All other types are cast to `VARCHAR` and then written as text cells. ## Full-Text Search Extension {#docs:stable:core_extensions:full_text_search} Full-Text Search is an extension to DuckDB that allows for search through strings, similar to [SQLite's FTS5 extension](https://www.sqlite.org/fts5.html). #### Installing and Loading {#docs:stable:core_extensions:full_text_search::installing-and-loading} The `fts` extension will be transparently [autoloaded](#docs:stable:extensions:overview::autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run: ```sql INSTALL fts; LOAD fts; ``` #### Usage {#docs:stable:core_extensions:full_text_search::usage} The extension adds two `PRAGMA` statements to DuckDB: one to create, and one to drop an index. Additionally, a scalar macro `stem` is added, which is used internally by the extension. ##### `PRAGMA create_fts_index` {#docs:stable:core_extensions:full_text_search::pragma-create_fts_index} ```python create_fts_index(input_table, input_id, *input_values, stemmer = 'porter', stopwords = 'english', ignore = '(\\.|[^a-z])+', strip_accents = 1, lower = 1, overwrite = 0) ``` `PRAGMA` that creates a FTS index for the specified table. | Name | Type | Description | |:--|:--|:----------| | `input_table` | `VARCHAR` | Qualified name of specified table, e.g., `'table_name'` or `'main.table_name'` | | `input_id` | `VARCHAR` | Column name of document identifier, e.g., `'document_identifier'` | | `input_values...` | `VARCHAR` | Column names of the text fields to be indexed (vararg), e.g., `'text_field_1'`, `'text_field_2'`, ..., `'text_field_N'`, or `'\*'` for all columns in input_table of type `VARCHAR` | | `stemmer` | `VARCHAR` | The type of stemmer to be used. One of `'arabic'`, `'basque'`, `'catalan'`, `'danish'`, `'dutch'`, `'english'`, `'finnish'`, `'french'`, `'german'`, `'greek'`, `'hindi'`, `'hungarian'`, `'indonesian'`, `'irish'`, `'italian'`, `'lithuanian'`, `'nepali'`, `'norwegian'`, `'porter'`, `'portuguese'`, `'romanian'`, `'russian'`, `'serbian'`, `'spanish'`, `'swedish'`, `'tamil'`, `'turkish'`, or `'none'` if no stemming is to be used. Defaults to `'porter'` | | `stopwords` | `VARCHAR` | Qualified name of table containing a single `VARCHAR` column containing the desired stopwords, or `'none'` if no stopwords are to be used. Defaults to `'english'` for a pre-defined list of 571 English stopwords | | `ignore` | `VARCHAR` | Regular expression of patterns to be ignored. Defaults to `'(\\.|[^a-z])+'`, ignoring all escaped and non-alphabetic lowercase characters | | `strip_accents` | `BOOLEAN` | Whether to remove accents (e.g., convert `Ã¡` to `a`). Defaults to `1` | | `lower` | `BOOLEAN` | Whether to convert all text to lowercase. Defaults to `1` | | `overwrite` | `BOOLEAN` | Whether to overwrite an existing index on a table. Defaults to `0` | This `PRAGMA` builds the index under a newly created schema. The schema will be named after the input table: if an index is created on table `'main.table_name'`, then the schema will be named `'fts_main_table_name'`. ##### `PRAGMA drop_fts_index` {#docs:stable:core_extensions:full_text_search::pragma-drop_fts_index} ```python drop_fts_index(input_table) ``` Drops a FTS index for the specified table. | Name | Type | Description | |:--|:--|:-----------| | `input_table` | `VARCHAR` | Qualified name of input table, e.g., `'table_name'` or `'main.table_name'` | ##### `match_bm25` Function {#docs:stable:core_extensions:full_text_search::match_bm25-function} ```python match_bm25(input_id, query_string, fields := NULL, k := 1.2, b := 0.75, conjunctive := 0) ``` When an index is built, this retrieval macro is created that can be used to search the index. | Name | Type | Description | |:--|:--|:----------| | `input_id` | `VARCHAR` | Column name of document identifier, e.g., `'document_identifier'` | | `query_string` | `VARCHAR` | The string to search the index for | | `fields` | `VARCHAR` | Comma-separarated list of fields to search in, e.g., `'text_field_2, text_field_N'`. Defaults to `NULL` to search all indexed fields | | `k` | `DOUBLE` | Parameter _k₁_ in the Okapi BM25 retrieval model. Defaults to `1.2` | | `b` | `DOUBLE` | Parameter _b_ in the Okapi BM25 retrieval model. Defaults to `0.75` | | `conjunctive` | `BOOLEAN` | Whether to make the query conjunctive i.e., all terms in the query string must be present in order for a document to be retrieved | ##### `stem` Function {#docs:stable:core_extensions:full_text_search::stem-function} ```python stem(input_string, stemmer) ``` Reduces words to their base. Used internally by the extension. | Name | Type | Description | |:--|:--|:----------| | `input_string` | `VARCHAR` | The column or constant to be stemmed. | | `stemmer` | `VARCHAR` | The type of stemmer to be used. One of `'arabic'`, `'basque'`, `'catalan'`, `'danish'`, `'dutch'`, `'english'`, `'finnish'`, `'french'`, `'german'`, `'greek'`, `'hindi'`, `'hungarian'`, `'indonesian'`, `'irish'`, `'italian'`, `'lithuanian'`, `'nepali'`, `'norwegian'`, `'porter'`, `'portuguese'`, `'romanian'`, `'russian'`, `'serbian'`, `'spanish'`, `'swedish'`, `'tamil'`, `'turkish'`, or `'none'` if no stemming is to be used. | #### Example Usage {#docs:stable:core_extensions:full_text_search::example-usage} Create a table and fill it with text data: ```sql CREATE TABLE documents ( document_identifier VARCHAR, text_content VARCHAR, author VARCHAR, doc_version INTEGER ); INSERT INTO documents VALUES ('doc1', 'The mallard is a dabbling duck that breeds throughout the temperate.', 'Hannes MÃ¼hleisen', 3), ('doc2', 'The cat is a domestic species of small carnivorous mammal.', 'Laurens Kuiper', 2 ); ``` Build the index, and make both the `text_content` and `author` columns searchable. ```sql PRAGMA create_fts_index( 'documents', 'document_identifier', 'text_content', 'author' ); ``` Search the `author` field index for documents that are authored by `Muhleisen`. This retrieves `doc1`: ```sql SELECT document_identifier, text_content, score FROM ( SELECT *, fts_main_documents.match_bm25( document_identifier, 'Muhleisen', fields := 'author' ) AS score FROM documents ) sq WHERE score IS NOT NULL AND doc_version > 2 ORDER BY score DESC; ``` | document_identifier | text_content | score | |---------------------|----------------------------------------------------------------------|------:| | doc1 | The mallard is a dabbling duck that breeds throughout the temperate. | 0.0 | Search for documents about `small cats`. This retrieves `doc2`: ```sql SELECT document_identifier, text_content, score FROM ( SELECT *, fts_main_documents.match_bm25( document_identifier, 'small cats' ) AS score FROM documents ) sq WHERE score IS NOT NULL ORDER BY score DESC; ``` | document_identifier | text_content | score | |---------------------|------------------------------------------------------------|------:| | doc2 | The cat is a domestic species of small carnivorous mammal. | 0.0 | > **Warning.** The FTS index will not update automatically when input table changes. > A workaround of this limitation can be recreating the index to refresh. ## httpfs (HTTP and S3) {#core_extensions:httpfs} ### httpfs Extension for HTTP and S3 Support {#docs:stable:core_extensions:httpfs:overview} The `httpfs` extension is an autoloadable extension implementing a file system that allows reading remote/writing remote files. For plain HTTP(S), only file reading is supported. For object storage using the S3 API, the `httpfs` extension supports reading/writing/[globbing](#docs:stable:sql:functions:pattern_matching::globbing) files. #### Installation and Loading {#docs:stable:core_extensions:httpfs:overview::installation-and-loading} The `httpfs` extension will be, by default, autoloaded on first use of any functionality exposed by this extension. To manually install and load the `httpfs` extension, run: ```sql INSTALL httpfs; LOAD httpfs; ``` #### HTTP(S) {#docs:stable:core_extensions:httpfs:overview::https} The `httpfs` extension supports connecting to [HTTP(S) endpoints](#docs:stable:core_extensions:httpfs:https). #### S3 API {#docs:stable:core_extensions:httpfs:overview::s3-api} The `httpfs` extension supports connecting to [S3 API endpoints](#docs:stable:core_extensions:httpfs:s3api). ### HTTP(S) Support {#docs:stable:core_extensions:httpfs:https} With the `httpfs` extension, it is possible to directly query files over the HTTP(S) protocol. This works for all files supported by DuckDB or its various extensions, and provides read-only access. ```sql SELECT * FROM 'https://domain.tld/file.extension'; ``` #### Partial Reading {#docs:stable:core_extensions:httpfs:https::partial-reading} For CSV files, files will be downloaded entirely in most cases, due to the row-based nature of the format. For Parquet files, DuckDB supports [partial reading](#docs:stable:data:parquet:overview::partial-reading), i.e., it can use a combination of the Parquet metadata and [HTTP range requests](https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests) to only download the parts of the file that are actually required by the query. For example, the following query will only read the Parquet metadata and the data for the `column_a` column: ```sql SELECT column_a FROM 'https://domain.tld/file.parquet'; ``` In some cases, no actual data needs to be read at all as they only require reading the metadata: ```sql SELECT count(*) FROM 'https://domain.tld/file.parquet'; ``` #### Scanning Multiple Files {#docs:stable:core_extensions:httpfs:https::scanning-multiple-files} Scanning multiple files over HTTP(S) is also supported: ```sql SELECT * FROM read_parquet([ 'https://domain.tld/file1.parquet', 'https://domain.tld/file2.parquet' ]); ``` #### Authenticating {#docs:stable:core_extensions:httpfs:https::authenticating} To authenticate for an HTTP(S) endpoint, create an `HTTP` secret using the [Secrets Manager](#docs:stable:configuration:secrets_manager): ```sql CREATE SECRET http_auth ( TYPE http, BEARER_TOKEN 'âŸ¨tokenâŸ©' ); ``` Or: ```sql CREATE SECRET http_auth ( TYPE http, EXTRA_HTTP_HEADERS MAP { 'Authorization': 'Bearer âŸ¨tokenâŸ©' } ); ``` #### HTTP Proxy {#docs:stable:core_extensions:httpfs:https::http-proxy} DuckDB supports HTTP proxies. You can add an HTTP proxy using the [Secrets Manager](#docs:stable:configuration:secrets_manager): ```sql CREATE SECRET http_proxy ( TYPE http, HTTP_PROXY 'âŸ¨http_proxy_urlâŸ©', HTTP_PROXY_USERNAME 'âŸ¨usernameâŸ©', HTTP_PROXY_PASSWORD 'âŸ¨passwordâŸ©' ); ``` Alternatively, you can add it via [configuration options](#docs:stable:configuration:pragmas): ```sql SET http_proxy = 'âŸ¨http_proxy_urlâŸ©'; SET http_proxy_username = 'âŸ¨usernameâŸ©'; SET http_proxy_password = 'âŸ¨passwordâŸ©'; ``` #### Using a Custom Certificate File {#docs:stable:core_extensions:httpfs:https::using-a-custom-certificate-file} To use the `httpfs` extension with a custom certificate file, set the following [configuration options](#docs:stable:configuration:pragmas) prior to loading the extension: ```sql LOAD httpfs; SET ca_cert_file = 'âŸ¨certificate_fileâŸ©'; SET enable_server_cert_verification = true; ``` ### Hugging Face Support {#docs:stable:core_extensions:httpfs:hugging_face} The `httpfs` extension introduces support for the `hf://` protocol to access datasets hosted in [Hugging Face](https://huggingface.co/) repositories. See the [announcement blog post](https://duckdb.org/2024/05/29/access-150k-plus-datasets-from-hugging-face-with-duckdb) for details. #### Usage {#docs:stable:core_extensions:httpfs:hugging_face::usage} Hugging Face repositories can be queried using the following URL pattern: ```text hf://datasets/âŸ¨my_usernameâŸ©/âŸ¨my_datasetâŸ©/âŸ¨path_to_fileâŸ© ``` For example, to read a CSV file, you can use the following query: ```sql SELECT * FROM 'hf://datasets/datasets-examples/doc-formats-csv-1/data.csv'; ``` Where: * `datasets-examples` is the name of the user/organization * `doc-formats-csv-1` is the name of the dataset repository * `data.csv` is the file path in the repository The result of the query is: | kind | sound | |---------|-------| | dog | woof | | cat | meow | | pokemon | pika | | human | hello | To read a JSONL file, you can run: ```sql SELECT * FROM 'hf://datasets/datasets-examples/doc-formats-jsonl-1/data.jsonl'; ``` Finally, for reading a Parquet file, use the following query: ```sql SELECT * FROM 'hf://datasets/datasets-examples/doc-formats-parquet-1/data/train-00000-of-00001.parquet'; ``` Each of these commands reads the data from the specified file format and displays it in a structured tabular format. Choose the appropriate command based on the file format you are working with. #### Creating a Local Table {#docs:stable:core_extensions:httpfs:hugging_face::creating-a-local-table} To avoid accessing the remote endpoint for every query, you can save the data in a DuckDB table by running a [`CREATE TABLE ... AS` command](#docs:stable:sql:statements:create_table::create-table--as-select-ctas). For example: ```sql CREATE TABLE data AS SELECT * FROM 'hf://datasets/datasets-examples/doc-formats-csv-1/data.csv'; ``` Then, simply query the `data` table as follows: ```sql SELECT * FROM data; ``` #### Multiple Files {#docs:stable:core_extensions:httpfs:hugging_face::multiple-files} To query all files under a specific directory, you can use a [glob pattern](#docs:stable:data:multiple_files:overview::multi-file-reads-and-globs). For example: ```sql SELECT count(*) AS count FROM 'hf://datasets/cais/mmlu/astronomy/*.parquet'; ``` | count | |------:| | 173 | By using glob patterns, you can efficiently handle large datasets and perform comprehensive queries across multiple files, simplifying your data inspections and processing tasks. Here, you can see how you can look for questions that contain the word â€œplanetâ€ in astronomy: ```sql SELECT count(*) AS count FROM 'hf://datasets/cais/mmlu/astronomy/*.parquet' WHERE question LIKE '%planet%'; ``` | count | |------:| | 21 | #### Versioning and Revisions {#docs:stable:core_extensions:httpfs:hugging_face::versioning-and-revisions} In Hugging Face repositories, dataset versions or revisions are different dataset updates. Each version is a snapshot at a specific time, allowing you to track changes and improvements. In git terms, it can be understood as a branch or specific commit. You can query different dataset versions/revisions by using the following URL: ```sql hf://datasets/âŸ¨my_usernameâŸ©/âŸ¨my_datasetâŸ©@âŸ¨my_branchâŸ©/âŸ¨path_to_fileâŸ© ``` For example: ```sql SELECT * FROM 'hf://datasets/datasets-examples/doc-formats-csv-1@~parquet/**/*.parquet'; ``` | kind | sound | |---------|-------| | dog | woof | | cat | meow | | pokemon | pika | | human | hello | The previous query will read all Parquet files under the `~parquet` revision. This is a special branch where Hugging Face automatically generates the Parquet files of every dataset to enable efficient scanning. #### Authentication {#docs:stable:core_extensions:httpfs:hugging_face::authentication} Configure your Hugging Face Token in the DuckDB Secrets Manager to access private or gated datasets. First, visit [Hugging Face Settings â€“ Tokens](https://huggingface.co/settings/tokens) to obtain your access token. Second, set it in your DuckDB session using DuckDBâ€™s [Secrets Manager](#docs:stable:configuration:secrets_manager). DuckDB supports two providers for managing secrets: ##### `CONFIG` {#docs:stable:core_extensions:httpfs:hugging_face::config} The user must pass all configuration information into the `CREATE SECRET` statement. To create a secret using the `CONFIG` provider, use the following command: ```sql CREATE SECRET hf_token ( TYPE huggingface, TOKEN 'your_hf_token' ); ``` ##### `credential_chain` {#docs:stable:core_extensions:httpfs:hugging_face::credential_chain} Automatically tries to fetch credentials. For the Hugging Face token, it will try to get it from `~/.cache/huggingface/token`. To create a secret using the `credential_chain` provider, use the following command: ```sql CREATE SECRET hf_token ( TYPE huggingface, PROVIDER credential_chain ); ``` ### S3 API Support {#docs:stable:core_extensions:httpfs:s3api} The `httpfs` extension supports reading/writing/[globbing](#::globbing) files on object storage servers using the S3 API. S3 offers a standard API to read and write to remote files (while regular http servers, predating S3, do not offer a common write API). DuckDB conforms to the S3 API, that is now common among industry storage providers. #### Platforms {#docs:stable:core_extensions:httpfs:s3api::platforms} The `httpfs` filesystem is tested with [AWS S3](https://aws.amazon.com/s3/), [Minio](https://min.io/), [Google Cloud](https://cloud.google.com/storage/docs/interoperability), and [lakeFS](https://docs.lakefs.io/integrations/duckdb.html). Other services that implement the S3 API (such as [Cloudflare R2](https://www.cloudflare.com/en-gb/developer-platform/r2/)) should also work, but not all features may be supported. The following table shows which parts of the S3 API are required for each `httpfs` feature. | Feature | Required S3 API features | |:---|:---| | Public file reads | HTTP Range requests | | Private file reads | Secret key or session token authentication | | File glob | [ListObjectsV2](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html) | | File writes | [Multipart upload](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html) | #### Configuration and Authentication {#docs:stable:core_extensions:httpfs:s3api::configuration-and-authentication} The preferred way to configure and authenticate to S3 endpoints is to use [secrets](#docs:stable:sql:statements:create_secret). Multiple secret providers are available. To migrate from the [deprecated S3 API](#docs:stable:core_extensions:httpfs:s3api_legacy_authentication), use a defined secret with a profile. See the [â€œLoading a Secret Based on a Profileâ€ section](#::loading-a-secret-based-on-a-profile). ##### `config` Provider {#docs:stable:core_extensions:httpfs:s3api::config-provider} The default provider, `config` (i.e., user-configured), allows access to the S3 bucket by manually providing a key. For example: ```sql CREATE OR REPLACE SECRET secret ( TYPE s3, PROVIDER config, KEY_ID 'âŸ¨AKIAIOSFODNN7EXAMPLEâŸ©', SECRET 'âŸ¨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYâŸ©', REGION 'âŸ¨us-east-1âŸ©' ); ``` > **Tip.** If you get an IO Error (` Connection error for HTTP HEAD`), configure the endpoint explicitly via `ENDPOINT 's3.âŸ¨your_regionâŸ©.amazonaws.com'`{:.language-sql .highlight}. Now, to query using the above secret, simply query any `s3://` prefixed file: ```sql SELECT * FROM 's3://âŸ¨your-bucketâŸ©/âŸ¨your_fileâŸ©.parquet'; ``` ##### `credential_chain` Provider {#docs:stable:core_extensions:httpfs:s3api::credential_chain-provider} The `credential_chain` provider allows automatically fetching credentials using mechanisms provided by the AWS SDK. For example, to use the AWS SDK default provider: ```sql CREATE OR REPLACE SECRET secret ( TYPE s3, PROVIDER credential_chain ); ``` Again, to query a file using the above secret, simply query any `s3://` prefixed file. DuckDB also allows specifying a specific chain using the `CHAIN` keyword. This takes a semicolon-separated list (` a;b;c`) of providers that will be tried in order. For example: ```sql CREATE OR REPLACE SECRET secret ( TYPE s3, PROVIDER credential_chain, CHAIN 'env;config' ); ``` The possible values for `CHAIN` are the following: * [`config`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_profile_config_file_a_w_s_credentials_provider.html) * [`sts`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_s_t_s_assume_role_web_identity_credentials_provider.html) * [`sso`](https://aws.amazon.com/what-is/sso/) * [`env`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_environment_a_w_s_credentials_provider.html) * [`instance`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_instance_profile_credentials_provider.html) * [`process`](https://sdk.amazonaws.com/cpp/api/LATEST/aws-cpp-sdk-core/html/class_aws_1_1_auth_1_1_process_credentials_provider.html) The `credential_chain` provider also allows overriding the automatically fetched config. For example, to automatically load credentials, and then override the region, run: ```sql CREATE OR REPLACE SECRET secret ( TYPE s3, PROVIDER credential_chain, CHAIN config, REGION 'âŸ¨eu-west-1âŸ©' ); ``` ###### Loading a Secret Based on a Profile {#docs:stable:core_extensions:httpfs:s3api::loading-a-secret-based-on-a-profile} To load credentials based on a profile which is not defined as a default from the `AWS_PROFILE` environment variable or as a default profile based on AWS SDK precedence, run: ```sql CREATE OR REPLACE SECRET secret ( TYPE s3, PROVIDER credential_chain, CHAIN config, PROFILE 'âŸ¨my_profileâŸ©' ); ``` This approach is equivalent to the [deprecated S3 API's](#docs:stable:core_extensions:httpfs:s3api_legacy_authentication)'s method `load_aws_credentials('âŸ¨my_profileâŸ©')`. ##### Overview of S3 Secret Parameters {#docs:stable:core_extensions:httpfs:s3api::overview-of-s3-secret-parameters} Below is a complete list of the supported parameters that can be used for both the `config` and `credential_chain` providers: | Name | Description | Secret | Type | Default | |:------------------------------|:--------------------------------------------------------------------------------------|:------------------|:----------|:--------------------------------------------| | `ENDPOINT` | Specify a custom S3 endpoint | `S3`, `GCS`, `R2` | `STRING` | `s3.amazonaws.com` for `S3`, | | `KEY_ID` | The ID of the key to use | `S3`, `GCS`, `R2` | `STRING` | - | | `REGION` | The region for which to authenticate (should match the region of the bucket to query) | `S3`, `GCS`, `R2` | `STRING` | `us-east-1` | | `SECRET` | The secret of the key to use | `S3`, `GCS`, `R2` | `STRING` | - | | `SESSION_TOKEN` | Optionally, a session token can be passed to use temporary credentials | `S3`, `GCS`, `R2` | `STRING` | - | | `URL_COMPATIBILITY_MODE` | Can help when URLs contain problematic characters | `S3`, `GCS`, `R2` | `BOOLEAN` | `true` | | `URL_STYLE` | Either `vhost` or `path` | `S3`, `GCS`, `R2` | `STRING` | `vhost` for `S3`, `path` for `R2` and `GCS` | | `USE_SSL` | Whether to use HTTPS or HTTP | `S3`, `GCS`, `R2` | `BOOLEAN` | `true` | | `ACCOUNT_ID` | The R2 account ID to use for generating the endpoint URL | `R2` | `STRING` | - | | `KMS_KEY_ID` | AWS KMS (Key Management Service) key for Server Side Encryption S3 | `S3` | `STRING` | - | | `REQUESTER_PAYS` | Allows use of "requester pays" S3 buckets | `S3` | `BOOLEAN` | `false` | ##### Platform-Specific Secret Types {#docs:stable:core_extensions:httpfs:s3api::platform-specific-secret-types} ###### S3 Secrets {#docs:stable:core_extensions:httpfs:s3api::s3-secrets} The httpfs extension supports [Server Side Encryption via the AWS Key Management Service (KMS) on S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html) using the `KMS_KEY_ID` option: ```sql CREATE OR REPLACE SECRET secret ( TYPE s3, PROVIDER credential_chain, CHAIN config, REGION 'âŸ¨eu-west-1âŸ©', KMS_KEY_ID 'arn:aws:kms:âŸ¨regionâŸ©:âŸ¨account_idâŸ©:âŸ¨keyâŸ©/âŸ¨key_idâŸ©', SCOPE 's3://âŸ¨bucket-sub-pathâŸ©' ); ``` ###### R2 Secrets {#docs:stable:core_extensions:httpfs:s3api::r2-secrets} While [Cloudflare R2](https://www.cloudflare.com/developer-platform/r2) uses the regular S3 API, DuckDB has a special Secret type, `R2`, to make configuring it a bit simpler: ```sql CREATE OR REPLACE SECRET secret ( TYPE r2, KEY_ID 'âŸ¨AKIAIOSFODNN7EXAMPLEâŸ©', SECRET 'âŸ¨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYâŸ©', ACCOUNT_ID 'âŸ¨my_account_idâŸ©' ); ``` Note the addition of the `ACCOUNT_ID` which is used to generate the correct endpoint URL for you. Also note that `R2` Secrets can also use both the `CONFIG` and `credential_chain` providers. However, since DuckDB uses an AWS client internally, when using `credential_chain`, the client will search for AWS credentials in the standard AWS credential locations (environment variables, credential files, etc.). Therefore, your R2 credentials must be made available as AWS environment variables (` AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`) for the credential chain to work properly. Finally, `R2` secrets are only available when using URLs starting with `r2://`, for example: ```sql SELECT * FROM read_parquet('r2://âŸ¨some-file-that-uses-an-r2-secretâŸ©.parquet'); ``` ###### GCS Secrets {#docs:stable:core_extensions:httpfs:s3api::gcs-secrets} While [Google Cloud Storage](https://cloud.google.com/storage) is accessed by DuckDB using the S3 API, DuckDB has a special Secret type, `GCS`, to make configuring it a bit simpler: ```sql CREATE OR REPLACE SECRET secret ( TYPE gcs, KEY_ID 'âŸ¨my_hmac_access_idâŸ©', SECRET 'âŸ¨my_hmac_secret_keyâŸ©' ); ``` **Important**: The `KEY_ID` and `SECRET` values must be HMAC keys generated specifically for Google Cloud Storage interoperability. These are not the same as regular GCP service account keys or access tokens. You can create HMAC keys by following the [Google Cloud documentation for managing HMAC keys](https://cloud.google.com/storage/docs/authentication/managing-hmackeys). Note that the above secret will automatically have the correct Google Cloud Storage endpoint configured. Also note that `GCS` Secrets can also use both the `CONFIG` and `credential_chain` providers. However, since DuckDB uses an AWS client internally, when using `credential_chain`, the client will search for AWS credentials in the standard AWS credential locations (environment variables, credential files, etc.). Therefore, your GCS HMAC keys must be made available as AWS environment variables (` AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`) for the credential chain to work properly. Finally, `GCS` secrets are only available when using URLs starting with `gcs://` or `gs://`, for example: ```sql SELECT * FROM read_parquet('gcs://âŸ¨some/file/that/uses/a/gcs/secretâŸ©.parquet'); ``` #### Reading {#docs:stable:core_extensions:httpfs:s3api::reading} Reading files from S3 is now as simple as: ```sql SELECT * FROM 's3://âŸ¨your-bucketâŸ©/âŸ¨filenameâŸ©.âŸ¨extensionâŸ©'; ``` ##### Partial Reading {#docs:stable:core_extensions:httpfs:s3api::partial-reading} The `httpfs` extension supports [partial reading](#docs:stable:core_extensions:httpfs:https::partial-reading) from S3 buckets. ##### Reading Multiple Files {#docs:stable:core_extensions:httpfs:s3api::reading-multiple-files} Multiple files are also possible, for example: ```sql SELECT * FROM read_parquet([ 's3://âŸ¨your-bucketâŸ©/âŸ¨filename-1âŸ©.parquet', 's3://âŸ¨your-bucketâŸ©/âŸ¨filename-2âŸ©.parquet' ]); ``` ##### Globbing {#docs:stable:core_extensions:httpfs:s3api::globbing} File [globbing](#docs:stable:sql:functions:pattern_matching::globbing) is implemented using the ListObjectsV2 API call and allows to use filesystem-like glob patterns to match multiple files, for example: ```sql SELECT * FROM read_parquet('s3://âŸ¨your-bucketâŸ©/*.parquet'); ``` This query matches all files in the root of the bucket with the [Parquet extension](#docs:stable:data:parquet:overview). Several features for matching are supported, such as `*` to match any number of any character, `?` for any single character or `[0-9]` for a single character in a range of characters: ```sql SELECT count(*) FROM read_parquet('s3://âŸ¨your-bucketâŸ©/folder*/100?/t[0-9].parquet'); ``` A useful feature when using globs is the `filename` option, which adds a column named `filename` that encodes the file that a particular row originated from: ```sql SELECT * FROM read_parquet('s3://âŸ¨your-bucketâŸ©/*.parquet', filename = true); ``` This could for example result in: | column_a | column_b | filename | |:---|:---|:---| | 1 | examplevalue1 | s3://bucket-name/file1.parquet | | 2 | examplevalue1 | s3://bucket-name/file2.parquet | ##### Hive Partitioning {#docs:stable:core_extensions:httpfs:s3api::hive-partitioning} DuckDB also offers support for the [Hive partitioning scheme](#docs:stable:data:partitioning:hive_partitioning), which is available when using HTTP(S) and S3 endpoints. #### Writing {#docs:stable:core_extensions:httpfs:s3api::writing} Writing to S3 uses the multipart upload API. This allows DuckDB to robustly upload files at high speed. Writing to S3 works for both CSV and Parquet: ```sql COPY table_name TO 's3://âŸ¨your-bucketâŸ©/âŸ¨filenameâŸ©.âŸ¨extensionâŸ©'; ``` Partitioned copy to S3 also works: ```sql COPY table TO 's3://âŸ¨your-bucketâŸ©/partitioned' ( FORMAT parquet, PARTITION_BY (âŸ¨part_col_aâŸ©, âŸ¨part_col_bâŸ©) ); ``` An automatic check is performed for existing files/directories, which is currently quite conservative (and on S3 will add a bit of latency). To disable this check and force writing, an `OVERWRITE_OR_IGNORE` flag is added: ```sql COPY table TO 's3://âŸ¨your-bucketâŸ©/partitioned' ( FORMAT parquet, PARTITION_BY (âŸ¨part_col_aâŸ©, âŸ¨part_col_bâŸ©), OVERWRITE_OR_IGNORE true ); ``` The naming scheme of the written files looks like this: ```sql s3://âŸ¨your-bucketâŸ©/partitioned/part_col_a=âŸ¨valâŸ©/part_col_b=âŸ¨valâŸ©/data_âŸ¨thread_numberâŸ©.parquet ``` ##### Configuration {#docs:stable:core_extensions:httpfs:s3api::configuration} Some additional configuration options exist for the S3 upload, though the default values should suffice for most use cases. | Name | Description | |:---|:---| | `s3_uploader_max_parts_per_file` | Used for part size calculation, see [AWS docs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html) | | `s3_uploader_max_filesize` | Used for part size calculation, see [AWS docs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html) | | `s3_uploader_thread_limit` | Maximum number of uploader threads | ### Legacy Authentication Scheme for S3 API {#docs:stable:core_extensions:httpfs:s3api_legacy_authentication} Prior to version 0.10.0, DuckDB did not have a [Secrets manager](#docs:stable:sql:statements:create_secret). Hence, the configuration of and authentication to S3 endpoints was handled via variables. This page documents the legacy authentication scheme for the S3 API. > **Warning.** This page describes a legacy method to store secrets as DuckDB settings. > This increases the risk of accidentally leaking secrets (e.g., by printing their values). > Therefore, avoid using these methods for storing secrets. > The recommended way to configuration and authentication of S3 endpoints is to use [secrets](#docs:stable:core_extensions:httpfs:s3api::configuration-and-authentication). #### Legacy Authentication Scheme {#docs:stable:core_extensions:httpfs:s3api_legacy_authentication::legacy-authentication-scheme} To be able to read or write from S3, the correct region should be set: ```sql SET s3_region = 'us-east-1'; ``` Optionally, the endpoint can be configured in case a non-AWS object storage server is used: ```sql SET s3_endpoint = 'âŸ¨domainâŸ©.âŸ¨tldâŸ©:âŸ¨portâŸ©'; ``` If the endpoint is not SSL-enabled then run: ```sql SET s3_use_ssl = false; ``` Switching between [path-style](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#path-style-access) and [vhost-style](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#virtual-hosted-style-access) URLs is possible using: ```sql SET s3_url_style = 'path'; ``` However, note that this may also require updating the endpoint. For example for AWS S3 it is required to change the endpoint to `s3.âŸ¨regionâŸ©.amazonaws.com`{:.language-sql .highlight}. After configuring the correct endpoint and region, public files can be read. To also read private files, authentication credentials can be added: ```sql SET s3_access_key_id = 'âŸ¨aws_access_key_idâŸ©'; SET s3_secret_access_key = 'âŸ¨aws_secret_access_keyâŸ©'; ``` Alternatively, temporary S3 credentials are also supported. They require setting an additional session token: ```sql SET s3_session_token = 'âŸ¨aws_session_tokenâŸ©'; ``` The [`aws` extension](#docs:stable:core_extensions:aws) allows for loading AWS credentials. #### Per-Request Configuration {#docs:stable:core_extensions:httpfs:s3api_legacy_authentication::per-request-configuration} Aside from the global S3 configuration described above, specific configuration values can be used on a per-request basis. This allows for use of multiple sets of credentials, regions, etc. These are used by including them on the S3 URI as query parameters. All the individual configuration values listed above can be set as query parameters. For instance: ```sql SELECT * FROM 's3://bucket/file.parquet?s3_access_key_id=accessKey&s3_secret_access_key=secretKey'; ``` Multiple configurations per query are also allowed: ```sql SELECT * FROM 's3://bucket/file.parquet?s3_access_key_id=accessKey1&s3_secret_access_key=secretKey1' t1 INNER JOIN 's3://bucket/file.csv?s3_access_key_id=accessKey2&s3_secret_access_key=secretKey2' t2; ``` #### Configuration {#docs:stable:core_extensions:httpfs:s3api_legacy_authentication::configuration} Some additional configuration options exist for the S3 upload, though the default values should suffice for most use cases. Additionally, most of the configuration options can be set via environment variables: | DuckDB setting | Environment variable | Note | |:-----------------------|:---------------------------|:-----------------------------------------| | `s3_region` | `AWS_REGION` | Takes priority over `AWS_DEFAULT_REGION` | | `s3_region` | `AWS_DEFAULT_REGION` | | | `s3_access_key_id` | `AWS_ACCESS_KEY_ID` | | | `s3_secret_access_key` | `AWS_SECRET_ACCESS_KEY` | | | `s3_session_token` | `AWS_SESSION_TOKEN` | | | `s3_endpoint` | `DUCKDB_S3_ENDPOINT` | | | `s3_use_ssl` | `DUCKDB_S3_USE_SSL` | | | `s3_requester_pays` | `DUCKDB_S3_REQUESTER_PAYS` | | ## Iceberg {#core_extensions:iceberg} ### Iceberg Extension {#docs:stable:core_extensions:iceberg:overview} The `iceberg` extension implements support for the [Apache Iceberg open table format](https://iceberg.apache.org/). In this page we will go over the basic usage of the extension without the need to attach to an Iceberg catalog. For full support —including write support— see [how to attach Iceberg REST catalogs](#docs:stable:core_extensions:iceberg:iceberg_rest_catalogs). #### Installing and Loading {#docs:stable:core_extensions:iceberg:overview::installing-and-loading} To install the `iceberg` extension, run: ```sql INSTALL iceberg; ``` Note that the `iceberg` extension is not autoloadable. Therefore, you need to load it before using it: ```sql LOAD iceberg; ``` #### Updating the Extension {#docs:stable:core_extensions:iceberg:overview::updating-the-extension} The `iceberg` extension often receives updates between DuckDB releases. To make sure that you have the latest version, [update your extensions](#docs:stable:sql:statements:update_extensions): ```sql UPDATE EXTENSIONS; ``` #### Usage {#docs:stable:core_extensions:iceberg:overview::usage} To test the examples, download the [`iceberg_data.zip`](https://duckdb.org/data/iceberg_data.zip) file and unzip it. ##### Common Parameters {#docs:stable:core_extensions:iceberg:overview::common-parameters} | Parameter | Type | Default | Description | | ---------------------------- | ----------- | ------------------------------------------ | ---------------------------------------------------------- | | `allow_moved_paths` | `BOOLEAN` | `false` | Allows scanning Iceberg tables that are moved | | `metadata_compression_codec` | `VARCHAR` | `''` | Treats metadata files as when set to `'gzip'` | | `snapshot_from_id` | `UBIGINT` | `NULL` | Access snapshot with a specific `id` | | `snapshot_from_timestamp` | `TIMESTAMP` | `NULL` | Access snapshot with a specific `timestamp` | | `version` | `VARCHAR` | `'?'` | Provides an explicit version string, hint file or guessing | | `version_name_format` | `VARCHAR` | `'v%s%s.metadata.json,%s%s.metadata.json'` | Controls how versions are converted to metadata file names | ##### Querying Individual Tables {#docs:stable:core_extensions:iceberg:overview::querying-individual-tables} ```sql SELECT count(*) FROM iceberg_scan('data/iceberg/lineitem_iceberg', allow_moved_paths = true); ``` | count_star() | |-------------:| | 51793 | > The `allow_moved_paths` option ensures that some path resolution is performed, > which allows scanning Iceberg tables that are moved. You can also address specify the current manifest directly in the query, this may be resolved from the catalog prior to the query, in this example the manifest version is a UUID. To do so, navigate to the `data/iceberg` directory and run: ```sql SELECT count(*) FROM iceberg_scan('lineitem_iceberg/metadata/v1.metadata.json'); ``` | count_star() | |-------------:| | 60175 | The `iceberg` works together with the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview) or the [`azure` extension](#docs:stable:core_extensions:azure) to access Iceberg tables in object stores such as S3 or Azure Blob Storage. ```sql SELECT count(*) FROM iceberg_scan('s3://bucketname/lineitem_iceberg/metadata/v1.metadata.json'); ``` ##### Access Iceberg Metadata {#docs:stable:core_extensions:iceberg:overview::access-iceberg-metadata} To access Iceberg Metadata, you can use the `iceberg_metadata` function: ```sql SELECT * FROM iceberg_metadata('data/iceberg/lineitem_iceberg', allow_moved_paths = true); ``` | manifest_path | manifest_sequence_number | manifest_content | status | content | file_path | file_format | record_count | |------------------------------------------------------------------------|--------------------------|------------------|---------|----------|------------------------------------------------------------------------------------|-------------|--------------| | lineitem_iceberg/metadata/10eaca8a-1e1c-421e-ad6d-b232e5ee23d3-m1.avro | 2 | DATA | ADDED | EXISTING | lineitem_iceberg/data/00041-414-f3c73457-bbd6-4b92-9c15-17b241171b16-00001.parquet | PARQUET | 51793 | | lineitem_iceberg/metadata/10eaca8a-1e1c-421e-ad6d-b232e5ee23d3-m0.avro | 2 | DATA | DELETED | EXISTING | lineitem_iceberg/data/00000-411-0792dcfe-4e25-4ca3-8ada-175286069a47-00001.parquet | PARQUET | 60175 | ##### Visualizing Snapshots {#docs:stable:core_extensions:iceberg:overview::visualizing-snapshots} To visualize the snapshots in an Iceberg table, use the `iceberg_snapshots` function: ```sql SELECT * FROM iceberg_snapshots('data/iceberg/lineitem_iceberg'); ``` | sequence_number | snapshot_id | timestamp_ms | manifest_list | |-----------------|---------------------|-------------------------|------------------------------------------------------------------------------------------------| | 1 | 3776207205136740581 | 2023-02-15 15:07:54.504 | lineitem_iceberg/metadata/snap-3776207205136740581-1-cf3d0be5-cf70-453d-ad8f-48fdc412e608.avro | | 2 | 7635660646343998149 | 2023-02-15 15:08:14.73 | lineitem_iceberg/metadata/snap-7635660646343998149-1-10eaca8a-1e1c-421e-ad6d-b232e5ee23d3.avro | > `iceberg_snapshots` does not take `allow_moved_paths`, `snapshot_from_id` or `snapshot_from_timestamp` as parameters. ##### Selecting Metadata Versions {#docs:stable:core_extensions:iceberg:overview::selecting-metadata-versions} By default, the `iceberg` extension will look for a `version-hint.text` file to identify the proper metadata version to use. This can be overridden by explicitly supplying a version number via the `version` parameter to the functions of the `iceberg` extension: ```sql SELECT * FROM iceberg_snapshots( 'data/iceberg/lineitem_iceberg', version = '1', ); ``` By default, `iceberg` functions will look for both `v{version}.metadata.json` and `{version}.metadata.json` files, or `v{version}.gz.metadata.json` and `{version}.gz.metadata.json` when `metadata_compression_codec = 'gzip'` is specified. Other compression codecs are not supported. If any text file is provided through the `version` parameter, it is opened and treated as a version hint file: ```sql SELECT * FROM iceberg_snapshots( 'data/iceberg/lineitem_iceberg', version = 'version-hint.txt', ); ``` The `iceberg` extension will open this file and use the **entire content** of the file as a provided version number. Note that the entire content of the `version-hint.txt` file will be treated as a literal version name, with no encoding, escaping or trimming. This includes any whitespace, or unsafe characters which will be explicitly passed formatted into filenames in the logic described below. ##### Working with Alternative Metadata Naming Conventions {#docs:stable:core_extensions:iceberg:overview::working-with-alternative-metadata-naming-conventions} The `iceberg` extension can handle different metadata naming conventions by specifying them as a comma-delimited list of format strings via the `version_name_format` parameter. Each format string must contain two `%s` parameters. The first is the location of the version number in the metadata filename and the second is the location of the filename extension specified by the `metadata_compression_codec`. The behavior described above is provided by the default value of `"v%s%s.metadata.gz,%s%smetadata.gz`. If you had an alternatively named metadata file, e.g., `rev-2.metadata.json.gz`, the table can be read via the follow statement: ```sql SELECT * FROM iceberg_snapshots( 'data/iceberg/alternative_metadata_gz_naming', version = '2', version_name_format = 'rev-%s.metadata.json%s', metadata_compression_codec = 'gzip', ); ``` ##### â€œGuessingâ€ Metadata Versions {#docs:stable:core_extensions:iceberg:overview::guessing-metadata-versions} By default, either a table version number or a `version-hint.text` **must** be provided for the `iceberg` extension to read a table. This is typically provided by an external data catalog. In the event neither is present, the `iceberg` extension can attempt to guess the latest version by passing `?` as the `version` parameter: ```sql SELECT count(*) FROM iceberg_scan( 'data/iceberg/lineitem_iceberg_no_hint', version = '?', allow_moved_paths = true ); ``` The â€œlatestâ€ version is assumed to be the filename that is lexicographically largest when sorting the filenames. Collations are not considered. This behavior is not enabled by default as it may potentially violate ACID constraints. It can be enabled by setting `unsafe_enable_version_guessing` to `true`. When this is set, `iceberg` functions will attempt to guess the latest version by default before failing. ```sql SET unsafe_enable_version_guessing = true; SELECT count(*) FROM iceberg_scan( 'data/iceberg/lineitem_iceberg_no_hint', allow_moved_paths = true ); ``` #### Limitations {#docs:stable:core_extensions:iceberg:overview::limitations} - Updates and deletes. - Inserts into v3 Iceberg specification tables. - Reads from v3 tables with v2 data types. - Geometry data type For a set of unsupported operations when attaching to an iceberg catalog, [see](#docs:stable:core_extensions:iceberg:iceberg_rest_catalogs::unsupported-operations). ### Iceberg REST Catalogs {#docs:stable:core_extensions:iceberg:iceberg_rest_catalogs} The `iceberg` extension supports attaching Iceberg REST Catalogs. Before attaching an Iceberg REST Catalog, you must install the `iceberg` extension by following the instructions located in the [overview](#docs:stable:core_extensions:iceberg:overview). If you are attaching to an Iceberg REST Catalog managed by Amazon, please see the instructions for attaching to [Amazon S3 tables](#docs:stable:core_extensions:iceberg:amazon_s3_tables) or [Amazon SageMaker Makehouse](#docs:stable:core_extensions:iceberg:amazon_sagemaker_lakehouse). For all other Iceberg REST Catalogs, you can follow the instructions below. Please see the [Examples](#::specific-catalog-examples) section for questions about specific catalogs. Most Iceberg REST Catalogs authenticate via OAuth2. You can use the existing DuckDB secret workflow to store login credentials for the OAuth2 service. ```sql CREATE SECRET iceberg_secret ( TYPE iceberg, CLIENT_ID 'âŸ¨adminâŸ©', CLIENT_SECRET 'âŸ¨passwordâŸ©', OAUTH2_SERVER_URI 'âŸ¨http://irc_host_url.com/v1/oauth/tokensâŸ©' ); ``` If you already have a Bearer token, you can pass it directly to your `CREATE SECRET` statement ```sql CREATE SECRET iceberg_secret ( TYPE iceberg, TOKEN 'âŸ¨bearer_tokenâŸ©' ); ``` You can attach the Iceberg catalog with the following [`ATTACH`](#docs:stable:sql:statements:attach) statement. ```sql LOAD httpfs; ATTACH 'âŸ¨warehouseâŸ©' AS iceberg_catalog ( TYPE iceberg, SECRET iceberg_secret, -- pass a specific secret name to prevent ambiguity ENDPOINT 'âŸ¨https://rest_endpoint.comâŸ©' ); ``` To see the available tables run ```sql SHOW ALL TABLES; ``` ##### ATTACH OPTIONS {#docs:stable:core_extensions:iceberg:iceberg_rest_catalogs::attach-options} A REST Catalog with OAuth2 authorization can also be attached with just an `ATTACH` statement. See the complete list of `ATTACH` options for a REST catalog below. | Parameter | Type | Default | Description | |-------------------------------|-----------|----------|-------------------------------------------------------------------------------------------------------| | `ENDPOINT_TYPE` | `VARCHAR` | `NULL` | Used for attaching S3Tables or Glue catalogs. Allowed values are 'GLUE' and 'S3_TABLES' | | `ENDPOINT` | `VARCHAR` | `NULL` | URL endpoint to communicate with the REST Catalog. Cannot be used in conjunction with `ENDPOINT_TYPE` | | `SECRET` | `VARCHAR` | `NULL` | Name of secret used to communicate with the REST Catalog | | `CLIENT_ID` | `VARCHAR` | `NULL` | CLIENT_ID used for Secret | | `CLIENT_SECRET` | `VARCHAR` | `NULL` | CLIENT_SECRET needed for Secret | | `DEFAULT_REGION` | `VARCHAR` | `NULL` | A Default region to use when communicating with the storage layer | | `OAUTH2_SERVER_URI` | `VARCHAR` | `NULL` | OAuth2 server url for getting a Bearer Token | | `AUTHORIZATION_TYPE` | `VARCHAR` | `OAUTH2` | Pass `SigV4` for Catalogs the require SigV4 authorization, `none` for catalogs that don't need auth | | `SUPPORT_NESTED_NAMESPACES` | `BOOLEAN` | `true` | Option for catalogs that support nested namespaces. | | `SUPPORT_STAGE_CREATE` | `BOOLEAN` | `false` | Option for catalogs that do not support stage create. | The following options can only be passed to a `CREATE SECRET` statement, and they require `AUTHORIZATION_TYPE` to be `OAUTH2` | Parameter | Type | Default | Description | |---------------------|-----------|---------|-----------------------------------------------------| | `OAUTH2_GRANT_TYPE` | `VARCHAR` | `NULL` | Grant Type when requesting an OAuth Token | | `OAUTH2_SCOPE` | `VARCHAR` | `NULL` | Requested scope for the returned OAuth Access Token | ##### Supported Operations {#docs:stable:core_extensions:iceberg:iceberg_rest_catalogs::supported-operations} The DuckDB Iceberg extensions supports the following operations when used with a REST catalog attached: - `CREATE/DROP SCHEMA` - `CREATE/DROP TABLE` - `INSERT INTO` - `SELECT` Since these operations are supported, the following would also work: ```sql COPY FROM DATABASE duckdb_db TO iceberg_datalake; -- Or COPY FROM DATABASE iceberg_datalake to duckdb_db; ``` This functionality enables deep copies between Iceberg and DuckDB storage. ##### Metadata Operations {#docs:stable:core_extensions:iceberg:iceberg_rest_catalogs::metadata-operations} The functions `iceberg_metadata` and `iceberg_snapshots` are also available to use with an Iceberg REST catalog using a fully qualified path, e.g. ```sql SELECT * FROM iceberg_metadata(my_datalake.default.t) -- Or SELECT * FROM iceberg_snapshots(my_datalake.default.t) ``` This functionality enables the user to grab a `snapshot_from_id` to do **time-traveling**. ```sql SELECT * FROM my_datalake.default.t AT (VERSION => âŸ¨SNAPSHOT_IDâŸ©) -- Or using a timestamp SELECT * FROM my_datalake.default.t AT (TIMESTAMP => TIMESTAMP '2025-09-22 12:32:43.217') ``` ##### Interoperability with DuckLake {#docs:stable:core_extensions:iceberg:iceberg_rest_catalogs::interoperability-with-ducklake} The DuckDB Iceberg extensions exposes a function to do metadata only copies of the Iceberg metadata to DuckLake, which enables users to query Iceberg tables as if they where DuckLake tables. ```sql -- Given that we have an Iceberg catalog attached aliased to iceberg_datalake ATTACH `ducklake:my_ducklake.ducklake` AS my_ducklake; CALL iceberg_to_ducklake('iceberg_datalake', 'my_ducklake'); ``` It is also possible to skip a set of tables provided the `skip_tables` parameter. ```sql CALL iceberg_to_ducklake('iceberg_datalake', 'my_ducklake', skip_tables := ['table_to_skip']); ``` ##### Unsupported Operations {#docs:stable:core_extensions:iceberg:iceberg_rest_catalogs::unsupported-operations} The following operations are not supported by the Iceberg DuckDB extension: - `UPDATE` - `DELETE` - `MERGE INTO` - `ALTER TABLE` #### Specific Catalog Examples {#docs:stable:core_extensions:iceberg:iceberg_rest_catalogs::specific-catalog-examples} ##### R2 Catalog {#docs:stable:core_extensions:iceberg:iceberg_rest_catalogs::r2-catalog} To attach to an [R2 cloudflare](https://developers.cloudflare.com/r2/data-catalog/) managed catalog follow the attach steps below. ```sql CREATE SECRET r2_secret ( TYPE iceberg, TOKEN 'âŸ¨r2_tokenâŸ©' ); ``` You can create a token by following the [create an API token](https://developers.cloudflare.com/r2/data-catalog/get-started/#3-create-an-api-token) steps in getting started. Then, attach the catalog with the following commands. ```sql ATTACH 'âŸ¨warehouseâŸ©' AS my_r2_catalog ( TYPE iceberg, ENDPOINT 'âŸ¨catalog-uriâŸ©' ); ``` The variables for `warehouse` and `catalog-uri` will be available under the settings of the desired R2 Object Storage Catalog (R2 Object Store > Catalog name > Settings). ##### Polaris {#docs:stable:core_extensions:iceberg:iceberg_rest_catalogs::polaris} To attach to a [Polaris](https://polaris.apache.org) catalog the following commands will work. ```sql CREATE SECRET polaris_secret ( TYPE iceberg, CLIENT_ID 'âŸ¨adminâŸ©', CLIENT_SECRET 'âŸ¨passwordâŸ©', ); ``` ```sql ATTACH 'quickstart_catalog' AS polaris_catalog ( TYPE iceberg, ENDPOINT 'âŸ¨polaris_rest_catalog_endpointâŸ©' ); ``` ##### Lakekeeper {#docs:stable:core_extensions:iceberg:iceberg_rest_catalogs::lakekeeper} To attach to a [Lakekeeper](https://docs.lakekeeper.io) catalog the following commands will work. ```sql CREATE SECRET lakekeeper_secret ( TYPE iceberg, CLIENT_ID 'âŸ¨adminâŸ©', CLIENT_SECRET 'âŸ¨passwordâŸ©', OAUTH2_SCOPE 'âŸ¨scopeâŸ©', OAUTH2_SERVER_URI 'âŸ¨lakekeeper_oauth_urlâŸ©' ); ``` ```sql ATTACH 'âŸ¨warehouseâŸ©' AS lakekeeper_catalog ( TYPE iceberg, ENDPOINT 'âŸ¨lakekeeper_irc_urlâŸ©', SECRET 'âŸ¨lakekeeper_secretâŸ©' ); ``` #### Limitations {#docs:stable:core_extensions:iceberg:iceberg_rest_catalogs::limitations} Reading from Iceberg REST Catalogs backed by remote storage that is not S3 or S3Tables is not yet supported. ### Amazon S3 Tables {#docs:stable:core_extensions:iceberg:amazon_s3_tables} > Support for S3 Tables is currently experimental. The `iceberg` extension supports reading Iceberg tables stored in [Amazon S3 Tables](https://aws.amazon.com/s3/features/tables/). #### Requirements {#docs:stable:core_extensions:iceberg:amazon_s3_tables::requirements} Install the following extensions: ```sql INSTALL aws; INSTALL httpfs; INSTALL iceberg; ``` #### Connecting to Amazon S3 Tables {#docs:stable:core_extensions:iceberg:amazon_s3_tables::connecting-to-amazon-s3-tables} You can let DuckDB detect your AWS credentials and configuration based on the default profile in your `~/.aws` directory by creating the following secret using the [Secrets Manager](#docs:stable:configuration:secrets_manager): ```sql CREATE SECRET ( TYPE s3, PROVIDER credential_chain ); ``` Alternatively, you can set the values manually: ```sql CREATE SECRET ( TYPE s3, KEY_ID 'âŸ¨key_idâŸ©', SECRET 'âŸ¨secretâŸ©', REGION 'âŸ¨regionâŸ©' ); ``` Then, connect to the catalog using you S3 Tables ARN (available in the AWS Management Console) and the `ENDPOINT_TYPE s3_tables` option: ```sql ATTACH 'âŸ¨s3_tables_arnâŸ©' AS s3_tables ( TYPE iceberg, ENDPOINT_TYPE s3_tables ); ``` To check whether the attachment worked, list all tables: ```sql SHOW ALL TABLES; ``` You can query a table as follows: ```sql SELECT count(*) FROM s3_tables.âŸ¨namespace_nameâŸ©.âŸ¨table_nameâŸ©; ``` ### Amazon SageMaker Lakehouse (AWS Glue) {#docs:stable:core_extensions:iceberg:amazon_sagemaker_lakehouse} > Support for Amazon SageMaker Lakehouse (AWS Glue) is currently experimental. The `iceberg` extension supports reading Iceberg tables through the [Amazon SageMaker Lakehouse (a.k.a. AWS Glue)](https://aws.amazon.com/sagemaker/lakehouse/) catalog. #### Requirements {#docs:stable:core_extensions:iceberg:amazon_sagemaker_lakehouse::requirements} To use it, install the following extensions: ```sql INSTALL aws; INSTALL httpfs; INSTALL iceberg; ``` > If you want to switch back to using extensions from the `core` repository, > follow the [extension documentation](#docs:stable:extensions:installing_extensions::force-installing-to-upgrade-extensions). #### Connecting to Amazon SageMaker Lakehouse (AWS Glue) {#docs:stable:core_extensions:iceberg:amazon_sagemaker_lakehouse::connecting-to-amazon-sagemaker-lakehouse-aws-glue} Create an S3 secret using the [Secrets Manager](#docs:stable:configuration:secrets_manager): ```sql CREATE SECRET ( TYPE s3, PROVIDER credential_chain, CHAIN sts, ASSUME_ROLE_ARN 'arn:aws:iam::âŸ¨account_idâŸ©:role/âŸ¨roleâŸ©', REGION 'us-east-2' ); ``` In this example we use an STS token, but [other authentication methods are supported](#docs:stable:core_extensions:aws). Then, connect to the catalog: ```sql ATTACH 'âŸ¨account_idâŸ©' AS glue_catalog ( TYPE iceberg, ENDPOINT 'glue.âŸ¨REGIONâŸ©.amazonaws.com/iceberg' AUTHORIZATION_TYPE 'sigv4' ); ``` Or alternatively: ```sql ATTACH 'âŸ¨account_idâŸ©' AS glue_catalog ( TYPE iceberg, ENDPOINT_TYPE 'glue' ); ``` To check whether the attachment worked, list all tables: ```sql SHOW ALL TABLES; ``` You can query a table as follows: ```sql SELECT count(*) FROM glue_catalog.âŸ¨namespace_nameâŸ©.âŸ¨table_nameâŸ©; ``` ### Troubleshooting {#docs:stable:core_extensions:iceberg:troubleshooting} #### Limitations {#docs:stable:core_extensions:iceberg:troubleshooting::limitations} * Writing Iceberg tables is currently not supported. * Reading tables with deletes is not yet supported. #### Curl Request Fails {#docs:stable:core_extensions:iceberg:troubleshooting::curl-request-fails} ##### Problem {#docs:stable:core_extensions:iceberg:troubleshooting::problem} When trying to attach to an Iceberg REST Catalog endpoint, DuckDB returns the following error: ```console IO Error: Curl Request to '/v1/oauth/tokens' failed with error: 'URL using bad/illegal format or missing URL' ``` ##### Solution {#docs:stable:core_extensions:iceberg:troubleshooting::solution} Make sure that you have the latest Iceberg extension installed: ```batch duckdb ``` ```plsql FORCE INSTALL iceberg FROM core_nightly; ``` Exit DuckDB and start a new session: ```batch duckdb ``` ```plsql LOAD iceberg; ``` #### HTTP Error 403 {#docs:stable:core_extensions:iceberg:troubleshooting::http-error-403} ##### Problem {#docs:stable:core_extensions:iceberg:troubleshooting::problem} When trying to list the tables in a remote-attached catalog, DuckDB returns the following error: ```sql SHOW ALL TABLES; ``` ```console Failed to query https://s3tables.us-east-2.amazonaws.com/iceberg/v1/arn:aws:s3tables:... http error 403 thrown. Message: {"message":"The security token included in the request is invalid."} ``` ##### Solution {#docs:stable:core_extensions:iceberg:troubleshooting::solution} Use the `duckdb_secrets()` function to check whether DuckDB loaded the required credentials: ```sql .mode line FROM duckdb_secrets(); ``` If you do not see your credentials, set them manually using the following secret: ```sql CREATE SECRET ( TYPE s3, KEY_ID 'âŸ¨AKIAIOSFODNN7EXAMPLEâŸ©', SECRET 'âŸ¨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYâŸ©', REGION 'âŸ¨us-east-1âŸ©' ); ``` ## ICU Extension {#docs:stable:core_extensions:icu} The `icu` extension contains an easy-to-use version of the collation/timezone part of the [ICU library](https://github.com/unicode-org/icu). #### Installing and Loading {#docs:stable:core_extensions:icu::installing-and-loading} The `icu` extension will be transparently [autoloaded](#docs:stable:extensions:overview::autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run: ```sql INSTALL icu; LOAD icu; ``` #### Features {#docs:stable:core_extensions:icu::features} The `icu` extension introduces the following features: * [Region-dependent collations](#docs:stable:sql:expressions:collations) * [Time zones](#docs:stable:sql:data_types:timezones), used for [timestamp data types](#docs:stable:sql:data_types:timestamp) and [timestamp functions](#docs:stable:sql:functions:timestamptz) ## inet Extension {#docs:stable:core_extensions:inet} The `inet` extension defines the `INET` data type for storing [IPv4](https://en.wikipedia.org/wiki/Internet_Protocol_version_4) and [IPv6](https://en.wikipedia.org/wiki/IPv6) Internet addresses. It supports the [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation) for subnet masks (e.g., `198.51.100.0/22`, `2001:db8:3c4d::/48`). #### Installing and Loading {#docs:stable:core_extensions:inet::installing-and-loading} The `inet` extension will be transparently [autoloaded](#docs:stable:extensions:overview::autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run: ```sql INSTALL inet; LOAD inet; ``` #### Examples {#docs:stable:core_extensions:inet::examples} ```sql SELECT '127.0.0.1'::INET AS ipv4, '2001:db8:3c4d::/48'::INET AS ipv6; ``` | ipv4 | ipv6 | |-----------|--------------------| | 127.0.0.1 | 2001:db8:3c4d::/48 | ```sql CREATE TABLE tbl (id INTEGER, ip INET); INSERT INTO tbl VALUES (1, '192.168.0.0/16'), (2, '127.0.0.1'), (3, '8.8.8.8'), (4, 'fe80::/10'), (5, '2001:db8:3c4d:15::1a2f:1a2b'); SELECT * FROM tbl; ``` | id | ip | |---:|-----------------------------| | 1 | 192.168.0.0/16 | | 2 | 127.0.0.1 | | 3 | 8.8.8.8 | | 4 | fe80::/10 | | 5 | 2001:db8:3c4d:15::1a2f:1a2b | #### Operations on `INET` Values {#docs:stable:core_extensions:inet::operations-on-inet-values} `INET` values can be compared naturally, and IPv4 will sort before IPv6. Additionally, IP addresses can be modified by adding or subtracting integers. ```sql CREATE TABLE tbl (cidr INET); INSERT INTO tbl VALUES ('127.0.0.1'::INET + 10), ('fe80::10'::INET - 9), ('127.0.0.1'), ('2001:db8:3c4d:15::1a2f:1a2b'); SELECT cidr FROM tbl ORDER BY cidr ASC; ``` | cidr | |-----------------------------| | 127.0.0.1 | | 127.0.0.11 | | 2001:db8:3c4d:15::1a2f:1a2b | | fe80::7 | #### `host` Function {#docs:stable:core_extensions:inet::host-function} The host component of an `INET` value can be extracted using the `HOST()` function. ```sql CREATE TABLE tbl (cidr INET); INSERT INTO tbl VALUES ('192.168.0.0/16'), ('127.0.0.1'), ('2001:db8:3c4d:15::1a2f:1a2b/96'); SELECT cidr, host(cidr) FROM tbl; ``` | cidr | host(cidr) | |--------------------------------|-----------------------------| | 192.168.0.0/16 | 192.168.0.0 | | 127.0.0.1 | 127.0.0.1 | | 2001:db8:3c4d:15::1a2f:1a2b/96 | 2001:db8:3c4d:15::1a2f:1a2b | #### `netmask` Function {#docs:stable:core_extensions:inet::netmask-function} Computes the network mask for the address's network. ```sql CREATE TABLE tbl (cidr INET); INSERT INTO tbl VALUES ('192.168.1.5/24'), ('127.0.0.1'), ('2001:db8:3c4d:15::1a2f:1a2b/96'); SELECT cidr, netmask(cidr) FROM tbl; ``` | cidr | netmask(cidr) | |--------------------------------|------------------------------------| | 192.168.1.5/24 | 255.255.255.0/24 | | 127.0.0.1 | 255.255.255.255 | | 2001:db8:3c4d:15::1a2f:1a2b/96 | ffff:ffff:ffff:ffff:ffff:ffff::/96 | #### `network` Function {#docs:stable:core_extensions:inet::network-function} Returns the network part of the address, zeroing out whatever is to the right of the netmask. ```sql CREATE TABLE tbl (cidr INET); INSERT INTO tbl VALUES ('192.168.1.5/24'), ('127.0.0.1'), ('2001:db8:3c4d:15::1a2f:1a2b/96'); SELECT cidr, network(cidr) FROM tbl; ``` | cidr | network(cidr) | |--------------------------------|------------------------------------| | 192.168.1.5/24 | 192.168.1.0/24 | | 127.0.0.1 | 255.255.255.255 | | 2001:db8:3c4d:15::1a2f:1a2b/96 | ffff:ffff:ffff:ffff:ffff:ffff::/96 | #### `broadcast` Function {#docs:stable:core_extensions:inet::broadcast-function} Computes the broadcast address for the address's network. ```sql CREATE TABLE tbl (cidr INET); INSERT INTO tbl VALUES ('192.168.1.5/24'), ('127.0.0.1'), ('2001:db8:3c4d:15::1a2f:1a2b/96'); SELECT cidr, broadcast(cidr) FROM tbl; ``` | cidr | broadcast(cidr) | |--------------------------------|------------------------------------| | 192.168.1.5/24 | 192.168.1.0/24 | | 127.0.0.1 | 127.0.0.1 | | 2001:db8:3c4d:15::1a2f:1a2b/96 | 2001:db8:3c4d:15::/96 | #### `<<=` Predicate {#docs:stable:core_extensions:inet::-predicate} Is subnet contained by or equal to subnet? ```sql CREATE TABLE tbl (cidr INET); INSERT INTO tbl VALUES ('192.168.1.0/24'), ('127.0.0.1'), ('2001:db8:3c4d:15::1a2f:1a2b/96'); SELECT cidr, INET '192.168.1.5/32' <<= cidr FROM tbl; ``` | cidr | (CAST('192.168.1.5/32' AS INET) <<= cidr) | |--------------------------------|---------------------------------------------| | 192.168.1.5/24 | true | | 127.0.0.1 | false | | 2001:db8:3c4d:15::1a2f:1a2b/96 | false | #### `>>=` Predicate {#docs:stable:core_extensions:inet::-predicate} Does subnet contain or equal subnet? ```sql CREATE TABLE tbl (cidr INET); INSERT INTO tbl VALUES ('192.168.1.0/24'), ('127.0.0.1'), ('2001:db8:3c4d:15::1a2f:1a2b/96'); SELECT cidr, INET '192.168.0.0/16' >>= cidr FROM tbl; ``` | cidr | (CAST('192.168.0.0/16' AS INET) >>= cidr) | |--------------------------------|---------------------------------------------| | 192.168.1.5/24 | true | | 127.0.0.1 | false | | 2001:db8:3c4d:15::1a2f:1a2b/96 | false | #### HTML Escape and Unescape Functions {#docs:stable:core_extensions:inet::html-escape-and-unescape-functions} ```sql SELECT html_escape('&'); ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ html_escape('&') â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ & â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ```sql SELECT html_unescape('&'); ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ html_unescape('&') â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ & â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ## jemalloc Extension {#docs:stable:core_extensions:jemalloc} The `jemalloc` extension replaces the system's memory allocator with [jemalloc](https://jemalloc.net/). Unlike other DuckDB extensions, the `jemalloc` extension is statically linked and cannot be installed or loaded during runtime. #### Operating System Support {#docs:stable:core_extensions:jemalloc::operating-system-support} The availability of the `jemalloc` extension depends on the operating system. ##### Linux {#docs:stable:core_extensions:jemalloc::linux} Linux distributions of DuckDB ships with the `jemalloc` extension. To disable the `jemalloc` extension, [build DuckDB from source](#docs:stable:dev:building:overview) and set the `SKIP_EXTENSIONS` flag as follows: ```batch GEN=ninja SKIP_EXTENSIONS="jemalloc" make ``` ##### macOS {#docs:stable:core_extensions:jemalloc::macos} The macOS version of DuckDB does not ship with the `jemalloc` extension but can be [built from source](#docs:stable:dev:building:macos) to include it: ```batch GEN=ninja BUILD_JEMALLOC=1 make ``` ##### Windows {#docs:stable:core_extensions:jemalloc::windows} On Windows, this extension is not available. #### Configuration {#docs:stable:core_extensions:jemalloc::configuration} ##### Environment Variables {#docs:stable:core_extensions:jemalloc::environment-variables} The jemalloc allocator in DuckDB can be configured via the [`MALLOC_CONF` environment variable](https://jemalloc.net/jemalloc.3.html#environment). ##### Background Threads {#docs:stable:core_extensions:jemalloc::background-threads} By default, jemalloc's [background threads](https://jemalloc.net/jemalloc.3.html#background_thread) are disabled. To enable them, use the following configuration option: ```sql SET allocator_background_threads = true; ``` Background threads asynchronously purge outstanding allocations so that this doesn't have to be done synchronously by the foreground threads. This improves allocation performance, and should be noticeable in allocation-heavy workloads, especially on many-core CPUs. ## MySQL Extension {#docs:stable:core_extensions:mysql} The `mysql` extension allows DuckDB to directly read and write data from/to a running MySQL instance. The data can be queried directly from the underlying MySQL database. Data can be loaded from MySQL tables into DuckDB tables, or vice versa. #### Installing and Loading {#docs:stable:core_extensions:mysql::installing-and-loading} To install the `mysql` extension, run: ```sql INSTALL mysql; ``` The extension is loaded automatically upon first use. If you prefer to load it manually, run: ```sql LOAD mysql; ``` #### Reading Data from MySQL {#docs:stable:core_extensions:mysql::reading-data-from-mysql} To make a MySQL database accessible to DuckDB use the `ATTACH` command with the `mysql` or the `mysql_scanner` type: ```sql ATTACH 'host=localhost user=root port=0 database=mysql' AS mysqldb (TYPE mysql); USE mysqldb; ``` ##### Configuration {#docs:stable:core_extensions:mysql::configuration} The connection string determines the parameters for how to connect to MySQL as a set of `key=value` pairs. Any options not provided are replaced by their default values, as per the table below. Connection information can also be specified with [environment variables](https://dev.mysql.com/doc/refman/8.3/en/environment-variables.html). If no option is provided explicitly, the MySQL extension tries to read it from an environment variable. | Setting | Default | Environment variable | |-------------|----------------|----------------------| | database | NULL | MYSQL_DATABASE | | host | localhost | MYSQL_HOST | | password | | MYSQL_PWD | | port | 0 | MYSQL_TCP_PORT | | socket | NULL | MYSQL_UNIX_PORT | | user | _current user_ | MYSQL_USER | | ssl_mode | preferred | | | ssl_ca | | | | ssl_capath | | | | ssl_cert | | | | ssl_cipher | | | | ssl_crl | | | | ssl_crlpath | | | | ssl_key | | | ##### Configuring via Secrets {#docs:stable:core_extensions:mysql::configuring-via-secrets} MySQL connection information can also be specified with [secrets](https://duckdb.org/docs/configuration/secrets_manager). The following syntax can be used to create a secret. ```sql CREATE SECRET ( TYPE mysql, HOST '127.0.0.1', PORT 0, DATABASE mysql, USER 'mysql', PASSWORD '' ); ``` The information from the secret will be used when `ATTACH` is called. We can leave the connection string empty to use all of the information stored in the secret. ```sql ATTACH '' AS mysql_db (TYPE mysql); ``` We can use the connection string to override individual options. For example, to connect to a different database while still using the same credentials, we can override only the database name in the following manner. ```sql ATTACH 'database=my_other_db' AS mysql_db (TYPE mysql); ``` By default, created secrets are temporary. Secrets can be persisted using the [`CREATE PERSISTENT SECRET` command](#docs:stable:configuration:secrets_manager::persistent-secrets). Persistent secrets can be used across sessions. ###### Managing Multiple Secrets {#docs:stable:core_extensions:mysql::managing-multiple-secrets} Named secrets can be used to manage connections to multiple MySQL database instances. Secrets can be given a name upon creation. ```sql CREATE SECRET mysql_secret_one ( TYPE mysql, HOST '127.0.0.1', PORT 0, DATABASE mysql, USER 'mysql', PASSWORD '' ); ``` The secret can then be explicitly referenced using the `SECRET` parameter in the `ATTACH`. ```sql ATTACH '' AS mysql_db_one (TYPE mysql, SECRET mysql_secret_one); ``` ##### SSL Connections {#docs:stable:core_extensions:mysql::ssl-connections} The [`ssl` connection parameters](https://dev.mysql.com/doc/refman/8.4/en/using-encrypted-connections.html) can be used to make SSL connections. Below is a description of the supported parameters. | Setting | Description | |-------------|--------------------------------------------------------------------------------------------------------------------------------------------------| | ssl_mode | The security state to use for the connection to the server: `disabled, required, verify_ca, verify_identity or preferred` (default: `preferred`) | | ssl_ca | The path name of the Certificate Authority (CA) certificate file | | ssl_capath | The path name of the directory that contains trusted SSL CA certificate files | | ssl_cert | The path name of the client public key certificate file | | ssl_cipher | The list of permissible ciphers for SSL encryption | | ssl_crl | The path name of the file containing certificate revocation lists | | ssl_crlpath | The path name of the directory that contains files containing certificate revocation lists | | ssl_key | The path name of the client private key file | ##### Reading MySQL Tables {#docs:stable:core_extensions:mysql::reading-mysql-tables} The tables in the MySQL database can be read as if they were normal DuckDB tables, but the underlying data is read directly from MySQL at query time. ```sql SHOW ALL TABLES; ``` | name | |-----------------| | signed_integers | ```sql SELECT * FROM signed_integers; ``` | t | s | m | i | b | |-----:|-------:|---------:|------------:|---------------------:| | -128 | -32768 | -8388608 | -2147483648 | -9223372036854775808 | | 127 | 32767 | 8388607 | 2147483647 | 9223372036854775807 | | NULL | NULL | NULL | NULL | NULL | It might be desirable to create a copy of the MySQL databases in DuckDB to prevent the system from re-reading the tables from MySQL continuously, particularly for large tables. Data can be copied over from MySQL to DuckDB using standard SQL, for example: ```sql CREATE TABLE duckdb_table AS FROM mysqlscanner.mysql_table; ``` #### Writing Data to MySQL {#docs:stable:core_extensions:mysql::writing-data-to-mysql} In addition to reading data from MySQL, create tables, ingest data into MySQL and make other modifications to a MySQL database using standard SQL queries. This allows you to use DuckDB to, for example, export data that is stored in a MySQL database to Parquet, or read data from a Parquet file into MySQL. Below is a brief example of how to create a new table in MySQL and load data into it. ```sql ATTACH 'host=localhost user=root port=0 database=mysqlscanner' AS mysql_db (TYPE mysql); CREATE TABLE mysql_db.tbl (id INTEGER, name VARCHAR); INSERT INTO mysql_db.tbl VALUES (42, 'DuckDB'); ``` Many operations on MySQL tables are supported. All these operations directly modify the MySQL database, and the result of subsequent operations can then be read using MySQL. Note that if modifications are not desired, `ATTACH` can be run with the `READ_ONLY` property which prevents making modifications to the underlying database. For example: ```sql ATTACH 'host=localhost user=root port=0 database=mysqlscanner' AS mysql_db (TYPE mysql, READ_ONLY); ``` #### Supported Operations {#docs:stable:core_extensions:mysql::supported-operations} Below is a list of supported operations. ##### `CREATE TABLE` {#docs:stable:core_extensions:mysql::create-table} ```sql CREATE TABLE mysql_db.tbl (id INTEGER, name VARCHAR); ``` ##### `INSERT INTO` {#docs:stable:core_extensions:mysql::insert-into} ```sql INSERT INTO mysql_db.tbl VALUES (42, 'DuckDB'); ``` ##### `SELECT` {#docs:stable:core_extensions:mysql::select} ```sql SELECT * FROM mysql_db.tbl; ``` | id | name | |---:|--------| | 42 | DuckDB | ##### `COPY` {#docs:stable:core_extensions:mysql::copy} ```sql COPY mysql_db.tbl TO 'data.parquet'; COPY mysql_db.tbl FROM 'data.parquet'; ``` You may also create a full copy of the database using the [`COPY FROM DATABASE` statement](#docs:stable:sql:statements:copy::copy-from-database--to): ```sql COPY FROM DATABASE mysql_db TO my_duckdb_db; ``` ##### `UPDATE` {#docs:stable:core_extensions:mysql::update} ```sql UPDATE mysql_db.tbl SET name = 'Woohoo' WHERE id = 42; ``` ##### `DELETE` {#docs:stable:core_extensions:mysql::delete} ```sql DELETE FROM mysql_db.tbl WHERE id = 42; ``` ##### `ALTER TABLE` {#docs:stable:core_extensions:mysql::alter-table} ```sql ALTER TABLE mysql_db.tbl ADD COLUMN k INTEGER; ``` ##### `DROP TABLE` {#docs:stable:core_extensions:mysql::drop-table} ```sql DROP TABLE mysql_db.tbl; ``` ##### `CREATE VIEW` {#docs:stable:core_extensions:mysql::create-view} ```sql CREATE VIEW mysql_db.v1 AS SELECT 42; ``` ##### `CREATE SCHEMA` and `DROP SCHEMA` {#docs:stable:core_extensions:mysql::create-schema-and-drop-schema} ```sql CREATE SCHEMA mysql_db.s1; CREATE TABLE mysql_db.s1.integers (i INTEGER); INSERT INTO mysql_db.s1.integers VALUES (42); SELECT * FROM mysql_db.s1.integers; ``` | i | |---:| | 42 | ```sql DROP SCHEMA mysql_db.s1; ``` ##### Transactions {#docs:stable:core_extensions:mysql::transactions} ```sql CREATE TABLE mysql_db.tmp (i INTEGER); BEGIN; INSERT INTO mysql_db.tmp VALUES (42); SELECT * FROM mysql_db.tmp; ``` This returns: | i | |---:| | 42 | ```sql ROLLBACK; SELECT * FROM mysql_db.tmp; ``` This returns an empty table. > The DDL statements are not transactional in MySQL. #### Running SQL Queries in MySQL {#docs:stable:core_extensions:mysql::running-sql-queries-in-mysql} ##### The `mysql_query` Table Function {#docs:stable:core_extensions:mysql::the-mysql_query-table-function} The `mysql_query` table function allows you to run arbitrary read queries within an attached database. `mysql_query` takes the name of the attached MySQL database to execute the query in, as well as the SQL query to execute. The result of the query is returned. Single-quote strings are escaped by repeating the single quote twice. ```sql mysql_query(attached_database::VARCHAR, query::VARCHAR) ``` For example: ```sql ATTACH 'host=localhost database=mysql' AS mysqldb (TYPE mysql); SELECT * FROM mysql_query('mysqldb', 'SELECT * FROM cars LIMIT 3'); ``` ##### The `mysql_execute` Function {#docs:stable:core_extensions:mysql::the-mysql_execute-function} The `mysql_execute` function allows running arbitrary queries within MySQL, including statements that update the schema and content of the database. ```sql ATTACH 'host=localhost database=mysql' AS mysqldb (TYPE mysql); CALL mysql_execute('mysqldb', 'CREATE TABLE my_table (i INTEGER)'); ``` #### Settings {#docs:stable:core_extensions:mysql::settings} | Name | Description | Default | |--------------------------------------|----------------------------------------------------------------|-----------| | `mysql_bit1_as_boolean` | Whether or not to convert `BIT(1)` columns to `BOOLEAN` | `true` | | `mysql_debug_show_queries` | DEBUG SETTING: print all queries sent to MySQL to stdout | `false` | | `mysql_experimental_filter_pushdown` | Whether or not to use filter pushdown (currently experimental) | `false` | | `mysql_tinyint1_as_boolean` | Whether or not to convert `TINYINT(1)` columns to `BOOLEAN` | `true` | #### Schema Cache {#docs:stable:core_extensions:mysql::schema-cache} To avoid having to continuously fetch schema data from MySQL, DuckDB keeps schema information â€“ such as the names of tables, their columns, etc. â€“ cached. If changes are made to the schema through a different connection to the MySQL instance, such as new columns being added to a table, the cached schema information might be outdated. In this case, the function `mysql_clear_cache` can be executed to clear the internal caches. ```sql CALL mysql_clear_cache(); ``` ## PostgreSQL Extension {#docs:stable:core_extensions:postgres} The `postgres` extension allows DuckDB to directly read and write data from a running PostgreSQL database instance. The data can be queried directly from the underlying PostgreSQL database. Data can be loaded from PostgreSQL tables into DuckDB tables, or vice versa. See the [official announcement](https://duckdb.org/2022/09/30/postgres-scanner) for implementation details and background. #### Installing and Loading {#docs:stable:core_extensions:postgres::installing-and-loading} The `postgres` extension will be transparently [autoloaded](#docs:stable:extensions:overview::autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run: ```sql INSTALL postgres; LOAD postgres; ``` #### Connecting {#docs:stable:core_extensions:postgres::connecting} To make a PostgreSQL database accessible to DuckDB, use the `ATTACH` command with the `postgres` or `postgres_scanner` type. To connect to the `public` schema of the PostgreSQL instance running on localhost in read-write mode, run: ```sql ATTACH '' AS postgres_db (TYPE postgres); ``` To connect to the PostgreSQL instance with the given parameters in read-only mode, run: ```sql ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS db (TYPE postgres, READ_ONLY); ``` By default, all schemas are attached. When working with large instances, it can be useful to only attach a specific schema. This can be accomplished using the `SCHEMA` command. ```sql ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS db (TYPE postgres, SCHEMA 'public'); ``` ##### Configuration {#docs:stable:core_extensions:postgres::configuration} The `ATTACH` command takes as input either a [`libpq` connection string](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING) or a [PostgreSQL URI](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING-URIS). Below are some example connection strings and commonly used parameters. A full list of available parameters can be found in the [PostgreSQL documentation](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS). ```text dbname=postgresscanner host=localhost port=5432 dbname=mydb connect_timeout=10 ``` | Name | Description | Default | | ---------- | ------------------------------------ | -------------- | | `dbname` | Database name | [user] | | `host` | Name of host to connect to | `localhost` | | `hostaddr` | Host IP address | `localhost` | | `passfile` | Name of file passwords are stored in | `~/.pgpass` | | `password` | PostgreSQL password | (empty) | | `port` | Port number | `5432` | | `user` | PostgreSQL user name | _current user_ | An example URI is `postgresql://username@hostname/dbname`. ##### Configuring via Secrets {#docs:stable:core_extensions:postgres::configuring-via-secrets} PostgreSQL connection information can also be specified with [secrets](https://duckdb.org/docs/configuration/secrets_manager). The following syntax can be used to create a secret. ```sql CREATE SECRET ( TYPE postgres, HOST '127.0.0.1', PORT 5432, DATABASE postgres, USER 'postgres', PASSWORD '' ); ``` The information from the secret will be used when `ATTACH` is called. We can leave the PostgreSQL connection string empty to use all of the information stored in the secret. ```sql ATTACH '' AS postgres_db (TYPE postgres); ``` We can use the PostgreSQL connection string to override individual options. For example, to connect to a different database while still using the same credentials, we can override only the database name in the following manner. ```sql ATTACH 'dbname=my_other_db' AS postgres_db (TYPE postgres); ``` By default, created secrets are temporary. Secrets can be persisted using the [`CREATE PERSISTENT SECRET` command](#docs:stable:configuration:secrets_manager::persistent-secrets). Persistent secrets can be used across sessions. ###### Managing Multiple Secrets {#docs:stable:core_extensions:postgres::managing-multiple-secrets} Named secrets can be used to manage connections to multiple PostgreSQL database instances. Secrets can be given a name upon creation. ```sql CREATE SECRET postgres_secret_one ( TYPE postgres, HOST '127.0.0.1', PORT 5432, DATABASE postgres, USER 'postgres', PASSWORD '' ); ``` The secret can then be explicitly referenced using the `SECRET` parameter in the `ATTACH`. ```sql ATTACH '' AS postgres_db_one (TYPE postgres, SECRET postgres_secret_one); ``` ##### Configuring via Environment Variables {#docs:stable:core_extensions:postgres::configuring-via-environment-variables} PostgreSQL connection information can also be specified with [environment variables](https://www.postgresql.org/docs/current/libpq-envars.html). This can be useful in a production environment where the connection information is managed externally and passed in to the environment. ```bash export PGPASSWORD="secret" export PGHOST=localhost export PGUSER=owner export PGDATABASE=mydatabase ``` Then, to connect, start the `duckdb` process and run: ```sql ATTACH '' AS p (TYPE postgres); ``` #### Usage {#docs:stable:core_extensions:postgres::usage} The tables in the PostgreSQL database can be read as if they were normal DuckDB tables, but the underlying data is read directly from PostgreSQL at query time. ```sql SHOW ALL TABLES; ``` | name | | ----- | | uuids | ```sql SELECT * FROM uuids; ``` | u | | ------------------------------------ | | 6d3d2541-710b-4bde-b3af-4711738636bf | | NULL | | 00000000-0000-0000-0000-000000000001 | | ffffffff-ffff-ffff-ffff-ffffffffffff | It might be desirable to create a copy of the PostgreSQL databases in DuckDB to prevent the system from re-reading the tables from PostgreSQL continuously, particularly for large tables. Data can be copied over from PostgreSQL to DuckDB using standard SQL, for example: ```sql CREATE TABLE duckdb_table AS FROM postgres_db.postgres_tbl; ``` #### Writing Data to PostgreSQL {#docs:stable:core_extensions:postgres::writing-data-to-postgresql} In addition to reading data from PostgreSQL, the extension allows you to create tables, ingest data into PostgreSQL and make other modifications to a PostgreSQL database using standard SQL queries. This allows you to use DuckDB to, for example, export data that is stored in a PostgreSQL database to Parquet, or read data from a Parquet file into PostgreSQL. Below is a brief example of how to create a new table in PostgreSQL and load data into it. ```sql ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE postgres); CREATE TABLE postgres_db.tbl (id INTEGER, name VARCHAR); INSERT INTO postgres_db.tbl VALUES (42, 'DuckDB'); ``` Many operations on PostgreSQL tables are supported. All these operations directly modify the PostgreSQL database, and the result of subsequent operations can then be read using PostgreSQL. Note that if modifications are not desired, `ATTACH` can be run with the `READ_ONLY` property which prevents making modifications to the underlying database. For example: ```sql ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE postgres, READ_ONLY); ``` Below is a list of supported operations. ##### `CREATE TABLE` {#docs:stable:core_extensions:postgres::create-table} ```sql CREATE TABLE postgres_db.tbl (id INTEGER, name VARCHAR); ``` ##### `INSERT INTO` {#docs:stable:core_extensions:postgres::insert-into} ```sql INSERT INTO postgres_db.tbl VALUES (42, 'DuckDB'); ``` ##### `SELECT` {#docs:stable:core_extensions:postgres::select} ```sql SELECT * FROM postgres_db.tbl; ``` | id | name | | ---: | ------ | | 42 | DuckDB | ##### `COPY` {#docs:stable:core_extensions:postgres::copy} You can copy tables back and forth between PostgreSQL and DuckDB: ```sql COPY postgres_db.tbl TO 'data.parquet'; COPY postgres_db.tbl FROM 'data.parquet'; ``` These copies use [PostgreSQL binary wire encoding](https://www.postgresql.org/docs/current/sql-copy.html). DuckDB can also write data using this encoding to a file which you can then load into PostgreSQL using a client of your choosing if you would like to do your own connection management: ```sql COPY 'data.parquet' TO 'pg.bin' WITH (FORMAT postgres_binary); ``` The file produced will be the equivalent of copying the file to PostgreSQL using DuckDB and then dumping it from PostgreSQL using `psql` or another client: DuckDB: ```sql COPY postgres_db.tbl FROM 'data.parquet'; ``` PostgreSQL: ```sql \copy tbl TO 'data.bin' WITH (FORMAT BINARY); ``` You may also create a full copy of the database using the [`COPY FROM DATABASE` statement](#docs:stable:sql:statements:copy::copy-from-database--to): ```sql COPY FROM DATABASE postgres_db TO my_duckdb_db; ``` ##### `UPDATE` {#docs:stable:core_extensions:postgres::update} ```sql UPDATE postgres_db.tbl SET name = 'Woohoo' WHERE id = 42; ``` ##### `DELETE` {#docs:stable:core_extensions:postgres::delete} ```sql DELETE FROM postgres_db.tbl WHERE id = 42; ``` ##### `ALTER TABLE` {#docs:stable:core_extensions:postgres::alter-table} ```sql ALTER TABLE postgres_db.tbl ADD COLUMN k INTEGER; ``` ##### `DROP TABLE` {#docs:stable:core_extensions:postgres::drop-table} ```sql DROP TABLE postgres_db.tbl; ``` ##### `CREATE VIEW` {#docs:stable:core_extensions:postgres::create-view} ```sql CREATE VIEW postgres_db.v1 AS SELECT 42; ``` ##### `CREATE SCHEMA` / `DROP SCHEMA` {#docs:stable:core_extensions:postgres::create-schema--drop-schema} ```sql CREATE SCHEMA postgres_db.s1; CREATE TABLE postgres_db.s1.integers (i INTEGER); INSERT INTO postgres_db.s1.integers VALUES (42); SELECT * FROM postgres_db.s1.integers; ``` | i | | ---: | | 42 | ```sql DROP SCHEMA postgres_db.s1; ``` #### `DETACH` {#docs:stable:core_extensions:postgres::detach} ```sql DETACH postgres_db; ``` ##### Transactions {#docs:stable:core_extensions:postgres::transactions} ```sql CREATE TABLE postgres_db.tmp (i INTEGER); BEGIN; INSERT INTO postgres_db.tmp VALUES (42); SELECT * FROM postgres_db.tmp; ``` This returns: | i | | ---: | | 42 | ```sql ROLLBACK; SELECT * FROM postgres_db.tmp; ``` This returns an empty table. #### Running SQL Queries in PostgreSQL {#docs:stable:core_extensions:postgres::running-sql-queries-in-postgresql} ##### The `postgres_query` Table Function {#docs:stable:core_extensions:postgres::the-postgres_query-table-function} The `postgres_query` table function allows you to run arbitrary read queries within an attached database. `postgres_query` takes the name of the attached PostgreSQL database to execute the query in, as well as the SQL query to execute. The result of the query is returned. Single-quote strings are escaped by repeating the single quote twice. ```sql postgres_query(attached_database::VARCHAR, query::VARCHAR) ``` For example: ```sql ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE postgres); SELECT * FROM postgres_query('postgres_db', 'SELECT * FROM cars LIMIT 3'); ``` | brand | model | color | | ------------ | ---------- | ----- | | Ferrari | Testarossa | red | | Aston Martin | DB2 | blue | | Bentley | Mulsanne | gray | ##### The `postgres_execute` Function {#docs:stable:core_extensions:postgres::the-postgres_execute-function} The `postgres_execute` function allows running arbitrary queries within PostgreSQL, including statements that update the schema and content of the database. ```sql ATTACH 'dbname=postgresscanner' AS postgres_db (TYPE postgres); CALL postgres_execute('postgres_db', 'CREATE TABLE my_table (i INTEGER)'); ``` #### Settings {#docs:stable:core_extensions:postgres::settings} The extension exposes the following configuration parameters. | Name | Description | Default | | --------------------------------- | ---------------------------------------------------------------------------- | ------- | | `pg_array_as_varchar` | Read PostgreSQL arrays as varchar - enables reading mixed dimensional arrays | `false` | | `pg_connection_cache` | Whether or not to use the connection cache | `true` | | `pg_connection_limit` | The maximum amount of concurrent PostgreSQL connections | `64` | | `pg_debug_show_queries` | DEBUG SETTING: print all queries sent to PostgreSQL to stdout | `false` | | `pg_experimental_filter_pushdown` | Whether or not to use filter pushdown (currently experimental) | `true` | | `pg_pages_per_task` | The amount of pages per task | `1000` | | `pg_use_binary_copy` | Whether or not to use BINARY copy to read data | `true` | | `pg_null_byte_replacement` | When writing NULL bytes to Postgres, replace them with the given character | `NULL` | | `pg_use_ctid_scan` | Whether or not to parallelize scanning using table ctids | `true` | #### Schema Cache {#docs:stable:core_extensions:postgres::schema-cache} To avoid having to continuously fetch schema data from PostgreSQL, DuckDB keeps schema information â€“ such as the names of tables, their columns, etc. â€“ cached. If changes are made to the schema through a different connection to the PostgreSQL instance, such as new columns being added to a table, the cached schema information might be outdated. In this case, the function `pg_clear_cache` can be executed to clear the internal caches. ```sql CALL pg_clear_cache(); ``` > **Deprecated.** The old `postgres_attach` function is deprecated. It is recommended to switch over to the new `ATTACH` syntax. ## Spatial {#core_extensions:spatial} ### Spatial Extension {#docs:stable:core_extensions:spatial:overview} The `spatial` extension provides support for geospatial data processing in DuckDB. For an overview of the extension, see our [blog post](https://duckdb.org/2023/04/28/spatial). #### Installing and Loading {#docs:stable:core_extensions:spatial:overview::installing-and-loading} To install the `spatial` extension, run: ```sql INSTALL spatial; ``` Note that the `spatial` extension is not autoloadable. Therefore, you need to load it before using it: ```sql LOAD spatial; ``` #### The `GEOMETRY` Type {#docs:stable:core_extensions:spatial:overview::the-geometry-type} The core of the spatial extension is the `GEOMETRY` type. If you're unfamiliar with geospatial data and GIS tooling, this type probably works very different from what you'd expect. On the surface, the `GEOMETRY` type is a binary representation of â€œgeometryâ€ data made up out of sets of vertices (pairs of X and Y `double` precision floats). But what makes it somewhat special is that its actually used to store one of several different geometry subtypes. These are `POINT`, `LINESTRING`, `POLYGON`, as well as their â€œcollectionâ€ equivalents, `MULTIPOINT`, `MULTILINESTRING` and `MULTIPOLYGON`. Lastly there is `GEOMETRYCOLLECTION`, which can contain any of the other subtypes, as well as other `GEOMETRYCOLLECTION`s recursively. This may seem strange at first, since DuckDB already have types like `LIST`, `STRUCT` and `UNION` which could be used in a similar way, but the design and behavior of the `GEOMETRY` type is actually based on the [Simple Features](https://en.wikipedia.org/wiki/Simple_Features) geometry model, which is a standard used by many other databases and GIS software. The spatial extension also includes a couple of experimental non-standard explicit geometry types, such as `POINT_2D`, `LINESTRING_2D`, `POLYGON_2D` and `BOX_2D` that are based on DuckDBs native nested types, such as `STRUCT` and `LIST`. Since these have a fixed and predictable internal memory layout, it is theoretically possible to optimize a lot of geospatial algorithms to be much faster when operating on these types than on the `GEOMETRY` type. However, only a couple of functions in the spatial extension have been explicitly specialized for these types so far. All of these new types are implicitly castable to `GEOMETRY`, but with a small conversion cost, so the `GEOMETRY` type is still the recommended type to use for now if you are planning to work with a lot of different spatial functions. `GEOMETRY` is not currently capable of storing additional geometry types such as curved geometries or triangle networks. Additionally, the `GEOMETRY` type does not store SRID information on a per value basis. These limitations may be addressed in the future. ### Spatial Functions {#docs:stable:core_extensions:spatial:functions} #### Function Index {#docs:stable:core_extensions:spatial:functions::function-index-} **[Scalar Functions](#::scalar-functions)** | Function | Summary | | --- | --- | | [`DuckDB_PROJ_Compiled_Version`](#::duckdb_proj_compiled_version) | Returns a text description of the PROJ library version that that this instance of DuckDB was compiled against. | | [`DuckDB_Proj_Version`](#::duckdb_proj_version) | Returns a text description of the PROJ library version that is being used by this instance of DuckDB. | | [`ST_Affine`](#::st_affine) | Applies an affine transformation to a geometry. | | [`ST_Area`](#::st_area) | Compute the area of a geometry. | | [`ST_Area_Spheroid`](#::st_area_spheroid) | Returns the area of a geometry in meters, using an ellipsoidal model of the earth | | [`ST_AsGeoJSON`](#::st_asgeojson) | Returns the geometry as a GeoJSON fragment | | [`ST_AsHEXWKB`](#::st_ashexwkb) | Returns the geometry as a HEXWKB string | | [`ST_AsSVG`](#::st_assvg) | Convert the geometry into a SVG fragment or path | | [`ST_AsText`](#::st_astext) | Returns the geometry as a WKT string | | [`ST_AsWKB`](#::st_aswkb) | Returns the geometry as a WKB (Well-Known-Binary) blob | | [`ST_Azimuth`](#::st_azimuth) | Returns the azimuth (a clockwise angle measured from north) of two points in radian. | | [`ST_Boundary`](#::st_boundary) | Returns the "boundary" of a geometry | | [`ST_Buffer`](#::st_buffer) | Returns a buffer around the input geometry at the target distance | | [`ST_BuildArea`](#::st_buildarea) | Creates a polygonal geometry by attempting to "fill in" the input geometry. | | [`ST_Centroid`](#::st_centroid) | Returns the centroid of a geometry | | [`ST_Collect`](#::st_collect) | Collects a list of geometries into a collection geometry. | | [`ST_CollectionExtract`](#::st_collectionextract) | Extracts geometries from a GeometryCollection into a typed multi geometry. | | [`ST_ConcaveHull`](#::st_concavehull) | Returns the 'concave' hull of the input geometry, containing all of the source input's points, and which can be used to create polygons from points. The ratio parameter dictates the level of concavity; 1.0 returns the convex hull; and 0 indicates to return the most concave hull possible. Set allowHoles to a non-zero value to allow output containing holes. | | [`ST_Contains`](#::st_contains) | Returns true if the first geometry contains the second geometry | | [`ST_ContainsProperly`](#::st_containsproperly) | Returns true if the first geometry \"properly\" contains the second geometry | | [`ST_ConvexHull`](#::st_convexhull) | Returns the convex hull enclosing the geometry | | [`ST_CoverageInvalidEdges`](#::st_coverageinvalidedges) | Returns the invalid edges in a polygonal coverage, which are edges that are not shared by two polygons. | | [`ST_CoverageSimplify`](#::st_coveragesimplify) | Simplify the edges in a polygonal coverage, preserving the coverange by ensuring that the there are no seams between the resulting simplified polygons. | | [`ST_CoverageUnion`](#::st_coverageunion) | Union all geometries in a polygonal coverage into a single geometry. | | [`ST_CoveredBy`](#::st_coveredby) | Returns true if geom1 is "covered by" geom2 | | [`ST_Covers`](#::st_covers) | Returns true if the geom1 "covers" geom2 | | [`ST_Crosses`](#::st_crosses) | Returns true if geom1 "crosses" geom2 | | [`ST_DWithin`](#::st_dwithin) | Returns if two geometries are within a target distance of each-other | | [`ST_DWithin_GEOS`](#::st_dwithin_geos) | Returns if two geometries are within a target distance of each-other | | [`ST_DWithin_Spheroid`](#::st_dwithin_spheroid) | Returns if two POINT_2D's are within a target distance in meters, using an ellipsoidal model of the earths surface | | [`ST_Difference`](#::st_difference) | Returns the "difference" between two geometries | | [`ST_Dimension`](#::st_dimension) | Returns the "topological dimension" of a geometry. | | [`ST_Disjoint`](#::st_disjoint) | Returns true if the geometries are disjoint | | [`ST_Distance`](#::st_distance) | Returns the planar distance between two geometries | | [`ST_Distance_GEOS`](#::st_distance_geos) | Returns the planar distance between two geometries | | [`ST_Distance_Sphere`](#::st_distance_sphere) | Returns the haversine (great circle) distance between two geometries. | | [`ST_Distance_Spheroid`](#::st_distance_spheroid) | Returns the distance between two geometries in meters using an ellipsoidal model of the earths surface | | [`ST_Dump`](#::st_dump) | Dumps a geometry into a list of sub-geometries and their "path" in the original geometry. | | [`ST_EndPoint`](#::st_endpoint) | Returns the end point of a LINESTRING. | | [`ST_Envelope`](#::st_envelope) | Returns the minimum bounding rectangle of a geometry as a polygon geometry | | [`ST_Equals`](#::st_equals) | Returns true if the geometries are "equal" | | [`ST_Extent`](#::st_extent) | Returns the minimal bounding box enclosing the input geometry | | [`ST_Extent_Approx`](#::st_extent_approx) | Returns the approximate bounding box of a geometry, if available. | | [`ST_ExteriorRing`](#::st_exteriorring) | Returns the exterior ring (shell) of a polygon geometry. | | [`ST_FlipCoordinates`](#::st_flipcoordinates) | Returns a new geometry with the coordinates of the input geometry "flipped" so that x = y and y = x | | [`ST_Force2D`](#::st_force2d) | Forces the vertices of a geometry to have X and Y components | | [`ST_Force3DM`](#::st_force3dm) | Forces the vertices of a geometry to have X, Y and M components | | [`ST_Force3DZ`](#::st_force3dz) | Forces the vertices of a geometry to have X, Y and Z components | | [`ST_Force4D`](#::st_force4d) | Forces the vertices of a geometry to have X, Y, Z and M components | | [`ST_GeomFromGeoJSON`](#::st_geomfromgeojson) | Deserializes a GEOMETRY from a GeoJSON fragment. | | [`ST_GeomFromHEXEWKB`](#::st_geomfromhexewkb) | Deserialize a GEOMETRY from a HEX(E)WKB encoded string | | [`ST_GeomFromHEXWKB`](#::st_geomfromhexwkb) | Deserialize a GEOMETRY from a HEX(E)WKB encoded string | | [`ST_GeomFromText`](#::st_geomfromtext) | Deserialize a GEOMETRY from a WKT encoded string | | [`ST_GeomFromWKB`](#::st_geomfromwkb) | Deserializes a GEOMETRY from a WKB encoded blob | | [`ST_GeometryType`](#::st_geometrytype) | Returns a 'GEOMETRY_TYPE' enum identifying the input geometry type. Possible enum return types are: `POINT`, `LINESTRING`, `POLYGON`, `MULTIPOINT`, `MULTILINESTRING`, `MULTIPOLYGON`, and `GEOMETRYCOLLECTION`. | | [`ST_HasM`](#::st_hasm) | Check if the input geometry has M values. | | [`ST_HasZ`](#::st_hasz) | Check if the input geometry has Z values. | | [`ST_Hilbert`](#::st_hilbert) | Encodes the X and Y values as the hilbert curve index for a curve covering the given bounding box. | | [`ST_Intersection`](#::st_intersection) | Returns the intersection of two geometries | | [`ST_Intersects`](#::st_intersects) | Returns true if the geometries intersect | | [`ST_Intersects_Extent`](#::st_intersects_extent) | Returns true if the extent of two geometries intersects | | [`ST_IsClosed`](#::st_isclosed) | Check if a geometry is 'closed' | | [`ST_IsEmpty`](#::st_isempty) | Returns true if the geometry is "empty". | | [`ST_IsRing`](#::st_isring) | Returns true if the geometry is a ring (both ST_IsClosed and ST_IsSimple). | | [`ST_IsSimple`](#::st_issimple) | Returns true if the geometry is simple | | [`ST_IsValid`](#::st_isvalid) | Returns true if the geometry is valid | | [`ST_Length`](#::st_length) | Returns the length of the input line geometry | | [`ST_Length_Spheroid`](#::st_length_spheroid) | Returns the length of the input geometry in meters, using an ellipsoidal model of the earth | | [`ST_LineInterpolatePoint`](#::st_lineinterpolatepoint) | Returns a point interpolated along a line at a fraction of total 2D length. | | [`ST_LineInterpolatePoints`](#::st_lineinterpolatepoints) | Returns a multi-point interpolated along a line at a fraction of total 2D length. | | [`ST_LineMerge`](#::st_linemerge) | "Merges" the input line geometry, optionally taking direction into account. | | [`ST_LineString2DFromWKB`](#::st_linestring2dfromwkb) | Deserialize a LINESTRING_2D from a WKB encoded blob | | [`ST_LineSubstring`](#::st_linesubstring) | Returns a substring of a line between two fractions of total 2D length. | | [`ST_M`](#::st_m) | Returns the M coordinate of a point geometry | | [`ST_MMax`](#::st_mmax) | Returns the maximum M coordinate of a geometry | | [`ST_MMin`](#::st_mmin) | Returns the minimum M coordinate of a geometry | | [`ST_MakeEnvelope`](#::st_makeenvelope) | Create a rectangular polygon from min/max coordinates | | [`ST_MakeLine`](#::st_makeline) | Create a LINESTRING from a list of POINT geometries | | [`ST_MakePolygon`](#::st_makepolygon) | Create a POLYGON from a LINESTRING shell | | [`ST_MakeValid`](#::st_makevalid) | Returns a valid representation of the geometry | | [`ST_MaximumInscribedCircle`](#::st_maximuminscribedcircle) | Returns the maximum inscribed circle of the input geometry, optionally with a tolerance. | | [`ST_MinimumRotatedRectangle`](#::st_minimumrotatedrectangle) | Returns the minimum rotated rectangle that bounds the input geometry, finding the surrounding box that has the lowest area by using a rotated rectangle, rather than taking the lowest and highest coordinate values as per ST_Envelope(). | | [`ST_Multi`](#::st_multi) | Turns a single geometry into a multi geometry. | | [`ST_NGeometries`](#::st_ngeometries) | Returns the number of component geometries in a collection geometry. | | [`ST_NInteriorRings`](#::st_ninteriorrings) | Returns the number of interior rings of a polygon | | [`ST_NPoints`](#::st_npoints) | Returns the number of vertices within a geometry | | [`ST_Node`](#::st_node) | Returns a "noded" MultiLinestring, produced by combining a collection of input linestrings and adding additional vertices where they intersect. | | [`ST_Normalize`](#::st_normalize) | Returns the "normalized" representation of the geometry | | [`ST_NumGeometries`](#::st_numgeometries) | Returns the number of component geometries in a collection geometry. | | [`ST_NumInteriorRings`](#::st_numinteriorrings) | Returns the number of interior rings of a polygon | | [`ST_NumPoints`](#::st_numpoints) | Returns the number of vertices within a geometry | | [`ST_Overlaps`](#::st_overlaps) | Returns true if the geometries overlap | | [`ST_Perimeter`](#::st_perimeter) | Returns the length of the perimeter of the geometry | | [`ST_Perimeter_Spheroid`](#::st_perimeter_spheroid) | Returns the length of the perimeter in meters using an ellipsoidal model of the earths surface | | [`ST_Point`](#::st_point) | Creates a GEOMETRY point | | [`ST_Point2D`](#::st_point2d) | Creates a POINT_2D | | [`ST_Point2DFromWKB`](#::st_point2dfromwkb) | Deserialize a POINT_2D from a WKB encoded blob | | [`ST_Point3D`](#::st_point3d) | Creates a POINT_3D | | [`ST_Point4D`](#::st_point4d) | Creates a POINT_4D | | [`ST_PointN`](#::st_pointn) | Returns the n'th vertex from the input geometry as a point geometry | | [`ST_PointOnSurface`](#::st_pointonsurface) | Returns a point guaranteed to lie on the surface of the geometry | | [`ST_Points`](#::st_points) | Collects all the vertices in the geometry into a MULTIPOINT | | [`ST_Polygon2DFromWKB`](#::st_polygon2dfromwkb) | Deserialize a POLYGON_2D from a WKB encoded blob | | [`ST_Polygonize`](#::st_polygonize) | Returns a polygonized representation of the input geometries | | [`ST_QuadKey`](#::st_quadkey) | Compute the [quadkey](https://learn.microsoft.com/en-us/bingmaps/articles/bing-maps-tile-system) for a given lon/lat point at a given level. | | [`ST_ReducePrecision`](#::st_reduceprecision) | Returns the geometry with all vertices reduced to the given precision | | [`ST_RemoveRepeatedPoints`](#::st_removerepeatedpoints) | Remove repeated points from a LINESTRING. | | [`ST_Reverse`](#::st_reverse) | Returns the geometry with the order of its vertices reversed | | [`ST_ShortestLine`](#::st_shortestline) | Returns the shortest line between two geometries | | [`ST_Simplify`](#::st_simplify) | Returns a simplified version of the geometry | | [`ST_SimplifyPreserveTopology`](#::st_simplifypreservetopology) | Returns a simplified version of the geometry that preserves topology | | [`ST_StartPoint`](#::st_startpoint) | Returns the start point of a LINESTRING. | | [`ST_TileEnvelope`](#::st_tileenvelope) | The `ST_TileEnvelope` scalar function generates tile envelope rectangular polygons from specified zoom level and tile indices. | | [`ST_Touches`](#::st_touches) | Returns true if the geometries touch | | [`ST_Transform`](#::st_transform) | Transforms a geometry between two coordinate systems | | [`ST_Union`](#::st_union) | Returns the union of two geometries | | [`ST_VoronoiDiagram`](#::st_voronoidiagram) | Returns the Voronoi diagram of the supplied MultiPoint geometry | | [`ST_Within`](#::st_within) | Returns true if the first geometry is within the second | | [`ST_WithinProperly`](#::st_withinproperly) | Returns true if the first geometry \"properly\" is contained by the second geometry | | [`ST_X`](#::st_x) | Returns the X coordinate of a point geometry | | [`ST_XMax`](#::st_xmax) | Returns the maximum X coordinate of a geometry | | [`ST_XMin`](#::st_xmin) | Returns the minimum X coordinate of a geometry | | [`ST_Y`](#::st_y) | Returns the Y coordinate of a point geometry | | [`ST_YMax`](#::st_ymax) | Returns the maximum Y coordinate of a geometry | | [`ST_YMin`](#::st_ymin) | Returns the minimum Y coordinate of a geometry | | [`ST_Z`](#::st_z) | Returns the Z coordinate of a point geometry | | [`ST_ZMFlag`](#::st_zmflag) | Returns a flag indicating the presence of Z and M values in the input geometry. | | [`ST_ZMax`](#::st_zmax) | Returns the maximum Z coordinate of a geometry | | [`ST_ZMin`](#::st_zmin) | Returns the minimum Z coordinate of a geometry | **[Aggregate Functions](#::aggregate-functions)** | Function | Summary | | --- | --- | | [`ST_CoverageInvalidEdges_Agg`](#::st_coverageinvalidedges_agg) | Returns the invalid edges of a coverage geometry | | [`ST_CoverageSimplify_Agg`](#::st_coveragesimplify_agg) | Simplifies a set of geometries while maintaining coverage | | [`ST_CoverageUnion_Agg`](#::st_coverageunion_agg) | Unions a set of geometries while maintaining coverage | | [`ST_Envelope_Agg`](#::st_envelope_agg) | Alias for [ST_Extent_Agg](#::st_extent_agg). | | [`ST_Extent_Agg`](#::st_extent_agg) | Computes the minimal-bounding-box polygon containing the set of input geometries | | [`ST_Intersection_Agg`](#::st_intersection_agg) | Computes the intersection of a set of geometries | | [`ST_MemUnion_Agg`](#::st_memunion_agg) | Computes the union of a set of input geometries. | | [`ST_Union_Agg`](#::st_union_agg) | Computes the union of a set of input geometries | **[Macro Functions](#::macro-functions)** | Function | Summary | | --- | --- | | [`ST_Rotate`](#::st_rotate) | Alias of ST_RotateZ | | [`ST_RotateX`](#::st_rotatex) | Rotates a geometry around the X axis. This is a shorthand macro for calling ST_Affine. | | [`ST_RotateY`](#::st_rotatey) | Rotates a geometry around the Y axis. This is a shorthand macro for calling ST_Affine. | | [`ST_RotateZ`](#::st_rotatez) | Rotates a geometry around the Z axis. This is a shorthand macro for calling ST_Affine. | | [`ST_Scale`](#::st_scale) | | | [`ST_TransScale`](#::st_transscale) | Translates and then scales a geometry in X and Y direction. This is a shorthand macro for calling ST_Affine. | | [`ST_Translate`](#::st_translate) | | **[Table Functions](#::table-functions)** | Function | Summary | | --- | --- | | [`ST_Drivers`](#::st_drivers) | Returns the list of supported GDAL drivers and file formats | | [`ST_GeneratePoints`](#::st_generatepoints) | Generates a set of random points within the specified bounding box. | | [`ST_Read`](#::st_read) | Read and import a variety of geospatial file formats using the GDAL library. | | [`ST_ReadOSM`](#::st_readosm) | The `ST_ReadOsm()` table function enables reading compressed OpenStreetMap data directly from a `.osm.pbf file.` | | [`ST_ReadSHP`](#::st_readshp) | Read a Shapefile without relying on the GDAL library | | [`ST_Read_Meta`](#::st_read_meta) | Read the metadata from a variety of geospatial file formats using the GDAL library. | ---- #### Scalar Functions {#docs:stable:core_extensions:spatial:functions::scalar-functions} ##### DuckDB_PROJ_Compiled_Version {#docs:stable:core_extensions:spatial:functions::duckdb_proj_compiled_version} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql VARCHAR DuckDB_PROJ_Compiled_Version () ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns a text description of the PROJ library version that that this instance of DuckDB was compiled against. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT duckdb_proj_compiled_version(); â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ duckdb_proj_compiled_version() â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Rel. 9.1.1, December 1st, 2022 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ##### DuckDB_Proj_Version {#docs:stable:core_extensions:spatial:functions::duckdb_proj_version} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql VARCHAR DuckDB_Proj_Version () ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns a text description of the PROJ library version that is being used by this instance of DuckDB. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT duckdb_proj_version(); â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ duckdb_proj_version() â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 9.1.1 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ##### ST_Affine {#docs:stable:core_extensions:spatial:functions::st_affine} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_Affine (geom GEOMETRY, a DOUBLE, b DOUBLE, c DOUBLE, d DOUBLE, e DOUBLE, f DOUBLE, g DOUBLE, h DOUBLE, i DOUBLE, xoff DOUBLE, yoff DOUBLE, zoff DOUBLE) GEOMETRY ST_Affine (geom GEOMETRY, a DOUBLE, b DOUBLE, d DOUBLE, e DOUBLE, xoff DOUBLE, yoff DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Applies an affine transformation to a geometry. For the 2D variant, the transformation matrix is defined as follows: ```text | a b xoff | | d e yoff | | 0 0 1 | ``` For the 3D variant, the transformation matrix is defined as follows: ```text | a b c xoff | | d e f yoff | | g h i zoff | | 0 0 0 1 | ``` The transformation is applied to all vertices of the geometry. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- Translate a point by (2, 3) SELECT ST_Affine(ST_Point(1, 1), 1, 0, -- a, b 0, 1, -- d, e 2, 3); -- xoff, yoff ---- POINT (3 4) -- Scale a geometry by factor 2 in X and Y SELECT ST_Affine(ST_Point(1, 1), 2, 0, 0, -- a, b, c 0, 2, 0, -- d, e, f 0, 0, 1, -- g, h, i 0, 0, 0); -- xoff, yoff, zoff ---- POINT (2 2) ``` ---- ##### ST_Area {#docs:stable:core_extensions:spatial:functions::st_area} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql DOUBLE ST_Area (geom GEOMETRY) DOUBLE ST_Area (polygon POLYGON_2D) DOUBLE ST_Area (linestring LINESTRING_2D) DOUBLE ST_Area (point POINT_2D) DOUBLE ST_Area (box BOX_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Compute the area of a geometry. Returns `0.0` for any geometry that is not a `POLYGON`, `MULTIPOLYGON` or `GEOMETRYCOLLECTION` containing polygon geometries. The area is in the same units as the spatial reference system of the geometry. The `POINT_2D` and `LINESTRING_2D` overloads of this function always return `0.0` but are included for completeness. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql select ST_Area('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::geometry); -- 1.0 ``` ---- ##### ST_Area_Spheroid {#docs:stable:core_extensions:spatial:functions::st_area_spheroid} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql DOUBLE ST_Area_Spheroid (geom GEOMETRY) DOUBLE ST_Area_Spheroid (poly POLYGON_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the area of a geometry in meters, using an ellipsoidal model of the earth The input geometry is assumed to be in the [EPSG:4326](https://en.wikipedia.org/wiki/World_Geodetic_System) coordinate system (WGS84), with [latitude, longitude] axis order and the area is returned in square meters. This function uses the [GeographicLib](https://geographiclib.sourceforge.io/) library, calculating the area using an ellipsoidal model of the earth. This is a highly accurate method for calculating the area of a polygon taking the curvature of the earth into account, but is also the slowest. Returns `0.0` for any geometry that is not a `POLYGON`, `MULTIPOLYGON` or `GEOMETRYCOLLECTION` containing polygon geometries. ---- ##### ST_AsGeoJSON {#docs:stable:core_extensions:spatial:functions::st_asgeojson} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql JSON ST_AsGeoJSON (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the geometry as a GeoJSON fragment This does not return a complete GeoJSON document, only the geometry fragment. To construct a complete GeoJSON document or feature, look into using the DuckDB JSON extension in conjunction with this function. This function supports geometries with Z values, but not M values. M values are ignored. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql select ST_AsGeoJSON('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::geometry); ---- {"type":"Polygon","coordinates":[[[0.0,0.0],[0.0,1.0],[1.0,1.0],[1.0,0.0],[0.0,0.0]]]} -- Convert a geometry into a full GeoJSON feature (requires the JSON extension to be loaded) SELECT CAST({ type: 'Feature', geometry: ST_AsGeoJSON(ST_Point(1,2)), properties: { name: 'my_point' } } AS JSON); ---- {"type":"Feature","geometry":{"type":"Point","coordinates":[1.0,2.0]},"properties":{"name":"my_point"}} ``` ---- ##### ST_AsHEXWKB {#docs:stable:core_extensions:spatial:functions::st_ashexwkb} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql VARCHAR ST_AsHEXWKB (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the geometry as a HEXWKB string ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_AsHexWKB('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::geometry); ---- 01030000000100000005000000000000000000000000000... ``` ---- ##### ST_AsSVG {#docs:stable:core_extensions:spatial:functions::st_assvg} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql VARCHAR ST_AsSVG (geom GEOMETRY, relative BOOLEAN, precision INTEGER) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Convert the geometry into a SVG fragment or path The SVG fragment is returned as a string. The fragment is a path element that can be used in an SVG document. The second boolean argument specifies whether the path should be relative or absolute. The third argument specifies the maximum number of digits to use for the coordinates. Points are formatted as cx/cy using absolute coordinates or x/y using relative coordinates. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_AsSVG('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::GEOMETRY, false, 15); ---- M 0 0 L 0 -1 1 -1 1 0 Z ``` ---- ##### ST_AsText {#docs:stable:core_extensions:spatial:functions::st_astext} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql VARCHAR ST_AsText (geom GEOMETRY) VARCHAR ST_AsText (point POINT_2D) VARCHAR ST_AsText (linestring LINESTRING_2D) VARCHAR ST_AsText (polygon POLYGON_2D) VARCHAR ST_AsText (box BOX_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the geometry as a WKT string ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_MakeEnvelope(0,0,1,1); ---- POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0)) ``` ---- ##### ST_AsWKB {#docs:stable:core_extensions:spatial:functions::st_aswkb} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql WKB_BLOB ST_AsWKB (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the geometry as a WKB (Well-Known-Binary) blob ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_AsWKB('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::GEOMETRY)::BLOB; ---- \x01\x03\x00\x00\x00\x01\x00\x00\x00\x05... ``` ---- ##### ST_Azimuth {#docs:stable:core_extensions:spatial:functions::st_azimuth} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql DOUBLE ST_Azimuth (origin GEOMETRY, target GEOMETRY) DOUBLE ST_Azimuth (origin POINT_2D, target POINT_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the azimuth (a clockwise angle measured from north) of two points in radian. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT degrees(ST_Azimuth(ST_Point(0, 0), ST_Point(0, 1))); ---- 90.0 ``` ---- ##### ST_Boundary {#docs:stable:core_extensions:spatial:functions::st_boundary} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Boundary (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the "boundary" of a geometry ---- ##### ST_Buffer {#docs:stable:core_extensions:spatial:functions::st_buffer} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_Buffer (geom GEOMETRY, distance DOUBLE) GEOMETRY ST_Buffer (geom GEOMETRY, distance DOUBLE, num_triangles INTEGER) GEOMETRY ST_Buffer (geom GEOMETRY, distance DOUBLE, num_triangles INTEGER, cap_style VARCHAR, join_style VARCHAR, mitre_limit DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns a buffer around the input geometry at the target distance `geom` is the input geometry. `distance` is the target distance for the buffer, using the same units as the input geometry. `num_triangles` represents how many triangles that will be produced to approximate a quarter circle. The larger the number, the smoother the resulting geometry. The default value is 8. `cap_style` must be one of "CAP_ROUND", "CAP_FLAT", "CAP_SQUARE". This parameter is case-insensitive. `join_style` must be one of "JOIN_ROUND", "JOIN_MITRE", "JOIN_BEVEL". This parameter is case-insensitive. `mitre_limit` only applies when `join_style` is "JOIN_MITRE". It is the ratio of the distance from the corner to the mitre point to the corner radius. The default value is 1.0. This is a planar operation and will not take into account the curvature of the earth. ---- ##### ST_BuildArea {#docs:stable:core_extensions:spatial:functions::st_buildarea} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_BuildArea (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Creates a polygonal geometry by attempting to "fill in" the input geometry. Unlike ST_Polygonize, this function does not fill in holes. ---- ##### ST_Centroid {#docs:stable:core_extensions:spatial:functions::st_centroid} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_Centroid (geom GEOMETRY) POINT_2D ST_Centroid (point POINT_2D) POINT_2D ST_Centroid (linestring LINESTRING_2D) POINT_2D ST_Centroid (polygon POLYGON_2D) POINT_2D ST_Centroid (box BOX_2D) POINT_2D ST_Centroid (box BOX_2DF) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the centroid of a geometry ---- ##### ST_Collect {#docs:stable:core_extensions:spatial:functions::st_collect} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Collect (geoms GEOMETRY[]) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Collects a list of geometries into a collection geometry. - If all geometries are `POINT`'s, a `MULTIPOINT` is returned. - If all geometries are `LINESTRING`'s, a `MULTILINESTRING` is returned. - If all geometries are `POLYGON`'s, a `MULTIPOLYGON` is returned. - Otherwise if the input collection contains a mix of geometry types, a `GEOMETRYCOLLECTION` is returned. Empty and `NULL` geometries are ignored. If all geometries are empty or `NULL`, a `GEOMETRYCOLLECTION EMPTY` is returned. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- With all POINT's, a MULTIPOINT is returned SELECT ST_Collect([ST_Point(1, 2), ST_Point(3, 4)]); ---- MULTIPOINT (1 2, 3 4) -- With mixed geometry types, a GEOMETRYCOLLECTION is returned SELECT ST_Collect([ST_Point(1, 2), ST_GeomFromText('LINESTRING(3 4, 5 6)')]); ---- GEOMETRYCOLLECTION (POINT (1 2), LINESTRING (3 4, 5 6)) -- Note that the empty geometry is ignored, so the result is a MULTIPOINT SELECT ST_Collect([ST_Point(1, 2), NULL, ST_GeomFromText('GEOMETRYCOLLECTION EMPTY')]); ---- MULTIPOINT (1 2) -- If all geometries are empty or NULL, a GEOMETRYCOLLECTION EMPTY is returned SELECT ST_Collect([NULL, ST_GeomFromText('GEOMETRYCOLLECTION EMPTY')]); ---- GEOMETRYCOLLECTION EMPTY -- Tip: You can use the `ST_Collect` function together with the `list()` aggregate function to collect multiple rows of geometries into a single geometry collection: CREATE TABLE points (geom GEOMETRY); INSERT INTO points VALUES (ST_Point(1, 2)), (ST_Point(3, 4)); SELECT ST_Collect(list(geom)) FROM points; ---- MULTIPOINT (1 2, 3 4) ``` ---- ##### ST_CollectionExtract {#docs:stable:core_extensions:spatial:functions::st_collectionextract} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_CollectionExtract (geom GEOMETRY, type INTEGER) GEOMETRY ST_CollectionExtract (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Extracts geometries from a GeometryCollection into a typed multi geometry. If the input geometry is a GeometryCollection, the function will return a multi geometry, determined by the `type` parameter. - if `type` = 1, returns a MultiPoint containg all the Points in the collection - if `type` = 2, returns a MultiLineString containg all the LineStrings in the collection - if `type` = 3, returns a MultiPolygon containg all the Polygons in the collection If no `type` parameters is provided, the function will return a multi geometry matching the highest "surface dimension" of the contained geometries. E.g. if the collection contains only Points, a MultiPoint will be returned. But if the collection contains both Points and LineStrings, a MultiLineString will be returned. Similarly, if the collection contains Polygons, a MultiPolygon will be returned. Contained geometries of a lower surface dimension will be ignored. If the input geometry contains nested GeometryCollections, their geometries will be extracted recursively and included into the final multi geometry as well. If the input geometry is not a GeometryCollection, the function will return the input geometry as is. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql select st_collectionextract('MULTIPOINT(1 2,3 4)'::geometry, 1); -- MULTIPOINT (1 2, 3 4) ``` ---- ##### ST_ConcaveHull {#docs:stable:core_extensions:spatial:functions::st_concavehull} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_ConcaveHull (geom GEOMETRY, ratio DOUBLE, allowHoles BOOLEAN) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the 'concave' hull of the input geometry, containing all of the source input's points, and which can be used to create polygons from points. The ratio parameter dictates the level of concavity; 1.0 returns the convex hull; and 0 indicates to return the most concave hull possible. Set allowHoles to a non-zero value to allow output containing holes. ---- ##### ST_Contains {#docs:stable:core_extensions:spatial:functions::st_contains} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql BOOLEAN ST_Contains (geom1 POLYGON_2D, geom2 POINT_2D) BOOLEAN ST_Contains (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if the first geometry contains the second geometry In contrast to `ST_ContainsProperly`, this function will also return true if `geom2` is contained strictly on the boundary of `geom1`. A geometry always `ST_Contains` itself, but does not `ST_ContainsProperly` itself. ---- ##### ST_ContainsProperly {#docs:stable:core_extensions:spatial:functions::st_containsproperly} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_ContainsProperly (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if the first geometry \"properly\" contains the second geometry In contrast to `ST_Contains`, this function does not return true if `geom2` is contained strictly on the boundary of `geom1`. A geometry always `ST_Contains` itself, but does not `ST_ContainsProperly` itself. ---- ##### ST_ConvexHull {#docs:stable:core_extensions:spatial:functions::st_convexhull} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_ConvexHull (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the convex hull enclosing the geometry ---- ##### ST_CoverageInvalidEdges {#docs:stable:core_extensions:spatial:functions::st_coverageinvalidedges} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_CoverageInvalidEdges (geoms GEOMETRY[], tolerance DOUBLE) GEOMETRY ST_CoverageInvalidEdges (geoms GEOMETRY[]) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the invalid edges in a polygonal coverage, which are edges that are not shared by two polygons. Returns NULL if the input is not a polygonal coverage, or if the input is valid. Tolerance is 0 by default. ---- ##### ST_CoverageSimplify {#docs:stable:core_extensions:spatial:functions::st_coveragesimplify} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_CoverageSimplify (geoms GEOMETRY[], tolerance DOUBLE, simplify_boundary BOOLEAN) GEOMETRY ST_CoverageSimplify (geoms GEOMETRY[], tolerance DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Simplify the edges in a polygonal coverage, preserving the coverange by ensuring that the there are no seams between the resulting simplified polygons. By default, the boundary of the coverage is also simplified, but this can be controlled with the optional third 'simplify_boundary' parameter. ---- ##### ST_CoverageUnion {#docs:stable:core_extensions:spatial:functions::st_coverageunion} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_CoverageUnion (geoms GEOMETRY[]) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Union all geometries in a polygonal coverage into a single geometry. This may be faster than using `ST_Union`, but may use more memory. ---- ##### ST_CoveredBy {#docs:stable:core_extensions:spatial:functions::st_coveredby} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_CoveredBy (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if geom1 is "covered by" geom2 ---- ##### ST_Covers {#docs:stable:core_extensions:spatial:functions::st_covers} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_Covers (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if the geom1 "covers" geom2 ---- ##### ST_Crosses {#docs:stable:core_extensions:spatial:functions::st_crosses} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_Crosses (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if geom1 "crosses" geom2 ---- ##### ST_DWithin {#docs:stable:core_extensions:spatial:functions::st_dwithin} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_DWithin (geom1 GEOMETRY, geom2 GEOMETRY, distance DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns if two geometries are within a target distance of each-other ---- ##### ST_DWithin_GEOS {#docs:stable:core_extensions:spatial:functions::st_dwithin_geos} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_DWithin_GEOS (geom1 GEOMETRY, geom2 GEOMETRY, distance DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns if two geometries are within a target distance of each-other ---- ##### ST_DWithin_Spheroid {#docs:stable:core_extensions:spatial:functions::st_dwithin_spheroid} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_DWithin_Spheroid (p1 POINT_2D, p2 POINT_2D, distance DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns if two POINT_2D's are within a target distance in meters, using an ellipsoidal model of the earths surface The input geometry is assumed to be in the [EPSG:4326](https://en.wikipedia.org/wiki/World_Geodetic_System) coordinate system (WGS84), with [latitude, longitude] axis order and the distance is returned in meters. This function uses the [GeographicLib](https://geographiclib.sourceforge.io/) library to solve the [inverse geodesic problem](https://en.wikipedia.org/wiki/Geodesics_on_an_ellipsoid#Solution_of_the_direct_and_inverse_problems), calculating the distance between two points using an ellipsoidal model of the earth. This is a highly accurate method for calculating the distance between two arbitrary points taking the curvature of the earths surface into account, but is also the slowest. ---- ##### ST_Difference {#docs:stable:core_extensions:spatial:functions::st_difference} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Difference (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the "difference" between two geometries ---- ##### ST_Dimension {#docs:stable:core_extensions:spatial:functions::st_dimension} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql INTEGER ST_Dimension (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the "topological dimension" of a geometry. - For POINT and MULTIPOINT geometries, returns `0` - For LINESTRING and MULTILINESTRING, returns `1` - For POLYGON and MULTIPOLYGON, returns `2` - For GEOMETRYCOLLECTION, returns the maximum dimension of the contained geometries, or 0 if the collection is empty ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql select st_dimension('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::geometry); ---- 2 ``` ---- ##### ST_Disjoint {#docs:stable:core_extensions:spatial:functions::st_disjoint} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_Disjoint (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if the geometries are disjoint ---- ##### ST_Distance {#docs:stable:core_extensions:spatial:functions::st_distance} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql DOUBLE ST_Distance (point1 POINT_2D, point2 POINT_2D) DOUBLE ST_Distance (point POINT_2D, linestring LINESTRING_2D) DOUBLE ST_Distance (linestring LINESTRING_2D, point POINT_2D) DOUBLE ST_Distance (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the planar distance between two geometries ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_Distance('POINT (0 0)'::GEOMETRY, 'POINT (3 4)'::GEOMETRY); ---- 5.0 -- Z coordinates are ignored SELECT ST_Distance('POINT Z (0 0 0)'::GEOMETRY, 'POINT Z (3 4 5)'::GEOMETRY); ---- 5.0 ``` ---- ##### ST_Distance_GEOS {#docs:stable:core_extensions:spatial:functions::st_distance_geos} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql DOUBLE ST_Distance_GEOS (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the planar distance between two geometries ---- ##### ST_Distance_Sphere {#docs:stable:core_extensions:spatial:functions::st_distance_sphere} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql DOUBLE ST_Distance_Sphere (geom1 GEOMETRY, geom2 GEOMETRY) DOUBLE ST_Distance_Sphere (point1 POINT_2D, point2 POINT_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the haversine (great circle) distance between two geometries. - Only supports POINT geometries. - Returns the distance in meters. - The input is expected to be in WGS84 (EPSG:4326) coordinates, using a [latitude, longitude] axis order. ---- ##### ST_Distance_Spheroid {#docs:stable:core_extensions:spatial:functions::st_distance_spheroid} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql DOUBLE ST_Distance_Spheroid (p1 POINT_2D, p2 POINT_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the distance between two geometries in meters using an ellipsoidal model of the earths surface The input geometry is assumed to be in the [EPSG:4326](https://en.wikipedia.org/wiki/World_Geodetic_System) coordinate system (WGS84), with [latitude, longitude] axis order and the distance limit is expected to be in meters. This function uses the [GeographicLib](https://geographiclib.sourceforge.io/) library to solve the [inverse geodesic problem](https://en.wikipedia.org/wiki/Geodesics_on_an_ellipsoid#Solution_of_the_direct_and_inverse_problems), calculating the distance between two points using an ellipsoidal model of the earth. This is a highly accurate method for calculating the distance between two arbitrary points taking the curvature of the earths surface into account, but is also the slowest. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- Note: the coordinates are in WGS84 and [latitude, longitude] axis order -- Whats the distance between New York and Amsterdam (JFK and AMS airport)? SELECT st_distance_spheroid( st_point(40.6446, -73.7797), st_point(52.3130, 4.7725) ); ---- 5863418.7459356235 -- Roughly 5863km! ``` ---- ##### ST_Dump {#docs:stable:core_extensions:spatial:functions::st_dump} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql STRUCT(geom GEOMETRY, path INTEGER[])[] ST_Dump (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Dumps a geometry into a list of sub-geometries and their "path" in the original geometry. You can use the `UNNEST(res, recursive := true)` function to explode resulting list of structs into multiple rows. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql select st_dump('MULTIPOINT(1 2,3 4)'::geometry); ---- [{'geom': 'POINT(1 2)', 'path': [0]}, {'geom': 'POINT(3 4)', 'path': [1]}] select unnest(st_dump('MULTIPOINT(1 2,3 4)'::geometry), recursive := true); -- â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” -- â”‚ geom â”‚ path â”‚ -- â”‚ geometry â”‚ int32[] â”‚ -- â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ -- â”‚ POINT (1 2) â”‚ [1] â”‚ -- â”‚ POINT (3 4) â”‚ [2] â”‚ -- â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ##### ST_EndPoint {#docs:stable:core_extensions:spatial:functions::st_endpoint} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_EndPoint (geom GEOMETRY) POINT_2D ST_EndPoint (line LINESTRING_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the end point of a LINESTRING. ---- ##### ST_Envelope {#docs:stable:core_extensions:spatial:functions::st_envelope} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Envelope (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the minimum bounding rectangle of a geometry as a polygon geometry ---- ##### ST_Equals {#docs:stable:core_extensions:spatial:functions::st_equals} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_Equals (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if the geometries are "equal" ---- ##### ST_Extent {#docs:stable:core_extensions:spatial:functions::st_extent} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql BOX_2D ST_Extent (geom GEOMETRY) BOX_2D ST_Extent (wkb WKB_BLOB) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the minimal bounding box enclosing the input geometry ---- ##### ST_Extent_Approx {#docs:stable:core_extensions:spatial:functions::st_extent_approx} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOX_2DF ST_Extent_Approx (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the approximate bounding box of a geometry, if available. This function is only really used internally, and returns the cached bounding box of the geometry if it exists. This function may be removed or renamed in the future. ---- ##### ST_ExteriorRing {#docs:stable:core_extensions:spatial:functions::st_exteriorring} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_ExteriorRing (geom GEOMETRY) LINESTRING_2D ST_ExteriorRing (polygon POLYGON_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the exterior ring (shell) of a polygon geometry. ---- ##### ST_FlipCoordinates {#docs:stable:core_extensions:spatial:functions::st_flipcoordinates} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_FlipCoordinates (geom GEOMETRY) POINT_2D ST_FlipCoordinates (point POINT_2D) LINESTRING_2D ST_FlipCoordinates (linestring LINESTRING_2D) POLYGON_2D ST_FlipCoordinates (polygon POLYGON_2D) BOX_2D ST_FlipCoordinates (box BOX_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns a new geometry with the coordinates of the input geometry "flipped" so that x = y and y = x ---- ##### ST_Force2D {#docs:stable:core_extensions:spatial:functions::st_force2d} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Force2D (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Forces the vertices of a geometry to have X and Y components This function will drop any Z and M values from the input geometry, if present. If the input geometry is already 2D, it will be returned as is. ---- ##### ST_Force3DM {#docs:stable:core_extensions:spatial:functions::st_force3dm} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Force3DM (geom GEOMETRY, m DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Forces the vertices of a geometry to have X, Y and M components The following cases apply: - If the input geometry has a Z component but no M component, the Z component will be replaced with the new M value. - If the input geometry has a M component but no Z component, it will be returned as is. - If the input geometry has both a Z component and a M component, the Z component will be removed. - Otherwise, if the input geometry has neither a Z or M component, the new M value will be added to the vertices of the input geometry. ---- ##### ST_Force3DZ {#docs:stable:core_extensions:spatial:functions::st_force3dz} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Force3DZ (geom GEOMETRY, z DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Forces the vertices of a geometry to have X, Y and Z components The following cases apply: - If the input geometry has a M component but no Z component, the M component will be replaced with the new Z value. - If the input geometry has a Z component but no M component, it will be returned as is. - If the input geometry has both a Z component and a M component, the M component will be removed. - Otherwise, if the input geometry has neither a Z or M component, the new Z value will be added to the vertices of the input geometry. ---- ##### ST_Force4D {#docs:stable:core_extensions:spatial:functions::st_force4d} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Force4D (geom GEOMETRY, z DOUBLE, m DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Forces the vertices of a geometry to have X, Y, Z and M components The following cases apply: - If the input geometry has a Z component but no M component, the new M value will be added to the vertices of the input geometry. - If the input geometry has a M component but no Z component, the new Z value will be added to the vertices of the input geometry. - If the input geometry has both a Z component and a M component, the geometry will be returned as is. - Otherwise, if the input geometry has neither a Z or M component, the new Z and M values will be added to the vertices of the input geometry. ---- ##### ST_GeomFromGeoJSON {#docs:stable:core_extensions:spatial:functions::st_geomfromgeojson} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_GeomFromGeoJSON (geojson JSON) GEOMETRY ST_GeomFromGeoJSON (geojson VARCHAR) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Deserializes a GEOMETRY from a GeoJSON fragment. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_GeomFromGeoJSON('{"type":"Point","coordinates":[1.0,2.0]}'); ---- POINT (1 2) ``` ---- ##### ST_GeomFromHEXEWKB {#docs:stable:core_extensions:spatial:functions::st_geomfromhexewkb} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_GeomFromHEXEWKB (hexwkb VARCHAR) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Deserialize a GEOMETRY from a HEX(E)WKB encoded string DuckDB spatial doesnt currently differentiate between `WKB` and `EWKB`, so `ST_GeomFromHEXWKB` and `ST_GeomFromHEXEWKB" are just aliases of eachother. ---- ##### ST_GeomFromHEXWKB {#docs:stable:core_extensions:spatial:functions::st_geomfromhexwkb} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_GeomFromHEXWKB (hexwkb VARCHAR) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Deserialize a GEOMETRY from a HEX(E)WKB encoded string DuckDB spatial doesnt currently differentiate between `WKB` and `EWKB`, so `ST_GeomFromHEXWKB` and `ST_GeomFromHEXEWKB" are just aliases of eachother. ---- ##### ST_GeomFromText {#docs:stable:core_extensions:spatial:functions::st_geomfromtext} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_GeomFromText (wkt VARCHAR) GEOMETRY ST_GeomFromText (wkt VARCHAR, ignore_invalid BOOLEAN) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Deserialize a GEOMETRY from a WKT encoded string ---- ##### ST_GeomFromWKB {#docs:stable:core_extensions:spatial:functions::st_geomfromwkb} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_GeomFromWKB (wkb WKB_BLOB) GEOMETRY ST_GeomFromWKB (blob BLOB) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Deserializes a GEOMETRY from a WKB encoded blob ---- ##### ST_GeometryType {#docs:stable:core_extensions:spatial:functions::st_geometrytype} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql ANY ST_GeometryType (geom GEOMETRY) ANY ST_GeometryType (point POINT_2D) ANY ST_GeometryType (linestring LINESTRING_2D) ANY ST_GeometryType (polygon POLYGON_2D) ANY ST_GeometryType (wkb WKB_BLOB) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns a 'GEOMETRY_TYPE' enum identifying the input geometry type. Possible enum return types are: `POINT`, `LINESTRING`, `POLYGON`, `MULTIPOINT`, `MULTILINESTRING`, `MULTIPOLYGON`, and `GEOMETRYCOLLECTION`. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT DISTINCT ST_GeometryType(ST_GeomFromText('POINT(1 1)')); ---- POINT ``` ---- ##### ST_HasM {#docs:stable:core_extensions:spatial:functions::st_hasm} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql BOOLEAN ST_HasM (geom GEOMETRY) BOOLEAN ST_HasM (wkb WKB_BLOB) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Check if the input geometry has M values. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- HasM for a 2D geometry SELECT ST_HasM(ST_GeomFromText('POINT(1 1)')); ---- false -- HasM for a 3DZ geometry SELECT ST_HasM(ST_GeomFromText('POINT Z(1 1 1)')); ---- false -- HasM for a 3DM geometry SELECT ST_HasM(ST_GeomFromText('POINT M(1 1 1)')); ---- true -- HasM for a 4D geometry SELECT ST_HasM(ST_GeomFromText('POINT ZM(1 1 1 1)')); ---- true ``` ---- ##### ST_HasZ {#docs:stable:core_extensions:spatial:functions::st_hasz} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql BOOLEAN ST_HasZ (geom GEOMETRY) BOOLEAN ST_HasZ (wkb WKB_BLOB) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Check if the input geometry has Z values. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- HasZ for a 2D geometry SELECT ST_HasZ(ST_GeomFromText('POINT(1 1)')); ---- false -- HasZ for a 3DZ geometry SELECT ST_HasZ(ST_GeomFromText('POINT Z(1 1 1)')); ---- true -- HasZ for a 3DM geometry SELECT ST_HasZ(ST_GeomFromText('POINT M(1 1 1)')); ---- false -- HasZ for a 4D geometry SELECT ST_HasZ(ST_GeomFromText('POINT ZM(1 1 1 1)')); ---- true ``` ---- ##### ST_Hilbert {#docs:stable:core_extensions:spatial:functions::st_hilbert} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql UINTEGER ST_Hilbert (x DOUBLE, y DOUBLE, bounds BOX_2D) UINTEGER ST_Hilbert (geom GEOMETRY, bounds BOX_2D) UINTEGER ST_Hilbert (geom GEOMETRY) UINTEGER ST_Hilbert (box BOX_2D, bounds BOX_2D) UINTEGER ST_Hilbert (box BOX_2DF, bounds BOX_2DF) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Encodes the X and Y values as the hilbert curve index for a curve covering the given bounding box. If a geometry is provided, the center of the approximate bounding box is used as the point to encode. If no bounding box is provided, the hilbert curve index is mapped to the full range of a single-presicion float. For the BOX_2D and BOX_2DF variants, the center of the box is used as the point to encode. ---- ##### ST_Intersection {#docs:stable:core_extensions:spatial:functions::st_intersection} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Intersection (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the intersection of two geometries ---- ##### ST_Intersects {#docs:stable:core_extensions:spatial:functions::st_intersects} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql BOOLEAN ST_Intersects (box1 BOX_2D, box2 BOX_2D) BOOLEAN ST_Intersects (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if the geometries intersect ---- ##### ST_Intersects_Extent {#docs:stable:core_extensions:spatial:functions::st_intersects_extent} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_Intersects_Extent (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if the extent of two geometries intersects ---- ##### ST_IsClosed {#docs:stable:core_extensions:spatial:functions::st_isclosed} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_IsClosed (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Check if a geometry is 'closed' ---- ##### ST_IsEmpty {#docs:stable:core_extensions:spatial:functions::st_isempty} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql BOOLEAN ST_IsEmpty (geom GEOMETRY) BOOLEAN ST_IsEmpty (linestring LINESTRING_2D) BOOLEAN ST_IsEmpty (polygon POLYGON_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if the geometry is "empty". ---- ##### ST_IsRing {#docs:stable:core_extensions:spatial:functions::st_isring} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_IsRing (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if the geometry is a ring (both ST_IsClosed and ST_IsSimple). ---- ##### ST_IsSimple {#docs:stable:core_extensions:spatial:functions::st_issimple} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_IsSimple (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if the geometry is simple ---- ##### ST_IsValid {#docs:stable:core_extensions:spatial:functions::st_isvalid} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_IsValid (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if the geometry is valid ---- ##### ST_Length {#docs:stable:core_extensions:spatial:functions::st_length} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql DOUBLE ST_Length (geom GEOMETRY) DOUBLE ST_Length (linestring LINESTRING_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the length of the input line geometry ---- ##### ST_Length_Spheroid {#docs:stable:core_extensions:spatial:functions::st_length_spheroid} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql DOUBLE ST_Length_Spheroid (geom GEOMETRY) DOUBLE ST_Length_Spheroid (line LINESTRING_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the length of the input geometry in meters, using an ellipsoidal model of the earth The input geometry is assumed to be in the [EPSG:4326](https://en.wikipedia.org/wiki/World_Geodetic_System) coordinate system (WGS84), with [latitude, longitude] axis order and the length is returned in meters. This function uses the [GeographicLib](https://geographiclib.sourceforge.io/) library, calculating the length using an ellipsoidal model of the earth. This is a highly accurate method for calculating the length of a line geometry taking the curvature of the earth into account, but is also the slowest. Returns `0.0` for any geometry that is not a `LINESTRING`, `MULTILINESTRING` or `GEOMETRYCOLLECTION` containing line geometries. ---- ##### ST_LineInterpolatePoint {#docs:stable:core_extensions:spatial:functions::st_lineinterpolatepoint} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_LineInterpolatePoint (line GEOMETRY, fraction DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns a point interpolated along a line at a fraction of total 2D length. ---- ##### ST_LineInterpolatePoints {#docs:stable:core_extensions:spatial:functions::st_lineinterpolatepoints} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_LineInterpolatePoints (line GEOMETRY, fraction DOUBLE, repeat BOOLEAN) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns a multi-point interpolated along a line at a fraction of total 2D length. if repeat is false, the result is a single point, (and equivalent to ST_LineInterpolatePoint), otherwise, the result is a multi-point with points repeated at the fraction interval. ---- ##### ST_LineMerge {#docs:stable:core_extensions:spatial:functions::st_linemerge} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_LineMerge (geom GEOMETRY) GEOMETRY ST_LineMerge (geom GEOMETRY, preserve_direction BOOLEAN) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} "Merges" the input line geometry, optionally taking direction into account. ---- ##### ST_LineString2DFromWKB {#docs:stable:core_extensions:spatial:functions::st_linestring2dfromwkb} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_LineString2DFromWKB (linestring LINESTRING_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Deserialize a LINESTRING_2D from a WKB encoded blob ---- ##### ST_LineSubstring {#docs:stable:core_extensions:spatial:functions::st_linesubstring} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_LineSubstring (line GEOMETRY, start_fraction DOUBLE, end_fraction DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns a substring of a line between two fractions of total 2D length. ---- ##### ST_M {#docs:stable:core_extensions:spatial:functions::st_m} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql DOUBLE ST_M (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the M coordinate of a point geometry ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_M(ST_Point(1, 2, 3, 4)) ``` ---- ##### ST_MMax {#docs:stable:core_extensions:spatial:functions::st_mmax} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql DOUBLE ST_MMax (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the maximum M coordinate of a geometry ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_MMax(ST_Point(1, 2, 3, 4)) ``` ---- ##### ST_MMin {#docs:stable:core_extensions:spatial:functions::st_mmin} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql DOUBLE ST_MMin (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the minimum M coordinate of a geometry ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_MMin(ST_Point(1, 2, 3, 4)) ``` ---- ##### ST_MakeEnvelope {#docs:stable:core_extensions:spatial:functions::st_makeenvelope} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_MakeEnvelope (min_x DOUBLE, min_y DOUBLE, max_x DOUBLE, max_y DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Create a rectangular polygon from min/max coordinates ---- ##### ST_MakeLine {#docs:stable:core_extensions:spatial:functions::st_makeline} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_MakeLine (geoms GEOMETRY[]) GEOMETRY ST_MakeLine (start GEOMETRY, end GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Create a LINESTRING from a list of POINT geometries ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_MakeLine([ST_Point(0, 0), ST_Point(1, 1)]); ---- LINESTRING(0 0, 1 1) ``` ---- ##### ST_MakePolygon {#docs:stable:core_extensions:spatial:functions::st_makepolygon} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_MakePolygon (shell GEOMETRY) GEOMETRY ST_MakePolygon (shell GEOMETRY, holes GEOMETRY[]) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Create a POLYGON from a LINESTRING shell ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_MakePolygon(ST_LineString([ST_Point(0, 0), ST_Point(1, 0), ST_Point(1, 1), ST_Point(0, 0)])); ``` ---- ##### ST_MakeValid {#docs:stable:core_extensions:spatial:functions::st_makevalid} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_MakeValid (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns a valid representation of the geometry ---- ##### ST_MaximumInscribedCircle {#docs:stable:core_extensions:spatial:functions::st_maximuminscribedcircle} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql STRUCT(center GEOMETRY, nearest GEOMETRY, radius DOUBLE) ST_MaximumInscribedCircle (geom GEOMETRY) STRUCT(center GEOMETRY, nearest GEOMETRY, radius DOUBLE) ST_MaximumInscribedCircle (geom GEOMETRY, tolerance DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the maximum inscribed circle of the input geometry, optionally with a tolerance. By default, the tolerance is computed as `max(width, height) / 1000`. The return value is a struct with the center of the circle, the nearest point to the center on the boundary of the geometry, and the radius of the circle. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- Find the maximum inscribed circle of a square SELECT ST_MaximumInscribedCircle( ST_GeomFromText('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))') ); ---- {'center': POINT (5 5), 'nearest': POINT (5 0), 'radius': 5.0} ``` ---- ##### ST_MinimumRotatedRectangle {#docs:stable:core_extensions:spatial:functions::st_minimumrotatedrectangle} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_MinimumRotatedRectangle (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the minimum rotated rectangle that bounds the input geometry, finding the surrounding box that has the lowest area by using a rotated rectangle, rather than taking the lowest and highest coordinate values as per ST_Envelope(). ---- ##### ST_Multi {#docs:stable:core_extensions:spatial:functions::st_multi} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Multi (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Turns a single geometry into a multi geometry. If the geometry is already a multi geometry, it is returned as is. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_Multi(ST_GeomFromText('POINT(1 2)')); ---- MULTIPOINT (1 2) SELECT ST_Multi(ST_GeomFromText('LINESTRING(1 1, 2 2)')); ---- MULTILINESTRING ((1 1, 2 2)) SELECT ST_Multi(ST_GeomFromText('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))')); ---- MULTIPOLYGON (((0 0, 0 1, 1 1, 1 0, 0 0))) ``` ---- ##### ST_NGeometries {#docs:stable:core_extensions:spatial:functions::st_ngeometries} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql INTEGER ST_NGeometries (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the number of component geometries in a collection geometry. If the input geometry is not a collection, this function returns 0 or 1 depending on if the geometry is empty or not. ---- ##### ST_NInteriorRings {#docs:stable:core_extensions:spatial:functions::st_ninteriorrings} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql INTEGER ST_NInteriorRings (geom GEOMETRY) INTEGER ST_NInteriorRings (polygon POLYGON_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the number of interior rings of a polygon ---- ##### ST_NPoints {#docs:stable:core_extensions:spatial:functions::st_npoints} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql UINTEGER ST_NPoints (geom GEOMETRY) UBIGINT ST_NPoints (point POINT_2D) UBIGINT ST_NPoints (linestring LINESTRING_2D) UBIGINT ST_NPoints (polygon POLYGON_2D) UBIGINT ST_NPoints (box BOX_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the number of vertices within a geometry ---- ##### ST_Node {#docs:stable:core_extensions:spatial:functions::st_node} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Node (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns a "noded" MultiLinestring, produced by combining a collection of input linestrings and adding additional vertices where they intersect. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- Create a noded multilinestring from two intersecting lines SELECT ST_Node( ST_GeomFromText('MULTILINESTRING((0 0, 2 2), (0 2, 2 0))') ); ---- MULTILINESTRING ((0 0, 1 1), (1 1, 2 2), (0 2, 1 1), (1 1, 2 0)) ``` ---- ##### ST_Normalize {#docs:stable:core_extensions:spatial:functions::st_normalize} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Normalize (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the "normalized" representation of the geometry ---- ##### ST_NumGeometries {#docs:stable:core_extensions:spatial:functions::st_numgeometries} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql INTEGER ST_NumGeometries (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the number of component geometries in a collection geometry. If the input geometry is not a collection, this function returns 0 or 1 depending on if the geometry is empty or not. ---- ##### ST_NumInteriorRings {#docs:stable:core_extensions:spatial:functions::st_numinteriorrings} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql INTEGER ST_NumInteriorRings (geom GEOMETRY) INTEGER ST_NumInteriorRings (polygon POLYGON_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the number of interior rings of a polygon ---- ##### ST_NumPoints {#docs:stable:core_extensions:spatial:functions::st_numpoints} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql UINTEGER ST_NumPoints (geom GEOMETRY) UBIGINT ST_NumPoints (point POINT_2D) UBIGINT ST_NumPoints (linestring LINESTRING_2D) UBIGINT ST_NumPoints (polygon POLYGON_2D) UBIGINT ST_NumPoints (box BOX_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the number of vertices within a geometry ---- ##### ST_Overlaps {#docs:stable:core_extensions:spatial:functions::st_overlaps} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_Overlaps (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if the geometries overlap ---- ##### ST_Perimeter {#docs:stable:core_extensions:spatial:functions::st_perimeter} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql DOUBLE ST_Perimeter (geom GEOMETRY) DOUBLE ST_Perimeter (polygon POLYGON_2D) DOUBLE ST_Perimeter (box BOX_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the length of the perimeter of the geometry ---- ##### ST_Perimeter_Spheroid {#docs:stable:core_extensions:spatial:functions::st_perimeter_spheroid} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql DOUBLE ST_Perimeter_Spheroid (geom GEOMETRY) DOUBLE ST_Perimeter_Spheroid (poly POLYGON_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the length of the perimeter in meters using an ellipsoidal model of the earths surface The input geometry is assumed to be in the [EPSG:4326](https://en.wikipedia.org/wiki/World_Geodetic_System) coordinate system (WGS84), with [latitude, longitude] axis order and the length is returned in meters. This function uses the [GeographicLib](https://geographiclib.sourceforge.io/) library, calculating the perimeter using an ellipsoidal model of the earth. This is a highly accurate method for calculating the perimeter of a polygon taking the curvature of the earth into account, but is also the slowest. Returns `0.0` for any geometry that is not a `POLYGON`, `MULTIPOLYGON` or `GEOMETRYCOLLECTION` containing polygon geometries. ---- ##### ST_Point {#docs:stable:core_extensions:spatial:functions::st_point} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Point (x DOUBLE, y DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Creates a GEOMETRY point ---- ##### ST_Point2D {#docs:stable:core_extensions:spatial:functions::st_point2d} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql POINT_2D ST_Point2D (x DOUBLE, y DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Creates a POINT_2D ---- ##### ST_Point2DFromWKB {#docs:stable:core_extensions:spatial:functions::st_point2dfromwkb} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Point2DFromWKB (point POINT_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Deserialize a POINT_2D from a WKB encoded blob ---- ##### ST_Point3D {#docs:stable:core_extensions:spatial:functions::st_point3d} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql POINT_3D ST_Point3D (x DOUBLE, y DOUBLE, z DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Creates a POINT_3D ---- ##### ST_Point4D {#docs:stable:core_extensions:spatial:functions::st_point4d} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql POINT_4D ST_Point4D (x DOUBLE, y DOUBLE, z DOUBLE, m DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Creates a POINT_4D ---- ##### ST_PointN {#docs:stable:core_extensions:spatial:functions::st_pointn} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_PointN (geom GEOMETRY, index INTEGER) POINT_2D ST_PointN (linestring LINESTRING_2D, index INTEGER) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the n'th vertex from the input geometry as a point geometry ---- ##### ST_PointOnSurface {#docs:stable:core_extensions:spatial:functions::st_pointonsurface} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_PointOnSurface (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns a point guaranteed to lie on the surface of the geometry ---- ##### ST_Points {#docs:stable:core_extensions:spatial:functions::st_points} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Points (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Collects all the vertices in the geometry into a MULTIPOINT ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql select st_points('LINESTRING(1 1, 2 2)'::geometry); ---- MULTIPOINT (1 1, 2 2) select st_points('MULTIPOLYGON Z EMPTY'::geometry); ---- MULTIPOINT Z EMPTY ``` ---- ##### ST_Polygon2DFromWKB {#docs:stable:core_extensions:spatial:functions::st_polygon2dfromwkb} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Polygon2DFromWKB (polygon POLYGON_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Deserialize a POLYGON_2D from a WKB encoded blob ---- ##### ST_Polygonize {#docs:stable:core_extensions:spatial:functions::st_polygonize} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Polygonize (geometries GEOMETRY[]) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns a polygonized representation of the input geometries ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- Create a polygon from a closed linestring ring SELECT ST_Polygonize([ ST_GeomFromText('LINESTRING(0 0, 0 10, 10 10, 10 0, 0 0)') ]); GEOMETRYCOLLECTION (POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0))) ``` ---- ##### ST_QuadKey {#docs:stable:core_extensions:spatial:functions::st_quadkey} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql VARCHAR ST_QuadKey (longitude DOUBLE, latitude DOUBLE, level INTEGER) VARCHAR ST_QuadKey (point GEOMETRY, level INTEGER) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Compute the [quadkey](https://learn.microsoft.com/en-us/bingmaps/articles/bing-maps-tile-system) for a given lon/lat point at a given level. Note that the parameter order is __longitude__, __latitude__. `level` has to be between 1 and 23, inclusive. The input coordinates will be clamped to the lon/lat bounds of the earth (longitude between -180 and 180, latitude between -85.05112878 and 85.05112878). The geometry overload throws an error if the input geometry is not a `POINT` ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_QuadKey(st_point(11.08, 49.45), 10); ---- 1333203202 ``` ---- ##### ST_ReducePrecision {#docs:stable:core_extensions:spatial:functions::st_reduceprecision} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_ReducePrecision (geom GEOMETRY, precision DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the geometry with all vertices reduced to the given precision ---- ##### ST_RemoveRepeatedPoints {#docs:stable:core_extensions:spatial:functions::st_removerepeatedpoints} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql LINESTRING_2D ST_RemoveRepeatedPoints (line LINESTRING_2D) LINESTRING_2D ST_RemoveRepeatedPoints (line LINESTRING_2D, tolerance DOUBLE) GEOMETRY ST_RemoveRepeatedPoints (geom GEOMETRY) GEOMETRY ST_RemoveRepeatedPoints (geom GEOMETRY, tolerance DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Remove repeated points from a LINESTRING. ---- ##### ST_Reverse {#docs:stable:core_extensions:spatial:functions::st_reverse} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Reverse (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the geometry with the order of its vertices reversed ---- ##### ST_ShortestLine {#docs:stable:core_extensions:spatial:functions::st_shortestline} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_ShortestLine (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the shortest line between two geometries ---- ##### ST_Simplify {#docs:stable:core_extensions:spatial:functions::st_simplify} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Simplify (geom GEOMETRY, tolerance DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns a simplified version of the geometry ---- ##### ST_SimplifyPreserveTopology {#docs:stable:core_extensions:spatial:functions::st_simplifypreservetopology} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_SimplifyPreserveTopology (geom GEOMETRY, tolerance DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns a simplified version of the geometry that preserves topology ---- ##### ST_StartPoint {#docs:stable:core_extensions:spatial:functions::st_startpoint} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_StartPoint (geom GEOMETRY) POINT_2D ST_StartPoint (line LINESTRING_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the start point of a LINESTRING. ---- ##### ST_TileEnvelope {#docs:stable:core_extensions:spatial:functions::st_tileenvelope} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_TileEnvelope (tile_zoom INTEGER, tile_x INTEGER, tile_y INTEGER) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} The `ST_TileEnvelope` scalar function generates tile envelope rectangular polygons from specified zoom level and tile indices. This is used in MVT generation to select the features corresponding to the tile extent. The envelope is in the Web Mercator coordinate reference system (EPSG:3857). The tile pyramid starts at zoom level 0, corresponding to a single tile for the world. Each zoom level doubles the number of tiles in each direction, such that zoom level 1 is 2 tiles wide by 2 tiles high, zoom level 2 is 4 tiles wide by 4 tiles high, and so on. Tile indices start at `[x=0, y=0]` at the top left, and increase down and right. For example, at zoom level 2, the top right tile is `[x=3, y=0]`, the bottom left tile is `[x=0, y=3]`, and the bottom right is `[x=3, y=3]`. ```sql SELECT ST_TileEnvelope(2, 3, 1); ``` ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_TileEnvelope(2, 3, 1); â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ st_tileenvelope(2, 3, 1) â”‚ â”‚ geometry â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ POLYGON ((1.00188E+07 0, 1.00188E+07 1.00188E+07, 2.00375E+07 1.00188E+07, 2.00375E+07 0, 1.00188E+07 0)) â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ##### ST_Touches {#docs:stable:core_extensions:spatial:functions::st_touches} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_Touches (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if the geometries touch ---- ##### ST_Transform {#docs:stable:core_extensions:spatial:functions::st_transform} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql BOX_2D ST_Transform (box BOX_2D, source_crs VARCHAR, target_crs VARCHAR) BOX_2D ST_Transform (box BOX_2D, source_crs VARCHAR, target_crs VARCHAR, always_xy BOOLEAN) POINT_2D ST_Transform (point POINT_2D, source_crs VARCHAR, target_crs VARCHAR) POINT_2D ST_Transform (point POINT_2D, source_crs VARCHAR, target_crs VARCHAR, always_xy BOOLEAN) GEOMETRY ST_Transform (geom GEOMETRY, source_crs VARCHAR, target_crs VARCHAR) GEOMETRY ST_Transform (geom GEOMETRY, source_crs VARCHAR, target_crs VARCHAR, always_xy BOOLEAN) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Transforms a geometry between two coordinate systems The source and target coordinate systems can be specified using any format that the [PROJ library](https://proj.org) supports. The third optional `always_xy` parameter can be used to force the input and output geometries to be interpreted as having a [easting, northing] coordinate axis order regardless of what the source and target coordinate system definition says. This is particularly useful when transforming to/from the [WGS84/EPSG:4326](https://en.wikipedia.org/wiki/World_Geodetic_System) coordinate system (what most people think of when they hear "longitude"/"latitude" or "GPS coordinates"), which is defined as having a [latitude, longitude] axis order even though [longitude, latitude] is commonly used in practice (e.g. in [GeoJSON](https://tools.ietf.org/html/rfc7946)). More details available in the [PROJ documentation](https://proj.org/en/9.3/faq.html#why-is-the-axis-ordering-in-proj-not-consistent). DuckDB spatial vendors its own static copy of the PROJ database of coordinate systems, so if you have your own installation of PROJ on your system the available coordinate systems may differ to what's available in other GIS software. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- Transform a geometry from EPSG:4326 to EPSG:3857 (WGS84 to WebMercator) -- Note that since WGS84 is defined as having a [latitude, longitude] axis order -- we follow the standard and provide the input geometry using that axis order, -- but the output will be [easting, northing] because that is what's defined by -- WebMercator. SELECT ST_Transform( st_point(52.373123, 4.892360), 'EPSG:4326', 'EPSG:3857' ); ---- POINT (544615.0239773799 6867874.103539125) -- Alternatively, let's say we got our input point from e.g. a GeoJSON file, -- which uses WGS84 but with [longitude, latitude] axis order. We can use the -- `always_xy` parameter to force the input geometry to be interpreted as having -- a [northing, easting] axis order instead, even though the source coordinate -- reference system definition (WGS84) says otherwise. SELECT ST_Transform( -- note the axis order is reversed here st_point(4.892360, 52.373123), 'EPSG:4326', 'EPSG:3857', always_xy := true ); ---- POINT (544615.0239773799 6867874.103539125) -- Transform a geometry from OSG36 British National Grid EPSG:27700 to EPSG:4326 WGS84 -- Standard transform is often fine for the first few decimal places before being wrong -- which could result in an error starting at about 10m and possibly much more SELECT ST_Transform(bng, 'EPSG:27700', 'EPSG:4326', xy := true) AS without_grid_file FROM (SELECT ST_GeomFromText('POINT( 170370.718 11572.405 )') AS bng); ---- POINT (-5.202992651563592 49.96007490162923) -- By using an official NTv2 grid file, we can reduce the error down around the 9th decimal place -- which in theory is below a millimetre, and in practise unlikely that your coordinates are that precise -- British National Grid "NTv2 format files" download available here: -- https://www.ordnancesurvey.co.uk/products/os-net/for-developers SELECT ST_Transform(bng , '+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +units=m +no_defs +nadgrids=/full/path/to/OSTN15-NTv2/OSTN15_NTv2_OSGBtoETRS.gsb +type=crs' , 'EPSG:4326', xy := true) AS with_grid_file FROM (SELECT ST_GeomFromText('POINT( 170370.718 11572.405 )') AS bng) t; ---- POINT (-5.203046090608746 49.96006137018598) ``` ---- ##### ST_Union {#docs:stable:core_extensions:spatial:functions::st_union} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Union (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the union of two geometries ---- ##### ST_VoronoiDiagram {#docs:stable:core_extensions:spatial:functions::st_voronoidiagram} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_VoronoiDiagram (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the Voronoi diagram of the supplied MultiPoint geometry ---- ##### ST_Within {#docs:stable:core_extensions:spatial:functions::st_within} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql BOOLEAN ST_Within (geom1 POINT_2D, geom2 POLYGON_2D) BOOLEAN ST_Within (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if the first geometry is within the second ---- ##### ST_WithinProperly {#docs:stable:core_extensions:spatial:functions::st_withinproperly} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql BOOLEAN ST_WithinProperly (geom1 GEOMETRY, geom2 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns true if the first geometry \"properly\" is contained by the second geometry This function functions the same as `ST_ContainsProperly`, but the arguments are swapped. ---- ##### ST_X {#docs:stable:core_extensions:spatial:functions::st_x} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql DOUBLE ST_X (geom GEOMETRY) DOUBLE ST_X (point POINT_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the X coordinate of a point geometry ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_X(ST_Point(1, 2)) ``` ---- ##### ST_XMax {#docs:stable:core_extensions:spatial:functions::st_xmax} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql DOUBLE ST_XMax (geom GEOMETRY) DOUBLE ST_XMax (point POINT_2D) DOUBLE ST_XMax (line LINESTRING_2D) DOUBLE ST_XMax (polygon POLYGON_2D) DOUBLE ST_XMax (box BOX_2D) FLOAT ST_XMax (box BOX_2DF) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the maximum X coordinate of a geometry ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_XMax(ST_Point(1, 2)) ``` ---- ##### ST_XMin {#docs:stable:core_extensions:spatial:functions::st_xmin} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql DOUBLE ST_XMin (geom GEOMETRY) DOUBLE ST_XMin (point POINT_2D) DOUBLE ST_XMin (line LINESTRING_2D) DOUBLE ST_XMin (polygon POLYGON_2D) DOUBLE ST_XMin (box BOX_2D) FLOAT ST_XMin (box BOX_2DF) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the minimum X coordinate of a geometry ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_XMin(ST_Point(1, 2)) ``` ---- ##### ST_Y {#docs:stable:core_extensions:spatial:functions::st_y} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql DOUBLE ST_Y (geom GEOMETRY) DOUBLE ST_Y (point POINT_2D) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the Y coordinate of a point geometry ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_Y(ST_Point(1, 2)) ``` ---- ##### ST_YMax {#docs:stable:core_extensions:spatial:functions::st_ymax} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql DOUBLE ST_YMax (geom GEOMETRY) DOUBLE ST_YMax (point POINT_2D) DOUBLE ST_YMax (line LINESTRING_2D) DOUBLE ST_YMax (polygon POLYGON_2D) DOUBLE ST_YMax (box BOX_2D) FLOAT ST_YMax (box BOX_2DF) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the maximum Y coordinate of a geometry ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_YMax(ST_Point(1, 2)) ``` ---- ##### ST_YMin {#docs:stable:core_extensions:spatial:functions::st_ymin} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql DOUBLE ST_YMin (geom GEOMETRY) DOUBLE ST_YMin (point POINT_2D) DOUBLE ST_YMin (line LINESTRING_2D) DOUBLE ST_YMin (polygon POLYGON_2D) DOUBLE ST_YMin (box BOX_2D) FLOAT ST_YMin (box BOX_2DF) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the minimum Y coordinate of a geometry ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_YMin(ST_Point(1, 2)) ``` ---- ##### ST_Z {#docs:stable:core_extensions:spatial:functions::st_z} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql DOUBLE ST_Z (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the Z coordinate of a point geometry ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_Z(ST_Point(1, 2, 3)) ``` ---- ##### ST_ZMFlag {#docs:stable:core_extensions:spatial:functions::st_zmflag} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql UTINYINT ST_ZMFlag (geom GEOMETRY) UTINYINT ST_ZMFlag (wkb WKB_BLOB) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns a flag indicating the presence of Z and M values in the input geometry. 0 = No Z or M values 1 = M values only 2 = Z values only 3 = Z and M values ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- ZMFlag for a 2D geometry SELECT ST_ZMFlag(ST_GeomFromText('POINT(1 1)')); ---- 0 -- ZMFlag for a 3DZ geometry SELECT ST_ZMFlag(ST_GeomFromText('POINT Z(1 1 1)')); ---- 2 -- ZMFlag for a 3DM geometry SELECT ST_ZMFlag(ST_GeomFromText('POINT M(1 1 1)')); ---- 1 -- ZMFlag for a 4D geometry SELECT ST_ZMFlag(ST_GeomFromText('POINT ZM(1 1 1 1)')); ---- 3 ``` ---- ##### ST_ZMax {#docs:stable:core_extensions:spatial:functions::st_zmax} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql DOUBLE ST_ZMax (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the maximum Z coordinate of a geometry ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_ZMax(ST_Point(1, 2, 3)) ``` ---- ##### ST_ZMin {#docs:stable:core_extensions:spatial:functions::st_zmin} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql DOUBLE ST_ZMin (geom GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the minimum Z coordinate of a geometry ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_ZMin(ST_Point(1, 2, 3)) ``` ---- #### Aggregate Functions {#docs:stable:core_extensions:spatial:functions::aggregate-functions} ##### ST_CoverageInvalidEdges_Agg {#docs:stable:core_extensions:spatial:functions::st_coverageinvalidedges_agg} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_CoverageInvalidEdges_Agg (col0 GEOMETRY) GEOMETRY ST_CoverageInvalidEdges_Agg (col0 GEOMETRY, col1 DOUBLE) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the invalid edges of a coverage geometry ---- ##### ST_CoverageSimplify_Agg {#docs:stable:core_extensions:spatial:functions::st_coveragesimplify_agg} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_CoverageSimplify_Agg (col0 GEOMETRY, col1 DOUBLE) GEOMETRY ST_CoverageSimplify_Agg (col0 GEOMETRY, col1 DOUBLE, col2 BOOLEAN) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Simplifies a set of geometries while maintaining coverage ---- ##### ST_CoverageUnion_Agg {#docs:stable:core_extensions:spatial:functions::st_coverageunion_agg} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_CoverageUnion_Agg (col0 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Unions a set of geometries while maintaining coverage ---- ##### ST_Envelope_Agg {#docs:stable:core_extensions:spatial:functions::st_envelope_agg} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Envelope_Agg (col0 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Alias for [ST_Extent_Agg](#::st_extent_agg). Computes the minimal-bounding-box polygon containing the set of input geometries. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_Extent_Agg(geom) FROM UNNEST([ST_Point(1,1), ST_Point(5,5)]) AS _(geom); -- POLYGON ((1 1, 1 5, 5 5, 5 1, 1 1)) ``` ---- ##### ST_Extent_Agg {#docs:stable:core_extensions:spatial:functions::st_extent_agg} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Extent_Agg (col0 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Computes the minimal-bounding-box polygon containing the set of input geometries ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT ST_Extent_Agg(geom) FROM UNNEST([ST_Point(1,1), ST_Point(5,5)]) AS _(geom); -- POLYGON ((1 1, 1 5, 5 5, 5 1, 1 1)) ``` ---- ##### ST_Intersection_Agg {#docs:stable:core_extensions:spatial:functions::st_intersection_agg} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Intersection_Agg (col0 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Computes the intersection of a set of geometries ---- ##### ST_MemUnion_Agg {#docs:stable:core_extensions:spatial:functions::st_memunion_agg} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_MemUnion_Agg (col0 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Computes the union of a set of input geometries. "Slower, but might be more memory efficient than ST_UnionAgg as each geometry is merged into the union individually rather than all at once. ---- ##### ST_Union_Agg {#docs:stable:core_extensions:spatial:functions::st_union_agg} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Union_Agg (col0 GEOMETRY) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Computes the union of a set of input geometries ---- #### Macro Functions {#docs:stable:core_extensions:spatial:functions::macro-functions} ##### ST_Rotate {#docs:stable:core_extensions:spatial:functions::st_rotate} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_Rotate (geom GEOMETRY, radians double) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Alias of ST_RotateZ ---- ##### ST_RotateX {#docs:stable:core_extensions:spatial:functions::st_rotatex} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_RotateX (geom GEOMETRY, radians double) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Rotates a geometry around the X axis. This is a shorthand macro for calling ST_Affine. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- Rotate a 3D point 90 degrees (Ï€/2 radians) around the X-axis SELECT ST_RotateX(ST_GeomFromText('POINT Z(0 1 0)'), pi()/2); ---- POINT Z (0 0 1) ``` ---- ##### ST_RotateY {#docs:stable:core_extensions:spatial:functions::st_rotatey} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_RotateY (geom GEOMETRY, radians double) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Rotates a geometry around the Y axis. This is a shorthand macro for calling ST_Affine. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- Rotate a 3D point 90 degrees (Ï€/2 radians) around the Y-axis SELECT ST_RotateY(ST_GeomFromText('POINT Z(1 0 0)'), pi()/2); ---- POINT Z (0 0 -1) ``` ---- ##### ST_RotateZ {#docs:stable:core_extensions:spatial:functions::st_rotatez} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_RotateZ (geom GEOMETRY, radians double) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Rotates a geometry around the Z axis. This is a shorthand macro for calling ST_Affine. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- Rotate a point 90 degrees (Ï€/2 radians) around the Z-axis SELECT ST_RotateZ(ST_Point(1, 0), pi()/2); ---- POINT (0 1) ``` ---- ##### ST_Scale {#docs:stable:core_extensions:spatial:functions::st_scale} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_Scale (geom GEOMETRY, xs double, ys double, zs double) GEOMETRY ST_Scale (geom GEOMETRY, xs double, ys double) ``` ---- ##### ST_TransScale {#docs:stable:core_extensions:spatial:functions::st_transscale} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql GEOMETRY ST_TransScale (geom GEOMETRY, dx double, dy double, xs double, ys double) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Translates and then scales a geometry in X and Y direction. This is a shorthand macro for calling ST_Affine. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- Translate by (1, 2) then scale by (2, 3) SELECT ST_TransScale(ST_Point(1, 1), 1, 2, 2, 3); ---- POINT (4 9) ``` ---- ##### ST_Translate {#docs:stable:core_extensions:spatial:functions::st_translate} ###### Signatures {#docs:stable:core_extensions:spatial:functions::signatures} ```sql GEOMETRY ST_Translate (geom GEOMETRY, dx double, dy double, dz double) GEOMETRY ST_Translate (geom GEOMETRY, dx double, dy double) ``` ---- #### Table Functions {#docs:stable:core_extensions:spatial:functions::table-functions} ##### ST_Drivers {#docs:stable:core_extensions:spatial:functions::st_drivers} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql ST_Drivers () ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Returns the list of supported GDAL drivers and file formats Note that far from all of these drivers have been tested properly. Some may require additional options to be passed to work as expected. If you run into any issues please first consult the [consult the GDAL docs](https://gdal.org/drivers/vector/index.html). ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT * FROM ST_Drivers(); ``` ---- ##### ST_GeneratePoints {#docs:stable:core_extensions:spatial:functions::st_generatepoints} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql ST_GeneratePoints (col0 BOX_2D, col1 BIGINT) ST_GeneratePoints (col0 BOX_2D, col1 BIGINT, col2 BIGINT) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Generates a set of random points within the specified bounding box. Takes a bounding box (min_x, min_y, max_x, max_y), a count of points to generate, and optionally a seed for the random number generator. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT * FROM ST_GeneratePoints({min_x: 0, min_y:0, max_x:10, max_y:10}::BOX_2D, 5, 42); ``` ---- ##### ST_Read {#docs:stable:core_extensions:spatial:functions::st_read} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql ST_Read (col0 VARCHAR, keep_wkb BOOLEAN, max_batch_size INTEGER, sequential_layer_scan BOOLEAN, layer VARCHAR, sibling_files VARCHAR[], spatial_filter WKB_BLOB, spatial_filter_box BOX_2D, allowed_drivers VARCHAR[], open_options VARCHAR[]) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Read and import a variety of geospatial file formats using the GDAL library. The `ST_Read` table function is based on the [GDAL](https://gdal.org/index.html) translator library and enables reading spatial data from a variety of geospatial vector file formats as if they were DuckDB tables. > See [ST_Drivers](#::st_drivers) for a list of supported file formats and drivers. Except for the `path` parameter, all parameters are optional. | Parameter | Type | Description | | --------- | -----| ----------- | | `path` | VARCHAR | The path to the file to read. Mandatory | | `sequential_layer_scan` | BOOLEAN | If set to true, the table function will scan through all layers sequentially and return the first layer that matches the given layer name. This is required for some drivers to work properly, e.g., the OSM driver. | | `spatial_filter` | WKB_BLOB | If set to a WKB blob, the table function will only return rows that intersect with the given WKB geometry. Some drivers may support efficient spatial filtering natively, in which case it will be pushed down. Otherwise the filtering is done by GDAL which may be much slower. | | `open_options` | VARCHAR[] | A list of key-value pairs that are passed to the GDAL driver to control the opening of the file. E.g., the GeoJSON driver supports a FLATTEN_NESTED_ATTRIBUTES=YES option to flatten nested attributes. | | `layer` | VARCHAR | The name of the layer to read from the file. If NULL, the first layer is returned. Can also be a layer index (starting at 0). | | `allowed_drivers` | VARCHAR[] | A list of GDAL driver names that are allowed to be used to open the file. If empty, all drivers are allowed. | | `sibling_files` | VARCHAR[] | A list of sibling files that are required to open the file. E.g., the ESRI Shapefile driver requires a .shx file to be present. Although most of the time these can be discovered automatically. | | `spatial_filter_box` | BOX_2D | If set to a BOX_2D, the table function will only return rows that intersect with the given bounding box. Similar to spatial_filter. | | `keep_wkb` | BOOLEAN | If set, the table function will return geometries in a wkb_geometry column with the type WKB_BLOB (which can be cast to BLOB) instead of GEOMETRY. This is useful if you want to use DuckDB with more exotic geometry subtypes that DuckDB spatial doesnt support representing in the GEOMETRY type yet. | Note that GDAL is single-threaded, so this table function will not be able to make full use of parallelism. By using `ST_Read`, the spatial extension also provides â€œreplacement scansâ€ for common geospatial file formats, allowing you to query files of these formats as if they were tables directly. ```sql SELECT * FROM './path/to/some/shapefile/dataset.shp'; ``` In practice this is just syntax-sugar for calling ST_Read, so there is no difference in performance. If you want to pass additional options, you should use the ST_Read table function directly. The following formats are currently recognized by their file extension: | Format | Extension | | ------ | --------- | | ESRI ShapeFile | .shp | | GeoPackage | .gpkg | | FlatGeoBuf | .fgb | ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- Read a Shapefile SELECT * FROM ST_Read('some/file/path/filename.shp'); -- Read a GeoJSON file CREATE TABLE my_geojson_table AS SELECT * FROM ST_Read('some/file/path/filename.json'); ``` ---- ##### ST_ReadOSM {#docs:stable:core_extensions:spatial:functions::st_readosm} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql ST_ReadOSM (col0 VARCHAR) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} The `ST_ReadOsm()` table function enables reading compressed OpenStreetMap data directly from a `.osm.pbf file.` This function uses multithreading and zero-copy protobuf parsing which makes it a lot faster than using the `ST_Read()` OSM driver, however it only outputs the raw OSM data (Nodes, Ways, Relations), without constructing any geometries. For simple node entities (like PoI's) you can trivially construct POINT geometries, but it is also possible to construct LINESTRING and POLYGON geometries by manually joining refs and nodes together in SQL, although with available memory usually being a limiting factor. The `ST_ReadOSM()` function also provides a "replacement scan" to enable reading from a file directly as if it were a table. This is just syntax sugar for calling `ST_ReadOSM()` though. Example: ```sql SELECT * FROM 'tmp/data/germany.osm.pbf' LIMIT 5; ``` ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql SELECT * FROM ST_ReadOSM('tmp/data/germany.osm.pbf') WHERE tags['highway'] != [] LIMIT 5; ---- â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ kind â”‚ id â”‚ tags â”‚ refs â”‚ lat â”‚ lon â”‚ ref_roles â”‚ ref_types â”‚ â”‚ enum('node', 'way'â€¦ â”‚ int64 â”‚ map(varchar, varchâ€¦ â”‚ int64[] â”‚ double â”‚ double â”‚ varchar[] â”‚ enum('node', 'way', â€¦ â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ node â”‚ 122351 â”‚ {bicycle=yes, buttâ€¦ â”‚ â”‚ 53.5492951 â”‚ 9.977553 â”‚ â”‚ â”‚ â”‚ node â”‚ 122397 â”‚ {crossing=no, highâ€¦ â”‚ â”‚ 53.520990100000006 â”‚ 10.0156924 â”‚ â”‚ â”‚ â”‚ node â”‚ 122493 â”‚ {TMC:cid_58:tabcd_â€¦ â”‚ â”‚ 53.129614600000004 â”‚ 8.1970173 â”‚ â”‚ â”‚ â”‚ node â”‚ 123566 â”‚ {highway=traffic_sâ€¦ â”‚ â”‚ 54.617268200000005 â”‚ 8.9718171 â”‚ â”‚ â”‚ â”‚ node â”‚ 125801 â”‚ {TMC:cid_58:tabcd_â€¦ â”‚ â”‚ 53.070685000000005 â”‚ 8.7819939 â”‚ â”‚ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ---- ##### ST_ReadSHP {#docs:stable:core_extensions:spatial:functions::st_readshp} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql ST_ReadSHP (col0 VARCHAR, encoding VARCHAR) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Read a Shapefile without relying on the GDAL library ---- ##### ST_Read_Meta {#docs:stable:core_extensions:spatial:functions::st_read_meta} ###### Signature {#docs:stable:core_extensions:spatial:functions::signature} ```sql ST_Read_Meta (col0 VARCHAR) ST_Read_Meta (col0 VARCHAR[]) ``` ###### Description {#docs:stable:core_extensions:spatial:functions::description} Read the metadata from a variety of geospatial file formats using the GDAL library. The `ST_Read_Meta` table function accompanies the `ST_Read` table function, but instead of reading the contents of a file, this function scans the metadata instead. Since the data model of the underlying GDAL library is quite flexible, most of the interesting metadata is within the returned `layers` column, which is a somewhat complex nested structure of DuckDB `STRUCT` and `LIST` types. ###### Example {#docs:stable:core_extensions:spatial:functions::example} ```sql -- Find the coordinate reference system authority name and code for the first layers first geometry column in the file SELECT layers[1].geometry_fields[1].crs.auth_name as name, layers[1].geometry_fields[1].crs.auth_code as code FROM st_read_meta('../../tmp/data/amsterdam_roads.fgb'); ``` ---- ### R-Tree Indexes {#docs:stable:core_extensions:spatial:r-tree_indexes} As of DuckDB v1.1.0 the [`spatial` extension](#docs:stable:core_extensions:spatial:overview) provides basic support for spatial indexing through the R-tree extension index type. #### Why Should I Use an R-Tree Index? {#docs:stable:core_extensions:spatial:r-tree_indexes::why-should-i-use-an-r-tree-index} When working with geospatial datasets, it is very common that you want to filter rows based on their spatial relationship with a specific region of interest. Unfortunately, even though DuckDB's vectorized execution engine is pretty fast, this sort of operation does not scale very well to large datasets as it always requires a full table scan to check every row in the table. However, by indexing a table with an R-tree, it is possible to accelerate these types of queries significantly. #### How Do R-Tree Indexes Work? {#docs:stable:core_extensions:spatial:r-tree_indexes::how-do-r-tree-indexes-work} An R-tree is a balanced tree data structure that stores the approximate _minimum bounding rectangle_ of each geometry (and the internal ID of the corresponding row) in the leaf nodes, and the bounding rectangle enclosing all of the child nodes in each internal node. > The _minimum bounding rectangle_ (MBR) of a geometry is the smallest rectangle that completely encloses the geometry. Usually when we talk about the bounding rectangle of a geometry (or the bounding "box" in the context of 2D geometry), we mean the minimum bounding rectangle. Additionally, we tend to assume that bounding boxes/rectangles are _axis-aligned,_ i.e., the rectangle is **not** rotated â€“ the sides are always parallel to the coordinate axes. The MBR of a point is the point itself. By traversing the R-tree from top to bottom, it is possible to very quickly search a R-tree-indexed table for only those rows where the indexed geometry column intersect a specific region of interest, as you can skip searching entire sub-trees if the bounding rectangles of their parent nodes don't intersect the query region at all. Once the leaf nodes are reached, only the specific rows whose geometries intersect the query region have to be fetched from disk, and the often much more expensive exact spatial predicate check (and any other filters) only have to be executed for these rows. #### What Are the Limitations of R-Tree Indexes in DuckDB? {#docs:stable:core_extensions:spatial:r-tree_indexes::what-are-the-limitations-of-r-tree-indexes-in-duckdb} Before you get started using the R-tree index, there are some limitations to be aware of: - The R-tree index is only supported for the `GEOMETRY` data type. - The R-tree index will only be used to perform "index scans" when the table is filtered (using a `WHERE` clause) with one of the following spatial predicate functions (as they all imply intersection): `ST_Equals`, `ST_Intersects`, `ST_Touches`, `ST_Crosses`, `ST_Within`, `ST_Contains`, `ST_Overlaps`, `ST_Covers`, `ST_CoveredBy`, `ST_ContainsProperly`. - One of the arguments to the spatial predicate function must be a â€œconstantâ€ (i.e., an expression whose result is known at query planning time). This is because the query planner needs to know the bounding box of the query region _before_ the query itself is executed in order to use the R-tree index scan. In the future we want to enable R-tree indexes to be used to accelerate additional predicate functions and more complex queries such a spatial joins. #### How to Use R-Tree Indexes in DuckDB {#docs:stable:core_extensions:spatial:r-tree_indexes::how-to-use-r-tree-indexes-in-duckdb} To create an R-tree index, simply use the `CREATE INDEX` statement with the `USING RTREE` clause, passing the geometry column to index within the parentheses. For example: ```sql -- Create a table with a geometry column CREATE TABLE my_table (geom GEOMETRY); -- Create an R-tree index on the geometry column CREATE INDEX my_idx ON my_table USING RTREE (geom); ``` You can also pass in additional options when creating an R-tree index using the `WITH` clause to control the behavior of the R-tree index. For example, to specify the maximum number of entries per node in the R-tree, you can use the `max_node_capacity` option: ```sql CREATE INDEX my_idx ON my_table USING RTREE (geom) WITH (max_node_capacity = 16); ``` The impact tweaking these options will have on performance is highly dependent on the system setup DuckDB is running on, the spatial distribution of the dataset, and the query patterns of your specific workload. The defaults should be good enough, but you if you want to experiment with different parameters, see the [full list of options here](#::options). #### Example {#docs:stable:core_extensions:spatial:r-tree_indexes::example} Here is an example that shows how to create an R-tree index on a geometry column and where we can see that the `RTREE_INDEX_SCAN` operator is used when the table is filtered with a spatial predicate: ```sql INSTALL spatial; LOAD spatial; -- Create a table with 10_000_000 random points CREATE TABLE t1 AS SELECT point::GEOMETRY AS geom FROM st_generatepoints({min_x: 0, min_y: 0, max_x: 100, max_y: 100}::BOX_2D, 10_000, 1337); -- Create an index on the table. CREATE INDEX my_idx ON t1 USING RTREE (geom); -- Perform a query with a "spatial predicate" on the indexed geometry column -- Note how the second argument in this case, the ST_MakeEnvelope call is a "constant" SELECT count(*) FROM t1 WHERE ST_Within(geom, ST_MakeEnvelope(45, 45, 65, 65)); ``` ```text 390 ``` We can check for ourselves that an R-tree index scan is used by using the `EXPLAIN` statement: ```sql EXPLAIN SELECT count(*) FROM t1 WHERE ST_Within(geom, ST_MakeEnvelope(45, 45, 65, 65)); ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ UNGROUPED_AGGREGATE â”‚ â”‚ â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€ â”‚ â”‚ Aggregates: â”‚ â”‚ count_star() â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ FILTER â”‚ â”‚ â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€ â”‚ â”‚ ST_Within(geom, '...') â”‚ â”‚ â”‚ â”‚ ~2000 Rows â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ RTREE_INDEX_SCAN â”‚ â”‚ â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€ â”‚ â”‚ t1 (RTREE INDEX SCAN : â”‚ â”‚ my_idx) â”‚ â”‚ â”‚ â”‚ Projections: geom â”‚ â”‚ â”‚ â”‚ ~10000 Rows â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Performance Considerations {#docs:stable:core_extensions:spatial:r-tree_indexes::performance-considerations} ##### Bulk Loading & Maintenance {#docs:stable:core_extensions:spatial:r-tree_indexes::bulk-loading--maintenance} Creating R-trees on top of an already populated table is much faster than first creating the index and then inserting the data. This is because the R-tree will have to periodically rebalance itself and perform a somewhat costly splitting operation when a node reaches max capacity after an insert, potentially causing additional splits to cascade up the tree. However, when the R-tree index is created on an already populated table, a special bottom up "bulk loading algorithm" (Sort-Tile-Recursive) is used, which divides all entries into an already balanced tree as the total number of required nodes can be computed from the beginning. Additionally, using the bulk loading algorithm tends to create a R-tree with a better structure (less overlap between bounding boxes), which usually leads to better query performance. If you find that the performance of querying the R-tree starts to deteriorate after a large number of updates or deletions, dropping and re-creating the index might produce a higher quality R-tree. ##### Memory Usage {#docs:stable:core_extensions:spatial:r-tree_indexes::memory-usage} Like DuckDB's built in ART-index, all the associated buffers containing the R-tree will be lazily loaded from disk (when running DuckDB in disk-backed mode), but they are currently never unloaded unless the index is dropped. This means that if you end up scanning the entire index, the entire index will be loaded into memory and stay there for the duration of the database connection. However, all memory used by the R-tree index (even during bulk-loading) is tracked by DuckDB, and will count towards the memory limit set by the `memory_limit` configuration parameter. ##### Tuning {#docs:stable:core_extensions:spatial:r-tree_indexes::tuning} Depending on you specific workload, you might want to experiment with the `max_node_capacity` and `min_node_capacity` options to change the structure of the R-tree and how it responds to insertions and deletions, see the [full list of options here](#::options). In general, a tree with a higher total number of nodes (i.e., a lower `max_node_capacity`) _may_ result in a more granular structure that enables more aggressive pruning of sub-trees during query execution, but it will also require more memory to store the tree itself and be more punishing when querying larger regions as more internal nodes will have to be traversed. #### Options {#docs:stable:core_extensions:spatial:r-tree_indexes::options} The following options can be passed to the `WITH` clause when creating an R-tree index: (e.g., `CREATE INDEX my_idx ON my_table USING RTREE (geom) WITH (âŸ¨optionâŸ© = âŸ¨valueâŸ©);`{:.language-sql .highlight}) | Option | Description | Default | |---------------------|------------------------------------------------------|---------------------------| | `max_node_capacity` | The maximum number of entries per node in the R-tree | `128` | | `min_node_capacity` | The minimum number of entries per node in the R-tree | `0.4 * max_node_capacity` | *Should a node fall under the minimum number of entries after a deletion, the node will be dissolved and all the entries reinserted from the top of the tree. This is a common operation in R-tree implementations to prevent the tree from becoming too unbalanced. #### R-Tree Table Functions {#docs:stable:core_extensions:spatial:r-tree_indexes::r-tree-table-functions} The `rtree_index_dump(VARCHAR)` table function can be used to return all the nodes within an R-tree index which might come on handy when debugging, profiling or otherwise just inspecting the structure of the index. The function takes the name of the R-tree index as an argument and returns a table with the following columns: | Column name | Type | Description | |-------------|------------|-------------------------------------------------------------------------------| | `level` | `INTEGER` | The level of the node in the R-tree. The root node has level 0 | | `bounds` | `BOX_2DF` | The bounding box of the node | | `row_id` | `ROW_TYPE` | If this is a leaf node, the `rowid` of the row in the table, otherwise `NULL` | Example: ```sql -- Create a table with 64 random points CREATE TABLE t1 AS SELECT point::GEOMETRY AS geom FROM st_generatepoints({min_x: 0, min_y: 0, max_x: 100, max_y: 100}::BOX_2D, 64, 1337); -- Create an R-tree index on the geometry column (with a low max_node_capacity for demonstration purposes) CREATE INDEX my_idx ON t1 USING RTREE (geom) WITH (max_node_capacity = 4); -- Inspect the R-tree index. Notice how the area of the bounding boxes of the branch nodes -- decreases as we go deeper into the tree. SELECT level, bounds::GEOMETRY AS geom, CASE WHEN row_id IS NULL THEN st_area(geom) ELSE NULL END AS area, row_id, CASE WHEN row_id IS NULL THEN 'branch' ELSE 'leaf' END AS kind FROM rtree_index_dump('my_idx') ORDER BY area DESC; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ level â”‚ geom â”‚ area â”‚ row_id â”‚ kind â”‚ â”‚ int32 â”‚ geometry â”‚ double â”‚ int64 â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 0 â”‚ POLYGON ((2.17285037040710â€¦ â”‚ 3286.396482226409 â”‚ â”‚ branch â”‚ â”‚ 0 â”‚ POLYGON ((6.00962591171264â€¦ â”‚ 3193.725100864862 â”‚ â”‚ branch â”‚ â”‚ 0 â”‚ POLYGON ((0.74995160102844â€¦ â”‚ 3099.921458393704 â”‚ â”‚ branch â”‚ â”‚ 0 â”‚ POLYGON ((14.6168870925903â€¦ â”‚ 2322.2760491675654 â”‚ â”‚ branch â”‚ â”‚ 1 â”‚ POLYGON ((2.17285037040710â€¦ â”‚ 604.1520104388514 â”‚ â”‚ branch â”‚ â”‚ 1 â”‚ POLYGON ((26.6022186279296â€¦ â”‚ 569.1665467030252 â”‚ â”‚ branch â”‚ â”‚ 1 â”‚ POLYGON ((35.7942314147949â€¦ â”‚ 435.24662436250037 â”‚ â”‚ branch â”‚ â”‚ 1 â”‚ POLYGON ((62.2643051147460â€¦ â”‚ 396.39027683023596 â”‚ â”‚ branch â”‚ â”‚ 1 â”‚ POLYGON ((59.5225715637207â€¦ â”‚ 386.09153403820187 â”‚ â”‚ branch â”‚ â”‚ 1 â”‚ POLYGON ((82.3060836791992â€¦ â”‚ 369.15115640929434 â”‚ â”‚ branch â”‚ â”‚ Â· â”‚ Â· â”‚ Â· â”‚ Â· â”‚ Â· â”‚ â”‚ Â· â”‚ Â· â”‚ Â· â”‚ Â· â”‚ Â· â”‚ â”‚ Â· â”‚ Â· â”‚ Â· â”‚ Â· â”‚ Â· â”‚ â”‚ 2 â”‚ POLYGON ((20.5411434173584â€¦ â”‚ â”‚ 35 â”‚ leaf â”‚ â”‚ 2 â”‚ POLYGON ((14.6168870925903â€¦ â”‚ â”‚ 36 â”‚ leaf â”‚ â”‚ 2 â”‚ POLYGON ((43.7271652221679â€¦ â”‚ â”‚ 39 â”‚ leaf â”‚ â”‚ 2 â”‚ POLYGON ((53.4629211425781â€¦ â”‚ â”‚ 44 â”‚ leaf â”‚ â”‚ 2 â”‚ POLYGON ((26.6022186279296â€¦ â”‚ â”‚ 62 â”‚ leaf â”‚ â”‚ 2 â”‚ POLYGON ((53.1732063293457â€¦ â”‚ â”‚ 63 â”‚ leaf â”‚ â”‚ 2 â”‚ POLYGON ((78.1427154541015â€¦ â”‚ â”‚ 10 â”‚ leaf â”‚ â”‚ 2 â”‚ POLYGON ((75.1728591918945â€¦ â”‚ â”‚ 15 â”‚ leaf â”‚ â”‚ 2 â”‚ POLYGON ((62.2643051147460â€¦ â”‚ â”‚ 42 â”‚ leaf â”‚ â”‚ 2 â”‚ POLYGON ((80.5032577514648â€¦ â”‚ â”‚ 49 â”‚ leaf â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 84 rows (20 shown) 5 columns â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ### GDAL Integration {#docs:stable:core_extensions:spatial:gdal} The spatial extension integrates the [GDAL](https://gdal.org/en/latest/) translator library to read and write spatial data from a variety of geospatial vector file formats. See the documentation for the [`st_read` table function](#docs:stable:core_extensions:spatial:functions::st_read) for how to make use of this in practice. In order to spare users from having to setup and install additional dependencies on their system, the spatial extension bundles its own copy of the GDAL library. This also means that spatial's version of GDAL may not be the latest version available or provide support for all of the file formats that a system-wide GDAL installation otherwise would. Refer to the section on the [`st_drivers` table function](#docs:stable:core_extensions:spatial:functions::st_drivers) to inspect which GDAL drivers are currently available. #### GDAL Based `COPY` Function {#docs:stable:core_extensions:spatial:gdal::gdal-based-copy-function} The spatial extension does not only enable _importing_ geospatial file formats (through the `ST_Read` function), it also enables _exporting_ DuckDB tables to different geospatial vector formats through a GDAL based `COPY` function. For example, to export a table to a GeoJSON file, with generated bounding boxes, you can use the following query: ```sql COPY âŸ¨tableâŸ© TO 'some/file/path/filename.geojson' WITH (FORMAT gdal, DRIVER 'GeoJSON', LAYER_CREATION_OPTIONS 'WRITE_BBOX=YES', SRS 'EPSG:4326'); ``` Available options: * `FORMAT`: is the only required option and must be set to `GDAL` to use the GDAL based copy function. * `DRIVER`: is the GDAL driver to use for the export. Use `ST_Drivers()` to list the names of all available drivers. * `LAYER_CREATION_OPTIONS`: list of options to pass to the GDAL driver. See the GDAL docs for the driver you are using for a list of available options. * `SRS`: Set a spatial reference system as metadata to use for the export. This can be a WKT string, an EPSG code or a proj-string, basically anything you would normally be able to pass to GDAL. Note that this will **not** perform any reprojection of the input geometry, it just sets the metadata if the target driver supports it. #### Limitations {#docs:stable:core_extensions:spatial:gdal::limitations} Note that only vector based drivers are supported by the GDAL integration. Reading and writing raster formats are not supported. ## SQLite Extension {#docs:stable:core_extensions:sqlite} The SQLite extension allows DuckDB to directly read and write data from a SQLite database file. The data can be queried directly from the underlying SQLite tables. Data can be loaded from SQLite tables into DuckDB tables, or vice versa. #### Installing and Loading {#docs:stable:core_extensions:sqlite::installing-and-loading} The `sqlite` extension will be transparently [autoloaded](#docs:stable:extensions:overview::autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run: ```sql INSTALL sqlite; LOAD sqlite; ``` #### Usage {#docs:stable:core_extensions:sqlite::usage} To make a SQLite file accessible to DuckDB, use the `ATTACH` statement with the `sqlite` or `sqlite_scanner` type. Attached SQLite databases support both read and write operations. For example, to attach to the [`sakila.db` file](https://github.com/duckdb/sqlite_scanner/raw/main/data/db/sakila.db), run: ```sql ATTACH 'sakila.db' (TYPE sqlite); USE sakila; ``` The tables in the file can be read as if they were normal DuckDB tables, but the underlying data is read directly from the SQLite tables in the file at query time. ```sql SHOW TABLES; ``` | name | |------------------------| | actor | | address | | category | | city | | country | | customer | | customer_list | | film | | film_actor | | film_category | | film_list | | film_text | | inventory | | language | | payment | | rental | | sales_by_film_category | | sales_by_store | | staff | | staff_list | | store | You can query the tables using SQL, e.g., using the example queries from [`sakila-examples.sql`](https://github.com/duckdb/sqlite_scanner/blob/main/data/sql/sakila-examples.sql): ```sql SELECT cat.name AS category_name, sum(ifnull(pay.amount, 0)) AS revenue FROM category cat LEFT JOIN film_category flm_cat ON cat.category_id = flm_cat.category_id LEFT JOIN film fil ON flm_cat.film_id = fil.film_id LEFT JOIN inventory inv ON fil.film_id = inv.film_id LEFT JOIN rental ren ON inv.inventory_id = ren.inventory_id LEFT JOIN payment pay ON ren.rental_id = pay.rental_id GROUP BY cat.name ORDER BY revenue DESC LIMIT 5; ``` #### Data Types {#docs:stable:core_extensions:sqlite::data-types} SQLite is a [weakly typed database system](https://www.sqlite.org/datatype3.html). As such, when storing data in a SQLite table, types are not enforced. The following is valid SQL in SQLite: ```sql CREATE TABLE numbers (i INTEGER); INSERT INTO numbers VALUES ('hello'); ``` DuckDB is a strongly typed database system, as such, it requires all columns to have defined types and the system rigorously checks data for correctness. When querying SQLite, DuckDB must deduce a specific column type mapping. DuckDB follows SQLite's [type affinity rules](https://www.sqlite.org/datatype3.html#type_affinity) with a few extensions. 1. If the declared type contains the string `INT` then it is translated into the type `BIGINT` 2. If the declared type of the column contains any of the strings `CHAR`, `CLOB`, or `TEXT` then it is translated into `VARCHAR`. 3. If the declared type for a column contains the string `BLOB` or if no type is specified then it is translated into `BLOB`. 4. If the declared type for a column contains any of the strings `REAL`, `FLOA`, `DOUB`, `DEC` or `NUM` then it is translated into `DOUBLE`. 5. If the declared type is `DATE`, then it is translated into `DATE`. 6. If the declared type contains the string `TIME`, then it is translated into `TIMESTAMP`. 7. If none of the above apply, then it is translated into `VARCHAR`. As DuckDB enforces the corresponding columns to contain only correctly typed values, we cannot load the string â€œhelloâ€ into a column of type `BIGINT`. As such, an error is thrown when reading from the â€œnumbersâ€ table above: ```console Mismatch Type Error: Invalid type in column "i": column was declared as integer, found "hello" of type "text" instead. ``` This error can be avoided by setting the `sqlite_all_varchar` option: ```sql SET GLOBAL sqlite_all_varchar = true; ``` When set, this option overrides the type conversion rules described above, and instead always converts the SQLite columns into a `VARCHAR` column. Note that this setting must be set *before* `sqlite_attach` is called. #### Opening SQLite Databases Directly {#docs:stable:core_extensions:sqlite::opening-sqlite-databases-directly} SQLite databases can also be opened directly and can be used transparently instead of a DuckDB database file. In any client, when connecting, a path to a SQLite database file can be provided and the SQLite database will be opened instead. For example, with the shell, a SQLite database can be opened as follows: ```batch duckdb sakila.db ``` ```sql SELECT first_name FROM actor LIMIT 3; ``` | first_name | |------------| | PENELOPE | | NICK | | ED | #### Writing Data to SQLite {#docs:stable:core_extensions:sqlite::writing-data-to-sqlite} In addition to reading data from SQLite, the extension also allows you to create new SQLite database files, create tables, ingest data into SQLite and make other modifications to SQLite database files using standard SQL queries. This allows you to use DuckDB to, for example, export data that is stored in a SQLite database to Parquet, or read data from a Parquet file into SQLite. Below is a brief example of how to create a new SQLite database and load data into it. ```sql ATTACH 'new_sqlite_database.db' AS sqlite_db (TYPE sqlite); CREATE TABLE sqlite_db.tbl (id INTEGER, name VARCHAR); INSERT INTO sqlite_db.tbl VALUES (42, 'DuckDB'); ``` The resulting SQLite database can then be read into from SQLite. ```batch sqlite3 new_sqlite_database.db ``` ```sql SQLite version 3.39.5 2022-10-14 20:58:05 sqlite> SELECT * FROM tbl; ``` ```text id name -- ------ 42 DuckDB ``` Many operations on SQLite tables are supported. All these operations directly modify the SQLite database, and the result of subsequent operations can then be read using SQLite. #### Concurrency {#docs:stable:core_extensions:sqlite::concurrency} DuckDB can read or modify a SQLite database while DuckDB or SQLite reads or modifies the same database from a different thread or a separate process. More than one thread or process can read the SQLite database at the same time, but only a single thread or process can write to the database at one time. Database locking is handled by the SQLite library, not DuckDB. Within the same process, SQLite uses mutexes. When accessed from different processes, SQLite uses file system locks. The locking mechanisms also depend on SQLite configuration, like WAL mode. Refer to the [SQLite documentation on locking](https://www.sqlite.org/lockingv3.html) for more information. > **Warning.** Linking multiple copies of the SQLite library into the same application can lead to application errors. See [sqlite_scanner Issue #82](https://github.com/duckdb/sqlite_scanner/issues/82) for more information. #### Settings {#docs:stable:core_extensions:sqlite::settings} The extension exposes the following configuration parameters. | Name | Description | Default | | --------------------------------- | ---------------------------------------------------------------------------- | ------- | | `sqlite_debug_show_queries` | DEBUG SETTING: print all queries sent to SQLite to stdout | `false` | #### Supported Operations {#docs:stable:core_extensions:sqlite::supported-operations} Below is a list of supported operations. ##### `CREATE TABLE` {#docs:stable:core_extensions:sqlite::create-table} ```sql CREATE TABLE sqlite_db.tbl (id INTEGER, name VARCHAR); ``` ##### `INSERT INTO` {#docs:stable:core_extensions:sqlite::insert-into} ```sql INSERT INTO sqlite_db.tbl VALUES (42, 'DuckDB'); ``` ##### `SELECT` {#docs:stable:core_extensions:sqlite::select} ```sql SELECT * FROM sqlite_db.tbl; ``` | id | name | |---:|--------| | 42 | DuckDB | ##### `COPY` {#docs:stable:core_extensions:sqlite::copy} ```sql COPY sqlite_db.tbl TO 'data.parquet'; COPY sqlite_db.tbl FROM 'data.parquet'; ``` ##### `UPDATE` {#docs:stable:core_extensions:sqlite::update} ```sql UPDATE sqlite_db.tbl SET name = 'Woohoo' WHERE id = 42; ``` ##### `DELETE` {#docs:stable:core_extensions:sqlite::delete} ```sql DELETE FROM sqlite_db.tbl WHERE id = 42; ``` ##### `ALTER TABLE` {#docs:stable:core_extensions:sqlite::alter-table} ```sql ALTER TABLE sqlite_db.tbl ADD COLUMN k INTEGER; ``` ##### `DROP TABLE` {#docs:stable:core_extensions:sqlite::drop-table} ```sql DROP TABLE sqlite_db.tbl; ``` ##### `CREATE VIEW` {#docs:stable:core_extensions:sqlite::create-view} ```sql CREATE VIEW sqlite_db.v1 AS SELECT 42; ``` ##### Transactions {#docs:stable:core_extensions:sqlite::transactions} ```sql CREATE TABLE sqlite_db.tmp (i INTEGER); ``` ```sql BEGIN; INSERT INTO sqlite_db.tmp VALUES (42); SELECT * FROM sqlite_db.tmp; ``` | i | |---:| | 42 | ```sql ROLLBACK; SELECT * FROM sqlite_db.tmp; ``` | i | |--:| | | > **Deprecated.** The old `sqlite_attach` function is deprecated. It is recommended to switch over to the new [`ATTACH` syntax](#docs:stable:sql:statements:attach). ## TPC-DS Extension {#docs:stable:core_extensions:tpcds} The `tpcds` extension implements the data generator and queries for the [TPC-DS benchmark](https://www.tpc.org/tpcds/). #### Installing and Loading {#docs:stable:core_extensions:tpcds::installing-and-loading} The `tpcds` extension will be transparently [autoloaded](#docs:stable:extensions:overview::autoloading-extensions) on first use from the official extension repository. If you would like to install and load it manually, run: ```sql INSTALL tpcds; LOAD tpcds; ``` #### Usage {#docs:stable:core_extensions:tpcds::usage} To generate data for scale factor 1, use: ```sql CALL dsdgen(sf = 1); ``` To run a query, e.g., query 8, use: ```sql PRAGMA tpcds(8); ``` | s_store_name | sum(ss_net_profit) | |--------------|-------------------:| | able | -10354620.18 | | ation | -10576395.52 | | bar | -10625236.01 | | ese | -10076698.16 | | ought | -10994052.78 | #### Generating the Schema {#docs:stable:core_extensions:tpcds::generating-the-schema} It's possible to generate the schema of TPC-DS without any data by setting the scale factor to 0: ```sql CALL dsdgen(sf = 0); ``` #### Limitations {#docs:stable:core_extensions:tpcds::limitations} The `tpcds(âŸ¨query_idâŸ©)`{:.language-sql .highlight} function runs a fixed TPC-DS query with pre-defined bind parameters (a.k.a. substitution parameters). It is not possible to change the query parameters using the `tpcds` extension. ## TPC-H Extension {#docs:stable:core_extensions:tpch} The `tpch` extension implements the data generator and queries for the [TPC-H benchmark](https://www.tpc.org/tpch/). #### Installing and Loading {#docs:stable:core_extensions:tpch::installing-and-loading} The `tpch` extension is shipped by default in some DuckDB builds, otherwise it will be transparently [autoloaded](#docs:stable:extensions:overview::autoloading-extensions) on first use. If you would like to install and load it manually, run: ```sql INSTALL tpch; LOAD tpch; ``` #### Usage {#docs:stable:core_extensions:tpch::usage} ##### Generating Data {#docs:stable:core_extensions:tpch::generating-data} To generate data for scale factor 1, use: ```sql CALL dbgen(sf = 1); ``` Calling `dbgen` does not clean up existing TPC-H tables. To clean up existing tables, use `DROP TABLE` before running `dbgen`: ```sql DROP TABLE IF EXISTS customer; DROP TABLE IF EXISTS lineitem; DROP TABLE IF EXISTS nation; DROP TABLE IF EXISTS orders; DROP TABLE IF EXISTS part; DROP TABLE IF EXISTS partsupp; DROP TABLE IF EXISTS region; DROP TABLE IF EXISTS supplier; ``` ##### Running a Query {#docs:stable:core_extensions:tpch::running-a-query} To run a query, e.g., query 4, use: ```sql PRAGMA tpch(4); ``` | o_orderpriority | order_count | | --------------- | ----------: | | 1-URGENT | 10594 | | 2-HIGH | 10476 | | 3-MEDIUM | 10410 | | 4-NOT SPECIFIED | 10556 | | 5-LOW | 10487 | ##### Listing Queries {#docs:stable:core_extensions:tpch::listing-queries} To list all 22 queries, run: ```sql FROM tpch_queries(); ``` This function returns a table with columns `query_nr` and `query`. ##### Listing Expected Answers {#docs:stable:core_extensions:tpch::listing-expected-answers} To produced the expected results for all queries on scale factors 0.01, 0.1, and 1, run: ```sql FROM tpch_answers(); ``` This function returns a table with columns `query_nr`, `scale_factor`, and `answer`. #### Generating the Schema {#docs:stable:core_extensions:tpch::generating-the-schema} It's possible to generate the schema of TPC-H without any data by setting the scale factor to 0: ```sql CALL dbgen(sf = 0); ``` #### Data Generator Parameters {#docs:stable:core_extensions:tpch::data-generator-parameters} The data generator function `dbgen` has the following parameters: | Name | Type | Description | | ----------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------- | | `catalog` | `VARCHAR` | Target catalog | | `children` | `UINTEGER` | Number of partitions | | `overwrite` | `BOOLEAN` | (Not used) | | `sf` | `DOUBLE` | Scale factor | | `step` | `UINTEGER` | Defines the partition to be generated, indexed from 0 to `children` - 1. Must be defined when the `children` arguments is defined | | `suffix` | `VARCHAR` | Append the `suffix` to table names | #### Pre-Generated Datasets {#docs:stable:core_extensions:tpch::pre-generated-datasets} Pre-generated DuckDB databases for TPC-H are available for download: * [`tpch-sf1.db`](https://blobs.duckdb.org/data/tpch-sf1.db) (250 MB) * [`tpch-sf3.db`](https://blobs.duckdb.org/data/tpch-sf3.db) (754 MB) * [`tpch-sf10.db`](https://blobs.duckdb.org/data/tpch-sf10.db) (2.5 GB) * [`tpch-sf30.db`](https://blobs.duckdb.org/data/tpch-sf30.db) (7.6 GB) * [`tpch-sf100.db`](https://blobs.duckdb.org/data/tpch-sf100.db) (26 GB) * [`tpch-sf300.db`](https://blobs.duckdb.org/data/tpch-sf300.db) (78 GB) * [`tpch-sf1000.db`](https://blobs.duckdb.org/data/tpch-sf1000.db) (265 GB) * [`tpch-sf3000.db`](https://blobs.duckdb.org/data/tpch-sf3000.db) (796 GB) #### Resource Usage of the Data Generator {#docs:stable:core_extensions:tpch::resource-usage-of-the-data-generator} Generating TPC-H datasets for large scale factors takes a significant amount of time. Additionally, _if the generation is performed in a single step,_ it requires a large amount of memory. The following table gives an estimate on the resources required to produce DuckDB database files containing the generated TPC-H dataset using 128 threads. | Scale factor | Database size | Generation time | Single-step generation's memory usage | | -----------: | ------------: | --------------: | ------------------------------------: | | 100 | 26 GB | 17 minutes | 71 GB | | 300 | 78 GB | 51 minutes | 211 GB | | 1,000 | 265 GB | 2 h 53 minutes | 647 GB | | 3,000 | 796 GB | 8 h 30 minutes | 1799 GB | The numbers shown above were achieved by running the `dbgen` function in a single step, for example: ```sql CALL dbgen(sf = 300); ``` If you have a limited amount of memory available, you can run the `dbgen` function in steps. For example, you may generate SF300 in 10 steps: ```sql CALL dbgen(sf = 300, children = 10, step = 0); CALL dbgen(sf = 300, children = 10, step = 1); ... CALL dbgen(sf = 300, children = 10, step = 9); ``` #### Limitation {#docs:stable:core_extensions:tpch::limitation} The `tpch(âŸ¨query_idâŸ©)`{:.language-sql .highlight} function runs a fixed TPC-H query with pre-defined bind parameters (a.k.a. substitution parameters). It is not possible to change the query parameters using the `tpch` extension. To run the queries with the parameters prescribed by the TPC-H benchmark, use a TPC-H framework implementation. ## UI Extension {#docs:stable:core_extensions:ui} The `ui` extension adds a user interface for your local DuckDB instance. The UI is built and maintained by [MotherDuck](https://motherduck.com/). An overview of its features can be found in the [MotherDuck documentation](https://motherduck.com/docs/getting-started/motherduck-quick-tour/). #### Requirements {#docs:stable:core_extensions:ui::requirements} * An environment with a browser. * Any DuckDB client except Wasm, v1.2.1 or later. #### Usage {#docs:stable:core_extensions:ui::usage} To start the UI from the command line: ```batch duckdb -ui ``` To start the UI from SQL: ```sql CALL start_ui(); ``` Running either of these will open the UI in your default browser. The UI connects to the DuckDB instance it was started from, so any data youâ€™ve already loaded will be available. Since this instance is a native process (not Wasm), it can leverage all the resources of your local environment: all cores, memory, and files. Closing this instance will cause the UI to stop working. The UI is served from an HTTP server embedded in DuckDB. To start this server without launching the browser, run: ```sql CALL start_ui_server(); ``` You can then load the UI in your browser by navigating to `http://localhost:4213`. To stop the HTTP server, run: ```sql CALL stop_ui_server(); ``` #### Local Query Execution {#docs:stable:core_extensions:ui::local-query-execution} By default, the DuckDB UI runs your queries fully locally: your queries and data never leave your computer. If you would like to use [MotherDuck](https://motherduck.com/) through the UI, you have to opt-in explicitly and sign into MotherDuck. #### Configuration {#docs:stable:core_extensions:ui::configuration} ##### Local Port {#docs:stable:core_extensions:ui::local-port} The local port of the HTTP server can be configured with a SQL command like: ```sql SET ui_local_port = 4213; ``` The environment variable `ui_local_port` can also be used. The default port is 4213. (Why? 4 = D, 21 = U, 3 = C) ##### Remote URL {#docs:stable:core_extensions:ui::remote-url} The local HTTP server fetches the files for the UI from a remote HTTP server so they can be kept up-to-date. The default URL for the remote server is . An alternate remote URL can be configured with a SQL command like: ```sql SET ui_remote_url = 'https://ui.duckdb.org'; ``` The environment variable `ui_remote_port` can also be used. This setting is available mainly for testing purposes. Be sure you trust any URL you configure, as the application can access the data you load into DuckDB. Because of this risk, the setting is only respected if `allow_unsigned_extensions` is enabled. ##### Polling Interval {#docs:stable:core_extensions:ui::polling-interval} The UI extension polls for some information on a background thread. It watches for changes to the list of attached databases, and it detects when you connect to MotherDuck. These checks take very little time to complete, so the default polling interval is short (284 milliseconds). You can configure it with a SQL command like: ```sql SET ui_polling_interval = 284; ``` The environment variable `ui_polling_interval` can also be used. Setting the polling interval to 0 will disable polling entirely. This is not recommended, as the list of databases in the UI could get out of date, and some ways of connecting to MotherDuck will not work properly. #### Tips {#docs:stable:core_extensions:ui::tips} ##### Opening a CSV File with the DuckDB UI {#docs:stable:core_extensions:ui::opening-a-csv-file-with-the-duckdb-ui} Using the [DuckDB CLI client](#docs:stable:clients:cli:overview), you can start the UI with a CSV available as a view using the [`-cmd` argument](#docs:stable:clients:cli:arguments): ```batch duckdb -cmd "CREATE VIEW âŸ¨view_nameâŸ© AS FROM 'âŸ¨filenameâŸ©.csv';" -ui ``` ##### Running the UI in Read-Only Mode {#docs:stable:core_extensions:ui::running-the-ui-in-read-only-mode} The DuckDB UI uses DuckDB tables as storage internally (e.g., for saving notebooks). Therefore, running the UI directly on a read-only database [is not supported](https://github.com/duckdb/duckdb-ui/issues/61): ```batch duckdb -ui -readonly read_only_test.db ``` In the UI, this results in: ```console Catalog Error: SET schema: No catalog + schema named "memory.main" found. ``` To work around this, run the UI on another database file: ```batch duckdb -ui ui_catalog.db ``` Then, open a notebook and attach to the database: ```sql ATTACH 'test.db' (READ_ONLY) AS my_db; USE my_db ``` #### Limitations {#docs:stable:core_extensions:ui::limitations} * The UI currently does not support the ARM-based Windows platforms (` windows_arm64` and `windows_arm64_mingw`). ## Vector Similarity Search Extension {#docs:stable:core_extensions:vss} The `vss` extension is an experimental extension for DuckDB that adds indexing support to accelerate vector similarity search queries using DuckDB's new fixed-size `ARRAY` type. See the [announcement blog post](https://duckdb.org/2024/05/03/vector-similarity-search-vss) and the [â€œWhat's New in the Vector Similarity Search Extension?â€ post](https://duckdb.org/2024/10/23/whats-new-in-the-vss-extension). #### Usage {#docs:stable:core_extensions:vss::usage} To create a new HNSW (Hierarchical Navigable Small Worlds) index on a table with an `ARRAY` column, use the `CREATE INDEX` statement with the `USING HNSW` clause. For example: ```sql INSTALL vss; LOAD vss; CREATE TABLE my_vector_table (vec FLOAT[3]); INSERT INTO my_vector_table SELECT array_value(a, b, c) FROM range(1, 10) ra(a), range(1, 10) rb(b), range(1, 10) rc(c); CREATE INDEX my_hnsw_index ON my_vector_table USING HNSW (vec); ``` The index will then be used to accelerate queries that use an `ORDER BY` clause evaluating one of the supported distance metric functions against the indexed columns and a constant vector, followed by a `LIMIT` clause. For example: ```sql SELECT * FROM my_vector_table ORDER BY array_distance(vec, [1, 2, 3]::FLOAT[3]) LIMIT 3; ``` Additionally, the overloaded `min_by(col, arg, n)` can also be accelerated with the `HNSW` index if the `arg` argument is a matching distance metric function. This can be used to do quick one-shot nearest neighbor searches. For example, to get the top 3 rows with the closest vectors to `[1, 2, 3]`: ```sql SELECT min_by(my_vector_table, array_distance(vec, [1, 2, 3]::FLOAT[3]), 3 ORDER BY vec) AS result FROM my_vector_table; ``` ```text [{'vec': [1.0, 2.0, 3.0]}, {'vec': [2.0, 2.0, 3.0]}, {'vec': [1.0, 2.0, 4.0]}] ``` Note how we pass the table name as the first argument to [`min_by`](#docs:stable:sql:functions:aggregates::min_byarg-val-n) to return a struct containing the entire matched row. We can verify that the index is being used by checking the `EXPLAIN` output and looking for the `HNSW_INDEX_SCAN` node in the plan: ```sql EXPLAIN SELECT * FROM my_vector_table ORDER BY array_distance(vec, [1, 2, 3]::FLOAT[3]) LIMIT 3; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ PROJECTION â”‚ â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚ â”‚ #0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ PROJECTION â”‚ â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚ â”‚ vec â”‚ â”‚array_distance(vec, [1.0, 2â”‚ â”‚ .0, 3.0]) â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ HNSW_INDEX_SCAN â”‚ â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚ â”‚ t1 (HNSW INDEX SCAN : â”‚ â”‚ my_idx) â”‚ â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚ â”‚ vec â”‚ â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚ â”‚ EC: 3 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` By default the HNSW index will be created using the euclidean distance `l2sq` (L2-norm squared) metric, matching DuckDBs `array_distance` function, but other distance metrics can be used by specifying the `metric` option during index creation. For example: ```sql CREATE INDEX my_hnsw_cosine_index ON my_vector_table USING HNSW (vec) WITH (metric = 'cosine'); ``` The following table shows the supported distance metrics and their corresponding DuckDB functions | Metric | Function | Description | |----------|--------------------------------|----------------------------| | `l2sq` | `array_distance` | Euclidean distance | | `cosine` | `array_cosine_distance` | Cosine similarity distance | | `ip` | `array_negative_inner_product` | Negative inner product | Note that while each `HNSW` index only applies to a single column you can create multiple `HNSW` indexes on the same table each individually indexing a different column. Additionally, you can also create multiple `HNSW` indexes to the same column, each supporting a different distance metric. #### Index Options {#docs:stable:core_extensions:vss::index-options} Besides the `metric` option, the `HNSW` index creation statement also supports the following options to control the hyperparameters of the index construction and search process: | Option | Default | Description | |-------|--:|----------------------------| | `ef_construction` | 128 | The number of candidate vertices to consider during the construction of the index. A higher value will result in a more accurate index, but will also increase the time it takes to build the index. | | `ef_search` | 64 | The number of candidate vertices to consider during the search phase of the index. A higher value will result in a more accurate index, but will also increase the time it takes to perform a search. | | `M` | 16 | The maximum number of neighbors to keep for each vertex in the graph. A higher value will result in a more accurate index, but will also increase the time it takes to build the index. | | `M0` | 2 * `M` | The base connectivity, or the number of neighbors to keep for each vertex in the zero-th level of the graph. A higher value will result in a more accurate index, but will also increase the time it takes to build the index. | Additionally, you can also override the `ef_search` parameter set at index construction time by setting the `SET hnsw_ef_search = âŸ¨intâŸ©`{:.language-sql .highlight} configuration option at runtime. This can be useful if you want to trade search performance for accuracy or vice-versa on a per-connection basis. You can also unset the override by calling `RESET hnsw_ef_search`{:.language-sql .highlight}. #### Persistence {#docs:stable:core_extensions:vss::persistence} Due to some known issues related to persistence of custom extension indexes, the `HNSW` index can only be created on tables in in-memory databases by default, unless the `SET hnsw_enable_experimental_persistence = âŸ¨boolâŸ©`{:.language-sql .highlight} configuration option is set to `true`. The reasoning for locking this feature behind an experimental flag is that â€œWALâ€ recovery is not yet properly implemented for custom indexes, meaning that if a crash occurs or the database is shut down unexpectedly while there are uncommitted changes to a `HNSW`-indexed table, you can end up with **data loss or corruption of the index**. If you enable this option and experience an unexpected shutdown, you can try to recover the index by first starting DuckDB separately, loading the `vss` extension and then `ATTACH`ing the database file, which ensures that the `HNSW` index functionality is available during WAL-playback, allowing DuckDB's recovery process to proceed without issues. But we still recommend that you do not use this feature in production environments. With the `hnsw_enable_experimental_persistence` option enabled, the index will be persisted into the DuckDB database file (if you run DuckDB with a disk-backed database file), which means that after a database restart, the index can be loaded back into memory from disk instead of having to be re-created. With that in mind, there are no incremental updates to persistent index storage, so every time DuckDB performs a checkpoint the entire index will be serialized to disk and overwrite itself. Similarly, after a restart of the database, the index will be deserialized back into main memory in its entirety. Although this will be deferred until you first access the table associated with the index. Depending on how large the index is, the deserialization process may take some time, but it should still be faster than simply dropping and re-creating the index. #### Inserts, Updates, Deletes and Re-Compaction {#docs:stable:core_extensions:vss::inserts-updates-deletes-and-re-compaction} The HNSW index does support inserting, updating and deleting rows from the table after index creation. However, there are two things to keep in mind: * It's faster to create the index after the table has been populated with data as the initial bulk load can make better use of parallelism on large tables. * Deletes are not immediately reflected in the index, but are instead â€œmarkedâ€ as deleted, which can cause the index to grow stale over time and negatively impact query quality and performance. To remedy the last point, you can call the `PRAGMA hnsw_compact_index('âŸ¨index_nameâŸ©')`{:.language-sql .highlight} pragma function to trigger a re-compaction of the index pruning deleted items, or re-create the index after a significant number of updates. #### Bonus: Vector Similarity Search Joins {#docs:stable:core_extensions:vss::bonus-vector-similarity-search-joins} The `vss` extension also provides a couple of table macros to simplify matching multiple vectors against each other, so called "fuzzy joins". These are: * `vss_join(left_table, right_table, left_col, right_col, k, metric := 'l2sq')` * `vss_match(right_table", left_col, right_col, k, metric := 'l2sq')` These **do not** currently make use of the `HNSW` index but are provided as convenience utility functions for users who are ok with performing brute-force vector similarity searches without having to write out the join logic themselves. In the future these might become targets for index-based optimizations as well. These functions can be used as follows: ```sql CREATE TABLE haystack (id int, vec FLOAT[3]); CREATE TABLE needle (search_vec FLOAT[3]); INSERT INTO haystack SELECT row_number() OVER (), array_value(a, b, c) FROM range(1, 10) ra(a), range(1, 10) rb(b), range(1, 10) rc(c); INSERT INTO needle VALUES ([5, 5, 5]), ([1, 1, 1]); SELECT * FROM vss_join(needle, haystack, search_vec, vec, 3) res; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ score â”‚ left_tbl â”‚ right_tbl â”‚ â”‚ float â”‚ struct(search_vec float[3]) â”‚ struct(id integer, vec float[3]) â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 0.0 â”‚ {'search_vec': [5.0, 5.0, 5.0]} â”‚ {'id': 365, 'vec': [5.0, 5.0, 5.0]} â”‚ â”‚ 1.0 â”‚ {'search_vec': [5.0, 5.0, 5.0]} â”‚ {'id': 364, 'vec': [5.0, 4.0, 5.0]} â”‚ â”‚ 1.0 â”‚ {'search_vec': [5.0, 5.0, 5.0]} â”‚ {'id': 356, 'vec': [4.0, 5.0, 5.0]} â”‚ â”‚ 0.0 â”‚ {'search_vec': [1.0, 1.0, 1.0]} â”‚ {'id': 1, 'vec': [1.0, 1.0, 1.0]} â”‚ â”‚ 1.0 â”‚ {'search_vec': [1.0, 1.0, 1.0]} â”‚ {'id': 10, 'vec': [2.0, 1.0, 1.0]} â”‚ â”‚ 1.0 â”‚ {'search_vec': [1.0, 1.0, 1.0]} â”‚ {'id': 2, 'vec': [1.0, 2.0, 1.0]} â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` Alternatively, we can use the `vss_match` macro as a â€œlateral joinâ€ to get the matches already grouped by the left table. Note that this requires us to specify the left table first, and then the `vss_match` macro which references the search column from the left table (in this case, `search_vec`): ```sql SELECT * FROM needle, vss_match(haystack, search_vec, vec, 3) res; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ search_vec â”‚ matches â”‚ â”‚ float[3] â”‚ struct(score float, "row" struct(id integer, vec float[3]))[] â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ [5.0, 5.0, 5.0] â”‚ [{'score': 0.0, 'row': {'id': 365, 'vec': [5.0, 5.0, 5.0]}}, {'score': 1.0, 'row': {'id': 364, 'vec': [5.0, 4.0, 5.0]}}, {'score': 1.0, 'row': {'id': 356, 'vec': [4.0, 5.0, 5.0]}}] â”‚ â”‚ [1.0, 1.0, 1.0] â”‚ [{'score': 0.0, 'row': {'id': 1, 'vec': [1.0, 1.0, 1.0]}}, {'score': 1.0, 'row': {'id': 10, 'vec': [2.0, 1.0, 1.0]}}, {'score': 1.0, 'row': {'id': 2, 'vec': [1.0, 2.0, 1.0]}}] â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Limitations {#docs:stable:core_extensions:vss::limitations} * Only vectors consisting of `FLOAT`s (32-bit, single precision) are supported at the moment. * The index itself is not buffer managed and must be able to fit into RAM memory. * The size of the index in memory does not count towards DuckDB's `memory_limit` configuration parameter. * `HNSW` indexes can only be created on tables in in-memory databases, unless the `SET hnsw_enable_experimental_persistence = âŸ¨boolâŸ©`{:.language-sql .highlight} configuration option is set to `true`, see [Persistence](#::persistence) for more information. * The vector join table macros (` vss_join` and `vss_match`) do not require or make use of the `HNSW` index. # Guides {#guides} ## Guides {#docs:stable:guides:overview} The guides section contains compact how-to guides that are focused on achieving a single goal. For API references and examples, see the rest of the documentation. Note that there are many tools using DuckDB, which are not covered in the official guides. To find a list of these tools, check out the [Awesome DuckDB repository](https://github.com/davidgasquez/awesome-duckdb). > **Tip.** For a short introductory tutorial, check out the [â€œAnalyzing Railway Traffic in the Netherlandsâ€](https://duckdb.org/2024/05/31/analyzing-railway-traffic-in-the-netherlands) tutorial. #### Data Import and Export {#docs:stable:guides:overview::data-import-and-export} * [Data import overview](#docs:stable:guides:file_formats:overview) * [File access with the `file:` protocol](#docs:stable:guides:file_formats:file_access) ##### CSV Files {#docs:stable:guides:overview::csv-files} * [How to load a CSV file into a table](#docs:stable:guides:file_formats:csv_import) * [How to export a table to a CSV file](#docs:stable:guides:file_formats:csv_export) ##### Parquet Files {#docs:stable:guides:overview::parquet-files} * [How to load a Parquet file into a table](#docs:stable:guides:file_formats:parquet_import) * [How to export a table to a Parquet file](#docs:stable:guides:file_formats:parquet_export) * [How to run a query directly on a Parquet file](#docs:stable:guides:file_formats:query_parquet) ##### HTTP(S), S3 and GCP {#docs:stable:guides:overview::https-s3-and-gcp} * [How to load a Parquet file directly from HTTP(S)](#docs:stable:guides:network_cloud_storage:http_import) * [How to load a Parquet file directly from S3](#docs:stable:guides:network_cloud_storage:s3_import) * [How to export a Parquet file to S3](#docs:stable:guides:network_cloud_storage:s3_export) * [How to load a Parquet file from S3 Express One](#docs:stable:guides:network_cloud_storage:s3_express_one) * [How to load a Parquet file directly from GCS](#docs:stable:guides:network_cloud_storage:gcs_import) * [How to load a Parquet file directly from Cloudflare R2](#docs:stable:guides:network_cloud_storage:cloudflare_r2_import) * [How to load an Iceberg table directly from S3](#docs:stable:guides:network_cloud_storage:s3_iceberg_import) ##### JSON Files {#docs:stable:guides:overview::json-files} * [How to load a JSON file into a table](#docs:stable:guides:file_formats:json_import) * [How to export a table to a JSON file](#docs:stable:guides:file_formats:json_export) ##### Excel Files with the Spatial Extension {#docs:stable:guides:overview::excel-files-with-the-spatial-extension} * [How to load an Excel file into a table](#docs:stable:guides:file_formats:excel_import) * [How to export a table to an Excel file](#docs:stable:guides:file_formats:excel_export) ##### Querying Other Database Systems {#docs:stable:guides:overview::querying-other-database-systems} * [How to directly query a MySQL database](#docs:stable:guides:database_integration:mysql) * [How to directly query a PostgreSQL database](#docs:stable:guides:database_integration:postgres) * [How to directly query a SQLite database](#docs:stable:guides:database_integration:sqlite) ##### Directly Reading Files {#docs:stable:guides:overview::directly-reading-files} * [How to directly read a binary file](#docs:stable:guides:file_formats:read_file::read_blob) * [How to directly read a text file](#docs:stable:guides:file_formats:read_file::read_text) #### Performance {#docs:stable:guides:overview::performance} * [My workload is slow (troubleshooting guide)](#docs:stable:guides:performance:my_workload_is_slow) * [How to design the schema for optimal performance](#docs:stable:guides:performance:schema) * [What is the ideal hardware environment for DuckDB](#docs:stable:guides:performance:environment) * [What performance implications do Parquet files and (compressed) CSV files have](#docs:stable:guides:performance:file_formats) * [How to tune workloads](#docs:stable:guides:performance:how_to_tune_workloads) * [Benchmarks](#docs:stable:guides:performance:benchmarks) #### Meta Queries {#docs:stable:guides:overview::meta-queries} * [How to list all tables](#docs:stable:guides:meta:list_tables) * [How to view the schema of the result of a query](#docs:stable:guides:meta:describe) * [How to quickly get a feel for a dataset using summarize](#docs:stable:guides:meta:summarize) * [How to view the query plan of a query](#docs:stable:guides:meta:explain) * [How to profile a query](#docs:stable:guides:meta:explain_analyze) #### ODBC {#docs:stable:guides:overview::odbc} * [How to set up an ODBC application (and more!)]({% link docs/stable/guides/odbc/general.md %}) #### Python Client {#docs:stable:guides:overview::python-client} * [How to install the Python client](#docs:stable:guides:python:install) * [How to execute SQL queries](#docs:stable:guides:python:execute_sql) * [How to easily query DuckDB in Jupyter Notebooks](#docs:stable:guides:python:jupyter) * [How to easily query DuckDB in marimo Notebooks](#docs:stable:guides:python:marimo) * [How to use Multiple Python Threads with DuckDB](#docs:stable:guides:python:multiple_threads) * [How to use fsspec filesystems with DuckDB](#docs:stable:guides:python:filesystems) ##### Pandas {#docs:stable:guides:overview::pandas} * [How to execute SQL on a Pandas DataFrame](#docs:stable:guides:python:sql_on_pandas) * [How to create a table from a Pandas DataFrame](#docs:stable:guides:python:import_pandas) * [How to export data to a Pandas DataFrame](#docs:stable:guides:python:export_pandas) ##### Apache Arrow {#docs:stable:guides:overview::apache-arrow} * [How to execute SQL on Apache Arrow](#docs:stable:guides:python:sql_on_arrow) * [How to create a DuckDB table from Apache Arrow](#docs:stable:guides:python:import_arrow) * [How to export data to Apache Arrow](#docs:stable:guides:python:export_arrow) ##### Relational API {#docs:stable:guides:overview::relational-api} * [How to query Pandas DataFrames with the Relational API](#docs:stable:guides:python:relational_api_pandas) ##### Python Library Integrations {#docs:stable:guides:overview::python-library-integrations} * [How to use Ibis to query DuckDB with or without SQL](#docs:stable:guides:python:ibis) * [How to use DuckDB with Polars DataFrames via Apache Arrow](#docs:stable:guides:python:polars) #### SQL Features {#docs:stable:guides:overview::sql-features} * [Friendly SQL](#docs:stable:sql:dialect:friendly_sql) * [As-of join](#docs:stable:guides:sql_features:asof_join) * [Full-text search](#docs:stable:guides:sql_features:full_text_search) * [`query` and `query_table` functions](#docs:stable:guides:sql_features:query_and_query_table_functions) #### SQL Editors and IDEs {#docs:stable:guides:overview::sql-editors-and-ides} * [How to set up the DBeaver SQL IDE](#docs:stable:guides:sql_editors:dbeaver) #### Data Viewers {#docs:stable:guides:overview::data-viewers} * [How to visualize DuckDB databases with Tableau](#docs:stable:guides:data_viewers:tableau) * [How to draw command-line plots with DuckDB and YouPlot](#docs:stable:guides:data_viewers:youplot) ## Data Viewers {#guides:data_viewers} ### Tableau â€“ A Data Visualization Tool {#docs:stable:guides:data_viewers:tableau} [Tableau](https://www.tableau.com/) is a popular commercial data visualization tool. In addition to a large number of built in connectors, it also provides generic database connectivity via ODBC and JDBC connectors. Tableau has two main versions: Desktop and Online (Server). * For Desktop, connecting to a DuckDB database is similar to working in an embedded environment like Python. * For Online, since DuckDB is in-process, the data needs to be either on the server itself or in a remote data bucket that is accessible from the server. #### Database Creation {#docs:stable:guides:data_viewers:tableau::database-creation} When using a DuckDB database file the datasets do not actually need to be imported into DuckDB tables; it suffices to create views of the data. For example, this will create a view of the `h2oai` Parquet test file in the current DuckDB code base: ```sql CREATE VIEW h2oai AS ( FROM read_parquet('/Users/username/duckdb/data/parquet-testing/h2oai/h2oai_group_small.parquet') ); ``` Note that you should use full path names to local files so that they can be found from inside Tableau. Also note that you will need to use a version of the driver that is compatible (i.e., from the same release) as the database format used by the DuckDB tool (e.g., Python module, command line) that was used to create the file. #### Installing the JDBC Driver {#docs:stable:guides:data_viewers:tableau::installing-the-jdbc-driver} Tableau provides documentation on how to [install a JDBC driver](https://help.tableau.com/current/pro/desktop/en-gb/jdbc_tableau.htm) for Tableau to use. > Tableau (both Desktop and Server versions) need to be restarted any time you add or modify drivers. ##### Driver Links {#docs:stable:guides:data_viewers:tableau::driver-links} The link here is for a recent version of the JDBC driver that is compatible with Tableau. If you wish to connect to a database file, you will need to make sure the file was created with a file-compatible version of DuckDB. Also, check that there is only one version of the driver installed as there are multiple filenames in use. Download the [JAR file](https://repo1.maven.org/maven2/org/duckdb/duckdb_jdbc/1.4.1.0/duckdb_jdbc-1.4.1.0.jar). * macOS: Copy it to `~/Library/Tableau/Drivers/` * Windows: Copy it to `C:\Program Files\Tableau\Drivers` * Linux: Copy it to `/opt/tableau/tableau_driver/jdbc`. #### Using the PostgreSQL Dialect {#docs:stable:guides:data_viewers:tableau::using-the-postgresql-dialect} If you just want to do something simple, you can try connecting directly to the JDBC driver and using Tableau-provided PostgreSQL dialect. 1. Create a DuckDB file containing your views and/or data. 2. Launch Tableau 3. Under Connect > To a Server > Moreâ€¦ click on â€œOther Databases (JDBC)â€ This will bring up the connection dialogue box. For the URL, enter `jdbc:duckdb:/User/username/path/to/database.db`. For the Dialect, choose PostgreSQL. The rest of the fields can be ignored: ![Tableau PostgreSQL](../images/guides/tableau/tableau-osx-jdbc.png) However, functionality will be missing such as `median` and `percentile` aggregate functions. To make the data source connection more compatible with the PostgreSQL dialect, please use the DuckDB taco connector as described below. #### Installing the Tableau DuckDB Connector {#docs:stable:guides:data_viewers:tableau::installing-the-tableau-duckdb-connector} While it is possible to use the Tableau-provided PostgreSQL dialect to communicate with the DuckDB JDBC driver, we strongly recommend using the [DuckDB "taco" connector](https://github.com/motherduckdb/duckdb-tableau-connector). This connector has been fully tested against the Tableau dialect generator and [is more compatible](https://github.com/motherduckdb/duckdb-tableau-connector/blob/main/tableau_connectors/duckdb_jdbc/dialect.tdd) than the provided PostgreSQL dialect. The documentation on how to install and use the connector is in its repository, but essentially you will need the [`duckdb_jdbc.taco`](https://github.com/motherduckdb/duckdb-tableau-connector/raw/main/packaged-connector/duckdb_jdbc-v1.0.0-signed.taco) file. (Despite what the Tableau documentation says, the real security risk is in the JDBC driver code, not the small amount of JavaScript in the Taco.) ##### Server (Online) {#docs:stable:guides:data_viewers:tableau::server-online} On Linux, copy the Taco file to `/opt/tableau/connectors`. On Windows, copy the Taco file to `C:\Program Files\Tableau\Connectors`. Then issue these commands to disable signature validation: ```batch tsm configuration set -k native_api.disable_verify_connector_plugin_signature -v true ``` ```batch tsm pending-changes apply ``` The last command will restart the server with the new settings. ##### macOS {#docs:stable:guides:data_viewers:tableau::macos} Copy the Taco file to the `/Users/[User]/Documents/My Tableau Repository/Connectors` folder. Then launch Tableau Desktop from the Terminal with the command line argument to disable signature validation: ```batch /Applications/Tableau\ Desktop\ âŸ¨yearâŸ©.âŸ¨quarterâŸ©.app/Contents/MacOS/Tableau -DDisableVerifyConnectorPluginSignature=true ``` You can also package this up with AppleScript by using the following script: ```tableau do shell script "\"/Applications/Tableau Desktop 2023.2.app/Contents/MacOS/Tableau\" -DDisableVerifyConnectorPluginSignature=true" quit ``` Create this file with the [Script Editor](https://support.apple.com/guide/script-editor/welcome/mac) (located in `/Applications/Utilities`) and [save it as a packaged application](https://support.apple.com/guide/script-editor/save-a-script-as-an-app-scpedt1072/mac): ![tableau-applescript](../images/guides/tableau/applescript.png) You can then double-click it to launch Tableau. You will need to change the application name in the script when you get upgrades. ##### Windows Desktop {#docs:stable:guides:data_viewers:tableau::windows-desktop} Copy the Taco file to the `C:\Users\[Windows User]\Documents\My Tableau Repository\Connectors` directory. Then launch Tableau Desktop from a shell with the `-DDisableVerifyConnectorPluginSignature=true` argument to disable signature validation. #### Output {#docs:stable:guides:data_viewers:tableau::output} Once loaded, you can run queries against your data! Here is the result of the first H2O.ai benchmark query from the Parquet test file: ![tableau-parquet](../images/guides/tableau/h2oai-group-by-1.png) ### CLI Charting with YouPlot {#docs:stable:guides:data_viewers:youplot} DuckDB can be used with CLI graphing tools to quickly pipe input to stdout to graph your data in one line. [YouPlot](https://github.com/red-data-tools/YouPlot) is a Ruby-based CLI tool for drawing visually pleasing plots on the terminal. It can accept input from other programs by piping data from `stdin`. It takes tab-separated (or delimiter of your choice) data and can easily generate various types of plots including bar, line, histogram and scatter. With DuckDB, you can write to the console (` stdout`) by using the `TO '/dev/stdout'` command. And you can also write comma-separated values by using `WITH (FORMAT csv, HEADER)`. #### Installing YouPlot {#docs:stable:guides:data_viewers:youplot::installing-youplot} Installation instructions for YouPlot can be found on the main [YouPlot repository](https://github.com/red-data-tools/YouPlot#installation). If you're on a Mac, you can use: ```batch brew install youplot ``` Run `uplot --help` to ensure you've installed it successfully! #### Piping DuckDB Queries to stdout {#docs:stable:guides:data_viewers:youplot::piping-duckdb-queries-to-stdout} By combining the [`COPY...TO`](#docs:stable:sql:statements:copy::copy-to) function with a CSV output file, data can be read from any format supported by DuckDB and piped to YouPlot. There are three important steps to doing this. 1. As an example, this is how to read all data from `input.json`: ```batch duckdb -s "SELECT * FROM read_json_auto('input.json')" ``` 2. To prepare the data for YouPlot, write a simple aggregate: ```batch duckdb -s "SELECT date, sum(purchases) AS total_purchases FROM read_json_auto('input.json') GROUP BY 1 ORDER BY 2 DESC LIMIT 10" ``` 3. Finally, wrap the `SELECT` in the `COPY ... TO` function with an output location of `/dev/stdout`. The syntax looks like this: ```sql COPY (âŸ¨queryâŸ©) TO '/dev/stdout' WITH (FORMAT csv, HEADER); ``` The full DuckDB command below outputs the query in CSV format with a header: ```batch duckdb -s "COPY (SELECT date, sum(purchases) AS total_purchases FROM read_json_auto('input.json') GROUP BY 1 ORDER BY 2 DESC LIMIT 10) TO '/dev/stdout' WITH (FORMAT csv, HEADER)" ``` #### Connecting DuckDB to YouPlot {#docs:stable:guides:data_viewers:youplot::connecting-duckdb-to-youplot} Finally, the data can now be piped to YouPlot! Let's assume we have an `input.json` file with dates and number of purchases made by somebody on that date. Using the query above, we'll pipe the data to the `uplot` command to draw a plot of the Top 10 Purchase Dates ```batch duckdb -s "COPY (SELECT date, sum(purchases) AS total_purchases FROM read_json_auto('input.json') GROUP BY 1 ORDER BY 2 DESC LIMIT 10) TO '/dev/stdout' WITH (FORMAT csv, HEADER)" \ | uplot bar -d, -H -t "Top 10 Purchase Dates" ``` This tells `uplot` to draw a bar plot, use a comma-separated delimiter (` -d,`), that the data has a header (` -H`), and give the plot a title (` -t`). ![youplot-top-10](../images/guides/youplot/top-10-plot.png) #### Bonus Round! stdin + stdout {#docs:stable:guides:data_viewers:youplot::bonus-round-stdin--stdout} Maybe you're piping some data through `jq`. Maybe you're downloading a JSON file from somewhere. You can also tell DuckDB to read the data from another process by changing the filename to `/dev/stdin`. Let's combine this with a quick `curl` from GitHub to see what a certain user has been up to lately. ```batch curl -sL "https://api.github.com/users/dacort/events?per_page=100" \ | duckdb -s "COPY (SELECT type, count(*) AS event_count FROM read_json_auto('/dev/stdin') GROUP BY 1 ORDER BY 2 DESC LIMIT 10) TO '/dev/stdout' WITH (FORMAT csv, HEADER)" \ | uplot bar -d, -H -t "GitHub Events for @dacort" ``` ![github-events](../images/guides/youplot/github-events.png) ## Database Integration {#guides:database_integration} ### Database Integration {#docs:stable:guides:database_integration:overview} ### MySQL Import {#docs:stable:guides:database_integration:mysql} To run a query directly on a running MySQL database, the [`mysql` extension](#docs:stable:core_extensions:mysql) is required. #### Installation and Loading {#docs:stable:guides:database_integration:mysql::installation-and-loading} The extension can be installed using the `INSTALL` SQL command. This only needs to be run once. ```sql INSTALL mysql; ``` To load the `mysql` extension for usage, use the `LOAD` SQL command: ```sql LOAD mysql; ``` #### Usage {#docs:stable:guides:database_integration:mysql::usage} After the `mysql` extension is installed, you can attach to a MySQL database using the following command: ```sql ATTACH 'host=localhost user=root port=0 database=mysqlscanner' AS mysql_db (TYPE mysql, READ_ONLY); USE mysql_db; ``` The string used by `ATTACH` is a PostgreSQL-style connection string (_not_ a MySQL connection string!). It is a list of connection arguments provided in `{key}={value}` format. Below is a list of valid arguments. Any options not provided are replaced by their default values. | Setting | Default | |------------|--------------| | `database` | `NULL` | | `host` | `localhost` | | `password` | | | `port` | `0` | | `socket` | `NULL` | | `user` | current user | You can directly read and write the MySQL database: ```sql CREATE TABLE tbl (id INTEGER, name VARCHAR); INSERT INTO tbl VALUES (42, 'DuckDB'); ``` For a list of supported operations, see the [MySQL extension documentation](#docs:stable:core_extensions:mysql::supported-operations). ### PostgreSQL Import {#docs:stable:guides:database_integration:postgres} To run a query directly on a running PostgreSQL database, the [`postgres` extension](#docs:stable:core_extensions:postgres) is required. #### Installation and Loading {#docs:stable:guides:database_integration:postgres::installation-and-loading} The extension can be installed using the `INSTALL` SQL command. This only needs to be run once. ```sql INSTALL postgres; ``` To load the `postgres` extension for usage, use the `LOAD` SQL command: ```sql LOAD postgres; ``` #### Usage {#docs:stable:guides:database_integration:postgres::usage} After the `postgres` extension is installed, tables can be queried from PostgreSQL using the `postgres_scan` function: ```sql -- Scan the table "mytable" from the schema "public" in the database "mydb" SELECT * FROM postgres_scan('host=localhost port=5432 dbname=mydb', 'public', 'mytable'); ``` The first parameter to the `postgres_scan` function is the [PostgreSQL connection string](https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING), a list of connection arguments provided in `{key}={value}` format. Below is a list of valid arguments. | Name | Description | Default | | ---------- | ------------------------------------ | -------------- | | `host` | Name of host to connect to | `localhost` | | `hostaddr` | Host IP address | `localhost` | | `port` | Port number | `5432` | | `user` | PostgreSQL user name | [OS user name] | | `password` | PostgreSQL password | | | `dbname` | Database name | [user] | | `passfile` | Name of file passwords are stored in | `~/.pgpass` | Alternatively, the entire database can be attached using the `ATTACH` command. This allows you to query all tables stored within the PostgreSQL database as if it was a regular database. ```sql -- Attach the PostgreSQL database using the given connection string ATTACH 'host=localhost port=5432 dbname=mydb' AS test (TYPE postgres); -- The table "tbl_name" can now be queried as if it is a regular table SELECT * FROM test.tbl_name; -- Switch the active database to "test" USE test; -- List all tables in the file SHOW TABLES; ``` For more information see the [PostgreSQL extension documentation](#docs:stable:core_extensions:postgres). ### SQLite Import {#docs:stable:guides:database_integration:sqlite} To run a query directly on a SQLite file, the `sqlite` extension is required. #### Installation and Loading {#docs:stable:guides:database_integration:sqlite::installation-and-loading} The extension can be installed using the `INSTALL` SQL command. This only needs to be run once. ```sql INSTALL sqlite; ``` To load the `sqlite` extension for usage, use the `LOAD` SQL command: ```sql LOAD sqlite; ``` #### Usage {#docs:stable:guides:database_integration:sqlite::usage} After the SQLite extension is installed, tables can be queried from SQLite using the `sqlite_scan` function: ```sql -- Scan the table "tbl_name" from the SQLite file "test.db" SELECT * FROM sqlite_scan('test.db', 'tbl_name'); ``` Alternatively, the entire file can be attached using the `ATTACH` command. This allows you to query all tables stored within a SQLite database file as if they were a regular database. ```sql -- Attach the SQLite file "test.db" ATTACH 'test.db' AS test (TYPE sqlite); -- The table "tbl_name" can now be queried as if it is a regular table SELECT * FROM test.tbl_name; -- Switch the active database to "test" USE test; -- List all tables in the file SHOW TABLES; ``` For more information see the [SQLite extension documentation](#docs:stable:core_extensions:sqlite). ## File Formats {#guides:file_formats} ### File Formats {#docs:stable:guides:file_formats:overview} ### CSV Import {#docs:stable:guides:file_formats:csv_import} To read data from a CSV file, use the `read_csv` function in the `FROM` clause of a query: ```sql SELECT * FROM read_csv('input.csv'); ``` Alternatively, you can omit the `read_csv` function and let DuckDB infer it from the extension: ```sql SELECT * FROM 'input.csv'; ``` To create a new table using the result from a query, use [`CREATE TABLE ... AS SELECT` statement](#docs:stable:sql:statements:create_table::create-table--as-select-ctas): ```sql CREATE TABLE new_tbl AS SELECT * FROM read_csv('input.csv'); ``` We can use DuckDB's [optional `FROM`-first syntax](#docs:stable:sql:query_syntax:from) to omit `SELECT *`: ```sql CREATE TABLE new_tbl AS FROM read_csv('input.csv'); ``` To load data into an existing table from a query, use `INSERT INTO` from a `SELECT` statement: ```sql INSERT INTO tbl SELECT * FROM read_csv('input.csv'); ``` Alternatively, the `COPY` statement can also be used to load data from a CSV file into an existing table: ```sql COPY tbl FROM 'input.csv'; ``` For additional options, see the [CSV import reference](#docs:stable:data:csv:overview) and the [`COPY` statement documentation](#docs:stable:sql:statements:copy). ### CSV Export {#docs:stable:guides:file_formats:csv_export} To export the data from a table to a CSV file, use the `COPY` statement: ```sql COPY tbl TO 'output.csv' (HEADER, DELIMITER ','); ``` The result of queries can also be directly exported to a CSV file: ```sql COPY (SELECT * FROM tbl) TO 'output.csv' (HEADER, DELIMITER ','); ``` For additional options, see the [`COPY` statement documentation](#docs:stable:sql:statements:copy::csv-options). ### Directly Reading Files {#docs:stable:guides:file_formats:read_file} DuckDB allows directly reading files via the [`read_text`](#::read_text) and [`read_blob`](#::read_blob) functions. These functions accept a filename, a list of filenames or a glob pattern, and output the content of each file as a `VARCHAR` or `BLOB`, respectively, as well as additional metadata such as the file size and last modified time. #### `read_text` {#docs:stable:guides:file_formats:read_file::read_text} The `read_text` table function reads from the selected source(s) to a `VARCHAR`. Each file results in a single row with the `content` field holding the entire content of the respective file. ```sql SELECT size, parse_path(filename), content FROM read_text('test/sql/table_function/files/*.txt'); ``` | size | parse_path(filename) | content | |-----:|-----------------------------------------------|------------------| | 12 | [test, sql, table_function, files, one.txt] | Hello World! | | 2 | [test, sql, table_function, files, three.txt] | 42 | | 10 | [test, sql, table_function, files, two.txt] | Foo Bar\nFÃ¶Ã¶ BÃ¤r | The file content is first validated to be valid UTF-8. If `read_text` attempts to read a file with invalid UTF-8, an error is thrown suggesting to use [`read_blob`](#::read_blob) instead. #### `read_blob` {#docs:stable:guides:file_formats:read_file::read_blob} The `read_blob` table function reads from the selected source(s) to a `BLOB`: ```sql SELECT size, content, filename FROM read_blob('test/sql/table_function/files/*'); ``` | size | content | filename | |-----:|--------------------------------------------------------------|-----------------------------------------| | 178 | PK\x03\x04\x0A\x00\x00\x00\x00\x00\xACi=X\x14t\xCE\xC7\x0Aâ€¦ | test/sql/table_function/files/four.blob | | 12 | Hello World! | test/sql/table_function/files/one.txt | | 2 | 42 | test/sql/table_function/files/three.txt | | 10 | F\xC3\xB6\xC3\xB6 B\xC3\xA4r | test/sql/table_function/files/two.txt | #### Schema {#docs:stable:guides:file_formats:read_file::schema} The schemas of the tables returned by `read_text` and `read_blob` are identical: ```sql DESCRIBE FROM read_text('README.md'); ``` | column_name | column_type | null | key | default | extra | |---------------|-------------|------|------|---------|-------| | filename | VARCHAR | YES | NULL | NULL | NULL | | content | VARCHAR | YES | NULL | NULL | NULL | | size | BIGINT | YES | NULL | NULL | NULL | | last_modified | TIMESTAMP | YES | NULL | NULL | NULL | #### Hive Partitioning {#docs:stable:guides:file_formats:read_file::hive-partitioning} Data can be read from [Hive partitioned](#docs:stable:data:partitioning:hive_partitioning) datasets. ```sql SELECT * FROM read_blob('data/parquet-testing/hive-partitioning/simple/**/*.parquet') WHERE part IN ('a', 'b') AND date >= '2012-01-01'; ``` | filename | content | size | last_modified | date | part | |---------------------------------------|-------------------------------|------|------------------------|------------|---------| | â€¦/part=a/date=2012-01-01/test.parquet | PAR1\x15\x00\x15\x14\x15\x18â€¦ | 266 | 2024-11-12 02:23:20+00 | 2012-01-01 | a | | â€¦/part=b/date=2013-01-01/test.parquet | PAR1\x15\x00\x15\x14\x15\x18â€¦ | 266 | 2024-11-12 02:23:20+00 | 2013-01-01 | b | #### Handling Missing Metadata {#docs:stable:guides:file_formats:read_file::handling-missing-metadata} In cases where the underlying filesystem is unable to provide some of this data due (e.g., because HTTPFS can't always return a valid timestamp), the cell is set to `NULL` instead. #### Support for Projection Pushdown {#docs:stable:guides:file_formats:read_file::support-for-projection-pushdown} The table functions also utilize projection pushdown to avoid computing properties unnecessarily. So you could e.g., use this to glob a directory full of huge files to get the file size in the size column, as long as you omit the content column the data won't be read into DuckDB. ### Excel Import {#docs:stable:guides:file_formats:excel_import} DuckDB supports reading Excel `.xlsx` files, however, `.xls` files are not supported. #### Importing Excel Sheets {#docs:stable:guides:file_formats:excel_import::importing-excel-sheets} Use the `read_xlsx` function in the `FROM` clause of a query: ```sql SELECT * FROM read_xlsx('test_excel.xlsx'); ``` Alternatively, you can omit the `read_xlsx` function and let DuckDB infer it from the extension: ```sql SELECT * FROM 'test_excel.xlsx'; ``` However, if you want to be able to pass options to control the import behavior, you should use the `read_xlsx` function. One such option is the `sheet` parameter, which allows specifying the name of the Excel worksheet: ```sql SELECT * FROM read_xlsx('test_excel.xlsx', sheet = 'Sheet1'); ``` By default, the first sheet is loaded if no sheet is specified. #### Importing a Specific Range {#docs:stable:guides:file_formats:excel_import::importing-a-specific-range} To select a specific range of cells, use the `range` parameter with a string in the format `A1:B2`, where `A1` is the top-left cell and `B2` is the bottom-right cell: ```sql SELECT * FROM read_xlsx('test_excel.xlsx', range = 'A1:B2'); ``` This can also be used to, e.g., skip the first 5 of rows: ```sql SELECT * FROM read_xlsx('test_excel.xlsx', range = 'A5:Z'); ``` Or skip the first 5 columns ```sql SELECT * FROM read_xlsx('test_excel.xlsx', range = 'E:Z'); ``` If no range parameter is provided, the range is automatically inferred as the rectangular region of cells between the first row of consecutive non-empty cells and the first empty row spanning the same columns. By default, if no range is provided DuckDB will stop reading the Excel file at when encountering an empty row. But when a range is provided, the default is to read until the end of the range. This behavior can be controlled with the `stop_at_empty` parameter: ```sql -- Read the first 100 rows, or until the first empty row, whichever comes first SELECT * FROM read_xlsx('test_excel.xlsx', range = '1:100', stop_at_empty = true); -- Always read the whole sheet, even if it contains empty rows SELECT * FROM read_xlsx('test_excel.xlsx', stop_at_empty = false); ``` #### Creating a New Table {#docs:stable:guides:file_formats:excel_import::creating-a-new-table} To create a new table using the result from a query, use `CREATE TABLE ... AS` from a `SELECT` statement: ```sql CREATE TABLE new_tbl AS SELECT * FROM read_xlsx('test_excel.xlsx', sheet = 'Sheet1'); ``` #### Loading to an Existing Table {#docs:stable:guides:file_formats:excel_import::loading-to-an-existing-table} To load data into an existing table from a query, use `INSERT INTO` from a `SELECT` statement: ```sql INSERT INTO tbl SELECT * FROM read_xlsx('test_excel.xlsx', sheet = 'Sheet1'); ``` Alternatively, you can use the `COPY` statement with the `XLSX` format option to import an Excel file into an existing table: ```sql COPY tbl FROM 'test_excel.xlsx' (FORMAT xlsx, SHEET 'Sheet1'); ``` When using the `COPY` statement to load an Excel file into an existing table, the types of the columns in the target table will be used to coerce the types of the cells in the Excel sheet. #### Importing a Sheet with/without a Header {#docs:stable:guides:file_formats:excel_import::importing-a-sheet-withwithout-a-header} To treat the first row as containing the names of the resulting columns, use the `header` parameter: ```sql SELECT * FROM read_xlsx('test_excel.xlsx', header = true); ``` By default, the first row is treated as a header if all the cells in the first row (within the inferred or supplied range) are non-empty strings. To disable this behavior, set `header` to `false`. #### Detecting Types {#docs:stable:guides:file_formats:excel_import::detecting-types} When not importing into an existing table, DuckDB will attempt to infer the types of the columns in the Excel sheet based on their contents and/or "number format". - `TIMESTAMP`, `TIME`, `DATE` and `BOOLEAN` types are inferred when possible based on the "number format" applied to the cell. - Text cells containing `TRUE` and `FALSE` are inferred as `BOOLEAN`. - Empty cells are considered to be of type `DOUBLE` by default. - Otherwise cells are inferred as `VARCHAR` or `DOUBLE` based on their contents. This behavior can be adjusted in the following ways. To treat all empty cells as `VARCHAR` instead of `DOUBLE`, set `empty_as_varchar` to `true`: ```sql SELECT * FROM read_xlsx('test_excel.xlsx', empty_as_varchar = true); ``` To disable type inference completely and treat all cells as `VARCHAR`, set `all_varchar` to `true`: ```sql SELECT * FROM read_xlsx('test_excel.xlsx', all_varchar = true); ``` Additionally, if the `ignore_errors` parameter is set to `true`, DuckDB will silently replace cells that can't be cast to the corresponding inferred column type with `NULL`'s. ```sql SELECT * FROM read_xlsx('test_excel.xlsx', ignore_errors = true); ``` #### See Also {#docs:stable:guides:file_formats:excel_import::see-also} DuckDB can also [export Excel files](#docs:stable:guides:file_formats:excel_export). For additional details on Excel support, see the [excel extension page](#docs:stable:core_extensions:excel). ### Excel Export {#docs:stable:guides:file_formats:excel_export} DuckDB supports exporting data to Excel `.xlsx` files via the `excel` extension. Please note that `.xls` files are not supported. To install and load the extension, run: ```sql INSTALL excel; LOAD excel; ``` #### Exporting Excel Sheets {#docs:stable:guides:file_formats:excel_export::exporting-excel-sheets} To export a table to an Excel file, use the `COPY` statement with the `FORMAT xlsx` option: ```sql COPY tbl TO 'output.xlsx' WITH (FORMAT xlsx); ``` The result of a query can also be directly exported to an Excel file: ```sql COPY (SELECT * FROM tbl) TO 'output.xlsx' WITH (FORMAT xlsx); ``` Or: ```sql COPY (SELECT * FROM tbl) TO 'output.xlsx'; ``` To write the column names as the first row in the Excel file, use the `HEADER` option: ```sql COPY tbl TO 'output.xlsx' WITH (FORMAT xlsx, HEADER true); ``` To name the worksheet in the resulting Excel file, use the `SHEET` option: ```sql COPY tbl TO 'output.xlsx' WITH (FORMAT xlsx, SHEET 'Sheet1'); ``` #### Type Conversions {#docs:stable:guides:file_formats:excel_export::type-conversions} Because Excel only really supports storing numbers or strings â€“ the equivalent of `VARCHAR` and `DOUBLE`, the following type conversions are automatically applied when writing XLSX files: * Numeric types are cast to `DOUBLE`. * Temporal types (` TIMESTAMP`, `DATE`, `TIME`, etc.) are converted to Excel "serial" numbers, that is the number of days since 1900-01-01 for dates and the fraction of a day for times. These are then styled with a "number format" so that they appear as dates or times when opened in Excel. * `TIMESTAMP_TZ` and `TIME_TZ` are cast to UTC `TIMESTAMP` and `TIME` respectively, with the timezone information being lost. * `BOOLEAN`s are converted to `1` and `0`, with a "number format" applied to make them appear as `TRUE` and `FALSE` in Excel. * All other types are cast to `VARCHAR` and then written as text cells. But you can of course also explicitly cast columns to a different type before exporting them to Excel: ```sql COPY (SELECT CAST(a AS VARCHAR), b FROM tbl) TO 'output.xlsx' WITH (FORMAT xlsx); ``` #### See Also {#docs:stable:guides:file_formats:excel_export::see-also} DuckDB can also [import Excel files](#docs:stable:guides:file_formats:excel_import). For additional details on Excel support, see the [`excel` extension page](#docs:stable:core_extensions:excel). ### JSON Import {#docs:stable:guides:file_formats:json_import} To read data from a JSON file, use the `read_json_auto` function in the `FROM` clause of a query: ```sql SELECT * FROM read_json_auto('input.json'); ``` To create a new table using the result from a query, use `CREATE TABLE AS` from a `SELECT` statement: ```sql CREATE TABLE new_tbl AS SELECT * FROM read_json_auto('input.json'); ``` To load data into an existing table from a query, use `INSERT INTO` from a `SELECT` statement: ```sql INSERT INTO tbl SELECT * FROM read_json_auto('input.json'); ``` Alternatively, the `COPY` statement can also be used to load data from a JSON file into an existing table: ```sql COPY tbl FROM 'input.json'; ``` For additional options, see the [JSON Loading reference](#docs:stable:data:json:overview) and the [`COPY` statement documentation](#docs:stable:sql:statements:copy). ### JSON Export {#docs:stable:guides:file_formats:json_export} To export the data from a table to a JSON file, use the `COPY` statement: ```sql COPY tbl TO 'output.json'; ``` The result of queries can also be directly exported to a JSON file: ```sql COPY (SELECT * FROM range(3) tbl(n)) TO 'output.json'; ``` ```text {"n":0} {"n":1} {"n":2} ``` The JSON export writes JSON lines by default, standardized as [Newline-delimited JSON](https://en.wikipedia.org/wiki/JSON_streaming#NDJSON). The `ARRAY` option can be used to write a single JSON array object instead. ```sql COPY (SELECT * FROM range(3) tbl(n)) TO 'output.json' (ARRAY); ``` ```text [ {"n":0}, {"n":1}, {"n":2} ] ``` For additional options, see the [`COPY` statement documentation](#docs:stable:sql:statements:copy). ### Parquet Import {#docs:stable:guides:file_formats:parquet_import} To read data from a Parquet file, use the `read_parquet` function in the `FROM` clause of a query: ```sql SELECT * FROM read_parquet('input.parquet'); ``` Alternatively, you can omit the `read_parquet` function and let DuckDB infer it from the extension: ```sql SELECT * FROM 'input.parquet'; ``` To create a new table using the result from a query, use [`CREATE TABLE ... AS SELECT` statement](#docs:stable:sql:statements:create_table::create-table--as-select-ctas): ```sql CREATE TABLE new_tbl AS SELECT * FROM read_parquet('input.parquet'); ``` To load data into an existing table from a query, use `INSERT INTO` from a `SELECT` statement: ```sql INSERT INTO tbl SELECT * FROM read_parquet('input.parquet'); ``` Alternatively, the `COPY` statement can also be used to load data from a Parquet file into an existing table: ```sql COPY tbl FROM 'input.parquet' (FORMAT parquet); ``` #### Adjusting the Schema on the Fly {#docs:stable:guides:file_formats:parquet_import::adjusting-the-schema-on-the-fly} You can load a Parquet file into a slightly different schema (e.g., different number of columns, more relaxed types) using the following trick. Suppose we have a Parquet file with two columns, `c1` and `c2`: ```sql COPY (FROM (VALUES (42, 43)) t(c1, c2)) TO 'f.parquet'; ``` If we want to add another column `c3` that is not present in the file, we can run: ```sql FROM (VALUES(NULL::VARCHAR, NULL, NULL)) t(c1, c2, c3) WHERE false UNION ALL BY NAME FROM 'f.parquet'; ``` The first `FROM` clause generates an empty tables with *three* columns where `c1` is a `VARCHAR`. Then, we use `UNION ALL BY NAME` to union the Parquet file. The result here is: ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â” â”‚ c1 â”‚ c2 â”‚ c3 â”‚ â”‚ varchar â”‚ int32 â”‚ int32 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 42 â”‚ 43 â”‚ NULL â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”˜ ``` For additional options, see the [Parquet loading reference](#docs:stable:data:parquet:overview). ### Parquet Export {#docs:stable:guides:file_formats:parquet_export} To export the data from a table to a Parquet file, use the `COPY` statement: ```sql COPY tbl TO 'output.parquet' (FORMAT parquet); ``` The result of queries can also be directly exported to a Parquet file: ```sql COPY (SELECT * FROM tbl) TO 'output.parquet' (FORMAT parquet); ``` The flags for setting compression, row group size, etc. are listed in the [Reading and Writing Parquet files](#docs:stable:data:parquet:overview) page. ### Querying Parquet Files {#docs:stable:guides:file_formats:query_parquet} To run a query directly on a Parquet file, use the `read_parquet` function in the `FROM` clause of a query. ```sql SELECT * FROM read_parquet('input.parquet'); ``` The Parquet file will be processed in parallel. Filters will be automatically pushed down into the Parquet scan, and only the relevant columns will be read automatically. For more information see the blog post [â€œQuerying Parquet with Precision using DuckDBâ€](https://duckdb.org/2021/06/25/querying-parquet). ### File Access with the file: Protocol {#docs:stable:guides:file_formats:file_access} DuckDB supports using the `file:` protocol. It currently supports the following formats: * `file:/some/path` (host omitted completely) * `file:///some/path` (empty host) * `file://localhost/some/path` (` localhost` as host) Note that the following formats are *not* supported because they are non-standard: * `file:some/relative/path` (relative path) * `file://some/path` (double-slash path) Additionally, the `file:` protocol currently does not support remote (non-localhost) hosts. ## Network and Cloud Storage {#guides:network_cloud_storage} ### Network and Cloud Storage {#docs:stable:guides:network_cloud_storage:overview} ### HTTP Parquet Import {#docs:stable:guides:network_cloud_storage:http_import} To load a Parquet file over HTTP(S), the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview) is required. This can be installed using the `INSTALL` SQL command. This only needs to be run once. ```sql INSTALL httpfs; ``` To load the `httpfs` extension for usage, use the `LOAD` SQL command: ```sql LOAD httpfs; ``` After the `httpfs` extension is set up, Parquet files can be read over `http(s)`: ```sql SELECT * FROM read_parquet('https://âŸ¨domainâŸ©/path/to/file.parquet'); ``` For example: ```sql SELECT * FROM read_parquet('https://duckdb.org/data/prices.parquet'); ``` The function `read_parquet` can be omitted if the URL ends with `.parquet`: ```sql SELECT * FROM read_parquet('https://duckdb.org/data/holdings.parquet'); ``` Moreover, the `read_parquet` function itself can also be omitted thanks to DuckDB's [replacement scan mechanism](#docs:stable:clients:c:replacement_scans): ```sql SELECT * FROM 'https://duckdb.org/data/holdings.parquet'; ``` ### S3 Parquet Import {#docs:stable:guides:network_cloud_storage:s3_import} #### Prerequisites {#docs:stable:guides:network_cloud_storage:s3_import::prerequisites} To load a Parquet file from S3, the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview) is required. This can be installed using the `INSTALL` SQL command. This only needs to be run once. ```sql INSTALL httpfs; ``` To load the `httpfs` extension for usage, use the `LOAD` SQL command: ```sql LOAD httpfs; ``` #### Credentials and Configuration {#docs:stable:guides:network_cloud_storage:s3_import::credentials-and-configuration} After loading the `httpfs` extension, set up the credentials and S3 region to read data: ```sql CREATE SECRET ( TYPE s3, KEY_ID 'âŸ¨AKIAIOSFODNN7EXAMPLEâŸ©', SECRET 'âŸ¨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYâŸ©', REGION 'âŸ¨us-east-1âŸ©' ); ``` > **Tip.** If you get an IO Error (` Connection error for HTTP HEAD`), configure the endpoint explicitly via `ENDPOINT 's3.âŸ¨your_regionâŸ©.amazonaws.com'`{:.language-sql .highlight}. Alternatively, use the [`aws` extension](#docs:stable:core_extensions:aws) to retrieve the credentials automatically: ```sql CREATE SECRET ( TYPE s3, PROVIDER credential_chain ); ``` #### Querying {#docs:stable:guides:network_cloud_storage:s3_import::querying} After the `httpfs` extension is set up and the S3 configuration is set correctly, Parquet files can be read from S3 using the following command: ```sql SELECT * FROM read_parquet('s3://âŸ¨bucketâŸ©/âŸ¨fileâŸ©'); ``` #### Google Cloud Storage (GCS) and Cloudflare R2 {#docs:stable:guides:network_cloud_storage:s3_import::google-cloud-storage-gcs-and-cloudflare-r2} DuckDB can also handle [Google Cloud Storage (GCS)](#docs:stable:guides:network_cloud_storage:gcs_import) and [Cloudflare R2](#docs:stable:guides:network_cloud_storage:cloudflare_r2_import) via the S3 API. See the relevant guides for details. ### S3 Parquet Export {#docs:stable:guides:network_cloud_storage:s3_export} To write a Parquet file to S3, the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview) is required. This can be installed using the `INSTALL` SQL command. This only needs to be run once. ```sql INSTALL httpfs; ``` To load the `httpfs` extension for usage, use the `LOAD` SQL command: ```sql LOAD httpfs; ``` After loading the `httpfs` extension, set up the credentials to write data. Note that the `region` parameter should match the region of the bucket you want to access. ```sql CREATE SECRET ( TYPE s3, KEY_ID 'âŸ¨AKIAIOSFODNN7EXAMPLEâŸ©', SECRET 'âŸ¨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYâŸ©', REGION 'âŸ¨us-east-1âŸ©' ); ``` > **Tip.** If you get an IO Error (` Connection error for HTTP HEAD`), configure the endpoint explicitly via `ENDPOINT 's3.âŸ¨your_regionâŸ©.amazonaws.com'`{:.language-sql .highlight}. Alternatively, use the [`aws` extension](#docs:stable:core_extensions:aws) to retrieve the credentials automatically: ```sql CREATE SECRET ( TYPE s3, PROVIDER credential_chain ); ``` After the `httpfs` extension is set up and the S3 credentials are correctly configured, Parquet files can be written to S3 using the following command: ```sql COPY âŸ¨table_nameâŸ© TO 's3://âŸ¨s3-bucketâŸ©/âŸ¨filenameâŸ©.parquet'; ``` Similarly, Google Cloud Storage (GCS) is supported through the Interoperability API. You need to create [HMAC keys](https://console.cloud.google.com/storage/settings;tab=interoperability) and provide the credentials as follows: ```sql CREATE SECRET ( TYPE gcs, KEY_ID 'âŸ¨AKIAIOSFODNN7EXAMPLEâŸ©', SECRET 'âŸ¨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYâŸ©' ); ``` After setting up the GCS credentials, you can export using: ```sql COPY âŸ¨table_nameâŸ© TO 'gs://âŸ¨gcs_bucketâŸ©/âŸ¨filenameâŸ©.parquet'; ``` ### S3 Iceberg Import {#docs:stable:guides:network_cloud_storage:s3_iceberg_import} #### Prerequisites {#docs:stable:guides:network_cloud_storage:s3_iceberg_import::prerequisites} To load an Iceberg file from S3, both the [`httpfs`](#docs:stable:core_extensions:httpfs:overview) and [`iceberg`](#docs:stable:core_extensions:iceberg:overview) extensions are required. They can be installed using the `INSTALL` SQL command. The extensions only need to be installed once. ```sql INSTALL httpfs; INSTALL iceberg; ``` To load the extensions for usage, use the `LOAD` command: ```sql LOAD httpfs; LOAD iceberg; ``` #### Credentials {#docs:stable:guides:network_cloud_storage:s3_iceberg_import::credentials} After loading the extensions, set up the credentials and S3 region to read data. You may either use an access key and secret, or a token. ```sql CREATE SECRET ( TYPE s3, KEY_ID 'âŸ¨AKIAIOSFODNN7EXAMPLEâŸ©', SECRET 'âŸ¨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYâŸ©', REGION 'âŸ¨us-east-1âŸ©' ); ``` Alternatively, use the [`aws` extension](#docs:stable:core_extensions:aws) to retrieve the credentials automatically: ```sql CREATE SECRET ( TYPE s3, PROVIDER credential_chain ); ``` #### Loading Iceberg Tables from S3 {#docs:stable:guides:network_cloud_storage:s3_iceberg_import::loading-iceberg-tables-from-s3} After the extensions are set up and the S3 credentials are correctly configured, Iceberg table can be read from S3 using the following command: ```sql SELECT * FROM iceberg_scan('s3://âŸ¨bucketâŸ©/âŸ¨iceberg_table_folderâŸ©/metadata/âŸ¨idâŸ©.metadata.json'); ``` Note that you need to link directly to the manifest file. Otherwise you'll get an error like this: ```console IO Error: Cannot open file "s3://bucket/iceberg_table_folder/metadata/version-hint.text": No such file or directory ``` ### S3 Express One {#docs:stable:guides:network_cloud_storage:s3_express_one} In late 2023, AWS [announced](https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-s3-express-one-zone-storage-class/) the [S3 Express One Zone](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-express-one-zone.html), a high-speed variant of traditional S3 buckets. DuckDB can read S3 Express One buckets using the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview). #### Credentials and Configuration {#docs:stable:guides:network_cloud_storage:s3_express_one::credentials-and-configuration} The configuration of S3 Express One buckets is similar to [regular S3 buckets](#docs:stable:guides:network_cloud_storage:s3_import) with one exception: we have to specify the endpoint according to the following pattern: ```sql s3express-âŸ¨availability_zoneâŸ©.âŸ¨regionâŸ©.amazonaws.com ``` where the `âŸ¨availability_zoneâŸ©`{:.language-sql .highlight} (e.g., `use-az5`) can be obtained from the S3 Express One bucket's configuration page and the `âŸ¨regionâŸ©`{:.language-sql .highlight} is the AWS region (e.g., `us-east-1`). For example, to allow DuckDB to use an S3 Express One bucket, configure the [Secrets manager](#docs:stable:sql:statements:create_secret) as follows: ```sql CREATE SECRET ( TYPE s3, KEY_ID 'âŸ¨AKIAIOSFODNN7EXAMPLEâŸ©', SECRET 'âŸ¨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYâŸ©', REGION 'âŸ¨us-east-1âŸ©', ENDPOINT 's3express-âŸ¨use1-az5âŸ©.âŸ¨us-east-1âŸ©.amazonaws.com' ); ``` #### Instance Location {#docs:stable:guides:network_cloud_storage:s3_express_one::instance-location} For best performance, make sure that the EC2 instance is in the same availability zone as the S3 Express One bucket you are querying. To determine the mapping between zone names and zone IDs, use the `aws ec2 describe-availability-zones` command. * Zone name to zone ID mapping: ```bash aws ec2 describe-availability-zones --output json \ | jq -r '.AvailabilityZones[] | select(.ZoneName == "us-east-1f") | .ZoneId' ``` ```text use1-az5 ``` * Zone ID to zone name mapping: ```bash aws ec2 describe-availability-zones --output json \ | jq -r '.AvailabilityZones[] | select(.ZoneId == "use1-az5") | .ZoneName' ``` ```text us-east-1f ``` #### Querying {#docs:stable:guides:network_cloud_storage:s3_express_one::querying} You can query the S3 Express One bucket as any other S3 bucket: ```sql SELECT * FROM 's3://express-bucket-name--use1-az5--x-s3/my-file.parquet'; ``` #### Performance {#docs:stable:guides:network_cloud_storage:s3_express_one::performance} We ran two experiments on a `c7gd.12xlarge` instance using the [LDBC SF300 Comments `creationDate` Parquet file](https://blobs.duckdb.org/data/ldbc-sf300-comments-creationDate.parquet) file (also used in the [microbenchmarks of the performance guide](#docs:stable:guides:performance:benchmarks::data-sets)). | Experiment | File size | Runtime | |:-----|--:|--:| | Loading only from Parquet | 4.1 GB | 3.5 s | | Creating local table from Parquet | 4.1 GB | 5.1 s | The â€œloading onlyâ€ variant is running the load as part of an [`EXPLAIN ANALYZE`](#docs:stable:guides:meta:explain_analyze) statement to measure the runtime without account creating a local table, while the â€œcreating local tableâ€ variant uses [`CREATE TABLE ... AS SELECT`](#docs:stable:sql:statements:create_table::create-table--as-select-ctas) to create a persistent table on the local disk. ### Google Cloud Storage Import {#docs:stable:guides:network_cloud_storage:gcs_import} #### Prerequisites {#docs:stable:guides:network_cloud_storage:gcs_import::prerequisites} The Google Cloud Storage (GCS) can be used via the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview). This can be installed with the `INSTALL httpfs` SQL command. This only needs to be run once. #### Credentials and Configuration {#docs:stable:guides:network_cloud_storage:gcs_import::credentials-and-configuration} You need to create [HMAC keys](https://console.cloud.google.com/storage/settings;tab=interoperability) and declare them: ```sql CREATE SECRET ( TYPE gcs, KEY_ID 'âŸ¨AKIAIOSFODNN7EXAMPLEâŸ©', SECRET 'âŸ¨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYâŸ©' ); ``` #### Querying {#docs:stable:guides:network_cloud_storage:gcs_import::querying} After setting up the GCS credentials, you can query the GCS data using: ```sql SELECT * FROM read_parquet('gs://âŸ¨gcs_bucketâŸ©/âŸ¨file.parquetâŸ©'); ``` #### Attaching to a Database {#docs:stable:guides:network_cloud_storage:gcs_import::attaching-to-a-database} You can [attach to a database file](#docs:stable:guides:network_cloud_storage:duckdb_over_https_or_s3) in read-only mode: ```sql LOAD httpfs; ATTACH 'gs://âŸ¨gcs_bucketâŸ©/âŸ¨file.duckdbâŸ©' AS âŸ¨duckdb_databaseâŸ© (READ_ONLY); ``` > Databases in Google Cloud Storage can only be attached in read-only mode. ### Cloudflare R2 Import {#docs:stable:guides:network_cloud_storage:cloudflare_r2_import} #### Prerequisites {#docs:stable:guides:network_cloud_storage:cloudflare_r2_import::prerequisites} For Cloudflare R2, the [S3 Compatibility API](https://developers.cloudflare.com/r2/api/s3/api/) allows you to use DuckDB's S3 support to read and write from R2 buckets. This requires the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview), which can be installed using the `INSTALL` SQL command. This only needs to be run once. #### Credentials and Configuration {#docs:stable:guides:network_cloud_storage:cloudflare_r2_import::credentials-and-configuration} You will need to [generate an S3 auth token](https://developers.cloudflare.com/r2/api/s3/tokens/) and create an `R2` secret in DuckDB: ```sql CREATE SECRET ( TYPE r2, KEY_ID 'âŸ¨AKIAIOSFODNN7EXAMPLEâŸ©', SECRET 'âŸ¨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYâŸ©', ACCOUNT_ID 'âŸ¨your-33-character-hexadecimal-account-IDâŸ©' ); ``` #### Querying {#docs:stable:guides:network_cloud_storage:cloudflare_r2_import::querying} After setting up the R2 credentials, you can query the R2 data using DuckDB's built-in methods, such as `read_csv` or `read_parquet`: ```sql SELECT * FROM read_parquet('r2://âŸ¨r2-bucket-nameâŸ©/âŸ¨fileâŸ©'); ``` ### Attach to a DuckDB Database over HTTPS or S3 {#docs:stable:guides:network_cloud_storage:duckdb_over_https_or_s3} You can establish a read-only connection to a DuckDB instance via HTTPS or the S3 API. #### Prerequisites {#docs:stable:guides:network_cloud_storage:duckdb_over_https_or_s3::prerequisites} This guide requires the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview), which can be installed using the `INSTALL httpfs` SQL command. This only needs to be run once. #### Attaching to a Database over HTTPS {#docs:stable:guides:network_cloud_storage:duckdb_over_https_or_s3::attaching-to-a-database-over-https} To connect to a DuckDB database via HTTPS, use the [`ATTACH` statement](#docs:stable:sql:statements:attach) as follows: ```sql ATTACH 'https://blobs.duckdb.org/databases/stations.duckdb' AS stations_db; ``` > Since DuckDB version 1.1, the `ATTACH` statement creates a read-only connection to HTTP endpoints. > In prior versions, it is necessary to use the `READ_ONLY` flag. Then, the database can be queried using: ```sql SELECT count(*) AS num_stations FROM stations_db.stations; ``` | num_stations | |-------------:| | 578 | #### Attaching to a Database over the S3 API {#docs:stable:guides:network_cloud_storage:duckdb_over_https_or_s3::attaching-to-a-database-over-the-s3-api} To connect to a DuckDB database via the S3 API, [configure the authentication](#docs:stable:guides:network_cloud_storage:s3_import::credentials-and-configuration) for your bucket (if required). Then, use the [`ATTACH` statement](#docs:stable:sql:statements:attach) as follows: ```sql ATTACH 's3://duckdb-blobs/databases/stations.duckdb' AS stations_db; ``` > Since DuckDB version 1.1, the `ATTACH` statement creates a read-only connection to HTTP endpoints. > In prior versions, it is necessary to use the `READ_ONLY` flag. The database can be queried using: ```sql SELECT count(*) AS num_stations FROM stations_db.stations; ``` | num_stations | |-------------:| | 578 | > Connecting to S3-compatible APIs such as the [Google Cloud Storage (` gs://`)](#docs:stable:guides:network_cloud_storage:gcs_import::attaching-to-a-database) is also supported. #### Limitations {#docs:stable:guides:network_cloud_storage:duckdb_over_https_or_s3::limitations} * Only read-only connections are allowed, writing the database via the HTTPS protocol or the S3 API is not possible. ### Fastly Object Storage Import {#docs:stable:guides:network_cloud_storage:fastly_object_storage_import} #### Prerequisites {#docs:stable:guides:network_cloud_storage:fastly_object_storage_import::prerequisites} For Fastly Object Storage, the [S3 Compatibility API](https://docs.fastly.com/products/object-storage) allows you to use DuckDB's S3 support to read and write from Fastly buckets. This requires the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview), which can be installed using the `INSTALL` SQL command. This only needs to be run once. #### Credentials and Configuration {#docs:stable:guides:network_cloud_storage:fastly_object_storage_import::credentials-and-configuration} You will need to [generate an S3 auth token](https://docs.fastly.com/en/guides/working-with-object-storage#creating-an-object-storage-access-key) and create an `S3` secret in DuckDB: ```sql CREATE SECRET my_secret ( TYPE s3, KEY_ID 'âŸ¨AKIAIOSFODNN7EXAMPLEâŸ©', SECRET 'âŸ¨wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEYâŸ©', URL_STYLE 'path', REGION 'âŸ¨us-eastâŸ©', ENDPOINT 'âŸ¨us-eastâŸ©.object.fastlystorage.app' -- see note below ); ``` * The `ENDPOINT` needs to point to the [Fastly endpoint for the region](https://docs.fastly.com/en/guides/working-with-object-storage#working-with-the-s3-compatible-api) you want to use (e.g `eu-central.object.fastlystorage.app`). * `REGION` must use the same region mentioned in `ENDPOINT`. * `URL_STYLE` needs to use `path`. #### Querying {#docs:stable:guides:network_cloud_storage:fastly_object_storage_import::querying} After setting up the Fastly Object Storage credentials, you can query the data there using DuckDB's built-in methods, such as `read_csv` or `read_parquet`: ```sql SELECT * FROM 's3://âŸ¨fastly-bucket-nameâŸ©/(file).csv'; SELECT * FROM read_parquet('s3://âŸ¨fastly-bucket-nameâŸ©/âŸ¨fileâŸ©.parquet'); ``` ## Meta Queries {#guides:meta} ### Describe {#docs:stable:guides:meta:describe} #### Describing a Table {#docs:stable:guides:meta:describe::describing-a-table} In order to view the schema of a table, use the `DESCRIBE` statement (or its aliases `DESC` and `SHOW`) followed by the table name. ```sql CREATE TABLE tbl (i INTEGER PRIMARY KEY, j VARCHAR); DESCRIBE tbl; SHOW tbl; -- equivalent to DESCRIBE tbl; ``` | column_name | column_type | null | key | default | extra | |-------------|-------------|------|------|---------|-------| | i | INTEGER | NO | PRI | NULL | NULL | | j | VARCHAR | YES | NULL | NULL | NULL | #### Describing a Query {#docs:stable:guides:meta:describe::describing-a-query} In order to view the schema of the result of a query, prepend `DESCRIBE` to a query. ```sql DESCRIBE SELECT * FROM tbl; ``` | column_name | column_type | null | key | default | extra | |-------------|-------------|------|------|---------|-------| | i | INTEGER | YES | NULL | NULL | NULL | | j | VARCHAR | YES | NULL | NULL | NULL | Note that there are subtle differences: compared to the result when [describing a table](#::describing-a-table), nullability (` null`) and key information (` key`) are lost. #### Using `DESCRIBE` in a Subquery {#docs:stable:guides:meta:describe::using-describe-in-a-subquery} `DESCRIBE` can be used a subquery. This allows creating a table from the description, for example: ```sql CREATE TABLE tbl_description AS SELECT * FROM (DESCRIBE tbl); ``` #### Describing Remote Tables {#docs:stable:guides:meta:describe::describing-remote-tables} It is possible to describe remote tables via the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview) using the `DESCRIBE TABLE` statement. For example: ```sql DESCRIBE TABLE 'https://blobs.duckdb.org/data/Star_Trek-Season_1.csv'; ``` | column_name | column_type | null | key | default | extra | |-----------------------------------------|-------------|------|------|---------|-------| | season_num | BIGINT | YES | NULL | NULL | NULL | | episode_num | BIGINT | YES | NULL | NULL | NULL | | aired_date | DATE | YES | NULL | NULL | NULL | | cnt_kirk_hookups | BIGINT | YES | NULL | NULL | NULL | | cnt_downed_redshirts | BIGINT | YES | NULL | NULL | NULL | | bool_aliens_almost_took_over_planet | BIGINT | YES | NULL | NULL | NULL | | bool_aliens_almost_took_over_enterprise | BIGINT | YES | NULL | NULL | NULL | | cnt_vulcan_nerve_pinch | BIGINT | YES | NULL | NULL | NULL | | cnt_warp_speed_orders | BIGINT | YES | NULL | NULL | NULL | | highest_warp_speed_issued | BIGINT | YES | NULL | NULL | NULL | | bool_hand_phasers_fired | BIGINT | YES | NULL | NULL | NULL | | bool_ship_phasers_fired | BIGINT | YES | NULL | NULL | NULL | | bool_ship_photon_torpedos_fired | BIGINT | YES | NULL | NULL | NULL | | cnt_transporter_pax | BIGINT | YES | NULL | NULL | NULL | | cnt_damn_it_jim_quote | BIGINT | YES | NULL | NULL | NULL | | cnt_im_givin_her_all_shes_got_quote | BIGINT | YES | NULL | NULL | NULL | | cnt_highly_illogical_quote | BIGINT | YES | NULL | NULL | NULL | | bool_enterprise_saved_the_day | BIGINT | YES | NULL | NULL | NULL | ### EXPLAIN: Inspect Query Plans {#docs:stable:guides:meta:explain} ```sql EXPLAIN SELECT * FROM tbl; ``` The `EXPLAIN` statement displays the physical plan, i.e., the query plan that will get executed, and is enabled by prepending the query with `EXPLAIN`. The physical plan is a tree of operators that are executed in a specific order to produce the result of the query. To generate an efficient physical plan, the query optimizer transforms the existing physical plan into a better physical plan. To demonstrate, see the below example: ```sql CREATE TABLE students (name VARCHAR, sid INTEGER); CREATE TABLE exams (eid INTEGER, subject VARCHAR, sid INTEGER); INSERT INTO students VALUES ('Mark', 1), ('Joe', 2), ('Matthew', 3); INSERT INTO exams VALUES (10, 'Physics', 1), (20, 'Chemistry', 2), (30, 'Literature', 3); EXPLAIN ANALYZE SELECT name FROM students JOIN exams USING (sid) WHERE name LIKE 'Ma%'; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”â”‚ â”‚â”‚ Physical Plan â”‚â”‚ â”‚â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ PROJECTION â”‚ â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚ â”‚ name â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ HASH_JOIN â”‚ â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚ â”‚ INNER â”‚ â”‚ sid = sid â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚ â”‚ â”‚ EC: 1 â”‚ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ â”‚ â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ SEQ_SCAN â”‚â”‚ FILTER â”‚ â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚ â”‚ exams â”‚â”‚ prefix(name, 'Ma') â”‚ â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚ â”‚ sid â”‚â”‚ EC: 1 â”‚ â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚â”‚ â”‚ â”‚ EC: 3 â”‚â”‚ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ SEQ_SCAN â”‚ â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚ â”‚ students â”‚ â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚ â”‚ sid â”‚ â”‚ name â”‚ â”‚ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”€ â”‚ â”‚ Filters: name>=Ma AND nameâ”‚ â”‚ =Ma AND nameâ”‚ â”‚ There are links throughout this page to the official [Microsoft ODBC documentation](https://learn.microsoft.com/en-us/sql/odbc/reference/odbc-programmer-s-reference?view=sql-server-ver16), which is a great resource for learning more about ODBC. #### General Concepts {#docs:stable:guides:odbc:general::general-concepts} * [Handles](#::handles) * [Connecting](#::connecting) * [Error Handling and Diagnostics](#::error-handling-and-diagnostics) * [Buffers and Binding](#::buffers-and-binding) ##### Handles {#docs:stable:guides:odbc:general::handles} A [handle](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/handles?view=sql-server-ver16) is a pointer to a specific ODBC object which is used to interact with the database. There are several different types of handles, each with a different purpose, these are the environment handle, the connection handle, the statement handle, and the descriptor handle. Handles are allocated using the [`SQLAllocHandle`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlallochandle-function?view=sql-server-ver16) which takes as input the type of handle to allocate, and a pointer to the handle, the driver then creates a new handle of the specified type which it returns to the application. The DuckDB ODBC driver has the following handle types. ###### Environment {#docs:stable:guides:odbc:general::environment} | | | |:--|:--------| | **Handle name** |[Environment](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/environment-handles?view=sql-server-ver16) | | **Type name** | `SQL_HANDLE_ENV` | | | | |:--|:--------| | **Description** |Manages the environment settings for ODBC operations, and provides a global context in which to access data. | | **Use case** | Initializing ODBC, managing driver behavior, resource allocation | | **Additional information** | Must be [allocated](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/allocating-the-environment-handle?view=sql-server-ver16) once per application upon starting, and freed at the end. | ###### Connection {#docs:stable:guides:odbc:general::connection} | | | |:--|:--------| | **Handle name** |[Connection](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/connection-handles?view=sql-server-ver16) | | **Type name** | `SQL_HANDLE_DBC` | | | | |:--|:--------| | **Description** |Represents a connection to a data source. Used to establish, manage, and terminate connections. Defines both the driver and the data source to use within the driver. | | **Use case** | Establishing a connection to a database, managing the connection state | | **Additional information** | Multiple connection handles can be [created](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/allocating-a-connection-handle-odbc?view=sql-server-ver16) as needed, allowing simultaneous connections to multiple data sources. *Note:* Allocating a connection handle does not establish a connection, but must be allocated first, and then used once the connection has been established. | ###### Statement {#docs:stable:guides:odbc:general::statement} | | | |:--|:--------| | **Handle name** |[Statement](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/statement-handles?view=sql-server-ver16) | **Type name** | `SQL_HANDLE_STMT` | | | |:--|:--------| | **Description** |Handles the execution of SQL statements, as well as the returned result sets. | **Use case** | Executing SQL queries, fetching result sets, managing statement options. | **Additional information** | To facilitate the execution of concurrent queries, multiple handles can be [allocated](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/allocating-a-statement-handle-odbc?view=sql-server-ver16) per connection. ###### Descriptor {#docs:stable:guides:odbc:general::descriptor} | | | |:--|:--------| | **Handle name** |[Descriptor](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/descriptor-handles?view=sql-server-ver16) | **Type name** | `SQL_HANDLE_DESC` | | | |:--|:--------| | **Description** |Describes the attributes of a data structure or parameter, and allows the application to specify the structure of data to be bound/retrieved. | **Use case** | Describing table structures, result sets, binding columns to application buffers | **Additional information** | Used in situations where data structures need to be explicitly defined, for example during parameter binding or result set fetching. They are automatically allocated when a statement is allocated, but can also be allocated explicitly. ##### Connecting {#docs:stable:guides:odbc:general::connecting} The first step is to connect to the data source so that the application can perform database operations. First the application must allocate an environment handle, and then a connection handle. The connection handle is then used to connect to the data source. There are two functions which can be used to connect to a data source, [`SQLDriverConnect`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqldriverconnect-function?view=sql-server-ver16) and [`SQLConnect`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlconnect-function?view=sql-server-ver16). The former is used to connect to a data source using a connection string, while the latter is used to connect to a data source using a DSN. ###### Connection String {#docs:stable:guides:odbc:general::connection-string} A [connection string](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/connection-strings?view=sql-server-ver16) is a string which contains the information needed to connect to a data source. It is formatted as a semicolon separated list of key-value pairs, however DuckDB currently only utilizes the DSN and ignores the rest of the parameters. ###### DSN {#docs:stable:guides:odbc:general::dsn} A DSN (_Data Source Name_) is a string that identifies a database. It can be a file path, URL, or a database name. For example: `C:\Users\me\duckdb.db` and `DuckDB` are both valid DSNs. More information on DSNs can be found on the [â€œChoosing a Data Source or Driverâ€ page of the SQL Server documentation](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/choosing-a-data-source-or-driver?view=sql-server-ver16). ##### Error Handling and Diagnostics {#docs:stable:guides:odbc:general::error-handling-and-diagnostics} All functions in ODBC return a code which represents the success or failure of the function. This allows for easy error handling, as the application can simply check the return code of each function call to determine if it was successful. When unsuccessful, the application can then use the [`SQLGetDiagRec`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlgetdiagrec-function?view=sql-server-ver16) function to retrieve the error information. The following table defines the [return codes](https://learn.microsoft.com/en-us/sql/odbc/reference/develop-app/return-codes-odbc?view=sql-server-ver16): | Return code | Description | |-------------------------|----------------------------------------------------| | `SQL_SUCCESS` | The function completed successfully | | `SQL_SUCCESS_WITH_INFO` | The function completed successfully, but additional information is available, including a warning | | `SQL_ERROR` | The function failed | | `SQL_INVALID_HANDLE` | The handle provided was invalid, indicating a programming error, i.e., when a handle is not allocated before it is used, or is the wrong type | | `SQL_NO_DATA` | The function completed successfully, but no more data is available | | `SQL_NEED_DATA` | More data is needed, such as when a parameter data is sent at execution time, or additional connection information is required | | `SQL_STILL_EXECUTING` | A function that was asynchronously executed is still executing | ##### Buffers and Binding {#docs:stable:guides:odbc:general::buffers-and-binding} A buffer is a block of memory used to store data. Buffers are used to store data retrieved from the database, or to send data to the database. Buffers are allocated by the application, and then bound to a column in a result set, or a parameter in a query, using the [`SQLBindCol`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlbindcol-function?view=sql-server-ver16) and [`SQLBindParameter`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqlbindparameter-function?view=sql-server-ver16) functions. When the application fetches a row from the result set, or executes a query, the data is stored in the buffer. When the application sends a query to the database, the data in the buffer is sent to the database. #### Setting up an Application {#docs:stable:guides:odbc:general::setting-up-an-application} The following is a step-by-step guide to setting up an application that uses ODBC to connect to a database, execute a query, and fetch the results in `C++`. > To install the driver as well as anything else you will need follow these [instructions](#docs:stable:clients:odbc:overview). ##### 1. Include the SQL Header Files {#docs:stable:guides:odbc:general::1-include-the-sql-header-files} The first step is to include the SQL header files: ```cpp #include #include ``` These files contain the definitions of the ODBC functions, as well as the data types used by ODBC. In order to be able to use these header files you have to have the `unixodbc` package installed: On macOS: ```batch brew install unixodbc ``` On Ubuntu and Debian: ```batch sudo apt-get install -y unixodbc-dev ``` On Fedora, CentOS, and Red Hat: ```batch sudo yum install -y unixODBC-devel ``` Remember to include the header file location in your `CFLAGS`. For `MAKEFILE`: ```make CFLAGS=-I/usr/local/include # or CFLAGS=-/opt/homebrew/Cellar/unixodbc/2.3.11/include ``` For `CMAKE`: ```cmake include_directories(/usr/local/include) # or include_directories(/opt/homebrew/Cellar/unixodbc/2.3.11/include) ``` You also have to link the library in your `CMAKE` or `MAKEFILE`. For `CMAKE`: ```cmake target_link_libraries(ODBC_application /path/to/duckdb_odbc/libduckdb_odbc.dylib) ``` For `MAKEFILE`: ```make LDLIBS=-L/path/to/duckdb_odbc/libduckdb_odbc.dylib ``` ##### 2. Define the ODBC Handles and Connect to the Database {#docs:stable:guides:odbc:general::2-define-the-odbc-handles-and-connect-to-the-database} ###### 2.a. Connecting with SQLConnect {#docs:stable:guides:odbc:general::2a-connecting-with-sqlconnect} Then set up the ODBC handles, allocate them, and connect to the database. First the environment handle is allocated, then the environment is set to ODBC version 3, then the connection handle is allocated, and finally the connection is made to the database. The following code snippet shows how to do this: ```cpp SQLHANDLE env; SQLHANDLE dbc; SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env); SQLSetEnvAttr(env, SQL_ATTR_ODBC_VERSION, (void*)SQL_OV_ODBC3, 0); SQLAllocHandle(SQL_HANDLE_DBC, env, &dbc); std::string dsn = "DSN=duckdbmemory"; SQLConnect(dbc, (SQLCHAR*)dsn.c_str(), SQL_NTS, NULL, 0, NULL, 0); std::cout << "Connected!" << std::endl; ``` ###### 2.b. Connecting with SQLDriverConnect {#docs:stable:guides:odbc:general::2b-connecting-with-sqldriverconnect} Alternatively, you can connect to the ODBC driver using [`SQLDriverConnect`](https://learn.microsoft.com/en-us/sql/odbc/reference/syntax/sqldriverconnect-function?view=sql-server-ver16). `SQLDriverConnect` accepts a connection string in which you can configure the database using any of the available [DuckDB configuration options](#docs:stable:configuration:overview). ```cpp SQLHANDLE env; SQLHANDLE dbc; SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env); SQLSetEnvAttr(env, SQL_ATTR_ODBC_VERSION, (void*)SQL_OV_ODBC3, 0); SQLAllocHandle(SQL_HANDLE_DBC, env, &dbc); SQLCHAR str[1024]; SQLSMALLINT strl; std::string dsn = "DSN=DuckDB;access_mode=READ_ONLY" SQLDriverConnect(dbc, nullptr, (SQLCHAR*)dsn.c_str(), SQL_NTS, str, sizeof(str), &strl, SQL_DRIVER_COMPLETE) std::cout << "Connected!" << std::endl; ``` ##### 3. Adding a Query {#docs:stable:guides:odbc:general::3-adding-a-query} Now that the application is set up, we can add a query to it. First, we need to allocate a statement handle: ```cpp SQLHANDLE stmt; SQLAllocHandle(SQL_HANDLE_STMT, dbc, &stmt); ``` Then we can execute a query: ```cpp SQLExecDirect(stmt, (SQLCHAR*)"SELECT * FROM integers", SQL_NTS); ``` ##### 4. Fetching Results {#docs:stable:guides:odbc:general::4-fetching-results} Now that we have executed a query, we can fetch the results. First, we need to bind the columns in the result set to buffers: ```cpp SQLLEN int_val; SQLLEN null_val; SQLBindCol(stmt, 1, SQL_C_SLONG, &int_val, 0, &null_val); ``` Then we can fetch the results: ```cpp SQLFetch(stmt); ``` ##### 5. Go Wild {#docs:stable:guides:odbc:general::5-go-wild} Now that we have the results, we can do whatever we want with them. For example, we can print them: ```cpp std::cout << "Value: " << int_val << std::endl; ``` or do any other processing we want. As well as executing more queries and doing any thing else we want to do with the database such as inserting, updating, or deleting data. ##### 6. Free the Handles and Disconnecting {#docs:stable:guides:odbc:general::6-free-the-handles-and-disconnecting} Finally, we need to free the handles and disconnect from the database. First, we need to free the statement handle: ```cpp SQLFreeHandle(SQL_HANDLE_STMT, stmt); ``` Then we need to disconnect from the database: ```cpp SQLDisconnect(dbc); ``` And finally, we need to free the connection handle and the environment handle: ```cpp SQLFreeHandle(SQL_HANDLE_DBC, dbc); SQLFreeHandle(SQL_HANDLE_ENV, env); ``` Freeing the connection and environment handles can only be done after the connection to the database has been closed. Trying to free them before disconnecting from the database will result in an error. #### Sample Application {#docs:stable:guides:odbc:general::sample-application} The following is a sample application that includes a `cpp` file that connects to the database, executes a query, fetches the results, and prints them. It also disconnects from the database and frees the handles, and includes a function to check the return value of ODBC functions. It also includes a `CMakeLists.txt` file that can be used to build the application. ##### Sample `.cpp` File {#docs:stable:guides:odbc:general::sample-cpp-file} ```cpp #include #include #include void check_ret(SQLRETURN ret, std::string msg) { if (ret != SQL_SUCCESS && ret != SQL_SUCCESS_WITH_INFO) { std::cout << ret << ": " << msg << " failed" << std::endl; exit(1); } if (ret == SQL_SUCCESS_WITH_INFO) { std::cout << ret << ": " << msg << " succeeded with info" << std::endl; } } int main() { SQLHANDLE env; SQLHANDLE dbc; SQLRETURN ret; ret = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env); check_ret(ret, "SQLAllocHandle(env)"); ret = SQLSetEnvAttr(env, SQL_ATTR_ODBC_VERSION, (void*)SQL_OV_ODBC3, 0); check_ret(ret, "SQLSetEnvAttr"); ret = SQLAllocHandle(SQL_HANDLE_DBC, env, &dbc); check_ret(ret, "SQLAllocHandle(dbc)"); std::string dsn = "DSN=duckdbmemory"; ret = SQLConnect(dbc, (SQLCHAR*)dsn.c_str(), SQL_NTS, NULL, 0, NULL, 0); check_ret(ret, "SQLConnect"); std::cout << "Connected!" << std::endl; SQLHANDLE stmt; ret = SQLAllocHandle(SQL_HANDLE_STMT, dbc, &stmt); check_ret(ret, "SQLAllocHandle(stmt)"); ret = SQLExecDirect(stmt, (SQLCHAR*)"SELECT * FROM integers", SQL_NTS); check_ret(ret, "SQLExecDirect(SELECT * FROM integers)"); SQLLEN int_val; SQLLEN null_val; ret = SQLBindCol(stmt, 1, SQL_C_SLONG, &int_val, 0, &null_val); check_ret(ret, "SQLBindCol"); ret = SQLFetch(stmt); check_ret(ret, "SQLFetch"); std::cout << "Value: " << int_val << std::endl; ret = SQLFreeHandle(SQL_HANDLE_STMT, stmt); check_ret(ret, "SQLFreeHandle(stmt)"); ret = SQLDisconnect(dbc); check_ret(ret, "SQLDisconnect"); ret = SQLFreeHandle(SQL_HANDLE_DBC, dbc); check_ret(ret, "SQLFreeHandle(dbc)"); ret = SQLFreeHandle(SQL_HANDLE_ENV, env); check_ret(ret, "SQLFreeHandle(env)"); } ``` ##### Sample `CMakelists.txt` File {#docs:stable:guides:odbc:general::sample-cmakeliststxt-file} ```cmake cmake_minimum_required(VERSION 3.25) project(ODBC_Tester_App) set(CMAKE_CXX_STANDARD 17) include_directories(/opt/homebrew/Cellar/unixodbc/2.3.11/include) add_executable(ODBC_Tester_App main.cpp) target_link_libraries(ODBC_Tester_App /duckdb_odbc/libduckdb_odbc.dylib) ``` ## Performance {#guides:performance} ### Performance Guide {#docs:stable:guides:performance:overview} DuckDB aims to automatically achieve high performance by using well-chosen default configurations and having a forgiving architecture. Of course, there are still opportunities for tuning the system for specific workloads. The Performance Guide's page contain guidelines and tips for achieving good performance when loading and processing data with DuckDB. The guides include several microbenchmarks. You may find details about these on the [Benchmarks page](#docs:stable:guides:performance:benchmarks). ### Environment {#docs:stable:guides:performance:environment} The environment where DuckDB is run has an obvious impact on performance. This page focuses on the effects of the hardware configuration and the operating system used. #### Hardware Configuration {#docs:stable:guides:performance:environment::hardware-configuration} ##### CPU {#docs:stable:guides:performance:environment::cpu} DuckDB works efficiently on both AMD64 (x86_64) and ARM64 (AArch64) CPU architectures. ##### Memory {#docs:stable:guides:performance:environment::memory} > **Best practice.** Aim for 1-4 GB memory per thread. ###### Minimum Required Memory {#docs:stable:guides:performance:environment::minimum-required-memory} As a rule of thumb, DuckDB requires a _minimum_ of 125 MB of memory per thread. For example, if you use 8 threads, you need at least 1 GB of memory. If you are working in a memory-constrained environment, consider [limiting the number of threads](#docs:stable:configuration:pragmas::threads), e.g., by issuing: ```sql SET threads = 4; ``` ###### Memory for Ideal Performance {#docs:stable:guides:performance:environment::memory-for-ideal-performance} The amount of memory required for ideal performance depends on several factors, including the dataset size and the queries to execute. Maybe surprisingly, the _queries_ have a larger effect on the memory requirement. Workloads containing large joins over many-to-many tables yield large intermediate datasets and thus require more memory for their evaluation to fully fit into the memory. As an approximation, aggregation-heavy workloads require 1-2 GB memory per thread and join-heavy workloads require 3-4 GB memory per thread. ###### Larger-than-Memory Workloads {#docs:stable:guides:performance:environment::larger-than-memory-workloads} DuckDB can process larger-than-memory workloads by spilling to disk. This is possible thanks to _out-of-core_ support for grouping, joining, sorting and windowing operators. Note that larger-than-memory workloads can be processed both in persistent mode and in in-memory mode as DuckDB still spills to disk in both modes. ##### Local Disk {#docs:stable:guides:performance:environment::local-disk} **Disk type.** DuckDB's disk-based mode is designed to work best with SSD and NVMe disks. While HDDs are supported, they will result in low performance, especially for write operations. **Disk-based vs. in-memory storage.** Counter-intuitively, using a disk-based DuckDB instance can be faster than an in-memory instance due to compression. Read more in the [â€œHow to Tune Workloadsâ€ page](#docs:stable:guides:performance:how_to_tune_workloads::persistent-vs-in-memory-tables). **File systems.** On Linux, [DuckDB performs best with the XFS file system](https://www.phoronix.com/review/linux-615-filesystems/5) but it also performs reasonably well with other file systems such as ext4. On Windows, we recommend using NTFS and avoiding FAT32. > **Note.** that DuckDB databases have built-in checksums, so integrity checks from the file system are not required to prevent data corruption. ##### Network-Attached Disks {#docs:stable:guides:performance:environment::network-attached-disks} **Cloud disks.** DuckDB runs well on network-backed cloud disks such as [AWS EBS](https://aws.amazon.com/ebs/) for both read-only and read-write workloads. **Network-attached storage.** Network-attached storage can serve DuckDB for read-only workloads. However, _it is not recommended to run DuckDB in read-write mode on network-attached storage (NAS)._ These setups include [NFS](https://en.wikipedia.org/wiki/Network_File_System), network drives such as [SMB](https://en.wikipedia.org/wiki/Server_Message_Block) and [Samba](https://en.wikipedia.org/wiki/Samba_(software)). Based on user reports, running read-write workloads on network-attached storage can result in slow and unpredictable performance, as well as spurious errors cased by the underlying file system. > **Warning.** Avoid running DuckDB in read-write mode on network-attached storage. > **Best practice.** Fast disks are important if your workload is larger than memory and/or fast data loading is important. Only use network-backed disks if they are reliable (e.g., cloud disks) and guarantee high IO. #### Operating System {#docs:stable:guides:performance:environment::operating-system} We recommend using the latest stable version of operating systems: macOS, Windows, and Linux are all well-tested and DuckDB can run on them with high performance. ##### Linux {#docs:stable:guides:performance:environment::linux} DuckDB runs on all mainstream Linux distributions released in the last â‰ˆ5 years. If you don't have a particular preference, we recommended using Ubuntu Linux LTS due to its stability and the fact that most of DuckDBâ€™s Linux test suite jobs run on Ubuntu workers. ###### glibc vs. musl libc {#docs:stable:guides:performance:environment::glibc-vs-musl-libc} DuckDB can be built with both [glibc](https://www.gnu.org/software/libc/) (default) and [musl libc](https://www.musl-libc.org/) (see the [build guide](#docs:stable:dev:building:linux)). However, note that DuckDB binaries built with musl libc have lower performance. In practice, this can lead to a slowdown of more than 5Ã— on compute-intensive workloads. Therefore, it's recommended to use a Linux distribution with glibc for performance-oriented workloads when running DuckDB. #### Memory Allocator {#docs:stable:guides:performance:environment::memory-allocator} If you have a many-core CPU running on a system where DuckDB ships with [`jemalloc`](#docs:stable:core_extensions:jemalloc) as the default memory allocator, consider [enabling the allocator's background threads](#docs:stable:core_extensions:jemalloc::background-threads). ### Data Import {#docs:stable:guides:performance:import} #### Recommended Import Methods {#docs:stable:guides:performance:import::recommended-import-methods} When importing data from other systems to DuckDB, there are several considerations to take into account. We recommend importing using the following order: 1. For systems which are supported by a DuckDB scanner extension, it's preferable to use the scanner. DuckDB currently offers scanners for [MySQL](#docs:stable:guides:database_integration:mysql), [PostgreSQL](#docs:stable:guides:database_integration:postgres), and [SQLite](#docs:stable:guides:database_integration:sqlite). 2. If there is a bulk export feature in the data source system, export the data to Parquet or CSV format, then load it using DuckDB's [Parquet](#docs:stable:guides:file_formats:parquet_import) or [CSV loader](#docs:stable:guides:file_formats:csv_import). 3. If the approaches above are not applicable, consider using the DuckDB [appender](#docs:stable:data:appender), currently available in the C, C++, Go, Java, and Rust APIs. #### Methods to Avoid {#docs:stable:guides:performance:import::methods-to-avoid} If possible, avoid looping row-by-row (tuple-at-a-time) in favor of bulk operations. Performing row-by-row inserts (even with prepared statements) is detrimental to performance and will result in slow load times. > **Best practice.** Unless your data is small (<100k rows), avoid using inserts in loops. ### Schema {#docs:stable:guides:performance:schema} #### Types {#docs:stable:guides:performance:schema::types} It is important to use the correct type for encoding columns (e.g., `BIGINT`, `DATE`, `DATETIME`). While it is always possible to use string types (` VARCHAR`, etc.) to encode more specific values, this is not recommended. Strings use more space and are slower to process in operations such as filtering, join, and aggregation. When loading CSV files, you may leverage the CSV reader's [auto-detection mechanism](#docs:stable:data:csv:auto_detection) to get the correct types for CSV inputs. If you run in a memory-constrained environment, using smaller data types (e.g., `TINYINT`) can reduce the amount of memory and disk space required to complete a query. DuckDBâ€™s [bitpacking compression](https://duckdb.org/2022/10/28/lightweight-compression#bit-packing) means small values stored in larger data types will not take up larger sizes on disk, but they will take up more memory during processing. > **Best practice.** Use the most restrictive types possible when creating columns. Avoid using strings for encoding more specific data items. ##### Microbenchmark: Using Timestamps {#docs:stable:guides:performance:schema::microbenchmark-using-timestamps} We illustrate the difference in aggregation speed using the [`creationDate` column of the LDBC Comment table on scale factor 300](https://blobs.duckdb.org/data/ldbc-sf300-comments-creationDate.parquet). This table has approx. 554 million unordered timestamp values. We run a simple aggregation query that returns the average day-of-the month from the timestamps in two configurations. First, we use a `DATETIME` to encode the values and run the query using the [`extract` datetime function](#docs:stable:sql:functions:timestamp): ```sql SELECT avg(extract('day' FROM creationDate)) FROM Comment; ``` Second, we use the `VARCHAR` type and use string operations: ```sql SELECT avg(CAST(creationDate[9:10] AS INTEGER)) FROM Comment; ``` The results of the microbenchmark are as follows: | Column type | Storage size | Query time | | ----------- | -----------: | ---------: | | `DATETIME` | 3.3 GB | 0.9 s | | `VARCHAR` | 5.2 GB | 3.9 s | The results show that using the `DATETIME` value yields smaller storage sizes and faster processing. ##### Microbenchmark: Joining on Strings {#docs:stable:guides:performance:schema::microbenchmark-joining-on-strings} We illustrate the difference caused by joining on different types by computing a self-join on the [LDBC Comment table at scale factor 100](https://blobs.duckdb.org/data/ldbc-sf100-comments.tar.zst). The table has 64-bit integer identifiers used as the `id` attribute of each row. We perform the following join operation: ```sql SELECT count(*) AS count FROM Comment c1 JOIN Comment c2 ON c1.ParentCommentId = c2.id; ``` In the first experiment, we use the correct (most restrictive) types, i.e., both the `id` and the `ParentCommentId` columns are defined as `BIGINT`. In the second experiment, we define all columns with the `VARCHAR` type. While the results of the queries are the same for all both experiments, their runtime vary significantly. The results below show that joining on `BIGINT` columns is approx. 1.8Ã— faster than performing the same join on `VARCHAR`-typed columns encoding the same value. | Join column payload type | Join column schema type | Example value | Query time | | ------------------------ | ----------------------- | ------------------ | ---------: | | `BIGINT` | `BIGINT` | `70368755640078` | 1.2 s | | `BIGINT` | `VARCHAR` | `'70368755640078'` | 2.1 s | > **Best practice.** Avoid representing numeric values as strings, especially if you intend to perform e.g., join operations on them. #### Constraints {#docs:stable:guides:performance:schema::constraints} DuckDB allows defining [constraints](#docs:stable:sql:constraints) such as `UNIQUE`, `PRIMARY KEY`, and `FOREIGN KEY`. These constraints can be beneficial for ensuring data integrity but they have a negative effect on load performance as they necessitate building indexes and performing checks. Moreover, they _very rarely improve the performance of queries_ as DuckDB does not rely on these indexes for join and aggregation operators (see [indexing](#docs:stable:guides:performance:indexing) for more details). > **Best practice.** Do not define constraints unless your goal is to ensure data integrity. ##### Microbenchmark: The Effect of Primary Keys {#docs:stable:guides:performance:schema::microbenchmark-the-effect-of-primary-keys} We illustrate the effect of using primary keys with the [LDBC Comment table at scale factor 300](https://blobs.duckdb.org/data/ldbc-sf300-comments.tar.zst). This table has approx. 554 million entries. In the first experiments, we create the schema *without* a primary key, then load the data. In the second experiment, we create the schema *with* a primary key, then load the data. In the third case, we create the schema *without* a primary key, load the data and then add the primary key constraint. In all cases, we take the data from `.csv.gz` files, and measure the time required to perform the loading. | Operation | Execution time | |-----------------------------------------------|---------------:| | Load with primary key | 461.6 s | | Load without primary key | 121.0 s | | Load without primary key then add primary key | 242.0 s | For this dataset, primary keys will only have a (small) positive effect on highly selective queries such as when filtering on a single identifier. Defining primary keys (or indexes) will not have an effect on join and aggregation operators. > **Best practice.** For best bulk load performance, avoid primary key constraints. > If they are required, define them after the bulk loading step. ### Indexing {#docs:stable:guides:performance:indexing} DuckDB has two types of indexes: zonemaps and ART indexes. #### Zonemaps {#docs:stable:guides:performance:indexing::zonemaps} DuckDB automatically creates [zonemaps](https://en.wikipedia.org/wiki/Block_Range_Index) (also known as min-max indexes) for the columns of all [general-purpose data types](#docs:stable:sql:data_types:overview::general-purpose-data-types). Operations like predicate pushdown into scan operators and computing aggregations use zonemaps. If a filter criterion (like `WHERE column1 = 123`) is in use, DuckDB can skip any row group whose min-max range does not contain that filter value (e.g., it can omit a block with a min-max range of 1000 to 2000 when comparing for `= 123` or `< 400`). ##### The Effect of Ordering on Zonemaps {#docs:stable:guides:performance:indexing::the-effect-of-ordering-on-zonemaps} The more ordered the data within a column, the more valuable the zonemap indexes will be. For example, a column could contain a random number on every row in the worst case. Then, DuckDB will likely be unable to skip any row groups. If you query specific columns with selective filters, it is best to pre-order data by those columns when inserting it. Even an imperfect ordering will still be helpful. The best case of ordered data commonly arises with `DATETIME` columns. ##### Microbenchmark: The Effect of Ordering {#docs:stable:guides:performance:indexing::microbenchmark-the-effect-of-ordering} For an example, letâ€™s repeat the [microbenchmark for timestamps](#docs:stable:guides:performance:schema::microbenchmark-using-timestamps) with an ordered timestamp column using an ascending order vs. an unordered one. | Column type | Ordered | Storage size | Query time | |---|---|--:|--:| | `DATETIME` | yes | 1.3 GB | 0.6 s | | `DATETIME` | no | 3.3 GB | 0.9 s | The results show that simply keeping the column order allows for improved compression, yielding a 2.5Ã— smaller storage size. It also allows the computation to be 1.5Ã— faster. ##### Ordered Integers {#docs:stable:guides:performance:indexing::ordered-integers} Another practical way to exploit ordering is to use the `INTEGER` type with automatic increments rather than `UUID` for columns queried using selective filters. In a scenario where a table contains out-of-order `UUID`s, DuckDB has to scan many row groups to find a specific `UUID` value. An ordered `INTEGER` column allows skipping all row groups except those containing the value. #### ART Indexes {#docs:stable:guides:performance:indexing::art-indexes} DuckDB allows defining [Adaptive Radix Tree (ART) indexes](https://db.in.tum.de/~leis/papers/ART.pdf) in two ways. First, such an index is created implicitly for columns with `PRIMARY KEY`, `FOREIGN KEY`, and `UNIQUE` [constraints](#docs:stable:guides:performance:schema::constraints). Second, explicitly running the [`CREATE INDEX`](#docs:stable:sql:indexes) statement creates an ART index on the target column(s). The tradeoffs of having an ART index on a column are as follows: 1. ART indexes enable constraint checking during changes (inserts, updates, and deletes). 2. Changes on indexed tables perform worse than their non-indexed counterparts. That is because of index maintenance for these operations. 3. For some use cases, _single-column ART indexes_ improve the performance of highly selective queries using the indexed column. An ART index does not affect the performance of join, aggregation, and sorting queries. ##### ART Index Scans {#docs:stable:guides:performance:indexing::art-index-scans} ART index scans probe a single-column ART index for the requested data instead of scanning a table sequentially. Probing can improve the performance of some queries. DuckDB will try to use an index scan for equality and `IN(...)` conditions. It also pushes dynamic filters, e.g., from hash joins, into the scan, allowing dynamic index scans on these filters. Indexes are only eligible for index scans if they index a single column without expressions. E.g., the following index is eligible for index scans: ```sql CREATE INDEX idx ON tbl (col1); ``` E.g., the following two indexes are **NOT** eligible for index scans: ```sql CREATE INDEX idx_multi_column ON tbl (col1, col2); CREATE INDEX idx_expr ON tbl (col1 + 1); ``` The default threshold for index scans is `MAX(2048, 0.001 * table_cardinality)`. You can configure this threshold via `index_scan_percentage` and `index_scan_max_count`, or disable them by setting these values to zero. When in doubt, use [`EXPLAIN ANALYZE`](#docs:stable:guides:meta:explain_analyze) to verify that your query plan uses the index scan. ##### Indexes and Memory {#docs:stable:guides:performance:indexing::indexes-and-memory} DuckDB registers index memory through its buffer manager. However, these index buffers are not yet buffer-managed. That means DuckDB does not yet destroy any index buffers if it has to evict memory. Thus, indexes can take up a significant portion of DuckDB's available memory, potentially affecting the performance of memory-intensive queries. Re-attaching (` DETACH` + `ATTACH`) the database containing indexes can mitigate this effect, as we deserialize index memory lazily. Disabling index scans and re-attaching after changes can further decrease the impact of indexes on DuckDB's available memory. ##### Indexes and Opening Databases {#docs:stable:guides:performance:indexing::indexes-and-opening-databases} Indexes are serialized to disk and deserialized lazily, i.e., when reopening the database. Operations using the index will only load the required parts of the index. Therefore, having an index will not cause any slowdowns when opening an existing database. > **Best practice.** We recommend following these guidelines: > > * Only use primary keys, foreign keys, or unique constraints, if these are necessary for enforcing constraints on your data. > * Do not define explicit indexes unless you have highly selective queries and enough memory available. > * If you define an ART index, do so after bulk loading the data to the table. Adding an index prior to loading, either explicitly or via primary/foreign keys, is [detrimental to load performance](#docs:stable:guides:performance:schema::microbenchmark-the-effect-of-primary-keys). ### Join Operations {#docs:stable:guides:performance:join_operations} #### How to Force a Join Order {#docs:stable:guides:performance:join_operations::how-to-force-a-join-order} DuckDB has a cost-based query optimizer, which uses statistics in the base tables (stored in a DuckDB database or Parquet files) to estimate the cardinality of operations. ##### Turn off the Join Order Optimizer {#docs:stable:guides:performance:join_operations::turn-off-the-join-order-optimizer} To turn off the join order optimizer, set the following [`PRAGMA`s](#docs:stable:configuration:pragmas): ```sql SET disabled_optimizers = 'join_order,build_side_probe_side'; ``` This disables both the join order optimizer and left/right swapping for joins. This way, DuckDB builds a left-deep join tree following the order of `JOIN` clauses. ```sql SELECT ... FROM ... JOIN ... -- this join is performed first JOIN ...; -- this join is performed second ``` Once the query in question has been executed, turn back the optimizers with the following command: ```sql SET disabled_optimizers = ''; ``` ##### Create Temporary Tables {#docs:stable:guides:performance:join_operations::create-temporary-tables} To force a particular join order, you can break up the query into multiple queries with each creating a temporary tables: ```sql CREATE OR REPLACE TEMPORARY TABLE t1 AS ...; -- join on the result of the first query, t1 CREATE OR REPLACE TEMPORARY TABLE t2 AS SELECT * FROM t1 ...; -- compute the final result using t2 SELECT * FROM t1 ... ``` To clean up, drop the interim tables: ```sql DROP TABLE IF EXISTS t1; DROP TABLE IF EXISTS t2; ``` ### File Formats {#docs:stable:guides:performance:file_formats} #### Handling Parquet Files {#docs:stable:guides:performance:file_formats::handling-parquet-files} DuckDB has advanced support for Parquet files, which includes [directly querying Parquet files](https://duckdb.org/2021/06/25/querying-parquet). When deciding on whether to query these files directly or to first load them to the database, you need to consider several factors. ##### Reasons for Querying Parquet Files {#docs:stable:guides:performance:file_formats::reasons-for-querying-parquet-files} **Availability of basic statistics:** Parquet files use a columnar storage format and contain basic statistics such as [zonemaps](#docs:stable:guides:performance:indexing::zonemaps). Thanks to these features, DuckDB can leverage optimizations such as projection and filter pushdown on Parquet files. Therefore, workloads that combine projection, filtering, and aggregation tend to perform quite well when run on Parquet files. **Storage considerations:** Loading the data from Parquet files will require approximately the same amount of space for the DuckDB database file. Therefore, if the available disk space is constrained, it is worth running the queries directly on Parquet files. ##### Reasons against Querying Parquet Files {#docs:stable:guides:performance:file_formats::reasons-against-querying-parquet-files} **Lack of advanced statistics:** The DuckDB database format has the [hyperloglog statistics](https://en.wikipedia.org/wiki/HyperLogLog) that Parquet files do not have. These improve the accuracy of cardinality estimates, and are especially important if the queries contain a large number of join operators. **Tip.** If you find that DuckDB produces a suboptimal join order on Parquet files, try loading the Parquet files to DuckDB tables. The improved statistics likely help obtain a better join order. **Repeated queries:** If you plan to run multiple queries on the same dataset, it is worth loading the data into DuckDB. The queries will always be somewhat faster, which over time amortizes the initial load time. **High decompression times:** Some Parquet files are compressed using heavyweight compression algorithms such as gzip. In these cases, querying the Parquet files will necessitate an expensive decompression time every time the file is accessed. Meanwhile, lightweight compression methods like Snappy, LZ4, and zstd, are faster to decompress. You may use the [`parquet_metadata` function](#docs:stable:data:parquet:metadata::parquet-metadata) to find out the compression algorithm used. ###### Microbenchmark: Running TPC-H on a DuckDB Database vs. Parquet {#docs:stable:guides:performance:file_formats::microbenchmark-running-tpc-h-on-a-duckdb-database-vs-parquet} The queries on the [TPC-H benchmark](#docs:stable:core_extensions:tpch) run approximately 1.1-5.0Ã— slower on Parquet files than on a DuckDB database. > **Best practice.** If you have the storage space available, and have a join-heavy workload and/or plan to run many queries on the same dataset, load the Parquet files into the database first. The compression algorithm and the row group sizes in the Parquet files have a large effect on performance: study these using the [`parquet_metadata` function](#docs:stable:data:parquet:metadata::parquet-metadata). ##### The Effect of Row Group Sizes {#docs:stable:guides:performance:file_formats::the-effect-of-row-group-sizes} DuckDB works best on Parquet files with row groups of 100K-1M rows each. The reason for this is that DuckDB can only [parallelize over row groups](#docs:stable:guides:performance:how_to_tune_workloads::parallelism-multi-core-processing) â€“ so if a Parquet file has a single giant row group it can only be processed by a single thread. You can use the [`parquet_metadata` function](#docs:stable:data:parquet:metadata::parquet-metadata) to figure out how many row groups a Parquet file has. When writing Parquet files, use the [`row_group_size`](#docs:stable:sql:statements:copy::parquet-options) option. ###### Microbenchmark: Running Aggregation Query at Different Row Group Sizes {#docs:stable:guides:performance:file_formats::microbenchmark-running-aggregation-query-at-different-row-group-sizes} We run a simple aggregation query over Parquet files using different row group sizes, selected between 960 and 1,966,080. The results are as follows. | Row group size | Execution time | |---------------:|---------------:| | 960 | 8.77 s | | 1920 | 8.95 s | | 3840 | 4.33 s | | 7680 | 2.35 s | | 15360 | 1.58 s | | 30720 | 1.17 s | | 61440 | 0.94 s | | 122880 | 0.87 s | | 245760 | 0.93 s | | 491520 | 0.95 s | | 983040 | 0.97 s | | 1966080 | 0.88 s | The results show that row group sizes <5,000 have a strongly detrimental effect, making runtimes more than 5-10Ã— larger than ideally-sized row groups, while row group sizes between 5,000 and 20,000 are still 1.5-2.5Ã— off from best performance. Above row group size of 100,000, the differences are small: the gap is about 10% between the best and the worst runtime. ##### Parquet File Sizes {#docs:stable:guides:performance:file_formats::parquet-file-sizes} DuckDB can also parallelize across multiple Parquet files. It is advisable to have at least as many total row groups across all files as there are CPU threads. For example, with a machine having 10 threads, both 10 files with 1 row group or 1 file with 10 row groups will achieve full parallelism. It is also beneficial to keep the size of individual Parquet files moderate. > **Best practice.** The ideal range is between 100 MB and 10 GB per individual Parquet file. ##### Hive Partitioning for Filter Pushdown {#docs:stable:guides:performance:file_formats::hive-partitioning-for-filter-pushdown} When querying many files with filter conditions, performance can be improved by using a [Hive-format folder structure](#docs:stable:data:partitioning:hive_partitioning) to partition the data along the columns used in the filter condition. DuckDB will only need to read the folders and files that meet the filter criteria. This can be especially helpful when querying remote files. ##### More Tips on Reading and Writing Parquet Files {#docs:stable:guides:performance:file_formats::more-tips-on-reading-and-writing-parquet-files} For tips on reading and writing Parquet files, see the [Parquet Tips page](#docs:stable:data:parquet:tips). #### Loading CSV Files {#docs:stable:guides:performance:file_formats::loading-csv-files} CSV files are often distributed in compressed format such as GZIP archives (` .csv.gz`). DuckDB can decompress these files on the fly. In fact, this is typically faster than decompressing the files first and loading them due to reduced IO. | Schema | Load time | |---|--:| | Load from GZIP-compressed CSV files (` .csv.gz`) | 107.1 s | | Decompressing (using parallel `gunzip`) and loading from decompressed CSV files | 121.3 s | ##### Loading Many Small CSV Files {#docs:stable:guides:performance:file_formats::loading-many-small-csv-files} The [CSV reader](#docs:stable:data:csv:overview) runs the [CSV sniffer](https://duckdb.org/2023/10/27/csv-sniffer) on all files. For many small files, this may cause an unnecessarily high overhead. A potential optimization to speed this up is to turn the sniffer off. Assuming that all files have the same CSV dialect and column names/types, get the sniffer options as follows: ```sql .mode line SELECT Prompt FROM sniff_csv('part-0001.csv'); ``` ```text Prompt = FROM read_csv('file_path.csv', auto_detect=false, delim=',', quote='"', escape='"', new_line='\n', skip=0, header=true, columns={'hello': 'BIGINT', 'world': 'VARCHAR'}); ``` Then, you can adjust `read_csv` command, by e.g., applying [filename expansion (globbing)](#docs:stable:sql:functions:pattern_matching::globbing), and run with the rest of the options detected by the sniffer: ```sql FROM read_csv('part-*.csv', auto_detect=false, delim=',', quote='"', escape='"', new_line='\n', skip=0, header=true, columns={'hello': 'BIGINT', 'world': 'VARCHAR'}); ``` ### Tuning Workloads {#docs:stable:guides:performance:how_to_tune_workloads} #### The `preserve_insertion_order` Option {#docs:stable:guides:performance:how_to_tune_workloads::the-preserve_insertion_order-option} When importing or exporting datasets (from/to the Parquet or CSV formats), which are much larger than the available memory, an out of memory error may occur: ```console Out of Memory Error: failed to allocate data of size ... (.../... used) ``` In these cases, consider setting the [`preserve_insertion_order` configuration option](#docs:stable:configuration:overview) to `false`: ```sql SET preserve_insertion_order = false; ``` This allows the systems to re-order any results that do not contain `ORDER BY` clauses, potentially reducing memory usage. #### Parallelism (Multi-Core Processing) {#docs:stable:guides:performance:how_to_tune_workloads::parallelism-multi-core-processing} ##### The Effect of Row Groups on Parallelism {#docs:stable:guides:performance:how_to_tune_workloads::the-effect-of-row-groups-on-parallelism} DuckDB parallelizes the workload based on _[row groups](#docs:stable:internals:storage::row-groups),_ i.e., groups of rows that are stored together at the storage level. The default row group size in DuckDB's database format is 122,880 rows. Parallelism starts at the level of row groups, therefore, for a query to run on _k_ threads, it needs to scan at least _k_ \* 122,880 rows. The row group size can be specified as an option of the `ATTACH` statement: ```sql ATTACH '/tmp/somefile.db' AS db (ROW_GROUP_SIZE 16384); ``` The [performance considerations when chosing `ROW_GROUP_SIZE` for Parquet files](#docs:stable:data:parquet:tips::selecting-a-row_group_size) apply verbatim to DuckDB's own database format. ##### Too Many Threads {#docs:stable:guides:performance:how_to_tune_workloads::too-many-threads} Note that in certain cases DuckDB may launch _too many threads_ (e.g., due to HyperThreading), which can lead to slowdowns. In these cases, itâ€™s worth manually limiting the number of threads using [`SET threads = X`](#docs:stable:configuration:pragmas::threads). #### Larger-than-Memory Workloads (Out-of-Core Processing) {#docs:stable:guides:performance:how_to_tune_workloads::larger-than-memory-workloads-out-of-core-processing} A key strength of DuckDB is support for larger-than-memory workloads, i.e., it is able to process datasets that are larger than the available system memory (also known as _out-of-core processing_). It can also run queries where the intermediate results cannot fit into memory. This section explains the prerequisites, scope, and known limitations of larger-than-memory processing in DuckDB. ##### Spilling to Disk {#docs:stable:guides:performance:how_to_tune_workloads::spilling-to-disk} Larger-than-memory workloads are supported by spilling to disk. With the default configuration, DuckDB creates the `âŸ¨database_file_nameâŸ©.tmp`{:.language-sql .highlight} temporary directory (in persistent mode) or the `.tmp`{:.language-sql .highlight} directory (in in-memory mode). This directory can be changed using the [`temp_directory` configuration option](#docs:stable:configuration:pragmas::temp-directory-for-spilling-data-to-disk), e.g.: ```sql SET temp_directory = '/path/to/temp_dir.tmp/'; ``` ##### Blocking Operators {#docs:stable:guides:performance:how_to_tune_workloads::blocking-operators} Some operators cannot output a single row until the last row of their input has been seen. These are called _blocking operators_ as they require their entire input to be buffered, and are the most memory-intensive operators in relational database systems. The main blocking operators are the following: - _grouping:_ [`GROUP BY`](#docs:stable:sql:query_syntax:groupby) - _joining:_ [`JOIN`](#docs:stable:sql:query_syntax:from::joins) - _sorting:_ [`ORDER BY`](#docs:stable:sql:query_syntax:orderby) - _windowing:_ [`OVER ... (PARTITION BY ... ORDER BY ...)`](#docs:stable:sql:functions:window_functions) DuckDB supports larger-than-memory processing for all of these operators. ##### Limitations {#docs:stable:guides:performance:how_to_tune_workloads::limitations} DuckDB strives to always complete workloads even if they are larger-than-memory. That said, there are some limitations at the moment: - If multiple blocking operators appear in the same query, DuckDB may still throw an out-of-memory exception due to the complex interplay of these operators. - Some [aggregate functions](#docs:stable:sql:functions:aggregates), such as `list()` and `string_agg()`, do not support offloading to disk. - [Aggregate functions that use sorting](#docs:stable:sql:functions:aggregates::order-by-clause-in-aggregate-functions) are holistic, i.e., they need all inputs before the aggregation can start. As DuckDB cannot yet offload some complex intermediate aggregate states to disk, these functions can cause an out-of-memory exception when run on large datasets. - The `PIVOT` operation [internally uses the `list()` function](#docs:stable:sql:statements:pivot::internals), therefore it is subject to the same limitation. #### Profiling {#docs:stable:guides:performance:how_to_tune_workloads::profiling} If your queries are not performing as well as expected, itâ€™s worth studying their query plans: - Use [`EXPLAIN`](#docs:stable:guides:meta:explain) to print the physical query plan without running the query. - Use [`EXPLAIN ANALYZE`](#docs:stable:guides:meta:explain_analyze) to run and profile the query. This will show the CPU time that each step in the query takes. Note that due to multi-threading, adding up the individual times will be larger than the total query processing time. Query plans can point to the root of performance issues. A few general directions: - Avoid nested loop joins in favor of hash joins. - A scan that does not include a filter pushdown for a filter condition that is later applied performs unnecessary IO. Try rewriting the query to apply a pushdown. - Bad join orders where the cardinality of an operator explodes to billions of tuples should be avoided at all costs. #### Prepared Statements {#docs:stable:guides:performance:how_to_tune_workloads::prepared-statements} [Prepared statements](#docs:stable:sql:query_syntax:prepared_statements) can improve performance when running the same query many times, but with different parameters. When a statement is prepared, it completes several of the initial portions of the query execution process (parsing, planning, etc.) and caches their output. When it is executed, those steps can be skipped, improving performance. This is beneficial mostly for repeatedly running small queries (with a runtime of < 100ms) with different sets of parameters. Note that it is not a primary design goal for DuckDB to quickly execute many small queries concurrently. Rather, it is optimized for running larger, less frequent queries. #### Querying Remote Files {#docs:stable:guides:performance:how_to_tune_workloads::querying-remote-files} DuckDB uses synchronous IO when reading remote files. This means that each DuckDB thread can make at most one HTTP request at a time. If a query must make many small requests over the network, increasing DuckDB's [`threads` setting](#docs:stable:configuration:pragmas::threads) to larger than the total number of CPU cores (approx. 2-5 times CPU cores) can improve parallelism and performance. ##### Avoid Reading Unnecessary Data {#docs:stable:guides:performance:how_to_tune_workloads::avoid-reading-unnecessary-data} The main bottleneck in workloads reading remote files is likely to be the IO. This means that minimizing the unnecessarily read data can be highly beneficial. Some basic SQL tricks can help with this: - Avoid `SELECT *`. Instead, only select columns that are actually used. DuckDB will try to only download the data it actually needs. - Apply filters on remote Parquet files when possible. DuckDB can use these filters to reduce the amount of data that is scanned. - Either [sort](#docs:stable:sql:query_syntax:orderby) or [partition](#docs:stable:data:partitioning:partitioned_writes) data by columns that are regularly used for filters: this increases the effectiveness of the filters in reducing IO. To inspect how much remote data is transferred for a query, [`EXPLAIN ANALYZE`](#docs:stable:guides:meta:explain_analyze) can be used to print out the total number of requests and total data transferred for queries on remote files. ##### Caching {#docs:stable:guides:performance:how_to_tune_workloads::caching} Starting with version 1.3.0, DuckDB supports caching remote data. To inspect the content of the external file cache, run: ```sql FROM duckdb_external_file_cache(); ``` #### Best Practices for Using Connections {#docs:stable:guides:performance:how_to_tune_workloads::best-practices-for-using-connections} DuckDB will perform best when reusing the same database connection many times. Disconnecting and reconnecting on every query will incur some overhead, which can reduce performance when running many small queries. DuckDB also caches some data and metadata in memory, and that cache is lost when the last open connection is closed. Frequently, a single connection will work best, but a connection pool may also be used. Using multiple connections can parallelize some operations, although it is typically not necessary. DuckDB does attempt to parallelize as much as possible within each individual query, but it is not possible to parallelize in all cases. Making multiple connections can process more operations concurrently. This can be more helpful if DuckDB is not CPU limited, but instead bottlenecked by another resource like network transfer speed. #### Persistent vs. In-Memory Tables {#docs:stable:guides:performance:how_to_tune_workloads::persistent-vs-in-memory-tables} DuckDB supports [lightweight compression techniques](https://duckdb.org/2022/10/28/lightweight-compression). By default, compression is only applied on persistent (on-disk) databases and not on in-memory tables. In some cases, this can result in counter-intuitive performance results where queries are faster on on-disk tables compared to in-memory ones. Let's take Q1 of the [TPC-H workload](#docs:stable:core_extensions:tpch) for example on the SF30 dataset: ```sql CALL dbgen(sf = 30); .timer on PRAGMA tpch(1); ``` We run this script using three DuckDB prompts: | Database setup | DuckDB prompt | Execution time | | --------------------------- | ----------------------------------------------------------- | -------------: | | In-memory DB (uncompressed) | `duckdb` | 4.22 s | | In-memory DB (compressed) | `duckdb -cmd "ATTACH ':memory:' AS db (COMPRESS); USE db;"` | 0.55 s | | Persistent DB (compressed) | `duckdb tpch-sf30.db` | 0.56 s | We can observe that the compressed databases about 8Ã— faster compared to the uncompressed in-memory database. ### My Workload Is Slow {#docs:stable:guides:performance:my_workload_is_slow} If you find that your workload in DuckDB is slow, we recommend performing the following checks. More detailed instructions are linked for each point. 1. Do you have enough memory? DuckDB works best if you have [1-4 GB memory per thread](#docs:stable:guides:performance:environment::cpu-and-memory). 1. Are you using a fast disk? Network-attached disks (such as cloud block storage) cause write-intensive and [larger than memory](#docs:stable:guides:performance:how_to_tune_workloads::spilling-to-disk) workloads to slow down. For running such workloads in cloud environments, it is recommended to use instance-attached storage (NVMe SSDs). 1. Are you using indexes or constraints (primary key, unique, etc.)? If possible, try [disabling them](#docs:stable:guides:performance:schema::indexing), which boosts load and update performance. 1. Are you using the correct types? For example, [use `TIMESTAMP` to encode datetime values](#docs:stable:guides:performance:schema::types). 1. Are you reading from Parquet files? If so, do they have [row group sizes between 100k and 1M](#docs:stable:guides:performance:file_formats::the-effect-of-row-group-sizes) and file sizes between 100 MB to 10 GB? 1. Does the query plan look right? Study it with [`EXPLAIN`](#docs:stable:guides:performance:how_to_tune_workloads::profiling). 1. Is the workload running [in parallel](#docs:stable:guides:performance:how_to_tune_workloads::parallelism)? Use `htop` or the operating system's task manager to observe this. 1. Is DuckDB using too many threads? Try [limiting the amount of threads](#docs:stable:guides:performance:how_to_tune_workloads::parallelism-multi-core-processing). Are you aware of other common issues? If so, please click the _Report content issue_ link below and describe them along with their workarounds. ### Benchmarks {#docs:stable:guides:performance:benchmarks} For several of the recommendations in our performance guide, we use microbenchmarks to back up our claims. For these benchmarks, we use datasets from the [TPC-H benchmark](#docs:stable:core_extensions:tpch) and the [LDBC Social Network Benchmarkâ€™s BI workload](https://github.com/ldbc/ldbc_snb_bi/blob/main/snb-bi-pre-generated-data-sets.md#compressed-csvs-in-the-composite-merged-fk-format). #### Datasets {#docs:stable:guides:performance:benchmarks::datasets} We use the [LDBC BI SF300 dataset's Comment table](https://blobs.duckdb.org/data/ldbc-sf300-comments.tar.zst) (20 GB `.tar.zst` archive, 21 GB when decompressed into `.csv.gz` files), while others use the same table's [`creationDate` column](https://blobs.duckdb.org/data/ldbc-sf300-comments-creationDate.parquet) (4 GB `.parquet` file). The TPC datasets used in the benchmark are generated with the DuckDB [tpch extension](#docs:stable:core_extensions:tpch). #### A Note on Benchmarks {#docs:stable:guides:performance:benchmarks::a-note-on-benchmarks} Running [fair benchmarks is difficult](https://hannes.muehleisen.org/publications/DBTEST2018-performance-testing.pdf), especially when performing system-to-system comparison. When running benchmarks on DuckDB, please make sure you are using the latest version (preferably the [preview build](https://duckdb.org/install/index.html?version=main)). If in doubt about your benchmark results, feel free to contact us at `[email protected]`. #### Disclaimer on Benchmarks {#docs:stable:guides:performance:benchmarks::disclaimer-on-benchmarks} Note that the benchmark results presented in this guide do not constitute official TPC or LDBC benchmark results. Instead, they merely use the datasets of and some queries provided by the TPC-H and the LDBC BI benchmark frameworks, and omit other parts of the workloads such as updates. ### Working with Huge Databases {#docs:stable:guides:performance:working_with_huge_databases} This page contains information for working with huge DuckDB database files. While most DuckDB databases are well below 1 TB, in our [2024 user survey](https://duckdb.org/2024/10/04/duckdb-user-survey-analysis#dataset-sizes), 1% of respondents used DuckDB files of 2 TB or more (corresponding to roughly 10 TB of CSV files). DuckDB's [native database format](#docs:stable:internals:storage) supports huge database files without any practical restrictions, however, there are a few things to keep in mind when working with huge database files. 1. Object storage systems have lower limits on file sizes than block-based storage systems. For example, [AWS S3 limits the file size to 5 TB](https://aws.amazon.com/s3/faqs/). 2. Checkpointing a DuckDB database can be slow. For example, checkpointing after adding a few rows to a table in the [TPC-H](#docs:stable:core_extensions:tpch) SF1000 database takes approximately 5 seconds. 3. On block-based storage, the file has a big effect on performance when working with large files. On Linux, DuckDB performs best with XFS on large files. For storing large amounts of data, consider using the [DuckLake lakehouse format](https://ducklake.select/). ## Python {#guides:python} ### Installing the Python Client {#docs:stable:guides:python:install} #### Installing via Pip {#docs:stable:guides:python:install::installing-via-pip} The latest release of the Python client can be installed using `pip`. ```batch pip install duckdb ``` The pre-release Python client (known as the â€œpreviewâ€ or â€œnightlyâ€ build) can be installed using `--pre`. ```batch pip install duckdb --upgrade --pre ``` #### Installing from Source {#docs:stable:guides:python:install::installing-from-source} The latest Python client can be installed from source from the [`tools/pythonpkg` directory in the DuckDB GitHub repository](https://github.com/duckdb/duckdb/tree/main/tools/pythonpkg). ```bash BUILD_PYTHON=1 GEN=ninja make cd tools/pythonpkg python setup.py install ``` For detailed instructions on how to compile DuckDB from source, see the [Building guide](#docs:stable:dev:building:python). ### Executing SQL in Python {#docs:stable:guides:python:execute_sql} SQL queries can be executed using the `duckdb.sql` function. ```python import duckdb duckdb.sql("SELECT 42").show() ``` By default this will create a relation object. The result can be converted to various formats using the result conversion functions. For example, the `fetchall` method can be used to convert the result to Python objects. ```python results = duckdb.sql("SELECT 42").fetchall() print(results) ``` ```text [(42,)] ``` Several other result objects exist. For example, you can use `df` to convert the result to a Pandas DataFrame. ```python results = duckdb.sql("SELECT 42").df() print(results) ``` ```text 42 0 42 ``` By default, a global in-memory connection will be used. Any data stored in files will be lost after shutting down the program. A connection to a persistent database can be created using the `connect` function. After connecting, SQL queries can be executed using the `sql` command. ```python con = duckdb.connect("file.db") con.sql("CREATE TABLE integers (i INTEGER)") con.sql("INSERT INTO integers VALUES (42)") con.sql("SELECT * FROM integers").show() ``` ### Jupyter Notebooks {#docs:stable:guides:python:jupyter} DuckDB's Python client can be used directly in Jupyter notebooks with no additional configuration if desired. However, additional libraries can be used to simplify SQL query development. This guide will describe how to utilize those additional libraries. See other guides in the Python section for how to use DuckDB and Python together. In this example, we use the [JupySQL](https://github.com/ploomber/jupysql) package. This example workflow is also available as a [Google Colab notebook](https://colab.research.google.com/drive/1bNfU8xRTu8MQJnCbyyDRxvptklLb0ExH?usp=sharing). #### Library Installation {#docs:stable:guides:python:jupyter::library-installation} Four additional libraries improve the DuckDB experience in Jupyter notebooks. 1. [jupysql](https://github.com/ploomber/jupysql): Convert a Jupyter code cell into a SQL cell 2. [Pandas](https://github.com/pandas-dev/pandas): Clean table visualizations and compatibility with other analysis 3. [matplotlib](https://github.com/matplotlib/matplotlib): Plotting with Python 4. [duckdb-engine (DuckDB SQLAlchemy driver)](https://github.com/Mause/duckdb_engine): Used by SQLAlchemy to connect to DuckDB (optional) Run these `pip install` commands from the command line if Jupyter Notebook is not yet installed. Otherwise, see Google Colab link above for an in-notebook example: ```batch pip install duckdb ``` Install Jupyter Notebook: ```batch pip install notebook ``` Or JupyterLab: ```batch pip install jupyterlab ``` Install supporting libraries: ```batch pip install jupysql pandas matplotlib duckdb-engine ``` #### Library Import and Configuration {#docs:stable:guides:python:jupyter::library-import-and-configuration} Open a Jupyter Notebook and import the relevant libraries. Set configurations on jupysql to directly output data to Pandas and to simplify the output that is printed to the notebook. ```python %config SqlMagic.autopandas = True %config SqlMagic.feedback = False %config SqlMagic.displaycon = False ``` ##### Connecting to DuckDB Natively {#docs:stable:guides:python:jupyter::connecting-to-duckdb-natively} To connect to DuckDB, run: ```python import duckdb import pandas as pd %load_ext sql conn = duckdb.connect() %sql conn --alias duckdb ``` > **Warning.** [Variables](#docs:stable:sql:statements:set_variable) are not recognized within a native DuckDB connection. ##### Connecting to DuckDB via SQLAlchemy {#docs:stable:guides:python:jupyter::connecting-to-duckdb-via-sqlalchemy} Alternatively, you can connect to DuckDB via SQLAlchemy using `duckdb_engine`. See the [performance and feature differences](https://jupysql.ploomber.io/en/latest/tutorials/duckdb-native-sqlalchemy.html). ```python import duckdb import pandas as pd # No need to import duckdb_engine # jupysql will auto-detect the driver needed based on the connection string! # Import jupysql Jupyter extension to create SQL cells %load_ext sql ``` Either connect to a new [in-memory DuckDB](#docs:stable:clients:python:dbapi::in-memory-connection), the [default connection](#docs:stable:clients:python:dbapi::default-connection) or a file backed database: ```sql %sql duckdb:///:memory: ``` ```sql %sql duckdb:///:default: ``` ```sql %sql duckdb:///path/to/file.db ``` > The `%sql` command and `duckdb.sql` share the same [default connection](#docs:stable:clients:python:dbapi) if you provide `duckdb:///:default:` as the SQLAlchemy connection string. #### Querying DuckDB {#docs:stable:guides:python:jupyter::querying-duckdb} Single line SQL queries can be run using `%sql` at the start of a line. Query results will be displayed as a Pandas DataFrame. ```sql %sql SELECT 'Off and flying!' AS a_duckdb_column; ``` An entire Jupyter cell can be used as a SQL cell by placing `%%sql` at the start of the cell. Query results will be displayed as a Pandas DataFrame. ```sql %%sql SELECT schema_name, function_name FROM duckdb_functions() ORDER BY ALL DESC LIMIT 5; ``` To store the query results in a Python variable, use `<<` as an assignment operator. This can be used with both the `%sql` and `%%sql` Jupyter magics. ```sql %sql res << SELECT 'Off and flying!' AS a_duckdb_column; ``` If the `%config SqlMagic.autopandas = True` option is set, the variable is a Pandas dataframe, otherwise, it is a `ResultSet` that can be converted to Pandas with the `DataFrame()` function. #### Querying Pandas Dataframes {#docs:stable:guides:python:jupyter::querying-pandas-dataframes} DuckDB is able to find and query any dataframe stored as a variable in the Jupyter notebook. ```python input_df = pd.DataFrame.from_dict({"i": [1, 2, 3], "j": ["one", "two", "three"]}) ``` The dataframe being queried can be specified just like any other table in the `FROM` clause. ```sql %sql output_df << SELECT sum(i) AS total_i FROM input_df; ``` > **Warning.** When using the SQLAlchemy connection, and DuckDB >= 1.1.0, make sure to run `%sql SET python_scan_all_frames=true`, to make Pandas dataframes queryable. #### Visualizing DuckDB Data {#docs:stable:guides:python:jupyter::visualizing-duckdb-data} The most common way to plot datasets in Python is to load them using Pandas and then use matplotlib or seaborn for plotting. This approach requires loading all data into memory which is highly inefficient. The plotting module in JupySQL runs computations in the SQL engine. This delegates memory management to the engine and ensures that intermediate computations do not keep eating up memory, efficiently plotting massive datasets. ##### Boxplot & Histogram {#docs:stable:guides:python:jupyter::boxplot--histogram} To create a boxplot, call `%sqlplot boxplot`, passing the name of the table and the column to plot. In this case, the name of the table is the path of the locally stored Parquet file. ```python from urllib.request import urlretrieve _ = urlretrieve( "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet", "yellow_tripdata_2021-01.parquet", ) %sqlplot boxplot --table yellow_tripdata_2021-01.parquet --column trip_distance ``` ![](../images/trip-distance-boxplot.png) ##### Install and Load DuckDB httpfs Extension {#docs:stable:guides:python:jupyter::install-and-load-duckdb-httpfs-extension} DuckDB's [httpfs extension](#docs:stable:core_extensions:httpfs:overview) allows Parquet and CSV files to be queried remotely over http. These examples query a Parquet file that contains historical taxi data from NYC. Using the Parquet format allows DuckDB to only pull the rows and columns into memory that are needed rather than downloading the entire file. DuckDB can be used to process local [Parquet files](#docs:stable:data:parquet:overview) as well, which may be desirable if querying the entire Parquet file, or running multiple queries that require large subsets of the file. ```sql %%sql INSTALL httpfs; LOAD httpfs; ``` Now, create a query that filters by the 90th percentile. Note the use of the `--save`, and `--no-execute` functions. This tells JupySQL to store the query, but skips execution. It will be referenced in the next plotting call. ```sql %%sql --save short_trips --no-execute SELECT * FROM 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet' WHERE trip_distance < 6.3 ``` To create a histogram, call `%sqlplot histogram` and pass the name of the table, the column to plot, and the number of bins. This uses `--with short-trips` so JupySQL uses the query defined previously and therefore only plots a subset of the data. ```python %sqlplot histogram --table short_trips --column trip_distance --bins 10 --with short_trips ``` ![](../images/trip-distance-histogram.png) #### Summary {#docs:stable:guides:python:jupyter::summary} You now have the ability to alternate between SQL and Pandas in a simple and highly performant way! You can plot massive datasets directly through the engine (avoiding both the download of the entire file and loading all of it into Pandas in memory). Dataframes can be read as tables in SQL, and SQL results can be output into Dataframes. Happy analyzing! An alternative to `jupysql` is [`magic_duckdb`](https://github.com/iqmo-org/magic_duckdb). ### marimo Notebooks {#docs:stable:guides:python:marimo} [marimo](https://github.com/marimo-team/marimo) is an open-source reactive notebook for Python and SQL that's tightly integrated with DuckDB's Python client, letting you mix and match Python and SQL in a single git-versionable notebook. Unlike traditional notebooks, when you run a cell or interact with a UI element, marimo automatically (or lazily) runs affected cells, keeping code and outputs consistent. Its integration with DuckDB makes it well-suited to interactively working with data, and its representation as a Python file makes it simple to run notebooks as scripts. #### Installation {#docs:stable:guides:python:marimo::installation} To get started, install marimo and DuckDB from your terminal: ```batch pip install "marimo[sql]" # or uv add "marimo[sql]" ``` Install supporting libraries: ```batch pip install "polars[pyarrow]" # or uv add "polars[pyarrow]" ``` Run a tutorial: ```batch marimo tutorial sql ``` #### SQL in marimo {#docs:stable:guides:python:marimo::sql-in-marimo} Create a notebook from your terminal with `marimo edit notebook.py`. Create SQL cells in one of three ways: 1. Right-click the **+** button and pick **SQL cell** 2. Convert any empty cell to SQL via the cell menu 3. Hit the SQL button at the bottom of your notebook ![](../images/guides/marimo/marimo-sql-button.png) In marimo, SQL cells give the appearance of writing SQL while being serialized as standard Python code using the `mo.sql()` function, which keeps your notebook as pure Python code without requiring special syntax or magic commands. ```python df = mo.sql(f"SELECT 'Off and flying!' AS a_duckdb_column") ``` This is because marimo stores notebooks as pure Python, [for many reasons](https://marimo.io/blog/python-not-json), such as git-friendly diffs and running notebooks as Python scripts. The SQL statement itself is an f-string, letting you interpolate Python values into the query with `{}` (shown later). In particular, this means your SQL queries can depend on the values of UI elements or other Python values, all part of marimo's dataflow graph. > **Warning.** Heads up! > If you have user-generated content going into the SQL queries, be sure to sanitize your inputs to prevent SQL injection. #### Connecting a Custom DuckDB Connection {#docs:stable:guides:python:marimo::connecting-a-custom-duckdb-connection} To connect to a custom DuckDB connection instead of using the default global connection, create a cell and create a DuckDB connection as Python variable: ```python import duckdb # Create a DuckDB connection conn = duckdb.connect("path/to/my/duckdb.db") ``` marimo automatically discovers the connection and lets you select it in the SQL cell's connection dropdown.

![](../images/guides/marimo/marimo-custom-connection.png)

Custom connection

#### Database, Schema, and Table Auto-Discovery {#docs:stable:guides:python:marimo::database-schema-and-table-auto-discovery} marimo introspects connections and display the database, schemas, tables, and columns in the Data Sources panel. This panel lets you quickly navigate your schemas to pull tables and columns into your SQL queries.

![](../images/guides/marimo/marimo-datasource-discovery.png)

Data Sources Panel

#### Reference a Local Dataframe {#docs:stable:guides:python:marimo::reference-a-local-dataframe} Reference a local dataframe in your SQL cell by using the name of the Python variable that holds the dataframe. If you have a database connection with a table of the same name, the database table will be used instead. ```python import polars as pl df = pl.DataFrame({"column": [1, 2, 3]}) ``` ```sql SELECT * FROM df WHERE column > 2 ``` #### Reference the Output of a SQL Cell {#docs:stable:guides:python:marimo::reference-the-output-of-a-sql-cell} Defining a non-private (non-underscored) output variable in the SQL cell allows you to reference the resulting dataframe in other Python and SQL cells.

![](../images/guides/marimo/marimo-sql-result.png)

Reference the SQL result in Python

#### Reactive SQL Cells {#docs:stable:guides:python:marimo::reactive-sql-cells} marimo allows you to create reactive SQL cells that automatically update when their dependencies change. **Working with expensive queries or large datasets?** You can configure marimo's runtime to be â€œlazyâ€. By doing so, dependent cells are only marked as stale letting the user choose when they should be re-run. ```python digits = mo.ui.slider(label="Digits", start=100, stop=10000, step=200) digits ``` ```sql CREATE TABLE random_data AS SELECT i AS id, random() AS random_value, FROM range({digits.value}) AS t(i); SELECT * FROM random_data; ``` Interacting with UI elements, like a slider, makes your data more tangible. ![](../images/guides/marimo/marimo-reactive-sql.gif) #### DuckDB-Powered OLAP Analytics in marimo {#docs:stable:guides:python:marimo::duckdb-powered-olap-analytics-in-marimo} marimo provides several features that work well with DuckDB for analytical workflows: * Seamless integration between Python and SQL * Reactive execution that automatically updates dependent cells when queries change * Interactive UI elements that can be used to parameterize SQL queries * Ability to export notebooks as standalone applications or Python scripts, or even run entirely in the browser [with WebAssembly](https://docs.marimo.io/guides/wasm/). #### Next Steps {#docs:stable:guides:python:marimo::next-steps} * Read the [marimo docs](https://docs.marimo.io/). * Try the SQL tutorial: `marimo tutorial sql`. * The code for this guide is [available on GitHub](https://github.com/marimo-team/marimo/blob/main/examples/sql/duckdb_example.py). Run it with `marimo edit âŸ¨github_urlâŸ©`. ### SQL on Pandas {#docs:stable:guides:python:sql_on_pandas} Pandas DataFrames stored in local variables can be queried as if they are regular tables within DuckDB. ```python import duckdb import pandas # Create a Pandas dataframe my_df = pandas.DataFrame.from_dict({'a': [42]}) # query the Pandas DataFrame "my_df" # Note: duckdb.sql connects to the default in-memory database connection results = duckdb.sql("SELECT * FROM my_df").df() ``` The seamless integration of Pandas DataFrames to DuckDB SQL queries is allowed by [replacement scans](#docs:stable:clients:c:replacement_scans), which replace instances of accessing the `my_df` table (which does not exist in DuckDB) with a table function that reads the `my_df` dataframe. ### Import from Pandas {#docs:stable:guides:python:import_pandas} [`CREATE TABLE ... AS`]({% link docs/stable/sql/statements/create_table.md %}#create-table--as-select-ctas) and [`INSERT INTO`](#docs:stable:sql:statements:insert) can be used to create a table from any query. We can then create tables or insert into existing tables by referring to the [Pandas](https://pandas.pydata.org/) DataFrame in the query. There is no need to register the DataFrames manually â€“ DuckDB can find them in the Python process by name thanks to [replacement scans](#docs:stable:guides:glossary::replacement-scan). ```python import duckdb import pandas # Create a Pandas dataframe my_df = pandas.DataFrame.from_dict({'a': [42]}) # create the table "my_table" from the DataFrame "my_df" # Note: duckdb.sql connects to the default in-memory database connection duckdb.sql("CREATE TABLE my_table AS SELECT * FROM my_df") # insert into the table "my_table" from the DataFrame "my_df" duckdb.sql("INSERT INTO my_table SELECT * FROM my_df") ``` If the order of columns is different or not all columns are present in the DataFrame, use [`INSERT INTO ... BY NAME`](#docs:stable:sql:statements:insert::insert-into--by-name): ```python duckdb.sql("INSERT INTO my_table BY NAME SELECT * FROM my_df") ``` #### See Also {#docs:stable:guides:python:import_pandas::see-also} DuckDB also supports [exporting to Pandas](#docs:stable:guides:python:export_pandas). ### Export to Pandas {#docs:stable:guides:python:export_pandas} The result of a query can be converted to a [Pandas](https://pandas.pydata.org/) DataFrame using the `df()` function. ```python import duckdb # read the result of an arbitrary SQL query to a Pandas DataFrame results = duckdb.sql("SELECT 42").df() results ``` ```text 42 0 42 ``` #### See Also {#docs:stable:guides:python:export_pandas::see-also} DuckDB also supports [importing from Pandas](#docs:stable:guides:python:import_pandas). ### Import from Numpy {#docs:stable:guides:python:import_numpy} It is possible to query Numpy arrays from DuckDB. There is no need to register the arrays manually â€“ DuckDB can find them in the Python process by name thanks to [replacement scans](#docs:stable:guides:glossary::replacement-scan). For example: ```python import duckdb import numpy as np my_arr = np.array([(1, 9.0), (2, 8.0), (3, 7.0)]) duckdb.sql("SELECT * FROM my_arr") ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ column0 â”‚ column1 â”‚ column2 â”‚ â”‚ double â”‚ double â”‚ double â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 1.0 â”‚ 2.0 â”‚ 3.0 â”‚ â”‚ 9.0 â”‚ 8.0 â”‚ 7.0 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### See Also {#docs:stable:guides:python:import_numpy::see-also} DuckDB also supports [exporting to Numpy](#docs:stable:guides:python:export_numpy). ### Export to Numpy {#docs:stable:guides:python:export_numpy} The result of a query can be converted to a Numpy array using the `fetchnumpy()` function. For example: ```python import duckdb import numpy as np my_arr = duckdb.sql("SELECT unnest([1, 2, 3]) AS x, 5.0 AS y").fetchnumpy() my_arr ``` ```text {'x': array([1, 2, 3], dtype=int32), 'y': masked_array(data=[5.0, 5.0, 5.0], mask=[False, False, False], fill_value=1e+20)} ``` Then, the array can be processed using Numpy functions, e.g.: ```python np.sum(my_arr['x']) ``` ```text 6 ``` #### See Also {#docs:stable:guides:python:export_numpy::see-also} DuckDB also supports [importing from Numpy](#docs:stable:guides:python:import_numpy). ### SQL on Apache Arrow {#docs:stable:guides:python:sql_on_arrow} DuckDB can query multiple different types of Apache Arrow objects. #### Apache Arrow Tables {#docs:stable:guides:python:sql_on_arrow::apache-arrow-tables} [Arrow Tables](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) stored in local variables can be queried as if they are regular tables within DuckDB. ```python import duckdb import pyarrow as pa # connect to an in-memory database con = duckdb.connect() my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4], 'j': ["one", "two", "three", "four"]}) # query the Apache Arrow Table "my_arrow_table" and return as an Arrow Table results = con.execute("SELECT * FROM my_arrow_table WHERE i = 2").arrow() ``` #### Apache Arrow Datasets {#docs:stable:guides:python:sql_on_arrow::apache-arrow-datasets} [Arrow Datasets](https://arrow.apache.org/docs/python/dataset.html) stored as variables can also be queried as if they were regular tables. Datasets are useful to point towards directories of Parquet files to analyze large datasets. DuckDB will push column selections and row filters down into the dataset scan operation so that only the necessary data is pulled into memory. ```python import duckdb import pyarrow as pa import tempfile import pathlib import pyarrow.parquet as pq import pyarrow.dataset as ds # connect to an in-memory database con = duckdb.connect() my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4], 'j': ["one", "two", "three", "four"]}) # create example Parquet files and save in a folder base_path = pathlib.Path(tempfile.gettempdir()) (base_path / "parquet_folder").mkdir(exist_ok = True) pq.write_to_dataset(my_arrow_table, str(base_path / "parquet_folder")) # link to Parquet files using an Arrow Dataset my_arrow_dataset = ds.dataset(str(base_path / 'parquet_folder/')) # query the Apache Arrow Dataset "my_arrow_dataset" and return as an Arrow Table results = con.execute("SELECT * FROM my_arrow_dataset WHERE i = 2").arrow() ``` #### Apache Arrow Scanners {#docs:stable:guides:python:sql_on_arrow::apache-arrow-scanners} [Arrow Scanners](https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Scanner.html) stored as variables can also be queried as if they were regular tables. Scanners read over a dataset and select specific columns or apply row-wise filtering. This is similar to how DuckDB pushes column selections and filters down into an Arrow Dataset, but using Arrow compute operations instead. Arrow can use asynchronous IO to quickly access files. ```python import duckdb import pyarrow as pa import tempfile import pathlib import pyarrow.parquet as pq import pyarrow.dataset as ds import pyarrow.compute as pc # connect to an in-memory database con = duckdb.connect() my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4], 'j': ["one", "two", "three", "four"]}) # create example Parquet files and save in a folder base_path = pathlib.Path(tempfile.gettempdir()) (base_path / "parquet_folder").mkdir(exist_ok = True) pq.write_to_dataset(my_arrow_table, str(base_path / "parquet_folder")) # link to Parquet files using an Arrow Dataset my_arrow_dataset = ds.dataset(str(base_path / 'parquet_folder/')) # define the filter to be applied while scanning # equivalent to "WHERE i = 2" scanner_filter = (pc.field("i") == pc.scalar(2)) arrow_scanner = ds.Scanner.from_dataset(my_arrow_dataset, filter = scanner_filter) # query the Apache Arrow scanner "arrow_scanner" and return as an Arrow Table results = con.execute("SELECT * FROM arrow_scanner").arrow() ``` #### Apache Arrow RecordBatchReaders {#docs:stable:guides:python:sql_on_arrow::apache-arrow-recordbatchreaders} [Arrow RecordBatchReaders](https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatchReader.html) are a reader for Arrow's streaming binary format and can also be queried directly as if they were tables. This streaming format is useful when sending Arrow data for tasks like interprocess communication or communicating between language runtimes. ```python import duckdb import pyarrow as pa # connect to an in-memory database con = duckdb.connect() my_recordbatch = pa.RecordBatch.from_pydict({'i': [1, 2, 3, 4], 'j': ["one", "two", "three", "four"]}) my_recordbatchreader = pa.ipc.RecordBatchReader.from_batches(my_recordbatch.schema, [my_recordbatch]) # query the Apache Arrow RecordBatchReader "my_recordbatchreader" and return as an Arrow Table results = con.execute("SELECT * FROM my_recordbatchreader WHERE i = 2").arrow() ``` ### Import from Apache Arrow {#docs:stable:guides:python:import_arrow} `CREATE TABLE AS` and `INSERT INTO` can be used to create a table from any query. We can then create tables or insert into existing tables by referring to the Apache Arrow object in the query. This example imports from an [Arrow Table](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html), but DuckDB can query different Apache Arrow formats as seen in the [SQL on Arrow guide](#docs:stable:guides:python:sql_on_arrow). ```python import duckdb import pyarrow as pa # connect to an in-memory database my_arrow = pa.Table.from_pydict({'a': [42]}) # create the table "my_table" from the DataFrame "my_arrow" duckdb.sql("CREATE TABLE my_table AS SELECT * FROM my_arrow") # insert into the table "my_table" from the DataFrame "my_arrow" duckdb.sql("INSERT INTO my_table SELECT * FROM my_arrow") ``` ### Export to Apache Arrow {#docs:stable:guides:python:export_arrow} All results of a query can be exported to an [Apache Arrow Table](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) using the `fetch_arrow_table` function. Alternatively, results can be returned as a [RecordBatchReader](https://arrow.apache.org/docs/python/generated/pyarrow.ipc.RecordBatchStreamReader.html) using the `arrow` function and results can be read one batch at a time. In addition, relations built using DuckDB's [Relational API](#docs:stable:guides:python:relational_api_pandas) can also be exported. #### Export to an Arrow Table {#docs:stable:guides:python:export_arrow::export-to-an-arrow-table} ```python import duckdb import pyarrow as pa my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4], 'j': ["one", "two", "three", "four"]}) # query the Apache Arrow Table "my_arrow_table" and return as an Arrow Table results = duckdb.sql("SELECT * FROM my_arrow_table").fetch_arrow_table() ``` #### Export as a RecordBatchReader {#docs:stable:guides:python:export_arrow::export-as-a-recordbatchreader} ```python import duckdb import pyarrow as pa my_arrow_table = pa.Table.from_pydict({'i': [1, 2, 3, 4], 'j': ["one", "two", "three", "four"]}) # query the Apache Arrow Table "my_arrow_table" and return as an Arrow RecordBatchReader chunk_size = 1_000_000 result = duckdb.sql("SELECT * FROM my_arrow_table").arrow(chunk_size) # Loop through the results. A StopIteration exception is thrown when the RecordBatchReader is empty while (batch := result.read_next_batch()): # Process a single chunk here print(batch.to_pandas()) ``` #### Export from Relational API {#docs:stable:guides:python:export_arrow::export-from-relational-api} Arrow objects can also be exported from the Relational API. A relation can be converted to an Arrow table using either the `DuckDBPyRelation.fetch_arrow_table` or `DuckDBPyRelation.to_arrow_table` function, and to an Arrow record batch reader using either the `DuckDBPyRelation.arrow` or `DuckDBPyRelation.fetch_arrow_reader` function. ```python import duckdb # connect to an in-memory database con = duckdb.connect() con.execute('CREATE TABLE integers (i integer)') con.execute('INSERT INTO integers VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (NULL)') # Create a relation from the table and export the entire relation as Arrow rel = con.table("integers") relation_as_arrow = rel.to_arrow_table() # or .fetch_arrow_table() # Calculate a result using that relation and export that result to Arrow res = rel.aggregate("sum(i)").execute() arrow_table = res.to_arrow_table() # or .fetch_arrow_table() # You can also create an Arrow record batch reader from a relation arrow_batch_reader = res.arrow() # or .fetch_arrow_reader() while (batch := arrow_batch_reader.read_next_batch()): # Process a single chunk here print(batch.to_pandas()) ``` ### Relational API on Pandas {#docs:stable:guides:python:relational_api_pandas} DuckDB offers a relational API that can be used to chain together query operations. These are lazily evaluated so that DuckDB can optimize their execution. These operators can act on Pandas DataFrames, DuckDB tables or views (which can point to any underlying storage format that DuckDB can read, such as CSV or Parquet files, etc.). Here we show a simple example of reading from a Pandas DataFrame and returning a DataFrame. ```python import duckdb import pandas # connect to an in-memory database con = duckdb.connect() input_df = pandas.DataFrame.from_dict({'i': [1, 2, 3, 4], 'j': ["one", "two", "three", "four"]}) # create a DuckDB relation from a dataframe rel = con.from_df(input_df) # chain together relational operators (this is a lazy operation, so the operations are not yet executed) # equivalent to: SELECT i, j, i*2 AS two_i FROM input_df WHERE i >= 2 ORDER BY i DESC LIMIT 2 transformed_rel = rel.filter('i >= 2').project('i, j, i*2 AS two_i').order('i DESC').limit(2) # trigger execution by requesting .df() of the relation # .df() could have been added to the end of the chain above - it was separated for clarity output_df = transformed_rel.df() ``` Relational operators can also be used to group rows, aggregate, find distinct combinations of values, join, union, and more. They are also able to directly insert results into a DuckDB table or write to a CSV. Please see [these additional examples](https://github.com/duckdb/duckdb/blob/main/examples/python/duckdb-python.py) and the [available relational methods on the `DuckDBPyRelation` class](#docs:stable:clients:python:reference:index::duckdb.DuckDBPyRelation). ### Multiple Python Threads {#docs:stable:guides:python:multiple_threads} This page demonstrates how to simultaneously insert into and read from a DuckDB database across multiple Python threads. This could be useful in scenarios where new data is flowing in and an analysis should be periodically re-run. Note that this is all within a single Python process (see the [FAQ](#faq) for details on DuckDB concurrency). Feel free to follow along in this [Google Colab notebook](https://colab.research.google.com/drive/190NB2m-LIfDcMamCY5lIzaD2OTMnYclB?usp=sharing). #### Setup {#docs:stable:guides:python:multiple_threads::setup} First, import DuckDB and several modules from the Python standard library. Note: if using Pandas, add `import pandas` at the top of the script as well (as it must be imported prior to the multi-threading). Then connect to a file-backed DuckDB database and create an example table to store inserted data. This table will track the name of the thread that completed the insert and automatically insert the timestamp when that insert occurred using the [`DEFAULT` expression](#docs:stable:sql:statements:create_table::syntax). ```python import duckdb from threading import Thread, current_thread import random duckdb_con = duckdb.connect('my_peristent_db.duckdb') # Use connect without parameters for an in-memory database # duckdb_con = duckdb.connect() duckdb_con.execute(""" CREATE OR REPLACE TABLE my_inserts ( thread_name VARCHAR, insert_time TIMESTAMP DEFAULT current_timestamp ) """) ``` #### Reader and Writer Functions {#docs:stable:guides:python:multiple_threads::reader-and-writer-functions} Next, define functions to be executed by the writer and reader threads. Each thread must use the `.cursor()` method to create a thread-local connection to the same DuckDB file based on the original connection. This approach also works with in-memory DuckDB databases. ```python def write_from_thread(duckdb_con): # Create a DuckDB connection specifically for this thread local_con = duckdb_con.cursor() # Insert a row with the name of the thread. insert_time is auto-generated. thread_name = str(current_thread().name) result = local_con.execute(""" INSERT INTO my_inserts (thread_name) VALUES (?) """, (thread_name,)).fetchall() def read_from_thread(duckdb_con): # Create a DuckDB connection specifically for this thread local_con = duckdb_con.cursor() # Query the current row count thread_name = str(current_thread().name) results = local_con.execute(""" SELECT ? AS thread_name, count(*) AS row_counter, current_timestamp FROM my_inserts """, (thread_name,)).fetchall() print(results) ``` #### Create Threads {#docs:stable:guides:python:multiple_threads::create-threads} We define how many writers and readers to use, and define a list to track all of the threads that will be created. Then, create first writer and then reader threads. Next, shuffle them so that they will be kicked off in a random order to simulate simultaneous writers and readers. Note that the threads have not yet been executed, only defined. ```python write_thread_count = 50 read_thread_count = 5 threads = [] # Create multiple writer and reader threads (in the same process) # Pass in the same connection as an argument for i in range(write_thread_count): threads.append(Thread(target = write_from_thread, args = (duckdb_con,), name = 'write_thread_' + str(i))) for j in range(read_thread_count): threads.append(Thread(target = read_from_thread, args = (duckdb_con,), name = 'read_thread_' + str(j))) # Shuffle the threads to simulate a mix of readers and writers random.seed(6) # Set the seed to ensure consistent results when testing random.shuffle(threads) ``` #### Run Threads and Show Results {#docs:stable:guides:python:multiple_threads::run-threads-and-show-results} Now, kick off all threads to run in parallel, then wait for all of them to finish before printing out the results. Note that the timestamps of readers and writers are interspersed as expected due to the randomization. ```python # Kick off all threads in parallel for thread in threads: thread.start() # Ensure all threads complete before printing final results for thread in threads: thread.join() print(duckdb_con.execute(""" SELECT * FROM my_inserts ORDER BY insert_time """).df()) ``` ### Integration with Ibis {#docs:stable:guides:python:ibis} [Ibis](https://ibis-project.org) is a Python dataframe library that supports 20+ backends, with DuckDB as the default. Ibis with DuckDB provides a Pythonic interface for SQL with great performance. #### Installation {#docs:stable:guides:python:ibis::installation} You can pip install Ibis with the DuckDB backend: ```batch pip install 'ibis-framework[duckdb,examples]' # examples is only required to access the sample data Ibis provides ``` or use conda: ```batch conda install ibis-framework ``` or use mamba: ```batch mamba install ibis-framework ``` #### Create a Database File {#docs:stable:guides:python:ibis::create-a-database-file} Ibis can work with several file types, but at its core, it connects to existing databases and interacts with the data there. You can get started with your own DuckDB databases or create a new one with example data. ```python import ibis con = ibis.connect("duckdb://penguins.ddb") con.create_table( "penguins", ibis.examples.penguins.fetch().to_pyarrow(), overwrite = True ) ``` ```python # Output: DatabaseTable: penguins species string island string bill_length_mm float64 bill_depth_mm float64 flipper_length_mm int64 body_mass_g int64 sex string year int64 ``` You can now see the example dataset copied over to the database: ```python # reconnect to the persisted database (dropping temp tables) con = ibis.connect("duckdb://penguins.ddb") con.list_tables() ``` ```python # Output: ['penguins'] ``` There's one table, called `penguins`. We can ask Ibis to give us an object that we can interact with. ```python penguins = con.table("penguins") penguins ``` ```text # Output: DatabaseTable: penguins species string island string bill_length_mm float64 bill_depth_mm float64 flipper_length_mm int64 body_mass_g int64 sex string year int64 ``` Ibis is lazily evaluated, so instead of seeing the data, we see the schema of the table. To peek at the data, we can call `head` and then `to_pandas` to get the first few rows of the table as a pandas DataFrame. ```python penguins.head().to_pandas() ``` ```text species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year 0 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007 1 Adelie Torgersen 39.5 17.4 186.0 3800.0 female 2007 2 Adelie Torgersen 40.3 18.0 195.0 3250.0 female 2007 3 Adelie Torgersen NaN NaN NaN NaN None 2007 4 Adelie Torgersen 36.7 19.3 193.0 3450.0 female 2007 ``` `to_pandas` takes the existing lazy table expression and evaluates it. If we leave it off, you'll see the Ibis representation of the table expression that `to_pandas` will evaluate (when you're ready!). ```python penguins.head() ``` ```python # Output: r0 := DatabaseTable: penguins species string island string bill_length_mm float64 bill_depth_mm float64 flipper_length_mm int64 body_mass_g int64 sex string year int64 Limit[r0, n=5] ``` Ibis returns results as a pandas DataFrame using `to_pandas`, but isn't using pandas to perform any of the computation. The query is executed by DuckDB. Only when `to_pandas` is called does Ibis then pull back the results and convert them into a DataFrame. #### Interactive Mode {#docs:stable:guides:python:ibis::interactive-mode} For the rest of this intro, we'll turn on interactive mode, which partially executes queries to give users a preview of the results. There is a small difference in the way the output is formatted, but otherwise this is the same as calling `to_pandas` on the table expression with a limit of 10 result rows returned. ```python ibis.options.interactive = True penguins.head() ``` ```text â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ bill_length_mm â”ƒ bill_depth_mm â”ƒ flipper_length_mm â”ƒ body_mass_g â”ƒ sex â”ƒ year â”ƒ â”¡â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ float64 â”‚ float64 â”‚ int64 â”‚ int64 â”‚ string â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Adelie â”‚ Torgersen â”‚ 39.1 â”‚ 18.7 â”‚ 181 â”‚ 3750 â”‚ male â”‚ 2007 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 39.5 â”‚ 17.4 â”‚ 186 â”‚ 3800 â”‚ female â”‚ 2007 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 40.3 â”‚ 18.0 â”‚ 195 â”‚ 3250 â”‚ female â”‚ 2007 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ nan â”‚ nan â”‚ NULL â”‚ NULL â”‚ NULL â”‚ 2007 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 36.7 â”‚ 19.3 â”‚ 193 â”‚ 3450 â”‚ female â”‚ 2007 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Common Operations {#docs:stable:guides:python:ibis::common-operations} Ibis has a collection of useful table methods to manipulate and query the data in a table. ##### filter {#docs:stable:guides:python:ibis::filter} `filter` allows you to select rows based on a condition or set of conditions. We can filter so we only have penguins of the species Adelie: ```python penguins.filter(penguins.species == "Gentoo") ``` ```text â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ bill_length_mm â”ƒ bill_depth_mm â”ƒ flipper_length_mm â”ƒ body_mass_g â”ƒ sex â”ƒ year â”ƒ â”¡â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ float64 â”‚ float64 â”‚ int64 â”‚ int64 â”‚ string â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Gentoo â”‚ Biscoe â”‚ 46.1 â”‚ 13.2 â”‚ 211 â”‚ 4500 â”‚ female â”‚ 2007 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 50.0 â”‚ 16.3 â”‚ 230 â”‚ 5700 â”‚ male â”‚ 2007 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 48.7 â”‚ 14.1 â”‚ 210 â”‚ 4450 â”‚ female â”‚ 2007 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 50.0 â”‚ 15.2 â”‚ 218 â”‚ 5700 â”‚ male â”‚ 2007 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 47.6 â”‚ 14.5 â”‚ 215 â”‚ 5400 â”‚ male â”‚ 2007 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 46.5 â”‚ 13.5 â”‚ 210 â”‚ 4550 â”‚ female â”‚ 2007 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 45.4 â”‚ 14.6 â”‚ 211 â”‚ 4800 â”‚ female â”‚ 2007 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 46.7 â”‚ 15.3 â”‚ 219 â”‚ 5200 â”‚ male â”‚ 2007 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 43.3 â”‚ 13.4 â”‚ 209 â”‚ 4400 â”‚ female â”‚ 2007 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 46.8 â”‚ 15.4 â”‚ 215 â”‚ 5150 â”‚ male â”‚ 2007 â”‚ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”˜ ``` Or filter for Gentoo penguins that have a body mass larger than 6 kg. ```python penguins.filter((penguins.species == "Gentoo") & (penguins.body_mass_g > 6000)) ``` ```text â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ bill_length_mm â”ƒ bill_depth_mm â”ƒ flipper_length_mm â”ƒ body_mass_g â”ƒ sex â”ƒ year â”ƒ â”¡â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ float64 â”‚ float64 â”‚ int64 â”‚ int64 â”‚ string â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Gentoo â”‚ Biscoe â”‚ 49.2 â”‚ 15.2 â”‚ 221 â”‚ 6300 â”‚ male â”‚ 2007 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 59.6 â”‚ 17.0 â”‚ 230 â”‚ 6050 â”‚ male â”‚ 2007 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”˜ ``` You can use any Boolean comparison in a filter (although if you try to do something like use `<` on a string, Ibis will yell at you). ##### select {#docs:stable:guides:python:ibis::select} Your data analysis might not require all the columns present in a given table. `select` lets you pick out only those columns that you want to work with. To select a column you can use the name of the column as a string: ```python penguins.select("species", "island", "year").limit(3) ``` ```text â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ year â”ƒ â”¡â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Adelie â”‚ Torgersen â”‚ 2007 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 2007 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 2007 â”‚ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”˜ ``` Or you can use column objects directly (this can be convenient when paired with tab-completion): ```python penguins.select(penguins.species, penguins.island, penguins.year).limit(3) ``` ```text â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ year â”ƒ â”¡â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Adelie â”‚ Torgersen â”‚ 2007 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 2007 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 2007 â”‚ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”˜ ``` Or you can mix-and-match: ```python penguins.select("species", "island", penguins.year).limit(3) ``` ```text â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ year â”ƒ â”¡â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Adelie â”‚ Torgersen â”‚ 2007 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 2007 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 2007 â”‚ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ##### mutate {#docs:stable:guides:python:ibis::mutate} `mutate` lets you add new columns to your table, derived from the values of existing columns. ```python penguins.mutate(bill_length_cm=penguins.bill_length_mm / 10) ``` ```text â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ bill_length_mm â”ƒ bill_depth_mm â”ƒ flipper_length_mm â”ƒ body_mass_g â”ƒ sex â”ƒ year â”ƒ bill_length_cm â”ƒ â”¡â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ float64 â”‚ float64 â”‚ int64 â”‚ int64 â”‚ string â”‚ int64 â”‚ float64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Adelie â”‚ Torgersen â”‚ 39.1 â”‚ 18.7 â”‚ 181 â”‚ 3750 â”‚ male â”‚ 2007 â”‚ 3.91 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 39.5 â”‚ 17.4 â”‚ 186 â”‚ 3800 â”‚ female â”‚ 2007 â”‚ 3.95 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 40.3 â”‚ 18.0 â”‚ 195 â”‚ 3250 â”‚ female â”‚ 2007 â”‚ 4.03 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ nan â”‚ nan â”‚ NULL â”‚ NULL â”‚ NULL â”‚ 2007 â”‚ nan â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 36.7 â”‚ 19.3 â”‚ 193 â”‚ 3450 â”‚ female â”‚ 2007 â”‚ 3.67 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 39.3 â”‚ 20.6 â”‚ 190 â”‚ 3650 â”‚ male â”‚ 2007 â”‚ 3.93 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 38.9 â”‚ 17.8 â”‚ 181 â”‚ 3625 â”‚ female â”‚ 2007 â”‚ 3.89 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 39.2 â”‚ 19.6 â”‚ 195 â”‚ 4675 â”‚ male â”‚ 2007 â”‚ 3.92 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 34.1 â”‚ 18.1 â”‚ 193 â”‚ 3475 â”‚ NULL â”‚ 2007 â”‚ 3.41 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 42.0 â”‚ 20.2 â”‚ 190 â”‚ 4250 â”‚ NULL â”‚ 2007 â”‚ 4.20 â”‚ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` Notice that the table is a little too wide to display all the columns now (depending on your screen-size). `bill_length` is now present in millimeters _and_ centimeters. Use a `select` to trim down the number of columns we're looking at. ```python penguins.mutate(bill_length_cm=penguins.bill_length_mm / 10).select( "species", "island", "bill_depth_mm", "flipper_length_mm", "body_mass_g", "sex", "year", "bill_length_cm", ) ``` ```text â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ bill_depth_mm â”ƒ flipper_length_mm â”ƒ body_mass_g â”ƒ sex â”ƒ year â”ƒ bill_length_cm â”ƒ â”¡â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ float64 â”‚ int64 â”‚ int64 â”‚ string â”‚ int64 â”‚ float64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Adelie â”‚ Torgersen â”‚ 18.7 â”‚ 181 â”‚ 3750 â”‚ male â”‚ 2007 â”‚ 3.91 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 17.4 â”‚ 186 â”‚ 3800 â”‚ female â”‚ 2007 â”‚ 3.95 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 18.0 â”‚ 195 â”‚ 3250 â”‚ female â”‚ 2007 â”‚ 4.03 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ nan â”‚ NULL â”‚ NULL â”‚ NULL â”‚ 2007 â”‚ nan â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 19.3 â”‚ 193 â”‚ 3450 â”‚ female â”‚ 2007 â”‚ 3.67 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 20.6 â”‚ 190 â”‚ 3650 â”‚ male â”‚ 2007 â”‚ 3.93 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 17.8 â”‚ 181 â”‚ 3625 â”‚ female â”‚ 2007 â”‚ 3.89 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 19.6 â”‚ 195 â”‚ 4675 â”‚ male â”‚ 2007 â”‚ 3.92 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 18.1 â”‚ 193 â”‚ 3475 â”‚ NULL â”‚ 2007 â”‚ 3.41 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 20.2 â”‚ 190 â”‚ 4250 â”‚ NULL â”‚ 2007 â”‚ 4.20 â”‚ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ##### selectors {#docs:stable:guides:python:ibis::selectors} Typing out _all_ of the column names _except_ one is a little annoying. Instead of doing that again, we can use a `selector` to quickly select or deselect groups of columns. ```python import ibis.selectors as s penguins.mutate(bill_length_cm=penguins.bill_length_mm / 10).select( ~s.matches("bill_length_mm") # match every column except `bill_length_mm` ) ``` ```text â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ bill_depth_mm â”ƒ flipper_length_mm â”ƒ body_mass_g â”ƒ sex â”ƒ year â”ƒ bill_length_cm â”ƒ â”¡â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ float64 â”‚ int64 â”‚ int64 â”‚ string â”‚ int64 â”‚ float64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Adelie â”‚ Torgersen â”‚ 18.7 â”‚ 181 â”‚ 3750 â”‚ male â”‚ 2007 â”‚ 3.91 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 17.4 â”‚ 186 â”‚ 3800 â”‚ female â”‚ 2007 â”‚ 3.95 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 18.0 â”‚ 195 â”‚ 3250 â”‚ female â”‚ 2007 â”‚ 4.03 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ nan â”‚ NULL â”‚ NULL â”‚ NULL â”‚ 2007 â”‚ nan â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 19.3 â”‚ 193 â”‚ 3450 â”‚ female â”‚ 2007 â”‚ 3.67 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 20.6 â”‚ 190 â”‚ 3650 â”‚ male â”‚ 2007 â”‚ 3.93 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 17.8 â”‚ 181 â”‚ 3625 â”‚ female â”‚ 2007 â”‚ 3.89 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 19.6 â”‚ 195 â”‚ 4675 â”‚ male â”‚ 2007 â”‚ 3.92 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 18.1 â”‚ 193 â”‚ 3475 â”‚ NULL â”‚ 2007 â”‚ 3.41 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 20.2 â”‚ 190 â”‚ 4250 â”‚ NULL â”‚ 2007 â”‚ 4.20 â”‚ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` You can also use a `selector` alongside a column name. ```python penguins.select("island", s.numeric()) ``` ```text â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”“ â”ƒ island â”ƒ bill_length_mm â”ƒ bill_depth_mm â”ƒ flipper_length_mm â”ƒ body_mass_g â”ƒ year â”ƒ â”¡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”© â”‚ string â”‚ float64 â”‚ float64 â”‚ int64 â”‚ int64 â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Torgersen â”‚ 39.1 â”‚ 18.7 â”‚ 181 â”‚ 3750 â”‚ 2007 â”‚ â”‚ Torgersen â”‚ 39.5 â”‚ 17.4 â”‚ 186 â”‚ 3800 â”‚ 2007 â”‚ â”‚ Torgersen â”‚ 40.3 â”‚ 18.0 â”‚ 195 â”‚ 3250 â”‚ 2007 â”‚ â”‚ Torgersen â”‚ nan â”‚ nan â”‚ NULL â”‚ NULL â”‚ 2007 â”‚ â”‚ Torgersen â”‚ 36.7 â”‚ 19.3 â”‚ 193 â”‚ 3450 â”‚ 2007 â”‚ â”‚ Torgersen â”‚ 39.3 â”‚ 20.6 â”‚ 190 â”‚ 3650 â”‚ 2007 â”‚ â”‚ Torgersen â”‚ 38.9 â”‚ 17.8 â”‚ 181 â”‚ 3625 â”‚ 2007 â”‚ â”‚ Torgersen â”‚ 39.2 â”‚ 19.6 â”‚ 195 â”‚ 4675 â”‚ 2007 â”‚ â”‚ Torgersen â”‚ 34.1 â”‚ 18.1 â”‚ 193 â”‚ 3475 â”‚ 2007 â”‚ â”‚ Torgersen â”‚ 42.0 â”‚ 20.2 â”‚ 190 â”‚ 4250 â”‚ 2007 â”‚ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”˜ ``` You can read more about [`selectors`](https://ibis-project.org/reference/selectors/) in the docs! ##### `order_by` {#docs:stable:guides:python:ibis::order_by} `order_by` arranges the values of one or more columns in ascending or descending order. By default, `ibis` sorts in ascending order: ```python penguins.order_by(penguins.flipper_length_mm).select( "species", "island", "flipper_length_mm" ) ``` ```text â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ flipper_length_mm â”ƒ â”¡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Adelie â”‚ Biscoe â”‚ 172 â”‚ â”‚ Adelie â”‚ Biscoe â”‚ 174 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 176 â”‚ â”‚ Adelie â”‚ Dream â”‚ 178 â”‚ â”‚ Adelie â”‚ Dream â”‚ 178 â”‚ â”‚ Adelie â”‚ Dream â”‚ 178 â”‚ â”‚ Chinstrap â”‚ Dream â”‚ 178 â”‚ â”‚ Adelie â”‚ Dream â”‚ 179 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ 180 â”‚ â”‚ Adelie â”‚ Biscoe â”‚ 180 â”‚ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` You can sort in descending order using the `desc` method of a column: ```python penguins.order_by(penguins.flipper_length_mm.desc()).select( "species", "island", "flipper_length_mm" ) ``` ```text â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ flipper_length_mm â”ƒ â”¡â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Gentoo â”‚ Biscoe â”‚ 231 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 230 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 230 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 230 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 230 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 230 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 230 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 230 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 229 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 229 â”‚ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` Or you can use `ibis.desc` ```python penguins.order_by(ibis.desc("flipper_length_mm")).select( "species", "island", "flipper_length_mm" ) ``` ```text â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ flipper_length_mm â”ƒ â”¡â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Gentoo â”‚ Biscoe â”‚ 231 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 230 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 230 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 230 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 230 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 230 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 230 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 230 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 229 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 229 â”‚ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ##### aggregate {#docs:stable:guides:python:ibis::aggregate} Ibis has several aggregate functions available to help summarize data. `mean`, `max`, `min`, `count`, `sum` (the list goes on). To aggregate an entire column, call the corresponding method on that column. ```python penguins.flipper_length_mm.mean() ``` ```python # Output: 200.91520467836258 ``` You can compute multiple aggregates at once using the `aggregate` method: ```python penguins.aggregate([penguins.flipper_length_mm.mean(), penguins.bill_depth_mm.max()]) ``` ```text â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”“ â”ƒ Mean(flipper_length_mm) â”ƒ Max(bill_depth_mm) â”ƒ â”¡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”© â”‚ float64 â”‚ float64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 200.915205 â”‚ 21.5 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` But `aggregate` _really_ shines when it's paired with `group_by`. ##### `group_by` {#docs:stable:guides:python:ibis::group_by} `group_by` creates groupings of rows that have the same value for one or more columns. But it doesn't do much on its own -- you can pair it with `aggregate` to get a result. ```python penguins.group_by("species").aggregate() ``` ```text â”â”â”â”â”â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ â”¡â”â”â”â”â”â”â”â”â”â”â”â”© â”‚ string â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Adelie â”‚ â”‚ Gentoo â”‚ â”‚ Chinstrap â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` We grouped by the `species` column and handed it an â€œemptyâ€ aggregate command. The result of that is a column of the unique values in the `species` column. If we add a second column to the `group_by`, we'll get each unique pairing of the values in those columns. ```python penguins.group_by(["species", "island"]).aggregate() ``` ```text â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ â”¡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Adelie â”‚ Torgersen â”‚ â”‚ Adelie â”‚ Biscoe â”‚ â”‚ Adelie â”‚ Dream â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ â”‚ Chinstrap â”‚ Dream â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` Now, if we add an aggregation function to that, we start to really open things up. ```python penguins.group_by(["species", "island"]).aggregate(penguins.bill_length_mm.mean()) ``` ```text â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ Mean(bill_length_mm) â”ƒ â”¡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ float64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Adelie â”‚ Torgersen â”‚ 38.950980 â”‚ â”‚ Adelie â”‚ Biscoe â”‚ 38.975000 â”‚ â”‚ Adelie â”‚ Dream â”‚ 38.501786 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 47.504878 â”‚ â”‚ Chinstrap â”‚ Dream â”‚ 48.833824 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` By adding that `mean` to the `aggregate`, we now have a concise way to calculate aggregates over each of the distinct groups in the `group_by`. And we can calculate as many aggregates as we need. ```python penguins.group_by(["species", "island"]).aggregate( [penguins.bill_length_mm.mean(), penguins.flipper_length_mm.max()] ) ``` ```text â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ Mean(bill_length_mm) â”ƒ Max(flipper_length_mm) â”ƒ â”¡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ float64 â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Adelie â”‚ Torgersen â”‚ 38.950980 â”‚ 210 â”‚ â”‚ Adelie â”‚ Biscoe â”‚ 38.975000 â”‚ 203 â”‚ â”‚ Adelie â”‚ Dream â”‚ 38.501786 â”‚ 208 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ 47.504878 â”‚ 231 â”‚ â”‚ Chinstrap â”‚ Dream â”‚ 48.833824 â”‚ 212 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` If we need more specific groups, we can add to the `group_by`. ```python penguins.group_by(["species", "island", "sex"]).aggregate( [penguins.bill_length_mm.mean(), penguins.flipper_length_mm.max()] ) ``` ```text â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”“ â”ƒ species â”ƒ island â”ƒ sex â”ƒ Mean(bill_length_mm) â”ƒ Max(flipper_length_mm) â”ƒ â”¡â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”© â”‚ string â”‚ string â”‚ string â”‚ float64 â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Adelie â”‚ Torgersen â”‚ male â”‚ 40.586957 â”‚ 210 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ female â”‚ 37.554167 â”‚ 196 â”‚ â”‚ Adelie â”‚ Torgersen â”‚ NULL â”‚ 37.925000 â”‚ 193 â”‚ â”‚ Adelie â”‚ Biscoe â”‚ female â”‚ 37.359091 â”‚ 199 â”‚ â”‚ Adelie â”‚ Biscoe â”‚ male â”‚ 40.590909 â”‚ 203 â”‚ â”‚ Adelie â”‚ Dream â”‚ female â”‚ 36.911111 â”‚ 202 â”‚ â”‚ Adelie â”‚ Dream â”‚ male â”‚ 40.071429 â”‚ 208 â”‚ â”‚ Adelie â”‚ Dream â”‚ NULL â”‚ 37.500000 â”‚ 179 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ female â”‚ 45.563793 â”‚ 222 â”‚ â”‚ Gentoo â”‚ Biscoe â”‚ male â”‚ 49.473770 â”‚ 231 â”‚ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â€¦ â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Chaining It All Together {#docs:stable:guides:python:ibis::chaining-it-all-together} We've already chained some Ibis calls together. We used `mutate` to create a new column and then `select` to only view a subset of the new table. We were just chaining `group_by` with `aggregate`. There's nothing stopping us from putting all of these concepts together to ask questions of the data. How about: * What was the largest female penguin (by body mass) on each island in the year 2008? ```python penguins.filter((penguins.sex == "female") & (penguins.year == 2008)).group_by( ["island"] ).aggregate(penguins.body_mass_g.max()) ``` ```text â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”“ â”ƒ island â”ƒ Max(body_mass_g) â”ƒ â”¡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”© â”‚ string â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Biscoe â”‚ 5200 â”‚ â”‚ Torgersen â”‚ 3800 â”‚ â”‚ Dream â”‚ 3900 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` * What about the largest male penguin (by body mass) on each island for each year of data collection? ```python penguins.filter(penguins.sex == "male").group_by(["island", "year"]).aggregate( penguins.body_mass_g.max().name("max_body_mass") ).order_by(["year", "max_body_mass"]) ``` ```text â”â”â”â”â”â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”³â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”“ â”ƒ island â”ƒ year â”ƒ max_body_mass â”ƒ â”¡â”â”â”â”â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â•‡â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”© â”‚ string â”‚ int64 â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ Dream â”‚ 2007 â”‚ 4650 â”‚ â”‚ Torgersen â”‚ 2007 â”‚ 4675 â”‚ â”‚ Biscoe â”‚ 2007 â”‚ 6300 â”‚ â”‚ Torgersen â”‚ 2008 â”‚ 4700 â”‚ â”‚ Dream â”‚ 2008 â”‚ 4800 â”‚ â”‚ Biscoe â”‚ 2008 â”‚ 6000 â”‚ â”‚ Torgersen â”‚ 2009 â”‚ 4300 â”‚ â”‚ Dream â”‚ 2009 â”‚ 4475 â”‚ â”‚ Biscoe â”‚ 2009 â”‚ 6000 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Learn More {#docs:stable:guides:python:ibis::learn-more} That's all for this quick-start guide. If you want to learn more, check out the [Ibis documentation](https://ibis-project.org). ### Integration with Polars {#docs:stable:guides:python:polars} [Polars](https://github.com/pola-rs/polars) is a DataFrames library built in Rust with bindings for Python and Node.js. It uses [Apache Arrow's columnar format](https://arrow.apache.org/docs/format/Columnar.html) as its memory model. DuckDB can read Polars DataFrames and convert query results to Polars DataFrames. It does this internally using the efficient Apache Arrow integration. Note that the `pyarrow` library must be installed for the integration to work. #### Installation {#docs:stable:guides:python:polars::installation} ```batch pip install -U duckdb 'polars[pyarrow]' ``` #### Polars to DuckDB {#docs:stable:guides:python:polars::polars-to-duckdb} DuckDB can natively query Polars DataFrames by referring to the name of Polars DataFrames as they exist in the current scope. ```python import duckdb import polars as pl df = pl.DataFrame( { "A": [1, 2, 3, 4, 5], "fruits": ["banana", "banana", "apple", "apple", "banana"], "B": [5, 4, 3, 2, 1], "cars": ["beetle", "audi", "beetle", "beetle", "beetle"], } ) duckdb.sql("SELECT * FROM df").show() ``` #### DuckDB to Polars {#docs:stable:guides:python:polars::duckdb-to-polars} DuckDB can output results as Polars DataFrames using the `.pl()` result-conversion method. ```python df = duckdb.sql(""" SELECT 1 AS id, 'banana' AS fruit UNION ALL SELECT 2, 'apple' UNION ALL SELECT 3, 'mango'""" ).pl() print(df) ``` ```text shape: (3, 2) â”Œâ”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”† fruit â”‚ â”‚ --- â”† --- â”‚ â”‚ i32 â”† str â”‚ â•žâ•â•â•â•â•â•ªâ•â•â•â•â•â•â•â•â•¡ â”‚ 1 â”† banana â”‚ â”‚ 2 â”† apple â”‚ â”‚ 3 â”† mango â”‚ â””â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` The optional `lazy` parameter allows returning Polars LazyFrames. ```python df = duckdb.sql(""" SELECT 1 AS id, 'banana' AS fruit UNION ALL SELECT 2, 'apple' UNION ALL SELECT 3, 'mango'""" ).pl(lazy=True) print(df) ``` ```text naive plan: (run LazyFrame.explain(optimized=True) to see the optimized plan) PYTHON SCAN [] PROJECT */2 COLUMNS ``` To learn more about Polars, feel free to explore their [Python API Reference](https://pola-rs.github.io/polars/py-polars/html/reference/index.html). ### Using fsspec Filesystems {#docs:stable:guides:python:filesystems} DuckDB support for [`fsspec`](https://filesystem-spec.readthedocs.io) filesystems allows querying data in filesystems that DuckDB's [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview) does not support. `fsspec` has a large number of [inbuilt filesystems](https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations), and there are also many [external implementations](https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations). This capability is only available in DuckDB's Python client because `fsspec` is a Python library, while the `httpfs` extension is available in many DuckDB clients. #### Example {#docs:stable:guides:python:filesystems::example} The following is an example of using `fsspec` to query a file in Google Cloud Storage (instead of using their S3-compatible API). Firstly, you must install `duckdb` and `fsspec`, and a filesystem interface of your choice. ```batch pip install duckdb fsspec gcsfs ``` Then, you can register whichever filesystem you'd like to query: ```python import duckdb from fsspec import filesystem # this line will throw an exception if the appropriate filesystem interface is not installed duckdb.register_filesystem(filesystem('gcs')) duckdb.sql("SELECT * FROM read_csv('gcs:///bucket/file.csv')") ``` > These filesystems are not implemented in C++, hence, their performance may not be comparable to the ones provided by the `httpfs` extension. > It is also worth noting that as they are third-party libraries, they may contain bugs that are beyond our control. ## SQL Editors {#guides:sql_editors} ### DBeaver SQL IDE {#docs:stable:guides:sql_editors:dbeaver} [DBeaver](https://dbeaver.io/) is a powerful and popular desktop sql editor and integrated development environment (IDE). It has both an open source and enterprise version. It is useful for visually inspecting the available tables in DuckDB and for quickly building complex queries. DuckDB's [JDBC connector](https://search.maven.org/artifact/org.duckdb/duckdb_jdbc) allows DBeaver to query DuckDB files, and by extension, any other files that DuckDB can access (like [Parquet files](#docs:stable:guides:file_formats:query_parquet)). #### Installing DBeaver {#docs:stable:guides:sql_editors:dbeaver::installing-dbeaver} 1. Install DBeaver using the download links and instructions found at their [download page](https://dbeaver.io/download/). 2. Open DBeaver and create a new connection. Either click on the â€œNew Database Connectionâ€ button or go to Database > New Database Connection in the menu bar. ![](../images/guides/DBeaver_new_database_connection.png) ![](../images/guides/DBeaver_new_database_connection_menu.png) 3. Search for DuckDB, select it, and click Next. ![](../images/guides/DBeaver_select_database_driver.png) 4. Enter the path or browse to the DuckDB database file you wish to query. To use an in-memory DuckDB (useful primarily if just interested in querying Parquet files, or for testing) enter `:memory:` as the path. ![](../images/guides/DBeaver_connection_settings_path.png) 5. Click â€œTest Connectionâ€. This will then prompt you to install the DuckDB JDBC driver. If you are not prompted, see alternative driver installation instructions below. ![](../images/guides/DBeaver_connection_settings_test_connection.png) 6. Click â€œDownloadâ€ to download DuckDB's JDBC driver from Maven. Once download is complete, click â€œOKâ€, then click â€œFinishâ€. * Note: If you are in a corporate environment or behind a firewall, before clicking download, click the â€œDownload Configurationâ€ link to configure your proxy settings. ![](../images/guides/DBeaver_download_driver_files.png) 7. You should now see a database connection to your DuckDB database in the left hand â€œDatabase Navigatorâ€ pane. Expand it to see the tables and views in your database. Right click on that connection and create a new SQL script. ![](../images/guides/DBeaver_new_sql_script.png) 8. Write some SQL and click the â€œExecuteâ€ button. ![](../images/guides/DBeaver_execute_query.png) 9. Now you're ready to fly with DuckDB and DBeaver! ![](../images/guides/DBeaver_query_results.png) #### Alternative Driver Installation {#docs:stable:guides:sql_editors:dbeaver::alternative-driver-installation} 1. If not prompted to install the DuckDB driver when testing your connection, return to the â€œConnect to a databaseâ€ dialog and click â€œEdit Driver Settingsâ€. ![](../images/guides/DBeaver_edit_driver_settings.png) 2. (Alternate) You may also access the driver settings menu by returning to the main DBeaver window and clicking Database > Driver Manager in the menu bar. Then select DuckDB, then click Edit. ![](../images/guides/DBeaver_driver_manager.png) ![](../images/guides/DBeaver_driver_manager_edit.png) 3. Go to the â€œLibrariesâ€ tab, then click on the DuckDB driver and click â€œDownload/Updateâ€. If you do not see the DuckDB driver, first click on â€œReset to Defaultsâ€. ![](../images/guides/DBeaver_edit_driver_duckdb.png) 4. Click â€œDownloadâ€ to download DuckDB's JDBC driver from Maven. Once download is complete, click â€œOKâ€, then return to the main DBeaver window and continue with step 7 above. * Note: If you are in a corporate environment or behind a firewall, before clicking download, click the â€œDownload Configurationâ€ link to configure your proxy settings. ![](../images/guides/DBeaver_download_driver_files_from_driver_settings.png) ## SQL Features {#guides:sql_features} ### AsOf Join {#docs:stable:guides:sql_features:asof_join} #### What is an AsOf Join? {#docs:stable:guides:sql_features:asof_join::what-is-an-asof-join} Time series data is not always perfectly aligned. Clocks may be slightly off, or there may be a delay between cause and effect. This can make connecting two sets of ordered data challenging. AsOf joins are a tool for solving this and other similar problems. One of the problems that AsOf joins are used to solve is finding the value of a varying property at a specific point in time. This use case is so common that it is where the name came from: _Give me the value of the property **as of this time**_. More generally, however, AsOf joins embody some common temporal analytic semantics, which can be cumbersome and slow to implement in standard SQL. #### Portfolio Example Dataset {#docs:stable:guides:sql_features:asof_join::portfolio-example-dataset} Let's start with a concrete example. Suppose we have a table of stock [`prices`](https://duckdb.org/data/prices.csv) with timestamps: | ticker | when | price | | :----- | :--- | ----: | | APPL | 2001-01-01 00:00:00 | 1 | | APPL | 2001-01-01 00:01:00 | 2 | | APPL | 2001-01-01 00:02:00 | 3 | | MSFT | 2001-01-01 00:00:00 | 1 | | MSFT | 2001-01-01 00:01:00 | 2 | | MSFT | 2001-01-01 00:02:00 | 3 | | GOOG | 2001-01-01 00:00:00 | 1 | | GOOG | 2001-01-01 00:01:00 | 2 | | GOOG | 2001-01-01 00:02:00 | 3 | We have another table containing portfolio [`holdings`](https://duckdb.org/data/holdings.csv) at various points in time: | ticker | when | shares | | :----- | :--- | -----: | | APPL | 2000-12-31 23:59:30 | 5.16 | | APPL | 2001-01-01 00:00:30 | 2.94 | | APPL | 2001-01-01 00:01:30 | 24.13 | | GOOG | 2000-12-31 23:59:30 | 9.33 | | GOOG | 2001-01-01 00:00:30 | 23.45 | | GOOG | 2001-01-01 00:01:30 | 10.58 | | DATA | 2000-12-31 23:59:30 | 6.65 | | DATA | 2001-01-01 00:00:30 | 17.95 | | DATA | 2001-01-01 00:01:30 | 18.37 | To load these tables to DuckDB, run: ```sql CREATE TABLE prices AS FROM 'https://duckdb.org/data/prices.csv'; CREATE TABLE holdings AS FROM 'https://duckdb.org/data/holdings.csv'; ``` #### Inner AsOf Joins {#docs:stable:guides:sql_features:asof_join::inner-asof-joins} We can compute the value of each holding at that point in time by finding the most recent price before the holding's timestamp by using an AsOf Join: ```sql SELECT h.ticker, h.when, price * shares AS value FROM holdings h ASOF JOIN prices p ON h.ticker = p.ticker AND h.when >= p.when; ``` This attaches the value of the holding at that time to each row: | ticker | when | value | | :----- | :--- | ----: | | APPL | 2001-01-01 00:00:30 | 2.94 | | APPL | 2001-01-01 00:01:30 | 48.26 | | GOOG | 2001-01-01 00:00:30 | 23.45 | | GOOG | 2001-01-01 00:01:30 | 21.16 | It essentially executes a function defined by looking up nearby values in the `prices` table. Note also that missing `ticker` values do not have a match and don't appear in the output. #### Outer AsOf Joins {#docs:stable:guides:sql_features:asof_join::outer-asof-joins} Because AsOf produces at most one match from the right hand side, the left side table will not grow as a result of the join, but it could shrink if there are missing times on the right. To handle this situation, you can use an *outer* AsOf Join: ```sql SELECT h.ticker, h.when, price * shares AS value FROM holdings h ASOF LEFT JOIN prices p ON h.ticker = p.ticker AND h.when >= p.when ORDER BY ALL; ``` As you might expect, this will produce `NULL` prices and values instead of dropping left side rows when there is no ticker or the time is before the prices begin. | ticker | when | value | | :----- | :--- | ----: | | APPL | 2000-12-31 23:59:30 | | | APPL | 2001-01-01 00:00:30 | 2.94 | | APPL | 2001-01-01 00:01:30 | 48.26 | | GOOG | 2000-12-31 23:59:30 | | | GOOG | 2001-01-01 00:00:30 | 23.45 | | GOOG | 2001-01-01 00:01:30 | 21.16 | | DATA | 2000-12-31 23:59:30 | | | DATA | 2001-01-01 00:00:30 | | | DATA | 2001-01-01 00:01:30 | | #### AsOf Joins with the `USING` Keyword {#docs:stable:guides:sql_features:asof_join::asof-joins-with-the-using-keyword} So far we have been explicit about specifying the conditions for AsOf, but SQL also has a simplified join condition syntax for the common case where the column names are the same in both tables. This syntax uses the `USING` keyword to list the fields that should be compared for equality. AsOf also supports this syntax, but with two restrictions: * The last field is the inequality * The inequality is `>=` (the most common case) Our first query can then be written as: ```sql SELECT ticker, h.when, price * shares AS value FROM holdings h ASOF JOIN prices p USING (ticker, "when"); ``` ##### Clarification on Column Selection with `USING` in ASOF Joins {#docs:stable:guides:sql_features:asof_join::clarification-on-column-selection-with-using-in-asof-joins} When you use the `USING` keyword in a join, the columns specified in the `USING` clause are merged in the result set. This means that if you run: ```sql SELECT * FROM holdings h ASOF JOIN prices p USING (ticker, "when"); ``` You will get back only the columns `h.ticker, h.when, h.shares, p.price`. The columns `ticker` and `when` will appear only once, with `ticker` and `when` coming from the left table (holdings). This behavior is fine for the `ticker` column because the value is the same in both tables. However, for the `when` column, the values might differ between the two tables due to the `>=` condition used in the AsOf join. The AsOf join is designed to match each row in the left table (` holdings`) with the nearest preceding row in the right table (` prices`) based on the `when` column. If you want to retrieve the `when` column from both tables to see both timestamps, you need to list the columns explicitly rather than relying on `*`, like so: ```sql SELECT h.ticker, h.when AS holdings_when, p.when AS prices_when, h.shares, p.price FROM holdings h ASOF JOIN prices p USING (ticker, "when"); ``` This ensures that you get the complete information from both tables, avoiding any potential confusion caused by the default behavior of the `USING` keyword. #### See Also {#docs:stable:guides:sql_features:asof_join::see-also} For implementation details, see the [blog post â€œDuckDB's AsOf joins: Fuzzy Temporal Lookupsâ€](https://duckdb.org/2023/09/15/asof-joins-fuzzy-temporal-lookups). ### Full-Text Search {#docs:stable:guides:sql_features:full_text_search} DuckDB supports full-text search via the [`fts` extension](#docs:stable:core_extensions:full_text_search). A full-text index allows for a query to quickly search for all occurrences of individual words within longer text strings. #### Example: Shakespeare Corpus {#docs:stable:guides:sql_features:full_text_search::example-shakespeare-corpus} Here's an example of building a full-text index of Shakespeare's plays. ```sql CREATE TABLE corpus AS SELECT * FROM 'https://blobs.duckdb.org/data/shakespeare.parquet'; ``` ```sql DESCRIBE corpus; ``` | column_name | column_type | null | key | default | extra | |-------------|-------------|------|------|---------|-------| | line_id | VARCHAR | YES | NULL | NULL | NULL | | play_name | VARCHAR | YES | NULL | NULL | NULL | | line_number | VARCHAR | YES | NULL | NULL | NULL | | speaker | VARCHAR | YES | NULL | NULL | NULL | | text_entry | VARCHAR | YES | NULL | NULL | NULL | The text of each line is in `text_entry`, and a unique key for each line is in `line_id`. #### Creating a Full-Text Search Index {#docs:stable:guides:sql_features:full_text_search::creating-a-full-text-search-index} First, we create the index, specifying the table name, the unique id column, and the column(s) to index. We will just index the single column `text_entry`, which contains the text of the lines in the play. ```sql PRAGMA create_fts_index('corpus', 'line_id', 'text_entry'); ``` The table is now ready to query using the [Okapi BM25](https://en.wikipedia.org/wiki/Okapi_BM25) ranking function. Rows with no match return a `NULL` score. What does Shakespeare say about butter? ```sql SELECT fts_main_corpus.match_bm25(line_id, 'butter') AS score, line_id, play_name, speaker, text_entry FROM corpus WHERE score IS NOT NULL ORDER BY score DESC; ``` | score | line_id | play_name | speaker | text_entry | |-------------------:|-------------|--------------------------|--------------|----------------------------------------------------| | 4.427313429798464 | H4/2.4.494 | Henry IV | Carrier | As fat as butter. | | 3.836270302568675 | H4/1.2.21 | Henry IV | FALSTAFF | prologue to an egg and butter. | | 3.836270302568675 | H4/2.1.55 | Henry IV | Chamberlain | They are up already, and call for eggs and butter; | | 3.3844488405497115 | H4/4.2.21 | Henry IV | FALSTAFF | toasts-and-butter, with hearts in their bellies no | | 3.3844488405497115 | H4/4.2.62 | Henry IV | PRINCE HENRY | already made thee butter. But tell me, Jack, whose | | 3.3844488405497115 | AWW/4.1.40 | Alls well that ends well | PAROLLES | butter-womans mouth and buy myself another of | | 3.3844488405497115 | AYLI/3.2.93 | As you like it | TOUCHSTONE | right butter-womens rank to market. | | 3.3844488405497115 | KL/2.4.132 | King Lear | Fool | kindness to his horse, buttered his hay. | | 3.0278411214953107 | AWW/5.2.9 | Alls well that ends well | Clown | henceforth eat no fish of fortunes buttering. | | 3.0278411214953107 | MWW/2.2.260 | Merry Wives of Windsor | FALSTAFF | Hang him, mechanical salt-butter rogue! I will | | 3.0278411214953107 | MWW/2.2.284 | Merry Wives of Windsor | FORD | rather trust a Fleming with my butter, Parson Hugh | | 3.0278411214953107 | MWW/3.5.7 | Merry Wives of Windsor | FALSTAFF | Ill have my brains taen out and buttered, and give | | 3.0278411214953107 | MWW/3.5.102 | Merry Wives of Windsor | FALSTAFF | to heat as butter; a man of continual dissolution | | 2.739219044070792 | H4/2.4.115 | Henry IV | PRINCE HENRY | Didst thou never see Titan kiss a dish of butter? | Unlike standard indexes, full-text indexes don't auto-update as the underlying data is changed, so you need to `PRAGMA drop_fts_index(my_fts_index)` and recreate it when appropriate. #### Note on Generating the Corpus Table {#docs:stable:guides:sql_features:full_text_search::note-on-generating-the-corpus-table} For more details, see the [â€œGenerating a Shakespeare corpus for full-text searching from JSONâ€ blog post](https://duckdb.blogspot.com/2023/04/generating-shakespeare-corpus-for-full.html). * The Columns are: line_id, play_name, line_number, speaker, text_entry. * We need a unique key for each row in order for full-text searching to work. * The line_id `KL/2.4.132` means King Lear, Act 2, Scene 4, Line 132. ### query and query_table Functions {#docs:stable:guides:sql_features:query_and_query_table_functions} The [`query_table`](#docs:stable:sql:functions:utility::query_tabletbl_name) and [`query`](#docs:stable:sql:functions:utility::queryquery_string_literal) functions enable powerful and more dynamic SQL. The `query_table` function returns the table whose name is specified by its string argument; the `query` function returns the table obtained by executing the query specified by its string argument. Both functions only accept constant strings. For example, they allow passing in a table name as a prepared statement parameter: ```sql CREATE TABLE my_table (i INTEGER); INSERT INTO my_table VALUES (42); PREPARE select_from_table AS SELECT * FROM query_table($1); EXECUTE select_from_table('my_table'); ``` | i | |---:| | 42 | When combined with the [`COLUMNS` expression](#docs:stable:sql:expressions:star::columns), we can write very generic SQL-only macros. For example, below is a custom version of `SUMMARIZE` that computes the `min` and `max` of every column in a table: ```sql CREATE OR REPLACE MACRO my_summarize(table_name) AS TABLE SELECT unnest([*COLUMNS('alias_.*')]) AS column_name, unnest([*COLUMNS('min_.*')]) AS min_value, unnest([*COLUMNS('max_.*')]) AS max_value FROM ( SELECT any_value(alias(COLUMNS(*))) AS "alias_\0", min(COLUMNS(*))::VARCHAR AS "min_\0", max(COLUMNS(*))::VARCHAR AS "max_\0" FROM query_table(table_name::VARCHAR) ); SELECT * FROM my_summarize('https://blobs.duckdb.org/data/ontime.parquet') LIMIT 3; ``` | column_name | min_value | max_value | |-------------|----------:|----------:| | year | 2017 | 2017 | | quarter | 1 | 3 | | month | 1 | 9 | The `query` function allows for even more flexibility. For example, users who prefer pandas' `stack` syntax over SQL's `UNPIVOT` syntax, may use: ```sql CREATE OR REPLACE MACRO stack(table_name, index, name, values) AS TABLE FROM query( 'UNPIVOT ' || table_name || ' ON COLUMNS(* EXCLUDE (' || array_to_string(index, ', ') || ')) INTO NAME ' || name || ' VALUES ' || values ); WITH cities AS ( FROM ( VALUES ('NL', 'Amsterdam', '10', '12', '15'), ('US', 'New York', '100', '120', '150') ) _(country, city, '2000', '2010', '2020') ) SELECT * FROM stack('cities', ['country', 'city'], 'year', 'population'); ``` | country | city | year | population | |---------|-----------|------|------------| | NL | Amsterdam | 2000 | 10 | | NL | Amsterdam | 2010 | 12 | | NL | Amsterdam | 2020 | 15 | | US | New York | 2000 | 100 | | US | New York | 2010 | 120 | | US | New York | 2020 | 150 | ### Timestamp Issues {#docs:stable:guides:sql_features:timestamps} #### Timestamp with Time Zone Promotion Casts {#docs:stable:guides:sql_features:timestamps::timestamp-with-time-zone-promotion-casts} Working with time zones in SQL can be quite confusing at times. For example, when filtering to a date range, one might try the following query: ```sql SET timezone = 'America/Los_Angeles'; CREATE TABLE times AS FROM range('2025-08-30'::TIMESTAMPTZ, '2025-08-31'::TIMESTAMPTZ, INTERVAL 1 HOUR) tbl(t); FROM times WHERE t <= '2025-08-30'; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ t â”‚ â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 2025-08-30 00:00:00-07 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` But if you change to another time zone, the results of the query change: ```sql SET timezone = 'HST'; FROM times WHERE t <= '2025-08-30'; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ t â”‚ â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 2025-08-29 21:00:00-10 â”‚ â”‚ 2025-08-29 22:00:00-10 â”‚ â”‚ 2025-08-29 23:00:00-10 â”‚ â”‚ 2025-08-30 00:00:00-10 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` Or worse: ```sql SET timezone = 'America/New_York'; FROM times WHERE t <= '2025-08-30'; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ t â”‚ â”‚ timestamp with time zone â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 0 rows â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` These confusing results are due to the SQL casting rules from `DATE` to `TIMESTAMP WITH TIME ZONE`. This cast is required to promote the date to midnight _in the current time zone_. In general, unless you need to use the current time zone for display (or [other temporal binning](https://duckdb.org/2022/01/06/time-zones) operations) you should use plain `TIMESTAMP`s for temporal data. This will avoid confusing issues such as this, and the arithmetic operations are generally faster. #### Time Zone Performance {#docs:stable:guides:sql_features:timestamps::time-zone-performance} DuckDB uses the _International Components for Unicode_ time library for [time zone support](https://duckdb.org/2022/01/06/time-zones). This library has a number of advantages, including support for daylight savings time past 2037. (Note: Pandas gives incorrect results past that year). The downside of using ICU is that it is not highly performant. One workaround for this is to create a calendar table for the timestamps being modeled. For example, if the application is modeling electrical supply and demand out to 2100 at hourly resolution, one can create the calendar table like so: ```sql SET timezone = 'Europe/Netherlands'; CREATE OR REPLACE TABLE hourly AS SELECT ts, year::SMALLINT AS year, month::TINYINT AS month, day::TINYINT AS day, hour::TINYINT AS hour, FROM ( SELECT ts, unnest(date_part(['year', 'month', 'day', 'hour',], ts)) FROM generate_series( '2020-01-01'::DATE::TIMESTAMPTZ, '2100-01-01'::DATE::TIMESTAMPTZ, INTERVAL 1 HOUR) tbl(ts) ) parts; ``` You can then join this ~700K row table against any timestamp column to quickly obtain the temporal bin values for the time zone in question. The inner casts are not required, but result in a smaller table because `date_part` returns 64 bit integers for all parts. Notice that we can extract _all_ of the parts with a single call to `date_part`. This part list version of the function is faster than extracting the parts one by one because the underlying binning computation computes all parts, so picking out the ones in the list is avoids duplicate calls to the slow ICU function. Also notice that we are leveraging the `DATE` cast rules from the previous section to bound the calendar to the model domain. #### Half Open Intervals {#docs:stable:guides:sql_features:timestamps::half-open-intervals} Another subtle problem in using SQL for temporal analytics is the `BETWEEN` operator. Temporal analytics almost always uses [half-open binning intervals](https://www.cs.arizona.edu/~rts/tdbbook.pdf) to avoid overlaps at the ends. Unfortunately, the `BETWEEN` operator is a closed-closed interval: ```sql x BETWEEN begin AND end -- expands to begin <= x AND x <= end -- not begin <= x AND x < end ``` To avoid this problem, make sure you are explicit about comparison boundaries instead of using `BETWEEN`. ## Snippets {#guides:snippets} ### Create Synthetic Data {#docs:stable:guides:snippets:create_synthetic_data} DuckDB allows you to quickly generate synthetic datasets. To do so, you may use: * [range functions](#docs:stable:sql:functions:list::range-functions) * hash functions, e.g., [`hash`](#docs:stable:sql:functions:utility::hashvalue), [`md5`](#docs:stable:sql:functions:utility::md5string), [`sha256`](#docs:stable:sql:functions:utility::sha256value) * the [Faker Python package](https://faker.readthedocs.io/) via the [Python function API](#docs:stable:clients:python:function) * using [cross products (Cartesian products)](#docs:stable:sql:query_syntax:from::cross-product-joins-cartesian-product) For example: ```python import duckdb from duckdb.typing import * from faker import Faker fake = Faker() def random_date(): return fake.date_between() def random_short_text(): return fake.text(max_nb_chars=20) def random_long_text(): return fake.text(max_nb_chars=200) con = duckdb.connect() con.create_function("random_date", random_date, [], DATE, type="native", side_effects=True) con.create_function("random_short_text", random_short_text, [], VARCHAR, type="native", side_effects=True) con.create_function("random_long_text", random_long_text, [], VARCHAR, type="native", side_effects=True) res = con.sql(""" SELECT hash(i * 10 + j) AS id, random_date() AS creationDate, random_short_text() AS short, random_long_text() AS long, IF (j % 2, true, false) AS bool FROM generate_series(1, 5) s(i) CROSS JOIN generate_series(1, 2) t(j) """) res.show() ``` This generates the following: ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ id â”‚ creationDate â”‚ flag â”‚ â”‚ uint64 â”‚ date â”‚ boolean â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 6770051751173734325 â”‚ 2019-11-05 â”‚ true â”‚ â”‚ 16510940941872865459 â”‚ 2002-08-03 â”‚ true â”‚ â”‚ 13285076694688170502 â”‚ 1998-11-27 â”‚ true â”‚ â”‚ 11757770452869451863 â”‚ 1998-07-03 â”‚ true â”‚ â”‚ 2064835973596856015 â”‚ 2010-09-06 â”‚ true â”‚ â”‚ 17776805813723356275 â”‚ 2020-12-26 â”‚ false â”‚ â”‚ 13540103502347468651 â”‚ 1998-03-21 â”‚ false â”‚ â”‚ 4800297459639118879 â”‚ 2015-06-12 â”‚ false â”‚ â”‚ 7199933130570745587 â”‚ 2005-04-13 â”‚ false â”‚ â”‚ 18103378254596719331 â”‚ 2014-09-15 â”‚ false â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 10 rows 3 columns â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ### Dutch Railway Datasets {#docs:stable:guides:snippets:dutch_railway_datasets} Examples in this documentation often use datasets based on the [Dutch Railway datasets](https://www.rijdendetreinen.nl/en/open-data/). These high-quality datasets are maintained by the team behind the [Rijden de Treinen _(Are the trains running?)_ application](https://www.rijdendetreinen.nl/en/about). This page contains download links to our mirrors to the datasets. > In 2024, we have published a [blog post on the analysis of these datasets](https://duckdb.org/2024/05/31/analyzing-railway-traffic-in-the-netherlands). #### Loading the Datasets {#docs:stable:guides:snippets:dutch_railway_datasets::loading-the-datasets} You can load the datasets directly as follows: ```sql CREATE TABLE services AS FROM 'https://blobs.duckdb.org/nl-railway/services-2025-03.csv.gz'; ``` ```sql DESCRIBE services; ``` | column_name | column_type | null | key | default | extra | |------------------------------|--------------------------|------|------|---------|-------| | Service:RDT-ID | BIGINT | YES | NULL | NULL | NULL | | Service:Date | DATE | YES | NULL | NULL | NULL | | Service:Type | VARCHAR | YES | NULL | NULL | NULL | | Service:Company | VARCHAR | YES | NULL | NULL | NULL | | Service:Train number | BIGINT | YES | NULL | NULL | NULL | | Service:Completely cancelled | BOOLEAN | YES | NULL | NULL | NULL | | Service:Partly cancelled | BOOLEAN | YES | NULL | NULL | NULL | | Service:Maximum delay | BIGINT | YES | NULL | NULL | NULL | | Stop:RDT-ID | BIGINT | YES | NULL | NULL | NULL | | Stop:Station code | VARCHAR | YES | NULL | NULL | NULL | | Stop:Station name | VARCHAR | YES | NULL | NULL | NULL | | Stop:Arrival time | TIMESTAMP WITH TIME ZONE | YES | NULL | NULL | NULL | | Stop:Arrival delay | BIGINT | YES | NULL | NULL | NULL | | Stop:Arrival cancelled | BOOLEAN | YES | NULL | NULL | NULL | | Stop:Departure time | TIMESTAMP WITH TIME ZONE | YES | NULL | NULL | NULL | | Stop:Departure delay | BIGINT | YES | NULL | NULL | NULL | | Stop:Departure cancelled | BOOLEAN | YES | NULL | NULL | NULL | #### Datasets {#docs:stable:guides:snippets:dutch_railway_datasets::datasets} ##### 80-Month Datasets {#docs:stable:guides:snippets:dutch_railway_datasets::80-month-datasets} * [2019-01 to 2025-08](https://blobs.duckdb.org/nl-railway/railway-services-80-months.zip): 80 months as uncompressed CSVs in a single zip ##### Yearly Datasets {#docs:stable:guides:snippets:dutch_railway_datasets::yearly-datasets} The yearly datasets are about 350 MB each. * [2019](https://blobs.duckdb.org/nl-railway/services-2019.csv.gz) * [2020](https://blobs.duckdb.org/nl-railway/services-2020.csv.gz) * [2021](https://blobs.duckdb.org/nl-railway/services-2021.csv.gz) * [2022](https://blobs.duckdb.org/nl-railway/services-2022.csv.gz) * [2023](https://blobs.duckdb.org/nl-railway/services-2023.csv.gz) * [2024](https://blobs.duckdb.org/nl-railway/services-2024.csv.gz) ##### Monthly Datasets {#docs:stable:guides:snippets:dutch_railway_datasets::monthly-datasets} The monthly datasets are about 30 MB each. * [2024-01](https://blobs.duckdb.org/nl-railway/services-2024-01.csv.gz) * [2024-02](https://blobs.duckdb.org/nl-railway/services-2024-02.csv.gz) * [2024-03](https://blobs.duckdb.org/nl-railway/services-2024-03.csv.gz) * [2024-04](https://blobs.duckdb.org/nl-railway/services-2024-04.csv.gz) * [2024-05](https://blobs.duckdb.org/nl-railway/services-2024-05.csv.gz) * [2024-06](https://blobs.duckdb.org/nl-railway/services-2024-06.csv.gz) * [2024-07](https://blobs.duckdb.org/nl-railway/services-2024-07.csv.gz) * [2024-08](https://blobs.duckdb.org/nl-railway/services-2024-08.csv.gz) * [2024-09](https://blobs.duckdb.org/nl-railway/services-2024-09.csv.gz) * [2024-10](https://blobs.duckdb.org/nl-railway/services-2024-10.csv.gz) * [2024-11](https://blobs.duckdb.org/nl-railway/services-2024-11.csv.gz) * [2024-12](https://blobs.duckdb.org/nl-railway/services-2024-12.csv.gz) * [2025-01](https://blobs.duckdb.org/nl-railway/services-2025-01.csv.gz) * [2025-02](https://blobs.duckdb.org/nl-railway/services-2025-02.csv.gz) * [2025-03](https://blobs.duckdb.org/nl-railway/services-2025-03.csv.gz) * [2025-04](https://blobs.duckdb.org/nl-railway/services-2025-04.csv.gz) * [2025-05](https://blobs.duckdb.org/nl-railway/services-2025-05.csv.gz) * [2025-06](https://blobs.duckdb.org/nl-railway/services-2025-06.csv.gz) * [2025-07](https://blobs.duckdb.org/nl-railway/services-2025-07.csv.gz) * [2025-08](https://blobs.duckdb.org/nl-railway/services-2025-08.csv.gz) * [2025-09](https://blobs.duckdb.org/nl-railway/services-2025-09.csv.gz) ### Sharing Macros {#docs:stable:guides:snippets:sharing_macros} DuckDB has a powerful [macro mechanism](#docs:stable:sql:statements:create_macro) that allows creating shorthands for common tasks. #### Sharing a Scalar Macro {#docs:stable:guides:snippets:sharing_macros::sharing-a-scalar-macro} First, we defined a macro that pretty-prints a non-negative integer as a short string with thousands, millions, and billions (without rounding) as follows: ```batch duckdb pretty_print_integer_macro.duckdb ``` ```sql CREATE MACRO pretty_print_integer(n) AS CASE WHEN n >= 1_000_000_000 THEN printf('%dB', n // 1_000_000_000) WHEN n >= 1_000_000 THEN printf('%dM', n // 1_000_000) WHEN n >= 1_000 THEN printf('%dk', n // 1_000) ELSE printf('%d', n) END; SELECT pretty_print_integer(25_500_000) AS x; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ x â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 25M â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` As one would expect, the macro gets persisted in the database. But this also means that we can host it on an HTTPS endpoint and share it with anyone! We have published this macro on `blobs.duckdb.org`. You can try it from DuckDB: ```batch duckdb ``` Make sure that the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview) is installed: ```sql INSTALL httpfs; ``` You can now attach to the remote endpoint and use the macro: ```sql ATTACH 'https://blobs.duckdb.org/data/pretty_print_integer_macro.duckdb' AS pretty_print_macro_db; SELECT pretty_print_macro_db.pretty_print_integer(42_123) AS x; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ x â”‚ â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 42k â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Sharing a Table Macro {#docs:stable:guides:snippets:sharing_macros::sharing-a-table-macro} It's also possible to share table macros. For example, we created the [`checksum` macro](https://duckdb.org/2024/10/11/duckdb-tricks-part-2#computing-checksums-for-columns) as follows: ```batch duckdb compute_table_checksum.duckdb ``` ```sql CREATE MACRO checksum(table_name) AS TABLE SELECT bit_xor(md5_number(COLUMNS(*)::VARCHAR)) FROM query_table(table_name); ``` To use it, make sure that the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview) is installed: ```sql INSTALL httpfs; ``` You can attach to the remote endpoint and use the macro: ```sql ATTACH 'https://blobs.duckdb.org/data/compute_table_checksum.duckdb' AS compute_table_checksum_db; CREATE TABLE stations AS FROM 'https://blobs.duckdb.org/stations.parquet'; .mode line FROM compute_table_checksum_db.checksum('stations'); ``` ```text id = -132780776949939723506211681506129908318 code = 126327004005066229305810236187733612209 uic = -145623335062491121476006068124745817380 name_short = -114540917565721687000878144381189869683 name_medium = -568264780518431562127359918655305384 name_long = 126079956280724674884063510870679874110 slug = -53458800462031706622213217090663245511 country = 143068442936912051858689770843609587944 type = 5665662315470785456147400604088879751 geo_lat = 160608116135251821259126521573759502306 geo_lng = -138297281072655463682926723171691547732 ``` ### Analyzing a Git Repository {#docs:stable:guides:snippets:analyze_git_repository} You can use DuckDB to analyze Git logs using the output of the [`git log` command](https://git-scm.com/docs/git-log). #### Exporting the Git Log {#docs:stable:guides:snippets:analyze_git_repository::exporting-the-git-log} We start by picking a character that doesn't occur in any part of the commit log (author names, messages, etc). Since version v1.2.0, DuckDB's CSV reader supports [4-byte delimiters](https://duckdb.org/2025/02/05/announcing-duckdb-120#csv-features), making it possible to use emojis! ðŸŽ‰ Despite being featured in the [Emoji Movie](https://www.imdb.com/title/tt4877122/) (IMDb rating: 3.4), we can assume that the [Fish Cake with Swirl emoji (ðŸ¥)](https://emojipedia.org/fish-cake-with-swirl) is not a common occurrence in most Git logs. So, let's clone the [`duckdb/duckdb` repository](https://github.com/duckdb/duckdb) and export its log as follows: ```batch git log --date=iso-strict --pretty=format:%adðŸ¥%hðŸ¥%anðŸ¥%s > git-log.csv ``` The resulting file looks like this: ```text 2025-02-25T18:12:54+01:00ðŸ¥d608a31e13ðŸ¥MarkðŸ¥MAIN_BRANCH_VERSIONING: Adopt also for Python build and amalgamation (#16400) 2025-02-25T15:05:56+01:00ðŸ¥920b39ad96ðŸ¥MarkðŸ¥Read support for Parquet Float16 (#16395) 2025-02-25T13:43:52+01:00ðŸ¥61f55734b9ðŸ¥Carlo PiovesanðŸ¥MAIN_BRANCH_VERSIONING: Adopt also for Python build and amalgamation 2025-02-25T12:35:28+01:00ðŸ¥87eff7ebd3ðŸ¥MarkðŸ¥Fix issue #16377 (#16391) 2025-02-25T10:33:49+01:00ðŸ¥35af26476eðŸ¥Hannes MÃ¼hleisenðŸ¥Read support for Parquet Float16 ``` #### Loading the Git Log into DuckDB {#docs:stable:guides:snippets:analyze_git_repository::loading-the-git-log-into-duckdb} Start DuckDB and read the log as a ~~CSV~~ ðŸ¥SV: ```sql CREATE TABLE commits AS FROM read_csv( 'git-log.csv', delim = 'ðŸ¥', header = false, column_names = ['timestamp', 'hash', 'author', 'message'] ); ``` This will result in a nice DuckDB table: ```sql FROM commits LIMIT 5; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ timestamp â”‚ hash â”‚ author â”‚ message â”‚ â”‚ timestamp â”‚ varchar â”‚ varchar â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 2025-02-25 17:12:54 â”‚ d608a31e13 â”‚ Mark â”‚ MAIN_BRANCH_VERSIONING: Adopt also for Python build and amalgamation (#16400) â”‚ â”‚ 2025-02-25 14:05:56 â”‚ 920b39ad96 â”‚ Mark â”‚ Read support for Parquet Float16 (#16395) â”‚ â”‚ 2025-02-25 12:43:52 â”‚ 61f55734b9 â”‚ Carlo Piovesan â”‚ MAIN_BRANCH_VERSIONING: Adopt also for Python build and amalgamation â”‚ â”‚ 2025-02-25 11:35:28 â”‚ 87eff7ebd3 â”‚ Mark â”‚ Fix issue #16377 (#16391) â”‚ â”‚ 2025-02-25 09:33:49 â”‚ 35af26476e â”‚ Hannes MÃ¼hleisen â”‚ Read support for Parquet Float16 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` #### Analyzing the Log {#docs:stable:guides:snippets:analyze_git_repository::analyzing-the-log} We can analyze the table as any other in DuckDB. ##### Common Topics {#docs:stable:guides:snippets:analyze_git_repository::common-topics} Let's start with a simple question: which topic was the most commonly mentioned in the commit messages: CI, CLI, or Python? ```sql SELECT message.lower().regexp_extract('\b(ci|cli|python)\b') AS topic, count(*) AS num_commits FROM commits WHERE topic <> '' GROUP BY ALL ORDER BY num_commits DESC; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ topic â”‚ num_commits â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ ci â”‚ 828 â”‚ â”‚ python â”‚ 666 â”‚ â”‚ cli â”‚ 49 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` Out of these three topics, commits related to continuous integration dominate the log! We can also do a more exploratory analysis by looking at all words in the commit messages. To do so, we first tokenize the messages: ```sql CREATE TABLE words AS SELECT unnest( message .lower() .regexp_replace('\W', ' ') .trim(' ') .string_split_regex('\W') ) AS word FROM commits; ``` Then, we remove stopwords using a pre-defined list: ```sql CREATE TABLE stopwords AS SELECT unnest(['a', 'about', 'above', 'after', 'again', 'against', 'all', 'am', 'an', 'and', 'any', 'are', 'as', 'at', 'be', 'because', 'been', 'before', 'being', 'below', 'between', 'both', 'but', 'by', 'can', 'did', 'do', 'does', 'doing', 'don', 'down', 'during', 'each', 'few', 'for', 'from', 'further', 'had', 'has', 'have', 'having', 'he', 'her', 'here', 'hers', 'herself', 'him', 'himself', 'his', 'how', 'i', 'if', 'in', 'into', 'is', 'it', 'its', 'itself', 'just', 'me', 'more', 'most', 'my', 'myself', 'no', 'nor', 'not', 'now', 'of', 'off', 'on', 'once', 'only', 'or', 'other', 'our', 'ours', 'ourselves', 'out', 'over', 'own', 's', 'same', 'she', 'should', 'so', 'some', 'such', 't', 'than', 'that', 'the', 'their', 'theirs', 'them', 'themselves', 'then', 'there', 'these', 'they', 'this', 'those', 'through', 'to', 'too', 'under', 'until', 'up', 'very', 'was', 'we', 'were', 'what', 'when', 'where', 'which', 'while', 'who', 'whom', 'why', 'will', 'with', 'you', 'your', 'yours', 'yourself', 'yourselves']) AS word; CREATE OR REPLACE TABLE words AS FROM words NATURAL ANTI JOIN stopwords WHERE word != ''; ``` > We use the `NATURAL ANTI JOIN` clause here, which allows us to elegantly filter out values that occur in the `stopwords` table. Finally, we select the top-20 most common words. ```sql SELECT word, count(*) AS count FROM words GROUP BY ALL ORDER BY count DESC LIMIT 20; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â” â”‚ w â”‚ count â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ merge â”‚ 12550 â”‚ â”‚ fix â”‚ 6402 â”‚ â”‚ branch â”‚ 6005 â”‚ â”‚ pull â”‚ 5950 â”‚ â”‚ request â”‚ 5945 â”‚ â”‚ add â”‚ 5687 â”‚ â”‚ test â”‚ 3801 â”‚ â”‚ master â”‚ 3289 â”‚ â”‚ tests â”‚ 2339 â”‚ â”‚ issue â”‚ 1971 â”‚ â”‚ main â”‚ 1935 â”‚ â”‚ remove â”‚ 1884 â”‚ â”‚ format â”‚ 1819 â”‚ â”‚ duckdb â”‚ 1710 â”‚ â”‚ use â”‚ 1442 â”‚ â”‚ mytherin â”‚ 1410 â”‚ â”‚ fixes â”‚ 1333 â”‚ â”‚ hawkfish â”‚ 1147 â”‚ â”‚ feature â”‚ 1139 â”‚ â”‚ function â”‚ 1088 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 20 rows â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` As expected, there are many Git terms (` merge`, `branch`, `pull`, etc.), followed by terminology related to development (` fix`, `test`/`tests`, `issue`, `format`). We also see the account names of some developers ([`mytherin`](https://github.com/Mytherin), [`hawkfish`](https://github.com/hawkfish)), which are likely there due to commit message for merging pull requests (e.g., [â€Merge pull request #13776 from Mytherin/expressiondepthâ€](https://github.com/duckdb/duckdb/commit/4d18b9d05caf88f0420dbdbe03d35a0faabf4aa7)). Finally, we also see some DuckDB-related terms such as `duckdb` (shocking!) and `function`. ##### Visualizing the Number of Commits {#docs:stable:guides:snippets:analyze_git_repository::visualizing-the-number-of-commits} Let's visualize the number of commits each year: ```sql SELECT year(timestamp) AS year, count(*) AS num_commits, num_commits.bar(0, 20_000) AS num_commits_viz FROM commits GROUP BY ALL ORDER BY ALL; ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ year â”‚ num_commits â”‚ num_commits_viz â”‚ â”‚ int64 â”‚ int64 â”‚ varchar â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 2018 â”‚ 870 â”‚ â–ˆâ–ˆâ–ˆâ– â”‚ â”‚ 2019 â”‚ 1621 â”‚ â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ– â”‚ â”‚ 2020 â”‚ 3484 â”‚ â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–‰ â”‚ â”‚ 2021 â”‚ 6488 â”‚ â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–‰ â”‚ â”‚ 2022 â”‚ 9817 â”‚ â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–Ž â”‚ â”‚ 2023 â”‚ 14585 â”‚ â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–Ž â”‚ â”‚ 2024 â”‚ 15949 â”‚ â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–Š â”‚ â”‚ 2025 â”‚ 1788 â”‚ â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ– â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` We see a steady growth over the years â€“ especially considering that many of DuckDB's functionalities and clients, which were originally part of the main repository, are now maintained in separate repositories (e.g., [Java](https://github.com/duckdb/duckdb-java), [R](https://github.com/duckdb/duckdb-r)). Happy hacking! ### Importing Duckbox Tables {#docs:stable:guides:snippets:importing_duckbox_tables} > The scripts provided in this page work on Linux, macOS, and WSL. By default, the DuckDB [CLI client](#docs:stable:clients:cli:overview) renders query results in the [duckbox format](#docs:stable:clients:cli:output_formats), which uses rich, ASCII-art inspired tables to show data. These tables are often shared verbatim in other documents. For example, take the table used to demonstrate [new CSV features in the DuckDB v1.2.0 release blog post](https://duckdb.org/2025/02/05/announcing-duckdb-120#csv-features.md): ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â” â”‚ a â”‚ b â”‚ â”‚ varchar â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ hello â”‚ 42 â”‚ â”‚ world â”‚ 84 â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”˜ ``` What if we would like to load this data back to DuckDB? This is not supported by default but it can be achieved by some scripting: we can turn the table into a `â”‚`-separated file and read it with DuckDB's [CSV reader](#docs:stable:data:csv:overview). Note that the separator is not the pipe character `|`, instead it is the [â€œBox Drawings Light Verticalâ€ character](https://www.compart.com/en/unicode/U+2502) `â”‚`. #### Loading Duckbox Tables to DuckDB {#docs:stable:guides:snippets:importing_duckbox_tables::loading-duckbox-tables-to-duckdb} First, we save the table above as `duckbox.csv`. Then, we clean it using `sed`: ```bash echo -n > duckbox-cleaned.csv sed -n "2s/^â”‚ *//;s/ *â”‚$//;s/ *â”‚ */â”‚/p;2q" duckbox.csv >> duckbox-cleaned.csv sed "1,4d;\$d;s/^â”‚ *//;s/ *â”‚$//;s/ *â”‚ */â”‚/g" duckbox.csv >> duckbox-cleaned.csv ``` The `duckbox-cleaned.csv` file looks as follows: ```text aâ”‚b helloâ”‚42 worldâ”‚84 ``` We can then simply load this to DuckDB via: ```sql FROM read_csv('duckbox-cleaned.csv', delim = 'â”‚'); ``` And export it to a CSV: ```sql COPY (FROM read_csv('duckbox-cleaned.csv', delim = 'â”‚')) TO 'out.csv'; ``` ```text a,b hello,42 world,84 ``` #### Using `shellfs` {#docs:stable:guides:snippets:importing_duckbox_tables::using-shellfs} To parse duckbox tables with a single `read_csv` call â€“ and without creating any temporary files â€“, we can use the [`shellfs` community extension](#community_extensions:extensions:shellfs): ```sql INSTALL shellfs FROM community; LOAD shellfs; FROM read_csv( '(sed -n "2s/^â”‚ *//;s/ *â”‚$//;s/ *â”‚ */â”‚/p;2q" duckbox.csv; ' || 'sed "1,4d;\$d;s/^â”‚ *//;s/ *â”‚$//;s/ *â”‚ */â”‚/g" duckbox.csv) |', delim = 'â”‚' ); ``` We can also create a [table macro](#docs:stable:sql:statements:create_macro::table-macros): ```sql CREATE MACRO read_duckbox(path) AS TABLE FROM read_csv( printf( '(sed -n "2s/^â”‚ *//;s/ *â”‚$//;s/ *â”‚ */â”‚/p;2q" %s; ' || 'sed "1,4d;\$d;s/^â”‚ *//;s/ *â”‚$//;s/ *â”‚ */â”‚/g" %s) |', path, path ), delim = 'â”‚' ); ``` Then, reading a duckbox table is as simple as: ```sql FROM read_duckbox('duckbox.csv'); ``` > `shellfs` is a community extension and it comes without any support or guarantees. > Only use it if you can ensure that its inputs are appropriately sanitized. > Please consult the [Securing DuckDB page](#docs:stable:operations_manual:securing_duckdb:overview) for more details. #### Limitations {#docs:stable:guides:snippets:importing_duckbox_tables::limitations} Please consider the following limitations when running this script: * This approach only works if the table does not have long pipe `â”‚` characters. It also trims spaces from the table cell values. Make sure to factor in these assumptions when running the script. * The script is compatible with both BSD `sed` (which is the default on macOS) and GNU `sed` (which is the default on Linux and available on macOS as `gsed`). * Only the data types [supported by the CSV sniffer](#docs:stable:data:csv:auto_detection::type-detection) are parsed correctly. Values containing nested data will be parsed as a `VARCHAR`. ### Copying an In-Memory Database to a File {#docs:stable:guides:snippets:copy_in-memory_database_to_file} Imagine the following situation â€“ you started DuckDB in in-memory mode but would like to persist the state of your database to disk. To achieve this, **attach to a new disk-based database** and use the [`COPY FROM DATABASE ... TO` command](#docs:stable:sql:statements:copy::copy-from-database--to): ```sql ATTACH 'my_database.db'; COPY FROM DATABASE memory TO my_database; DETACH my_database; ``` > Ensure that the disk-based database file does not exist before attaching to it. ## Troubleshooting {#guides:troubleshooting} ### Crashes {#docs:stable:guides:troubleshooting:crashes} DuckDB is [thoroughly tested](#why_duckdb::thoroughly-tested) via an extensive test suite. However, bugs can still occur and these can sometimes lead to crashes. This page contains practical information on how to troubleshoot DuckDB crashes. #### Types of Crashes {#docs:stable:guides:troubleshooting:crashes::types-of-crashes} There are a few major types of crashes: * **Termination signals:** The process stops with a `SIGSEGV` (segmentation fault), `SIGABRT`, etc.: these should never occur. Please [submit an issue](#::submitting-an-issue). * **Internal errors:** an operation may result in an [`Internal Error`](#docs:stable:dev:internal_errors), e.g.: ```console INTERNAL Error: Attempted to access index 3 within vector of size 3 ``` After encountering an internal error, DuckDB enters a restricted mode where any further operations will result in the following error message: ```console FATAL Error: Failed: database has been invalidated because of a previous fatal error. The database must be restarted prior to being used again. ``` * **Out of memory errors:** A DuckDB crash can also be a symptom of the operating system killing the process. For example, many Linux distributions run an [OOM reaper or OOM killer process](https://learn.redhat.com/t5/Platform-Linux/Out-of-Memory-Killer/td-p/48828), which kills processes to free up their memory and thus prevents the operating system from running out of memory. If your DuckDB session is killed by the OOM reaper, consult the [â€œOOM errorsâ€ page](#docs:stable:guides:troubleshooting:oom_errors) #### Recovering Data {#docs:stable:guides:troubleshooting:crashes::recovering-data} If your DuckDB session was writing to a persistent database file prior to crashing, there might be a WAL ([write-ahead log](https://en.wikipedia.org/wiki/Write-ahead_logging)) file next to your database named `âŸ¨database_filenameâŸ©.wal`{:.language-sql .highlight}. To recover data from the WAL file, simply start a new DuckDB session on the persistent database. DuckDB will then replay the write-ahead log and perform a [checkpoint operation](#docs:stable:sql:statements:checkpoint), restoring the database to the state before the crash. #### Troubleshooting the Crash {#docs:stable:guides:troubleshooting:crashes::troubleshooting-the-crash} ##### Using the Latest Stable and Preview Builds {#docs:stable:guides:troubleshooting:crashes::using-the-latest-stable-and-preview-builds} DuckDB is constantly improving, so there is a chance that the bug you have encountered has already been fixed in the codebase. First, try updating to the [**latest stable build**](https://duckdb.org/install/index.html?version=stable). If this doesn't resolve the problem, try using the [**preview build**](https://duckdb.org/install/index.html?version=main) (also known as the â€œnightly buildâ€). If you would like use DuckDB with an [open pull request](https://github.com/duckdb/duckdb/pulls) applied to the codebase, you can try [building it from source](#docs:stable:dev:building:overview). ##### Search for Existing Issues {#docs:stable:guides:troubleshooting:crashes::search-for-existing-issues} There is a chance that someone else already reported the bug that causes the crash. Please search in the [GitHub issue tracker](https://github.com/duckdb/duckdb/issues) for the error message to see potentially related issues. DuckDB has a large community and there may be some suggestions for a workaround. ##### Disabling the Query Optimizer {#docs:stable:guides:troubleshooting:crashes::disabling-the-query-optimizer} Some crashes are caused by DuckDB's query optimizer component. To identify whether the optimizer is causing the crash, try to turn it off and re-run the query: ```sql PRAGMA disable_optimizer; ``` If the query finishes successfully, then the crash was caused by one or more optimizer rules. To pinpoint the specific rules that caused the crash, you can try to [selectively disable optimizer rules](#docs:stable:configuration:pragmas::selectively-disabling-optimizers). This way, your query can still benefit from the rest of the optimizer rules. ##### Try to Isolate the Issue {#docs:stable:guides:troubleshooting:crashes::try-to-isolate-the-issue} Some issues are caused by the interplay of different components and extensions, or are specific to certain platforms or client languages. You can often isolate the issue to a smaller problem. ###### Reproducing in Plain SQL {#docs:stable:guides:troubleshooting:crashes::reproducing-in-plain-sql} Issues can also occur due to differences in client libraries. To understand whether this is the case, try reproducing the issue using plain SQL queries with the [DuckDB CLI client](#docs:stable:clients:cli:overview). If you cannot reproduce the issue in the command line client, it is likely related to the client library. ###### Different Hardware Setup {#docs:stable:guides:troubleshooting:crashes::different-hardware-setup} According to our experience, several crashes occur due to faulty hardware (overheating hard drives, overclocked CPUs, etc.). Therefore, it's worth trying another computer to run the same workload. ###### Decomposing the Query {#docs:stable:guides:troubleshooting:crashes::decomposing-the-query} It's a good idea to try to break down the query into multiple smaller queries with each using a separate DuckDB extension and SQL feature. For example, if you have a query that targets a dataset in an AWS S3 bucket and performs two joins on it, try to rewrite it as a series of smaller steps as follows. Download the dataset's files manually and load them into DuckDB. Then perform the first join and the second join separately. If the multi-step approach still exhibits the crash at some step, then the query that triggers the crash is a good basis for a minimal reproducible example.If the multi-step approach works and the multi-step process no longer crashes, try to reconstruct the original query and observe which step reintroduces the error. In both cases, you will have a better understanding of what is causing the issue and potentially also a workaround that you can use right away. In any case, please consider [submitting an issue](#::submitting-an-issue) with your findings. #### Submitting an Issue {#docs:stable:guides:troubleshooting:crashes::submitting-an-issue} If you found a crash in DuckDB, please consider submitting an issue in our [GitHub issue tracker](https://github.com/duckdb/duckdb/issues) with a [minimal reproducible example](https://en.wikipedia.org/wiki/Minimal_reproducible_example). ### Out of Memory Errors {#docs:stable:guides:troubleshooting:oom_errors} DuckDB has a state of the art out-of-core query engine that can spill to disk for larger-than-memory processing. We continuously strive to improve DuckDB to improve its scalability and prevent out of memory errors whenever possible. That said, you may still experience out-of-memory errors if you run queries with multiple [blocking operators](#docs:stable:guides:performance:how_to_tune_workloads::blocking-operators), certain aggregation functions, `PIVOT` operations, etc., or if you have very little available memory compared to the dataset size. #### Types of â€œOut of Memoryâ€ Errors {#docs:stable:guides:troubleshooting:oom_errors::types-of-out-of-memory-errors} Out of memory errors mainly occur in two forms: ##### `OutOfMemoryException` {#docs:stable:guides:troubleshooting:oom_errors::outofmemoryexception} Most of the time DuckDB runs out of memory with an `OutOfMemoryException`. For example: ```console duckdb.duckdb.OutOfMemoryException: Out of Memory Error: failed to pin block of size 256.0 KiB (476.7 MiB/476.8 MiB used) ``` ##### OOM Reaper (Linux) {#docs:stable:guides:troubleshooting:oom_errors::oom-reaper-linux} Many Linux distributions have an [OOM killer or OOM reaper process](https://learn.redhat.com/t5/Platform-Linux/Out-of-Memory-Killer/td-p/48828) whose goal is to prevent memory overcommitment. If the OOM reaper killed your process, you often see the following message where DuckDB was running: ```console Killed ``` To get more detailed information, check the diagnostic messages using the [`dmesg` command](https://en.wikipedia.org/wiki/Dmesg) (you may need `sudo`): ```batch sudo dmesg ``` If the process was killed by the OOM killer/reaper, you will find an entry like this: ```console [Fri Apr 18 02:04:10 2025] Out of memory: Killed process 54400 (duckdb) total-vm:1037911068kB, anon-rss:770031964kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:1814612kB oom_score_adj:0 ``` #### Troubleshooting Out of Memory Errors {#docs:stable:guides:troubleshooting:oom_errors::troubleshooting-out-of-memory-errors} To prevent out of memory errors, try to reduce memory usage. To this end, please consult the [â€œHow to Tune Workloadsâ€ site](#docs:stable:guides:performance:how_to_tune_workloads). In short: * Reduce the number of threads using the `SET threads = ...` command. * If your query reads a large mount of data from a file or writes a large amount of data, try setting the `preserve_insertion_order` option to `false`: `SET preserve_insertion_order = false`. * Counter-intuitively, reducing the memory limit below the [default 80%](#docs:stable:operations_manual:limits) can help prevent out of memory errors. This is because some DuckDB operations circumvent the database's buffer manager and thus they can reserve more memory than allowed by the memory limit. If this happens (e.g., DuckDB is killed by the operating system or an OOM reaper process), set the memory limit to just 50-60% of the total system memory by using the `SET memory_limit = '...'` statement. * Break up the query into subqueries. This allows you to see where the intermediate results â€œblow upâ€, causing the query to run out of memory. #### See Also {#docs:stable:guides:troubleshooting:oom_errors::see-also} For more information on DuckDB's memory management, see the [â€œMemory Management in DuckDBâ€ blog post](https://duckdb.org/2024/07/09/memory-management). ## Glossary of Terms {#docs:stable:guides:glossary} This page contains a glossary of a few common terms used in DuckDB. #### Terms {#docs:stable:guides:glossary::terms} ##### In-Process Database Management System {#docs:stable:guides:glossary::in-process-database-management-system} The DBMS runs in the client application's process instead of running as a separate process, which is common in the traditional clientâ€“server setup. An alternative term is **embeddable** database management system. In general, the term _â€œembedded database management systemâ€_ should be avoided, as it can be confused with DBMSs targeting _embedded systems_ (which run on e.g., microcontrollers). ##### Replacement Scan {#docs:stable:guides:glossary::replacement-scan} In DuckDB, replacement scans are used when a table name used by a query does not exist in the catalog. These scans can substitute another data source instead of the table. Using replacement scans allows DuckDB to, e.g., seamlessly read [Pandas DataFrames](#docs:stable:guides:python:sql_on_pandas) or read input data from remote sources without explicitly invoking the functions that perform this (e.g., [reading Parquet files from https](#docs:stable:guides:network_cloud_storage:http_import)). For details, see the [C API â€“ Replacement Scans page](#docs:stable:clients:c:replacement_scans). ##### Extension {#docs:stable:guides:glossary::extension} DuckDB has a flexible extension mechanism that allows for dynamically loading extensions. These may extend DuckDB's functionality by providing support for additional file formats, introducing new types, and domain-specific functionality. For details, see the [Extensions page](#docs:stable:extensions:overview). ##### Platform {#docs:stable:guides:glossary::platform} The platform is a combination of the operating system (e.g., Linux, macOS, Windows), system architecture (e.g., AMD64, ARM64), and, optionally, the compiler used (e.g., GCC4). Platforms are used to distributed DuckDB binaries and [extension packages](#docs:stable:extensions:extension_distribution::platforms). ## Browsing Offline {#docs:stable:guides:offline-copy} The offline documentation is currently not available. Please check back later. # Operations Manual {#operations_manual} ## Overview {#docs:stable:operations_manual:overview} We designed DuckDB to be easy to deploy and operate. We believe that most users do not need to consult the pages of the operations manual. However, there are certain setups â€“ e.g., when DuckDB is running in mission-critical infrastructure â€“ where we would like to offer advice on how to configure DuckDB. The operations manual contains advice for these cases and also offers convenient configuration snippets such as Gitignore files. For advice on getting the best performance from DuckDB, see also the [Performance Guide](#docs:stable:guides:performance:overview). ## DuckDB's Footprint {#operations_manual:footprint_of_duckdb} ### Files Created by DuckDB {#docs:stable:operations_manual:footprint_of_duckdb:files_created_by_duckdb} DuckDB creates several files and directories on disk. This page lists both the global and the local ones. #### Global Files and Directories {#docs:stable:operations_manual:footprint_of_duckdb:files_created_by_duckdb::global-files-and-directories} DuckDB creates the following global files and directories in the user's home directory (denoted with `~`): | Location | Description | Shared between versions | Shared between clients | |-------|-------------------|--|--| | `~/.duckdbrc` | The content of this file is executed when starting the [DuckDB CLI client](#docs:stable:clients:cli:overview). The commands can be both [dot command](#docs:stable:clients:cli:dot_commands) and SQL statements. The naming of this file follows the `~/.bashrc` and `~/.zshrc` â€œrun commandsâ€ files. | Yes | Only used by CLI | | `~/.duckdb_history` | History file, similar to `~/.bash_history` and `~/.zsh_history`. Used by the [DuckDB CLI client](#docs:stable:clients:cli:overview). | Yes | Only used by CLI | | `~/.duckdb/extensions` | Binaries of installed [extensions](#docs:stable:extensions:overview). | No | Yes | | `~/.duckdb/stored_secrets` | [Persistent secrets](#docs:stable:configuration:secrets_manager::persistent-secrets) created by the [Secrets manager](#docs:stable:configuration:secrets_manager). | Yes | Yes | #### Local Files and Directories {#docs:stable:operations_manual:footprint_of_duckdb:files_created_by_duckdb::local-files-and-directories} DuckDB creates the following files and directories in the working directory (for in-memory connections) or relative to the database file (for persistent connections): | Name | Description | Example | |-------|-------------------|---| | `âŸ¨database_filenameâŸ©`{:.language-sql .highlight} | Database file. Only created in on-disk mode. The file can have any extension with typical extensions being `.duckdb`, `.db`, and `.ddb`. | `weather.duckdb` | | `.tmp/` | Temporary directory. Only created in in-memory mode. | `.tmp/` | | `âŸ¨database_filenameâŸ©.tmp/`{:.language-sql .highlight} | Temporary directory. Only created in on-disk mode. | `weather.tmp/` | | `âŸ¨database_filenameâŸ©.wal`{:.language-sql .highlight} | [Write-ahead log](https://en.wikipedia.org/wiki/Write-ahead_logging) file. If DuckDB exits normally, the WAL file is deleted upon exit. If DuckDB crashes, the WAL file is required to recover data. | `weather.wal` | If you are working in a Git repository and would like to disable tracking these files by Git, see the instructions on using [`.gitignore` for DuckDB](#docs:stable:operations_manual:footprint_of_duckdb:gitignore_for_duckdb). ### Gitignore for DuckDB {#docs:stable:operations_manual:footprint_of_duckdb:gitignore_for_duckdb} If you work in a Git repository, you may want to configure your [Gitignore](https://git-scm.com/docs/gitignore) to disable tracking [files created by DuckDB](#docs:stable:operations_manual:footprint_of_duckdb:files_created_by_duckdb). These potentially include the DuckDB database, write ahead log, temporary files. #### Sample Gitignore Files {#docs:stable:operations_manual:footprint_of_duckdb:gitignore_for_duckdb::sample-gitignore-files} In the following, we present sample Gitignore configuration snippets for DuckDB. ##### Ignore Temporary Files but Keep Database {#docs:stable:operations_manual:footprint_of_duckdb:gitignore_for_duckdb::ignore-temporary-files-but-keep-database} This configuration is useful if you would like to keep the database file in the version control system: ```text *.wal *.tmp/ ``` ##### Ignore Database and Temporary Files {#docs:stable:operations_manual:footprint_of_duckdb:gitignore_for_duckdb::ignore-database-and-temporary-files} If you would like to ignore both the database and the temporary files, extend the Gitignore file to include the database file. The exact Gitignore configuration to achieve this depends on the extension you use for your DuckDB databases (` .duckdb`, `.db`, `.ddb`, etc.). For example, if your DuckDB files use the `.duckdb` extension, add the following lines to your `.gitignore` file: ```text *.duckdb* *.wal *.tmp/ ``` ### Reclaiming Space {#docs:stable:operations_manual:footprint_of_duckdb:reclaiming_space} DuckDB uses a single-file format, which has some inherent limitations w.r.t. reclaiming disk space. #### `CHECKPOINT` {#docs:stable:operations_manual:footprint_of_duckdb:reclaiming_space::checkpoint} To reclaim space after deleting rows, use the [`CHECKPOINT` statement](#docs:stable:sql:statements:checkpoint). #### `VACUUM` {#docs:stable:operations_manual:footprint_of_duckdb:reclaiming_space::vacuum} The [`VACUUM` statement](#docs:stable:sql:statements:vacuum) does _not_ trigger vacuuming deletes and hence does not reclaim space. #### Compacting a Database by Copying {#docs:stable:operations_manual:footprint_of_duckdb:reclaiming_space::compacting-a-database-by-copying} To compact the database, you can create a fresh copy of the database using the [`COPY FROM DATABASE` statement](#docs:stable:sql:statements:copy::copy-from-database--to). In the following example, we first connect to the original database `db1`, then the new (empty) database `db2`. Then, we copy the content of `db1` to `db2`. ```sql ATTACH 'db1.db' AS db1; ATTACH 'db2.db' AS db2; COPY FROM DATABASE db1 TO db2; ``` ## Installing DuckDB {#operations_manual:installing_duckdb} ### Install Script {#docs:stable:operations_manual:installing_duckdb:install_script} You can install the [DuckDB CLI client](#docs:stable:clients:cli:overview) using an install script. #### Linux and macOS {#docs:stable:operations_manual:installing_duckdb:install_script::linux-and-macos} To use the [DuckDB install script](https://install.duckdb.org) on Linux and macOS, run: ```bash curl https://install.duckdb.org | sh ```

Click to see the output of the install script.

```text % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3507 100 3507 0 0 34367 0 --:--:-- --:--:-- --:--:-- 34382 https://install.duckdb.org/v1.4.1/duckdb_cli-osx-universal.gz *** DuckDB Linux/MacOS installation script, version 1.4.1 *** .;odxdl, .xXXXXXXXXKc 0XXXXXXXXXXXd cooo: ,XXXXXXXXXXXXK OXXXXd 0XXXXXXXXXXXo cooo: .xXXXXXXXXKc .;odxdl, ########################################################################## 100.0% {#docs:stable:operations_manual:installing_duckdb:install_script::1000} Successfully installed DuckDB binary to /Users/your_user/.duckdb/cli/1.4.1/duckdb with a link from /Users/your_user/.duckdb/cli/latest/duckdb Hint: Append the following line to your shell profile: export PATH='/Users/your_user/.duckdb/cli/latest':$PATH To launch DuckDB now, type /Users/your_user/.duckdb/cli/latest/duckdb ```

By default, this installs the latest stable version of DuckDB to `~/.duckdb/cli/latest/duckdb`. To add the DuckDB binary to your path, append the following line to your shell profile or RC file (e.g., `~/.bashrc`, `~/.zshrc`): ```bash export PATH="~/.duckdb/cli/latest":$PATH ``` You can install [past DuckDB releases](#release_calendar::past-releases) (all the way back to v1.0.0) using the `DUCKDB_VERSION` variable. For example, to install v1.2.2, run: ```bash curl https://install.duckdb.org | DUCKDB_VERSION=1.2.2 sh ``` #### Windows {#docs:stable:operations_manual:installing_duckdb:install_script::windows} The DuckDB install script is currently not available for Windows. ## Logging {#operations_manual:logging} ### Logging {#docs:stable:operations_manual:logging:overview} DuckDB implements a logging mechanism that provides users with detailed information about events such as query execution, performance metrics, and system events. #### Basics {#docs:stable:operations_manual:logging:overview::basics} The DuckDB logging mechanism can be enabled or disabled using a special function, `enable_logging`. Logs are stored in a special view named `duckdb_logs`, which can be queried like any standard table. Example: ```sql CALL enable_logging(); -- Run some queries... SELECT * FROM duckdb_logs; ``` To disable logging, run ```sql CALL disable_logging(); ``` To clear the current log, run ```sql CALL truncate_duckdb_logs(); ``` #### Log Level {#docs:stable:operations_manual:logging:overview::log-level} DuckDB supports different logging levels that control the verbosity of the logs: * `ERROR`: Only logs error messages * `WARN`: Logs warnings and errors * `INFO`: Logs general information, warnings and errors (default) * `DEBUG`: Logs detailed debugging information * `TRACE`: Logs very detailed tracing information The log level can be set using: ```sql CALL enable_logging(level = 'debug'); ``` #### Log Types {#docs:stable:operations_manual:logging:overview::log-types} In DuckDB, log messages can have an associated log type. Log types allow two main things: * Fine-grained control over log message generation * Support for structured logging ##### Logging-Specific Types {#docs:stable:operations_manual:logging:overview::logging-specific-types} To log only messages of a specific type: ```sql CALL enable_logging('HTTP'); ``` The above function will automatically set the correct log level, and will add the `HTTP` type to the `enabled_log_types` settings. This ensures only log messages of the 'HTTP' type will be written to the log. To enable multiple log types, simply pass: ```sql CALL enable_logging(['HTTP', 'QueryLog']);~~~~ ``` ##### Structured Logging {#docs:stable:operations_manual:logging:overview::structured-logging} Some log types like `HTTP` will have an associated message schema. To make DuckDB automatically parse the message, use the `duckdb_logs_parsed()` macro. For example: ```sql SELECT request.headers FROM duckdb_logs_parsed('HTTP'); ``` To view the schema of each structure log type simply run: ```sql DESCRIBE FROM duckdb_logs_parsed('HTTP'); ``` ##### List of Available Log Types {#docs:stable:operations_manual:logging:overview::list-of-available-log-types} This is a (non-exhaustive) list of the available log types in DuckDB. | Log Type | Description | Structured | |--------------|----------------------------------------------------------|------------| | `QueryLog` | Logs which queries are executed in DuckDB | No | | `FileSystem` | Logs all FileSystem interaction with DuckDB's Filesystem | Yes | | `HTTP` | Logs all HTTP traffic from DuckDB's internal HTTP client | Yes | #### Log Storages {#docs:stable:operations_manual:logging:overview::log-storages} By default, DuckDB logs to an in-memory log storage (` memory`). DuckDB supports different types of log storage. Currently the following log storage types are implemented in core DuckDB | Log Storage | Description | |-------------|-----------------------------------------------------------| | `memory` | (default) Log to an in-memory buffer | | `stdout` | Log to the stdout of the current process (in CSV format) | | `file` | Log to (a) csv file(s) | Note that the `duckdb_logs` view is automatically updated to target the currently active log storage. This means that switching the log storage may influence what is returned by the `duckdb_logs` function. ##### Logging to stdout {#docs:stable:operations_manual:logging:overview::logging-to-stdout} ```sql CALL enable_logging(storage = 'stdout'); ``` ##### Logging to File {#docs:stable:operations_manual:logging:overview::logging-to-file-} ```sql CALL enable_logging(storage = 'file', storage_config = {'path': 'path/to/store/logs'}); ``` or using the equivalent shorthand: ```sql CALL enable_logging(storage_path = 'path/to/store/logs'); ``` #### Advanced Usage {#docs:stable:operations_manual:logging:overview::advanced-usage} ##### Normalized vs. Denormalized Logging {#docs:stable:operations_manual:logging:overview::normalized-vs-denormalized-logging} DuckDB's log storages can log in two ways: normalized vs. denormalized. In denormalized logging, the log context information is appended directly to each log entry, while in normalized logging the log entries are stored separately with context_ids referencing the context information. | Log Storage | Normalized | |-------------|--------------| | `memory` | yes | | `file` | configurable | | `stdout` | no | For file storage, you can switch between normalized and denormalized by providing a path ending in .csv (for normalized) or without .csv (for denormalized). For file logging, denormalized is generally recommended since this increases performance and reduces the total size of the logs. To configure normalization of `file` log storage: ```sql -- normalized: creates `/tmp/duckdb_log_contexts.csv` and `/tmp/duckdb_log_entries.csv` CALL enable_logging(storage_path = '/tmp'); -- denormalized: creates `/tmp/logs.csv` CALL enable_logging(storage_path = '/tmp/logs.csv'); ``` Note that the difference between normalized and denormalized is typically hidden from users through the 'duckdb_logs' function, which automatically joins normalized tables into a single unified result. To illustrate, both configurations above will be queryable using `FROM duckdb_logs;` and will produce identical results. ##### Buffer Size {#docs:stable:operations_manual:logging:overview::buffer-size} The log storage in DuckDB implements a buffering mechanism to optimize logging performance. This implementation introduces a potential delay between message logging and storage writing. This delay can obscure the actual message writing time, which is particularly problematic when debugging crashes, as messages generated immediately before a crash might not be written. To address this, the buffer size can be configured as follows: ```sql CALL enable_logging(storage_config = {'buffer_size': 0}); ``` or using the equivalent shorthand: ```sql CALL enable_logging(storage_buffer_size = 0); ``` Note that the default buffer size is different for different log storages: | Log Storage | Default buffer size | |-------------|-------------------------------| | `memory` | `STANDARD_VECTOR_SIZE` (2048) | | `file` | `STANDARD_VECTOR_SIZE` (2048) | | `stdout` | Disabled (0) | So for example, if you want to increase your `stdout` logging performance, simply enable buffering to greatly (>10x) speed up your logging: ```SQL CALL enable_logging(storage = 'stdout', storage_buffer_size = 2048); ``` Or imagine you are debugging a crash in DuckDB and you want to use the `file` logger to understand what's going on: Simply disable the buffering using: ```sql CALL enable_logging(storage_path = '/tmp/mylogs', storage_buffer_size = 2048); ``` ##### Syntactic Sugar {#docs:stable:operations_manual:logging:overview::syntactic-sugar} DuckDB contains some syntactic sugar to make common paths. For example, the following statements are all equal: ```sql -- regular invocation CALL enable_logging(storage = 'file', storage_config = {'path': 'path/to/store/logs'}); -- using shorthand for common path storage config param CALL enable_logging(storage = 'file', storage_path = 'path/to/store/logs'); -- omitting `storage = 'file'` -> is implied from presence of `storage_config` CALL enable_logging(storage_config = {'path': 'path/to/store/logs'}); ``` ## Securing DuckDB {#operations_manual:securing_duckdb} ### Securing DuckDB {#docs:stable:operations_manual:securing_duckdb:overview} DuckDB is quite powerful, which can be problematic, especially if untrusted SQL queries are run, e.g., from public-facing user inputs. This page lists some options to restrict the potential fallout from malicious SQL queries. The approach to securing DuckDB varies depending on your use case, environment, and potential attack models. Therefore, consider the security-related configuration options carefully, especially when working with confidential datasets. If you plan to embed DuckDB in your application, please consult the [â€œEmbedding DuckDBâ€](#docs:stable:operations_manual:securing_duckdb:embedding_duckdb) page. #### Reporting Vulnerabilities {#docs:stable:operations_manual:securing_duckdb:overview::reporting-vulnerabilities} If you discover a potential vulnerability, please [report it confidentially via GitHub](https://github.com/duckdb/duckdb/security/advisories/new). #### Safe Mode (CLI) {#docs:stable:operations_manual:securing_duckdb:overview::safe-mode-cli} DuckDB's CLI client supports [â€œsafe modeâ€](#docs:stable:clients:cli:safe_mode), which prevents DuckDB from accessing external files other than the database file. This can be activated via a command line argument or a [dot command](#docs:stable:clients:cli:dot_commands): ```batch duckdb -safe ... ``` ```plsql .safe_mode ``` #### Restricting File Access {#docs:stable:operations_manual:securing_duckdb:overview::restricting-file-access} DuckDB can list directories and read arbitrary files via its CSV parserâ€™s [`read_csv` function](#docs:stable:data:csv:overview) or read text via the [`read_text` function](#docs:stable:sql:functions:text::read_textsource). This makes it possible to read from the local file system, for example: ```sql SELECT * FROM read_csv('/etc/passwd', sep = ':'); ``` ##### Disabling File Access {#docs:stable:operations_manual:securing_duckdb:overview::disabling-file-access} Files access can be disabled in two ways. First, you can disable individual file systems. For example: ```sql SET disabled_filesystems = 'LocalFileSystem'; ``` Second, you can also completely disable external access by setting the [`enable_external_access` option](#docs:stable:configuration:overview::configuration-reference) option to `false`. ```sql SET enable_external_access = false; ``` This setting implies that: * `ATTACH` cannot attach to a database in a file. * `COPY` cannot read to or write from files. * Functions such as `read_csv`, `read_parquet`, `read_json`, etc. cannot read from an external source. ##### The `allowed_directories` and `allowed_paths` Options {#docs:stable:operations_manual:securing_duckdb:overview::the-allowed_directories-and-allowed_paths-options} You can restrict DuckDB's access to certain directories or files using the `allowed_directories` and `allowed_paths` options (respectively). These options allows fine-grained access control for the file system. For example, you can set DuckDB to only use the `/tmp` directory. ```sql SET allowed_directories = ['/tmp']; SET enable_external_access = false; FROM read_csv('test.csv'); ``` With the setting applied, DuckDB will refuse to read files in the current working directory: ```console Permission Error: Cannot access file "test.csv" - file system operations are disabled by configuration ``` #### Secrets {#docs:stable:operations_manual:securing_duckdb:overview::secrets} [Secrets](#docs:stable:configuration:secrets_manager) are used to manage credentials to log into third party services like AWS or Azure. DuckDB can show a list of secrets using the `duckdb_secrets()` table function. This will redact any sensitive information such as security keys by default. The `allow_unredacted_secrets` option can be set to show all information contained within a security key. It is recommended not to turn on this option if you are running untrusted SQL input. Queries can access the secrets defined in the Secrets Manager. For example, if there is a secret defined to authenticate with a user, who has write privileges to a given AWS S3 bucket, queries may write to that bucket. This is applicable for both persistent and temporary secrets. [Persistent secrets](#docs:stable:configuration:secrets_manager::persistent-secrets) are stored in unencrypted binary format on the disk. These have the same permissions as SSH keys, `600`, i.e., only user who is running the DuckDB (parent) process can read and write them. #### Locking Configurations {#docs:stable:operations_manual:securing_duckdb:overview::locking-configurations} Security-related configuration settings generally lock themselves for safety reasons. For example, while we can disable [community extensions](#community_extensions:index) using the `SET allow_community_extensions = false`, we cannot re-enable them again after the fact without restarting the database. Trying to do so will result in an error: ```console Invalid Input Error: Cannot upgrade allow_community_extensions setting while database is running ``` This prevents untrusted SQL input from re-enabling settings that were explicitly disabled for security reasons. Nevertheless, many configuration settings do not disable themselves, such as the resource constraints. If you allow users to run SQL statements unrestricted on your own hardware, it is recommended that you lock the configuration after your own configuration has finished using the following command: ```sql SET lock_configuration = true; ``` This prevents any configuration settings from being modified from that point onwards. #### Prepared Statements to Prevent SQL Injection {#docs:stable:operations_manual:securing_duckdb:overview::prepared-statements-to-prevent-sql-injection} Similarly to other SQL databases, it's recommended to use [prepared statements](#docs:stable:sql:query_syntax:prepared_statements) in DuckDB to prevent [SQL injection](https://en.wikipedia.org/wiki/SQL_injection). **Therefore, avoid concatenating strings for queries:** ```python import duckdb duckdb.execute("SELECT * FROM (VALUES (32, 'a'), (42, 'b')) t(x) WHERE x = " + str(42)).fetchall() ``` **Instead, use prepared statements:** ```python import duckdb duckdb.execute("SELECT * FROM (VALUES (32, 'a'), (42, 'b')) t(x) WHERE x = ?", [42]).fetchall() ``` #### Constrain Resource Usage {#docs:stable:operations_manual:securing_duckdb:overview::constrain-resource-usage} DuckDB can use quite a lot of CPU, RAM, and disk space. To avoid denial of service attacks, these resources can be limited. The number of CPU threads that DuckDB can use can be set using, for example: ```sql SET threads = 4; ``` Where 4 is the number of allowed threads. The maximum amount of memory (RAM) can also be limited, for example: ```sql SET memory_limit = '4GB'; ``` The size of the temporary file directory can be limited with: ```sql SET max_temp_directory_size = '4GB'; ``` #### Extensions {#docs:stable:operations_manual:securing_duckdb:overview::extensions} DuckDB has a powerful extension mechanism, which have the same privileges as the user running DuckDB's (parent) process. This introduces security considerations. Therefore, we recommend reviewing the configuration options for [securing extensions](#docs:stable:operations_manual:securing_duckdb:securing_extensions). #### Privileges {#docs:stable:operations_manual:securing_duckdb:overview::privileges} Avoid running DuckDB as a root user (e.g., using `sudo`). There is no good reason to run DuckDB as root. #### Generic Solutions {#docs:stable:operations_manual:securing_duckdb:overview::generic-solutions} Securing DuckDB can also be supported via proven means, for example: * Scoping user privileges via [`chroot`](https://en.wikipedia.org/wiki/Chroot), relying on the operating system * Containerization, e.g., via Docker or Podman. See the [â€œDuckDB Docker Containerâ€ page](#docs:stable:operations_manual:duckdb_docker) * Running DuckDB in WebAssembly ### Embedding DuckDB {#docs:stable:operations_manual:securing_duckdb:embedding_duckdb} #### CLI Client {#docs:stable:operations_manual:securing_duckdb:embedding_duckdb::cli-client} The [Command Line Interface (CLI) client](#docs:stable:clients:cli:overview) is intended for interactive use cases and not for embedding. As a result, it has more features that could be abused by a malicious actor. For example, the CLI client has the `.sh` feature that allows executing arbitrary shell commands. This feature is only present in the CLI client and not in any other DuckDB clients. ```sql .sh ls ``` > **Tip.** Calling DuckDB's CLI client via shell commands is **not recommended** for embedding DuckDB. It is recommended to use one of the client libraries, e.g., [Python](#docs:stable:clients:python:overview), [R](#docs:stable:clients:r), [Java](#docs:stable:clients:java), etc. ### Securing Extensions {#docs:stable:operations_manual:securing_duckdb:securing_extensions} DuckDB has a powerful extension mechanism, which have the same privileges as the user running DuckDB's (parent) process. This introduces security considerations. Therefore, we recommend reviewing the configuration options listed on this page and setting them according to your attack models. #### DuckDB Signature Checks {#docs:stable:operations_manual:securing_duckdb:securing_extensions::duckdb-signature-checks} DuckDB extensions are checked on every load using the signature of the binaries. There are currently three categories of extensions: * Signed with a `core` key. Only extensions vetted by the core DuckDB team are signed with these keys. * Signed with a `community` key. These are open-source extensions distributed via the [DuckDB Community Extensions repository](#community_extensions:index). * Unsigned. #### Overview of Security Levels for Extensions {#docs:stable:operations_manual:securing_duckdb:securing_extensions::overview-of-security-levels-for-extensions} DuckDB offers the following security levels for extensions. | Usable extensions | Description | Configuration | |-----|---|---| | `core` | Extensions can only be loaded if signed from a `core` key. | `SET allow_community_extensions = false` | | `core` and `community` | Extensions can only be loaded if signed from a `core` or `community` key. | This is the default security level. | | Any extension including unsigned | Any extensions can be loaded. | `SET allow_unsigned_extensions = true` | Security-related configuration settings [lock themselves](#docs:stable:operations_manual:securing_duckdb:overview::locking-configurations), i.e., it is only possible to restrict capabilities in the current process. For example, attempting the following configuration changes will result in an error: ```sql SET allow_community_extensions = false; SET allow_community_extensions = true; ``` ```console Invalid Input Error: Cannot upgrade allow_community_extensions setting while database is running ``` #### Community Extensions {#docs:stable:operations_manual:securing_duckdb:securing_extensions::community-extensions} DuckDB has a [Community Extensions repository](#community_extensions:index), which allows convenient installation of third-party extensions. Community extension repositories like pip or npm are essentially enabling remote code execution by design. This is less dramatic than it sounds. For better or worse, we are quite used to piping random scripts from the web into our shells, and routinely install a staggering amount of transitive dependencies without thinking twice. Some repositories like CRAN enforce a human inspection at some point, but thatâ€™s no guarantee for anything either. Weâ€™ve studied several different approaches to community extension repositories and have picked what we think is a sensible approach: we do not attempt to review the submissions, but require that the *source code of extensions is available*. We do take over the complete build, sign and distribution process. Note that this is a step up from pip and npm that allow uploading arbitrary binaries but a step down from reviewing everything manually. We allow users to [report malicious extensions](https://github.com/duckdb/community-extensions/security/advisories/new) and show adoption statistics like GitHub stars and download count. Because we manage the repository, we can remove problematic extensions from distribution quickly. Despite this, installing and loading DuckDB extensions from the community extension repository will execute code written by third party developers, and therefore *can* be dangerous. A malicious developer could create and register a harmless-looking DuckDB extension that steals your crypto coins. If youâ€™re running a web service that executes untrusted SQL from users with DuckDB, we recommend disabling community extensions. To do so, run: ```sql SET allow_community_extensions = false; ``` #### Disabling Autoinstalling and Autoloading Known Extensions {#docs:stable:operations_manual:securing_duckdb:securing_extensions::disabling-autoinstalling-and-autoloading-known-extensions} By default, DuckDB automatically installs and loads known extensions. To disable autoinstalling known extensions, run: ```sql SET autoinstall_known_extensions = false; ``` To disable autoloading known extensions, run: ```sql SET autoload_known_extensions = false; ``` To lock this configuration, use the [`lock_configuration` option](#docs:stable:operations_manual:securing_duckdb:overview::locking-configurations): ```sql SET lock_configuration = true; ``` #### Always Require Signed Extensions {#docs:stable:operations_manual:securing_duckdb:securing_extensions::always-require-signed-extensions} By default, DuckDB requires extensions to be either signed as core extensions (created by the DuckDB developers) or community extensions (created by third-party developers but distributed by the DuckDB developers). The [`allow_unsigned_extensions` setting](#docs:stable:extensions:overview::unsigned-extensions) can be enabled on start-up to allow loading unsigned extensions. While this setting is useful for extension development, enabling it will allow DuckDB to load _any extensions,_ which means more care must be taken to ensure malicious extensions are not loaded. ## Non-Deterministic Behavior {#docs:stable:operations_manual:non-deterministic_behavior} Several operators in DuckDB exhibit non-deterministic behavior. Most notably, SQL uses set semantics, which allows results to be returned in a different order. DuckDB exploits this to improve performance, particularly when performing multi-threaded query execution. Other factors, such as using different compilers, operating systems, and hardware architectures, can also cause changes in ordering. This page documents the cases where non-determinism is an _expected behavior_. If you would like to make your queries deterministic, see the [â€œWorking Around Non-Determinismâ€ section](#::working-around-non-determinism). #### Set Semantics {#docs:stable:operations_manual:non-deterministic_behavior::set-semantics} One of the most common sources of non-determinism is the set semantics used by SQL. E.g., if you run the following query repeatedly, you may get two different results: ```sql SELECT * FROM ( SELECT 'A' AS x UNION SELECT 'B' AS x ); ``` Both results `A`, `B` and `B`, `A` are correct. #### Different Results on Different Platforms: `array_distinct` {#docs:stable:operations_manual:non-deterministic_behavior::different-results-on-different-platforms-array_distinct} The `array_distinct` function may return results [in a different order on different platforms](https://github.com/duckdb/duckdb/issues/13746): ```sql SELECT array_distinct(['A', 'A', 'B', NULL, NULL]) AS arr; ``` For this query, both `[A, B]` and `[B, A]` are valid results. #### Floating-Point Aggregate Operations with Multi-Threading {#docs:stable:operations_manual:non-deterministic_behavior::floating-point-aggregate-operations-with-multi-threading} Floating-point inaccuracies may produce different results when run in a multi-threaded configurations: For example, [`stddev` and `corr` may produce non-deterministic results](https://github.com/duckdb/duckdb/issues/13763): ```sql CREATE TABLE tbl AS SELECT 'ABCDEFG'[floor(random() * 7 + 1)::INT] AS s, 3.7 AS x, i AS y FROM range(1, 1_000_000) r(i); SELECT s, stddev(x) AS standard_deviation, corr(x, y) AS correlation FROM tbl GROUP BY s ORDER BY s; ``` The expected standard deviations and correlations from this query are 0 for all values of `s`. However, when executed on multiple threads, the query may return small numbers (` 0 <= z < 10e-16`) due to floating-point inaccuracies. #### Working Around Non-Determinism {#docs:stable:operations_manual:non-deterministic_behavior::working-around-non-determinism} For the majority of use cases, non-determinism is not causing any issues. However, there are some cases where deterministic results are desirable. In these cases, try the following workarounds: 1. Limit the number of threads to prevent non-determinism introduced by multi-threading. ```sql SET threads = 1; ``` 2. Enforce ordering. For example, you can use the [`ORDER BY ALL` clause](#docs:stable:sql:query_syntax:orderby::order-by-all): ```sql SELECT * FROM ( SELECT 'A' AS x UNION SELECT 'B' AS x ) ORDER BY ALL; ``` You can also sort lists using [`list_sort`](#docs:stable:sql:functions:list::list_sortlist): ```sql SELECT list_sort(array_distinct(['A', 'A', 'B', NULL, NULL])) AS i ORDER BY i; ``` It's also possible to introduce a [deterministic shuffling](https://duckdb.org/2024/08/19/duckdb-tricks-part-1#shuffling-data). ## Limits {#docs:stable:operations_manual:limits} This page contains DuckDB's built-in limit values. To check the value of a setting on your system, use the `current_setting` function. #### Limit Values {#docs:stable:operations_manual:limits::limit-values} | Limit | Default value | Configuration option | Comment | |---|---|---|---| | Array size | 100000 | - | | | BLOB size | 4 GB | - | | | Expression depth | 1000 | [`max_expression_depth`](#docs:stable:configuration:overview) | | | Memory allocation for a vector | 128 GB | - | | | Memory use | 80% of RAM | [`memory_limit`](#docs:stable:configuration:pragmas::memory-limit) | Note: This limit only applies to the buffer manager. | | String size | 4 GB | - | | | Temporary directory size | unlimited | [`max_temp_directory_size`](#docs:stable:configuration:overview) | | #### Size of Database Files {#docs:stable:operations_manual:limits::size-of-database-files} DuckDB doesn't have a practical limit for the size of a single DuckDB database file. We have database files using 15 TB+ of disk space and they work fine. However, connecting to such a huge database may take a few seconds, and [checkpointing](#docs:stable:sql:statements:checkpoint) can be slower. ## DuckDB Docker Container {#docs:stable:operations_manual:duckdb_docker} DuckDB has an [official Docker image](https://github.com/duckdb/duckdb-docker), which supports both the arm64 (AArch64) and x86_64 (AMD64) architectures. #### Usage {#docs:stable:operations_manual:duckdb_docker::usage} To use the DuckDB Docker image, run: ```batch docker run --rm -it -v "$(pwd):/workspace" -w /workspace duckdb/duckdb ``` #### Using the DuckDB UI with Docker {#docs:stable:operations_manual:duckdb_docker::using-the-duckdb-ui-with-docker} To use the [DuckDB UI](#docs:stable:core_extensions:ui) with Docker, enable host networking. > This setting forwards all ports from the container, so exercise caution and avoid it in secure environments. ```batch docker run --rm -it -v "$(pwd):/workspace" -w /workspace --net host duckdb/duckdb ``` Then, launch the UI as follows: ```plsql CALL start_ui(); ``` To enable host networking in Docker Desktop, follow the instructions on the [Host network driver](https://docs.docker.com/engine/network/drivers/host/#docker-desktop) page. # Development {#dev} ## DuckDB Repositories {#docs:stable:dev:repositories} Several components of DuckDB are maintained in separate repositories. #### Main Repositories {#docs:stable:dev:repositories::main-repositories} * [`duckdb`](https://github.com/duckdb/duckdb): core DuckDB project * [`duckdb-web`](https://github.com/duckdb/duckdb-web): documentation and blog #### Clients {#docs:stable:dev:repositories::clients} * [`duckdb-go`](https://github.com/duckdb/duckdb-go): Go client * [`duckdb-java`](https://github.com/duckdb/duckdb-java): Java (JDBC) client * [`duckdb-node`](https://github.com/duckdb/duckdb-node): Node.js client (deprecated) * [`duckdb-node-neo`](https://github.com/duckdb/duckdb-node-neo): Node.js client * [`duckdb-odbc`](https://github.com/duckdb/duckdb-odbc): ODBC client * [`duckdb-pyodide`](https://github.com/duckdb/duckdb-pyodide): Pyodide client * [`duckdb-python`](https://github.com/duckdb/duckdb-python): Python client * [`duckdb-r`](https://github.com/duckdb/duckdb-r): R client * [`duckdb-rs`](https://github.com/duckdb/duckdb-rs): Rust client * [`duckdb-swift`](https://github.com/duckdb/duckdb-swift): Swift client * [`duckdb-wasm`](https://github.com/duckdb/duckdb-wasm): WebAssembly client * [`duckplyr`](https://github.com/tidyverse/duckplyr): a drop-in replacement for dplyr in R #### Connectors {#docs:stable:dev:repositories::connectors} * [`dbt-duckdb`](https://github.com/duckdb/dbt-duckdb): dbt * [`duckdb-mysql`](https://github.com/duckdb/duckdb-mysql): MySQL connector * [`duckdb-postgres`](https://github.com/duckdb/duckdb-postgres): PostgreSQL connector (connect to PostgreSQL from DuckDB) * [`duckdb-sqlite`](https://github.com/duckdb/duckdb-sqlite): SQLite connector * [`pg_duckdb`](https://github.com/duckdb/pg_duckdb): official PostgreSQL extension for DuckDB (run DuckDB in PostgreSQL) #### Extensions {#docs:stable:dev:repositories::extensions} * [`duckdb-ui`](https://github.com/duckdb/duckdb-ui): web UI for DuckDB * Core extension repositories are linked in the [Official Extensions page](#docs:stable:core_extensions:overview) * Community extensions are served from the [Community Extensions repository](#community_extensions:index) #### Specifications {#docs:stable:dev:repositories::specifications} * [DuckLake specification](https://ducklake.select/docs/stable/specification/introduction) ## Profiling {#docs:stable:dev:profiling} Profiling is essential to help understand why certain queries exhibit specific performance characteristics. DuckDB contains several built-in features to enable query profiling, which this page covers. For a high-level example of using `EXPLAIN`, see the [â€œInspect Query Plansâ€ page](#docs:stable:guides:meta:explain). For an in-depth explanation, see the [â€œProfilingâ€ page](#docs:stable:dev:profiling) in the Developer Documentation. #### Statements {#docs:stable:dev:profiling::statements} ##### The `EXPLAIN` Statement {#docs:stable:dev:profiling::the-explain-statement} The first step to profiling a query can include examining the query plan. The [`EXPLAIN`](#docs:stable:guides:meta:explain) statement shows the query plan and describes what is going on under the hood. ##### The `EXPLAIN ANALYZE` Statement {#docs:stable:dev:profiling::the-explain-analyze-statement} The query plan helps developers understand the performance characteristics of the query. However, it is often also necessary to examine the performance numbers of individual operators and the cardinalities that pass through them. The [`EXPLAIN ANALYZE`](#docs:stable:guides:meta:explain_analyze) statement enables obtaining these, as it pretty-prints the query plan and also executes the query. Thus, it provides the actual run-time performance numbers. ##### The `FORMAT` Option {#docs:stable:dev:profiling::the-format-option} The `EXPLAIN [ANALYZE]` statement allows exporting to several formats: * `text` â€“ default ASCII-art style output * `graphviz` â€“ produces a DOT output, which can be rendered with [Graphviz](https://graphviz.org/) * `html` â€“ produces an HTML output, which can rendered with [treeflex](https://dumptyd.github.io/treeflex/) * `json` â€“ produces a JSON output To specify a format, use the `FORMAT` tag: ```sql EXPLAIN (FORMAT html) SELECT 42 AS x; ``` #### Pragmas {#docs:stable:dev:profiling::pragmas} DuckDB supports several pragmas for turning profiling on and off and controlling the level of detail in the profiling output. The following pragmas are available and can be set using either `PRAGMA` or `SET`. They can also be reset using `RESET`, followed by the setting name. For more information, see the [â€œProfilingâ€](#docs:stable:configuration:pragmas::profiling) section of the pragmas page. | Setting | Description | Default | Options | |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------| | [`enable_profiling`](#docs:stable:configuration:pragmas::enable_profiling), [`enable_profile`](#docs:stable:configuration:pragmas::enable_profiling) | Turn on profiling | `query_tree` | `query_tree`, `json`, `query_tree_optimizer`, `no_output` | | [`profiling_coverage`](#docs:stable:configuration:pragmas::profiling_coverage) | Set the operators to profile | `SELECT` | `SELECT`, `ALL` | | [`profiling_output`](#docs:stable:configuration:pragmas::profiling_output) | Set a profiling output file | Console | A filepath | | [`profiling_mode`](#docs:stable:configuration:pragmas::profiling_mode) | Toggle additional optimizer and planner metrics | `standard` | `standard`, `detailed` | | [`custom_profiling_settings`](#docs:stable:configuration:pragmas::custom_profiling_metrics) | Enable or disable specific metrics | All metrics except those activated by detailed profiling | A JSON object that matches the following: `{"METRIC_NAME": "boolean", ...}`. See the [metrics](#::metrics) section below. | | [`disable_profiling`](#docs:stable:configuration:pragmas::disable_profiling), [`disable_profile`](#docs:stable:configuration:pragmas::disable_profiling) | Turn off profiling | | | #### Metrics {#docs:stable:dev:profiling::metrics} The query tree has two types of nodes: the `QUERY_ROOT` and `OPERATOR` nodes. The `QUERY_ROOT` refers exclusively to the top-level node, and the metrics it contains are measured over the entire query. The `OPERATOR` nodes refer to the individual operators in the query plan. Some metrics are only available for `QUERY_ROOT` nodes, while others are only for `OPERATOR` nodes. The table below describes each metric and which nodes they are available for. Other than `QUERY_NAME` and `OPERATOR_TYPE`, it is possible to turn all metrics on or off. | Metric | Return type | Unit | Query | Operator | Description | |-------------------------|-------------|----------|:-----:|:--------:|-------------------------------------------------------------------------------------------------------------------------------| | `BLOCKED_THREAD_TIME` | `double` | seconds | âœ… | | The total time threads are blocked | | `EXTRA_INFO` | `string` | | âœ… | âœ… | Unique operator metrics | | `LATENCY` | `double` | seconds | âœ… | | The total elapsed query execution time | | `OPERATOR_CARDINALITY` | `uint64` | absolute | | âœ… | The cardinality of each operator, i.e., the number of rows it returns to its parent. Operator equivalent of `ROWS_RETURNED` | | `OPERATOR_ROWS_SCANNED` | `uint64` | absolute | | âœ… | The total rows scanned by each operator | | `OPERATOR_TIMING` | `double` | seconds | | âœ… | The time taken by each operator. Operator equivalent of `LATENCY` | | `OPERATOR_TYPE` | `string` | | | âœ… | The name of each operator | | `QUERY_NAME` | `string` | | âœ… | | The query string | | `RESULT_SET_SIZE` | `uint64` | bytes | âœ… | âœ… | The size of the result | | `ROWS_RETURNED` | `uint64` | absolute | âœ… | | The number of rows returned by the query | ##### Cumulative Metrics {#docs:stable:dev:profiling::cumulative-metrics} DuckDB also supports several cumulative metrics that are available in all nodes. In the `QUERY_ROOT` node, these metrics represent the sum of the corresponding metrics across all operators in the query. The `OPERATOR` nodes represent the sum of the operator's specific metric and those of all its children recursively. These cumulative metrics can be enabled independently, even if the underlying specific metrics are disabled. The table below shows the cumulative metrics. It also depicts the metric based on which DuckDB calculates the cumulative metric. | Metric | Unit | Metric calculated cumulatively | |---------------------------|----------|--------------------------------| | `CPU_TIME` | seconds | `OPERATOR_TIMING` | | `CUMULATIVE_CARDINALITY` | absolute | `OPERATOR_CARDINALITY` | | `CUMULATIVE_ROWS_SCANNED` | absolute | `OPERATOR_ROWS_SCANNED` | `CPU_TIME` measures the cumulative operator timings. It does not include time spent in other stages, like parsing, query planning, etc. Thus, for some queries, the `LATENCY` in the `QUERY_ROOT` can be greater than the `CPU_TIME`. #### Detailed Profiling {#docs:stable:dev:profiling::detailed-profiling} When the `profiling_mode` is set to `detailed`, an extra set of metrics are enabled, which are only available in the `QUERY_ROOT` node. These include [`OPTIMIZER`](#::optimizer-metrics), [`PLANNER`](#::planner-metrics), and [`PHYSICAL_PLANNER`](#::physical-planner-metrics) metrics. They are measured in seconds and returned as a `double`. It is possible to toggle each of these additional metrics individually. ##### Optimizer Metrics {#docs:stable:dev:profiling::optimizer-metrics} At the `QUERY_ROOT` node, there are metrics that measure the time taken by each [optimizer](#docs:stable:internals:overview::optimizer). These metrics are only available when the specific optimizer is enabled. The available optimizations can be queried using the [`duckdb_optimizers()`{:.language-sql .highlight} table function](#docs:stable:sql:meta:duckdb_table_functions::duckdb_optimizers). Each optimizer has a corresponding metric that follows the template: `OPTIMIZER_âŸ¨OPTIMIZER_NAMEâŸ©`{:.language-sql .highlight}. For example, the `OPTIMIZER_JOIN_ORDER` metric corresponds to the `JOIN_ORDER` optimizer. Additionally, the following metrics are available to support the optimizer metrics: * `ALL_OPTIMIZERS`: Enables all optimizer metrics and measures the time the optimizer parent node takes. * `CUMULATIVE_OPTIMIZER_TIMING`: The cumulative sum of all optimizer metrics. It is usable without turning on all optimizer metrics. ##### Planner Metrics {#docs:stable:dev:profiling::planner-metrics} The planner is responsible for generating the logical plan. Currently, DuckDB measures two metrics in the planner: * `PLANNER`: The time to generate the logical plan from the parsed SQL nodes. * `PLANNER_BINDING`: The time taken to bind the logical plan. ##### Physical Planner Metrics {#docs:stable:dev:profiling::physical-planner-metrics} The physical planner is responsible for generating the physical plan from the logical plan. The following are the metrics supported in the physical planner: * `PHYSICAL_PLANNER`: The time spent generating the physical plan. * `PHYSICAL_PLANNER_COLUMN_BINDING`: The time spent binding the columns in the logical plan to physical columns. * `PHYSICAL_PLANNER_RESOLVE_TYPES`: The time spent resolving the types in the logical plan to physical types. * `PHYSICAL_PLANNER_CREATE_PLAN`: The time spent creating the physical plan. #### Custom Metrics Examples {#docs:stable:dev:profiling::custom-metrics-examples} The following examples demonstrate how to enable custom profiling and set the output format to `json`. In the first example, we enable profiling and set the output to a file. We only enable `EXTRA_INFO`, `OPERATOR_CARDINALITY`, and `OPERATOR_TIMING`. ```sql CREATE TABLE students (name VARCHAR, sid INTEGER); CREATE TABLE exams (eid INTEGER, subject VARCHAR, sid INTEGER); INSERT INTO students VALUES ('Mark', 1), ('Joe', 2), ('Matthew', 3); INSERT INTO exams VALUES (10, 'Physics', 1), (20, 'Chemistry', 2), (30, 'Literature', 3); PRAGMA enable_profiling = 'json'; PRAGMA profiling_output = '/path/to/file.json'; PRAGMA custom_profiling_settings = '{"CPU_TIME": "false", "EXTRA_INFO": "true", "OPERATOR_CARDINALITY": "true", "OPERATOR_TIMING": "true"}'; SELECT name FROM students JOIN exams USING (sid) WHERE name LIKE 'Ma%'; ``` The file's content after executing the query: ```json { "extra_info": {}, "query_name": "SELECT name\nFROM students\nJOIN exams USING (sid)\nWHERE name LIKE 'Ma%';", "children": [ { "operator_timing": 0.000001, "operator_cardinality": 2, "operator_type": "PROJECTION", "extra_info": { "Projections": "name", "Estimated Cardinality": "1" }, "children": [ { "extra_info": { "Join Type": "INNER", "Conditions": "sid = sid", "Build Min": "1", "Build Max": "3", "Estimated Cardinality": "1" }, "operator_cardinality": 2, "operator_type": "HASH_JOIN", "operator_timing": 0.00023899999999999998, "children": [ ... ``` The second example adds detailed metrics to the output. ```sql PRAGMA profiling_mode = 'detailed'; SELECT name FROM students JOIN exams USING (sid) WHERE name LIKE 'Ma%'; ``` The contents of the outputted file: ```json { "all_optimizers": 0.001413, "cumulative_optimizer_timing": 0.0014120000000000003, "planner": 0.000873, "planner_binding": 0.000869, "physical_planner": 0.000236, "physical_planner_column_binding": 0.000005, "physical_planner_resolve_types": 0.000001, "physical_planner_create_plan": 0.000226, "optimizer_expression_rewriter": 0.000029, "optimizer_filter_pullup": 0.000002, "optimizer_filter_pushdown": 0.000102, ... "optimizer_column_lifetime": 0.000009999999999999999, "rows_returned": 2, "latency": 0.003708, "cumulative_rows_scanned": 6, "cumulative_cardinality": 11, "extra_info": {}, "cpu_time": 0.000095, "optimizer_build_side_probe_side": 0.000017, "result_set_size": 32, "blocked_thread_time": 0.0, "query_name": "SELECT name\nFROM students\nJOIN exams USING (sid)\nWHERE name LIKE 'Ma%';", "children": [ { "operator_timing": 0.000001, "operator_rows_scanned": 0, "cumulative_rows_scanned": 6, "operator_cardinality": 2, "operator_type": "PROJECTION", "cumulative_cardinality": 11, "extra_info": { "Projections": "name", "Estimated Cardinality": "1" }, "result_set_size": 32, "cpu_time": 0.000095, "children": [ ... ``` #### Query Graphs {#docs:stable:dev:profiling::query-graphs} It is also possible to render the profiling output as a query graph. The query graph visually represents the query plan, showing the operators and their relationships. The query plan must be output in the `json` format and stored in a file. After writing a profiling output to its designated file, the Python script can render it as a query graph. The script requires the `duckdb` Python module to be installed. It generates an HTML file and opens it in your web browser. ```batch python -m duckdb.query_graph /path/to/file.json ``` #### Notation in Query Plans {#docs:stable:dev:profiling::notation-in-query-plans} In query plans, the [hash join](https://en.wikipedia.org/wiki/Hash_join) operators adhere to the following convention: the _probe side_ of the join is the left operand, while the _build side_ is the right operand. Join operators in the query plan show the join type used: * Inner joins are denoted as `INNER`. * Left outer joins and right outer joins are denoted as `LEFT` and `RIGHT`, respectively. * Full outer joins are denoted as `FULL`. > **Tip.** To visualize query plans, consider using the [DuckDB execution plan visualizer](https://db.cs.uni-tuebingen.de/explain/) developed by the [Database Systems Research Group at the University of TÃ¼bingen](https://github.com/DBatUTuebingen). ## Building DuckDB {#dev:building} ### Building DuckDB from Source {#docs:stable:dev:building:overview} #### When Should You Build DuckDB? {#docs:stable:dev:building:overview::when-should-you-build-duckdb} DuckDB binaries are available for _stable_ and _preview_ builds on the [installation page](https://duckdb.org/install). In most cases, it's recommended to use these binaries. When you are running on an experimental platform (e.g., [Raspberry Pi](#docs:stable:dev:building:raspberry_pi)) or you would like to build the project for an unmerged pull request, you can build DuckDB from source based on the [`duckdb/duckdb` repository hosted on GitHub](https://github.com/duckdb/duckdb/). This page explains the steps for building DuckDB. #### Prerequisites {#docs:stable:dev:building:overview::prerequisites} DuckDB needs CMake and a C++11-compliant compiler (e.g., GCC, Apple-Clang, MSVC). Additionally, we recommend using the [Ninja build system](https://ninja-build.org/), which automatically parallelizes the build process. #### Getting Started {#docs:stable:dev:building:overview::getting-started} A `Makefile` wraps the build process. See [Build Configuration](#docs:stable:dev:building:build_configuration) for targets and configuration flags. ```bash make make release # same as plain make make debug GEN=ninja make # for use with Ninja BUILD_BENCHMARK=1 make # build with benchmarks ``` #### Platforms {#docs:stable:dev:building:overview::platforms} ##### Platforms with Full Support {#docs:stable:dev:building:overview::platforms-with-full-support} DuckDB fully supports Linux, macOS and Windows. Both x86_64 (amd64) and AArch64 (arm64) builds are available for these platforms, and almost all extensions are distributed for these platforms. | Platform name | Description | |--------------------|------------------------------------------------------------------------| | `linux_amd64` | Linux x86_64 (amd64) with [glibc](https://www.gnu.org/software/libc/) | | `linux_arm64` | Linux AArch64 (arm64) with [glibc](https://www.gnu.org/software/libc/) | | `osx_amd64` | macOS 12+ amd64 (Intel CPUs) | | `osx_arm64` | macOS 12+ arm64 (Apple Silicon CPUs) | | `windows_amd64` | Windows 10+ x86_64 (amd64) | | `windows_arm64` | Windows 10+ AArch64 (arm64) | For these platforms, builds are available for both the latest stable version and the preview version (nightly build). In some circumstances, you may still want to build DuckDB from source, e.g., to test an unmerged [pull request](https://github.com/duckdb/duckdb/pulls). For build instructions on these platforms, see: * [Linux](#docs:stable:dev:building:linux) * [macOS](#docs:stable:dev:building:macos) * [Windows](#docs:stable:dev:building:windows) ##### Platforms with Partial Support {#docs:stable:dev:building:overview::platforms-with-partial-support} There are several partially supported platforms. For some platforms, DuckDB binaries and extensions (or a [subset of extensions](#docs:stable:extensions:extension_distribution::platforms)) are distributed. For others, building from source is possible. | Platform name | Description | |------------------------|------------------------------------------------------------------------------------------------------| | `linux_amd64_musl` | Linux x86_64 (amd64) with [musl libc](https://musl.libc.org/), e.g., Alpine Linux | | `linux_arm64_musl` | Linux AArch64 (arm64) with [musl libc](https://musl.libc.org/), e.g., Alpine Linux | | `linux_arm64_android` | Android AArch64 (arm64) | | `wasm_eh` | WebAssembly Exception Handling | Below, we provide detailed build instructions for some platforms: * [Android](#docs:stable:dev:building:android) * [Raspberry Pi](#docs:stable:dev:building:raspberry_pi) ##### Platforms with Best Effort Support {#docs:stable:dev:building:overview::platforms-with-best-effort-support} | Platform name | Description | |------------------------|------------------------------------------------------------------------------------------------------| | `freebsd_amd64` | FreeBSD x86_64 (amd64) | | `freebsd_arm64` | FreeBSD AArch64 (arm64) | | `wasm_mvp` | WebAssembly Minimum Viable Product | | `windows_amd64_mingw` | Windows 10+ x86_64 (amd64) with MinGW | | `windows_arm64_mingw` | Windows 10+ AArch64 (arm64) with MinGW | > These platforms are not covered by DuckDB's community support. For details on commercial support, see the [support policy page](https://duckdblabs.com/community_support_policy#platforms). See also the [â€œUnofficial and Unsupported Platformsâ€ page](#docs:stable:dev:building:unofficial_and_unsupported_platforms) for details. ##### Outdated Platforms {#docs:stable:dev:building:overview::outdated-platforms} Some platforms were supported in older DuckDB versions but are no longer supported. | Platform name | Description | |------------------------|------------------------------------------------------------------------------------------------------| | `linux_amd64_gcc4` | Linux AMDM64 (x86_64) with GCC 4, e.g., CentOS 7 | | `linux_arm64_gcc4` | Linux AArch64 (arm64) with GCC 4, e.g., CentOS 7 | | `windows_amd64_rtools` | Windows 10+ x86_64 (amd64) for [RTools](https://cran.r-project.org/bin/windows/Rtools/) | DuckDB can also be built for end-of-life platforms such as [macOS 11](https://endoflife.date/macos) and [CentOS 7/8](https://endoflife.date/centos) using the instructions provided for macOS and Linux. #### Amalgamation Build {#docs:stable:dev:building:overview::amalgamation-build} DuckDB can be build as a single pair of C++ header and source code files (` duckdb.hpp` and `duckdb.cpp`) with approximately 0.5M lines of code. To generate this file, run: ```bash python scripts/amalgamation.py ``` Note that amalgamation build is provided on a best-effort basis and is not officially supported. #### Limitations {#docs:stable:dev:building:overview::limitations} Currently, DuckDB has the following known compile-time limitations: * The `-march=native` build flag, i.e., compiling DuckDB with the local machine's native instructions set, is not supported. #### Troubleshooting Guides {#docs:stable:dev:building:overview::troubleshooting-guides} We provide troubleshooting guides for building DuckDB: * [Generic issues](#docs:stable:dev:building:troubleshooting) * [Python](#docs:stable:dev:building:python) * [R](#docs:stable:dev:building:r) ### Building Configuration {#docs:stable:dev:building:build_configuration} #### Build Types {#docs:stable:dev:building:build_configuration::build-types} DuckDB can be built in many different settings, most of these correspond directly to CMake but not all of them. ##### `release` {#docs:stable:dev:building:build_configuration::release} This build has been stripped of all the assertions and debug symbols and code, optimized for performance. ##### `debug` {#docs:stable:dev:building:build_configuration::debug} This build runs with all the debug information, including symbols, assertions and `#ifdef DEBUG` blocks. Due to these, binaries of this build are expected to be slow. Note: the special debug defines are not automatically set for this build. ##### `relassert` {#docs:stable:dev:building:build_configuration::relassert} This build does not trigger the `#ifdef DEBUG` code blocks but it still has debug symbols that make it possible to step through the execution with line number information and `D_ASSERT` lines are still checked in this build. Binaries of this build mode are significantly faster than those of the `debug` mode. ##### `reldebug` {#docs:stable:dev:building:build_configuration::reldebug} This build is similar to `relassert` in many ways, only assertions are also stripped in this build. ##### `benchmark` {#docs:stable:dev:building:build_configuration::benchmark} This build is a shorthand for `release` with `BUILD_BENCHMARK=1` set. ##### `tidy-check` {#docs:stable:dev:building:build_configuration::tidy-check} This creates a build and then runs [Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/) to check for issues or style violations through static analysis. The CI will also run this check, causing it to fail if this check fails. ##### `format-fix` | `format-changes` | `format-main` {#docs:stable:dev:building:build_configuration::format-fix--format-changes--format-main} This doesn't actually create a build, but uses the following format checkers to check for style issues: * [clang-format](https://clang.llvm.org/docs/ClangFormat.html) to fix format issues in the code. * [cmake-format](https://cmake-format.readthedocs.io/en/latest/) to fix format issues in the `CMakeLists.txt` files. The CI will also run this check, causing it to fail if this check fails. #### Extension Selection {#docs:stable:dev:building:build_configuration::extension-selection} [Core DuckDB extensions](#docs:stable:core_extensions:overview) are the ones maintained by the DuckDB team. These are hosted in the `duckdb` GitHub organization and are served by the `core` extension repository. Core extensions can be built as part of DuckDB via the `CORE_EXTENSIONS` flag, then listing the names of the extensions that are to be built. ```batch CORE_EXTENSIONS='tpch;httpfs;fts;json;parquet' make ``` More on this topic at [building duckdb extensions](#docs:stable:dev:building:building_extensions). #### Package Flags {#docs:stable:dev:building:build_configuration::package-flags} For every package that is maintained by core DuckDB, there exists a flag in the Makefile to enable building the package. These can be enabled by either setting them in the current `env`, through set up files like `bashrc` or `zshrc`, or by setting them before the call to `make`, for example: ```batch BUILD_PYTHON=1 make debug ``` ##### `BUILD_PYTHON` {#docs:stable:dev:building:build_configuration::build_python} When this flag is set, the [Python](#docs:stable:clients:python:overview) package is built. ##### `BUILD_SHELL` {#docs:stable:dev:building:build_configuration::build_shell} When this flag is set, the [CLI](#docs:stable:clients:cli:overview) is built, this is usually enabled by default. ##### `BUILD_BENCHMARK` {#docs:stable:dev:building:build_configuration::build_benchmark} When this flag is set, DuckDB's in-house benchmark suite is built. More information about this can be found [in the README](https://github.com/duckdb/duckdb/blob/main/benchmark/README.md). ##### `BUILD_JDBC` {#docs:stable:dev:building:build_configuration::build_jdbc} When this flag is set, the [Java](#docs:stable:clients:java) package is built. ##### `BUILD_ODBC` {#docs:stable:dev:building:build_configuration::build_odbc} When this flag is set, the [ODBC](#docs:stable:clients:odbc:overview) package is built. #### Miscellaneous Flags {#docs:stable:dev:building:build_configuration::miscellaneous-flags} ##### `DISABLE_UNITY` {#docs:stable:dev:building:build_configuration::disable_unity} To improve compilation time, we use [Unity Build](https://cmake.org/cmake/help/latest/prop_tgt/UNITY_BUILD.html) to combine translation units. This can however hide include bugs, this flag disables using the unity build so these errors can be detected. ##### `DISABLE_SANITIZER` {#docs:stable:dev:building:build_configuration::disable_sanitizer} In some situations, running an executable that has been built with sanitizers enabled is not support / can cause problems. Julia is an example of this. With this flag enabled, the sanitizers are disabled for the build. #### Overriding Git Hash and Version {#docs:stable:dev:building:build_configuration::overriding-git-hash-and-version} It is possible to override the Git hash and version when building from source using the `OVERRIDE_GIT_DESCRIBE` environment variable. This is useful when building from sources that are not part of a complete Git repository (e.g., an archive file with no information on commit hashes and tags). For example: ```batch OVERRIDE_GIT_DESCRIBE=v0.10.0-843-g09ea97d0a9 GEN=ninja make ``` Will result in the following output when running `./build/release/duckdb`: ```text v0.10.1-dev843 09ea97d0a9 ... ``` ### Building Extensions {#docs:stable:dev:building:building_extensions} [Extensions]({% link docs/stable/extensions/overview.md %}) can be built from source and installed from the resulting local binary. #### Building Extensions {#docs:stable:dev:building:building_extensions::building-extensions} To build using extension flags, set the `CORE_EXTENSIONS` flag to the list of extensions that you want to be build. For example: ```bash CORE_EXTENSIONS='autocomplete;httpfs;icu;json;tpch' GEN=ninja make ``` This option also accepts out-of-tree extensions such as [`delta`](#docs:stable:core_extensions:delta): ```bash CORE_EXTENSIONS='autocomplete;httpfs;icu;json;tpch;delta' GEN=ninja make ``` In most cases, extension will be directly linked in the resulting DuckDB executable. #### Special Extension Flags {#docs:stable:dev:building:building_extensions::special-extension-flags} ##### `BUILD_JEMALLOC` {#docs:stable:dev:building:building_extensions::build_jemalloc} When this flag is set, the [`jemalloc` extension](#docs:stable:core_extensions:jemalloc) is built. ##### `BUILD_TPCE` {#docs:stable:dev:building:building_extensions::build_tpce} When this flag is set, the [TPCE](https://www.tpc.org/tpce/) library is built. Unlike TPC-H and TPC-DS this is not a proper extension and it's not distributed as such. Enabling this allows TPC-E enabled queries through our test suite. #### Debug Flags {#docs:stable:dev:building:building_extensions::debug-flags} ##### `CRASH_ON_ASSERT` {#docs:stable:dev:building:building_extensions::crash_on_assert} `D_ASSERT(condition)` is used all throughout the code, these will throw an InternalException in debug builds. With this flag enabled, when the assertion triggers it will instead directly cause a crash. ##### `DISABLE_STRING_INLINE` {#docs:stable:dev:building:building_extensions::disable_string_inline} In our execution format `string_t` has the feature to â€œinlineâ€ strings that are under a certain length (12 bytes), this means they don't require a separate allocation. When this flag is set, we disable this and don't inline small strings. ##### `DISABLE_MEMORY_SAFETY` {#docs:stable:dev:building:building_extensions::disable_memory_safety} Our data structures that are used extensively throughout the non-performance-critical code have extra checks to ensure memory safety, these checks include: * Making sure `nullptr` is never dereferenced. * Making sure index out of bounds accesses don't trigger a crash. With this flag enabled we remove these checks, this is mostly done to check that the performance hit of these checks is negligible. ##### `DESTROY_UNPINNED_BLOCKS` {#docs:stable:dev:building:building_extensions::destroy_unpinned_blocks} When previously pinned blocks in the BufferManager are unpinned, with this flag enabled we destroy them instantly to make sure that there aren't situations where this memory is still being used, despite not being pinned. ##### `DEBUG_STACKTRACE` {#docs:stable:dev:building:building_extensions::debug_stacktrace} When a crash or assertion hit occurs in a test, print a stack trace. This is useful when debugging a crash that is hard to pinpoint with a debugger attached. #### Using a CMake Configuration File {#docs:stable:dev:building:building_extensions::using-a-cmake-configuration-file} To build using a CMake configuration file, create an extension configuration file named `extension_config.cmake` with e.g., the following content: ```cmake duckdb_extension_load(autocomplete) duckdb_extension_load(fts) duckdb_extension_load(inet) duckdb_extension_load(icu) duckdb_extension_load(json) duckdb_extension_load(parquet) ``` Build DuckDB as follows: ```bash GEN=ninja EXTENSION_CONFIGS="extension_config.cmake" make ``` Then, to install the extensions in one go, run: ```bash # for release builds cd build/release/extension/ # for debug builds cd build/debug/extension/ # install extensions for EXTENSION in *; do ../duckdb -c "INSTALL '${EXTENSION}/${EXTENSION}.duckdb_extension';" done ``` ### Android {#docs:stable:dev:building:android} DuckDB has experimental support for Android. Please use the latest `main` branch of DuckDB instead of the stable versions. #### Building the DuckDB Library Using the Android NDK {#docs:stable:dev:building:android::building-the-duckdb-library-using-the-android-ndk} We provide build instructions for setups using macOS and Android Studio. For other setups, please adjust the steps accordingly. 1. Open [Android Studio](https://developer.android.com/studio). Select the **Tools** menu and pick **SDK Manager**. Select the SDK Tools tab and tick the **NDK (Side by side)** option. Click **OK** to install. 1. Set the Android NDK's location. For example: ```bash ANDROID_NDK=~/Library/Android/sdk/ndk/28.0.12433566/ ``` 1. Set the [Android ABI](https://developer.android.com/ndk/guides/abis). For example: ```bash ANDROID_ABI=arm64-v8a ``` Or: ```bash ANDROID_ABI=x86_64 ``` 1. If you would like to use the [Ninja build system](#docs:stable:dev:building:overview::prerequisites), make sure it is installed and available on the `PATH`. 1. Set the list of DuckDB extensions to build. These will be statically linked in the binary. For example: ```bash DUCKDB_EXTENSIONS="icu;json;parquet" ``` 1. Navigate to DuckDB's directory and run the build as follows: ```bash PLATFORM_NAME="android_${ANDROID_ABI}" BUILDDIR=./build/${PLATFORM_NAME} mkdir -p ${BUILDDIR} cd ${BUILDDIR} cmake \ -G "Ninja" \ -DEXTENSION_STATIC_BUILD=1 \ -DDUCKDB_EXTRA_LINK_FLAGS="-llog" \ -DBUILD_EXTENSIONS=${DUCKDB_EXTENSIONS} \ -DENABLE_EXTENSION_AUTOLOADING=1 \ -DENABLE_EXTENSION_AUTOINSTALL=1 \ -DCMAKE_VERBOSE_MAKEFILE=on \ -DANDROID_PLATFORM=${ANDROID_PLATFORM} \ -DLOCAL_EXTENSION_REPO="" \ -DOVERRIDE_GIT_DESCRIBE="" \ -DDUCKDB_EXPLICIT_PLATFORM=${PLATFORM_NAME} \ -DBUILD_UNITTESTS=0 \ -DBUILD_SHELL=1 \ -DANDROID_ABI=${ANDROID_ABI} \ -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake \ -DCMAKE_BUILD_TYPE=Release ../.. cmake \ --build . \ --config Release ``` 1. For the `arm64-v8a` ABI, the build will produce the `build/android_arm64-v8a/duckdb` and `build/android_arm64-v8a/src/libduckdb.so` binaries. #### Building the CLI in Termux {#docs:stable:dev:building:android::building-the-cli-in-termux} 1. To build the [command line client](#docs:stable:clients:cli:overview) in the [Termux application](https://termux.dev/), install the following packages: ```bash pkg install -y git ninja clang cmake python3 ``` 1. Set the list of DuckDB extensions to build. These will be statically linked in the binary. For example: ```bash DUCKDB_EXTENSIONS="icu;json" ``` 1. Build DuckDB as follows: ```bash mkdir build cd build export LDFLAGS="-llog" cmake \ -G "Ninja" \ -DBUILD_EXTENSIONS="${DUCKDB_EXTENSIONS}" \ -DDUCKDB_EXPLICIT_PLATFORM=linux_arm64_android \ -DCMAKE_BUILD_TYPE=Release \ .. cmake --build . --config Release ``` Note that you can also use the Python client on Termux: ```bash pip install --pre --upgrade duckdb ``` #### Troubleshooting {#docs:stable:dev:building:android::troubleshooting} ##### Log Library Is Missing {#docs:stable:dev:building:android::log-library-is-missing} **Problem:** The build throws the following error: ```console ld.lld: error: undefined symbol: __android_log_write ``` **Solution:** Make sure the log library is linked: ```bash export LDFLAGS="-llog" ``` ### Linux {#docs:stable:dev:building:linux} #### Prerequisites {#docs:stable:dev:building:linux::prerequisites} On Linux, install the required packages with the package manager of your distribution. ##### Ubuntu and Debian {#docs:stable:dev:building:linux::ubuntu-and-debian} ###### CLI Client {#docs:stable:dev:building:linux::cli-client} On Ubuntu and Debian (and also MX Linux, Linux Mint, etc.), the requirements for building the DuckDB CLI client are the following: ```bash sudo apt-get update sudo apt-get install -y git g++ cmake ninja-build libssl-dev git clone https://github.com/duckdb/duckdb cd duckdb GEN=ninja make ``` ##### Fedora, CentOS and Red Hat {#docs:stable:dev:building:linux::fedora-centos-and-red-hat} ###### CLI Client {#docs:stable:dev:building:linux::cli-client} The requirements for building the DuckDB CLI client on Fedora, CentOS, Red Hat, AlmaLinux, Rocky Linux, etc. are the following: ```bash sudo yum install -y git g++ cmake ninja-build openssl-devel git clone https://github.com/duckdb/duckdb cd duckdb GEN=ninja make ``` Note that on older Red Hat-based distributions, you may have to change the package name for `g++` to `gcc-c++`, skip Ninja and manually configure the number of Make jobs: ```bash sudo yum install -y git gcc-c++ cmake openssl-devel git clone https://github.com/duckdb/duckdb cd duckdb mkdir build cd build cmake .. make -j`nproc` ``` ##### Arch, Omarchy and Manjaro {#docs:stable:dev:building:linux::arch-omarchy-and-manjaro} ###### CLI Client {#docs:stable:dev:building:linux::cli-client} The requirements for building the DuckDB CLI client on Arch, Omarchy, Manjaro, etc. are the following: ```bash sudo pacman -S git gcc cmake ninja openssl git clone https://github.com/duckdb/duckdb cd duckdb GEN=ninja make ``` DuckDB is also [available in AUR](https://aur.archlinux.org/packages/duckdb). To install it, run: ```bash yay -S duckdb ``` ##### Alpine Linux {#docs:stable:dev:building:linux::alpine-linux} ###### CLI Client {#docs:stable:dev:building:linux::cli-client} The requirements for building the DuckDB CLI client on Alpine Linux are the following: ```bash apk add g++ git make cmake ninja git clone https://github.com/duckdb/duckdb cd duckdb GEN=ninja make ``` ###### Performance with musl libc {#docs:stable:dev:building:linux::performance-with-musl-libc} Note that Alpine Linux uses [musl libc](https://musl.libc.org/) as its C standard library. DuckDB binaries built with musl libc have lower performance compared to the glibc variants: for some workloads, the slowdown can be more than 5Ã—. Therefore, it's recommended to use glibc for performance-oriented workloads. ###### Distribution for the `linux_*_musl` Platforms {#docs:stable:dev:building:linux::distribution-for-the-linux__musl-platforms} Starting with DuckDB v1.2.0, [_DuckDB extensions_ are distributed for the `linux_amd64_musl` platform](https://duckdb.org/2025/02/05/announcing-duckdb-120#musl-extensions) (but not yet for the `linux_arm64_musl` platform). However, there are no official _DuckDB binaries_ distributed for musl libc but it can be build with it manually following the instructions on this page. ###### Python Client on Alpine Linux {#docs:stable:dev:building:linux::python-client-on-alpine-linux} Currently, installing the DuckDB Python on Alpine Linux requires compilation from source. To do so, install the required packages before running `pip`: ```bash apk add g++ py3-pip python3-dev pip install duckdb ``` #### Using the DuckDB CLI Client on Linux {#docs:stable:dev:building:linux::using-the-duckdb-cli-client-on-linux} Once the build finishes successfully, you can find the `duckdb` binary in the `build` directory: ```bash build/release/duckdb ``` For different build configurations (` debug`, `relassert`, etc.), please consult the [â€œBuild Configurationsâ€ page](#docs:stable:dev:building:build_configuration). #### Building Using Extension Flags {#docs:stable:dev:building:linux::building-using-extension-flags} To build using extension flags, set the `CORE_EXTENSIONS` flag to the list of extensions that you want to be build. For example: ```bash CORE_EXTENSIONS='autocomplete;httpfs;icu;json;tpch' GEN=ninja make ``` #### Troubleshooting {#docs:stable:dev:building:linux::troubleshooting} ##### R Package on Linux AArch64: `too many GOT entries` Build Error {#docs:stable:dev:building:linux::r-package-on-linux-aarch64-too-many-got-entries-build-error} **Problem:** Building the R package on Linux running on an ARM64 architecture (AArch64) may result in the following error message: ```console /usr/bin/ld: /usr/include/c++/10/bits/basic_string.tcc:206: warning: too many GOT entries for -fpic, please recompile with -fPIC ``` **Solution:** Create or edit the `~/.R/Makevars` file. This example also contains the [`MAKEFLAGS` setting to parallelize the build](#docs:stable:dev:building:r::the-build-only-uses-a-single-thread ): ```ini ALL_CXXFLAGS = $(PKG_CXXFLAGS) -fPIC $(SHLIB_CXXFLAGS) $(CXXFLAGS) MAKEFLAGS = -j$(nproc) ``` ##### Building the httpfs Extension Fails {#docs:stable:dev:building:linux::building-the-httpfs-extension-fails} **Problem:** When building the [`httpfs` extension](#docs:stable:core_extensions:httpfs:overview) on Linux, the build may fail with the following error. ```console CMake Error at /usr/share/cmake-3.22/Modules/FindPackageHandleStandardArgs.cmake:230 (message): Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the system variable OPENSSL_ROOT_DIR (missing: OPENSSL_CRYPTO_LIBRARY OPENSSL_INCLUDE_DIR) ``` **Solution:** Install the `libssl-dev` library. ```bash sudo apt-get install -y libssl-dev ``` Then, build with: ```bash GEN=ninja CORE_EXTENSIONS="httpfs" make ``` ### macOS {#docs:stable:dev:building:macos} #### Prerequisites {#docs:stable:dev:building:macos::prerequisites} Install Xcode and [Homebrew](https://brew.sh/). Then, install the required packages with: ```bash brew install git cmake ninja ``` #### Building DuckDB {#docs:stable:dev:building:macos::building-duckdb} Clone and build DuckDB as follows. ```bash git clone https://github.com/duckdb/duckdb cd duckdb GEN=ninja make ``` Once the build finishes successfully, you can find the `duckdb` binary in the `build` directory: ```bash build/release/duckdb ``` For different build configurations (` debug`, `relassert`, etc.), please consult the [Build Configurations page](#docs:stable:dev:building:build_configuration). #### Troubleshooting {#docs:stable:dev:building:macos::troubleshooting} ##### Build Failure: `'string' file not found` {#docs:stable:dev:building:macos::build-failure-string-file-not-found} **Problem:** The build fails on macOS with following error: ```console FAILED: third_party/libpg_query/CMakeFiles/duckdb_pg_query.dir/src_backend_nodes_list.cpp.o /Library/Developer/CommandLineTools/usr/bin/c++ -DDUCKDB_BUILD_LIBRARY -DEXT_VERSION_PARQUET=\"9cba6a2a03\" -I/Users/builder/external/duckdb/src/include -I/Users/builder/external/duckdb/third_party/fsst -I/Users/builder/external/duckdb/third_party/fmt/include -I/Users/builder/external/duckdb/third_party/hyperloglog -I/Users/builder/external/duckdb/third_party/fastpforlib -I/Users/builder/external/duckdb/third_party/skiplist -I/Users/builder/external/duckdb/third_party/fast_float -I/Users/builder/external/duckdb/third_party/re2 -I/Users/builder/external/duckdb/third_party/miniz -I/Users/builder/external/duckdb/third_party/utf8proc/include -I/Users/builder/external/duckdb/third_party/concurrentqueue -I/Users/builder/external/duckdb/third_party/pcg -I/Users/builder/external/duckdb/third_party/tdigest -I/Users/builder/external/duckdb/third_party/mbedtls/include -I/Users/builder/external/duckdb/third_party/jaro_winkler -I/Users/builder/external/duckdb/third_party/yyjson/include -I/Users/builder/external/duckdb/third_party/libpg_query/include -O3 -DNDEBUG -O3 -DNDEBUG -std=c++11 -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX15.1.sdk -fPIC -fvisibility=hidden -fcolor-diagnostics -w -MD -MT third_party/libpg_query/CMakeFiles/duckdb_pg_query.dir/src_backend_nodes_list.cpp.o -MF third_party/libpg_query/CMakeFiles/duckdb_pg_query.dir/src_backend_nodes_list.cpp.o.d -o third_party/libpg_query/CMakeFiles/duckdb_pg_query.dir/src_backend_nodes_list.cpp.o -c /Users/builder/external/duckdb/third_party/libpg_query/src_backend_nodes_list.cpp In file included from /Users/builder/external/duckdb/third_party/libpg_query/src_backend_nodes_list.cpp:35: /Users/builder/external/duckdb/third_party/libpg_query/include/pg_functions.hpp:4:10: fatal error: 'string' file not found 4 | #include ``` **Solution:** Users report that reinstalling Xcode fixed their problem. See related discussions on the [DuckDB GitHub issues](https://github.com/duckdb/duckdb/issues/14665#issuecomment-2452679953) and on [Stack Overflow](https://stackoverflow.com/questions/78999694/cant-compile-c-hello-world-with-clang-on-mac-sequoia-15-0-and-vs-code). > **Warning.** Attempting to reinstall your Xcode suite may impact other applications on your system. Proceed with caution. ```bash sudo rm -rf /Library/Developer/CommandLineTools xcode-select --install ``` ##### Debug Build Prints malloc Warning {#docs:stable:dev:building:macos::debug-build-prints-malloc-warning} **Problem:** The `debug` build on macOS prints a `malloc` warning, e.g.: ```text duckdb(83082,0x205b30240) malloc: nano zone abandoned due to inability to reserve vm space. ``` **Solution:** To prevent this, set the `MallocNanoZone` flag to 0: ```bash MallocNanoZone=0 make debug ``` To apply this change for your future terminal sessions, you can add the following to your `~/.zshrc` file: ```bash export MallocNanoZone=0 ``` ### Raspberry Pi {#docs:stable:dev:building:raspberry_pi} DuckDB is not officially distributed for the Raspberry Pi OS (previously called Raspbian). You can build it following the instructions on this page. #### Raspberry Pi (64-bit) {#docs:stable:dev:building:raspberry_pi::raspberry-pi-64-bit} First, install the required build packages: ```bash sudo apt-get update sudo apt-get install -y git g++ cmake ninja-build ``` Then, clone and build it as follows: ```bash git clone https://github.com/duckdb/duckdb cd duckdb GEN=ninja CORE_EXTENSIONS="icu;json" make ``` Finally, run it: ```bash build/release/duckdb ``` #### Raspberry Pi (32-bit) {#docs:stable:dev:building:raspberry_pi::raspberry-pi-32-bit} On 32-bit Raspberry Pi boards, you need to add the [`-latomic` link flag](https://github.com/duckdb/duckdb/issues/13855#issuecomment-2341539339). As extensions are not distributed for this platform, it's recommended to also include them in the build. For example: ```bash mkdir build cd build cmake .. \ -DCORE_EXTENSIONS="httpfs;json;parquet" \ -DDUCKDB_EXTRA_LINK_FLAGS="-latomic" make -j4 ``` ### Windows {#docs:stable:dev:building:windows} On Windows, DuckDB requires the [Microsoft Visual C++ Redistributable package](https://learn.microsoft.com/en-US/cpp/windows/latest-supported-vc-redist) both as a build-time and runtime dependency. Note that unlike the build process on UNIX-like systems, the Windows builds directly call CMake. #### Visual Studio {#docs:stable:dev:building:windows::visual-studio} To build DuckDB on Windows, we recommend using the Visual Studio compiler. To use it, follow the instructions in the [CI workflow](https://github.com/duckdb/duckdb/blob/52b43b166091c82b3f04bf8af15f0ace18207a64/.github/workflows/Windows.yml#L73): ```bash python scripts/windows_ci.py cmake \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_GENERATOR_PLATFORM=x64 \ -DENABLE_EXTENSION_AUTOLOADING=1 \ -DENABLE_EXTENSION_AUTOINSTALL=1 \ -DDUCKDB_EXTENSION_CONFIGS="${GITHUB_WORKSPACE}/.github/config/bundled_extensions.cmake" \ -DDISABLE_UNITY=1 \ -DOVERRIDE_GIT_DESCRIBE="$OVERRIDE_GIT_DESCRIBE" cmake --build . --config Release --parallel ``` #### MSYS2 and MinGW64 {#docs:stable:dev:building:windows::msys2-and-mingw64} DuckDB on Windows can also be built with [MSYS2](https://www.msys2.org/) and [MinGW64](https://www.mingw-w64.org/). Note that this build is only supported for compatibility reasons and should only be used if the Visual Studio build is not feasible on a given platform. To build DuckDB with MinGW64, install the required dependencies using Pacman. When prompted with `Enter a selection (default=all)`, select the default option by pressing `Enter`. ```bash pacman -Syu git mingw-w64-x86_64-toolchain mingw-w64-x86_64-cmake mingw-w64-x86_64-ninja git clone https://github.com/duckdb/duckdb cd duckdb cmake -G "Ninja" -DCMAKE_BUILD_TYPE=Release -DBUILD_EXTENSIONS="icu;parquet;json" cmake --build . --config Release ``` Once the build finishes successfully, you can find the `duckdb.exe` binary in the repository's directory: ```bash ./duckdb.exe ``` #### Building the Go Client {#docs:stable:dev:building:windows::building-the-go-client} Building on Windows may result in the following error: ```batch go build ``` ```console collect2.exe: error: ld returned 5 exit status ``` GitHub user [vdmitriyev](https://github.com/vdmitriyev) shared instructions for [building the DuckDB Go client on Windows](https://github.com/marcboeker/go-duckdb/issues/4#issuecomment-2176409066): 1. Get four files (` .dll, .lib, .hpp, .h`) from the `libduckdb-windows-amd64.zip` archive. 2. Place them to, e.g.,: `C:\duckdb-go\libs\`. 3. Install the dependencies following the [`duckdb-go` project](https://github.com/duckdb/duckdb-go). 4. Build your project using the following instructions: ```bash set PATH=C:\duckdb-go\libs\;%PATH% set CGO_CFLAGS=-IC:\duckdb-go\libs\ set CGO_LDFLAGS=-LC:\duckdb-go\libs\ -lduckdb go build ``` ### Python {#docs:stable:dev:building:python} The DuckDB Python package has its own repository at [duckdb/duckdb-python](https://github.com/duckdb/duckdb-python) and uses [pybind11](https://pybind11.readthedocs.io/en/stable/) to create Python bindings with DuckDB. #### Prerequisites {#docs:stable:dev:building:python::prerequisites} This guide assumes: 1. You have a working copy of the DuckDB Python package source (including git submodules and tags) 2. You have [Astral UV](https://docs.astral.sh/uv/) version >= 0.8.0 installed 3. You run commands from the root of the duckdb-python source We are opinionated about using **Astral UV** for Python environment and dependency management. While using pip for a development environment with an editable install without build isolation is possible, we don't provide guidance for that approach in this guide. We use **CLion** as our IDE. This guide doesn't include specific instructions for other IDEs, but the setup should be similar. ##### 1. DuckDB Python Repository {#docs:stable:dev:building:python::1-duckdb-python-repository} Start by [forking duckdb-python](https://github.com/duckdb/duckdb-python/fork) into a personal repository, then clone your fork: ```bash git clone --recurse-submodules [YOUR_FORK_URL] cd duckdb-python git remote add upstream https://github.com/duckdb/duckdb-python.git git fetch --all ``` If you've already cloned without submodules: ```bash git submodule update --init --recursive git remote add upstream https://github.com/duckdb/duckdb-python.git git fetch --all ``` **Important notes:** - DuckDB is vendored as a git submodule and must be initialized - DuckDB version determination depends on local availability of git tags - If switching between branches with different submodule refs, add the git hooks: ```bash git config --local core.hooksPath .githooks/ ``` ##### 2. Install Astral uv {#docs:stable:dev:building:python::2-install-astral-uv} [Install uv](https://docs.astral.sh/uv/getting-started/installation/) version >= 0.8.0. #### Development Environment Setup {#docs:stable:dev:building:python::development-environment-setup} ##### 1. Platform-Specific Setup {#docs:stable:dev:building:python::1-platform-specific-setup} **All Platforms:** - Python 3.9+ supported - uv >= 0.8.0 required - CMake and Ninja (installed via UV) - C++ compiler toolchain **Linux (Ubuntu 24.04):** ```bash sudo apt-get update sudo apt-get install ccache ``` **macOS:** ```bash # Xcode command line tools xcode-select --install ``` **Windows:** - Visual Studio 2019+ with C++ support - Git for Windows ##### 2. Install Dependencies and Build {#docs:stable:dev:building:python::2-install-dependencies-and-build} Set up the development environment in two steps: ```bash # Install all development dependencies without building the project uv sync --no-install-project # Build and install the project without build isolation uv sync --no-build-isolation ``` **Why two steps?** - `uv sync` performs editable installs by default with scikit-build-core using a persistent build-dir - The build happens in an isolated, ephemeral environment where cmake's paths point to non-existing directories - Installing dependencies first, then building without isolation ensures proper cmake integration ##### 3. Enable Pre-Commit Hooks {#docs:stable:dev:building:python::3-enable-pre-commit-hooks} We run a number of linting, formatting and type-checking in CI. You can run all of these manually, but to make your life easier you can install the exact same checks we run in CI as git hooks with pre-commit, which is already installed as part of the dev dependencies: ```bash uvx pre-commit install ``` This will run all required checks before letting your commit pass. You can also install a post-checkout hook that always runs `git submodule update --init --recursive`. When you change branches between main and a bugfix branch, this makes sure the duckdb submodule is always correctly initialized: ```bash uvx pre-commit install --hook-type post-checkout ``` ##### 4. Verify Installation {#docs:stable:dev:building:python::4-verify-installation} ```bash uv run python -c "import duckdb; print(duckdb.sql('SELECT 42').fetchall())" ``` #### Development Workflow {#docs:stable:dev:building:python::development-workflow} ##### Running Tests {#docs:stable:dev:building:python::running-tests} Run all tests: ```bash uv run --no-build-isolation pytest ./tests --verbose ``` Run fast tests only (excludes slow directory): ```bash uv run --no-build-isolation pytest ./tests --verbose --ignore=./tests/slow ``` ##### Test Coverage {#docs:stable:dev:building:python::test-coverage} Run with coverage (compiles extension with `--coverage` for C++ coverage): ```bash COVERAGE=1 uv run --no-build-isolation coverage run -m pytest ./tests --verbose ``` Check Python coverage: ```bash uv run coverage html -d htmlcov-python uv run coverage report --format=markdown ``` Check C++ coverage: ```bash uv run gcovr \ --gcov-ignore-errors all \ --root "$PWD" \ --filter "${PWD}/src/duckdb_py" \ --exclude '.*/\.cache/.*' \ --gcov-exclude '.*/\.cache/.*' \ --gcov-exclude '.*/external/.*' \ --gcov-exclude '.*/site-packages/.*' \ --exclude-unreachable-branches \ --exclude-throw-branches \ --html --html-details -o coverage-cpp.html \ build/coverage/src/duckdb_py \ --print-summary ``` ##### Building Wheels {#docs:stable:dev:building:python::building-wheels} Build wheel for your system: ```bash uv build ``` Build for specific Python version: ```bash uv build -p 3.9 ``` ##### Cleaning Build Artifacts {#docs:stable:dev:building:python::cleaning-build-artifacts} ```bash uv cache clean rm -rf build .venv uv.lock ``` #### IDE Setup (CLion) {#docs:stable:dev:building:python::ide-setup-clion} For CLion users, the project can be configured for C++ debugging of the Python extension: ##### CMake Profile Configuration {#docs:stable:dev:building:python::cmake-profile-configuration} In **Settings** â†’ **Build, Execution, Deployment** â†’ **CMake**, create a Debug profile: - **Name**: Debug - **Build type**: Debug - **Generator**: Ninja - **CMake Options**: ``` -DCMAKE_PREFIX_PATH=$CMakeProjectDir$/.venv;$CMAKE_PREFIX_PATH ``` ##### Python Debug Configuration {#docs:stable:dev:building:python::python-debug-configuration--} Create a **CMake Application** run configuration: - **Name**: Python Debug - **Target**: `All targets` - **Executable**: `[PROJECT_DIR]/.venv/bin/python3` - **Program arguments**: `$FilePath$` - **Working directory**: `$ProjectFileDir$` This allows setting C++ breakpoints and debugging Python scripts that use the DuckDB extension. #### Debugging {#docs:stable:dev:building:python::debugging} ##### Command Line Debugging {#docs:stable:dev:building:python::command-line-debugging} Set breakpoints and debug with lldb: ```bash # Example Python script (test.py) # import duckdb # print(duckdb.sql("select * from range(1000)").df()) lldb -- .venv/bin/python3 test.py ``` In lldb: ```bash # Set breakpoint (library loads when imported) (lldb) br s -n duckdb::DuckDBPyRelation::FetchDF (lldb) r ``` #### Cross-Platform Testing {#docs:stable:dev:building:python::cross-platform-testing} You can run the packaging workflow manually on your fork for any branch, choosing platforms and test suites via the GitHub Actions web interface. #### Troubleshooting {#docs:stable:dev:building:python::troubleshooting} ##### Build Issues {#docs:stable:dev:building:python::build-issues} **Missing git tags**: If you forked DuckDB Python, ensure you have the upstream tags: ```bash git remote add upstream https://github.com/duckdb/duckdb-python.git git fetch --tags upstream git push --tags ``` ##### Platform-Specific Issues {#docs:stable:dev:building:python::platform-specific-issues} **Windows compilation**: Ensure you have Visual Studio 2019+ with C++ support installed. ### R {#docs:stable:dev:building:r} This page contains instructions for building the R client library. #### Parallelizing the Build {#docs:stable:dev:building:r::parallelizing-the-build} **Problem:** By default, R compiles packages using a single thread, which causes the build to be slow. **Solution:** To parallelize the compilation, create or edit the `~/.R/Makevars` file, and add a line like the following: ```ini MAKEFLAGS = -j8 ``` The above will parallelize the compilation using 8 threads. On Linux/macOS, you can add the following to use all of the machine's threads: ```ini MAKEFLAGS = -j$(nproc) ``` However, note that, the more threads that are used, the higher the RAM consumption. If the system runs out of RAM while compiling, then the R session will crash. ### Troubleshooting {#docs:stable:dev:building:troubleshooting} This page contains solutions to common problems reported by users. If you have platform-specific issues, make sure you also consult the troubleshooting guide for your platform such as the one for [Linux builds](#docs:stable:dev:building:linux::troubleshooting). #### The Build Runs Out of Memory {#docs:stable:dev:building:troubleshooting::the-build-runs-out-of-memory} **Problem:** Ninja parallelizes the build, which can cause out-of-memory issues on systems with limited resources. These issues have also been reported to occur on Alpine Linux, especially on machines with limited resources. **Solution:** Avoid using Ninja by setting the Makefile generator to empty via `GEN=`: ```bash GEN= make ``` ### Unofficial and Unsupported Platforms {#docs:stable:dev:building:unofficial_and_unsupported_platforms} > **Warning.** > The platforms listed on this page are not officially supported. > The build instructions are provided on a best-effort basis. > Community contributions are very welcome. DuckDB is built and distributed for several platform with [different levels of support](#docs:stable:dev:building:overview). DuckDB _can be built_ for other platforms with varying levels of success. This page provides an overview of these with the intent to clarify which platforms can be expected to work. #### 32-bit Architectures {#docs:stable:dev:building:unofficial_and_unsupported_platforms::32-bit-architectures} [32-bit architectures](https://en.wikipedia.org/wiki/32-bit_computing) are officially not supported but it is possible to build DuckDB manually for some of these platforms. For example, see the build instructions for [32-bit Raspberry Pi boards](#docs:stable:dev:building:raspberry_pi::raspberry-pi-32-bit). Note that 32-bit platforms are limited to using 4 GiB RAM due to the amount of addressable memory. #### Big-Endian Architectures {#docs:stable:dev:building:unofficial_and_unsupported_platforms::big-endian-architectures} [Big-endian architectures](https://en.wikipedia.org/wiki/Endianness) (such as PowerPC) are [not supported](https://duckdblabs.com/community_support_policy#architectures) by DuckDB. While DuckDB can likely be built on such architectures, the resulting binary may exhibit [correctness](https://github.com/duckdb/duckdb/issues/5548) [errors](https://github.com/duckdb/duckdb/issues/9714) on certain operations. Therefore, it's use is not recommended. #### RISC-V Architectures {#docs:stable:dev:building:unofficial_and_unsupported_platforms::risc-v-architectures} The user [â€œLivingLinuxâ€ on Bluesky](https://bsky.app/profile/livinglinux.bsky.social) managed to [build DuckDB](https://bsky.app/profile/livinglinux.bsky.social/post/3lak5q7mmg42j) for a [RISC-V](https://en.wikipedia.org/wiki/RISC-V) profile and [published a video about it](https://www.youtube.com/watch?v=G6uVDH3kvNQ). The instruction to build DuckDB, including the `fts` extension, is the following: ```bash GEN=ninja \ CC='gcc-14 -march=rv64gcv_zicsr_zifencei_zihintpause_zvl256b' \ CXX='g++-14 -march=rv64gcv_zicsr_zifencei_zihintpause_zvl256b' \ CORE_EXTENSIONS='fts' \ make ``` For those who do not have a RISC-V chip development environment, you can cross-compile DuckDB using latest [g++-riscv64-linux-gnu](https://github.com/riscv-collab/riscv-gnu-toolchain) : ```bash GEN=ninja \ CC='riscv64-linux-gnu-gcc -march=rv64gcv_zicsr_zifencei_zihintpause_zvl256b' \ CXX='riscv64-linux-gnu-g++ -march=rv64gcv_zicsr_zifencei_zihintpause_zvl256b' \ make ``` For more reference information on DuckDB RISC-V cross-compiling, see the [mocusez/duckdb-riscv-ci](https://github.com/mocusez/duckdb-riscv-ci) and [DuckDB Pull Request #16549](https://github.com/duckdb/duckdb/pull/16549) ## Benchmark Suite {#docs:stable:dev:benchmark} DuckDB has an extensive benchmark suite. When making changes that have potential performance implications, it is important to run these benchmarks to detect potential performance regressions. #### Getting Started {#docs:stable:dev:benchmark::getting-started} To build the benchmark suite, run the following command in the [DuckDB repository](https://github.com/duckdb/duckdb): ```batch BUILD_BENCHMARK=1 CORE_EXTENSIONS='tpch' make ``` #### Listing Benchmarks {#docs:stable:dev:benchmark::listing-benchmarks} To list all available benchmarks, run: ```batch build/release/benchmark/benchmark_runner --list ``` #### Running Benchmarks {#docs:stable:dev:benchmark::running-benchmarks} ##### Running a Single Benchmark {#docs:stable:dev:benchmark::running-a-single-benchmark} To run a single benchmark, issue the following command: ```batch build/release/benchmark/benchmark_runner benchmark/micro/nulls/no_nulls_addition.benchmark ``` The output will be printed to `stdout` in CSV format, in the following format: ```text name run timing benchmark/micro/nulls/no_nulls_addition.benchmark 1 0.121234 benchmark/micro/nulls/no_nulls_addition.benchmark 2 0.121702 benchmark/micro/nulls/no_nulls_addition.benchmark 3 0.122948 benchmark/micro/nulls/no_nulls_addition.benchmark 4 0.122534 benchmark/micro/nulls/no_nulls_addition.benchmark 5 0.124102 ``` You can also specify an output file using the `--out` flag. This will write only the timings (delimited by newlines) to that file. ```batch build/release/benchmark/benchmark_runner benchmark/micro/nulls/no_nulls_addition.benchmark --out=timings.out ``` The output will contain the following: ```text 0.182472 0.185027 0.184163 0.185281 0.182948 ``` ##### Running Multiple Benchmark Using a Regular Expression {#docs:stable:dev:benchmark::running-multiple-benchmark-using-a-regular-expression} You can also use a regular expression to specify which benchmarks to run. Be careful of shell expansion of certain regex characters (e.g., `*` will likely be expanded by your shell, hence this requires proper quoting or escaping). ```batch build/release/benchmark/benchmark_runner "benchmark/micro/nulls/.*" ``` ###### Running All Benchmarks {#docs:stable:dev:benchmark::running-all-benchmarks} Not specifying any argument will run all benchmarks. ```batch build/release/benchmark/benchmark_runner ``` ###### Other Options {#docs:stable:dev:benchmark::other-options} The `--info` flag gives you some other information about the benchmark. ```batch build/release/benchmark/benchmark_runner benchmark/micro/nulls/no_nulls_addition.benchmark --info ``` ```text display_name:NULL Addition (no nulls) group:micro subgroup:nulls ``` The `--query` flag will print the query that is run by the benchmark. ```sql SELECT min(i + 1) FROM integers; ``` The `--profile` flag will output a query tree. ## Testing {#dev:sqllogictest} ### Overview {#docs:stable:dev:sqllogictest:overview} #### How is DuckDB Tested? {#docs:stable:dev:sqllogictest:overview::how-is-duckdb-tested} Testing is vital to make sure that DuckDB works properly and keeps working properly. For that reason, we put a large emphasis on thorough and frequent testing: * We run a batch of small tests on every commit using [GitHub Actions](https://github.com/duckdb/duckdb/actions), and run a more exhaustive batch of tests on pull requests and commits in the `main` branch. * We use a [fuzzer](https://github.com/duckdb/duckdb-fuzzer), which automatically reports of issues found through fuzzing DuckDB. * We use [SQLsmith](#docs:stable:core_extensions:sqlsmith) for generating random queries. ### sqllogictest Introduction {#docs:stable:dev:sqllogictest:intro} For testing plain SQL, we use an extended version of the SQL logic test suite, adopted from [SQLite](https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki). Every test is a single self-contained file located in the `test/sql` directory. To run tests located outside of the default `test` directory, specify `--test-dir ` and make sure provided test file paths are relative to that root directory. The test describes a series of SQL statements, together with either the expected result, a `statement ok` indicator, or a `statement error` indicator. An example of a test file is shown below: ```sql # name: test/sql/projection/test_simple_projection.test # group [projection] # enable query verification statement ok PRAGMA enable_verification # create table statement ok CREATE TABLE a (i INTEGER, j INTEGER); # insertion: 1 affected row statement ok INSERT INTO a VALUES (42, 84); query II SELECT * FROM a; ---- 42 84 ``` In this example, three statements are executed. The first statements are expected to succeed (prefixed by `statement ok`). The third statement is expected to return a single row with two columns (indicated by `query II`). The values of the row are expected to be `42` and `84` (separated by a tab character). For more information on query result verification, see the [result verification section](#docs:stable:dev:sqllogictest:result_verification). The top of every file should contain a comment describing the name and group of the test. The name of the test is always the relative file path of the file. The group is the folder that the file is in. The name and group of the test are relevant because they can be used to execute *only* that test in the unittest group. For example, if we wanted to execute *only* the above test, we would run the command `unittest test/sql/projection/test_simple_projection.test`. If we wanted to run all tests in a specific directory, we would run the command `unittest "[projection]"`. Any tests that are placed in the `test` directory are automatically added to the test suite. Note that the extension of the test is significant. The sqllogictests should either use the `.test` extension, or the `.test_slow` extension. The `.test_slow` extension indicates that the test takes a while to run, and will only be run when all tests are explicitly run using `unittest *`. Tests with the extension `.test` will be included in the fast set of tests. #### Query Verification {#docs:stable:dev:sqllogictest:intro::query-verification} Many simple tests start by enabling query verification. This can be done through the following `PRAGMA` statement: ```sql statement ok PRAGMA enable_verification ``` Query verification performs extra validation to ensure that the underlying code runs correctly. The most important part of that is that it verifies that optimizers do not cause bugs in the query. It does this by running both an unoptimized and optimized version of the query, and verifying that the results of these queries are identical. Query verification is very useful because it not only discovers bugs in optimizers, but also finds bugs in e.g., join implementations. This is because the unoptimized version will typically run using cross products instead. Because of this, query verification can be very slow to do when working with larger datasets. It is therefore recommended to turn on query verification for all unit tests, except those involving larger datasets (more than ~10-100 rows). #### Editors & Syntax Highlighting {#docs:stable:dev:sqllogictest:intro::editors--syntax-highlighting} The sqllogictests are not exactly an industry standard, but several other systems have adopted them as well. Parsing sqllogictests is intentionally simple. All statements have to be separated by empty lines. For that reason, writing a syntax highlighter is not extremely difficult. A syntax highlighter exists for [Visual Studio Code](https://marketplace.visualstudio.com/items?itemName=benesch.sqllogictest). We have also [made a fork that supports the DuckDB dialect of the sqllogictests](https://github.com/Mytherin/vscode-sqllogictest). You can use the fork by installing the original, then copying the `syntaxes/sqllogictest.tmLanguage.json` into the installed extension (on macOS this is located in `~/.vscode/extensions/benesch.sqllogictest-0.1.1`). A syntax highlighter is also available for [CLion](https://plugins.jetbrains.com/plugin/15295-sqltest). It can be installed directly on the IDE by searching SQLTest on the marketplace. A [GitHub repository](https://github.com/pdet/SQLTest) is also available, with extensions and bug reports being welcome. ##### Temporary Files {#docs:stable:dev:sqllogictest:intro::temporary-files} For some tests (e.g., CSV/Parquet file format tests) it is necessary to create temporary files. Any temporary files should be created in the temporary testing directory. This directory can be used by placing the string `__TEST_DIR__` in a query. This string will be replaced by the path of the temporary testing directory. ```sql statement ok COPY csv_data TO '__TEST_DIR__/output_file.csv.gz' (COMPRESSION gzip); ``` ##### Require & Extensions {#docs:stable:dev:sqllogictest:intro::require--extensions} To avoid bloating the core system, certain functionality of DuckDB is available only as an extension. Tests can be build for those extensions by adding a `require` field in the test. If the extension is not loaded, any statements that occurs after the require field will be skipped. Examples of this are `require parquet` or `require icu`. Another usage is to limit a test to a specific vector size. For example, adding `require vector_size 512` to a test will prevent the test from being run unless the vector size greater than or equal to 512. This is useful because certain functionality is not supported for low vector sizes, but we run tests using a vector size of 2 in our CI. ### Writing Tests {#docs:stable:dev:sqllogictest:writing_tests} #### Development and Testing {#docs:stable:dev:sqllogictest:writing_tests::development-and-testing} It is crucial that any new features that get added have correct tests that not only test the â€œhappy pathâ€, but also test edge cases and incorrect usage of the feature. In this section, we describe how DuckDB tests are structured and how to make new tests for DuckDB. The tests can be run by running the `unittest` program located in the `test` folder. For the default compilations this is located in either `build/release/test/unittest` (release) or `build/debug/test/unittest` (debug). #### Philosophy {#docs:stable:dev:sqllogictest:writing_tests::philosophy} When testing DuckDB, we aim to route all the tests through SQL. We try to avoid testing components individually because that makes those components more difficult to change later on. As such, almost all of our tests can (and should) be expressed in pure SQL. There are certain exceptions to this, which we will discuss in [Catch Tests](#docs:stable:dev:sqllogictest:catch). However, in most cases you should write your tests in plain SQL. #### Frameworks {#docs:stable:dev:sqllogictest:writing_tests::frameworks} SQL tests should be written using the [sqllogictest framework](#docs:stable:dev:sqllogictest:intro). C++ tests can be written using the [Catch framework](#docs:stable:dev:sqllogictest:catch). #### Client Connector Tests {#docs:stable:dev:sqllogictest:writing_tests::client-connector-tests} DuckDB also has tests for various client connectors. These are generally written in the relevant client language, and can be found in `tools/*/tests`. They also double as documentation of what should be doable from a given client. #### Functions for Generating Test Data {#docs:stable:dev:sqllogictest:writing_tests::functions-for-generating-test-data} DuckDB has built-in functions for generating test data. ##### `test_all_types` Function {#docs:stable:dev:sqllogictest:writing_tests::test_all_types-function} The `test_all_types` table function generates a table whose columns correspond to types (` BOOL`, `TINYINT`, etc.). The table has three rows encoding the minimum value, the maximum value, and the `NULL` value for each type. ```sql FROM test_all_types(); ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ bool â”‚ tinyint â”‚ smallint â”‚ int â”‚ bigint â”‚ hugeint â”‚ â€¦ â”‚ struct â”‚ struct_of_arrays â”‚ array_of_structs â”‚ map â”‚ union â”‚ â”‚ boolean â”‚ int8 â”‚ int16 â”‚ int32 â”‚ int64 â”‚ int128 â”‚ â”‚ struct(a integer, â€¦ â”‚ struct(a integer[]â€¦ â”‚ struct(a integer, â€¦ â”‚ map(varchar, varchâ€¦ â”‚ union("name" varchâ€¦ â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ false â”‚ -128 â”‚ -32768 â”‚ -2147483648 â”‚ -9223372036854775808 â”‚ -17014118346046923â€¦ â”‚ â€¦ â”‚ {'a': NULL, 'b': Nâ€¦ â”‚ {'a': NULL, 'b': Nâ€¦ â”‚ [] â”‚ {} â”‚ Frank â”‚ â”‚ true â”‚ 127 â”‚ 32767 â”‚ 2147483647 â”‚ 9223372036854775807 â”‚ 170141183460469231â€¦ â”‚ â€¦ â”‚ {'a': 42, 'b': ðŸ¦†â€¦ â”‚ {'a': [42, 999, NUâ€¦ â”‚ [{'a': NULL, 'b': â€¦ â”‚ {key1=ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†â€¦ â”‚ 5 â”‚ â”‚ NULL â”‚ NULL â”‚ NULL â”‚ NULL â”‚ NULL â”‚ NULL â”‚ â€¦ â”‚ NULL â”‚ NULL â”‚ NULL â”‚ NULL â”‚ NULL â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ 3 rows 44 columns (11 shown) â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ##### `test_vector_types` Function {#docs:stable:dev:sqllogictest:writing_tests::test_vector_types-function} The `test_vector_types` table function takes _n_ arguments `col1`, ..., `coln` and an optional `BOOLEAN` argument `all_flat`. The function generates a table with _n_ columns `test_vector`, `test_vector2`, ..., `test_vectorn`. In each row, each field contains values conforming to the type of their respective column. ```sql FROM test_vector_types(NULL::BIGINT); ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ test_vector â”‚ â”‚ int64 â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ -9223372036854775808 â”‚ â”‚ 9223372036854775807 â”‚ â”‚ NULL â”‚ â”‚ ... â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` ```sql FROM test_vector_types(NULL::ROW(i INTEGER, j VARCHAR, k DOUBLE), NULL::TIMESTAMP); ``` ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ test_vector â”‚ test_vector2 â”‚ â”‚ struct(i integer, j varchar, k double) â”‚ timestamp â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ {'i': -2147483648, 'j': ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†ðŸ¦†, 'k': -1.7976931348623157e+308} â”‚ 290309-12-22 (BC) 00:00:00 â”‚ â”‚ {'i': 2147483647, 'j': goo\0se, 'k': 1.7976931348623157e+308} â”‚ 294247-01-10 04:00:54.775806 â”‚ â”‚ {'i': NULL, 'j': NULL, 'k': NULL} â”‚ NULL â”‚ â”‚ ... â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` `test_vector_types` has an optional argument called `all_flat` of type `BOOL`. This only affects the internal representation of the vector. ```sql FROM test_vector_types(NULL::ROW(i INTEGER, j VARCHAR, k DOUBLE), NULL::TIMESTAMP, all_flat = true); -- the output is the same as above but with a different internal representation ``` ### Debugging {#docs:stable:dev:sqllogictest:debugging} The purpose of the tests is to figure out when things break. Inevitably changes made to the system will cause one of the tests to fail, and when that happens the test needs to be debugged. First, it is always recommended to run in debug mode. This can be done by compiling the system using the command `make debug`. Second, it is recommended to only run the test that breaks. This can be done by passing the filename of the breaking test to the test suite as a command line parameter (e.g., `build/debug/test/unittest test/sql/projection/test_simple_projection.test`). For more options on running a subset of the tests see the [Triggering which tests to run](#::triggering-which-tests-to-run) section. After that, a debugger can be attached to the program and the test can be debugged. In the sqllogictests it is normally difficult to break on a specific query, however, we have expanded the test suite so that a function called `query_break` is called with the line number `line` as parameter for every query that is run. This allows you to put a conditional breakpoint on a specific query. For example, if we want to break on line number 43 of the test file we can create the following break point: ```text gdb: break query_break if line==43 lldb: break s -n query_break -c line==43 ``` You can also skip certain queries from executing by placing `mode skip` in the file, followed by an optional `mode unskip`. Any queries between the two statements will not be executed. #### Triggering Which Tests to Run {#docs:stable:dev:sqllogictest:debugging::triggering-which-tests-to-run} When running the unittest program, by default all the fast tests are run. A specific test can be run by adding the name of the test as an argument. For the sqllogictests, this is the relative path to the test file. To run only a single test: ```batch build/debug/test/unittest test/sql/projection/test_simple_projection.test ``` All tests in a given directory can be executed by providing the directory as a parameter with square brackets. To run all tests in the â€œprojectionâ€ directory: ```batch build/debug/test/unittest "[projection]" ``` All tests, including the slow tests, can be run by running the tests with an asterisk. To run all tests, including the slow tests: ```batch build/debug/test/unittest "*" ``` We can run a subset of the tests using the `--start-offset` and `--end-offset` parameters. To run the tests 200..250: ```batch build/debug/test/unittest --start-offset=200 --end-offset=250 ``` These are also available in percentages. To run tests 10% - 20%: ```batch build/debug/test/unittest --start-offset-percentage=10 --end-offset-percentage=20 ``` The set of tests to run can also be loaded from a file containing one test name per line, and loaded using the `-f` command. ```batch cat test.list ``` ```text test/sql/join/full_outer/test_full_outer_join_issue_4252.test test/sql/join/full_outer/full_outer_join_cache.test test/sql/join/full_outer/test_full_outer_join.test ``` To run only the tests labeled in the file: ```batch build/debug/test/unittest -f test.list ``` ### Result Verification {#docs:stable:dev:sqllogictest:result_verification} The standard way of verifying results of queries is using the `query` statement, followed by the letter `I` times the number of columns that are expected in the result. After the query, four dashes (` ----`) are expected followed by the result values separated by tabs. For example, ```sql query II SELECT 42, 84 UNION ALL SELECT 10, 20; ---- 42 84 10 20 ``` For legacy reasons the letters `R` and `T` are also accepted to denote columns. > **Deprecated.** DuckDB deprecated the usage of types in the sqllogictest. The DuckDB test runner does not use or need them internally â€“ therefore, only `I` should be used to denote columns. #### NULL Values and Empty Strings {#docs:stable:dev:sqllogictest:result_verification::null-values-and-empty-strings} Empty lines have special significance for the SQLLogic test runner: they signify an end of the current statement or query. For that reason, empty strings and NULL values have special syntax that must be used in result verification. NULL values should use the string `NULL`, and empty strings should use the string `(empty)`, e.g.: ```sql query II SELECT NULL, '' ---- NULL (empty) ``` #### Error Verification {#docs:stable:dev:sqllogictest:result_verification::error-verification} In order to signify that an error is expected, the `statement error` indicator can be used. The `statement error` also takes an optional expected result â€“ which is interpreted as the *expected error message*. Similar to `query`, the expected error should be placed after the four dashes (` ----`) following the query. The test passes if the error message *contains* the text under `statement error` â€“ the entire error message does not need to be provided. It is recommended that you only use a subset of the error message, so that the test does not break unnecessarily if the formatting of error messages is changed. ```sql statement error SELECT * FROM non_existent_table; ---- Table with name non_existent_table does not exist! ``` #### Regex {#docs:stable:dev:sqllogictest:result_verification::regex} In certain cases result values might be very large or complex, and we might only be interested in whether or not the result *contains* a snippet of text. In that case, we can use the `:` modifier followed by a certain regex. If the result value matches the regex the test is passed. This is primarily used for query plan analysis. ```sql query II EXPLAIN SELECT tbl.a FROM 'data/parquet-testing/arrow/alltypes_plain.parquet' tbl(a) WHERE a = 1 OR a = 2 ---- physical_plan :.*PARQUET_SCAN.*Filters: a=1 OR a=2.* ``` If we instead want the result *not* to contain a snippet of text, we can use the `<!REGEX>:` modifier. #### File {#docs:stable:dev:sqllogictest:result_verification::file} As results can grow quite large, and we might want to re-use results over multiple files, it is also possible to read expected results from files using the `` command. The expected result is read from the given file. As convention the file path should be provided as relative to the root of the GitHub repository. ```sql query I PRAGMA tpch(1) ---- :extension/tpch/dbgen/answers/sf1/q01.csv ``` #### Row-Wise vs. Value-Wise Result Ordering {#docs:stable:dev:sqllogictest:result_verification::row-wise-vs-value-wise-result-ordering} The result values of a query can be either supplied in row-wise order, with the individual values separated by tabs, or in value-wise order. In value wise order the individual *values* of the query must appear in row, column order each on an individual line. Consider the following example in both row-wise and value-wise order: ```sql # row-wise query II SELECT 42, 84 UNION ALL SELECT 10, 20; ---- 42 84 10 20 # value-wise query II SELECT 42, 84 UNION ALL SELECT 10, 20; ---- 42 84 10 20 ``` #### Hashes and Outputting Values {#docs:stable:dev:sqllogictest:result_verification::hashes-and-outputting-values} Besides direct result verification, the sqllogic test suite also has the option of using MD5 hashes for value comparisons. A test using hashes for result verification looks like this: ```sql query I SELECT g, string_agg(x,',') FROM strings GROUP BY g ---- 200 values hashing to b8126ea73f21372cdb3f2dc483106a12 ``` This approach is useful for reducing the size of tests when results have many output rows. However, it should be used sparingly, as hash values make the tests more difficult to debug if they do break. After it is ensured that the system outputs the correct result, hashes of the queries in a test file can be computed by adding `mode output_hash` to the test file. For example: ```sql mode output_hash query II SELECT 42, 84 UNION ALL SELECT 10, 20; ---- 42 84 10 20 ``` The expected output hashes for every query in the test file will then be printed to the terminal, as follows: ```text ================================================================================ SQL Query SELECT 42, 84 UNION ALL SELECT 10, 20; ================================================================================ 4 values hashing to 498c69da8f30c24da3bd5b322a2fd455 ================================================================================ ``` In a similar manner, `mode output_result` can be used in order to force the program to print the result to the terminal for every query run in the test file. #### Result Sorting {#docs:stable:dev:sqllogictest:result_verification::result-sorting} Queries can have an optional field that indicates that the result should be sorted in a specific manner. This field goes in the same location as the connection label. Because of that, connection labels and result sorting cannot be mixed. The possible values of this field are `nosort`, `rowsort` and `valuesort`. An example of how this might be used is given below: ```sql query I rowsort SELECT 'world' UNION ALL SELECT 'hello' ---- hello world ``` In general, we prefer not to use this field and rely on `ORDER BY` in the query to generate deterministic query answers. However, existing sqllogictests use this field extensively, hence it is important to know of its existence. #### Query Labels {#docs:stable:dev:sqllogictest:result_verification::query-labels} Another feature that can be used for result verification are `query labels`. These can be used to verify that different queries provide the same result. This is useful for comparing queries that are logically equivalent, but formulated differently. Query labels are provided after the connection label or sorting specifier. Queries that have a query label do not need to have a result provided. Instead, the results of each of the queries with the same label are compared to each other. For example, the following script verifies that the queries `SELECT 42+1` and `SELECT 44-1` provide the same result: ```sql query I nosort r43 SELECT 42+1; ---- query I nosort r43 SELECT 44-1; ---- ``` ### Persistent Testing {#docs:stable:dev:sqllogictest:persistent_testing} By default, all tests are run in in-memory mode (unless `--force-storage` is enabled). In certain cases, we want to force the usage of a persistent database. We can initiate a persistent database using the `load` command, and trigger a reload of the database using the `restart` command. ```sql # load the DB from disk load __TEST_DIR__/storage_scan.db statement ok CREATE TABLE test (a INTEGER); statement ok INSERT INTO test VALUES (11), (12), (13), (14), (15), (NULL) # ... restart query I SELECT * FROM test ORDER BY a ---- NULL 11 12 13 14 15 ``` Note that by default the tests run with `SET wal_autocheckpoint = '0KB'` â€“ meaning a checkpoint is triggered after every statement. WAL tests typically run with the following settings to disable this behavior: ```sql statement ok PRAGMA disable_checkpoint_on_shutdown statement ok PRAGMA wal_autocheckpoint = '1TB' ``` ### Loops {#docs:stable:dev:sqllogictest:loops} Loops can be used in sqllogictests when it is required to execute the same query many times but with slight modifications in constant values. For example, suppose we want to fire off 100 queries that check for the presence of the values `0..100` in a table: ```sql # create the table 'integers' with values 0..100 statement ok CREATE TABLE integers AS SELECT * FROM range(0, 100, 1) t1(i); # verify individually that all 100 values are there loop i 0 100 # execute the query, replacing the value query I SELECT count(*) FROM integers WHERE i = ${i}; ---- 1 # end the loop (note that multiple statements can be part of a loop) endloop ``` Similarly, `foreach` can be used to iterate over a set of values. ```sql foreach partcode millennium century decade year quarter month day hour minute second millisecond microsecond epoch query III SELECT i, date_part('${partcode}', i) AS p, date_part(['${partcode}'], i) AS st FROM intervals WHERE p <> st['${partcode}']; ---- endloop ``` `foreach` also has a number of preset combinations that should be used when required. In this manner, when new combinations are added to the preset, old tests will automatically pick up these new combinations. | Preset | Expansion | |----------------|--------------------------------------------------------------| | âŸ¨compressionâŸ© | none uncompressed rle bitpacking dictionary fsst chimp patas | | âŸ¨signedâŸ© | tinyint smallint integer bigint hugeint | | âŸ¨unsignedâŸ© | utinyint usmallint uinteger ubigint uhugeint | | âŸ¨integralâŸ© | âŸ¨signedâŸ© âŸ¨unsignedâŸ© | | âŸ¨numericâŸ© | âŸ¨integralâŸ© float double | | âŸ¨alltypesâŸ© | âŸ¨numericâŸ© bool interval varchar json | > Use large loops sparingly. Executing hundreds of thousands of SQL statements will slow down tests unnecessarily. Do not use loops for inserting data. #### Data Generation without Loops {#docs:stable:dev:sqllogictest:loops::data-generation-without-loops} Loops should be used sparingly. While it might be tempting to use loops for inserting data using insert statements, this will considerably slow down the test cases. Instead, it is better to generate data using the built-in `range` and `repeat` functions. To create the table `integers` with the values `[0, 1, ..., 98, 99]`, run: ```sql CREATE TABLE integers AS SELECT * FROM range(0, 100, 1) t1(i); ``` To create the table `strings` with 100 times the value `hello`, run: ```sql CREATE TABLE strings AS SELECT * FROM repeat('hello', 100) t1(s); ``` Using these two functions, together with clever use of cross products and other expressions, many different types of datasets can be efficiently generated. The `random()` function can also be used to generate random data. An alternative option is to read data from an existing CSV or Parquet file. There are several large CSV files that can be loaded from the directory `test/sql/copy/csv/data/real` using a `COPY INTO` statement or the `read_csv_auto` function. The TPC-H and TPC-DS extensions can also be used to generate synthetic data, using e.g. `CALL dbgen(sf = 1)` or `CALL dsdgen(sf = 1)`. ### Multiple Connections {#docs:stable:dev:sqllogictest:multiple_connections} For tests whose purpose is to verify that the transactional management or versioning of data works correctly, it is generally necessary to use multiple connections. For example, if we want to verify that the creation of tables is correctly transactional, we might want to start a transaction and create a table in `con1`, then fire a query in `con2` that checks that the table is not accessible yet until committed. We can use multiple connections in the sqllogictests using `connection labels`. The connection label can be optionally appended to any `statement` or `query`. All queries with the same connection label will be executed in the same connection. A test that would verify the above property would look as follows: ```sql statement ok con1 BEGIN TRANSACTION statement ok con1 CREATE TABLE integers (i INTEGER); statement error con2 SELECT * FROM integers; ``` #### Concurrent Connections {#docs:stable:dev:sqllogictest:multiple_connections::concurrent-connections} Using connection modifiers on the statement and queries will result in testing of multiple connections, but all the queries will still be run *sequentially* on a single thread. If we want to run code from multiple connections *concurrently* over multiple threads, we can use the `concurrentloop` construct. The queries in `concurrentloop` will be run concurrently on separate threads at the same time. ```sql concurrentloop i 0 10 statement ok CREATE TEMP TABLE t2 AS (SELECT 1); statement ok INSERT INTO t2 VALUES (42); statement ok DELETE FROM t2 endloop ``` One caveat with `concurrentloop` is that results are often unpredictable â€“ as multiple clients can hammer the database at the same time we might end up with (expected) transaction conflicts. `statement maybe` can be used to deal with these situations. `statement maybe` essentially accepts both a success, and a failure with a specific error message. ```sql concurrentloop i 1 10 statement maybe CREATE OR REPLACE TABLE t2 AS (SELECT -54124033386577348004002656426531535114 FROM t2 LIMIT 70%); ---- write-write conflict endloop ``` ### Catch C/C++ Tests {#docs:stable:dev:sqllogictest:catch} While we prefer the sqllogic tests for testing most functionality, for certain tests only SQL is not sufficient. This typically happens when you want to test the C++ API. When using pure SQL is really not an option it might be necessary to make a C++ test using Catch. Catch tests reside in the test directory as well. Here is an example of a catch test that tests the storage of the system: ```cpp #include "catch.hpp" #include "test_helpers.hpp" TEST_CASE("Test simple storage", "[storage]") { auto config = GetTestConfig(); unique_ptr result; auto storage_database = TestCreatePath("storage_test"); // make sure the database does not exist DeleteDatabase(storage_database); { // create a database and insert values DuckDB db(storage_database, config.get()); Connection con(db); REQUIRE_NO_FAIL(con.Query("CREATE TABLE test (a INTEGER, b INTEGER);")); REQUIRE_NO_FAIL(con.Query("INSERT INTO test VALUES (11, 22), (13, 22), (12, 21), (NULL, NULL)")); REQUIRE_NO_FAIL(con.Query("CREATE TABLE test2 (a INTEGER);")); REQUIRE_NO_FAIL(con.Query("INSERT INTO test2 VALUES (13), (12), (11)")); } // reload the database from disk a few times for (idx_t i = 0; i < 2; i++) { DuckDB db(storage_database, config.get()); Connection con(db); result = con.Query("SELECT * FROM test ORDER BY a"); REQUIRE(CHECK_COLUMN(result, 0, {Value(), 11, 12, 13})); REQUIRE(CHECK_COLUMN(result, 1, {Value(), 22, 21, 22})); result = con.Query("SELECT * FROM test2 ORDER BY a"); REQUIRE(CHECK_COLUMN(result, 0, {11, 12, 13})); } DeleteDatabase(storage_database); } ``` The test uses the `TEST_CASE` wrapper to create each test. The database is created and queried using the C++ API. Results are checked using either `REQUIRE_FAIL` / `REQUIRE_NO_FAIL` (corresponding to statement ok and statement error) or `REQUIRE(CHECK_COLUMN(...))` (corresponding to query with a result check). Every test that is created in this way needs to be added to the corresponding `CMakeLists.txt`. # Internals {#internals} ## Overview of DuckDB Internals {#docs:stable:internals:overview} On this page is a brief description of the internals of the DuckDB engine. #### Parser {#docs:stable:internals:overview::parser} The parser converts a query string into the following tokens: * [`SQLStatement`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/parser/sql_statement.hpp) * [`QueryNode`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/parser/query_node.hpp) * [`TableRef`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/parser/tableref.hpp) * [`ParsedExpression`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/parser/parsed_expression.hpp) The parser is not aware of the catalog or any other aspect of the database. It will not throw errors if tables do not exist, and will not resolve **any** types of columns yet. It only transforms a query string into a set of tokens as specified. ##### ParsedExpression {#docs:stable:internals:overview::parsedexpression} The ParsedExpression represents an expression within a SQL statement. This can be e.g., a reference to a column, an addition operator or a constant value. The type of the ParsedExpression indicates what it represents, e.g., a comparison is represented as a [`ComparisonExpression`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/parser/expression/comparison_expression.hpp). ParsedExpressions do **not** have types, except for nodes with explicit types such as `CAST` statements. The types for expressions are resolved in the Binder, not in the Parser. ##### TableRef {#docs:stable:internals:overview::tableref} The TableRef represents any table source. This can be a reference to a base table, but it can also be a join, a table-producing function or a subquery. ##### QueryNode {#docs:stable:internals:overview::querynode} The QueryNode represents either (1) a `SELECT` statement, or (2) a set operation (i.e., `UNION`, `INTERSECT` or `DIFFERENCE`). ##### SQL Statement {#docs:stable:internals:overview::sql-statement} The SQLStatement represents a complete SQL statement. The type of the SQL Statement represents what kind of statement it is (e.g., `StatementType::SELECT` represents a `SELECT` statement). A single SQL string can be transformed into multiple SQL statements in case the original query string contains multiple queries. #### Binder {#docs:stable:internals:overview::binder} The binder converts all nodes into their **bound** equivalents. In the binder phase: * The tables and columns are resolved using the catalog * Types are resolved * Aggregate/window functions are extracted The following conversions happen: * SQLStatement â†’ [`BoundStatement`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/bound_statement.hpp) * QueryNode â†’ [`BoundQueryNode`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/bound_query_node.hpp) * TableRef â†’ [`BoundTableRef`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/bound_tableref.hpp) * ParsedExpression â†’ [`Expression`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/expression.hpp) #### Logical Planner {#docs:stable:internals:overview::logical-planner} The logical planner creates [`LogicalOperator`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/logical_operator.hpp) nodes from the bound statements. In this phase, the actual logical query tree is created. #### Optimizer {#docs:stable:internals:overview::optimizer} After the logical planner has created the logical query tree, the optimizers are run over that query tree to create an optimized query plan. The following query optimizers are run: * **Expression Rewriter**: Simplifies expressions, performs constant folding * **Filter Pushdown**: Pushes filters down into the query plan and duplicates filters over equivalency sets. Also prunes subtrees that are guaranteed to be empty (because of filters that statically evaluate to false). * **Join Order Optimizer**: Reorders joins using dynamic programming. Specifically, the `DPccp` algorithm from the paper [Dynamic Programming Strikes Back](https://15721.courses.cs.cmu.edu/spring2017/papers/14-optimizer1/p539-moerkotte.pdf) is used. * **Common Sub Expressions**: Extracts common subexpressions from projection and filter nodes to prevent unnecessary duplicate execution. * **In Clause Rewriter**: Rewrites large static IN clauses to a MARK join or INNER join. #### Column Binding Resolver {#docs:stable:internals:overview::column-binding-resolver} The column binding resolver converts logical [`BoundColumnRefExpresion`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/expression/bound_columnref_expression.hpp) nodes that refer to a column of a specific table into [`BoundReferenceExpression`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/planner/expression/bound_reference_expression.hpp) nodes that refer to a specific index into the DataChunks that are passed around in the execution engine. #### Physical Plan Generator {#docs:stable:internals:overview::physical-plan-generator} The physical plan generator converts the resulting logical operator tree into a [`PhysicalOperator`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/execution/physical_operator.hpp) tree. #### Execution {#docs:stable:internals:overview::execution} In the execution phase, the physical operators are executed to produce the query result. DuckDB uses a push-based vectorized model, where [`DataChunks`](https://github.com/duckdb/duckdb/blob/main/src/include/duckdb/common/types/data_chunk.hpp) are pushed through the operator tree. For more information, see the talk [Push-Based Execution in DuckDB](https://www.youtube.com/watch?v=1kDrPgRUuEI). ## Storage Versions and Format {#docs:stable:internals:storage} #### Compatibility {#docs:stable:internals:storage::compatibility} ##### Backward Compatibility {#docs:stable:internals:storage::backward-compatibility} _Backward compatibility_ refers to the ability of a newer DuckDB version to read storage files created by an older DuckDB version. Version 0.10 is the first release of DuckDB that supports backward compatibility in the storage format. DuckDB v0.10 can read and operate on files created by the previous DuckDB version â€“ DuckDB v0.9. For future DuckDB versions, our goal is to ensure that any DuckDB version released **after** can read files created by previous versions, starting from this release. We want to ensure that the file format is fully backward compatible. This allows you to keep data stored in DuckDB files around and guarantees that you will be able to read the files without having to worry about which version the file was written with or having to convert files between versions. ##### Forward Compatibility {#docs:stable:internals:storage::forward-compatibility} _Forward compatibility_ refers to the ability of an older DuckDB version to read storage files produced by a newer DuckDB version. DuckDB v0.9 is [**partially** forward compatible with DuckDB v0.10](https://duckdb.org/2024/02/13/announcing-duckdb-0100#forward-compatibility). Certain files created by DuckDB v0.10 can be read by DuckDB v0.9. Forward compatibility is provided on a **best effort** basis. While stability of the storage format is important â€“ there are still many improvements and innovations that we want to make to the storage format in the future. As such, forward compatibility may be (partially) broken on occasion. #### How to Move between Storage Formats {#docs:stable:internals:storage::how-to-move-between-storage-formats} When you update DuckDB and open an old database file, you might encounter an error message about incompatible storage formats, pointing to this page. To move your database(s) to newer format you only need the older and the newer DuckDB executable. Open your database file with the older DuckDB and run the SQL statement `EXPORT DATABASE 'tmp'`. This allows you to save the whole state of the current database in use inside folder `tmp`. The content of the `tmp` folder will be overridden, so choose an empty/non yet existing location. Then, start the newer DuckDB and execute `IMPORT DATABASE 'tmp'` (pointing to the previously populated folder) to load the database, which can be then saved to the file you pointed DuckDB to. A Bash script to achieve this (to be adapted with the file names and executable locations) is the following ```bash /older/duckdb mydata.old.db -c "EXPORT DATABASE 'tmp'" /newer/duckdb mydata.new.db -c "IMPORT DATABASE 'tmp'" ``` After this, `mydata.old.db` will remain in the old format, `mydata.new.db` will contain the same data but in a format accessible by the more recent DuckDB version, and the folder `tmp` will hold the same data in a universal format as different files. Check [`EXPORT` documentation](#docs:stable:sql:statements:export) for more details on the syntax. ##### Explicit Storage Versions {#docs:stable:internals:storage::explicit-storage-versions} [DuckDB v1.2.0 introduced the `STORAGE_VERSION` option](https://duckdb.org/2025/02/05/announcing-duckdb-120#explicit-storage-versions), which allows explicitly specifying the storage version. Using this, you can opt-in to newer forwards-incompatible features: ```sql ATTACH 'file.db' (STORAGE_VERSION 'v1.2.0'); ``` This setting specifies the minimum DuckDB version that should be able to read the database file. When database files are written with this option, the resulting files cannot be opened by older DuckDB released versions than the specified version. They can be read by the specified version and all newer versions of DuckDB. If you attach to DuckDB databases, you can query the storage versions using the following command: ```sql SELECT database_name, tags FROM duckdb_databases(); ``` This shows the storage versions: ```text â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â” â”‚ database_name â”‚ tags â”‚ â”‚ varchar â”‚ map(varchar, varchar) â”‚ â”œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¤ â”‚ file1 â”‚ {storage_version=v1.2.0} â”‚ â”‚ file2 â”‚ {storage_version=v1.0.0 - v1.1.3} â”‚ â”‚ ... â”‚ ... â”‚ â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”´â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜ ``` This means that `file2` can be opened by past DuckDB versions while `file1` is compatible only with `v1.2.0` (or future versions). ##### Converting between Storage Versions {#docs:stable:internals:storage::converting-between-storage-versions} To convert from the new format to the old format for compatibility, use the following sequence in DuckDB v1.2.0+: ```sql ATTACH 'file1.db'; ATTACH 'converted_file.db' (STORAGE_VERSION 'v1.0.0'); COPY FROM DATABASE file1 TO converted_file; ``` #### Storage Header {#docs:stable:internals:storage::storage-header} DuckDB files start with a `uint64_t` which contains a checksum for the main header, followed by four magic bytes (` DUCK`), followed by the storage version number in a `uint64_t`. ```batch hexdump -n 20 -C mydata.db ``` ```text 00000000 01 d0 e2 63 9c 13 39 3e 44 55 43 4b 2b 00 00 00 |...c..9>DUCK+...| 00000010 00 00 00 00 |....| 00000014 ``` A simple example of reading the storage version using Python is below. ```python import struct pattern = struct.Struct('<8x4sQ') with open('test/sql/storage_version/storage_version.db', 'rb') as fh: print(pattern.unpack(fh.read(pattern.size))) ``` #### Storage Version Table {#docs:stable:internals:storage::storage-version-table} For changes in each given release, check out the [change log](https://github.com/duckdb/duckdb/releases) on GitHub. To see the commits that changed each storage version, see the [commit log](https://github.com/duckdb/duckdb/commits/main/src/storage/storage_info.cpp). | Storage version | DuckDB version(s) | |----------------:|---------------------------------| | 67 | v1.4.x | | 66 | v1.3.x | | 65 | v1.2.x | | 64 | v0.9.x, v0.10.x, v1.0.0, v1.1.x | | 51 | v0.8.x | | 43 | v0.7.x | | 39 | v0.6.x | | 38 | v0.5.x | | 33 | v0.3.3, v0.3.4, v0.4.0 | | 31 | v0.3.2 | | 27 | v0.3.1 | | 25 | v0.3.0 | | 21 | v0.2.9 | | 18 | v0.2.8 | | 17 | v0.2.7 | | 15 | v0.2.6 | | 13 | v0.2.5 | | 11 | v0.2.4 | | 6 | v0.2.3 | | 4 | v0.2.2 | | 1 | v0.2.1 and prior | #### Compression {#docs:stable:internals:storage::compression} DuckDB uses [lightweight compression](https://duckdb.org/2022/10/28/lightweight-compression). By default, compression is only applied to persistent databases and is **not applied to in-memory instances**. To turn on compression for in-memory databases, use `ATTACH` with the [`COMPRESS` option](#docs:stable:sql:statements:attach::options). ##### Compression Algorithms {#docs:stable:internals:storage::compression-algorithms} The compression algorithms supported by DuckDB include the following: * [Constant Encoding](https://duckdb.org/2022/10/28/lightweight-compression#constant-encoding) * [Run-Length Encoding (RLE)](https://duckdb.org/2022/10/28/lightweight-compression#run-length-encoding-rle) * [Bit Packing](https://duckdb.org/2022/10/28/lightweight-compression#bit-packing) * [Frame of Reference (FOR)](https://duckdb.org/2022/10/28/lightweight-compression#frame-of-reference) * [Dictionary Encoding](https://duckdb.org/2022/10/28/lightweight-compression#dictionary-encoding) * [Fast Static Symbol Table (FSST)](https://duckdb.org/2022/10/28/lightweight-compression#fsst) â€“ [VLDB 2020 paper](https://www.vldb.org/pvldb/vol13/p2649-boncz.pdf) * [Adaptive Lossless Floating-Point Compression (ALP)](https://duckdb.org/2024/02/13/announcing-duckdb-0100#adaptive-lossless-floating-point-compression-alp) â€“ [SIGMOD 2024 paper](https://ir.cwi.nl/pub/33334/33334.pdf) * [Chimp](https://duckdb.org/2022/10/28/lightweight-compression#chimp--patas) â€“ [VLDB 2022 paper](https://www.vldb.org/pvldb/vol15/p3058-liakos.pdf) * [Patas](https://duckdb.org/2022/11/14/announcing-duckdb-060#compression-improvements) #### Disk Usage {#docs:stable:internals:storage::disk-usage} The disk usage of DuckDB's format depends on a number of factors, including the data type and the data distribution, the compression methods used, etc. As a rough approximation, loading 100 GB of uncompressed CSV files into a DuckDB database file will require 25 GB of disk space, while loading 100 GB of Parquet files will require 120 GB of disk space. #### Row Groups {#docs:stable:internals:storage::row-groups} DuckDB's storage format stores the data in _row groups,_ i.e., horizontal partitions of the data. This concept is equivalent to [Parquet's row groups](https://parquet.apache.org/docs/concepts/). Several features in DuckDB, including [parallelism](#docs:stable:guides:performance:how_to_tune_workloads) and [compression](https://duckdb.org/2022/10/28/lightweight-compression) are based on row groups. The row group size can be specified as an option of the `ATTACH` statement: ```sql ATTACH '/tmp/somefile.db' AS db (ROW_GROUP_SIZE 16384); ``` #### Troubleshooting {#docs:stable:internals:storage::troubleshooting} ##### Error Message When Opening an Incompatible Database File {#docs:stable:internals:storage::error-message-when-opening-an-incompatible-database-file} When opening a database file that has been written by a different DuckDB version from the one you are using, the following error message may occur: ```console Error: unable to open database "...": Serialization Error: Failed to deserialize: ... ``` The message implies that the database file was created with a newer DuckDB version and uses features that are backward incompatible with the DuckDB version used to read the file. There are two potential workarounds: 1. Update your DuckDB version to the latest stable version. 2. Open the database with the latest version of DuckDB, export it to a standard format (e.g., Parquet), then import it using to any version of DuckDB. See the [`EXPORT/IMPORT DATABASE` statements](#docs:stable:sql:statements:export) for details. ## Execution Format {#docs:stable:internals:vector} `Vector` is the container format used to store in-memory data during execution. `DataChunk` is a collection of Vectors, used for instance to represent a column list in a `PhysicalProjection` operator. #### Data Flow {#docs:stable:internals:vector::data-flow} DuckDB uses a vectorized query execution model. All operators in DuckDB are optimized to work on Vectors of a fixed size. This fixed size is commonly referred to in the code as `STANDARD_VECTOR_SIZE`. The default `STANDARD_VECTOR_SIZE` is 2048 tuples. #### Vector Format {#docs:stable:internals:vector::vector-format} Vectors logically represent arrays that contain data of a single type. DuckDB supports different *vector formats*, which allow the system to store the same logical data with a different *physical representation*. This allows for a more compressed representation, and potentially allows for compressed execution throughout the system. Below the list of supported vector formats is shown. ##### Flat Vectors {#docs:stable:internals:vector::flat-vectors} Flat vectors are physically stored as a contiguous array, this is the standard uncompressed vector format. For flat vectors the logical and physical representations are identical. ![](../images/internals/flat.png) ##### Constant Vectors {#docs:stable:internals:vector::constant-vectors} Constant vectors are physically stored as a single constant value. ![](../images/internals/constant.png) Constant vectors are useful when data elements are repeated â€“ for example, when representing the result of a constant expression in a function call, the constant vector allows us to only store the value once. ```sql SELECT lst || 'duckdb' FROM range(1000) tbl(lst); ``` Since `duckdb` is a string literal, the value of the literal is the same for every row. In a flat vector, we would have to duplicate the literal 'duckdb' once for every row. The constant vector allows us to only store the literal once. Constant vectors are also emitted by the storage when decompressing from constant compression. ##### Dictionary Vectors {#docs:stable:internals:vector::dictionary-vectors} Dictionary vectors are physically stored as a child vector, and a selection vector that contains indexes into the child vector. ![](../images/internals/dictionary.png) Dictionary vectors are emitted by the storage when decompressing from dictionary Just like constant vectors, dictionary vectors are also emitted by the storage. When deserializing a dictionary compressed column segment, we store this in a dictionary vector so we can keep the data compressed during query execution. ##### Sequence Vectors {#docs:stable:internals:vector::sequence-vectors} Sequence vectors are physically stored as an offset and an increment value. ![](../images/internals/sequence.png) Sequence vectors are useful for efficiently storing incremental sequences. They are generally emitted for row identifiers. ##### Unified Vector Format {#docs:stable:internals:vector::unified-vector-format} These properties of the different vector formats are great for optimization purposes, for example you can imagine the scenario where all the parameters to a function are constant, we can just compute the result once and emit a constant vector. But writing specialized code for every combination of vector types for every function is unfeasible due to the combinatorial explosion of possibilities. Instead of doing this, whenever you want to generically use a vector regardless of the type, the UnifiedVectorFormat can be used. This format essentially acts as a generic view over the contents of the Vector. Every type of Vector can convert to this format. #### Complex Types {#docs:stable:internals:vector::complex-types} ##### String Vectors {#docs:stable:internals:vector::string-vectors} To efficiently store strings, we make use of our `string_t` class. ```cpp struct string_t { union { struct { uint32_t length; char prefix[4]; char *ptr; } pointer; struct { uint32_t length; char inlined[12]; } inlined; } value; }; ``` Short strings (` <= 12 bytes`) are inlined into the structure, while larger strings are stored with a pointer to the data in the auxiliary string buffer. The length is used throughout the functions to avoid having to call `strlen` and having to continuously check for null-pointers. The prefix is used for comparisons as an early out (when the prefix does not match, we know the strings are not equal and don't need to chase any pointers). ##### List Vectors {#docs:stable:internals:vector::list-vectors} List vectors are stored as a series of *list entries* together with a child Vector. The child vector contains the *values* that are present in the list, and the list entries specify how each individual list is constructed. ```cpp struct list_entry_t { idx_t offset; idx_t length; }; ``` The offset refers to the start row in the child Vector, the length keeps track of the size of the list of this row. List vectors can be stored recursively. For nested list vectors, the child of a list vector is again a list vector. For example, consider this mock representation of a Vector of type `BIGINT[][]`: ```json { "type": "list", "data": "list_entry_t", "child": { "type": "list", "data": "list_entry_t", "child": { "type": "bigint", "data": "int64_t" } } } ``` ##### Struct Vectors {#docs:stable:internals:vector::struct-vectors} Struct vectors store a list of child vectors. The number and types of the child vectors is defined by the schema of the struct. ##### Map Vectors {#docs:stable:internals:vector::map-vectors} Internally map vectors are stored as a `LIST[STRUCT(key KEY_TYPE, value VALUE_TYPE)]`. ##### Union Vectors {#docs:stable:internals:vector::union-vectors} Internally `UNION` utilizes the same structure as a `STRUCT`. The first â€œchildâ€ is always occupied by the Tag Vector of the `UNION`, which records for each row which of the `UNION`'s types apply to that row. ## Pivot Internals {#docs:stable:internals:pivot} #### `PIVOT` {#docs:stable:internals:pivot::pivot} [Pivoting](#docs:stable:sql:statements:pivot) is implemented as a combination of SQL query re-writing and a dedicated `PhysicalPivot` operator for higher performance. Each `PIVOT` is implemented as set of aggregations into lists and then the dedicated `PhysicalPivot` operator converts those lists into column names and values. Additional pre-processing steps are required if the columns to be created when pivoting are detected dynamically (which occurs when the `IN` clause is not in use). DuckDB, like most SQL engines, requires that all column names and types be known at the start of a query. In order to automatically detect the columns that should be created as a result of a `PIVOT` statement, it must be translated into multiple queries. [`ENUM` types](#docs:stable:sql:data_types:enum) are used to find the distinct values that should become columns. Each `ENUM` is then injected into one of the `PIVOT` statement's `IN` clauses. After the `IN` clauses have been populated with `ENUM`s, the query is re-written again into a set of aggregations into lists. For example: ```sql PIVOT cities ON year USING sum(population); ``` is initially translated into: ```sql CREATE TEMPORARY TYPE __pivot_enum_0_0 AS ENUM ( SELECT DISTINCT year::VARCHAR FROM cities ORDER BY year ); PIVOT cities ON year IN __pivot_enum_0_0 USING sum(population); ``` and finally translated into: ```sql SELECT country, name, list(year), list(population_sum) FROM ( SELECT country, name, year, sum(population) AS population_sum FROM cities GROUP BY ALL ) GROUP BY ALL; ``` This produces the result: | country | name | list("year") | list(population_sum) | |---------|---------------|--------------------|----------------------| | NL | Amsterdam | [2000, 2010, 2020] | [1005, 1065, 1158] | | US | Seattle | [2000, 2010, 2020] | [564, 608, 738] | | US | New York City | [2000, 2010, 2020] | [8015, 8175, 8772] | The `PhysicalPivot` operator converts those lists into column names and values to return this result: | country | name | 2000 | 2010 | 2020 | |---------|---------------|-----:|-----:|-----:| | NL | Amsterdam | 1005 | 1065 | 1158 | | US | Seattle | 564 | 608 | 738 | | US | New York City | 8015 | 8175 | 8772 | #### `UNPIVOT` {#docs:stable:internals:pivot::unpivot} ##### Internals {#docs:stable:internals:pivot::internals} Unpivoting is implemented entirely as rewrites into SQL queries. Each `UNPIVOT` is implemented as set of `unnest` functions, operating on a list of the column names and a list of the column values. If dynamically unpivoting, the `COLUMNS` expression is evaluated first to calculate the column list. For example: ```sql UNPIVOT monthly_sales ON jan, feb, mar, apr, may, jun INTO NAME month VALUE sales; ``` is translated into: ```sql SELECT empid, dept, unnest(['jan', 'feb', 'mar', 'apr', 'may', 'jun']) AS month, unnest(["jan", "feb", "mar", "apr", "may", "jun"]) AS sales FROM monthly_sales; ``` Note the single quotes to build a list of text strings to populate `month`, and the double quotes to pull the column values for use in `sales`. This produces the same result as the initial example: | empid | dept | month | sales | |------:|-------------|-------|------:| | 1 | electronics | jan | 1 | | 1 | electronics | feb | 2 | | 1 | electronics | mar | 3 | | 1 | electronics | apr | 4 | | 1 | electronics | may | 5 | | 1 | electronics | jun | 6 | | 2 | clothes | jan | 10 | | 2 | clothes | feb | 20 | | 2 | clothes | mar | 30 | | 2 | clothes | apr | 40 | | 2 | clothes | may | 50 | | 2 | clothes | jun | 60 | | 3 | cars | jan | 100 | | 3 | cars | feb | 200 | | 3 | cars | mar | 300 | | 3 | cars | apr | 400 | | 3 | cars | may | 500 | | 3 | cars | jun | 600 | # DuckDB Blog ## Testing Out DuckDB's Full Text Search Extension **Publication date:** 2021-01-25 **Author:** Laurens Kuiper **TL;DR:** DuckDB now has full-text search functionality, similar to the FTS5 extension in SQLite. The main difference is that our FTS extension is fully formulated in SQL. We tested it out on TREC disks 4 and 5. Searching through textual data stored in a database can be cumbersome, as SQL does not provide a good way of formulating questions such as "Give me all the documents about __Mallard Ducks__": string patterns with `LIKE` will only get you so far. Despite SQL's shortcomings here, storing textual data in a database is commonplace. Consider the table `products (id INTEGER, name VARCHAR, description VARCHAR`) â€“ it would be useful to search through the `name` and `description` columns for a website that sells these products. We expect a search engine to return us results within milliseconds. For a long time databases were unsuitable for this task, because they could not search large inverted indexes at this speed: transactional database systems are not made for this use case. However, analytical database systems, can keep up with state-of-the art information retrieval systems. The company [Spinque](https://www.spinque.com/) is a good example of this. At Spinque, MonetDB is used as a computation engine for customized search engines. DuckDB's FTS implementation follows the paper "[Old Dogs Are Great at New Tricks](https://www.duckdb.org/pdf/SIGIR2014-column-stores-ir-prototyping.pdf)". A keen observation there is that advances made to the database system, such as parallelization, will speed up your search engine "for free"! Alright, enough about the "why", let's get to the "how". #### Preparing the Data The TREC 2004 Robust Retrieval Track has 250 "topics" (search queries) over TREC disks 4 and 5. The data consist of many text files stored in SGML format, along with a corresponding DTD (document type definition) file. This format is rarely used anymore, but it is similar to XML. We will use OpenSP's command line tool `osx` to convert it to XML. Because there are many files, I wrote a Bash script: ```bash mkdir -p latimes/xml for i in $(seq -w 1 9); do cat dtds/la.dtd latimes-$i | osx > latimes/xml/latimes-$i.xml done ``` This sorts the `latimes` files. Repeat for the `fbis`, `cr`, `fr94`, and `ft` files. To parse the XML I used BeautifulSoup. Each document has a `docno` identifier, and a `text` field. Because the documents do not come from the same source, they differ in what other fields they have. I chose to take all of the fields. ```python import duckdb import multiprocessing import pandas as pd import re from bs4 import BeautifulSoup as bs from tqdm import tqdm # fill variable 'files' with the path to each .xml file that we created here def process_file(fpath): dict_list = [] with open(fpath, 'r') as f: content = f.read() bs_content = bs(content, "html.parser") # find all 'doc' nodes for doc in bs_content.findChildren('doc', recursive=True): row_dict = {} for c in doc.findChildren(recursive=True): row_dict[c.name] = ''.join(c.findAll(text=True, recursive=False)).trim() dict_list.append(row_dict) return dict_list # process documents (in parallel to speed things up) pool = multiprocessing.Pool(multiprocessing.cpu_count()) list_of_dict_lists = [] for x in tqdm(pool.imap_unordered(process_file, files), total=len(files)): list_of_dict_lists.append(x) pool.close() pool.join() # create pandas dataframe from the parsed data documents_df = pd.DataFrame([x for sublist in list_of_dict_lists for x in sublist]) ``` Now that we have a dataframe, we can register it in DuckDB. ```python # create database connection and register the dataframe con = duckdb.connect(database='db/trec04_05.db', read_only=False) con.register('documents_df', documents_df) # create a table from the dataframe so that it persists con.execute("CREATE TABLE documents AS (SELECT * FROM documents_df)") con.close() ``` This is the end of my preparation script, so I closed the database connection. #### Building the Search Engine We can now build the inverted index and the retrieval model using a `PRAGMA` statement. The extension is [documented here](#docs:stable:core_extensions:full_text_search). We create an index table on table `documents` or `main.documents` that we created with our script. The column that identifies our documents is called `docno`, and we wish to create an inverted index on the fields supplied. I supplied all fields by using the '\*' shortcut. ```python con = duckdb.connect(database='db/trec04_05.db', read_only=False) con.execute("PRAGMA create_fts_index('documents', 'docno', '*', stopwords='english')") ``` Under the hood, a parameterized SQL script is called. The schema `fts_main_documents` is created, along with tables `docs`, `terms`, `dict`, and `stats`, that make up the inverted index. If you're curious what this look like, take a look at our source code under the `extension` folder in DuckDB's source code! #### Running the Benchmark The data is now fully prepared. Now we want to run the queries in the benchmark, one by one. We load the topics file as follows: ```python # the 'topics' file is not structured nicely, therefore we need parse some of it using regex def after_tag(s, tag): m = re.findall(r'<' + tag + r'>([\s\S]*?)<.*>', s) return m[0].replace('\n', '').strip() topic_dict = {} with open('../../trec/topics', 'r') as f: bs_content = bs(f.read(), "lxml") for top in bs_content.findChildren('top'): top_content = top.getText() # we need the number and title of each topic num = after_tag(str(top), 'num').split(' ')[1] title = after_tag(str(top), 'title') topic_dict[num] = title ``` This gives us a dictionary that has query number as keys, and query strings as values, e.g., `301 -> 'International Organized Crime'`. We want to store the results in a specific format, so that they can be evaluated by [trec eval](https://github.com/usnistgov/trec_eval.git): ```python # create a prepared statement to make querying our document collection easier con.execute(""" PREPARE fts_query AS ( WITH scored_docs AS ( SELECT *, fts_main_documents.match_bm25(docno, ?) AS score FROM documents) SELECT docno, score FROM scored_docs WHERE score IS NOT NULL ORDER BY score DESC LIMIT 1000) """) # enable parallelism con.execute('PRAGMA threads=32') results = [] for query in topic_dict: q_str = topic_dict[query].replace('\'', ' ') con.execute("EXECUTE fts_query('" + q_str + "')") for i, row in enumerate(con.fetchall()): results.append(query + " Q0 " + row[0].trim() + " " + str(i) + " " + str(row[1]) + " STANDARD") con.close() with open('results', 'w+') as f: for r in results: f.write(r + '\n') ``` #### Results Now that we have created our 'results' file, we can compare them to the relevance assessments `qrels` using `trec_eval`. ```bash ./trec_eval -m P.30 -m map qrels results ``` ```text map all 0.2324 P_30 all 0.2948 ``` Not bad! While these results are not as high as the reproducible by [Anserini](https://github.com/castorini/anserini), they are definitely acceptable. The difference in performance can be explained by differences in 1. Which stemmer was used (we used 'porter') 2. Which stopwords were used (we used the list of 571 English stopwords used in the SMART system) 3. Pre-processing (removal of accents, punctuation, numbers) 4. BM25 parameters (we used the default k=1.2 and b=0.75, non-conjunctive) 5. Which fields were indexed (we used all columns by supplying '\*') Retrieval time for each query was between 0.5 and 1.3 seconds on our machine, which will be improved with further improvements to DuckDB. I hope you enjoyed reading this blog, and become inspired to test out the extension as well! ## Efficient SQL on Pandas with DuckDB **Publication date:** 2021-05-14 **Authors:** Mark Raasveldt and Hannes MÃ¼hleisen **TL;DR:** DuckDB, a free and open source analytical data management system, can efficiently run SQL queries directly on Pandas DataFrames. Recently, an article was published [advocating for using SQL for Data Analysis](https://hakibenita.com/sql-for-data-analysis). Here at team DuckDB, we are huge fans of [SQL](https://en.wikipedia.org/wiki/SQL). It is a versatile and flexible language that allows the user to efficiently perform a wide variety of data transformations, without having to care about how the data is physically represented or how to do these data transformations in the most optimal way. While you can very effectively perform aggregations and data transformations in an external database system such as Postgres if your data is stored there, at some point you will need to convert that data back into [Pandas](https://pandas.pydata.org) and [NumPy](https://numpy.org). These libraries serve as the standard for data exchange between the vast ecosystem of Data Science libraries in Python¹ such as [scikit-learn](https://scikit-learn.org/stable/) or [TensorFlow](https://www.tensorflow.org). ¹[Apache Arrow](https://arrow.apache.org) is gaining significant traction in this domain as well, and DuckDB also quacks Arrow. If you are reading from a file (e.g., a CSV or Parquet file) often your data will never be loaded into an external database system at all, and will instead be directly loaded into a Pandas DataFrame. #### SQL on Pandas After your data has been converted into a Pandas DataFrame often additional data wrangling and analysis still need to be performed. SQL is a very powerful tool for performing these types of data transformations. Using DuckDB, it is possible to run SQL efficiently right on top of Pandas DataFrames. As a short teaser, here is a code snippet that allows you to do exactly that: run arbitrary SQL queries directly on Pandas DataFrames using DuckDB. ```python # to install: pip install duckdb import pandas as pd import duckdb mydf = pd.DataFrame({'a' : [1, 2, 3]}) print(duckdb.query("SELECT sum(a) FROM mydf").to_df()) ``` In the rest of the article, we will go more in-depth into how this works and how fast it is. #### Data Integration & SQL on Pandas One of the core goals of DuckDB is that accessing data in common formats should be easy. DuckDB is fully capable of running queries in parallel *directly* on top of a Pandas DataFrame (or on a Parquet/CSV file, or on an Arrow table, â€¦). A separate (time-consuming) import step is not necessary. DuckDB can also write query results directly to any of these formats. You can use DuckDB to process a Pandas DataFrame in parallel using SQL, and convert the result back to a Pandas DataFrame again, so you can then use the result in other Data Science libraries. When you run a query in SQL, DuckDB will look for Python variables whose name matches the table names in your query and automatically start reading your Pandas DataFrames. Looking back at the previous example we can see this in action: ```python import pandas as pd import duckdb mydf = pd.DataFrame({'a' : [1, 2, 3]}) print(duckdb.query("SELECT sum(a) FROM mydf").to_df()) ``` The SQL table name `mydf` is interpreted as the local Python variable `mydf` that happens to be a Pandas DataFrame, which DuckDB can read and query directly. The column names and types are also extracted automatically from the DataFrame. Not only is this process painless, it is highly efficient. For many queries, you can use DuckDB to process data faster than Pandas, and with a much lower total memory usage, *without ever leaving the Pandas DataFrame binary format* ("Pandas-in, Pandas-out"). Unlike when using an external database system such as Postgres, the data transfer time of the input or the output is negligible (see Appendix A for details). #### SQL on Pandas Performance To demonstrate the performance of DuckDB when executing SQL on Pandas DataFrames, we now present a number of benchmarks. The source code for the benchmarks is available for interactive use [in Google Colab](https://colab.research.google.com/drive/1eg_TJpPQr2tyYKWjISJlX8IEAi8Qln3U?usp=sharing). In these benchmarks, we operate *purely* on Pandas DataFrames. Both the DuckDB code and the Pandas code operates fully on a `Pandas-in, Pandas-out` basis. ##### Benchmark Setup and Data Set We run the benchmark entirely from within the Google Colab environment. For our benchmark dataset, we use the [infamous TPC-H data set](http://www.tpc.org/tpch/). Specifically, we focus on the `lineitem` and `orders` tables as these are the largest tables in the benchmark. The total dataset size is around 1 GB in uncompressed CSV format ("scale factor" 1). As DuckDB is capable of using multiple processors (multi-threading), we include both a single-threaded variant and a variant with two threads. Note that while DuckDB can scale far beyond two threads, Google Colab only supports two. ##### Setup First we need to install DuckDB. This is a simple one-liner. ```bash pip install duckdb ``` To set up the dataset for processing we download two Parquet files using `wget`. After that, we load the data into a Pandas DataFrame using the built-in Parquet reader of DuckDB. The system automatically infers that we are reading a Parquet file by looking at the `.parquet` extension of the file. ```python lineitem = duckdb.query( "SELECT * FROM 'lineitemsf1.snappy.parquet'" ).to_df() orders = duckdb.query( "SELECT * FROM 'orders.parquet'" ).to_df() ``` ##### Ungrouped Aggregates For our first query, we will run a set of ungrouped aggregates over the Pandas DataFrame. Here is the SQL query: ```sql SELECT sum(l_extendedprice), min(l_extendedprice), max(l_extendedprice), avg(l_extendedprice) FROM lineitem; ``` The Pandas code looks similar: ```python lineitem.agg( Sum=('l_extendedprice', 'sum'), Min=('l_extendedprice', 'min'), Max=('l_extendedprice', 'max'), Avg=('l_extendedprice', 'mean') ) ``` | Name | Time (s) | |:-------------------|----------:| | DuckDB (1 Thread) | 0.079 | | DuckDB (2 Threads) | 0.048 | | Pandas | 0.070 | This benchmark involves a very simple query, and Pandas performs very well here. These simple queries are where Pandas excels (ha), as it can directly call into the numpy routines that implement these aggregates, which are highly efficient. Nevertheless, we can see that DuckDB performs similar to Pandas in the single-threaded scenario, and benefits from its multi-threading support when enabled. ##### Grouped Aggregate For our second query, we will run the same set of aggregates, but this time include a grouping condition. In SQL, we can do this by adding a GROUP BY clause to the query. ```sql SELECT l_returnflag, l_linestatus, sum(l_extendedprice), min(l_extendedprice), max(l_extendedprice), avg(l_extendedprice) FROM lineitem GROUP BY l_returnflag, l_linestatus; ``` In Pandas, we use the groupby function before we perform the aggregation. ```python lineitem.groupby( ['l_returnflag', 'l_linestatus'] ).agg( Sum=('l_extendedprice', 'sum'), Min=('l_extendedprice', 'min'), Max=('l_extendedprice', 'max'), Avg=('l_extendedprice', 'mean') ) ``` | Name | Time (s) | |:-------------|----------:| | DuckDB (1 Thread) | 0.43 | | DuckDB (2 Threads) | 0.32 | | Pandas | 0.84 | This query is already getting more complex, and while Pandas does a decent job, it is a factor two slower than the single-threaded version of DuckDB. DuckDB has a highly optimized aggregate hash-table implementation that will perform both the grouping and the computation of all the aggregates in a single pass over the data. ##### Grouped Aggregate with a Filter Now suppose that we don't want to perform an aggregate over all of the data, but instead only want to select a subset of the data to aggregate. We can do this by adding a filter clause that removes any tuples we are not interested in. In SQL, we can accomplish this through the `WHERE` clause. ```sql SELECT l_returnflag, l_linestatus, sum(l_extendedprice), min(l_extendedprice), max(l_extendedprice), avg(l_extendedprice) FROM lineitem WHERE l_shipdate <= DATE '1998-09-02' GROUP BY l_returnflag, l_linestatus; ``` In Pandas, we can create a filtered variant of the DataFrame by using the selection brackets. ```python # filter out the rows filtered_df = lineitem[ lineitem['l_shipdate'] < "1998-09-02"] # perform the aggregate result = filtered_df.groupby( ['l_returnflag', 'l_linestatus'] ).agg( Sum=('l_extendedprice', 'sum'), Min=('l_extendedprice', 'min'), Max=('l_extendedprice', 'max'), Avg=('l_extendedprice', 'mean') ) ``` In DuckDB, the query optimizer will combine the filter and aggregation into a single pass over the data, only reading relevant columns. In Pandas, however, we have no such luck. The filter as it is executed will actually subset the entire lineitem table, *including any columns we are not using!* As a result of this, the filter operation is much more time-consuming than it needs to be. We can manually perform this optimization ("projection pushdown" in database literature). To do this, we first need to select only the columns that are relevant to our query and then subset the lineitem dataframe. We will end up with the following code snippet: ```python # projection pushdown pushed_down_df = lineitem[ ['l_shipdate', 'l_returnflag', 'l_linestatus', 'l_extendedprice'] ] # perform the filter filtered_df = pushed_down_df[ pushed_down_df['l_shipdate'] < "1998-09-02"] # perform the aggregate result = filtered_df.groupby( ['l_returnflag', 'l_linestatus'] ).agg( Sum=('l_extendedprice', 'sum'), Min=('l_extendedprice', 'min'), Max=('l_extendedprice', 'max'), Avg=('l_extendedprice', 'mean') ) ``` | Name | Time (s) | |:----------------------------|----------:| | DuckDB (1 Thread) | 0.60 | | DuckDB (2 Threads) | 0.42 | | Pandas | 3.57 | | Pandas (manual pushdown) | 2.23 | While the manual projection pushdown significantly speeds up the query in Pandas, there is still a significant time penalty for the filtered aggregate. To process a filter, Pandas will write a copy of the entire DataFrame (minus the filtered out rows) back into memory. This operation can be time consuming when the filter is not very selective. Due to its holistic query optimizer and efficient query processor, DuckDB performs significantly better on this query. ##### Joins For the final query, we will join (` merge` in Pandas) the lineitem table with the orders table, and apply a filter that only selects orders which have the status we are interested in. This leads us to the following query in SQL: ```sql SELECT l_returnflag, l_linestatus, sum(l_extendedprice), min(l_extendedprice), max(l_extendedprice), avg(l_extendedprice) FROM lineitem JOIN orders ON (l_orderkey = o_orderkey) WHERE l_shipdate <= DATE '1998-09-02' AND o_orderstatus='O' GROUP BY l_returnflag, l_linestatus; ``` For Pandas, we have to add a `merge` step. In a basic approach, we merge lineitem and orders together, then apply the filters, and finally apply the grouping and aggregation. This will give us the following code snippet: ```python # perform the join merged = lineitem.merge( orders, left_on='l_orderkey', right_on='o_orderkey') # filter out the rows filtered_a = merged[ merged['l_shipdate'] < "1998-09-02"] filtered_b = filtered_a[ filtered_a['o_orderstatus'] == "O"] # perform the aggregate result = filtered_b.groupby( ['l_returnflag', 'l_linestatus'] ).agg( Sum=('l_extendedprice', 'sum'), Min=('l_extendedprice', 'min'), Max=('l_extendedprice', 'max'), Avg=('l_extendedprice', 'mean') ) ``` Now we have missed two performance opportunities: * First, we are merging far too many columns, because we are merging columns that are not required for the remainder of the query (projection pushdown). * Second, we are merging far too many rows. We can apply the filters prior to the merge to reduce the amount of data that we need to merge (filter pushdown). Applying these two optimizations manually results in the following code snippet: ```python # projection & filter on lineitem table lineitem_projected = lineitem[ ['l_shipdate', 'l_orderkey', 'l_linestatus', 'l_returnflag', 'l_extendedprice'] ] lineitem_filtered = lineitem_projected[ lineitem_projected['l_shipdate'] < "1998-09-02"] # projection and filter on order table orders_projected = orders[ ['o_orderkey', 'o_orderstatus'] ] orders_filtered = orders_projected[ orders_projected['o_orderstatus'] == 'O'] # perform the join merged = lineitem_filtered.merge( orders_filtered, left_on='l_orderkey', right_on='o_orderkey') # perform the aggregate result = merged.groupby( ['l_returnflag', 'l_linestatus'] ).agg( Sum=('l_extendedprice', 'sum'), Min=('l_extendedprice', 'min'), Max=('l_extendedprice', 'max'), Avg=('l_extendedprice', 'mean') ) ``` Both of these optimizations are automatically applied by DuckDB's query optimizer. | Name | Time (s) | |:-------------------------|---------:| | DuckDB (1 Thread) | 1.05 | | DuckDB (2 Threads) | 0.53 | | Pandas | 15.2 | | Pandas (manual pushdown) | 3.78 | We see that the basic approach is extremely time consuming compared to the optimized version. This demonstrates the usefulness of the automatic query optimizer. Even after optimizing, the Pandas code is still significantly slower than DuckDB because it stores intermediate results in memory after the individual filters and joins. ##### Takeaway Using DuckDB, you can take advantage of the powerful and expressive SQL language without having to worry about moving your data in â€“ and out â€“ of Pandas. DuckDB is extremely simple to install, and offers many advantages such as a query optimizer, automatic multi-threading and larger-than-memory computation. DuckDB uses the Postgres SQL parser, and offers many of the same SQL features as Postgres, including advanced features such as window functions, correlated subqueries, (recursive) common table expressions, nested types and sampling. If you are missing a feature, please [open an issue](https://github.com/duckdb/duckdb/issues). #### Appendix A: There and Back Again: Transferring Data from Pandas to a SQL Engine and Back Traditional SQL engines use the Client-Server paradigm, which means that a client program connects through a socket to a server. Queries are run on the server, and results are sent back down to the client afterwards. This is the same when using for example Postgres from Python. Unfortunately, this transfer [is a serious bottleneck](http://www.vldb.org/pvldb/vol10/p1022-muehleisen.pdf). In-process engines such as SQLite or DuckDB do not run into this problem. To showcase how costly this data transfer over a socket is, we have run a benchmark involving Postgres, SQLite and DuckDB. The source code for the benchmark can be found on [GitHub](https://gist.github.com/hannes/a95a39a1eda63aeb0ca13fd82d1ba49c). In this benchmark we copy a (fairly small) Pandas data frame consisting of 10M 4-byte integers (40 MB) from Python to the PostgreSQL, SQLite and DuckDB databases. Since the default Pandas `to_sql` was rather slow, we added a separate optimization in which we tell Pandas to write the data frame to a temporary CSV file, and then tell PostgreSQL to directly copy data from that file into a newly created table. This of course will only work if the database server is running on the same machine as Python. | Name | Time (s) | |:---------------------------------------------|----------:| | Pandas to Postgres using to_sql | 111.25 | | Pandas to Postgres using temporary CSV file | 5.57 | | Pandas to SQLite using to_sql | 6.80 | | Pandas to DuckDB | 0.03 | While SQLite performs significantly better than Postgres here, it is still rather slow. That is because the `to_sql` function in Pandas runs a large number of `INSERT INTO` statements, which involves transforming all the individual values of the Pandas DataFrame into a row-wise representation of Python objects which are then passed onto the system. DuckDB on the other hand directly reads the underlying array from Pandas, which makes this operation almost instant. Transferring query results or tables back from the SQL system into Pandas is another potential bottleneck. Using the built-in `read_sql_query` is extremely slow, but even the more optimized CSV route still takes at least a second for this tiny data set. DuckDB, on the other hand, also performs this transformation almost instantaneously. | Name | Time (s) | |:-----------------------------------------------|----------:| | PostgreSQL to Pandas using read_sql_query | 7.08 | | PostgreSQL to Pandas using temporary CSV file | 1.29 | | SQLite to Pandas using read_sql_query | 5.20 | | DuckDB to Pandas | 0.04 | #### Appendix B: Comparison to PandaSQL There is a package called [PandaSQL](https://pypi.org/project/pandasql/) that also provides the facilities of running SQL directly on top of Pandas. However, it is built using the to_sql and from_sql infrastructure that we have seen is extremely slow in Appendix A. Nevertheless, for good measure we have run the first Ungrouped Aggregate query in PandaSQL to time it. When we first tried to run the query on the original dataset, however, we ran into an out-of-memory error that crashed our colab session. For that reason, we have decided to run the benchmark again for PandaSQL using a sample of 10% of the original data set size (600K rows). Here are the results: | Name | Time (s) | |:-------------|-----------:| | DuckDB (1 Thread) | 0.023 | | DuckDB (2 Threads) | 0.014 | | Pandas | 0.017 | | PandaSQL | 24.43 | We can see that PandaSQL (powered by SQLite) is around 1000Ã— slower than either Pandas or DuckDB on this straightforward benchmark. The performance difference was so large we have opted not to run the other benchmarks for PandaSQL. #### Appendix C: Query on Parquet Directly In the benchmarks above, we fully read the Parquet files into Pandas. However, DuckDB also has the capability of directly running queries on top of Parquet files (in parallel!). In this appendix, we show the performance of this compared to loading the file into Python first. For the benchmark, we will run two queries: the simplest query (the ungrouped aggregate) and the most complex query (the final join) and compare the cost of running this query directly on the Parquet file, compared to loading it into Pandas using the `read_parquet` function. ##### Setup In DuckDB, we can create a view over the Parquet file using the following query. This allows us to run queries over the Parquet file as if it was a regular table. Note that we do not need to worry about projection pushdown at all: we can just do a `SELECT *` and DuckDB's optimizer will take care of only projecting the required columns at query time. ```sql CREATE VIEW lineitem_parquet AS SELECT * FROM 'lineitemsf1.snappy.parquet'; CREATE VIEW orders_parquet AS SELECT * FROM 'orders.parquet'; ``` ##### Ungrouped Aggregate After we have set up this view, we can run the same queries we ran before, but this time against the `lineitem_parquet` table. ```sql SELECT sum(l_extendedprice), min(l_extendedprice), max(l_extendedprice), avg(l_extendedprice) FROM lineitem_parquet; ``` For Pandas, we will first need to run `read_parquet` to load the data into Pandas. To do this, we use the Parquet reader powered by Apache Arrow. After that, we can run the query as we did before. ```python lineitem_pandas_parquet = pd.read_parquet('lineitemsf1.snappy.parquet') result = lineitem_pandas_parquet.agg(Sum=('l_extendedprice', 'sum'), Min=('l_extendedprice', 'min'), Max=('l_extendedprice', 'max'), Avg=('l_extendedprice', 'mean')) ``` However, we now again run into the problem where Pandas will read the Parquet file in its entirety. In order to circumvent this, we will need to perform projection pushdown manually again by providing the `read_parquet` method with the set of columns that we want to read. The optimizer in DuckDB will figure this out by itself by looking at the query you are executing. ```python lineitem_pandas_parquet = pd.read_parquet('lineitemsf1.snappy.parquet', columns=['l_extendedprice']) result = lineitem_pandas_parquet.agg(Sum=('l_extendedprice', 'sum'), Min=('l_extendedprice', 'min'), Max=('l_extendedprice', 'max'), Avg=('l_extendedprice', 'mean')) ``` | Name | Time (s) | |:------------------------------|---------:| | DuckDB (1 Thread) | 0.16 | | DuckDB (2 Threads) | 0.14 | | Pandas | 7.87 | | Pandas (manual pushdown) | 0.17 | We can see that the performance difference between doing the pushdown and not doing the pushdown is dramatic. When we perform the pushdown, Pandas has performance in the same ballpark as DuckDB. Without the pushdown, however, it is loading the entire file from disk, including the other 15 columns that are not required to answer the query. #### Joins Now for the final query that we saw in the join section previously. To recap: ```sql SELECT l_returnflag, l_linestatus, sum(l_extendedprice), min(l_extendedprice), max(l_extendedprice), avg(l_extendedprice) FROM lineitem JOIN orders ON (l_orderkey = o_orderkey) WHERE l_shipdate <= DATE '1998-09-02' AND o_orderstatus='O' GROUP BY l_returnflag, l_linestatus; ``` For Pandas we again create two versions. A naive version, and a manually optimized version. The exact code used can be found [in Google Colab](https://colab.research.google.com/drive/1eg_TJpPQr2tyYKWjISJlX8IEAi8Qln3U?usp=sharing). | Name | Time (s) | |:-------------------------|---------:| | DuckDB (1 Thread) | 1.04 | | DuckDB (2 Threads) | 0.89 | | Pandas | 20.4 | | Pandas (manual pushdown) | 3.95 | We see that for this more complex query the slight difference in performance between running over a Pandas DataFrame and a Parquet file vanishes, and the DuckDB timings become extremely similar to the timings we saw before. The added Parquet read again increases the necessity of manually performing optimizations on the Pandas code, which is not required at all when running SQL in DuckDB. ## Querying Parquet with Precision Using DuckDB **Publication date:** 2021-06-25 **Authors:** Hannes MÃ¼hleisen and Mark Raasveldt **TL;DR:** DuckDB, a free and open source analytical data management system, can run SQL queries directly on Parquet files and automatically take advantage of the advanced features of the Parquet format. Apache Parquet is the most common "Big Data" storage format for analytics. In Parquet files, data is stored in a columnar-compressed binary format. Each Parquet file stores a single table. The table is partitioned into row groups, which each contain a subset of the rows of the table. Within a row group, the table data is stored in a columnar fashion. ![](../images/blog/parquet.svg) The Parquet format has a number of properties that make it suitable for analytical use cases: 1. The columnar representation means that individual columns can be (efficiently) read. No need to always read the entire file! 2. The file contains per-column statistics in every row group (min/max value, and the number of `NULL` values). These statistics allow the reader to skip row groups if they are not required. 3. The columnar compression significantly reduces the file size of the format, which in turn reduces the storage requirement of data sets. This can often turn Big Data into Medium Data. #### DuckDB and Parquet DuckDB's zero-dependency Parquet reader is able to directly execute SQL queries on Parquet files without any import or analysis step. Because of the natural columnar format of Parquet, this is very fast! DuckDB will read the Parquet files in a streaming fashion, which means you can perform queries on large Parquet files that do not fit in your main memory. DuckDB is able to automatically detect which columns and rows are required for any given query. This allows users to analyze much larger and more complex Parquet files without needing to perform manual optimizations or investing in more hardware. And as an added bonus, DuckDB is able to do all of this using parallel processing and over multiple Parquet files at the same time using the glob syntax. As a short teaser, here is a code snippet that allows you to directly run a SQL query on top of a Parquet file. To install the DuckDB package: ```bash pip install duckdb ``` To download the Parquet file: ```bash wget https://blobs.duckdb.org/data/taxi_2019_04.parquet ``` Then, run the following Python script: ```python import duckdb print(duckdb.query(''' SELECT count(*) FROM 'taxi_2019_04.parquet' WHERE pickup_at BETWEEN '2019-04-15' AND '2019-04-20' ''').fetchall()) ``` #### Automatic Filter & Projection Pushdown Let us dive into the previous query to better understand the power of the Parquet format when combined with DuckDB's query optimizer. ```sql SELECT count(*) FROM 'taxi_2019_04.parquet' WHERE pickup_at BETWEEN '2019-04-15' AND '2019-04-20'; ``` In this query, we read a single column from our Parquet file (` pickup_at`). Any other columns stored in the Parquet file can be entirely skipped, as we do not need them to answer our query. ![](../images/blog/parquet-filter-svg.svg) In addition, only rows that have a `pickup_at` between the 15th and the 20th of April 2019 influence the result of the query. Any rows that do not satisfy this predicate can be skipped. We can use the statistics inside the Parquet file to great advantage here. Any row groups that have a max value of `pickup_at` lower than `2019-04-15`, or a min value higher than `2019-04-20`, can be skipped. In some cases, that allows us to skip reading entire files. #### DuckDB versus Pandas To illustrate how effective these automatic optimizations are, we will run a number of queries on top of Parquet files using both Pandas and DuckDB. In these queries, we use a part of the infamous New York Taxi dataset stored as Parquet files, specifically data from April, May and June 2019. These files are ca. 360 MB in size together and contain around 21 million rows of 18 columns each. The three files are placed into the `taxi/` folder. The examples are available as [an interactive notebook over at Google Colab](https://colab.research.google.com/drive/1e1beWqYOcFidKl2IxHtxT5s9i_6KYuNY). The timings reported here are from this environment for reproducibility. #### Reading Multiple Parquet Files First we look at some rows in the dataset. There are three Parquet files in the `taxi/` folder. [DuckDB supports the globbing syntax](#docs:stable:data:parquet:overview), which allows it to query all three files simultaneously. ```python con.execute(""" SELECT * FROM 'taxi/*.parquet' LIMIT 5""").df() ``` | pickup_at | dropoff_at | passenger_count | trip_distance | rate_code_id | |---------------------|---------------------|-----------------|---------------|--------------| | 2019-04-01 00:04:09 | 2019-04-01 00:06:35 | 1 | 0.5 | 1 | | 2019-04-01 00:22:45 | 2019-04-01 00:25:43 | 1 | 0.7 | 1 | | 2019-04-01 00:39:48 | 2019-04-01 01:19:39 | 1 | 10.9 | 1 | | 2019-04-01 00:35:32 | 2019-04-01 00:37:11 | 1 | 0.2 | 1 | | 2019-04-01 00:44:05 | 2019-04-01 00:57:58 | 1 | 4.8 | 1 | Despite the query selecting all columns from three (rather large) Parquet files, the query completes instantly. This is because DuckDB processes the Parquet file in a streaming fashion, and will stop reading the Parquet file after the first few rows are read as that is all required to satisfy the query. If we try to do the same in Pandas, we realize it is not so straightforward, as Pandas cannot read multiple Parquet files in one call. We will first have to use `pandas.concat` to concatenate the three Parquet files together: ```python import pandas import glob df = pandas.concat( [pandas.read_parquet(file) for file in glob.glob('taxi/*.parquet')]) print(df.head(5)) ``` Below are the timings for both of these queries. | System | Time (s) | |:-------|---------:| | DuckDB | 0.015 | | Pandas | 12.300 | Pandas takes significantly longer to complete this query. That is because Pandas not only needs to read each of the three Parquet files in their entirety, it has to concatenate these three separate Pandas DataFrames together. #### Concatenate into a Single File We can address the concatenation issue by creating a single big Parquet file from the three smaller parts. We can use the `pyarrow` library for this, which has support for reading multiple Parquet files and streaming them into a single large file. Note that the `pyarrow` Parquet reader is the very same Parquet reader that is used by Pandas internally. ```python import pyarrow.parquet as pq # concatenate all three Parquet files pq.write_table(pq.ParquetDataset('taxi/').read(), 'alltaxi.parquet', row_group_size=100000) ``` Note that [DuckDB also has support for writing Parquet files](#docs:stable:data:parquet:overview::writing-to-parquet-files) using the COPY statement. #### Querying the Large File Now let us repeat the previous experiment, but using the single file instead. ```python # DuckDB con.execute(""" SELECT * FROM 'alltaxi.parquet' LIMIT 5""").df() # Pandas pandas.read_parquet('alltaxi.parquet') .head(5) ``` | System | Time (s) | |:-------|---------:| | DuckDB | 0.02 | | Pandas | 7.50 | We can see that Pandas performs better than before, as the concatenation is avoided. However, the entire file still needs to be read into memory, which takes both a significant amount of time and memory. For DuckDB it does not really matter how many Parquet files need to be read in a query. #### Counting Rows Now suppose we want to figure out how many rows are in our data set. We can do that using the following code: ```python # DuckDB con.execute(""" SELECT count(*) FROM 'alltaxi.parquet' """).df() # Pandas len(pandas.read_parquet('alltaxi.parquet')) ``` | System | Time (s) | |:-------|---------:| | DuckDB | 0.015 | | Pandas | 7.500 | DuckDB completes the query very quickly, as it automatically recognizes what needs to be read from the Parquet file and minimizes the required reads. Pandas has to read the entire file again, which causes it to take the same amount of time as the previous query. For this query, we can improve Pandas' time through manual optimization. In order to get a count, we only need a single column from the file. By manually specifying a single column to be read in the `read_parquet` command, we can get the same result but much faster. ```python len(pandas.read_parquet('alltaxi.parquet', columns=['vendor_id'])) ``` | System | Time (s) | |:-------------------|---------:| | DuckDB | 0.015 | | Pandas | 7.500 | | Pandas (optimized) | 1.200 | While this is much faster, this still takes more than a second as the entire `vendor_id` column has to be read into memory as a Pandas column only to count the number of rows. #### Filtering Rows It is common to use some sort of filtering predicate to only look at the interesting parts of a data set. For example, imagine we want to know how many taxi rides occur after the 30th of June 2019. We can do that using the following query in DuckDB: ```python con.execute(""" SELECT count(*) FROM 'alltaxi.parquet' WHERE pickup_at > '2019-06-30' """).df() ``` The query completes in `45ms` and yields the following result: | count | |-------:| | 167022 | In Pandas, we can perform the same operation using a naive approach. ```python # pandas naive len(pandas.read_parquet('alltaxi.parquet') .query("pickup_at > '2019-06-30'")) ``` This again reads the entire file into memory, however, causing this query to take 7.5 s. With the manual projection pushdown we can bring this down to 0.9 s. Still significantly higher than DuckDB. ```python # pandas projection pushdown len(pandas.read_parquet('alltaxi.parquet', columns=['pickup_at']) .query("pickup_at > '2019-06-30'")) ``` The `pyarrow` Parquet reader also allows us to perform filter pushdown into the scan, however. Once we add this we end up with a much more competitive `70ms` to complete the query. ```python len(pandas.read_parquet('alltaxi.parquet', columns=['pickup_at'], filters=[('pickup_at', '>', '2019-06-30')])) ``` | System | Time (s) | |:--------------------------------------|---------:| | DuckDB | 0.05 | | Pandas | 7.50 | | Pandas (projection pushdown) | 0.90 | | Pandas (projection & filter pushdown) | 0.07 | This shows that the results here are not due to DuckDB's Parquet reader being faster than the `pyarrow` Parquet reader. The reason that DuckDB performs better on these queries is because its optimizers automatically extract all required columns and filters from the SQL query, which then get automatically utilized in the Parquet reader with no manual effort required. Interestingly, both the `pyarrow` Parquet reader and DuckDB are significantly faster than performing this operation natively in Pandas on a materialized DataFrame. ```python # read the entire Parquet file into Pandas df = pandas.read_parquet('alltaxi.parquet') # run the query natively in Pandas # note: we only time this part print(len(df[['pickup_at']].query("pickup_at > '2019-06-30'"))) ``` | System | Time (s) | |:--------------------------------------|---------:| | DuckDB | 0.05 | | Pandas | 7.50 | | Pandas (projection pushdown) | 0.90 | | Pandas (projection & filter pushdown) | 0.07 | | Pandas (native) | 0.26 | #### Aggregates Finally lets look at a more complex aggregation. Say we want to compute the number of rides per passenger. With DuckDB and SQL, it looks like this: ```python con.execute(""" SELECT passenger_count, count(*) FROM 'alltaxi.parquet' GROUP BY passenger_count""").df() ``` The query completes in `220ms` and yields the following result: | passenger_count | count | |---------------:|----------:| | 0 | 408742 | | 1 | 15356631 | | 2 | 3332927 | | 3 | 944833 | | 4 | 439066 | | 5 | 910516 | | 6 | 546467 | | 7 | 106 | | 8 | 72 | | 9 | 64 | For the SQL-averse and as a teaser for a future blog post, DuckDB also has a "Relational API" that allows for a more Python-esque declaration of queries. Here's the equivalent to the above SQL query, that provides the exact same result and performance: ```python con.from_parquet('alltaxi.parquet') .aggregate('passenger_count, count(*)') .df() ``` Now as a comparison, let's run the same query in Pandas in the same way we did previously. ```python # naive pandas.read_parquet('alltaxi.parquet') .groupby('passenger_count') .agg({'passenger_count' : 'count'}) # projection pushdown pandas.read_parquet('alltaxi.parquet', columns=['passenger_count']) .groupby('passenger_count') .agg({'passenger_count' : 'count'}) # native (parquet file pre-loaded into memory) df.groupby('passenger_count') .agg({'passenger_count' : 'count'}) ``` | System | Time (s) | |:-----------------------------|---------:| | DuckDB | 0.22 | | Pandas | 7.50 | | Pandas (projection pushdown) | 0.58 | | Pandas (native) | 0.51 | We can see that DuckDB is faster than Pandas in all three scenarios, without needing to perform any manual optimizations and without needing to load the Parquet file into memory in its entirety. #### Conclusion DuckDB can efficiently run queries directly on top of Parquet files without requiring an initial loading phase. The system will automatically take advantage of all of Parquet's advanced features to speed up query execution. DuckDB is a free and open source database management system (MIT licensed). It aims to be the SQLite for Analytics, and provides a fast and efficient database system with zero external dependencies. It is available not just for Python, but also for C/C++, R, Java, and more. ## Fastest Table Sort in the West â€“ Redesigning DuckDBâ€™s Sort **Publication date:** 2021-08-27 **Author:** Laurens Kuiper **TL;DR:** DuckDB, a free and open-source analytical data management system, has a new highly efficient parallel sorting implementation that can sort much more data than fits in main memory. Database systems use sorting for many purposes, the most obvious purpose being when a user adds an `ORDER BY` clause to their query. Sorting is also used within operators, such as window functions. DuckDB recently improved its sorting implementation, which is now able to sort data in parallel and sort more data than fits in memory. In this post, we will take a look at how DuckDB sorts, and how this compares to other data management systems. Not interested in the implementation? [Jump straight to the experiments!](#comparison) #### Sorting Relational Data Sorting is one of the most well-studied problems in computer science, and it is an important aspect of data management. There are [entire communities](https://sortbenchmark.org) dedicated to who sorts fastest. Research into sorting algorithms tends to focus on sorting large arrays or key/value pairs. While important, this does not cover how to implement sorting in a database system. There is a lot more to sorting tables than just sorting a large array of integers! Consider the following example query on a snippet of a TPC-DS table: ```sql SELECT c_customer_sk, c_birth_country, c_birth_year FROM customer ORDER BY c_birth_country DESC, c_birth_year ASC NULLS LAST; ``` Which yields: | c_customer_sk | c_birth_country | c_birth_year | | ------------: | --------------- | -----------: | | 64760 | NETHERLANDS | 1991 | | 75011 | NETHERLANDS | 1992 | | 89949 | NETHERLANDS | 1992 | | 90766 | NETHERLANDS | NULL | | 42927 | GERMANY | 1924 | In other words: `c_birth_country` is ordered descendingly, and where `c_birth_country` is equal, we sort on `c_birth_year` ascendingly. By specifying `NULLS LAST`, null values are treated as the lowest value in the `c_birth_year` column. Whole rows are thus reordered, not just the columns in the `ORDER BY` clause. The columns that are not in the `ORDER BY` clause we call "payload columns". Therefore, payload column `c_customer_sk` has to be reordered too. It is easy to implement something that can evaluate the example query using any sorting implementation, for instance, __C++__'s `std::sort`. While `std::sort` is excellent algorithmically, it is still a single-threaded approach that is unable to efficiently sort by multiple columns because function call overhead would quickly dominate sorting time. Below we will discuss why that is. To achieve good performance when sorting tables, a custom sorting implementation is needed. We are â€“ of course â€“ not the first to implement relational sorting, so we dove into the literature to look for guidance. In 2006 the famous Goetz Graefe wrote a survey on [implementing sorting in database systems](http://wwwlgis.informatik.uni-kl.de/archiv/wwwdvs.informatik.uni-kl.de/courses/DBSREAL/SS2005/Vorlesungsunterlagen/Implementing_Sorting.pdf). In this survey, he collected many sorting techniques that are known to the community. This is a great guideline if you are about to start implementing sorting for tables. The cost of sorting is dominated by comparing values and moving data around. Anything that makes these two operations cheaper will have a big impact on the total runtime. There are two obvious ways to go about implementing a comparator when we have multiple `ORDER BY` clauses: 1. Loop through the clauses: Compare columns until we find one that is not equal, or until we have compared all columns. This is fairly complex already, as this requires a loop with an if/else inside of it for every single row of data. If we have columnar storage, this comparator has to jump between columns, [causing random access in memory](https://ir.cwi.nl/pub/13805). 2. Entirely sort the data by the first clause, then sort by the second clause, but only where the first clause was equal, and so on. This approach is especially inefficient when there are many duplicate values, as it requires multiple passes over the data. #### Binary String Comparison The binary string comparison technique improves sorting performance by simplifying the comparator. It encodes *all* columns in the `ORDER BY` clause into a single binary sequence that, when compared using `memcmp` will yield the correct overall sorting order. Encoding the data is not free, but since we are using the comparator so much during sorting, it will pay off. Let us take another look at 3 rows of the example: | c_birth_country | c_birth_year | |-----------------|-------------:| | NETHERLANDS | 1991 | | NETHERLANDS | 1992 | | GERMANY | 1924 | On [little-endian](https://en.wikipedia.org/wiki/Endianness) hardware, the bytes that represent these values look like this in memory, assuming 32-bit integer representation for the year: ```sql c_birth_country -- NETHERLANDS 01001110 01000101 01010100 01001000 01000101 01010010 01001100 01000001 01001110 01000100 01010011 00000000 -- GERMANY 01000111 01000101 01010010 01001101 01000001 01001110 01011001 00000000 c_birth_year -- 1991 11000111 00000111 00000000 00000000 -- 1992 11001000 00000111 00000000 00000000 -- 1924 10000100 00000111 00000000 00000000 ``` The trick is to convert these to a binary string that encodes the sorting order: ```sql -- NETHERLANDS | 1991 10110001 10111010 10101011 10110111 10111010 10101101 10110011 10111110 10110001 10111011 10101100 11111111 10000000 00000000 00000111 11000111 -- NETHERLANDS | 1992 10110001 10111010 10101011 10110111 10111010 10101101 10110011 10111110 10110001 10111011 10101100 11111111 10000000 00000000 00000111 11001000 -- GERMANY | 1924 10111000 10111010 10101101 10110010 10111110 10110001 10100110 11111111 11111111 11111111 11111111 11111111 10000000 00000000 00000111 10000100 ``` The binary string is fixed-size because this makes it much easier to move it around during sorting. The string "GERMANY" is shorter than "NETHERLANDS", therefore it is padded with `00000000`'s. All bits in column `c_birth_country` are subsequently inverted because this column is sorted descendingly. If a string is too long we encode its prefix and only look at the whole string if the prefixes are equal. The bytes in `c_birth_year` are swapped because we need the big-endian representation to encode the sorting order. The first bit is also flipped, to preserve order between positive and negative integers for [signed integers](https://en.wikipedia.org/wiki/Signed_number_representations). If there are `NULL` values, these must be encoded using an additional byte (not shown in the example). With this binary string, we can now compare both columns at the same time by comparing only the binary string representation. This can be done with a single `memcmp` in __C++__! The compiler will emit efficient assembly for single function call, even auto-generating [SIMD instructions](https://en.wikipedia.org/wiki/SIMD). This technique solves one of the problems mentioned above, namely the function call overhead when using complex comparators. #### Radix Sort Now that we have a cheap comparator, we have to choose our sorting algorithm. Every computer science student learns about [comparison-based](https://en.wikipedia.org/wiki/Sorting_algorithm#Comparison_sorts) sorting algorithms like [Quicksort](https://en.wikipedia.org/wiki/Quicksort) and [Merge sort](https://en.wikipedia.org/wiki/Merge_sort), which have _O (n_ log _n)_ time complexity, where _n_ is the number of records being sorted. However, there are also [distribution-based](https://en.wikipedia.org/wiki/Sorting_algorithm#Non-comparison_sorts) sorting algorithms, which typically have a time complexity of _O (n k)_, where _k_ is the width of the sorting key. This class of sorting algorithms scales much better with a larger _n_ because _k_ is constant, whereas log _n_ is not. One such algorithm is [Radix sort](https://en.wikipedia.org/wiki/Radix_sort). This algorithm sorts the data by computing the data distribution with [Counting sort](https://en.wikipedia.org/wiki/Counting_sort), multiple times until all digits have been counted. It may sound counter-intuitive to encode the sorting key columns such that we have a cheap comparator, and then choose a sorting algorithm that does not compare records. However, the encoding is necessary for Radix sort: Binary strings that produce a correct order with `memcmp` will produce a correct order if we do a byte-by-byte Radix sort. #### Two-Phase Parallel Sorting DuckDB uses [Morsel-Driven Parallelism](https://15721.courses.cs.cmu.edu/spring2016/papers/p743-leis.pdf), a framework for parallel query execution. For the sorting operator, this means that multiple threads collect roughly an equal amount of data, in parallel, from the table. We use this parallelism for sorting by first having each thread sort the data it collects using our Radix sort. After this first sorting phase, each thread has one or more sorted blocks of data, which must be combined into the final sorted result. [Merge sort](https://en.wikipedia.org/wiki/Merge_sort) is the algorithm of choice for this task. There are two main ways of implementing merge sort: [K-way merge](https://en.wikipedia.org/wiki/K-way_merge_algorithm) and [Cascade merge](https://en.wikipedia.org/wiki/Cascade_merge_sort). K-way merge merges K lists into one sorted list in one pass, and is traditionally [used for external sorting (sorting more data than fits in memory)](https://en.wikipedia.org/wiki/External_sorting#External_merge_sort) because it minimizes I/O. Cascade merge merges two lists of sorted data at a time until only one sorted list remains, and is used for in-memory sorting because it is more efficient than K-way merge. We aim to have an implementation that has high in-memory performance, which gracefully degrades as we go over the limit of available memory. Therefore, we choose cascade merge. In a cascade merge sort, we merge two blocks of sorted data at a time until only one sorted block remains. Naturally, we want to use all available threads to compute the merge. If we have many more sorted blocks than threads, we can assign each thread to merge two blocks. However, as the blocks get merged, we will not have enough blocks to keep all threads busy. This is especially slow when the final two blocks are merged: One thread has to process all the data. To fully parallelize this phase, we have implemented [Merge Path](https://arxiv.org/pdf/1406.2628.pdf) by Oded Green et al. Merge Path pre-computes *where* the sorted lists will intersect while merging, shown in the image below (taken from the paper). ![](../images/blog/sorting/merge_path.png) The intersections along the merge path can be efficiently computed using [Binary Search](https://en.wikipedia.org/wiki/Binary_search_algorithm). If we know where the intersections are, we can merge partitions of the sorted data independently in parallel. This allows us to use all available threads effectively for the entire merge phase. For another trick to improve merge sort, see [the appendix](#::predication). #### Columns or Rows? Besides comparisons, the other big cost of sorting is moving data around. DuckDB has a vectorized execution engine. Data is stored in a columnar layout, which is processed in batches (called chunks) at a time. This layout is great for analytical query processing because the chunks fit in the CPU cache, and it gives a lot of opportunities for the compiler to generate SIMD instructions. However, when the table is sorted, entire rows are shuffled around, rather than columns. We could stick to the columnar layout while sorting: Sort the key columns, then re-order the payload columns one by one. However, re-ordering will cause a random access pattern in memory for each column. If there are many payload columns, this will be slow. Converting the columns to rows will make re-ordering rows much easier. This conversion is of course not free: Columns need to be copied to rows, and back from rows to columns again after sorting. Because we want to support external sorting, we have to store data in [buffer-managed](https://research.cs.wisc.edu/coral/minibase/bufMgr/bufMgr.html) blocks that can be offloaded to disk. Because we have to copy the input data to these blocks anyway, converting the rows to columns is effectively free. There are a few operators that are inherently row-based, such as joins and aggregations. DuckDB has a unified internal row layout for these operators, and we decided to use it for the sorting operator as well. This layout has only been used in memory so far. In the next section, we will explain how we got it to work on disk as well. We should note that we will only write sorting data to disk if main memory is not able to hold it. #### External Sorting The buffer manager can unload blocks from memory to disk. This is not something we actively do in our sorting implementation, but rather something that the buffer manager decides to do if memory would fill up otherwise. It uses a least-recently-used queue to decide which blocks to write. More on how to properly use this queue in [the appendix](#::zigzag). When we need a block, we "pin" it, which reads it from disk if it is not loaded already. Accessing disk is much slower than accessing memory, therefore it is crucial that we minimize the number of reads and writes. Unloading data to disk is easy for fixed-size columns like integers, but more difficult for variable-sized columns like strings. Our row layout uses fixed-size rows, which cannot fit strings with arbitrary sizes. Therefore, strings are represented by a pointer, which points into a separate block of memory where the actual string data lives, a so-called "string heap". We have changed our heap to also store strings row-by-row in buffer-managed blocks: ![](../images/blog/sorting/heap-light.svg) ![](../images/blog/sorting/heap-dark.svg) Each row has an additional 8-byte field `pointer` which points to the start of this row in the heap. This is useless in the in-memory representation, but we will see why it is useful for the on-disk representation in just a second. If the data fits in memory, the heap blocks stay pinned, and only the fixed-size rows are re-ordered while sorting. If the data does not fit in memory, the blocks need to be offloaded to disk, and the heap will also be re-ordered while sorting. When a heap block is offloaded to disk, the pointers pointing into it are invalidated. When we load the block back into memory, the pointers will have changed. This is where our row-wise layout comes into play. The 8-byte `pointer` field is overwritten with an 8-byte `offset` field, denoting where in the heap block strings of this row can be found. This technique is called ["pointer swizzling"](https://en.wikipedia.org/wiki/Pointer_swizzling). When we swizzle the pointers, the row layout and heap block look like this: ![](../images/blog/sorting/heap_swizzled-light.svg) ![](../images/blog/sorting/heap_swizzled-dark.svg) The pointers to the subsequent string values are also overwritten with an 8-byte relative offset, denoting how far this string is offset from the start of the row in the heap (hence every `stringA` has an offset of `0`: It is the first string in the row). Using relative offsets within rows rather than absolute offsets is very useful during sorting, as these relative offsets stay constant, and do not need to be updated when a row is copied. When the blocks need to be scanned to read the sorted result, we "unswizzle" the pointers, making them point to the string again. With this dual-purpose row-wise representation, we can easily copy around both the fixed-size rows and the variable-sized rows in the heap. Besides having the buffer manager load/unload blocks, the only difference between in-memory and external sorting is that we swizzle/unswizzle pointers to the heap blocks, and copy data from the heap blocks during merge sort. All this reduces overhead when blocks need to be moved in and out of memory, which will lead to graceful performance degradation as we approach the limit of available memory. #### Comparison with Other Systems Now that we have covered most of the techniques that are used in our sorting implementation, we want to know how we compare to other systems. DuckDB is often used for interactive data analysis, and is therefore often compared to tools like [dplyr](https://dplyr.tidyverse.org). In this setting, people are usually running on laptops or PCs, therefore we will run these experiments on a 2020 MacBook Pro. This laptop has an [Apple M1 CPU](https://en.wikipedia.org/wiki/Apple_M1), which is [ARM](https://en.wikipedia.org/wiki/ARM_architecture)-based. The M1 processor has 8 cores: 4 high-performance (Firestorm) cores, and 4 energy-efficient (Icestorm) cores. The Firestorm cores have very, very fast single-thread performance, so this should level the playing field between single- and multi-threaded sorting implementations somewhat. The MacBook has 16 GB of memory, and [one of the fastest SSDs found in a laptop](https://eclecticlight.co/2020/12/12/how-fast-is-the-ssd-inside-an-m1-mac/). We will be comparing against the following systems: 1. [ClickHouse](https://clickhouse.tech), version 21.7.5 2. [HyPer](https://dbdb.io/db/hyper), version 2021.2.1.12564 3. [Pandas](https://pandas.pydata.org), version 1.3.2 4. [SQLite](https://www.sqlite.org/index.html), version 3.36.0 ClickHouse and HyPer are included in our comparison because they are analytical SQL engines with an emphasis on performance. Pandas and SQLite are included in our comparison because they can be used to perform relational operations within Python, like DuckDB. Pandas operates fully in memory, whereas SQLite is a more traditional disk-based system. This list of systems should give us a good mix of single-/multi-threaded, and in-memory/external sorting. ClickHouse was built for M1 using [this guide](https://clickhouse.tech/docs/en/development/build-osx/). We have set the memory limit to 12 GB, and `max_bytes_before_external_sort` to 10 GB, following [this suggestion](https://clickhouse.tech/docs/en/sql-reference/statements/select/order-by/#implementation-details). HyPer is [Tableau's data engine](https://www.tableau.com/products/new-features/hyper), created by the [database group at the University of Munich](http://db.in.tum.de). It does not run natively (yet) on ARM-based processors like the M1. We will use [Rosetta 2](https://en.wikipedia.org/wiki/Rosetta_(software)#Rosetta_2), macOS's x86 emulator to run it. Emulation causes some overhead, so we have included an experiment on an x86 machine in [the appendix](#::x86). Benchmarking sorting in database systems is not straightforward. Ideally, we would like to measure only the time it takes to sort the data, not the time it takes to read the input data and show the output. Not every system has a profiler to measure the time of the sorting operator exactly, so this is not an option. To approach a fair comparison, we will measure the end-to-end time of queries that sort the data and write the result to a temporary table, i.e.: ```sql CREATE TEMPORARY TABLE output AS SELECT ... FROM ... ORDER BY ...; ``` There is no perfect solution to this problem, but this should give us a good comparison because the end-to-end time of this query should be dominated by sorting. For Pandas we will use `sort_values` with `inplace=False` to mimic this query. In ClickHouse, temporary tables can only exist in memory, which is problematic for our out-of-core experiments. Therefore we will use a regular `TABLE`, but then we also need to choose a table engine. Most of the table engines apply compression or create an index, which we do not want to measure. Therefore we have chosen the simplest on-disk engine, which is [File](https://clickhouse.tech/docs/en/engines/table-engines/special/file/#file), with format [Native](https://clickhouse.tech/docs/en/interfaces/formats/#native). The table engine we chose for the input tables for ClickHouse is [MergeTree](https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/mergetree/#mergetree) with `ORDER BY tuple()`. We chose this because we encountered strange behavior with `File(Native)` input tables, where there was no difference in runtime between the queries `SELECT * FROM ... ORDER BY` and `SELECT col1 FROM ... ORDER BY`. Presumably, because all columns in the table were sorted regardless of how many there were selected. To measure stable end-to-end query time, we run each query 5 times and report the median run time. There are some differences in reading/writing tables between the systems. For instance, Pandas cannot read/write from/to disk, so both the input and output data frame will be in memory. DuckDB will not write the output table to disk unless there is not enough room to keep it in memory, and therefore also may have an advantage. However, sorting dominates the total runtime, so these differences are not that impactful. #### Random Integers We will start with a simple example. We have generated the first 100 million integers and shuffled them, and we want to know how well the systems can sort them. This experiment is more of a micro-benchmark than anything else and is of little real-world significance. For our first experiment, we will look at how the systems scale with the number of rows. From the initial table with integers, we have made 9 more tables, with 10M, 20M, ..., 90M integers each. ![](../images/blog/sorting/randints_scaling.svg) Being a traditional disk-based database system, SQLite always opts for an external sorting strategy. It writes intermediate sorted blocks to disk even if they fit in main-memory, therefore it is much slower. The performance of the other systems is in the same ballpark, with DuckDB and ClickHouse going toe-to-toe with \~3 and \~4 seconds for 100M integers. Because SQLite is so much slower, we will not include it in our next set of experiments (TPC-DS). DuckDB and ClickHouse both make very good use out of all available threads, with a single-threaded sort in parallel, followed by a parallel merge sort. We are not sure what strategy HyPer uses. For our next experiment, we will zoom in on multi-threading, and see how well ClickHouse and DuckDB scale with the number of threads (we were not able to set the number of threads for HyPer). ![](../images/blog/sorting/randints_threads.svg) This plot demonstrates that Radix sort is very fast. DuckDB sorts 100M integers in just under 5 seconds using a single thread, which is much faster than ClickHouse. Adding threads does not improve performance as much for DuckDB, because Radix Sort is so much faster than Merge Sort. Both systems end up at about the same performance at 4 threads. Beyond 4 threads we do not see performance improve much more, due to the CPU architecture. For all of the of other the experiments, we have set both DuckDB and ClickHouse to use 4 threads. For our last experiment with random integers, we will see how the sortedness of the input may impact performance. This is especially important to do in systems that use Quicksort because Quicksort performs much worse on inversely sorted data than on random data. ![](../images/blog/sorting/randints_sortedness.svg) Not surprisingly, all systems perform better on sorted data, sometimes by a large margin. ClickHouse, Pandas, and SQLite likely have some optimization here: e.g., keeping track of sortedness in the catalog, or checking sortedness while scanning the input. DuckDB and HyPer have only a very small difference in performance when the input data is sorted, and do not have such an optimization. For DuckDB the slightly improved performance can be explained due to a better memory access pattern during sorting: When the data is already sorted the access pattern is mostly sequential. Another interesting result is that DuckDB sorts data faster than some of the other systems can read already sorted data. #### TPC-DS For the next comparison, we have improvised a relational sorting benchmark on two tables from the standard [TPC Decision Support benchmark (TPC-DS)](http://www.tpc.org/tpcds/). TPC-DS is challenging for sorting implementations because it has wide tables (with many columns, unlike the tables in [TPC-H](http://www.tpc.org/tpch/)), and a mix of fixed- and variable-sized types. The number of rows increases with the scale factor. The tables used here are `catalog_sales` and `customer`. `catalog_sales` has 34 columns, all fixed-size types (integer and double), and grows to have many rows as the scale factor increases. `customer` has 18 columns (10 integers, and 8 strings), and a decent amount of rows as the scale factor increases. The row counts of both tables at each scale factor are shown in the table below. | SF | customer | catalog_sales | | ---: | --------: | ------------: | | 1 | 100,000 | 1,441,548 | | 10 | 500,000 | 14,401,261 | | 100 | 2,000,000 | 143,997,065 | | 300 | 5,000,000 | 260,014,080 | We will use `customer` at SF100 and SF300, which fits in memory at every scale factor. We will use `catalog_sales` table at SF10 and SF100, which does not fit in memory anymore at SF100. The data was generated using DuckDB's TPC-DS extension, then exported to CSV in a random order to undo any ordering patterns that could have been in the generated data. #### Catalog Sales (Numeric Types) Our first experiment on the `catalog_sales` table is selecting 1 column, then 2 columns, ..., up to all 34, always ordering by `cs_quantity` and `cs_item_sk`. This experiment will tell us how well the different systems can re-order payload columns. ![](../images/blog/sorting/tpcds_catalog_sales_payload.svg) We see similar trends at SF10 and SF100, but for SF100, at around 12 payload columns or so, the data does not fit in memory anymore, and ClickHouse and HyPer show a big drop in performance. ClickHouse switches to an external sorting strategy, which is much slower than its in-memory strategy. Therefore, adding a few payload columns results in a runtime that is orders of magnitude higher. At 20 payload columns ClickHouse runs into the following error: ```console DB::Exception: Memory limit (for query) exceeded: would use 11.18 GiB (attempt to allocate chunk of 4204712 bytes), maximum: 11.18 GiB: (while reading column cs_list_price): (while reading from part ./store/523/5230c288-7ed5-45fa-9230-c2887ed595fa/all_73_108_2/ from mark 4778 with max_rows_to_read = 8192): While executing MergeTreeThread. ``` HyPer also drops in performance before erroring out with the following message: ```console ERROR: Cannot allocate 333982248 bytes of memory: The `global memory limit` limit of 12884901888 bytes was exceeded. ``` As far as we are aware, HyPer uses [`mmap`](https://man7.org/linux/man-pages/man2/mmap.2.html), which creates a mapping between memory and a file. This allows the operating system to move data between memory and disk. While useful, it is no substitute for a proper external sort, as it creates random access to disk, which is very slow. Pandas performs surprisingly well on SF100, despite the data not fitting in memory. Pandas can only do this because macOS dynamically increases swap size. Most operating systems do not do this and would fail to load the data at all. Using swap usually slows down processing significantly, but the SSD is so fast that there is no visible performance drop! While Pandas loads the data, swap size grows to an impressive \~40 GB: Both the file and the data frame are fully in memory/swap at the same time, rather than streamed into memory. This goes down to \~20 GB of memory/swap when the file is done being read. Pandas is able to get quite far into the experiment until it crashes with the following error: ```console UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown ``` DuckDB performs well both in-memory and external, and there is no clear visible point at which data no longer fits in memory: Runtime is fast and reliable. #### Customer (Strings & Integers) Now that we have seen how the systems handle large amounts of fixed-size types, it is time to see some variable-size types! For our first experiment on the `customer` table, we will select all columns, and order them by either 3 integer columns (` c_birth_year`, `c_birth_month`, `c_birth_day`), or by 2 string columns (` c_first_name`, `c_last_name`). Comparing strings is much, much more difficult than comparing integers, because strings can have variable sizes, and need to be compare byte-by-byte, whereas integers always have the same comparison. ![](../images/blog/sorting/tpcds_customer_type_sort_barplot.svg) As expected, ordering by strings is more expensive than ordering by integers, except for HyPer, which is impressive. Pandas has only a slightly bigger difference between ordering by integers and ordering by strings than ClickHouse and DuckDB. This difference is explained by an expensive comparator between strings. Pandas uses [NumPy](https://numpy.org)'s sort, which is efficiently implemented in __C__. However, when this sorts strings, it has to use virtual function calls to compare a Python string object, which is slower than a simple "`<`" between integers in __C__. Nevertheless, Pandas performs well on the `customer` table. In our next experiment, we will see how the payload type affects performance. `customer` has 10 integer columns and 8 string columns. We will either select all integer columns or all string columns and order by (` c_birth_year`, `c_birth_month`, `c_birth_day`) every time. ![](../images/blog/sorting/tpcds_customer_type_payload_barplot.svg) As expected, re-ordering strings takes much more time than re-ordering integers. Pandas has an advantage here because it already has the strings in memory, and most likely only needs to re-order pointers to these strings. The database systems need to copy strings twice: Once when reading the input table, and again when creating the output table. Profiling in DuckDB reveals that the actual sorting takes less than a second at SF300, and most time is spent on (de)serializing strings. #### Conclusion DuckDB's new parallel sorting implementation can efficiently sort more data than fits in memory, making use of the speed of modern SSDs. Where other systems crash because they run out of memory, or switch to an external sorting strategy that is much slower, DuckDB's performance gracefully degrades as it goes over the memory limit. The code that was used to run the experiments can be found on [GitHub](https://github.com/lnkuiper/experiments/tree/master/sorting). If we made any mistakes, please let us know! DuckDB is a free and open-source database management system (MIT licensed). It aims to be the SQLite for Analytics, and provides a fast and efficient database system with zero external dependencies. It is available not just for Python, but also for C/C++, R, Java, and more. [Discuss this post on Hacker News](https://news.ycombinator.com/item?id=28328657) [Read our paper on sorting at ICDE '23](https://hannes.muehleisen.org/publications/ICDE2023-sorting.pdf) Listen to Laurens' appearance on the Disseminate podcast: * [Spotify](https://open.spotify.com/show/6IQIF9oRSf0FPjBUj0AkYA) * [Google](https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5hY2FzdC5jb20vcHVibGljL3Nob3dzL2Rpc3NlbWluYXRl) * [Apple](https://podcasts.apple.com/us/podcast/disseminate-the-computer-science-research-podcast/id1631350873) #### Appendix A: Predication Another technique we have used to speed up merge sort is _predication_. With this technique, we turn code with _if/else_ branches into code without branches. Modern CPUs try to predict whether the _if_, or the _else_ branch will be predicted. If this is hard to predict, it can slow down the code. Take a look at the example of pseudo-code with branches below. ```cpp // continue until merged while (l_ptr && r_ptr) { // check which side is smaller if (memcmp(l_ptr, r_ptr, entry) < 0) { // copy from left side and advance memcpy(result_ptr, l_ptr, entry); l_ptr += entry; } else { // copy from right side and advance memcpy(result_ptr, r_ptr, entry); r_ptr += entry; } // advance result result_ptr += entry; } ``` We are merging the data from the left and right blocks into a result block, one entry at a time, by advancing pointers. This code can be made _branchless_ by using the comparison boolean as a 0 or 1, shown in the pseudo-code below. ```cpp // continue until merged while (l_ptr && r_ptr) { // store comparison result in a bool bool left_less = memcmp(l_ptr, r_ptr, entry) < 0; bool right_less = 1 - left_less; // copy from either side memcpy(result_ptr, l_ptr, left_less * entry); memcpy(result_ptr, r_ptr, right_less * entry); // advance either one l_ptr += left_less * entry; l_ptr += right_less * entry; // advance result result_ptr += entry; } ``` When `left_less` is true, it is equal to 1. This means `right_less` is false, and therefore equal to 0. We use this to copy `entry` bytes from the left side, and 0 bytes from the right side, and incrementing the left and right pointers accordingly. With predicated code, the CPU does not have to predict which instructions to execute, which means there will be fewer instruction cache misses! #### Appendix B: Zig-Zagging A simple trick to reduce I/O is zig-zagging through the pairs of blocks to merge in the cascaded merge sort. This is illustrated in the image below (dashes arrows indicate in which order the blocks are merged). ![](../images/blog/sorting/zigzag-light.svg) ![](../images/blog/sorting/zigzag-dark.svg) By zig-zagging through the blocks, we start an iteration by merging the last blocks that were merged in the previous iteration. Those blocks are likely still in memory, saving us some precious read/write operations. #### Appendix C: x86 Experiment We also ran the `catalog_sales` SF100 experiment on a machine with x86 CPU architecture, to get a more fair comparison with HyPer (without Rosetta 2 emulation). The machine has an Intel(R) Xeon(R) W-2145 CPU @ 3.70 GHz, which has 8 cores (up to 16 virtual threads), and 128 GB of RAM, so this time the data fits fully in memory. We have set the number of threads that DuckDB and ClickHouse use to 8 because we saw no visibly improved performance past 8. ![](../images/blog/sorting/jewels_payload.svg) Pandas performs comparatively worse than on the MacBook, because it has a single-threaded implementation, and this CPU has a lower single-thread performance. Again, Pandas crashes with an error (this machine does not dynamically increase swap): ```console numpy.core._exceptions.MemoryError: Unable to allocate 6.32 GiB for an array with shape (6, 141430723) and data type float64 ``` DuckDB, HyPer, and ClickHouse all make good use out of more available threads, being significantly faster than on the MacBook. An interesting pattern in this plot is that DuckDB and HyPer scale very similarly with additional payload columns. Although DuckDB is faster at sorting, re-ordering the payload seems to cost about the same for both systems. Therefore it is likely that HyPer also uses a row layout. ClickHouse scales worse with additional payload columns. ClickHouse does not use a row layout, and therefore has to pay the cost of random access as each column is re-ordered after sorting. ## Windowing in DuckDB **Publication date:** 2021-10-13 **Author:** Richard Wesley **TL;DR:** DuckDB, a free and open-source analytical data management system, has a state-of-the-art windowing engine that can compute complex moving aggregates like inter-quartile ranges as well as simpler moving averages. Window functions (those using the `OVER` clause) are important tools for analyzing data series, but they can be slow if not implemented carefully. In this post, we will take a look at how DuckDB implements windowing. We will also see how DuckDB can leverage its aggregate function architecture to compute useful moving aggregates such as moving inter-quartile ranges (IQRs). #### Beyond Sets The original relational model as developed by Codd in the 1970s treated relations as *unordered sets* of tuples. While this was nice for theoretical computer science work, it ignored the way humans think using physical analogies (the "embodied brain" model from neuroscience). In particular, humans naturally order data to help them understand it and engage with it. To help with this, SQL uses the `SELECT` clause for horizontal layout and the `ORDER BY` clause for vertical layout. Still, the orderings that humans put on data are often more than neurological crutches. For example, time places a natural ordering on measurements, and wide swings in those measurements can themselves be important data, or they may indicate that the data needs to be cleaned by smoothing. Trends may be present or relative changes may be more important for analysis than raw values. To help answer such questions, SQL introduced *analytic* (or *window*) functions in 2003. ##### Window Functions Windowing works by breaking a relation up into independent *partitions*, *ordering* those partitions, and then defining [various functions](#docs:stable:sql:functions:window_functions) that can be computed for each row using the nearby values. These functions include all the aggregate functions (such as `sum` and `avg`) as well as some window-specific functions (such as `rank()` and `nth_value(, )`). Some window functions depend only on the partition boundary and the ordering, but a few (including all the aggregates) also use a *frame*. Frames are specified as a number of rows on either side (*preceding* or *following*) of the *current row*. The distance can either be specified as a number of *rows* or a *range* of values using the partition's ordering value and a distance. ![](../images/blog/windowing/framing.svg) Framing is the most confusing part of the windowing environment, so let's look at a very simple example and ignore the partitioning and ordering for a moment. ```sql SELECT points, sum(points) OVER ( ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS we FROM results; ``` This query computes the `sum` of each point and the points on either side of it: ![](../images/blog/windowing/moving-sum.jpg) Notice that at the edge of the partition, there are only two values added together. ##### Power Generation Example Now let's look at a concrete example of a window function query. Suppose we have some power plant generation data: | Plant | Date | MWh | |:---|:---|---:| | Boston | 2019-01-02 | 564337 | | Boston | 2019-01-03 | 507405 | | Boston | 2019-01-04 | 528523 | | Boston | 2019-01-05 | 469538 | | Boston | 2019-01-06 | 474163 | | Boston | 2019-01-07 | 507213 | | Boston | 2019-01-08 | 613040 | | Boston | 2019-01-09 | 582588 | | Boston | 2019-01-10 | 499506 | | Boston | 2019-01-11 | 482014 | | Boston | 2019-01-12 | 486134 | | Boston | 2019-01-13 | 531518 | | Worcester | 2019-01-02 | 118860 | | Worcester | 2019-01-03 | 101977 | | Worcester | 2019-01-04 | 106054 | | Worcester | 2019-01-05 | 92182 | | Worcester | 2019-01-06 | 94492 | | Worcester | 2019-01-07 | 99932 | | Worcester | 2019-01-08 | 118854 | | Worcester | 2019-01-09 | 113506 | | Worcester | 2019-01-10 | 96644 | | Worcester | 2019-01-11 | 93806 | | Worcester | 2019-01-12 | 98963 | | Worcester | 2019-01-13 | 107170 | The data is noisy, so we want to compute a 7 day moving average for each plant. To do this, we can use this window query: ```sql SELECT "Plant", "Date", avg("MWh") OVER ( PARTITION BY "Plant" ORDER BY "Date" ASC RANGE BETWEEN INTERVAL 3 DAYS PRECEDING AND INTERVAL 3 DAYS FOLLOWING) AS "MWh 7-day Moving Average" FROM "Generation History" ORDER BY 1, 2; ``` This query computes the seven day moving average of the power generated by each power plant on each day. The `OVER` clause is the way that SQL specifies that a function is to be computed in a window. It partitions the data by `Plant` (to keep the different power plants' data separate), orders each plant's partition by `Date` (to put the energy measurements next to each other), and uses a `RANGE` frame of three days on either side of each day for the `avg` (to handle any missing days). Here is the result: | Plant | Date | MWh 7-day
Moving Average | |:---|:---|---:| | Boston | 2019-01-02 | 517450.75 | | Boston | 2019-01-03 | 508793.20 | | Boston | 2019-01-04 | 508529.83 | | Boston | 2019-01-05 | 523459.85 | | Boston | 2019-01-06 | 526067.14 | | Boston | 2019-01-07 | 524938.71 | | Boston | 2019-01-08 | 518294.57 | | Boston | 2019-01-09 | 520665.42 | | Boston | 2019-01-10 | 528859.00 | | Boston | 2019-01-11 | 532466.66 | | Boston | 2019-01-12 | 516352.00 | | Boston | 2019-01-13 | 499793.00 | | Worcester | 2019-01-02 | 104768.25 | | Worcester | 2019-01-03 | 102713.00 | | Worcester | 2019-01-04 | 102249.50 | | Worcester | 2019-01-05 | 104621.57 | | Worcester | 2019-01-06 | 103856.71 | | Worcester | 2019-01-07 | 103094.85 | | Worcester | 2019-01-08 | 101345.14 | | Worcester | 2019-01-09 | 102313.85 | | Worcester | 2019-01-10 | 104125.00 | | Worcester | 2019-01-11 | 104823.83 | | Worcester | 2019-01-12 | 102017.80 | | Worcester | 2019-01-13 | 99145.75 | You can request multiple different `OVER` clauses in the same `SELECT`, and each will be computed separately. Often, however, you want to use the same window for multiple functions, and you can do this by using a `WINDOW` clause to define a *named* window: ```sql SELECT "Plant", "Date", avg("MWh") OVER seven AS "MWh 7-day Moving Average" FROM "Generation History" WINDOW seven AS ( PARTITION BY "Plant" ORDER BY "Date" ASC RANGE BETWEEN INTERVAL 3 DAYS PRECEDING AND INTERVAL 3 DAYS FOLLOWING) ORDER BY 1, 2; ``` This would be useful, for example, if one also wanted the 7-day moving `min` and `max` to show the bounds of the data. #### Under the Feathers That is a long list of complicated functionality! Making it all work relatively quickly has many pieces, so lets have a look at how they all get implemented in DuckDB. ##### Pipeline Breaking The first thing to notice is that windowing is a "pipeline breaker". That is, the `Window` operator has to read all of its inputs before it can start computing a function. This means that if there is some other way to compute something, it may well be faster to use a different technique. One common analytic task is to find the last value in some group. For example, suppose we want the last recorded power output for each plant. It is tempting to use the `rank()` window function with a reverse sort for this task: ```sql SELECT "Plant", "MWh" FROM ( SELECT "Plant", "MWh", rank() OVER ( PARTITION BY "Plant" ORDER BY "Date" DESC) AS r FROM table) t WHERE r = 1; ``` but this requires materialising the entire table, partitioning it, sorting the partitions, and then pulling out a single row from those partitions. A much faster way to do this is to use a self join to filter the table to contain only the last (` max`) value of the `DATE` field: ```sql SELECT table."Plant", "MWh" FROM table, (SELECT "Plant", max("Date") AS "Date" FROM table GROUP BY 1) lasts WHERE table."Plant" = lasts."Plant" AND table."Date" = lasts."Date"; ``` This join query requires two scans of the table, but the only materialised data is the filtering table (which is probably much smaller than the original table), and there is no sorting at all. This type of query showed up [in a user's blog](https://bwlewis.github.io/duckdb_and_r/last/last.html) and we found that the join query was over 20 times faster on their data set: ![](../images/blog/windowing/last-in-group.jpg) Of course most analytic tasks that use windowing *do* require using the `Window` operator, and DuckDB uses a collection of techniques to make the performance as fast as possible. ##### Partitioning and Sorting At one time, windowing was implemented by sorting on both the partition and the ordering fields and then finding the partition boundaries. This is resource intensive, both because the entire relation must be sorted, and because sorting is `O(N log N)` in the size of the relation. Fortunately, there are faster ways to implement this step. To reduce resource consumption, DuckDB uses the partitioning scheme from Leis et al.'s [*Efficient Processing of Window Functions in Analytical SQL Queries*](http://www.vldb.org/pvldb/vol8/p1058-leis.pdf) and breaks the partitions up into 1024 chunks using `O(N)` hashing. The chunks still need to be sorted on all the fields because there may be hash collisions, but each partition can now be 1024 times smaller, which reduces the runtime significantly. Moreover, the partitions can easily be extracted and processed in parallel. Sorting in DuckDB recently got a [big performance boost](https://duckdb.org/2021/08/27/external-sorting), along with the ability to work on partitions that were larger than memory. This functionality has been also added to the `Window` operator, resulting in a 33% improvement in the last-in-group example: ![](../images/blog/windowing/last-in-group-sort.jpg) As a final optimization, even though you can request multiple window functions, DuckDB will collect functions that use the same partitioning and ordering, and share the data layout between those functions. ##### Aggregation Most of the [general-purpose window functions](#docs:stable:sql:functions:window_functions) are straightforward to compute, but windowed aggregate functions can be expensive because they need to look at multiple values for each row. They often need to look at the same value multiple times, or repeatedly look at a large number of values, so over the years several approaches have been taken to improve performance. ###### NaÃ¯ve Windowed Aggregation Before explaining how DuckDB implements windowed aggregation, we need to take a short detour through how ordinary aggregates are implemented. Aggregate "functions" are implemented using three required operations and one optional operation: * *Initialize* â€“ Creates a state that will be updated. For `sum`, this is the running total, starting at `NULL` (because a `sum` of zero items is `NULL`, not zero.) * *Update* â€“ Updates the state with a new value. For `sum`, this adds the value to the state. * *Finalize* â€“ Produces the final aggregate value from the state. For `sum`, this just copies the running total. * *Combine* â€“ Combines two states into a single state. Combine is optional, but when present it allows the aggregate to be computed in parallel. For `sum`, this produces a new state with the sum of the two input values. The simplest way to compute an individual windowed aggregate value is to *initialize* a state, *update* the state with all the values in the window frame, and then use *finalize* to produce the value of the windowed aggregate. This naÃ¯ve algorithm will always work, but it is quite inefficient. For example, a running total will re-add all the values from the start of the partition for each running total, and this has a run time of `O(N^2)`. To improve on this, some databases add additional ["moving state" operations](https://www.postgresql.org/docs/14/sql-createaggregate.html) that can add or remove individual values incrementally. This reduces computation in some common cases, but it can only be used for certain aggregates. For example, it doesn't work for `min`) because you don't know if there are multiple duplicate minima. Moreover, if the frame boundaries move around a lot, it can still degenerate to an `O(N^2)` run time. ###### Segment Tree Aggregation Instead of adding more functions, DuckDB uses the *segment tree* approach from Leis et al. above. This works by building a tree on top of the entire partition with the aggregated values at the bottom. Values are combined into states at nodes above them in the tree until there is a single root: ![](../images/blog/windowing/segment-tree.png) To compute a value, the algorithm generates states for the ragged ends of the frame, *combines* states in the tree above the values in the frame, and *finalizes* the result from the last remaining state. So in the example above (Figure 5 from Leis et al.) only three values need to be added instead of 7. This technique can be used for all *combinable* aggregates. ###### General Windowed Aggregation The biggest drawback of segment trees is the need to manage a potentially large number of intermediate states. For the simple states used for standard distributive aggregates like `sum`, this is not a problem because the states are small, the tree keeps the number of states logarithmically low, and the state used to compute each value is also cheap. For some aggregates, however, the state is not small. Typically these are so-called *holistic* aggregates, where the value depends on all the values of the frame. Examples of such aggregates are `mode` and `quantile`, where each state may have to contain a copy of *all* the values seen so far. While segment trees *can* be used to implement moving versions of any combinable aggregate, this can be quite expensive for large, complex states â€“ and this was not the original goal of the algorithm. To solve this problem, we use the approach from Wesley and Xu's [*Incremental Computation of Common Windowed Holistic Aggregates*](http://www.vldb.org/pvldb/vol9/p1221-wesley.pdf), which generalises segment trees to aggregate-specific data structures. The aggregate can define a fifth optional *window* operation, which will be passed the bottom of the tree and the bounds of the current and previous frame. The aggregate can then create an appropriate data structure for its implementation. For example, the `mode` function maintains a hash table of counts that it can update efficiently, and the `quantile` function maintains a partially sorted list of frame indexes. Moreover, the `quantile` functions can take an array of quantile values, which further increases performance by sharing the partially ordered results among the different quantile values. Because these aggregates can be used in a windowing context, the moving average example above can be easily modified to produce a moving inter-quartile range: ```sql SELECT "Plant", "Date", quantile_cont("MWh", [0.25, 0.5, 0.75]) OVER seven AS "MWh 7-day Moving IQR" FROM "Generation History" WINDOW seven AS ( PARTITION BY "Plant" ORDER BY "Date" ASC RANGE BETWEEN INTERVAL 3 DAYS PRECEDING AND INTERVAL 3 DAYS FOLLOWING) ORDER BY 1, 2; ``` Moving quantiles like this are [more robust to anomalies](https://blogs.sas.com/content/iml/2021/05/26/running-median-smoother.html), which makes them a valuable tool for data series analysis, but they are not generally implemented in most database systems. There are some approaches that can be used in some query engines, but the lack of a general moving aggregation architecture means that these solutions can be [unnatural](https://docs.oracle.com/cd/E57185_01/HIRUG/ch12s07s08.html) or [complex](https://ndesmo.github.io/blog/oracle-moving-metrics/). DuckDB's implementation uses the standard window notation, which means you don't have to learn new syntax or pull the data out into another tool. ###### Ordered Set Aggregates Window functions are often closely associated with some special "[ordered set aggregates](https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE)" defined by the SQL standard. Some databases implement these functions using the `Window` operator, but this is rather inefficient because sorting the data (an `O(N log N)` operation) is not required â€“ it suffices to use Hoare's `O(N)` [`FIND`](https://courses.cs.vt.edu/~cs3114/Summer15/Notes/Supplemental/p321-hoare.pdf) algorithm as used in the STL's [`std::nth_element`](https://en.cppreference.com/w/cpp/algorithm/nth_element). DuckDB translates these ordered set aggregates to use the faster `quantile_cont`, `quantile_disc`, and `mode` regular aggregate functions, thereby avoiding using windowing entirely. ###### Extensions This architecture also means that any new aggregates we add can benefit from the existing windowing infrastructure. DuckDB is an open source project, and we welcome submissions of useful aggregate functions â€“ or you can create your own domain-specific ones in your own fork. At some point we hope to have a UDF architecture that will allow plug-in aggregates, and the simplicity and power of the interface will let these plugins leverage the notational simplicity and run time performance that the internal functions enjoy. #### Conclusion DuckDB's windowing implementation uses a variety of techniques to speed up what can be the slowest part of an analytic query. It is well integrated with the sorting subsystem and the aggregate function architecture, which makes expressing advanced moving aggregates both natural and efficient. DuckDB is a free and open-source database management system (MIT licensed). It aims to be the SQLite for Analytics, and provides a fast and efficient database system with zero external dependencies. It is available not just for Python, but also for C/C++, R, Java, and more. ## DuckDB-Wasm: Efficient Analytical SQL in the Browser **Publication date:** 2021-10-29 **Authors:** AndrÃ© Kohn and Dominik Moritz **TL;DR:** [DuckDB-Wasm](https://github.com/duckdb/duckdb-wasm) is an in-process analytical SQL database for the browser. It is powered by WebAssembly, speaks Arrow fluently, reads Parquet, CSV and JSON files backed by Filesystem APIs or HTTP requests and has been tested with Chrome, Firefox, Safari and Node.js. You can try it at [shell.duckdb.org](https://shell.duckdb.org) or on [Observable](https://observablehq.com/@cmudig/duckdb). ![](../images/blog/duckdb_wasm-light.svg) ![](../images/blog/duckdb_wasm-dark.svg) *DuckDB-Wasm is fast! If you're here for performance numbers, head over to our benchmarks at [shell.duckdb.org/versus](https://shell.duckdb.org/versus).* #### Efficient Analytics in the Browser The web browser has evolved to a universal computation platform that even runs in your car. Its rise has been accompanied by increasing requirements for the browser programming language JavaScript. JavaScript was, first and foremost, designed to be very flexible which comes at the cost of a reduced processing efficiency compared to native languages like C++. This becomes particularly apparent when considering the execution times of more complex data analysis tasks that often fall behind the native execution by orders of magnitude. In the past, such analysis tasks have therefore been pushed to servers that tie any client-side processing to additional round-trips over the internet and introduce their own set of scalability problems. The processing capabilities of browsers were boosted tremendously 4 years ago with the introduction of WebAssembly: > WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable compilation target for programming languages, enabling deployment on the web for client and server applications. > > The Wasm stack machine is designed to be encoded in a size- and load-time efficient binary format. WebAssembly aims to execute at native speed by taking advantage of common hardware capabilities available on a wide range of platforms. > > (ref: [https://webassembly.org/](https://webassembly.org/)) Four years later, the WebAssembly revolution is in full progress with first implementations being shipped in four major browsers. It has already brought us game engines, [entire IDEs](https://blog.stackblitz.com/posts/introducing-webcontainers/) and even a browser version of [Photoshop](https://web.dev/ps-on-the-web/). Today, we join the ranks with a first release of the npm library [@duckdb/duckdb-wasm](https://www.npmjs.com/package/@duckdb/duckdb-wasm). As an in-process analytical database, DuckDB has the rare opportunity to significantly speed up OLAP workloads in the browser. We believe that there is a need for a comprehensive and self-contained data analysis library. DuckDB-wasm automatically offloads your queries to dedicated worker threads and reads Parquet, CSV and JSON files from either your local filesystem or HTTP servers driven by plain SQL input. In this blog post, we want to introduce the library and present challenges on our journey towards a browser-native OLAP database. *DuckDB-Wasm is not yet stable. You will find rough edges and bugs in this release. Please share your thoughts with us [on GitHub](https://github.com/duckdb/duckdb-wasm/discussions).* #### How to Get Data In? Let's dive into examples. DuckDB-Wasm provides a variety of ways to load your data. First, raw SQL value clauses like `INSERT INTO sometable VALUES (1, 'foo'), (2, 'bar')` are easy to formulate and only depend on plain SQL text. Alternatively, SQL statements like `CREATE TABLE foo AS SELECT * FROM 'somefile.parquet'` consult our integrated web filesystem to resolve `somefile.parquet` locally, remotely or from a buffer. The methods `insertCSVFromPath` and `insertJSONFromPath` further provide convenient ways to import CSV and JSON files using additional typed settings like column types. And finally, the method `insertArrowFromIPCStream` (optionally through `insertArrowTable`, `insertArrowBatches` or `insertArrowVectors`) copies raw IPC stream bytes directly into a WebAssembly stream decoder. The following example presents different options how data can be imported into DuckDB-Wasm: ```ts // Data can be inserted from an existing arrow.Table await c.insertArrowTable(existingTable, { name: "arrow_table" }); // ..., from Arrow vectors await c.insertArrowVectors({ col1: arrow.Int32Vector.from([1, 2]), col2: arrow.Utf8Vector.from(["foo", "bar"]), }, { name: "arrow_vectors" }); // ..., from a raw Arrow IPC stream const c = await db.connect(); const streamResponse = await fetch(` someapi`); const streamReader = streamResponse.body.getReader(); const streamInserts = []; while (true) { const { value, done } = await streamReader.read(); if (done) break; streamInserts.push(c.insertArrowFromIPCStream(value, { name: "streamed" })); } await Promise.all(streamInserts); // ..., from CSV files // (interchangeable: registerFile{Text,Buffer,URL,Handle}) await db.registerFileText(` data.csv`, "1|foo\n2|bar\n"); // ... with typed insert options await db.importCSVFromPath('data.csv', { schema: 'main', name: 'foo', detect: false, header: false, delimiter: '|', columns: { col1: new arrow.Int32(), col2: new arrow.Utf8(), } }); // ..., from JSON documents in row-major format await db.registerFileText("rows.json", `[ { "col1": 1, "col2": "foo" }, { "col1": 2, "col2": "bar" }, ]`); // ... or column-major format await db.registerFileText("columns.json", `{ "col1": [1, 2], "col2": ["foo", "bar"] }`); // ... with typed insert options await db.importJSONFromPath('rows.json', { name: 'rows' }); await db.importJSONFromPath('columns.json', { name: 'columns' }); // ..., from Parquet files const pickedFile: File = letUserPickFile(); await db.registerFileHandle("local.parquet", pickedFile); await db.registerFileURL("remote.parquet", "https://origin/remote.parquet"); // ..., by specifying URLs in the SQL text await c.query(` CREATE TABLE direct AS SELECT * FROM 'https://origin/remote.parquet' `); // ..., or by executing raw insert statements await c.query(` INSERT INTO existing_table VALUES (1, "foo"), (2, "bar")`); ``` #### How to Get Data Out? Now that we have the data loaded, DuckDB-Wasm can run queries on two different ways that differ in the result materialization. First, the method `query` runs a query to completion and returns the results as single `arrow.Table`. Second, the method `send` fetches query results lazily through an `arrow.RecordBatchStreamReader`. Both methods are generic and allow for typed results in Typescript: ```ts // Either materialize the query result await conn.query<{ v: arrow.Int32 }>(` SELECT * FROM generate_series(1, 100) t(v) `); // ..., or fetch the result chunks lazily for await (const batch of await conn.send<{ v: arrow.Int32 }>(` SELECT * FROM generate_series(1, 100) t(v) `)) { // ... } ``` Alternatively, you can prepare statements for parameterized queries using: ```ts // Prepare query const stmt = await conn.prepare<{ v: arrow.Int32 }>( `SELECT (v + ?) AS v FROM generate_series(0, 10000) t(v);` ); // ... and run the query with materialized results await stmt.query(234); // ... or result chunks for await (const batch of await stmt.send(234)) { // ... } ``` #### Looks like Arrow to Me DuckDB-Wasm uses [Arrow](https://arrow.apache.org) as data protocol for the data import and all query results. Arrow is a database-friendly columnar format that is organized in chunks of column vectors, called record batches and that support zero-copy reads with only a small overhead. The npm library `apache-arrow` implements the Arrow format in the browser and is already used by other data processing frameworks, like [Arquero](https://github.com/uwdata/arquero). Arrow therefore not only spares us the implementation of the SQL type logic in JavaScript, it also makes us compatible to existing tools. _Why not use plain Javascript objects?_ WebAssembly is isolated and memory-safe. This isolation is part of it's DNA and drives fundamental design decisions in DuckDB-Wasm. For example, WebAssembly introduces a barrier towards the traditional JavaScript heap. Crossing this barrier is difficult as JavaScript has to deal with native function calls, memory ownership and serialization performance. Languages like C++ make this worse as they rely on smart pointers that are not available through the FFI. They leave us with the choice to either pass memory ownership to static singletons within the WebAssembly instance or maintain the memory through C-style APIs in JavaScript, a language that is too dynamic for sound implementations of the [RAII idiom](https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization). The memory-isolation forces us to serialize data before we can pass it to the WebAssembly instance. Browsers can serialize JavaScript objects natively to and from JSON using the functions `JSON.stringify` and `JSON.parse` but this is slower compared to, for example, copying raw native arrays. #### Web Filesystem DuckDB-Wasm integrates a dedicated filesystem for WebAssembly. DuckDB itself is built on top of a virtual filesystem that decouples higher level tasks, such as reading a Parquet file, from low-level filesystem APIs that are specific to the operating system. We leverage this abstraction in DuckDB-Wasm to tailor filesystem implementations to the different WebAssembly environments. The following figure shows our current web filesystem in action. The sequence diagram presents a user running a SQL query that scans a single Parquet file. The query is first offloaded to a dedicated web worker through a JavaScript API. There, it is passed to the WebAssembly module that processes the query until the execution hits the `parquet_scan` table function. This table function then reads the file using a buffered filesystem which, in turn, issues paged reads on the web filesystem. This web filesystem then uses an environment-specific runtime to read the file from several possible locations.

![](../images/blog/webfs-light.svg) ![](../images/blog/webfs-dark.svg)

Depending on the context, the Parquet file may either reside on the local device, on a remote server or in a buffer that was registered by the user upfront. We deliberately treat all three cases equally to unify the retrieval and processing of external data. This does not only simplify the analysis, it also enables more advanced features like partially consuming structured file formats. Parquet files, for example, consist of multiple row groups that store data in a column-major fashion. As a result, we may not need to download the entire file for a query but only required bytes. A query like `SELECT count(*) FROM parquet_scan(...)`, for example, can be evaluated on the file metadata alone and will finish in milliseconds even on remote files that are several terabytes large. Another more general example are paging scans with `LIMIT` and `OFFSET` qualifiers such as `SELECT * FROM parquet_scan(...) LIMIT 20 OFFSET 40`, or queries with selective filter predicates where entire row groups can be skipped based on metadata statistics. These partial file reads are no groundbreaking novelty and could be implemented in JavaScript today, but with DuckDB-Wasm, these optimizations are now driven by the semantics of SQL queries instead of fine-tuned application logic. *Note: The common denominator among the available File APIs is unfortunately not large. This limits the features that we can provide in the browser. For example, local persistency of DuckDB databases would be a feature with significant impact but requires a way to either read and write synchronously into user-provided files or IndexedDB. We might be able to bypass these limitations in the future but this is subject of ongoing research.* #### Advanced Features WebAssembly 1.0 has landed in all major browsers. The WebAssembly Community Group fixed the design of this first version back in November 2017, which is now referred to as WebAssembly MVP. Since then, the development has been ongoing with eight additional features that have been added to the standard and at least five proposals that are currently in progress. The rapid pace of this development presents challenges and opportunities for library authors. On the one hand, the different features find their way into the browsers at different speeds which leads to a fractured space of post-MVP functionality. On the other hand, features can bring flat performance improvements and are therefore indispensable when aiming for a maximum performance. The most promising feature for DuckDB-Wasm is [exception handling](https://github.com/WebAssembly/exception-handling/blob/main/proposals/exception-handling/Exceptions.md) which is already enabled by default in Chrome 95. DuckDB and DuckDB-Wasm are written in C++ and use exceptions for faulty situations. DuckDB does not use exceptions for general control flow but to automatically propagate errors upwards to the top-level plan driver. In native environments, these exceptions are implemented as "zero-cost exceptions" as they induce no overhead until they are thrown. With the WebAssembly MVP, however, that is no longer possible as the compiler toolchain Emscripten has to emulate exceptions through JavaScript. Without WebAssembly exceptions, DuckDB-Wasm calls throwing functions through a JavaScript hook that can catch exceptions emulated through JavaScript `aborts`. An example for these hook calls is shown in the following figure. Both stack traces originate from a single paged read of a Parquet file in DuckDB-Wasm. The left side shows a stack trace with the WebAssembly MVP and requires multiple calls through the functions `wasm-to-js-i*` . The right stack trace uses WebAssembly exceptions without any hook calls.

![](../images/blog/wasm-eh.png)

This fractured feature space is a temporary challenge that will be resolved once high-impact features like exception handling, SIMD and bulk-memory operations are available everywhere. In the meantime, we will ship multiple WebAssembly modules that are compiled for different feature sets and adaptively pick the best bundle for you using dynamic browser checks. The following example shows how the asynchronous version of DuckDB-Wasm can be instantiated using either manual or JsDelivr bundles: ```ts // Import the ESM bundle (supports tree-shaking) import * as duckdb from '@duckdb/duckdb-wasm/dist/duckdb-esm.js'; // Either bundle them manually, for example as Webpack assets import duckdb_wasm from '@duckdb/duckdb-wasm/dist/duckdb.wasm'; import duckdb_wasm_next from '@duckdb/duckdb-wasm/dist/duckdb-next.wasm'; import duckdb_wasm_next_coi from '@duckdb/duckdb-wasm/dist/duckdb-next-coi.wasm'; const WEBPACK_BUNDLES: duckdb.DuckDBBundles = { asyncDefault: { mainModule: duckdb_wasm, mainWorker: new URL('@duckdb/duckdb-wasm/dist/duckdb-browser-async.worker.js', import.meta.url).toString(), }, asyncNext: { mainModule: duckdb_wasm_next, mainWorker: new URL('@duckdb/duckdb-wasm/dist/duckdb-browser-async-next.worker.js', import.meta.url).toString(), }, asyncNextCOI: { mainModule: duckdb_wasm_next_coi, mainWorker: new URL( '@duckdb/duckdb-wasm/dist/duckdb-browser-async-next-coi.worker.js', import.meta.url, ).toString(), pthreadWorker: new URL( '@duckdb/duckdb-wasm/dist/duckdb-browser-async-next-coi.pthread.worker.js', import.meta.url, ).toString(), }, }; // ..., or load the bundles from jsdelivr const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles(); // Select a bundle based on browser checks const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES); // Instantiate the asynchronous version of DuckDB-Wasm const worker = new Worker(bundle.mainWorker!); const logger = new duckdb.ConsoleLogger(); const db = new duckdb.AsyncDuckDB(logger, worker); await db.instantiate(bundle.mainModule, bundle.pthreadWorker); ``` *You can also test the features and selected bundle in your browser using the web shell command `.features` .* #### Multithreading In 2018, the Spectre and Meltdown vulnerabilities sent crippling shockwaves through the internet. Today, we are facing the repercussions of these events, in particular in software that runs arbitrary user code â€“ such as web browsers. Shortly after the publications, all major browser vendors restricted the use of `SharedArrayBuffers` to prevent dangerous timing attacks. `SharedArrayBuffers` are raw buffers that can be shared among web workers for global state and an alternative to the browser-specific message passing. These restrictions had detrimental effects on WebAssembly modules since `SharedArrayBuffers` are necessary for the implementation of POSIX threads in WebAssembly. Without `SharedArrayBuffers`, WebAssembly modules can run in a dedicated web worker to unblock the main event loop but won't be able to spawn additional workers for parallel computations within the same instance. By default, we therefore cannot unleash the parallel query execution of DuckDB in the web. However, browser vendors have recently started to reenable `SharedArrayBuffers` for websites that are [cross-origin-isolated](https://web.dev/coop-coep/). A website is cross-origin-isolated if it ships the main document with the following HTTP headers: ```text Cross-Origin-Embedder-Policy: require-corp Cross-Origin-Opener-Policy: same-origin ``` These headers will instruct browsers to A) isolate the top-level document from other top-level documents outside its own origin and B) prevent the document from making arbitrary cross-origin requests unless the requested resource explicitly opts in. Both restrictions have far reaching implications for a website since many third-party data sources won't yet provide the headers today and the top-level isolation currently hinders the communication with, for example, OAuth pop up's ([there are plans to lift that](https://github.com/whatwg/html/issues/6364)). *We therefore assume that DuckDB-Wasm will find the majority of users on non-isolated websites. We are, however, experimenting with dedicated bundles for isolated sites using the suffix `-next-coi`) and will closely monitor the future need of our users.* #### Web Shell We further host a web shell powered by DuckDB-Wasm alongside the library release at [shell.duckdb.org](https://shell.duckdb.org). Use the following shell commands to query remote TPC-H files at scale factor 0.01. When querying your own, make sure to properly set CORS headers since your browser will otherwise block these requests. You can alternatively use the `.files` command to register files from the local filesystem. ```sql .timer on SELECT count(*) FROM 'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/lineitem.parquet'; SELECT count(*) FROM 'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/customer.parquet'; SELECT avg(c_acctbal) FROM 'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/customer.parquet'; SELECT * FROM 'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/orders.parquet' LIMIT 10; SELECT n_name, avg(c_acctbal) FROM 'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/customer.parquet', 'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/nation.parquet' WHERE c_nationkey = n_nationkey GROUP BY n_name; SELECT * FROM 'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/region.parquet', 'https://blobs.duckdb.org/data/tpch-sf0.01-parquet/nation.parquet' WHERE r_regionkey = n_regionkey; ``` #### Evaluation The following table teases the execution times of some TPC-H queries at scale factor 0.5 using the libraries [DuckDB-Wasm](https://www.npmjs.com/package/@duckdb/duckdb-wasm), [sql.js](https://github.com/sql-js/sql.js/), [Arquero](https://github.com/uwdata/arquero) and [Lovefield](https://github.com/google/lovefield). You can find a more in-depth discussion with all TPC-H queries, additional scale factors and microbenchmarks on the [â€œDuckDB-Wasm versus Xâ€ page](https://shell.duckdb.org/versus). | Query | DuckDB-Wasm | sql.js | Arquero | Lovefield | |--:|--:|--:|--:|--:| | 1 | **0.855 s** | 8.441 s | 24.031 s | 12.666 s | | 3 | **0.179 s** | 1.758 s | 16.848 s | 3.587 s | | 4 | **0.151 s** | 0.384 s | 6.519 s | 3.779 s | | 5 | **0.197 s** | 1.965 s | 18.286 s | 13.117 s | | 6 | **0.086 s** | 1.294 s | 1.379 s | 5.253 s | | 7 | **0.319 s** | 2.677 s | 6.013 s | 74.926 s | | 8 | **0.236 s** | 4.126 s | 2.589 s | 18.983 s | | 10 | **0.351 s** | 1.238 s | 23.096 s | 18.229 s | | 12 | **0.276 s** | 1.080 s | 11.932 s | 10.372 s | | 13 | **0.194 s** | 5.887 s | 16.387 s | 9.795 s | | 14 | **0.086 s** | 1.194 s | 6.332 s | 6.449 s | | 16 | **0.137 s** | 0.453 s | 0.294 s | 5.590 s | | 19 | **0.377 s** | 1.272 s | 65.403 s | 9.977 s | #### Future Research We believe that WebAssembly unveils hitherto dormant potential for shared query processing between clients and servers. Pushing computation closer to the client can eliminate costly round-trips to the server and thus increase interactivity and scalability of in-browser analytics. We further believe that the release of DuckDB-Wasm could be the first step towards a more universal data plane spanning across multiple layers including traditional database servers, clients, CDN workers and computational storage. As an in-process analytical database, DuckDB might be the ideal driver for distributed query plans that increase the scalability and interactivity of SQL databases at low costs. ## Fast Moving Holistic Aggregates **Publication date:** 2021-11-12 **Author:** Richard Wesley **TL;DR:** DuckDB, a free and open-source analytical data management system, has a windowing API that can compute complex moving aggregates like interquartile ranges and median absolute deviation much faster than the conventional approaches. In a [previous post](https://duckdb.org/2021/10/13/windowing), we described the DuckDB windowing architecture and mentioned the support for some advanced moving aggregates. In this post, we will compare the performance various possible moving implementations of these functions and explain how DuckDB's performant implementations work. #### What Is an Aggregate Function? When people think of aggregate functions, they typically have something simple in mind such as `SUM` or `AVG`. But more generally, what an aggregate function does is _summarise_ a set of values into a single value. Such summaries can be arbitrarily complex, and involve any data type. For example, DuckDB provides aggregates for concatenating strings (` STRING_AGG`) and constructing lists (` LIST`). In SQL, aggregated sets come from either a `GROUP BY` clause or an `OVER` windowing specification. ##### Holistic Aggregates All of the basic SQL aggregate functions like `SUM` and `MAX` can be computed by reading values one at a time and throwing them away. But there are some functions that potentially need to keep track of all the values before they can produce a result. These are called _holistic_ aggregates, and they require more care when implementing. For some aggregates (like `STRING_AGG`) the order of the values can change the result. This is not a problem for windowing because `OVER` clauses can specify an ordering, but in a `GROUP BY` clause, the values are unordered. To handle this, order-sensitive aggregates can include a `WITHIN GROUP(ORDER BY )` clause to specify the order of the values. Because the values must all be collected and sorted, aggregates that use the `WITHIN GROUP` clause are holistic. ##### Statistical Holistic Aggregates Because sorting the arguments to a windowed aggregate can be specified with the `OVER` clause, you might wonder if there are any other kinds of holistic aggregates that do not use sorting, or which use an ordering different from the one in the `OVER` clause. It turns out that there are a number of important statistical functions that turn into holistic aggregates in SQL. In particular, here are the statistical holistic aggregates that DuckDB currently supports: | Function | Description| |:---|:---| | `mode(x)` | The most common value in a set | | `median(x)` | The middle value of a set | | `quantile_disc(x, )` | The exact value corresponding to a fractional position. | | `quantile_cont(x, )` | The interpolated value corresponding to a fractional position. | | `quantile_disc(x, [...])` | A list of the exact values corresponding to a list of fractional positions. | | `quantile_cont(x, [...])` | A list of the interpolated value corresponding to a list of fractional positions. | | `mad(x)` | The median of the absolute values of the differences of each value from the median. | Where things get really interesting is when we try to compute moving versions of these aggregates. For example, computing a moving `AVG` is fairly straightforward: You can subtract values that have left the frame and add in the new ones, or use the segment tree approach from the [previous post on windowing](https://duckdb.org/2021/10/13/windowing). ##### Python Example Computing a moving median is not as easy. Let's look at a simple example of how we might implement moving `median` in Python for the following string data, using a frame that includes one element from each side: ![](../images/blog/holistic/python.svg) For this example we are using strings so we don't have to worry about interpolating values. ```python data = ('a', 'b', 'c', 'd', 'c', 'b',) w = len(data) for row in range(w): l = max(row - 1, 0) # First index of the frame r = min(row + 1, w-1) # Last index of the frame frame = list(data[l:r+1]) # Copy the frame values frame.sort() # Sort the frame values n = (r - l) // 2 # Middle index of the frame median = frame[n] # The median is the middle value print(row, data[row], median) ``` Each frame has a different set of values to aggregate and we can't change the order in the table, so we have to copy them each time before we sort. Sorting is slow, and there is a lot of repetition. All of these holistic aggregates have similar problems if we just reuse the simple implementations for moving versions. Fortunately, there are much faster approaches for all of them. #### Moving Holistic Aggregation In the [previous post on windowing](https://duckdb.org/2021/10/13/windowing), we explained the component operations used to implement a generic aggregate function (initialize, update, finalize, combine and window). In the rest of this post, we will dig into how they can be implemented for these complex aggregates. ##### Quantile The `quantile` aggregate variants all extract the value(s) at a given fraction (or fractions) of the way through the ordered list of values in the set. The simplest variant is the `median` function, which we met in the introduction, which uses a fraction of `0.5`. There are other variants depending on whether the values are quantitative (i.e., they have a distance and the values can be interpolated) or merely ordinal (i.e., they can be ordered, but ties have to be broken.) Still other variants depend on whether the fraction is a single value or a list of values, but they can all be implemented in similar ways. A common way to implement `quantile` that we saw in the Python example is to collect all the values into the state, sort them, and then read out the values at the requested positions. (This is probably why the SQL standard refers to it as an "ordered-set aggregate".) States can be combined by concatenation, which lets us group in parallel and build segment trees for windowing. This approach is very time-consuming because sorting is `O(N log N)`, but happily for `quantile` we can use a related algorithm called `QuickSelect`, which can find a positional value in only `O(N)` time by _partially sorting_ the array. You may have run into this algorithm if you have ever used the `std::nth_element` algorithm in the C++ standard library. This works well for grouped quantiles, but for moving quantiles the segment tree approach ends up being about 5% slower than just starting from scratch for each value. To really improve the performance of moving quantiles, we note that the partial order probably does not change much between frames. If we maintain a list of indirect indices into the window and call `nth_element` on the indices, we can reorder the partially ordered indices instead of the values themselves. In the common case where the frame has the same size, we can even check to see whether the new value disrupts the partial ordering at all, and skip the reordering! With this approach, we can obtain a significant performance boost of 1.5-10 times. In this example, we have a 3-element frame (green) that moves one space to the right for each value: ![](../images/blog/holistic/median.svg) The median values in orange must be computed from scratch. Notice that in the example, this only happens at the start of the window. The median values in white are computed using the existing partial ordering. In the example, this happens when the frame changes size. Finally, the median values in blue do not require reordering because the new value is the same as the old value. With this algorithm, we can create a faster implementation of single-fraction `quantile` without sorting. ##### InterQuartile Ranges (IQR) We can extend this implementation to _lists_ of fractions by leveraging the fact that each call to `nth_element` partially orders the values, which further improves performance. The "reuse" trick can be generalised to distinguish between fractions that are undisturbed and ones that need to be recomputed. A common application of multiple fractions is computing [interquartile ranges](https://en.wikipedia.org/wiki/Interquartile_range) by using the fraction list `[0.25, 0.5, 0.75]`. This is the fraction list we use for the multiple fraction benchmarks. Combined with moving `MIN` and `MAX`, this moving aggregate can be used to generate the data for a moving box-and-whisker plot. ##### Median Absolute Deviation (MAD) Maintaining the partial ordering can also be used to boost the performance of the [median absolute deviation](https://en.wikipedia.org/wiki/Median_absolute_deviation) (or `mad`) aggregate. Unfortunately, the second partial ordering can't use the single value trick because the "function" being used to partially order the values will have changed if the data median changes. Still, the values are still probably not far off, which again improves the performance of `nth_element`. ##### Mode The `mode` aggregate returns the most common value in a set. One common way to implement it is to accumulate all the values in the state, sort them and then scan for the longest run. These states can be combined by merging, which lets us compute the mode in parallel and build segment trees for windowing. Once again, this approach is very time-consuming because sorting is `O(N log N)`. It may also use more memory than necessary because it keeps _all_ the values instead of keeping only the unique values. If there are heavy-hitters in the list, (which is typically what `mode` is being used to find) this can be significant. Another way to implement `mode` is to use a hash map for the state that maps values to counts. Hash tables are typically `O(N)` for accumulation, which is an improvement on sorting, and they only need to store unique values. If the state also tracks the largest value and count seen so far, we can just return that value when we finalize the aggregate. States can be combined by merging, which lets us group in parallel and build segment trees for windowing. Unfortunately, as the benchmarks below demonstrate, this segment tree approach for windowing is quite slow! The overhead of merging the hash tables for the segment trees turns out to be about 5% slower than just building a new hash table for each row in the window. But for a moving `mode` computation, we can instead make a single hash table and update it every time the frame moves, removing the old values, adding the new values, and updating the value/count pair. At times the current mode value may have its count decremented, but when that happens we can rescan the table to find the new mode. In this example, the 4-element frame (green) moves one space to the right for each value: ![](../images/blog/holistic/mode.svg) When the mode is unchanged (blue) it can be used directly. When the mode becomes ambiguous (orange), we must recan the table. This approach is much faster, and in the benchmarks it comes in between 15 and 55 times faster than the other two. #### Microbenchmarks To benchmark the various implementations, we run moving window queries against a 10M table of integers: ```sql CREATE TABLE rank100 AS SELECT b % 100 AS a, b FROM range(10000000) tbl(b); ``` The results are then re-aggregated down to one row to remove the impact of streaming the results. The frames are 100 elements wide, and the test is repeated with a fixed trailing frame: ```sql SELECT quantile_cont(a, [0.25, 0.5, 0.75]) OVER ( ORDER BY b ASC ROWS BETWEEN 100 PRECEDING AND CURRENT ROW) AS iqr FROM rank100; ``` and a variable frame that moves pseudo-randomly around the current value: ```sql SELECT quantile_cont(a, [0.25, 0.5, 0.75]) OVER ( ORDER BY b ASC ROWS BETWEEN mod(b * 47, 521) PRECEDING AND 100 - mod(b * 47, 521) FOLLOWING) AS iqr FROM rank100; ``` The two examples here are the interquartile range queries; the other queries use the single argument aggregates `median`, `mad` and `mode`. As a final step, we ran the same query with `count(*)`, which has the same overhead as the other benchmarks, but is trivial to compute (it just returns the frame size). That overhead was subtracted from the run times to give the algorithm timings: ![](../images/blog/holistic/benchmarks.svg) As can be seen, there is a substantial benefit from implementing the window operation for all of these aggregates, often on the order of a factor of ten. An unexpected finding was that the segment tree approach for these complex states is always slower (by about 5%) than simply creating the state for each output row. This suggests that when writing combinable complex aggregates, it is well worth benchmarking the aggregate and then considering providing a window operation instead of deferring to the segment tree machinery. #### Conclusion DuckDB's aggregate API enables aggregate functions to define a windowing operation that can significantly improve the performance of moving window computations for complex aggregates. This functionality has been used to significantly speed up windowing for several statistical aggregates, such as mode, interquartile ranges and median absolute deviation. DuckDB is a free and open-source database management system (MIT licensed). It aims to be the SQLite for Analytics, and provides a fast and efficient database system with zero external dependencies. It is available not just for Python, but also for C/C++, R, Java, and more. ## DuckDB â€“ Lord of the Enums: The Fellowship of the Categorical and Factors **Publication date:** 2021-11-26 **Author:** Pedro Holanda ![](../images/blog/duck-lotr.png) String types are one of the most commonly used types. However, often string columns have a limited number of distinct values. For example, a country column will never have more than a few hundred unique entries. Storing a data type as a plain string causes a waste of storage and compromises query performance. A better solution is to dictionary encode these columns. In dictionary encoding, the data is split into two parts: the category and the values. The category stores the actual strings, and the values stores a reference to the strings. This encoding is depicted below. ![](../images/blog/dictionary-encoding.png) In the old times, users would manually perform dictionary encoding by creating lookup tables and translating their ids back with join operations. Environments like Pandas and R support these types more elegantly. [Pandas Categorical](https://pandas.pydata.org/docs/reference/api/pandas.Categorical.html) and [R Factors](https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Factors) are types that allow for columns of strings with many duplicate entries to be efficiently stored through dictionary encoding. Dictionary encoding not only allows immense storage savings but also allows systems to operate on numbers instead of on strings, drastically boosting query performance. By lowering RAM usage, `ENUM`s also allow DuckDB to scale to significantly larger datasets. To allow DuckDB to fully integrate with these encoded structures, we implemented Enum Types. This blog post will show code snippets of how to use `ENUM` types from both SQL API and Python/R clients, and will demonstrate the performance benefits of the enum types over using regular strings. To the best of our knowledge, DuckDB is the first RDBMS that natively integrates with Pandas categorical columns and R factors. #### SQL Our Enum SQL syntax is heavily inspired by [Postgres](https://www.postgresql.org/docs/9.1/datatype-enum.html). Below, we depict how to create and use the `ENUM` type. ```sql CREATE TYPE lotr_race AS ENUM ('Mayar', 'Hobbit', 'Orc'); CREATE TABLE character ( name text, race lotr_race ); INSERT INTO character VALUES ('Frodo Quackins','Hobbit'), ('Quackalf ', 'Mayar'); -- We can perform a normal string comparison -- Note that 'Hobbit' will be cast to a lotr_race -- hence this comparison is actually a fast integer comparison SELECT name FROM character WHERE race = 'Hobbit'; ---- Frodo Quackins ``` `ENUM` columns behave exactly the same as normal `VARCHAR` columns. They can be used in string functions (such as `LIKE` or `substring`), they can be compared, ordered, etc. The only exception is that `ENUM` columns can only hold the values that are specified in the enum definition. Inserting a value that is not part of the enum definition will result in an error. DuckDB `ENUM`s are currently static (i.e., values can not be added or removed after the `ENUM` definition). However, `ENUM` updates are on the roadmap for the next version. See [the documentation](#docs:stable:sql:data_types:enum) for more information. #### Python ##### Setup First we need to install DuckDB and Pandas. The installation process of both libraries in Python is straightforward: ```bash # Python Install pip install duckdb pip install pandas ``` ##### Usage Pandas columns from the categorical type are directly converted to DuckDB's `ENUM` types: ```python import pandas as pd import duckdb # Our unencoded data. data = ['Hobbit', 'Elf', 'Elf', 'Man', 'Mayar', 'Hobbit', 'Mayar'] # 'pd.Categorical' automatically encodes the data as a categorical column df_in = pd.DataFrame({'races': pd.Categorical(data),}) # We can query this dataframe as we would any other # The conversion from categorical columns to enums happens automatically df_out = duckdb.execute("SELECT * FROM df_in").df() ``` #### R ##### Setup We only need to install DuckDB in our R client, and we are ready to go. ```R # R Install install.packages("duckdb") ``` ##### Usage Similar to our previous example with Pandas, R Factor columns are also automatically converted to DuckDB's `ENUM` types. ```r library ("duckdb") con <- dbConnect(duckdb::duckdb()) on.exit(dbDisconnect(con, shutdown = TRUE)) # Our unencoded data. data <- c('Hobbit', 'Elf', 'Elf', 'Man', 'Mayar', 'Hobbit', 'Mayar') # Our R dataframe holding an encoded version of our data column # 'as.factor' automatically encodes it. df_in <- data.frame(races=as.factor(data)) duckdb::duckdb_register(con, "characters", df_in) df_out <- dbReadTable(con, "characters") ``` #### Benchmark Comparison To demonstrate the performance of DuckDB when running operations on categorical columns of Pandas DataFrames, we present a number of benchmarks. The source code for the benchmarks is available on [GitHub](https://raw.githubusercontent.com/duckdb/duckdb-web/main/_posts/benchmark_scripts/enum.py). In our benchmarks we always consume and produce Pandas DataFrames. ##### Dataset Our dataset is composed of one dataframe with 4 columns and 10 million rows. The first two columns are named `race` and `subrace` representing races. They are both categorical, with the same categories but different values. The other two columns `race_string` and `subrace_string` are the string representations of `race` and `subrace```. ```python def generate_df(size): race_categories = ['Hobbit', 'Elf', 'Man', 'Mayar'] race = np.random.choice(race_categories, size) subrace = np.random.choice(race_categories, size) return pd.DataFrame({'race': pd.Categorical(race), 'subrace': pd.Categorical(subrace), 'race_string': race, 'subrace_string': subrace,}) size = pow(10,7) #10,000,000 rows df = generate_df(size) ``` ##### Grouped Aggregation In our grouped aggregation benchmark, we do a count of how many characters for each race we have in the `race` or `race_string` column of our table. ```python def duck_categorical(df): return con.execute("SELECT race, count(*) FROM df GROUP BY race").df() def duck_string(df): return con.execute("SELECT race_string, count(*) FROM df GROUP BY race_string").df() def pandas(df): return df.groupby(['race']).agg({'race': 'count'}) def pandas_string(df): return df.groupby(['race_string']).agg({'race_string': 'count'}) ``` The table below depicts the timings of this operation. We can see the benefits of performing grouping on encoded values over strings, with DuckDB being 4Ã— faster when grouping small unsigned values. | Name | Time (s) | |:-------------|---------:| | DuckDB (Categorical) | 0.01 | | DuckDB (String) | 0.04 | | Pandas (Categorical) | 0.06 | | Pandas (String) | 0.40 | ##### Filter In our filter benchmark, we do a count of how many Hobbit characters we have in the `race` or `race_string` column of our table. ```python def duck_categorical(df): return con.execute("SELECT count(*) FROM df WHERE race = 'Hobbit'").df() def duck_string(df): return con.execute("SELECT count(*) FROM df WHERE race_string = 'Hobbit'").df() def pandas(df): filtered_df = df[df.race == "Hobbit"] return filtered_df.agg({'race': 'count'}) def pandas_string(df): filtered_df = df[df.race_string == "Hobbit"] return filtered_df.agg({'race_string': 'count'}) ``` For the DuckDB enum type, DuckDB converts the string `Hobbit` to a value in the `ENUM`, which returns an unsigned integer. We can then do fast numeric comparisons, instead of expensive string comparisons, which results in greatly improved performance. | Name | Time (s) | |:-------------|---------:| | DuckDB (Categorical) | 0.003 | | DuckDB (String) | 0.023 | | Pandas (Categorical) | 0.158 | | Pandas (String) | 0.440 | ##### Enum â€“ Enum Comparison In this benchmark, we perform an equality comparison of our two breed columns. `race` and `subrace` or `race_string` and `subrace_string``` ```python def duck_categorical(df): return con.execute("SELECT count(*) FROM df WHERE race = subrace").df() def duck_string(df): return con.execute("SELECT count(*) FROM df WHERE race_string = subrace_string").df() def pandas(df): filtered_df = df[df.race == df.subrace] return filtered_df.agg({'race': 'count'}) def pandas_string(df): filtered_df = df[df.race_string == df.subrace_string] return filtered_df.agg({'race_string': 'count'}) ``` DuckDB `ENUM`s can be compared directly on their encoded values. This results in a time difference similar to the previous case, again because we are able to compare numeric values instead of strings. | Name | Time (s) | |:----------------------|---------:| | DuckDB (Categorical) | 0.005 | | DuckDB (String) | 0.040 | | Pandas (Categorical) | 0.130 | | Pandas (String) | 0.550 | ##### Storage In this benchmark, we compare the storage savings of storing `ENUM` Types vs Strings. ```python race_categories = ['Hobbit', 'Elf', 'Man','Mayar'] race = np.random.choice(race_categories, size) categorical_race = pd.DataFrame({'race': pd.Categorical(race),}) string_race = pd.DataFrame({'race': race,}) con = duckdb.connect('duck_cat.db') con.execute("CREATE TABLE character AS SELECT * FROM categorical_race") con = duckdb.connect('duck_str.db') con.execute("CREATE TABLE character AS SELECT * FROM string_race") ``` The table below depicts the DuckDB file size differences when storing the same column as either an Enum or a plain string. Since the dictionary-encoding does not repeat the string values, we can see a reduction of one order of magnitude in size. | Name | Size (MB) | |:---------------------|----------:| | DuckDB (Categorical) | 11 | | DuckDB (String) | 102 | #### What about the Sequels? There are three main directions we will pursue in the following versions of DuckDB related to `ENUM`s. 1. Automatic Storage Encoding: As described in the introduction, users frequently define database columns as Strings when in reality they are `ENUM`s. Our idea is to automatically detect and dictionary-encode these columns, without any input of the user and in a way that is completely invisible to them. 2. `ENUM` Updates: As said in the introduction, our `ENUM`s are currently static. We will allow the insertion and removal of `ENUM` categories. 3. Integration with other Data Formats: We want to expand our integration with data formats that implement `ENUM`-like structures. #### Feedback As usual, let us know what you think about our `ENUM` integration, which data formats you would like us to integrate with and any ideas you would like us to pursue on this topic! Feel free to send me an [email](#mailto:[email protected]). If you encounter any problems when using our `ENUM`s, please open an issue in our [issue tracker](https://github.com/duckdb/duckdb/issues)! ## DuckDB Quacks Arrow: A Zero-Copy Data Integration between Apache Arrow and DuckDB **Publication date:** 2021-12-03 **Authors:** Pedro Holanda and Jonathan Keane **TL;DR:** The zero-copy integration between DuckDB and Apache Arrow allows for rapid analysis of larger than memory datasets in Python and R using either SQL or relational APIs. This post is a collaboration with and cross-posted on the [Arrow blog](https://arrow.apache.org/blog/2021/12/03/arrow-duckdb/). Part of [Apache Arrow](https://arrow.apache.org) is an in-memory data format optimized for analytical libraries. Like Pandas and R Dataframes, it uses a columnar data model. But the Arrow project contains more than just the format: The Arrow C++ library, which is accessible in Python, R, and Ruby via bindings, has additional features that allow you to compute efficiently on datasets. These additional features are on top of the implementation of the in-memory format described above. The datasets may span multiple files in Parquet, CSV, or other formats, and files may even be on remote or cloud storage like HDFS or Amazon S3. The Arrow C++ query engine supports the streaming of query results, has an efficient implementation of complex data types (e.g., Lists, Structs, Maps), and can perform important scan optimizations like Projection and Filter Pushdown. [DuckDB](https://www.duckdb.org) is a new analytical data management system that is designed to run complex SQL queries within other processes. DuckDB has bindings for R and Python, among others. DuckDB can query Arrow datasets directly and stream query results back to Arrow. This integration allows users to query Arrow data using DuckDB's SQL Interface and API, while taking advantage of DuckDB's parallel vectorized execution engine, without requiring any extra data copying. Additionally, this integration takes full advantage of Arrow's predicate and filter pushdown while scanning datasets. This integration is unique because it uses zero-copy streaming of data between DuckDB and Arrow and vice versa so that you can compose a query using both together. This results in three main benefits: 1. **Larger Than Memory Analysis:** Since both libraries support streaming query results, we are capable of executing on data without fully loading it from disk. Instead, we can execute one batch at a time. This allows us to execute queries on data that is bigger than memory. 2. **Complex Data Types:** DuckDB can efficiently process complex data types that can be stored in Arrow vectors, including arbitrarily nested structs, lists, and maps. 3. **Advanced Optimizer:** DuckDB's state-of-the-art optimizer can push down filters and projections directly into Arrow scans. As a result, only relevant columns and partitions will be read, allowing the system to e.g., take advantage of partition elimination in Parquet files. This significantly accelerates query execution. For those that are just interested in benchmarks, you can jump ahead [benchmark section below](#::Benchmark Comparison). #### Quick Tour Before diving into the details of the integration, in this section we provide a quick motivating example of how powerful and simple to use is the DuckDB-Arrow integration. With a few lines of code, you can already start querying Arrow datasets. Say you want to analyze the infamous [NYC Taxi Dataset](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) and figure out if groups tip more or less than single riders. ##### R Both Arrow and DuckDB support dplyr pipelines for people more comfortable with using dplyr for their data analysis. The Arrow package includes two helper functions that allow us to pass data back and forth between Arrow and DuckDB (` to_duckdb()` and `to_arrow()`). This is especially useful in cases where something is supported in one of Arrow or DuckDB but not the other. For example, if you find a complex dplyr pipeline where the SQL translation doesn't work with DuckDB, use `to_arrow()` before the pipeline to use the Arrow engine. Or, if you have a function (e.g., windowed aggregates) that aren't yet implemented in Arrow, use `to_duckdb()` to use the DuckDB engine. All while not paying any cost to (re)serialize the data when you pass it back and forth! ```R library(duckdb) library(arrow) library(dplyr) # Open dataset using year, month folder partition ds <- arrow::open_dataset("nyc-taxi", partitioning = c("year", "month")) ds %>% # Look only at 2015 on, where the number of passenger is positive, the trip distance is # greater than a quarter mile, and where the fare amount is positive filter(year > 2014 & passenger_count > 0 & trip_distance > 0.25 & fare_amount > 0) %>% # Pass off to DuckDB to_duckdb() %>% group_by(passenger_count) %>% mutate(tip_pct = tip_amount / fare_amount) %>% summarise( fare_amount = mean(fare_amount, na.rm = TRUE), tip_amount = mean(tip_amount, na.rm = TRUE), tip_pct = mean(tip_pct, na.rm = TRUE) ) %>% arrange(passenger_count) %>% collect() ``` ##### Python The workflow in Python is as simple as it is in R. In this example we use DuckDB's Relational API. ```python import duckdb import pyarrow as pa import pyarrow.dataset as ds # Open dataset using year, month folder partition nyc = ds.dataset('nyc-taxi/', partitioning=["year", "month"]) # We transform the nyc dataset into a DuckDB relation nyc = duckdb.arrow(nyc) # Run same query again nyc.filter("year > 2014 & passenger_count > 0 & trip_distance > 0.25 & fare_amount > 0") .aggregate("SELECT avg(fare_amount), avg(tip_amount), avg(tip_amount / fare_amount) AS tip_pct", "passenger_count").arrow() ``` #### DuckDB and Arrow: The Basics In this section, we will look at some basic examples of the code needed to read and output Arrow tables in both Python and R. ##### Setup First we need to install DuckDB and Arrow. The installation process for both libraries is shown below. Python: ```bash pip install duckdb pip install pyarrow ``` R: ```R install.packages("duckdb") install.packages("arrow") ``` To execute the sample examples in this section, we need to download the following custom Parquet files: * [`integers.parquet`](https://duckdb.org/data/integers.parquet) * [`lineitemsf1.snappy.parquet`](https://blobs.duckdb.org/data/lineitemsf1.snappy.parquet) ###### Python There are two ways in Python of querying data from Arrow. 1. Through the Relational API: ```python # Reads Parquet File to an Arrow Table arrow_table = pq.read_table('integers.parquet') # Transforms Arrow Table -> DuckDB Relation rel_from_arrow = duckdb.arrow(arrow_table) # we can run a SQL query on this and print the result print(rel_from_arrow.query('arrow_table', 'SELECT sum(data) FROM arrow_table WHERE data > 50').fetchone()) # Transforms DuckDB Relation -> Arrow Table arrow_table_from_duckdb = rel_from_arrow.arrow() ``` 2. By using replacement scans and querying the object directly with SQL: ```python # Reads Parquet File to an Arrow Table arrow_table = pq.read_table('integers.parquet') # Gets Database Connection con = duckdb.connect() # we can run a SQL query on this and print the result print(con.execute('SELECT sum(data) FROM arrow_table WHERE data > 50').fetchone()) # Transforms Query Result from DuckDB to Arrow Table # We can directly read the arrow object through DuckDB's replacement scans. con.execute("SELECT * FROM arrow_table").fetch_arrow_table() ``` It is possible to transform both DuckDB Relations and Query Results back to Arrow. ###### R In R, you can interact with Arrow data in DuckDB by registering the table as a view (an alternative is to use dplyr as shown above). ```r library(duckdb) library(arrow) library(dplyr) # Reads Parquet File to an Arrow Table arrow_table <- arrow::read_parquet("integers.parquet", as_data_frame = FALSE) # Gets Database Connection con <- dbConnect(duckdb::duckdb()) # Registers arrow table as a DuckDB view arrow::to_duckdb(arrow_table, table_name = "arrow_table", con = con) # we can run a SQL query on this and print the result print(dbGetQuery(con, "SELECT sum(data) FROM arrow_table WHERE data > 50")) # Transforms Query Result from DuckDB to Arrow Table result <- dbSendQuery(con, "SELECT * FROM arrow_table") ``` ##### Streaming Data from/to Arrow In the previous section, we depicted how to interact with Arrow tables. However, Arrow also allows users to interact with the data in a streaming fashion. Either consuming it (e.g., from an Arrow Dataset) or producing it (e.g., returning a RecordBatchReader). And of course, DuckDB is able to consume Datasets and produce RecordBatchReaders. This example uses the NYC Taxi Dataset, stored in Parquet files partitioned by year and month, which we can download through the Arrow R package: ```R arrow::copy_files("s3://ursa-labs-taxi-data", "nyc-taxi") ``` ###### Python ```python # Reads dataset partitioning it in year/month folder nyc_dataset = ds.dataset('nyc-taxi/', partitioning=["year", "month"]) # Gets Database Connection con = duckdb.connect() query = con.execute("SELECT * FROM nyc_dataset") # DuckDB's queries can now produce a Record Batch Reader record_batch_reader = query.fetch_record_batch() # Which means we can stream the whole query per batch. # This retrieves the first batch chunk = record_batch_reader.read_next_batch() ``` ###### R ```r # Reads dataset partitioning it in year/month folder nyc_dataset = open_dataset("nyc-taxi/", partitioning = c("year", "month")) # Gets Database Connection con <- dbConnect(duckdb::duckdb()) # We can use the same function as before to register our arrow dataset duckdb::duckdb_register_arrow(con, "nyc", nyc_dataset) res <- dbSendQuery(con, "SELECT * FROM nyc", arrow = TRUE) # DuckDB's queries can now produce a Record Batch Reader record_batch_reader <- duckdb::duckdb_fetch_record_batch(res) # Which means we can stream the whole query per batch. # This retrieves the first batch cur_batch <- record_batch_reader$read_next_batch() ``` The preceding R code shows in low-level detail how the data is streaming. We provide the helper `to_arrow()` in the Arrow package which is a wrapper around this that makes it easy to incorporate this streaming into a dplyr pipeline. > In Arrow 6.0.0, `to_arrow()` currently returns the full table, but will allow full streaming in our upcoming 7.0.0 release. #### Benchmark Comparison Here we demonstrate in a simple benchmark the performance difference between querying Arrow datasets with DuckDB and querying Arrow datasets with Pandas. For both the Projection and Filter pushdown comparison, we will use Arrow tables. That is due to Pandas not being capable of consuming Arrow stream objects. For the NYC Taxi benchmarks, we used a server in the SciLens cluster and for the TPC-H benchmarks, we used a MacBook Pro with an M1 CPU. In both cases, parallelism in DuckDB was used (which is now on by default). For the comparison with Pandas, note that DuckDB runs in parallel, while pandas only support single-threaded execution. Besides that, one should note that we are comparing automatic optimizations. DuckDB's query optimizer can automatically push down filters and projections. This automatic optimization is not supported in pandas, but it is possible for users to manually perform some of these predicate and filter pushdowns by manually specifying them in the `read_parquet()` call. ##### Projection Pushdown In this example we run a simple aggregation on two columns of our lineitem table. ```python # DuckDB lineitem = pq.read_table('lineitemsf1.snappy.parquet') con = duckdb.connect() # Transforms Query Result from DuckDB to Arrow Table con.execute("""SELECT sum(l_extendedprice * l_discount) AS revenue FROM lineitem;""").fetch_arrow_table() ``` ```python # Pandas arrow_table = pq.read_table('lineitemsf1.snappy.parquet') # Converts an Arrow table to a Dataframe df = arrow_table.to_pandas() # Runs aggregation res = pd.DataFrame({'sum': [(df.l_extendedprice * df.l_discount).sum()]}) # Creates an Arrow Table from a Dataframe new_table = pa.Table.from_pandas(res) ``` | Name | Time (s) | |-------------|---------:| | DuckDB | 0.19 | | Pandas | 2.13 | The lineitem table is composed of 16 columns, however, to execute this query only two columns `l_extendedprice` and `l_discount` are necessary. Since DuckDB can push down the projection of these columns, it is capable of executing this query about one order of magnitude faster than Pandas. ##### Filter Pushdown For our filter pushdown we repeat the same aggregation used in the previous section, but add filters on 4 more columns. ```python # DuckDB lineitem = pq.read_table('lineitemsf1.snappy.parquet') # Get database connection con = duckdb.connect() # Transforms Query Result from DuckDB to Arrow Table con.execute("""SELECT sum(l_extendedprice * l_discount) AS revenue FROM lineitem WHERE l_shipdate >= CAST('1994-01-01' AS date) AND l_shipdate < CAST('1995-01-01' AS date) AND l_discount BETWEEN 0.05 AND 0.07 AND l_quantity < 24; """).fetch_arrow_table() ``` ```python # Pandas arrow_table = pq.read_table('lineitemsf1.snappy.parquet') df = arrow_table.to_pandas() filtered_df = lineitem[ (lineitem.l_shipdate >= "1994-01-01") & (lineitem.l_shipdate < "1995-01-01") & (lineitem.l_discount >= 0.05) & (lineitem.l_discount <= 0.07) & (lineitem.l_quantity < 24)] res = pd.DataFrame({'sum': [(filtered_df.l_extendedprice * filtered_df.l_discount).sum()]}) new_table = pa.Table.from_pandas(res) ``` | Name | Time (s) | |-------------|---------:| | DuckDB | 0.04 | | Pandas | 2.29 | The difference now between DuckDB and Pandas is more drastic, being two orders of magnitude faster than Pandas. Again, since both the filter and projection are pushed down to Arrow, DuckDB reads less data than Pandas, which can't automatically perform this optimization. ##### Streaming As demonstrated before, DuckDB is capable of consuming and producing Arrow data in a streaming fashion. In this section we run a simple benchmark, to showcase the benefits in speed and memory usage when comparing it to full materialization and Pandas. This example uses the full NYC taxi dataset which you can download ```python # DuckDB # Open dataset using year, month folder partition nyc = ds.dataset('nyc-taxi/', partitioning=["year", "month"]) # Get database connection con = duckdb.connect() # Run query that selects part of the data query = con.execute("SELECT total_amount, passenger_count, year FROM nyc where total_amount > 100 and year > 2014") # Create Record Batch Reader from Query Result. # "fetch_record_batch()" also accepts an extra parameter related to the desired produced chunk size. record_batch_reader = query.fetch_record_batch() # Retrieve all batch chunks chunk = record_batch_reader.read_next_batch() while len(chunk) > 0: chunk = record_batch_reader.read_next_batch() ``` ```python # Pandas # We must exclude one of the columns of the NYC dataset due to an unimplemented cast in Arrow. working_columns = ["vendor_id","pickup_at","dropoff_at","passenger_count","trip_distance","pickup_longitude", "pickup_latitude","store_and_fwd_flag","dropoff_longitude","dropoff_latitude","payment_type", "fare_amount","extra","mta_tax","tip_amount","tolls_amount","total_amount","year", "month"] # Open dataset using year, month folder partition nyc_dataset = ds.dataset(dir, partitioning=["year", "month"]) # Generate a scanner to skip problematic column dataset_scanner = nyc_dataset.scanner(columns=working_columns) # Materialize dataset to an Arrow Table nyc_table = dataset_scanner.to_table() # Generate Dataframe from Arow Table nyc_df = nyc_table.to_pandas() # Apply Filter filtered_df = nyc_df[ (nyc_df.total_amount > 100) & (nyc_df.year >2014)] # Apply Projection res = filtered_df[["total_amount", "passenger_count","year"]] # Transform Result back to an Arrow Table new_table = pa.Table.from_pandas(res) ``` | Name | Time (s) | Peak memory usage (GBs) | |-------------|---------:|------------------------:| | DuckDB | 0.05 | 0.3 | | Pandas | 146.91 | 248 | The difference in times between DuckDB and Pandas is a combination of all the integration benefits we explored in this article. In DuckDB the filter pushdown is applied to perform partition elimination (i.e., we skip reading the Parquet files where the year is <= 2014). The filter pushdown is also used to eliminate unrelated row_groups (i.e., row groups where the total amount is always <= 100). Due to our projection pushdown, Arrow only has to read the columns of interest from the Parquet files, which allows it to read only 4 out of 20 columns. On the other hand, Pandas is not capable of automatically pushing down any of these optimizations, which means that the full dataset must be read. **This results in the 4 orders of magnitude difference in query execution time.** In the table above, we also depict the comparison of peak memory usage between DuckDB (Streaming) and Pandas (Fully-Materializing). In DuckDB, we only need to load the row group of interest into memory. Hence our memory usage is low. We also have constant memory usage since we only have to keep one of these row groups in-memory at a time. Pandas, on the other hand, has to fully materialize all Parquet files when executing the query. Because of this, we see a constant steep increase in its memory consumption. **The total difference in memory consumption of the two solutions is around 3 orders of magnitude.** #### Conclusion and Feedback In this blog post, we mainly showcased how to execute queries on Arrow datasets with DuckDB. There are additional libraries that can also consume the Arrow format but they have different purposes and capabilities. As always, we are happy to hear if you want to see benchmarks with different tools for a post in the future! Feel free to drop us an [email](#mailto:[email protected];[email protected]) or share your thoughts directly in the Hacker News post. Last but not least, if you encounter any problems when using our integration, please open an issue in either [DuckDB's issue tracker](https://github.com/duckdb/duckdb/issues) or [Arrow's issue tracker](https://issues.apache.org/jira/projects/ARROW/), depending on which library has a problem. ## DuckDB Time Zones: Supporting Calendar Extensions **Publication date:** 2022-01-06 **Author:** Richard Wesley **TL;DR:** The DuckDB ICU extension now provides time zone support. Time zone support is a common request for temporal analytics, but the rules are complex and somewhat arbitrary. The most well supported library for locale-specific operations is the [International Components for Unicode (ICU)](https://icu.unicode.org). DuckDB already provided collated string comparisons using ICU via an extension (to avoid dependencies), and we have now connected the existing ICU calendar and time zone functions to the main code via the new `TIMESTAMP WITH TIME ZONE` (or `TIMESTAMPTZ` for short) data type. The ICU extension is pre-bundled in DuckDB's Python client and can be optionally installed in the remaining clients. In this post, we will describe how time works in DuckDB and what time zone functionality has been added. #### What Is Time? >People assume that time is a strict progression of cause to effect, >but actually from a non-linear, non-subjective viewpoint >itâ€™s more like a big ball of wibbly wobbly timey wimey stuff. > -- Doctor Who: Blink Time in databases can be very confusing because the way we talk about time is itself confusing. Local time, GMT, UTC, time zones, leap years, proleptic Gregorian calendars â€“ it all looks like a big mess. But if you step back, modeling time is actually fairly simple, and can be reduced to two pieces: instants and binning. ##### Instants You will often hear people (and documentation) say that database time is stored in UTC. This is sort of right, but it is more accurate to say that databases store *instants*. An instant is a point in universal time, and they are usually given as a count of some time increment from a fixed point in time (called the *epoch*). In DuckDB, the fixed point is the Unix epoch `1970-01-01 00:00:00 +00:00`, and the increment is microseconds (Âµs). (Note that to avoid confusion we will be using ISO-8601 y-m-d notation in this post to denote instants.) In other words, a `TIMESTAMP` column contains instants. There are three other temporal types in SQL: * `DATE` â€“ an integral count of days from a fixed date. In DuckDB, the fixed date is `1970-01-01`, again in UTC. * `TIME` â€“ a (positive) count of microseconds up to a single day * `INTERVAL` â€“ a set of fields for counting time differences. In DuckDB, intervals count months, days and microseconds. (Months are not completely well-defined, but when present, they represent 30 days.) None of these other temporal types except `TIME` can have a `WITH TIME ZONE` modifier (and shorter `TZ` suffix), but to understand what that modifier means, we first need to talk about *temporal binning*. ##### Temporal Binning Instants are pretty straightforward â€“ they are just a number â€“ but binning is the part that trips people up. Binning is probably a familiar idea if you have worked with continuous data: You break up a set of values into ranges and map each value to the range (or *bin*) that it falls into. Temporal binning is just doing this to instants: ![](../images/blog/timezones/tz-instants-light.svg) ![](../images/blog/timezones/tz-instants-dark.svg) Temporal binning systems are often called *calendars*, but we are going to avoid that term for now because calendars are usually associated with dates, and temporal binning also includes rules for time. These time rules are called *time zones*, and they also impact where the day boundaries used by the calendar fall. For example, here is what the binning for a second time zone looks like at the epoch: ![](../images/blog/timezones/tz-timezone-light.svg) ![](../images/blog/timezones/tz-timezone-dark.svg) The most confusing thing about temporal binning is that there is more than one way to bin time, and it is not always obvious what binning should be used. For example, what I mean by "today" is a bin of instants often determined by where I live. Every instant that is part of my "today" goes in that bin. But notice that I qualified "today" with "where I live", and that qualification determines what binning system is being used. But "today" could also be determined by "where the events happened", which would require a different binning to be applied. The biggest temporal binning problem most people run into occurs when daylight savings time changes. This example contains a daylight savings time change where the "hour" bin is two hours long! To distinguish the two hours, we needed to include another bin containing the offset from UTC: ![](../images/blog/timezones/tz-daylight-light.svg) ![](../images/blog/timezones/tz-daylight-dark.svg) As this example shows, in order to bin the instants correctly, we need to know the binning rules that apply. It also shows that we can't just use the built in binning operations, because they don't understand daylight savings time. ##### NaÃ¯ve Timestamps Instants are sometimes created from a string format using a local binning system instead of an instant. This results in the instants being offset from UTC, which can cause problems with daylight savings time. These are called *naÃ¯ve* timestamps, and they may constitute a data cleaning problem. Cleaning naÃ¯ve timestamps requires determining the offset for each timestamp and then updating the value to be an instant. For most values, this can be done with an inequality join against a table containing the correct offsets, but the ambiguous values may need to be fixed by hand. It may also be possible to correct the ambiguous values by assuming that they were inserted in order and looking for "backwards jumps" using window functions. A simple way to avoid this situation going forward is to add the UTC offset to non-UTC strings: `2021-07-31 07:20:15 -07:00`. The DuckDB `VARCHAR` cast operation parses these offsets correctly and will generate the corresponding instant. #### Time Zone Data Types The SQL standard defines temporal data types qualified by `WITH TIME ZONE`. This terminology is confusing because it seems to imply that the time zone will be stored with the value, but what it really means is "bin this value using the session's `TimeZone` setting". Thus a `TIMESTAMPTZ` column also stores instants, but expresses a "hint" that it should use a specific binning system. There are a number of operations that can be performed on instants without a binning system: * Comparing; * Sorting; * Increment (Âµs) difference; * Casting to and from regular `TIMESTAMP`s. These common operations have been implemented in the main DuckDB code base, while the binning operations have been delegated to extensions such as ICU. One small difference between the display of the new `WITH TIME ZONE` types and the older types is that the new types will be displayed with a `+00` UTC offset. This is simply to make the type differences visible in command line interfaces and for testing. Properly formatting a `TIMESTAMPTZ` for display in a locale requires using a binning system. #### ICU Temporal Binning DuckDB already uses an ICU extension for collating strings for a particular locale, so it was natural to extend it to expose the ICU calendar and time zone functionality. ##### ICU Time Zones The first step for supporting time zones is to add the `TimeZone` setting that should be applied. DuckDB extensions can define and validate their own settings, and the ICU extension now does this: ```sql -- Load the extension -- This is not needed in Python or R, as the extension is already installed LOAD icu; -- Show the current time zone. The default is set to ICU's current time zone. SELECT * FROM duckdb_settings() WHERE name = 'TimeZone'; ``` ```text TimeZone Europe/Amsterdam The current time zone VARCHAR ``` ```sql -- Choose a time zone. SET TimeZone = 'America/Los_Angeles'; -- Emulate Postgres' time zone table SELECT name, abbrev, utc_offset FROM pg_timezone_names() ORDER BY 1 LIMIT 5; ``` ```text ACT ACT 09:30:00 AET AET 10:00:00 AGT AGT -03:00:00 ART ART 02:00:00 AST AST -09:00:00 ``` ##### ICU Temporal Binning Functions Databases like DuckDB and Postgres usually provide some temporal binning functions such as `YEAR` or `DATE_PART`. These functions are part of a single binning system for the conventional (proleptic Gregorian) calendar and the UTC time zone. Note that casting to a string is a binning operation because the text produced contains bin values. Because timestamps that require custom binning have a different data type, the ICU extension can define additional functions with bindings to `TIMESTAMPTZ`: * `+` â€“ Add an `INTERVAL` to a timestamp * `-` â€“ Subtract an `INTERVAL` from a timestamp * `AGE` â€“ Compute an `INTERVAL` describing the months/days/microseconds between two timestamps (or one timestamp and the current instant). * `DATE_DIFF` â€“ Count part boundary crossings between two timestamp * `DATE_PART` â€“ Extract a named timestamp part. This includes the part alias functions such as `YEAR`. * `DATE_SUB` â€“ Count the number of complete parts between two timestamp * `DATE_TRUNC` â€“ Truncate a timestamp to the given precision * `LAST_DAY` â€“ Returns the last day of the month * `MAKE_TIMESTAMPTZ` â€“ Constructs a `TIMESTAMPTZ` from parts, including an optional final time zone specifier. We have not implemented these functions for `TIMETZ` because this type has limited utility, but it would not be difficult to add in the future. We have also not implemented string formatting/casting to `VARCHAR` because the type casting system is not yet extensible, and the current [ICU build](https://github.com/Mytherin/minimal-icu-collation) we are using does not embed this data. ##### ICU Calendar Support ICU can also perform binning operations for some non-Gregorian calendars. We have added support for these calendars via a `Calendar` setting and the `icu_calendar_names` table function: ```sql LOAD icu; -- Show the current calendar. The default is set to ICU's current locale. SELECT * FROM duckdb_settings() WHERE name = 'Calendar'; ``` ```text Calendar gregorian The current calendar VARCHAR ``` ```sql -- List the available calendars SELECT DISTINCT name FROM icu_calendar_names() ORDER BY 1 DESC LIMIT 5; ``` ```text roc persian japanese iso8601 islamic-umalqura ``` ```sql -- Choose a calendar SET Calendar = 'japanese'; -- Extract the current Japanese era number using Tokyo time SET TimeZone = 'Asia/Tokyo'; SELECT era('2019-05-01 00:00:00+10'::TIMESTAMPTZ), era('2019-05-01 00:00:00+09'::TIMESTAMPTZ); ``` ```text 235 236 ``` ##### Caveats ICU has some differences in behavior and representation from the DuckDB implementation. These are hopefully minor issues that should only be of concern to serious time nerds. * ICU represents instants as millisecond counts using a `DOUBLE`. This makes it lose accuracy far from the epoch (e.g., around the first millennium) * ICU uses the Julian calendar for dates before the Gregorian change on `1582-10-15` instead of the proleptic Gregorian calendar. This means that dates prior to the changeover will differ, although ICU will give the date as actually written at the time. * ICU computes ages by using part increments instead of using the length of the earlier month like DuckDB and Postgres. #### Future Work Temporal analysis is a large area, and while the ICU time zone support is a big step forward, there is still much that could be done. Some of these items are core DuckDB improvements that could benefit all temporal binning systems and some expose more ICU functionality. There is also the prospect for writing other custom binning systems via extensions. ##### DuckDB Features Here are some general projects that all binning systems could benefit from: * Add a `DATE_ROLL` function that emulates the ICU calendar `roll` operation for "rotating" around a containing bin; * Making casting operations extensible so extensions can add their own support; ##### ICU Functionality ICU is a very rich library with a long pedigree, and there is much that could be done with the existing library: * Create a more general `MAKE_TIMESTAPTZ` variant that takes a `STRUCT` with the parts. This could be useful for some non-Gregorian calendars. * Extend the embedded data to contain locale temporal information (such as month names) and support formatting (` to_char`) and parsing (` to_timestamp`) of local dates. One issue here is that the ICU date formatting language is more sophisticated than the Postgres language, so multiple functions might be required (e.g., `icu_to_char`); * Extend the binning functions to take per-row calendar and time zone specifications to support row-level temporal analytics such as "what time of day did this happen"? ##### Separation of Concerns Because the time zone data type is defined in the main code base, but the calendar operations are provided by an extension, it is now possible to write application-specific extensions with custom calendar and time zone support such as: * Financial 4-4-5 calendars; * ISO week-based years; * Table-driven calendars; * Astronomical calendars with leap seconds; * Fun calendars, such as Shire Reckoning and French Republican! #### Conclusion and Feedback In this blog post, we described the new DuckDB time zone functionality as implemented via the ICU extension. We hope that the functionality provided can enable temporal analytic applications involving time zones. We also look forward to seeing any custom calendar extensions that our users dream up! Last but not least, if you encounter any problems when using our integration, please open an issue in DuckDB's issue tracker! ## Parallel Grouped Aggregation in DuckDB **Publication date:** 2022-03-07 **Authors:** Hannes MÃ¼hleisen and Mark Raasveldt **TL;DR:** DuckDB has a fully parallelized aggregate hash table that can efficiently aggregate over millions of groups. Grouped aggregations are a core data analysis command. It is particularly important for large-scale data analysis (â€œOLAPâ€) because it is useful for computing statistical summaries of huge tables. DuckDB contains a highly optimized parallel aggregation capability for fast and scalable summarization. Jump [straight to the benchmarks](#::experiments)? #### Introduction `GROUP BY` changes the result set cardinality â€“ instead of returning the same number of rows of the input (like a normal `SELECT`), `GROUP BY` returns as many rows as there are groups in the data. Consider this (weirdly familiar) example query: ```sql SELECT l_returnflag, l_linestatus, sum(l_extendedprice), avg(l_quantity) FROM lineitem GROUP BY l_returnflag, l_linestatus; ``` `GROUP BY` is followed by two column names, `l_returnflag` and `l_linestatus`. Those are the columns to compute the groups on, and the resulting table will contain all combinations of the same column that occur in the data. We refer to the columns in the `GROUP BY` clause as the â€œgrouping columnsâ€ and all occurring combinations of values therein as â€œgroupsâ€. The `SELECT` clause contains four (not five) expressions: References to the grouping columns, and two aggregates: the `sum` over `l_extendedprice` and the `avg` over `l_quantity`. We refer to those as the â€œaggregatesâ€. If executed, the result of this query looks something like this: | l_returnflag | l_linestatus | sum(l_extendedprice) | avg(l_quantity) | |--------------|--------------|----------------------|----------------:| | N | O | 114935210409.19 | 25.5 | | R | F | 56568041380.9 | 25.51 | | A | F | 56586554400.73 | 25.52 | | N | F | 1487504710.38 | 25.52 | In general, SQL allows only columns that are mentioned in the `GROUP BY` clause to be part of the `SELECT` expressions directly, all other columns need to be subject to one of the aggregate functions like `sum`, `avg` etc. There are [many more aggregate functions](#docs:stable:sql:functions:aggregates) depending on which SQL system you use. How should a query processing engine compute such an aggregation? There are many design decisions involved, and we will discuss those below and in particular the decisions made by DuckDB. The main issue when computing grouping results is that the groups can occur in the input table in any order. Were the input already sorted on the grouping columns, computing the aggregation would be trivial, as we could just compare the current values for the grouping columns with the previous ones. If a change occurs, the next group begins and a new aggregation result needs to be computed. Since the sorted case is easy, one straightforward way of computing grouped aggregates is to sort the input table on the grouping columns first, and then use the trivial approach. But sorting the input is unfortunately still a computationally expensive operation [despite our best efforts](https://duckdb.org/2021/08/27/external-sorting). In general, sorting has a computational complexity of `O(nlogn)` with n being the number of rows sorted. #### Hash Tables for Aggregation A better way is to use a hash table. Hash tables are a [foundational data structure in computing](https://en.wikipedia.org/wiki/Hash_table) that allow us to find entries with a computational complexity of `O(1)`. A full discussion on how hash tables work is far beyond the scope of this post. Below we try to focus on a very basic description and considerations related to aggregate computation. ![](../images/blog/aggregates/aggr-bench-nlogn.svg)

O(n) plotted against O(nlogn) to illustrate scaling behavior

To add `n` rows to a hash table we are looking at a complexity of `O(n)`, much, much better than `O(nlogn)` for sorting, especially when n goes into the billions. The figure above illustrates how the complexity develops as the table size increases. Another big advantage is that we do not have to make a sorted copy of the input first, which is going to be just as large as the input. Instead, the hash table will have at most as many entries as there are groups, which can be (and usually are) dramatically fewer than input rows. The overall process is thus this: Scan the input table, and for each row, update the hash table accordingly. Once the input is exhausted, we scan the hash table to provide rows to upstream operators or the query result directly. ##### Collision Handling So, hash table it is then! We build a hash table on the input with the groups as keys and the aggregates as the entries. Then, for every input row, we compute a hash of the group values, find the entry in the hash table, and either create or update the aggregate states with the values from the row? Its unfortunately not that simple: Two rows with *different* values for the grouping columns may result in a hash that points to the *same* hash table entry, which would lead to incorrect results. There are two main approaches to [work around this problem](https://en.wikipedia.org/wiki/Hash_table#Collision_resolution): â€œChainingâ€ or â€œlinear probingâ€. With chaining, we do not keep the aggregate values in the hash table directly, but rather keep a list of group values and aggregates. If grouping values points to a hash table entry with an empty list, the new group and the aggregates are simply added. If grouping values point to an existing list, we check for every list entry whether the grouping values match. If so, we update the aggregates for that group. If not, we create a new list entry. In linear probing there are no such lists, but on finding an existing entry, we will compare the grouping values, and if they match we will update the entry. If they do not match, we move one entry down in the hash table and try again. This process finishes when either a matching group entry has been found or an empty hash table entry is found. While theoretically equivalent, computer hardware architecture will favor linear probing because of cache locality. Because linear probing walks the hash table entries *linearly*, the next entry will very likely be in the CPU cache and hence access is faster. Chaining will generally lead to random access and much worse performance on modern hardware architectures. We have therefore adopted linear probing for our aggregate hash table. Both chaining and linear probing will degrade in theoretical lookup performance from O(1) to O(n) wrt hash table size if there are too many collisions, i.e., too many groups hashing to the same hash table entry. A common solution to this problem is to resize the hash table once the â€œfill ratioâ€ exceeds some threshold, e.g., 75% is the default for Javaâ€™s `HashMap`. This is particularly important as we do not know the amount of groups in the result before starting the aggregation. Neither do we assume to know the amount of rows in the input table. We thus start with a fairly small hash table and resize it once the fill ratio exceeds a threshold. The basic hash table structure is shown in the figure below, the table has four slots 0-4. There are already three groups in the table, with group keys 12, 5 and 2. Each group has aggregate values (e.g., from a `SUM`) of 43 etc. ![](../images/blog/aggregates/aggr-ht-naive.png)

Basic Aggregate Hash Table Structure

A big challenge with the resize of a partially filled hash table after the resize, all the groups are in the wrong place and we would have to move everything, which will be very expensive. ![](../images/blog/aggregates/aggr-ht-twopart.png)

Two-Part Aggregate Hash Table

To support resize efficiently, we have implemented a two-part aggregate hash table consisting of a separately-allocated pointer array which points into payload blocks that contain grouping values and aggregate states for each group. The pointers are not actual pointers but symbolic, they refer to a block ID and a row offset within said block. This is shown in the figure above, the hash table entries are split over two payload blocks. On resize, we throw away the pointer array and allocate a bigger one. Then, we read all payload blocks again, hash the group values, and re-insert pointers to them into the new pointer array. The group data thus remains unchanged, which greatly reduces the cost of resizing the hash table. This can be seen in the figure below, where we double the pointer array size but the payload blocks remain unchanged. ![](../images/blog/aggregates/aggr-ht-resize.png)

Resizing Two-Part Aggregate Hash Table

The naive two-part hash table design would require a re-hashing of *all* group values on resize, which can be quite expensive especially for string values. To speed this up, we also write the raw hash of the group values to the payload blocks for every group. Then, during resize, we donâ€™t have to re-hash the groups but can just read them from the payload blocks, compute the new offset into the pointer array, and insert there. ![](../images/blog/aggregates/aggr-ht-hashcache.png)

Optimization: Adding Hashes to Payload

The two-part hash table has a big drawback when looking up entries: There is no ordering between the pointer array and the group entries in the payload blocks. Hence, following the pointer creates random access in the memory hierarchy. This will lead to unnecessary stalls in the computation. To mitigate this issue, we extend the memory layout of the pointer array to include some (1 or 2) bytes from the group hash in addition to the pointer to the payload value. This way, linear probing can first compare the hash bits in the pointer array with the current group hash and decide whether itâ€™s worth following the payload pointer or not. This can potentially continue for every group in the pointer chain. Only when the hash bits match we have to actually follow the pointer and compare the actual groups. This optimization greatly reduces the amount of times the pointer to the payload blocks has to be followed and thereby reduces the amount of random accesses into memory which are directly related to overall performance. It has the nice side-effect of also greatly reducing full group comparisons which can also be expensive, e.g., when aggregating on groups that contain strings. ![](../images/blog/aggregates/aggr-ht-salting.png)

Optimization: Adding Hash Bits to Pointer Array

Another (smaller) optimization here concerns the width of the pointer array entries. For small hash tables with few entries, we do not need many bits to encode the payload block offset pointers. DuckDB supports both 4 byte and 8 byte pointer array entries. For most aggregate queries, the vast majority of query processing time is spent looking up hash table entries, which is why it's worth spending time on optimizing them. If youâ€™re curious, the code for all this is in the DuckDB repo, `aggregate_hashtable.cpp`. There is another optimization for when we know that there are only a few distinct groups from column statistics, the perfect hash aggregate, but thatâ€™s for another post. But weâ€™re not done here just yet. #### Parallel Aggregation While we now have an aggregate hash table design that should do fairly well for grouped aggregations, we still have not considered the fact that DuckDB automatically parallelizes all queries to use multiple hardware threads (â€œCPUsâ€). How does parallelism work together with hash tables? In general, the answer is unfortunately: â€œBadlyâ€. Hash tables are delicate structures that donâ€™t handle parallel modifications well. For example, imagine one thread would want to resize the hash table while another wants to add some new group data to it. Or how should we handle multiple threads inserting new groups at the same time for the same entry? One could use locks to make sure that only one thread at a time is using the table, but this would mostly defeat parallelizing the query. There has been plenty of research into concurrency-friendly hash tables but the short summary is that it's still an open issue. It is possible to let each thread read data from downstream operators and build individual, local hash tables and merge those together later from a single thread. This works quite nicely if there are few groups like in the example at the top of this post. If there are few groups, a single thread can merge many thread-local hash tables without creating a bottleneck. However, itâ€™s entirely possible there are as many groups as there are input rows, for this tends to happen a lot when someone groups on a column that would be a candidate for a primary key, e.g., `observation_number`, `timestamp` etc. What is thus needed is a parallel merge of the parallel hash tables. We adopt a method from [Leis et al.](https://15721.courses.cs.cmu.edu/spring2016/papers/p743-leis.pdf): Each thread builds not one, but multiple *partitioned* hash tables based on a radix-partitioning on the group hash. ![](../images/blog/aggregates/aggr-ht-parallel.png)

Partitioning Hash Tables for Parallelized Merging

The key observation here is that if two groups have a different hash value, they cannot possibly be the same. Because of this property, it is possible to use the hash values to create fully independent partitions of the groups without requiring any communication between threads as long as all the threads use the same partitioning scheme (see Phase 1 in the above diagram). After all the local hash tables have been constructed, we assign individual partitions to each worker thread and merge the hash tables within that partition together (Phase 2). Because the partitions were created using the radix partitioning scheme on the hash, all worker threads can independently merge the hash tables within their respective partitions. The result is correct because each group goes into a single partition and that partition only. One interesting detail is that we never need to build a final (possibly giant) hash table that holds all the groups because the radix group partitioning ensures that each group is localized to a partition. There are two additional optimizations for the parallel partitioned hash table strategy: 1) We only start partitioning once a single threadâ€™s aggregate hash table exceeds a fixed limit of entries, currently set to 10 000 rows. This is because using a partitioned hash table is not free. For every row added, we have to figure out which partition it should go into, and we have to merge everything back together at the end. For this reason, we will not start partitioning until the parallelization benefit outweighs the cost. Since the partitioning decision is individual to each thread, it may well be possible only some threads start partitioning. If that is the case, we will need to partition the hash tables of the threads that have not done so before starting merging them. This is a fully thread-local operation however and does not interfere with parallelism. 2) We will stop adding values to a hash table once its pointer array exceeds a certain threshold. Every thread then builds multiple sets of potentially partitioned hash tables. This is because we do not want the pointer array to become arbitrarily large. While this potentially creates duplicate entries for the same group in multiple hash tables, this is not problematic because we merge them all later anyway. This optimization works particularly well on data sets that have many distinct groups, but have group values that are clustered in the input in some manner. For example, when grouping by day in a data set that is ordered on date. There are some kinds of aggregates which cannot use the parallel and partitioned hash table approach. While it is trivial to parallelize a sum, because the sum of the overall result is just the sum of the individual results, this is fairly impossible for computations like `median`, which DuckDB also supports. Also for this reason, DuckDB also supports `approx_quantile`, which *is* parallelizable. #### Experiments Putting all this together, itâ€™s now time for some performance experiments. We will compare DuckDBâ€™s aggregation operator as described above with the same operator in various Python data wrangling libraries. The other contenders are Pandas, Polars and Arrow. Those are chosen since they can all execute an aggregation operator on Pandas DataFrames without converting into some other storage format first, just like DuckDB. For our benchmarks, we generate a synthetic dataset with a pre-defined number of groups over two integer columns and some random integer data to aggregate. The entire dataset is shuffled before the experiments to prevent taking advantage of the clustered nature of the synthetically generated data. For each group, we compute two aggregates, sum of the data column and a simple count. The SQL version of this aggregation would be `SELECT g1, g2, sum(d), count(*) FROM dft GROUP BY g1, g2 LIMIT 1;`. In the experiments below, we vary the dataset size and the amount of groups in them. This should nicely show the scaling behavior of the aggregation. Because we are not interested in measuring the result set materialization time which would be significant for millions of groups, we follow the aggregation with an operator that only retrieves the first row. This does not change the complexity of the aggregation at all, since it needs to collect all data before producing even the first result row, since there might be data in the very last input data row that changes results for the first result. Of course this would be fairly unrealistic in practice, but it should nicely isolate the behavior of the aggregation operator only, since a `head(1)` operation on three columns should be fairly cheap and constant in execution time. ![](../images/blog/aggregates/aggr-bench-rows-fewgroups.svg)

Varying row count for 1000 groups

We measure the elapsed wall clock time required to complete each aggregation. To account for minor variation, we repeat each measurement three times and report the median time required. All experiments were run on a 2021 MacBook Pro with a ten-core M1 Max processor and 64 GB of RAM. Our data generation benchmark script [is available online](https://gist.github.com/hannes/e2599ae338d275c241c567934a13d422) and we invite interested readers to re-run the experiment on their machines. ![](../images/blog/aggregates/aggr-bench-rows-manygroups.svg)

Varying both row count and group count

Now let's discuss some results. We start with varying the amount of rows in the table between one million and 100 millions. We repeat the experiment for both a fixed (small) group count of 1000 and when the amount of groups is equal to the amount of rows. Results are plotted as a *log-log plot*, we can see how DuckDB consistently outperforms the other systems, with the single-threaded Pandas being slowest, Polars and Arrow being generally similar. ![](../images/blog/aggregates/aggr-bench-groups.svg)

Varying group count for 100M rows

For the next experiment, we fix the amount of rows at 100M (the largest size we experimented with) and show the full behavior when increasing the group size. We can see again how DuckDB consistently exhibits good scaling behavior when increasing group size, because it can effectively parallelize all phases of aggregation as outlined above. If you are interested in how we generated those plots, the plotting [script is available, too](https://gist.github.com/hannes/9b0e47625290b8af78de88e1d26441c0). #### Conclusion Data analysis pipelines using mostly aggregation spend the vast majority of their execution time in the aggregate hash table, which is why it is worth spending an ungodly amount of human time optimizing them. We have some ideas for future work on this, for example we would like to extend [our work when comparing sorting keys](https://duckdb.org/2021/08/27/external-sorting) to comparing groups in the aggregate hash table. We also would like to add capabilities of dynamically choosing the amount of partitions a thread uses based on dynamic observation of the created hash table, e.g., if partitions are imbalanced we could use more bits to do so. Another large area of future work is to make our aggregate hash table work with out-of-core operations, where an individual hash table no longer fits in memory, this is particularly problematic when merging. And of course there are always opportunities to fine-tune an aggregation operator, and we are continuously improving DuckDBs aggregation operator. If you want to work on cutting edge data engineering like this that will be used by thousands of people, consider contributing to DuckDB or join us at DuckDB Labs in Amsterdam! ## Friendlier SQL with DuckDB **Publication date:** 2022-05-04 **Author:** Alex Monahan **TL;DR:** DuckDB offers several extensions to the SQL syntax. For a full list of these features, see the [Friendly SQL documentation page](/docs/guides/sql_features/friendly_sql). ![](../images/blog/duck_chewbacca.png) An elegant user experience is a key design goal of DuckDB. This goal guides much of DuckDB's architecture: it is simple to install, seamless to integrate with other data structures like Pandas, Arrow, and R Dataframes, and requires no dependencies. Parallelization occurs automatically, and if a computation exceeds available memory, data is gracefully buffered out to disk. And of course, DuckDB's processing speed makes it easier to get more work accomplished. However, SQL is not famous for being user-friendly. DuckDB aims to change that! DuckDB includes both a Relational API for dataframe-style computation, and a highly Postgres-compatible version of SQL. If you prefer dataframe-style computation, we would love your feedback on [our roadmap](https://github.com/duckdb/duckdb/issues/2000). If you are a SQL fan, read on to see how DuckDB is bringing together both innovation and pragmatism to make it easier to write SQL in DuckDB than anywhere else. Please reach out on [GitHub](https://github.com/duckdb/duckdb/discussions) or [Discord](https://discord.gg/vukK4xp7Rd) and let us know what other features would simplify your SQL workflows. Join us as we teach an old dog new tricks! #### `SELECT * EXCLUDE` A traditional SQL `SELECT` query requires that requested columns be explicitly specified, with one notable exception: the `*` wildcard. `SELECT *` allows SQL to return all relevant columns. This adds tremendous flexibility, especially when building queries on top of one another. However, we are often interested in *almost* all columns. In DuckDB, simply specify which columns to `EXCLUDE`: ```sql SELECT * EXCLUDE (jar_jar_binks, midichlorians) FROM star_wars; ``` Now we can save time repeatedly typing all columns, improve code readability, and retain flexibility as additional columns are added to underlying tables. DuckDB's implementation of this concept can even handle exclusions from multiple tables within a single statement: ```sql SELECT sw.* EXCLUDE (jar_jar_binks, midichlorians), ff.* EXCLUDE cancellation FROM star_wars sw, firefly ff; ``` #### `SELECT * REPLACE` Similarly, we often wish to use all of the columns in a table, aside from a few small adjustments. This would also prevent the use of `*` and require a list of all columns, including those that remain unedited. In DuckDB, easily apply changes to a small number of columns with `REPLACE`: ```sql SELECT * REPLACE (movie_count+3 AS movie_count, show_count*1000 AS show_count) FROM star_wars_owned_by_disney; ``` This allows views, CTE's, or sub-queries to be built on one another in a highly concise way, while remaining adaptable to new underlying columns. #### `GROUP BY ALL` A common cause of repetitive and verbose SQL code is the need to specify columns in both the `SELECT` clause and the `GROUP BY` clause. In theory this adds flexibility to SQL, but in practice it rarely adds value. DuckDB now offers the `GROUP BY` we all expected when we first learned SQL â€“ just `GROUP BY ALL` columns in the `SELECT` clause that aren't wrapped in an aggregate function! ```sql SELECT systems, planets, cities, cantinas, sum(scum + villainy) AS total_scum_and_villainy FROM star_wars_locations GROUP BY ALL; -- GROUP BY systems, planets, cities, cantinas ``` Now changes to a query can be made in only one place instead of two! Plus this prevents many mistakes where columns are removed from a `SELECT` list, but not from the `GROUP BY`, causing duplication. Not only does this dramatically simplify many queries, it also makes the above `EXCLUDE` and `REPLACE` clauses useful in far more situations. Imagine if we wanted to adjust the above query by no longer considering the level of scum and villainy in each specific cantina: ```sql SELECT * EXCLUDE (cantinas, booths, scum, villainy), sum(scum + villainy) AS total_scum_and_villainy FROM star_wars_locations GROUP BY ALL; -- GROUP BY systems, planets, cities ``` Now that is some concise and flexible SQL! How many of your `GROUP BY` clauses could be re-written this way? #### `ORDER BY ALL` Another common cause for repetition in SQL is the `ORDER BY` clause. DuckDB and other RDBMSs have previously tackled this issue by allowing queries to specify the numbers of columns to `ORDER BY` (For example, `ORDER BY 1, 2, 3`). However, frequently the goal is to order by all columns in the query from left to right, and maintaining that numeric list when adding or subtracting columns can be error prone. In DuckDB, simply `ORDER BY ALL`: ```sql SELECT age, sum(civility) AS total_civility FROM star_wars_universe GROUP BY ALL ORDER BY ALL; -- ORDER BY age, total_civility ``` This is particularly useful when building summaries, as many other client tools automatically sort results in this manner. DuckDB also supports `ORDER BY ALL DESC` to sort each column in reverse order, and options to specify `NULLS FIRST` or `NULLS LAST`. #### Column Aliases in `WHERE` / `GROUP BY` / `HAVING` In many SQL dialects, it is not possible to use an alias defined in a `SELECT` clause anywhere but in the `ORDER BY` clause of that statement. This commonly leads to verbose CTE's or subqueries in order to utilize those aliases. In DuckDB, a non-aggregate alias in the `SELECT` clause can be immediately used in the `WHERE` and `GROUP BY` clauses, and aggregate aliases can be used in the `HAVING` clause, even at the same query depth. No subquery needed! ```sql SELECT only_imperial_storm_troopers_are_so_precise AS nope, turns_out_a_parsec_is_a_distance AS very_speedy, sum(mistakes) AS total_oops FROM oops WHERE nope = 1 GROUP BY nope, very_speedy HAVING total_oops > 0; ``` #### Case Insensitivity While Maintaining Case DuckDB allows queries to be case insensitive, while maintaining the specified case as data flows into and out of the system. This simplifies queries within DuckDB while ensuring compatibility with external libraries. ```sql CREATE TABLE mandalorian AS SELECT 1 AS "THIS_IS_THE_WAY"; SELECT this_is_the_way FROM mandalorian; ``` | THIS_IS_THE_WAY | |----------------:| | 1 | #### Friendly Error Messages Regardless of expertise, and despite DuckDB's best efforts to understand our intentions, we all make mistakes in our SQL queries. Many RDBMSs leave you trying to use the force to detect an error. In DuckDB, if you make a typo on a column or table name, you will receive a helpful suggestion about the most similar name. Not only that, you will receive an arrow that points directly to the offending location within your query. ```sql SELECT * FROM star_trek; ``` ```console Error: Catalog Error: Table with name star_trek does not exist! Did you mean "star_wars"? LINE 1: SELECT * FROM star_trek; ^ ``` (Don't worry, ducks and duck-themed databases still love some Trek as well). DuckDB's suggestions are even context specific. Here, we receive a suggestion to use the most similar column from the table we are querying. ```sql SELECT long_ago FROM star_wars; ``` ```console Error: Binder Error: Referenced column "long_ago" not found in FROM clause! Candidate bindings: "star_wars.long_long_ago" LINE 1: SELECT long_ago FROM star_wars; ^ ``` #### String Slicing Even as SQL fans, we know that SQL can learn a thing or two from newer languages. Instead of using bulky `SUBSTRING` functions, you can slice strings in DuckDB using bracket syntax. As a note, SQL is required to be 1-indexed, so that is a slight difference from other languages (although it keeps DuckDB internally consistent and similar to other DBs). ```sql SELECT 'I love you! I know'[:-3] AS nearly_soloed; ``` | nearly_soloed | |:---| | I love you! I k | #### Simple List and Struct Creation DuckDB provides nested types to allow more flexible data structures than the purely relational model would allow, while retaining high performance. To make them as easy as possible to use, creating a `LIST` (array) or a `STRUCT` (object) uses simpler syntax than other SQL systems. Data types are automatically inferred. ```sql SELECT ['A-Wing', 'B-Wing', 'X-Wing', 'Y-Wing'] AS starfighter_list, {name: 'Star Destroyer', common_misconceptions: 'Can''t in fact destroy a star'} AS star_destroyer_facts; ``` #### List Slicing Bracket syntax may also be used to slice a `LIST`. Again, note that this is 1-indexed for SQL compatibility. ```sql SELECT starfighter_list[2:2] AS dont_forget_the_b_wing FROM (SELECT ['A-Wing', 'B-Wing', 'X-Wing', 'Y-Wing'] AS starfighter_list); ``` | dont_forget_the_b_wing | |:---| | [B-Wing] | #### Struct Dot Notation Use convenient dot notation to access the value of a specific key in a DuckDB `STRUCT` column. If keys contain spaces, double quotes can be used. ```sql SELECT planet.name, planet."Amount of sand" FROM (SELECT {name: 'Tatooine', 'Amount of sand': 'High'} AS planet); ``` #### Trailing Commas Have you ever removed your final column from a SQL `SELECT` and been met with an error, only to find you needed to remove the trailing comma as well!? Never? Ok, Jedi... On a more serious note, this feature is an example of DuckDB's responsiveness to the community. In under 2 days from seeing this issue in a tweet (not even about DuckDB!), this feature was already built, tested, and merged into the primary branch. You can include trailing commas in many places in your query, and we hope this saves you from the most boring but frustrating of errors! ```sql SELECT x_wing, proton_torpedoes, --targeting_computer FROM luke_whats_wrong GROUP BY x_wing, proton_torpedoes, ; ``` #### Function Aliases from Other Databases For many functions, DuckDB supports multiple names in order to align with other database systems. After all, ducks are pretty versatile â€“ they can fly, swim, and walk! Most commonly, DuckDB supports PostgreSQL function names, but many SQLite names are supported, as well as some from other systems. If you are migrating your workloads to DuckDB and a different function name would be helpful, please reach out â€“ they are very easy to add as long as the behavior is the same! See our [functions documentation](#docs:stable:sql:functions:overview) for details. ```sql SELECT 'Use the Force, Luke'[:13] AS sliced_quote_1, substr('I am your father', 1, 4) AS sliced_quote_2, substring('Obi-Wan Kenobi, you''re my only hope', 17, 100) AS sliced_quote_3; ``` #### Auto-Increment Duplicate Column Names As you are building a query that joins similar tables, you'll often encounter duplicate column names. If the query is the final result, DuckDB will simply return the duplicated column names without modifications. However, if the query is used to create a table, or nested in a subquery or Common Table Expression (where duplicate columns are forbidden by other databases!), DuckDB will automatically assign new names to the repeated columns to make query prototyping easier. ```sql SELECT * FROM ( SELECT s1.tie_fighter, s2.tie_fighter FROM squadron_one s1 CROSS JOIN squadron_two s2 ) theyre_coming_in_too_fast; ``` | tie_fighter | tie_fighter:1 | |:---|:---| | green_one | green_two | #### Implicit Type Casts DuckDB believes in using specific data types for performance, but attempts to automatically cast between types whenever necessary. For example, when joining between an integer and a varchar, DuckDB will automatically cast them to be the same type and complete the join successfully. A `List` or `IN` expression may also be created with a mixture of types, and they will be automatically cast as well. Also, `INTEGER` and `BIGINT` are interchangeable, and thanks to DuckDB's new storage compression, a `BIGINT` usually doesn't even take up any extra space! Now you can store your data as the optimal data type, but use it easily for the best of both! ```sql CREATE TABLE sith_count_int AS SELECT 2::INTEGER AS sith_count; CREATE TABLE sith_count_varchar AS SELECT 2::VARCHAR AS sith_count; SELECT * FROM sith_count_int s_int JOIN sith_count_varchar s_char ON s_int.sith_count = s_char.sith_count; ``` | sith_count | sith_count | |---:|---:| | 2 | 2 | #### Other Friendly Features There are many other features of DuckDB that make it easier to analyze data with SQL! DuckDB [makes working with time easier in many ways](https://duckdb.org/2022/01/06/time-zones), including by accepting multiple different syntaxes (from other databases) for the [`INTERVAL` data type](#docs:stable:sql:data_types:interval) used to specify a length of time. DuckDB also implements multiple SQL clauses outside of the traditional core clauses including the [`SAMPLE` clause](#docs:stable:sql:query_syntax:sample) for quickly selecting a random subset of your data and the [`QUALIFY` clause](#docs:stable:sql:query_syntax:qualify) that allows filtering of the results of window functions (much like a `HAVING` clause does for aggregates). The [`DISTINCT ON` clause](#docs:stable:sql:statements:select) allows DuckDB to select unique combinations of a subset of the columns in a `SELECT` clause, while returning the first row of data for columns not checked for uniqueness. #### Ideas for the Future In addition to what has already been implemented, several other improvements have been suggested. Let us know if one would be particularly useful â€“ we are flexible with our roadmap! If you would like to contribute, we are very open to PRs and you are welcome to reach out on [GitHub](https://github.com/duckdb/duckdb) or [Discord](https://discord.gg/vukK4xp7Rd) ahead of time to talk through a new feature's design. - Choose columns via regex - Decide which columns to select with a pattern rather than specifying columns explicitly - ClickHouse supports this with the [`COLUMNS` expression](https://clickhouse.com/docs/en/sql-reference/statements/select/#columns-expression) - Incremental column aliases - Refer to previously defined aliases in subsequent calculated columns rather than re-specifying the calculations - Dot operators for JSON types - The JSON extension is brand new ([see our documentation!]({% link docs/stable/data/json/overview.md %})) and already implements friendly `->` and `->>` syntax Thanks for checking out DuckDB! May the Force be with you... ## Range Joins in DuckDB **Publication date:** 2022-05-27 **Author:** Richard Wesley **TL;DR:** DuckDB has fully parallelized range joins that can efficiently join millions of range predicates. Range intersection joins are an important operation in areas such as [temporal analytics](https://www2.cs.arizona.edu/~rts/tdbbook.pdf), and occur when two inequality conditions are present in a join predicate. Database implementations often rely on slow `O(N^2)` algorithms that compare every pair of rows for these operations. Instead, DuckDB leverages its fast sorting logic to implement two highly optimized parallel join operators for these kinds of range predicates, resulting in 20-30Ã— faster queries. With these operators, DuckDB can be used effectively in more time-series-oriented use cases. #### Introduction Joining tables row-wise is one of the fundamental and distinguishing operations of the relational model. A join connects two tables horizontally using some Boolean condition called a _predicate_. This sounds straightforward, but how fast the join can be performed depends on the expressions in the predicate. This has lead to the creation of different join algorithms that are optimized for different predicate types. In this post, we will explain several join algorithms and their capabilities. In particular, we will describe a newly added "range join" algorithm that makes connecting tables on overlapping time intervals or multiple ordering conditions much faster. ##### Flight Data No, this part isn't about ducks, but about air group flight statistics from the Battlestar Galactica reboot. We have a couple of tables we will be using: `Pilots`, `Crafts`, `Missions` and `Battles`. Some data was lost when the fleet dispersed, but hopefully this is enough to provide some "real life" examples! The `Pilots` table contains the pilots and their data that does not change (name, call sign, serial number): | id | callsign | name | serial | | --: | :------- | :--------------- | -----: | | 1 | Apollo | Lee Adama | 234567 | | 2 | Starbuck | Kara Thrace | 462753 | | 3 | Boomer | Sharon Valeri | 312743 | | 4 | Kat | Louanne Katraine | 244977 | | 5 | Hotdog | Brendan Costanza | 304871 | | 6 | Husker | William Adama | 204971 | | ... | ... | ... | ... | The `Crafts` table contains all the various fighting craft (ignoring the ["Ship Of Theseus"](https://en.wikipedia.org/wiki/Ship_of_Theseus) problem of recycled parts!): | id | type | tailno | | --: | :-------- | :----- | | 1 | Viper | N7242C | | 2 | Viper | 2794NC | | 3 | Raptor | 312 | | 4 | Blackbird | N9999C | | ... | ... | ... | The `Missions` table contains all the missions flown by pilots. Missions have a `begin` and `end` time logged with the flight deck. We will use some common pairings (and an unusual mission at the end where Commander Adama flew his old Viper): | pid | cid | begin | end | | --: | --: | :------------------ | :------------------ | | 2 | 2 | 3004-05-04 13:22:12 | 3004-05-04 15:05:49 | | 1 | 2 | 3004-05-04 10:00:00 | 3004-05-04 18:19:12 | | 3 | 3 | 3004-05-04 13:33:52 | 3004-05-05 19:12:21 | | 6 | 1 | 3008-03-20 08:14:37 | 3008-03-20 10:21:15 | | ... | ... | ... | ... | The `Battles` table contains the time window of each [battle with the Cylons](#<::en.battlestarwikiclone.org:wiki:colonial_battles_chronology_>). | battle | begin | end | | :------------------- | :------------------ | :------------------ | | Fall of the Colonies | 3004-05-04 13:21:45 | 3004-05-05 02:47:16 | | Red Moon | 3004-05-28 07:55:27 | 3004-05-28 08:12:19 | | Tylium Asteroid | 3004-06-09 09:00:00 | 3004-06-09 11:14:29 | | Resurrection Ship | 3004-10-28 22:00:00 | 3004-10-28 23:47:05 | | ... | ... | ... | These last two tables (` Missions` and `Battles`) are examples of _state tables_. An object in a state table has a state that runs between two time points. For the battles, the state is just yes/no. For the missions, the state is a pilot/craft combination. ##### Equality Predicates The most common type of join involves comparing one or more pairs of expressions for equality, often a primary key and a foreign key. For example, if we want a list of the craft flown by the pilots, we can join the `Pilots` table to the `Craft` table through the `Missions` table: ```sql SELECT callsign, count(*), tailno FROM Pilots p, Missions m, Crafts c WHERE p.id = m.pid AND c.id = m.cid GROUP BY ALL ORDER BY 2 DESC; ``` This will give us a table like: | callsign | count(\*) | tailno | | :------- | --------: | :----- | | Starbuck | 127 | 2794NC | | Boomer | 55 | R1234V | | Apollo | 3 | N7242C | | Husker | 1 | N7242C | | ... | ... | ... | ##### Range Predicates The thing to notice in this example is that the conditions joining the tables are equalities connected with `AND`s. But relational joins can be defined using _any_ Boolean predicate â€“ even ones without equality or `AND`. One common operation in temporal databases is intersecting two state tables. Suppose we want to find the time intervals when each pilot was engaged in combat so we can compute combat hours for seniority? Vipers are launched quickly, but not before the battle has started, and there can be malfunctions or pilots may be delayed getting to the flight deck. ```sql SELECT callsign, battle, greatest(m.begin, b.begin) AS begin, least(m.end, b.end) AS end FROM Pilots p, Missions m, Crafts c, Battles b WHERE m.begin < b.end AND b.begin < m.end AND p.id = m.pid AND c.id = m.cid; ``` This join creates a set of records containing the call sign and period in combat for each pilot. It handles the case where a pilot returns for a new craft, excludes patrol flights, and even handles the situation when a patrol flight turns into combat! This is because intersecting state tables this way produces a _joint state table_ â€“ an important temporal database operation. Here are a few rows from the result: | callsign | battle | begin | end | | :------- | :------------------- | :------------------ | :------------------ | | Starbuck | Fall of the Colonies | 3004-05-04 13:22:12 | 3004-05-04 15:05:49 | | Apollo | Fall of the Colonies | 3004-05-04 13:21:45 | 3004-05-04 18:19:12 | | Boomer | Fall of the Colonies | 3004-05-04 13:33:52 | 3004-05-05 02:47:16 | | ... | ... | ... | ... | Apollo was already in flight when the first Cylon attack came, so the query puts his `begin` time for the battle at the start of the battle, not when he launched for the decommissioning flyby. Starbuck and Boomer were scrambled after the battle started, but Boomer did not return until after the battle was effectively over, so her `end` time is moved back to the official end of the battle. What is important here is that the join condition between the pilot/mission/craft relation and the battle table has no equalities in it. This kind of join is traditionally very expensive to compute, but as we will see, there are ways of speeding it up. ##### Infinite Time One common problem with populating state tables is how to represent the open edges. For example, the begin time for the first state might not be known, or the current state may not have ended yet. Often such values are represented by `NULL`s, but this complicates the intersection query because comparing with `NULL` yields `NULL`. This issue can be worked around by using `coalesce(end, )`, but that adds a computation to every row, most of which don't need it. Another approach is to just use `` directly instead of the `NULL`, which solves the expression computation problem but introduces an arbitrary time value. This value may give strange results when used in computations. DuckDB provides a third alternative from Postgres that can be used for these situations: [infinite time values](https://www.postgresql.org/docs/14/datatype-datetime.html#DATATYPE-DATETIME-SPECIAL-TABLE). Infinite time values will compare as expected, but arithmetic with them will produce `NULL`s or infinities, indicating that the computation is not well defined. #### Common Join Algorithms To see why these joins can be expensive, let's start by looking at the two most common join algorithms. ##### Hash Joins Joins with at least one equality condition `AND`ed to the rest of the conditions are called _equi-joins_. They are usually implemented using a hash table like this: ```python hashes = {} for b in build: hashes[b.pk] = b result = [] for p in probe: result.append((p, hashes[p.fk], )) ``` The expressions from one side (the _build_ side) are computed and hashed, then the corresponding expressions from the other side (the _probe_ side) are looked up in the hash table and checked for a match. We can modify this a bit when only _some_ of the `AND`ed conditions are equalities by checking the other conditions once we find the equalities in the hash table. The important point is that we can use a hash table to make the join run time `O(N)`. This modification is a general technique that can be used with any join algorithm which reduces the possible matches. ##### Nested Loop Joins Since relational joins can be defined using _any_ Boolean predicate â€“ even one without equality or `AND`, hash joins do not always work. The join algorithm of last resort in these situations is called a _Nested Loop Join_ (or NLJ for short), and consists of just comparing every row from the probe side with every row from the build side: ```python result = [] for p in probe: for b in build if compare(p, b): result.append((p, b, )) ``` This is `O(M x N)` in the number of rows, which can be very slow if the tables are large. Even worse, most practical analytic queries (such as the combat hours example above) will not return anything like this many results, so a lot of effort may be wasted. But without an algorithm that is tuned for a kind of predicate, this is what we would have to use. #### Range Joins When we have a range comparison (one of `<`, `<=` `>`, `>=`) as one of the join conditions, we can take advantage of the ordering it implies by sorting the input relations on some of the join conditions. Sorting is `O(N log N)`, which suggests that this could be faster than an NLJ, and indeed this turns out to be the case. ##### Piecewise Merge Join Before the advent of hash joins, databases would often sort the join inputs to find matches. For equi-joins, a repeated binary search would then find the matching values on the build side in `O(M log N)` time. This is called a _Merge Join_, and it runs faster than `O(M x N)`, but not as fast as the `O(N)` time of a hash join. Still, in the case where we have a single range comparison, the binary search lets us find the first match for a probe value. We can then find all the remaining matches by looking after the first one. If we also sort the probe side, we can even know where to start the search for the next probe value because it will be after where we found the previous value. This is how _Piecewise Merge Join_ (PWMJ) works: We sort the build side so that the values are ordered by the predicate (either `ASC` or `DESC`), then sort each probe chunk the same way so we can quickly scan through sets of values to find possible matches. This can be significantly faster than NLJ for these types of queries. If there are more join conditions, we can then check the generated matches to make sure all conditions are met because once again the sorting has significantly reduced the number of checks that have to be made. ##### Inequality Join (IEJoin) For two range conditions (like the combat pay query), there are even faster algorithms available. We have recently added a new join called [IEJoin](https://vldb.org/pvldb/vol8/p2074-khayyat.pdf), which sorts on two predicates to really speed things up. The way that IEJoin works is to first sort both tables on the values for the first condition and merge the two sort keys into a combined table that tracks the two input tables' row numbers. Next, it sorts the positions in the combined table on the second range condition. It can then quickly scan for matches that pass both conditions. And just like for hash joins, we can check any remaining conditions because we have hopefully significantly reduced the number pairs we have to test. ###### Walk Through Because the algorithm is a bit tricky, let's step through a small example. (If you are reading the paper, this is a simplified version of the "Union Arrays" optimization from Â§4.3, but I find this version of the algorithm is much easier to understand than the version in Â§3.1.) We are going to look at `Qp` from the paper, which is a self join on the table "West": | West | t_id | time | cost | cores | | :--- | ---: | ---: | ---: | ----: | | s1 | 404 | 100 | 6 | 4 | | s2 | 498 | 140 | 11 | 2 | | s3 | 676 | 80 | 10 | 1 | | s4 | 742 | 90 | 5 | 4 | We are looking for pairs of billing ids where the second id had a shorter time than the first, but a higher cost: ```sql SELECT s1.t_id, s2.t_id AS t_id2 FROM west s1, west s2 WHERE s1.time > s2.time AND s1.cost < s2.cost; ``` There are two pairs that meet this criteria: | t_id | t_id2 | | ---: | ----: | | 404 | 676 | | 742 | 676 | (This is an example of another kind of double range query where we are looking for anomalies.) First, we sort both input tables on the first condition key (` time`). (We sort `DESC` because we want the values to satisfy the join condition (` >`) from left to right.) Because they are sorted the same way, we can merge the condition keys from the sorted tables into a new table called `L1` after marking each row with the table it came from (using negative row numbers to indicate the right table): | L1 | s2 | s2 | s1 | s1 | s4 | s4 | s3 | s3 | | :--- | --: | --: | --: | --: | --: | --: | --: | --: | | time | 140 | 140 | 100 | 100 | 90 | 90 | 80 | 80 | | cost | 11 | 11 | 6 | 6 | 5 | 5 | 10 | 10 | | rid | 1 | -1 | 2 | -2 | 3 | -3 | 4 | -4 | The `rid` column lets us map rows in `L1` back to the original table. Next, we build a second table `L2` with the second condition key (` cost`) and the row positions (` P`) of `L1` (not the row numbers from the original tables!) We sort `L2` on `cost` (` DESC` again this time because now we want the join condition to hold from right to left): | L2 | s2 | s2 | s3 | s3 | s1 | s1 | s4 | s4 | | :--- | --: | --: | --: | --: | --: | --: | --: | --: | | cost | 11 | 11 | 10 | 10 | 6 | 6 | 5 | 5 | | P | 0 | 1 | 6 | 7 | 2 | 3 | 4 | 5 | The sorted column of `L1` row positions is called the _permutation array_, and we can use it to find the corresponding position of the `time` value for a given `cost`. At this point we have two tables (` L1` and `L2`), each sorted on one of the join conditions and pointing back to the tables it was derived from. Moreover, the sort orders have been chosen so that the condition holds from left to right (resp. right to left). Since the conditions are transitive, this means that whenever we have a value that satisfies a condition at a point in the table, it also satisfies it for everything to the right (resp. left)! With this setup, we can scan `L2` from left to right looking for rows that match both conditions using two indexes: - `i` iterates across `L2` from left to right; - `off2` tracks `i` and is used to identify `costs` that satisfy the join condition compared to `i`. (Note that for loose inequalities, this could be to the right of `i`); We use a bitmap `B` to track which rows in `L1` that the `L2` scan has already identified as satisfying the `cost` condition compared to the `L2` scan position `i`. Because we only want matches between one left and one right row, we can skip matches where the `rid`s have different signs. To leverage this observation, we only process values of `i` that are in the left hand table (` rid[P[i]]` is positive), and we only mark bits for rows in the right hand table (` rid[P[i]]` is negative). In this example, the right side rows are the odd numbered values in `P` (which are conveniently also the odd values of `i`), which makes them easy to track in the example. For the other rows, here is what happens: | i | off2 | cost[i] | cost[off2] | P[i] | rid[P[i]] | B | Result | | --: | ---: | ------: | ---------: | ---: | --------: | :--------- | :--------- | | 0 | 0 | 11 | 11 | 0 | 1 | `00000000` | [] | | 2 | 0..2 | 10 | 11..10 | 6 | 4 | `01000000` | [] | | 4 | 2..4 | 6 | 10..6 | 2 | 2 | `01000001` | [{s4, s3}] | | 6 | 4..6 | 5 | 6..5 | 4 | 3 | `01010001` | [{s1, s3}] | Whenever we find `cost`s that satisfy the condition to the left of the scan location (between `off2` and `i`), we use `P[off2]` to mark the bits in `B` corresponding to those positions in `L1` that reference right side rows. This records that the `cost` condition is satisfied for those rows. Then whenever we have a position `P[i]` in `L1`, we can scan `B` to the right to find values that also satisfy the `cost` condition. This works because everything to the right of `P[i]` in `L1` satisfies the `price` condition thanks the sort order of `L1` and the transitivity of the comparison operations. In more detail: 1. When `i` and `off2` are `0`, the `cost` condition `<` is not satisfied, so nothing happens; 1. When `i` is `1`, we are looking at a row from the right side of the join, so we skip it and move on; 1. When `i` is `2`, we are now looking at a row from the left side, so we bring `off2` forward until the `cost` condition fails, marking `B` where it succeeds at `P[1] = [1]`; 1. We then scan the `time` values in `L1` right from position `P[i=2] = 6` and find no matches in `B`; 1. When `i` is `4`, we bring `off2` forward again, marking `B` at `P[3] = [7]`; 1. We then scan `time` from position `2` and find matches at `[6,7]`, one of which (` 6`) is from the right side table; 1. When `i` is `6`, we bring `off2` forward again, marking `B` at `P[5] = [3]`; 1. We then scan `time` from position `4` and again find matches at `[6,7]`; 1. Finally, when `i` runs off the end, we have no new `cost` values, so nothing happens; What makes this fast is that we only have to check a few bits to find the matches. When we do need to perform comparisons, we can use the fast radix comparison code from our sorting code, which doesn't require special templated versions for every data type. This not only reduces the code size and complexity, it "future-proofs" it against new data types. ###### Further Details That walk through is a slightly simplified, single threaded version of the actual algorithm. There are a few more details that may be of interest: - Scanning large, mostly empty bit maps can be slow, so we use the Bloom filter optimization from Â§4.2. - The published algorithm assumes that there are no duplicate `L1` values in either table. To handle the general case, we use an [exponential search](https://en.wikipedia.org/wiki/Exponential_search) to find the first `L1` value that satisfies the predicate with respect to the current position and scan right from that point; - We also adapted the distributed Algorithm 3 from Â§5 by joining pairs of the sorted blocks generated by the sort code on separate threads. This allows us to fully parallelize the operator by first using parallel sorting and then by breaking up the join into independent pieces; - Breaking up the pieces for parallel execution also allows us to spool join blocks that are not being processed to disk, making the join scalable. #### Special Joins One of the nice things about IEJoin is that it is very general and implements a number of more specialized join types reasonably efficiently. For example, the state intersection query above is an example of an _interval join_ where we are looking to join on the intersection of two intervals. Another specialized join that can be accelerated with `IEJoin` is a _band join_. This can be used to join values that are "close" to each other ```sql SELECT r.id, s.id FROM r, s WHERE r.value - s.value BETWEEN a AND b; ``` This translates into a double inequality join condition: ```sql SELECT r.id, s.id FROM r, s WHERE s.value + a <= r.value AND r.value <= s.value + b; ``` which is exactly the type of join expression that IEJoin handles. #### Performance So how fast is the IEJoin? It is so fast that it is difficult to compare it to the previous range join algorithms because the improvements are so large that the other algorithms do not complete in a reasonable amount of time! ##### Simple Measurements To give an example, here are the run times for a 100K self join of some employee tax and salary data, where the goal is to find the 1001 pairs of employees where one has a higher salary but the other has a higher tax rate: ```sql SELECT r.id, s.id FROM Employees r JOIN Employees s ON r.salary < s.salary AND r.tax > s.tax; ``` | Algorithm | Time (s) | | :-------- | -------: | | NLJ | 21.440 | | PWMJ | 38.698 | | IEJoin | 0.280 | Another example is a self join to find 3772 overlapping events in a 30K event table: ```sql SELECT r.id, s.id FROM events r JOIN events s ON r.start <= s.end AND r.end >= s.start AND r.id <> s.id; ``` | Algorithm | Time (s) | | :-------- | -------: | | NLJ | 6.985 | | PWMJ | 4.780 | | IEJoin | 0.226 | In both cases we see performance improvements of 20-100Ã—, which is very helpful when you run a lot of queries like these! ##### Optimization Measurements A third example demonstrates the importance of the join pair filtering and exponential search optimizations. The data is a state table of [library circulation data](https://www.opendata.dk/city-of-aarhus/transaktionsdata-fra-aarhus-kommunes-biblioteker) from another [interval join paper](https://vldb.org/pvldb/vol10/p1346-bouros.pdf), and the query is a point-in-period temporal query used to generate Figure 4d: ```sql SELECT x, count(*) AS y FROM books, (SELECT x FROM range('2013-01-01'::TIMESTAMP, '2014-01-01'::TIMESTAMP, INTERVAL 1 DAY) tbl(x)) dates WHERE checkout <= x AND x <= return GROUP BY ALL ORDER BY 1; ``` The result is a count of the number of books checked out at midnight on each day. These are the runtimes on an 18 core iMac Pro: | Improvement | Time | CPU | | :---------- | -------: | ----: | | Unoptimized | > 30 m | ~100% | | Filtering | 119.76 s | 269% | | Exponential | 11.21 s | 571% | The query joins a 35M row table with a 365 row table, so most of the data comes from the left hand side. By avoiding setting bits for the matching rows in the left table, we eliminate almost all `L1` checks. This dramatically reduces the runtime and improved the CPU utilization. The data also has a large number of rows corresponding to books that were checked out at the start of the year, which all have the same `checkout` date. Searching left linearly in the first block to find the first match for the scan resulted in repeated runs of ~120K comparisons. This caused the runtime to be completely dominated by processing the first block. By reducing the number of comparisons for these rows from an average of ~60K to 16, the runtime dropped by a factor of 10 and the CPU utilization doubled. #### Conclusion and Feedback In this blog post, we explained the new DuckDB range join improvements provided by the new IEJoin operator. This should greatly improve the response time of state table joins and anomaly detection joins. We hope this makes your DuckDB experience even better â€“ and please let us know if you run into any problems! Feel free to reach out on our [GitHub page](https://github.com/duckdb/duckdb), or our [Discord server](https://discord.gg/vukK4xp7Rd). ## Persistent Storage of Adaptive Radix Trees (ART) in DuckDB **Publication date:** 2022-07-27 **Author:** Pedro Holanda **TL;DR:** DuckDB uses Adaptive Radix Tree (ART) Indexes to enforce constraints and to speed up query filters. Up to this point, indexes were not persisted, causing issues like loss of indexing information and high reload times for tables with data constraints. We now persist ART Indexes to disk, drastically diminishing database loading times (up to orders of magnitude), and we no longer lose track of existing indexes. This blog post contains a deep dive into the implementation of ART storage, benchmarks, and future work. Finally, to better understand how our indexes are used, I'm asking you to answer the following [survey](https://forms.gle/eSboTEp9qpP7ybz98). It will guide us when defining our future roadmap. ![](../images/blog/ART/pedro-art.jpg) DuckDB uses [ART Indexes](https://db.in.tum.de/~leis/papers/ART.pdf) to keep primary key (PK), foreign key (FK), and unique constraints. They also speed up point-queries, range queries (with high selectivity), and joins. Before the bleeding edge version (or V0.4.1, depending on when you are reading this post), DuckDB did not persist ART indexes on disk. When storing a database file, only the information about existing PKs and FKs would be stored, with all other indexes being transient and non-existing when restarting the database. For PKs and FKs, they would be fully reconstructed when reloading the database, creating the inconvenience of high-loading times. A lot of scientific work has been published regarding ART Indexes, most notably on [synchronization](https://db.in.tum.de/~leis/papers/artsync.pdf), [cache-efficiency](https://dbis.uibk.ac.at/sites/default/files/2018-06/hot-height-optimized.pdf), and [evaluation](https://bigdata.uni-saarland.de/publications/ARCD15.pdf). However, up to this point, no public work exists on serializing and buffer managing an ART Tree. [Some say](https://twitter.com/muehlbau/status/1548024479971807233) that Hyper, the database in Tableau, persists ART indexes, but again, there is no public information on how that is done. This blog post will describe how DuckDB stores and loads ART indexes. In particular, how the index is lazily loaded (i.e., an ART node is only loaded into memory when necessary). In the [ART Index Section](#::art-index), we go through what an ART Index is, how it works, and some examples. In the [ART in DuckDB Section](#::art-in-duckdb), we explain why we decided to use an ART index in DuckDB where it is used and discuss the problems of not persisting ART indexes. In the [ART Storage Section](#::art-storage), we explain how we serialize and buffer manage ART Indexes in DuckDB. In the [Benchmarks Section](#::benchmarks), we compare DuckDB v0.4.0 (before ART Storage) with the bleeding edge version of DuckDB. We demonstrate the difference in the loading costs of PKs and FKs in both versions and the differences between lazily loading an ART index and accessing a fully loaded ART Index. Finally, in the [Road Map section](#::roadmap), we discuss the drawbacks of our current implementations and the plans on the list of ART index goodies for the future. #### ART Index Adaptive Radix Trees are, in essence, [Tries](https://en.wikipedia.org/wiki/Trie) that apply vertical and horizontal compression to create compact index structures. ##### [Trie](https://en.wikipedia.org/wiki/Trie) Tries are tree data structures, where each tree level holds information on part of the dataset. They are commonly exemplified with strings. In the figure below, you can see a Trie representation of a table containing the strings "pedro", "paulo" and "peri" The root node represents the first character "p" with children "a" (from paulo) and "e" (from pedro and peri), and so on. ![](../images/blog/ART/string-trie.png) To perform lookups on a Trie, you must match each character of the key to the current level of the Trie. For example, if you search for pedro, you must check the root contains the letter p. If it does, you check if any of its children contains the letter e, up to the point you reach a leaf node containing the pointer to the tuple that holds this string. (See figure below). ![](../images/blog/ART/lookup-trie.png) The main advantage of Tries is that they have O(k) lookups, meaning that in the worst case, the lookup cost will equal the length of the strings. In reality, Tries can also be used for numeric data types. However, storing them character by character-like strings would be wasteful. Take, for example, the `UBIGINT` data type. In reality, `UBIGINT` is a `uint64_t` which takes 64 bits (i.e., 8 bytes) of space. The maximum value of a `uint64_t` is `18,446,744,073,709,551,615`. Hence if we represented it, like in the example above, we would need 17 levels on the Trie. In practice, Tries are created on a bit fan-out, which tells how many bits are represented per level of the Trie. A `uint64_t` Trie with 8-bit fan-out would have a maximum of 8 levels, each representing a byte. To have more realistic examples, from this point onwards, all depictions in this post will be with bit representations. In DuckDB, the fan-out is always 8 bits. However, for simplicity, the following examples in this blog post will have a fan-out of 2 bits. In the example below, we have a Trie that indexes the values 7, 10, and 12. You can also see the binary representation of each value on the table next to them. Each node consists of the bits 0 and 1, with a pointer next to them. This pointer can either be set (represented by `*`) or null (represented by `Ã˜`). Similar to the string Trie we had before, each level of the Trie will represent two bits, with the pointer next to these bits pointing to their children. Finally, the leaves point to the actual data. ![](../images/blog/ART/2-bit-trie.png) One can quickly notice that this Trie representation is wasteful on two different fronts. First, many nodes only have one child (i.e., one path), which could be collapsed by vertical compression (i.e., Radix Tree). Second, many nodes have null pointers, storing space without any information in them, which could be resolved with horizontal compression. ##### Vertical Compression (i.e., [Radix Trees](https://en.wikipedia.org/wiki/Radix_tree)) The basic idea of vertical compression is that we collapse paths with nodes that only have one child. To support this, nodes store a prefix variable containing the collapsed path to that node. You can see a representation of this in the figure below. For example, one can see that the first four nodes have only one child. These nodes can be collapsed to the third node (i.e., the first one that bifurcates) as a prefix path. When performing lookups, the key must match all values included in the prefix path. ![](../images/blog/ART/2-bit-collapse-trie.png) Below you can see the resulting Trie after vertical compression. This Trie variant is commonly known as a Radix Tree. Although a lot of wasted space has already been saved with this Trie variant, we still have many nodes with unset pointers. ![](../images/blog/ART/2-bit-collapse-trie-result.png) ##### Horizontal Compression (i.e., ART) To fully understand the design decisions behind ART indexes, we must first extend the 2-bit fan-out to 8-bits, the commonly found fan-out for database systems. ![](../images/blog/ART/8-bit-radix-tree.png) Below you can see the same nodes as before in a TRIE node of 8 bits. In reality, these nodes will store (2^8) 256 pointers, with the key being the array position of the pointer. In the case depicted by this example, we have a node with (256 pointers * 8 bytes) 2048 byte size while only actually utilizing 24 bytes (3 pointers * 8 bytes), which means that 2016 bytes are entirely wasted. To avoid this situation. ART indexes are composed of 4 different node types that depend on how full the current node is. Below I quickly describe each node with a graphical representation of them. In the graphical representation, I present a conceptual visualization of the node and an example with keys 0,4 and 255. **Node 4**: Node 4 holds up to 4 different keys. Each key is stored in a one-byte array, with one pointer per key. With its total size being 40 bytes (4\*1 + 4\*8). Note that the pointer array is aligned with the key array (e.g., key 0 is in position 0 of the keys array, hence its pointer is in position 0 of the pointers array) ![](../images/blog/ART/art-4.png) **Node 16** : Node 16 holds up to 16 different keys. Like node 4, each key is stored in a one-byte array, with one pointer per key. With its total size being 144 bytes (16\*1 + 16\*8). Like Node 4, the pointer array is aligned with the key array. ![](../images/blog/ART/art-16.png) **Node 48** : Node 48 holds up to 48 different keys. When a key is present in this node, the one-byte array position representing that key will hold an index into the pointer array that points to the child of that key. Its total size is 640 bytes (256\*1 + 48\*8). Note that the pointer array and the key array are not aligned anymore. The key array points to the position in the pointer array where the pointer of that key is stored (e.g., the key 255 in the key array is set to 2 because the position 2 of the pointer array points to the child pertinent to that key). ![](../images/blog/ART/art-48.png) **Node 256**: Node 256 holds up to 256 different keys, hence all possible values in the distribution. It only has a pointer vector, if the pointer is set, the key exists, and it points to its child. Its total size is 2048 bytes (256 pointers * 8 bytes). ![](../images/blog/ART/art-256.png) For the example in the previous section, we could use a `Node 4` instead of a `Node 256` to store the keys, since we only have 3 keys present. Hence it would look like the following: ![](../images/blog/ART/art-index-example.png) #### ART in DuckDB When considering which index structure to implement in DuckDB, we wanted a structure that could be used to keep PK/FK/Unique constraints while also being able to speed up range queries and Joins. Database systems commonly implement [Hash-Tables](https://en.wikipedia.org/wiki/Hash_table) for constraint checks and [BP-Trees](https://en.wikipedia.org/wiki/B%2B_tree) for range queries. However, we saw in ART Indexes an opportunity to diminish the code complexity by having one data structures for two use cases. The main characteristics that ART Index provides that we take advantage of are: 1. Compact Structure. Since the ART internal nodes are rather small, they can fit in CPU caches, being a more cache-conscious structure than BP-Trees. 2. Fast Point Queries. The worst case for an ART point query is O(k), which is sufficiently fast for constraint checking. 3. No dramatic regression on insertions. Many Hash-Table variants must be rebuilt when they reach a certain size. In practice, one insert might cause a significant regression in time, with a query suddenly taking orders of magnitude more time to complete, with no apparent reason for the user. In the ART, inserts might cause node growths (e.g., a Node 4 might grow to a Node 16), but these are inexpensive. 4. Ability to run range queries. Although the ART does not run range queries as fast as BP-Trees since it must perform tree traversals, where the BP-Tree can scan leaf nodes sequentially, it still presents an advantage over hash tables since these types of queries can be done (Some might argue that you can use Hash Tables for range queries, but meh). This allows us to efficiently use ART for highly selective range queries and index joins. 5. Maintainability. Using one structure for both constraint checks and range queries instead of two is more code efficient and maintainable. ##### What Is It Used For? As said previously, ART indexes are mainly used in DuckDB on three fronts. 1. Data Constraints. Primary key, Foreign Keys, and Unique constraints are all maintained by an ART Index. When inserting data in a tuple with a constraint, this will effectively try to perform an insertion in the ART index and fail if the tuple already exists. ```sql CREATE TABLE integers (i INTEGER PRIMARY KEY); -- Insert unique values into ART INSERT INTO integers VALUES (3), (2); -- Insert conflicting value in ART will fail INSERT INTO integers VALUES (3); CREATE TABLE fk_integers (j INTEGER, FOREIGN KEY (j) REFERENCES integers(i)); -- This insert works normally INSERT INTO fk_integers VALUES (2), (3); -- This fails after checking the ART in integers INSERT INTO fk_integers VALUES (4); ``` 2. Range Queries. Highly selective range queries on indexed columns will also use the ART index underneath. ```sql CREATE TABLE integers (i INTEGER PRIMARY KEY); -- Insert unique values into ART INSERT INTO integers VALUES (3), (2), (1), (8) , (10); -- Range queries (if highly selective) will also use the ART index SELECT * FROM integers WHERE i >= 8; ``` 3. Joins. Joins with a small number of matches will also utilize existing ART indexes. ```sql -- Optionally you can always force index joins with the following pragma PRAGMA force_index_join; CREATE TABLE t1 (i INTEGER PRIMARY KEY); CREATE TABLE t2 (i INTEGER PRIMARY KEY); -- Insert unique values into ART INSERT INTO t1 VALUES (3), (2), (1), (8), (10); INSERT INTO t2 VALUES (3), (2), (1), (8), (10); -- Joins will also use the ART index SELECT * FROM t1 INNER JOIN t2 ON (t1.i = t2.i); ``` 4. Indexes over expressions. ART indexes can also be used to quickly look up expressions. ```sql CREATE TABLE integers (i INTEGER, j INTEGER); INSERT INTO integers VALUES (1, 1), (2, 2), (3, 3); -- Creates index over the i + j expression CREATE INDEX i_index ON integers USING ART((i + j)); -- Uses ART index point query SELECT i FROM integers WHERE i + j = 2; ``` #### ART Storage There are two main constraints when storing ART indexes: 1. The index must be stored in an order that allows for lazy-loading. Otherwise, we would have to fully load the index, including nodes that might be unnecessary to queries that would be executed in that session. 2. It must not increase the node size. Otherwise, we diminish the cache-conscious effectiveness of the ART index. ##### Post-Order Traversal To allow for lazy loading, we must store all children of a node, collect the information of where each child is stored, and then, when storing the actual node, we store the disk information of each of its children. To perform this type of operation, we do a post-order traversal. The post-order traversal is shown in the figure below. The circles in red represent the numeric order where the nodes will be stored. If we start from the root node (i.e., Node 4 with storage order 10), we must first store both children (i.e., Node 16 with storage order 8 and the Leaf with storage order 9). This goes on recursively for each of its children. ![](../images/blog/ART/serialization-order.png) The figure below shows an actual representation of what this would look like in DuckDB's block format. In DuckDB, data is stored in 256 kB contiguous blocks, with some blocks reserved for metadata and some for actual data. Each block is represented by an `id`. To allow for navigation within a block, they are partitioned by byte offsets hence each block contains 256,000 different offsets ![](../images/blog/ART/block-storage.png) In this example, we have `Block 0` that stored some of our database metadata. In particular, between offsets 100,000 and 100,200 we store information pertinent to one ART index. This will store information on the index (e.g., name, constraints, expression) and the `` position of its root node. For example, let's assume we are doing a lookup of the key with `row_ids` stored in the Leaf with storage order 1. We would start by loading the Art Root node on `<2 offset:220><2 offset:140><0 offset:0>&&t&t <::en.wikipedia.org:wiki:entropy_>&&&

&&&format&format

&ab_channel

&&

&args&state&result&name_vector&<:urls::url_view>&instance

&time&facet&uniformYAxis&utm_medium&&&list&index &orderby&sortby&version

&sort&publisher&organization&ext_location&ext_bbox&ext_prev_extent

&&t&&&

&:sid&:redirect&:display_count&:origin

&&range&range&range

Railway Network Utilization across the Country

Railway Network Utilization during the Year

Closest 5 train stations from user selection

Average Number of Train Services, per 15 minutes, 2024

Penguins observations, bill depth and length

&t&t&t&VERSION&FLTCOL1&FLTCOLOPR1&FLTCHO1&ADDFILTERROW&filterRowCount&SRTCOL1&SRTDIR1&ADDSORTROW&sortRowCount&DISPRES&include_withdrawn_results&include_historic_results&include_specification_revision&include_server_cpu&include_total_system_price&include_cluster_info&highlight&prod1&prod2&prod3&t&&&&&

&Yv0x

Best matched emotion versus classified emotion

&per_page&page&per_page&page&per_page&page

&&

&&&&&&

Data Preprocessing Benchmark, During Training

&version

&lhs&rhs&&