Apache DataFusion Bloghttps://datafusion.apache.org/blog/Fri, 20 Mar 2026 00:00:00 +0000Turning LIMIT into an I/O Optimization: Inside DataFusion’s Multi-Layer Pruning Stackhttps://datafusion.apache.org/blog/2026/03/20/limit-pruning<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <style> figure { margin: 20px 0; } figure img { display: block; max-width: 80%; margin: auto; } figcaption { font-style: italic; color: #555; font-size: 0.9em; max-width: 80%; margin: auto; text-align: center; } </style> <p><em>Xudong Wang, <a href="https://www.massive.com/">Massive</a></em></p> <p>Reading data efficiently means touching as little data as possible. The fastest I/O is the I/O you never make …</p>xudongFri, 20 Mar 2026 00:00:00 +0000tag:datafusion.apache.org,2026-03-20:/blog/2026/03/20/limit-pruningblogOptimizing SQL CASE Expression Evaluationhttps://datafusion.apache.org/blog/2026/02/02/datafusion_case<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <style> figure { margin: 20px 0; } figure img { display: block; max-width: 80%; margin: auto; } figcaption { font-style: italic; color: #555; font-size: 0.9em; max-width: 80%; margin: auto; text-align: center; } </style> <p>SQL's <code>CASE</code> expression is one of the few explicit conditional evaluation constructs the language provides. It allows you to control which expression from a …</p>Pepijn Van EeckhoudtMon, 02 Feb 2026 00:00:00 +0000tag:datafusion.apache.org,2026-02-02:/blog/2026/02/02/datafusion_caseblogApache DataFusion Comet 0.13.0 Releasehttps://datafusion.apache.org/blog/2026/01/30/datafusion-comet-0.13.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>The Apache DataFusion PMC is pleased to announce version 0.13.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> <p>This release covers approximately eight weeks of development …</p>pmcFri, 30 Jan 2026 00:00:00 +0000tag:datafusion.apache.org,2026-01-30:/blog/2026/01/30/datafusion-comet-0.13.0blogApache DataFusion 52.0.0 Releasedhttps://datafusion.apache.org/blog/2026/01/12/datafusion-52.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>We are proud to announce the release of <a href="https://crates.io/crates/datafusion/52.0.0">DataFusion 52.0.0</a>. This post highlights some of the major improvements since <a href="https://datafusion.apache.org/blog/2025/11/25/datafusion-51.0.0/">DataFusion 51.0.0</a>. The complete list of changes is available in the <a href="https://github.com/apache/datafusion/blob/branch-52/dev/changelog/52.0.0.md">changelog</a>. Thanks to the <a href="https://github.com/apache/datafusion/blob/branch-52/dev/changelog/52.0.0.md#credits">121 contributors</a> for making this release possible.</p> <h2 id="performance-improvements">Performance Improvements 🚀<a class="headerlink" href="#performance-improvements" title="Permanent link">¶</a></h2> <p>We continue to …</p>pmcMon, 12 Jan 2026 00:00:00 +0000tag:datafusion.apache.org,2026-01-12:/blog/2026/01/12/datafusion-52.0.0blogExtending SQL in DataFusion: from ->> to TABLESAMPLEhttps://datafusion.apache.org/blog/2026/01/12/extending-sql<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>If you embed <a href="https://datafusion.apache.org/">DataFusion</a> in your product, your users will eventually run SQL that DataFusion does not recognize. Not because the query is unreasonable, but because SQL in practice includes many dialects and system-specific statements.</p> <p>Suppose you store data as Parquet files on S3 and want users to attach an …</p>Geoffrey Claude (Datadog)Mon, 12 Jan 2026 00:00:00 +0000tag:datafusion.apache.org,2026-01-12:/blog/2026/01/12/extending-sqlblogOptimizing Repartitions in DataFusion: How I Went From Database Noob to Core Contributionhttps://datafusion.apache.org/blog/2025/12/15/avoid-consecutive-repartitions<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <div style="display: flex; align-items: center; gap: 20px; margin-bottom: 20px;"> <div style="flex: 1;"> Databases are some of the most complex yet interesting pieces of software. They are amazing pieces of abstraction: query engines optimize and execute complex plans, storage engines provide sophisticated infrastructure as the backbone of the system, while intricate file formats lay the groundwork for particular workloads. All of this is …</div></div>Gene BordegarayMon, 15 Dec 2025 00:00:00 +0000tag:datafusion.apache.org,2025-12-15:/blog/2025/12/15/avoid-consecutive-repartitionsblogApache DataFusion Comet 0.12.0 Releasehttps://datafusion.apache.org/blog/2025/12/04/datafusion-comet-0.12.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>The Apache DataFusion PMC is pleased to announce version 0.12.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> <p>This release covers approximately four weeks of development …</p>pmcThu, 04 Dec 2025 00:00:00 +0000tag:datafusion.apache.org,2025-12-04:/blog/2025/12/04/datafusion-comet-0.12.0blogApache DataFusion 51.0.0 Releasedhttps://datafusion.apache.org/blog/2025/11/25/datafusion-51.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2> <p>We are proud to announce the release of <a href="https://crates.io/crates/datafusion/51.0.0">DataFusion 51.0.0</a>. This post highlights some of the major improvements since <a href="https://datafusion.apache.org/blog/2025/09/29/datafusion-50.0.0/">DataFusion 50.0.0</a>. The complete list of changes is available in the <a href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md">changelog</a>. Thanks to the <a href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md#credits">128 contributors</a> for making this release possible.</p> <h2 id="performance-improvements">Performance Improvements 🚀<a class="headerlink" href="#performance-improvements" title="Permanent link">¶</a></h2> <p>We continue …</p>pmcTue, 25 Nov 2025 00:00:00 +0000tag:datafusion.apache.org,2025-11-25:/blog/2025/11/25/datafusion-51.0.0blogApache DataFusion Comet 0.11.0 Releasehttps://datafusion.apache.org/blog/2025/10/21/datafusion-comet-0.11.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>The Apache DataFusion PMC is pleased to announce version 0.11.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> <p>This release covers approximately five weeks of development …</p>pmcTue, 21 Oct 2025 00:00:00 +0000tag:datafusion.apache.org,2025-10-21:/blog/2025/10/21/datafusion-comet-0.11.0blogApache DataFusion 50.0.0 Releasedhttps://datafusion.apache.org/blog/2025/09/29/datafusion-50.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <!-- see https://github.com/apache/datafusion/issues/16347 for details --> <h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2> <p>We are proud to announce the release of <a href="https://crates.io/crates/datafusion/50.0.0">DataFusion 50.0.0</a>. This blog post highlights some of the major improvements since the release of <a href="https://datafusion.apache.org/blog/2025/07/28/datafusion-49.0.0/">DataFusion 49.0.0</a>. The complete list of changes is available in the <a href="https://github.com/apache/datafusion/blob/branch-50/dev/changelog/50.0.0.md">changelog</a>. Thanks to <a href="https://github.com/apache/datafusion/blob/branch-50/dev/changelog/50.0.0.md#credits">numerous contributors</a> for making this release possible!</p> <h2 id="performance-improvements">Performance …</h2>pmcMon, 29 Sep 2025 00:00:00 +0000tag:datafusion.apache.org,2025-09-29:/blog/2025/09/29/datafusion-50.0.0blogImplementing User Defined Types and Custom Metadata in DataFusionhttps://datafusion.apache.org/blog/2025/09/21/custom-types-using-metadata<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %}x --> <p><a href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0/">Apache DataFusion</a> significantly improves support for user defined types and metadata. The user defined function APIs let users access metadata on the input columns to functions and produce metadata in the output.</p> <h2 id="user-defined-types-extension-types">User defined types == extension types<a class="headerlink" href="#user-defined-types-extension-types" title="Permanent link">¶</a></h2> <p>DataFusion directly uses <a href="https://arrow.apache.org">Apache Arrow</a>'s <a href="https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html">DataTypes</a> as its type system. This has …</p>Tim Saucer(rerun.io), Dewey Dunnington(Wherobots), Andrew Lamb(InfluxData)Sun, 21 Sep 2025 00:00:00 +0000tag:datafusion.apache.org,2025-09-21:/blog/2025/09/21/custom-types-using-metadatablogApache DataFusion Comet 0.10.0 Releasehttps://datafusion.apache.org/blog/2025/09/16/datafusion-comet-0.10.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>The Apache DataFusion PMC is pleased to announce version 0.10.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> <p>This release covers approximately ten weeks of development …</p>pmcTue, 16 Sep 2025 00:00:00 +0000tag:datafusion.apache.org,2025-09-16:/blog/2025/09/16/datafusion-comet-0.10.0blogDynamic Filters: Passing Information Between Operators During Execution for 25x Faster Querieshttps://datafusion.apache.org/blog/2025/09/10/dynamic-filters<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <!-- diagrams source: https://docs.google.com/presentation/d/1FFYy27ydZdeFZWWuMjZGnYKUx9QNJfzuVLAH8AE5wlc/edit?slide=id.g364a74cba3d_0_92#slide=id.g364a74cba3d_0_92 Intended Audience: Query engine / data systems developers who want to learn about topk optimization Goal: Introduce TopK and dynamic filters as general optimization techniques for query engines, and how they were used to improve performance in DataFusion. --> <p>This blog post introduces the query engine optimization techniques called TopK and dynamic filters. We describe the motivating use case, how these optimizations work, and how we implemented them with the <a href="https://datafusion.apache.org/">Apache DataFusion</a> community to improve performance by an order of magnitude for some query patterns.</p> <h2 id="motivation-and-results">Motivation and Results<a class="headerlink" href="#motivation-and-results" title="Permanent link">¶</a></h2> <p>The …</p>Adrian Garcia Badaracco (Pydantic), Andrew Lamb (InfluxData)Wed, 10 Sep 2025 00:00:00 +0000tag:datafusion.apache.org,2025-09-10:/blog/2025/09/10/dynamic-filtersblogUsing External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquethttps://datafusion.apache.org/blog/2025/08/15/external-parquet-indexes<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <!-- diagrams source https://docs.google.com/presentation/d/1e_Z_F8nt2rcvlNvhU11khF5lzJJVqNtqtyJ-G3mp4-Q --> <p>It is a common misconception that <a href="https://parquet.apache.org/">Apache Parquet</a> requires (slow) reparsing of metadata and is limited to indexing structures provided by the format. In fact, caching parsed metadata and using custom external indexes along with Parquet's hierarchical data organization can significantly speed up query processing.</p> <p>In this blog, I describe …</p>Andrew Lamb (InfluxData)Fri, 15 Aug 2025 00:00:00 +0000tag:datafusion.apache.org,2025-08-15:/blog/2025/08/15/external-parquet-indexesblogApache DataFusion 49.0.0 Releasedhttps://datafusion.apache.org/blog/2025/07/28/datafusion-49.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <!-- see https://github.com/apache/datafusion/issues/16347 for details --> <h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2> <p>We are proud to announce the release of <a href="https://crates.io/crates/datafusion/49.0.0">DataFusion 49.0.0</a>. This blog post highlights some of the major improvements since the release of <a href="https://datafusion.apache.org/blog/2025/07/18/datafusion-48.0.0/">DataFusion 48.0.0</a>. The complete list of changes is available in the <a href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md">changelog</a>.</p> <h2 id="performance-improvements">Performance Improvements 🚀<a class="headerlink" href="#performance-improvements" title="Permanent link">¶</a></h2> <p>DataFusion continues to focus on enhancing performance, as …</p>pmcMon, 28 Jul 2025 00:00:00 +0000tag:datafusion.apache.org,2025-07-28:/blog/2025/07/28/datafusion-49.0.0blogApache DataFusion 48.0.0 Releasedhttps://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <!-- see https://github.com/apache/datafusion/issues/16347 for details --> <p>We’re excited to announce the release of <strong>Apache DataFusion 48.0.0</strong>! As always, this version packs in a wide range of improvements and fixes. You can find the complete details in the full <a href="https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md">changelog</a>. We’ll highlight the most important changes below and guide you through upgrading.</p> <h2 id="breaking-changes">Breaking …</h2>PMCWed, 16 Jul 2025 00:00:00 +0000tag:datafusion.apache.org,2025-07-16:/blog/2025/07/16/datafusion-48.0.0blogEmbedding User-Defined Indexes in Apache Parquet Fileshttps://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>It’s a common misconception that <a href="https://parquet.apache.org/">Apache Parquet</a> files are limited to basic Min/Max/Null Count statistics and Bloom filters, and that adding more advanced indexes requires changing the specification or creating a new file format. In fact, footer metadata and offset-based addressing already provide everything needed to embed …</p>Qi Zhu (Cloudera), Jigao Luo (Systems Group at TU Darmstadt), and Andrew Lamb (InfluxData)Mon, 14 Jul 2025 00:00:00 +0000tag:datafusion.apache.org,2025-07-14:/blog/2025/07/14/user-defined-parquet-indexesblogApache DataFusion 47.0.0 Releasedhttps://datafusion.apache.org/blog/2025/07/11/datafusion-47.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <!-- see https://github.com/apache/datafusion/issues/16347 for details --> <p>We’re excited to announce the release of <strong>Apache DataFusion 47.0.0</strong>! This new version represents a significant milestone for the project, packing in a wide range of improvements and fixes. You can find the complete details in the full <a href="https://github.com/apache/datafusion/blob/branch-47/dev/changelog/47.0.0.md">changelog</a>. We’ll highlight the most important changes below …</p>PMCFri, 11 Jul 2025 00:00:00 +0000tag:datafusion.apache.org,2025-07-11:/blog/2025/07/11/datafusion-47.0.0blogApache DataFusion Comet 0.9.0 Releasehttps://datafusion.apache.org/blog/2025/07/01/datafusion-comet-0.9.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>The Apache DataFusion PMC is pleased to announce version 0.9.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> <p>This release covers approximately ten weeks of development …</p>pmcTue, 01 Jul 2025 00:00:00 +0000tag:datafusion.apache.org,2025-07-01:/blog/2025/07/01/datafusion-comet-0.9.0blogUsing Rust async for Query Execution and Cancelling Long-Running Querieshttps://datafusion.apache.org/blog/2025/06/30/cancellation<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <style> figure { margin: 20px 0; } figure img { display: block; max-width: 80%; margin: auto; } figcaption { font-style: italic; color: #555; font-size: 0.9em; max-width: 80%; margin: auto; text-align: center; } </style> <p>Have you ever tried to cancel a query that just wouldn't stop? In this post, we'll review how Rust's <a href="https://doc.rust-lang.org/book/ch17-00-async-await.html"><code>async</code> programming model</a> works, how …</p>Pepijn Van EeckhoudtMon, 30 Jun 2025 00:00:00 +0000tag:datafusion.apache.org,2025-06-30:/blog/2025/06/30/cancellationblogOptimizing SQL (and DataFrames) in DataFusion, Part 1: Query Optimization Overviewhttps://datafusion.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-one<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p><em>Note: this blog was originally published <a href="https://www.influxdata.com/blog/optimizing-sql-dataframes-part-one/">on the InfluxData blog</a></em></p> <h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2> <p>Sometimes Query Optimizers are seen as a sort of black magic, <a href="https://15799.courses.cs.cmu.edu/spring2025/">“the most challenging problem in computer science,”</a> according to Father Pavlo, or some behind-the-scenes player. We believe this perception is because:</p> <ol> <li> <p>One must implement the rest of a …</p></li></ol>alamb, akurmustafaSun, 15 Jun 2025 00:00:00 +0000tag:datafusion.apache.org,2025-06-15:/blog/2025/06/15/optimizing-sql-dataframes-part-oneblogOptimizing SQL (and DataFrames) in DataFusion, Part 2: Optimizers in Apache DataFusionhttps://datafusion.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-two <!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p><em>Note, this blog was originally published <a href="https://www.influxdata.com/blog/optimizing-sql-dataframes-part-two/">on the InfluxData blog</a>.</em></p> <p>In the <a href="https://datafusion.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-one">first part of this post</a>, we discussed what a Query Optimizer is, what role it plays, and described how industrial optimizers are organized. In this second post, we describe various optimizations that are found in <a href="https://datafusion.apache.org/">Apache DataFusion</a> and …</p>alamb, akurmustafaSun, 15 Jun 2025 00:00:00 +0000tag:datafusion.apache.org,2025-06-15:/blog/2025/06/15/optimizing-sql-dataframes-part-twoblogApache DataFusion Comet 0.8.0 Releasehttps://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> <p>This release covers approximately six weeks of development …</p>pmcTue, 06 May 2025 00:00:00 +0000tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0blogUser defined Window Functions in DataFusionhttps://datafusion.apache.org/blog/2025/04/19/user-defined-window-functions<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>Window functions are a powerful feature in SQL, allowing for complex analytical computations over a subset of data. However, efficiently implementing them, especially sliding windows, can be quite challenging. With <a href="https://datafusion.apache.org/">Apache DataFusion</a>'s user-defined window functions, developers can easily take advantage of all the effort put into DataFusion's implementation.</p> <p>In …</p>Aditya Singh Rathore, Andrew LambSat, 19 Apr 2025 00:00:00 +0000tag:datafusion.apache.org,2025-04-19:/blog/2025/04/19/user-defined-window-functionsblogtpchgen-rs World’s fastest open source TPC-H data generator, written in Rusthttps://datafusion.apache.org/blog/2025/04/10/fastest-tpch-generator<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %}x --> <style> /* Table borders */ table, th, td { border: 1px solid black; border-collapse: collapse; } th, td { padding: 3px; } </style> <p><strong>TLDR: TPC-H SF=100 in 1min using tpchgen-rs vs 30min+ with dbgen</strong>.</p> <p>3 members of the <a href="https://datafusion.apache.org/">Apache DataFusion</a> community used Rust and open source development to build <a href="https://github.com/clflushopt/tpchgen-rs">tpchgen-rs</a>, a fully open TPC-H data generator over …</p>Andrew Lamb, Achraf B, and Sean SmithThu, 10 Apr 2025 00:00:00 +0000tag:datafusion.apache.org,2025-04-10:/blog/2025/04/10/fastest-tpch-generatorblogApache DataFusion Python 46.0.0 Releasedhttps://datafusion.apache.org/blog/2025/03/30/datafusion-python-46.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>We are happy to announce that <a href="https://pypi.org/project/datafusion/46.0.0/">datafusion-python 46.0.0</a> has been released. This release brings in all of the new features of the core <a href="https://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0">DataFusion 46.0.0</a> library. Since the last blog post for <a href="https://datafusion.apache.org/blog/2024/12/14/datafusion-python-43.1.0/">datafusion-python 43.1.0</a>, a large number of improvements have been made that can …</p>timsaucerSun, 30 Mar 2025 00:00:00 +0000tag:datafusion.apache.org,2025-03-30:/blog/2025/03/30/datafusion-python-46.0.0blogApache DataFusion 46.0.0 Releasedhttps://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>We’re excited to announce the release of <strong>Apache DataFusion 46.0.0</strong>! This new version represents a significant milestone for the project, packing in a wide range of improvements and fixes. You can find the complete details in the full <a href="https://github.com/apache/datafusion/blob/branch-46/dev/changelog/46.0.0.md">changelog</a>. We’ll highlight the most important changes below …</p>Oznur Hanci and Berkay Sahin on behalf of the PMCMon, 24 Mar 2025 00:00:00 +0000tag:datafusion.apache.org,2025-03-24:/blog/2025/03/24/datafusion-46.0.0blogEfficient Filter Pushdown in Parquethttps://datafusion.apache.org/blog/2025/03/21/parquet-pushdown<style> figure { margin: 20px 0; } figure img { display: block; max-width: 80%; } figcaption { font-style: italic; margin-top: 10px; color: #555; font-size: 0.9em; max-width: 80%; } </style> <!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p><em>Editor's Note: This blog was first published on <a href="https://blog.xiangpeng.systems/posts/parquet-pushdown/">Xiangpeng Hao's blog</a>. Thanks to <a href="https://www.influxdata.com/">InfluxData</a> for sponsoring this work as part of his PhD funding.</em></p> <hr/> <p>In the <a href="https://datafusion.apache.org/blog/2025/03/20/parquet-pruning">previous post …</a></p>Xiangpeng HaoFri, 21 Mar 2025 00:00:00 +0000tag:datafusion.apache.org,2025-03-21:/blog/2025/03/21/parquet-pushdownblogApache DataFusion Comet 0.7.0 Releasehttps://datafusion.apache.org/blog/2025/03/20/datafusion-comet-0.7.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>The Apache DataFusion PMC is pleased to announce version 0.7.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> <p>Comet runs on commodity hardware and aims to …</p>pmcThu, 20 Mar 2025 00:00:00 +0000tag:datafusion.apache.org,2025-03-20:/blog/2025/03/20/datafusion-comet-0.7.0blogParquet Pruning in DataFusion: Read Only What Mattershttps://datafusion.apache.org/blog/2025/03/20/parquet-pruning<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p><em>Editor's Note: This blog was first published on <a href="https://blog.xiangpeng.systems/posts/parquet-to-arrow/">Xiangpeng Hao's blog</a>. Thanks to <a href="https://www.influxdata.com/">InfluxData</a> for sponsoring this work as part of his PhD funding.</em></p> <hr/> <p><a href="https://parquet.apache.org/">Apache Parquet</a> has become the industry standard for storing columnar data, and reading Parquet efficiently -- especially from remote storage -- is crucial for query performance.</p> <p><a href="https://datafusion.apache.org/">Apache DataFusion …</a></p>Xiangpeng HaoThu, 20 Mar 2025 00:00:00 +0000tag:datafusion.apache.org,2025-03-20:/blog/2025/03/20/parquet-pruningblogUsing Ordering for Better Plans in Apache DataFusionhttps://datafusion.apache.org/blog/2025/03/11/ordering-analysis<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <!-- see https://github.com/apache/datafusion/issues/11631 for details --> <h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2> <p>In this blog post, we explain when an ordering requirement of an operator is satisfied by its input data. This analysis is essential for order-based optimizations and is often more complex than one might initially think.</p> <blockquote style="border-left: 4px solid #007bff; padding: 10px; background-color: #f8f9fa;"> <strong>Ordering Requirement</strong> for an operator describes how the input data to that operator …</blockquote>Mustafa Akur, Andrew LambTue, 11 Mar 2025 00:00:00 +0000tag:datafusion.apache.org,2025-03-11:/blog/2025/03/11/ordering-analysisblogApache DataFusion 45.0.0 Releasedhttps://datafusion.apache.org/blog/2025/02/20/datafusion-45.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <!-- see https://github.com/apache/datafusion/issues/11631 for details --> <h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2> <p>We are very proud to announce <a href="https://crates.io/crates/datafusion/45.0.0">DataFusion 45.0.0</a>. This blog highlights some of the many major improvements since we released <a href="https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/">DataFusion 40.0.0</a> and a preview of what the community is thinking about in the next 6 months. It has been an exciting period of development …</p>pmcThu, 20 Feb 2025 00:00:00 +0000tag:datafusion.apache.org,2025-02-20:/blog/2025/02/20/datafusion-45.0.0blogApache DataFusion Comet 0.6.0 Releasehttps://datafusion.apache.org/blog/2025/02/17/datafusion-comet-0.6.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>The Apache DataFusion PMC is pleased to announce version 0.6.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> <p>Comet runs on commodity hardware and aims to …</p>pmcMon, 17 Feb 2025 00:00:00 +0000tag:datafusion.apache.org,2025-02-17:/blog/2025/02/17/datafusion-comet-0.6.0blogApache DataFusion Ballista 43.0.0 Releasedhttps://datafusion.apache.org/blog/2025/02/02/datafusion-ballista-43.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>We are pleased to announce version <a href="https://github.com/apache/datafusion-ballista/blob/main/CHANGELOG.md#4300-2025-01-07">43.0.0</a> of the <a href="https://datafusion.apache.org/ballista/">DataFusion Ballista</a>. Ballista allows existing <a href="https://datafusion.apache.org">DataFusion</a> applications to be scaled out on a cluster for use cases that are not practical to run on a single node.</p> <h2 id="highlights-of-this-release">Highlights of this release<a class="headerlink" href="#highlights-of-this-release" title="Permanent link">¶</a></h2> <h3 id="seamless-integration-with-datafusion">Seamless Integration with DataFusion<a class="headerlink" href="#seamless-integration-with-datafusion" title="Permanent link">¶</a></h3> <p>The primary objective of …</p>milenkovicmSun, 02 Feb 2025 00:00:00 +0000tag:datafusion.apache.org,2025-02-02:/blog/2025/02/02/datafusion-ballista-43.0.0blogApache DataFusion Comet 0.5.0 Releasehttps://datafusion.apache.org/blog/2025/01/17/datafusion-comet-0.5.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>The Apache DataFusion PMC is pleased to announce version 0.5.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> <p>Comet runs on commodity hardware and aims to …</p>pmcFri, 17 Jan 2025 00:00:00 +0000tag:datafusion.apache.org,2025-01-17:/blog/2025/01/17/datafusion-comet-0.5.0blogApache DataFusion Python 43.1.0 Releasedhttps://datafusion.apache.org/blog/2024/12/14/datafusion-python-43.1.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>We are happy to announce that <a href="https://pypi.org/project/datafusion/43.1.0/">datafusion-python 43.1.0</a> has been released. This release brings in all of the new features of the core <a href="https://github.com/apache/datafusion/blob/main/dev/changelog/43.0.0.md">DataFusion 43.0.0</a> library. Since the last blog post for <a href="https://datafusion.apache.org/blog/2024/08/20/python-datafusion-40.0.0/">datafusion-python 40.1.0</a>, a large number of improvements have been made that can …</p>timsaucerSat, 14 Dec 2024 00:00:00 +0000tag:datafusion.apache.org,2024-12-14:/blog/2024/12/14/datafusion-python-43.1.0blogApache DataFusion Comet 0.4.0 Releasehttps://datafusion.apache.org/blog/2024/11/20/datafusion-comet-0.4.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>The Apache DataFusion PMC is pleased to announce version 0.4.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> <p>Comet runs on commodity hardware and aims to …</p>pmcWed, 20 Nov 2024 00:00:00 +0000tag:datafusion.apache.org,2024-11-20:/blog/2024/11/20/datafusion-comet-0.4.0blogComparing approaches to User Defined Functions in Apache DataFusion using Pythonhttps://datafusion.apache.org/blog/2024/11/19/datafusion-python-udf-comparisons<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <h2 id="personal-context">Personal Context<a class="headerlink" href="#personal-context" title="Permanent link">¶</a></h2> <p>For a few months now I’ve been working with <a href="https://datafusion.apache.org/">Apache DataFusion</a>, a fast query engine written in Rust. From my experience the language that nearly all data scientists are working in is Python. In general, data scientists often use <a href="https://pandas.pydata.org/">Pandas</a> for in-memory tasks and <a href="https://spark.apache.org/">PySpark</a> for larger …</p>timsaucerTue, 19 Nov 2024 00:00:00 +0000tag:datafusion.apache.org,2024-11-19:/blog/2024/11/19/datafusion-python-udf-comparisonsblogApache DataFusion is now the fastest single node engine for querying Apache Parquet fileshttps://datafusion.apache.org/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>I am extremely excited to announce that <a href="https://crates.io/crates/datafusion">Apache DataFusion</a> is the fastest engine for querying Apache Parquet files in <a href="https://benchmark.clickhouse.com/">ClickBench</a>. It is faster than <a href="https://duckdb.org/">DuckDB</a>, <a href="https://clickhouse.com/chdb">chDB</a> and <a href="https://clickhouse.com/">Clickhouse</a> using the same hardware. It also marks the first time a <a href="https://www.rust-lang.org/">Rust</a>-based engine holds the top spot, which has previously been …</p>Andrew Lamb, Staff Engineer at InfluxDataMon, 18 Nov 2024 00:00:00 +0000tag:datafusion.apache.org,2024-11-18:/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbenchblogApache DataFusion Comet 0.3.0 Releasehttps://datafusion.apache.org/blog/2024/09/27/datafusion-comet-0.3.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>The Apache DataFusion PMC is pleased to announce version 0.3.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> <p>Comet runs on commodity hardware and aims to …</p>pmcFri, 27 Sep 2024 00:00:00 +0000tag:datafusion.apache.org,2024-09-27:/blog/2024/09/27/datafusion-comet-0.3.0blogUsing StringView / German Style Strings to Make Queries Faster: Part 1- Reading Parquethttps://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p><em>Editor's Note: This is the first of a <a href="../string-view-german-style-strings-part-2/">two part</a> blog series that was first published on the <a href="https://www.influxdata.com/blog/faster-queries-with-stringview-part-one-influxdb/">InfluxData blog</a>. Thanks to InfluxData for sponsoring this work as <a href="https://haoxp.xyz/">Xiangpeng Hao</a>'s summer intern project</em></p> <p>This blog describes our experience implementing <a href="https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-view-layout">StringView</a> in the <a href="https://github.com/apache/arrow-rs">Rust implementation</a> of <a href="https://arrow.apache.org/">Apache Arrow</a>, and integrating …</p>Xiangpeng Hao, Andrew LambFri, 13 Sep 2024 00:00:00 +0000tag:datafusion.apache.org,2024-09-13:/blog/2024/09/13/string-view-german-style-strings-part-1blogUsing StringView / German Style Strings to make Queries Faster: Part 2 - String Operationshttps://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-2<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p><em>Editor's Note: This blog series was first published on the <a href="https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/">InfluxData blog</a>. Thanks to InfluxData for sponsoring this work as <a href="https://haoxp.xyz/">Xiangpeng Hao</a>'s summer intern project</em></p> <p>In the <a href="https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1">first post</a>, we discussed the nuances required to accelerate Parquet loading using StringViewArray by reusing buffers and reducing copies. In this second …</p>Xiangpeng Hao, Andrew LambFri, 13 Sep 2024 00:00:00 +0000tag:datafusion.apache.org,2024-09-13:/blog/2024/09/13/string-view-german-style-strings-part-2blogApache DataFusion Comet 0.2.0 Releasehttps://datafusion.apache.org/blog/2024/08/28/datafusion-comet-0.2.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>The Apache DataFusion PMC is pleased to announce version 0.2.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> <p>Comet runs on commodity hardware and aims to …</p>pmcWed, 28 Aug 2024 00:00:00 +0000tag:datafusion.apache.org,2024-08-28:/blog/2024/08/28/datafusion-comet-0.2.0blogApache DataFusion Python 40.1.0 Released, Significant usability updateshttps://datafusion.apache.org/blog/2024/08/20/python-datafusion-40.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2> <p>We are happy to announce that <a href="https://pypi.org/project/datafusion/40.1.0/">DataFusion in Python 40.1.0</a> has been released. In addition to bringing in all of the new features of the core <a href="https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/">DataFusion 40.0.0</a> package, this release contains <em>significant</em> updates to the user interface and documentation. We listened to the python …</p>timsaucerTue, 20 Aug 2024 00:00:00 +0000tag:datafusion.apache.org,2024-08-20:/blog/2024/08/20/python-datafusion-40.0.0blogApache DataFusion 40.0.0 Releasedhttps://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <!-- see https://github.com/apache/datafusion/issues/9602 for details --> <h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2> <p>We are proud to announce <a href="https://crates.io/crates/datafusion/40.0.0">DataFusion 40.0.0</a>. This blog highlights some of the many major improvements since we released <a href="https://datafusion.apache.org/blog/2024/01/19/datafusion-34.0.0/">DataFusion 34.0.0</a> and a preview of what the community is thinking about in the next 6 months. We are hoping to make more regular blog posts …</p>pmcWed, 24 Jul 2024 00:00:00 +0000tag:datafusion.apache.org,2024-07-24:/blog/2024/07/24/datafusion-40.0.0blogApache DataFusion Comet 0.1.0 Releasehttps://datafusion.apache.org/blog/2024/07/20/datafusion-comet-0.1.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>The Apache DataFusion PMC is pleased to announce the first official source release of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> <p>Comet runs on commodity hardware and aims …</p>pmcSat, 20 Jul 2024 00:00:00 +0000tag:datafusion.apache.org,2024-07-20:/blog/2024/07/20/datafusion-comet-0.1.0blogAnnouncing Apache Arrow DataFusion is now Apache DataFusionhttps://datafusion.apache.org/blog/2024/05/07/datafusion-tlp<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2> <p>TLDR; <a href="https://arrow.apache.org/">Apache Arrow</a> DataFusion --&gt; <a href="https://datafusion.apache.org/">Apache DataFusion</a></p> <p>The Arrow PMC and newly created DataFusion PMC are happy to announce that as of April 16, 2024 the Apache Arrow DataFusion subproject is now a top level <a href="https://www.apache.org/">Apache Software Foundation</a> project.</p> <h2 id="background">Background<a class="headerlink" href="#background" title="Permanent link">¶</a></h2> <p>Apache DataFusion is a fast, extensible query engine for building …</p>pmcTue, 07 May 2024 00:00:00 +0000tag:datafusion.apache.org,2024-05-07:/blog/2024/05/07/datafusion-tlpblogAnnouncing Apache Arrow DataFusion Comethttps://datafusion.apache.org/blog/2024/03/06/comet-donation<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <h1 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h1> <p>The Apache Arrow PMC is pleased to announce the donation of the <a href="https://github.com/apache/arrow-datafusion-comet">Comet project</a>, a native Spark SQL Accelerator built on <a href="https://arrow.apache.org/datafusion">Apache Arrow DataFusion</a>.</p> <p>Comet is an Apache Spark plugin that uses Apache Arrow DataFusion to accelerate Spark workloads. It is designed as a drop-in replacement for Spark's JVM …</p>pmcWed, 06 Mar 2024 00:00:00 +0000tag:datafusion.apache.org,2024-03-06:/blog/2024/03/06/comet-donationblogApache Arrow DataFusion 34.0.0 Released, Looking Forward to 2024https://datafusion.apache.org/blog/2024/01/19/datafusion-34.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <h2 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h2> <p>We recently <a href="https://crates.io/crates/datafusion/34.0.0">released DataFusion 34.0.0</a>. This blog highlights some of the major improvements since we <a href="https://arrow.apache.org/blog/2023/06/24/datafusion-25.0.0/.">released DataFusion 26.0.0</a> (spoiler alert there are many) and a preview of where the community plans to focus in the next 6 months.</p> <p><a href="https://arrow.apache.org/datafusion/">Apache Arrow DataFusion</a> is an extensible query …</p>pmcFri, 19 Jan 2024 00:00:00 +0000tag:datafusion.apache.org,2024-01-19:/blog/2024/01/19/datafusion-34.0.0blogAggregating Millions of Groups Fast in Apache Arrow DataFusion 28.0.0https://datafusion.apache.org/blog/2023/08/05/datafusion_fast_grouping<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <!-- Converted from Google Docs using https://www.buymeacoffee.com/docstomarkdown --> <h2 id="aggregating-millions-of-groups-fast-in-apache-arrow-datafusion">Aggregating Millions of Groups Fast in Apache Arrow DataFusion<a class="headerlink" href="#aggregating-millions-of-groups-fast-in-apache-arrow-datafusion" title="Permanent link">¶</a></h2> <p>Andrew Lamb, Daniël Heres, Raphael Taylor-Davies,</p> <p><em>Note: this article was originally published on the <a href="https://www.influxdata.com/blog/aggregating-millions-groups-fast-apache-arrow-datafusion">InfluxData Blog</a></em></p> <h2 id="tldr">TLDR<a class="headerlink" href="#tldr" title="Permanent link">¶</a></h2> <p>Grouped aggregations are a core part of any analytic tool, creating understandable summaries of huge data volumes. <a href="https://arrow.apache.org/datafusion/">Apache Arrow DataFusion</a>’s parallel aggregation capability …</p>alamb, Dandandan, tustvoldSat, 05 Aug 2023 00:00:00 +0000tag:datafusion.apache.org,2023-08-05:/blog/2023/08/05/datafusion_fast_groupingblogApache Arrow DataFusion 26.0.0https://datafusion.apache.org/blog/2023/06/24/datafusion-25.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>It has been a whirlwind 6 months of DataFusion development since <a href="https://arrow.apache.org/blog/2023/01/19/datafusion-16.0.0">our last update</a>: the community has grown, many features have been added, performance improved and we are <a href="https://github.com/apache/arrow-datafusion/discussions/6475">discussing</a> branching out to our own top level Apache Project.</p> <h2 id="background">Background<a class="headerlink" href="#background" title="Permanent link">¶</a></h2> <p><a href="https://arrow.apache.org/datafusion/">Apache Arrow DataFusion</a> is an extensible query engine and database toolkit …</p>pmcSat, 24 Jun 2023 00:00:00 +0000tag:datafusion.apache.org,2023-06-24:/blog/2023/06/24/datafusion-25.0.0blogApache Arrow DataFusion 16.0.0 Project Updatehttps://datafusion.apache.org/blog/2023/01/19/datafusion-16.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <h1 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h1> <p><a href="https://arrow.apache.org/datafusion/">DataFusion</a> is an extensible query execution framework, written in <a href="https://www.rust-lang.org/">Rust</a>, that uses <a href="https://arrow.apache.org">Apache Arrow</a> as its in-memory format. It is targeted primarily at developers creating data intensive analytics, and offers mature <a href="https://arrow.apache.org/datafusion/user-guide/sql/index.html">SQL support</a>, a DataFrame API, and many extension points.</p> <p>Systems based on DataFusion perform very well in benchmarks …</p>pmcThu, 19 Jan 2023 00:00:00 +0000tag:datafusion.apache.org,2023-01-19:/blog/2023/01/19/datafusion-16.0.0blogApache Arrow Ballista 0.9.0 Releasehttps://datafusion.apache.org/blog/2022/10/28/ballista-0.9.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <h1 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h1> <p><a href="https://github.com/apache/arrow-ballista">Ballista</a> is an Arrow-native distributed SQL query engine implemented in Rust.</p> <p>Ballista 0.9.0 is now available and is the most significant release since the project was <a href="http://arrow.apache.org/blog/2021/04/12/ballista-donation/">donated</a> to Apache Arrow in 2021.</p> <p>This release represents 4 weeks of work, with 66 commits from 14 contributors:</p> <pre><code> 22 Andy …</code></pre>pmcFri, 28 Oct 2022 00:00:00 +0000tag:datafusion.apache.org,2022-10-28:/blog/2022/10/28/ballista-0.9.0blogApache Arrow DataFusion 13.0.0 Project Updatehttps://datafusion.apache.org/blog/2022/10/25/datafusion-13.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <h1 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h1> <p><a href="https://arrow.apache.org/datafusion/">Apache Arrow DataFusion</a> <a href="https://crates.io/crates/datafusion"><code>13.0.0</code></a> is released, and this blog contains an update on the project for the 5 months since our <a href="https://arrow.apache.org/blog/2022/05/16/datafusion-8.0.0/">last update in May 2022</a>.</p> <p>DataFusion is an extensible and embeddable query engine, written in Rust used to create modern, fast and efficient data pipelines, ETL …</p>pmcTue, 25 Oct 2022 00:00:00 +0000tag:datafusion.apache.org,2022-10-25:/blog/2022/10/25/datafusion-13.0.0blogApache Arrow DataFusion 8.0.0 Releasehttps://datafusion.apache.org/blog/2022/05/16/datafusion-8.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <h1 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">¶</a></h1> <p><a href="https://arrow.apache.org/datafusion/">DataFusion</a> is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.</p> <p>When you want to extend your Rust project with <a href="https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html">SQL support</a>, a DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV data, DataFusion is definitely worth …</p>pmcMon, 16 May 2022 00:00:00 +0000tag:datafusion.apache.org,2022-05-16:/blog/2022/05/16/datafusion-8.0.0blogIntroducing Apache Arrow DataFusion Contribhttps://datafusion.apache.org/blog/2022/03/21/datafusion-contrib<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <h1 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">&para;</a></h1> <p>Apache Arrow <a href="https://arrow.apache.org/datafusion/">DataFusion</a> is an extensible query execution framework, written in Rust, that uses <a href="https://arrow.apache.org">Apache Arrow</a> as its in-memory format.</p> <p>When you want to extend your Rust project with <a href="https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html">SQL support</a>, a DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV data, DataFusion is …</p>pmcMon, 21 Mar 2022 00:00:00 +0000tag:datafusion.apache.org,2022-03-21:/blog/2022/03/21/datafusion-contribblogApache Arrow DataFusion 7.0.0 Releasehttps://datafusion.apache.org/blog/2022/02/28/datafusion-7.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <h1 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">&para;</a></h1> <p><a href="https://arrow.apache.org/datafusion/">DataFusion</a> is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.</p> <p>When you want to extend your Rust project with <a href="https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html">SQL support</a>, a DataFrame API, or the ability to read and process Parquet, JSON, Avro or CSV data, DataFusion is definitely worth …</p>pmcMon, 28 Feb 2022 00:00:00 +0000tag:datafusion.apache.org,2022-02-28:/blog/2022/02/28/datafusion-7.0.0blogApache Arrow DataFusion 6.0.0 Releasehttps://datafusion.apache.org/blog/2021/11/19/2021-11-8-datafusion-6.0.0.md<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <h1 id="introduction">Introduction<a class="headerlink" href="#introduction" title="Permanent link">&para;</a></h1> <p><a href="https://arrow.apache.org/datafusion/">DataFusion</a> is an embedded query engine which leverages the unique features of <a href="https://www.rust-lang.org/">Rust</a> and <a href="https://arrow.apache.org/">Apache Arrow</a> to provide a system that is high performance, easy to connect, easy to embed, and high quality.</p> <p>The Apache Arrow team is pleased to announce the DataFusion 6.0.0 release. This covers …</p>pmcFri, 19 Nov 2021 00:00:00 +0000tag:datafusion.apache.org,2021-11-19:/blog/2021/11/19/2021-11-8-datafusion-6.0.0.mdblogApache Arrow Ballista 0.5.0 Releasehttps://datafusion.apache.org/blog/2021/08/18/ballista-0.5.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>Ballista extends DataFusion to provide support for distributed queries. This is the first release of Ballista since the project was <a href="https://arrow.apache.org/blog/2021/04/12/ballista-donation/">donated</a> to the Apache Arrow project and includes 80 commits from 11 contributors.</p> <pre><code>git shortlog -sn 4.0.0..5.0.0 ballista/rust/client ballista/rust/core ballista/rust …</code></pre>pmcWed, 18 Aug 2021 00:00:00 +0000tag:datafusion.apache.org,2021-08-18:/blog/2021/08/18/ballista-0.5.0blogApache Arrow DataFusion 5.0.0 Releasehttps://datafusion.apache.org/blog/2021/08/18/datafusion-5.0.0<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>The Apache Arrow team is pleased to announce the DataFusion 5.0.0 release. This covers 4 months of development work and includes 211 commits from the following 31 distinct contributors.</p> <pre><code>$ git shortlog -sn 4.0.0..5.0.0 datafusion datafusion-cli datafusion-examples 61 Jiayu Liu 47 Andrew Lamb 27 …</code></pre>pmcWed, 18 Aug 2021 00:00:00 +0000tag:datafusion.apache.org,2021-08-18:/blog/2021/08/18/datafusion-5.0.0blogBallista: A Distributed Scheduler for Apache Arrowhttps://datafusion.apache.org/blog/2021/04/12/ballista-donation<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>We are excited to announce that <a href="https://github.com/apache/arrow-datafusion/tree/master/ballista">Ballista</a> has been donated to the Apache Arrow project. </p> <p>Ballista is a distributed compute platform primarily implemented in Rust, and powered by Apache Arrow. It is built on an architecture that allows other programming languages (such as Python, C++, and Java) to be supported …</p>agroveMon, 12 Apr 2021 00:00:00 +0000tag:datafusion.apache.org,2021-04-12:/blog/2021/04/12/ballista-donationblogDataFusion: A Rust-native Query Engine for Apache Arrowhttps://datafusion.apache.org/blog/2019/02/04/datafusion-donation<!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> <p>We are excited to announce that <a href="https://github.com/apache/arrow-datafusion">DataFusion</a> has been donated to the Apache Arrow project. DataFusion is an in-memory query engine for the Rust implementation of Apache Arrow.</p> <p>Although DataFusion was started two years ago, it was recently re-implemented to be Arrow-native and currently has limited capabilities but does support …</p>agroveMon, 04 Feb 2019 00:00:00 +0000tag:datafusion.apache.org,2019-02-04:/blog/2019/02/04/datafusion-donationblog