Open
Conversation
Add new array functions from upstream DataFusion v53: array_any_value, array_distance, array_max, array_min, array_reverse, arrays_zip, string_to_array, and gen_series. Add corresponding list_* aliases and missing list_* aliases for existing functions (list_empty, list_pop_back, list_pop_front, list_has, list_has_all, list_has_any). Also add array_contains/list_contains as aliases for array_has, generate_series as alias for gen_series, and string_to_list as alias for string_to_array. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests cover all functions and aliases added in the previous commit: array_any_value, array_distance, array_max, array_min, array_reverse, arrays_zip, string_to_array, gen_series, generate_series, array_contains, list_contains, list_empty, list_pop_back, list_pop_front, list_has, list_has_all, list_has_any, and list_* aliases for the new functions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…comment - Make null_string optional in string_to_array/string_to_list - Make step optional in gen_series/generate_series - Rename second_array to element in array_contains/list_has/list_contains - Restore # Window Functions section comment in __all__ - Add tests for optional parameter variants Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduce 26 individual tests to 14 test functions with parametrized cases, eliminating boilerplate while maintaining full coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…block Merge standalone tests for list_empty, list_pop_back, list_pop_front, list_has, array_contains, list_contains, list_has_all, and list_has_any into the existing parametrized test_array_functions block alongside their array_* counterparts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use the richer multi-row dataset (including all-nulls case) for both array_any_value and list_any_value via the parametrized test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5b592dc to
ef48dd9
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR exposes several upstream DataFusion array/list scalar functions and aliases through the datafusion-python API, and adds Python unit tests to validate the new bindings and aliases (closing #1452).
Changes:
- Added Python API exports and wrappers for new array/list functions and
list_*aliases (e.g.,array_any_value,array_distance,array_max/min,array_reverse,arrays_zip,string_to_array,gen_series, pluslist_*aliases). - Added Rust pyo3 bindings for newly exposed functions that weren’t previously available in the Python extension module.
- Expanded unit test coverage to exercise new functions and alias behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
python/datafusion/functions.py |
Adds new public function exports (__all__) and Python-level wrappers/aliases for array/list functions. |
crates/core/src/functions.rs |
Adds pyo3 bindings for new DataFusion nested functions/UDFs and registers them in the Python extension module. |
python/tests/test_functions.py |
Adds unit tests for new functions and alias coverage in both the general array-function parametrized suite and targeted tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
These aliases match the upstream DataFusion SQL-level aliases, completing the set of missing array functions from issue apache#1452. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #1452
Rationale for this change
These features are available upstream but not exposed to the python API.
What changes are included in this PR?
Add python API
Add unit tests
Are there any user-facing changes?
Addition only.