Skip to content

Add missing array functions#1468

Open
timsaucer wants to merge 7 commits intoapache:mainfrom
timsaucer:feat/add-missing-array-fns
Open

Add missing array functions#1468
timsaucer wants to merge 7 commits intoapache:mainfrom
timsaucer:feat/add-missing-array-fns

Conversation

@timsaucer
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #1452

Rationale for this change

These features are available upstream but not exposed to the python API.

What changes are included in this PR?

Add python API
Add unit tests

Are there any user-facing changes?

Addition only.

timsaucer and others added 6 commits April 3, 2026 13:52
Add new array functions from upstream DataFusion v53: array_any_value,
array_distance, array_max, array_min, array_reverse, arrays_zip,
string_to_array, and gen_series. Add corresponding list_* aliases and
missing list_* aliases for existing functions (list_empty, list_pop_back,
list_pop_front, list_has, list_has_all, list_has_any). Also add
array_contains/list_contains as aliases for array_has, generate_series
as alias for gen_series, and string_to_list as alias for string_to_array.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests cover all functions and aliases added in the previous commit:
array_any_value, array_distance, array_max, array_min, array_reverse,
arrays_zip, string_to_array, gen_series, generate_series,
array_contains, list_contains, list_empty, list_pop_back,
list_pop_front, list_has, list_has_all, list_has_any, and list_*
aliases for the new functions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…comment

- Make null_string optional in string_to_array/string_to_list
- Make step optional in gen_series/generate_series
- Rename second_array to element in array_contains/list_has/list_contains
- Restore # Window Functions section comment in __all__
- Add tests for optional parameter variants

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduce 26 individual tests to 14 test functions with parametrized
cases, eliminating boilerplate while maintaining full coverage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…block

Merge standalone tests for list_empty, list_pop_back, list_pop_front,
list_has, array_contains, list_contains, list_has_all, and list_has_any
into the existing parametrized test_array_functions block alongside
their array_* counterparts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use the richer multi-row dataset (including all-nulls case) for both
array_any_value and list_any_value via the parametrized test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR exposes several upstream DataFusion array/list scalar functions and aliases through the datafusion-python API, and adds Python unit tests to validate the new bindings and aliases (closing #1452).

Changes:

  • Added Python API exports and wrappers for new array/list functions and list_* aliases (e.g., array_any_value, array_distance, array_max/min, array_reverse, arrays_zip, string_to_array, gen_series, plus list_* aliases).
  • Added Rust pyo3 bindings for newly exposed functions that weren’t previously available in the Python extension module.
  • Expanded unit test coverage to exercise new functions and alias behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
python/datafusion/functions.py Adds new public function exports (__all__) and Python-level wrappers/aliases for array/list functions.
crates/core/src/functions.rs Adds pyo3 bindings for new DataFusion nested functions/UDFs and registers them in the Python extension module.
python/tests/test_functions.py Adds unit tests for new functions and alias coverage in both the general array-function parametrized suite and targeted tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

These aliases match the upstream DataFusion SQL-level aliases, completing
the set of missing array functions from issue apache#1452.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer marked this pull request as ready for review April 3, 2026 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add missing array/list functions and aliases

2 participants