feat(rust/sedona-spatial-join): Integrate libgpuspatial into sedona-spatial-join by pwrliang · Pull Request #722 · apache/sedona-db

pwrliang · 2026-03-17T06:54:53Z

This PR integrates libgpuspatial by providing a GPUSpatialIndex and GPUSpatialIndexBuilder. The PR should pass the CI after its predecessor PRs (#717, #718, #719, #721) are merged.

pwrliang · 2026-03-17T06:55:34Z

@paleolimbot @Kontinuation Could you take a look when you are available?

rust/sedona-spatial-join/src/index/gpu_spatial_index_builder.rs

rust/sedona-spatial-join-gpu/src/index/gpu_spatial_index_builder.rs

Kontinuation · 2026-03-18T17:30:59Z

rust/sedona-spatial-join/src/operand_evaluator.rs

+                        if wkb.geometry_type() == GeometryType::Point {
+                            Some(Rect::new(
+                                coord!(x: min.x as f32, y: min.y as f32),
+                                coord!(x: max.x as f32, y: max.y as f32),
+                            ))


Will this introduce false negatives since the resulting bounding box is not guaranteed to cover the original bounding box?

Thanks for mentioning this. When one side contains only points, their coordinates in FP64 are downcast to FP32, which may result in false negatives. However, on the other side containing rectangles, I called next_after to produce a conservative bounding box. This compensates for potential false-negative issues.

For Geography joins we will also have to play similar games (i.e., create our own EvaluatedGeometryArray)...I will try to find an extension point here so that we can keep this logic separated.

I implemented this in #737 (the ability to override the Rect creation for a SpatialJoinProvider); however, it's worth double checking that these rectangles are not used anywhere that this assumption might cause problems.

paleolimbot

Thank you for working on this!

My main concern with this PR is that the GPU details are very entangled with the non-GPU details. How about:

sedona-spatial-join-common (defines the traits you nicely separated in the last PR and a SedonaSpatialJoinExtension trait)
sedona-spatial-join-gpu (implements the traits in sedona-spatial-join-common)
sedona-spatial-join (uses an Arc<dyn SedonaSpatialJoinExtension> object to decide if an extension join should be used in place of the normal join). The hook to register a join extension can be a global function for now and you can use a mock join extension built from the default implementation index/refiner to test it.
sedona (when built with the gpu feature, registers the Gpu join with sedona-spatial-join)

There are a few things I think this will help with:

We may want to add other join accelerators for other hardware (e.g., Apple GPU) or other data types (Geography x H3 cell join), or possibly Raster/Vector join.
It will make it easier for you to have clear ownership over a subdirectory so you can iterate faster.
It makes the sedona crate the single place where features are assembled (to avoid two levels of feature flags, the consequences of which you've started to identify here with the "all except GPU" feature).

I do think it's important to get this right early while we are all here and have time to dedicate to this!

pwrliang · 2026-03-21T02:42:19Z

Thank you for working on this!

My main concern with this PR is that the GPU details are very entangled with the non-GPU details. How about:

sedona-spatial-join-common (defines the traits you nicely separated in the last PR and a SedonaSpatialJoinExtension trait)

sedona-spatial-join-gpu (implements the traits in sedona-spatial-join-common)

sedona-spatial-join (uses an Arc<dyn SedonaSpatialJoinExtension> object to decide if an extension join should be used in place of the normal join). The hook to register a join extension can be a global function for now and you can use a mock join extension built from the default implementation index/refiner to test it.

sedona (when built with the gpu feature, registers the Gpu join with sedona-spatial-join)

There are a few things I think this will help with:

We may want to add other join accelerators for other hardware (e.g., Apple GPU) or other data types (Geography x H3 cell join), or possibly Raster/Vector join.

It will make it easier for you to have clear ownership over a subdirectory so you can iterate faster.

It makes the sedona crate the single place where features are assembled (to avoid two levels of feature flags, the consequences of which you've started to identify here with the "all except GPU" feature).

I do think it's important to get this right early while we are all here and have time to dedicate to this!

Thanks for the suggestions. If I understand correctly, this design implies that sedona-spatial-join-gpu will execute end-to-end spatial joins. This requires several components to work together, such as the optimizer, planner, and SpatialJoinStream. We would either need to export all these components to the GPU module or copy/paste the code. My concern is that this design could introduce a lot of redundant code. Since libgpuspatial functions as a refinement backend (similar to GEOS, GEO, or TG), I'm not sure why adding one more backend requiring such significant architectural changes.

paleolimbot · 2026-03-21T04:05:52Z

Thanks for the suggestions. If I understand correctly, this design implies that sedona-spatial-join-gpu will execute end-to-end spatial joins.

I don't think we need to (although we can if it is easier)....I had in mind that the sedona-spatial-join would be expose something like fn register_join_extension(extension: Arc<dyn SpatialJoinExtension>), where SpatialJoinExtension has members that instantiate the builder and refiner.

It may be worth trying to separate that all within sedona-spatial-join first to see what it will look like (and then move whatever has to be moved to sedona-spatial-join-common, and then move whatever has to be moved into sedona-spatial-join-gpu.

paleolimbot · 2026-03-21T04:15:19Z

Since libgpuspatial functions as a refinement backend (similar to GEOS, GEO, or TG), I'm not sure why adding one more backend requiring such significant architectural changes.

I think the main difference is that (1) the index is now configurable where it wasn't before and (2) tg, geo, and geos are trivial dependencies to satisfy.

I'm happy to give this a go on Monday!

paleolimbot

Thank you!

All just details from me. Looking forward to having this merged!

rust/sedona-common/src/option.rs

paleolimbot · 2026-04-09T18:24:38Z

rust/sedona-spatial-join/src/index/default_spatial_index.rs

        self.inner.schema.clone()
    }

+    #[allow(dead_code)]


Suggested change

#[allow(dead_code)]

I think this can be removed now because the trait is pub?

rust/sedona/src/context.rs

rust/sedona-spatial-join-gpu/Cargo.toml

rust/sedona-spatial-join-gpu/src/physical_planner.rs

rust/sedona-spatial-join-gpu/src/index/gpu_spatial_index_builder.rs

rust/sedona-spatial-join-gpu/src/index/gpu_spatial_index.rs

paleolimbot · 2026-04-09T18:59:49Z

rust/sedona-spatial-join-gpu/src/index/gpu_spatial_index.rs

+#[cfg(test)]
+#[cfg(feature = "gpu")]
+mod tests {
+    use crate::index::gpu_spatial_index_builder::GPUSpatialIndexBuilder;
+    use arrow_array::RecordBatch;
+    use arrow_schema::{DataType, Field, SchemaRef};
+    use datafusion_common::JoinType;
+    use datafusion_physical_expr::expressions::Column;
+    use futures::Stream;


For #702 we have to move all of these tests into rust/sedona/tests to avoid circular dev dependencies, where we can deduplicate the common pieces with sedona-spatial-join.

pwrliang · 2026-04-10T15:19:19Z

I have addressed most of the issues. I will run an end-to-end test on a machine with a GPU and post performance numbers on a subset of SpatialBench.

pwrliang · 2026-04-12T13:01:25Z

Instance: g6.2xlarge (NVIDIA L4)
Spilling is disabled.

SF=1

Query	sedonadb_gpu	datafusion batch size
q2	0.43s	100K
q4	0.60s	100K
q6	0.67s	100K
q9	0.03s	100K
q10	2.53s	2M
q11	4.29s	2M

SF=10

Query	sedonadb_gpu	datafusion batch size
q2	1.91s	100K
q4	1.23s	100K
q6	2.70s	100K
q9	0.12s	100K
q10	8.52s	2M
q11	25.51s	2M

I noticed performance regressions compared to my previous implementation.

After my investigation, the performance degradation is due to the introduction of ProbeShuffleExec node and the new implementation of GpuSpatialIndexBuilder with EvaluatedGeometryArray::interleave. The former can be addressed by setting sedona.spatial_join.repartition_probe_side = false and the latter can be improved by providing the EvaluatedGeometryArray::concatenate() method as suggested.

…ith tests

Kontinuation reviewed Mar 18, 2026

View reviewed changes

pwrliang force-pushed the integrate/libgpuspatial branch from 1d3b96b to 41b97c8 Compare March 19, 2026 11:59

paleolimbot reviewed Mar 20, 2026

View reviewed changes

paleolimbot mentioned this pull request Mar 24, 2026

refactor(rust/sedona-query-planner): Move query planner and utilities to dedicated crate #735

Merged

paleolimbot and others added 2 commits April 8, 2026 19:46

Add GPU join crate

fa0dbf2

Add gpu feature to python package

27299af

pwrliang force-pushed the integrate/libgpuspatial branch from 78a3d20 to 27299af Compare April 9, 2026 04:57

pwrliang added 4 commits April 9, 2026 12:58

Remove planner

69f1439

Fix clippy issues

7d26baa

Skip gpu crate in tests

149240f

Add missing license

466eccb

paleolimbot reviewed Apr 9, 2026

View reviewed changes

pwrliang added 6 commits April 10, 2026 11:01

Implement option injection

99b3c08

Remove unused deps

3c9253e

Cleanup planner rules

3ba472a

Remove the ctor having too many args

22cbc31

Address some issues

799ef0d

Fix clippy issues

58ba749

Bug fix

81204db

Temporary remove gpu feature for python package, will bring it back w…

e0ccc0a

…ith tests

Conversation

pwrliang commented Mar 17, 2026

Uh oh!

pwrliang commented Mar 17, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kontinuation Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

pwrliang Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

paleolimbot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

paleolimbot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

pwrliang commented Mar 21, 2026

Uh oh!

paleolimbot commented Mar 21, 2026

Uh oh!

paleolimbot commented Mar 21, 2026

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

paleolimbot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

paleolimbot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

pwrliang commented Apr 10, 2026

Uh oh!

pwrliang commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pwrliang commented Apr 12, 2026 •

edited

Loading