feat(rust/sedona-spatial-join): Integrate libgpuspatial into sedona-spatial-join#722
feat(rust/sedona-spatial-join): Integrate libgpuspatial into sedona-spatial-join#722pwrliang wants to merge 14 commits intoapache:mainfrom
Conversation
|
@paleolimbot @Kontinuation Could you take a look when you are available? |
rust/sedona-spatial-join/src/index/gpu_spatial_index_builder.rs
Outdated
Show resolved
Hide resolved
rust/sedona-spatial-join/src/index/gpu_spatial_index_builder.rs
Outdated
Show resolved
Hide resolved
rust/sedona-spatial-join-gpu/src/index/gpu_spatial_index_builder.rs
Outdated
Show resolved
Hide resolved
| if wkb.geometry_type() == GeometryType::Point { | ||
| Some(Rect::new( | ||
| coord!(x: min.x as f32, y: min.y as f32), | ||
| coord!(x: max.x as f32, y: max.y as f32), | ||
| )) |
There was a problem hiding this comment.
Will this introduce false negatives since the resulting bounding box is not guaranteed to cover the original bounding box?
There was a problem hiding this comment.
Thanks for mentioning this. When one side contains only points, their coordinates in FP64 are downcast to FP32, which may result in false negatives. However, on the other side containing rectangles, I called next_after to produce a conservative bounding box. This compensates for potential false-negative issues.
There was a problem hiding this comment.
For Geography joins we will also have to play similar games (i.e., create our own EvaluatedGeometryArray)...I will try to find an extension point here so that we can keep this logic separated.
There was a problem hiding this comment.
I implemented this in #737 (the ability to override the Rect creation for a SpatialJoinProvider); however, it's worth double checking that these rectangles are not used anywhere that this assumption might cause problems.
1d3b96b to
41b97c8
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
Thank you for working on this!
My main concern with this PR is that the GPU details are very entangled with the non-GPU details. How about:
sedona-spatial-join-common(defines the traits you nicely separated in the last PR and aSedonaSpatialJoinExtensiontrait)sedona-spatial-join-gpu(implements the traits in sedona-spatial-join-common)sedona-spatial-join(uses anArc<dyn SedonaSpatialJoinExtension>object to decide if an extension join should be used in place of the normal join). The hook to register a join extension can be a global function for now and you can use a mock join extension built from the default implementation index/refiner to test it.sedona(when built with the gpu feature, registers the Gpu join with sedona-spatial-join)
There are a few things I think this will help with:
- We may want to add other join accelerators for other hardware (e.g., Apple GPU) or other data types (Geography x H3 cell join), or possibly Raster/Vector join.
- It will make it easier for you to have clear ownership over a subdirectory so you can iterate faster.
- It makes the
sedonacrate the single place where features are assembled (to avoid two levels of feature flags, the consequences of which you've started to identify here with the "all except GPU" feature).
I do think it's important to get this right early while we are all here and have time to dedicate to this!
Thanks for the suggestions. If I understand correctly, this design implies that |
I don't think we need to (although we can if it is easier)....I had in mind that the It may be worth trying to separate that all within sedona-spatial-join first to see what it will look like (and then move whatever has to be moved to sedona-spatial-join-common, and then move whatever has to be moved into sedona-spatial-join-gpu. |
I think the main difference is that (1) the index is now configurable where it wasn't before and (2) tg, geo, and geos are trivial dependencies to satisfy. I'm happy to give this a go on Monday! |
78a3d20 to
27299af
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
Thank you!
All just details from me. Looking forward to having this merged!
| self.inner.schema.clone() | ||
| } | ||
|
|
||
| #[allow(dead_code)] |
There was a problem hiding this comment.
| #[allow(dead_code)] |
I think this can be removed now because the trait is pub?
rust/sedona-spatial-join-gpu/src/index/gpu_spatial_index_builder.rs
Outdated
Show resolved
Hide resolved
| #[cfg(test)] | ||
| #[cfg(feature = "gpu")] | ||
| mod tests { | ||
| use crate::index::gpu_spatial_index_builder::GPUSpatialIndexBuilder; | ||
| use arrow_array::RecordBatch; | ||
| use arrow_schema::{DataType, Field, SchemaRef}; | ||
| use datafusion_common::JoinType; | ||
| use datafusion_physical_expr::expressions::Column; | ||
| use futures::Stream; |
There was a problem hiding this comment.
For #702 we have to move all of these tests into rust/sedona/tests to avoid circular dev dependencies, where we can deduplicate the common pieces with sedona-spatial-join.
|
I have addressed most of the issues. I will run an end-to-end test on a machine with a GPU and post performance numbers on a subset of SpatialBench. |

This PR integrates libgpuspatial by providing a
GPUSpatialIndexandGPUSpatialIndexBuilder. The PR should pass the CI after its predecessor PRs (#717, #718, #719, #721) are merged.