Skip to content

Add a pull-style streaming select API#471

Open
iskakaushik wants to merge 1 commit intoClickHouse:masterfrom
iskakaushik:feature/add-streaming-select-api
Open

Add a pull-style streaming select API#471
iskakaushik wants to merge 1 commit intoClickHouse:masterfrom
iskakaushik:feature/add-streaming-select-api

Conversation

@iskakaushik
Copy link
Copy Markdown
Contributor

pg_clickhouse needs to consume select results one block at a time, but clickhouse-cpp only exposes a callback-driven select path today. That forces downstream users to layer coroutines or connection resets on top of the client when they need pull-style iteration.

Add BeginSelect(), ReceiveSelectBlock(), and EndSelect() to mirror the existing multi-step insert workflow. The implementation reuses the existing query and packet handling code, keeps Query callbacks active for progress, profile, and log packets, and drains canceled queries so connections remain reusable.

Add integration tests that cover full streaming iteration, preserved Query callbacks, early cleanup, end-of-stream reuse, and exception cleanup with subsequent reuse.

pg_clickhouse needs to consume select results one block at a time, but
clickhouse-cpp only exposes a callback-driven select path today. That
forces downstream users to layer coroutines or connection resets on top
of the client when they need pull-style iteration.

Add BeginSelect(), ReceiveSelectBlock(), and EndSelect() to mirror the
existing multi-step insert workflow. The implementation reuses the
existing query and packet handling code, keeps Query callbacks active
for progress, profile, and log packets, and drains canceled queries so
connections remain reusable.

Add integration tests that cover full streaming iteration, preserved
Query callbacks, early cleanup, end-of-stream reuse, and exception
cleanup with subsequent reuse.
@slabko
Copy link
Copy Markdown
Contributor

slabko commented Apr 7, 2026

@iskakaushik master seem to build fine, can you take a look at CI errors.

Copy link
Copy Markdown
Contributor

@slabko slabko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some back and forth, implementing a ReceivePacket that only parses a single packet and returns a tagged enum (std::variant) does not seem particularly complex. The current ReceivePacket can then be built on top of it.

This gives full control over the ExecuteQuery loop, allowing a synchronous implementation without workarounds. In other words, the library can become synchronous while still preserving the asynchronous (callback-based) API.

This also enables synchronous handling of other events.

Finally, clickhouse::Query callbacks should not be ignored in synchronous mode—they can still be invoked, making the implementation complete.

There will be some extra work to be done, but it will be clear what after experimenting more with synchronous version of ReceivePacket

bool inserting_;
bool inserting_ = false;
bool selecting_ = false;
bool discarding_select_data_ = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will not be needed if after proper synchronous version of ReceivePacket is implemented.

ServerInfo server_info_;

bool inserting_;
bool inserting_ = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch mutually exclusive of bools look time like a good candidate for a enum.

@slabko
Copy link
Copy Markdown
Contributor

slabko commented Apr 8, 2026

@iskakaushik
Here ReceivePacket now returns the data that it has received: #474

Regarding API naming, for a pull-based API I would use BeginSelect and SelectNext, which returns the next block (and still passes it through callbacks). Based on that, I could implement SelectAll, which passes all blocks through the callbacks. After that, I could re-implement Select by calling PrepareSelect and SelectAll. And here we are, with Select implemented in terms of the new pull-based ReceivePacket, while still preserving all callbacks embedded in Query.

Based on that it is relatively easy to return data to the caller directly and allow interactive control over the loop.

This is where I would start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants