Skip to content

feat(hook): move hooks to direct func call#35

Merged
bzp2010 merged 3 commits intomainfrom
bzp/feat-move-hook-to-func-call
Apr 12, 2026
Merged

feat(hook): move hooks to direct func call#35
bzp2010 merged 3 commits intomainfrom
bzp/feat-move-hook-to-func-call

Conversation

@bzp2010
Copy link
Copy Markdown
Collaborator

@bzp2010 bzp2010 commented Apr 10, 2026

Second part of #34

Summary by CodeRabbit

  • Refactor
    • Streamlined request context and hook plumbing for more consistent authorization and rate-limit checks; observability centralized for both streaming and non‑streaming flows.
  • Bug Fixes
    • Improved handling and reporting of rate‑limit failures and injection of rate‑limit headers into responses.
  • Enhancement
    • Added first‑token latency and final streaming usage metrics to improve performance visibility.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

Replaces the legacy HookContext/HookManager hook framework with a new async RequestContext-based system; handlers now call standalone async hooks (authorization, rate_limit, observability), move request-scoped storage into RequestContext extensions, and remove the old hooks2/metric modules.

Changes

Cohort / File(s) Summary
Chat Completions
src/proxy/handlers/chat_completions/mod.rs, src/proxy/handlers/chat_completions/types.rs
Removed HookContext plumbing; handler now uses RequestContext, calls hooks::observability::record_start_time, hooks::authorization::check, hooks::rate_limit::pre_check; streaming flow centralizes finalization with finalize_stream_request; replaced HookError with explicit RateLimitError handling.
Embeddings
src/proxy/handlers/embeddings/mod.rs, src/proxy/handlers/embeddings/types.rs
Dropped HookContext param; switched to RequestContext and standalone hooks::* calls for auth, rate limit, observability; replaced generic HookError with RateLimitError.
Models handler
src/proxy/handlers/models.rs
Removed HookContext/Request parameters and HOOK_MANAGER pre-call; list_models reads ApiKey from RequestContext extensions; ModelError collapsed to empty enum (no hook error propagation).
Hook core & context
src/proxy/hooks/mod.rs, src/proxy/hooks/context.rs
Removed HookManager/HookContext/ProxyHook trait APIs; added new RequestContext (async RwLock-wrapped extensions, FromRequestParts impl) and restructured hooks module to export authorization, observability, rate_limit and re-export RequestContext.
Authorization
src/proxy/hooks/authorization/mod.rs
Now uses RequestContext API; reads/writes ApiKey, model, and request-scoped data via extensions().await / extensions_mut().await.
Rate limiting
src/proxy/hooks/rate_limit/mod.rs
Removed RateLimitHook impl; added standalone async functions pre_check, post_check, post_check_streaming, inject_response_headers and RateLimitError (wraps raw Response); stores rate-limit/concurrency state in RequestContext extensions.
Observability (metrics)
src/proxy/hooks/observability/mod.rs
New module replacing removed metric hook; exposes record_start_time, record_usage, record_first_token_latency, record_streaming_usage that read/write start time and TokenUsage in RequestContext extensions and emit metrics.
Cleanup: removed legacy modules/types
src/proxy/hooks2/*, src/proxy/hooks/metric.rs, src/proxy/mod.rs, src/proxy/middlewares/mod.rs
Deleted hooks2 RequestContext and metric hook; made hooks internal to proxy (visibility changed) and removed exported RequestModel type.

Sequence Diagrams

sequenceDiagram
  participant Client as Client
  participant Handler as Handler (chat_completions / embeddings)
  participant ReqCtx as RequestContext
  participant Auth as hooks::authorization
  participant RLPre as hooks::rate_limit::pre_check
  participant Provider as Model Provider
  participant RLPost as hooks::rate_limit::post_check
  participant Obs as hooks::observability

  Client->>Handler: HTTP request
  Handler->>ReqCtx: construct/obtain RequestContext
  Handler->>Obs: record_start_time(&mut ReqCtx)
  Handler->>Auth: authorization::check(&mut ReqCtx)
  Auth-->>Handler: ok
  Handler->>RLPre: rate_limit::pre_check(&mut ReqCtx)
  RLPre-->>Handler: ok
  Handler->>Provider: invoke model/provider
  Provider-->>Handler: response
  Handler->>RLPost: rate_limit::post_check(&mut ReqCtx, response)
  Handler->>Obs: record_usage(&mut ReqCtx, response)
  Handler-->>Client: HTTP response
Loading
sequenceDiagram
  participant Client as Client
  participant Handler as Handler (streaming)
  participant ReqCtx as RequestContext (cloned)
  participant Auth as hooks::authorization
  participant RLPre as hooks::rate_limit::pre_check
  participant Provider as Provider (stream)
  participant RLStream as hooks::rate_limit::post_check_streaming
  participant Obs as hooks::observability

  Client->>Handler: streaming request
  Handler->>ReqCtx: construct/clone RequestContext
  Handler->>Auth: authorization::check(&mut ReqCtx)
  Handler->>RLPre: rate_limit::pre_check(&mut ReqCtx)
  Handler->>Provider: start stream
  Provider-->>Handler: first chunk
  Handler->>Obs: record_first_token_latency(&mut ReqCtx)
  Note right of Handler: per-chunk TokenUsage inserted into ReqCtx extensions when present
  Provider-->>Handler: stream completes / [DONE]
  Handler->>RLStream: rate_limit::post_check_streaming(&mut ReqCtx)
  Handler->>Obs: record_streaming_usage(&mut ReqCtx)
  Handler-->>Client: streaming end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • LiteSun
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
E2e Test Quality Review ⚠️ Warning PR lacks E2E tests for significant refactoring, missing unit tests for RequestContext/observability/rate_limit modules, and has error handling gaps where post_check failures are logged but not propagated to responses. Add E2E tests covering full chat_completions and embeddings flows (streaming/non-streaming), implement unit tests for new modules, and improve error propagation from rate_limit post_check instead of silent logging.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main refactoring: moving from a hook manager/trait-based architecture to direct function calls for authorization, rate limiting, and observability.
Security Check ✅ Passed RequestContext properly isolates sensitive data in protected RwLock; authorization checks use ? operator for proper error propagation; no logging of API keys or tokens; specific error types prevent silent failures.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bzp/feat-move-hook-to-func-call

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/proxy/handlers/chat_completions/mod.rs (1)

157-160: ⚠️ Potential issue | 🟠 Major

Finalize stream accounting before aborting on provider errors.

When the upstream stream yields Some(Err(_)), this branch closes the SSE immediately. Any usage already accumulated in request_ctx is then lost, so failed streams are neither rate-accounted nor measured. Run the same finalization used by the done branch before returning None.

Suggested fix
                         Some(Err(err)) => {
                             error!("Stream error: {}", err);
+                            if let Err(err) =
+                                hooks::rate_limit::post_check_streaming(&mut request_ctx).await
+                            {
+                                error!("Rate limit post_check_streaming error: {}", err);
+                            }
+                            hooks::observability::record_streaming_usage(&mut request_ctx).await;
                             drop(span);
                             None
                         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/handlers/chat_completions/mod.rs` around lines 157 - 160, When
handling Some(Err(err)) you currently log and return None without finalizing
request usage; replicate the same finalization performed in the `done` branch
before returning to ensure usage/rate accounting and metrics are recorded.
Modify the Some(Err(err)) branch (the block that currently does error!("Stream
error: {}", err); drop(span); None) to invoke the same finalization logic used
by the `done` branch for `request_ctx` (or extract that finalization into a
helper and call it here), then drop the span and return None.
🧹 Nitpick comments (7)
src/proxy/hooks/authorization/mod.rs (1)

87-88: Consider combining consecutive lock acquisitions.

Two separate extensions_mut().await calls acquire the write lock twice in succession. This could be combined into a single lock acquisition for slightly better performance.

♻️ Proposed optimization
-    ctx.extensions_mut().await.insert(model);
-    ctx.extensions_mut().await.insert(RequestModel(model_name));
+    {
+        let mut ext = ctx.extensions_mut().await;
+        ext.insert(model);
+        ext.insert(RequestModel(model_name));
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/hooks/authorization/mod.rs` around lines 87 - 88, The two
consecutive write-lock acquisitions on ctx.extensions_mut().await should be
merged: obtain the mutable extensions once (e.g., let mut extensions =
ctx.extensions_mut().await) and then call extensions.insert(model) followed by
extensions.insert(RequestModel(model_name)); update the code around the
ctx.extensions_mut().await calls so both insertions use the single mutable
reference to avoid duplicate await/lock overhead.
src/proxy/handlers/embeddings/mod.rs (2)

23-30: Missing #[fastrace::trace] decorator.

Per coding guidelines, public functions in src/**/*.rs should use #[fastrace::trace] for distributed tracing. The embeddings handler is missing this decorator.

♻️ Proposed fix
+#[fastrace::trace]
 pub async fn embeddings(
     State(_state): State<AppState>,
     mut request_ctx: RequestContext,

As per coding guidelines: "Use #[fastrace::trace] decorator for distributed tracing on public functions".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/handlers/embeddings/mod.rs` around lines 23 - 30, The public
handler function embeddings is missing the required #[fastrace::trace]
attribute; add the #[fastrace::trace] decorator immediately above the pub async
fn embeddings(...) declaration so the function is instrumented for distributed
tracing, ensuring the attribute appears before the function signature (no other
code changes needed inside hooks::observability or the function body).

33-38: Consider handling the unwrap() more gracefully.

While the TODO acknowledges this, the unwrap() could panic if authorization::check fails to insert the Model for any reason. An explicit error return would be more defensive.

♻️ Proposed defensive approach
-    //TODO: safe unwrap
-    let model = request_ctx
+    let model = request_ctx
         .extensions()
         .await
         .get::<ResourceEntry<Model>>()
         .cloned()
-        .unwrap();
+        .ok_or_else(|| {
+            error!("Model not found in context after authorization");
+            EmbeddingError::AuthorizationError(
+                crate::proxy::hooks::authorization::AuthorizationError::MissingApiKeyInContext
+            )
+        })?;

Alternatively, you could add a specific error variant for this case.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/handlers/embeddings/mod.rs` around lines 33 - 38, The code
currently calls unwrap on
request_ctx.extensions().await.get::<ResourceEntry<Model>>().cloned(), which can
panic if the Model entry is missing; change this to a safe check that returns an
explicit error instead of panicking — e.g., replace the unwrap with a match or
ok_or/ok_or_else and return a typed error/Response (create or use an existing
error variant) from the handler when the ResourceEntry<Model> is absent;
reference the same symbols (request_ctx, extensions().await,
ResourceEntry<Model>, and the surrounding handler function) and ensure the error
path conveys that authorization::check failed to insert the Model.
src/proxy/handlers/models.rs (1)

52-57: The expect() here is redundant.

The RequestContext::from_request_parts implementation already uses expect() when extracting the ApiKey from request extensions (see src/proxy/hooks/context.rs:30-32). If the ApiKey were missing, extraction would have panicked before reaching this handler. Consider using unwrap() with a comment referencing the invariant, or simply trust the prior extraction.

♻️ Suggested simplification
     let api_key = request_ctx
         .extensions()
         .await
         .get::<ResourceEntry<ApiKey>>()
         .cloned()
-        .expect("apikey should exist in context");
+        .expect("ApiKey guaranteed by RequestContext extraction");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/handlers/models.rs` around lines 52 - 57, The call using expect()
when retrieving api_key in models.rs is redundant because
RequestContext::from_request_parts already guarantees an ApiKey is present;
replace the expect("apikey should exist in context") with a simple unwrap() (or
remove the explicit panic and call .cloned().unwrap()) and add a short comment
referencing that RequestContext::from_request_parts enforces this invariant (see
RequestContext::from_request_parts) so future readers know the absence cannot
occur.
src/proxy/hooks/context.rs (1)

22-40: Consider a more informative rejection type.

Using type Rejection = () means extraction failures produce an opaque 500 error. While the expect() guarantees ApiKey presence (assuming auth middleware runs), a custom rejection with IntoResponse would provide clearer error handling if the invariant is ever violated.

This is acceptable for now since the auth middleware guarantees ApiKey presence, but if the middleware chain changes in the future, the empty rejection provides no diagnostic information.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/hooks/context.rs` around lines 22 - 40, The extractor impl for
FromRequestParts<AppState> for RequestContext uses type Rejection = () and an
expect() on parts.extensions.remove::<ResourceEntry<ApiKey>>(), which yields
opaque 500s on failure; replace the unit rejection with a small custom rejection
type (e.g., RequestContextRejection) that implements IntoResponse and conveys a
clear error message/status, change from_request_parts to return
Err(RequestContextRejection::MissingApiKey) when the ResourceEntry<ApiKey> is
absent (avoid expect), and reference the FromRequestParts<AppState> impl, the
from_request_parts method, and ResourceEntry<ApiKey> when making the change so
the extractor produces informative responses on extraction failure.
src/proxy/hooks/rate_limit/mod.rs (1)

24-28: Let RateLimitHookError own its Axum response conversion.

The handlers currently have to unpack RateLimitHookError::Raw(resp) manually just to return the response. Implementing IntoResponse here keeps the HTTP mapping at the error-type boundary and matches the repo's proxy error guideline.

As per coding guidelines, src/{proxy,admin}/**/*.rs: Implement IntoResponse for error types to support Axum HTTP handlers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/hooks/rate_limit/mod.rs` around lines 24 - 28, Implement
IntoResponse for RateLimitHookError so handlers can return the error directly;
match the Raw(Response<Body>) variant and return its inner Response as the HTTP
response. Concretely, add an impl axum::response::IntoResponse for
RateLimitHookError that matches self { RateLimitHookError::Raw(resp) =>
resp.into_response() } (ensure you import axum::response::IntoResponse and use
the existing Response<Body> type).
src/proxy/hooks/observability/mod.rs (1)

8-9: Tighten this module's public API surface.

StartTime looks like an internal extension marker, so making it pub unnecessarily widens the hooks API. While touching these exports, the public helpers should also get /// docs and #[fastrace::trace] so the new observability path stays documented and visible in traces.

As per coding guidelines, src/**/*.rs: Use #[fastrace::trace] decorator for distributed tracing on public functions, and **/*.rs: Use /// for doc comments on public items in Rust.

Also applies to: 59-79

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/hooks/observability/mod.rs` around lines 8 - 9, StartTime is an
internal marker and should not be exported: remove the pub from the StartTime
tuple struct (change `pub struct StartTime(Instant);` to `struct
StartTime(Instant)`), and for the module's public helper functions (the
functions currently exported around the 59–79 region) add triple-slash doc
comments describing their purpose and annotate each with `#[fastrace::trace]` to
enable distributed tracing; ensure signatures remain the same and only
visibility, docs, and the `#[fastrace::trace]` attribute are changed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/proxy/handlers/chat_completions/mod.rs`:
- Around line 73-83: The success (non-streaming) completion path currently never
calls hooks::observability::record_usage, so add a call to
hooks::observability::record_usage(request_ctx,
&ResponseData::ChatCompletion(response.clone()), start_time). Locate the
non-streaming branch that builds resp = Json(response).into_response() and,
after injecting headers with hooks::rate_limit::inject_response_headers(...) and
before returning the response, await hooks::observability::record_usage with the
same request_ctx, the cloned ResponseData::ChatCompletion(response.clone()), and
the previously saved start_time (the same value used for chat_completions) so
latency/token metrics are emitted like the streaming path.

In `@src/proxy/hooks/rate_limit/mod.rs`:
- Around line 43-52: The code holds the extensions guard returned by
ctx.extensions_mut().await across .await points (e.g., in run_post_check calling
apply_post_check and similarly in the pre/concurrency check blocks), which can
deadlock request-local state; fix by reading needed resources (api_key, model)
and releasing the guard before calling async check functions, then reacquire
ctx.extensions_mut().await only to mutate RateLimitState or ConcurrencyState
with the results; specifically change run_post_check (and the blocks around
apply_pre_check and apply_concurrency_check) to compute check results outside of
the guard, drop the guard, await the check futures, then re-lock to update
RateLimitState / ConcurrencyState.
- Around line 174-180: The post_check_streaming function must not treat missing
TokenUsage as zero; update the TokenUsage retrieval in post_check_streaming to
detect when ctx.extensions().await.get::<TokenUsage>() is None, log an error via
the request context logger (e.g. ctx.logger or similar) explaining that
TokenUsage is missing for the streaming request, and return an Err (or early
return) instead of using 0 so token-based post-checks still run or upstream
logic can handle the unknown usage; modify the branch around TokenUsage lookup
in post_check_streaming (and any downstream uses of total_tokens) to avoid
silently defaulting to 0.

---

Outside diff comments:
In `@src/proxy/handlers/chat_completions/mod.rs`:
- Around line 157-160: When handling Some(Err(err)) you currently log and return
None without finalizing request usage; replicate the same finalization performed
in the `done` branch before returning to ensure usage/rate accounting and
metrics are recorded. Modify the Some(Err(err)) branch (the block that currently
does error!("Stream error: {}", err); drop(span); None) to invoke the same
finalization logic used by the `done` branch for `request_ctx` (or extract that
finalization into a helper and call it here), then drop the span and return
None.

---

Nitpick comments:
In `@src/proxy/handlers/embeddings/mod.rs`:
- Around line 23-30: The public handler function embeddings is missing the
required #[fastrace::trace] attribute; add the #[fastrace::trace] decorator
immediately above the pub async fn embeddings(...) declaration so the function
is instrumented for distributed tracing, ensuring the attribute appears before
the function signature (no other code changes needed inside hooks::observability
or the function body).
- Around line 33-38: The code currently calls unwrap on
request_ctx.extensions().await.get::<ResourceEntry<Model>>().cloned(), which can
panic if the Model entry is missing; change this to a safe check that returns an
explicit error instead of panicking — e.g., replace the unwrap with a match or
ok_or/ok_or_else and return a typed error/Response (create or use an existing
error variant) from the handler when the ResourceEntry<Model> is absent;
reference the same symbols (request_ctx, extensions().await,
ResourceEntry<Model>, and the surrounding handler function) and ensure the error
path conveys that authorization::check failed to insert the Model.

In `@src/proxy/handlers/models.rs`:
- Around line 52-57: The call using expect() when retrieving api_key in
models.rs is redundant because RequestContext::from_request_parts already
guarantees an ApiKey is present; replace the expect("apikey should exist in
context") with a simple unwrap() (or remove the explicit panic and call
.cloned().unwrap()) and add a short comment referencing that
RequestContext::from_request_parts enforces this invariant (see
RequestContext::from_request_parts) so future readers know the absence cannot
occur.

In `@src/proxy/hooks/authorization/mod.rs`:
- Around line 87-88: The two consecutive write-lock acquisitions on
ctx.extensions_mut().await should be merged: obtain the mutable extensions once
(e.g., let mut extensions = ctx.extensions_mut().await) and then call
extensions.insert(model) followed by
extensions.insert(RequestModel(model_name)); update the code around the
ctx.extensions_mut().await calls so both insertions use the single mutable
reference to avoid duplicate await/lock overhead.

In `@src/proxy/hooks/context.rs`:
- Around line 22-40: The extractor impl for FromRequestParts<AppState> for
RequestContext uses type Rejection = () and an expect() on
parts.extensions.remove::<ResourceEntry<ApiKey>>(), which yields opaque 500s on
failure; replace the unit rejection with a small custom rejection type (e.g.,
RequestContextRejection) that implements IntoResponse and conveys a clear error
message/status, change from_request_parts to return
Err(RequestContextRejection::MissingApiKey) when the ResourceEntry<ApiKey> is
absent (avoid expect), and reference the FromRequestParts<AppState> impl, the
from_request_parts method, and ResourceEntry<ApiKey> when making the change so
the extractor produces informative responses on extraction failure.

In `@src/proxy/hooks/observability/mod.rs`:
- Around line 8-9: StartTime is an internal marker and should not be exported:
remove the pub from the StartTime tuple struct (change `pub struct
StartTime(Instant);` to `struct StartTime(Instant)`), and for the module's
public helper functions (the functions currently exported around the 59–79
region) add triple-slash doc comments describing their purpose and annotate each
with `#[fastrace::trace]` to enable distributed tracing; ensure signatures
remain the same and only visibility, docs, and the `#[fastrace::trace]`
attribute are changed.

In `@src/proxy/hooks/rate_limit/mod.rs`:
- Around line 24-28: Implement IntoResponse for RateLimitHookError so handlers
can return the error directly; match the Raw(Response<Body>) variant and return
its inner Response as the HTTP response. Concretely, add an impl
axum::response::IntoResponse for RateLimitHookError that matches self {
RateLimitHookError::Raw(resp) => resp.into_response() } (ensure you import
axum::response::IntoResponse and use the existing Response<Body> type).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 936007f2-ab77-4963-9a72-ae9c602416c5

📥 Commits

Reviewing files that changed from the base of the PR and between 2c0a71f and ae96763.

📒 Files selected for processing (14)
  • src/proxy/handlers/chat_completions/mod.rs
  • src/proxy/handlers/chat_completions/types.rs
  • src/proxy/handlers/embeddings/mod.rs
  • src/proxy/handlers/embeddings/types.rs
  • src/proxy/handlers/models.rs
  • src/proxy/hooks/authorization/mod.rs
  • src/proxy/hooks/context.rs
  • src/proxy/hooks/metric.rs
  • src/proxy/hooks/mod.rs
  • src/proxy/hooks/observability/mod.rs
  • src/proxy/hooks/rate_limit/mod.rs
  • src/proxy/hooks2/context.rs
  • src/proxy/hooks2/mod.rs
  • src/proxy/mod.rs
💤 Files with no reviewable changes (3)
  • src/proxy/hooks/metric.rs
  • src/proxy/hooks2/context.rs
  • src/proxy/hooks2/mod.rs

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/proxy/hooks/rate_limit/mod.rs (1)

176-189: ⚠️ Potential issue | 🟠 Major

Silently defaulting to zero tokens when TokenUsage is missing.

When TokenUsage is absent (common for providers that don't emit usage in streaming chunks), defaulting to 0 bypasses token-based post-checks entirely. This could allow requests to exceed rate limits without detection.

Consider logging a warning or returning an error when usage data is unavailable for streaming requests.

🛡️ Proposed fix to log a warning
 pub async fn post_check_streaming(ctx: &mut RequestContext) -> Result<()> {
     let total_tokens = {
         match ctx.extensions().await.get::<TokenUsage>() {
             Some(usage) => usage.total_tokens,
-            None => 0,
+            None => {
+                error!("TokenUsage missing for streaming request; token-based rate limiting skipped");
+                0
+            }
         }
     };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/hooks/rate_limit/mod.rs` around lines 176 - 189,
post_check_streaming currently treats missing TokenUsage as zero which can
bypass rate checks; update post_check_streaming to detect when
ctx.extensions().await.get::<TokenUsage>() is None and either log a warning via
the request logger on RequestContext (include request id/ctx info) or return an
Err to surface missing usage, then only call run_post_check(ctx, total_tokens)
when usage is present; reference TokenUsage, post_check_streaming,
run_post_check and RequestContext when making the change.
🧹 Nitpick comments (5)
src/proxy/hooks/rate_limit/mod.rs (4)

191-205: Missing #[fastrace::trace] decorator on inject_response_headers.

Per coding guidelines, this public function should have the tracing decorator.

♻️ Proposed fix
+#[fastrace::trace]
 pub async fn inject_response_headers(

As per coding guidelines: "Use #[fastrace::trace] decorator for distributed tracing on public functions"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/hooks/rate_limit/mod.rs` around lines 191 - 205, Add the
#[fastrace::trace] attribute above the public async function
inject_response_headers to enable distributed tracing; locate the
inject_response_headers definition and insert the decorator on the line
immediately preceding the "pub async fn inject_response_headers(...)"
declaration (ensure the crate feature/imports for fastrace are available in the
module if not already).

167-174: Missing #[fastrace::trace] decorator on post_check.

Per coding guidelines, this public function should have the tracing decorator.

♻️ Proposed fix
+#[fastrace::trace]
 pub async fn post_check(ctx: &mut RequestContext, response: &ResponseData) -> Result<()> {

As per coding guidelines: "Use #[fastrace::trace] decorator for distributed tracing on public functions"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/hooks/rate_limit/mod.rs` around lines 167 - 174, Add the
#[fastrace::trace] attribute to the public async function post_check so it
participates in distributed tracing; locate the pub async fn post_check(ctx:
&mut RequestContext, response: &ResponseData) -> Result<()> declaration and add
the decorator directly above it (leaving the function body and the call to
run_post_check(ctx, usage.total_tokens).await unchanged) to comply with the
tracing guideline.

79-165: Missing #[fastrace::trace] decorator on public function.

Per coding guidelines, public functions should have the #[fastrace::trace] decorator for distributed tracing.

♻️ Proposed fix
+#[fastrace::trace]
 pub async fn pre_check(ctx: &mut RequestContext) -> Result<(), RateLimitError> {

As per coding guidelines: "Use #[fastrace::trace] decorator for distributed tracing on public functions"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/hooks/rate_limit/mod.rs` around lines 79 - 165, Add the
#[fastrace::trace] attribute directly above the public function declaration for
pre_check to enable distributed tracing (i.e., annotate pub async fn
pre_check(...) with #[fastrace::trace]); ensure the fastrace crate is available
in scope (add the dependency/import if missing) and then run format/lint to
confirm no attribute ordering or import issues.

39-50: Consider returning Result instead of panicking on missing resources.

Using .expect() will panic if resources are missing. While this may be by design (assuming prior middleware guarantees their presence), returning a Result would make the contract explicit and avoid runtime panics if upstream logic changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/hooks/rate_limit/mod.rs` around lines 39 - 50, The get_resources
function should not panic with expect; change its signature from async fn
get_resources(ctx: &RequestContext) -> (ResourceEntry<ApiKey>,
ResourceEntry<Model>) to return a Result: async fn get_resources(ctx:
&RequestContext) -> Result<(ResourceEntry<ApiKey>, ResourceEntry<Model>), E>
(use an existing error type or define a small error like
RateLimitError::MissingResource or anyhow::Error), then replace the .expect()
calls on guard.get::<ResourceEntry<ApiKey>>() and
guard.get::<ResourceEntry<Model>>() with ok_or/ok_or_else to return Err with a
clear message when a resource is missing; finally update callers of
get_resources to propagate the error with ? or handle it accordingly.
src/proxy/handlers/embeddings/mod.rs (1)

33-39: Address the TODO: .unwrap() on missing Model can panic.

The unwrap() will panic if ResourceEntry<Model> is missing from extensions. Consider returning an error instead of panicking, which would provide clearer failure semantics and avoid runtime panics.

♻️ Proposed fix
     let model = request_ctx
         .extensions()
         .await
         .get::<ResourceEntry<Model>>()
         .cloned()
-        .unwrap();
+        .ok_or_else(|| EmbeddingError::Internal("Model not found in request context".into()))?;

This assumes EmbeddingError has an Internal variant; adjust based on the actual error enum definition in types.rs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/proxy/handlers/embeddings/mod.rs` around lines 33 - 39, The code
currently unwraps the model from
request_ctx.extensions().await.get::<ResourceEntry<Model>>().cloned().unwrap(),
which can panic if the extension is missing; change this to handle the missing
case by returning a proper error (e.g., map the None into an
Err(EmbeddingError::Internal("missing Model in extensions") or another
appropriate EmbeddingError variant) and propagate it out of the surrounding
function instead of panicking, ensuring any subsequent uses of model operate on
the safely extracted value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/proxy/hooks/rate_limit/mod.rs`:
- Around line 176-189: post_check_streaming currently treats missing TokenUsage
as zero which can bypass rate checks; update post_check_streaming to detect when
ctx.extensions().await.get::<TokenUsage>() is None and either log a warning via
the request logger on RequestContext (include request id/ctx info) or return an
Err to surface missing usage, then only call run_post_check(ctx, total_tokens)
when usage is present; reference TokenUsage, post_check_streaming,
run_post_check and RequestContext when making the change.

---

Nitpick comments:
In `@src/proxy/handlers/embeddings/mod.rs`:
- Around line 33-39: The code currently unwraps the model from
request_ctx.extensions().await.get::<ResourceEntry<Model>>().cloned().unwrap(),
which can panic if the extension is missing; change this to handle the missing
case by returning a proper error (e.g., map the None into an
Err(EmbeddingError::Internal("missing Model in extensions") or another
appropriate EmbeddingError variant) and propagate it out of the surrounding
function instead of panicking, ensuring any subsequent uses of model operate on
the safely extracted value.

In `@src/proxy/hooks/rate_limit/mod.rs`:
- Around line 191-205: Add the #[fastrace::trace] attribute above the public
async function inject_response_headers to enable distributed tracing; locate the
inject_response_headers definition and insert the decorator on the line
immediately preceding the "pub async fn inject_response_headers(...)"
declaration (ensure the crate feature/imports for fastrace are available in the
module if not already).
- Around line 167-174: Add the #[fastrace::trace] attribute to the public async
function post_check so it participates in distributed tracing; locate the pub
async fn post_check(ctx: &mut RequestContext, response: &ResponseData) ->
Result<()> declaration and add the decorator directly above it (leaving the
function body and the call to run_post_check(ctx, usage.total_tokens).await
unchanged) to comply with the tracing guideline.
- Around line 79-165: Add the #[fastrace::trace] attribute directly above the
public function declaration for pre_check to enable distributed tracing (i.e.,
annotate pub async fn pre_check(...) with #[fastrace::trace]); ensure the
fastrace crate is available in scope (add the dependency/import if missing) and
then run format/lint to confirm no attribute ordering or import issues.
- Around line 39-50: The get_resources function should not panic with expect;
change its signature from async fn get_resources(ctx: &RequestContext) ->
(ResourceEntry<ApiKey>, ResourceEntry<Model>) to return a Result: async fn
get_resources(ctx: &RequestContext) -> Result<(ResourceEntry<ApiKey>,
ResourceEntry<Model>), E> (use an existing error type or define a small error
like RateLimitError::MissingResource or anyhow::Error), then replace the
.expect() calls on guard.get::<ResourceEntry<ApiKey>>() and
guard.get::<ResourceEntry<Model>>() with ok_or/ok_or_else to return Err with a
clear message when a resource is missing; finally update callers of
get_resources to propagate the error with ? or handle it accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ce8bb26a-5bc9-4b1b-ab37-125b41a441f3

📥 Commits

Reviewing files that changed from the base of the PR and between e9c480d and cb9467a.

📒 Files selected for processing (7)
  • src/proxy/handlers/chat_completions/mod.rs
  • src/proxy/handlers/chat_completions/types.rs
  • src/proxy/handlers/embeddings/mod.rs
  • src/proxy/handlers/embeddings/types.rs
  • src/proxy/hooks/authorization/mod.rs
  • src/proxy/hooks/observability/mod.rs
  • src/proxy/hooks/rate_limit/mod.rs
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/proxy/hooks/authorization/mod.rs
  • src/proxy/handlers/embeddings/types.rs
  • src/proxy/handlers/chat_completions/types.rs
  • src/proxy/hooks/observability/mod.rs
  • src/proxy/handlers/chat_completions/mod.rs

@bzp2010
Copy link
Copy Markdown
Collaborator Author

bzp2010 commented Apr 12, 2026

TODO:

The rate limit module has been slightly refactored from its previous implementation, but it remains tightly coupled. This work will continue in future pull requests.

@bzp2010 bzp2010 merged commit 7de3d35 into main Apr 12, 2026
10 checks passed
@bzp2010 bzp2010 deleted the bzp/feat-move-hook-to-func-call branch April 12, 2026 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant