Skip to content

enh(Net,Foundation): multipart parsing performance and stream bulk-read optimization#5290

Open
matejk wants to merge 6 commits intomainfrom
5288-multipart-fixes-improvements
Open

enh(Net,Foundation): multipart parsing performance and stream bulk-read optimization#5290
matejk wants to merge 6 commits intomainfrom
5288-multipart-fixes-improvements

Conversation

@matejk
Copy link
Copy Markdown
Contributor

@matejk matejk commented Apr 3, 2026

Summary

Measured Performance (200 x 500KB parts, ~97 MB)

Handler using StreamCopier::copyToString (production MultiPartHandler path):

Scenario main This PR Speedup
With Content-Length 224 ms 9 ms 25x
Without Content-Length 227 ms 94 ms 2.4x

Handler using stream.ignore() (NullPartHandler):

Scenario main This PR Speedup
With Content-Length 426 ms 211 ms 2x
Without Content-Length 424 ms 296 ms 1.4x

Small parts (60,000 parts, 5.5 MB):

Scenario main This PR Speedup
NullPartHandler 31 ms 71 ms 0.4x (regression)

OSS-Fuzz #5288 test case (46 KB crafted message, 181 parts, 2086 Content-Type params):

main This PR Speedup
3,879 ms 7 ms (rejected: empty boundary) 554x

Small-parts regression explanation

The 60,000 small-parts regression (31ms -> 71ms) is from bulk sgetn over-reading past boundaries. Each boundary match requires storing over-read bytes in a holdback buffer and replaying them via PrependStreamBuf for the next part's header parsing. For tiny parts (~10 bytes content, ~100 bytes headers), the holdback management overhead exceeds the old line-by-line scanning cost. For realistic payloads (large file uploads), the bulk-read approach is significantly faster.

Side-effect performance improvements

BufferedStreamBuf::xsgetn (affects all BufferedStreamBuf-derived streams)

The xsgetn() override copies directly from the internal buffer instead of calling sbumpc() per byte. This benefits every consumer using istream::read() or StreamCopier on any BufferedStreamBuf-derived stream:

Stream class Module Expected benefit
InflatingStreamBuf Foundation High — gzip/zlib decompression via StreamCopier
DeflatingStreamBuf Foundation High — compression input path
CryptoStreamBuf Crypto High — bulk encrypt/decrypt
ZipStreamBuf + PartialStreamBuf Zip High — ZIP extraction
DigestBuf Foundation Medium — limited by hash CPU cost
MultipartStreamBuf Net Proven 25x with Content-Length

UnbufferedStreamBuf::xsgetn exception safety

Fixes StreamCopier::copyToString returning empty when reading through Base64Decoder on streams that throw mid-read (e.g., multipart without final boundary). Previously, an exception during xsgetn discarded all bytes read so far.

HTTP server impact

For HTTP multipart form uploads (HTMLForm::readMultipart):

Scenario Impact
File uploads with Content-Length per part 25x faster content reading
File uploads without Content-Length (typical browser) 2.4x faster (bulk boundary scanning)
Small form fields Negligible (< 1ms either way)
Plain HTTP body reading via StreamCopier Moderate — xsgetn benefits any read()-based consumer on HTTPFixedLengthStream

Changes

Fix #5288: MailMessage multipart parsing timeout

  • Cache multipart state in makeMultipart() — eliminates O(parts x content_type_params) re-parsing
  • Validate empty boundary before entering multipart loop
  • Add MailMessage::MAX_PARTS (100,000) limit as DoS protection
  • Replace char-by-char stream drain with istr.ignore()

Fix #4118: Content-Length bulk-read optimization

  • MultipartReader::nextPart() extracts Content-Length from part headers (with 32-bit overflow protection)
  • readContent() uses sgetn() for bulk reading when Content-Length is known
  • PatternMatcher validates boundary after Content-Length exhaustion (lazy-allocated only when needed)
  • Increase STREAM_BUFFER_SIZE from 1 KB to 16 KB

Bulk-read boundary scanning (no Content-Length path)

  • readContent() uses sgetn() to bulk-read into buffer, then scanForBoundary() searches in-buffer
  • Over-read bytes stored in holdback vector, replayed via PrependStreamBuf (reusable, zero-copy string_view prefix)
  • Persistent std::istream wrapper avoids per-part iostream construction overhead
  • _readAhead as vector<char> with std::move between parts eliminates copies

BufferedStreamBuf::xsgetn() bulk-read override

  • Copies directly from internal buffer via char_traits::copy + setg
  • Delegates refilling to existing underflow() for consistent behavior

UnbufferedStreamBuf::xsgetn() exception safety

  • Catches exceptions mid-read and returns bytes already copied
  • Fixes StreamCopier::copyToString through Base64Decoder on truncated streams

C++17 modernization

  • MailMessage: [[nodiscard]], = delete, structured bindings, std::none_of, if-with-initializer
  • MultipartReader: = delete, explicit != nullptr, static constexpr, enum class
  • PartHandler: = default constructor/destructor, = delete copy
  • PatternMatcher, PrependStreamBuf: implementation-private classes in .cpp, string_view, [[nodiscard]]

Test plan

  • All 23 MailMessage tests pass (7 new: empty boundary, Content-Length, zero-CL, 60K parts, 100K+ rejected, 200x500KB with/without CL)
  • All 9 MultipartReader tests pass (including Unix line ends, no final boundary)
  • All 15 HTMLForm tests pass
  • All 102 Foundation Streams+FileStream tests pass
  • Profiled with Apple Instruments — remaining hotspots are MessageHeader::read (3.1%), memory management (7.5%), our scanForBoundary+readContent (2.4%)
  • Full CI

matejk added 3 commits April 3, 2026 11:05
Fix OSS-Fuzz timeout (#5288) caused by O(parts × content_type_params)
complexity in readMultipart — a crafted message with 2086 Content-Type
parameters and 181 empty-boundary parts caused 60s+ timeout. Root cause:
isMultipart() re-parsed the massive Content-Type header for every part
via makeMultipart().

Add Content-Length bulk-read optimization (#4118) to MultipartStreamBuf
using sgetn fast path when Content-Length is present in part headers,
achieving ~2x throughput improvement on large binary parts.

Changes:
- Cache multipart state in makeMultipart() to avoid repeated
  Content-Type re-parsing
- Validate empty boundary before entering multipart loop
- Add MAX_PARTS (100000) limit to readMultipart() as DoS protection
- Add Content-Length-aware bulk-read path in MultipartStreamBuf
  with proper CRLF/boundary parsing and 32-bit overflow protection
- Increase STREAM_BUFFER_SIZE from 1KB to 16KB
- Replace char-by-char stream drain with istr.ignore()
- C++17 modernization: [[nodiscard]], = delete, = default,
  structured bindings, std::none_of, if-with-initializer
- Add tests for empty boundary, Content-Length path, many parts
  (60000), too-many-parts, and large parts with/without
  Content-Length (200 × 500KB) with elapsed time output
Override xsgetn() in BasicBufferedStreamBuf to copy directly from the
internal buffer instead of calling sbumpc() per byte. This eliminates
the per-byte virtual call overhead when consumers use istream::read()
or StreamCopier::copyToString(), yielding up to 46x throughput
improvement for large multipart parts with Content-Length.

The override delegates buffer refilling to the existing underflow()
method, ensuring consistent behavior with single-byte reads.

Also modernize PartHandler (= default, = delete) and test handler
(override, [[nodiscard]], std::move).
When readFromDevice() throws mid-read (e.g., Base64Decoder hitting
EOF while expecting padding), the exception propagated out of xsgetn,
causing istream::read() to set badbit with gcount=0 — discarding all
bytes read so far.

Fix: catch exceptions in xsgetn and return bytes already copied if
any were read. Only re-throw if no bytes have been consumed yet.

This fixes StreamCopier::copyToString returning empty through
Base64Decoder on multipart streams without a final boundary marker.

Also update test StringPartHandler to use StreamCopier::copyToString
(bulk reads) now that the underlying bug is fixed.
@matejk matejk force-pushed the 5288-multipart-fixes-improvements branch from 501a689 to be73cbe Compare April 3, 2026 09:52
matejk added 2 commits April 3, 2026 12:00
- Replace gbump(int) with setg() in BufferedStreamBuf::xsgetn to avoid
  int truncation risk with large buffers
- Add comment explaining boundary check reorder in readWithBoundaryDetection
- Add testReadMultiPartWithZeroContentLength for Content-Length: 0 edge case
…part

Replace line-by-line boundary detection with bulk sgetn + in-buffer
scanForBoundary for the no-Content-Length path (~8x faster for large
parts). Over-read bytes are stored in a holdback vector and replayed
via a reusable PrependStreamBuf that provides zero-copy string_view
prefix reading before the underlying stream.

For the Content-Length path, boundary validation after bulk content
read uses PatternMatcher (only allocated when Content-Length present).

Changes:
- Add PrependStreamBuf (reusable, string_view prefix, zero-copy)
- Add PatternMatcher for post-Content-Length boundary validation
- Both are implementation-private classes in MultipartReader.cpp
- Bulk sgetn + scanForBoundary for no-Content-Length path
- Holdback via std::move between parts (no string copies)
- _readAhead stored as vector<char> to avoid type conversions
- Fix _boundaryFound to always signal; return -1 not 0
- Fix PatternMatcher::reset() to clear _failedLen
- Add boundary length assertion
- Modernize MultipartReader (= delete, explicit nullptr checks)
- static constexpr STREAM_BUFFER_SIZE, enum class Result
- Content-Length: 0 test case
@matejk matejk requested review from aleks-f and obiltschnig April 3, 2026 15:22
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fuzzing Issue in MailMessage Poco Multipart parsing is 10x slower than its Boost/beat or restinio equivalent

2 participants