enh(Net,Foundation): multipart parsing performance and stream bulk-read optimization#5290
Open
enh(Net,Foundation): multipart parsing performance and stream bulk-read optimization#5290
Conversation
Fix OSS-Fuzz timeout (#5288) caused by O(parts × content_type_params) complexity in readMultipart — a crafted message with 2086 Content-Type parameters and 181 empty-boundary parts caused 60s+ timeout. Root cause: isMultipart() re-parsed the massive Content-Type header for every part via makeMultipart(). Add Content-Length bulk-read optimization (#4118) to MultipartStreamBuf using sgetn fast path when Content-Length is present in part headers, achieving ~2x throughput improvement on large binary parts. Changes: - Cache multipart state in makeMultipart() to avoid repeated Content-Type re-parsing - Validate empty boundary before entering multipart loop - Add MAX_PARTS (100000) limit to readMultipart() as DoS protection - Add Content-Length-aware bulk-read path in MultipartStreamBuf with proper CRLF/boundary parsing and 32-bit overflow protection - Increase STREAM_BUFFER_SIZE from 1KB to 16KB - Replace char-by-char stream drain with istr.ignore() - C++17 modernization: [[nodiscard]], = delete, = default, structured bindings, std::none_of, if-with-initializer - Add tests for empty boundary, Content-Length path, many parts (60000), too-many-parts, and large parts with/without Content-Length (200 × 500KB) with elapsed time output
Override xsgetn() in BasicBufferedStreamBuf to copy directly from the internal buffer instead of calling sbumpc() per byte. This eliminates the per-byte virtual call overhead when consumers use istream::read() or StreamCopier::copyToString(), yielding up to 46x throughput improvement for large multipart parts with Content-Length. The override delegates buffer refilling to the existing underflow() method, ensuring consistent behavior with single-byte reads. Also modernize PartHandler (= default, = delete) and test handler (override, [[nodiscard]], std::move).
When readFromDevice() throws mid-read (e.g., Base64Decoder hitting EOF while expecting padding), the exception propagated out of xsgetn, causing istream::read() to set badbit with gcount=0 — discarding all bytes read so far. Fix: catch exceptions in xsgetn and return bytes already copied if any were read. Only re-throw if no bytes have been consumed yet. This fixes StreamCopier::copyToString returning empty through Base64Decoder on multipart streams without a final boundary marker. Also update test StringPartHandler to use StreamCopier::copyToString (bulk reads) now that the underlying bug is fixed.
501a689 to
be73cbe
Compare
- Replace gbump(int) with setg() in BufferedStreamBuf::xsgetn to avoid int truncation risk with large buffers - Add comment explaining boundary check reorder in readWithBoundaryDetection - Add testReadMultiPartWithZeroContentLength for Content-Length: 0 edge case
…part Replace line-by-line boundary detection with bulk sgetn + in-buffer scanForBoundary for the no-Content-Length path (~8x faster for large parts). Over-read bytes are stored in a holdback vector and replayed via a reusable PrependStreamBuf that provides zero-copy string_view prefix reading before the underlying stream. For the Content-Length path, boundary validation after bulk content read uses PatternMatcher (only allocated when Content-Length present). Changes: - Add PrependStreamBuf (reusable, string_view prefix, zero-copy) - Add PatternMatcher for post-Content-Length boundary validation - Both are implementation-private classes in MultipartReader.cpp - Bulk sgetn + scanForBoundary for no-Content-Length path - Holdback via std::move between parts (no string copies) - _readAhead stored as vector<char> to avoid type conversions - Fix _boundaryFound to always signal; return -1 not 0 - Fix PatternMatcher::reset() to clear _failedLen - Add boundary length assertion - Modernize MultipartReader (= delete, explicit nullptr checks) - static constexpr STREAM_BUFFER_SIZE, enum class Result - Content-Length: 0 test case
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
xsgetn()bulk-read override toBufferedStreamBufand fix exception handling inUnbufferedStreamBuf::xsgetn()PrependStreamBuffor the no-Content-Length pathMailMessage,MultipartReader, andPartHandlerwith C++17 featuresMeasured Performance (200 x 500KB parts, ~97 MB)
Handler using
StreamCopier::copyToString(productionMultiPartHandlerpath):Handler using
stream.ignore()(NullPartHandler):Small parts (60,000 parts, 5.5 MB):
OSS-Fuzz #5288 test case (46 KB crafted message, 181 parts, 2086 Content-Type params):
Small-parts regression explanation
The 60,000 small-parts regression (31ms -> 71ms) is from bulk
sgetnover-reading past boundaries. Each boundary match requires storing over-read bytes in a holdback buffer and replaying them viaPrependStreamBuffor the next part's header parsing. For tiny parts (~10 bytes content, ~100 bytes headers), the holdback management overhead exceeds the old line-by-line scanning cost. For realistic payloads (large file uploads), the bulk-read approach is significantly faster.Side-effect performance improvements
BufferedStreamBuf::xsgetn(affects all BufferedStreamBuf-derived streams)The
xsgetn()override copies directly from the internal buffer instead of callingsbumpc()per byte. This benefits every consumer usingistream::read()orStreamCopieron anyBufferedStreamBuf-derived stream:UnbufferedStreamBuf::xsgetnexception safetyFixes
StreamCopier::copyToStringreturning empty when reading throughBase64Decoderon streams that throw mid-read (e.g., multipart without final boundary). Previously, an exception duringxsgetndiscarded all bytes read so far.HTTP server impact
For HTTP multipart form uploads (
HTMLForm::readMultipart):read()-based consumer on HTTPFixedLengthStreamChanges
Fix #5288: MailMessage multipart parsing timeout
makeMultipart()— eliminates O(parts x content_type_params) re-parsingMailMessage::MAX_PARTS(100,000) limit as DoS protectionistr.ignore()Fix #4118: Content-Length bulk-read optimization
MultipartReader::nextPart()extracts Content-Length from part headers (with 32-bit overflow protection)readContent()usessgetn()for bulk reading when Content-Length is knownSTREAM_BUFFER_SIZEfrom 1 KB to 16 KBBulk-read boundary scanning (no Content-Length path)
readContent()usessgetn()to bulk-read into buffer, thenscanForBoundary()searches in-bufferPrependStreamBuf(reusable, zero-copystring_viewprefix)std::istreamwrapper avoids per-part iostream construction overhead_readAheadasvector<char>withstd::movebetween parts eliminates copiesBufferedStreamBuf::xsgetn()bulk-read overridechar_traits::copy+setgunderflow()for consistent behaviorUnbufferedStreamBuf::xsgetn()exception safetyStreamCopier::copyToStringthroughBase64Decoderon truncated streamsC++17 modernization
MailMessage:[[nodiscard]],= delete, structured bindings,std::none_of, if-with-initializerMultipartReader:= delete, explicit!= nullptr,static constexpr,enum classPartHandler:= defaultconstructor/destructor,= deletecopyPatternMatcher,PrependStreamBuf: implementation-private classes in .cpp,string_view,[[nodiscard]]Test plan