Skip to content

[v22.x] src: clamp WriteUtf8 capacity to INT_MAX in EncodeInto#62621

Open
semimikoh wants to merge 2 commits intonodejs:v22.x-stagingfrom
semimikoh:fix-encodeinto-intmax
Open

[v22.x] src: clamp WriteUtf8 capacity to INT_MAX in EncodeInto#62621
semimikoh wants to merge 2 commits intonodejs:v22.x-stagingfrom
semimikoh:fix-encodeinto-intmax

Conversation

@semimikoh
Copy link
Copy Markdown
Contributor

Backport notice

This is a v22.x-only patch, not a cherry-pick. The bug it fixes was
incidentally resolved on main / v24.x by #58070, which migrated
EncodeInto from v8::String::WriteUtf8 to WriteUtf8V2. The
intent of #58070 was deprecation cleanup, not bug fixing.

WriteUtf8V2 is not available in v22.x's bundled V8 version, so
backporting #58070 in full is not feasible. This PR instead applies
a minimal scoped fix to EncodeInto only.

Problem

In TextEncoder.encodeInto, the destination buffer's byte length
is read as size_t but then implicitly narrowed to int when
passed as the capacity argument to v8::String::WriteUtf8. When
the destination view is larger than INT_MAX (2,147,483,647
bytes), the narrowing conversion underflows to a negative value.
V8 treats this as "no capacity available" and writes 0 bytes,
returning { read: 0, written: 0 } even though the buffer has
ample space.

Reproduction on v22.x (from #62610):

```js
const _TE = new TextEncoder();
const s = "aÿ我𝑒";
const u8a = new Uint8Array(2429682061); // ~2.4 GB
const offset = 38928786;

_TE.encodeInto(s, u8a.subarray(offset));
// v22.x: { read: 0, written: 0 } ← bug
// v24.x: { read: 5, written: 10 } ← correct

_TE.encodeInto(s, u8a.subarray(offset, offset + 10));
// Both versions: { read: 5, written: 10 }
// (works because the bounded view is only 10 bytes, well within int range)
```

This is a silent data loss bug — no error is thrown, the caller
just gets a successful-looking return value with zero bytes written.

Fix

Clamp the capacity passed to WriteUtf8 to INT_MAX in
EncodeInto. The source string in encodeInto is bounded in
practice and never requires more than INT_MAX bytes to encode;
only the destination view length can exceed INT_MAX.

Testing

Built locally on v22.x-staging (v22.22.2-pre) and verified:

  • ✅ Original reproduction from TextEncoder.encodeInto NOT work on BIG utf8 subarray ON 22.4.1 #62610 now returns { read: 5, written: 10 } for both unbounded and bounded subarrays
  • ✅ New pummel regression test (test/pummel/test-whatwg-encoding-encodeinto-large.js) passes with the patched binary. Follows existing pummel conventions: skips on 32-bit architectures via common.skipIf32Bits(), skips on insufficient os.totalmem(), and skips on allocation failure
  • test/parallel/test-whatwg-encoding*, test-textencoder*, test-textdecoder* — 11 tests pass
  • test/parallel/test-buffer-*, test-string-decoder* — 69 tests pass
  • test/pummel/test-whatwg-encoding* — passes
  • cpplint.py on the modified file — clean
  • make lint-js — clean

Refs: #58070
Fixes: #62610

In TextEncoder.encodeInto, the destination buffer's byte length is
read as a size_t but then implicitly narrowed to int when passed as
the capacity argument to v8::String::WriteUtf8. When the destination
view is larger than INT_MAX (2,147,483,647 bytes), the narrowing
conversion underflows to a negative value, V8 treats it as "no
capacity", and writes 0 bytes - returning { read: 0, written: 0 }
even though the buffer has plenty of room.

Clamp the capacity to INT_MAX before passing it to WriteUtf8. This
is sufficient because the source string in encodeInto is bounded in
practice and never requires more than INT_MAX bytes to encode; only
the destination view length can exceed INT_MAX.

This issue is already fixed on main and v24.x as a side effect of
PR nodejs#58070, which migrated to the non-deprecated WriteUtf8V2 method
whose capacity parameter is size_t. WriteUtf8V2 is not available in
v22.x's V8 version, so this minimal patch fixes only the EncodeInto
path instead of backporting the full migration.

Refs: nodejs#58070
Fixes: nodejs#62610
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. v22.x Issues that can be reproduced on v22.x or PRs targeting the v22.x-staging branch. labels Apr 7, 2026
@Renegade334
Copy link
Copy Markdown
Member

Please also ensure your commits are signed off.

@Renegade334 Renegade334 changed the title [v22.x backport] src: clamp WriteUtf8 capacity to INT_MAX in EncodeInto [v22.x] src: clamp WriteUtf8 capacity to INT_MAX in EncodeInto Apr 7, 2026
Signed-off-by: semimikoh <ejffjeosms@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. v22.x Issues that can be reproduced on v22.x or PRs targeting the v22.x-staging branch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants