fix: encode lone UTF-16 surrogates as U+FFFD to match native bcrypt by spokodev · Pull Request #171 · dcodeIO/bcrypt.js

spokodev · 2026-06-25T19:22:29Z

Problem

The README states bcryptjs is "Compatible to the C++ bcrypt binding". For any password containing an unpaired UTF-16 surrogate, it is not: bcryptjs computes a different hash than native bcrypt, so a hash produced by one cannot be verified by the other.

utf8Array's 3-byte else branch encodes a lone surrogate as raw WTF-8. Every standards-compliant UTF-8 encoder instead substitutes the U+FFFD replacement character:

[...Buffer.from("\uD800", "utf8")]            // [0xEF, 0xBF, 0xBD]  (U+FFFD)
[...new TextEncoder().encode("\uD800")]       // [0xEF, 0xBF, 0xBD]  (U+FFFD)
// bcryptjs utf8Array("\uD800")               // [0xED, 0xA0, 0x80]  (WTF-8)

Because the bytes fed to the KDF differ, the hashes do not match the native binding:

const bcrypt = require("bcrypt");      // native C++ binding
const bcryptjs = require("bcryptjs");
const salt = "$2b$06$DCq7YPn5Rq63x1Lad4cll.";

bcrypt.hashSync("\uD800", salt);
// $2b$06$DCq7YPn5Rq63x1Lad4cll.nt6EZli4Z4eGj7C9JUnk.i4ERs/EVsq
bcryptjs.hashSync("\uD800", salt);
// $2b$06$DCq7YPn5Rq63x1Lad4cll.AEJkiq9W7zpkNhET9G4TKw3czJdIZTu   (mismatch)

The native binding maps every lone surrogate to the same U+FFFD bytes, exactly as Buffer.from(str, "utf8") does. bcryptjs is the only one producing distinct, non-interoperable output here.

Root cause

In utf8Array, a high+low surrogate pair is decoded correctly. Anything else with charCodeAt >= 2048 falls through to the final else, which 3-byte-encodes the raw code unit. For a code unit in 0xD800-0xDFFF that is not part of a valid pair, that produces WTF-8 rather than U+FFFD.

Fix

Before the generic 3-byte branch, detect a code unit in the surrogate range ((c1 & 0xf800) === 0xd800) that was not consumed as a valid pair, and emit 0xEF 0xBF 0xBD (U+FFFD), matching Buffer.from(str, "utf8") and the native binding. utf8Length already counts a lone surrogate as 3 bytes, so buffer sizing is unchanged.

Backwards-compatibility note

This changes the hash bcryptjs computes for the (rare, malformed) lone-surrogate password class. A previously-stored bcryptjs hash of such a password would no longer verify against the new output. The affected inputs were already non-interoperable with every other bcrypt implementation, so this aligns bcryptjs with the standard rather than breaking a working contract, but it is a behavior change worth your judgement.

Verification

Added a compat_lone_surrogate unit test in the same style as compat_hash, asserting bcryptjs matches the native bcrypt binding for five lone-surrogate cases (high alone, low alone, emoji high half, surrounded by ASCII, low-then-high). It fails on the current code and passes with the fix. The full npm test suite (unit + the four TypeScript projects) is green, and compat_hash still passes, confirming valid UTF-8, 8-bit, NUL, and 72-byte-truncation cases are unaffected.

A password containing an unpaired UTF-16 surrogate was encoded as WTF-8 (e.g. ED A0 80 for U+D800) by utf8Array, whereas Buffer.from(str, "utf8"), TextEncoder, and the native C++ bcrypt binding all map a lone surrogate to the U+FFFD replacement character (EF BF BD). The differing key bytes made bcryptjs hashes of such passwords non-verifiable across implementations, contrary to the README's "Compatible to the C++ bcrypt binding" claim. Emit EF BF BD for any code unit in the surrogate range that is not part of a valid pair. utf8Length already counts a lone surrogate as 3 bytes, so the buffer sizing is unchanged.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: encode lone UTF-16 surrogates as U+FFFD to match native bcrypt#171

fix: encode lone UTF-16 surrogates as U+FFFD to match native bcrypt#171
spokodev wants to merge 1 commit into
dcodeIO:mainfrom
spokodev:fix/utf8-lone-surrogate-replacement

spokodev commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

spokodev commented Jun 25, 2026

Problem

Root cause

Fix

Backwards-compatibility note

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant