[RNE Rewrite] Add text and image embeddings pipelines by msluszniak · Pull Request #1292 · software-mansion/react-native-executorch

msluszniak · 2026-06-30T10:56:04Z

Description

Adds text and image embeddings pipelines to the new architecture, achieving parity with the old flow. Embeddings are pure-TypeScript tasks (pooling + L2-norm stay baked into the .pte): text tokenizes and runs forward; image reuses the existing image preprocessor. To run the existing int64-input embedding models unchanged, this adds an int64/Long tensor dtype to the core (the tensor data path is byte-oriented, so it is a small dtype.{h,cpp} + tensor.ts change).

Text inputs are fed at their exact token length (no padding). model.execute validates dynamically-shaped forward inputs against the [min, max, step] bounds exposed by an optional get_dynamic_dims method; models without it keep exact per-dimension validation. This fixes scale-sensitive pooling heads (e.g. DistilUSE's tanh projection), which padding otherwise corrupts.

Includes createTextEmbeddings / createImageEmbeddings tasks, useTextEmbeddings / useImageEmbeddings hooks, models.textEmbeddings / models.imageEmbeddings registry entries, an interactive text-embeddings demo in apps/nlp, and a CLIP zero-shot image-embeddings demo in apps/computer-vision.

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

nlp app → Text Embeddings: seeds a sentence library; type a query and Find similar to rank by cosine similarity, switch models via the chips. Verified on a physical Android device (arm64): all-MiniLM-L6-v2 returns 384-dim L2-normalized embeddings (~25 ms/forward on XNNPACK); DistilUSE ranks correctly with a wide similarity spread (previously compressed by padding).
computer-vision app → Image Embeddings: pick an image and rank editable text labels via CLIP zero-shot (image vs. text embeddings). Verified on device.

Screenshots

Related issues

#1247

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

DistilUSE and CLIP (text) are re-exported with the get_dynamic_dims method and pinned to v0.10.0; the remaining text-embedding models (all-MiniLM-L6-v2, all-mpnet-base-v2, multi-qa MiniLM/MPNet, paraphrase-ML) still need re-export to v0.10.0.

Add int64/Long tensor dtype support and text/image embeddings tasks, hooks, and model registry entries, plus an interactive text-embeddings demo screen in apps/nlp. Closes #1247

model.execute now validates dynamically-shaped forward inputs against the model-declared [min, max, step] bounds exposed by an optional get_dynamic_dims method, instead of requiring an exact shape match; models without it keep exact per-dimension validation. Text embeddings feed the exact token length with no padding, which fixes scale-sensitive pooling heads (e.g. DistilUSE's tanh projection). Point DistilUSE at v0.10.0 (re-exported with get_dynamic_dims).

…mbeddings demo - Simplify text-embeddings cosine to a dot product (all models L2-normalize) and drop redundant inline comments. - Move the get_dynamic_dims / input-validation contract into the ModelHostObject class docs; trim the inline narration in model.cpp. - Add an Image Embeddings example to the computer-vision app: pick two images and compare their CLIP embeddings by cosine similarity.

Rework the computer-vision Image Embeddings screen (based on main's CLIP demo): pick an image and rank editable text labels by CLIP image/text embedding similarity, instead of the uninformative two-image score. Pads the scroll content past the Android nav bar. Point CLIP text + image at v0.10.0 (text re-exported with get_dynamic_dims; image unchanged) and declare the textEmbeddings feature in the app.

- model.{h,cpp}: read get_dynamic_dims once per model and cache it instead of re-executing the method on every forward() call; reject a present-but- malformed declaration (wrong dtype/rank/shape, bad min/max/step, or row count not matching forward's tensor input dims) with an explicit error instead of silently falling back to exact validation. - textEmbeddings: throw a clear error when input tokenizes to zero tokens (was BigInt(undefined)); fix docstring to match no-padding behavior. - useTextEmbeddings: expose localPath/tokenizerPath like sibling hooks. - computer-vision: extract shared skImageToBuffer helper, dedup from classification and imageEmbeddings screens.

Rebase onto rne-rewrite adopted #1296's rewritten model.cpp, which delegates tensor dtype/shape checks to tensor::fromJs and already supports RangeDim [min, max, step] bounds. Re-implement variable-length forward inputs on top of it: parse get_dynamic_dims once per method into cached bounds, build a SymbolicShape of RangeDims, and pass it to fromJs. Statically shaped methods keep exact validation.

Align the text/image embeddings tasks with the add-task-pipeline skill (and every other task): allocate the static output tensor in a `[...] as const` array, destructure it, and dispose via `tensors.forEach`.

barhanc

Checked only lib implementation. I will take a look at app code on Monday.

Regarding core/ dynamic input support changes, I've added some suggestions that imo make the code more future-proof and easier to change.

Regarding TS side everything is fine, just some small nits. I was thinking though if we don't want to take the opportunity of this refactor and beef-up the TS embeddings side a bit more, something like unifying both image and text embeddings into one pipeline that on top of exposing simple embed method would also expose methods implementing a small vector search data structure like insert, clear, find, etc. I don't know how much effort would that be so it's your call. The current implementation is correct.

barhanc · 2026-07-03T20:19:44Z

+
+                    std::shared_ptr<TensorHostObject> tensorHostObject;
+                    if (dynamicInputBounds.empty()) {
+                        tensorHostObject = tensor::fromJs(rt, ctx, val, expectedDtype, tensorMeta.sizes());
+                    } else {
+                        // Map bounds by the method-declared rank so mapping is
+                        // independent of the caller-supplied shape.
+                        const auto rank = tensorMeta.sizes().size();
+                        if (boundsOffset + rank > dynamicInputBounds.size()) {
+                            throw jsi::JSError(rt, std::format("execute: get_dynamic_dims declares fewer "
+                                                               "dimensions ({}) than forward's tensor "
+                                                               "inputs require",
+                                                               dynamicInputBounds.size()));
+                        }
+                        tensor::SymbolicShape expectedShape;
+                        expectedShape.reserve(rank);
+                        for (size_t d = 0; d < rank; ++d) {
+                            const auto &row = dynamicInputBounds[boundsOffset + d];
+                            tensor::RangeDim rangeDim;
+                            rangeDim.min = static_cast<int32_t>(row[0]);
+                            rangeDim.max = static_cast<int32_t>(row[1]);
+                            if (row[2] > 1) {
+                                rangeDim.step = static_cast<int32_t>(row[2]);
+                            }
+                            expectedShape.emplace_back(rangeDim);
+                        }
+                        boundsOffset += rank;
+                        tensorHostObject = tensor::fromJs(rt, ctx, val, expectedDtype,
+                                                          std::optional<tensor::SymbolicShape>(std::move(expectedShape)));
+                    }


Suggested change

std::shared_ptr<TensorHostObject> tensorHostObject;

if (dynamicInputBounds.empty()) {

tensorHostObject = tensor::fromJs(rt, ctx, val, expectedDtype, tensorMeta.sizes());

} else {

// Map bounds by the method-declared rank so mapping is

// independent of the caller-supplied shape.

const auto rank = tensorMeta.sizes().size();

if (boundsOffset + rank > dynamicInputBounds.size()) {

throw jsi::JSError(rt, std::format("execute: get_dynamic_dims declares fewer "

"dimensions ({}) than forward's tensor "

"inputs require",

dynamicInputBounds.size()));

}

tensor::SymbolicShape expectedShape;

expectedShape.reserve(rank);

for (size_t d = 0; d < rank; ++d) {

const auto &row = dynamicInputBounds[boundsOffset + d];

tensor::RangeDim rangeDim;

rangeDim.min = static_cast<int32_t>(row[0]);

rangeDim.max = static_cast<int32_t>(row[1]);

if (row[2] > 1) {

rangeDim.step = static_cast<int32_t>(row[2]);

}

expectedShape.emplace_back(rangeDim);

}

boundsOffset += rank;

tensorHostObject = tensor::fromJs(rt, ctx, val, expectedDtype,

std::optional<tensor::SymbolicShape>(std::move(expectedShape)));

}

std::shared_ptr<TensorHostObject> tensorHostObject;

if (self->dynamicInputShapes_.contains(methodName)) {

auto expectedShape = self->dynamicInputShapes_[methodName][i];

tensorHostObject = tensor::fromJs(rt, ctx, val, expectedDtype, expectedShape);

} else {

tensorHostObject = tensor::fromJs(rt, ctx, val, expectedDtype, tensorMeta.sizes());

}

We want to minimize changes to the execute method required for dynamic inputs validation so that the code is future-proof, e.g. when ExecuTorch adds native support or we will want to change how the dynamic shapes are parsed.

barhanc · 2026-07-03T20:20:26Z

+            if (!dynamicInputBounds.empty() && boundsOffset != dynamicInputBounds.size()) {
+                throw jsi::JSError(rt, std::format("execute: get_dynamic_dims declares more dimensions ({}) "
+                                                   "than forward's tensor inputs use ({})",
+                                                   dynamicInputBounds.size(), boundsOffset));
+            }
+


Suggested change

if (!dynamicInputBounds.empty() && boundsOffset != dynamicInputBounds.size()) {

throw jsi::JSError(rt, std::format("execute: get_dynamic_dims declares more dimensions ({}) "

"than forward's tensor inputs use ({})",

dynamicInputBounds.size(), boundsOffset));

}

barhanc · 2026-07-03T20:21:06Z

+            // Per-dimension [min, max, step] bounds parsed from get_dynamic_dims
+            // at construction. Absent for statically shaped methods, which then
+            // validate exactly.
+            const std::vector<std::array<int64_t, 3>> noBounds;
+            auto boundsIt = self->dynamicInputBounds_.find(methodName);
+            const auto &dynamicInputBounds =
+                boundsIt != self->dynamicInputBounds_.end() ? boundsIt->second : noBounds;
+            size_t boundsOffset = 0;
+


Suggested change

// Per-dimension [min, max, step] bounds parsed from get_dynamic_dims

// at construction. Absent for statically shaped methods, which then

// validate exactly.

const std::vector<std::array<int64_t, 3>> noBounds;

auto boundsIt = self->dynamicInputBounds_.find(methodName);

const auto &dynamicInputBounds =

boundsIt != self->dynamicInputBounds_.end() ? boundsIt->second : noBounds;

size_t boundsOffset = 0;

barhanc · 2026-07-03T20:23:07Z

   * Writes data from a typed array into this tensor's native buffer.
   * @param src The source typed array. Its size in bytes must match the
-   * tensor's size.
+   * tensor's size. Use a `BigInt64Array` for `int64` tensors.


Suggested change

* tensor's size. Use a `BigInt64Array` for `int64` tensors.

* tensor's size.

barhanc · 2026-07-03T20:24:16Z

    std::unique_ptr<executorch::extension::Module> etModule_;
    std::mutex mutex_;
+
+    std::unordered_map<std::string, std::vector<std::array<int64_t, 3>>> dynamicInputBounds_;


Suggested change

std::unordered_map<std::string, std::vector<std::array<int64_t, 3>>> dynamicInputBounds_;

std::unordered_map<std::string, std::vector<tensor::SymbolicShape>> dynamicInputShapes_;

Let's use the tensor::SymbolicShape directly so that we don't have to build it on every execute call and can simplify the execute code.

barhanc · 2026-07-03T20:43:29Z

+   * @param input The input text to embed.
+   * @returns A promise resolving to the embedding vector.
+   */
+  forward: (input: string) => Promise<Float32Array>;


Same as in image embeddings, more descriptive name would be better imo.

barhanc · 2026-07-03T20:45:25Z

+    const tokenIds = tensor('int64', [1, len], idsData);
+    const attentionMask = tensor('int64', [1, len], maskData);


Please use the t<Name> naming convention for tensor variables.

barhanc · 2026-07-03T20:47:28Z

+ * @returns A promise resolving to an object containing the embedding and
+ * disposal controls.
+ */
+export async function createTextEmbeddings(


Probably should be named ...Embedder to match other tasks. Same with file name, perhaps ...Embedding.ts (no 's') would be more consistent.

barhanc · 2026-07-03T20:51:11Z

+    if (ids.length === 0) {
+      throw new Error('createTextEmbeddings: input tokenized to zero tokens');
+    }
+    const len = Math.min(ids.length, maxSeqLen);


Worth documenting the truncating behaviour on long inputs.

barhanc · 2026-07-03T20:52:24Z

 export * from './tasks/tokenization';
+export * from './tasks/textEmbeddings';


Tasks shouldn't be explicitly exported from /extensions/<domain>/index.ts.

msluszniak self-assigned this Jun 30, 2026

msluszniak added the refactoring label Jun 30, 2026

msluszniak linked an issue Jun 30, 2026 that may be closed by this pull request

[RNE Rewrite] Add image and text embeddings pipelines #1247

Open

msluszniak added the feature PRs that implement a new feature label Jun 30, 2026

msluszniak commented Jul 1, 2026

View reviewed changes

msluszniak marked this pull request as ready for review July 1, 2026 16:07

msluszniak requested a review from barhanc July 1, 2026 16:07

msluszniak added 6 commits July 3, 2026 15:59

[RNE Rewrite] Add text and image embeddings pipelines

df762d6

Add int64/Long tensor dtype support and text/image embeddings tasks, hooks, and model registry entries, plus an interactive text-embeddings demo screen in apps/nlp. Closes #1247

fix(computer-vision): add JSDoc @param/@returns for skImageToBuffer

0fa2742

msluszniak marked this pull request as draft July 3, 2026 14:24

msluszniak force-pushed the @ms/add-embeddings branch from 8e0f200 to ede040e Compare July 3, 2026 14:25

msluszniak commented Jul 3, 2026

View reviewed changes

Comment thread packages/react-native-executorch/cpp/core/model.cpp Outdated

msluszniak force-pushed the @ms/add-embeddings branch from ede040e to 6c3ccc4 Compare July 3, 2026 14:37

msluszniak marked this pull request as ready for review July 3, 2026 14:37

refactor(embeddings): pre-allocate static tensors via as const array

da219cd

Align the text/image embeddings tasks with the add-task-pipeline skill (and every other task): allocate the static output tensor in a `[...] as const` array, destructure it, and dispose via `tensors.forEach`.

barhanc reviewed Jul 3, 2026

View reviewed changes

	* tensor's size. Use a `BigInt64Array` for `int64` tensors.
	* tensor's size.

	std::unordered_map<std::string, std::vector<std::array<int64_t, 3>>> dynamicInputBounds_;
	std::unordered_map<std::string, std::vector<tensor::SymbolicShape>> dynamicInputShapes_;

		const tokenIds = tensor('int64', [1, len], idsData);
		const attentionMask = tensor('int64', [1, len], maskData);

		export * from './tasks/tokenization';
		export * from './tasks/textEmbeddings';

Uh oh!

Conversation

msluszniak commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

barhanc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

msluszniak commented Jun 30, 2026 •

edited

Loading