Skip to content

feat(aicore/fallback): add opt-in model fallback for Orchestration v2#185

Open
lenin-ribeiro wants to merge 4 commits into
feat/orchestration-filteringfrom
feat/aicore-model-fallback
Open

feat(aicore/fallback): add opt-in model fallback for Orchestration v2#185
lenin-ribeiro wants to merge 4 commits into
feat/orchestration-filteringfrom
feat/aicore-model-fallback

Conversation

@lenin-ribeiro

@lenin-ribeiro lenin-ribeiro commented Jun 24, 2026

Copy link
Copy Markdown

Disclaimer: Do not include SAP-internal or customer-specific information in this PR (e.g. internal system URLs, customer names, tenant IDs, or confidential configurations). This is a public repository.

Stacked on feat/orchestration-filtering. This branch builds on the filtering PR. Open against that branch as the base to keep the diff focused; retarget to main once filtering merges.

Description

Adds opt-in model fallback for SAP AI Core Orchestration v2 to the sap_cloud_sdk.aicore module.

Orchestration v2 supports preference-ordered fallback module configurations: when the primary call fails (model unsupported in region, 429, 408, or any 5xx — and unsupported-model only for streaming), the server transparently retries with the next preference. The underlying litellm SAP provider already builds body["config"]["modules"] as a list when fallback_sap_modules is present in optional_params; what was missing was the SDK-side ergonomic surface and the response-side visibility into which preferences were skipped.

This PR introduces:

  • FallbackModel, FallbackConfig — typed dataclasses for declaring per-preference model + params + version.
  • set_fallbacks(config) — single entry point mirroring set_filtering(). Fallback is opt-in: set_aicore_config() does NOT activate it. Developers either call set_fallbacks(...) programmatically or set AICORE_FALLBACK_ENABLED=true (with AICORE_FALLBACK_MODELS or AICORE_FALLBACK_CONFIG) and call set_fallbacks() with no args.
  • response.intermediate_failures — when the fallback path fires, the per-preference failure list from the orchestration response is surfaced as an attribute on the returned ModelResponse. None when the primary succeeded, useful as a quick check.
  • Patch composition — the existing FilteringOrchestrationConfig is renamed to OrchestrationPatchConfig (alias kept for back-compat) and now owns both filtering and fallback. One install/uninstall path, no ordering issues.
  • Filtering broadcast across fallbacks — when both filtering and fallback are active, the filtering config is applied to every module entry on the wire (previously modules[0] only). Consistent SDK-side default; if a fallback should run unfiltered, the developer can call disable_filtering() before the call.

Related Issue

N/A — additive feature, no issue tracked

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Code refactoring

How to Test

Unit tests (no live credentials required)

uv run python -m pytest tests/aicore -v

Expect 142 passed, 8 skipped (the 8 skips are integration scenarios waiting for live env vars).

Integration tests (requires AI Core access)

  1. Copy .env_integration_tests.example to .env_integration_tests and fill in the AI Core creds.
  2. Set the new fallback secrets:
    AICORE_FALLBACK_TEST_PRIMARY_MODEL=sap/this-model-does-not-exist
    AICORE_FALLBACK_TEST_FALLBACK_MODEL=sap/mistralai--mistral-small-instruct
    The primary should be a model name the orchestration server reports as unsupported in your region — a genuinely nonexistent name works and avoids depending on transient 5xx errors.
  3. Run:
    uv run python -m pytest tests/aicore/integration/test_fallback_bdd.py -v
  4. Expected: 4 scenarios pass (primary succeeds, primary unsupported → fallback used, filtering + fallback composition, streaming + fallback).

Manual smoke

from sap_cloud_sdk.aicore import (
    FallbackConfig, FallbackModel, set_aicore_config, set_fallbacks,
)
from litellm import completion

set_aicore_config()
set_fallbacks(FallbackConfig([
    FallbackModel(model="sap/mistralai--mistral-small-instruct"),
]))

response = completion(
    model="sap/gpt-4o",  # or any potentially-unsupported model
    messages=[{"role": "user", "content": "Translate 'hello' to German."}],
)
print(response.choices[0].message.content)
print(getattr(response, "intermediate_failures", None))
# None if primary worked; list of failure dicts if fallback fired.

Checklist

  • I have read the Contributing Guidelines
  • I have verified that my changes solve the issue
  • I have added/updated automated tests to cover my changes
  • All tests pass locally (pytest tests/aicore, ruff check, ruff format --check, ty check)
  • I have verified that my code follows the Code Guidelines
  • I have updated documentation (if applicable) — aicore/user-guide.md gains a "Model Fallback (opt-in)" section
  • I have added type hints for all public APIs
  • My code does not contain sensitive information (credentials, tokens, etc.)
  • I have followed Conventional Commits for commit messages

Breaking Changes

None for users of sap_cloud_sdk.aicore public APIs. All existing names (FilteringOrchestrationConfig, _install, etc.) remain importable via aliases.

There is one user-visible behavioural change that only affects users who use filtering AND fallback together (an impossible combination on main today since fallback didn't exist): when both are active, the filtering configuration now applies to every module entry (primary + every fallback), not just modules[0]. This is the safe-by-default semantic — to run a fallback unfiltered, explicitly disable_filtering() before that call. Documented in the user guide.

Additional Notes

Design choices (selected via user Q&A during planning)

  • API style — patch-based, mirrors filtering. Single set_fallbacks() entry point, global state, no disable_fallbacks() (developers either opt in via set_fallbacks() or never call it; runtime clearing is set_fallbacks(None)).
  • One subclass for both concerns. OrchestrationPatchConfig handles filtering injection AND fallback injection in one transform_request. Single install/uninstall lifecycle. Idempotent.
  • Env vars opt-in. AICORE_FALLBACK_ENABLED defaults to false (unlike filtering, which is on by default after set_aicore_config()). Two-tier schema: AICORE_FALLBACK_MODELS (comma list, simple case) + AICORE_FALLBACK_CONFIG (JSON, full per-model config; takes precedence).
  • intermediate_failures on the response object. Pydantic ModelResponse uses extra="allow", so we can attach the field directly. Accessed via getattr(response, "intermediate_failures", None).

v1 limitations (documented)

  • intermediate_failures is surfaced for non-streaming responses only. Capturing the field from SAPStreamIterator chunks requires deeper changes to litellm internals and is deferred to a future iteration. The streaming integration test asserts that fallback still fires correctly server-side; it doesn't assert intermediate_failures.

Telemetry

Added Operation.AICORE_SET_FALLBACKS = "set_fallbacks". The set_fallbacks entry point is decorated with @record_metrics(Module.AICORE, Operation.AICORE_SET_FALLBACKS) per docs/GUIDELINES.md.

Files added / changed

NEW:
  src/sap_cloud_sdk/aicore/fallback/__init__.py
  src/sap_cloud_sdk/aicore/fallback/fallback.py
  tests/aicore/fallback/__init__.py
  tests/aicore/fallback/unit/__init__.py
  tests/aicore/fallback/unit/test_fallback_config.py
  tests/aicore/fallback/unit/test_patch.py
  tests/aicore/fallback/unit/test_set_fallbacks.py
  tests/aicore/integration/fallback.feature
  tests/aicore/integration/test_fallback_bdd.py

MODIFIED:
  src/sap_cloud_sdk/aicore/__init__.py            # export FallbackModel/FallbackConfig/set_fallbacks; docstring
  src/sap_cloud_sdk/aicore/filtering/filters.py   # OrchestrationPatchConfig rename + split _install + broadcast filtering + attach intermediate_failures
  src/sap_cloud_sdk/aicore/user-guide.md          # new "Model Fallback (opt-in)" section
  src/sap_cloud_sdk/core/telemetry/operation.py   # AICORE_SET_FALLBACKS
  tests/aicore/integration/conftest.py            # extend with fallback fixtures + clean-skip behaviour
  tests/core/unit/telemetry/test_operation.py     # expected enum count 152 → 153
  .env_integration_tests.example                  # AICORE_FALLBACK_TEST_PRIMARY_MODEL / _FALLBACK_MODEL

Add a sibling 'fallback' subpackage exposing FallbackModel, FallbackConfig
(dataclasses) and a set_fallbacks() entry point that mirrors the filtering
API style. Fallback is opt-in: set_aicore_config() does not enable it; the
developer activates it explicitly via set_fallbacks() or by setting
AICORE_FALLBACK_ENABLED=true and AICORE_FALLBACK_MODELS / _CONFIG.

The litellm SAP provider already builds modules as a list when
fallback_sap_modules is present in optional_params. The SDK now injects
that kwarg from the active FallbackConfig via the existing transport patch.

Refactor filtering/filters.py to host both concerns in a single subclass,
OrchestrationPatchConfig (FilteringOrchestrationConfig kept as alias):

- _install split into _install_filter + _install_fallback sharing
  _apply_patch(); _install retained as alias for back-compat.
- transform_request now injects fallback_sap_modules before super(), and
  BROADCASTS filtering to every module entry (was modules[0] only).
- transform_response attaches response.intermediate_failures from the body
  so callers can inspect which preferences were skipped. Non-streaming only
  in v1; streaming surfacing is deferred.

Tests:
- 35 new unit tests across test_fallback_config.py, test_patch.py and
  test_set_fallbacks.py covering dataclass shape, env parsing, patch
  injection, filtering broadcast, intermediate_failures attachment, and
  install lifecycle composition with filtering.
- New BDD fallback.feature + test_fallback_bdd.py with 4 scenarios
  (primary success, primary unsupported -> fallback used, filtering
  composition, streaming + fallback). conftest skips cleanly when
  AICORE_FALLBACK_TEST_* env vars are missing.
- Bump expected enum count for AICORE_SET_FALLBACKS.

Docs & ops:
- user-guide.md gains a Model Fallback (opt-in) section with programmatic
  and env-driven examples, composition with filtering, and the v1
  streaming limitation.
- .env_integration_tests.example documents the new
  AICORE_FALLBACK_TEST_PRIMARY_MODEL / _FALLBACK_MODEL secrets.
@lenin-ribeiro lenin-ribeiro self-assigned this Jun 24, 2026
@lenin-ribeiro lenin-ribeiro requested a review from a team as a code owner June 24, 2026 08:38
…odule entries

litellm's transform_request only builds the primary module's template from
`messages`; fallback entries get whatever was popped from their dict's
"messages" key (transformation.py:371), which is `[]` for
FallbackModel.to_dict(). The orchestration server then rejected with
"config.modules[N].prompt_templating.prompt.template should be non-empty".

Mirrors the existing filtering broadcast in the same transform_request.

Adds a realistic unit test (and helper) that would have caught this before
integration — the previous list-modules fixture hardcoded an empty
template on both the primary AND fallback entries, normalising the bug away.
Parity with sibling subpackages aicore/ and aicore/filtering/, both of
which already ship a py.typed marker. The parent package marker already
covers the subpackage transitively, but the one-marker-per-subpackage
convention is what docs/GUIDELINES.md prescribes.
…d filtering package

The parent branch refactored aicore/filtering/filters.py into four files
(_api.py, _models.py, _patch.py, config.py). This branch's fallback code
hooked directly into filters.py; the merge requires porting:

- src/sap_cloud_sdk/aicore/fallback/_patch.py (new): owns
  OrchestrationPatchConfig (now a subclass of FilteringOrchestrationConfig),
  _active_fallback_cfg, and _install_fallback. Keeps the same hooks as
  before: fallback_sap_modules injection, prompt-template broadcast to
  every fallback module entry, filtering broadcast across all entries
  (overriding the parent's primary-only injection), intermediate_failures
  attachment.

- src/sap_cloud_sdk/aicore/filtering/_patch.py: _install now defers to the
  installed fallback subclass when _active_fallback_cfg is set, so calling
  set_filtering() while fallback is active no longer clobbers the patch.
  Lazy import of fallback._patch avoids a circular dependency.

- src/sap_cloud_sdk/aicore/fallback/fallback.py: import _install_fallback
  from the new ._patch module instead of the deleted filtering.filters.

- tests/aicore/fallback/unit/test_patch.py + test_set_fallbacks.py: rewired
  to the new import paths. Adjusted test_patch_installed_when_only_filtering
  to assert FilteringOrchestrationConfig (not OrchestrationPatchConfig) is
  installed — filtering-only no longer uses the combined subclass under the
  refactored design.

Local verification: pytest tests/aicore → 145 passed, 8 skipped; pytest
tests (sans live-credential integration suites) → 2610 passed, 73 skipped;
ruff check + ruff format --check + ty check → all green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant