feat(datafabric): standalone ontology tool grounded on OWL + R2RML#911
feat(datafabric): standalone ontology tool grounded on OWL + R2RML#911sankalp-uipath wants to merge 31 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an optional fetch_ontology inner tool to the Data Fabric SQL sub-agent so the inner LLM can retrieve a configured ontology’s OWL schema from the QueryEngine REST API and use it to generate semantically-correct SQL.
Changes:
- Introduces an ontology REST client (
fetch_ontology_owl) with name validation and size limiting. - Adds a
fetch_ontologyleaf tool with an instance-level cache and wires it into the inner Data Fabric subgraph alongsideexecute_sql. - Threads
ontology_name/folder_keyinto the Data Fabric tool construction path (with an env-var fallback).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/uipath_langchain/agent/tools/datafabric_tool/ontology_fetch_tool.py |
New leaf tool (fetch_ontology) and cached fetcher wrapper for inner SQL agent use. |
src/uipath_langchain/agent/tools/datafabric_tool/ontology_client.py |
New client helper to fetch OWL content via EntitiesService.request_async, including name validation and payload cap. |
src/uipath_langchain/agent/tools/datafabric_tool/models.py |
Adds an intentionally-empty args schema (OntologyFetchInput) for the new tool. |
src/uipath_langchain/agent/tools/datafabric_tool/datafabric_tool.py |
Plumbs ontology_name / folder_key into the query handler creation (currently with env-var fallback). |
src/uipath_langchain/agent/tools/datafabric_tool/datafabric_subgraph.py |
Adds optional fetch_ontology tool binding and dispatch-by-tool-name inside the inner subgraph. |
…logy_file (drop local client)
…age.status to match host node
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| # Inner toolset: always execute_sql; optionally an LLM-decided | ||
| # fetch_ontology tool when one or more ontologies are configured. | ||
| inner_tools: list[BaseTool] = [self._execute_sql_tool] | ||
| if ontologies: |
There was a problem hiding this comment.
EnabledNewLlmClients <- check for the feature flag impl of this to ensure out feature is behind the feature flag.
| # fetch_ontology tool when one or more ontologies are configured. | ||
| inner_tools: list[BaseTool] = [self._execute_sql_tool] | ||
| if ontologies: | ||
| inner_tools.append( |
There was a problem hiding this comment.
This doesnt update the subgraph ? correct?
| entity set) as ``ontologySet`` items. Each carries its own ``folderId``, so | ||
| it is fetched from its own folder. | ||
| """ | ||
| items = getattr(resource, "ontology_set", None) or [] |
There was a problem hiding this comment.
Same as other PR. ontology_set?
8b04daa to
86e5912
Compare
| def test_fetch_ontology_bound_only_when_ontologies(make_graph): | ||
| without = make_graph(None) | ||
| assert "execute_sql" in without._tools_by_name | ||
| assert "fetch_ontology" not in without._tools_by_name | ||
|
|
||
| with_onto = make_graph([("library", None)]) | ||
| assert "fetch_ontology" in with_onto._tools_by_name |
There was a problem hiding this comment.
nit: splitting this test into two(should bind when present/should not bind when absent) is trivial and allows instantly knowing what failed from the test name alone without checking the assertion message.
| # An ontology context is not a standalone tool — it only grounds the Data | ||
| # Fabric entity tool, which gathers it via resolve_context_ontologies. | ||
| if resource.context_type == AgentContextType.DATA_FABRIC_ONTOLOGY: | ||
| return None |
There was a problem hiding this comment.
If it is not a standalone tool at runtime, I think it is confusing to model it as a top level resource at design time. So far, all "resource nodes" in a lowcode agent (either standalone or part of flow), are independently executable and show up in traces. This is now a different paradigm, it is an optional helper tool that will be part of another tool's subgraph.
That being said this only applies to how it's modeled today. If we indeed plan to expand ontology support in the future such that they will actually allow queries (via something like SPARQL statements for instance); then it will be better for future proofing to define them top level (at least in the package mapping). We can figure out a less confusing design time experience for now
There was a problem hiding this comment.
Yes, we have plan to expand ontology support to make it a primary design experience i.e, user will select the ontologies and then it will resolve the entities internally, thus decision of making it top level resource as a part of iterative development.
| lines.append("## Available Ontology (authoritative semantic schema)") | ||
| lines.append("") | ||
| lines.append( | ||
| f"This agent has a semantic ontology attached for these entities: " | ||
| f"{names}. It is the authoritative source for the exact column names, " | ||
| "value formats (date formats, codes, zero-padding), allowed values, " | ||
| "and the relationships between entities — richer and more reliable " | ||
| "than the field list below, which omits value formats and semantics." | ||
| ) | ||
| lines.append("") | ||
| lines.append( | ||
| "**Before writing any SQL, call the `fetch_ontology` tool once** to " | ||
| "load it, then base your column names, filter values, and joins on " | ||
| "what it says. The entity tables below are a quick reference only; " | ||
| "the ontology is the source of truth when they disagree." | ||
| ) | ||
| lines.append("") | ||
|
|
There was a problem hiding this comment.
nit: could be cleaner to have this as a single formatted string depending on names instead of individually applying each line like this.
Applicable to the existing sql_expert_system_prompt as well, but that one wasn't introduced by this PR
There was a problem hiding this comment.
Fixed, please review. Also there are some changes linked to your other comment in the data fabric prompt builder (adding ontology text in the prompt).
| # When short-circuiting to END, return ONLY the terminal-success | ||
| # ToolMessages so the outer agent's result is the query rows — not a | ||
| # co-issued fetch_ontology's OWL. On a non-terminal turn keep all messages | ||
| # so the inner LLM can use them on its next pass. |
There was a problem hiding this comment.
isn't concurrent execution of a ontology retrieval + data service query an anomaly? It doesn't seem to be correct. Why not mechanically enforce ontology retrieval and injecting it in the context. When is it useful for the llm to choose not to fetch the ontology?
There was a problem hiding this comment.
Agreed, earlier we were doing so as to support future use cases where llm will query the ontology (for ex using SPARQL), instead of giving the complete ontology to agent.
But I agree with you right now it makes more sense to mechanically injecting it in the system prompt.
I have made the changes please review again.
| ontologies: list[tuple[str, str | None]] = [] | ||
| for resource in resources: | ||
| if ( | ||
| isinstance(resource, AgentContextResourceConfig) | ||
| and resource.is_datafabric_ontology | ||
| ): | ||
| for item in resource.ontology_set or []: | ||
| ontologies.append((item.name, item.folder_key)) | ||
| return ontologies |
There was a problem hiding this comment.
if I understand correctly, we implicitly assume all topologies will apply to this data service entity context. Shouldn't the link be more explicitly defined? IE either:
a) when defining an Data Service Context resource you can also specify one or more ontologies
b) when defining the Ontology Context resource you specify the list of entities it describes
There was a problem hiding this comment.
I am currently working on adding the R2RML mapping which will resolve the entities from ontologies at the agent runtime by the llm node implicitly (I am working on it in separate PR and is currently in progress ).
…ring ontology prompt)
fbb0bea to
9a4a187
Compare
9a4a187 to
a35807b
Compare
…ame)" This reverts commit eebdfc2.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…builder self-contained
…ogy-fetch-tool # Conflicts: # pyproject.toml # tests/agent/tools/test_datafabric_prompt_builder.py
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|


What
Adds a standalone Data Fabric ontology tool for low-code agents. The agent selects ontologies (not entities); the tool derives the entities it may query from the ontology's R2RML, resolves their schemas, and runs the existing inner SQL sub-graph grounded on both the OWL (semantic schema) and the R2RML (ontology→table/column mapping).
An ontology context (
contextType: "datafabricontology") now becomes a tool on its own — noentitySetrequired inagent.json.How it works (first invocation, cached)
EntitiesService.get_ontology_file_async.(entity_name, folder_path)allow-list, readingrr:tableName+ auipath:folderPathannotation perrr:TriplesMap.folderPath → folder_key(folders.retrieve_key_async, cached per path) thenname → Entityschema (entities.retrieve_by_name_async); build a folder-scopedEntitiesServicevia the publicfolders_mapconstructor param (no SDK change).DataFabricGraph— the inner agent still has a single tool,execute_sql.Everything from the sub-graph down (
execute_sql→query_entity_records_async) is reused unchanged from the entity tool.Key modules
New
ontology_r2rml.py— dependency-free parser:parse_r2rml_entities()→ the(entity_name, folder_path)allow-list (block-based,R2RMLParseErroron contract violations).datafabric_ontology_tool.py—resolve_ontology_entities()(the resolver),DataFabricOntologyQueryHandler,create_datafabric_ontology_tool().datafabric_ontology_prompt_builder.py— the ontology tool's inner prompt (OWL + R2RML + entity schemas), fully self-contained so the entity-tool builder is left untouched by this feature.Changed
ontology_fetcher.py— generalized tofetch_ontology_file(raw content + media type; raises) +fence_ontology_block.datafabric_subgraph.py—DataFabricGraphis now prompt-agnostic (takes a pre-builtsystem_prompt).datafabric_tool.py/context_tool.py— removed all ontology grounding from the entity-tool path; the ontology context now builds the standalone tool (flag-gated).datafabric_prompt_builder.py— not modified by this PR (kept byte-identical tomain); it remains the entity tool's builder only.Design decisions
execute_sql's blast radius = the resolved set.uipath:folderPath(a folder path, not a GUID): R2RML has no folder concept and is deployment-agnostic;rr:tableNamestays a valid SQL table name. Folder identity is resolved through the trusted folder service, never from mapping content. (Authoring contract documented for the ontology-authoring skill.)uipath-langchain-pythonover existing publicuipath-platformmethods.Feature flag
DataFabricOntologyEnabled(default off), a single shared constant.context_tool— off ⇒ no tool created, feature fully inert) and the handler's lazy init (re-checks before any OWL/R2RML fetch/parse/resolve). Off ⇒ the agent runs exactly as before; the entity tool is flag-independent.Security
execute_sql's single-statementsqlparseguard is unchanged.Testing
folders_map), the ontology prompt builder, the factory, and the flag guard.datafabric/ab, gpt-5.4,california-schoolsontology): flag prefetched on → R2RML + OWL fetched →folderPathresolved → 3 entities resolved by name → OWL+R2RML-grounded SQL (join on the R2RML FK) executed folder-scoped → correct answer. Confirmedretrieve_by_name(/metadata) returns populated field schemas.Notes / dependencies
ontologySeton the context,get_ontology_file_async). Pinned touipath>=2.12.5, <2.13.0anduipath-platform>=0.1.91, <0.2.0— the SDK versions that carry the ontology binding.uipath-platform 0.1.91is not yet on PyPI (only testpypi dev builds), so CI dependency resolution stays red until #1728 merges and publishes; no.devpin is committed.DataFabricOntologyEnabledinuipath-agents-python's_ALL_FLAGSprefetch + the gitops flag deployed for the target tenants (both already in place from the prior work).uipath:folderPathperrr:TriplesMap(authoring-skill guidelines provided) — otherwise resolution fails loudly by design.