tx-generator: generic remote submission transport (with an Ogmios backend)#6609
tx-generator: generic remote submission transport (with an Ogmios backend)#6609palas wants to merge 12 commits into
Conversation
Strenghten and polish rough edges of implementation: * `runScript` now returns `Maybe AsyncBenchmarkControl` instead of fabricating a no-op control: submit modes that never start the benchmark machinery (LocalSocket, Ogmios) yield Nothing, and a failing run no longer dies with the misleading "AsyncBenchmarkControl uninitialized" error that masked the real one. noopBenchmarkControl is gone; both call sites in Command.hs only ever consumed fst, so they are unaffected. * WebSocket failures (DNS, refused connection, handshake rejection, mid-stream drops, close frames) are caught around WS.runClient and converted into TxGenError instead of escaping as raw exceptions past the error machinery and logging shutdown. * `parseOgmiosUrl` validates the scheme (plain `ws://` only; `wss://` was silently degrading to a plaintext connection), parses the port via readMaybe with a 1-65535 range check instead of a partial read (`ws://host:/` no longer crashes), and rejects empty hosts. * Submission responses are subject to a 90s timeout (generous, since the node may hold submissions back under mempool pressure) and their JSON-RPC id is verified against the request id; a mismatched or null id is treated as a protocol fault and aborts the run with the offending response described. * json_highlevel configs that set ogmiosUrl without debugMode: true are rejected at compile time with an explanatory error: Ogmios mode ignores tps/targetNodes and produces no benchmark metrics, so a config asking for a real benchmark must fail fast rather than run unpaced and unmeasured. Low-level json scripts are unaffected, and compileOptions fails before any node interaction. * Polish: parseOgmiosUrl/parseOgmiosResponse/OgmiosResult exported to make them unit-testable, `fromMaybe Null` instead of `maybe Null id`, unused `RankNTypes` pragma dropped, haddock module header added, import list put in stylish-haskell order.
* Document the limitation on the implementation of the support
of Ogmios as a target that limits the throughput to one request
in flight per round trip (Ogmios supports pipelining by JSON-RPC id,
but we are not supporting it for now.
* Document that only submission goes through Ogmios:
protocol-parameter and era queries as well as protocol startup still
require the local node socket and config file.
* Route per-transaction rejections through the benchmark tracer instead
of putStrLn, so they reach the trace stream like every other
submission event instead of interleaving arbitrarily with it. The
failure detail payload reported by Ogmios ('error.data'), which
carries the actual ledger failure and was previously discarded, is
included in the message along with the error code.
* Add `ogmiosUrl` to the README's connection-settings table plus a
'Submitting through Ogmios' section, add a changelog entry for the
feature (including the clean-exit behavior change for scripts that
never start the benchmark machinery), and bump the package version
to 2.17.
* Capture the tx-generator exit code with an if/else instead of a bare $? after the subshell: under 'set -e' a failure used to abort the script on the spot, so the log tails and the exit-code report - the part that matters exactly when the run fails - were dead code. (Note the 'if ! (...); then TX_EXIT=$?' form would not work either: the negation resets $? to 0.) * Actually fail on functional failure: if no new UTxOs appear at the benchmark address the script now exits 1 instead of falling through and reporting success. The confirmation window is doubled to 120s - in practice inclusion took ~50s, which the previous 60s window only just covered. * Derive the benchmark address from the hardcoded "BenchmarkingDone" signing key (cf. keyBenchmarkDone in Compiler.hs) at run time instead of hardcoding the bech32 - if the key or the derivation ever changes, the test follows instead of silently counting a stale address. * Default the ogmios flake ref to the repository's master branch instead of a personal work-in-progress branch that will rot away; a different ref can still be passed as the first argument. * Use 'find -perm -u+x' in the cabal fallback - the previous '+111' is BSD-only syntax that GNU find rejects; '-u+x' works in both. * Document that debugMode is mandatory alongside ogmiosUrl, silence the SC2329 false positive for the trap-only cleanup function on the CI-pinned shellcheck, and check for required host commands (nix, jq, nc) up front.
A rejected transaction used to be counted and traced but otherwise ignored: the run carried on and exited 0 even if every submission was rejected, so a script could mistake a completely failed run for a successful one. Now a rejection makes the run fail, with the strategy depending on the shape of the stream being submitted. Streams of chained setup transactions (genesis import, splitting) abort at the first rejection - everything after it spends outputs that will never exist, so carrying on would only produce a cascade of confusing follow-up rejections. The benchmarking phase's NtoM stream consists of mutually independent transactions, so it is submitted to the end and the action fails afterwards when the tally shows rejections. Either way the process exits non-zero, making exit codes trustworthy in scripts. The strategy is picked in submitInEra from the generator: NtoM (looked up through Take/Cycle/Sequence wrappers) submits to the end, everything else aborts at the first rejection.
Three issues kept the script from resolving and launching ogmios:
1. Submodules. Resolving ogmios as `github:IntersectMBO/ogmios/<ref>`
downloads a GitHub tarball, which omits the hjsonpointer, hjsonschema
and wai-routes git submodules that ogmios' cabal.project depends on.
haskell.nix then fails plan-to-nix with "modules/hjsonpointer does
not contain any .cabal file". This is platform-independent (it fails
on x86_64 too). Use the `git+https://…?submodules=1` fetcher, which
actually pulls the submodules. Note that `github:…?submodules=1` is
silently ignored and the flake's own `self.submodules = true` does
not rescue the tarball path either.
2. Default ref. The cardano-node tools are derived from a cardano-node
input of the ogmios flake, which only the testnet-tx-gen-tests branch
declares; ogmios master has no such input, so the old default
produced 'github:null/null/null' and a 404. Default to the branch
that actually provides it. Also guard the lookup so a missing input
fails with a clear message instead of a cryptic fetch error.
3. Binary path. The ogmios line captured the nix output *directory* and
ran it directly ("Is a directory"); the cardano-* lines already
append /bin/<name>. Append /bin/ogmios to match.
Jimbo4350
left a comment
There was a problem hiding this comment.
This should not be Ogmios specific, it needs to be generic and at most a small section on how to use with Ogmios.
Submission to a remote endpoint is no longer Ogmios-specific. A new backend-agnostic transport interface sits behind the submission path, with Ogmios as the first (and currently only) backend. * Add Cardano.Benchmarking.Script.Submission: a SubmitTransport record (one submitOne action), the transport-agnostic submitLoop (rejection policy, tally, tracing, exit code) and the rejection-policy helpers. The backend's rejection type is kept abstract and only rendered for tracing (via Pretty), so no transport detail leaks into the loop. * Cardano.Benchmarking.Script.Ogmios becomes that interface's Ogmios backend: withOgmiosTransport builds a SubmitTransport over a WebSocket connection; its rejection type and protocol-fault handling stay internal to the module. * Config: replace the ogmiosUrl key with submissionEndpointType and submissionEndpointURI, which must be set together; SubmitMode's Ogmios constructor becomes SubmitToEndpoint SubmissionEndpointType. * Update README, CHANGELOG and test-ogmios.sh accordingly.
| => WS.Connection -> IORef Int -> Tx era -> IO (Either OgmiosRejection ()) | ||
| ogmiosSubmitOne conn reqIdRef tx = do | ||
| reqId <- atomicModifyIORef' reqIdRef $ \n -> (n + 1, n) | ||
| WS.sendTextData conn $ Aeson.encode (mkSubmitRequest tx reqId) |
There was a problem hiding this comment.
if the process stalls on the other end, even receiving data might get stuck. worth wrapping in timeout.
There was a problem hiding this comment.
Do you mean even submitting? Yes, apparently it can, good call
| parseOgmiosResponse bs = | ||
| case Aeson.eitherDecode bs of | ||
| Left err -> Left err | ||
| Right val -> Aeson.parseEither parseResponse val | ||
| where | ||
| parseResponse = Aeson.withObject "OgmiosResponse" $ \obj -> do | ||
| respId <- obj .:? "id" | ||
| mResult <- obj .:? "result" | ||
| result <- case mResult of | ||
| Just resultVal -> do | ||
| txId <- Aeson.withObject "result" (\r -> do | ||
| txObj <- r .: "transaction" | ||
| Aeson.withObject "transaction" (.: "id") txObj | ||
| ) resultVal | ||
| return $ OgmiosSuccess txId | ||
| Nothing -> do | ||
| errVal <- obj .: "error" | ||
| Aeson.withObject "error" (\errObj -> do | ||
| code <- errObj .: "code" | ||
| msg <- errObj .: "message" | ||
| dat <- errObj .:? "data" | ||
| return $ OgmiosError code msg (fromMaybe Null dat) | ||
| ) errVal | ||
| return (respId, result) |
There was a problem hiding this comment.
Staircase pattern: nested case mResult → withObject "result" → withObject "transaction", then Nothing → withObject "error" - four levels of rightward drift.
Consider extracting parseSuccess and parseError helpers, e.g.:
parseResponse = Aeson.withObject "OgmiosResponse" $ \obj -> do
respId <- obj .:? "id"
mResult <- obj .:? "result"
result <- maybe (parseError obj) parseSuccess mResult
return (respId, result)
parseSuccess :: Value -> Aeson.Parser OgmiosResult
parseSuccess = Aeson.withObject "result" $ \r -> do
txObj <- r .: "transaction"
txId <- Aeson.withObject "transaction" (.: "id") txObj
return $ OgmiosSuccess txId
parseError :: Aeson.Object -> Aeson.Parser OgmiosResult
parseError obj = do
errVal <- obj .: "error"
Aeson.withObject "error" (\errObj ->
OgmiosError <$> errObj .: "code" <*> errObj .: "message"
<*> (fromMaybe Null <$> errObj .:? "data")
) errVal| onRejectionFor :: Generator -> OnRejection | ||
| onRejectionFor generator = case generator of | ||
| NtoM {} -> ContinueOnRejection | ||
| Take _ g -> onRejectionFor g | ||
| Cycle g -> onRejectionFor g | ||
| Sequence gs | ||
| | any ((ContinueOnRejection ==) . onRejectionFor) gs -> ContinueOnRejection | ||
| _ -> AbortOnRejection |
There was a problem hiding this comment.
onRejectionFor is decided once for the whole stream, but Sequence can mix chained and independent sub-generators. If a low-level JSON script uses Sequence [Split ..., NtoM ...], NtoM makes the whole sequence ContinueOnRejection - so a rejected Split tx doesn't abort the loop, and the rest of the chained split txs (plus the NtoM txs that depend on their outputs) are submitted anyway, all doomed.
The high-level compiler emits each phase as a separate Submit action, so this doesn't bite there. But the logic is still wrong for mixed Sequence generators in raw scripts.
One option: apply the policy per sub-generator instead of per stream - abort when the current sub-generator says abort, continue when it says continue. Alternatively, since Sequence with mixed dependency chains is an edge case, a simpler fix would be to just use AbortOnRejection for Sequence unconditionally (abort is always safe, just conservative).
There was a problem hiding this comment.
Hm, for the low-level, it is tricky to know what to do. Continuing is more conservative. But I am not even sure we should abort on split, because maybe nothing depends on it. Maybe we should just abort always on low level, or something like that
| parseOgmiosResponse bs = | ||
| case Aeson.eitherDecode bs of | ||
| Left err -> Left err | ||
| Right val -> Aeson.parseEither parseResponse val |
There was a problem hiding this comment.
| parseOgmiosResponse bs = | |
| case Aeson.eitherDecode bs of | |
| Left err -> Left err | |
| Right val -> Aeson.parseEither parseResponse val | |
| parseOgmiosResponse = fmap (Aeson.parseEither parseResponse) . Aeson.eitherDecode |
There was a problem hiding this comment.
This looks like a good use case for process-compose, rather than writing own shell glue for handling processes.
| -- submission metrics, so a config that asks for a real benchmark must fail | ||
| -- fast here instead of running unpaced and unmeasured. | ||
| submitMode <- case (endpoint, debugMode) of | ||
| (Just (eType, uri), True) -> pure $ SubmitToEndpoint eType uri |
There was a problem hiding this comment.
I think uri should be already parsed at this stage instead of being String
| DumpToFile :: !FilePath -> SubmitMode | ||
| DiscardTX :: SubmitMode | ||
| NodeToNode :: NonEmpty NodeIPv4Address -> SubmitMode --deprecated | ||
| SubmitToEndpoint :: !SubmissionEndpointType -> !String -> SubmitMode |
There was a problem hiding this comment.
I am not convinced if this is the most convenient representation here. I'd just add separate SubmitModes for each backend i.e. keep ogmios here instead in SubmissionEndpointType.
There was a problem hiding this comment.
I'm good with either. String needs a haddock though and the Type in SubmissionEndpointType is redundant.
There was a problem hiding this comment.
I'd just add separate SubmitModes for each backend i.e. keep ogmios here instead in SubmissionEndpointType.
That is what I did originally, and then I changed it so that it wasn't so ogmios specific. In retrospective, I kind of like the current approach, and I think it will probably make adding other endpoints easier in the future. It may be that when we implement gRPC we want to do it differently, but well...
There was a problem hiding this comment.
I like that this module is abstracted away from the supported backends. 👍🏻
carbolymer
left a comment
There was a problem hiding this comment.
LGTM but I'm leaving approval for perf team
|
Nit: "the script monad" — did you mean |
There was a problem hiding this comment.
Closer. I'll have another look when you've reconciled @carbolymer 's comments (where you see fit).
| -- the 'SubmitTransport' and rendered for tracing via 'Pretty'. Deliberately | ||
| -- not exported: the submission loop only needs to render it, so no other | ||
| -- module should depend on its shape. | ||
| data OgmiosRejection = OgmiosRejection !Int !Text !Value |
There was a problem hiding this comment.
Haddocks for this type is missing. What is Int? What is Text?
| DumpToFile :: !FilePath -> SubmitMode | ||
| DiscardTX :: SubmitMode | ||
| NodeToNode :: NonEmpty NodeIPv4Address -> SubmitMode --deprecated | ||
| SubmitToEndpoint :: !SubmissionEndpointType -> !String -> SubmitMode |
There was a problem hiding this comment.
I'm good with either. String needs a haddock though and the Type in SubmissionEndpointType is redundant.
Description
Adds a generic remote submission transport to
tx-generator: point it at a submission endpoint and every transaction is submitted there as the endpoint's native calls, instead of via the local socket / Node-to-Node protocols. The transport is backend-agnostic; the first (and currently only) backend is Ogmios (JSON-RPC 2.0submitTransactionover a WebSocket).It is a functional submission transport, not a benchmarking one — useful for exercising a node through an endpoint end-to-end, not for measuring throughput.
Following review, the submission path is intentionally not Ogmios-specific: it is a generic interface with Ogmios as one implementation. Adding another backend later is a localised change (a new endpoint type + a transport builder + a dispatch arm), with no changes to the submission loop or its error handling.
What it adds
submissionEndpointType+submissionEndpointURIin the high-level JSON config, which must be set together — e.g."submissionEndpointType": "Ogmios","submissionEndpointURI": "ws://127.0.0.1:1337". When set, every phase — genesis import, UTxO splitting, and the benchmark phase — submits through the endpoint.Cardano.Benchmarking.Script.Submission), with Ogmios as one backend (Cardano.Benchmarking.Script.Ogmios).websockets+network-urideps).test-ogmios.sh: an integration test that spins upcardano-testnet+ Ogmios and submits through it.debugMode: true, or low-level scripts with noBenchmarkphase) used to crash at shutdown with "AsyncBenchmarkControl absent". They now exit cleanly —runScriptreturnsMaybe AsyncBenchmarkControlinstead of fabricating a no-op control.Implementation nuances
submitLoopdrives aSubmitTransport(onesubmitOneaction per backend) and owns the rejection policy, the sent/failed tally, tracing, and the exit-code logic. A backend's rejection type stays its own and is only rendered for tracing (aPrettyconstraint), so no transport detail leaks into the loop.SubmitMode's constructor is nowSubmitToEndpoint SubmissionEndpointType.debugMode: true. A submission endpoint ignorestps/targetNodesand produces no benchmark metrics, so the compiler rejects an endpoint config withoutdebugMode: trueat compile time (fail fast, before any node interaction).submissionEndpointTypeandsubmissionEndpointURImust be set together, or the compiler rejects the config. Low-leveljsonscripts are unaffected.id) is deliberately not used yet; it is noted in the module docs as a potential next step.Testing
test-ogmios.shis a self-contained integration test (nix-resolved Ogmios +cardano-testnet). It is not wired into CI (it needs nix). The Ogmios URL parser (parseOgmiosUrl) is exported to keep it unit-testable.Checklist
See Running tests for more details
CHANGELOG.mdfor affected package.cabalfiles are updatedhlint. See.github/workflows/check-hlint.ymlto get thehlintversionstylish-haskell. See.github/workflows/stylish-haskell.ymlto get thestylish-haskellversionghc-9.6andghc-9.12