@data-fair/processing-datasets-list

Plugin for data-fair/processings. Builds and maintains a REST dataset that catalogs all the datasets accessible to the configured API key (typically every dataset of an organization).

Unlike the back-office datasets list view, this runs asynchronously: it can perform per-dataset computations and aggregations that would be too costly to do on the fly, and it materializes the result as a regular dataset — so it can be filtered, charted, embedded and published like any other.

How it works

Catalog dataset — on first run (create mode) it creates a REST dataset with the catalog schema (see below) and switches its own config to update mode, storing the reference.
Schema sync — on every run the catalog schema is re-pushed (PATCH), so new columns appear on existing catalogs without a manual rebuild.
Collect — it paginates through GET /api/v1/datasets (scoped by the API key), excluding the catalog dataset itself.
Upsert — each dataset becomes one line (keyed by the dataset id) pushed through the _bulk_lines API.
Prune — when enabled, lines of datasets that no longer exist are deleted so the catalog stays in sync.

Catalog schema

One line per dataset. It exposes the metadata data-fair stores on a dataset, plus aggregates that are not available in one click from the back-office. Columns are ordered by theme (data-fair renders dataset columns as a flat table, so the grouping is conveyed by ordering). Definitions live in lib/catalog.ts.

Columns are grouped with x-group. The schema leans on data-fair native rendering where it helps: owner carries the account concept (renders the owner avatar; value is type:id[:department]), page the WebPage concept (clickable link), description the description concept (markdown). Type, frequency and visibility store a stable code and map it to a nice label via x-labels (e.g. rest → Éditable). Only concepts that exist in data-fair's vocabulary (api/contract/vocabulary.js) are used.

Group (`x-group`)	Columns
Général	`storageType` (Type: Fichier/Éditable/Virtuel/Métadonnées), `page` (link), `id`, `slug`, `title`, `summary`, `description` (markdown), `image`, `keywords`, `topics`
Métadonnées	`license`, `conformsTo` + `conformsToVersion` + `conformsToUrl`, `origin`, `creator`, `frequency`, `spatial`, `temporalStart`, `temporalEnd`, `modified` (DCAT source modification date), `relatedDatasets`, then one column per owner-defined custom field (discovered per run)
Métadonnées calculées	`bbox`, `projection`, `timeZone` (auto-detected, not editable)
Propriété & visibilité	`owner` (avatar), `visibility`, `published`
Fichier	`fileName`, `fileFormat`
Structure & stockage	`count`, `nbColumns`, `primaryKey`, `storageSize`, `indexedSize`
Relations & enrichissements	`nbExtensions`, `nbAttachments`, `nbChildren` (virtual sources), `nbUsedInVirtual` (datasets reusing it), `nbApplications`, `nbRelatedDatasets`
Données de référence	`isMasterData` (exposes reference-data services), `nbBulkSearchs` (bulk enrichment endpoints), `nbSingleSearchs` (code/label search endpoints) — derived from the dataset's `masterData` config
Publication	`publicationPortals` (resolved portal names), `nbPublicationSites`, `nbRequestedPublicationSites`
État	`status`
Audit & dates	`createdAt`, `createdBy`, `updatedAt`, `updatedBy`, `dataUpdatedAt`, `dataUpdatedBy`, `finalizedAt`, `portalModified` (modification date shown on the portal: `modified` › `dataUpdatedAt` › `updatedAt`)

Multi-valued columns (keywords, topics, relatedDatasets, primaryKey, publicationPortals) carry a separator so data-fair treats them as arrays.

The computed aggregates (nbColumns, nbChildren, nbUsedInVirtual, nbApplications…) are cross-dataset information a synchronous list view cannot afford; nbUsedInVirtual is a reverse index built once per run over the whole dataset list. Custom metadata columns are appended dynamically (one per owner-defined field) at the end of the Métadonnées group. Their titles, and the publication portal names, are resolved from the owner settings (datasets-metadata and publication-sites) when the API key can read them; otherwise the raw keys/ids are used. portalModified is recomputed locally (modified › dataUpdatedAt › updatedAt) because data-fair stores it as the internal _modified field, which the public API does not expose.

Configuration

Tab	Field	Description
Jeu de données catalogue	`datasetMode`	`create` to create the catalog dataset, `update` to target an existing one
Jeu de données catalogue	`datasetTitle` / `dataset`	Title to create, or reference to the dataset to update

The catalog always includes metadata-only datasets, populates the storage size columns, and deletes catalog lines of datasets that no longer exist — these are not configurable.

Development

npm install
npm run build-types       # generates the .type/ artifacts from the JSON schemas
npm run lint
npm test                  # runs against the data-fair instance in config/local-test.mjs

Create a config/local-test.mjs (gitignored) with a dataFairUrl and a dataFairAPIKey to run the integration test against a real instance.

Release

Publishing is handled automatically by CI: the plugin is pushed to the data-fair registry (@data-fair/registry), not to the public npm registry. A push to main/master publishes to the staging registry; pushing a v* tag publishes to production:

npm version minor       # version bump + v* tag
git push --follow-tags  # CI publishes to the production registry

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
.husky		.husky
config		config
lib		lib
test-it		test-it
types/processingConfig		types/processingConfig
.gitignore		.gitignore
.nvmrc		.nvmrc
LICENSE		LICENSE
README.md		README.md
commitlint.config.ts		commitlint.config.ts
eslint.config.mjs		eslint.config.mjs
index.ts		index.ts
package-lock.json		package-lock.json
package.json		package.json
processing-config-schema.json		processing-config-schema.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

@data-fair/processing-datasets-list

How it works

Catalog schema

Configuration

Development

Release

About

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

@data-fair/processing-datasets-list

How it works

Catalog schema

Configuration

Development

Release

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages