Plugin for data-fair/processings. Builds and maintains a REST dataset that catalogs all the datasets accessible to the configured API key (typically every dataset of an organization).
Unlike the back-office datasets list view, this runs asynchronously: it can perform per-dataset computations and aggregations that would be too costly to do on the fly, and it materializes the result as a regular dataset — so it can be filtered, charted, embedded and published like any other.
- Catalog dataset — on first run (
createmode) it creates a REST dataset with the catalog schema (see below) and switches its own config toupdatemode, storing the reference. - Schema sync — on every run the catalog schema is re-pushed (
PATCH), so new columns appear on existing catalogs without a manual rebuild. - Collect — it paginates through
GET /api/v1/datasets(scoped by the API key), excluding the catalog dataset itself. - Upsert — each dataset becomes one line (keyed by the dataset id) pushed through the
_bulk_linesAPI. - Prune — when enabled, lines of datasets that no longer exist are deleted so the catalog stays in sync.
One line per dataset. It exposes the metadata data-fair stores on a dataset, plus
aggregates that are not available in one click from the back-office. Columns are
ordered by theme (data-fair renders dataset columns as a flat table, so the
grouping is conveyed by ordering). Definitions live in lib/catalog.ts.
Columns are grouped with x-group. The schema leans on data-fair native
rendering where it helps: owner carries the account concept (renders the
owner avatar; value is type:id[:department]), page the WebPage concept
(clickable link), description the description concept (markdown). Type,
frequency and visibility store a stable code and map it to a nice label via
x-labels (e.g. rest → Éditable). Only concepts that exist in data-fair's
vocabulary (api/contract/vocabulary.js) are used.
Group (x-group) |
Columns |
|---|---|
| Général | storageType (Type: Fichier/Éditable/Virtuel/Métadonnées), page (link), id, slug, title, summary, description (markdown), image, keywords, topics |
| Métadonnées | license, conformsTo + conformsToVersion + conformsToUrl, origin, creator, frequency, spatial, temporalStart, temporalEnd, modified (DCAT source modification date), relatedDatasets, then one column per owner-defined custom field (discovered per run) |
| Métadonnées calculées | bbox, projection, timeZone (auto-detected, not editable) |
| Propriété & visibilité | owner (avatar), visibility, published |
| Fichier | fileName, fileFormat |
| Structure & stockage | count, nbColumns, primaryKey, storageSize, indexedSize |
| Relations & enrichissements | nbExtensions, nbAttachments, nbChildren (virtual sources), nbUsedInVirtual (datasets reusing it), nbApplications, nbRelatedDatasets |
| Données de référence | isMasterData (exposes reference-data services), nbBulkSearchs (bulk enrichment endpoints), nbSingleSearchs (code/label search endpoints) — derived from the dataset's masterData config |
| Publication | publicationPortals (resolved portal names), nbPublicationSites, nbRequestedPublicationSites |
| État | status |
| Audit & dates | createdAt, createdBy, updatedAt, updatedBy, dataUpdatedAt, dataUpdatedBy, finalizedAt, portalModified (modification date shown on the portal: modified › dataUpdatedAt › updatedAt) |
Multi-valued columns (keywords, topics, relatedDatasets, primaryKey,
publicationPortals) carry a separator so data-fair treats them as arrays.
The computed aggregates (nbColumns, nbChildren, nbUsedInVirtual,
nbApplications…) are cross-dataset information a synchronous list view cannot
afford; nbUsedInVirtual is a reverse index built once per run over the whole
dataset list. Custom metadata columns are appended dynamically (one per
owner-defined field) at the end of the Métadonnées group. Their titles, and
the publication portal names, are resolved from the owner settings
(datasets-metadata and publication-sites) when the API key can read them;
otherwise the raw keys/ids are used. portalModified is recomputed locally
(modified › dataUpdatedAt › updatedAt) because data-fair stores it as the
internal _modified field, which the public API does not expose.
| Tab | Field | Description |
|---|---|---|
| Jeu de données catalogue | datasetMode |
create to create the catalog dataset, update to target an existing one |
| Jeu de données catalogue | datasetTitle / dataset |
Title to create, or reference to the dataset to update |
The catalog always includes metadata-only datasets, populates the storage size columns, and deletes catalog lines of datasets that no longer exist — these are not configurable.
npm install
npm run build-types # generates the .type/ artifacts from the JSON schemas
npm run lint
npm test # runs against the data-fair instance in config/local-test.mjsCreate a config/local-test.mjs (gitignored) with a dataFairUrl and a dataFairAPIKey to run the integration test against a real instance.
Publishing is handled automatically by CI: the plugin is pushed to the data-fair registry (@data-fair/registry), not to the public npm registry. A push to main/master publishes to the staging registry; pushing a v* tag publishes to production:
npm version minor # version bump + v* tag
git push --follow-tags # CI publishes to the production registry