Provenance

Support for provenance was developed as a BIDS Extension Proposal. Please see Citing BIDS on how to appropriately credit this extension when referring to it in the context of the academic literature.

Example datasets

Several example datasets have been formatted using this specification and can be used for practical guidance when curating a new dataset.

This part of the BIDS specification is aimed at describing the provenance of a BIDS dataset. This description is retrospective: it describes a set of steps that were executed in order to establish the dataset and is based on W3C PROV (see Provenance graph).

Provenance information SHOULD be included in a BIDS dataset when possible. If provenance information is included, it MUST be described using the conventions detailed hereafter. Provenance information reflects the provenance of a full dataset and/or of specific files at any level of the BIDS hierarchy. Provenance information SHOULD not include human subject identifying data.

Note

Throughout this document, the terms Id and Label are used to provide identification for JSON objects related to provenance. Id is used to unambiguously identify those objects that may be referenced elsewhere, permitting automated tools to construct and query a graph. Label is a human-readable name for that object, which need not be unique, and should not be confused with the BIDS term label.

Provenance of a BIDS file

Provenance of a BIDS data file SHOULD be stored inside its sidecar JSON.

For that purpose, any sidecar JSON file MAY include the following keys:

Key name	Requirement Level	Data type	Description
GeneratedBy	OPTIONAL	array of strings	Identifier(s) of the activity/activities responsible for the creation of the file. Related activities MUST be described as specified in the Activities section. Corresponds to W3C PROV `prov:wasGeneratedBy`.
SidecarGeneratedBy	OPTIONAL	array of strings	Identifier(s) of the activity/activities responsible for the creation of the sidecar JSON file. Related activities MUST be described as specified in the Activities section. Corresponds to W3C PROV `prov:wasGeneratedBy`.
Digest	OPTIONAL	object	Object containing digests of the file. Each key in the object MUST be the name of a checksum function if present in this list: `MD5`; `SHA1`; `SHA-224` ; `SHA-256` ; `SHA-384` ; `SHA-512` ; `SHA3-224`; `SHA3-256`; `SHA3-384`; `SHA3-512`; `BLAKE2B-256`; `BLAKE3-256`; `SHAKE128`; `SHAKE256`. Otherwise, key MAY be an arbitrary label. The corresponding value is the checksum as computed by the function identified by the key.
Type	OPTIONAL	array of strings	Term(s) from (a) controlled vocabulary/vocabularies that more specifically describes the file. Corresponds to W3C PROV `prov:type`.

Example of metadata in a sidecar JSON file

{
    "GeneratedBy": "bids::prov#conversion-00f3a18f",
    "SidecarGeneratedBy": [
        "bids::prov#preparation-conversion-1xkhm1ft",
        "bids::prov#conversion-00f3a18f"
    ],
    "Digest": {
        "SHA-256": "66eeafb465559148e0222d4079558a8354eb09b9efabcc47cd5b8af6eed51907"
    }
}

For a complete example see Provenance of DICOM to NIfTI conversion with heudiconv.

Provenance of a BIDS dataset

Provenance of a BIDS dataset (raw, derivative, or study) SHOULD be stored inside its dataset_description.json file. The dataset_description.json file of a BIDS raw dataset or BIDS study dataset MAY include the GeneratedBy key to describe provenance. The dataset_description.json file of a BIDS derivative dataset MUST include the GeneratedBy key to describe provenance.

The GeneratedBy field MAY contain either of the following values:

Identifier(s) of the activity/activities responsible for the creation of the dataset (see Description using identifiers).
A description of pipelines or processes responsible for the creation of the dataset (see Description of pipelines or processes).

Description using identifiers

This section details how to describe provenance of a dataset using identifiers. The following field is intended for use in dataset_description.json to provide provenance information that applies to the entire dataset.

Key name	Requirement Level	Data type	Description
GeneratedBy	RECOMMENDED for BIDS raw datasets and BIDS study datasets, REQUIRED for BIDS derivative datasets	array of strings	Identifier(s) of the activity/activities responsible for the creation of the dataset. Related activities MUST be described as specified in the Activities section. Corresponds to W3C PROV `prov:wasGeneratedBy`.

Example of GeneratedBy contents in a dataset_description.json

{
    "GeneratedBy": "bids::prov#preprocessing-xMpFqB5q"
}

For a complete example see Provenance of fMRI preprocessing with fMRIPrep.

Description of processes or pipelines

This section details how to describe the provenance of a dataset using an array of objects representing pipelines or processes that generated the dataset.

Warning

This description can be equivalently represented using the previous section. This modeling is kept for backward-compatibility but might be removed in future BIDS releases (see BIDS 2.0).

Key name	Requirement Level	Data type	Description
GeneratedBy	RECOMMENDED for BIDS raw datasets and BIDS study datasets, REQUIRED for BIDS derivative datasets	array of objects	Used to specify provenance of the dataset.

Each object in the GeneratedBy array includes the following REQUIRED, RECOMMENDED and OPTIONAL keys:

Key name	Requirement Level	Data type	Description
Name	REQUIRED	string	Name of the pipeline or process that generated the dataset. Use `"Manual"` to indicate the derivatives were generated by hand, or adjusted manually after an initial run of an automated pipeline.
Version	RECOMMENDED	string	Version of the pipeline or process that generated the dataset.
Description	RECOMMENDED if `Name` is `"Manual"`, OPTIONAL otherwise	string	Plain-text description of the pipeline or process that generated the dataset. RECOMMENDED if `Name` is `"Manual"`.
CodeURL	OPTIONAL	string	URL where the code used to generate the dataset may be found.
Container	OPTIONAL	object	Used to specify the location and relevant attributes of software container image used to produce the dataset. Valid keys in this object include `Type`, `Tag` and [`URI`][uri] with [string][] values.

Example of GeneratedBy contents in a dataset_description.json

{
    "GeneratedBy": [
        {
          "Name": "reproin",
          "Version": "0.6.0",
          "Container": {
            "Type": "docker",
            "Tag": "repronim/reproin:0.6.0"
          }
        }
    ]
}

Provenance files

Any provenance information that can't be stored in either sidecar JSON files (see Provenance of BIDS file) or in dataset_description.json (see Provenance of BIDS dataset) MUST be stored in provenance files under the /prov/ directory.

Template:

prov/
    prov-<label>_act.json
    prov-<label>_ent.json
    prov-<label>_env.json
    prov-<label>_soft.json

Legend:

For more information about filename elements (for example, entities, suffixes, extensions), follow the links embedded in the filename template.
<matches> is a placeholder to denote an arbitrary (and valid) sequence of entities and labels at the beginning of the filename (only BIDS "raw").
<source-entities> is a placeholder to denote an arbitrary sequence of entities and labels at the beginning of the filename matching a source file from which the file derives (only BIDS-Derivatives).
Filename entities or directories between square brackets (for example, [_ses-<label>]) are OPTIONAL.
Some entities may only allow specific values, in which case those values are listed in <>, separated by |.
_<suffix> means that there are several (>6) valid suffixes for this filename pattern.
.<extension> means that there are several (>6) valid extensions for this file type.
[.gz] means that both the unzipped and gzipped versions of the extension are valid.

Note

The prov entity allows to group related provenance files, using an arbitrary value for <label>. A subdirectory MAY be used to group provenance files sharing the same prov entity.

The following suffixes specify the contents of provenance files.

Name	`suffix`	Description
Description of activities	act	A JSON file containing objects describing activities in the context of provenance. (See the `Activities` section).
Description of input and output data	ent	A JSON file containing objects describing input and output data in the context of provenance. (See the `Input and output data` section).
Description of environments	env	A JSON file containing objects describing environments in the context of provenance. (See the `Environments` section).
Description of software	soft	A JSON file containing objects describing software in the context of provenance. (See the `Software` section).

Example of organization for provenance files

prov/
├─ prov-preprocspm/
│  ├─ prov-preprocspm_act.json
│  └─ prov-preprocspm_ent.json
├─ prov-preprocfsl_act.json
├─ prov-preprocfsl_ent.json
├─ prov-preprocfsl_env.json
├─ prov-preprocfsl_soft.json
└─ ...

Activities

Activities are transformations that have been applied to data.

Each file with an act suffix is a JSON file describing activities. It MUST include the following key:

Key name	Requirement Level	Data type	Description
Activities	REQUIRED	array of objects	Objects describing activities.

Each object in the Activities array includes the following keys:

Key name	Requirement Level	Data type	Description
Id	REQUIRED	string	Identifier for the activity. Corresponds to JSON-LD `@id`.
Label	REQUIRED	string	Name for the activity. Corresponds to RDF Schema `rdfs:label`.
Command	REQUIRED	string or null	Command (or commands) performed by the activity, including all parameters. Set to `null` to describe that the activity was performed manually.
Description	OPTIONAL	string	Plain-text extended description of the activity. RECOMMENDED if `Command` is set to `null`. Corresponds to RDF Schema `rdfs:comment`.
AssociatedWith	OPTIONAL	array of strings	Identifier(s) of the software package(s) used to compute the activity. Related software MUST be described as specified in the Software section. Corresponds to W3C PROV `prov:wasAssociatedWith`.
Used	OPTIONAL	array of strings	Identifier(s) of the input and output data or environment(s) used by the activity. Related input and output data MUST be described as specified in the Input and output data section. Related environment(s) MUST be described as specified in the Environments section. Corresponds to W3C PROV `prov:used`.
Type	OPTIONAL	array of strings	Term(s) from (a) controlled vocabulary/vocabularies that more specifically describes the activity. Corresponds to W3C PROV `prov:type`.
StartedAtTime	OPTIONAL	string	Timestamp tracking when the activity started. Corresponds to W3C PROV `prov:startedAtTime`.
EndedAtTime	OPTIONAL	string	Timestamp tracking when the activity ended. Corresponds to W3C PROV `prov:endedAtTime`.

Example: description of an activity in a prov/[<subdir>/]prov-<label>_act.json file

{
    "Activities": [
        {
            "Id": "bids::prov#conversion-00f3a18f",
            "Label": "Dicom to NIfTI conversion",
            "Command": "dcm2niix -o . -f sub-%i/anat/sub-%i_T1w sourcedata/dicoms",
            "AssociatedWith": "bids::prov#dcm2niix-khhkm7u1",
            "Used": [
                "bids::prov#fedora-uldfv058",
                "bids::sourcedata/dicoms"
            ],
            "StartedAtTime": "2025-03-13T10:26:00",
            "EndedAtTime": "2025-03-13T10:26:05"
        }
    ]
}

For a complete example, see Provenance of DICOM to NIfTI conversion with dcm2niix.

Software

This section specifies how to describe software packages that computed the activities.

Each file with a soft suffix is a JSON file describing software. It MUST include the following key:

Key name	Requirement Level	Data type	Description
Software	REQUIRED	array of objects	Objects describing software.

Each object in the Software array includes the following keys:

Key name	Requirement Level	Data type	Description
Id	REQUIRED	string	Identifier for the software package. Corresponds to JSON-LD `@id`.
Label	REQUIRED	string	Name of the software package. Corresponds to RDF Schema `rdfs:label`.
Version	REQUIRED	string	Version of the software package.
AlternativeIdentifier	OPTIONAL	array of strings	URI(s) of (an) alternative identifier(s) (such as RRID) for the software package.
ActedOnBehalfOf	OPTIONAL	array of strings	Identifier(s) of other software package(s) that triggered the use of the software package. Example: if software A launches software B to perform activity C, then B ActedOnBehalfOf A. Related software MUST be described as specified in the Software section. Corresponds to W3C PROV `prov:actedOnBehalfOf`.

Example: description of a software package in a prov/[<subdir>/]prov-<label>_soft.json file

{
    "Software": [
        {
            "Id": "bids::prov#dcm2niix-khhkm7u1",
            "AlternativeIdentifier": ["RRID:SCR_023517"],
            "Label": "dcm2niix",
            "Version": "v1.0.20220720"
        }
    ]
}

For a complete example, see Provenance of DICOM to NIfTI conversion with dcm2niix

Input and output data

This section specifies how to describe input and output data for activities. This data corresponds to the W3C PROV prov:Entity class that includes files, datasets and other types of data.

Each file with a ent suffix is a JSON file describing input and output data.

Note

The ent suffix stands for prov:Entity.

Warning

These files SHOULD not describe files that are available in the dataset. See Provenance of a BIDS file for this purpose.

These files SHOULD not describe the current dataset. See Provenance of a BIDS dataset for this purpose.

Each file MUST include one or more of the following keys:

Key name	Requirement Level	Data type	Description
Files	OPTIONAL, but REQUIRED if `prov:Entity` and `Datasets` fields are absent	array of objects	Objects describing files.
Datasets	OPTIONAL, but REQUIRED if `Files` and `prov:Entity` fields are absent	array of objects	Objects describing datasets.
prov:Entity	OPTIONAL, but REQUIRED if `Files` and `Datasets` fields are absent	array of objects	Objects describing prov:Entity objects other than files or datasets.

Each object in the Files array includes the following keys:

Key name	Requirement Level	Data type	Description
Id	REQUIRED	string	Identifier for the file. Corresponds to JSON-LD `@id`.
Label	REQUIRED	string	Name for the file. Corresponds to RDF Schema `rdfs:label`.
Digest	RECOMMENDED	object	Object containing digests of the file. Each key in the object MUST be the name of a checksum function if present in this list: `MD5`; `SHA1`; `SHA-224` ; `SHA-256` ; `SHA-384` ; `SHA-512` ; `SHA3-224`; `SHA3-256`; `SHA3-384`; `SHA3-512`; `BLAKE2B-256`; `BLAKE3-256`; `SHAKE128`; `SHAKE256`. Otherwise, key MAY be an arbitrary label. The corresponding value is the checksum as computed by the function identified by the key.
AtLocation	OPTIONAL	string	Relative path to the file on disk. Corresponds to W3C PROV `prov:atLocation`.
GeneratedBy	OPTIONAL	array of strings	Identifier(s) of the activity/activities responsible for the creation of the file. Related activities MUST be described as specified in the Activities section. Corresponds to W3C PROV `prov:wasGeneratedBy`.
Type	OPTIONAL	array of strings	Term(s) from (a) controlled vocabulary/vocabularies that more specifically describes the file. Corresponds to W3C PROV `prov:type`.

Each object in the Datasets array includes the following keys:

Key name	Requirement Level	Data type	Description
Id	REQUIRED	string	Identifier for the dataset. Corresponds to JSON-LD `@id`.
Label	REQUIRED	string	Name for the dataset. Corresponds to RDF Schema `rdfs:label`.
GeneratedBy	OPTIONAL	array of strings	Identifier(s) of the activity/activities responsible for the creation of the dataset. Related activities MUST be described as specified in the Activities section. Corresponds to W3C PROV `prov:wasGeneratedBy`.

Each object in the prov:Entity array includes the following keys:

Key name	Requirement Level	Data type	Description
Id	REQUIRED	string	Identifier for the prov:Entity. Corresponds to JSON-LD `@id`.
Label	REQUIRED	string	Name for the prov:Entity. Corresponds to RDF Schema `rdfs:label`.
Digest	RECOMMENDED	object	Object containing digests of the prov:Entity. Each key in the object MUST be the name of a checksum function if present in this list: `MD5`; `SHA1`; `SHA-224` ; `SHA-256` ; `SHA-384` ; `SHA-512` ; `SHA3-224`; `SHA3-256`; `SHA3-384`; `SHA3-512`; `BLAKE2B-256`; `BLAKE3-256`; `SHAKE128`; `SHAKE256`. Otherwise, key MAY be an arbitrary label. The corresponding value is the checksum as computed by the function identified by the key.
GeneratedBy	OPTIONAL	array of strings	Identifier(s) of the activity/activities responsible for the creation of the prov:Entity. Related activities MUST be described as specified in the Activities section. Corresponds to W3C PROV `prov:wasGeneratedBy`.
Type	OPTIONAL	array of strings	Term(s) from (a) controlled vocabulary/vocabularies that more specifically describes the prov:Entity. Corresponds to W3C PROV `prov:type`.

Example: description of a file in a prov/[<subdir>/]prov-<label>_ent.json file

{
    "Files": [
        {
            "Id": "bids::sub-01/anat/sub-01_T1w.nii#97a89211",
            "Label": "sub-01_T1w.nii",
            "AtLocation": "sub-01/anat/sub-01_T1w.nii",
            "GeneratedBy": "bids::prov#gunzip-e9264918",
            "Digest": {
                "SHA-256": "45485541db5734f565b7cac3e009f8b02907245fc6db435c700e84d1037773b5"
            }
        }
    ]
}

For a complete example, see Provenance of fMRI preprocessing with SPM

Example: description of a dataset in a prov/[<subdir>/]prov-<label>_ent.json file

{
    "Datasets": [
        {
            "Id": "bids:ds001734:.",
            "Label": "NARPS"
        }
    ]
}

For a complete example, see Provenance of fMRI preprocessing with fMRIPrep.

Environments

This section specifies how to describe software environments in which activities were performed.

Each file with a env suffix is a JSON file describing environments. It MUST include the following key:

Key name	Requirement Level	Data type	Description
Environments	REQUIRED	array of objects	Objects describing environments.

Each object in the Environments array includes the following keys:

Key name	Requirement Level	Data type	Description
Id	REQUIRED	string	Identifier for the environment. Corresponds to JSON-LD `@id`.
Label	REQUIRED	string	Name for the environment. Corresponds to RDF Schema `rdfs:label`.
AlternativeIdentifier	OPTIONAL	array of strings	URI(s) of (an) alternative identifier(s) for the environment.
EnvironmentVariables	OPTIONAL	object	Object containing environment variables as key-value pairs.
OperatingSystem	OPTIONAL	string	Name of the operating system for the environment. Including the version of the kernel and/or distribution is RECOMMENDED when applicable.
Dependencies	OPTIONAL	object	Object containing names of the software dependencies as keys and their versions as values.

Example: description of an environment (docker container) in a prov/[<subdir>/]prov-<label>_env.json file

{
    "Environments": [
        {
            "Id": "bids::prov#poldracklab/fmriprep-mHl7Dqa0",
            "Label": "poldracklab/fmriprep:1.1.4",
            "AlternativeIdentifier": [
                "https://hub.docker.com/layers/poldracklab/fmriprep/1.1.4"
            ]
        }
    ]
}

For a complete example, see Provenance of fMRI preprocessing with fMRIPrep.

Provenance label file

Template:

prov/
    provenance.tsv
    provenance.json

The purpose of this RECOMMENDED file is to describe properties of prov- entities used in the names of provenance files. It MUST contain the column provenance_id, which MUST consist of prov-<label> values identifying one row for each prov entity in the dataset, followed by an optional column containing a description for the entity. Each entity MUST be described by one and only one row.

We RECOMMEND to make use of these columns, and in case that you do use them, we RECOMMEND to use the following values for them:

Column name	Requirement Level	Data type	Description
provenance_id	REQUIRED	string	An identifier of the form `prov-<label>`, matching a `prov` entity found in the dataset. There MUST be exactly one row for each `prov-<label>` entity. Values in `provenance_id` MUST be unique. This column must appear first in the file.
description	OPTIONAL	string	Free-form text description of the provenance file(s). This column may appear anywhere in the file.
Additional Columns	OPTIONAL	`n/a`	Additional columns are allowed if they are defined in the associated metadata file.

Throughout BIDS you can indicate missing values with n/a (for "not available").

provenance.tsv example:

provenance_id	description
prov-preprocspm	Provenance of preprocessing performed with SPM.
prov-preprocfsl	Provenance of preprocessing performed with FSL.

Additional columns may be added to provenance.tsv but MUST be accompanied with a provenance.json sidecar file to describe the TSV column names and properties of their values as outlined in common principles for tabular files.

Provenance identifiers

Identifiers for JSON objects related to provenance must be IRIs. The following rules and conventions are provided in order to have consistent, human readable, unique, and explicit IRIs as identifiers.

Identifiers for input and output data

The identifier for a BIDS file or a BIDS dataset MUST be a BIDS URI. The identifier for a no-longer-existing BIDS file or BIDS dataset SHOULD be a BIDS URI with a fragment part.

Warning

The use of BIDS URIs may require to define the DatasetLinks object in dataset_description.json.

Apart from BIDS files and BIDS datasets, identifiers for a prov:Entity (see Input and output data) in a BIDS dataset <dataset-name> MAY have the following form, where <label> is an arbitrary value for identifying the prov:Entity.

bids:[<dataset-name>]:prov#entity-<label>

Examples of identifiers for input and output data

BIDS files and datasets

bids:ds000011:sub-01/anat/sub-01_T1w.nii.gz - identifier for a T1w file for subject sub-01 in the ds000011 dataset;
bids::sub-014/func/sub-014_task-MGT_run-01_events.tsv - identifier for an events file for subject sub-014 in the current dataset;
bids:fmriprep:sub-001/func/sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_preproc.nii.gz - identifier for a bold file for subject sub-001 in the fmriprep dataset;
bids:ds001734:. - identifier for the ds001734 dataset;

Other prov:Entity

bids::prov#entity-28c0ba28 - identifier for a prov:Entity that is described in the current dataset.

Identifiers for other objects

The identifier for an activity, software, or environment described in a BIDS dataset <dataset-name> SHOULD have the following form, where <label> is a human readable name for coherently identifying the object and <uid> is a unique group of chars.

bids:[<dataset-name>]:prov#<label>-<uid>

The <uid> part of this identifier MUST be used to generate unique identifiers that distinguish any activity, software, or environment that are different in any of their attributes.

Examples of identifiers for activities, environments and software

bids::prov#conversion-00f3a18f - a conversion activity described inside the current dataset;
bids::prov#fedora-uldfv058 - a Fedora based environment described inside the current dataset;
bids::prov#fmriprep-awf6cvk6 - the fMRIPrep software described inside the current dataset.

Provenance graph

Objects describing provenance as defined in this specification can be aggregated into JSON-LD files ; which allows to represent provenance as an RDF graph (see Resource Description Framework (RDF)).

Minimal provenance graph

flowchart BT
    B[Brain extraction] -->|wasAssociatedWith| S{FSL<br>}
    B -->|used| T1([sub-001_T1w.nii])
    B -->|used| L((Linux))
    T1p([sub-001_space-orig_dseg.nii]) -->|wasGeneratedBy| B

In this example, a brain extraction algorithm was applied on a T1-weighted image:

sub-001_T1w.nii is the original T1-weighted image;
sub-001_space-orig_dseg.nii is the skull striped image;
the Brain extraction activity was performed using the FSL software within a Linux software environment.

The terms defined in this specification to describe provenance are based on the RDF, the RDF Schema, JSON-LD, and W3C PROV. The corresponding IRIs are described in the JSON-LD context file provenance-context.json provided with this specification.

Furthermore, this specification allows to describe provenance with terms from other vocaularies. This can be done using the Type fields for Activities, Files or prov:Entity.

All BIDS examples related to provenance (see. bids-examples, provenance section) show the aggregated version of the provenance metadata they contain. This comes as a JSON-LD file and a visualization of the graph. The JSON-LD file consists of an aggregation of the Activities, Software, Files, Datasets, prov:Entity and Environments objects inside a Records object, as well as a reference to the provenance-context.json file as JSON-LD @context.

Minimal examples

Provenance of a BIDS raw dataset

Example

For a complete example, see Provenance of DICOM to NIfTI conversion with dcm2niix.

In this example, we explain provenance metadata of a DICOM to NIfTI conversion with dcm2niix. Consider the following BIDS raw dataset:

├─ prov/
│  ├─ prov-dcm2niix_act.json 
│  ├─ prov-dcm2niix_soft.json 
│  └─ ... 
├─ sourcedata/
│  └─ dicoms/
│     └─ ... 
├─ sub-001/
│  └─ anat/
│     ├─ sub-001_T1w.json 
│     └─ sub-001_T1w.nii.gz 
└─ ...

The prov/prov-dcm2niix_soft.json file describes dcm2niix, the software package used for the DICOM conversion. As per the Provenance identifiers section, the identifier for the associated software object SHOULD start with bids:<dataset>:prov# (bids:: refers to the current dataset).

{
    "Software": [
        {
            "Id": "bids::prov#dcm2niix-khhkm7u1",
            "Label": "dcm2niix"
        }
    ]
}

The prov/prov-dcm2niix_act.json file describes the conversion activity. Note that the identifier for the previously described software package is used here to describe that the software package was used to compute this activity.

{
    "Activities": [
        {
            "Id": "bids::prov#conversion-00f3a18f",
            "Label": "Conversion",
            "AssociatedWith": "bids::prov#dcm2niix-khhkm7u1"
        }
    ]
}

Inside the sub-001/anat/sub-001_T1w.json file, the metadata field GeneratedBy indicates that the sub-001/anat/sub-001_T1w.nii.gz file was generated by the previously described activity.

{
    "GeneratedBy": "bids::prov#conversion-00f3a18f"
}

Provenance of a BIDS derivative dataset

Example

For a complete example, see Provenance of fMRI preprocessing with SPM.

In this example, we explain provenance metadata of fMRI preprocessing steps performed with SPM. Consider the following BIDS derivative dataset:

├─ prov/
│  ├─ prov-spm_act.json 
│  ├─ prov-spm_ent.json 
│  └─ ... 
├─ sub-01/
│  ├─ anat/
│  │  ├─ c1sub-001_T1w.json 
│  │  ├─ c1sub-001_T1w.nii 
│  │  ├─ ... 
│  │  ├─ sub-001_T1w.json 
│  │  └─ sub-001_T1w.nii 
│  └─ func/
│     └─ ... 
└─ ...

The prov/prov-spm_act.json file describes the preprocessing steps (activities) as JSON objects. Among them:

the bids::prov#movefile-bac3f385 activity needed a T1w file from the ds000011 dataset identified by bids:ds000011:sub-01/anat/sub-01_T1w.nii.gz;
the bids::prov#segment-7d5d4ac5 brain segmentation activity needed the two files listed inside the Used array.

{
    "Activities": [
        {
            "Id": "bids::prov#movefile-bac3f385",
            "Label": "Move file",
            "Used": [
                "bids:ds000011:sub-01/anat/sub-01_T1w.nii.gz"
            ]
        },
        {
            "Id": "bids::prov#segment-7d5d4ac5",
            "Label": "Segment",
            "Used": [
                "bids::prov#entity-28c0ba28",
                "bids::sub-01/anat/sub-01_T1w.nii"
            ]
        }
    ]
}

bids::sub-01/anat/sub-01_T1w.nii is a BIDS file available in the current dataset. The spm12/tpm/TPM.nii file is not inside the dataset ; hence its description is stored inside prov/prov-spm_ent.json and its identifier is not a BIDS URI:

{
    "Files": [
        {
            "Id": "bids::prov#entity-28c0ba28",
            "Label": "TPM.nii",
            "AtLocation": "spm12/tpm/TPM.nii"
        }
    ]
}

Inside the sub-001/anat/c1sub-001_T1w.json file, the metadata field GeneratedBy indicates that the c1sub-001/anat/sub-001_T1w.nii.gz file was generated by the previously described brain segmentation activity.

{
    "GeneratedBy": "bids::prov#segment-7d5d4ac5"
}

Provenance of a BIDS study dataset

Example

For a complete example, see Provenance of manual segmentations.

In this example, we explain provenance metadata of manual segmentations performed by two experts on the same T1w file. Consider the following BIDS study dataset:

├─ dataset_description.json 
├─ derivatives/
│  ├─ seg-brain/
│  │  ├─ dataset_description.json 
│  │  ├─ descriptions.tsv 
│  │  ├─ ... 
│  │  ├─ prov/
│  │  │  ├─ provenance.tsv 
│  │  │  ├─ prov-seg_act.json 
│  │  │  ├─ prov-seg_soft.json 
│  │  │  └─ prov-seg_ent.json 
│  │  └─ sub-001/
│  │     ├─ sub-001_space-orig_desc-exp1_dseg.json 
│  │     ├─ sub-001_space-orig_desc-exp1_dseg.nii.gz 
│  │     ├─ sub-001_space-orig_desc-exp2_dseg.json 
│  │     └─ sub-001_space-orig_desc-exp2_dseg.nii.gz 
│  └─ seg-lesions/
│     └─ ... 
├─ ... 
└─ sourcedata/
   └─ raw/
      ├─ dataset_description.json 
      ├─ prov/
      │  └─ prov-raw_ent.json 
      └─ sub-001/
         ├─ sub-001_T1w.json 
         └─ sub-001_T1w.nii.gz

Inside the dataset_description.json file of the seg-brain derivative dataset, the DatasetLinks metadata field defines an alias that is needed to refer to the raw dataset using BIDS URIs.

{
    "DatasetLinks": {
        "raw": "../../sourcedata/raw"
    }
}

The prov/prov-seg_act.json file describes activities during which the experts generated segmentations.

{
    "Activities": [
        {
            "Id": "bids::prov#segmentation-nO5RGsrb",
            "Label": "Manual brain segmentation",
            "Command": null,
            "Used": [
                "bids:raw:sub-001/anat/sub-001_T1w.nii.gz"
            ]
        },
        {
            "Id": "bids::prov#segmentation-mOOypIYB",
            "Label": "Manual brain segmentation",
            "Command": null,
            "Used": [
                "bids:raw:sub-001/anat/sub-001_T1w.nii.gz"
            ]
        }
    ]
}

Note that a description of the sub-001/anat/sub-001_T1w.nii.gz file is needed because this data file is related to the activities. Here we rely on the sourcedata/raw dataset to provide a description of the data file.

Under the derivatives/seg-brain dataset, the sub-001_space-orig_desc-exp1_dseg.json file describes which activity generated the sub-001_space-orig_desc-exp1_dseg.nii.gz file.

{
    "GeneratedBy": "bids::prov#segmentation-nO5RGsrb"
}

The derivatives/seg-brain/prov/provenance.tsv gives a description of the prov-seg entity.

provenance_id	description
prov-seg	Manual brain segmentation performed by two experts

The descriptions.tsv gives descriptions of the desc- entities used for datafiles.

desc_id	description
desc-exp1	Files generated by expert #1
desc-exp2	Files generated by expert #2