Skip to content

Provenance

Support for provenance was developed as a BIDS Extension Proposal. Please see Citing BIDS on how to appropriately credit this extension when referring to it in the context of the academic literature.

Example datasets

Several example datasets have been formatted using this specification and can be used for practical guidance when curating a new dataset.

This part of the BIDS specification is aimed at describing the provenance of a BIDS dataset. This description is retrospective: it describes a set of steps that were executed in order to establish the dataset and is based on W3C PROV (see Provenance graph).

Provenance information SHOULD be included in a BIDS dataset when possible. If provenance information is included, it MUST be described using the conventions detailed hereafter. Provenance information reflects the provenance of a full dataset and/or of specific files at any level of the BIDS hierarchy. Provenance information SHOULD not include human subject identifying data.

Note

Throughout this document, the terms Id and Label are used to provide identification for JSON objects related to provenance. Id is used to unambiguously identify those objects that may be referenced elsewhere, permitting automated tools to construct and query a graph. Label is a human-readable name for that object, which need not be unique, and should not be confused with the BIDS term label.

Provenance of a BIDS file

Provenance of a BIDS data file SHOULD be stored inside its sidecar JSON.

For that purpose, any sidecar JSON file MAY include the following keys:

Key name Requirement Level Data type Description
GeneratedBy OPTIONAL array of strings Identifier(s) of the activity/activities responsible for the creation of the file.
Related activities MUST be described as specified in the Activities section.
Corresponds to W3C PROV prov:wasGeneratedBy.
SidecarGeneratedBy OPTIONAL array of strings Identifier(s) of the activity/activities responsible for the creation of the sidecar JSON file.
Related activities MUST be described as specified in the Activities section.
Corresponds to W3C PROV prov:wasGeneratedBy.
Digest OPTIONAL object Object containing digests of the file. Each key in the object MUST be the name of a checksum function if present in this list: MD5; SHA1; SHA-224 ; SHA-256 ; SHA-384 ; SHA-512 ; SHA3-224; SHA3-256; SHA3-384; SHA3-512; BLAKE2B-256; BLAKE3-256; SHAKE128; SHAKE256. Otherwise, key MAY be an arbitrary label. The corresponding value is the checksum as computed by the function identified by the key.
Type OPTIONAL array of strings Term(s) from (a) controlled vocabulary/vocabularies that more specifically describes the file.
Corresponds to W3C PROV prov:type.

Example of metadata in a sidecar JSON file

{
    "GeneratedBy": "bids::prov#conversion-00f3a18f",
    "SidecarGeneratedBy": [
        "bids::prov#preparation-conversion-1xkhm1ft",
        "bids::prov#conversion-00f3a18f"
    ],
    "Digest": {
        "SHA-256": "66eeafb465559148e0222d4079558a8354eb09b9efabcc47cd5b8af6eed51907"
    }
}
For a complete example see Provenance of DICOM to NIfTI conversion with heudiconv.

Provenance of a BIDS dataset

Provenance of a BIDS dataset (raw, derivative, or study) SHOULD be stored inside its dataset_description.json file. The dataset_description.json file of a BIDS raw dataset or BIDS study dataset MAY include the GeneratedBy key to describe provenance. The dataset_description.json file of a BIDS derivative dataset MUST include the GeneratedBy key to describe provenance.

The GeneratedBy field MAY contain either of the following values:

Description using identifiers

This section details how to describe provenance of a dataset using identifiers. The following field is intended for use in dataset_description.json to provide provenance information that applies to the entire dataset.

Key name Requirement Level Data type Description
GeneratedBy RECOMMENDED for BIDS raw datasets and BIDS study datasets, REQUIRED for BIDS derivative datasets array of strings Identifier(s) of the activity/activities responsible for the creation of the dataset.
Related activities MUST be described as specified in the Activities section.
Corresponds to W3C PROV prov:wasGeneratedBy.

Example of GeneratedBy contents in a dataset_description.json

{
    "GeneratedBy": "bids::prov#preprocessing-xMpFqB5q"
}
For a complete example see Provenance of fMRI preprocessing with fMRIPrep.

Description of processes or pipelines

This section details how to describe the provenance of a dataset using an array of objects representing pipelines or processes that generated the dataset.

Warning

This description can be equivalently represented using the previous section. This modeling is kept for backward-compatibility but might be removed in future BIDS releases (see BIDS 2.0).

Key name Requirement Level Data type Description
GeneratedBy RECOMMENDED for BIDS raw datasets and BIDS study datasets, REQUIRED for BIDS derivative datasets array of objects Used to specify provenance of the dataset.

Each object in the GeneratedBy array includes the following REQUIRED, RECOMMENDED and OPTIONAL keys:

Key name Requirement Level Data type Description
Name REQUIRED string Name of the pipeline or process that generated the dataset. Use "Manual" to indicate the derivatives were generated by hand, or adjusted manually after an initial run of an automated pipeline.
Version RECOMMENDED string Version of the pipeline or process that generated the dataset.
Description RECOMMENDED if Name is "Manual", OPTIONAL otherwise string Plain-text description of the pipeline or process that generated the dataset. RECOMMENDED if Name is "Manual".
CodeURL OPTIONAL string URL where the code used to generate the dataset may be found.
Container OPTIONAL object Used to specify the location and relevant attributes of software container image used to produce the dataset. Valid keys in this object include Type, Tag and [URI][uri] with [string][] values.

Example of GeneratedBy contents in a dataset_description.json

{
    "GeneratedBy": [
        {
          "Name": "reproin",
          "Version": "0.6.0",
          "Container": {
            "Type": "docker",
            "Tag": "repronim/reproin:0.6.0"
          }
        }
    ]
}

Provenance files

Any provenance information that can't be stored in either sidecar JSON files (see Provenance of BIDS file) or in dataset_description.json (see Provenance of BIDS dataset) MUST be stored in provenance files under the /prov/ directory.

Template:

Legend:
  • For more information about filename elements (for example, entities, suffixes, extensions), follow the links embedded in the filename template.

  • <matches> is a placeholder to denote an arbitrary (and valid) sequence of entities and labels at the beginning of the filename (only BIDS "raw").

  • <source-entities> is a placeholder to denote an arbitrary sequence of entities and labels at the beginning of the filename matching a source file from which the file derives (only BIDS-Derivatives).

  • Filename entities or directories between square brackets (for example, [_ses-<label>]) are OPTIONAL.

  • Some entities may only allow specific values, in which case those values are listed in <>, separated by |.

  • _<suffix> means that there are several (>6) valid suffixes for this filename pattern.

  • .<extension> means that there are several (>6) valid extensions for this file type.

  • [.gz] means that both the unzipped and gzipped versions of the extension are valid.

Note

The prov entity allows to group related provenance files, using an arbitrary value for <label>. A subdirectory MAY be used to group provenance files sharing the same prov entity.

The following suffixes specify the contents of provenance files.

Name suffix Description
Description of activities act A JSON file containing objects describing activities in the context of provenance. (See the Activities section).
Description of input and output data ent A JSON file containing objects describing input and output data in the context of provenance. (See the Input and output data section).
Description of environments env A JSON file containing objects describing environments in the context of provenance. (See the Environments section).
Description of software soft A JSON file containing objects describing software in the context of provenance. (See the Software section).

Example of organization for provenance files

prov/
├─ prov-preprocspm/
│  ├─ prov-preprocspm_act.json
│  └─ prov-preprocspm_ent.json
├─ prov-preprocfsl_act.json
├─ prov-preprocfsl_ent.json
├─ prov-preprocfsl_env.json
├─ prov-preprocfsl_soft.json
└─ ...

Activities

Activities are transformations that have been applied to data.

Each file with an act suffix is a JSON file describing activities. It MUST include the following key:

Key name Requirement Level Data type Description
Activities REQUIRED array of objects Objects describing activities.

Each object in the Activities array includes the following keys:

Key name Requirement Level Data type Description
Id REQUIRED string Identifier for the activity.
Corresponds to JSON-LD @id.
Label REQUIRED string Name for the activity.
Corresponds to RDF Schema rdfs:label.
Command REQUIRED string or null Command (or commands) performed by the activity, including all parameters.
Set to null to describe that the activity was performed manually.
Description OPTIONAL string Plain-text extended description of the activity.
RECOMMENDED if Command is set to null.
Corresponds to RDF Schema rdfs:comment.
AssociatedWith OPTIONAL array of strings Identifier(s) of the software package(s) used to compute the activity.
Related software MUST be described as specified in the Software section.
Corresponds to W3C PROV prov:wasAssociatedWith.
Used OPTIONAL array of strings Identifier(s) of the input and output data or environment(s) used by the activity.
Related input and output data MUST be described as specified in the Input and output data section.
Related environment(s) MUST be described as specified in the Environments section.
Corresponds to W3C PROV prov:used.
Type OPTIONAL array of strings Term(s) from (a) controlled vocabulary/vocabularies that more specifically describes the activity.
Corresponds to W3C PROV prov:type.
StartedAtTime OPTIONAL string Timestamp tracking when the activity started.
Corresponds to W3C PROV prov:startedAtTime.
EndedAtTime OPTIONAL string Timestamp tracking when the activity ended.
Corresponds to W3C PROV prov:endedAtTime.

Example: description of an activity in a prov/[<subdir>/]prov-<label>_act.json file

{
    "Activities": [
        {
            "Id": "bids::prov#conversion-00f3a18f",
            "Label": "Dicom to NIfTI conversion",
            "Command": "dcm2niix -o . -f sub-%i/anat/sub-%i_T1w sourcedata/dicoms",
            "AssociatedWith": "bids::prov#dcm2niix-khhkm7u1",
            "Used": [
                "bids::prov#fedora-uldfv058",
                "bids::sourcedata/dicoms"
            ],
            "StartedAtTime": "2025-03-13T10:26:00",
            "EndedAtTime": "2025-03-13T10:26:05"
        }
    ]
}
For a complete example, see Provenance of DICOM to NIfTI conversion with dcm2niix.

Software

This section specifies how to describe software packages that computed the activities.

Each file with a soft suffix is a JSON file describing software. It MUST include the following key:

Key name Requirement Level Data type Description
Software REQUIRED array of objects Objects describing software.

Each object in the Software array includes the following keys:

Key name Requirement Level Data type Description
Id REQUIRED string Identifier for the software package.
Corresponds to JSON-LD @id.
Label REQUIRED string Name of the software package.
Corresponds to RDF Schema rdfs:label.
Version REQUIRED string Version of the software package.
AlternativeIdentifier OPTIONAL array of strings URI(s) of (an) alternative identifier(s) (such as RRID) for the software package.
ActedOnBehalfOf OPTIONAL array of strings Identifier(s) of other software package(s) that triggered the use of the software package.
Example: if software A launches software B to perform activity C, then B ActedOnBehalfOf A.
Related software MUST be described as specified in the Software section.
Corresponds to W3C PROV prov:actedOnBehalfOf.

Example: description of a software package in a prov/[<subdir>/]prov-<label>_soft.json file

{
    "Software": [
        {
            "Id": "bids::prov#dcm2niix-khhkm7u1",
            "AlternativeIdentifier": ["RRID:SCR_023517"],
            "Label": "dcm2niix",
            "Version": "v1.0.20220720"
        }
    ]
}
For a complete example, see Provenance of DICOM to NIfTI conversion with dcm2niix

Input and output data

This section specifies how to describe input and output data for activities. This data corresponds to the W3C PROV prov:Entity class that includes files, datasets and other types of data.

Each file with a ent suffix is a JSON file describing input and output data.

Note

The ent suffix stands for prov:Entity.

Warning

These files SHOULD not describe files that are available in the dataset. See Provenance of a BIDS file for this purpose.

These files SHOULD not describe the current dataset. See Provenance of a BIDS dataset for this purpose.

Each file MUST include one or more of the following keys:

Key name Requirement Level Data type Description
Files OPTIONAL, but REQUIRED if prov:Entity and Datasets fields are absent array of objects Objects describing files.
Datasets OPTIONAL, but REQUIRED if Files and prov:Entity fields are absent array of objects Objects describing datasets.
prov:Entity OPTIONAL, but REQUIRED if Files and Datasets fields are absent array of objects Objects describing prov:Entity objects other than files or datasets.

Each object in the Files array includes the following keys:

Key name Requirement Level Data type Description
Id REQUIRED string Identifier for the file.
Corresponds to JSON-LD @id.
Label REQUIRED string Name for the file.
Corresponds to RDF Schema rdfs:label.
Digest RECOMMENDED object Object containing digests of the file. Each key in the object MUST be the name of a checksum function if present in this list: MD5; SHA1; SHA-224 ; SHA-256 ; SHA-384 ; SHA-512 ; SHA3-224; SHA3-256; SHA3-384; SHA3-512; BLAKE2B-256; BLAKE3-256; SHAKE128; SHAKE256. Otherwise, key MAY be an arbitrary label. The corresponding value is the checksum as computed by the function identified by the key.
AtLocation OPTIONAL string Relative path to the file on disk.
Corresponds to W3C PROV prov:atLocation.
GeneratedBy OPTIONAL array of strings Identifier(s) of the activity/activities responsible for the creation of the file.
Related activities MUST be described as specified in the Activities section.
Corresponds to W3C PROV prov:wasGeneratedBy.
Type OPTIONAL array of strings Term(s) from (a) controlled vocabulary/vocabularies that more specifically describes the file.
Corresponds to W3C PROV prov:type.

Each object in the Datasets array includes the following keys:

Key name Requirement Level Data type Description
Id REQUIRED string Identifier for the dataset.
Corresponds to JSON-LD @id.
Label REQUIRED string Name for the dataset.
Corresponds to RDF Schema rdfs:label.
GeneratedBy OPTIONAL array of strings Identifier(s) of the activity/activities responsible for the creation of the dataset.
Related activities MUST be described as specified in the Activities section.
Corresponds to W3C PROV prov:wasGeneratedBy.

Each object in the prov:Entity array includes the following keys:

Key name Requirement Level Data type Description
Id REQUIRED string Identifier for the prov:Entity.
Corresponds to JSON-LD @id.
Label REQUIRED string Name for the prov:Entity.
Corresponds to RDF Schema rdfs:label.
Digest RECOMMENDED object Object containing digests of the prov:Entity. Each key in the object MUST be the name of a checksum function if present in this list: MD5; SHA1; SHA-224 ; SHA-256 ; SHA-384 ; SHA-512 ; SHA3-224; SHA3-256; SHA3-384; SHA3-512; BLAKE2B-256; BLAKE3-256; SHAKE128; SHAKE256. Otherwise, key MAY be an arbitrary label. The corresponding value is the checksum as computed by the function identified by the key.
GeneratedBy OPTIONAL array of strings Identifier(s) of the activity/activities responsible for the creation of the prov:Entity.
Related activities MUST be described as specified in the Activities section.
Corresponds to W3C PROV prov:wasGeneratedBy.
Type OPTIONAL array of strings Term(s) from (a) controlled vocabulary/vocabularies that more specifically describes the prov:Entity.
Corresponds to W3C PROV prov:type.

Example: description of a file in a prov/[<subdir>/]prov-<label>_ent.json file

{
    "Files": [
        {
            "Id": "bids::sub-01/anat/sub-01_T1w.nii#97a89211",
            "Label": "sub-01_T1w.nii",
            "AtLocation": "sub-01/anat/sub-01_T1w.nii",
            "GeneratedBy": "bids::prov#gunzip-e9264918",
            "Digest": {
                "SHA-256": "45485541db5734f565b7cac3e009f8b02907245fc6db435c700e84d1037773b5"
            }
        }
    ]
}
For a complete example, see Provenance of fMRI preprocessing with SPM

Example: description of a dataset in a prov/[<subdir>/]prov-<label>_ent.json file

{
    "Datasets": [
        {
            "Id": "bids:ds001734:.",
            "Label": "NARPS"
        }
    ]
}
For a complete example, see Provenance of fMRI preprocessing with fMRIPrep.

Environments

This section specifies how to describe software environments in which activities were performed.

Each file with a env suffix is a JSON file describing environments. It MUST include the following key:

Key name Requirement Level Data type Description
Environments REQUIRED array of objects Objects describing environments.

Each object in the Environments array includes the following keys:

Key name Requirement Level Data type Description
Id REQUIRED string Identifier for the environment.
Corresponds to JSON-LD @id.
Label REQUIRED string Name for the environment.
Corresponds to RDF Schema rdfs:label.
AlternativeIdentifier OPTIONAL array of strings URI(s) of (an) alternative identifier(s) for the environment.
EnvironmentVariables OPTIONAL object Object containing environment variables as key-value pairs.
OperatingSystem OPTIONAL string Name of the operating system for the environment. Including the version of the kernel and/or distribution is RECOMMENDED when applicable.
Dependencies OPTIONAL object Object containing names of the software dependencies as keys and their versions as values.

Example: description of an environment (docker container) in a prov/[<subdir>/]prov-<label>_env.json file

{
    "Environments": [
        {
            "Id": "bids::prov#poldracklab/fmriprep-mHl7Dqa0",
            "Label": "poldracklab/fmriprep:1.1.4",
            "AlternativeIdentifier": [
                "https://hub.docker.com/layers/poldracklab/fmriprep/1.1.4"
            ]
        }
    ]
}
For a complete example, see Provenance of fMRI preprocessing with fMRIPrep.

Provenance label file

Template:

prov/
    provenance.tsv
    provenance.json

The purpose of this RECOMMENDED file is to describe properties of prov- entities used in the names of provenance files. It MUST contain the column provenance_id, which MUST consist of prov-<label> values identifying one row for each prov entity in the dataset, followed by an optional column containing a description for the entity. Each entity MUST be described by one and only one row.

We RECOMMEND to make use of these columns, and in case that you do use them, we RECOMMEND to use the following values for them:

Column name Requirement Level Data type Description
provenance_id REQUIRED string An identifier of the form prov-<label>, matching a prov entity found in the dataset. There MUST be exactly one row for each prov-<label> entity.

Values in provenance_id MUST be unique.

This column must appear first in the file.
description OPTIONAL string Free-form text description of the provenance file(s).

This column may appear anywhere in the file.
Additional Columns OPTIONAL n/a Additional columns are allowed if they are defined in the associated metadata file.

Throughout BIDS you can indicate missing values with n/a (for "not available").

provenance.tsv example:

provenance_iddescription
prov-preprocspmProvenance of preprocessing performed with SPM.
prov-preprocfslProvenance of preprocessing performed with FSL.

Additional columns may be added to provenance.tsv but MUST be accompanied with a provenance.json sidecar file to describe the TSV column names and properties of their values as outlined in common principles for tabular files.

Provenance identifiers

Identifiers for JSON objects related to provenance must be IRIs. The following rules and conventions are provided in order to have consistent, human readable, unique, and explicit IRIs as identifiers.

Identifiers for input and output data

The identifier for a BIDS file or a BIDS dataset MUST be a BIDS URI. The identifier for a no-longer-existing BIDS file or BIDS dataset SHOULD be a BIDS URI with a fragment part.

Warning

The use of BIDS URIs may require to define the DatasetLinks object in dataset_description.json.

Apart from BIDS files and BIDS datasets, identifiers for a prov:Entity (see Input and output data) in a BIDS dataset <dataset-name> MAY have the following form, where <label> is an arbitrary value for identifying the prov:Entity.

bids:[<dataset-name>]:prov#entity-<label>

Examples of identifiers for input and output data

BIDS files and datasets

  • bids:ds000011:sub-01/anat/sub-01_T1w.nii.gz - identifier for a T1w file for subject sub-01 in the ds000011 dataset;
  • bids::sub-014/func/sub-014_task-MGT_run-01_events.tsv - identifier for an events file for subject sub-014 in the current dataset;
  • bids:fmriprep:sub-001/func/sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_preproc.nii.gz - identifier for a bold file for subject sub-001 in the fmriprep dataset;
  • bids:ds001734:. - identifier for the ds001734 dataset;

Other prov:Entity

  • bids::prov#entity-28c0ba28 - identifier for a prov:Entity that is described in the current dataset.

Identifiers for other objects

The identifier for an activity, software, or environment described in a BIDS dataset <dataset-name> SHOULD have the following form, where <label> is a human readable name for coherently identifying the object and <uid> is a unique group of chars.

bids:[<dataset-name>]:prov#<label>-<uid>

The <uid> part of this identifier MUST be used to generate unique identifiers that distinguish any activity, software, or environment that are different in any of their attributes.

Examples of identifiers for activities, environments and software

  • bids::prov#conversion-00f3a18f - a conversion activity described inside the current dataset;
  • bids::prov#fedora-uldfv058 - a Fedora based environment described inside the current dataset;
  • bids::prov#fmriprep-awf6cvk6 - the fMRIPrep software described inside the current dataset.

Provenance graph

Objects describing provenance as defined in this specification can be aggregated into JSON-LD files ; which allows to represent provenance as an RDF graph (see Resource Description Framework (RDF)).

Minimal provenance graph

flowchart BT
    B[Brain extraction] -->|wasAssociatedWith| S{FSL<br>}
    B -->|used| T1([sub-001_T1w.nii])
    B -->|used| L((Linux))
    T1p([sub-001_space-orig_dseg.nii]) -->|wasGeneratedBy| B

In this example, a brain extraction algorithm was applied on a T1-weighted image:

  • sub-001_T1w.nii is the original T1-weighted image;
  • sub-001_space-orig_dseg.nii is the skull striped image;
  • the Brain extraction activity was performed using the FSL software within a Linux software environment.

The terms defined in this specification to describe provenance are based on the RDF, the RDF Schema, JSON-LD, and W3C PROV. The corresponding IRIs are described in the JSON-LD context file provenance-context.json provided with this specification.

Furthermore, this specification allows to describe provenance with terms from other vocaularies. This can be done using the Type fields for Activities, Files or prov:Entity.

All BIDS examples related to provenance (see. bids-examples, provenance section) show the aggregated version of the provenance metadata they contain. This comes as a JSON-LD file and a visualization of the graph. The JSON-LD file consists of an aggregation of the Activities, Software, Files, Datasets, prov:Entity and Environments objects inside a Records object, as well as a reference to the provenance-context.json file as JSON-LD @context.

Minimal examples

Provenance of a BIDS raw dataset

Example

For a complete example, see Provenance of DICOM to NIfTI conversion with dcm2niix.

In this example, we explain provenance metadata of a DICOM to NIfTI conversion with dcm2niix. Consider the following BIDS raw dataset:

├─ prov/
│  ├─ prov-dcm2niix_act.json 
│  ├─ prov-dcm2niix_soft.json 
│  └─ ... 
├─ sourcedata/
│  └─ dicoms/
│     └─ ... 
├─ sub-001/
│  └─ anat/
│     ├─ sub-001_T1w.json 
│     └─ sub-001_T1w.nii.gz 
└─ ... 

The prov/prov-dcm2niix_soft.json file describes dcm2niix, the software package used for the DICOM conversion. As per the Provenance identifiers section, the identifier for the associated software object SHOULD start with bids:<dataset>:prov# (bids:: refers to the current dataset).

{
    "Software": [
        {
            "Id": "bids::prov#dcm2niix-khhkm7u1",
            "Label": "dcm2niix"
        }
    ]
}

The prov/prov-dcm2niix_act.json file describes the conversion activity. Note that the identifier for the previously described software package is used here to describe that the software package was used to compute this activity.

{
    "Activities": [
        {
            "Id": "bids::prov#conversion-00f3a18f",
            "Label": "Conversion",
            "AssociatedWith": "bids::prov#dcm2niix-khhkm7u1"
        }
    ]
}

Inside the sub-001/anat/sub-001_T1w.json file, the metadata field GeneratedBy indicates that the sub-001/anat/sub-001_T1w.nii.gz file was generated by the previously described activity.

{
    "GeneratedBy": "bids::prov#conversion-00f3a18f"
}

Provenance of a BIDS derivative dataset

Example

For a complete example, see Provenance of fMRI preprocessing with SPM.

In this example, we explain provenance metadata of fMRI preprocessing steps performed with SPM. Consider the following BIDS derivative dataset:

├─ prov/
│  ├─ prov-spm_act.json 
│  ├─ prov-spm_ent.json 
│  └─ ... 
├─ sub-01/
│  ├─ anat/
│  │  ├─ c1sub-001_T1w.json 
│  │  ├─ c1sub-001_T1w.nii 
│  │  ├─ ... 
│  │  ├─ sub-001_T1w.json 
│  │  └─ sub-001_T1w.nii 
│  └─ func/
│     └─ ... 
└─ ... 

The prov/prov-spm_act.json file describes the preprocessing steps (activities) as JSON objects. Among them:

  • the bids::prov#movefile-bac3f385 activity needed a T1w file from the ds000011 dataset identified by bids:ds000011:sub-01/anat/sub-01_T1w.nii.gz;

  • the bids::prov#segment-7d5d4ac5 brain segmentation activity needed the two files listed inside the Used array.

{
    "Activities": [
        {
            "Id": "bids::prov#movefile-bac3f385",
            "Label": "Move file",
            "Used": [
                "bids:ds000011:sub-01/anat/sub-01_T1w.nii.gz"
            ]
        },
        {
            "Id": "bids::prov#segment-7d5d4ac5",
            "Label": "Segment",
            "Used": [
                "bids::prov#entity-28c0ba28",
                "bids::sub-01/anat/sub-01_T1w.nii"
            ]
        }
    ]
}

bids::sub-01/anat/sub-01_T1w.nii is a BIDS file available in the current dataset. The spm12/tpm/TPM.nii file is not inside the dataset ; hence its description is stored inside prov/prov-spm_ent.json and its identifier is not a BIDS URI:

{
    "Files": [
        {
            "Id": "bids::prov#entity-28c0ba28",
            "Label": "TPM.nii",
            "AtLocation": "spm12/tpm/TPM.nii"
        }
    ]
}

Inside the sub-001/anat/c1sub-001_T1w.json file, the metadata field GeneratedBy indicates that the c1sub-001/anat/sub-001_T1w.nii.gz file was generated by the previously described brain segmentation activity.

{
    "GeneratedBy": "bids::prov#segment-7d5d4ac5"
}

Provenance of a BIDS study dataset

Example

For a complete example, see Provenance of manual segmentations.

In this example, we explain provenance metadata of manual segmentations performed by two experts on the same T1w file. Consider the following BIDS study dataset:

├─ dataset_description.json 
├─ derivatives/
│  ├─ seg-brain/
│  │  ├─ dataset_description.json 
│  │  ├─ descriptions.tsv 
│  │  ├─ ... 
│  │  ├─ prov/
│  │  │  ├─ provenance.tsv 
│  │  │  ├─ prov-seg_act.json 
│  │  │  ├─ prov-seg_soft.json 
│  │  │  └─ prov-seg_ent.json 
│  │  └─ sub-001/
│  │     ├─ sub-001_space-orig_desc-exp1_dseg.json 
│  │     ├─ sub-001_space-orig_desc-exp1_dseg.nii.gz 
│  │     ├─ sub-001_space-orig_desc-exp2_dseg.json 
│  │     └─ sub-001_space-orig_desc-exp2_dseg.nii.gz 
│  └─ seg-lesions/
│     └─ ... 
├─ ... 
└─ sourcedata/
   └─ raw/
      ├─ dataset_description.json 
      ├─ prov/
      │  └─ prov-raw_ent.json 
      └─ sub-001/
         ├─ sub-001_T1w.json 
         └─ sub-001_T1w.nii.gz 

Inside the dataset_description.json file of the seg-brain derivative dataset, the DatasetLinks metadata field defines an alias that is needed to refer to the raw dataset using BIDS URIs.

{
    "DatasetLinks": {
        "raw": "../../sourcedata/raw"
    }
}

The prov/prov-seg_act.json file describes activities during which the experts generated segmentations.

{
    "Activities": [
        {
            "Id": "bids::prov#segmentation-nO5RGsrb",
            "Label": "Manual brain segmentation",
            "Command": null,
            "Used": [
                "bids:raw:sub-001/anat/sub-001_T1w.nii.gz"
            ]
        },
        {
            "Id": "bids::prov#segmentation-mOOypIYB",
            "Label": "Manual brain segmentation",
            "Command": null,
            "Used": [
                "bids:raw:sub-001/anat/sub-001_T1w.nii.gz"
            ]
        }
    ]
}

Note that a description of the sub-001/anat/sub-001_T1w.nii.gz file is needed because this data file is related to the activities. Here we rely on the sourcedata/raw dataset to provide a description of the data file.

Under the derivatives/seg-brain dataset, the sub-001_space-orig_desc-exp1_dseg.json file describes which activity generated the sub-001_space-orig_desc-exp1_dseg.nii.gz file.

{
    "GeneratedBy": "bids::prov#segmentation-nO5RGsrb"
}

The derivatives/seg-brain/prov/provenance.tsv gives a description of the prov-seg entity.

provenance_iddescription
prov-segManual brain segmentation performed by two experts

The descriptions.tsv gives descriptions of the desc- entities used for datafiles.

desc_iddescription
desc-exp1Files generated by expert #1
desc-exp2Files generated by expert #2