Provenance
Support for provenance was developed as a BIDS Extension Proposal. Please see Citing BIDS on how to appropriately credit this extension when referring to it in the context of the academic literature.
Example datasets
Several example datasets have been formatted using this specification and can be used for practical guidance when curating a new dataset.
This part of the BIDS specification is aimed at describing the provenance of a BIDS dataset. This description is retrospective: it describes a set of steps that were executed in order to establish the dataset and is based on W3C PROV (see Provenance graph).
Provenance information SHOULD be included in a BIDS dataset when possible. If provenance information is included, it MUST be described using the conventions detailed hereafter. Provenance information reflects the provenance of a full dataset and/or of specific files at any level of the BIDS hierarchy. Provenance information SHOULD not include human subject identifying data.
Note
Throughout this document, the terms Id and Label are used to provide identification
for JSON objects related to provenance.
Id is used to unambiguously identify those objects
that may be referenced elsewhere,
permitting automated tools to construct and query a graph.
Label is a human-readable name for that object, which need not be unique,
and should not be confused with the BIDS term
label.
Provenance of a BIDS file
Provenance of a BIDS data file SHOULD be stored inside its sidecar JSON.
For that purpose, any sidecar JSON file MAY include the following keys:
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| GeneratedBy | OPTIONAL | array of strings | Identifier(s) of the activity/activities responsible for the creation of the file. Related activities MUST be described as specified in the Activities section. Corresponds to W3C PROV prov:wasGeneratedBy. |
| SidecarGeneratedBy | OPTIONAL | array of strings | Identifier(s) of the activity/activities responsible for the creation of the sidecar JSON file. Related activities MUST be described as specified in the Activities section. Corresponds to W3C PROV prov:wasGeneratedBy. |
| Digest | OPTIONAL | object | Object containing digests of the file. Each key in the object MUST be the name of a checksum function if present in this list: MD5; SHA1; SHA-224 ; SHA-256 ; SHA-384 ; SHA-512 ; SHA3-224; SHA3-256; SHA3-384; SHA3-512; BLAKE2B-256; BLAKE3-256; SHAKE128; SHAKE256. Otherwise, key MAY be an arbitrary label. The corresponding value is the checksum as computed by the function identified by the key. |
| Type | OPTIONAL | array of strings | Term(s) from (a) controlled vocabulary/vocabularies that more specifically describes the file. Corresponds to W3C PROV prov:type. |
Example of metadata in a sidecar JSON file
{
"GeneratedBy": "bids::prov#conversion-00f3a18f",
"SidecarGeneratedBy": [
"bids::prov#preparation-conversion-1xkhm1ft",
"bids::prov#conversion-00f3a18f"
],
"Digest": {
"SHA-256": "66eeafb465559148e0222d4079558a8354eb09b9efabcc47cd5b8af6eed51907"
}
}
heudiconv.
Provenance of a BIDS dataset
Provenance of a BIDS dataset (raw, derivative, or study) SHOULD be stored
inside its dataset_description.json file.
The dataset_description.json file of a BIDS raw dataset or BIDS study dataset MAY
include the GeneratedBy key to describe provenance.
The dataset_description.json file of a BIDS derivative dataset MUST
include the GeneratedBy key to describe provenance.
The GeneratedBy field MAY contain either of the following values:
-
Identifier(s) of the activity/activities responsible for the creation of the dataset (see Description using identifiers).
-
A description of pipelines or processes responsible for the creation of the dataset (see Description of pipelines or processes).
Description using identifiers
This section details how to describe provenance of a dataset using identifiers.
The following field is intended for use in dataset_description.json to provide
provenance information that applies to the entire dataset.
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| GeneratedBy | RECOMMENDED for BIDS raw datasets and BIDS study datasets, REQUIRED for BIDS derivative datasets | array of strings | Identifier(s) of the activity/activities responsible for the creation of the dataset. Related activities MUST be described as specified in the Activities section. Corresponds to W3C PROV prov:wasGeneratedBy. |
Example of GeneratedBy contents in a dataset_description.json
{
"GeneratedBy": "bids::prov#preprocessing-xMpFqB5q"
}
fMRIPrep.
Description of processes or pipelines
This section details how to describe the provenance of a dataset using an array of objects representing pipelines or processes that generated the dataset.
Warning
This description can be equivalently represented using the previous section. This modeling is kept for backward-compatibility but might be removed in future BIDS releases (see BIDS 2.0).
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| GeneratedBy | RECOMMENDED for BIDS raw datasets and BIDS study datasets, REQUIRED for BIDS derivative datasets | array of objects | Used to specify provenance of the dataset. |
Each object in the GeneratedBy array includes the following REQUIRED, RECOMMENDED
and OPTIONAL keys:
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| Name | REQUIRED | string | Name of the pipeline or process that generated the dataset. Use "Manual" to indicate the derivatives were generated by hand, or adjusted manually after an initial run of an automated pipeline. |
| Version | RECOMMENDED | string | Version of the pipeline or process that generated the dataset. |
| Description | RECOMMENDED if Name is "Manual", OPTIONAL otherwise |
string | Plain-text description of the pipeline or process that generated the dataset. RECOMMENDED if Name is "Manual". |
| CodeURL | OPTIONAL | string | URL where the code used to generate the dataset may be found. |
| Container | OPTIONAL | object | Used to specify the location and relevant attributes of software container image used to produce the dataset. Valid keys in this object include Type, Tag and [URI][uri] with [string][] values. |
Example of GeneratedBy contents in a dataset_description.json
{
"GeneratedBy": [
{
"Name": "reproin",
"Version": "0.6.0",
"Container": {
"Type": "docker",
"Tag": "repronim/reproin:0.6.0"
}
}
]
}
Provenance files
Any provenance information that can't be stored in either sidecar JSON files
(see Provenance of BIDS file) or in dataset_description.json
(see Provenance of BIDS dataset) MUST be stored in
provenance files under the /prov/ directory.
Template:
Legend:
-
For more information about filename elements (for example, entities, suffixes, extensions), follow the links embedded in the filename template.
-
<matches>is a placeholder to denote an arbitrary (and valid) sequence of entities and labels at the beginning of the filename (only BIDS "raw"). -
<source-entities>is a placeholder to denote an arbitrary sequence of entities and labels at the beginning of the filename matching a source file from which the file derives (only BIDS-Derivatives). -
Filename entities or directories between square brackets (for example,
[_ses-<label>]) are OPTIONAL. -
Some entities may only allow specific values, in which case those values are listed in
<>, separated by|. -
_<suffix>means that there are several (>6) valid suffixes for this filename pattern. -
.<extension>means that there are several (>6) valid extensions for this file type. -
[.gz]means that both the unzipped and gzipped versions of the extension are valid.
Note
The prov entity allows to group related provenance files,
using an arbitrary value for <label>.
A subdirectory MAY be used to group provenance files sharing the same prov entity.
The following suffixes specify the contents of provenance files.
| Name | suffix |
Description |
|---|---|---|
| Description of activities | act | A JSON file containing objects describing activities in the context of provenance. (See the Activities section). |
| Description of input and output data | ent | A JSON file containing objects describing input and output data in the context of provenance. (See the Input and output data section). |
| Description of environments | env | A JSON file containing objects describing environments in the context of provenance. (See the Environments section). |
| Description of software | soft | A JSON file containing objects describing software in the context of provenance. (See the Software section). |
Example of organization for provenance files
prov/
├─ prov-preprocspm/
│ ├─ prov-preprocspm_act.json
│ └─ prov-preprocspm_ent.json
├─ prov-preprocfsl_act.json
├─ prov-preprocfsl_ent.json
├─ prov-preprocfsl_env.json
├─ prov-preprocfsl_soft.json
└─ ...
Activities
Activities are transformations that have been applied to data.
Each file with an act suffix is a JSON file describing activities.
It MUST include the following key:
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| Activities | REQUIRED | array of objects | Objects describing activities. |
Each object in the Activities array includes the following keys:
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| Id | REQUIRED | string | Identifier for the activity. Corresponds to JSON-LD @id. |
| Label | REQUIRED | string | Name for the activity. Corresponds to RDF Schema rdfs:label. |
| Command | REQUIRED | string or null | Command (or commands) performed by the activity, including all parameters. Set to null to describe that the activity was performed manually. |
| Description | OPTIONAL | string | Plain-text extended description of the activity. RECOMMENDED if Command is set to null.Corresponds to RDF Schema rdfs:comment. |
| AssociatedWith | OPTIONAL | array of strings | Identifier(s) of the software package(s) used to compute the activity. Related software MUST be described as specified in the Software section. Corresponds to W3C PROV prov:wasAssociatedWith. |
| Used | OPTIONAL | array of strings | Identifier(s) of the input and output data or environment(s) used by the activity. Related input and output data MUST be described as specified in the Input and output data section. Related environment(s) MUST be described as specified in the Environments section. Corresponds to W3C PROV prov:used. |
| Type | OPTIONAL | array of strings | Term(s) from (a) controlled vocabulary/vocabularies that more specifically describes the activity. Corresponds to W3C PROV prov:type. |
| StartedAtTime | OPTIONAL | string | Timestamp tracking when the activity started. Corresponds to W3C PROV prov:startedAtTime. |
| EndedAtTime | OPTIONAL | string | Timestamp tracking when the activity ended. Corresponds to W3C PROV prov:endedAtTime. |
Example: description of an activity in a prov/[<subdir>/]prov-<label>_act.json file
{
"Activities": [
{
"Id": "bids::prov#conversion-00f3a18f",
"Label": "Dicom to NIfTI conversion",
"Command": "dcm2niix -o . -f sub-%i/anat/sub-%i_T1w sourcedata/dicoms",
"AssociatedWith": "bids::prov#dcm2niix-khhkm7u1",
"Used": [
"bids::prov#fedora-uldfv058",
"bids::sourcedata/dicoms"
],
"StartedAtTime": "2025-03-13T10:26:00",
"EndedAtTime": "2025-03-13T10:26:05"
}
]
}
dcm2niix.
Software
This section specifies how to describe software packages that computed the activities.
Each file with a soft suffix is a JSON file describing software.
It MUST include the following key:
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| Software | REQUIRED | array of objects | Objects describing software. |
Each object in the Software array includes the following keys:
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| Id | REQUIRED | string | Identifier for the software package. Corresponds to JSON-LD @id. |
| Label | REQUIRED | string | Name of the software package. Corresponds to RDF Schema rdfs:label. |
| Version | REQUIRED | string | Version of the software package. |
| AlternativeIdentifier | OPTIONAL | array of strings | URI(s) of (an) alternative identifier(s) (such as RRID) for the software package. |
| ActedOnBehalfOf | OPTIONAL | array of strings | Identifier(s) of other software package(s) that triggered the use of the software package. Example: if software A launches software B to perform activity C, then B ActedOnBehalfOf A. Related software MUST be described as specified in the Software section. Corresponds to W3C PROV prov:actedOnBehalfOf. |
Example: description of a software package in a prov/[<subdir>/]prov-<label>_soft.json file
{
"Software": [
{
"Id": "bids::prov#dcm2niix-khhkm7u1",
"AlternativeIdentifier": ["RRID:SCR_023517"],
"Label": "dcm2niix",
"Version": "v1.0.20220720"
}
]
}
dcm2niix
Input and output data
This section specifies how to describe input and output data for activities. This data corresponds to the W3C PROV prov:Entity class that includes files, datasets and other types of data.
Each file with a ent suffix is a JSON file describing input and output data.
Note
The ent suffix stands for prov:Entity.
Warning
These files SHOULD not describe files that are available in the dataset. See Provenance of a BIDS file for this purpose.
These files SHOULD not describe the current dataset. See Provenance of a BIDS dataset for this purpose.
Each file MUST include one or more of the following keys:
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| Files | OPTIONAL, but REQUIRED if prov:Entity and Datasets fields are absent |
array of objects | Objects describing files. |
| Datasets | OPTIONAL, but REQUIRED if Files and prov:Entity fields are absent |
array of objects | Objects describing datasets. |
| prov:Entity | OPTIONAL, but REQUIRED if Files and Datasets fields are absent |
array of objects | Objects describing prov:Entity objects other than files or datasets. |
Each object in the Files array includes the following keys:
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| Id | REQUIRED | string | Identifier for the file. Corresponds to JSON-LD @id. |
| Label | REQUIRED | string | Name for the file. Corresponds to RDF Schema rdfs:label. |
| Digest | RECOMMENDED | object | Object containing digests of the file. Each key in the object MUST be the name of a checksum function if present in this list: MD5; SHA1; SHA-224 ; SHA-256 ; SHA-384 ; SHA-512 ; SHA3-224; SHA3-256; SHA3-384; SHA3-512; BLAKE2B-256; BLAKE3-256; SHAKE128; SHAKE256. Otherwise, key MAY be an arbitrary label. The corresponding value is the checksum as computed by the function identified by the key. |
| AtLocation | OPTIONAL | string | Relative path to the file on disk. Corresponds to W3C PROV prov:atLocation. |
| GeneratedBy | OPTIONAL | array of strings | Identifier(s) of the activity/activities responsible for the creation of the file. Related activities MUST be described as specified in the Activities section. Corresponds to W3C PROV prov:wasGeneratedBy. |
| Type | OPTIONAL | array of strings | Term(s) from (a) controlled vocabulary/vocabularies that more specifically describes the file. Corresponds to W3C PROV prov:type. |
Each object in the Datasets array includes the following keys:
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| Id | REQUIRED | string | Identifier for the dataset. Corresponds to JSON-LD @id. |
| Label | REQUIRED | string | Name for the dataset. Corresponds to RDF Schema rdfs:label. |
| GeneratedBy | OPTIONAL | array of strings | Identifier(s) of the activity/activities responsible for the creation of the dataset. Related activities MUST be described as specified in the Activities section. Corresponds to W3C PROV prov:wasGeneratedBy. |
Each object in the prov:Entity array includes the following keys:
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| Id | REQUIRED | string | Identifier for the prov:Entity. Corresponds to JSON-LD @id. |
| Label | REQUIRED | string | Name for the prov:Entity. Corresponds to RDF Schema rdfs:label. |
| Digest | RECOMMENDED | object | Object containing digests of the prov:Entity. Each key in the object MUST be the name of a checksum function if present in this list: MD5; SHA1; SHA-224 ; SHA-256 ; SHA-384 ; SHA-512 ; SHA3-224; SHA3-256; SHA3-384; SHA3-512; BLAKE2B-256; BLAKE3-256; SHAKE128; SHAKE256. Otherwise, key MAY be an arbitrary label. The corresponding value is the checksum as computed by the function identified by the key. |
| GeneratedBy | OPTIONAL | array of strings | Identifier(s) of the activity/activities responsible for the creation of the prov:Entity. Related activities MUST be described as specified in the Activities section. Corresponds to W3C PROV prov:wasGeneratedBy. |
| Type | OPTIONAL | array of strings | Term(s) from (a) controlled vocabulary/vocabularies that more specifically describes the prov:Entity. Corresponds to W3C PROV prov:type. |
Example: description of a file in a prov/[<subdir>/]prov-<label>_ent.json file
{
"Files": [
{
"Id": "bids::sub-01/anat/sub-01_T1w.nii#97a89211",
"Label": "sub-01_T1w.nii",
"AtLocation": "sub-01/anat/sub-01_T1w.nii",
"GeneratedBy": "bids::prov#gunzip-e9264918",
"Digest": {
"SHA-256": "45485541db5734f565b7cac3e009f8b02907245fc6db435c700e84d1037773b5"
}
}
]
}
SPM
Example: description of a dataset in a prov/[<subdir>/]prov-<label>_ent.json file
{
"Datasets": [
{
"Id": "bids:ds001734:.",
"Label": "NARPS"
}
]
}
fMRIPrep.
Environments
This section specifies how to describe software environments in which activities were performed.
Each file with a env suffix is a JSON file describing environments.
It MUST include the following key:
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| Environments | REQUIRED | array of objects | Objects describing environments. |
Each object in the Environments array includes the following keys:
| Key name | Requirement Level | Data type | Description |
|---|---|---|---|
| Id | REQUIRED | string | Identifier for the environment. Corresponds to JSON-LD @id. |
| Label | REQUIRED | string | Name for the environment. Corresponds to RDF Schema rdfs:label. |
| AlternativeIdentifier | OPTIONAL | array of strings | URI(s) of (an) alternative identifier(s) for the environment. |
| EnvironmentVariables | OPTIONAL | object | Object containing environment variables as key-value pairs. |
| OperatingSystem | OPTIONAL | string | Name of the operating system for the environment. Including the version of the kernel and/or distribution is RECOMMENDED when applicable. |
| Dependencies | OPTIONAL | object | Object containing names of the software dependencies as keys and their versions as values. |
Example: description of an environment (docker container) in a prov/[<subdir>/]prov-<label>_env.json file
{
"Environments": [
{
"Id": "bids::prov#poldracklab/fmriprep-mHl7Dqa0",
"Label": "poldracklab/fmriprep:1.1.4",
"AlternativeIdentifier": [
"https://hub.docker.com/layers/poldracklab/fmriprep/1.1.4"
]
}
]
}
fMRIPrep.
Provenance label file
Template:
prov/
provenance.tsv
provenance.json
The purpose of this RECOMMENDED file is to describe properties of
prov- entities used in the names of provenance files.
It MUST contain the column provenance_id,
which MUST consist of prov-<label> values identifying one row for each
prov entity in the dataset,
followed by an optional column containing a description for the entity.
Each entity MUST be described by one and only one row.
We RECOMMEND to make use of these columns, and in case that you do use them, we RECOMMEND to use the following values for them:
| Column name | Requirement Level | Data type | Description |
|---|---|---|---|
| provenance_id | REQUIRED | string | An identifier of the form prov-<label>, matching a prov entity found in the dataset. There MUST be exactly one row for each prov-<label> entity.Values in provenance_id MUST be unique.This column must appear first in the file. |
| description | OPTIONAL | string | Free-form text description of the provenance file(s). This column may appear anywhere in the file. |
| Additional Columns | OPTIONAL | n/a |
Additional columns are allowed if they are defined in the associated metadata file. |
Throughout BIDS you can indicate missing values with n/a (for "not
available").
provenance.tsv example:
| provenance_id | description |
|---|---|
| prov-preprocspm | Provenance of preprocessing performed with SPM. |
| prov-preprocfsl | Provenance of preprocessing performed with FSL. |
Additional columns may be added to provenance.tsv but MUST be accompanied with a
provenance.json sidecar file to describe the TSV column names and properties of their values
as outlined in common principles for tabular files.
Provenance identifiers
Identifiers for JSON objects related to provenance must be IRIs. The following rules and conventions are provided in order to have consistent, human readable, unique, and explicit IRIs as identifiers.
Identifiers for input and output data
The identifier for a BIDS file or a BIDS dataset MUST be a BIDS URI. The identifier for a no-longer-existing BIDS file or BIDS dataset SHOULD be a BIDS URI with a fragment part.
Warning
The use of BIDS URIs may require to define the DatasetLinks object
in dataset_description.json.
Apart from BIDS files and BIDS datasets, identifiers for a prov:Entity
(see Input and output data)
in a BIDS dataset <dataset-name> MAY have the following form,
where <label> is an arbitrary value for identifying the prov:Entity.
bids:[<dataset-name>]:prov#entity-<label>
Examples of identifiers for input and output data
BIDS files and datasets
bids:ds000011:sub-01/anat/sub-01_T1w.nii.gz- identifier for a T1w file for subjectsub-01in theds000011dataset;bids::sub-014/func/sub-014_task-MGT_run-01_events.tsv- identifier for an events file for subjectsub-014in the current dataset;bids:fmriprep:sub-001/func/sub-001_task-MGT_run-01_bold_space-MNI152NLin2009cAsym_preproc.nii.gz- identifier for a bold file for subjectsub-001in thefmriprepdataset;bids:ds001734:.- identifier for theds001734dataset;
Other prov:Entity
bids::prov#entity-28c0ba28- identifier for a prov:Entity that is described in the current dataset.
Identifiers for other objects
The identifier for an activity, software, or environment described
in a BIDS dataset <dataset-name> SHOULD have the following form,
where <label> is a human readable name for coherently identifying
the object and <uid> is a unique group of chars.
bids:[<dataset-name>]:prov#<label>-<uid>
The <uid> part of this identifier MUST be used to generate unique identifiers
that distinguish any activity, software, or environment that are different in any of their attributes.
Examples of identifiers for activities, environments and software
bids::prov#conversion-00f3a18f- a conversion activity described inside the current dataset;bids::prov#fedora-uldfv058- a Fedora based environment described inside the current dataset;bids::prov#fmriprep-awf6cvk6- the fMRIPrep software described inside the current dataset.
Provenance graph
Objects describing provenance as defined in this specification can be aggregated into JSON-LD files ; which allows to represent provenance as an RDF graph (see Resource Description Framework (RDF)).
Minimal provenance graph
flowchart BT
B[Brain extraction] -->|wasAssociatedWith| S{FSL<br>}
B -->|used| T1([sub-001_T1w.nii])
B -->|used| L((Linux))
T1p([sub-001_space-orig_dseg.nii]) -->|wasGeneratedBy| B
In this example, a brain extraction algorithm was applied on a T1-weighted image:
- sub-001_T1w.nii is the original T1-weighted image;
- sub-001_space-orig_dseg.nii is the skull striped image;
- the Brain extraction activity was performed using the FSL software within a Linux software environment.
The terms defined in this specification to describe provenance
are based on the RDF, the RDF Schema,
JSON-LD, and W3C PROV.
The corresponding IRIs are described
in the JSON-LD context file provenance-context.json
provided with this specification.
Furthermore, this specification allows to describe provenance with terms from other vocaularies.
This can be done using the Type fields for Activities,
Files or prov:Entity.
All BIDS examples related to provenance (see. bids-examples, provenance section)
show the aggregated version of the provenance metadata they contain.
This comes as a JSON-LD file and a visualization of the graph.
The JSON-LD file consists of an aggregation of the Activities,
Software, Files,
Datasets, prov:Entity
and Environments objects inside a Records object,
as well as a reference to the provenance-context.json file as
JSON-LD @context.
Minimal examples
Provenance of a BIDS raw dataset
Example
For a complete example, see Provenance of DICOM to NIfTI conversion with dcm2niix.
In this example, we explain provenance metadata of a DICOM to NIfTI conversion with dcm2niix.
Consider the following BIDS raw dataset:
├─ prov/
│ ├─ prov-dcm2niix_act.json
│ ├─ prov-dcm2niix_soft.json
│ └─ ...
├─ sourcedata/
│ └─ dicoms/
│ └─ ...
├─ sub-001/
│ └─ anat/
│ ├─ sub-001_T1w.json
│ └─ sub-001_T1w.nii.gz
└─ ...
The prov/prov-dcm2niix_soft.json file describes dcm2niix,
the software package used for the DICOM conversion.
As per the Provenance identifiers
section, the identifier for the associated software object SHOULD
start with bids:<dataset>:prov# (bids:: refers to the current dataset).
{
"Software": [
{
"Id": "bids::prov#dcm2niix-khhkm7u1",
"Label": "dcm2niix"
}
]
}
The prov/prov-dcm2niix_act.json file describes the conversion activity.
Note that the identifier for the previously described software package is used here
to describe that the software package was used to compute this activity.
{
"Activities": [
{
"Id": "bids::prov#conversion-00f3a18f",
"Label": "Conversion",
"AssociatedWith": "bids::prov#dcm2niix-khhkm7u1"
}
]
}
Inside the sub-001/anat/sub-001_T1w.json file,
the metadata field GeneratedBy indicates that the sub-001/anat/sub-001_T1w.nii.gz file
was generated by the previously described activity.
{
"GeneratedBy": "bids::prov#conversion-00f3a18f"
}
Provenance of a BIDS derivative dataset
Example
For a complete example, see Provenance of fMRI preprocessing with SPM.
In this example, we explain provenance metadata of fMRI preprocessing steps performed with SPM.
Consider the following BIDS derivative dataset:
├─ prov/
│ ├─ prov-spm_act.json
│ ├─ prov-spm_ent.json
│ └─ ...
├─ sub-01/
│ ├─ anat/
│ │ ├─ c1sub-001_T1w.json
│ │ ├─ c1sub-001_T1w.nii
│ │ ├─ ...
│ │ ├─ sub-001_T1w.json
│ │ └─ sub-001_T1w.nii
│ └─ func/
│ └─ ...
└─ ...
The prov/prov-spm_act.json file describes the preprocessing steps (activities) as JSON objects.
Among them:
-
the
bids::prov#movefile-bac3f385activity needed a T1w file from the ds000011 dataset identified bybids:ds000011:sub-01/anat/sub-01_T1w.nii.gz; -
the
bids::prov#segment-7d5d4ac5brain segmentation activity needed the two files listed inside theUsedarray.
{
"Activities": [
{
"Id": "bids::prov#movefile-bac3f385",
"Label": "Move file",
"Used": [
"bids:ds000011:sub-01/anat/sub-01_T1w.nii.gz"
]
},
{
"Id": "bids::prov#segment-7d5d4ac5",
"Label": "Segment",
"Used": [
"bids::prov#entity-28c0ba28",
"bids::sub-01/anat/sub-01_T1w.nii"
]
}
]
}
bids::sub-01/anat/sub-01_T1w.nii is a BIDS file available in the current dataset.
The spm12/tpm/TPM.nii file is not inside the dataset ;
hence its description is stored inside prov/prov-spm_ent.json and
its identifier is not a BIDS URI:
{
"Files": [
{
"Id": "bids::prov#entity-28c0ba28",
"Label": "TPM.nii",
"AtLocation": "spm12/tpm/TPM.nii"
}
]
}
Inside the sub-001/anat/c1sub-001_T1w.json file,
the metadata field GeneratedBy indicates that the c1sub-001/anat/sub-001_T1w.nii.gz file
was generated by the previously described brain segmentation activity.
{
"GeneratedBy": "bids::prov#segment-7d5d4ac5"
}
Provenance of a BIDS study dataset
Example
For a complete example, see Provenance of manual segmentations.
In this example, we explain provenance metadata of manual segmentations performed by two experts on the same T1w file. Consider the following BIDS study dataset:
├─ dataset_description.json
├─ derivatives/
│ ├─ seg-brain/
│ │ ├─ dataset_description.json
│ │ ├─ descriptions.tsv
│ │ ├─ ...
│ │ ├─ prov/
│ │ │ ├─ provenance.tsv
│ │ │ ├─ prov-seg_act.json
│ │ │ ├─ prov-seg_soft.json
│ │ │ └─ prov-seg_ent.json
│ │ └─ sub-001/
│ │ ├─ sub-001_space-orig_desc-exp1_dseg.json
│ │ ├─ sub-001_space-orig_desc-exp1_dseg.nii.gz
│ │ ├─ sub-001_space-orig_desc-exp2_dseg.json
│ │ └─ sub-001_space-orig_desc-exp2_dseg.nii.gz
│ └─ seg-lesions/
│ └─ ...
├─ ...
└─ sourcedata/
└─ raw/
├─ dataset_description.json
├─ prov/
│ └─ prov-raw_ent.json
└─ sub-001/
├─ sub-001_T1w.json
└─ sub-001_T1w.nii.gz
Inside the dataset_description.json file of the seg-brain derivative dataset,
the DatasetLinks metadata field defines an alias that is needed
to refer to the raw dataset using BIDS URIs.
{
"DatasetLinks": {
"raw": "../../sourcedata/raw"
}
}
The prov/prov-seg_act.json file describes activities during which
the experts generated segmentations.
{
"Activities": [
{
"Id": "bids::prov#segmentation-nO5RGsrb",
"Label": "Manual brain segmentation",
"Command": null,
"Used": [
"bids:raw:sub-001/anat/sub-001_T1w.nii.gz"
]
},
{
"Id": "bids::prov#segmentation-mOOypIYB",
"Label": "Manual brain segmentation",
"Command": null,
"Used": [
"bids:raw:sub-001/anat/sub-001_T1w.nii.gz"
]
}
]
}
Note that a description of the sub-001/anat/sub-001_T1w.nii.gz file is needed because
this data file is related to the activities.
Here we rely on the sourcedata/raw dataset to provide a description of the data file.
Under the derivatives/seg-brain dataset,
the sub-001_space-orig_desc-exp1_dseg.json file describes which activity generated
the sub-001_space-orig_desc-exp1_dseg.nii.gz file.
{
"GeneratedBy": "bids::prov#segmentation-nO5RGsrb"
}
The derivatives/seg-brain/prov/provenance.tsv gives a description of the prov-seg entity.
| provenance_id | description |
|---|---|
| prov-seg | Manual brain segmentation performed by two experts |
The descriptions.tsv gives descriptions of the desc- entities used for datafiles.
| desc_id | description |
|---|---|
| desc-exp1 | Files generated by expert #1 |
| desc-exp2 | Files generated by expert #2 |