Manifests

Manifests

A manifest is a table of access methods and other metadata of the objects in some cohort. There are two manifest endpoints. The POST /cohorts/manifest/{cohort_id} API endpoint returns a manifest of some previously defined cohort. Parameters are send to the endpoint in the request body. The JSON schema of the manifestBody can be seen on the IDC API v2 UI page. Here is an example:

{
  "fields": [
    "Age_At_Diagnosis",
    "aws_bucket",
    "crdc_series_uuid",
    "Modality",
    "SliceThickness"
  ],
  "counts": false,
  "group_size": false,
  "sql": false,
  "page_size": 1000
}

The fields parameter of the body indicates the fields whose values are to be included in the returned manifests. The /fields API endpoint returns a list of the fields that can be included in a manifest.

The counts, group_size, sql and page_size parameters will be described in subsequent sections.

Every row in the returned manifest will include one value for each of the above fields.

The POST /cohorts/manifest/preview API accepts both a fields list, and a cohort definition in the manifestPreviewBody. Here is an example manifestPreviewBody:

{
  "cohort_def": {
    "name": "mycohort",
    "description": "Example description",
    "filters": {
      "collection_id": [
        "TCGA_luad",
        "%_kirc"
      ],
      "Modality": [
        "CT",
        "MR"
      ],
      "Race": [
        "WHITE"
      ],
      "age_at_diagnosis_btw": [
        65,
        75
      ]
    }
  },
  "fields": [
    "Age_At_Diagnosis",
    "aws_bucket",
    "crdc_series_uuid",
    "Modality",
    "SliceThickness"
  ],
  "counts": true,
  "group_size": true,
  "sql": true,
  "page_size": 1000
}
    

This endpoint behaves like the following API sequence:

POST /cohorts    #Create a cohort
POST /cohorts/manifest/{cohort_id} # Get a manifest for the new cohort
DELETE /cohorts/{cohort_id} # Delete the new cohort

That is, it behaves as if a cohort is created, a manifest for that cohort is returned and the new cohort is deleted.

The /cohorts/manifest/{cohort_id} endpoint returns a manifestResponse JSON object and the /cohorts/manifest/preview returns a manifestPreviewResponse JSON object. Here is an example manifestResponse:

{
  "code": 200,
  "cohort_def": {
    "cohort_id": 23,
    "description": "Example description",
    "user_email": "somebody@somemail.com",
    "filterSet": {
      "filters": {
        "Modality": [
          "CT",
          "MR"
        ],
        "age_at_diagnosis_btw": [
          65,
          75
        ],
        "collection_id": [
          "tcga_luad",
          "%_kirc"
        ],
        "race": [
          "WHITE"
        ]
      },
      "idc_data_version": "16.0"
    },
    "name": "mycohort",
  },
  "manifest": {
    "manifest_data": [
      {
        "Modality": "MR",
        "SliceThickness": "10.0",
        "age_at_diagnosis": 66,
        "aws_bucket": "idc-open-data",
        "crdc_series_uuid": "09bc812b-53f7-48fc-8895-72f6b03f642b"
      },
      {
        "Modality": "CT",
        "SliceThickness": "2.5",
        "age_at_diagnosis": 66,
        "aws_bucket": "idc-open-data",
        "crdc_series_uuid": "102d676d-6c6f-4c20-bb36-77ec81b81b13"
      },
      {
        "Modality": "CT",
        "SliceThickness": "8.0",
        "age_at_diagnosis": 66,
        "aws_bucket": "idc-open-data",
        "crdc_series_uuid": "1d365f52-bff4-4348-a508-82d399ca8442"
      },   
      :
      {
        "Modality": "CT",
        "SliceThickness": "1000.090881",
        "age_at_diagnosis": 74,
        "aws_bucket": "idc-open-data",
        "crdc_series_uuid": "faa47e10-45df-44a7-9f8b-2923a41196b4"
      }
    ],
    "rowsReturned": 626,
    "totalFound": 626
  },
  "next_page": ""
}

The cohort definition is included so that the manifest is self-documenting. The manifest_data component of the manifest component contains a row for each distinct combination of the requested fields in the cohort. The idc_data_version in the cohort_def is the IDC version when the cohort was created. To generate the manifest, the cohort's filter is applied against the data in that IDC version.

The structure of the manifestPreviewResponse returned by the /cohorts/manifest/preview API endpoint is identical to the manifestResponse except that it does not have a cohort_id or user_email component.

Because the /cohorts/manifest/preview API endpoint is always applied against the current IDC version, the idc_data_version in the cohort_def is always that of the current IDC version.

The next_page value is described in the next section.

Groups and group_size

We use the term group to indicate the set of all instances in the cohort having the values of some row in the manifest. Thus the values of the first row above:

"Modality": "MR",
"SliceThickness": "10.0",
"age_at_diagnosis": 66,
"aws_bucket": "idc-open-data",
"crdc_series_uuid": "09bc812b-53f7-48fc-8895-72f6b03f642b" 

implicitly define a group of instances in the cohort, each of which has those values.

When the group_size parameter in the manifestBody or manifestPreviewBody is true, the resulting manifest includes the total size in bytes of the instances in the corresponding group. Following is a fragment of the manifest for the same cohort above, but where the fields list includes group_size:

{
  "code": 200,
  "cohort_def": {
    "description": "Example description",
    "filterSet": {
      "filters": {
        "Modality": [
          "CT",
          "MR"
        ],
        "age_at_diagnosis_btw": [
          65,
          75
        ],
        "collection_id": [
          "tcga_luad",
          "tcga_kirc"
        ],
        "race": [
          "WHITE"
        ]
      },
      "idc_data_version": "16.0"
    },
    "name": "mycohort",
    "sql": ""
  },
  "next_page": "",
  "manifest": {
    "manifest_data": [
      {
        "Modality": "MR",
        "SliceThickness": "10.0",
        "age_at_diagnosis": 66,
        "aws_bucket": "idc-open-data",
        "crdc_series_uuid": "09bc812b-53f7-48fc-8895-72f6b03f642b",
        "group_size": 2690320
      },
      {
        "Modality": "CT",
        "SliceThickness": "2.5",
        "age_at_diagnosis": 66,
        "aws_bucket": "idc-open-data",
        "crdc_series_uuid": "102d676d-6c6f-4c20-bb36-77ec81b81b13",
        "group_size": 42818868
      },
      {
        "Modality": "CT",
        "SliceThickness": "8.0",
        "age_at_diagnosis": 66,
        "aws_bucket": "idc-open-data",
        "crdc_series_uuid": "1d365f52-bff4-4348-a508-82d399ca8442",
        "group_size": 20064536
      },
      :
      :
      {
        "Modality": "CT",
        "SliceThickness": "1000.090881",
        "age_at_diagnosis": 74,
        "aws_bucket": "idc-open-data",
        "crdc_series_uuid": "faa47e10-45df-44a7-9f8b-2923a41196b4",
        "group_size": 6518724
      }
    ],
    "rowsReturned": 626,
    "totalFound": 626
  },
  "next_page": ""
}

Here we see that the instances in the group corresponding to the first result row have a total size of 2,690,320B.

The totalFound value at the end of the manifest tells us that there are 626 rows in the manifest, meaning the manifest contains 626 different combinations of Modality, SliceThickness, age_at_diagnosis, aws_bucket, and crdc_series uuid. (The group size does not add to the combinatorics.) The rowsReturned value indicates that all the rows in the manifest were return in the first "page". If not all the rows had been returned, we can ask for additional "pages" as described in the next section.

The group_size parameter is optional and defaults to false .

Manifest granularity

If the counts parameter is true, the resulting manifest will selectively include counts of the instances, series, studies, patients and collections in each group. Which counts are included in a manifest is determined by the granularity and which, in turn, is determined by certain of the possible fields in the fields parameter list of the manifestBody or manifestPreviewBody.

For example, if the fields parameter list includes the SOPInstanceUID field, there will one group per instance in the manifest. Thus the manifest has instance granularity. A manifest has one of instance, series, study, patient, collection or version granularity.

For a given manifest granularity, and when counts is True, counts of the "lower level" objects are reported in the manifest. Thus, if a cohort has series granularity, then the count of all instances in each group is reported. If a cohort has study granularity, then the count of all instances in each group and of all series in each group is reported. And so on. This is described in detail in the remainder of this section.

In the following, manifest examples are based on this filterSet:

   "filters": {
      "collection_id": [
        "tcga_luad",
        "tcga_kirc"
      ],
      "Modality": [
        "CT",
        "MR"
      ],
      "Race": [
        "WHITE"
      ],
      "age_at_diagnosis_btw": [
        65,
        75
      ]
    }

Instance granularity

A manifest will have instance granularity if the fields parameter list includes one or both of the fields:

  • SOPInstanceUID

  • crdc_instance_uuid

Both of these fields are unique to each instance. Therefore the resulting manifest will include one row for each instance in the specified cohort. For example, the following fields list will result in a manifest having a row per instance:

{
  "fields": [
    "SOPInstanceUID",
    "Modality",
    "SliceThickness"
  ]
}

Each row will include the SOPInstanceUID, Modality and SliceThickness of the corresponding instance.

The counts parameter is ignored because there are no 'lower level' objects than instances,

Series granularity

A manifest will have series granularity if it goes not have instance granularity and the fields parameter list includes one or more of thee field:

  • SeriesInstanceUID

  • crdc_series_uuid

Both of these fields are unique to each series, and therefore the resulting manifest will include at least one row per series in the specified cohort. For example, the following fields list will result in a manifest having one or more rows per series:

"fields": [
  "Modality",
  "SliceThickness",
  "collection_id",
  "patientID",
  "StudyInstanceUID",
  "SeriesInstanceUID"
]

Because the SeriesInstanceUID is unique to each series in a cohort (more accurately, all instances in a series have the same SeriesInstanceUID), there will be at least one row per series in the resulting manifest. However, SliceThickness is not necessarily unique across all instance in a series. Therefore, the resulting manifest may have multiple rows for a given series...rows in which the SeriesInstanceUID is the same but the SliceThickness values differ. DICOM modality should always be the same for all instances in a series; therefore it is not expected to result in multiple rows per series.

If the counts parameter is true, each row of the manifest will have:

  • an instance_count value that is the count of instances in the group corresponding to the row

If the above fields then this is a fragment of the series granularity manifest of our example cohort:

{
  "code": 200,
  "cohort_def": {
    "description": "Example description",
    "filterSet": {
      "filters": {
        "Modality": [
          "CT",
          "MR"
        ],
        "age_at_diagnosis_btw": [
          65,
          75
        ],
        "collection_id": [
          "tcga_luad",
          "tcga_kirc"
        ],
        "race": [
          "WHITE"
        ]
      },
      "idc_data_version": "16.0"
    },
    "name": "mycohort",
    "sql": ""
  },
  "manifest": {
    "manifest_data": [
      {
        "Modality": "CT",
        "PatientID": "TCGA-50-6592",
        "SeriesInstanceUID": "1.3.6.1.4.1.14519.5.2.1.6450.9002.141004994853145237754973938025",
        "SliceThickness": null,
        "StudyInstanceUID": "1.3.6.1.4.1.14519.5.2.1.6450.9002.256822832756566055874151999412",
        "collection_id": "tcga_luad",
        "instance_count": "151"
      },
      {
        "Modality": "CT",
        "PatientID": "TCGA-50-6592",
        "SeriesInstanceUID": "1.3.6.1.4.1.14519.5.2.1.6450.9002.212096199865546132848990878032",
        "SliceThickness": null,
        "StudyInstanceUID": "1.3.6.1.4.1.14519.5.2.1.6450.9002.256822832756566055874151999412",
        "collection_id": "tcga_luad",
        "instance_count": "61"
      },
      {
        "Modality": "CT",
        "PatientID": "TCGA-50-6595",
        "SeriesInstanceUID": "1.3.6.1.4.1.14519.5.2.1.6450.9002.829269157955398706933292266867",
        "SliceThickness": "0.578125",
        "StudyInstanceUID": "1.3.6.1.4.1.14519.5.2.1.6450.9002.414530650520592976265083061155",
        "collection_id": "tcga_luad",
        "instance_count": "1"
      },
      :
      :
      {
        "Modality": "MR",
        "PatientID": "TCGA-B0-5109",
        "SeriesInstanceUID": "1.3.6.1.4.1.14519.5.2.1.6450.4004.370888372270096165934432087127",
        "SliceThickness": "20.0",
        "StudyInstanceUID": "1.3.6.1.4.1.14519.5.2.1.6450.4004.167173047835125001355984228239",
        "collection_id": "tcga_kirc",
        "instance_count": "50"
      }
    ],
    "rowsReturned": 742,
    "totalFound": 742
  }
  "next_page": ""
}

This tells us that the group of instances corresponding to the first row of the manifest results has 151 members.

Study Granularity

A manifest will have study granularity if it goes not have series or instance granularity and the queryFields list includes one or more of the fields:

  • StudyInstanceUID

  • crdc_study_uuid

Both of these fields are unique to each study, and therefore the resulting manifest will include at least one row per study in the specified cohort. For example, the following fields list will result in a manifest having a one or more rows per study:

"fields": [
    "Modality",
    "SliceThickness",
    "collection_id",
    "patientID",
    "StudyInstanceUID",
    "group_size",
    "counts"
]

Similarly, SliceThickness can vary not only among the instances in a series, but among series in a study. Therefore, the resulting manifest may have multiple rows for a study, and which differ from each other in both SliceThickness and Modality.

If counts is in the fields list, each row of the manifest will have:

  • an instance_count value that is the count of instances in the group corresponding to the row

  • a series_count value that is the count of series in the group corresponding to the row

If the fields list is as above, then this is a fragment of the study granularity manifest of our example cohort:

{
  "code": 200,
  "cohort_def": {
    "description": "Example description",
    "filterSet": {
      "filters": {
        "Modality": [
          "CT",
          "MR"
        ],
        "age_at_diagnosis_btw": [
          65,
          75
        ],
        "collection_id": [
          "tcga_luad",
          "tcga_kirc"
        ],
        "race": [
          "WHITE"
        ]
      },
      "idc_data_version": "16.0"
    },
    "name": "mycohort",
    "sql": ""
  },
  "manifest": {
    "manifest_data": [
      {
        "Modality": "CT",
        "PatientID": "TCGA-50-6592",
        "SliceThickness": null,
        "StudyInstanceUID": "1.3.6.1.4.1.14519.5.2.1.6450.9002.256822832756566055874151999412",
        "collection_id": "tcga_luad",
        "instance_count": 212,
        "series_count": 2
      },
      {
        "Modality": "CT",
        "PatientID": "TCGA-50-6595",
        "SliceThickness": "0.578125",
        "StudyInstanceUID": "1.3.6.1.4.1.14519.5.2.1.6450.9002.414530650520592976265083061155",
        "collection_id": "tcga_luad",
        "instance_count": 1,
        "series_count": 1
      },
      {
        "Modality": "CT",
        "PatientID": "TCGA-B8-4153",
        "SliceThickness": "0.6",
        "StudyInstanceUID": "1.3.6.1.4.1.14519.5.2.1.8421.4004.499780439902438461273732269226",
        "collection_id": "tcga_kirc",
        "instance_count": 2,
        "series_count": 1
      },
      :
      :
      {
        "Modality": "MR",
        "PatientID": "TCGA-B0-5109",
        "SliceThickness": "20.0",
        "StudyInstanceUID": "1.3.6.1.4.1.14519.5.2.1.6450.4004.167173047835125001355984228239",
        "collection_id": "tcga_kirc",
        "instance_count": 100,
        "series_count": 2
      }
    ],
    "rowsReturned": 324,
    "totalFound": 324
  },
  "next_page": ""
}

This tells us that the group of instances corresponding to the first row of the manifest results has 212 members, divided among two series. The group of instances corresponding to the third row of the manifest results has two members in a single series.

Patient Granularity

A manifest will have patient granularity if it goes not have study, series or instance granularity and the fields list includes the field PatientID. This field is unique to each patient, and therefore the resulting manifest will include at least one row per patient in the specified cohort. For example, the following fields list will result in a manifest having a one or more rows per study:

"fields": [
    "Modality",
    "SliceThickness",
    "collection_id",
    "patientID",
    "group_size",
    "counts"
]

Because the PatientID is unique to each patient in a cohort (more accurately, all instances in a study have the same PatientID), there will be at least one row per patient in the resulting manifest. It is common for a patient's series to examine different body parts. Therefore, the resulting manifest may well have more than one row per patient.

If counts is in the fields list, each row of the manifest will have:

  • an instance_count value that is the count of instances in the group corresponding to the row

  • a series_count value that is the count of series in the group corresponding to the row

  • a study_count value that is the count of studies in the group corresponding to the row

If the fields list is as above, then this is a fragment of the patient granularity manifest of our example cohort:

{
  "code": 200,
  "cohort_def": {
    "description": "Example description",
    "filterSet": {
      "filters": {
        "Modality": [
          "CT",
          "MR"
        ],
        "age_at_diagnosis_btw": [
          65,
          75
        ],
        "collection_id": [
          "tcga_luad",
          "tcga_kirc"
        ],
        "race": [
          "WHITE"
        ]
      },
      "idc_data_version": "16.0"
    },
    "name": "mycohort",
    "sql": ""
  },
  "next_page": "",
  "manifest": {
    "manifest_data": [
     {
        "Modality": "CT",
        "PatientID": "TCGA-50-6592",
        "SliceThickness": null,
        "collection_id": "tcga_luad",
        "instance_count": "212",
        "series_count": "2",
        "study_count": "1"
      },
      {
        "Modality": "CT",
        "PatientID": "TCGA-50-6595",
        "SliceThickness": "0.578125",
        "collection_id": "tcga_luad",
        "instance_count": "1",
        "series_count": "1",
        "study_count": "1"
      },
      {
        "Modality": "CT",
        "PatientID": "TCGA-B8-4153",
        "SliceThickness": "0.6",
        "collection_id": "tcga_kirc",
        "instance_count": "6",
        "series_count": "2",
        "study_count": "2"
      },
      :
      :
      {
        "Modality": "MR",
        "PatientID": "TCGA-B0-5109",
        "SliceThickness": "20.0",
        "collection_id": "tcga_kirc",
        "instance_count": "100",
        "series_count": "2",
        "study_count": "1"
      }
    ],
    "rowsReturned": 301,
    "totalFound": 301
  }
}

This tells us that the group of instances corresponding to the first row of the manifest results has 212 members divided among two series, and both in a single study.

Collection Granularity

A manifest will have collection granularity if it goes not have patient, study, series or instance granularity and the fields parameter list includes the field collection_id. This field is unique to each collection, and therefore the resulting manifest will include at least one row per collection in the specified cohort. For example, the following fields list will result in a manifest having a one or more rows per study:

"fields": [
    "Modality",
    "SliceThickness",
    "collection_id",
    "patientID",
    "group_size",
    "counts"
]

Because the collection_id is unique to each collection in a cohort (more accurately, all instances in a collection have the same collection_id), there will be at least one row per collection in the resulting manifest. It is common for a collection to have patients of different ages. Therefore, the resulting manifest may well have more than one row per patient.

If the fields list is as follows:

then this is a fragment of the collection granularity manifest of our example cohort:

{
  "code": 200,
  "cohort_def": {
    "description": "Example description",
    "filterSet": {
      "filters": {
        "Modality": [
          "CT",
          "MR"
        ],
        "age_at_diagnosis_btw": [
          65,
          75
        ],
        "collection_id": [
          "tcga_luad",
          "tcga_kirc"
        ],
        "race": [
          "WHITE"
        ]
      },
      "idc_data_version": "16.0"
    },
    "name": "mycohort",
    "sql": ""
  },
  "manifest": {
    "manifest_data": [
      {
        "Modality": "CT",
        "SliceThickness": null,
        "collection_id": "tcga_luad"
        "instance_count": "212",
        "patient_count": "1",
        "series_count": "2",
        "study_count": "1"
      },
      {
        "Modality": "CT",
        "SliceThickness": "0.578125",
        "collection_id": "tcga_luad",
        "instance_count": "1",
        "patient_count": "1",
        "series_count": "1",
        "study_count": "1"
      },
      {
        "Modality": "CT",
        "SliceThickness": "0.6",
        "collection_id": "tcga_kirc",
        "instance_count": "29",
        "patient_count": "9",
        "series_count": "16",
        "study_count": "14"
      },
      :
      :
      {
        "Modality": "MR",
        "SliceThickness": "20.0",
        "collection_id": "tcga_kirc",
        "instance_count": "100",
        "patient_count": "1",
        "series_count": "2",
        "study_count": "1"
      }
    ],
    "rowsReturned": 88,
    "totalFound": 88
  }
  "next_page": "",
}

Version granularity

A manifest will have version granularity if it does not have collection, patient, study, series or instance granularity. At this granularity level, the rows in the manifest return the combinations of queried values across all collects, patients, studies, series and instances in the cohort.

When the fields list is as follows:

"fields": [
    "Modality",
    "SliceThickness",
    "patientID",
    "group_size",
    "counts"
]

then this is a fragment of the version granularity manifest of our example cohort:

{
  "code": 200,
  "cohort_def": {
    "description": "Example description",
    "filterSet": {
      "filters": {
        "Modality": [
          "CT",
          "MR"
        ],
        "age_at_diagnosis_btw": [
          65,
          75
        ],
        "collection_id": [
          "tcga_luad",
          "tcga_kirc"
        ],
        "race": [
          "WHITE"
        ]
      },
      "idc_data_version": "16.0"
    },
    "name": "mycohort",
    "sql": ""
  },
  "manifest": {
    "manifest_data": [
      {
        "Modality": "CT",
        "SliceThickness": null,
        "collection_count": "1",
        "instance_count": "212",
        "patient_count": "1",
        "series_count": "2",
        "study_count": "1"
      },
      {
        "Modality": "CT",
        "SliceThickness": "0.578125",
        "collection_count": "1",
        "instance_count": "1",
        "patient_count": "1",
        "series_count": "1",
        "study_count": "1"
      },
      {
        "Modality": "CT",
        "SliceThickness": "0.6",
        "collection_count": "2",
        "instance_count": "34",
        "patient_count": "11",
        "series_count": "19",
        "study_count": "17"
      },
      {
      :
      :
      {
        "Modality": "MR",
        "SliceThickness": "20.0",
        "collection_count": "1",
        "instance_count": "100",
        "patient_count": "1",
        "series_count": "2",
        "study_count": "1"
      }
    ],
    "rowsReturned": 87,
    "totalFound": 87
  }
  "next_page": "",
}

Row one of the results tells us that the cohort has 212 instances having a Null SliceThickness and modality="CT". Also, there are apparently 87 different combinations of Modality and SliceThickness in the cohort as shown by the totalFound value.

Last updated