Creating a cohort helps you quickly return to a subset of interest from the vast collection of data available in the Imaging Data Commons. A cohort is composed of your selections in the Search Scope and Search Configuration panels.
Do the following to create a cohort.
- 2.Click the Save As New Cohort button in the top-right of the portal.The Save Cohort dialog box appears, showing your selected filters.
- 3.Enter a name for the cohort. Optionally, enter a description.
- 4.Click Save Cohort. The cohort details page appears.
Save Cohort dialog box
The cohort details page shows the cohort name, description (if available), and filter definition.
Greyed out filter options on the Search Configuration panel
Cohort details page
The cohort manifest identifies the sources of data in your cohort and allows you to download this data. You can export a cohort manifest to CSV, TSV, and JSON formats, and to BigQuery.
To export the cohort manifest
- 2.Click the Export Cohort Manifest button. The Export Cohort Manifest dialog box appears.
- 4.Select whether you want to export the cohort as files or using BigQuery.
- If you want to download the cohort as CSV or TSV files, select Include header fields if you want this data in the export file.
- Select which format the files in the export should be by clicking Download CSV, Download TSV, or Download JSON.
- If you want to download the cohort using BigQuery, click Get BQ Table.
The header of the manifest contains the name of cohort, user, filters used, the date generated, and the total number of records. Data is separated by multiple files until the limit of 65,000 files is reached. If your data set is larger than this, you must export it using BigQuery.
Example cohort manifest
The fields provided in a cohort manifest are:
PatientID: value of the corresponding DICOM attribute
collection_id: abbreviated identifier of the source data collection
StudyInstanceUID: value of the corresponding DICOM attribute
SeriesInstanceUID: value of the corresponding DICOM attribute
SOPInstanceUID: value of the corresponding DICOM attribute (NB: included only when manifest is exported into BigQuery!)
source_DOI: Digital Object Identifier (DOI) of the source data collection. Pre-pending
https://doi.org/will give you the URL of the collection dataset
instance_uuid: CRDC UUID of the object maintained by CRDC IndexD corresponding to the DICOM instance, which can be resolved to the URL of the underlying objects (NB: included only when the manifest is exported into BigQuery!)
You must have a Google account to use BigQuery.
Exporting a manifest to BigQuery allows you to run complex, analytical, SQL-based queries on large sets of data. After you start the export, the following window appears on the cohort details page, showing your unique link.
The exported cohort manifest table is intended for short-term use, and will be deleted after seven days. However, you can always re-export the cohort manifest.
Comma-Separated Values (CSV) and Tab-Separated Values (TSV) files include header fields, so you can customize which of those fields you want to include in the export. You can also select which columns to include in the file.
On the cohort manifest export confirmation window, you can select or clear header fields that you want to appear in your export.
Cohort manifest header field options
JSON files do not include any header information but like CSV and TSV files, require that you identify which columns to include in your export. You can import the JSON file to a BigQuery table for further analysis.
Cohort manifest column options
Click the Cohorts button in the header to view a list of all the cohorts you created under your account. Each row in the cohorts list includes the corresponding cohort ID, name, the number of cases, studies, and series in the cohort, and the version of the IDC data against which the cohort was created. Pressing the plus icon on any row in the cohort list opens a second related row that provides information about the cohort's collections and filters.
Cohort manifests can be exported directly from the Cohorts page. Click the checkboxes corresponding to the desired cohorts, then click the Export Manifest button which opens the Export Cohort Manifest dialog box discussed above. Complete this dialog box to export separate manifests for all selected cohorts. Note that only a single cohort manifest can be exported to a file download at one time. Also cohorts with inactive data versions cannot be downloaded to a file. When multiple manifests are exported to BigQuery they are copied to separate BigQuery tables. One or more cohorts can be deleted by clicking the relevant checkboxes and then clicking the Delete button.
To indicate cohorts created with previous data versions in the cohort list the Data Version cells of such cohorts will have a grey background. The cohort manifest remains accessible for all cohorts after the active data version changes. However cohorts created with older data versions can no longer be opened within the portal. Clicking on the button in the Version Compare cell in the cohort list opens a pop-up window that compares the number of cases, studies, and series in the current cohort with those of a new cohort that would be created by applying the cohort's filters to the current data set.
Clicking on the Load New Version button will open the explorer page, applying these cohort filters to the current data version. The user can then save this new cohort or modify the filters as desired.