Exploring data and cohorts

Exploring imaging data

The Imaging Data Commons Portal user interface has four components to support exploration of Imaging data; a Search Scope panel, a Filter Definition panel, a Search Configuration panel, and a Collections panel. Below you will find more details regarding our four primary search panels available:

You can explore Imaging Data Commons (IDC) data and metadata by selecting filters in the Search Scope and Search Configuration panels on the IDC portal home page. Selecting filters narrows down the available image series to meet your criteria. You can then save your filter selection as a cohort for later use.

  • Search Scope panel: The Search Scope panel is primarily used to filter by collection. We currently have 20+ collection options present.

  • Search Configuration panel: The Search Configuration panel is the more detailed attribute filter option by utilizing various case, Segmentation, Qualitative, and Quantitative Analyses.

  • Search Results panel: The Search Results is the visual representation panel of the detailed attribute filter options we have available in the form of pie charts.

  • Collections panel: The Collections panel can be used to view a Selected Study and/or a Specific Series without any additional attribute option selected.

    We will cover more in more detail all the attribute options we have available within the Search Configuration panel and the Search Results panel.

Search Scope, Search Configuration, and Search Results panels
Search Scope, Search Configuration, Filter Definition, and Search Results Panels
Collections panel

The pie charts in the Search Results panel show the number of cases (or patients) in your search results by Anatomical Region, Segmentation Category, and Segmentation Type. Hover over a pie slice to see the name of the Anatomical Region, Segmentation Category, and Segmentation Type, number of each, and percent of the total in your search results.

You can also explore the IDC data without filters. If you want to view a collection's cases, studies, and series, scroll down the IDC portal home page until you reach the Collections panel. Click any link on the Collections panel to view available data about your selection in tabular form in the Filter Definition, Selected Cases, Selected Studies, and Selected Series panels.

Collections Panel

Defining search scope and configuration

Do the following to define the scope and configuration of your search.

  1. In the Search Scope panel, click COLLECTION to view the collections in the portal. Over 20 collections are available.

  2. Click the box to the left of a collection name to select one or more collections. You can hover over a collection name to view more information about the collection.

  3. In the Search Configuration panel, select filters on the Original, Derived, and Related tabs to narrow down the available image series. Click any of the filter names on these tabs to view and select the available options. Attribute filter selections in the Search Configuration panel that have no data available are highlighted in grey. Optionally, hide attributes with 0 cases by selecting the checkbox at the top of the panel. The following table describes each of the tabs in this panel.

Tab

Description

Original

This attribute set has been built by DICOM objects that were produced by image acquisition equipment (e.g., MR, CT or PET images). This tab also includes groups of attributes that are common across all DICOM objects, for example, Modality. For more information, see Original data.

Derived

You can filter all analyzed and post processed data with Derived attributes. Over 25 attribute filter options are available.

For more information see, Derived data.

The IDC portal sorts derived objects by the following categories:

  • Segmentations: volumetric annotations of the image regions stored as DICOM Segmentation objects

  • Qualitative Analysis: Qualitative evaluation results (e.g., scores or categories associated with image findings) stored in DICOM Structured Reporting TID1500 objects

  • Quantitative Analysis: Quantitative evaluation results (e.g., scores or categories associated with image findings) stored in DICOM Structured Reporting TID1500 objects

Related

The Cancer Genome Atlas collections have a rich filter selection for clinical data associated with imaging data. This filter set is useful when working primarily with the TCGA collections.

Filter attributes in this tab only filter cases within the TCGA collections. Other collections are not affected by these filters.

The organization of the TCGA related data is described in detail in the ISB-CGC documentation.

Understanding counts in the search results

The Imaging Data Commons hosts multiple nuances of non-mutually exclusive attributes. This may mean that attributes you did not select appear in your search results. You may want to take this into consideration when analyzing the data in your search results.

On the Search Configuration panel, the number of unique cases (or patients) for each attribute within a cohort is constructed by adding the given attribute (when absent) to the defined filter.

On the Search Results panel, each pie chart reports the number of cases (or patients) for all values within a given attribute, given the currently defined filter set. Once you select a case, instances that both meet and do not meet the search criteria corresponding to this case affect the charts' content; for example, cases selected based on the presence of CT modality may also contain PET modality, counts of which for that given case also appear in the chart summary.

Viewing collections, studies, and series

All collections in IDC as well as their total number of cases and number of cases in this cohort appear in the Collections panel. The panel shows the collection name, total number of cases, and total number of cases for this cohort. You can customize your display of this data by choosing how many entries to show by page and move to previous and next pages.

Click one or more collections to select them. The selected row or rows are highlighted. The available cases for the selected collection(s) appear in the Selected Cases panel. Click the up or down arrow to sort the list alphabetically or numerically, as appropriate for the column.

Collections panel

You must select a collection before you can view data in the Selected Cases, Selected Studies, and Selected Series panels.

Selecting a case per collection

All cases available for the selected collection appear in the Selected Cases panel. The panel shows the collection name, case ID, total number of studies, and total number of series for each case. You can customize your display of this data by choosing how many entries to show by page and move to previous and next pages.

Selected Cases panel

Click one or more cases to select them. The selected row or rows are highlighted. The available studies for the selected case(s) appear in the Selected Studies panel. Click the up or down arrow to sort the list alphabetically or numerically, as appropriate for the column.

Viewing studies per case

All studies available for the selected case appear in the Selected Studies panel. The panel shows the project name, case ID, study ID, and study description for each study. You can customize your display of this data by choosing how many entries to show by page and move to previous and next pages.

Selected Studies panel

Click one or more studies to select them. The selected row or rows are highlighted. The available series for the selected case(s) appear in the Selected Series panel. Click the up or down arrow to sort the list alphabetically or numerically, as appropriate for the column.

Click the icon in the View column for a study row to view study objects in the IDC Viewer, which is based on the Open Health Imaging Foundation (OHIF) Viewer.

For more detailed information on the OHIF viewer, see Visualizing images.

Viewing a series per study per case

All series available for the selected study appear in the Selected Series panel. The panel shows the study ID, series number, modality, body part examined, and series description for each study. You can customize your display of this data by choosing how many entries to show by page and move to previous and next pages. Click the up or down arrow to sort the list alphabetically or numerically, as appropriate for the column.

Selected Series panel

Click the icon in the View column for a study row to view study objects in the IDC Viewer, which is based on the Open Health Imaging Foundation (OHIF) Viewer.

Some objects can only be opened by the OHIF viewer at their related study level and not at the series level. For these objects, the icon in the View column shows that viewing is disabled.

For more detailed information on the OHIF viewer, see Image visualization.

Viewing data in BigQuery

After you export a cohort manifest to BigQuery, you can view IDC data in BigQuery as a BQ table. An example query follows that returns all studies for the collection QIN_HEADNECK.

SELECT PatientID, StudyInstanceUID
FROM `idc-dev-etl.idc_tcia_mvp_wave0.dicom_derived_all`
WHERE collection_id = 'qin_headneck'
GROUP BY PatientID, StudyInstanceUID

Understanding cohorts

Creating a cohort helps you quickly return to a subset of interest from the vast collection of data available in the Imaging Data Commons. A cohort is composed of your selections in the Search Scope and Search Configuration panels.

Creating a cohort

Do the following to create a cohort.

  1. Select filters on the Search Scope and Search Configuration panels.

  2. Click the Save As New Cohort button in the top-right of the portal.

    The Save Cohort dialog box appears, showing your selected filters.

  3. Enter a name for the cohort. Optionally, enter a description.

  4. Click Save Cohort. The cohort details page appears.

Save Cohort dialog box

The cohort details page shows the cohort name, description (if available), and filter definition. Attribute filter selections in the Search Configuration panel that have no data available are greyed out and show "0 cases."

Greyed out filter options on the Search Configuration panel

You can open any study or series associated with the cohort using the IDC Viewer. For more information on image visualization, see Visualizing images.

Cohort details page

Accessing the cohort manifest

The cohort manifest identifies the sources of data in your cohort and allows you to download this data. You can export a cohort manifest to CSV, TSV, and JSON formats, and to BigQuery.

You can download cohorts of up to 650,000 rows as a multipart file, with each file having a limit of 65,000 rows. Access cohorts larger that 650,000 rows by exporting to BigQuery.

To export the cohort manifest

  1. Create a cohort or click Cohorts on the top menu bar and select a cohort you previously created.

  2. Click the Export Cohort Manifest button. The Export Cohort Manifest dialog box appears.

  3. Select the header fields and columns you want to appear in the export file.

  4. Select whether you want to export the cohort as files or using BigQuery.

    • If you want to download the cohort as CSV or TSV files, select Include header fields if you want this data in the export file.

    • Select which format the files in the export should be by clicking Download CSV, Download TSV, or Download JSON.

    • If you want to download the cohort using BigQuery, click Get BQ Table.

Understanding the cohort manifest

The header of the manifest contains the name of cohort, user, filters used, the date generated, and the total number of records. Data is separated by multiple files until the limit of 65,000 files is reached. If your data set is larger than this, you must export it using BigQuery.

Example cohort manifest

The default fields provided in a cohort are:

  • PatientID: value of the corresponding DICOM attribute

  • collection_id: abbreviated identifier of the source data collection

  • StudyInstanceUID: value of the corresponding DICOM attribute

  • SeriesInstanceUID: value of the corresponding DICOM attribute

  • SOPInstanceUID: value of the corresponding DICOM attribute

  • source_DOI: Digital Object Identifier (DOI) of the source data collection. Pre-pending source_DOI with https://doi.org/ will give you the URL of the collection dataset

  • crdc_instance_uid: unique identifier of the object maintained by CRDC IndexD (details on how to use this UID will be shared at a later time, when the corresponding capability is available)

  • gcs_url: gs:// URL that can be used to access the object using the GCP gsutil tool

An example of how you can use an IDC cohort manifest to retrieve the manifest-defined cohort files is shown in colab notebooks.

A multipart file export can have a maximum of ten files with 65,000 rows each. If your export is larger than this, you must export the manifest via BigQuery.

Exporting to BigQuery

You must have a Google account to use BigQuery.

Exporting a manifest to BigQuery allows you to run complex, analytical, SQL-based queries on large sets of data. The export table is available for seven days. After you start the export, the following window appears on the cohort details page, showing your unique link.

Be sure to save this URL information or pin the BigQuery table to your Google console interface.

After the export table expires, you can create a new manifest for analysis.

Exporting as a file

Comma-Separated Values (CSV) and Tab-Separated Values (TSV) files include header fields, so you can customize which of those fields you want to include in the export. You can also select which columns to include in the file.

On the cohort manifest export confirmation window, you can select or clear header fields that you want to appear in your export.

Cohort manifest header field options

JSON files do not include any header information but like CSV and TSV files, require that you identify which columns to include in your export. You can import the JSON file to a BigQuery table for further analysis.

Cohort manifest column options

You must select a column option to export the cohort manifest.

Viewing the cohorts list

Click the Cohorts button in the header to view a list of all the cohorts you created under your account. The cohorts list includes the corresponding cohort ID, name, description, owner, how many times it has been shared (the ability to share the cohort is not available in the current version of the portal), and the version of the IDC data against which the cohort was created.

Cohorts list

To delete a cohort, click the box in its row and then click the Delete button. You can delete multiple cohorts at once.