IDC User Guide
  • Welcome!
  • 🚀Getting started
  • Core functions
  • Frequently asked questions
  • Support
  • Key pointers
  • Publications
  • IDC team
  • Acknowledgments
  • Jobs
  • Data
    • Introduction
    • Data model
    • Data versioning
    • Organization of data
      • Files and metadata
      • Resolving CRDC Globally Unique Identifiers (GUIDs)
      • Clinical data
      • Organization of data, v2 through V13 (deprecated)
        • Files and metadata
        • Resolving CRDC Globally Unique Identifiers (GUIDs)
        • Clinical data
      • Organization of data in v1 (deprecated)
    • Downloading data
      • Downloading data with s5cmd
    • Data release notes
    • Data known issues
  • Tutorials
    • Portal tutorial
    • Python notebook tutorials
    • Slide microscopy
      • Using QuPath for visualization
  • DICOM
    • Introduction to DICOM
    • DICOM data model
    • Original objects
    • Derived objects
      • DICOM Segmentations
      • DICOM Radiotherapy Structure Sets
      • DICOM Structured Reports
    • Coding schemes
    • DICOM-TIFF dual personality files
    • IDC DICOM white papers
  • Portal
    • Getting started
    • Exploring and subsetting data
      • Configuring your search
      • Exploring search results
      • Data selection and download
    • Visualizing images
    • Proxy policy
    • Viewer release notes
    • Portal release notes
  • API
    • Getting Started
    • IDC API Concepts
    • Manifests
    • Accessing the API
    • Endpoint Details
    • V1 API
      • Getting Started
      • IDC Data Model Concepts
      • Accessing the API
      • Endpoint Details
      • Release Notes
  • Cookbook
    • Colab notebooks
    • BigQuery
    • Looker dashboards
      • Dashboard for your cohort
      • More dashboard examples
    • ACCESS allocations
    • Compute engine
      • 3D Slicer desktop VM
      • Using a BQ Manifest to Load DICOM Files onto a VM
      • Using VS Code with GCP VMs
      • Security considerations
    • NCI Cloud Resources
Powered by GitBook
On this page
  • IDC releases summary view
  • V20 - November 2024
  • v19 - September 2024
  • v18 - April 2024
  • v17 - December 2023
  • v16 - September 2023
  • v15 - July 2023
  • v14 - May 2023
  • v13 - Mar 2023
  • v12 - Nov 2022
  • v11 - Sept 2022
  • v10 - Aug 2022
  • v9 - May 2022
  • v8 - April 2022
  • v7 - February 2022
  • v6 - January 2022
  • v5 - December 2021
  • v4 - September 2021
  • v3 - August 2021
  • v2 - June 2021
  • v1 - October 2020

Was this helpful?

Edit on GitHub
Export as PDF
  1. Data

Data release notes

PreviousDownloading data with s5cmdNextData known issues

Last updated 4 months ago

Was this helpful?

Data hosted by IDC is ingested from several sources, including , , and .

Please refer to the license and terms of use, which are defined in the license_url and source_doi or source_doi of the IDC BigQuery . You can filter the data by license type in the .

IDC releases summary view

V20 - November 2024

New radiology collections

New pathology collections

Revised radiology collections

Revised pathology collections

Revised analysis results

    The segmentation of an instance in each of the following series was excluded due to having a DICOM PixelData size greater than or equal to 2GB:

    1. 1.2.826.0.1.3680043.10.511.3.10544506665348704312902213950958190

    2. 1.2.826.0.1.3680043.10.511.3.11183783347037364699862133130586654

    3. 1.2.826.0.1.3680043.10.511.3.11834745481756047014039855874680259

    4. 1.2.826.0.1.3680043.10.511.3.11901667084519361717338400810055642

    5. 1.2.826.0.1.3680043.10.511.3.12041600048156613329793822566495651

    6. 1.2.826.0.1.3680043.10.511.3.12718116375608495830041119776887887

    7. 1.2.826.0.1.3680043.10.511.3.13386724401829265460622415500801368

    8. 1.2.826.0.1.3680043.10.511.3.14042734131864468280344737986870899

    9. 1.2.826.0.1.3680043.10.511.3.17374765903080083648409690755539184

    10. 1.2.826.0.1.3680043.10.511.3.17429002643681869326389465422353495

    11. 1.2.826.0.1.3680043.10.511.3.20359930476040698387716730891020638

    12. 1.2.826.0.1.3680043.10.511.3.28397033639127902823368316410884210

    13. 1.2.826.0.1.3680043.10.511.3.28425539132321749931109935391487352

    14. 1.2.826.0.1.3680043.10.511.3.34574227972763695321794092913087775

    15. 1.2.826.0.1.3680043.10.511.3.36216094237641867532902805456135029

    16. 1.2.826.0.1.3680043.10.511.3.39533936694797964318706337783276378

    17. 1.2.826.0.1.3680043.10.511.3.39900930856460689132625586523683939

    18. 1.2.826.0.1.3680043.10.511.3.41633795217567037218184715094985555

    19. 1.2.826.0.1.3680043.10.511.3.42218106649761752724553401155203874

    20. 1.2.826.0.1.3680043.10.511.3.49098870621170235412220976183110770

    21. 1.2.826.0.1.3680043.10.511.3.50064322235999800062455171235601125

    22. 1.2.826.0.1.3680043.10.511.3.50905421517530127976832505410705816

    23. 1.2.826.0.1.3680043.10.511.3.62935684444056080516153739948364303

    24. 1.2.826.0.1.3680043.10.511.3.73572792121235596011940904319511291

    25. 1.2.826.0.1.3680043.10.511.3.74494366757564543824303304482444570

    26. 1.2.826.0.1.3680043.10.511.3.79988146996803179892075404247166692

    27. 1.2.826.0.1.3680043.10.511.3.80004293150506819482091023564947091

    28. 1.2.826.0.1.3680043.10.511.3.82774274518897141254234567300292686

    29. 1.2.826.0.1.3680043.10.511.3.84202416467561501610598853920808906

    30. 1.2.826.0.1.3680043.10.511.3.86214492184712627544696209982376598

    31. 1.2.826.0.1.3680043.10.511.3.90193069664920622990317347485104073

    32. 1.2.826.0.1.3680043.10.511.3.95666157880521064637011880609274546

    33. 1.2.826.0.1.3680043.10.511.3.96676982370873257329281821215166082

    34. 1.2.826.0.1.3680043.10.511.3.98258035017480972315346136181769675

New Clinical Metadata Tables

v19 - September 2024

New pathology collections

New analysis results

Revised radiology collections

Cancer Moonshot Biobank (CMB) radiology images were updated to fix incorrect values assigned to PatientID (see details on the collection pages linked above). The updated images have different DICOM Study/Series/SOPInstanceUIDs.

Revised analysis results

New clinical metadata tables

v18 - April 2024

New radiology collections

New analysis results

Revised radiology collections

(starred collections are revised due to new or revised analysis results)

Revised pathology collections

(starred collections are revised due to new or revised analysis results)

    1. Also added missing instance SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.3459553143.523311062.1687086765943.9.0

    2. Removed corrupted instances

      1. SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.2164023716.1899467316.1685791236516.37.0

      2. SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.2411736851.773458418.1686038949651.37.0

      3. SOPInstanceUID: 1.3.6.1.4.1.5962.99.1.2411736851.773458418.16860389

  1. TCGA-DLBC (No description page)

New clinical metadata tables

Notes

The deprecated columns tcia_api_collection_id and idc_webapp_collection_id have been removed from the auxiliary_metadata table in the idc_v18 BQ dataset. These columns were duplicates of columns collection_name and collection_id respectively.

v17 - December 2023

New radiology collections

New analysis results

  1. Collections analyzed:

Revised radiology collections

New clinical metadata tables

v16 - September 2023

New radiology collections

New pathology collections

Revised radiology collections

New analysis results

New clinical metadata tables

v15 - July 2023

New radiology collections

New pathology collections

Revised radiology collections

Revised pathology collections

New analysis results

Revised analysis results

New clinical metadata tables

v14 - May 2023

v13 - Mar 2023

New analysis results collection:

New clinical data collections:

v12 - Nov 2022

New collections:

Updated collections:

Other:

Metadata corresponding to "limited" access collections are removed.

New clinical data collections:

Other clinical data updates:

Limited access collections are removed. Clinical metadata for the COVID-19-NY-SUB and ACRIN 6698/I-SPY2 Breast DWI collections now includes information ingested from data dictionaries associated with these collections. In v11 the string value 'NA' was being changed to null during the ETL process for some columns/collections. This is now fixed in v12 and the value 'NA' is preserved.

v11 - Sept 2022

This release introduces clinical data ingested for a subset of collections, and now available via a dedicated BigQuery dataset.

New collections:

v10 - Aug 2022

New collections:

Updated collections:

CPTAC, TCGA and NLST collections have been reconverted due to a technical issue identified with a subset of images included in v9.

  1. TCGA-DLBC

  • TCGA-KIRP: PatientID TCGA-5P-A9KA, StudyInstanceUID 2.25.191236165605958868867890945341011875563

  • TCGA-BRCA: PatientID TCGA-OL-A66H, StudyInstanceUID 2.25.82800314486527687800038836287574075736 The affected files will be included in IDC when the infrastructure limitation is addressed.

Collection access level change:

v9 - May 2022

This data release introduces the concept of differential license to IDC: some of the collections maintained by IDC contain items that have different licenses. As an example, radiology component of the TCGA-GBM collection is covered by the TCIA limited access license, and is not available in IDC, while the digital pathology component is covered by CC-BY. With this release, we complete sharing in full of the digital pathology component of the datasets released by the CPTAC and TCGA programs.

New collections:

Updated collections:

v8 - April 2022

The main highlight of this release is the addition of the NLST and TCGA Slide Microscopy imaging data. New TCGA content includes introduction of new (to IDC) TCGA collections that have only slide microscopy component, and addition of the slide microscopy component to those IDC collections that were available earlier and included only the radiology component.

New collections

  1. TCGA-DLBC (TCGA-DLBC collection does not have a description page)

Updated collections

v7 - February 2022

The main highlight of this release is the addition of the Slide Microscopy imaging component to the remaining CPTAC collections.

New collections

Updated collections

v6 - January 2022

Original collections:

Analysis results collections:

v5 - December 2021

New collections:

New analysis results collections:

Updated collections:

v4 - September 2021

1) CT images available as any other imaging collection (via IDC Portal, BigQuery metadata tables, and storage buckets);

3) One instance is missing from patient/study/series: 126153/1.2.840.113654.2.55.319335498043274792486636919135185299851/1.2.840.113654.2.55.262421043240525317038356381369289737801

4) Three instances are missing from patient/study/series: 215303/1.3.6.1.4.1.14519.5.2.1.7009.9004.337968382369511017896638591276/1.3.6.1.4.1.14519.5.2.1.7009.9004.180224303090109944523368212991

v3 - August 2021

The DICOM Slide Microscopy (SM) images included in the collections above in IDC are not available in TCIA. TCIA only includes images in the vendor-specific SVS format!

v2 - June 2021

New original collections:

New analysis results collections:

v1 - October 2020

Original collections included:

Analysis collections included:

Collections analyzed:

Collections analyzed:

WARNING: After the release of v20, it was discovered that a mistake had been made during data conversion that affected the newly-released segmentations accompanying the "RMS-Mutation-Prediction" collection. Segmentations released in v20 for this collection have the segment labels for alveolar rhabdomyosarcoma (ARMS) and embryonal rhabdomyosarcoma (ERMS) switched in the metadata relative to the correct labels. Thus segment 3 in the released files is labelled in the metadata (the SegmentSequence) as ARMS but should correctly be interpreted as ERMS, and conversely segment 4 in the released files is labelled as ERMS but should be correctly interpreted as ARMS. We apologize for the mistake and any confusion that it has caused, and will be releasing a corrected version of the files in the next release as soon as possible. Collections analyzed:

Collections analyzed:

Collections analyzed:

Collections analyzed:

* Collections analyzed:

** Collections analyzed:

(revisions only to clinical data)

**

(fix PatientAges > 090Y)

(fix PatientAges > 090Y)

*

(All TCGA revisions are to correct multiple manufacturer values within same series)

Collections analyzed:

(TCIA description: (Repair of DICOM tag(0008,0005) to value "ISO_IR 100" in 79 series)

(Revised because results from CPTAC-CRCC-Tumor-Annotations were added)

(Revised because results from CPTAC-UCEC-Tumor-Annotations were added)

(Revised because results from CPTAC-PDA-Tumor-Annotations were added)

(ICDC-Glioma radiology added in a previous version)

(TCIA description: “Radiology modality data cleanup to remove extraneous scans.”)

(“TCIA description: Radiology modality data cleanup to remove extraneous scans.”)

(TCIA description: “Radiology modality data cleanup to remove extraneous scans.”)

(TCIA description: “Radiology modality data cleanup to remove extraneous scans.”)

(TCIA description: TCIA description: “Radiology modality data cleanup to remove extraneous scans.”)

(TCIA description: “Radiology modality data cleanup to remove extraneous scans.”)

(TCIA description: “Radiology modality data cleanup to remove extraneous scans.”)

(TCIA description: “Added DICOM version of MED_ABD_LYMPH_MASKS.zip segmentations that were previously available”)

(Revised because QIBA-VolCT-1B analysis results were added)

(Revised because analysis results from nnU-Net-BPR-Annotations were revised)

(Revised because analysis results from nnU-Net-BPR-Annotations were revised)

(11 pathology-only patients removed at request of data owner)

(1 pathology-only patient removed at request of data owner)

(Analysis of NLST and NSCLC-Radiomics)

(Annotations of NLST and NSCLC-Radiomics radiology)

This release does not introduce any new data, but changes the bucket organization and introduces replication of IDC files in Amazon AWS storage buckets, as described in .

In this release we introduce a new HTAN program including currently three collections release by the .

*

*

Note that the TCGA-KIRP and TCGA-BRCA collections (marked with the asterisk in the list above) are currently missing SM high resolution layer files/instances due to a of Google Healthcare that makes it not possible to ingest datasets that exceed some internal limits. Specifically, the following patient/studies are affected:

is now available as public access collection

The following collections became limited access due to the , which is the original source of those collections.

Outcome Prediction in Patients with Glioblastoma by Using Imaging, Clinical, and Genomic Biomarkers: Focus on the Nonenhancing Component of the Tumor ()

DICOM-SEG Conversions for TCGA-LGG and TCGA-GBM Segmentation Datasets ()

is added. The data included consists of the following components:

2) a subset of clinical data available in the BigQuery tables starting with nlst_ under the idc_v4 dataset, as documented in the section.

The following radiology collections were updated to include DICOM Slide Microscopy (SM) images converted from the original vendor-specific representation into .

Listed below are all of the and collections of currently hosted by IDC, with the links to the Digital Object Identifiers (DOIs) of those collections.

Listed below are all of the and collections of currently hosted by IDC, with the links to the Digital Object Identifiers (DOIs) of those collections.

(only items corresponding to the LIDC-IDRI original collection are included)

(only items corresponding to the ISPY1 original collection are included)

Mediastinal-Lymph-Node-SEG
Spine-Mets-CT-SEG
CMB-BRCA
CMB-OV
CMB-AML
CMB-CRC
CMB-GEC
CMB-LCA
CMB-MEL
CMB-MML
CMB-PCA
CCDI-MCI
CMB-AML
CMB-CRC
CMB-GEC
CMB-LCA
CMB-MEL
CMB-MML
CMB-PCA
BAMF-AIMI-Annotations
UPENN-GBM
Pan-Cancer-Nuclei-Seg-DICOM
TCGA-BLCA
TCGA-BRCA
TCGA-CESC
TCGA-COAD
TCGA-GBM
TCGA-LUAD
TCGA-LUSC
TCGA-PAAD
TCGA-PRAD
TCGA-READ
TCGA-SKCM
TCGA-STAD
TCGA-UCEC
TCGA-UVM
RMS-Mutation-Prediction-Expert-Annotations
RMS-Mutation-Prediction
mediastinal_lymph_node_seg_clinical
spine_mets_ct_seg_clinical
CCDI-MCI
CMB-AML
CMB-CRC
CMB-GEC
CMB-LCA
CMB-MEL
CMB-MML
CMB-PCA
GTEx
Pancreas-CT-SEG
Pancreas-CT
Pan-Cancer-Nuclei-Seg-DICOM
TCGA-BLCA
TCGA-BRCA
TCGA-CESC
TCGA-COAD
TCGA-GBM
TCGA-LUAD
TCGA-LUSC
TCGA-PAAD
TCGA-PRAD
TCGA-READ
TCGA-SKCM
TCGA-STAD
TCGA-UCEC
TCGA-UVM
Advanced-MRI-Breast-Lesions
CMB-AML
CMB-CRC
CMB-GEC
CMB-LCA
CMB-MEL
CMB-MML
CMB-PCA
CPTAC-CCRCC
CPTAC-LSCC
CPTAC-UCEC
NLM-Visible-Human-Project
RIDER Lung CT
BAMF-AIMI-Annotations
ACRIN-NSCLC-FDG-PET
Anti-PD-1_Lung
Colorectal-Liver-Metastases
CPTAC-CCRCC
Duke-Breast-Cancer-MRI
HCC-TACE-Seg
Lung-PET-CT-Dx
NLST
NSCLC Radiogenomics
Prostate-MRI-US-Biopsy
PROSTATEx
QIN-BREAST
QIN LUNG CT
RIDER Lung PET-CT
SPIE-AAPM Lung CT Challenge
TCGA-KICH
TCGA-KIRC
TCGA-KIRP
TCGA-LIHC
TCGA-LUAD
TCGA-LUSC
UPENN-GBM
acrin_contralateral_breast_mr_A0
acrin_contralateral_breast_mr_AB
acrin_contralateral_breast_mr_F1
acrin_contralateral_breast_mr_I1
acrin_contralateral_breast_mr_IA
acrin_contralateral_breast_mr_IM
acrin_contralateral_breast_mr_IS
acrin_contralateral_breast_mr_KS
acrin_contralateral_breast_mr_MS
acrin_contralateral_breast_mr_M4
acrin_contralateral_breast_mr_P8
acrin_contralateral_breast_mr_PA
acrin_contralateral_breast_mr_PD
acrin_contralateral_breast_mr_PE
acrin_contralateral_breast_mr_PR
acrin_contralateral_breast_mr_QA
advanced_mri_breast_lesions_clinical
upenn_gbm
Advanced-MRI-Breast-Lesions
RMS-Mutation-Prediction-Expert-Annotations
RMS-Mutation-Prediction
TotalSegmentator-CT-Segmentations
NLST
Breast-Cancer-Screening-DBT
NLST
CPTAC-BRCA
CPTAC-COAD
RMS-Mutation-Prediction
TCGA-BLCA
TCGA-BRCA
TCGA-CHOL
TCGA-COAD
TCGA-ESCA
TCGA-HNSC
TCGA-KIRC
TCGA-KIRP
TCGA-LIHC
TCGA-LUAD
TCGA-LUSC
TCGA-PAAD
TCGA-PRAD
TCGA-READ
TCGA-SARC
TCGA-SKCM
TCGA-STAD
TCGA-TGCT
TCGA-THCA
TCGA-THYM
TCGA-UCEC
TCGA-UCS
acrin_nsclc_fdg_pet_bamf_lung_pet_ct_segmentation
anti_pd_1_lung_bamf_lung_ct_segmentation
anti_pd_1_lung_bamf_lung_fdg_pet_ct_segmentation
lung_pet_ct_dx_bamf_lung_ct_segmentation
lung_pet_ct_dx_bamf_lung_fdg_pet_ct_segmentation
nsclc_radiogenomics_bamf_lung_ct_segmentation
nsclc_radiogenomics_bamf_lung_fdg_pet_ct_segmentation
prostatex_bamf_segmentations
qin_breast_bamf_breast_segmentation
rider_lung_pet_ct_bamf_lung_ct_segmentation
rider_lung_pet_ct_bamf_lung_fdg_pet_ct_segmentation
tcga_kirc_bamf_kidney_segmentation
tcga_lihc_bamf_liver_ct_segmentation
tcga_lihc_bamf_liver_mr_segmentation
tcga_luad_bamf_lung_ct_segmentation
tcga_luad_bamf_lung_mr_segmentation
tcga_lusc_bamf_lung_ct_segmentation
tcga_lusc_bamf_lung_mr_segmentation
CMB-AML
CT-Phantom4Radiomics
EA1141
ReMIND
Vestibular-Schwannoma-MC-RC
BAMF-AIMI-Annotations
ACRIN-NSCLC-FDG-PET
Anti-PD-1-Lung
LUNG-PET-CT-Dx
NSCLC Radiogenomics
ProstateX
QIN-Breast
RIDER Lung PET-CT
TCGA-KIRC
TCGA-LIHC
TCGA-LUAD
TCGA-LUSC
Prostate-MRI-US-Biopsy-DICOM-Annotations
Prostate-MRI-US-Biopsy
Prostate-MRI-US-Biopsy
CMB-CRC
CMB-GEC
CMB-LCA
CMB-MEL
CMB-MML
CMB-PCA
CPTAC-CCRCC
CPTAC-PDA
ea1141_demographics
ea1141_mri
ea1141_risk_model
ea1141_screening
ea1141_status_12mo
ea1141_status_6mo
ea1141_tomosynthesis
htan_ohsu_demographics
htan_vanderbilt_demographics
htan_vanderbilt_diagnosis
htan_vanderbilt_exposure
htan_vanderbilt_familyhistory
htan_vanderbilt_followup
htan_vanderbilt_moleculartest
htan_vanderbilt_therapy
remind_clinical