IDC User Guide
  • Welcome!
  • 🚀Getting started
  • Core functions
  • Frequently asked questions
  • Support
  • Key pointers
  • Publications
  • IDC team
  • Acknowledgments
  • Jobs
  • Data
    • Introduction
    • Data model
    • Data versioning
    • Organization of data
      • Files and metadata
      • Resolving CRDC Globally Unique Identifiers (GUIDs)
      • Clinical data
      • Organization of data, v2 through V13 (deprecated)
        • Files and metadata
        • Resolving CRDC Globally Unique Identifiers (GUIDs)
        • Clinical data
      • Organization of data in v1 (deprecated)
    • Downloading data
      • Downloading data with s5cmd
      • Directly loading DICOM objects from Google Cloud or AWS in Python
    • Data release notes
    • Data known issues
  • Tutorials
    • Portal tutorial
    • Python notebook tutorials
    • Slide microscopy
      • Using QuPath for visualization
  • DICOM
    • Introduction to DICOM
    • DICOM data model
    • Original objects
    • Derived objects
      • DICOM Segmentations
      • DICOM Radiotherapy Structure Sets
      • DICOM Structured Reports
    • Coding schemes
    • DICOM-TIFF dual personality files
    • IDC DICOM white papers
  • Portal
    • Getting started
    • Exploring and subsetting data
      • Configuring your search
      • Exploring search results
      • Data selection and download
    • Visualizing images
    • Proxy policy
    • Viewer release notes
    • Portal release notes
  • API
    • Getting Started
    • IDC API Concepts
    • Manifests
    • Accessing the API
    • Endpoint Details
    • V1 API
      • Getting Started
      • IDC Data Model Concepts
      • Accessing the API
      • Endpoint Details
      • Release Notes
  • Cookbook
    • Colab notebooks
    • BigQuery
    • Looker dashboards
      • Dashboard for your cohort
      • More dashboard examples
    • ACCESS allocations
    • Compute engine
      • 3D Slicer desktop VM
      • Using a BQ Manifest to Load DICOM Files onto a VM
      • Using VS Code with GCP VMs
      • Security considerations
    • NCI Cloud Resources
Powered by GitBook
On this page

Was this helpful?

Edit on GitHub
Export as PDF
  1. Data

Data known issues

PreviousData release notesNextPortal tutorial

Last updated 3 years ago

Was this helpful?

  1. Indexing of the collection of by the Data Commons Framework is pending.

  2. : only items corresponding to the LIDC-IDRI original collection are included

  3. : only items corresponding to the ISPY1 original collection are included

  4. : Some of the segmentations in this collection are empty (as an example, SeriesNumber 42100 with SeriesDescription "VOI PE Segmentation thresh=70" in is empty).

  5. Due to the existing limitations of Google Healthcare API, not all of the DICOM attributes are extracted and are available in BigQuery tables. Specifically:

    • sequences that have more than 15 levels of nesting are not extracted (see ) - we believe this limitation does not affect the data stored in IDC

    • sequences that contain around 1MiB of data are dropped from BigQuery export and RetrieveMetadata output currently. 1MiB is not an exact limit, but it can be used as a rough estimate of whether or not the API will drop the tag (this limitation was not documented as of writing this) - we know that some of the instances in IDC will be affected by this limitation. The fix for this limitation is targeted for sometime in 2021, according to the communication with Google Healthcare support.

NSCLC-Radiomics
QIN multi-site collection of Lung CT data with Nodule Segmentations
DICOM SR of clinical data and measurement for breast cancer collections to TCIA
ISPY1 (ACRIN 6657)
this study
https://cloud.google.com/bigquery/docs/nested-repeated