Data known issues

  1. Indexing of the collection of NSCLC-Radiomics by the Data Commons Framework is pending.

  2. QIN multi-site collection of Lung CT data with Nodule Segmentations: only items corresponding to the LIDC-IDRI original collection are included

  3. DICOM SR of clinical data and measurement for breast cancer collections to TCIA: only items corresponding to the ISPY1 original collection are included

  4. ISPY1 (ACRIN 6657): Some of the segmentations in this collection are empty (as an example, SeriesNumber 42100 with SeriesDescription "VOI PE Segmentation thresh=70" in this study is empty).

  5. Due to the existing limitations of Google Healthcare API, not all of the DICOM attributes are extracted and are available in BigQuery tables. Specifically:

    • sequences that have more than 15 levels of nesting are not extracted (see https://cloud.google.com/bigquery/docs/nested-repeated) - we believe this limitation does not affect the data stored in IDC

    • sequences that contain around 1MiB of data are dropped from BigQuery export and RetrieveMetadata output currently. 1MiB is not an exact limit, but it can be used as a rough estimate of whether or not the API will drop the tag (this limitation was not documented as of writing this) - we know that some of the instances in IDC will be affected by this limitation. The fix for this limitation is targeted for sometime in 2021, according to the communication with Google Healthcare support.

Edit on GitHub