IDC User Guide
  • Welcome!
  • 🚀Getting started
  • Core functions
  • Frequently asked questions
  • Support
  • Key pointers
  • Publications
  • IDC team
  • Acknowledgments
  • Jobs
  • Data
    • Introduction
    • Data model
    • Data versioning
    • Organization of data
      • Files and metadata
      • Resolving CRDC Globally Unique Identifiers (GUIDs)
      • Clinical data
      • Organization of data, v2 through V13 (deprecated)
        • Files and metadata
        • Resolving CRDC Globally Unique Identifiers (GUIDs)
        • Clinical data
      • Organization of data in v1 (deprecated)
    • Downloading data
      • Downloading data with s5cmd
      • Directly loading DICOM objects from Google Cloud or AWS in Python
    • Data release notes
    • Data known issues
  • Tutorials
    • Portal tutorial
    • Python notebook tutorials
    • Slide microscopy
      • Using QuPath for visualization
  • DICOM
    • Introduction to DICOM
    • DICOM data model
    • Original objects
    • Derived objects
      • DICOM Segmentations
      • DICOM Radiotherapy Structure Sets
      • DICOM Structured Reports
    • Coding schemes
    • DICOM-TIFF dual personality files
    • IDC DICOM white papers
  • Portal
    • Getting started
    • Exploring and subsetting data
      • Configuring your search
      • Exploring search results
      • Data selection and download
    • Visualizing images
    • Proxy policy
    • Viewer release notes
    • Portal release notes
  • API
    • Getting Started
    • IDC API Concepts
    • Manifests
    • Accessing the API
    • Endpoint Details
    • V1 API
      • Getting Started
      • IDC Data Model Concepts
      • Accessing the API
      • Endpoint Details
      • Release Notes
  • Cookbook
    • Colab notebooks
    • BigQuery
    • Looker dashboards
      • Dashboard for your cohort
      • More dashboard examples
    • ACCESS allocations
    • Compute engine
      • 3D Slicer desktop VM
      • Using a BQ Manifest to Load DICOM Files onto a VM
      • Using VS Code with GCP VMs
      • Security considerations
    • NCI Cloud Resources
Powered by GitBook
On this page

Was this helpful?

Edit on GitHub
Export as PDF
  1. Data
  2. Organization of data

Resolving CRDC Globally Unique Identifiers (GUIDs)

PreviousFiles and metadataNextClinical data

Last updated 9 months ago

Was this helpful?

As described in the section, a UUID identifies a particular version of an IDC data object. Thus, there is a UUID for every version of every DICOM instance in IDC hosted data. An IDC BigQuery manifest optionally includes the UUID (called a crdc_instance_uuid) of each instance (version) in the cohort.

From the specification:

"The Data Repository Service (DRS) API provides a generic interface to data repositories so data consumers, including workflow systems, can access data objects in a single, standard way regardless of where they are stored and how they are managed. The primary functionality of DRS is to map a logical ID to a means for physically retrieving the data represented by the ID."

Each such UUID can be used to form a that has been indexed by the (DCF), and can be used to access data that defines that object. In particular this data includes the GCS and AWS URLs of the DICOM instance file. Though the GCS or AWS URL of an instance might change over time, the UUID of an instance can always be resolved to obtain its current URLs. Thus, for long term curation of data, it is recommended to record instance UUIDs.

The data object returned by the server is a GA4GH DRS :

This is a typical IDC instance UUID: 641121f1-5ca0-42cc-9156-fb5538c14355 of a (version of a) DICOM instance, and this is the corresponding DRS ID: dg.4DFC/641121f1-5ca0-42cc-9156-fb5538c14355

A DRS ID can be resolved by appending it to the following URL, which is the resolution service within CRDC: https://nci-crdc.datacommons.io/ga4gh/drs/v1/objects/ . For example, the following curl command:

>> curl https://nci-crdc.datacommons.io/ga4gh/drs/v1/objects/dg.4DFC/641121f1-5ca0-42cc-9156-fb5538c14355

returns this DrsObject:

{
  "access_methods": [
    {
      "access_id": "gs",
      "access_url": {
        "url": "gs://public-datasets-idc/cc9c8541-949d-48d9-beaf-7028aa4906dc/641121f1-5ca0-42cc-9156-fb5538c14355.dcm"
      },
      "region": "",
      "type": "gs"
    },
    {
      "access_id": "s3",
      "access_url": {
        "url": "s3://idc-open-data/cc9c8541-949d-48d9-beaf-7028aa4906dc/641121f1-5ca0-42cc-9156-fb5538c14355.dcm"
      },
      "region": "",
      "type": "s3"
    }
  ],
  "aliases": [],
  "checksums": [
    {
      "checksum": "f338e8c5e3d8955d222a04d5f3f6e2b4",
      "type": "md5"
    }
  ],
  "created_time": "2020-06-01T00:00:00",
  "description": "DICOM instance",
  "form": "object",
  "id": "dg.4DFC/641121f1-5ca0-42cc-9156-fb5538c14355",
  "index_created_time": "2023-06-26T18:27:45.810110",
  "index_updated_time": "2023-06-26T18:27:45.810110",
  "mime_type": "application/json",
  "name": "1.3.6.1.4.1.14519.5.2.1.7695.1700.277743171070833720282648319465",
  "self_uri": "drs://dg.4DFC:641121f1-5ca0-42cc-9156-fb5538c14355",
  "size": 135450,
  "updated_time": "2020-06-01T00:00:00",
  "version": "IDC version: 1"
}

AS can be seen, the access_methods component in the returned DrsObject includes a URL for each of the corresponding files in Google GCS and AWS S3.

Data Versioning
GA4GH Data Repository Service API
DRS ID
NCI CRDC Data Commons Framework
DrsObject