Frequently asked questions
We provide a summary of what you can do with IDC, and what you will need to access specific capabilities. If you want to explore the capabilities of the cloud that require a billing account, and would like to develop a better understanding of the costs before committing your credit card, you can apply for a free Google cloud credit allocation using this form.
IDC is a cloud-based repository of publicly available cancer imaging data co-located with the analysis and exploration tools and resources.
First, we want to enable you to accomplish a lot of tasks related to data exploration (search, visualization, subsetting, cohort building) without having to download anything to your computer. We accomplish this by co-locating data with the cloud-based tools that support those tasks.
Once you identified data of interest, you can either download it to your computer and proceed with your analysis workflows, or you can cross-load the data to a cloud-based VM for further analysis. The latter should hopefully make it easier to reproduce and share your analysis workflow.
IDC pilot release took place in Fall 2020, followed by the production release in September 2021. You can learn about the planned milestones for the IDC development in these slides presented at RSNA 2019.
For a dataset to become part of the IDC offering, it has to be de-identified and curated by TCIA, and released as a public TCIA collection. Once this is done, it will (eventually!) be replicated in IDC.
Please cite the paper below:
Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S., Aerts, H. J. W. L., Homeyer, A., Lewis, R., Akbarzadeh, A., Bontempi, D., Clifford, W., Herrmann, M. D., Höfener, H., Octaviano, I., Osborne, C., Paquette, S., Petts, J., Punzo, D., Reyes, M., Schacherer, D. P., Tian, M., White, G., Ziegler, E., Shmulevich, I., Pihl, T., Wagner, U., Farahani, K. & Kikinis, R. NCI Imaging Data Commons. Cancer Res. 81, 4188–4193 (2021). http://dx.doi.org/10.1158/0008-5472.CAN-21-0950
IDC and TCIA are partners in providing FAIR data for cancer imaging researchers. While some of the functions between the two resources are similar, there are also key differences. The table below provides a summary of similarities and differences.
IDC is the right choice for you if you want to...
- Explore metadata, visualize images and annotations, build cohorts from the data included in public TCIA collections
- Analyze TCIA public collections data on the cloud
- Use existing tools such as Google Colab, BigQuery, and DataStudio with the TCIA public collections data
- Perform complex queries against any of the DICOM attributes in the TCIA public collections
- Utilize other resources available in CRDC, such as CRDC Cloud Resources, for data analysis
- Quickly visualize specific images from TCIA public collections
IDC is NOT the right choice for you if you want to...
- Upload and share publicly the imaging dataset you collected
- De-identify your dataset
- 1.Define relevant imaging data cohorts from public datasets (based on rich standardized metadata).
- 2.Explore the cohort, check quality (visualize images and image-derived data).
- 3.Apply off-the-shelf cloud analytics tools to the cohort:
- BigQuery, DataStudio, Colab Notebooks
- Further cohort refinement, metadata exploration, analysis
- 4.Perform exploratory analysis of data on a cloud VM.
- 5.Scale-up analysis.
- 6.Integrate with other data sources:
- Private data
- Non-imaging data from CRDC (i.e., genomics and proteomics)
- 7.Share analysis results and workflows.
At the moment, non-imaging data, such as the spreadsheets with clinical information, is not replicated on IDC and it is not possible to search this data using IDC Portal. You will need to access this data from TCIA.
Our short term plan is to selectively bring such spreadsheets as collection-specific BigQuery tables available within the release dataset (as an example, such tables are available for the NLST collection). We may expose some of those tables/attributes in the IDC portal.
Our longer-term plan is to work with the CRDC Center for Cancer Data Harmonization (CCDH) to harmonize the data in these spreadsheets, and identify the appropriate format and location for the resulting harmonized data.