Getting started with GCP
Whether you are new to the cloud, or you consider yourself an expert, we encourage you to apply for a free Google cloud credits that we provide to our users to support cancer imaging research projects that work with Imaging Data Commons. All reasonable requests will receive a $300 allocation of credits that do not expire, and we will not require you to provide a credit card information to verify your identity. All you have to do is fill out and submit this application form.
You are also encouraged to review the slides in the following presentation that provides an introduction into GCP, and shares some best practices the its usage.
W. Longabaugh. Introduction to Google Cloud Platform. Presented at MICCAI 2021. (slides in Google Slides)
Google Cloud platform provides a range of solutions to better understand and analyze data hosted by IDC. Depending on what you want to do (see the range of options here), you may need to complete one or more of the following steps below.
The steps concerning creating a Google Cloud project and setting up billing are covered in this short video tutorial.
Do you have a Google identity? If so, you can proceed to the next step.
If not, it only takes a minute to create a Google account. Note that you do NOT need a Gmail email account - you can use your non-Gmail email address to create one instead.
To perform queries against IDC BigQuery tables you will need a cloud project. You can get started with Google Cloud free project with the following steps (they are also illustrated in this short video):
- 2.Click "Select a project" button in the upper left corner of the screen, and then click "New project".
- 3.Open the GCP Dashboard ( ≡ > Cloud overview > Dashboard) and take note of the "Project ID" value - you will need it to perform some of the operations.
Additional reading materials:
IDC is using BigQuery for managing metadata for the hosted data. In order to locate the tables that contain such metadata, complete the following steps:
- 2.Click "+ ADD DATA" button, and select "Pin a project > Enter project name"
bigquery-public-datain the text box and click "PIN" button
- 4.In the left panel, expand the
bigquery-public-datadrop-down, and navigate to the items called
idc_current, which are the datasets containing metadata tables maintained by IDC. Numbered datasets correspond to the IDC data versions documented in Data Release Notes.
idc_currentis an alias that always points to the latest IDC version.
Navigate to the GCP BigQuery API page. If the BigQuery API has not been enabled, you will see blue "ENABLE" button that you will need to push to enable that API. This is needed in order to be able to query IDC BigQuery tables using Python API.
Follow the instructions here to install and configure Google Cloud SDK: https://cloud.google.com/sdk/docs/install-sdk.
Note that you will need to do this only if you want to interact with IDC data from your computer. If you use Google Colab, or Google Compute Engine VMs, Cloud SDK tools will be pre-installed and ready to use.
You will not need to set up billing for your project to do basic operations with IDC, such as running Colab notebooks, or executing queries, as long as you stay within the GCP free tier.
You will need to set up project billing if you want to launch your own VMs, or use resources beyond the free usage tier.
If you are just starting, it may be easiest to take advantage of the IDC "early adopter" free cloud credits allocation by filling out this form.
Once you set up billing, we can't stress enough how important it is to be diligent in tracking your usage of GCP resources!
- Be sure to shut down anything you aren't using - free trial credits, IDC-provided credits or your credit card will be charged otherwise for the resources you are not using.
- Be careful with your login information. If someone takes over your account they could run up a huge bill that you will be responsible for paying.
- Unless you are not concerned about billing, remember to SHUT DOWN THE MACHINE when you aren't using it! You are billed continuously while the VM instance is running.
- Even after you stop the VMs, you keep paying for the disk storage attached to those machines! You can delete the VM instances to stop incurring those costs.