Getting started with GCP
You are also encouraged to review the slides in the following presentation that provides an introduction into GCP, and shares some best practices for its usage.
W. Longabaugh. Introduction to Google Cloud Platform. Presented at MICCAI 2021. (slides in Google Slides)
Google Cloud platform provides a range of solutions to better understand and analyze data hosted by IDC. Depending on what you want to do (see the range of options here), you may need to complete one or more of the following steps below.
The steps concerning creating a Google Cloud project and setting up billing are covered in Part 1 of our "Getting started" tutorial series, and in this short video tutorial.
Obtain a Google identity
Do you have a Google identity? If so, you can proceed to the next step.
If not, it only takes a minute to create a Google account. Note that you do NOT need a Gmail email account - you can use your non-Gmail email address to create one instead.
Set up a Google Cloud Project
To perform queries against IDC BigQuery tables you will need a cloud project. You can get a Google Cloud free project by following these steps (they are also illustrated in this short video):
Go to https://console.cloud.google.com/, and accept Terms and conditions.
Click "Select a project" button in the upper left corner of the screen, and then click "New project".
Open the GCP Dashboard ( ≡ > Cloud overview > Dashboard) and take note of the "Project ID" value - you will need it to perform some of the operations.
Additional reading materials:
See Google’s documentation about how to create a Google Cloud Project.
Learn about how to add members and roles to a project.
Locate IDC metadata tables in Cloud BigQuery console
IDC uses BigQuery for managing metadata for the hosted data. In order to locate the tables that contain such metadata, complete the following steps:
Open BigQuery console: https://console.cloud.google.com/bigquery
Click "+ ADD" button, and select "Star a project by name" from the Additional Resource table
Type
bigquery-public-data
in the text box and click "PIN" buttonIn the left panel, expand the
bigquery-public-data
drop-down, and navigate to the items calledidc_v1
,idc_v2
, ...,idc_current
, which are the datasets containing metadata tables maintained by IDC. Numbered datasets correspond to the IDC data versions documented in Data Release Notes.idc_current
is an alias that always points to the latest IDC version.
Enable GCP BigQuery API
Navigate to the GCP BigQuery API page. If the BigQuery API has not been enabled, you will see a blue "ENABLE" button that you will need to push to enable that API. This is needed in order to be able to query IDC BigQuery tables using Python API.
[skip if using Google Colab] Install and configure Cloud SDK
Follow the instructions here to install and configure Google Cloud SDK: https://cloud.google.com/sdk/docs/install-sdk.
Note that you will need to do this only if you want to interact with IDC data from your computer. If you use Google Colab, or Google Compute Engine VMs, Cloud SDK tools will be pre-installed and ready to use.
OPTIONAL: Set up billing for your project
You will not need to set up billing for your project in order to do basic operations with IDC, such as running Colab notebooks, or executing queries, as long as you stay within the GCP free tier.
You will need to set up project billing if you want to launch your own VMs, or use resources beyond the free usage tier.
Once you set up billing, we can't stress enough how important it is to be diligent in tracking your usage of GCP resources!
Be sure to shut down anything you aren't using - free trial credits, IDC-provided credits or your credit card will be charged otherwise for the resources you are not using.
Be careful with your login information. If someone takes over your account they could run up a huge bill that you will be responsible for paying.
Unless you are not concerned about billing, remember to SHUT DOWN THE MACHINE when you aren't using it! You are billed continuously while the VM instance is running.
Even after you stop the VMs, you keep paying for the disk storage attached to those machines! You can delete the VM instances to stop incurring those costs.
You can use Budget Alerts to monitor your adherence to a budget!
Last updated