githubEdit

Getting started with GCP

You are also encouraged to review the slides in the following presentation that provides an introduction into GCP, and shares some best practices for its usage.

W. Longabaugh. Introduction to Google Cloud Platform. Presented at MICCAI 2021. (slides in Google Slidesarrow-up-right)

Google Cloud platform provides a range of solutions to better understand and analyze data hosted by IDC. Depending on what you want to do (see the range of options here), you may need to complete one or more of the following steps below.

The steps concerning creating a Google Cloud project and setting up billing are covered in Part 1 of our "Getting started" tutorial seriesarrow-up-right, and in this short video tutorialarrow-up-right.

Obtain a Google identity

Do you have a Google identity? If so, you can proceed to the next step.

If not, it only takes a minute to create a Google accountarrow-up-right. Note that you do NOT need a Gmail email account - you can use your non-Gmail email address to create one insteadarrow-up-right.

Set up a Google Cloud Project

To perform queries against IDC BigQuery tables you will need a cloud project. You can get a Google Cloud free project by following these steps (they are also illustrated in this short videoarrow-up-right):

  1. Go to https://console.cloud.google.com/arrow-up-right, and accept Terms and conditions.

  2. Click "Select a project" button in the upper left corner of the screen, and then click "New project".

  3. Open the GCP Dashboard ( > Cloud overview > Dashboard) and take note of the "Project ID" value - you will need it to perform some of the operations.

Additional reading materials:

Locate IDC metadata tables in Cloud BigQuery console

IDC uses BigQuery for managing metadata for the hosted data. In order to locate the tables that contain such metadata, complete the following steps:

  1. Click "+ ADD" button, and select "Star a project by name" from the Additional Resource table

  2. Type bigquery-public-data in the text box and click "PIN" button

  3. In the left panel, expand the bigquery-public-data drop-down, and navigate to the items called idc_v1, idc_v2, ..., idc_current, which are the datasets containing metadata tables maintained by IDC. Numbered datasets correspond to the IDC data versions documented in Data Release Notes. idc_current is an alias that always points to the latest IDC version.

Enable GCP BigQuery API

Navigate to the GCP BigQuery API pagearrow-up-right. If the BigQuery API has not been enabled, you will see a blue "ENABLE" button that you will need to push to enable that API. This is needed in order to be able to query IDC BigQuery tables using Python API.

[skip if using Google Colab] Install and configure Cloud SDK

Follow the instructions here to install and configure Google Cloud SDK: https://cloud.google.com/sdk/docs/install-sdkarrow-up-right.

Note that you will need to do this only if you want to interact with IDC data from your computer. If you use Google Colab, or Google Compute Engine VMs, Cloud SDK tools will be pre-installed and ready to use.

OPTIONAL: Set up billing for your project

You will not need to set up billing for your project in order to do basic operations with IDC, such as running Colab notebooks, or executing queries, as long as you stay within the GCP free tierarrow-up-right.

You will need to set up project billing if you want to launch your own VMs, or use resources beyond the free usage tier.

triangle-exclamation
circle-info

You can use Budget Alertsarrow-up-right to monitor your adherence to a budget!

Last updated

Was this helpful?