# IDC API Concepts

The IDC API is based on IDC Data Model concepts. Several of these concepts have been previously introduced in the context of the IDC Portal. We discuss these concepts here with respect to the IDC API.

## IDC Versions

As described previously, IDC data is versioned such that searching an IDC version according to some criteria (some *filter set* as described below) will always identify exactly the same set of DICOM objects.

The *GET /versions API* endpoint returns a list of the current and previous IDC data versions.

## Original Collections

An *original collection* is a set of DICOM data provided by a single source. (We usually just use *collection* to mean *original collection*.) Such *collections* are comprised primarily of DICOM image data that was obtained from some set of patients. However some original collections also include annotations, segmentations or other analyses of the image data in the collection. Typically, the patients in an *collection* are related by a common cancer type, though this is not always the case.

The *GET* /collections endpoint returns a list of the *original collections,* in the current IDC version. Some metadata about each collection is provided.

## Analysis Results

Analysis *results* are comprised of DICOM data that was generated by analyzing data in one or more *original collections.* Typically such analysis is performed by a different entity than that which provided the *original* *collection(s)* on which the analysis is based. Examples of data in *analysis* *collections* include segmentations, annotations and further processing of original images.&#x20;

Because a DICOM instance in an *analysis result* is "in" the same series and study as the DICOM instance data of which it is an analysis result, it is also "in" the same patient, and therefore is considered to be "in" the same collection.&#x20;

Specifically, each instance in IDC data has an associated *collection\_id.* An *analysis result* will have the same *collection\_id* as the *original collection* of which it is an analysis result.&#x20;

The *GET* /analysis\_results endpoint returns a list of the *analysis results, with some metadata,* in the current IDC version.

## Filter Sets

A *filter set* selects some set of DICOM objects in IDC hosted data, and is a set of conditions, where each condition is defined by an **attribute** and an array of values. An attribute identifies a field (column) in some data source (BQ table). Each *filter set* also includes the *IDC data version* upon which it operates.

Filter sets are JSON encoded.  Here is an example *filter set*:

```
{
  "filters": {
    "collection_id": [
      "TCGA-LUAD",
      "TCGA-KIRC"
    ],
    "Modality": [
      "CT",
      "MR"
    ],
    "race": [
      "WHITE"
    ],
    "age_at_diagnosis_btw": [
      65, 
      75
    ]
  }
}
```

A *filter set* selects a DICOM instance if, for every *attribute* in the *filter set*, the instance's corresponding value *satisfies* one or more of the values in the associated array of values. This is explained further below.

For example, the (attribute, \[values]) pair ("Modality", \["MR", "CT"]) is satisfied if an instance "has" a Modality of MR or CT.&#x20;

Note that if a *filter set* includes more than one (attribute, \[values]) pair having the same attribute, then only the last such (attribute, \[values]) pair is used. Thus if a *filter group* includes the (attribute, \[values]) pairs ("Modality", \["MR"]) and ("Modality", \["CT"]), in that order, only ("Modality", \["CT"]) is used.

The *filter set* above will select any instance in the current IDC version that is in the TCGA\_KIRC collection or the TCGA\_LUAD' collections.  To be selected by the filter, an instance must also have a Modality of CT or MR, and an age\_at\_diagnosis value between 65 and 75 .&#x20;

Because of the hierarchical nature of DICOM, if a *filter set* selects an instance, it implicitly selects the series, study, patient and collection which contain that instance. A manifest can be configured to return data about some or all of these entities.

Note that when defining a cohort through the API, the IDC version is always the current IDC version.&#x20;

## Data Sources

IDC maintains a set of GCP BigQuery (BQ) tables containing various types of metadata that together describe IDC data.

In the context of the API, a *data source* (or just *source*) is a BQ table that contains some portion of the metadata against which a *filter set* is applied. An API query to construct a manifest is performed against one or more such tables as needed.&#x20;

## Attributes

Both the IDC Web App and API expose selected fields against which queries can be performed. The */filters* endpoint returns the available filter attributes The */filters/values/{filter}* endpoint returns a list of the values which a specified Categorical String or Categorical Numeric filter attribute will match. Each *attribute* has a data type, one of:

* **String:**\
  An *attribute* with data type ***String*** may have an arbitrary string value. For example, the possible values of a *StudyDescription* attribute are arbitrary. An object is selected if its String attribute matches any of the values in the values array. Matching is insensitive to the case (upper case, lower case) of the characters in the strings. Thus ("StudyDescription",\["PETCT Skull-Thigh"] will match a StudyDescription containing the substring "PETCT SKULL-THIGH", or "petct skull-thigh" etc.\
  Pattern matching in String attributes is also supported. The ('StudyDescription",\["%SKULL%",  "ABDOMEN%", "%Pelvis"]) filter will match any StudyDescription that contains "SKULL", "skull", "Skull", etc., starts with "ABDOMEN", "abdomen", etc., or ends with "Pelvis", "PELVIS", etc.
* **Categorical String** An *attribute* with data type Categorical String will have one of a defined set of string values. For example, Modality is an Categorical String *attribute* that has possible values 'CT', 'MR', 'PT', etc.\
  Categorical String attributes have the same matching semantics as for Strings.\
  The /filters/values/{filter} endpoint returns a list of the values accepted for a specified Categorical String attribute (filter).
* **Categorical Numeric** An *attribute* with data type Categorical Numeric has one of a defined set of numeric values. The corresponding value array must have a single numeric value. The (attribute, value array) pair for a Categorical Numeric is satisfied if the *attribute* is equal to the value in the value array.\
  The /filters/values/{filter} endpoint returns a list of the values accepted for a Categorical Numeric attribute (filter).
* **Ranged Integer** An attribute with data type Ranged Integer will have an integer value. For example, *age\_at\_diagnosis* is an attribute of data type Ranged Integer. In order to enable relative numeric queries, the API exposes eight variations of each Ranged Integer attribute as *filter* *attribute* names. These variations are the base attribute name with one of the suffixes: *eq*, *gt*, *gte*, *btw*, *btwe*, *ebtw*, *ebtwe*, *lte*, or *lt,* e.g. *age\_at\_diagnosis\_eq*. The value array of the *btw*, *btwe*, *ebtw*, and *ebtwe* variations must contain exactly two **integer** values, in numeric order (least value first). The value array of the *eq*, *gt*, *gte*, *lte*, and *lt* variations must contain exactly one **integer** values. The (attribute, value array) pair for a Ranged Integer attribute is satisfied according to the suffix as follows:

  * eq: If an *attribute* is equal to the value in the value array
  * gt: If an *attribute* is greater than the value in the value array
  * gte: If an *attribute* is greater than or equal to the value in the value array
  * btw: if an *attribute* is greater than the first value and less than the second value in the value array
  * ebtw: if an *attribute* is greater than or equal to the first value and less than the second value in the value array
  * btwe: if an *attribute* is greater than the first value and less than or equal to the second value in the value array
  * ebtwe: if an *attribute* is greater than or equal to the first value and less than or equal to the second value in the value array
  * lte: If an *attribute* is less than or equal to the value in the value array
  * lt: If an *attribute* is less than the value in the value array

  **Ranged Number** An attribute with data type Ranged Number will have a numeric (integer or float) value. For example, *diameter* is an attribute of data type Ranged Number. In order to enable relative numeric queries, the API exposes eight variations of each Ranged Number attribute as *filter* *attribute* names. These variations are the base *attribute* name with one of the suffixes: *eq*, *gt*, *gte*, *btw*, *btwe*, *ebtw*, *ebtwe*, *lte*, or *lt,* e.g. *diameter\_eq*. The value array of the *btw*, *btwe*, *ebtw*, and *ebtwe* variations must contain exactly two numeric values, in numeric order (least value first). The value array of the *eq*, *gt*, *gte*, *lte*, and *lt* variations must contain exactly one numeric values. The (attribute, value array) pair for a Ranged Number attribute is satisfied according to the suffix as follows:

  * eq: If an *attribute* is equal to the value in the value array
  * gt: If an *attribute* is greater than the value in the value array
  * gte: If an *attribute* is greater than or equal to the value in the value array
  * btw: if an *attribute* is greater than the first value and less than the second value in the value array
  * ebtw: if an *attribute* is greater than or equal to the first value and less than the second value in the value array
  * btwe: if an *attribute* is greater than the first value and less than or equal to the second value in the value array
  * ebtwe: if an *attribute* is greater than or equal to the first value and less than or equal to the second value in the value array
  * lte: If an *attribute* is less than or equal to the value in the value array
  * lt: If an *attribute* is less than the value in the value array

## Cohorts

A *cohort* is the set of DICOM objects in IDC hosted data selected by a *filter set.*

The API no longer supports user defined cohorts. However, the ***POST*****&#x20;/cohorts/manifest/preview** endpoint effectively creates a cohort, queries the cohort to obtain a manifest of metadata of the objects in the cohort, and then deletes the cohort. The data in the manifest is highly configurable and can be used, with suitable tools, to obtain DICOM files from cloud storage. A manifest returned by the API can include values from a large set of fields.

Manifests are discussed in the next section.

## **IDC API UI**

The [IDC API UI](https://api.imaging.datacommons.cancer.gov/v2/swagger) can be used to see details about the syntax of each call, and also provides an interface to test requests. Each endpoint is also documented the [Endpoint Details](https://learn.canceridc.dev/api/endpoint-details) section.

### Make a Request

For a quick demonstration of the syntax of an API call, test the [GET /collections](https://api.imaging.datacommons.cancer.gov/v2/swagger#/data%20model/getCollections) request. You can experiment with this endpoint by clicking the ‘Try it out’ button, and then the 'Execute' button.

The API will return collection metadata for the current IDC data version.&#x20;

**Request Response**

The Swagger UI submits the request and shows the ***curl*** code that was submitted. The ***Response body*** section will display the response to the request. The expected JSON schema format of the response to this API request is shown below:

```
{
  "collections": [
    {
      "cancer_type": "string",
      "collection_id": "string",
      "date_updated": "string",
      "description": "string",
      "doi": "string",
      "image_types": "string",
      "location": "string",
      "species": "string",
      "subject_count": 0,
      "supporting_data": "string",
    }
  ],
  "code": 200
}
```

The actual JSON formatted response can be downloaded to your local file system by clicking the ‘Download’ button.
