DICOMweb

We welcome your questions or comments on this documentation page content! Please start a thread on IDC forum, and we will be happy to help you!

Background

The DICOMweb interface is available for accessing IDC data. This interface could be especially useful for efficiently downloading small(er) parts of large digital pathology images. While the entire pathology whole-slide image (WSI) pyramid can reach gigabytes in size, the part that is needed for a specific visualization or analysis task can be rather small and localized to the specific image tiles at a given resolution.

New to DICOM WSI? Check out our introductory tutorial to learn how slide microscopy images are organized in DICOM.

Detailed information on the DICOMweb endpoints that are available to access IDC data is provided here. In brief, there are two DICOM stores available - the IDC-maintained DICOM store and the Google-maintained DICOM store - we recommend that you familiarize yourself with the documentation to learn about the differences between the two, and select the option that is optimal for your use case.

Unique identifiers: locating the relevant slides

IDC uses DICOM for data organization, and every image contains metadata organized following the data model documented here. Each slide corresponds to a DICOM Series, uniquely identified by the SeriesInstanceUID, which in turn belongs to a DICOM Study identified by the StudyInstanceUID. You will need these two identifiers to access any DICOM slide using DICOMweb!

Since IDC contains many terabytes of images, you will typically want to first select images/slides that meet your needs. IDC offers various interfaces to explore and subset the data, starting from the IDC Portal, to the Python package idc-index (covered in this tutorial) and BigQuery SQL interfaces (see this tutorial). We strongly recommend you work through the referenced tutorials, but for the purposes of this tutorial, we will demonstrate how you can locate UIDs of a slide that corresponds to pancreas tissue.

First, install idc-index with pip install —upgrade idc-index (--upgrade part is very important to make sure you are working with the latest data release of IDC!).

Next, the following snippet demonstrates how to select slides of pancreas tissue (you can also select by the lens magnification, stain, and many other attributes - see this tutorial for details).

from idc_index import IDCClient

# Instantiate the client
idc_client = IDCClient()
idc_client.fetch_index('sm_index')


# Filter the slides
query = """
SELECT index.StudyInstanceUID, sm_index.SeriesInstanceUID
FROM sm_index
JOIN index ON sm_index.SeriesInstanceUID = index.SeriesInstanceUID
WHERE Modality = 'SM' AND primaryAnatomicStructure_CodeMeaning = 'Pancreas'
"""

pancreas_slides = idc_client.sql_query(query)

Next, we select the first slide and will use its StudyInstanceUID and SeriesInstanceUID in the subsequent sections of the code.

('2.25.25332367070577326639024635995523878122', '1.3.6.1.4.1.5962.99.1.3380245274.1362068963.1639762817818.2.0')

Reading slide regions via DICOMweb

We recommend the following two Python libraries that facilitate access to a DICOM store via DICOMweb:

Both libraries can be installed using pip:

wsidicom is based upon the dicomweb_client Python library, while ez-wsi-dicomweb includes its own DICOMweb implementation.

The following code snippets show exemplarily how to use each of the libraries to access a subregion from a DICOM slide identified by the following UIDs we selected earlier:

  • sample_study_uid = 2.25.25332367070577326639024635995523878122

  • sample_series_uid = 1.3.6.1.4.1.5962.99.1.3380245274.1362068963.1639762817818.2.0

wsidicom

When you work with wsidicom, the first step requires setting up dicomweb_client’s DICOMwebClient:

If you are accessing the Google-maintained DICOM store, you need to authenticate with your Google credentials first and set up an authorized session for the DICOMwebClient.

Otherwise, if you prefer using IDC-maintained proxied DICOM store, you can skip ahead and just set up your DICOMwebClient using the proxy URL.

Slide access with wsidicom

You now need to wrap the previously set-up DICOMwebClient into wsidicom’s WsiDicomWebClient. Then you can use the open_web() functionality to find, open and navigate the content of the selected slide:

To access a certain part of a slide, wsidicom offers the read_region() functionality:

Screenshot of slide region

ez-wsi-dicomweb

The following code shows how to set-up an interface for DICOMweb with ez-wsi-dicomweb. You can only use this interface for accessing data from the Google-maintained DICOM store, which means authentication with you Google account is required.

The slide, slide level information and slide regions can be accessed as follows. To accelerate image retrieval, ez-wsi-dicomweb can be configured to fetch frames in blocks and cache them for subsequent use. For more information, check out this notebook, section “Enabling EZ-WSI DICOMweb Frame Cache”.

Screenshot of slide region

Iterating through tiles using DICOMweb

To iterate over image tiles you can simply wrap the functionality presented above into your own function that iterates over the coordinates of interest to you. In case you prefer to iterate over the frames as they are stored within the DICOM file, wsidicom does also offer a read_tile() method. Iteration over a slide and accessing tiles from an area defined by a tissue mask can be quite easily achieved using ez-wsi-dicomweb’s DICOMPatchGenerator as described in this notebook in section “Generating patches from a level image”.

Recommendations

Both libraries — ez-wsi-dicomweb and wsidicom— can be recommended for reliable DICOMweb access to IDC data. Based on our experience, ez-wsi-dicomweb is often faster, likely due to its caching capabilities, and customizations for efficient access to image patches from a Google DICOM store for AI model training. wsidicom, on the other hand, is a more general-purpose tool offering extensive functionality for accessing DICOM files (images as well as annotation files) both from local disk or from the cloud via DICOMweb. It is important to note that when running code locally, access times may be slightly longer compared to cloud-based (such as in a Colab notebook) execution.

Last updated

Was this helpful?