Introduction to DICOM
IDC relies on DICOM for data modeling, representation and communication. Most of the data stored in IDC is in DICOM format. If you want to use IDC, you (hopefully!) do not need to become a DICOM expert, but you do need to have a basic understanding how DICOM data is structured, and how to transform DICOM objects into alternative representations that can be used by the tools familiar to you.
This section does not intend to be a comprehensive introduction to the standard, but rather a very brief overview of some of the concepts that you will need to understand to better use IDC data.
As discussed in REF, the main mechanism of accessing the data stored in IDC is by using the storage buckets that contain individual files indexed through other interfaces. Each of the files in the collection-specific storage buckets encodes a DICOM object. Each DICOM object is a collection of data elements or attributes. Below is an example of a subset of attributes in a DICOM object, as generated by the IDC Viewer (which can be toggled by clicking the "Tag browser" icon in the IDC viewer toolbar):
The standard defines constraints on what kind of data each of the attributes can contain. Every single attribute defined by the standard is listed in the DICOM Data Dictionary, which defines those constraints:
- Value Multiplicity (VM) defines the number of items of the prescribed VR that can be contained in a given data element.
What attributes are included in a given object is determined by the type of object (or, to follow the DICOM nomenclature, Information Object). Part 3 of the DICOM standard is dedicated to the definitions (IODs) of those objects.
How do you know what object is encoded in a given file (or instance of the object, using the DICOM lingo)? There's an attribute for that
uniquely identifies the class of the encoded object. The content of this attribute is not easy to interpret, since it is a unique identifier. To map it to the specific object class name, you can consult the complete list of object classes available in Part 4 here.
When you use IDC portal to build your cohort, unique identifiers for the object classes are mapped to their names, which are available under the "Object class" group of facets in the search interface.
A somewhat relate attribute that hints at the type of object is
Modality, which is defined by the standard as "Type of equipment that originally acquired the data used to create the images in this Series", and is expected to take one of the values from this list. However,
Modalityis not equivalent to
SOPClassUID, and should not be used as a substitute. As an example it is possible that data derived from the original modality could be saved as a different object class, but keep the value of modality identical.