Building Queries from Ontology Terms
Introduced in i2b2 core version 1.6
In the i2b2 1.6 release we allow the direct use of the patient_dimension and the visit_dimension tables in a query built with the i2b2 query tool. The capability exists to some extent in earlier releases, but was not fully defined and implemented.
The ontology cell provides instructions on how the data repository cell should perform its query for any given term or set of terms. Each row in an ontology metadata table(s) represents a term or item, such as a diagnosis like "diabetes", or a laboratory test like "white blood cell count". In order to find these items in the data repository, the data repository cell needs to know which table to find them in and how to identify them. The information contained in the metadata tables is utilized to build a metadata SELECT SQL statement as follows:
<span style="color: #0000ff">select</span> [c_facttablecolumnname] <span style="color: #0000ff">from</span> [c_tablename] <span style="color: #0000ff">where</span> [c_columnname] [c_operator] [c_dimcode]
The intent of this query is to associate a link between the dimension tables and the fact table for a given term. As a result every metadata SELECT SQL statement should return a fact table key.
A sample concept_dimension-based term is shown:
<span style="color: #0000ff">select</span> concept_cd <span style="color: #0000ff">from</span> concept_dimension <span style="color: #0000ff">where</span> concept_path <span style="color: #808080">LIKE</span> <span style="color: #ff0000">'\i2b2\Diagnoses\Circulatory system (390-459)%'</span>
The result of this query is a list of concept_cds to query against the fact table. The data repository cell uses this data to find a set of patients having a fact entry for this list of concept codes. This patient set finder query is as follows:
<span style="color: #0000ff">select</span> patient_num <span style="color: #0000ff">from</span> [c_facttablename] <span style="color: #0000ff">where</span> [c_facttablecolumnname] <span style="color: #808080">IN</span>
<span style="color: #808080">(</span><span style="color: #0000ff">select</span> [c_facttablecolumnname] <span style="color: #0000ff">from</span> [c_tablename] <span style="color: #0000ff">where</span> [c_columnname] [c_operator] [c_dimcode]<span style="color: #808080">)</span>
- Note that the sub-query in parentheses is the metadata SELECT SQL query.
Following through with our example, the patient set finder query would be:
select patient_num from observation_fact where concept_cd IN
(select concept_cd from concept_dimension where concept_path LIKE '\i2b2\Diagnoses\Circulatory system (390-459)%')
The ability to create a statement from the entries in the ontology metadata tables allows new items to be defined in a flexible manner. It also allows the ontology metadata table entries to have possible destructive effects if erroneous or malicious entries are added. Therefore, the trust relationship between the data repository cell and the ontology cell must be respected, and ontology metadata table entries must only be created by a trusted source.
Extending Queries to Dimension-Based Metadata Terms
Provider_dimension-based Metadata Terms
The Provider_dimension-based metadata terms are very similar in scope to concept_dimension-based terms: fact table key concept_cd is replaced by provider_id; metadata table's concept_path is replaced by provider_path. A sample patient set finder query is as follows:
select patient_num from observation_fact where provider_id in
(select provider_id from provider_dimension where provider_path LIKE '\i2b2\Providers\Emergency%')
Patient_dimension-based Metadata Terms
The Patient_dimension-based metadata terms are generally demographics-related: any of the following codes could be queried if they existed in the patient_dimension table:
- sex_cd
- language_cd
- race_cd
- marital_status_cd
- religion_cd
- zip_cd
- statecityzip_path
- income_cd
- vital_status_cd
The associated metadata SELECT query is straightforward for each of these.
select patient_num from patient_dimension where race_cd = 'white'
select patient_num from patient_dimension where sex_cd = 'F'
select patient_num from patient_dimension where statecityzip_path LIKE
'Zip codes\Massachusetts\Cambridge%'
select patient_num from patient_dimension where birth_date < '01/01/2005'
Slightly more complicated are queries based on birth_date or age. The following identifies patients between the ages of 10 and 17 years old:
select patient_num from patient_dimension where birth_date BETWEEN
sysdate – (365.25*18) AND sysdate – (365.25*10)
Visit_dimension-based Metadata Terms
The Visit_dimension-based metadata terms can be queried for any of the following codes if they exist in the visit_dimension table:
- active_status_cd
- inout_cd
- location_cd
- location_path
- length_of_stay
The associated metadata SELECT query is straightforward for each of these.
select encounter_num from visit_dimension where length_of_stay = 2
select encounter_num from visit_dimension where inout_cd = 'I'
Dimcode Rules
Notice that the term's dimcode passed by the ONT cell doesn't necessarily match the [c_dimcode] used in the metadata SELECT SQL query 100%. In general, the dimcodes provided by the metadata table are allowed to lack punctuation required by the SQL statement if the "fixing" of the terms is unambiguous. The data repository cell remedies this according to a set of rules for each data type.
Text (T) Data Type
The following are the rules for the Text (T) data type:
# 1 |
For the LIKE operator, if the dimcode is missing '\' at the end AND there are no enclosing single quotes AND there are no open and close parenthesis, then '\' will be added at the end. |
|
|
# 2 |
For the LIKE operator, if the dimcode is missing '%' at the end AND there are no enclosing single quotes AND there are no open and close parenthesis, then '%' will be added at the end. |
|
|
# 3 |
For any text operator, enclosing single quotes will be added, if the dimcode is missing them AND there are no open and close parentheses. |
|
|
# 4 |
For the IN operator, open and close parenthesis will be added, if the dimcode is missing them. |
Text Data Examples
Operator |
Dimcode Format |
Sample Metadata XML |
SQL |
LIKE |
Backslash, '%' and enclosing single quotes will be added to the dimcode, if they are omitted |
<concept> |
SELECT patient_num |
IN |
('val1,'val2','val3') or 'val1','val2','val3' |
<concept> |
SELECT patient_num |
All text Operators |
'val1' or val1 |
<concept> |
SELECT patient_num |
Number (N) Data Type
The following are the rules for the Text (T) data type:
# 1 |
For the IN operator, open and close parenthesis will be added if the dimcode is missing them. |
Numeric Data Examples
Operator |
Dimcode Format |
Sample Metadata XML |
SQL |
IN |
(val1,val2,val3) or val1,val2,val3 |
<concept> |
SELECT patient_num |
Date (D) Data Type
The following are the rules for the Date (D) data type:
# 1 |
For the IN operator, open and close parenthesis will be added if the dimcode is missing them. |
Date Data Examples
Operator |
Dimcode Format |
Sample Metadata XML |
SQL |
IN |
('val1','val2','val3') or 'val1','val2','val3' |
For SQLServer: |
SELECT patient_num |
All date operators (= , > , < , <> , BETWEEN) |
Follows the database specific value format related to the date operator |
For SQLServer: |
For SQL Server: |
|
|
For Oracle: |
For Oracle: |
© Shawn Murphy 2010