Page History
i2b2 ontologies have a column c_totalnum that can store the total count of patients associated with every item in the ontology tree. This can be visualized in the i2b2 webclient to assist with query building (e.g., to find concepts that have many patients) or be used for data quality (to find areas where patient counts do not make sense). It is also used by the query builder to optimize queries. The ENACT network uses these counts for additional analytics across their network. We recommend you run these counts after each ETL.
Info |
---|
i2b2 users must have the DATA_AGG user permission to view the counts through the web client. |
Installation
This should have already occurred in previous sections of this guide, but verify you have run these steps:
- If upgrading, create the totalnum and totalnum_report tables. In Release_1-7/Upgrade/Metadata, run the ant upgrade script.
ant -f data_build.xml upgrade_tables_release_1-7-12a - In the Release_1-7/NewInstall/Metadata/ run the ant script to create the stored procedures.
ant -f data_build.xml create_metadata_procedures_release_1-7 - Set privileges: If using multiple schemas, the stored procedure should be run from the metadata schema. Make sure the stored procedure can read the tables in the crcdata schema (observation_fact, visit_dimension, patient_dimension) and can both read an update ontology tables in the metadata schema (including table_access).
Fast Totalnum
i2b2 1.8 introduces a version that is 5-10x faster. This faster version is presently only available for MSSQL and has only been extensively tested with the ACT ontology. These replace the pat_count_dimensions
and run_all_counts
stored procedures.
Configuration
The first time you run this and when your local ontology changes, you must run the preparatory procedure. This creates a view of distinct concept codes and patient nums (OBSFACT_PAIRS), a unified ontology table (TNUM_ONTOLOGY) and a transitive closure table (CONCEPT_CLOSURE). It could take an hour to run.
exec FastTotalnumPrep or exec FastTotalnumPrep 'dbo'
- Optionally you can specify the schemaname, as above.
- ACT_VISIT_DETAILS_V4 and ACT_DEM_V4 table names are presently hardcoded, so change if your table names are different.
- If you use more than one fact table, the obsfact_pairs view will need to be customized. (See example in the code comments).
Run the actual counting. This relies on the i2b2 data tables and the closure and ontology tables created in step 1. It takes no parameters. Its output goes into the totalnum table, which was created when upgrading/installing i2b2 1.7.12 or 1.7.13 or 1.8. It typically runs in 1-3 hours.
exec FastTotalnumCount
Output the results to the totalnum_report table (as obfuscated counts) and into the totalnum column in the ontologies (for viewing in the query tool).
exec FastTotalnumOutput or exec FastTotalnumOutput 'dbo','@'
Optionally you can specify the schemaname and a single table name to run on a single ontology table (or @ for all).
Execution
Run the following commands in a SQL client.
exec FastTotalnumPrep or exec FastTotalnumPrep 'dbo'
(Run once when ontology changes.)exec FastTotalnumCount
(Actual counting, takes several hours.)exec FastTotalnumOutput or exec FastTotalnumOutput 'dbo','@'
(Output results to report table and UI.)
Some additional notes on running Postgres
Some users have reported difficulty executing the totalnum scripts due to user permissions. Lav Patel at UKMC has offered some solutions:
- Make sure the i2b2 user has access to insert, select, and update all i2b2 schemas... e.g.,
GRANT ALL PRIVILEGES ON DATABASE i2b2 to i2b2
- Make the i2b2 user a super user:
ALTER USER i2b2 with SUPERUSER;
- Change the schema ownership to the i2b2 user (requires function in the postgres directory of this repository):
select change_schema_owner('i2b2demodata', 'i2b2');
select change_schema_owner('i2b2metadata', 'i2b2');
select change_schema_owner('i2b2pm', 'i2b2');
select change_schema_owner('i2b2hive', 'i2b2');
Some additional notes on running on OMOP
It is possible to run counts on OMOP tables through the ENACT-OMOP feature in i2b2 1.8. The new 1.8 totalnum procedure works on OMOP - simply load the file totalnum_usp/sqlserver/totalnum_fast_prep_OMOP.sql
instead of totalnum_fast_prep.sql
.
Totalnum Classic (slower but more compatible with ontologies that use dimcode-based queries)
Configuration
If using multiple fact tables, the recommended approach is to create a fact table view as the union of all your fact tables. (This is essentially going back to a single fact table, but it is only used for totalnum counting. This is needed to correctly count patients that mention multiple fact tables within a hierarchy.)
e.g., create view observation_fact_view as select * from CONDITION_VIEW union all select * from drug_view
- If running the counting script in SQL Server, add the wildcard flag, to ignore multifact references in the ontology: e.g. exec RunTotalnum 'observation_fact_view','dbo','@','Y'
This is automatically accounted for in the other database platforms. Note this approach does not work if you have conflicting concept_cds across fact tables.
- If running the counting script in SQL Server, add the wildcard flag, to ignore multifact references in the ontology: e.g. exec RunTotalnum 'observation_fact_view','dbo','@','Y'
Execution: this can be done in two ways:
1. Run the ant command to execute the data_build.xml file with below specified target
POSTGRESQL : ant -f data_build.xml db_metadata_run_total_count_postgresql
ORACLE : ant -f data_build.xml db_metadata_run_total_count_oracle
SQL SERVER : ant -f data_build.xml db_metadata_run_total_count_sqlserver
2. Execute the RunTotalNum stored procedure manually against your database from a SQL Client. This can take several hours for large databases or large ontologies. Examples are below.
Oracle: | begin You can optionally include a table name if you only want to count one ontology table (this IS case sensitive): Note: If you get the error as: ERROR at line 1: ORA-01031: insufficient privilege, then run the command: |
SQL server: | exec RunTotalnum 'observation_fact','dbo','@' Parameters are: 1) the observation table name (for multi-fact-table setups), 2) the schema name, 3) a single table name to run on a single ontology table or '@' to run on all, and 4) and a wildcard flag that will ignore multifact references in the ontology if 'Y' |
PostgreSQL: | select RUNTOTALNUM('observation_fact','public') Replace 'public' by the schema name for the fact table If using a schema other than public for metadata, you might need to run "set search_path to 'i2b2metadata','public' " first as well |
When finished, verify it is complete by checking that c_totalnum columns in your ontology tables contain numbers (not nulls). These total counts will be visible in the ontology browser in the web client.
Parent folders will get counts (of all patients with facts in the leaves) except for ontology folders derived from visit_dimension or patient_dimension. These cannot be rolled up because of the way these terms are defined in the ontology. They will have no count at all (not a zero).