UMMS has designed and populated a clinical data warehouse. The warehouse strategy uses the i2b2 (Informatics for Integrating Biology and the Bedside, an NIH-funded National Center for Biomedical Computing (NCBC) based at Partners HealthCare System in Boston) software to store the clinical data on all patients served by UMMHC’s hospitals and ambulatory care offices. The UMMHC data currently represent more than 50 percent of central MA residents
As depicted in the data flow diagram, Clinical data from UMMHC HIS are delivered via encrypted Internet transport. We architected for internet transport to accommodate future external organizations. The data is aggregated and normalized to create the TIDE Data Warehouse. The data is mapped to the i2b2 ontology and de-identified to populate the MICARD i2b2 research repository. Identified data is transported back to UMMHC to populate the Medical Management quality reporting system. Another network segment has been created termed TIDE2 for managing clinical study data. TIDE2 hosts virtualized servers to support applications for clinical studies. After IRB approval, clinical data is loaded from the TIDE data warehouse to a study specific database in TIDE2. More details on our current data warehouse implementation are provided below.
UMMS i2b2 Implementation. We have implemented release 1.3 of the i2b2 software. The flexible hive design and the ability to merge multiple data sources are unique strengths of the i2b2 framework. UMMS was among the first academic health centers to adopt the i2b2 framework, now being implemented at many academic health centers. UMMS is an active participant in the i2b2 Academic Users Group. Our implementation currently includes in-patient demographics, primary and secondary discharge diagnoses, and pharmacy and laboratory data from the existing UMMHC Meditech system. We are also loading IDX (scheduling and billing) data and health plan data. We have in production 121,920,724 million facts available (August 31, 2009) on a pool of 2,050,425 patients. We will begin integrating data from other UMMHC systems (i.e. Allscripts, eICU, etc) in the next phase. We will do this by leveraging the UMMHC Initiate EMPI and dBMotion unified web portal and Health Information Exchange (HIE) engine to feed data to the datawarehouse via web services. A data governance committee has been established to set development and access priorities. There are three components in our i2b2 implementation: (1) TIDE Data Warehouse ETL (extract, transform and load) processing servers, (2) MICARD de-identified i2b2 research data mart, and (3) Medical Management reporting data mart.
1. The TIDE Data Warehouse is created as the ETL processing server aggregates, cleanses and standardizes raw clinical data. Data from hospital transactional systems (e.g., Meditech, IDX) is loaded into the ETL server, which normalizes the data to ensure consistency and accuracy. We are using public data files to develop reference code tables for the LOINC, ICD9, and NDC coding systems. LOINC and ICD9 codes are mature data standards that can be translated into meaningful searches. The NDC drug codes will be cross-referenced with the RxNorm and Orange Book drug databases to normalize the clinical drug component. RxNorm codes from the National Library of Medicine will be maintained wherever possible to insure a common vocabulary across multiple data sources. Proprietary codes that do not map cleanly to standard ontologies are presented in i2b2 as entity specific ontology. These reference tables had to be loaded, cleansed, and processed into tables in suitable formats. Stored procedures were written to document those processes. The cleansing process has been tested to be replicable, secure, and consistent. Exceptions are identified for manual processing. Additional ETL servers will be added to support other healthcare entities..
2. MICARD i2b2 research data mart. The Massachusetts Integrated Clinical and Academic Research Database is the UMMS implementation of i2b2 Clinical Research Chart (CRC) for research. This is a normalized, de-identified data mart. Investigators access the data directly using the i2b2 web client workbench. The MICARD workbench web client allows investigators to rapidly determine the patient cohorts with particular clinical attributes in the database. This facilitates proposal development and IRB workflow. Additionally, the system reports the availability of consented biospecimens from the Conquering Diseases Biorepository (See Section 9, Translational Technologies and Resources). Genetic data produced from these biospecimens will be accessible via a MICARD plug-in. This design increases the protection of privacy and confidentiality and substantially reduces the time required for investigators to plan and begin studies. Once a protocol is approved by the IRB, an analyst in ARCS runs the same query against the identified TIDE Data Trust to produce the limited data approved for the clinical study. The data set is delivered to secure systems in TIDE2.
3. Medical Management reporting data mart. UMMHC’s Medical Management Department has licensed Recombinant Data Corporation’s Quality Reporting System to generate and deliver reminders to physicians about patients who are overdue for cancer screenings and/or well care visits, as well as clinical alerts about the glycemic and/or lipemic control for their patients with diabetes and/or coronary artery disease. A combination of laboratory test results including hemoglobin A1c and lipid profiles, paid medical claims, paid pharmacy claims, unprocessed billing and scheduling data are used to generate reports to assist providers in identifying and intervening with their patients.