Big Data Analytics for Healthcare
The U.S. health care system is a complex network composed of thousands of organizations and millions of individuals that deliver health care services to the population of the United States and work with other governments and multinational organizations to protect the public's health domestically and globally. Hughes (2004) stated that the system provides services across the continuum of care: from health promotion and prevention to disease or injury diagnosis and treatment, through rehabilitation and maintenance for those suffering from lingering illness or chronic disease. Several characteristics distinguish the U.S. health care system from other segments of the domestic economy and its counterparts in other industrialized nations.
As stated by Burns, Bradley, and Shortell (2012), one of the critical challenges facing health care firms is the delivery of value, which is defined as the quotient of quality divided by cost. The providers are responsible for delivering a higher level of quality at the exact cost, the same level of quality at a lower cost, or higher quality at a lower cost. This challenge, as stated by Burns, Bradley, and Shortell (2012), has been proposed by (a) providers, in the form of accountable care organizations (A.C.O.s) and pay-for-performance, (b) suppliers, in the way of demonstrating the comparative clinical effectiveness of their products (versus alternative therapies) and (c) insurers and providers in the form of value-based purchasing.
The reality is that one of the reasons that delivering value had been a challenge is attached to the denominator of the definition: health cost. Health care costs have been rising over the years. For example, in the United States, the health cost has risen from three to four percent annually. Some of the arguments and possible explanations of this rising cost are technology and the application to new patients and patient indications. As stated by Burns, Bradley, and Shortell (2012) technology contributes to rising costs because it is a compliment rather than a substitute of labor (e.g., the requirements of technicians to use new equipment). Congressional Budget Office (C.B.O.) testimony pointed to advances in medical technologies as a primary driver of increasing healthcare costs) Advances in medical technology are essential, but there is no requirement that effectiveness is demonstrated before the technology is adopted in the U.S. healthcare market. However, as stated by Mack (2016), eagerness for innovation seems to have created a culture where medical technologies are adopted prematurely, and new medical technology is employed for additional uses beyond the original intent. In some instances, technologies that offer only marginal improvements over existing treatments - but with dramatically higher price tags - are adopted broadly and rapidly.
As part of the technology adopted by health care, the strategy to implement big data to support analytics has the potential to improve the health of the population and ultimately save lives. As stated by Sheeran and Steele (2017), the increasing richness of data in healthcare is a whole range of health-related areas where analytics can contribute, such as value-based care, health system efficiency, population health management, epidemiology, phenome-genome relationships, clinical practice knowledge discovery, personal health optimization, and personalized medicine. Big data technologies enable capturing and storing large data sets, then taking this data and analyzing it to reveal patterns and trends which would lead to the improvement of the service or product concerned. Big data technologies are concerned with quantities of data at such a scale that they cannot be easily handled by traditional means such as relational databases or data warehouses (Sheeran and Steele, 2017). In the area of health care, big data will take a large set of data, which could be typically from patients and past health information, to personalize and improve patient care through the storage, processing, and analysis of data related to the patient and the phenomic, genomic, environmental and contextual information.
Historically, the healthcare industry has generated large amounts of data, driven by record keeping, compliance, regulatory requirements, and patient care. While most data is stored in hard copy form, the current trend is toward rapid digitalization of these massive amounts of data. Driven by mandatory requirements and the potential to improve the quality of healthcare delivery meanwhile reducing the costs, these gigantic quantities of data (big data) hold the promise of supporting a wide range of medical and healthcare functions, including, among others, clinical decision support, disease surveillance and population health management (Raghupathi and Raghupathi, 2014).
Big data in healthcare refers to electronic health data sets so large and complex that they are difficult (or impossible) to manage with traditional software and hardware; nor can they be easily managed with conventional or standard data management tools and methods. Big data in healthcare is overwhelming because of its volume and the diversity of data types and the speed at which it must be managed. The totality of patient healthcare and well-being data makes up "big data" in the healthcare industry. It includes clinical data from CPOE and clinical decision support systems, such as physician's written notes and prescriptions, medical imaging, laboratory, pharmacy, insurance, and other administrative data; patient data in electronic patient records (E.P.R.s); machine-generated/sensor data, such as from monitoring vital signs; social media posts, including Twitter, feeds, blogs, status updates on Facebook and other platforms (Raghupathi and Raghupathi, 2014).
As stated by Raghupathi and Raghupathi (2014), by digitizing, combining, and effectively using big data, healthcare organizations ranging from single-physician offices and multi-provider groups to large hospital networks and accountable care organizations stand to realize significant benefits. Potential benefits include detecting disease at earlier stages when they can be treated more efficiently and effectively, managing specific individual and population health, and detecting health care fraud more quickly and efficiently. The health data volume is increased in the last few years, and also the healthcare reimbursement has been changing by creating a more meaningful use and pay for performance as new critical factors. Even though profit is not and should not be a priority or the primary motivator, it is essential for healthcare organizations to acquire the available tools, infrastructure, and techniques to leverage big data effectively or will risk losing millions of dollars in revenue and profits potentially.
As stated by Sheera and Steele (2017):
“In 1854, before a large amount of data was collected on hospitals and patients, an early example of health analytics was used to analyze and control an outbreak of cholera in London. Dr. John Snow, an English physician, collected data on the patients and analyzed their similarities in lifestyle. Realizing there was a connection to London’s Broad Street, he drew a map and further analyzed what may have been causing the outbreak. He noticed that those using the water from a pump on Broad Street were contracting the sickness - but those with a private well or not drinking water were not getting sick. This led to the water pump being inspected and changed and the improvement of the health of residents. Although he did not discover what was causing cholera at the time, he showed the usefulness of using and analyzing data to discover causes and prevent future outbreaks of diseases. While this is a very early example, the challenge in modern-day health is managing the vast and exploding quantities of health-related data available and carrying out useful analytics tasks to provide valuable and actionable information.”
Big data are enabled to capture and store many data sets, analyze for patterns and trends, and provide insightful information to embrace better services and products.
However, while it is ubiquitous today, 'big data' as a concept is nascent and has uncertain origins. Diebold (2012) argues that the term "big data … probably originated in lunch-table conversations at Silicon Graphics Inc. (S.G.I.) in the mid-1990s, in which John Mashey figured prominently". Despite the references to the mid-nineties, This current hype can be attributed to the promotional initiatives by I.B.M. and other leading technology companies. They invested in building the niche analytics market (Gandomi and Haider, 2015).
Size is the first characteristic that comes to mind considering the question "what is big data?" However, other features of big data have emerged recently. For instance, Laney (2001) suggested that Volume, Variety, and Velocity (or the Three V's) are the three dimensions of challenges in data management. The Three V's have emerged as a common framework to describe big data (Chen et al., 2012; Kwon et al., 2014). For example, Gartner, Inc. defines big data in similar terms: “Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.” (“Gartner IT Glossary, n.d.”) Similarly, TechAmerica Foundation defines big data as follows: “Big data is a term that describes large volumes of high velocity, complex and variable data that require advanced techniques and technologies to enable the capture, storage, distribution, management, and analysis of the information.” (TechAmerica Foundation's Federal Big Data Commission, 2012).
As stated by Gandomi and Haider (2015), 3 V’s are described as follow:
1. Volume refers to the magnitude of data. Big data sizes are reported in multiple terabytes and petabytes. A survey conducted by I.B.M. in mid-2012 revealed that just over half of the 1144 respondents considered datasets over one terabyte to be big data (Schroeck, Shockley, Smart, Romero-Morales, & Tufano, 2012). One terabyte stores as much data as would fit on 1500 CDs or 220 DVDs, enough to store around 16 million Facebook photographs. Beaver, Kumar, Li, Sobel, and Vajgel (2010) report that Facebook processes up to one million pictures per second. One petabyte equals 1024 terabytes. Earlier estimates suggest that Facebook stored 260 billion photos using storage space of over 20 petabytes.
2. Variety refers to the structural heterogeneity in a dataset. Technological advances allow firms to use various types of structured, semi-structured, and unstructured data. Structured data, which constitutes only 5% of all existing data (Cukier, 2010), refers to the tabular data found in spreadsheets or relational databases. Text, images, audio, and video are examples of unstructured data that sometimes lack the structural organization machines require for analysis. Spanning a continuum between fully structured and unstructured data, semi-structured data format does not conform to strict standards. Extensible Markup Language (XML), a textual language for exchanging data on the Web, is a typical example of semi-structured data. XML documents contain user-defined data tags, which make them machine-readable.
3. Velocity refers to the rate at which data are generated and the speed it should be analyzed and acted upon. The proliferation of digital devices such as smartphones and sensors has led to an unprecedented rate of data creation. It is driving a growing need for real-time analytics and evidence-based planning. Even conventional retailers are generating high-frequency data. Wal-Mart, for instance, processes more than one million transactions per hour (Cukier, 2010). The data emanating from mobile devices and flowing through mobile apps produce torrents of information that can be used to generate real-time, personalized offers for everyday customers. This data provides useful information about customers, such as geospatial location, demographics, and past buying patterns, which can be analyzed in real-time to create real customer value.
In addition to the 3 V’s mentioned above, there are three additional dimensions defined by Gandomi and Haider (2015) as follow:
1. Veracity. I.B.M. coined Veracity as the fourth V, which represents the unreliability inherent in some sources of data. For example, customer sentiments in social media are uncertain since they entail human judgment. They contain valuable information. Thus the need to deal with imprecise and uncertain data is another facet of big data, which is addressed using tools and analytics developed for managing and mining uncertain data.
2. Variability (and complexity). S.A.S. introduced Variability and Complexity as two additional dimensions of big data. Variability refers to the variation in the data flow rates. Often, big data velocity is not consistent and has periodic peaks and troughs. Complexity refers to the fact that big data are generated through a myriad of sources. This imposes a critical challenge: the need to connect, match, cleanse and transform data received from different sources.
3. Value. Oracle introduced value as a defining attribute of big data. Based on Oracle's definition, big data are often characterized by relatively "low-value density.” The data received in the original form usually has a low value relative to its volume. However, a high value can be obtained by analyzing large amounts of such data.
4. The extraction of useful information on group structures from these data can improve the quality and effectiveness of care for a sustainable health system. It is not only the sheer size of the data that imposes difficulty in the direct application of conventional clustering methods; big data in healthcare often exhibit a multilevel structure with complex correlation among observation and a mix of variable types. Massive data collection in the scientific field includes bioinformatics health, biomedicine, and biometrics/biostatistics. Huge data collection in these fields is the fact of 21s Century science, meaning that we live in the era of "Big Data." For example, big data in healthcare research is now commonplace (due to increasingly information-rich governments and sophisticated data management and data linking technologies), and the extraction of useful information on group structures from these data can contribute to improving the quality, effectiveness, and cost-effectiveness of care for the sustainable health system.
The healthcare field introduces a range of unique characteristics, challenges, and potential transformative benefits. For example, Singh, Bhatia, and Bhatia (2017) stated that big data technologies provide instant solutions and huge applications to biomedical problems. The instant service via prediction analysis proves better than the World Health Organisation (W.H.O.) and the Centre for Disease Control (C.D.C.). Application of Big Data Analysis using the patient care dataset, which includes notes of physician, Lab reports, X-Ray reports, case history, diet regime, doctors and nurses in a particular hospital, data from national health register, expiry date identification of medicine and surgical instruments based on RFID data can be helpful in better insight into care coordination, health management, and patient engagement.
There are various strategies that management could implement to be more efficient in gathering, collecting, storing, and analyzing data in healthcare. For example, the Internet of Things (IoT) can be perceived as enabling user-specific interoperable communication services by integrating sensing and actuation technologies with other leading technologies, such as telecommunications, networking, security and cloud computing, etc. Advancements in IoT technologies, bioengineering, and cloud computing opened new frontiers in critical healthcare systems. Mobile health (m-Health) systems are gaining wide popularity as they enable patient-centric personalized healthcare services. M-Health systems provide an omnipresent Internet interface through which the medical personnel or any authenticated and authorized end-user can remotely monitor the health parameters. Despite all security enhancements in data collection and communication technologies, m-Health systems have been highly vulnerable to malicious hacker attacks. A Healthcare system based on IoT can store the data in the cloud, and the system can notify the end-user if any critical situation occurs.
A second possible strategy or model that can be implemented to collect globally healthcare data is the Common Data Model (C.D.M.), which stores data in a fixed generic format instead of a specific format. As stated by Khan, Kothari, Kuchekar, and Koshy (2018), the C.D.M. should be independent of the structure of its source data files because the analysis tools are highly dependant on the data they are operating. The problem of integrating multiple data sources in one warehouse is a long-existing one. There have been several advancements in research on building standard models to store data about a particular department or institution (Khan, Kothari, Kuchekar, and Koshy, 2018). For example, the Integrated Aircraft Health Management (IAHM) program successfully developed a standard data model specifically for Aircraft Vehicle Health Management. The proposed system collected and integrated flight data, maintenance log information, and test stand run into their designed C.D.M. to support the development of a wide variety of aircraft health management analysis tools.
A third possible strategy is a basic model for assessing primary health care E.M.R. data quality, comprising a set of data quality measures within four domains. The model is offered as a starting point from which data users can refine their approach based on their needs. The nature of the data that a primary health care practitioner requires for the care of a patient cab differs from what is needed for other purposes, for example, research. The overall assessment of the quality of these data can vary depending on their intended use, which is characteristic of data quality that is aligned with the concept "fitness for purpose." As stated by Terry, Stewart, Cejic, Marshall, de Lusignan, Chesworth, and Thind (2019), the four basic tasks that the basic model of primary health care E.M.R. data quality should complete are: (a) conceptualizing data quality domains, (b) developing data quality measures, (c) operationalizing the data quality measures and (d) testing the data quality measures. The challenge with this model is that not all measures could be utilized in all datasets and illustrated the variability in the data quality as a step forward in creating a standard set of measures of data quality.
Conclusion
Big data in healthcare often involve data collections from multiple sources or platforms. These sources may include clinical decision support systems (medical imaging, laboratory, pharmacy, immunological and genomic-markers expressions, and other health administrative data), hospital emergency department records, Medicare, health insurance records, and social media posts. Big data with complex correlation structures and possibly a mix of variable types make them difficult to be analyzed by classical cluster analytic approaches. A lot of inferences and information can be obtained from this data via analysis in a field of data analytics that is rapidly blooming. The challenge is that even though data is available, it is scattered and segregated by domain. For example, patient data, which can be found in the electronic medical record (E.M.R.), which, as stated by Sheeran and Steele (2017), could be helpful for analysis on the possible diagnosis of one patient and in the future will increasingly include other sensor-captured data.
References
Burns, L. R., Bradley, E. H., Weiner, B. J., & Shortell, S. M. (2012). Shortell and Kaluzny’s health care management: Organization, design, and behavior. Clifton Park, NY: Delmar/Cengage Learning.
Burton, D.A. (2013). Population health management: Implementing a strategy for success. Retrieved from: https://healthcatalyst.com/wp-content/uploads/2013/07/WhitePaper PopulationHealthManagement.pdf
Dall, T.M., Gallo, P.D., Chakrabarti, R., West, T., Semilla, A.P. and Storm, M.V. (2013). An aging population and a growing disease burden will require a large and specialized healthcare workforce by 2025. Health Affairs, 32(11), 2013-2020. Retrieved from: http://content.healthaffairs.org/content/32/11/2013
Gandomi, A. and Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International journal of information management. 35, 137-144. Retrieved from: https://doi-org.ezp.waldenulibrary.org/10.1016/j.ijinfomgt.2014.10.007
Gartner, IT Glossary (n.d.). Retrieved from: http://www.gartner.com/it-glosarry/big-data
Hibbard, J.H. and Greene, J. (2013). What evidence shows about patient activation: Better health outcomes and care experiences; fewer data on costs. Health Affairs, 32(2), 201-214. Retrieved from: http://content.healthaffairs.org/content/32/2/207
Hughes, J. (2004). U.s. Health care system. In M. J. Stahl (Ed.), Encyclopedia of health care management (pp. 574-576). Thousand Oaks, CA: SAGE Publications, Inc. doi: 10.4135/9781412950602.n828
Kaufman, D. (2004). Health care services. In M. J. Stahl (Ed.), Encyclopedia of health care management (pp. 253-253). Thousand Oaks, CA: SAGE Publications, Inc. doi: 10.4135/9781412950602.n364
Khan, M. S., Guan, B. Y., Audimulam, J., Liceras, F. C., Coker, R. J., Joanne, Y., & ... Yoong, J. (2016). Economic interventions to improve population health: a scoping study of systematic reviews. B.M.C. Public Health, 16(528), 1–9. doi:10.1186/s12889-016-3119-5
Mack, M. (2016). What drives rising healthcare costs? Government Finance Review, (4), 26. Retrieved from https://search-ebscohost-com.ezp.waldenulibrary.org/login.aspx?direct=true&db=edsgea&AN=edsgcl.462236742&site=eds-live&scope=site
Moreno-Serra, R., & Smith, P. C. (2012). Does progress towards universal health coverage improve population health? The Lancet, 380(9845), 917–923. doi:10.1016/S0140-6736(12)61039-3
Raghupathi, W., and Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health Information Science and Systems, 2(1), 1-10. Retrieved from: https://link.springer.com/article/10.1186/2047-2501-2-3.
Singh, M., Bhatia, V., and Bhatia, M. (2017). Big data analytics: Solution to healthcare. 2017 International Conference on Intelligent Communication and Computational Techniques (ICCT), Intelligent Communication and Computational Techniques (ICCT), 2017 International Conference On, 239. Retrieved from: https://doi-org.ezp.waldenulibrary.org/10.1109/INTELCCT.2017.8324052
Sheran, M. and Steele, R. (2017). A framework for big data technology in health and healthcare. IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON), 2017 IEEE 8th Annual, 401. https://doi-org.ezp.waldenulibrary.org/10.1109/UEMCON.2017.8249095