Science Policy For All

Because science policy affects everyone.

The Debut of Health Care Data Science

leave a comment »

By: Fabrício Kury, M.D.

Image source: MedCityNews.com

It is easy for a millennial – a person born between mid-1980’s to late-90’s – to be unaware of just how young the current methods used in health care research really are. Controlled randomized clinical trials (RCT), dating only from the late 40’s, are probably younger than most millennial’s grandparents. Case-control methodology and Kaplan-Meier curves only originated in the 1950’s, while meta-analyses were only accepted by medical researchers in the late 70’s. Step into the 80’s and early millennials are as old, if not older, than propensity scores and the concept that is today called cost-effectiveness research. The term “Evidence-Based Medicine” is as young as a millennial born in the early 90’s, while the late 90’s and 2000’s saw the explosion of genomics, proteomics, metabolomics, and other -omics research. Finally, the 2010s so far might be credited for when the term Data Science (“the fourth paradigm of science“) gained widespread notoriety, and established its modern meaning as – long story made short – the practice of producing knowledge out of data that had been created for other purposes.

While the second half of the 20th century transformed health care research into an ever more rigorous and technology-driven science, it also saw the cost of the health care sector of the U.S. unrelentingly grow from a comfortable 5% of the Gross Domestic Product in 1960 to a crushing 18% in 2015. Medical bills have become the leading cause of personal bankruptcies in the nation, while life expectancy, as well as other basic health indicators, depicted a country nowhere close to getting a similar bang for each buck as other developed nations. In 2009, the Obama administration prescribed to the health care sector a remedy that had previously brought efficiency and cost savings to every industry it had previously touched: information technology. The Health Information Technology for Economic and Clinical Health (HITECH) Act (part of the American Recovery and Reinvestment Act of 2009) literally gave away as much as $36.5 billion of taxpayers’ money to hospitals and physician practices for them to buy and “meaningfully use” electronic health records (EHRs). This outpouring of money was overseen by the Office of the National Coordinator of Health Information Technology (ONC), which had existed since 2004 as a presidential Executive Order, but became solidified as a legislative mandate via HITECH. This act fiercely transitioned the country from mostly paper-based health care in 2008 to near-universal EHRs adoption by 2015, giving electronic life, and potential reuse for research, to streams of health data previously dormant in paper troves.

Moreover, in March, 2010, the Patient Protection and Affordable Care Act (PPACA, a.k.a. “Obamacare”) was signed into law and, among so many other interventions, secured a few hundred million dollars for the creation of the Patient-Centered Outcomes Research Institute (PCORI). The mission of the PCORI is to do research that responds directly to real-life concerns of patients. For that purpose, among the first initiatives by the PCORI was the creation of PCORnet, a network of institutions capable of providing electronic health data for research. Most recently, in January 2015, President Obama announced the Precision Medicine Initiative (PMI). The PMI seeks to craft a nationwide and representative cohort of 1 million individuals, from whom a wealth of health data will be collected with no definitive goal besides to serve as a multi-purpose prime-quality dataset for observational electronic research. Meanwhile, private sector-led initiatives such as Informatics for Integrating Biology and the Bedside (i2b2) and Observational Health Data Sciences and Informatics (OHDSI) were also launched with the mission to access and do research on health care’s big data, and their publications can be easily found in PubMed.

These initiatives depict a political and societal hope – or hype? – that information technology, among its other roles in health care as whole, can make health care research faster, broader, more transparent, more reproducible, and perhaps also closer to the everyday lives of people. One premise is that by using existing EHRs for research, instead of data collected on-demand for a particular study, the researcher gets closer to the “real world” individuals that ultimately receive the treatments and conclusions produced by the study. In traditional clinical trials and other studies, the patients who participate are highly selected and oftentimes remarkably unrepresentative of the general population. Moreover, in EHR-based research there is also the potential to investigate more individuals than any previous method could possibly attempt. This broader reach makes rare conditions (or combinations of conditions) not so rare that they cannot be readily studied, and allows subtler variations in diseases to become detectable. On top of that, these studies can be done at the speed of thought. De facto, electronic health records-based clinical research has been recently published in the Proceedings of the National Academy of Sciences (PNAS) and evinced to be feasible at international, multi-hundred million patients scale at a breathtakingly swift time span. Altogether, one can sense in this picture that the millions of dollars spent on HITECH, PCORnet, PMI, and the NIH’s Data Science research grants might not have been just unfounded hype.

The relationship of IT and health care must, however, recognize its rather long history of frustrated expectations. In 1968, for example, Dr. Laurence Weed – the father of today’s prevailing paradigm of patient notes – predicted that in the future all text narratives present in electronic health records would be entered in structured form that enables scientific analysis. Today, to say the minimum, we have become less confident about whether such change is feasible or even desirable to begin with. In 1987, Barnett and colleagues believed that “relatively simple computational models” could be used to construct “an effective [diagnostic] assistant to the physician in daily practice” and distributed nationwide, but such assistant is yet to arrive at your physician’s office downtown (although, truth be recognized, it might be around the corner). While presently teaming with excitement and blessed with incentives, the journey of IT into health care and health care research is invariably one of uncertainties and risks. Health information technology has been accused of provoking life-threatening medical errors, as well as – like previous technological breakthroughs along the history of Medicine, including the stethoscope – harming the patient-physician relationship and the quality of care. The editors of the New England Journal of Medicine early this year went as far as to state that data scientists are regarded by some clinical researchers as “research parasites.”

Moreover, the Federal Bureau of Intelligence has investigated that medical information can be sold on the black market for 10 times more than a credit card number, while at the same time cybersecurity experts are stunned by the extreme vulnerability of current U.S. health care facilities. This provides sensible ground for concern about patient privacy violation and identity theft once the health records have moved from papers into computers. Unlike a credit card, your medical and identity information cannot be cancelled over the phone and replaced by a new one. Patient matching, i.e. techniques for recognizing that data produced at separate sites refer to the same person, oftentimes confronts blunt opposition by civil opinion, while the ultimate ideal of a National Patient Identifier in the U.S. is explicitly prohibited by present legislation (HIPAA). Such seamless flow of interoperable health data between providers, however, is the very first recommendation expressed in 2012 by the Institute of Medicine for realizing the Learning Health Care System – one that revolves around the patient and where scientific discovery is a natural outgrowth of patient care.

With or without attaining the ideal of a Learning Health Care System, the U.S. health care system will undergo transformation sooner or later, by intervention or by itself, because the percentage of the GDP that is spent on health care can only continuously increase for so long. Information technology is at minimum a sensible “bet” for improving efficiency – however, the power of IT for improving efficiency lies not in greasing the wheels of existing paradigms, but in outclassing them with novel ones. This might be part of the explanation for the resistance against IT, although there does exist some evidence showing that IT can sometimes do more harm than good in health care, and here the word “harm” sometimes can mean patient harm. The cold truth is that, in spite of decades of scientific interest in using computers for health care, only very recently the health care industry became computerized, so we remain not far from the infancy of health care informatics. Nevertheless, Clinical Informatics has been unanimously approved in 2011 as a board-certified physician subspecialty by the American Board of Medical Specialties, signaling that the medical community sees in IT a permanent and complex duty for health care. Similarly, the NIH has in late 2013 appointed its first Associate Director for Data Science, also signaling that this novel field holds importance for health care research. Finally, there might be little that can be done with the entire -omics enterprise, with its thousands over thousands of measurements multiplied by millions of patients, that does not require data-scientific techniques.

The first cars were slower than horses, and today’s high-speed, road-only automobiles only became feasible after the country was dependably covered with a network of roads and freeways. Such a network was built not by the automobile producers, but by the government upon recognition that it would constitute a public good. The same principle could very well be the case of health care IT’s important issues with privacy, security and interoperability, with the added complication that it is easy for an EHR producer to design a solution but then block its users from having their system interact with software from competing companies. Now that health care records are electronic, we need the government to step in once again and build or coordinate the dependable freeways of health care data and IT standards, which will also constitute a public good and unlock fundamental potentials of the technology. Health care, on top of its humanitarian dimension, is fundamentally intensive in data and information, so it is reasonable to conjecture that information technology can be important, even revolutionizing, for health care. It took one hundred years for Einstein’s gravitational waves to evolve from a conjecture based on theoretical fundaments to a fact demonstrated by experiments. Perhaps in the future – let us hope not a century from today! – some of the data-scientific methods such as Artificial Neural Networks, Support Vector Machines, Naïve Bayes classifiers, Decision Trees, among others, in the hands of the millennials will withstand the trial of time and earn an entry at the standard jargon of medical research. Just like how, in their generations, meta-analyses, case-control studies, Kaplan-Meier curves, propensity scores, and the big grandpa of controlled randomized trial were similarly accepted.

Advertisements

Written by sciencepolicyforall

July 13, 2016 at 11:15 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: