News|Articles|May 9, 2024

MHE May 2024
Volume 34
Issue 5

As Data Pile Up, Some Healthcare Organizations Turn To ‘Lakehouses’

Healthcare organizations generate and store vast amounts of data but they often have to choose between speed and structure. Data "lakehouses" may be the answer.

Cloud computing offers a number of advantages but it would be a mistake to assume that cloud computing automatically leads to lower costs, says Nick Stepro, the chief product and technology officer at Arcadia, a healthcare IT firm in Boston.

“Those that have gone to the cloud have probably learned that of all of the things the cloud has made easy, it has made spending money easier than anything,” Stepro says. “You can basically blink and wake up to a seven-digit cloud bill like that.”

A 2022 report by the data security firm Netwrix found that 73% of healthcare organizations store sensitive data in the cloud. Respondents said that within 12 to 18 months, they expected more than half of their workload would be performed in the cloud.

Ironically, the most common reason respondents said they were moving to the cloud was to cut costs. Stepro says high cloud costs are one reason it is important to use the cloud efficiently. He says making cloud computing cost effective requires thinking not just about where an organization stores its data, but also how it stores the data. That is why some healthcare organizations are turning to a new approach to managing data: the
data lakehouse.

Why a lakehouse?

If the coinage “data lakehouse” seems odd, that’s because the concept is actually a combination of two other methods of collecting and storing large amounts of data.

The first — and older — method is called data warehousing. Just like a warehouse for an online retailer, everything that goes into a data warehouse needs to be carefully structured and organized right from the start so that it is easily trackable and accessible from the moment it arrives in the warehouse.

“It’s incredibly reliable,” Stepro says. “It makes sure that your data always writes when you ask it to write, reliably, and there’s no drift in data.”

However, all of that structure comes with a trade-off. “It’s really, really hard to run fast,” he explains. “And it’s really, really hard to take advantage of cloud-native compute and cloud-native storage.”

Those problems led to the creation of a different approach, the “data lake.” Stepro says this model started to gain traction about a decade ago. If data warehouses are akin to highly organized corporate warehouses, data lakes look more like your 9-year-old’s bedroom.

“It’s just like throwing your data over the wall,” he says. “We don’t really care. Data in any source in any format, any structure — throw it into the lake and we can apply really, really high-scale cloud compute on top of that to do all sorts of stuff.”

In other words, the emphasis is on collecting the data as quickly as possible, but at the expense of strict, front-end organization and quality control.

“And the challenge with that is that there’s basically no governance on it whatsoever,” Stepro says. “Your data can’t be reliably traced across schema changes, and it takes a lot of heavy processing people on top of that to govern that.”

If an organization is not careful, its data lake can turn into a swamp, full of duplicated, inaccurate or incomplete data.

Best of both worlds

The idea behind data lakehouses is to combine the strengths of data warehouses and data lakes while trying to minimize their downsides. That means leveraging the speed and lower cost of the data lake approach but also applying tier-based governance and technology in order to provide structure and usability.

“So you can still run really, really, really fast, you can still create massive, thousand-node compute layers on top of it, but you don’t turn it into a swamp, which is what happens to a lot of lakes,” says Stepro.

Arcadia markets a data lakehouse product designed to meet the needs of the healthcare industry, which poses some challenges because it is highly regulated and generates a wide array of data types. Other companies are selling data lakehouse products. Databricks, a data and artificial intelligence (AI) company in San Francisco, launched a healthcare-
focused data lakehouse platform in 2022. Companies such as eClinical are promoting data lakehouses as tools to optimize digital
clinical trials.

Among the healthcare organizations adopting data lakehouse infrastructures is Umpqua Health, an Oregon-based coordinated care organization. Juliana Landry, M.P.H., Umpqua’s vice president of health systems performance, said in a press release that she believes the lakehouse approach will ensure “the rapid delivery of updated data to our care teams, significantly improving our processes.

“From optimizing member outreach opportunities to accelerating care program enrollments and improving care coordination efficiency, this rapid data refresh means fresher insights and more informed decision-making across the enterprise and improves our ability to deliver better outcomes,” she added.

Stepro noted that healthcare organizations have everything ranging from very mundane data like physician directories and work hour logs to highly technical clinical trial data and highly personal health records.

A tiered approach

One way Arcadia’s lakehouse structure deals with these diverse needs is through a tiered “medallion” system. Under that system, data can enter the lakehouse as raw, “untransformed” data — the “bronze” tier. Over time, the data can be structured into a more useful product while the system applies tier-appropriate controls to govern access.

“As they are progressing through our system, every piece of data is tagged with record-level security that is temporally applied,” Stepro says. “And so what we’re able to do is to give access to big data technologies and data stores but do that in such a way where it respects the privacy of individuals and locks down different end-users from being able to access more than they should.”

Stepro gave an example of data from wastewater surveillance. Such data can be used to track the spread of illnesses such as COVID-19 through a population. The potential value of the data is high, yet it does not conform to any of the normal data structures that healthcare organizations are used to. “In the prior mindset, the work to negotiate that data into their existing structures and data strategy can be a monthlong exercise,” he says.

The lakehouse approach is to get the data into the database as soon as possible and worry about making it actionable later with the help of machine learning (ML) and other tools.

“We have this data, let it be loosely structured, throw it in the system, apply ML to traverse that dataset and do some initial insights in a relatively unstructured fashion,” he says, “and then you can graduate that data up the silver and gold chain of that modality and data model for more production, analytics assets.”

Lakehouses and artificial intelligence

Though one application of AI and ML is to make sense out of unstructured data, another role is to analyze that data in order to develop new insights that could improve patient care. Stepro says the lakehouse architecture can enable both functions.

“You can start leveraging this huge morass of unstructured data in which there’s a tremendous amount of signal,” he says, “but it needs to be married with appropriate context for these models to appropriately take advantage of it.”

After all, the “garbage in, garbage out” maxim is a major concern in the AI era. Flawed or incomplete data will lead to faulty insights, such as biased results that do not adequately account for different subgroups or scenarios. Stepro says the ideal situation is to be able to leverage a sufficient amount of data in a strategic way, one that does not require running up a sky-high cloud bill.

“Everyone right now is using a sledgehammer like GPT-4 on every problem,” he says, “but maturity will mean using finer-tuned, smaller models for very specific problems.”

Articles in this issue

over 1 year ago

Article

In Oncology, Antibody-Drug Conjugates Are a Hot Ticket

over 1 year ago

Article

Investment in Vaccine Development is Low. How Can That Be Fixed?

over 1 year ago

Article

Dermatology’s Embrace of JAK inhibitors

over 1 year ago

Article

Medicare Advantage, Racing Along, Hits Some Speed Bumps

over 1 year ago

Article

A New Kid on the Block of Therapy for Acute Myeloid Leukemia

Get the latest industry news, event updates, and more from Managed healthcare Executive.

As Data Pile Up, Some Healthcare Organizations Turn To ‘Lakehouses’

Why a lakehouse?

Best of both worlds

A tiered approach

Lakehouses and artificial intelligence

Articles in this issue

Newsletter

Latest CME

Community Oncology Connections™: Optimizing SCLC Treatment Strategies and Managing Adverse Events Across Disease Stages | Missouri

Community Oncology Connections™: Optimizing SCLC Treatment Strategies and Managing Adverse Events Across Disease Stages | New Mexico

Community Oncology Connections™: DLL3-Targeting Bispecific Antibodies for Small Cell Lung Cancer – From Innovation to Practice | North Carolina

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | West Virginia

From Suspicion to Stabilization: Early Recognition and Treatment of Paraneoplastic LEMS

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | Tennessee

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | Washington

Community Oncology Connections™: Optimizing SCLC Treatment Strategies and Managing Adverse Events Across Disease Stages | Texas

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | Arkansas

“D” Is for Diagnosis: Decoding a Difficult Thoracic Malignancy – Piecing Together a Rare Diagnosis, Preparing for Tomorrow’s Treatments

Inaugural Brain & Spine Metastases Conference: Evolving Practice and Emerging Therapies

23rd Annual Winter Lung Cancer Conference®

2nd Annual Hawaii Cancer Conference

42nd Annual CFS: Chemotherapy Foundation Symposium®: Innovative Cancer Therapy for Tomorrow

Striking the Right Nerve: Managing Cancer Associated LEMS in Lung Cancer Patients

Mastering Advances in Managing Unresectable and Metastatic NSCLC—Immunotherapy, Targeted Therapies, and Emerging Strategies

Show Me Your Care Plan!™ Insights for Oncology Nurses on Comprehensive SCLC Treatment and Care Strategies

Show Me Your Care Plan!™ Insights for Oncology Nurses on Comprehensive SCLC Treatment and Care Strategies

Practical Considerations and Future Directions for New Treatment Strategies in SCLC

Breaking Down the Latest Clinical Data for First-line Maintenance and R/R SCLC

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | Arizona

A Breath of Strength: Managing Cancer Associated LEMS and Lung Cancer as One

Cases & Conversations™: Expert Perspectives on Leveraging Recent Advances to Transform SCLC Treatment

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | Kentucky

Community Oncology Connections™: DLL3-Targeting Bispecific Antibodies for Small Cell Lung Cancer – From Innovation to Practice | Iowa

Community Oncology Connections™: DLL3-Targeting Bispecific Antibodies for Small Cell Lung Cancer – From Innovation to Practice | New York

Community Oncology Connections™: Optimizing SCLC Treatment Strategies and Managing Adverse Events Across Disease Stages | Minnesota

Community Oncology Connections™: Optimizing SCLC Treatment Strategies and Managing Adverse Events Across Disease Stages | Wisconsin

Personalized Management in NSCLC: Strategies for Early Detection, Molecular Testing, and Targeted Therapies | Nevada

Burst CME™: The Role of HER2 in NSCLC and Implications for Emerging Treatment Strategies

Future Directions in Treating SCLC and LCNEC-L: The Impact of DLL3

Community Practice Connections™: 19th Annual New York Lung Cancers Symposium®

3rd Annual Hawaii Lung: A Multidisciplinary Case-Based Conference

Community Practice Connections™: Incorporating Recent Updates in the Treatment of Metastatic ALK-Positive NSCLC

Virtual Testing Board: Digging Deeper on Your Testing Reports to Elevate Patient Outcomes in Advanced Non–Small Cell Lung Cancer

22nd Annual Winter Lung Cancer Conference®

(CME Credit) Advancing Outcomes in Limited-Stage Small Cell Lung Cancer: From Evidence to Practice

PER Tumor Board®: Applying Recent Advances to Transform the Treatment Paradigm in SCLC—Expert Perspectives on New Approvals and Emerging Strategies

Medical Crossfire®: The Precision Path for HER2 and TROP2-Targeted Treatments in Non–Small Cell Lung Cancer

Medical Crossfire®: DLL3-Driven Innovations in Small Cell Lung Cancer – How Do Experts Apply Pivotal Advances to Practice?

Medical Crossfire®: Expert Perspectives on Targeting c-Met Overexpression and 𝘔𝘌𝘛 Genomic Alterations in NSCLC – Unveiling the Complexities of 𝘔𝘌𝘛 Dysregulation

Tumor Board: Expert Insights on Managing Classical 𝘌𝘎𝘍𝘙 Mutations, 𝘌𝘎𝘍𝘙 Exon 20 Insertions, and Atypical 𝘌𝘎𝘍𝘙 Mutations in Metastatic NSCLC

27th Annual International Lung Cancer Congress®

Navigating Treatment Gaps in SCLC: Relapse, Resistance, and Need for New Options

Community Practice Connections™: Distinguishing Precision Pathways for c-Met and MET Alterations in NSCLC

A New Era of Targeted Therapy for Advanced NSCLC: Exploring Future Directions for Bispecific Antibodies and ADCs

Advances in Managing EGFR-Mutant NSCLC: Applying Evidence Across the Disease Continuum

26th Annual International Lung Cancer Congress

(CME Track) Antibody–Drug Conjugates in Oncology: The Essentials of AE Management for Better Patient Outcomes

21st Annual New York Lung Cancers Symposium®

Personalized Approaches in NSCLC: Early Detection, Molecular Testing, and Targeted Therapies

9th Annual School of Nursing Oncology™

Community Practice Connections™: Optimizing SCLC Treatment Strategies and Managing Adverse Events Across Disease Stages

Community Practice Connections™: DLL3-Targeting Bispecific Antibodies for Small Cell Lung Cancer—From Innovation to Practice

20th Annual New York Lung Cancers Symposium®

Cases and Conversations™: Transforming Small Cell Lung Cancer Treatment Through Emerging Evidence and Expert Insights

(Pharmacist Track) Antibody–Drug Conjugates in Oncology: The Essentials of AE Management for Better Patient Outcomes

Trending on Managed Healthcare Executive

How new psoriasis, alopecia and atopic dermatitis therapies stood out at Fall Clinical Derm 2025

Misdiagnosis of Type 1 diabetes remains a major problem, despite advances

The FDA proposes ditching comparative efficacy studies for biosimilars

UC Davis develops new drug for bladder cancer patients

Phase 3 trial launches for novel antibody targeting integrin beta-6 in advanced lung cancer