Preparing Life Sciences Data for AI Applications

Blog: Preparing Life Sciences Data for AI Applications

Artificial intelligence (AI) has the potential to fundamentally change healthcare and the life sciences that underpin it. Digital twins and computer vision are already proving better than humans at the prediction of disease or acute medical events, diagnosis of existing diseases and injuries, and design of potential treatments. There are two substantial roadblocks to the wider application of AI in healthcare. One is regulatory red tape, and one is the state of the data that will be necessary to train AI models.

Laboratory informatics professionals have little control over the regulatory environment. But the data is something that could be prepared now for its potential application in AI models, if and when that regulatory environment changes.

Current Uses for and Issues with AI in the Life Sciences

The U.S. Food and Drug Administration (FDA) considers AI to be a medical device under its definition of software as a medical device (SaMD). As part of its framework for ensuring the safety and efficacy of SaMD, the FDA has published Good Machine Learning Practice for Medical Device Development: Guiding Principles. In response, the Association of Food and Drug Officials and the Regulatory Affairs Professionals Society formed a work team to establish good machine learning practices (GMLP) to increase effectiveness and decrease patient risk.

This complex regulatory environment limits applications of AI in the life sciences today. The current regulatory framework was developed to manage traditional medical devices and pharmaceuticals. The existing rigorous preclinical and clinical trial requirements are not designed for predictive modeling or digital twins, the new paradigms in SaMD. This mismatch in expectations versus reality (to simplify) has had the unfortunate effect of throttling innovation for software applications on the patient-facing end of drug discovery, as well as on the research and development end.

The other issue facing AI applications in drug design is the available data. High-quality data is required for use in machine learning. And, for life sciences data particularly, that data must be as free from bias as possible. Increasing interconnections and automation in laboratories is creating exponentially greater quantities of data, but access to existing paper-based or siloed data remains limited. Access to this dark data will also be necessary to increase the effectiveness of AI and machine learning (ML) modeling.

▶ Additional Reading: Avoiding Virtual Data Silos

Data Types and Quality for AI in Life Sciences

Despite these data access issues, the life sciences are embracing open science. This trend will contribute more data to the sources that AI and ML leverage. There are many existing data sources that could benefit from the application of AI or ML in the life sciences industry today. These data include

To properly draw insights from these data, AI models need high-quality, standardized, and FAIR data. Getting there brings us to challenges in data preparation.

CTA: large_video_CSols-Summit-2024_FAIR Data

Preparing Data for AI Applications

First, you need a strategy for data collection and a working definition of high-quality data. Luckily, if you work in a regulated environment and have been through a validation or two, your data has many quality markers already. The ALCOA+ framework for data integrity (that is, Attributable, Legible, Contemporaneous, Original, Accurate, Complete, Consistent, Enduring, and Available data) attains a reasonable level of quality. From there, it isn’t much of a stretch to Findable, Attributable, Interoperable, and Reusable (FAIR) data. So, most life sciences data already meets the definition of high quality.

Your strategy for data collection can get a bit more difficult to pin down. Begin by thinking about what information you need, where it lives today, and the easiest way to bring it all together. Before bringing it together in a platform or data lake, clean the data to remove inaccuracies, duplicated data sources, or inconsistent formats. This is where an ontology becomes important. An ontology ensures a standard framework that supports model training and the interoperability of your data.

Ethical Considerations of AI in Life Sciences

Ethical considerations are the elephant in the room when talking about AI. Keeping sensitive data safe becomes exponentially more difficult in a cloud-based large language model (LLM). AI and ML models need these large data sets to make informed decisions, but some of those LLMs have been trained on biased data sets. It’s important to know what those biases are and how they could affect AI-based decision-making. The Open Worldwide Application Security Project (OWASP) publishes an annual list of security issues with AI applications. It’s worth reading if your organization is considering adopting an LLM for any reason.

Preparing for Eventual Policy Changes Around AI in Life Sciences

The undeniable advantages of AI and ML applications in the life sciences are beginning to outweigh the drawbacks. As awareness of biases and security risks has grown, so too have the remedies to address them. Government agencies are beginning to make policies around the responsible application of AI in the life sciences. For example, the FDA is actively considering the application of its regulatory framework to AI in drug manufacturing.

Laboratory informatics will have a central role in the adoption of AI by life sciences organizations, as data is the foundation of any laboratory informatics system. Taking steps early to ensure that your data is clean and FAIR will enable your organization to stay ahead of regulatory changes and be ready for the full realization of Pharma 4.0. Start preparing your data now for future AI applications and regulatory changes in the life sciences.


In what ways could your lab leverage AI today and derive better value from your existing data?

Share Now:
Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.