Top Use Cases for Machine Learning in Pharma

Real-World Use Cases for AI & ML in Pharma

For decades, Pharmaceutical data analytics has been a largely manual and tedious task conducted by the commercial research, health outcomes, R&D and Clinical Study groups at Pharma companies both small and large. With the emergence of machine learning, artificial intelligence and other disruptive innovations, Pharma, like other industries has also started its slow but sure transition to a more agile, data-driven model – one where in-house research is supplemented by intelligence gathered by applying algorithms across terabytes of Physician Rx, Patient Claims and other related datasets.

In this post, the following use cases have been discussed. These examples highlight some of the work we have done to deliver predictive analytics projects to Pharma companies over the past 3-5 years. But, first, let us look at some of the challenges with applying such analytics on Pharmaceutical datasets. Please contact us if you are interested in learning more about some of our projects and how we can help you to identify machine learning opportunities as well as how to execute them.

  • Patient Finder (or Rare Disease Patient Finder) using Claims Databases
  • Treatment Pathways & Patient Journey for Health Outcomes
  • Finding Physician Trends for Commercial Market Research
  • Risk-Based Monitoring in Clinical Trials
  • Physician Matching
  • Clinical Studies and R&D
  • Market Mix Modeling (or Promotion Response Modeling)


The Challenge with Machine Learning in the Pharmaceutical domain

One of the primary drawbacks of applying Machine Learning for Pharma has been the relative lack of proven enterprise use cases in the industry. Unlike purely quantitative disciplines, Pharma requires a strong element of human intuition. A high fever accompanied by a low blood pressure can be caused by a myriad of factors. Machine Learning excels in fields that are highly reliant on quantitative information. Making accurate diagnoses is as much an art as it is science. Due to this reason, while algorithms may be great at identifying malignant tumours, differentiating common cold from a more complex condition requires experience.

McKinsey & Company, released their well-known paper, titled “The big-data revolution in US health care: Accelerating value and innovation that spoke to the growing demand and necessity of Data Science to manage growing healthcare costs and changes in socio-economic factors.

Real-World Applications of Machine Learning and Artificial Intelligence

Despite the aforementioned challenges, the Pharma industry has scored some notable successes in applying algorithms to solving day-to-day tasks. The team at RxDataScience Inc. has been working with clients for more than 5 years, delivering cutting-edge solutions for data science – data mining and predictive intelligence to solving some of the toughest challenges in Pharma using R, Python, Spark MLLib, KDB+, Jupyter notebooks along with various machine learning libraries. The use cases are very diverse and while it is not possible to identify all of them, some of the interesting ones have been given here.



Patient Finder (or Rare Disease Patient Finder) using Claims Databases

Finding patients in claims databases, such as APLD (Anonymised Patient-Level Data) and Truven Marketscan can be accomplished by identifying patients that show characteristics that are similar to other patients with the same diagnosis codes. For eg., using a cohort of patients who have been confirmed to have diabetes, we can create ML models that can then be applied to other patients to identify potential undiagnosed cases / patients.

The standard patient identification approach can also be used to find patients with rare diseases. The latter is a challenging, yet a very popular topic. Rare disease patients often remain undiagnosed until it is too late. By using ML, it may be possible to detect the disease early in the progression. It is also economically lucrative for pharma companies as rare disease drugs are often extremely expensive and the per patient revenues could be very significant. We have applied various CART (Classification and Regression Tree) models such as C5, standard decision trees, random forest and others to addressing our client projects involving discovery of rare disease patients using a machine learning based approach.

Treatment Pathways & Patient Journey for Health Outcomes

Patient Journey and Treatment Pathways refer to the process of finding how a patient progresses from one disease state to another through multiple lines of therapies. In general, most of the current work focuses on using historical data to assess the pathways.

However, using clustering and scoring models machine learning can help assess which treatments and/or drugs should be recommended for patients based on historical outcomes and success rates of treatment pathways. Creating treatment pathways often involve computing across temporal, i.e., time-series data. To make our processes faster, we have employed time-series databases such as kdb+ in conjunction with Python (jupyterq), Tensorflow (Google) and GPU-based hardware to optimize processing capabilities and improve efficiency. 


Finding Physician Trends for Commercial Market Research

Using Associative Rules Mining, or “apriori”, data scientists can develop models with the outcome variable being a quantitative value related to Rx records. For instance, given a dataset of physicians’ Rx records whose prescriptions of a particular medicine is marked as increasing or decreasing on a quarterly or monthly basis, we can use Associative Rules to find previously not known patterns (eg., 75% of Physicians who accepted Cigna and was located in TX wrote 25% more Rx of Medicine A this month compared to the prior month). While, arguably, such patterns can be detected also using brute force methods, apriori is a much more efficient and statistically sound approach to finding patterns from large scale data

Risk-Based Monitoring in Clinical Trials

Risk-Based Monitoring (RBM) allows clinical trials related organisations (eg., CROs) such as IQVIA (Quintiles) to gather critical patient and subject information in real-time and react proactively to prevent adverse events before they occur. The use of machine learning in sensors and connected devices for EDC (Electronic Data Capture), such as devices for ECG, Actigraphy, Oximetry and others have been made possible, largely due to the advent of capabilities in consumer products such as Apple Watch and IOS/Android mobile devices. Data from such sensors can be transmitted in real time to a mobile device which can then apply machine learning to detect unusual changes or anomalies in vital signs and sensor measurements. An illustrative example can be seen in the application of Machine Learning to inertial sensors along with blood pressure monitors. A sudden and abrupt change in a patient’s position coupled with an elevated blood pressure level can immediately trigger an alert if the algorithm has been trained to recognize similar events that can lead to adverse outcomes.

Physician Matching

Pharma data vendors often use their own physician ids across their datasets. Unless the data records have an NPI ID, the absence of a common physician key makes it extremely challenging to join disparate datasources. Using data disambiguation techniques to correlate physician and patient records from disparate datasets. For eg., if Pharma Data Vendor 1 had records of Physician A and Pharma Data Vendor 2 also had records on Physician A but used different IDs, it may be possible to link the records using data disambiguation techniques. Such methods are often used to link disparate data sources and enrich existing datasets. The same theory/algorithms can also be applied towards Patient Matching

Machine Learning for Clinical Studies and R&D

Finding the right active molecules that work on specific targets (and not on unintended ones) is a common challenge in R&D. Machine Learning is used in conjunction with chemical structures and related data to find the optimal methods of targeting. Algorithms such as deep learning / neural networks and Bayesian Machine Learning are well suited to handling such use cases that may often involve discovering latent factors. 


Market Mix Modeling (or Promotion Response Modeling)

It is a common practice within Pharmaceutical companies to apply promotion response modeling to find the optimal sequence of mix multi-channel marketing (direct marketing, advertising, etc) and other activities such as detail (P1, P2) and call frequency. Traditionally, such activities have been performed using negative exponential (a response curve that increases fast but soon levels out, indicating that increased amount of promotion spending will have lesser and lesser impact on revenues). In more recent days, alternatives such as support vector machines have been successfully used to find optimal mix. The preliminary results have been encouraging and should prompt further research to assess the viability of new approaches.


In the next part of Machine Learning and AI in Pharma and Healthcare, we will share some more use cases on where Pharma is applying predictive and prescriptive analytics for improving internal research capabilities or supplementing them with algorithmic insights. We are currently offering free customized enterprise data science workshops focused on Pharma and Healthcare for a limited time if you wish to learn more about any specific topic. If you found this article useful or have any additional questions, please feel free to contact me at