The ETL (extract, transform and load) process by which organizations prepare data for storage is an essential part of modern database systems, particularly used for business intelligence applications. The problem is that it can be inefficient and slow — too slow for companies to do real-time and streaming analytics.
For many organizations, the lagginess of ETL is just part and parcel of the data engineering process. However, this excuse isn't going to cut it for many pharmaceutical professionals, who need their data faster than ever before. Real-time ETL needs to be a priority for the pharmaceutical industry as a whole.
ETL Is Changing
A number of recent data science thought pieces have described the major changes happening in ETL, with one going so far as to proclaim that "ETL is dead." While that headline might be a tad sensationalized, it's also true that ETL as we know it will soon be a thing of the past.
Despite its clear utility and value, ETL suffers from several common pain points. It asks for a specific skill set and a lot of maintenance to keep it running — not to mention the lengthy manual prep work that often causes time delays. These requirements simply aren't sustainable in a data-driven, real-time business landscape.
However, the goal of most data science professionals isn't to kill ETL, but to reform it. Rather than doing away with ETL entirely, companies can optimize their ETL processes for real-time insights, all while minimizing or even eliminating the manual prep work involved.
Why the Pharmaceutical Industry Needs to Prioritize Real-Time ETL
Real-time data operations and analytics is especially important for companies in the pharmaceutical industry. Here's just a few reasons why big pharma needs to make real-time ETL a priority now.
Technology Develops Faster
Recent technological developments in pharmaceuticals are generating and consuming more data than ever before, particularly real-time data. This is especially evident with the advent of futuristic inventions like wearable technologies, the internet of things (IoT), and "smart pills" that track whether patients have swallowed them. In the case of the smart pill, for example, results need to be as close to real time as possible.
The opportunities for healthcare and pharmaceuticals that new technologies enable are nearly limitless. Johnson & Johnson is currently using smart sensors in their production workflow to produce certain medications continuously rather than in a batch process. The company is also creating "smart contact lenses," personalizing them for the individual wearer in order to help with allergies and reduce glare and strain.
Data Gets Stale
The future of pharma data analytics depends on having access to fresh, real-time information, but ETL too often gets in the way. One study showed that "nearly two-thirds of data moved via ETL was at least five days old by the time it reached an analytics database."
Getting access to the data and extracting insights more quickly will make the pharmaceutical industry more efficient across a variety of applications, from research and development to marketing and the patient journey. Consulting firm McKinsey & Company estimates that applying big data strategies intelligently can increase revenues by $100 billion across the U.S. healthcare industry.
Mid-Market Companies Need to Compete
Since 2011, Novartis has hired over 1,200 engineers and mathematicians charged with analyzing big pharmaceutical data sets in order to determine drug values. While mid-market pharma companies obviously can't compete with this level of capital investment, they can minimize their time to insight in order to remain competitive.
The truth is that cost remains a huge barrier to growth in pharma. However, you can minimize and alleviate costs by using tools that reduce the need for pricey ETL experts and tedious manual prep work.
The Streaming ETL Revolution
Instead of the laborious ETL process — by which companies must clean, filter, reshape and roll up data before loading it into the database — an increasing number of businesses are choosing "streaming ETL." This means processing data in real time as it streams through a server, providing performance and scalability even as the volumes of data increase. In particular, "streaming analytics" solutions can extract meaning in real time, even while the data is in motion.
Although the financial services industry has used streaming ETL for some time for business problems such as algorithmic trading and fraud detection, it has yet to gain widespread adoption in other industries, such as pharmaceuticals. Of course, this is good news for now: Any pharma companies that choose streaming ETL instead of traditional methods will gain a competitive edge over their rivals.
Even better, businesses don't need to rely on Hadoop for streaming ETL anymore. They now have access to more lightweight tools that require less maintenance as well.