How data science is driving genomics in the pharmaceutical industry

Sriram, Ramya

How data science is driving genomics in the pharmaceutical industry

12

SHARES

Share via

Posted: 7 January 2021 | Ramya Sriram (Kolabtree) | No comments yet

In this article, Ramya Sriram describes how data science is driving innovations in medical biotechnology and genomics.

The human body contains a plethora of genomics data. Not only is our DNA made up of about three billion genome bases, if you laid out all the DNA in the human body, it would stretch to twice the diameter of the Solar System and each cell’s DNA would be three meters long.

Biotechnology, the use of living organisms or biological systems and their derivatives to make products, is propelled forward by data, information and statistics. In 2014, according to Science, bioinformatics became a discipline in its own right, rather than a tool in a biologist or biotechnologist’s armoury. Business intelligence, data analytics and technological advances are crucial to the development of new technologies and treatments and to overcome current challenges. By making sense of big data, from genomics or from sensors, we can identify potential drug targets, improve processes, bring new drugs to market and reduce errors in clinical trials.

Genomics

If you think of big data in the context of biotechnology, your first thought likely relates to genome sequencing. The Human Genome Project, which ran from 1990 to 2003, was a pioneering effort that gave us access to three billion bases of data, opening the door to information on mutations, genes and more. We now live in a world where genome data is at our fingertips; it can be sequenced in a few hours for under £1,000.

By making sense of big data, from genomics or from sensors, we can identify potential drug targets”

The data now available provides researchers with the ability to obtain valuable insights in the field of medicine, crime scene investigations, etc. To work with it effectively, data scientists use frameworks and tools to store, track, receive, analyse and interpret data. Tools are now being built to automatically annotate specific genes and software companies like DNAnexus, Knome and NextBio have begun to tackle genome interpretation. Interestingly, NextBio has even worked with Intel to improve Hadoop for genomic big data analysis. The pharmaceutical and healthcare industries can use this insight to improve diagnostics, aid drug discovery or develop personalised medicine strategies.

Drug discovery, development and genomics

Bringing a new pharmaceutical product to market is a long, arduous process with many bottlenecks. Trials regularly fail to meet their objectives, for example in terms of enrolment, which can add further delay and therefore increase the costs of an already expensive process. However before issues from recruiting patients for a clinical trial can take place, scientists first need to identify a drug candidate, which still includes numerous data points, experiments and risk/benefit analyses.

We can now use automated software to screen millions of compounds to identify drug candidates for a clinical trial. Pharmaceutical professionals can let artificial intelligence (AI) do the hard work of sifting through a huge library of potential drugs, assessing what is likely to work against the trial’s specific criteria.

Biotechnology company Numerate, for example, builds predictive models to help with small molecule drug design, making predictions on toxicity, metabolism, absorption, distribution and more. AI can also be used to invent new combinations of compounds. Pharmaceutical companies can therefore screen drug candidates and pick the ones most likely to succeed in clinical trials.

Big data in biotechnology is not only about genomics — the data may also be collected by sensors. Wearable, ingestible or implantable sensors can provide a continuous data stream for clinical trials. This data can reduce the gap between measurements taken at appointments, mitigate for human error, identify reasons for dropout and may allow patients to go about their normal lives more easily. Any improvement in the drug discovery or clinical trial process can save millions of dollars in development costs and therefore speed up the time it takes to bring a potentially life-saving drug to market.

The keys to the future

Data scientists hold the keys to the future of data analytics for medical biotechnology in their hands. For innovation to take place, the industry needs trained data scientists and biotechnicians with skills in programming and coding languages including Python, R, C++ and SQL among others. They also require an underlying knowledge of data collection, storage, algorithms, validation and visualisation to generate meaning from biological data.

The knowledge and skills needed to write in these languages are primarily held by data scientists with advanced degrees, meaning it is important to invest in either freelancers or training for in house talent to make full use of genomic data.

About the author

Ramya Sriram manages digital content and communications at Kolabtree, a freelancing platform for scientists. She has over a decade of experience in publishing, advertising and digital content creation.

Related organisations
DNAnexus, Human Genome Project, Knome, NextBio, Numerate

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

How data science is driving genomics in the pharmaceutical industry

Genomics

Drug discovery, development and genomics

The keys to the future

About the author

Leave a Reply Cancel reply

Recommended

How data science is driving genomics in the pharmaceutical industry

Genomics

Drug discovery, development and genomics

The keys to the future

About the author

Cancer drug discovery breakthroughs: research that’s changing lives

Outsmarting cancer by exploiting DNA repair flaws

Next-gen ADCs: Tubulis sets new standard in cancer treatment

AI-powered imaging for faster lung disease treatment

Using knowledge graphs in drug discovery (Part 1): how they link to large language models

Leave a Reply Cancel reply