article

Part four: an industry leader’s perspective on managing data quality

In this four-part series, Dr Raminderpal Singh discusses the challenges surrounding limited data quality and offers some pragmatic solutions. In this fourth article, he talks to John Conway, Chief Visioneer Officer at 20/15 Visioneers for an expert perspective.

Data set

In the first article in this series, published on Wednesday 14 August, the significance of data quality for the effectiveness of data analyses by machine learning (ML) and AI was discussed. In this article, John Conway, Chief Visioneer Officer at 20/15 Visioneers1 and industry leader, shares his perspectives on addressing data quality issues in the life sciences industry with Dr Raminderpal Singh. Read on to discover their key discussion points.

The state of data quality in drug discovery

In the realm of drug discovery, data quality is paramount. However, despite the advancements in technology and data generation tools, the quality of scientific data has not kept pace. Conway notes that while the scientific community has made significant strides in generating large volumes of data through high-throughput technologies, issues such as missing metadata, poor contextualisation, and inconsistent data management practices continue to plague the field.

The rapid increase in data generation has also led to a proliferation of what Conway refers to as “cheap data,” which is often collected without the necessary rigour to ensure its long-term usability. This has resulted in a landscape where vast amounts of data are underutilised or rendered useless due to a lack of proper management, leading to wasted resources and delayed drug discovery processes.

Addressing data quality challenges

Improving data quality in drug discovery requires a comprehensive approach that addresses the technical, procedural, and cultural aspects of data management. See below for strategies to tackle these challenges:

  1. Adopting a unified data strategy

A critical first step is for organisations to develop and implement a unified scientific data strategy, which should ensure that all data generated within the organisation is findable, accessible, interoperable, and reusable (FAIR). By aligning all departments and research teams to a common set of data management standards, organisations can prevent the fragmentation and inconsistency that often arise when different teams operate in silos.

  1. Investing in data governance

Effective data governance is essential to maintaining high data quality. This involves setting up governance structures that oversee the collection, management, and use of data across the organisation. Data governance should include the creation of standardised protocols for data capture, metadata generation, and data storage, ensuring that data is managed consistently and with the necessary context to be useful in the future.

  1. Automating metadata capture

One of the significant challenges in data quality is the manual effort required to capture metadata. Conway points out that scientists are often too busy to dedicate extra time to this task, which can lead to incomplete or inaccurate metadata. To address this, organisations should invest in technologies that automate the capture of metadata at the point of data generation. By embedding metadata requirements into the design of experiments and data capture systems, organisations can ensure that data is accompanied by the necessary contextual information without placing additional burdens on researchers.

  1. Implementing Standard Operating Procedures (SOPs)

Standard Operating Procedures (SOPs) are essential for ensuring consistency in data generation and management. These procedures should be designed to integrate seamlessly with the scientific workflow, providing clear guidelines on how data should be captured, stored, and analysed. SOPs should also be flexible enough to accommodate the specific needs of different research projects while maintaining a consistent approach to data quality.

  1. Encouraging reproducibility through rigorous methodology

Reproducibility is a cornerstone of scientific research and improving data quality is key to achieving it. Organisations should prioritise rigorous experimental design and methodology, ensuring that all experiments are conducted in a way that enables accurate replication. This includes thorough documentation of experimental conditions, data collection methods, and any variables that may affect the outcomes.

  1. Leveraging technology for data quality control

Advances in artificial intelligence (AI) and machine learning (ML) can play a significant role in improving data quality. These technologies can be used to monitor data in real time, flagging potential issues such as inconsistencies, missing metadata, or errors in data capture. By integrating these tools into the research process, organisations can proactively address data quality issues before they impact the research outcomes.

  1. Continuous training and development

Finally, ensuring data quality requires ongoing education and training for all members of the research team. This includes not only formal training in data management best practices but also continuous professional development to keep pace with new technologies and methodologies. By fostering a culture of learning and adaptation, organisations can ensure that their teams are equipped to maintain high standards of data quality in an ever-evolving scientific landscape.

Reference

1 https://www.20visioneers15.com/ 

About the author

Dr Raminderpal Singh

Raminderpal SinghDr Raminderpal Singh is a recognised visionary in the implementation of AI across technology and science-focused industries. He has over 30 years of global experience leading and advising teams, helping early to mid-stage companies achieve breakthroughs through the effective use of computational modelling. 

Raminderpal is currently the Global Head of AI and GenAI Practice at 20/15 Visioneers. He also founded and leads the HitchhikersAI.org open-source community. He is also a co-founder of Incubate Bio – a techbio providing a service to life sciences companies who are looking to accelerate their research and lower their wet lab costs through in silico modelling. 

Raminderpal has extensive experience building businesses in both Europe and the US. As a business executive at IBM Research in New York, Dr Singh led the go-to-market for IBM Watson Genomics Analytics. He was also Vice President and Head of the Microbiome Division at Eagle Genomics Ltd, in Cambridge. Raminderpal earned his PhD in semiconductor modelling in 1997. He has published several papers and two books and has twelve issued patents. In 2003, he was selected by EE Times as one of the top 13 most influential people in the semiconductor industry.

For more: http://raminderpalsingh.com; http://20visioneers15.com; http://hitchhikersAI.org; http://incubate.bio