The role of Data Scientists is a relatively new and a continuously evolving one in the Biopharma R&D industry. The defining characteristic of a Data Scientist is an amalgamation of strong domain knowledge in Therapeutic Area (Neurology, Immunology, Inflammation etc.) along with skills in Computational Statistics, Machine Learning and In silico modelling.
Now, as many of the R&D functions have already been using data to derive knowledge and insights since at least a couple of decades, there exist strong Computational Science teams within each those functions - such as Computational Chemistry, Computational Biology, Biostatistics and Computational Pharmacology. As Data Science derives a lot of its value from newer technologies such as Statistical Machine Learning, Deep Learning and Cloud-native tech stacks, it works hand-in-hand with the therapeutic-area teams from Computational Sciences.
This also means that the domain knowledge for Data Scientist who works together with Computational Science teams is specialised towards the domain of application. Therefore, we already see a burgeoning of different domain-specific Data Science roles. Here, I would like to cover some of these roles in brief.
Multi-omics Data Scientist - This is where Data Scientists work together with Computational Biologists, Systems Biologists, Proteomics Biomarker Scientists to understand disease and drug mechanisms. The data is high-throughput multi-omics dataset and the insights feed into novel targets identification, target validation or biomarker discovery and development. Much of the data is generated from cell-culture lab experiments or could be patient-derived such as those from large biobanks. ML-models which work with multi-modal data are often used to derive insights. The data is very high-dimensional so ML-based dimensionality reduction and regularisation techniques are often used.
Genomics Data Scientist - The human Genomics data is unique - one of the major enablers of Precision Medicine, it provides the blue-print for all other omic modalities. Here the Genomics Data Scientists work together with Bioinformatics Scientists & Geneticists for novel target identification but also with Biostatisticians and Epidemiologists for applications in genetics-based risk stratification of patients, clinical decision making and Pharmacogenetics-based therapeutic targeting. So, this role lends itself both to Early Drug Discovery as well as Clinical applications. The Genomics Data Scientist focusses on the development of validated, explainable ML models with a goal to enable genomics-based drug discovery and treatment.
Drug Discovery Data Scientist - The field of Drug Discovery is being shaped by AI-based generative algorithms for de novo design, both for small and large molecules. The Drug Discovery Data Scientist works closely with the Chemoinformatics and Computational Chemistry teams to complement their standard workflows with Deep Learning-based generative models and QSAR-based prediction models of bioactivity and other parameters in Hit-to-Lead generation as well as Lead Optimisation. The goal is to reduce the DMTA cycle times and hence accelerate bringing candidate molecules into pre-clinical development.
Pharmacometrics Data Scientist - The Pharmacometrics Data Scientist works closely with Quantitative DMPK and Toxicology teams essentially bringing predictive ML tools to complement the incumbent computational models. The goal of the Pharmacometric Data Scientist is to optimise the properties of the drug molecule so that it has the desired effect on patients when brought to the clinics. The Data Scientist works with in-vitro and animal in-vivo data along with some early stage human PK/PD data. As the datasets tend to be small, a lot of care has to be taken on the data quality and robustness of the ML models. There's also more focus on explainability as well as probabilistic ML models.
Biomedical Data Scientist - Here, Data Scientists collaborate with Translational Scientists who work on bridging the gap between the lab (Bio-) and the clinics (Medicine). One of the core tasks of Biomedical Data Scientists is to look at human data and derive insights such as identification of ML-driven disease phenotypes which would have the most favourable benefit-risk profile on exposure to the drug. There's close collaboration here with the Quantitative Pharmacology teams who determine the optimal dose and treatment regimens for early clinical development. The Quantitative Pharmacology Scientists use mechanistic models to model and simulate the Pharmacokinetics and Pharmacodynamics of the drug, the Biomedical Data Scientists on the other hand work together with them but adopts a less mechanistic approach to PK-PD predictions.
Clinical Data Scientist - In the early Clinical Development of the drug, there's considerable overlap between the roles of Biomedical and Clinical Data Scientists. Nevertheless, in late Clinical Development, the distinction becomes clearer. Along with the Clinical Scientists and the Biostatistics team, the Clinical Data Scientist focusses on ML-driven approaches to improve the design and analysis of clinical trials. They look to build predictive models on patient-level data for drug responses at baseline. Another area of interest for Clinical Data Scientists is optimisation of Clinical Operations. Although the article discusses Data Science roles primarily within biopharma R&D, Clinical Data Scientists are also found in Academic Medical Centres and University Hospitals which also engage in independent clinical research. Here the scope of their role could be even broader such as building AI-enabled disease diagnosis systems or decision support systems for clinicians.
Digital Data Scientist - The newest and perhaps the most upcoming role out of all the mentioned roles so far is the one of a Digital Data Scientist. Digital devices, wearables and health apps are now expected to be key enablers in Digital Medicine and Decentralised Clinical trials. This would include Digital measures or Digital Biomarkers such as those captured by wearables such as sleep or physical activity. This also includes Digital health apps which could be prescribed either as an add-on to pharmcological intervention (companion apps) or standalone therapeutics (DTx). This signals the creation of large amounts of patient-generated health data. The emergence of Digital Health and Data Science more or less coincided and hence there's no established Computational Science team which has handled digital measure data allowing Data Scientists to develop their own methods of data capture and data analytics borrowing from the established Signal Processing field.
Health Outcomes / RWE Data Scientist - The last and possibly one of the most diverse roles is that of Health Outcomes / RWE Data Scientist. In biopharma R&D, they collaborate with teams whose remit covers post-approval and real-world studies of drugs such as Medical Affairs and Market Access. They work on one of the most dynamic sources of data in biopharma R&D, the Real-World Data (RWD). This is all the patient-level data which is collected outside the confines of clinical research - EHR/EMR, Claims, Rx, Registeries etc. Traditionally, the Computational Epidemiologists have dealt with such data, but with the growth of RWD to truly "big-data" scale, Health Outcomes Data Scientists find themselves fully up to the challenge. Access to such data is often very costly and use-case specific. Health Data Scientists complement the typical workflow of Computational epidemiologists and drive use-cases such as creation of synthetic control arms, identification and prediction of patient-trajectories and add value to the traditional cost-effective analyses with ML-based predictive and causal models.
As a summary, there might be other Data Science roles which I might have missed all together (such as Imaging data scientist or NLP data scientists) - feel free to drop a comment if you feel that way!