Old maths & new tech combine to increase speed of drug discovery

Old maths & new tech combine to increase speed of drug discovery

Obtaining regulatory approval for a new drug can take up to 10 years, including identification of possible drug 'targets' in a given disease and the substances capable of acting on them, followed by clinical trials. This process is extremely expensive and risky, failing around 90% of the time.

The majority of potential new drugs being studied do not translate into approved medicines for patients, with the main factor for this low conversion rate being that the drug target itself ultimately does not turn out to play a critical role in the disease in question. Genetic evidence for drug targets plays a crucial role in drug discovery, increasing the success rate in drug development – in fact, it can double it - resulting in more effective medicines for patients.

However, it is extremely difficult to understand the genetic basis of complex disorders such as Parkinson's disease and diabetes. In Parkinson's, for example, 70% of the 'genetic load' - the proportion that an illness is due to genetic rather than environmental factors - is unexplained. Conventional analysis generates limited information and insights - the onset of complex disorders is thought to rely on the interaction of many variables, something not examined by traditional methods.

A research partnership between the Centre, a small drug discovery company C4X Discovery Limited, and Clive Bowman at Oxford University's Mathematical Institute aims to shorten this drug target identification stage by increasing the speed at which large patient DNA datasets can be analysed - together with analysing multiple genetic variables at a time in order to reveal the genes significant in specific disorders and the interactions between them. The software they have optimised for this purpose (Taxonomy3®) uses unique mathematics to help fill the heritability gap, which should lead to novel genetic insights and therefore the identification of highly valuable drug targets with an increased success rate.

The mathematical method is based on the 'individualised divergences' theory developed in 2005, a non-linear transformation which the Mathematical Institute has made applicable to real patient and control datasets. This process turns any data type into a numerical measure and contrasts it across groups to allow the genes of patients with complex disorders to be compared to healthy individuals according to various factors such as the patients' physical characteristics, gender or age, for example.

The Centre provides the computing know-how to analyse thousands of variables in large sets of data at a time. The software, which uses CUDA code to accelerate it, identifies significant records and patterns between patients. One such analysis compared 1 million genetic variables in 51 patients with Drug Induced Liver Injury with those of 282 controls (healthy individuals), matched by country and gender. This research identified Gene GJB1, which is central to drug-induced hepatotoxicity (liver damage) in mice, confirming that genes identified by the new method provide a significant opportunity for drug development.

However, even using the fastest processors available, 5,000 resamplings on a hundred subjects with a million variables still takes thousands of CPUs several weeks, and costs upwards of £10,000. To make the process even more efficient, the collaborators have moved from using CPUs to GPUs in order to conduct the analyses. This has significantly reduced the execution time of one of the analysis bottlenecks. For example the execution time of the analysis of 500,000 variables in 1,000 subjects was reduced from 116 seconds to just 1 second, and was four times less expensive. This cost reduction is cumulative, meaning that running analyses on 4,000 subjects delivers a 13-fold cost reduction.

Making analysis much faster and less costly means that running larger datasets, such as those relating to disorders like Parkinson's disease, is now possible. The records of 1,705 UK-based patients with this disorder were analysed along with 3,000 healthy controls, using data in the public domain from the Wellcome Trust Case Control Consortium (WTCCC). Prior classical analysis on this dataset only identified two significant genes. Our new method revealed thirteen – plus three different genetic subtypes, each with specific genes and gene-gene interactions driving the same apparent disorder. Using this information, specific drug targets can now be identified relating to each of the subgroups.

Olivier Delrieu, VP of Clinical Development & Mathematics at C4X Discovery Ltd, says "Taxonomy3® is a promising tool to help us understand complex disorders such as Parkinson's disease, and derive innovative drug targets with direct genetic support. We are now scaling up this important work to fuel the C4XD drug discovery pipeline, and have already hired a team of analysts for this purpose".