Home > Community > Expectations from a good QSAR tool in Drug Discovery Applications
Community
  • Expectations from a good QSAR tool in Drug Discovery Applications

    At every stage of the drug discovery pipeline, the application of QSAR is evidently beneficial, yet limited in its reliability in its current state. In this commentary, the various applications of QSAR are reviewed with respect to the drug discovery stages of compound library design, virtual screening, and lead optimization. The features required, performance expectations, and design constraints for an effective QSAR application vary significantly for each drug discovery stage, however, there are certain requirements that are common across all stages as well.

    A QSAR software application could comprise of three functionally distinct modules: (i) ‘Model Building Software Tools’, (ii) ‘QSAR Models’ and (iii) ‘Model Deployment and Prediction Systems’.

    (i) Model Building Software Tools:
     
    A QSAR model building software toolset is expected to handle common molecular structure format representations, perform structure optimizations, compute or import descriptors and property values for the input compounds, contain a set of machine learning algorithms for building QSAR Models as well as methods to validate them.  

    While QSAR modelers use a collection of statistical and computation chemistry software tools to achieve the above functions, very sophisticated specialty QSAR modeling software products are now available. These software products provide a broad selection of features useful at all stages of model building. With the implementation of current best practices, intelligent wizards and guided modeling workflows these products enable modelers of all skill levels to build good models of their data. Building the best models of any data is possible only by running the data through a wide range of statistical methods and model building algorithms over a wide range of parameter sets, as well as a variety of methods to validate the models and assess their robustness over intended ranges. Further, QSAR modeling software provides a rich interactive graphical interface for visual examination of data and results at all stages.

    (ii) QSAR Models:
     
    QSAR models can be categorized in a few ways. Depending on the type of end-point they are meant to predict, models can be activity, ADME, or toxicity models. Models are either global or local; local models are designed to predict over a small chemical space like a target focused library, a therapeutic class, or certain range of end point values, while global models are expected to cover a wider range of chemical space.

    There are several ‘Pre-built’ QSAR models for ADME and Toxicity predictions available commercially and in the public domain. Most of the pre-built models available are ‘black boxes’ with little information about the applicability domain and the prediction confidence metrics available to the users of the models. There are some model providers, though, that provide abundant information about the models, such as the training compounds, range and distribution of end-point values used, the descriptor features used in building the model, the algorithms and parameter settings employed, and so on. When the training data set is packaged with the pre-built models, it allows modelers to “localize” or “globalize” them by sub-setting or adding new or in-house data and retraining these models.

    (iii) Model Deployment and Prediction System:

    Information that allow users of models to attach confidence to the predictions, like similarity of input compounds to the model training compounds in the chemical and descriptor spaces, would be an essential aspect of an effective software system through which models are deployed for users. The model deployment system should allow users to visually examine the effect of variations on the compounds, like R-group enumerations, on the predictions. More often than not, the users of models are not as sophisticated users of computer programs as the modelers, so a higher level of product design considerations for ease-of-use and intuitiveness are essential in designing model deployment and prediction systems.

    QSAR models are commonly built and “thrown over the wall” for users. Focus is seldom on proper ways to collect information on performance and usage of these models. This information feedback would be vital to model builders to continually improve the predictive performance of the models. This also allows organizations to assess the value addition of the QSAR technology applications to their research efficiency. An effective model deployment system should focus on keeping the models updated. New data, especially data on compounds for which decisions were made upstream based on QSAR model predictions, should be made available to tune and improve the models as and when it becomes available from the labs.

     

    While the above modules and their desired functions describe the overall expectations of a good QSAR software system, the following description attempts to capture the specific requirements of each application area of QSAR in drug discovery research.

    Application #1: Compound Library Design.

    When designing target focused compound libraries, QSAR models are helpful in assessing drug-likeness of compounds using appropriate global ADME and toxicity models. At the same time, local QSAR models built to be sensitive to the bioisosteric transformations around scaffolds of known activity can be helpful in designing novel bio-equivalent compounds. 

    The features of QSAR software that are of most interest at this stage of discovery are:

    -    Global ADME and toxicity QSAR models; models based on human data from known drugs may be more valuable.
    -    Structure (Tanimoto) and descriptor space similarity searches
    -    Interpretable set of descriptors that are sensitive to R-group changes.
    -    Features that allow automated R-group enumerations.

    Application #2: Virtual Screening

    The value in assessing ADME/toxicity liabilities of compounds as early as virtual screening has been well established. Predictions from robust global ADME and toxicity models in addition to HTS results can be valuable in decision support for selecting the hit series. Results from early HTS runs can be used to build QSAR
    models, and such models can then be used to profile compound repositories and optimize the number of compounds that need to be synthesized and/or covered in the subsequent runs. Availability of global models based on legacy in vitro assay data would be useful in rank ordering hits as they are lined up for in vitro studies.

    The features of QSAR software that are of most interest at this stage of discovery are:
    -    Global ADME and toxicity QSAR models; models based on data from standardized in vitro assays.
    -    Local models based on “first-wave” of HTS runs.

    Application #3: Lead Optimization

    A large number of iterations of design, synthesis, and testing cycles characterize the lead optimization stage of drug design. By this stage, significant amount of assay data on the chemical series to which the leads belong would be available. This presents an opportunity to build a variety of QSAR models from this data and make them available to medicinal chemists.

    Lead optimization is essentially an optimization of the chemical structure over multiple parameter dimensions. Features that allow chemists to study the affect of structure changes on these dimensions, in real-time, through predictions from relevant QSAR models can be valuable in reducing the number of iterations required to converge on optimal candidates.

    The features of QSAR software that are of most interest at this stage of discovery are:

    -    Activity, ADME and toxicity QSAR models; models based on data from standardized in vitro and in vivo assays.
    -    An easy-to-use interface that allows editing structures interactively and computing QSAR predictions on the fly.
    -    A function that allows the chemists to define the optimization functions by providing relative weights for properties and also metrics of desirability
         for each property. This will allow the chemist to simultaneously examine a large collection of compounds against a single function value.

    Closing remarks:

    Despite the criticism about the effectiveness of QSAR methods for drug discovery, most large pharmaceutical companies are employing these methods for the type of applications discussed here. A simple fact is that there is new data being generated every day in addition to the mountain of data already available, and methods like QSAR are very powerful in mining and modeling this wealth of information. There is significant work needed to the effective apply the QSAR methods – and software products need to play a significant role in that movement.