3D-QSAR and Physical Property Modeling Using Quantum-Mechanically-
Derived Molecular Surface Properties
A Dissertation
Kendall Byler
2007
3D-QSAR and Physical Property Modeling Using
Quantum Mechanically Derived Molecular Surface
Properties
Den Naturwissenschaftlichen Fakultäten der
Friedrich-Alexander-Universität Erlangen-Nürnberg
zur
Erlangung des Doktorgrades
vorgelegt von
Kendall Grant Byler
aus Huntsville
Als Dissertation genehmigt von den naturwissenschaftlichen
Fakultäten der Friedrich-Alexander-Universität Erlangen-Nürnberg.
Tag der mündlichen Prüfung: 11.05.2007
Vorsitzender der Promotionskomission: Prof. Dr. E. Bänsch
Erstberichterstatter: Prof. Dr. T. Clark
Zweitberichterstatter: Prof. Dr. P. Gmeiner
Acknowledgements
I would like to thank those but for whom this work would not have been
possible. The first of these is Professor Dr. Tim Clark, who provided the opportunity
and the guidance in my study of computational chemistry. And thanks go to the
members of the Clark group who helped me in my endeavors: Dr. Nico van Eikema
Hommes, Dr. Harald Lanig, Dr. Ralph Puchta, Dr. Matthias Hennemann, Matthias
Brüstle, Anselm Horn, Dr. Olaf Othersen, Dr. Gudrun Schürer, Dr. Tatyana
Shubina, Florian Haberl, Kirsten Höhfeld, Catalin Rusu, Jr-Hung Lin, Hakan Kayi,
and Sergio Sanchez. And also to members of the Gasteiger group for their
assistance: Dr. Simon Spycher, Prof. Dr. Fernando da Costa, Dimitar Hristozov, Dr.
Christof Schwab, and Dr. Thomas Engel, and of course Adrian Jung of the Kirsch
group. Thanks also to the Pfizer Corporation for their financial support of this
research.
I would thank my family: my parents, Paul and Carol Byler, my sister,
Ashley, my grandparents, Henry and Martha Snoddy, Elza and Emma Byler, and
my beautiful wife, Anastasia. And I would thank the friends everywhere that stayed
friends despite the separations of time and distance.
i
Contents
1 Introduction ................................................................................................1
1.1 Drug Discovery............................................................................................1
1.2 Property Modeling ......................................................................................3
1.3 A Quantum-Mechanical, Molecular Orbital Approach..........................4
2 Surface-Integral QSPR Models: Local Energy Properties ....................7
2.1 Introduction.................................................................................................7
2.1.1 Local Molecular Properties...............................................................8
2.1.2 Surface-Integral Models....................................................................9
2.2 Methods......................................................................................................15
2.3 Results ........................................................................................................16
2.3.1 Octanol/Water Partition Coefficient ...............................................16
2.3.2 Free Energy of Solvation ................................................................23 2.3.2.1 Free Energy of Solvation in Octanol ...................................................... 23 2.3.2.2 Free Energy of Solvation in Water ......................................................... 28
2.3.3 Acid Dissociation Constant.............................................................33
2.3.4 Boiling Point ...................................................................................36
2.3.5 Glass Transition Temperature.........................................................40
2.3.6 Aqueous Solubility..........................................................................44
2.4 Discussion...................................................................................................48
2.5 Conclusions ................................................................................................51
ii
3 Support Vector Classification of Phospholipidosis-Inducing Drugs... 52
3.1 Introduction............................................................................................... 52
3.1.1 Phospholipidosis ............................................................................. 52
3.1.2 Phospholipidosis Models ................................................................ 54
3.1.3 Surface Autocorrelations ................................................................ 56
3.1.4 Statistical Methods.......................................................................... 57 3.1.4.1 Support Vector Machines...................................................................... 57 3.1.4.2 Multivariate Adaptive Regression Splines............................................ 60
3.2 Methods...................................................................................................... 61
3.3 Results ........................................................................................................ 62
3.3.1 Support Vector Machines ............................................................... 63
3.3.2 Multivariate Adaptive Regression Splines
Using Autocorrelation Indices....................................................... 68
3.4 Discussion .................................................................................................. 70
3.5 Conclusions................................................................................................ 73
4 3D-QSAR Using Local Properties .......................................................... 74
4.1 Introduction............................................................................................... 74
4.1.1 Comparative Molecular Field Analysis .......................................... 74
4.1.2 Partial Least Squares Regression.................................................... 76
4.1.3 Local Properties .............................................................................. 77
4.2 Computational Methods........................................................................... 79
4.3 Results and Discussion.............................................................................. 80
4.3.1 Serotonin Receptor Agonists/Antagonists ...................................... 80
4.3.2 Adrenergic Receptor Agonists/Antagonists.................................... 84
4.3.3 Dopamine D4 Antagonists.............................................................. 86
4.3.4 Avian Influenza Neuraminidase Inhibitors..................................... 89
4.3.5 Mutagenic Tertiary Amides ............................................................ 92
iii
4.3.6 The Effect of Grid Orientation on Predictivity ...............................96
4.4 Conclusions ..............................................................................................101
5 Conclusions and Outlook .......................................................................103
5.1 Conclusions ..............................................................................................103
5.2 Outlook.....................................................................................................104
6 Summary .................................................................................................106
7 Zusammenfassung ..................................................................................110
Appendix A..................................................................................................114
Appendix B..................................................................................................151
References....................................................................................................152
iv
Chapter 1
Introduction
1.1 Drug Discovery
It has been estimated1 that, out of a pool of millions of compounds screened,
10,000 reach the animal testing phase, which will then likely produce ten drug candidates
for human clinical trials, of which only one will reach the market. It may also require 15
years and 750,000 U.S. dollars in the process. Drug candidates that fail late in the testing
process will never produce a return for the company that has invested so much time and
money. Pharmaceutical companies must offset these losses by recouping the expenditure
from among the several successfully tested drugs they produce.
In an effort to minimize the potential loss from focusing on compounds that will
never result in a marketable drug, much preliminary research and testing are done. The
rational drug-design approach2 to this problem begins by identifying a molecular target
involved in a pathophysiological process and characterizing its structure and function;
then begins the search for a lead compound. This is usually achieved by means of an array
of in vitro screens for biological activity. Large groups of compounds may be evaluated
simultaneously in this way and the procedure is referred to as high-throughput screening
(HTS). Once a lead compound is discovered, it may also be found to have some
undesirable properties such as high toxicity, poor bioavailability or pharmacokinetics.
Libraries of compounds may be synthesized that have modifications to the general
structure of the lead compound in an effort to modulate the desirable and undesirable
1
Introduction
effects. Structure-activity relationships (SAR’s) may be observed concurrently with the
study of the combinatorial library that point to a common chemical substructure that
produces the pharmacological effect. The medicinal chemist can then make various
modifications to the pharmacophore in order to improve its properties.
Kubinyi3 describes the drug-design process in terms of a design cycle wherein the
optimization of a lead compound is improved iteratively in an evolutionary manner4
(Figure 1.1).
BiologicalConcept
Computer-aided design:Protein crystallography, NMR, 3D databases, designde novo
Lead StructuresSeries design,
synthesis design
SynthesesBiological Testing
Structure-activityrelationships, QSAR,molecular modeling
New Drug Investigational New DrugCandidates for
further development
Figure 1.1 The drug design cycle from Kubinyi’s lectures on drug design4.
However, all of this takes quite a lot of time and the questions of clinical
development and lengthy drug approval process have yet to be addressed. Thus, to
improve the efficiency of the HT screen further, chemists use molecular-modeling
schemes to calculate properties based on chemical structure to aid in the screening
process. These virtual-screening methods include molecular-dynamics simulations,
protein-ligand docking, protein-protein docking, membrane simulation, similarity
searching of pharmacophore databases, and quantitative structure-activity relationships
2
Chapter 1
(QSAR’s). These tools allow pharmaceutical companies to screen out compounds that
possess too many undesirable characteristics before investing time in producing,
chemically analyzing, and testing. Of great interest is the elucidation of a set of
chemical/physical properties that modulate the relationship between chemical structure
and pharmacological activity that could be used to predict activity based solely on
chemical structure.
1.2 Property Modeling
The use of property modeling for the purpose of prediction has taken many
approaches. One of the first examinations of electronic effects on activity lies with
Hammett’s linear free-energy relationship5 of substituent effects on benzoic acid
hydrolysis reaction rates. He generated a series of substituent constants from a plot of the
effect on reaction rate, which could then be used in the prediction of the substituent effect
on other reaction rates. Hansch suggested6 a similar relationship between lipophilicity and
biological activity. Unless a drug is actively transported across the cell membrane, it must
passively diffuse through the membrane7, which is composed of a lipid bilayer. Thus, the
lipophilicity of a compound must have a corresponding effect on the drug’s ability to enter
the cell and produce the pharmacological effect and, indeed, this correlation between
lipophilicity and biological activity had been observed as early as the late nineteenth
century8. Since direct measurement of the solubility of compounds in cellular membranes
is difficult at best, Hansch6 approximated this property of lipophilicity by a measure of the
ratio of a compound’s solubility in n-octanol and in water as defined9 by
[ ][ ] (1
oct
aq
compoundPcompound )α
=−
(1.1)
where the term (1-α) represents the degree to which the compound dissociates in water as
calculated from its ionization constant. As some compounds are ionizable, making them
appear more soluble in water, solubility measurements in water are often performed in an
aqueous buffer and measurements taken over a pH range (logD). Substituent constants
similar to those of Hammett were used to calculate logP and logD based solely on the
3
Introduction
chemical structure. More recently atom/fragment based methods were developed10 for the
prediction of logP and logD. A more or less Gaussian distribution of logP values
correlating to the drug potency (log 1/C), with a peak value of approximately 2, has been
observed11.
Lipinski made the observation12 that a compound’s oral absorption and distribution
seemed to depend on certain structural characteristics. This is commonly referred to as the
Rule of Five and states that a compound with two or more of the following characteristics
will be poorly absorbed and distributed in the body. These are:
• A molecular weight > 500 amu.
• A logP > 5.
• More than 5 hydrogen-bond donors (sum of –OH’s and –NH’s)
• More than 10 hydrogen-bond acceptors (sum of N and O atoms)
Drugs that passively diffuse across the cell membrane tend to follow this rule, while those
that are actively transported do not depend on the same criteria of
lipophilicity/hydrophilicity and are exceptions. More recently4, the observation was made
that the absorption of drug-like molecules is regularly distributed along these properties,
bounded on one side by the rule-of-five values. The general implication of this simple
rule is that the amount of property space that needs to be sampled in order to derive
physicochemical or pharmacological properties is small. It is necessary only to discover
the particular set of molecular descriptors that describe the set of properties to be predicted
adequately. This trend in property prediction seems to be toward a reduced space
approach, which can account for complex interactions by relatively simple terms.
1.3 A Quantum-Mechanical,
Molecular Orbital Approach
Most modeling methods employ as much theory as is practicable given the system
to be studied. For example, molecular dynamics may be used to model proteins and
4
Chapter 1
protein-ligand interactions in solution, but the complexity of the system requires the use of
classical mechanics with a reduced set of non-bonded interactions, and a simplified
representation of the solvent molecules. Other approximations are made in order that the
simulation may be made in some reasonable period of time. Although the system as a
whole may be well represented, this approach often leaves interactions near a particular
site of interest poorly described13,14. This has led to the development of hybrid quantum
mechanical/molecular mechanical (QM/MM) methods15,16, which employ quantum
mechanical calculations in the regions where a higher level of theory is required, while the
bulk of the system is represented by force-field calculations. These regions are usually
those where ligands interact with protein residues in a binding pocket and quantum
mechanical methods describe electrostatic intermolecular interactions better than atomic-
monopole-based force field techniques17.
Since the point of contact for all drugs lies inevitably with the molecular surface of
both the drug and the drug target, a descriptive model of the molecular surface is needed.
The nature of this surface is electronic and quantum mechanical methods are those which
describe the electronic structure of the molecule. Quantum mechanical calculations take
into account the behavior of electrons in molecular orbitals rather than localized atomic
orbitals, whereas force field techniques must inevitably rely on atomic constants
parameterized to heats of formation. Local properties such as the molecular electrostatic
potential (MEP) have been used to describe strong non-covalent interactions that are based
primarily on charge. The MEP has been projected onto molecular isodensity surfaces to
calculate descriptors for physical property prediction by Murray and Politzer18-23.
Recently, additional local properties were described24,25 to complement the MEP and
provide a more complete description of the local electronic environment at the molecular
surface. Local properties such as polarizability24, ionization potential24,26, electron
affinity24, electronegativity27-29, and hardness29, taken together, can readily be calculated
by quantum-mechanical methods. Dispersion forces, which dominate in the case of
nonpolar molecules, may be described by calculating local molecular polarizability30. The
tertiary structure of proteins and the stability of biological membranes depend
fundamentally on these dispersion interactions between nonpolar regions31-33.
5
Introduction
Figure 1.2 Surface-integral electrostatic potential surface for paracetamol.
The use of surface-integral models33 (SIM’s) to predict physical properties by the
integration of a functional of one or more local properties over the molecular surface has
been demonstrated in the literature31,32,34. In addition to predicting physical properties,
surface-integral QSAR models may be constructed from local properties that predict
biological activities such as enzyme inhibition constants (Ki) and protein-ligand binding
(Kd) constants. These activities, used as local properties, may then be mapped onto the
molecular surface to expose regions that are significant to the observed activity. In this
way, the portions of a drug’s molecular surface important to the binding and activation of
its target may be examined as functions of both local electronic properties and local
activities concomitant with the property/activity predictions of the virtual high-throughput
screen.
6
Chapter 2
Surface-Integral QSPR Models:
Local Energy Properties
2.1 Introduction
The tools used for quantitative structure-activity relationships (QSAR),
quantitative structure-property relationships (QSPR), protein-ligand docking, and scoring
functions, among others in the cheminformatics toolbox, generally apply an atom-based
approach. In an attempt to move from this atom-based scheme to a quantum-mechanical
surface-based approach13,14,17, a local properties method has been developed to define
properties and interactions at the molecular surface. These local properties are used in
statistical models for the prediction of physical properties and biological activities in terms
of Coulomb, exchange repulsion, dispersion, and donor-acceptor interactions. The
following describes the local-property/surface-integral approach implemented by the
CEPOS InSilico program Parasurf ‘0635 used in producing QSAR/QSPR models for the
octanol-water partition coefficient (logP), the free energy of solvation in water
(ΔGsolv.(H2O)), the free energy of solvation in n-octanol (ΔGsolv.(oct.)), the acid
dissociation constant (pKa) for nitrogenous compounds, the boiling point (Tb) for organic
7
Surface-Integral QSPR Models: Local Energy Properties
compounds, the glass transition temperature (Tg) for organic polymers, and water
solubility (logS).
2.1.1 Local Molecular Properties
The electrostatic potential at the molecular surface has been examined widely22,36
as a descriptor of the electronic environment of the molecular surface and has been used to
describe the noncovalent interactions possible for a given structure. Murray and Politzer
have used the molecular electrostatic potential (MEP, V) and statistical measures derived
from it to calculate pharmacological properties by a general interaction properties
method18-23,37. Tripos’ SYBYL36 uses a calculation of MEP for use in comparative
molecular-field analyses. Additional local properties defined at the molecular surface
have recently been examined as predictors of two-electron donor-acceptor interactions in
order to describe intermolecular electronic interactions more completely 24,25.
The molecular electrostatic potential V(r) is defined as the energy resulting from
the interaction between a positive point charge with a point r on the molecular surface and
is described by the equation,
( ) ( )1 R
ni
i i
dZVρ
=
′ ′= −
′−∑ ∫r r
rr r - r
(2.1)
where n is the number of atoms in the molecule, ρ (r) is the electron-density function for
the molecule, and Zi is the nuclear charge of atom i at Ri.
The local ionization potential26 IEL(r) is a density-weighted Koopmans’ ionization
potential38 at a point r at the surface that describes the tendency of a molecule to interact
with electron acceptors (electrophilic reactivity) and is defined by
( )( )
( )1
1
HOMO
i ii
L HOMO
ii
IEρ ε
ρ
=
=
−=
∑
∑
rr
r (2.2)
where ρ i (r) is the electron density at r due to molecular orbital i, εi is its Eigenvalue.
Local electron affinity EAL is defined in an analogous Koopmans’ formulation
using the virtual orbitals and describes the tendency of a molecule to interact with electron
donors. It is defined by:
8
Chapter 2
( )( )
( )
orbs
orbs
n
i ii LUMO
L n
ii LUMO
EAρ ε
ρ
=
=
−=
∑
∑
rr
r (2.3)
Local hardness29 ηL and local Mulliken electronegativity27 χL are derived from the
two previous properties24 by:
( )2
L LL
IE EAη −= (2.4)
2
L LL
IE EAχ += (2.5)
and represent additional local properties that are readily-interpretable chemical terms.
Local polarizability αL is an occupation-weighted sum of the orbital polarizabilities
over atomic orbitals using Rivail’s variational technique39-43 in which the contribution of
each atomic orbital is determined by the electron density of the individual atomic orbital at
point r and is defined by:
( )( )
( )
1
1
1
1
orbs
orbs
n
j jj
L n
j jj
q
q
jρ αα
ρ
=
=
=∑
∑
rr
r (2.6)
where qj is the Coulson occupation, α j is the isotropic polarizability for atomic orbital j,
and density ρ j is defined as the electron density at r due to an exactly singly occupied
atomic orbital j. The five local properties used in the following regression models have
been shown to be essentially orthogonal25, with ηL correlating weakly with local ionization
potential.
2.1.2 Surface-Integral Models
The surface-integral models are defined by the general expression:
( )1
, , , ,ntri
i i i i iL L L L
iP f V IE EA α η
=
iA= ⋅∑ (2.7)
9
Surface-Integral QSPR Models: Local Energy Properties
where P is the modeled property, f is a nonlinear function of the five local properties
where the summation is run over all ntri surface triangles which make up the molecular
surface. The individual surface properties are taken from the center of each triangle,
denoted by the superscript i, with an associated area Ai. The function f is determined by
multiple regression using pre-calculated sums of component terms as listed in Appendix
A, Table A1.
The local properties may be fitted to an isodensity or spherical-harmonic surface39.
When a spherical-harmonic approach is used, the surfaces, as well as the local properties,
are fit to a spherical-harmonic expansion of radial distances,
( ),0
cos cosN l
m ml lm l
l m lr c N Pα β mα β
= =−
= ∑ ∑ (2.8)
where (cosmlP )α are Legendre functions, Nlm are normalization factors, and l and m are
integers ( ). The number of harmonics to be used depends on the application. In
general, the higher the order of l, the incrementally tighter the surface is fitted to the
molecular framework. Spherical-harmonic fitting may only be used with a shrink-wrap
surface because the surface properties must be single-valued at any point extending
outward from the center along a radial vector.
l m l− ≤ ≤
The local properties are calculated for each of a set of triangles fitted to the surface
of the molecule. This set of tesserae may be integrated over the entire surface in order to
derive quantitative structure-activity and structure-property models. In this way the local
properties and the properties/activities derived from them, mapped to the molecular
surface, may be visualized using molecular visualization software such as GEISHA44 or
Pymol45 (See Table 2.2). In addition to QSAR/QSPR models that may be derived from
the surface-integral approach, descriptors based on various statistical features of the local
property surfaces may also be used. A set of 40 molecular descriptors derived from the
local surface properties are generated by Parasurf for use in statistical models. Models
generated using Murray-Politzer-type18, 19, 22 statistical descriptors use the general formula:
( )1 4,...,P f D D= 0 (2.9)
10
Chapter 2
These statistical descriptors are described in the following table:
Table 2.1 Parasurf ‘06 statistical descriptor set.
Descriptor Description
Dipole moment μ
Dipolar density μD
Molecular electronic polarizability α
Molecular weight MW
Globularity Glob
Molecular surface area A
Molecular volume Vol
Most positive MEP Vmax
Most negative MEP Vmin
Mean of positive MEP values V+
Mean of negative MEP values V−
Mean of all MEP values V
Range of MEP values VΔ
Total variance of positive MEP 2σ +
Total variance of negative MEP 2σ −
Total variance in MEP 2totσ
MEP balance parameter MEPν
Product of MEP balance and variance 2tot MEPσ ν
Maximum ionization potential value maxLIE
Minimum ionization potential value minLIE
Mean ionization potential value 1
1 Ni
L Li
IE IN =
= E∑
Range of ionization potential max minL L LIE IE IEΔ = −
Total variance in ionization potential 2
2
1
1 Ni
IE L Li
IE IEN
σ=
⎡ ⎤= −⎣ ⎦∑
Maximum electron affinity value maxLEA
11
Surface-Integral QSPR Models: Local Energy Properties
Minimum electron affinity value minLEA
Mean of positive electron affinity values 1
1 Ni
L Li
EA EAN
+
+ ++=
= ∑
Mean of negative electron affinity values 1
1 Ni
L Li
EA EAN
−
− −−=
= ∑
Mean of electron affinity values 1
1 Ni
L Li
EA EAN =
= ∑
Range of electron affinity max minL L LEA EA EAΔ = −
Variance in positive electron affinity 2
2
1
1 m
EA ii
EA EAm
σ + ++
=
⎡ ⎤= −⎣ ⎦∑
Variance in negative electron affinity 2
2
1
1 n
EA ii
EA EAn
σ − −−
=
⎡ ⎤= −⎣ ⎦∑
Sum of pos., neg. variances for EA 2 2 2EAtot EA EAσ σ σ+ −= +
EA balance parameter
2 2
22
EA EAEA
EA
σ σνσ
+ −⋅=
⎡ ⎤⎣ ⎦
Fraction of surface with pos. EA EAAδ +
Mean electronegativity value Lχ
Maximum local polarizability value maxLα
Minimum local polarizability value minLα
Mean local polarizability value Lα
Range of local polarizability LαΔ
Variance in local polarizability 2ασ
Yet other property models use spherical-harmonic hybridization coefficients as
terms in the multipole regression with the same general form. The set of spherical-
harmonic terms consists of 100 hybridization coefficients Hl: 16 shape hybrids, and 21
each of V, IEL, EAL, and αL hybrids. These are defined by:
( )2mm
l li m
H c=−
= ∑ (2.10)
12
Chapter 2
When a molecular shape or a local property is fitted to the spherical-harmonic expansion,
the shape or property may be described by the hybridization coefficients in an analogous
fashion to the linear combination of atomic orbitals (LCAO).
Figure 2.1 Molecular electrostatic potential surface for N-(3-acetylphenyl)-acetamide.
Quantitative structure-property models for several physical properties have been
derived using surface-integral methods31,32, including logP as a measure of
hydrophobicity33,46-48 and solvation free energies. Several surface-integral QSPR models
employing the aforementioned local properties are presented here. This treatment is
felicitous in dealing with donor-acceptor and dispersion interactions between molecular
surfaces that play a significant role in solvation by non-polar solvents49 and protein-ligand
binding to non-polar residues.
13
Surface-Integral QSPR Models: Local Energy Properties
Table 2.2 Local property surfaces for paracetamol calculated with Parasurf.
Electron Affinity
Electronegativity
-147 0 93 329
Hardness
Ionization Potential
200 432 320 761
Molecular Electrostatic Potential
Molecular Polarizability
-5424 1907 159 346
14
Chapter 2
2.2 Methods
Structures for the data sets assembled from the literature were converted from 2D
structures to 3D MDL SD files using Molecular Networks’ CORINA50,51. The molecular
geometries for these were then optimized with the AM1 Hamiltonian52 using VAMP 9.053.
In cases where the addition of d-orbitals improved the overall structure, the AM1*
Hamiltonian54 was used for optimization, followed by a single-point AM1 calculation in
order to retrieve essential polarizability data. The five local surface properties were
calculated for each structure by Parasurf ’0635 for either an electron isodensity surface or
a spherical-harmonic-fitted surface by either a marching-cube or shrink-wrap algorithm, as
indicated. Molecular electrostatic potentials were calculated using the zero-differential-
overlap-based atomic multipole technique and the local ionization energy, electron
affinity, and polarizability as described previously40-43,55,56. Multiple regression models
were generated with Tsar 3.357 using functions of powers and products of the five local
properties. One-hundred-fifty nonlinear product and power terms of the local properties
were generated by script and used as descriptors in a multiple regression routine
(Appendix A, Table A1). The multiple regressions were performed with the Leave Out
Groups of Three cross-validation method, using an F to enter value of 4.0 and F to leave
of 3.9, excluding variables if there is a cross-correlation greater than 0.9. A Leave One
Out method is often used with the multiple regression routine, yielding predictive r2cv
values that may be very close to corresponding r2 values. However, it is considered by the
authors of Tsar to be a better measure of predictivity in the case of stepwise regression to
leave out what amounts to a third of the data to be predicted by the remaining two-thirds
(Tsar reference guide). Individual terms used in the multiple regressions that cross-
correlated R>0.86 were excluded from the regression.
The surface-integral models obtained using the marching-cube method of
generating isodensity surfaces were fitted at an isodensity value of 0.008 e/Å3
(corresponding approximately to a van der Waals surface). For the regression models
using the shrink-wrap method of generating surfaces, including those models employing
spherical harmonic coefficients as regression terms, the local properties were fit to
spherical harmonics at an isodensity value of 0.0002 e/Å3, which, for a spherical-harmonic
15
Surface-Integral QSPR Models: Local Energy Properties
fit, is approximately the van der Waals’ surface. The set of spherical harmonic terms
consists of 100 hybridization coefficients: 16 shape hybrids, and 21 each of V, IEL, EAL,
and local polarizability hybrids. These were also used as terms to generate linear
regression models. In this chapter, the statistical measure of the fit of the regression
models to the surface data are presented below the plots of experimental and calculated
physical property values by the regression coefficient, r2. Measures of the predictive
capacity of the models are expressed as r2cv, the cross-validated regression coefficient, the
mean unsigned error (MUE), and the root-mean-square error (RMSD) of the predictions.
2.3 Results
2.3.1 Octanol/Water Partition Coefficient
The n-octanol/water partition coefficient (here, logP) data set consists of 168
structures assembled from the literature58-60, with values ranging from -3.64 to 8.23 logP
units (Appendix A, Table A2.). The surface-integral model for logP derived from
multiple regression using the set of 150 property terms (using the marching-cube
algorithm) as starting variables yielded an 8-term regression equation using neutral
structures (including zwitterionic amino acids) and represents the best model to date. The
regression equation is given by:
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( )
5 3 3-6 82
5 213 62
216
36 2
10
(log ) 1.6967 10 4.6367 10 0.25768
5.2448 10 4.4222 10
7.7213 10
1.5978 10
1.4233 10
L
L L
L L
L L
L
f P V V
V IE
V IE
V EA
V EA
α
α η
η
α
−
− −
−
−
−
= × ⋅ + × ⋅ − ⋅⎡ ⎤ ⎡ ⎤ ⎡⎣ ⎦ ⎣ ⎦ ⎣
− × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤ ⎡⎣ ⎦ ⎣
+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
+ × ⋅ ⋅ ⋅
r r r
r r r r
r r r
r r r
r r ( )
L
⎤⎦
⎤⎦
r
52 0.1784Lα +⎡ ⎤⎣ ⎦r
(2.11)
16
Chapter 2
-2 0 2 4 6 8Experimental logPOW
-2
0
2
4
6
8
Cal
cula
ted
logP
OW
Figure 2.2 Experimental and calculated values of logP for the test set:
N=168, MUE=0.227, RMSD=0.500, r2=0.797, r2cv=0.685.
In a prior model, the amino acids phenylalanine and tryptophan were represented in their
uncharged forms, which resulted in a 7-term regression equation:
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
2 3-6 8 10
337 5 2
311 2
324 2
(log ) 6.2390 10 2.7378 10 2.2779 10
5.0736 10 1.6563 10
2.0941 10
8.5026 10
0.3042
L
L L L L
L L
L L
f P V V IE
EA
V IE EA
V IE
α α η
η
− −
− −
−
−
= × ⋅ + × ⋅ − × ⋅⎡ ⎤ ⎡ ⎤ ⎡⎣ ⎦ ⎣ ⎦ ⎣
− × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦
− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+
r r r
r r r r
r r r
r r r
3⎤⎦r
(2.12)
As can be seen in Figure 2.1, the regression statistics and the clustering of points improves
slightly with the use of the zwitterionic forms of these amino acids.
17
Surface-Integral QSPR Models: Local Energy Properties
-2 0 2 4 6 8Experimental logPOW
-2
0
2
4
6
8
Cal
cula
ted
logP
OW
Figure 2.3 Neutral logP set with non-zwitterionic amino acids:
N=168, r2=0.782, r2cv=0.656, MUE=0.238, RMSD=0.516.
Another regression model was generated for the same set using ionized structures for
those ionized >50% at pH=7.0 as calculated by pKa, giving the 10-term equation:
( ) ( ) ( ) ( )
( ) ( )
( ) ( ) ( ) ( )
( ) ( )
3 5 32 25 7
3 3
5 314 182
25 7
(log ) 8.3660 10 5.4673 10 2.2713 10
4.4369 10 5.2747 10
6.0514 10 4.4293 10
1.4686 10 9.7195 10
L L
L L L
L L
f P V V V
IE EA
V IE IE
EA EA
η
α
− −
− −
− −
− −
⎡ ⎤ ⎡ ⎤= − × ⋅ + × ⋅ + × ⋅8− ⎡ ⎤⎣ ⎦⎣ ⎦ ⎣ ⎦
+ × ⋅ − × ⋅
− × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦
+ × ⋅ ⋅ + × ⋅⎡ ⎤⎣ ⎦
r r r
r r
r r r r
r r ( ) ( )( ) ( ) ( )81.2682 10 0.02807
L L
L LV IE EA
η−
⋅
+ × ⋅ ⋅ ⋅ +
r r
r r r
r
(2.13)
18
Chapter 2
-1 1 3 5 7Experimental logPOW
-1
1
3
5
7C
alcu
late
d lo
gPO
W
Figure 2.4 Surface-integral model for logP using compounds charged by pKa at pH=7:
r2=0.729, r2cv=0.145, MUE=0.252, RMSD=0.576.
Trifluopromazine is an outlier in this model, and the r2cv statistic improves significantly
with its removal, as do the MUE and RMSD. The resulting model predicted poorly,
however, exhibiting a negative value for r2cv, so the number of cross-validation sets was
reduced from ten (standard for these models) to six in order to generate a model with
better statistics. This gives the 9-term equation:
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )
3 52 24 7
39 3 2
3 52 25 6
318 8
(log ) 1.0814 10 2.2696 10
9.0847 10 7.0265 10 1.5322 10
4.0126 10 1.1769 10
7.1156 10 1.1316 10
L L
L L
L L L
f P V V
V IE
EA V
IE V IE
α
η
− −
− − −
− −
− −
⎡ ⎤ ⎡ ⎤= − × ⋅ + × ⋅⎣ ⎦ ⎣ ⎦
+ × ⋅ + × ⋅ − × ⋅⎡ ⎤⎣ ⎦
⎡ ⎤ ⎡ ⎤+ × ⋅ + × ⋅ ⋅⎣ ⎦ ⎣ ⎦
− × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤⎣ ⎦
r r r
r r
r r r
r r r r ( )0.08667
LEA⋅
−
r
EA r
(2.14)
19
Surface-Integral QSPR Models: Local Energy Properties
-1 1 3 5 7Experimental logPOW
-1
1
3
5
7
Cal
cula
ted
logP
OW
Figure 2.5 “Charged” logP model with outlier removed: r2=0.735, r2cv=0.573, MUE=0.151, RMSD=0.437.
This greatly improves the predictivity of the model, while not improving the
regression statistic as much. The use of charged structures diminished the predictivity of
the models and it was decided that their inclusion in these QSAR/QSPR models was not
useful. By virtue of the fact that the local properties are calculated in the gas phase, where
no solvent shielding may occur, the impact of ionization on target values as derived from
the regression models might be exaggerated.
A model using the 40 statistical descriptors as starting variables yielded a 10-term
equation:
( )( )
(log ) 0.5690 43.14 ( ) 0.1467 0.1577
10.45 0.0130 max 0.1397 0.0342
7.056 min( ) 14.84 38.72MEP L
f P MEP
IE EA
μ ρ μ α
ν χ
α α
+
−
= − ⋅ + ⋅ + ⋅ − ⋅
+ ⋅ − ⋅ + ⋅ − ⋅
+ ⋅ − ⋅ +
r
(2.15)
20
Chapter 2
-2 0 2 4 6 8Experimental logPOW
-2
0
2
4
6
8
Cal
cula
ted
logP
OW
Figure 2.6 Linear regression model for logP using statistical descriptors:
N=168, MUE=0.775, RMSD=0.996, r2=0.743, r2cv=0.635.
It is evident that, although the regression statistics are comparable to the best nonlinear
model, the predictive capacity of this model is not quite as good, with a root mean square
error of nearly 1 logP unit.
The regression model for logP using spherical-harmonic hybridization coefficients
is comprised of 11 terms:
( ) 1 4 1 3
4 9 10 11
21 1
(log ) 0.8771 1.389 0.0459 0.0584
0.1117 0.2729 0.0404 0.1722
6.808 8.842 5.786
R R MEP MEP
MEP MEP IEL EAL
EAL
f P H H H H
H H H
H Hα
= ⋅ + ⋅ − ⋅ − ⋅
− ⋅ − ⋅ + ⋅ + ⋅
+ ⋅ − ⋅ −
r
H (2.16)
21
Surface-Integral QSPR Models: Local Energy Properties
-2 0 2 4 6 8Experimental logPOW
-2
0
2
4
6
8
Cal
cula
ted
logP
OW
Figure 2.7 Regression model for logP using spherical-harmonic hybridization coefficients:
N=168, MUE= 0.756, RMSD= 0.966, r2= 0.759, r2cv=0.516.
The surface-integral model using a spherical harmonic-fitted surface gives the 4-term
equation:
( )( ) ( ) ( ) ( )
( ) ( )
3 53 1 12 2
317
log 5.668 10 2.345 10 2.154 10
9.453 10
L L
L L
f P V
IE EA
α α− − −
−
= − × ⋅ + × ⋅ − × ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣
+ × ⋅ ⋅⎡ ⎤⎣ ⎦
r r r
r r
⎦r (2.17)
Overall, the model with the best regression coefficient and best predictivity in terms of
r2cv, MUE, and RMSD was the surface-integral model that used the local properties from
the marching-cube surface and included the amino acids in their zwitterionic form. There
was not much variation among the several models in the r2 fit of the surface properties,
although the RMS error varied by nearly ½ of a logP unit.
22
Chapter 2
-2 0 2 4 6 8Experimental logPOW
-2
0
2
4
6
8
Cal
cula
ted
logP
OW
Figure 2.8 Surface-integral model for logP using spherical-harmonic-fitted surface:
N=162, MUE= 0.770, RMSD= 0.963, r2= 0.745, r2cv=.0.662.
2.3.2 Free Energy of Solvation
2.3.2.1 Free Energy of Solvation in Octanol
The surface-integral model for the free energy of solvation in n-octanol
(ΔGsolv.(oct.)) was generated using the 165 compounds in Table A3 of Appendix A, taken
from Ehresmann, et al.34. The resulting regression equation is comprised of 17 terms:
23
Surface-Integral QSPR Models: Local Energy Properties
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( )
32-2 -4 -2
3-16 -3
3-4 -13
32-4 -9
( )( ) 1.3705 10 3.9476 10 6.8874 10
5.5092 10 1.0796 10
1.1937 10 1.1179 10
3.5384 10 5.2971 10
solv L
L L
L L
L
octf G V V
V IE V EA
V EA V EA
V
α
α
⎡ ⎤Δ = × ⋅ − × ⋅ − × ⋅⎣ ⎦
+ × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤⎣ ⎦
+ × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦
⎡ ⎤+ × ⋅ ⋅ − ×⎣ ⎦
r r r
r r r r
r r r r
r r ( ) ( )
( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
r
( ) ( ) ( )
3
3 3-17 -21
-5 -8
32-12
-4
-17
1.6949 10 3.9527 10
3.6011 10 4.7541 10
2.8234 10
2.7129 10
6.5137 10
L L
L L L L
L L L L
L L
L L
IE
IE V IE EA
V IE V IE
V IE
V EA
α
η
α η
η
α
⋅ ⋅⎡ ⎤⎣ ⎦
− × ⋅ ⋅ + × ⋅ ⋅ ⋅⎡ ⎤ ⎡⎣ ⎦ ⎣− × ⋅ ⋅ ⋅ + × ⋅ ⋅ ⋅⎡ ⎤ ⎡⎣ ⎦ ⎣
⎡ ⎤+ × ⋅ ⋅ ⋅⎣ ⎦− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
− ×
r r
r r r r r
r r r r r r
r r r
r r r
( ) ( ) ( )
⎤⎦⎤⎦
( ) ( ) ( ) ( )
52
52-162.0072 10 0.3585
L L
L L L
V EA
V EA
η
α η
⎡ ⎤⋅ ⋅ ⋅⎣ ⎦
⎡ ⎤+ × ⋅ ⋅ ⋅ ⋅ +⎣ ⎦
r r r
r r r r
(2.18)
-12 -10 -8 -6 -4 -2 0 2Experimental ΔGsolv(octanol) (kcal mol-1)
-12
-10
-8
-6
-4
-2
0
2
Cal
cula
ted
ΔG
solv(o
ctan
ol) (
kcal
mol
-1)
Figure 2.9 Experimental and calculated free energies of solvation in octanol for the training set:
N=165, MUE=0.569, RMSD=0.713, r2=0.914, r2cv=0.798.
In order to examine the effect of optimal conformation in solution, the structures
from the same data set were (AM1) geometry-optimized using the conductor-like
24
Chapter 2
screening model61 (COSMO) of Klammt and Schüürmann with a bulk dielectric constant
(EPS) of 10. The surface-integral model using these structures yields an 11-term equation:
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
3 1
21 3
325 7
3 313 17
( )( ) 5.0064 10 5.5198 10
3.6463 10 2.7085 10
2.3985 10 7.3706 10
1.1133 10 1.8658 10
1.2607
solv L L
L L
L L
L L
octf G IE
V EA
V EA V EA
V EA IE
α
α
η
− −
− −
− −
− −
Δ = × ⋅ − × ⋅
+ × ⋅ + × ⋅ ⋅⎡ ⎤⎣ ⎦
L
⎡ ⎤+ × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦ ⎣ ⎦
+ × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣− ×
r r r
r r r
r r r r
r r r r
( ) ( ) ( )( ) ( ) ( )
( ) ( ) ( ) ( )
⎦5
8
5216
10
1.8956 10
1.2349 10 0.0834
L L
L L
L L L
V IE
V IE
V EA
α
η
α η
−
−
−
⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
⎡ ⎤+ × ⋅ ⋅ ⋅ ⋅ +⎣ ⎦
r r r
r r r
r r r r
(2.19)
-12 -10 -8 -6 -4 -2 0 2
Experimental ΔGsolv(octanol) (kcal mol-1)
-12
-10
-8
-6
-4
-2
0
2
Cal
cula
ted
ΔG
solv(o
ctan
ol) (
kcal
mol
-1)
Figure 2.10 Surface-integral model for free energy of solvation in octanol using the COSMO-optimized
training set: N=165, MUE=0.648, RMSD=0.841, r2=0.875, r2cv=0.816.
The use of the COSMO-optimized structures reduces the predictivity of the surface-
integral model in terms of the mean unsigned error and RMS error as seen in Figure 2.10.
25
Surface-Integral QSPR Models: Local Energy Properties
A regression model using spherical-harmonic hybrid coefficients yields an 18-term
equation:
( )( )( ) 1 1 4
7 11 18
2 4 6
11 17 12
1.003 0.058 0.071 0.143
0.156 0.557 1.670 0.024
0.013 0.010 0.015 0.018
0.042 0.127 0.169 0.26
5
1
7
solv R MEP MEP MEP
MEP MEP MEP IEL
IEL IEL IEL IEL
IEL IEL EAL
f G oct H H H H
H H H
H H H H
H H H
Δ = − ⋅ + ⋅ − ⋅ −
− ⋅ − ⋅ − ⋅ − ⋅
− ⋅ + ⋅ + ⋅ + ⋅
− ⋅ + ⋅ + ⋅ −
r
16
1 3
2
4.943 8.343 53.187
H
⋅
EALH
H Hα α
⋅
− ⋅ + ⋅ +
(2.20)
-10 -6 -2 2Experimental ΔGsolv(octanol) (kcal mol-1)
-10
-6
-2
2
Cal
cula
ted
ΔG
solv(o
ctan
ol) (
kcal
mol
-1)
Figure 2.11 Experimental and calculated free energies of solvation in octanol using hybrid coefficients:
N=165, MUE=0.636, RMSD=0.813, r2=0.889, r2cv=0.704.
This model has similar statistics to the surface-integral model using the marching-cube
surface, but the RMS error increases by 0.1 kcal·mol-1. The surface-integral model using a
spherical-harmonic-fitted surface yields the 12-term equation:
26
Chapter 2
( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )
22
314 5
55 215 62
8
4
1
( )( ) 0.1315 8.01 10
2.195 10 8.852 10
3.266 10 4.793 10
2.059 10
1.413 10
7.322 10
solv L L
L L
L L L L
L L
L L
octf G
V IE V EA
IE EA
V IE
V EA
α α
η α
η
α
−
− −
− −
−
−
−
Δ = − ⋅ + × ⋅ ⎡ ⎤⎣ ⎦
− × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤⎣ ⎦
⎡ ⎤− × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤⎣ ⎦ ⎣ ⎦+ × ⋅ ⋅ ⋅
− × ⋅ ⋅ ⋅
+ ×
r r r
r r r r
r r r r
r r r
r r r
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
31
324
5
329
7.020 10
9.707 10
1.792 10 3.126
L L
L L L
L L L
L L L
V EA
IE EA
V EA
V EA
α
η
α η
α η
−
−
−
⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
+ × ⋅ ⋅ ⋅ ⋅
⎡ ⎤− × ⋅ ⋅ ⋅ ⋅ +⎣ ⎦
r r r
r r r
r r r r
r r r r
(2.21)
-12 -10 -8 -6 -4 -2 0 2Experimental ΔGsolv(octanol) (kcal mol-1)
-12
-10
-8
-6
-4
-2
0
2
Cal
cula
ted
ΔG
solv(o
ctan
ol) (
kcal
mol
-1)
Figure 2.12 Surface-integral model for free energy of solvation in octanol using spherical harmonic surface:
N=165, MUE=0.719, RMSD=0.924, r2=0.865, r2cv=0.729.
Here again, the regression statistics are not quite as good for the surface-integral model
employing a spherical-harmonic surface as compared with the case of the marching-cube
surface. There is also an accompanying increase in RMS error of ~0.2 kcal·mol-1.
27
Surface-Integral QSPR Models: Local Energy Properties
2.3.2.2 Free Energy of Solvation in Water
The data set presented in Table A4 of Appendix A for the free energy of solvation
in water (ΔGsolv.(H2O)) was assembled from 384 compounds in the literature62-64. The
regression equation for the free energy of hydration surface-integral model is comprised of
21 terms:
( ) ( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )
( )
2-2 -52
3-9
5 32-7 -16
52-2 -6
-15
( )( ) 2.0192 10 4.3422 10
6.7859 10 0.3486
8.9213 10 1.2382 10
2.2924 10 3.1783 10
1.4154 10
solv
L L
L L
L L
L L
H Of G V V
IE
V IE
V V
IE EA
α
η
α α
Δ = × ⋅ − × ⋅ ⎡ ⎤⎣ ⎦
− × + ⋅⎡ ⎤⎣ ⎦
⎡ ⎤+ × ⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦⎣ ⎦
⎡ ⎤− × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦ ⎣ ⎦
+ × ⋅ ⋅
r r r
r r
r r r
r r r r
r ( )
( ) ( ) ( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( )( ) ( ) ( )
( ) ( ) ( )
3
55 2-7 -52
3-15
3-22
2-10
-8
-3
5.0232 10 5.7349 10
2.5633 10
1.1298 10
8.3577 10
3.3400 10
5.7742 10
L L L L
L L
L L
L L
L L
L L
IE EA
EA
V IE EA
V IE
V IE
V EA
α α
η
α
η
α
⎡ ⎤⎣ ⎦
⎡ ⎤− × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦ ⎣ ⎦
− × ⋅ ⋅⎡ ⎤⎣ ⎦
− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
− × ⋅ ⋅ ⋅
r
r r r r
r r
r r r
r r r
r r r
r r r
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )
-5
52-11
3-14
32-12
3-20
1.5065 10
1.5836 10
2.4093 10
7.9139 10
1.2174 10 0.2167
L L L
L L L
L L L
L L L
L L L
IE EA
IE EA
IE EA
IE EA
V EA
α
α
α
η
α η
+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
⎡ ⎤+ × ⋅ ⋅ ⋅⎣ ⎦
− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
⎡ ⎤− × ⋅ ⋅ ⋅⎣ ⎦
+ × ⋅ ⋅ ⋅ ⋅ −⎡ ⎤⎣ ⎦
r r r
r r r
r r r
r r r
r r r r
(2.22)
28
Chapter 2
-100.0 -72.5 -45.0 -17.5 10.0Experimental ΔGsolv(H2O) (kcal mol-1)
-100.0
-72.5
-45.0
-17.5
10.0
Cal
cula
ted
ΔG
solv(H
2O) (
kcal
mol
-1)
Figure 2.13 Experimental and calculated free energies of solvation in water for the training set given in
Table A4: N=384, MUE= 0.727, RMSD= 1.503, r2= 0.983, r2cv=0.825.
As can be seen in Figure 2.13, the predictivity suffers somewhat from the inclusion of the
charged species, resulting in a lever effect on the regression such that the whole set of
structures cannot be fitted with the same robustness as either the charged or uncharged
portions. Using only the neutral compounds (N=362) from the data set (Appendix A,
Table A4, rows 1-362) in a surface-integral model results in a 17-term equation:
29
Surface-Integral QSPR Models: Local Energy Properties
( ) ( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
382
32 27
3 213 5
3 37 15
( )( ) 8.385 10 0.0194 1.232
1.032 7.411 10
3.829 10 2.527 10
1.329 10 5.618 10
4.288 10
solv L L
L L
L L
L L
H Of G V IE
V EA
V EA V
V V
α
α
α
α η
−
−
− −
− −
Δ = − × ⋅ ⋅ − ⋅⎡ ⎤⎣ ⎦
⎡ ⎤+ ⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦ ⎣ ⎦
+ × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤ ⎡⎣ ⎦ ⎣
+ × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤ ⎡⎣ ⎦ ⎣
− ×
r r r
r r r
r r r r
r r r r
( ) ( )( ) ( ) ( )
( ) ( ) ( )
⎤⎦
⎤⎦
r
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
317
7
5218
6
3212
326
312
1.828 10
7.725 10
9.213 10
3.365 10
1.550 10
2.357 10
9.339 10
L L
L L
L L
L L
L L
L L
L L
IE
V IE EA
V IE EA
V IE
V IE
V EA
V EA
η
α
η
α
α
−
−
−
−
−
−
−
⋅ ⋅⎡ ⎤⎣ ⎦− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
⎡ ⎤− × ⋅ ⋅ ⋅⎣ ⎦− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
⎡ ⎤+ × ⋅ ⋅ ⋅⎣ ⎦
⎡ ⎤+ × ⋅ ⋅ ⋅⎣ ⎦
− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+ ×
r r
r r r
r r r
r r r
r r r
r r r
r r r
( ) ( ) ( ) ( )7 0.5539L L LV EA α η− ⋅ ⋅ ⋅ ⋅ −⎡ ⎤⎣ ⎦r r r r
(2.23)
-10 -5 0 5Experimental ΔGsolv(H2O) (kcal mol-1)
-10
-5
0
5
Cal
cula
ted
ΔG
solv(H
2O) (
kcal
mol
-1)
Figure 2.14 Experimental and calculated free energies of solvation in water for
the uncharged components of the training set given in Table A4:
N=362, MUE= 0.789, RMSD= 1.031, r2= 0.891, r2cv= 0.845.
30
Chapter 2
This gives a model with similar regression statistics, but with an improved RMS
predictivity of ½ a kcal·mol-1 of solvation free energy. When this data set, minus two
outliers, was optimized with the COSMO solvation model (EPS=80.0), the result was the
13-term equation:
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )
22
324 7
317
5
8
( )( ) 0.0165 0.8593 0.4845
8.100 10 4.357 10
0.0152 4.485 10
5.086 10
4.071 10
9.0
solv L L L
L L
L L L
L L
L L
H Of G IE
V EA V EA
V IE
V IE
V IE
α α
α η
α
η
− −
−
−
−
Δ = ⋅ − ⋅ + ⋅ ⎡ ⎤⎣ ⎦
⎡ ⎤+ × ⋅ ⋅ − × ⋅ ⋅⎣ ⎦
+ ⋅ ⋅ − × ⋅ ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+
r r r r
r r r r
r r r r
r r r
r r r
( ) ( ) ( )( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )
5
24
316
5217
77 10
9.522 10
3.271 10
5.327 10 0.9146
L L
L L L
L L L
L L L
V EA
IE EA
IE EA
V EA
α
η
η
α η
−
−
−
−
× ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
⎡ ⎤+ × ⋅ ⋅ ⋅ ⋅ −⎣ ⎦
r r r
r r r
r r r
r r r r
(2.24)
-10 -5 0 5Experimental ΔGsolv(H2O) (kcal mol-1)
-10
-5
0
5
Cal
cula
ted
ΔG
solv(H
2O) (
kcal
mol
-1)
Figure 2.15 Experimental and calculated free energies of solvation in water for the
unchargedcomponents using theCOSMO solvation model (EPS=80.0):
N=360, MUE= 0.922, RMSD= 1.139, r2= 0.862, r2cv= 0.805.
31
Surface-Integral QSPR Models: Local Energy Properties
Thus, the use of structures geometry-optimized with the COSMO model reduce the
predictivity again by 0.1 kcal·mol-1. The best regression model using spherical-harmonic
hybridization coefficients yielded a 21-term equation with poor regression statistics and is
not presented here (MUE=1.47, RMSD=2.14). The surface-integral model using a
spherical harmonic-fitted surface is defined by the 18-term equation:
( ) ( ) ( )
( ) ( )( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( )
2 34 62
523 7
13 3
324 6
312
( )( ) 1.545 10 8.236 10
5.710 10 5.879 10
0.247 1.651 10
1.054 10 1.043 10
4.746 10 5.11
solv
L L
L L
L L
L
H Of G V V
IE EA
V IE
V EA V EA
V EA
α
− −
− −
−
− −
−
Δ = × ⋅ + × ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦
⎡ ⎤+ × ⋅ − × ⋅ ⎣ ⎦− ⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦
⎡ ⎤+ × ⋅ ⋅ − × ⋅ ⋅⎣ ⎦
+ × ⋅ ⋅ +⎡ ⎤⎣ ⎦
r r r
r r
r r r
r r r r
r r ( ) ( )
( ) ( ) ( ) ( )
( ) ( )
( ) ( ) ( )( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
525
536 15 2
36
3211
4
326
10
3 10
7.387 10 6.237 10
1.639 10
3.699 10
3.319 10
5.071 10
1.036 10
L
L L
L L
L L
L L
L L
L L
V
V IE
EA
V IE
V EA
V EA
V EA
α
α η
α
η
α
α
α
−
− −
−
−
−
−
−
L
⎡ ⎤× ⋅ ⋅⎣ ⎦
− × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤ ⎡⎣ ⎦ ⎣
− × ⋅ ⋅⎡ ⎤⎣ ⎦
⎡ ⎤− × ⋅ ⋅ ⋅⎣ ⎦− × ⋅ ⋅ ⋅
⎡ ⎤+ × ⋅ ⋅ ⋅⎣ ⎦
+ × ⋅ ⋅ ⋅
r r
r r r r
r r
r r r
r r r
r r r
r r r
( ) ( ) ( ) ( )
3
3172.508 10 1.003L L LV EA α η−
⎡ ⎤⎣ ⎦
− × ⋅ ⋅ ⋅ ⋅ +⎡ ⎤⎣ ⎦r r r r
⎤⎦
(2.25)
32
Chapter 2
-10 -5 0 5Experimental ΔGsolv(H2O) (kcal mol-1)
-10
-5
0
5
Cal
cula
ted
ΔGso
lv(H
2O) (
kcal
mol
-1)
Figure 2.16 Surface-integral model for free energy of solvation in water for the
uncharged structures and using thespherical-harmonic-fitted surface:
N=360, MUE= 0.929, RMSD= 1.18, r2= 0.842, r2cv= 0.781.
Again, the spherical-harmonic-fitted surface manages to affect the local property space
such that predictivity is decreased (~0.2 kcal·mol-1) and the model with the best
predictivity was the surface-integral model using the marching-cube surface. With this
appearing to be a trend, the surface-integral models presented hereafter are comprised only
of the marching-cube surface-fitted local properties.
2.3.3 Acid Dissociation Constant
A surface-integral model was generated for pKa using the data set in Table A5 of
Appendix A, consisting of 268 nitrogenous compounds taken from the article by Tehan, et
al.65 on pKa estimation, which is comprised of primary and secondary amines, anilines,
and pyridines. The regression equation has 23 terms:
33
Surface-Integral QSPR Models: Local Energy Properties
( ) ( )
[ ] [ ][ ]
32 3 4 2
56 32
536 2
315 3
7
6.979 10 ( ) 6.469 10 ( ) 3.829 10 ( )
4.278 10 ( ) 8.326 10 ( )
1.552 10 ( ) 4.6127 ( )
3.0124 10 ( ) ( ) 7.818 10 ( ) ( )
7.530 10 ( )
a
L
L L
L L
f pK V V V
V IE
IE
V IE V EA
V
α
− − −
− −
−
− −
−
= − × ⋅ + × ⋅ + × ⋅ ⎡ ⎤⎣ ⎦
− × ⋅ ⎡ ⎤ + × ⋅⎣ ⎦
− × ⋅ − ⋅
+ × ⋅ ⋅ + × ⋅ ⋅
− × ⋅ ⋅
r r r
r r
r r
r r r r
r
r
[ ] [ ]
[ ]
[ ][ ]
3 5112 2
313 2
3 33 72
57 2
5
( ) 9.208 10 ( ) ( )
2.379 10 ( ) ( ) 4.549 10 ( ) ( )
1.522 10 ( ) ( ) 5.233 10 ( ) ( )
9.698 10 ( ) ( )
9.348 10 ( ) ( ) ( )
4.1
L L
L L
L L
L L
L L
EA V EA
V EA V
V EA V
IE
V IE
α
α
α
α
−
− −
− −
−
−
⎡ ⎤ + × ⋅ ⎡ ⋅⎣ ⎦ ⎣ ⎦
+ × ⋅ ⋅ − × ⋅ ⋅
+ × ⋅ ⎡ ⋅ ⎤ − × ⋅ ⋅⎣ ⎦
+ × ⋅ ⋅
+ × ⋅ ⋅ ⋅
−
r r
r r r r
r r r r
r r
r r r
[ ]
⎤r
[ ][ ][ ][ ]
8
519 2
323
312
213
319
31 10 ( ) ( ) ( )
2.534 10 ( ) ( ) ( )
7.872 10 ( ) ( ) ( )
6.732 10 ( ) ( ) ( )
9.212 10 ( ) ( ) ( ) ( )
5.947 10 ( ) ( ) ( ) ( ) 4.
L L
L L
L L
L L
L L L
L L L
V IE
V IE
V IE
V EA
V EA
V EA
η
η
η
α
α η
α η
−
−
−
−
−
−
× ⋅ ⋅ ⋅
+ × ⋅ ⎡ ⋅ ⋅ ⎤⎣ ⎦
− × ⋅ ⋅ ⋅
+ × ⋅ ⋅ ⋅
− × ⋅ ⋅ ⋅ ⋅
− × ⋅ ⋅ ⋅ ⋅ +
r r r
r r r
r r r
r r r
r r r r
r r r r 9512
(2.26)
-4 0 4 8 12Experimental pKa
-4
0
4
8
12
Cal
cula
ted
pKa
Figure 2.17 Experimental and calculated pKa values for the training set:
N=268, MUE= 1.03, RMSD= 1.339, r2= 0.841, r2cv=0.767.
34
Chapter 2
The authors performed separate regressions for each class of nitrogenous compound (i.e.
amines, anilines, pyridines, etc.) and report regression statistics for each class. These
values range from the low values of r2=0.55, r2cv=0.54 for nitrogenous heterocycles
(N=150) to high values of r2=0.94, r2cv=0.94 for a combined set of anilines and amines
(N=132). The reported regression equations are comprised of a constant and a single term
(electrophilic superdelocalizability)65.
Using the standard statistical descriptor output of Parasurf gave a model with four
terms:
( )( ) 2 1
1
4.599 10 1.281 10 3.065 10
1.200 10 12.451
a L
L
2f pK MEP MEP IE
EA
+− −
−−
= × ⋅ − × ⋅ − × ⋅
− × ⋅ +
r −
(2.27)
-3.5 -1.0 1.5 4.0 6.5 9.0 11.5Experimental pKa
-3.5
-1.0
1.5
4.0
6.5
9.0
11.5
Cal
cula
ted
pKa
Figure 2.18 Experimental and calculated pKa values using statistical descriptors:
N=268, MUE= 1.32, RMSD= 1.678, r2= 0.769, r2cv=0.736.
The regression statistics for the surface-integral model are only slightly better than that
obtained for the statistical descriptor model above and, considering the need for only four
linear terms (versus 23 nonlinear terms), this model may lend itself more easily to physical
35
Surface-Integral QSPR Models: Local Energy Properties
interpretation. The major drawback comes in the form of an increase in RMS error of 0.34
pKa units.
2.3.4 Boiling Point
The surface-integral model for the boiling point data set, which was taken from
Syracuse Research Corporation’s PHYSPROP database66 and consisting of 1642
compounds and using the marching-cube surface, has 17 terms:
( )( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( )
( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )
32 5 5
22 5
38
319
4
8
8.018 10 4.143 10 0.786 10
2.195 10 3.922 10
7.287 10
4.727 10
3.316 10
2.515 10
b L
L L
L L
L L
L L
L L
f T V V
V EA IE
IE
V IE EA
V IE EA
V IE
α
α
α
α
− − −
− −
−
−
−
−
= × ⋅ − × ⋅ + × ⋅⎡ ⎤⎣ ⎦
− × ⋅ ⋅ + × ⋅ ⋅ L⎡ ⎤⎣ ⎦
− × ⋅ ⋅⎡ ⎤⎣ ⎦
− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+ × ⋅ ⋅ ⋅
− × ⋅ ⋅ ⋅⎡⎣
r r r
r r r r
r r
r r r
r r r
r r r
( ) ( ) ( )
( ) ( ) ( )
r
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( )
2
7
3210
320
527
5219
512 2
11
7.114 10
2.367 10
3.463 10
1.071 10
6.296 10
1.032 10
9.686 10
L L
L L
L L
L L
L L L
L L L
L L
V IE
V IE
V IE
V EA
IE EA
IE
EA
η
η
η
α
α
α η
α η
−
−
−
−
−
−
−
⎤⎦− × ⋅ ⋅ ⋅
⎡ ⎤+ × ⋅ ⋅ ⋅⎣ ⎦
+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
⎡ ⎤− × ⋅ ⋅ ⋅⎣ ⎦
⎡ ⎤− × ⋅ ⋅ ⋅⎣ ⎦
+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
− × ⋅ ⋅ ⋅
r r r
r r r
r r r
r r r
r r r
r r r
r r ( )
( ) ( ) ( ) ( )
52
2101.165 10 65.41
L
L L LV EA α η−
⎡ ⎤⎣ ⎦
+ × ⋅ ⋅ ⋅ ⋅ −⎡ ⎤⎣ ⎦
r
r r r r
(2.28)
36
Chapter 2
0 100 200 300 400 500Experimental Tb (°C)
0
100
200
300
400
500
Cal
cula
ted
T b (°
C)
Figure 2.19 Surface-integral model for boiling point using the marching-cube surface:
N=1642, MUE= 22.2, RMSD= 33.9, r2= 0.740, r2cv=0.574.
When 19 outliers are removed from the data set, the resulting regression model possesses
19 terms:
( )( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
322
2 3
3 325 10
5 5210 13 2
0.235 0.106 1.023 10
6.926 5.028 1.529 10
6.297 10 1.126 10
2.656 10 1.144 10
8.485
b L
L L L
L L
L L
f T V V V IE
V EA
V EA V EA
V IE
α α
η η
−
−
− −
− −
⎡ ⎤= ⋅ + ⋅ − × ⋅ −⎣ ⎦
+ ⋅ − ⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦
⎡ ⎤+ × ⋅ ⋅ − × ⋅ ⋅
L
⎡ ⎤⎣ ⎦⎣ ⎦
⎡ ⎤+ × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤⎣ ⎦⎣ ⎦
+ ×
r r r r
r r r
r r r r
r r r r
( ) ( ) ( ) ( )
r
r
( ) ( ) ( )( ) ( ) ( )
( ) ( ) ( )( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )
536 6 2
3
7
320
3
324
16
10 3.106 10
2.101 10
9.788 10
3.008 10
3.397 10
1.313 10
1.131 10
L L L L
L L
L L
L L
L L
L L
L L L
EA
V IE EA
V IE
V IE
V IE
V IE
V EA
α α
η
η
α
α
α η
− −
−
−
−
−
−
−
⋅ ⋅ + × ⋅ ⋅⎡ ⎤ ⎡⎣ ⎦ ⎣
− × ⋅ ⋅ ⋅
− × ⋅ ⋅ ⋅
+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+ × ⋅ ⋅ ⋅
⎡ ⎤− × ⋅ ⋅ ⋅⎣ ⎦
+ × ⋅ ⋅ ⋅ ⋅⎡
r r r r
r r r
r r r
r r r
r r r
r r r
r r r r3
87.97−⎤⎣ ⎦
η ⎤⎦
(2.29)
37
Surface-Integral QSPR Models: Local Energy Properties
0 100 200 300 400Experimental Tb (°C)
0
100
200
300
400
Cal
cula
ted
T b (°
C)
Figure 2.20 Surface-integral model for boiling point with outliers removed:
N=1623, MUE=26.0, RMSD=35.1, r2= 0.752, r2cv=0.728.
Both regression statistics are improved, but somewhat ironically, the prediction error
increases with the removal of the outliers: MUE:+3.8 and RMSD: +1.2 degrees Celsius.
The linear regression model using spherical-harmonic hybrid coefficients for the boiling
point data set has 29 terms:
( ) 1 3 4 6
2 5 7 8
13 15 19 1
2 4 6 7
( ) 47.52 13.05 8.54 35.30 31.27
1.39 1.61 1.88 2.86
16.39 19.35 17.13 0.44
0.35 0.26 0.19 0.46 0.27
b R R R R
MEP MEP MEP MEP
MEP MEP MEP IEL
IEL IEL IEL IEL
f T H H H H H
H H H H
H H H H
H H H H H
= ⋅ + ⋅ − ⋅ − ⋅ + ⋅
+ ⋅ + ⋅ + ⋅ + ⋅
+ ⋅ + ⋅ − ⋅ + ⋅
+ ⋅ + ⋅ + ⋅ − ⋅ − ⋅
r
8
9 10 17 1
7 9 15 1
0.87 0.50 1.57 0.49 0.49
1.03 2.10 3.08 61.41 378.51143
7R
2
7
IEL
IEL IEL IEL EAL EAL
EAL EAL EAL
H H H H H
H H H H Hα α
− ⋅ − ⋅ − ⋅ − ⋅ − ⋅
− ⋅ − ⋅ + ⋅ + ⋅ − ⋅
−
(2.30)
38
Chapter 2
-50 50 150 250 350 450 550Experimental Tb (°C)
-50
50
150
250
350
450
550
Cal
cula
ted
T b (°
C)
Figure 2.21 Regression model for boiling point using spherical-harmonic hybrid coefficients:
MUE= 24.6, RMSD= 34.6, r2= 0.779, r2cv=0.742.
When the set of 40 statistical descriptors is used, the regression model for the boiling point
data set has 16 terms:
( ) max
2 min
max
max 2
( ) 23.19 7523 11.45 0.3103 1.163
12.65 8.179 0.7882 0.2502
0.9949 1.251 0.7031 1.285
127.9 449.5 757.2 699.2
b D
L
L L L
L L
L
f T M
V V IE
EA EA EA
α
μ μ α
σ
χ
α α σ
+ +
−
= ⋅ − ⋅ + ⋅ − ⋅ + ⋅
+ ⋅ − ⋅ − ⋅ − ⋅
− ⋅ − ⋅ + ⋅Δ + ⋅
− ⋅ + ⋅ + ⋅ −
r W V
(2.31)
39
Surface-Integral QSPR Models: Local Energy Properties
-5 95 195 295 395 495 595Experimental Tb (°C)
-5
95
195
295
395
495
595C
alcu
late
d T b
(°C
)
Figure 2.22 Regression model for boiling point using statistical descriptors:
MUE= 25.3, RMSD= 36.8, r2= 0.750, r2cv=0.733.
With an average RMS error of 35.1°C for the set of property models, none of the
individual models predicts well enough to be used in a practical application, but rather,
they serve to highlight the limitations of the method and confirm the intuitive notion that,
at or near the boiling point, where a percentage of molecules are entering the gas phase,
the collective set of local properties cease to describe well the interactions between
molecules in terms of molecular surface properties.
2.3.5 Glass Transition Temperature
Glass transition temperature (Tg) is the temperature at which amorphous materials
change from a somewhat crystalline phase to a liquid phase and is used as a measure of
the thermal failure limit for organic light-emitting diodes67-69 (OLED’s). The surface-
integral model using the marching-cube surface for the glass transition temperature was
40
Chapter 2
generated from a set of 73 OLED materials in Table A6 of Appendix A, assembled from
the literature70. The resulting regression equation has 4 terms:
( )( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
5 324 20
315
327
2.608 10 2.336 10
3.215 10
1.078 10 255.98
g L L
L L L
L LL
f T V V IE
IE
EA
η
α η
α η
− −
−
−
⎡ ⎤= − × ⋅ − × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦⎣ ⎦
+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
⎡ ⎤− × ⋅ ⋅ ⋅ +⎣ ⎦
r r r r
r r r
r r r
r
(2.32)
300 350 400 450Experimental Tg (°C)
300
350
400
450
Cal
cula
ted
T g (°
C)
Figure 2.23 Experimental and calculated glass transition temperatures for the training set:
N=73, MUE= 16.8, RMSD= 22.5, r2= 0.690, r2cv=0.582.
Using lower F statistic values (for individual terms to enter and to leave the regression
equation) in the multiple regression results in an equation with more terms and a slightly
improved r2 value, but also yields a much less predictive model (r2cv approaches zero).
The COSMO-optimized data set (using a bulk dielectric constant value of 10.0 for n-
octanol) yields a model with 12 terms:
41
Surface-Integral QSPR Models: Local Energy Properties
( )( ) ( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( )
( ) ( ) ( )
322 3
536 3 2
324 2
35
320
0.286 5.091 10 9.633 10
6.251 10 3.745 10 5.833
7.761 10 2.385 10
3.135 10
5.192 10
2.52
g
L L
L L
L
L L
f T V V V
V EA
V EA V
V
V IE EA
α
α
α
− −
− −
− −
−
−
⎡ ⎤= − ⋅ − × ⋅ + × ⋅ ⎣ ⎦
+ × ⋅ − × ⋅ + ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦
⎡ ⎤− × ⋅ ⋅ − × ⋅ ⋅⎣ ⎦
− × ⋅ ⋅⎡ ⎤⎣ ⎦
+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
+
r r r
r r
r r r r
r r
r r r
( ) ( ) ( )
( ) ( ) ( )
313
310
1 10
1.843 10 252.25
L L
L L
V IE
V EA
α
α
−
−
× ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦
+ × ⋅ ⋅ ⋅ +⎡ ⎤⎣ ⎦
r r r
r r r
r
r
(2.33)
300 350 400 450Experimental Tg (°C)
300
350
400
450
Cal
cula
ted
T g (°
C)
Figure 2.24 Surface-integral model for glass transition temperature using COSMO-optimized structures:
MUE= 15.3, RMSD= 18.7, r2= 0.779, r2cv=0.491.
The predictivity of this model is comparable to the previous one, with an improvement in
RMS error of 2.8. The same data set (not using COSMO-optimized structures) was used
to generate a regression model using the statistical descriptors, which yielded a 9-term
equation:
42
Chapter 2
( )( ) max
min max 2
1.253 6.112 7.938 4.544 1.221
0.334 6.220 0.125 6160 A840.6
g L
L L EA EA
f T V V V
IE EA
α
σ δ+ −
+−
= ⋅ − ⋅ − ⋅ + ⋅ − ⋅
+ ⋅ + ⋅ + ⋅ − ⋅+
r IE
(2.34)
300 350 400 450Experimental Tg (°C)
300
350
400
450
Cal
cula
ted
T g (°
C)
Figure 2.25 Regression model for glass transition temperature using statistical descriptors:
MUE=12.7, RMSD= 15.7, r2= 0.844, r2cv=0.521.
This model possesses better regression statistics, with a significantly improved prediction
error, compared with that of the surface-integral model. The regression model using the
spherical-harmonic hybrid coefficients yields an equation with only two terms that
predicts very poorly (MUE = 22.2, RMSD = 28.5):
( )( ) 16 4144.05 2.982 285.1g R MEPf T H H= ⋅ + ⋅ +r (2.35)
43
Surface-Integral QSPR Models: Local Energy Properties
300 350 400 450Experimental Tb (°C)
300
350
400
450C
alcu
late
d T b
(°C
)
Figure 2.26 Regression model for glass transition temperature using hybridization coefficients:
MUE=22.2, RMSD=28.5, r2= 0.501, r2cv=0.224.
The best-predicting property model here uses the statistical descriptor set and has an RMS
of 15.7°C, which represents roughly 10% of the range of the Tg values in the data set,
which is rather large (the best boiling point model predicts within ~6% of its range). But
here again, the local properties are being used to predict a phase change – the point at
which the forces dictating the arrangement of molecules cease to apply in the same
manner.
2.3.6 Aqueous Solubility
The aqueous solubility data set in Table 1.6 of Appendix A is a small subset of 589
compounds taken from The University of Arizona’s AQUASOL database71. Given that
the solubility values were in some 100 various units, all values were converted to standard
molarity units (moles/liter, M ) and the logarithm (base 10) taken as target values (logS).
44
Chapter 2
The regression equation for the surface-integral model derived using the marching-cube
surface consists of 11 terms:
( ) ( ) ( ) ( )
( ) ( )
( ) ( ) ( )( ) ( )
( ) ( )
( ) ( )( ) ( )
323 5
34 2
5 3142
3
527
319
5
log 2.086 10 3.768 10
1.803 10 0.338
0.391 3.643 10
5.224 10
4.336 10
9.093 10
1.257 10
L L
L L
L
L
L L
L
f S V V
EA
V EA
V
V
IE
V IE
α
α
α
α
η
− −
−
−
−
−
−
−
⎡ ⎤= × ⋅ + × ⋅ ⎣ ⎦
− × ⋅ − ⋅ ⎡ ⎤⎣ ⎦
+ ⋅ + × ⋅ ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣− × ⋅ ⋅
⎡ ⎤− × ⋅ ⋅⎣ ⎦
+ × ⋅ ⋅⎡ ⎤⎣ ⎦+ × ⋅ ⋅
r r r
r r
r r
r r
r r
r r
r r ( )( ) ( ) ( )81.478 10 0.8576
L
L LV IE
α
η−
⋅
− × ⋅ ⋅ ⋅ −
r
r r r
⎦r
(2.36)
-8 -6 -4 -2 0 2Experimental logS (H2O)
-8
-6
-4
-2
0
2
Cal
cula
ted
logS
(H2O
)
Figure 2.27 Surface-integral model for logS using the marching cube surface:
N=589, MUE= 0.844, RMSD= 1.14, r2= 0.578, r2cv=0.411.
45
Surface-Integral QSPR Models: Local Energy Properties
The regression model using the spherical-harmonic hybrid coefficients yielded an 18-term
equation:
( )( ) ( ) ( )
( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )
( )
1 1
3 321 1
2 2
321 1
21 2
2
log 4.206 10 3.416 10
2.993 10 6.651 10
2.272 10 2.797 10
1.447 10 8.008 10
7.902 10 1.113 10
2.251 10
L L
L L
L L
f S V V
V V
V IE
V IE V
V
α α
η
α
− −
− −
− −
− −
− −
−
= − × ⋅ − × ⋅
⎡ ⎤− × ⋅ + × ⋅ ⎡ ⎤⎣ ⎦⎣ ⎦
+ × ⋅ + × ⋅
⎡ ⎤− × ⋅ + × ⋅ ⋅⎣ ⎦
+ × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤⎣ ⎦
− × ⋅
r r r
r r
r r
r r r
r r r r
r ( ) ( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( )
32
522
22
21
32
3
4.405 10
1.044 10
3.205 10
1.564 10
1.920 2.457
59.728 2.055
L L
L L
L L
L L
L L L L
L L
V
IE EA
IE
EA
EA EA
V IE EA
η η
α
α
η η
−
−
−
−
⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦
⎡ ⎤+ × ⋅ ⋅⎣ ⎦
+ × ⋅ ⋅⎡ ⎤⎣ ⎦
− × ⋅ ⋅⎡ ⎤⎣ ⎦
⎡ ⎤− ⋅ ⋅ + ⋅ ⋅⎣ ⎦
+ ⋅ ⋅ ⋅ +⎡ ⎤⎣ ⎦
r r
r r
r r
r r
r r r r
r r r
r
(2.37)
-8 -6 -4 -2 0 2Experimental logS (H2O)
-8
-6
-4
-2
0
2
Cal
cula
ted
logS
(H2O
)
Figure 2.28 Regression model for logS using spherical-harmonic hybrid coefficients:
N=589, MUE= 0.960, RMSD= 1.27, r2= 0.469, r2cv=0.333.
46
Chapter 2
The regression model using the statistical descriptors has 14 terms:
( )( ) ( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( ) ( )
3 321 3 2
522 2 2
24 3
21 2
52 2
log 1.135 10 6.618 10
4.049 10 2.219 10
2.853 10 5.296 10
1.709 10 4.315 10
1.342 10 12.75
L
L L
L L
L L
L L
f S V IE
IE IE
EA
V IE
α
η η
η
− −
− −
− −
− −
−
⎡ ⎤= − × ⋅ − × ⋅ ⎡ ⎤⎣ ⎦⎣ ⎦
+ × ⋅ − × ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦
− × ⋅ + × ⋅⎡ ⎤⎣ ⎦
− × ⋅ − × ⋅ ⎡ ⎤⎣ ⎦
⎡+ × ⋅ + ⋅ ⋅⎡ ⎤⎣ ⎦ ⎣
r r
r r
r r
r r
r r
r
r
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
32
2
3 22
3.718 9.269
1.949 43.79
6.998
L L
L L
V IE V EA
V EA V EA
⎤⎦
+ ⋅ ⋅ − ⋅ ⋅⎡ ⎤⎣ ⎦
⎡ ⎤+ ⋅ ⋅ + ⋅ ⋅⎡ ⎤⎣ ⎦⎣ ⎦−
r r r r
r r r r
(2.38)
-8 -6 -4 -2 0 2Experimental logS (H2O)
-8
-6
-4
-2
0
2
Cal
cula
ted
logS
(H2O
)
Figure 2.29 Regression model for logS using statistical descriptors:
N=589, MUE= 0.791, RMSD= 1.07, r2= 0.624, r2cv=0.528.
As in the case of glass transition temperature, the statistical descriptor set again gives the
best regression model in terms of RMS error (~1 logS unit). The regression statistics,
however, are not encouraging. In comparison to the commercially-available product
ACD/Labs’ Solubility DB72, the RMS error of our best model performs only as well as the
outside range of RMS errors in the prediction of a set of test compounds73,74.
47
Surface-Integral QSPR Models: Local Energy Properties
2.4 Discussion
As described in Ehresmann, et al.34, the construction of the surface-integral models
involves two approximations: that the target properties may be treated using a sum of local
surface values and that gas-phase electron densities from semiempirical calculations can
be used to represent properties that, in bulk, depend on the presence of a polar medium.
The free energy of solvation models themselves give reliable estimates, although they are
not as accurate as the most reliable methods available63,75. It should be noted that this
surface-integral approach relies on the gas-phase electron densities and optimized
structures from semiempirical calculations and can only include solute polarization
implicitly by the local polarizability for the molecular surface. It was also found that the
use of COSMO-optimized structures by this method results in models of slightly lower
predictive power.
In order to evaluate the predictivity of our model, the logP data set was first
predicted using KOWWIN 1.67, included in the U.S. Environmental Protection Agency’s
property estimation package, EPISUITE 3.11. KOWWIN is a Windows implementation
of Syracuse Research Corporation’s LogKow10, which is an atom/fragment-based
method for estimating logP that was trained with 2,410 compounds, using 175 fragment
groups (r2 = 0.98). The statistics reported for a 13,058 compound validation set are as
follows: a standard deviation of 0.436, an MUE of 0.316 and an r2 of 0.95. The SD files
for our logP training set were converted to their corresponding SMILES strings and run as
a batch job. This test set yielded a mean signed error of 0.396, a mean unsigned error
(MUE) of 1.172, and a root mean square (RMS) error of 2.046. As a second trial and a
rough measure of the comparative prediction accuracy of the two methods, the logP values
of a small set of 17 recognizable biological structures taken from Exploring QSAR59 were
predicted using both methods (Table 2.3). The KOWWIN model yielded a mean
unsigned error of 1.12 and an RMS error of 1.61. Our local property model performed
slightly worse, with a mean unsigned error of 1.40 and an RMS error of 1.82.
From previous regressions, a tendency of the models to predict poorly for logP
values at or below zero logP units was observed and attributed to an under-representation
by the data set of compounds much more soluble in water than in n-octanol (since
48
Chapter 2
compounds with logP values at the other end of the solubility spectrum also presented a
similar prediction error).
Table 2.3 Results of KOWWIN and Parasurf logP predictions.
No. Compound Exp. KOWWIN Parasurf
1 estradiol 4.01 1.55 3.02
2 imipramine 4.80 5.01 4.78
3 pentazocine 3.31 5.03 4.65
4 rifampin 1.32 2.08 2.15
5 vincristine 2.57 3.11 3.83
6 digitoxin 1.68 2.04 3.61
7 terfenadine 3.22 7.62 7.87
8 sufentanil 3.24 3.62 3.22
9 colchicine 1.3 1.86 1.64
10 tetracycline -1.44 -0.18 0.57
11 hexetidine 2.00 5.26 5.30
12 Δ9-tetrahydrocannabinol 6.97 7.6 5.84
13 yohimbine 2.73 2.11 2.86
14 quinine 2.64 3.29 4.52
15 acyclovir -1.56 -2.41 0.20
16 diazepam 2.99 2.70 3.83
17 codeine 1.14 1.28 2.54
The principal disadvantage of the local-property/surface-integral method lies with
the application to small regions of the molecular surface as having a well-defined local
values in the property units. The projected property value is not a true measure of the
actual property value at a given point, but rather, is an index of it, describing over the
whole of the surface the variations in the property. This abstraction also applies to the
local properties themselves. It has been noted24 that EAL does not, in fact, represent a real
electron affinity, even within the definition of Koopmans’ theorem, but rather is a local
indicator of electron-accepting regions on the molecular surface - regions that are likely to
be the site of nucleophilic attack. The local electronegativity also does not correspond to a
real electronegativity based on chemical potential37. Another disadvantage of the method
49
Surface-Integral QSPR Models: Local Energy Properties
is the lack of an obvious physical interpretation of the regression equation. The surface-
integral models are nonlinear relationships between a physical property and a set of
surface properties (or indices) that require an understanding of the terms within their own
context or a method of relating local property variations to chemical structural features.
Figure 2.30 Surface-integral-modeled logP surface of decylsulfonic acid.
These drawbacks notwithstanding, the models themselves approach the predictive
ability of accepted, commonly-used prediction methods, with the added benefit of
allowing the researcher to visualize the physical property surface for a given structure, as
well as use the surface-mapped physical property as a local property in itself. The quality
of prediction of actual physical property values at the point of using surface-integral-
derived physical properties would be dubious at best. Rather, the use of these “extended”
properties would only be appropriate in the capacity of a surface descriptor in a statistical
analysis or a classification scheme.
50
Chapter 2
2.5 Conclusions
The quantitative structure-property models presented here represent a shift towards
a completely wave-function, molecular-orbital-based approach to QSAR/QSPR prediction
using surface-integral models. One particularly attractive feature of the surface-integral
technique is the ability to use the predicted property (or activity) as a local property itself.
The property in question is defined as a local property and projected onto an isodensity
surface for visualization or for further statistical analysis. Thus, not only are the surface-
integral models QSPR/QSAR prediction methods, but they are also indicators of the
molecular surface features that contribute to the particular property. With further
investigation into the physical meanings of surface-integral models in terms of the local
properties of which they are comprised, any physical property or biological activity that is
a function of molecular surface interactions can be predicted and visualized by this
method.
51
Chapter 3
Support Vector Classification of
Phospholipidosis-Inducing Drugs
3.1 Introduction
3.1.1 Phospholipidosis
Drug-induced phospholipidosis is a physiopathological condition characterized by
the appearance of microscopic subcellular structures, called lamellar bodies or lysosomal
inclusion bodies that contain primarily large deposits of undegraded phospholipids. The
lysosomal bodies aggregate inside the cells of the lungs, liver, kidneys, corneas, and brain
and their presence often coincides with adverse clinical effects such as inflammatory
reactions and fibrosis, although the relationship is as yet unexplained. Indeed, the onset of
phospholipidosis may or may not be associated with a presentation of adverse
symptoms76. It is nevertheless well-documented that drug-induced phospholipidosis does
affect cellular function by impairing lysosomal protein degradation, membrane fusion, and
pino- and endo-cytosis77,78. For this reason it is desirable of pharmaceutical companies to
develop a screen for phospholipidosis induction.
52
Chapter 3
The most common feature of the drugs that induce phospholipidosis is that they are
both cationic and amphiphilic: they have a positively-charged water-soluble portion and a
hydrophobic portion. Referred to, therefore, as cationic amphiphilic drugs (CAD’s), these
compounds are found accumulated inside the lysosomal bodies along with aggregated
phospholipids. It is thought that the CAD’s pass into the lysosomal compartment where
the pH is low and become trapped. By virtue of being weakly basic, they are protonated
so that they cannot pass back through the phospholipid bilayer79. This may also be a
defense mechanism of the cell to protect itself against exogenous xenobiotics and
metabolites.
While the molecular mechanism of phospholipidosis induction is not presently
known, there are two basic hypotheses. The first involves the CAD’s binding directly to
phospholipids, resulting in indigestible complexes that are stored in the lysosomal lamellar
bodies79. The other hypothesis takes note of the concomitant inhibition of phospholipase
activity in the lamellar bodies. The CAD’s can inhibit phospholipases either by binding
directly to them, or if the concentration of the CAD’s becomes high enough, they may
effectively raise the pH such that phospholipase function becomes impaired, resulting in
an accumulation of phospholipids that cannot be degraded78.
Working from either of these hypotheses it should be possible to elucidate some
molecular surface features common among the drugs that promote the induction of
phospholipidosis in order to establish the relationship between biological activity and
structure (or surface properties). The challenge comes in that, while most drugs that
induce phospholipidosis are cationic and amphiphilic, not all cationic amphiphilic
compounds induce phospholipidosis. Thus, the most defining characteristics of the drug
class may not be useful in classifying its activity. In fact, these properties may completely
overshadow other aspects of the molecular surface more pertinent to the function of
CAD’s as phospholipase inhibitors or as sites of phospholipid complexation/aggregation.
Given that the octanol-water partition coefficient (P, or in this case logP), itself an index
of hydrophobicity32, has been shown to be additive in terms of the solvent-accessible
surface80, local properties such as the ones described in the previous chapter (Equations
2.1 – 2.6) that were used in the logP surface-integral model should be particularly useful
in predicting phospholipidosis induction provided there is a sufficiently rigorous statistical
method available to classify compounds based on their molecular surface property values.
53
Support Vector Classification of Phospholipidosis-Inducing Drugs
3.1.2 Phospholipidosis Models
Our recent work with phospholipidosis prediction was performed using a 144-
compound data set of structures with a positive assay for phospholipidosis induction as
determined by transmission electron microscopy81, which was provided by Anne Tilloy-
Ellul (Pfizer Global R&D, Amboise Laboratories, France) and Marcel de Groot (Pfizer
Global R&D, Sandwich Laboratories, UK). This data set, shown in Table 3.1, was
divided in half, with one portion to be used as a training set, and the other as a test set by
sorting the provided set by the Parasurf-calculated free energies of solvation in octanol
(ΔGsolv(oct.)) and placing every other compound into either the training set or the test set.
Thus, the training set of 72 structures consists of 44 positives and 28 negatives, and the
test set of 72 compounds consists of 42 positives and 30 negatives. The two compounds
carbon tetrachloride and valproic acid were duplicates in the complete data set and are in
both the training set and test set. The basic approach undertaken to classify these data
according to the likelihood of inducing phospholipidosis in assay involved the use of two
statistical methods, each applied to two types of descriptor sets of local properties. The
first set of property descriptors used were the statistical descriptors described in the
previous chapter and shown in Table 2.2. The second used consisted of surface
autocorrelation indices, which had recently been implemented in Parasurf ’06 and will be
described briefly in the following section.
Table 3.1 The phospholipidosis training set (l) and test set (r).
Training Set Inducer Test Set Inducer
17-a-ethynylestradiol Negative 3-methylcholanthrene Negative 1-amino-4-octylpiperazine Positive AC-3579 Positive 1-chloro-10,11-dehydroamitriptyline Positive acetaminophen Negative
1-chloroamitriptyline Positive amikacin Positive 6-hydroxydopamine Positive amiodarone Positive abacavir Negative amitriptyline Positive amineptine Negative anticoman Negative amodiaquine Positive aricept Negative azaserine Negative AY-25329 Negative azithromycin Positive AY-9944 Positive bilirubin Positive bicalutamide Negative brompheniramine Positive boxidine Positive caffeine Negative bupropion Negative carbamazepine Negative carbon tetrachloride Negative carbon tetrachloride Negative ceftazidime Negative ceftazidime Negative cephaloridine Positive
54
Chapter 3
chloroquine Positive chlorcyclizine Positive chlorpromazine Positive chloroform Negative chlortetracycline Negative chlorphentermine Positive ciprofibrate Negative citalopram Positive clociguanil Positive clindamycin Positive clofibrate Negative clomipramine Positive colchicine Negative clozapine Positive cyclizine Positive cocaine Positive cyproterone acetate Negative coralgil Positive desipramine Positive dantrolene Negative (d-)H-4,4-bis-diethylaminoethoxy-diethylphenylethane Positive demeclocycline Negative
dibucaine Positive desferal Negative erythromycin Positive dibekacin Positive etoposide Negative diclofenac Negative felbamate Negative diflunisal Negative fenofibrate Negative di-isobutamide Positive fluoxetine Positive doxapram Negative galactosamine Negative doxycycline Negative gemfibrozil Negative emetine Positive gentamicin Positive ethyl fluclozepate Positive hydroxyzine Positive famotidine Negative hypoglycin-A Negative fenfluramine Positive iprindole Positive flutamide Negative ketoconazole Positive homochlorcyclizine Positive lysergide Positive hydrazine Negative methotrexate Negative hydroxyurea Negative methyldopa Negative IA3 Positive norchlorcyclizine Positive imipramine Positive nortriptyline Positive indoramin Positive phenacetin Positive (l)-ethionine Negative pheniramine Positive maprotiline Positive phenobarbital Negative meclizine Positive phentermine Positive mesoridazine Positive physostigmine Negative metformin Negative piroxicam Negative methadone Negative quinine Positive methapyrilene Negative R-800 Positive mianserin Positive RMI10393 Positive netilmicin Positive SC-45864 Probable noxiptiline Positive stilbamidine Positive paraquat Positive sulindac Negative perhexiline Positive suramin Positive procaine Negative tamoxifen Positive promethazine Positive temozolomide Negative propranolol Positive tetracaine Positive quinacrine Positive thioacetamide Negative quinidine Positive tilorone Positive rolitetracycline Negative tobramycin Positive SDZ_200-125 Positive tocainide Positive SKF-14336-D Positive trimeprazine Positive stavudine Negative trimethoprim sulfamethoxazole Positive tacrine Negative trospectomycin sulfate Positive trifluperazine Positive valproic_acid Negative triparanol Positive WY-14643 Negative tunicamycin Positive zidovudine Negative valproic_acid Negative zileuton
Negative
zimelidine
Positive
55
Support Vector Classification of Phospholipidosis-Inducing Drugs
3.1.3 Surface Autocorrelations
Surface autocorrelations are cross-correlations of various surface property values
that have been shifted by some distance on the molecular surface. Introduced by
Gasteiger82,83 as descriptors for use in molecular binding studies, they are used to discover
periodic patterns or fundamental harmonics that may not be apparent by inspection. There
are six general autocorrelation functions implemented in Parasurf ’06 that describe local
molecular electrostatic potential (3 functions), shape (1 function), local ionization
potential (1 function), and local electron affinity (1 function) cross-correlations. These are
used in the general vector equation:
( ) ( )2
1 1
1 tri trij
n nR ri
iji j itri
A R en
σω − −
= = +
= ∑ ∑ (3.1)
where rij is the distance between surface points and ωij is one of the four autocorrelation
functions. Each autocorrelation vector has 128 elements, starting at a radius of 2.5Å with
increments of 0.06Å.
The three MEP functions that describe the three possible sets of cross products are
defined in the following table.
Table 3.2 Molecular electrostatic functions used in surface autocorrelations.
Plus-plus MEP autocorrelation,
VPP
ij i jV Vω = × where (Vi > 0 and Vj > 0)
0.0ijω = where (Vi <0 or Vj < 0)
Minus-minus MEP autocorrelation,
VMMij i jV Vω = × where (Vi < 0 and Vj < 0)
Plus-minus MEP autocorrelation,
VPM
ij i jV Vω = − × where (Vi × Vj < 0)
0.0ijω = where (Vi × Vj > 0)
56
Chapter 3
Similarity indices may be calculated for data sets by comparison with a reference structure
by
( ) ( )( )
( ) ( )( )1 2
1 1 2
2 min ,1 Ni i
i i i
A R A RS
N A R A R=
⋅=
+∑ (3.2)
for ( ) ( )1 21
0N
i ii
A R A R=
⎡ ⎤+ >⎢ ⎥⎣ ⎦∑
where A1(Ri) is the value of the autocorrelation function for molecule 1 at a distance Ri and
N is the number of points within the defined range of R for which the sum is non-zero.
The similarity indices are calculated for the range of each of the autocorrelation functions,
as well as for the first four quartals of the distance range for each of the functions.
3.1.4 Statistical Methods
The principal statistical method used to predict phospholipidosis induction from
molecular surface descriptors was the Support Vector Machine84-86 (SVM), with a
multivariate adaptive regression splines87,88 (MARS) method used to compare the
prediction accuracy of the best SVM models. In the case of a small difference in
prediction accuracy between the training set and the test set for the support vector
machines and a relatively large difference in prediction accuracy for the regression splines
models, over-fitting of the data by the SVM’s would be assumed.
3.1.4.1 Support Vector Machines
The Support Vector Machine (SVM) is a machine-learning technique for
classification that involves a non-linear mapping of data into a high-dimensional feature
space, then using structural risk management to find a separating hyperplane with the
largest margin between the transformed data. These learning machines have been shown
to classify with an accuracy at least as good as the various neural net methods85.
57
Support Vector Classification of Phospholipidosis-Inducing Drugs
Margin
H2
H1
w
Origin
-b|w|
Figure 3.1 Linear maximum-margin hyperplane with circled support vectors.
The method for solving the maximum-margin hyperplane (Figure 3.1) problem involves a
minimization of the Lagrangian formulation
( )2
1 1
12
l l
P i i i ii i
L y bα α= =
≡ − ⋅ + +∑ ∑w x w (3.3)
where w is a vector normal to the hyperplane, /b w is the perpendicular distance from
the hyperplane to the origin, and αi are Lagrange multipliers with i=1, … , l, one for each
input vector, subject to the constraint, ( ) 1 0i iy b⋅ + − ≥x w , i∀ . In the convex quadratic
programming solution of the maximum and the minimum LP, the training data are mapped
into dot product feature space (represented here by the vector pair, xi and xj) after
requiring that the gradient of LP be subject to:
i i ii
yα= ∑w x (3.4)
and
0i ii
yα =∑ (3.5)
in the following “dual formulation” equation:
,
12D i i j i j
i i j
L yα α α= − ⋅∑ ∑ i jx xy (3.6)
58
Chapter 3
In the nonlinear case this is a difficult solution until one employs the “kernel trick”84 to
express the dot products and the mappings Φ into the Hilbert space:
: dΦ ℜ ⇒ H (3.7)
in terms of some kernel function of the form
( ) ( ) ( )i j i jK = Φ ⋅Φx ,x x x (3.8)
By this method, wherein the dot products are defined in the new space as a single function,
it becomes unnecessary to determine Φ explicitly. This is especially useful in the case of
the commonly-used radial basis function:
( )2
i j
i jK e γ− −= x xx ,x (3.9)
(for γ>0), which renders H infinite-dimensional.
In C-SVC, or C-support vector classification86, the construction of the optimal
hyperplane involves the minimization of the functional
, , 1
12min
lT
ib i
Cξ
ξ=
⎧ ⎫+⎨ ⎬⎩ ⎭
∑w
w w (3.10)
subject to ( )( ) 1Ti iy x b iφ ξ+ ≥ −w and ξi ≥ 0, where w again represents the hyperplane
vector, b is a bias term, and the summation term represents the sum of deviations, ξ, of
training errors and maximizes the margin for the correctly classified vectors. If the
training data can be separated without errors, then the constructed hyperplane corresponds
to the optimal margin hyperplane86. And by varying the value of the C term in the
expression, one can vary the trade-off between the complexity of the decision surface and
the frequency of error, in effect “tuning” the SVM’s ability to generalize.
Another algorithm for constructing the hyperplane is the ν-SVC method89, which
uses a parameter, ν, to set the upper bound on the fraction of training errors and a lower
bound on the fraction of support vectors. The formulation of this method is seen in the
formulation
, , , 1
1 1min2
lT
ib ilξ ρνρ ξ
=
⎧ ⎫− +⎨ ⎬⎩ ⎭
∑ww w (3.11)
59
Support Vector Classification of Phospholipidosis-Inducing Drugs
subject to ( )( )Ti iy x b iφ ρ ξ+ ≥ −w and ξi ≥ 0, ρ≥ 0, where l is the number of input of
input vectors, b is a bias term, and φ(xi) is the feature space mapping.
The expectation value of the probability of error is bounded by the ratio of the
expectation value of the number of support vectors to the number of training vectors,
expressed as:
Pr( ) sv
tv
nerror
n≤ (3.12)
where nsv is the number of support vectors and ntv is the number of training vectors. So, if
the optimal hyperplane can be constructed from a small number of support vectors relative
to the training set size, then the ability of the SVM to generalize will be high. This is true
even when the feature space is infinite dimensional, since the complexity of the learning
algorithm does not depend on the dimensionality of the feature space, but on the number
of support vectors86.
3.1.4.2 Multivariate Adaptive Regression Splines
The second classification method applied to the surface property data was the
multivariate adaptive regression splines technique87, which generates a prediction model
by selecting a weighted sum of basis functions from the set of basis functions spanning all
descriptor values, then adding basis functions to the predictive equation on a least squares
goodness-of-fit criterion, according to the general equation:
( ) ( )0
M
m mm
f Bα=
= ∑x x (3.13)
where BBm(x) represents the set of basis functions and αm is a real number indicating the
contribution of the basis function:
( ) ( )( ),1
mK
m km v k mk
B b x=
= ∏x (3.14)
where v(k,m) is an index of the factor used as the argument of bkm.
The basis functions here are categorical two-sided truncated linear functions used to
approximate the response to predictor variance:
60
Chapter 3
( )( ) ( )( )( )( ) ( )( )
, ,
, ,
km kmv k m v k m
km kmv k m v k m
b x I x A
b x I x A
+
−
= ∈
= ∉
(3.15)
(3.16)
with I(xv(k,m)) representing an indicator function having a value of one if true and zero if
false. Akm is a subset of the possible values of xv(k,m).
Both of the multivariate analysis techniques described (support vector machines and
multivariate adaptive regression splines) are readily applied to problems of high
dimensionality, where the number of predictors is large and the variance within the
predictors tends to add noise to underlying correlations.
3.2 Methods
The Pfizer set of 144 canonical SMILES90 (Table 3.1) were converted to 3D
structures with CORINA50, 51 and then geometry-optimized in the gas phase using the
AM1 Hamiltonian in VAMP 9.0. The molecular surface properties were calculated and
mapped with Parasurf ‘06, using the marching-cube algorithm to fit to an isodensity
surface of 8.0×10-3 e·Å-1. Parasurf’s 40 standard statistical descriptors (Chapter 2, Table
2.2) were augmented with three physical property descriptors calculated from multiple
regression models for logP, the free energy of solvation in octanol ΔGsolv.(oct.), and the
free energy of solvation in water ΔGsolv.(H2O), as described in the chapter on surface-
integral models. These 43 descriptors were then used to train the C-support vector
classification (C-SVC) and ν-support vector classification (ν-SVC) machines using Chang
and Lin’s libsvm91. The data set was linearly scaled from –1 to +1 to prevent single
descriptor domination of the training and a radial basis function (RBF) was used with
adjustable parameters, C and γ, determined by libsvm included cross-validation and grid
searching routine. The grid search is an automated trial of (C,γ) pairs that runs until a best
cross-validation accuracy is reached. The γ parameter is the multiplier in the RBF
exponential (Equation 3.9) and the C parameter corresponds, in the libsvm authors’
61
Support Vector Classification of Phospholipidosis-Inducing Drugs
formulation for the soft margin hyperplane minimization expression, to an upper bound
applied to the sum of training errors (Equation 3.10).
libsvmmachine
support vector ParaSurf ‘06Surface Descriptors
CORINA3D Structures
VAMP 9.0Geometry Optimization
Figure 3.2 General processing pathway from SMILES strings to phospholipidosis prediction.
The University of Minnesota’s XTAL package92 was used for the training of
multivariate adaptive regression splines. Piecewise-linear splines were used with a
varying maximum number of basis functions to be used and the number of interactions set
to the number of predictors in each case. A Leave One Out cross-validation scheme was
used with the training parameters varied individually until a minimum RMS error was
achieved.
3.3 Results
In the following section, predictions of the SVM and MARS models are presented
in the form of confusion matrices, with the actual value of positive or negative with
respect to the induction of phospholipidosis in rows with italicized text labels and the
predicted values in columns with plus (+) and minus (−) symbols as labels.
62
Chapter 3
3.3.1 Support Vector Machines
The support vector classification using the standard set of statistical descriptors
with spherical-harmonic fitting yielded a model with a prediction accuracy of 47% for
negatives and 95% for positives; an overall accuracy of 75%. This SVM (ν-SVC, ν=0.5)
uses a radial basis function (γ=0.016) and possesses 43 support vectors.
− +
Negative 14 16
Positive 2 40
Figure 3.3 Confusion matrix for test set using spherical-harmonic fitting.
A support vector machine was trained with the marching-cube surface-fitted
properties, which yields a model with a prediction accuracy of 43% for negatives and 90%
for positives; an overall accuracy of 71%. The SVM (ν-SVC, ν=0.4) uses a radial basis
function (γ=0.016) possesses 42 support vectors.
− +
Negative 13 17
Positive 4 38
Figure 3.4 Confusion matrix for the test set using marching-cube fitting.
An analysis of the correlation between the Parasurf statistical descriptors and the
target classification was performed and the ten most significant descriptors for the
marching-cube-fitted surface properties were used, along with calculated logP, as an
enriched predictor set (Table 3.3) and SVM’s trained. Three of these descriptors were
from a set of newly-added Parasurf statistical descriptors, describing the skewness and
kurtosis of the distribution of values for each of the local properties. The best ν-SVM
63
Support Vector Classification of Phospholipidosis-Inducing Drugs
(ν=0.500) with a radial basis function (γ=0.091) had 41 support vectors and an overall
prediction accuracy of 75% for the test set (57% for negatives, 88% for positives).
− +
Negative 17 13
Positive 5 37
Figure 3.5 Confusion matrix for the test set using descriptors that are highly-correlated with the target
values.
Table 3.3 Enriched descriptor set: 11 descriptors.
1 logP
2 Mol. Vol.
3 Mean (−) MEP
4 MEP (−) variance
5 MEP kurtosis
6 Mean IEL
7 IEL variance
8 IEL kurtosis
9 Mean χL
10 χL variance
11 χL skewness
Given that the compounds that have been observed to induce phospholipidosis are
generally cationic amphiphilic structures, an additional correlation analysis was performed
using pKa and logP values calculated from surface-integral models. These two surface-
integral properties proved to correlate well and were added to a list of the 18 most
significant descriptors (Table 3.4) and used to train a support vector machine (C-SVC,
C=0.8; radial basis function, γ=0.048). This consisted of 49 support vectors and yielded a
prediction accuracy of 53% for negatives and 98% for positives; or 79% overall.
64
Chapter 3
− +
Negative 16 14
Positive 1 41
Figure 3.6 Confusion matrix for the test set using additional descriptors, pKa and logP.
Table 3.4 Enriched descriptor set: 20 descriptors.
1 pKa
2 logP
3 Dipolar density, μD
4 Molecular polarizability, α
5 Mol. Wt.
6 Globularity
7 Mol. surface area
8 Mol. volume
9 Mean (+) MEP
10 Mean (−) MEP
11 MEP (−) variance
12 MEP total variance
13 MEP kurtosis
14 Mean IEL
15 IEL variance
16 IEL skewness
17 IEL kurtosis
18 Mean χL
19 χL skewness
20 ηL skewness
21 αL skewness
Six ν-SVM’s were trained using sets of 128 each of autocorrelation vectors for
shape, molecular electrostatic potential, electron affinity, and ionization potential. The
SVM using electron affinity autocorrelation vectors consisted of 46 support vectors
(ν=0.400; RBF, γ=0.008) and yielded a prediction accuracy of 37% for negatives and 74%
for positives, or 58% overall.
65
Support Vector Classification of Phospholipidosis-Inducing Drugs
− +
Negative 11 19
Positive 11 31
Figure 3.7 Confusion matrix for the test set using EAL autocorrelation vectors.
The SVM using ionization potential autocorrelation vectors consisted of 25 support
vectors (ν=0.200; RBF, γ=0.008) and yielded a prediction accuracy of 57% for negatives
and 76% for positives, or 68% overall.
− +
Negative 17 13
Positive 10 32
Figure 3.8 Confusion matrix for the test set using IEL autocorrelation vectors.
The SVM using shape autocorrelation vectors consisted of 40 support vectors (ν=0.300;
RBF, γ=0.008) and yielded a prediction accuracy of 37% for negatives and 83% for
positives, or 64% overall.
− +
Negative 11 19
Positive 7 35
Figure 3.9 Confusion matrix for the test set using shape autocorrelation vectors.
The SVM using molecular electrostatic potential autocorrelation vectors (minus-minus)
consisted of 42 support vectors (ν=0.200; RBF, γ=0.008) and yielded a prediction
accuracy of 47% for negatives and 88% for positives, or 71% overall.
66
Chapter 3
− +
Negative 14 16
Positive 5 37
Figure 3.10 Confusion matrix for the test set using VMM autocorrelation vectors.
The SVM using molecular electrostatic potential autocorrelation vectors (plus-minus)
consisted of 44 support vectors (ν=0.200; RBF, γ=0.008) and yielded a prediction
accuracy of 37% for negatives and 76% for positives, or 60% overall.
− +
Negative 11 19
Positive 10 32
Figure 3.11 Confusion matrix for the test set using VPM autocorrelation vectors.
The SVM using molecular electrostatic potential autocorrelation vectors (plus-plus)
consisted of 40 support vectors (ν=0.200; RBF, γ=0.008) and yielded a prediction
accuracy of 43% for negatives and 90% for positives, or 71% overall.
− +
Negative 13 17
Positive 4 38
Figure 3.12 Confusion matrix for test set using VPP autocorrelation vectors.
The sets of autocorrelation vectors were also combined (truncated to 28 increments each)
and used to train a ν-SVM (ν=0.500; RBF, γ=0.006) with 45 support vectors. The
prediction accuracy was 53% for negatives and 95% for positives for an overall accuracy
of 78%.
67
Support Vector Classification of Phospholipidosis-Inducing Drugs
− +
Negative 16 14
Positive 2 40
Figure 3.13 Confusion matrix for test set using all autocorrelation vectors.
The ν-SVM’s were trained with a 10-fold cross-validation scheme with no
misclassification penalty bias (misclassification of negatives are equivalent to
misclassification of positives in the training). As such, the support vector machines
tended toward far more false negatives than false positives.
3.3.2 Multivariate Adaptive Regression Splines
Using Autocorrelation Indices
Using the local properties mapped onto the marching cube surface for each
structure, 66 autocorrelation similarity indices described in Section 3.1.3 for all
compounds in both sets were calculated using the surface of valproic acid (test set;
negative) as a reference. The final model consisted of 12 basis functions and had a
generalized cross-validation error of 0.219. The prediction accuracy for the training set
was 97%, predicting one false positive and one false negative (Figure 3.14). When the
regression splines equations were applied to the test set, a prediction accuracy of 68%,
with 20 false positives and 3 false negatives was found (Figure 3.15).
− +
Negative 27 1
Positive 1 42
Figure 3.14 Confusion matrix for the training set.
68
Chapter 3
− +
Negative 10 3
Positive 20 39
Figure 3.15 Confusion matrix for the test set.
The same procedure was applied using the standard set of Parasurf ’06 statistical
descriptors. The training set yielded a model that consists of 9 basis functions and had a
generalized cross-validation error of 0.351. The prediction accuracy for the training set
was 90%, predicting 3 false positives and four false negatives (Figure 3.16). When the
regression equations were applied to the test set, a prediction accuracy of 72%, with 15
false positives and 5 false negatives (Figure 3.17).
− +
Negative 25 4
Positive 3 40
Figure 3.16 Confusion matrix for the training set.
− +
Negative 15 5
Positive 15 37
Figure 3.17 Confusion matrix for the test set.
Thus, the MARS models are slightly less accurate in their predictions as compared
to the support vector machines, but they also predict many more false positives (while the
SVM’s predict many more false negatives).
69
Support Vector Classification of Phospholipidosis-Inducing Drugs
3.4 Discussion
It seems clear from the comparison with the MARS prediction accuracies (of
approximately 70%) that there is not an obvious condition of over-fitting of the training
data in the case of the SVM’s, with an averaged prediction accuracy of 75.6%. The C-
SVC machine (RBF; C=0.8; γ=0.048) with the largest feature space margin (ρ=2.405)
with 49 support vectors was able to classify 57 of the 72 compounds in the test set
correctly (79% accuracy). It is generally useful to note the size of the feature space
margin, ρ, as an indicator of the relative ability of the SVM to generalize in the prediction
of new data, but as this value is in the units of the n-dimensional transformed feature space
for a particular SVM, it cannot be used as a standard measure. More useful is a direct
comparison with the predictive capacity of another multivariate analysis technique, such
as the MARS analyses presented here. That the two techniques give similar predictive
accuracies suggests that the best models will generalize as well as the training data will
allow.
Among the different SVM trainings using the various descriptor sets, there were 16
cases where molecules were predicted correctly consistently among all trained machines
and several cases where molecules were predicted incorrectly. The most consistently
well-predicted members of the test set are shown below in Table 3.5 in bold italics, while
the most misclassified compound, Ceftazidime, is underlined.
Table 3.5 Test set misclassifications among trained support vector machines.
Compound Number of Misclassifications
3-Methylcholanthrene 2 AC-3579 1
Acetaminophen 2 Amikacin 1
Amiodarone 1 Amitriptyline 1 Anticoman 3
Aricept 3 AY-25329 2 AY-9944 0
Bicalutamide 2 Boxidine 1
Bupropion 1 Carbon_tetrachloride 3
Ceftazidime 4
70
Chapter 3
Cephaloridine 1 Chlorcyclizine 2
Chloroform 3 Chlorphentermine 1
Citalopram 1 Clindamycin 0 Clomipramine 1
Clozapine 3 Cocaine 2 Coralgil 1
Dantrolene 1 Demeclocycline 0
Desferal 3 Dibekacin 1 Diclofenac 2 Diflunisal 2
Di-isobutamide 0 Doxapram 3
Doxycycline 1 Emetine 0
Ethyl_fluclozepate 2 Famotidine 2
Fenfluramine 2 Flutamide 0
Homochlorcyclizine 2 Hydrazine 2
Hydroxyurea 1 IA3 1
Imipramine 0 Indoramin 2
L-Ethionine 2 Maprotiline 1 Meclizine 0
Mesoridazine 1 Metformin 0 Methadone 3
Methapyrilene 3 Mianserin 2
Netilmicin 0 Noxiptiline 1
Paraquat 2 Perhexiline 1
Procaine 1 Promethazine 2 Propranolol 1 Quinacrine 1 Quinidine 1
Rolitetracycline 0 SDZ_200-125 3
SKF-14336-D 0 Stavudine 0 Tacrine 0
Trifluperazine 2 Triparanol 0 Tunicamycin 1 Valproic_acid 3 Zimelidine 0
71
Support Vector Classification of Phospholipidosis-Inducing Drugs
In general, structures that have a negative assay result for phospholipidosis
induction are under-represented in the data set and the multivariate adaptive regression
splines that have been applied to the surface property data have proved to predict more
false positives than false negatives, while the support vector machines predict in an
opposite manner. In both cases there is a tendency of the multivariate methods to bias
their predictions toward the correct classification of primarily positives or primarily
negatives, with the border between them remaining rather unresolved. Overall, the best
combinations of surface predictors and multivariate methods give a prediction accuracy of
75-79% for this data set. As more research is published on the prediction of
phospholipidosis, larger data sets will be available for use in the construction of
computational models and thus, the models themselves will improve.
In a previous experiment examining the effect of charge state on the predictive
capacity of the SVM’s, the structures in the training set were ionized according to their
charge state (ionized > 50%) in solution at physiological pH (7.4) and used to train several
SVM’s. The structures in the test set were also ionized by this criterion and used to test
the predictive capacity of the trained machines. The resulting SVM’s proved excellent in
predicting the charge state of the molecules, but very unreliable in predicting
phospholipidosis induction (~50% overall accuracy). As a result, all structures were used
in their neutral forms. It would seem that, while the charge state of a given molecule may
represent its true state in solution, the effect on the surface descriptors is to diminish the
impact of the weaker, non-electrostatic components such as molecular polarizability in the
subsequent classification schemes.
The work of Tomizawa, et al.93, drawing on earlier work by Ploemen94 and
Fischer95, describes the use of two predictors, net molecular charge (NC) (based on the
relative charge distribution of molecules in solution at a specified pH of 4.0 from a
calculation of pKa) and ClogP, in the prediction of phospholipidosis-inducing potential
(PLIP), giving a PLIP risk rating to each compound in their combined set of 63
compounds. The reported prediction accuracy is 98%, with only one misclassified
compound in their validation set. This simple and efficient method seems highly
predictive, but it is little more than a set of rules in the manner of Lipinski’s Rule of Five96
and, as the authors indicate, its accuracy is wholly dependent on the degree of accuracy of
the atom/fragment-based calculations of pKa and ClogP. And aside from predicting that
cationic amphiphilic species are, in fact, cationic and amphiphilic, the method does not
72
Chapter 3
explore or allow for the exploration of the underlying relationships between the CAD and
its environment, in terms of the close-contact regions with the surfaces of intra-lysosomal
phospholipids or phospholipases.
Thus, in terms of application to efficient high-throughput virtual screening, the
more lightweight, less computationally-intensive methods are the more desirable, which,
in this case are the methods of Tomizawa, Ploemen, and Fischer, with whatever their
actual prediction accuracies might be. In the case of our local property/SVM technique,
what is lost in a marginally greater computational cost is made up for in the accumulation
of local property information that may be used to ascribe electronic surface interactions to
actual processes involved in inducing phospholipidosis. The main drawback here, again,
is the present lack of interpretability of the local properties as a collection of statistical
measures. However, insofar as the properties of pKa and logP, themselves, may be
accurately predicted by local property surface-integral models (Chapter 2), it seems clear
that local surface properties must play a significant role in the interplay of forces
governing the initiation of phospholipid aggregation within the lysosome.
3.5 Conclusions
This study demonstrated the use of local surface properties in a support vector
machine methodology to predict phospholipidosis induction given a set of molecular
surfaces as described by statistical measures of the local properties. It is interesting to
note that the support vector machine trained with the additional pKa and logP descriptors
calculated from surface-integral models was the most accurately predictive in terms of
classification by local property descriptor. This suggests not only the importance of these
particular properties to the process of phospholipidosis induction, but, as these values are
themselves calculated from the same pool of local surface properties, the importance of
the local properties in describing the range of surface interactions involved in the process
associated with phospholipidosis.
73
Chapter 4
Three-Dimensional Quantitative
Structure-Activity Relationships Using
Local Properties
4.1 Introduction
4.1.1 Comparative Molecular Field Analysis
Comparative Molecular Field Analysis97, or CoMFA, is a 3D-QSAR method
developed by the group of Richard D. Cramer, III that involves modeling the relationship
between ligands and receptors in terms of steric and electrostatic interactions. This is
done by aligning a set of molecular structures that have an associated activity value (logK,
inhibitory concentration, etc.). A three-dimensional grid is generated around the aligned
molecules and probe “atoms” are placed at each point in the grid. The steric and
electrostatic potentials that arise from proximity with the atoms in the aligned molecules
are recorded and used in a partial least squares regression with the activity values as
independent variables. A Leave-One-Out cross-validation scheme is used in the Tripos’
74
Chapter 4
SYBYL36 implementation of CoMFA to estimate the predictive capacity of the model in
terms of q2, the cross-validated r2 of the model. A three-dimensional contour map is then
plotted that relates regions of steric and electrostatic potential to activity. Colored figures
in the space around the aligned molecules indicate regions that relate positively and
negatively to activity (Figure 4.1).
Figure 4.1 A representation of a SYBYL CoMFA analysis of coumarin substrates
as inhibitors of cytochrome P450 2A598.
The most common method of molecular alignment is by substructure. The
alignment algorithms use a reference fragment as a template for aligning all other
structures, as in SYBYL. Cepos InSilico’s Parafit aligns structures using a set of
spherical harmonic functions that are produced by Parasurf to generate a molecular
surface. Local properties that are mapped onto the surface, i.e. onto the spherical
harmonic functions, can then be used as a template for alignment by common electronic
properties such that the set of molecules need not have a common substructure. In
addition to the alignment by overlaying the spatial positions of spherical harmonics,
alignment may also be conducted by similarity of fitted local electronic properties, such as
electronegativity.
75
3D-QSAR Using Local Properties
The measure of the predictive capacity of a CoMFA model, according to the
SYBYL manual, is found in the statistical measures r2, the regression coefficient of the
model, and q2, the “predictive r2”. The latter is the measure of the fit of the cross-
validated predictions which, in the case of the standard CoMFA and the method employed
here, is a full Leave-One-Out cross-validation scheme, with each case left out in turn and
predicted by the rest of the data set. The value of r2 should always be greater than 0.6 (a
good model should have an r2 > 0.9) and the value of q2 could fall into three categories:
• q2 > 0.6: The model is fairly good.
• 0.4 < q2 < 0.6: The model is questionable.
• q2 < 0.4: The model is poor.
In addition, a minimum number of vector components (described in the following section)
should be used that improves r2 by at least 5%. Typically, the number of components in a
given model should be no greater than seven or eight. In general, the lower the number of
components, the more straightforward the relationship between the probe parameters and
observed activity.
4.1.2 Partial Least Squares Regression
Representing the large number of steric and electrostatic potential values
determined for each of the many grid points (in some cases, thousands of values) in a
meaningful way becomes difficult for typical statistical analytical methods. It is a
problem of how to select the important predictors from such a large set of data. In
instances of QSAR/QSPR modeling where multiple regression analyses result in poorly-
predicting models due to cases of over-fitting of the data or where large numbers of
factors cannot be avoided due to the nature of the experiment, a statistical method very
similar to principal component analysis, called Partial Least Squares (PLS) analysis99, can
be used to extract latent factors in the data that account for the variation in the target
values. Introduced by Wold100 and co-workers around 1979, and referred to as the
Projection to Latent Structures in statistics texts101, the general method involves
transforming the matrices of the factors and the target values into new vector spaces such
76
Chapter 4
that the relationship between successive pairs of scores is a high as possible. Directions in
transformed factor space that associate with the greatest variance in the responses that are
also biased toward directions in response space that result in accurate predictions are used
as a means of indirectly modeling the target values. The extracted factors depend on all
input variables, with each factor contributing successively less to the predictivity of the
model. Thus, while there is no data reduction involved in the process itself, only a certain
number of factors, or vector components, (usually determined by some measure of residual
variance) are used in the final model. The most common method of determining the
maximum number of vector components to be used is by cross-validating by each target
value until a minimum value is reached.
4.1.3 Local Properties
It has been argued102 that steric and electrostatic fields do not present a complete
picture of drug-receptor interactions, so more recently other 3D-QSAR methods have been
developed that take additional physicochemical properties into account. One such method
is known as Comparative Molecular Similarity Index Analysis103 (CoMSIA) and follows
the same general CoMFA methodology, but using atomic probes for local hydrophobicity,
hydrogen bond donors and acceptors, as well as for steric and electrostatic potential
contributions. Another major difference lies in the use of a Gaussian distance function
applied to grid values such that there are no dramatic property changes from grid point to
grid point and the use of similarity indices between structures for each property used as
factors in PLS analysis. The indices are calculated by:
2
1
( ) iqn
rqk i
i
S j e αω ω −
=
= − ⋅ ⋅∑ (4.1)
where, for molecule j, ωi is the target property value, ωi is the local property value at grid
point q for a probe atom (charge +1, radius 1Å, hydrophobicity +1, H-bond donor index
+1, and H-bond acceptor index +1), riq is the distance between grid point q and atom i, and
α is an attenuation factor. The models that result from CoMSIA are generally more
predictive in terms of q2 than their CoMFA counterparts and have the ability to model the
77
3D-QSAR Using Local Properties
binding surfaces of the ligand-substrate complex more accurately in terms that are familiar
to the biochemist.
In the interest of expanding the descriptive vocabulary of 3D-QSAR, a method has
been developed that uses local properties to model the electronic interactions of drug
binding surfaces using a methodology analogous to CoMFA. The approach described
below begins with the standard steric and electrostatic descriptors of CoMFA, in the form
of the local electron density and the local molecular electrostatic potential, respectively.
To these are added the local properties of electron affinity, ionization potential,
electronegativity, hardness, and polarizability. The result is an augmented molecular field
analysis that is interpreted in a 3D-graphical manner identical to that of CoMFA, but with
additional property fields that may be used alone or in combination to reveal important
intermolecular interactions not elucidated by shape and charge fields alone.
Figure 4.2 A set of aligned structures in a CoMFA grid.
78
Chapter 4
4.2 Computational Methods
The structures in the following data sets were aligned using SYBYL 7.0 via
conversion to Tripos mol2 format, followed by conversion back into MDL SD format.
Semi-empirical MO calculations with the AM1 Hamiltonian were performed on each
using VAMP 9.0 to calculate charges and orbital information, with or without geometry
optimization as indicated. A three-dimensional grid with a point spacing of 2 Ångstroms
and a 4 Ångstrom border was generated by script and was used in the calculation of seven
local properties: electron density δe, electron affinity EAL, electronegativity χL, hardness
ηL, ionization potential IEL, electrostatic potential V, and polarizability αL at each grid
point using Parasurf ’07104 (Figure 4.2).
Figure 4.3 Representation of the local-property/activity CoMFA field for EAL.
Points interior to the molecular surface were removed from the grid by using a
“generalized” van der Waals radius of 1.16 Ångstroms in order to ensure that property
values that bear no direct relation to surface activity would not appear in the PLS analyses.
This atomic radius, which is slightly smaller than the van der Waals radius for the
79
3D-QSAR Using Local Properties
hydrogen atom (1.20Å), was chosen in order to leave enough surface electron density to
use the local electron density as a steric parameter in the 3D-QSAR analysis. The local
property values at the grid points were then used as independent variables in separate
partial least squares regressions, using associated physical property values as target values.
The partial least squares analyses were performed using an in-house program using the
SIMPLS105 algorithm and a full cross-validation scheme (i.e. all cases are excluded and
predicted in turn), re-centering and re-scaling the included data for each run. The PLS
regressions were carried out initially to ten vector components in order to determine the
maximum number of components to be included in the final model, using the cross-
validated standard error of prediction (SEP) of the model to choose the appropriate
number of components (as in SYBYL). In the cases where PLS analyses were performed
using single local properties, the property data was normalized by the mean. The
regression coefficients for the final model were then used to generate a three-dimensional
representation of the property space with colored spheres using Pymol45. Those grid
points with a positive relationship to the particular target property are color-coded red,
while those with a negative relationship to the target property are color-coded blue. In
addition to color-coding, the size of the grid spheres, determined by the standardized
magnitude of the regression coefficients, represents the magnitude of the relationship
between the local property at that point to the target value (See Figure 4.3).
4.3 Results and Discussion
4.3.1 Serotonin Receptor Agonists/Antagonists
The common 5-HT1A and α1 -adrenergic agonist/antagonist data set consists of 23
thienopyrimidinone structures in Tripos mol2 format that had been optimized
previously106 with MM3107. The structures in Table 4.1 were aligned to structure 23 and
converted to MDL SD format. Single-point AM1 calculations were used to calculate the
charges and orbital information needed by Parasurf ’07 for the grid (4Å border, 2Å
80
Chapter 4
spacing) points surrounding the aligned molecules. The pIC50 values for 1) 5-HT1A
receptor binding and 2) α1 -adrenergic receptor binding for each compound were used as
target values in PLS analyses, where IC50 is the concentration of ligand that causes 50%
dissociation of [3H]-8-hydroxy-2-(di-N-propylamino)tetralin from the 5-HT1A receptor or
50% dissociation of [3H]-prazosin from the α1 receptor in binding assays.
The PLS analysis of the aligned data set using all local properties yielded a q2 of
0.761 with one vector component, an overall SEP of 0.793, and a model r2 of 0.870. The
cross-validated predictions are presented below in Figure 4.4. The results of the PLS
analyses using individual local properties are presented in Table 4.2.
Table 4.1 Thienopyrimidinone 5-HT1A and α1 agonists/antagonists.
N N
O
R3
S
S
R1 R2
N
NR4 N N
O
H2N
S
S
H3C CH3
N OCH3
21
N N
O
H2N
S N
N
OCH3
22
Structure R1 R2 R3 R4 5-HT1A pIC50 α1 pIC50
1 Me Me H 2-Cl-Ph 6.34 6.79
2 Me Me H 3-Cl-Ph 6.01 6.52
3 Me Me H 2-OMe-Ph 7.62 7.40
4 Me Me H 1-Naphthyl 6.45 6.05
5 Me Me H 2-Pyrimidyl 6.65 5.96
6 H 2-Cl-Ph 6.03 6.78
7 H 2-OMe-Ph 7.23 7.42
8 H 1-Naphthyl 6.43 6.35
9
-(CH2)4-
-(CH2)4-
-(CH2)4-
-(CH2)4- H 2-Cl-Ph 6.30 5.74
10 H Ph H 2-OMethenyl 6.41 6.65
11 H Ph H 1-Naphthyl 5.70 5.61
12 -(CH=CH)- H 2-OMe-Ph 7.34 7.04
13 H H NH2 2-OMe-Ph 8.92 8.54
14 Me 2-OMe-Ph 8.15 7.19
15
-(CH2)4-
-(CH2)4- NH2 2-OMe-Ph 8.89 7.41
16 Me Me NH2 Ph 8.48 7.37
81
3D-QSAR Using Local Properties
17 Me Me Me 2-OMe-Ph 8.52 7.57
18 Me Me NH-Ph 2-OMe-Ph 6.30 7.49
19 Me Me Me 2-Pyrimidyl 7.19 5.69
20 Me Me NH2 2-Pyrimidyl 8.17 6.30
21 − − − − 9.10 7.44
22 − − − − 9.30 8.40
23 Me Me NH2 2-OMe-Ph 9.52 8.14
4 5 6 7 8 9Experimental pIC50
4
5
6
7
8
9
Cal
cula
ted
pIC
50
Figure 4.4 Cross-validation predictions vs. actual values of 5-HT1A receptor binding pIC50.
Table 4.2 Partial least squares regression results for the 5-HT1A data set.
δe EAL χL ηL IEL V αL ALL
r2 0.682911 0.759950 0.974536 0.818111 0.936360 0.613866 0.780818 0.869525
q2 0.22754 0.66781 0.82511 0.698682 0.750238 0.509428 0.699553 0.761065
SEP 1.245238 0.913798 0.690154 0.874884 0.81127 1.059668 0.874447 0.792926
Components 1 1 2 1 2 1 1 1
None of the PLS regressions required more than two vector components to return q2
values greater than 0.6. The two exceptions (Table 4.2) are electronic density (δe) and
molecular electrostatic potential (V), which are the terms analogous to the two standard
CoMFA parameters. Judging from the q2 value (0.825) for the regression using only the
82
Chapter 4
local electronegativity, a 3D-QSAR model (Figure 4.5) using this single local property is
sufficient to predict the serotonin inhibitory concentration for the data set.
4 5 6 7 8 9Experimental pIC50
4
5
6
7
8
9
Cal
cula
ted
pIC
50
Figure 4.5 Cross-validation predictions vs. actual values of 5-HT1A receptor
binding pIC50 using only local electronegativity.
Figure 4.6 Local-electronegativity/activity field for the aligned 5-HT1A data set.
83
3D-QSAR Using Local Properties
In Figure 4.6, a strong positive relationship with activity is observed near the distal end of
aligned thienopyrimidinone nitrogenous substituents, while a larger region of negative
activity resides near the distal nitrogen of the aligned piperazine rings.
4.3.2 Adrenergic Receptor Agonists/Antagonists
The PLS analysis of the aligned data set using all local properties yielded a q2 of
0.700 with two vector components, an overall SEP of 0.602, and a model r2 of 0.980. The
cross-validated predictions are presented below in Figure 4.7 and the results of the PLS
analyses using individual local properties are presented in Table 4.3.
4 5 6 7 8 9Experimental pIC50
4
5
6
7
8
9
Cal
cula
ted
pIC
50
Figure 4.7 Cross-validation predictions vs. actual values of α1-adrenergic receptor binding pIC50.
Table 4.3 Partial least squares regression results for the α1-adrenergic receptor data set.
δe EAL χL ηL IEL V αL ALL
r2 0.837437 0.992341 0.964925 0.976671 0.949911 0.480773 0.656486 0.980002
q2 0.300313 0.676767 0.765624 0.635388 0.677426 0.296878 0.519698 0.700124
SEP 0.817689 0.633365 0.545112 0.659089 0.621647 0.820678 0.724712 0.60231
Components 2 7 3 4 3 2 1 4
84
Chapter 4
Here again, with a q2 of 0.766, the local electronegativity model predicts slightly
better than the combined local property model and the two standard CoMFA parameters
are the poorest-performing of the local properties. Additionally, more vector components
were required to construct each of the α 1 receptor models than were required for the 5-
HT1A models. The plot of experimental versus calculated pIC50 is presented below in
Figure 4.8.
4 5 6 7 8 9Experimental pIC50
4
5
6
7
8
9
Cal
cula
ted
pIC
50
Figure 4.8 Cross-validation predictions vs. actual values of α1-adrenergic
receptor binding pIC50 using only local electronegativity.
Figure 4.9 Local electronegativity/activity field for the aligned α1-receptor data set.
85
3D-QSAR Using Local Properties
As in the case of the 5-HT1A data, a positive response in local electronegativity near the
distal end of the thienopyrimidinone rings was observed for the α1 receptor data (Figure
4.9). There are several negative response regions, however, that describe a rather
complicated response in local electronegativity, primarily on the thienopyrimidinone end
of the structures. The property field regions of positive response common to both sets of
data indicate a relationship between electronegative (nitrogenous) substituents on the
thienopyrimidinone ring and inhibitory activity, while the regions of negative impact are
different for the two data sets.
4.3.3 Dopamine D4 Antagonists
The D4 dopamine antagonist data set consists of 29 MDL SD piperazine structures
that had been optimized previously108 with MM3. The structures were converted to mol2
format, then aligned to a central substructure using SYBYL 7.0 and converted back to SD
format. Single-point AM1 calculations using VAMP 9.0 were used to calculate charges
and orbital information that were used as input for Parasurf ‘07, which calculated the
local properties for each point in a three-dimensional grid (4Å border, 2Å spacing)
surrounding the aligned molecules. The pKi (the negative logarithm of the inhibition
constant) values for the dopamine D4 receptor binding for each compound were used as
the target values in PLS analyses.
Table 4.4 Piperazine dopamine D4 receptor antagonists.
NN
N N R1
R2
45
6
71-16, 26-29
R1
N N Cl
17-25
Structure R1 R2 exp. pKi calc. pKi
1
p-Cl-Ph
H
8.64
7.935
2 Ph H 7.78 7.379
86
Chapter 4
3 p-I-Ph H 8.52 8.047
4 p-F-Ph H 7.70 7.834
5 Me H 5.14 7.028
6 Et H 4.62 5.911
7 p-Cl-Ph 4-Me 7.30 7.522
8 p-Cl-Ph 7-I 8.30 9.439
9 p-Cl-Ph 7-Me 8.57 7.921
10 p-Cl-Ph 7-ethinyl 8.91 7.905
11 cyclohexyl H 5.35 6.698
12 m-Cl-Ph H 8.41 7.661
13 p-Cl-Ph 4,5-benzo 5.74 7.513
14 p-Cl-Ph 6,7-benzo 6.85 8.194
15 m-Cl-Ph 4,5-benzo 6.10 6.532
16 p-Cl-Ph 6,7-benzo 6.66 7.297
17 NH
− 7.58 7.805
18 NH
CN
NC − 8.02 8.426
19 HN
− 7.41 8.982
20 HN
NC
CN − 8.24 8.834
21 NH
− 7.74 7.822
22 N N
− 8.66 7.64
23
− 6.60 7.705
24 NH
N
− 9.21 7.581
25 NH
NN
Cl
N
− 7.80 7.101
26 p-ethinyl-Ph H 8.36 8.036
27 m,p-Cl-Ph H 8.25 7.708
28 m-CF3-Ph H 8.72 7.646
29 H CH2OH 7.71 7.935
87
3D-QSAR Using Local Properties
The PLS regression model using all local properties yields a q2 value of 0.623 with
three vector components and an overall SEP of 0.960, and a model r2 of 0.906. A plot of
the cross-validated predictions is presented in Figure 4.11. The results of the PLS
analyses using individual local properties are presented in Table 4.5.
Table 4.5 Partial least squares regression results for the D4 receptor data set.
δe EAL χL ηL IEL V αL ALL
r2 0.886454 0.533339 0.900134 0.922590 0.900603 0.930679 0.375814 0.905785
q2 0.67420 0.27462 0.626792 0.566274 0.616129 0.778059 0.098352 0.623449
SEP 0.94303 1.188695 0.955174 1.012352 0.972363 0.784165 1.235116 0.959651
Components 3 1 3 4 3 7 1 3
Figure 4.10 Molecular electrostatic potential field for the aligned D4 receptor set.
With this data set, the electrostatic potential (Figure 4.10) and electron density
regressions yield the most predictive models, with local electronegativity and local
ionization potential also providing significant contributions. It is, therefore, to be
88
Chapter 4
expected that a standard CoMFA would produce a comparable model and, indeed, the
reported q2 for the standard analysis with this data set was 0.739, with an SEP of 0.734
using seven vector components108.
4 5 6 7 8 9Experimental pKi
4
5
6
7
8
9C
alcu
late
d pK
i
Figure 4.11 Cross-validation predictions vs. actual values of dopamine D4 receptor binding pKi.
The original article described the use of an all-orientation109 sampling of CoMFA property
space to return the best possible q2 value, which may over-estimate the relationship
between the observed activity and the combined steric and electrostatic parameters.
4.3.4 Avian Influenza Neuraminidase Inhibitors
A subset of 21 2D structures and accompanying pIC50 values (Table 4.6) were
taken from a larger set of 126 avian influenza neuraminidase inhibitors110. These were
converted to 3D MDL SD files using Molecular Networks’ CORINA and subsequently
geometry-optimized with AM1 with VAMP 9.0. The structures were aligned with
SYBYL 7.0 and the set of Parasurf local properties were calculated for a grid (4Å border,
2Å spacing) surrounding the structures.
89
3D-QSAR Using Local Properties
Table 4.6 Avian influenza neuraminidase inhibitors. COOH
NHR3
R2R1O
Structure R1 R2 R3 exp. pIC50 calc. pIC50
1 CHEt2 NH2 COMe 9.00 7.15
2 C3H7 NH2 COMe 6.89 6.44
3 CH2CH2CF3 NH2 COMe 6.65 5.59
4 CH2CH=CH2 NH2 COMe 5.66 6.66
5 Me NH2 COMe 5.43 6.74
6 C2H5 NH2 COMe 5.70 7.08
7 C4H9 NH2 COMe 6.52 6.44
8 C5H11 NH2 COMe 6.70 6.86
9 C6H13 NH2 COMe 6.82 6.61
10 C7H15 NH2 COMe 6.57 6.93
11 C8H17 NH2 COMe 6.74 6.81
12 C9H19 NH2 COMe 6.68 6.40
13 C10H21 NH2 COMe 6.22 6.39
14 CH2CHMe2 NH2 COMe 6.70 5.56
15 CH(Me)Et (S) NH2 COMe 8.05 7.17
16 CH2C6H5 NH2 COMe 6.21 7.06
17 H NHC(=NH)NH2 COMe 7.00 7.75
18 C3H7 NHC(=NH)NH2 COMe 8.70 8.30
19 C4H9 NHC(=NH)NH2 COMe 8.52 8.43
20 CH(Me)Et NHC(=NH)NH2 COMe 9.30 9.86
21
C3H7
NH2
COMe
5.82
6.76
The PLS regression model using all local properties yields a q2 value of 0.678 with
four vector components and an overall SEP of 0.847, and a model r2 of 0.965. As can be
seen in Table 4.7, all of the local properties contribute significantly to the predictivity of
the model, with the exception of the local electron density and the local molecular
electrostatic potential. The regressions of local electron affinity and local molecular
polarizability are the best predictors of activity in this case, with q2 values greater than 0.7.
90
Chapter 4
Either of these local properties, alone, should be adequate in predicting the observed
activity and indicating the dependence of the activity on EAL or αL.
Table 4.7 Partial least squares regression results for the neuraminidase inhibitor data set.
δe EAL χL ηL IEL V αL ALL
r2 0.711406 0.953357 0.963215 0.986630 0.982036 0.606607 0.913741 0.965417
q2 0.559258 0.729133 0.680904 0.690092 0.681546 0.463248 0.733392 0.677654
SEP 0.937534 0.858793 0.838464 0.825561 0.836767 1.008261 0.782692 0.847242
Components 2 4 4 6 5 1 3 4
5 6 7 8 9Experimental pIC50
5
6
7
8
9
Cal
cula
ted
pIC
50
Figure 4.12 Cross-validation predictions vs. actual values of neuraminidase inhibition pIC50.
The regions of both positive and negative response for the local molecular polarizability
field are situated very near the central ring on the same side of the ring, suggesting a
somewhat complex positive response to polarizable R1 moieties.
91
3D-QSAR Using Local Properties
Figure 4.13 Local molecular polarizability field for the aligned neuraminidase inhibitor set.
4.3.5 Mutagenic Tertiary Amides
A set of 49 N-acyloxy-N-alkoxyamide structures111,112 that possess a fully sp3-
hybridized central nitrogen amide and accompanying mutagenicity values
(log[mutagenicity] at a concentration of 1 μmol/plate in Salmonella typhimurium TA100)
are presented in Table 4.8. These compounds have been shown to react with N7 of
guanine in the major groove of DNA through an SN2 mechanism involving the
displacement of the N-acyloxy group. Somewhat counter-intuitively, however, the less
reactive the compound, the more mutagenic it is113.
The 2D structures were converted to 3D MDL SD files using CORINA and
subsequently geometry-optimized with AM1. The structures were aligned using SYBYL
7.0 and the set of Parasurf local properties calculated for a grid (4Å border, 2Å spacing)
surrounding the structures. The PLS regression model using all local properties yields a q2
value of 0.678 with four vector components and an overall SEP of 0.847, and a model r2
of 0.965. The results of the local property regressions are presented in Table 4.9 below
and the cross-validated predictions of the model are presented in Figure 4.14.
92
Chapter 4
Table 4.8 The mutagenic tertiary amides data set.
ON
O
O
R2
R3
O
R1
Structure R1 R2 R3
1 Me φ p-Br-φ -CH2- 2 2,6-diMe-φ - φ Bu 3 3,5-diMe-φ - φ Bu 4 Me 3,5-diMe-φ - Bu 5 Me Me Bu 6 Me p-Br-φ - Bu 7 Me φ Bu 8 Me p-Cl-φ - Bu 9 Me p-Me-φ - Bu
10 Me m-NO2-φ - Bu 11 Me p-NO2-φ - Bu 12 Me p-φ-φ- Bu 13 Me p-t-Bu-φ- Bu 14 adamantanyl φ Bu 15 Bu φ Bu 16 φ adamantanyl Bu 17 φ φ Bu 18 φ i-Pr Bu 19 φ t-Bu-CH2- Bu 20 φ Et Bu 21 i-Pr φ Bu 22 t-Bu-CH2- φ Bu 23 t-Bu φ Bu 24 Me Me φ -CH2- 25 Me φ φ -CH2- 26 φ φ φ -CH2- 27 φ p-t-Bu-φ- φ -CH2- 28 p-benzaldehyde φ φ -CH2- 29 p-Cl-φ - φ φ -CH2- 30 p-cyano-φ - φ φ -CH2- 31 p-Me-φ - φ φ -CH2- 32 p-MeO-φ - φ φ -CH2- 33 m-MeO-φ - φ φ -CH2- 34 m-NO2-φ - φ φ -CH2-
93
3D-QSAR Using Local Properties
35 p-NO2-φ - φ φ -CH2- 36 p-φ-φ- φ φ -CH2- 37 p-t-Bu-φ- φ φ -CH2- 38 Me φ p-Cl-φ -CH2- 39 Me φ Et 40 Me φ i-Pr 41 Me φ p-Me-φ -CH2- 42 Me φ p-MeO-φ -CH2- 43 Me φ n-octanol 44 Me φ p-φ-φ-CH2- 45 Me φ p-φ-Ο-φ-CH2- 46 Me φ Pr 47 Me φ p-t-Bu-φ- CH2- 48 Me p-t-Bu-φ- p-t-Bu-φ- CH2-
49 φ φ p-t-Bu-φ- CH2-
Table 4.9 Partial least squares regression results for the mutagenic tertiary data set.
δe EAL χL ηL IEL V αL ALL
r20.790466 0.846869 0.726179 0.708745 0.717700 0.365975 0.531608 0.932495
q20.560957 0.621656 0.636329 0.615513 0.627104 0.322679 0.287702 0.693979
SEP 0.304009 0.291694 0.28339 0.289702 0.286228 0.350909 0.395195 0.266125
Components 3 4 1 1 1 1 1 4
Here again the standard CoMFA steric and electrostatic parameters would be
expected to produce a less predictive model as a result of lower q2 values and larger SEP
values for local electron density and local MEP in comparison to the other local property
q2 values. The q2 for local polarizability is also very low with this data set.
In one of the original papers describing these compounds112, it is indicated that
steric factors affect mutagenicity in two respects. The first of these concerns the ability of
the amides to enter the major groove of DNA in such a way that a stable transition state
geometry can be achieved and the second involves an observed decrease in SN2 reactivity
with increased bulk near the amide nitrogen. In our analysis, the major responses to
activity were found in the local properties EAL, χL, ηL, and IEL, with the local
94
Chapter 4
electronegativity field indicating a relationship between mutagenicity and the electron-
withdrawing character of para-substituents on the alkoxy phenyl ring moieties, as seen in
Figure 4.35, where the large red sphere is situated above the para- position (with respect
to the central nitrogen) of the benzoxy ring(s).
1 2 3Experimental
1
2
3
Cal
cula
ted
Figure 4.14 Cross-validation predictions vs. actual values for the mutagenic amides data set.
Figure 4.15 Local electronegativity field for the aligned mutagenic amides set.
95
3D-QSAR Using Local Properties
The local electron affinity field shown in Figure 4.16, which also contributes strongly to
predictivity, indicates a strong positive relationship with activity on the “puckered” side of
the central nitrogen and a strong negative relationship with activity on the opposite side.
Figure 4.16 Local electron affinity field for the aligned mutagenic amides set.
4.3.6 The Effect of Grid Orientation on Predictivity
A relationship recently has been observed109 between the relative orientation of the
grid surrounding the aligned structures and the predictive q2. An effect of the size of the
grid spacing on the predictivity of CoMFA models has also been reported114. It is now
well-documented that CoMFA analyses tend to give a range of q2 values as the grid
spacing changes or the grid is re-oriented around the aligned structures115. Tropsha, et al.
report that q2 may vary as much as 0.5 q2 units between grid re-orientations, which is
tantamount to the difference between a predictive model and a non-predictive model.
Wang, et al.109 have presented a grid orientation routine that samples all possible
translations and rotations to return the best possible CoMFA q2 value. It would seem that
96
Chapter 4
this approach defeats the purpose of using a validation scheme to estimate the predictivity
of the model since it takes advantage of a defect in the method to give the best possible
statistic. Böhm, et al.114 have attributed the dependence of q2 on grid orientation to the
steepness of calculated steric and electrostatic potentials at the lattice points with a typical
2Å spacing and the use of arbitrary cutoff values. If the grid spacing and orientation lead
to discontinuous values that bias the partial least squares model toward less estimated
predictivity (low q2 values), then might not these same discontinuities lead to an
overestimation of predictivity? A more reasonable use of an all-alignment method might
take the median q2 value from the distribution of values in a manner similar to the method
used by Kroemer, et al.116 in their examination of the effect of cross-validation techniques
on predictivity.
In an effort to offset the effect of grid orientation, the use of grids with 1Å spacing
has been reported108,116, but have not generally been adopted, possibly due to the
significant additional computational expense. For instance, a 1Å-spaced grid with the
same dimensions as a 2Å-spaced grid has roughly eight times as many points, resulting in
about eight times the number of dependent variables in the PLS analysis, which can easily
be ~10,000 in number with the use of several property descriptors. Additionally, the use
of the smaller grid spacing may not result in an improvement in q2. Indeed, neither the
analyses using the local properties presented here, nor those in Kroemer’s investigation116,
exhibit an improvement in q2 with the use of the smaller grid spacing.
To investigate the response in q2 for the Parasurf-generated local properties to
rotation of the grid, two data sets were aligned in a set of grids rotated in steps of 15°
through a range of −90° to +90° about a common Z-axis. Partial least squares analyses
were performed for each local property, including the combined set of properties for each
grid rotation.
97
3D-QSAR Using Local Properties
Table 4.10 Isoquinoline influenza neuraminidase inhibitors data set.
N
O
X
Structure X log 1/C
1
4-NO2
2.90
2 4-Br 2.77
3 4-CN 2.84
4 4-Cl 2.81
5 4-F 2.63
6 H 2.58
7 4-CH3 2.68
8 4-OCH3 2.62
9 4-OH 2.24
10 4-OC2H5 2.65
11 4-OC3H7 2.79
12 4-OC4H9 2.78
13 4-C(CH3)3 3.15
14 3-CH3 2.78
15 3-F 2.67
16 3-Cl 2.82
A set of 16 aligned influenza neuraminidase inhibitors117 in Table 4.10 exhibited a
very poor correlation to activity (log 1/C), with an average q2 value of −0.06593 for the
combined set of local property descriptors. The values of q2 for all of the individual local
property PLS analyses were observed to vary greatly among the rotated grids. The
electronic density models, which gave the overall best scores, yielded a maximum q2 value
of 0.682 and a minimum value of −0.012, with a range of 0.694 q2 units. The other local
properties varied similarly, as shown in Figure 4.17. Local ionization potential (IEL) and
local electronegativity (ENEG) periodically exhibit some measure of correlation to
98
Chapter 4
activity, while local polarizability (POL) consistently contributes nothing to the
predictivity of the model.
The previously described 5-HT1A data set had exhibited good q2 statistics and was
also evaluated for grid orientation dependency. The results of the PLS analyses are
presented in Figure 4.18 below. For five of the local properties: electronegativity,
hardness (HARD), ionization potential, polarizability, and electron affinity (EAL), grid
rotation has a very small effect on the predictivity as measured by q2, while electron
density (DENS) and electrostatic potential (MEP) exhibit a large variance in q2. This
would seem to suggest that when there is a significant contribution from these local
properties there is much less dependence on the orientation of the grid. While for the
steric and electrostatic parameters, the story is much the same as before: the predictive
quality of models that include them are subject to rather severe dependence on the
orientation of the grid.
-0.4
-0.2
0
0.2
0.4
0.6
0.8
-90° -75° -60° -45° -30° -15° 0° 15° 30° 45° 60° 75° 90°
degrees
q 2
DENSENEGHARDIELMEPPOLEAL
Figure 4.17 Response of q2 to grid rotation for each local property using the isoquinoline data set.
99
3D-QSAR Using Local Properties
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-90° -75° -60° -45° -30° -15° 0° 15° 30° 45° 60° 75° 90°degrees
q 2
DENSENEGHARDIELMEPPOLEAL
Figure 4.18 Response of q2 to grid rotation for each local property using the 5-HT1A data set.
The q2 for the PLS analysis using the set of combined local properties also exhibits
much less rotation dependence than the neuraminidase inhibitor data set (Table 4.19).
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
-90° -75° -60° -45° -30° -15° 0° 15° 30° 45° 60° 75° 90°degrees
q 2
Figure 4.19 Response of q2 to grid rotation for combined local properties using the 5-HT1A data set.
100
Chapter 4
The predictive capacity (q2) of the PLS regression models using the combined set
of local properties seems to suffer somewhat in comparison with the q2 values of the
individual local-property PLS regressions, presumably due to the addition of “noise” from
poorly-predicting local properties. This is not observed, however, in the case of the set of
mutagenic tertiary amides (Section 4.3.5), where the modestly-predictive local properties
appear to have an additive effect on the overall predictive capacity, even with the
inclusion of the poorly-predicting properties.
4.4 Conclusions
The application of the newly-introduced local properties: EAL, χL, ηL, IEL, V, and
αL, used previously only in a molecular surface context, to several comparative molecular
field analyses has proved to be substantially as predictive as the standard CoMFA steric
and electrostatic parameters. Indeed, the MEP, analogous to the electrostatic parameter in
standard CoMFA, has been found to be the least predictive of the local properties in some
of the cases presented here. Although it is a well-established property used in describing
Coulombic interactions (which are, in turn, the strongest contributors to intermolecular
interaction energies in the gas phase), the MEP is strongly attenuated in polar solvent such
that interactions from among other local properties may predominate in the overall
intermolecular interaction. In the case where a ligand molecule is “solvated” by inclusion
into an enzyme binding pocket, additional local properties offer a rationale for the
orientation of binding that goes beyond charge-cancellation.
In addition to the advantage gained by having a complement of local electronic
properties from which to establish structure-activity relationships, these local properties
seem to be exceptionally robust with respect to grid orientation, such that all-orientation
grid placement schemes may not be required to extract the best possible PLS q2 value.
The local property 3D-QSAR method presented here is similar to the CoMSIA method103
in that several properties contribute to the overall predictive response of the resulting
model. The local properties could very easily be adapted to a standard CoMSIA
methodology by the application of similarity indices to the local property values at the grid
points. The local properties can, in principle, also be used within the Hypothetical Active
101
3D-QSAR Using Local Properties
Site Lattice106,118 technique (HASL), whereby a 3D lattice of points internal to the van der
Waals radii of the aligned set of molecules is iteratively optimized in terms of partial
(biological, etc.) activities at each lattice point, generating a composite pharmacophore.
The application by any of these methods should prove to be an apt use of local electronic
properties to describe molecular/macromolecular interactions at the point of close contact.
As such, salient electronic features of the binding regions of drug targets may be better-
described by this ensemble of electronic terms.
102
Chapter 5
Conclusions and Outlook
5.1 Conclusions
The computational methods described here allow drug researchers a means of
applying quantum-mechanically-derived local electronic properties to in silico high
throughput screening schemes in such a way as to not only predict and classify by various
physical properties and biological activities, but also to describe in chemical terms the
nature of the observed activity as a function of surface properties. These properties may
also be visualized by mapping them onto an isodensity surface, making the identification
of important functional moieties readily accessible to everyone involved in the drug design
pathway. Large sets of 2D structures or SMILES strings may be evaluated for potential
problems, such as phospholipidosis-inducing potential, in terms that are useful to the
medicinal chemist. In this way, rapid screening of compounds can be achieved for any
property or activity that can be explained by local properties. The same pool of data can
then be used to extrapolate regions at or near the molecular surface that impact activity the
most. The major drawback in the use of local properties lies in their application to the
surface-integral models, where the interpretation of the model is less than straightforward.
However, when applied to a 3D-QSAR scheme, the individual contributions of the local
properties become evident and may be easily interpreted by indicating the regions that
contribute greatly to function.
103
Conclusions and Outlook
Figure 5.1 The contact MEP surface for 5-acetamido-1,3,4-thiadiazole-2-sulfonamide bound to
human carbonic anhydrase II (from the RCSB Protein Data Bank: ID=1YDA)
5.2 Outlook
Given the interest in describing the interactions between a drug and its target in as
much detail as is feasible, the addition of new electronic terms to the traditional
combination of steric and electrostatic parameters in the calculation of binding constants
and free energies, as in the case of automated docking routines, may provide better model
predictivity as well as new insights into the drug-binding process itself. Borrowing on the
ideas of building up a free energy of binding model from the contributions of molecular
fragments119 and the same sort of fragment-based approach in identifying the portions of
the binding region most important to protein-ligand binding, we find that we may be able
to construct a binding energy model based on the contributions of close-contact surface
properties. In essence, the idea is to treat the free energy of binding (ΔGbind) as a sort of
surface-integral-based solvation free energy where the contact free energy term (ΔGcontact)
is the energy associated with the “solvation” of ligand by the substrate (See Equation 5.1).
104
Chapter 5
lig substrbind contact conf conf solvG G G G GΔ = Δ + Δ + Δ + Δ (5.1)
Only the portion of the ligand surface that is in close proximity to the surface of the
enzyme binding pocket is taken to represent the binding surface. The model is constructed
by taking the bound ligand/binding-pocket atoms from crystal structures (along with
associated binding data such as experimentally-determined Kd or Ki values), adding
hydrogens and optimizing the geometries of these hydrogens, keeping the heavy atoms
fixed. Close-contact regions are identified by means of sums of van der Waals radii or by
overlap of isodensity surfaces and recorded. The substrate atoms are removed, leaving
only the ligand in a geometry that is very near that of the bound structure. The local
properties are then calculated for the molecule and, taking only the close-contact portion
of the surface, a regression model of binding energy is constructed from either 1)
statistical descriptors of the close-contact surface, or 2) a surface-integral treatment of the
close-contact surface.
One approximation that is made with this procedure is the neglect of the changes in
conformation in both the ligand and substrate upon binding, but several authors120-122 note
that the bound conformations of ligands are inevitably low-energy geometries and may not
contribute greatly to the overall free energy of binding as defined here. The
conformational free energy of the substrate cannot be evaluated by this method and is
assumed to be small and relatively constant among proteins. The free energy of solvation
term and the contact free energy term in Equation 5.1 are inversely proportional: as the
ligand enters the binding pocket, it becomes solvated by the binding pocket and desolvated
in bulk solution for the same portion of the molecular surface. Since the solvation free
energy model described in Chapter 2, Section 2.3.2, is calculated by a surface-integral
approach as well, it is a function of the solvent-exposed surface. The prediction of the free
energy of protein/ligand binding by local surface properties then becomes a matter of
including a small portion of the binding pocket of the protein in the initial handling of the
ligand. Further, it may be possible to predict the maximum possible free energy of
binding123 for a given ligand without a crystal structure, using the previously-described
model, and to estimate the amount contact surface area and the actual contact surface
regions from a given binding constant.
105
Chapter 6
Summary
Of great interest to the pharmaceutical industry is the elucidation of a set of
chemical/physical properties modulating the relationship between chemical structure and
pharmacological activity that could be used to predict activity based solely on chemical
structure. It is necessary then, only to discover the particular set of molecular descriptors
which adequately describe the activity to be predicted. Since the point of contact for all
drugs lies inevitably with the molecular surface of both the drug and the drug target, a
descriptive model of the molecular surface is needed. The nature of this surface is
electronic and quantum mechanical methods are those which describe the electronic
structure of the molecule. Local properties defined at points on the molecular surface,
such as molecular electrostatic potential (MEP), have been used to describe strong non-
covalent interactions that are based primarily on charge. Recently, additional local
properties have been described which complement MEP and provide a more complete
description of the local electronic environment at the molecular surface. This work
describes the implementation of the five local electronic properties using Parasurf:
electron affinity EAL, electronegativity χL, hardness ηL, ionization potential IEL, molecular
electrostatic potential MEP, and polarizability αL, into three principal methods of
quantitative structure-activity (QSAR) and structure-property prediction (QSPR) for use in
virtual high-throughput screening.
106
Chapter 6
The first of these methods involves the construction of surface-integral models,
which relate physical properties to the sum of the individual contributions of the local
surface properties, as determined by the statistical technique of multiple regression.
Similar regression models for activity have also been constructed from statistical measures
of the local properties, such as maxima, minima, ranges, etc. The predicted properties
may then be mapped onto the molecular surface as local properties, themselves, to expose
surface regions that relate to the observed activity. So, this method provides not only a
predicted property value, but allows for the visualization of the property surface.
Representation of a local property surface (MEP) used to construct surface-integral models.
Seven such models have been constructed for the prediction of 1) the n-octanol/water
partition coefficient logP, 2) the free energy of solvation in water ΔGsolv.(H2O), 3) the free
energy of solvation in n-octanol ΔGsolv.(oct.), 4) the acid dissociation constant pKa, 5)
boiling point Tb, 6) the glass transition temperature of organic LED materials Tg, and 7)
water solubility logS.
The second statistical method employing the local properties uses support vector
machines to classify drug compounds as inducers of phospholipidosis. Drug-induced
phospholipidosis is an undesirable side-effect of, primarily, cationic amphiphilic drugs
107
Summary
that causes lysosomal bodies, which contain large deposits of undegraded phospholipids,
to aggregate inside the cells of the lungs, liver, kidneys, corneas, and brain. Their
presence often coincides with adverse clinical effects such as inflammatory reactions and
fibrosis. The support vector machines use the statistical measures of the local properties
as descriptors for the classification of a test set of compounds, based on the model
constructed from a training set with the same number of compounds.
Representative local property field (EAL) using CoMFA methodology.
The third statistical method to which the local properties were applied was the 3D-
QSAR method of Comparative Molecular Field Analysis (CoMFA), which involves
aligning a set of molecules and determining the relationship between biological activity
and steric and electrostatic potentials at each of set of grid points surrounding the
structures by a partial least squares analysis. It has been argued that steric and
electrostatic fields alone do not present an adequate representation of drug-receptor
interactions, so additional physicochemical properties are required. In our formulation,
108
Chapter 6
the standard CoMFA electrostatic parameter corresponds to the MEP and the steric
parameter corresponds to the electron density, with the additional local properties
augmenting these basic parameters. Five sets of structure-activity data sets were
examined and our method produced q2 values comparable to, if not better than, the
reported standard CoMFA values. In addition, it was noted that the individual local
properties consistently gave better q2 values than either the MEP or electron density
parameters and proved to be significantly less sensitive to grid orientation. This method
effectively expands the descriptive vocabulary of 3D-QSAR and is better able to reveal
important intermolecular interactions not elucidated by shape and charge fields alone.
The computational methods described here allow drug researchers a means of
applying quantum-mechanically-derived local electronic properties to in silico high
throughput screening schemes in such a way as to not only predict and classify by various
physical properties and biological activities, but also to describe in chemical terms the
nature of the observed activity as a function of surface properties. These properties may
also be visualized by mapping them onto an isodensity surface, making the identification
of important functional moieties readily accessible to everyone involved in the drug design
pathway.
109
Chapter 7
Zusammenfassung
Die Aufklärung von physikalisch-chemischen Eigenschaften welche einen direkten
Bezug zwischen chemischer Struktur und pharmakologischer Aktivität und somit eine
Abschätzung der Aktivität allein basierend auf der strukturellen Information einer
Substanz erlauben ist für die pharmazeutische Industrie von essentieller Bedeutung.
Gelingt dies, so wird für die adäquate Beschreibung der vorherzusagenden Aktivität
folglich nur noch ein passend Satz molekularer Deskriptoren benötigt. Die Schnittstelle
zwischen Wirkstoff und aktivem Zentrum des Zielmoleküls wird unweigerlich durch die
molekularen Oberflächen beider Substanzen bestimmt weswegen ein Modell zur
Beschreibung dieser Oberflächen benötigt wird. Da es sich hierbei um elektronische
Oberflächen handelt werden zur Beschreibung quantenchemische Methoden
herangezogen. Um starke, auf Ladungen basierende, nicht-kovalente Wechselwirkungen
zu beschreiben wurden bisher lokale Eigenschaften auf der molekularen Oberfläche wie
etwa das molekulare elektrostatische Potential (MEP) benutzt. In letzter Zeit wurden
zusätzliche lokale Eigenschaften in diesen Ansatz integriert, welche die Informationen des
MEP ergänzen und somit zu einer vollständigeren Beschreibung der lokalen
elektronischen Umgebung auf der molekularen Oberfläche führen. Diese Arbeit
beschreibt die Integration fünf verschiedener lokaler elektronischer Eigenschaften wie
Elektronenaffinität EAL, Elektronegativität χL, Härte ηL, Ionisationspotential IEL,
molekulares elektronisches Potential MEP und Polarisierbarkeit αL in drei Hauptmethoden
110
Chapter 7
der quantitativen Stuktur-Aktivitäts (QSAR) und Struktur-Eigenschafts Vorhersage
(QSPR) für die Verwendung in High Throughput Screening (HTS) Anwendungen.
Darstellung einer MEP-Oberfläche für die Konstruktion von Oberflächenintegralmodellen
Die erste dieser Methoden beinhaltet die Konstruktion von
Oberflächenintegralmodellen. Diese setzen die physikalischen Eigenschaften mit der
Summe der individuellen Beiträge der lokalen Eigenschaften auf der Oberfläche in
Beziehung welche durch statistische Mehrfachregression bestimmt wurden.
Vergleichbare Regressionsmodelle zur Bestimmung der Aktivität wurden ebenfalls unter
Verwendung statistischer Größen wie etwa Maxima, Minima und Spannweiten aufgestellt.
Die vorhergesagten Eigenschaften können dann auf der molekularen Oberfläche als lokale
Eigenschaften abgebildet werden um so die Bereiche offen zu legen, die der
beobachtbaren Aktivität zuzuordnen sind. Somit sagt diese Methode nicht nur
Eigenschaftswerte voraus sondern eignet sich zusätzlich zur Veranschaulichung einer
Eigenschaftsoberfläche. Insgesamt wurden sieben solcher Modelle erstellt die sich zur
Vorhersage folgender lokaler Eigenschaften eignen 1) dem Verteilungskoeffizienten
(logP) von n-Octanol/Wasser, 2) der freien Lösungsenthalpie in Wasser ΔGsolv.(H2O), 3)
der freien Lösungsenthalpie in n-Octanol ΔGsolv.(oct.), 4) der Säuredissoziationskonstante
111
Zusammenfassung
pKa, 5) des Siedepunkts Tb, 6) der Glasübergangstemperatur organischer LED-Materialien
Tg, sowie 7) der Wasserlöslichkeit logS.
Die zweite statistische Methode zur Bestimmung lokaler Eigenschaften beinhaltet
die Verwendung von support vector machines zur Klassifizierung von
Wirkstoffbestandteilen als Ursache für Phospholipidose. Die wirkstoffinduzierte
Phospholipidose ist ein unerwünschter Nebeneffekt von meist kationischen amphiphilen
Wirkstoffen welcher zur Aggregation von Lysosomen führt die hohe Konzentrationen
nicht abgebauter Phospholipide enthalten die in der Lunge, der Leber, den Nieren, der
Kornea und dem Gehirn häufig zum Auftreten schädlicher Nebenwirkungen wie
Entzündungen und Fibrose führt. Die support vector machines nutzen die statistisch
ermittelten Größen der lokalen Eigenschaften als Deskriptoren zur Bestimmung einer
Auswahl an Substanzen welche vorher in einer Trainingsprozedur klassifiziert wurden.
Darstellung der räumlichen Verteilung lokaler Eigenschaften (EAL) erzeugt mittels CoMFA Methodik.
112
Chapter 7
Die dritte statistische Methode in die die lokalen Eigenschaften integriert wurden
ist die 3D-QSAR Methode der Comparative Molecular Field Analysis (CoMFA).
Basierend auf der Grundlage einer Gruppe zueinander ausgerichteter Moleküle erstellt die
CoMFA-Methode mittels Kleinstquadratanalyse auf einem Satz von Gitterpunkten eine
Beziehung zwischen biologischer Aktivität, der Sterik und dem elektrostatischen
Potential. Es wurde vermutet, dass Felder basierend auf Sterik und Elektrostatik allein
keine adäquate Wiedergabe der Wirkstoff-Rezeptor Wechselwirkung darstellen,
weswegen zusätzliche chemische und physikalische Eigenschaften mit berücksichtigt
werden mussten. In dem hier erarbeiteten Ansatz repräsentiert der elektrostatische
CoMFA Parameter das MEP und der sterische Parameter die Elektronendichte wobei die
zusätzlichen lokalen Eigenschaften diese grundlegenden Parameter ergänzen. Fünf der
mit dieser Methode erzeugten Stuktur-Aktivitäts Datensätze lieferten q2 Werte welche
vergleichbar oder sogar besser ausfielen als die durch Standard-CoMFA erzeugten
Literaturwerte. Zusätzlich zeigte sich, dass die individuellen lokalen Eigenschaften
durchweg bessere q2 Werte ergaben als die durch MEP oder Elektronendichteparameter
allein berechneten und dass sie sich zudem weniger anfällig bezüglich der
Gitterorientierung verhielten. Die in dieser Arbeit vorgestellte Methode erweitert die
Möglichkeiten von 3D-QSAR und ist weitaus besser in der Lage wichtige Informationen
über intermolekulare Wechselwirkungen aufzuzeigen die mit Strukturfeldern und
Ladungsverteilungen allein bisher nicht erkläret werden konnten.
Die hier beschriebenen rechnerbasierten Methoden eröffnen die Möglichkeit der
Anwendung quantenmechanisch generierter lokaler elektronischer Eigenschaften im in
silico HTS sodass nicht nur Vorhersagen und Klassifizierungen von physikalischen
Eigenschaften und biologischen Aktivitäten getroffen werden können sondern zusätzlich
eine chemische Beschreibung der Natur der beobachteten Eigenschaft als eine Funktion
von Oberflächeninformationen geschaffen werden kann. Diese können dann grafisch auf
eine molekulare Oberfläche übertragen und dargestellt werden, was im Folgenden die
Identifikation von wichtigen funktionalen Stellen für alle in der Prozesskette der
Wirkstoffentwicklung beteiligten Personen leicht zugänglich macht.
113
Appendix A
Data Sets
Table A1 Nonlinear regression terms used in calculating surface-integral models. Number Term
1 ( )V r
2 ( )V r
3 ( )3
2V⎡ ⎤⎣ ⎦r
4 ( ) 2V⎡ ⎤⎣ ⎦r
5 ( )5
2V⎡ ⎤⎣ ⎦r
6 ( ) 3V⎡ ⎤⎣ ⎦r
7 ( )LIE r
8 ( )LIE r
9 ( )3
2LIE⎡ ⎤⎣ ⎦r
10 ( ) 2LIE⎡ ⎤⎣ ⎦r
11 ( )5
2LIE⎡ ⎤⎣ ⎦r
12 ( ) 3LIE⎡ ⎤⎣ ⎦r
13 ( )LEA r
14 ( )LEA r
15 ( )3
2LEA⎡ ⎤⎣ ⎦r
16 ( ) 2LEA⎡ ⎤⎣ ⎦r
17 ( )5
2LEA⎡ ⎤⎣ ⎦r
18 ( ) 3LEA⎡ ⎤⎣ ⎦r
19 ( )Lα r
20 ( )Lα r
114
21 ( )3
2Lα⎡ ⎤⎣ ⎦r
22 ( ) 2Lα⎡ ⎤⎣ ⎦r
23 ( )5
2Lα⎡ ⎤⎣ ⎦r
24 ( ) 3Lα⎡ ⎤⎣ ⎦r
25 ( )Lη r
26 ( )Lη r
27 ( )3
2Lη⎡ ⎤⎣ ⎦r
28 ( ) 2Lη⎡ ⎤⎣ ⎦r
29 ( )5
2Lη⎡ ⎤⎣ ⎦r
30 ( ) 3Lη⎡ ⎤⎣ ⎦r
31 ( ) ( )LV IE⋅r r
32 ( ) ( )LV IE⋅r r
33 ( ) ( )3
2LV IE⎡ ⎤⋅⎣ ⎦r r
34 ( ) ( ) 2LV IE⋅⎡ ⎤⎣ ⎦r r
35 ( ) ( )5
2LV IE⎡ ⎤⋅⎣ ⎦r r
36 ( ) ( ) 3LV IE⋅⎡ ⎤⎣ ⎦r r
37 ( ) ( )LV EA⋅r r
38 ( ) ( )LV EA⋅r r
39 ( ) ( )3
2LV EA⎡ ⎤⋅⎣ ⎦r r
40 ( ) ( ) 2LV EA⋅⎡ ⎤⎣ ⎦r r
41 ( ) ( )5
2LV EA⎡ ⎤⋅⎣ ⎦r r
42 ( ) ( ) 3LV EA⋅⎡ ⎤⎣ ⎦r r
43 ( ) ( )LV α⋅r r
44 ( ) ( )LV α⋅r r
45 ( ) ( )3
2LV α⎡ ⎤⋅⎣ ⎦r r
46 ( ) ( ) 2LV α⋅⎡ ⎤⎣ ⎦r r
47 ( ) ( )5
2LV α⎡ ⎤⋅⎣ ⎦r r
48 ( ) ( ) 3LV α⋅⎡ ⎤⎣ ⎦r r
49 ( ) ( )LV η⋅r r
50 ( ) ( )LV η⋅r r
51 ( ) ( )3
2LV η⎡ ⎤⋅⎣ ⎦r r
52 ( ) ( ) 2LV η⋅⎡ ⎤⎣ ⎦r r
115
53 ( ) ( )5
2LV η⎡ ⎤⋅⎣ ⎦r r
54 ( ) ( ) 3LV η⋅⎡ ⎤⎣ ⎦r r
55 ( ) ( )L LIE EA⋅r r
56 ( ) ( )L LIE EA⋅r r
57 ( ) ( )3
2L LIE EA⎡ ⎤⋅⎣ ⎦r r
58 ( ) ( ) 2L LIE EA⋅⎡ ⎤⎣ ⎦r r
59 ( ) ( )5
2L LIE EA⎡ ⎤⋅⎣ ⎦r r
60 ( ) ( ) 3L LIE EA⋅⎡ ⎤⎣ ⎦r r
61 ( ) ( )L LIE α⋅r r
62 ( ) ( )L LIE α⋅r r
63 ( ) ( )3
2L LIE α⋅⎡ ⎤⎣ ⎦r r
64 ( ) ( ) 2L LIE α⋅⎡ ⎤⎣ ⎦r r
65 ( ) ( )5
2L LIE α⋅⎡ ⎤⎣ ⎦r r
66 ( ) ( ) 3L LIE α⋅⎡ ⎤⎣ ⎦r r
67 ( ) ( )L LIE η⋅r r
68 ( ) ( )L LIE η⋅r r
69 ( ) ( )3
2L LIE η⋅⎡ ⎤⎣ ⎦r r
70 ( ) ( ) 2L LIE η⋅⎡ ⎤⎣ ⎦r r
71 ( ) ( )5
2L LIE η⋅⎡ ⎤⎣ ⎦r r
72 ( ) ( ) 3L LIE η⋅⎡ ⎤⎣ ⎦r r
73 ( ) ( )L LEA α⋅r r
74 ( ) ( )L LEA α⋅r r
75 ( ) ( )3
2L LEA α⎡ ⎤⋅⎣ ⎦r r
76 ( ) ( ) 2L LEA α⋅⎡ ⎤⎣ ⎦r r
77 ( ) ( )5
2L LEA α⎡ ⎤⋅⎣ ⎦r r
78 ( ) ( ) 3L LEA α⋅⎡ ⎤⎣ ⎦r r
79 ( ) ( )L LEA η⋅r r
80 ( ) ( )L LEA η⋅r r
81 ( ) ( )3
2L LEA η⎡ ⎤⋅⎣ ⎦r r
82 ( ) ( ) 2L LEA η⋅⎡ ⎤⎣ ⎦r r
83 ( ) ( )5
2L LEA η⎡ ⎤⋅⎣ ⎦r r
84 ( ) ( ) 3L LEA η⋅⎡ ⎤⎣ ⎦r r
116
85 ( ) ( )L Lα η⋅r r
86 ( ) ( )L Lα η⋅r r
87 ( ) ( )3
2L Lα η⋅⎡ ⎤⎣ ⎦r r
88 ( ) ( ) 2L Lα η⋅⎡ ⎤⎣ ⎦r r
89 ( ) ( )5
2L Lα η⋅⎡ ⎤⎣ ⎦r r
90 ( ) ( ) 3L Lα η⋅⎡ ⎤⎣ ⎦r r
91 ( ) ( ) ( )L LV IE EA⋅ ⋅r r r
92 ( ) ( ) ( )L LV IE EA⋅ ⋅r r r
93 ( ) ( ) ( )3
2L LV IE EA⎡ ⎤⋅ ⋅⎣ ⎦r r r
94 ( ) ( ) ( ) 2L LV IE EA⋅ ⋅⎡ ⎤⎣ ⎦r r r
95 ( ) ( ) ( )5
2L LV IE EA⎡ ⎤⋅ ⋅⎣ ⎦r r r
96 ( ) ( ) ( ) 3L LV IE EA⋅ ⋅⎡ ⎤⎣ ⎦r r r
97 ( ) ( ) ( )L LV IE α⋅ ⋅r r r
98 ( ) ( ) ( )L LV IE α⋅ ⋅r r r
99 ( ) ( ) ( )3
2L LV IE α⎡ ⎤⋅ ⋅⎣ ⎦r r r
100 ( ) ( ) ( ) 2L LV IE α⋅ ⋅⎡ ⎤⎣ ⎦r r r
101 ( ) ( ) ( )5
2L LV IE α⎡ ⎤⋅ ⋅⎣ ⎦r r r
102 ( ) ( ) ( ) 3L LV IE α⋅ ⋅⎡ ⎤⎣ ⎦r r r
103 ( ) ( ) ( )L LV IE η⋅ ⋅r r r
104 ( ) ( ) ( )L LV IE η⋅ ⋅r r r
105 ( ) ( ) ( )3
2L LV IE η⎡ ⎤⋅ ⋅⎣ ⎦r r r
106 ( ) ( ) ( ) 2L LV IE η⋅ ⋅⎡ ⎤⎣ ⎦r r r
107 ( ) ( ) ( )5
2L LV IE η⎡ ⎤⋅ ⋅⎣ ⎦r r r
108 ( ) ( ) ( ) 3L LV IE η⋅ ⋅⎡ ⎤⎣ ⎦r r r
109 ( ) ( ) ( )L LV EA α⋅ ⋅r r r
110 ( ) ( ) ( )L LV EA α⋅ ⋅r r r
111 ( ) ( ) ( )3
2L LV EA α⎡ ⎤⋅ ⋅⎣ ⎦r r r
112 ( ) ( ) ( ) 2L LV EA α⋅ ⋅⎡ ⎤⎣ ⎦r r r
113 ( ) ( ) ( )5
2L LV EA α⎡ ⎤⋅ ⋅⎣ ⎦r r r
114 ( ) ( ) ( ) 3L LV EA α⋅ ⋅⎡ ⎤⎣ ⎦r r r
115 ( ) ( ) ( )L LV EA η⋅ ⋅r r r
116 ( ) ( ) ( )L LV EA η⋅ ⋅r r r
117
117 ( ) ( ) ( )3
2L LV EA η⎡ ⎤⋅ ⋅⎣ ⎦r r r
118 ( ) ( ) ( ) 2L LV EA η⋅ ⋅⎡ ⎤⎣ ⎦r r r
119 ( ) ( ) ( )5
2L LV EA η⎡ ⎤⋅ ⋅⎣ ⎦r r r
120 ( ) ( ) ( ) 3L LV EA η⋅ ⋅⎡ ⎤⎣ ⎦r r r
121 ( ) ( ) ( )L L LIE EA α⋅ ⋅r r r
r
122 ( ) ( ) ( )L L LIE EA α⋅ ⋅r r
123 ( ) ( ) ( )3
2L L LIE EA α⎡ ⎤⋅ ⋅⎣ ⎦r r r
124 ( ) ( ) ( ) 2L L LIE EA α⋅ ⋅⎡ ⎤⎣ ⎦r r r
125 ( ) ( ) ( )5
2L L LIE EA α⎡ ⎤⋅ ⋅⎣ ⎦r r r
126 ( ) ( ) ( ) 3L L LIE EA α⋅ ⋅⎡ ⎤⎣ ⎦r r r
127 ( ) ( ) ( )L L LIE EA η⋅ ⋅r r r
r
128 ( ) ( ) ( )L L LIE EA η⋅ ⋅r r
129 ( ) ( ) ( )3
2L L LIE EA η⎡ ⎤⋅ ⋅⎣ ⎦r r r
130 ( ) ( ) ( ) 2L L LIE EA η⋅ ⋅⎡ ⎤⎣ ⎦r r r
131 ( ) ( ) ( )5
2L L LIE EA η⎡ ⎤⋅ ⋅⎣ ⎦r r r
132 ( ) ( ) ( ) 3L L LIE EA η⋅ ⋅⎡ ⎤⎣ ⎦r r r
133 ( ) ( ) ( )L L LIE α η⋅ ⋅r r r
134 ( ) ( ) ( )L L LIE α η⋅ ⋅r r r
135 ( ) ( ) ( )3
2L L LIE α η⋅ ⋅⎡ ⎤⎣ ⎦r r r
136 ( ) ( ) ( ) 2L L LIE α η⋅ ⋅⎡ ⎤⎣ ⎦r r r
137 ( ) ( ) ( )5
2L L LIE α η⋅ ⋅⎡ ⎤⎣ ⎦r r r
138 ( ) ( ) ( ) 3L L LIE α η⋅ ⋅⎡ ⎤⎣ ⎦r r r
139 ( ) ( ) ( )L L LEA α η⋅ ⋅r r r
140 ( ) ( ) ( )L L LEA α η⋅ ⋅r r r
141 ( ) ( ) ( )3
2L L LEA α η⎡ ⎤⋅ ⋅⎣ ⎦r r r
142 ( ) ( ) ( ) 2L L LEA α η⋅ ⋅⎡ ⎤⎣ ⎦r r r
143 ( ) ( ) ( )5
2L L LEA α η⎡ ⎤⋅ ⋅⎣ ⎦r r r
144 ( ) ( ) ( ) 3L L LEA α η⋅ ⋅⎡ ⎤⎣ ⎦r r r
145 ( ) ( ) ( ) ( )L L LV EA α η⋅ ⋅ ⋅r r r r
r
146 ( ) ( ) ( ) ( )L L LV EA α η⋅ ⋅ ⋅r r r
147 ( ) ( ) ( ) ( )3
2L L LV EA α η⎡ ⎤⋅ ⋅ ⋅⎣ ⎦r r r r
148 ( ) ( ) ( ) ( ) 2L L LV EA α η⋅ ⋅ ⋅⎡ ⎤⎣ ⎦r r r r
118
149 ( ) ( ) ( ) ( )5
2L L LV EA α η⎡ ⎤⋅ ⋅ ⋅⎣ ⎦r r r r
150 ( ) ( ) ( ) ( ) 3L L LV EA α η⋅ ⋅ ⋅⎡ ⎤⎣ ⎦r r r r
Table A2 The logP data set.
No. Compound Exp. Calc.
1 glutamine -3.64 -3.16 2 citric acid -1.72 -0.59 3 phenylalanine -1.52 -1.79 4 tryptophan -1.06 -1.12 5 1,3-propanediol -1.04 -1.02 6 maleic acid-hydrazide -0.84 -0.43 7 N-formylcyclobutane carboxamide -0.70 0.19 8 allopurinol -0.55 1.23 9 2,2-dimethylpropionic acid-hydrazide -0.35 0.41
10 3-fluoropropanol -0.28 -0.14 11 thiamphenicol -0.27 1.19 12 2',3'-didesoxyadenosine -0.22 1.17 13 3-mesylphenyl urea -0.12 -0.95 14 imidazole -0.08 0.26 15 caffeine -0.07 0.91 16 o-methyl THPO -0.04 0.82 17 5,6-dihydro-2-methyl-1,4-oxathiin-3-carboxylic acid 0.04 0.32 18 mercaptoacetic acid 0.09 0.62 19 6-methylthioinosine 0.09 1.85 20 merbarone 0.14 2.64 21 atenolol 0.16 1.68 22 o-methylbenzoyl hydrazine 0.22 1.01 23 pentoxifylline 0.29 1.49 24 nikethamide 0.33 0.10 25 p-hydroxybenzamide 0.33 2.02 26 2,2-dichloroethanol 0.37 1.27 27 antipyrine 0.38 1.87 28 1-acetyl-N-(4-fluorophenyl)hydrazine carboxamide 0.42 -0.11 29 sulpiride 0.42 0.68 30 piperazine-2-carboxanilide 0.48 0.76 31 fluconazole 0.50 3.06 32 acetaminophen 0.51 0.20 33 2-amino-5-methoxy benzimidazole 0.57 0.99 34 sotalol 0.59 0.98 35 glutaric acid dimethyl ester 0.62 0.07 36 3-(5-nitro-2-furanyl)-2-propenoic amide 0.65 0.84 37 gallic acid 0.70 -0.23 38 1-acethyl-6-dimethyl-7-methoxymitosene 0.72 0.86 39 2-azacycloheptanthione 0.75 2.11 40 N-(2-benzoyl-oxyacetyl)-2-carboxyazetidine 0.79 0.37 41 chloropentazide 0.84 1.89 42 4-pyridinebutylamine 0.86 2.24 43 procainamide 0.88 1.52 44 tiapride 0.90 0.68 45 4-methylthiazole 0.97 1.40 46 6-cyanoquinoxaline 1.01 2.46 47 syringic acid 1.04 -0.02
119
48 1-phenyl-3-cyanoguanidine 1.05 2.06 49 m-acetylaminoacetophenone 1.10 0.98 50 acetylsalicylic acid 1.19 0.63 51 benzaldehydesemicarbazone 1.27 0.71 52 4-oxo-4-phenylbutanoic acid 1.30 1.00 53 2-phenylethanol 1.36 1.62 54 carocainide 1.38 1.43 55 3-bromobenzenesulfonamide 1.39 1.30 56 bromochloromethane 1.41 1.10 57 trimethylacetic acid 1.47 0.81 58 o-fluorophenylacetic acid 1.50 1.19 59 2-(2,6-dichloro-4-hydroxyphenylimino)imidazolidine 1.52 1.72 60 N-phenyl-4-aminophenylsufonamide 1.55 0.54 61 hydrocortisone 1.55 1.21 62 tryptamine 1.55 1.57 63 acetophenone 1.58 1.39 64 p-(N,N-dimethylcarbamate)-N,N-dimethylcarbamate, benzyl ester 1.59 1.48 65 prednisolone 1.60 1.82 66 1-dodecansulfonic acid 1.60 2.46 67 2-methylquinoxaline 1.61 2.72 68 3,5-dimethoxyphenol 1.64 0.74 69 bromazepam 1.65 2.89 70 indole-3-ethanolcarbamate 1.69 1.44 71 3-indolylpropionic acid 1.75 1.84 72 pindolol 1.75 2.09 73 propylene 1.77 1.54 74 2-oxoisopropyl-5-phenyl-5'-ethylbarbituric acid 1.79 3.11 75 4-dimethylamino-thieno(2,3-D)pyrimidine 1.82 2.09 76 dexamethasone 1.83 1.28 77 2-acetyl-oxyethyl benzoate 1.85 1.40 78 4-chloroaniline 1.88 1.53 79 N-methyl-2,3-dimethylphenyl carbamate 1.95 1.72 80 o-methylphenoxyacetic acid 1.98 0.95 81 acetic acid-m-methoxybenzoate 2.02 1.42 82 quinoline 2.03 2.69 83 1,1'-dioxo-3-cyclohexen-3-yl-1,2,4-benzothiadiazine 2.05 1.67 84 3,4-dimethylacetanilide 2.10 1.71 85 indole 2.14 1.59 86 mexilitene 2.15 2.32 87 griseofulvin 2.18 2.37 88 carbamazepine 2.19 2.08 89 hydrocortisone acetate 2.19 2.88 90 o-methylbenzaldehyde 2.26 1.48 91 4-bromoaniline 2.26 1.52 92 2,6-mimethoxypyridine 2.30 1.35 93 thiophene-2-carboxylic acid, ethyl ester 2.33 1.50 94 21-desoxybetamethasone 2.35 2.11 95 thiosalicylic acid 2.39 1.98 96 butyl gallate 2.41 1.14 97 1-pyrrol-2-yl-pentanone 2.42 1.55 98 4-phenylbutyric acid 2.42 2.04 99 5,5'-diphenylhydantoin 2.47 2.49
100 8-trifluoromethylquinoline 2.50 1.28 101 3-chlorophenol 2.50 1.93 102 lorazepam 2.51 3.04 103 2,17-dihydroxy-3-oxolactone-7,21-dicarboxy-pregan-4-ene 2.54 3.65 104 di-isopyramide 2.58 3.65 105 N-benzyl-N-formylaniline 2.62 2.62 106 5,6-diazaphenanthrene 2.71 3.47
120
107 lormetazepam 2.72 3.58
108 1-methyl-1,3-dihydro-5-(2-fluorophenyl)-7-chloro-1,4-benzodiazepin-2-one 2.75 3.57
109 diazepam 2.79 3.73 110 3-butyl-R,S-1-(3H)-isobenzofuranone 2.80 2.57 111 2-anilino-1,4-naphthoquinone 2.84 2.33 112 4-aminobiphenyl 2.86 2.90 113 quinidine 2.88 4.55 114 chlorobenzene 2.89 1.95 115 dihydromorphanthridine 2.90 3.67 116 p-phenoxyaniline 2.93 2.67 117 3-bromoquinoline 3.03 2.96 118 octanoic acid 3.05 2.04 119 deoxycorticosterone acetate 3.08 3.42 120 alprenolol 3.10 3.38 121 indecainide 3.11 4.09 122 N-(3,4-dichlorophenyl)difluoroacetamide 3.18 2.61 123 benzophenone 3.18 2.67 124 p-fluorotoluene 3.20 1.71 125 testosterone 3.29 3.37 126 1-(3,4-dichlorophenyl)-2-isopropylaminoethanol 3.32 3.65 127 3-methoxy-4-cyclohexylmethoxyphenylacetic acid 3.35 3.28 128 naphthalene 3.37 3.29 129 anthraquinone 3.39 2.06 130 1,2-dichlorobenzene 3.43 2.14 131 prometrin 3.51 2.37 132 4,7-dichloroquinoline 3.57 3.80 133 9-(N-((N,N’-diethylamino)acetyl)amino)fluorene 3.64 4.66 134 3,5-dichlorophenol 3.68 2.15 135 indigo 3.72 2.55 136 flecainide 3.78 2.38 137 3,4-dimethylchlorobenzene 3.82 3.30 138 diflubenzuron 3.83 2.14 139 estradiol 4.01 3.55 140 1-(4-cyclohexylphenyl)-3-methoxy-3-methylurea 4.08 3.61
141 1-phenyl-1-benzyl-2-methyl-3-(N,N-dimethylamino)-propanoic acid, propyl ester 4.18 4.48
142 2,6-dimethylnaphthalene 4.31 3.89 143 aminopyrene 4.31 4.15 144 fluphenazine 4.36 3.87 145 1,3-dimethylnaphthalene 4.42 4.08 146 1,3-dithiolan-2-ylidine-propanoic acid, dibutyl ester 4.60 4.32 147 1,2,4,5-tetrachlorobenzene 4.60 5.32 148 propafenone 4.63 3.53 149 bifonazole 4.77 5.82 150 aprindine 4.86 6.00 151 diethylstilbestrol 5.07 4.56 152 fluoranthene 5.16 4.93 153 trifluopromazine 5.19 4.18 154 clotrimazole 5.20 5.25 155 teflubenzuron 5.39 3.79 156 hexaflumuron 5.43 3.77 157 2,4,4'-trichlorobiphenyl 5.62 5.07 158 2,4,5-trichlorobiphenyl 5.90 4.50 159 thioridazine 5.90 5.47 160 phenylanthracene 6.01 6.33 161 flufenoxuron 6.16 5.10 162 1,3,7,8-tetrachlorodibenzodioxin 6.30 6.92 163 chlorfluazuron 6.63 5.09
121
164 1,2,3,6,7-pentachlorodibenzodioxin 6.74 8.00 165 linoleic acid 7.05 5.90 166 palmitic acid 7.17 5.26 167 3,3',4,4',5,5'-hexachlorobiphenyl 7.41 6.46 168 stearic acid 8.23 6.15
Table A3 The free energy of solvation in n-octanol data set. No. Compound Exp. Calc.
1 methane 0.51 -0.67 2 ethane -0.64 -1.19 3 propane -1.26 -1.77 4 cyclopropane -1.60 -0.98 5 2-methylpropane -1.45 -2.17 6 2,2-dimethylpropane -1.74 -2.36 7 n-butane -1.86 -2.22 8 cyclopentane -2.65 -3.49 9 n-pentane -2.45 -2.80
10 n-hexane -3.01 -3.32 11 cyclohexane -3.46 -3.16 12 methylcyclohexane -3.21 -3.36 13 n-heptane -3.74 -3.90 14 n-octane -4.18 -4.47 15 ethylene -0.27 -0.74 16 propylene -1.14 -1.42 17 2-methylpropene -2.03 -2.04 18 1-butene -1.89 -2.23 19 1-hexene -2.94 -3.37 20 1,3-butadiene -2.10 -2.16 21 acetylene -0.51 -0.61 22 propyne -1.59 -1.25 23 1-pentyne -2.79 -2.55 24 1-hexyne -3.43 -3.02 25 benzene -3.72 -3.92 26 toluene -4.55 -4.50 27 ethylbenzene -5.08 -5.20 28 m-xylene -5.25 -5.11 29 o-xylene -5.07 -4.98 30 p-xylene -5.19 -5.07 31 naphthalene -6.97 -6.97 32 anthracene -10.47 -10.11 33 1,1-difluoroethane -1.13 -2.39 34 tetrafluoromethane 1.50 0.36 35 fluorobenzene -3.87 -5.15 36 chlorotrifluoromethane -1.97 -0.49 37 dichlorodifluoromethane -1.25 -1.68 38 fluorotrichloromethane -2.63 -2.87 39 1,1,2-trichloro-1,2,2-trifluoroethane -2.54 -2.66 40 1-bromo-1-chloro-2,2,2-trifluoroethane -3.27 -4.01
122
41 bromotrifluoromethane -0.75 -1.44 42 dichloromethane -3.07 -2.44 43 trichloromethane -3.81 -3.15 44 chloroethane -2.58 -2.20 45 1,1,1-trichloroethane -3.69 -4.05 46 1,1,2-trichloroethane -4.53 -3.95 47 1-chloropropane -3.06 -2.93 48 2-chloropropane -2.84 -1.51 49 cis-1,2-dichloroethylene -3.71 -3.33 50 trans-1,2-dichloroethylene -3.61 -2.75 51 trichloroethylene -3.75 -3.29 52 tetrachloroethylene -4.24 -3.97 53 chlorobenzene -5.00 -5.42 54 1,2-dichlorobenzene -6.01 -6.26 55 1,4-dichlorobenzene -5.67 -6.55 56 2,2'-dichlorobiphenyl -9.41 -8.78 57 2,3-dichlorobiphenyl -9.23 -9.98 58 2,2',3'-trichlorobiphenyl -9.12 -9.81 59 bromomethane -2.43 -2.40 60 dibromomethane -4.18 -4.90 61 tribromomethane -5.62 -5.05 62 bromoethane -2.90 -2.93 63 1-bromopropane -3.42 -3.59 64 2-bromopropane -3.40 -2.94 65 1-bromobutane -4.16 -4.41 66 1-bromopentane -4.68 -5.02 67 3-bromopropene -3.30 -4.04 68 bromobenzene -5.46 -5.59 69 1,4-dibromobenzene -7.47 -7.46 70 p-bromotoluene -6.36 -6.14 71 methanol -3.87 -4.13 72 ethanol -4.36 -4.49 73 ethylene glycol -7.44 -6.90 74 1-propanol -5.02 -5.11 75 2-propanol -4.62 -4.93 76 1,1,1-trifluoro-2-propanol -5.12 -6.15 77 1,1,1,3,3,3-hexafluoro-2-propanol -5.76 -3.60 78 1-butanol -5.71 -5.49 79 t-butanol -4.78 -4.75 80 1-pentanol -6.40 -6.13 81 1-hexanol -7.06 -6.67 82 1-heptanol -7.75 -7.19 83 1-octanol -8.13 -7.75 84 allyl alcohol -5.27 -5.34 85 phenol -8.69 -7.46 86 4-bromophenol -10.59 -9.48 87 2-cresol -8.49 -7.89 88 3-cresol -8.20 -8.05 89 4-cresol -8.84 -8.12 90 2,2,2-trifluoroethanol -4.81 -6.75 91 2-methoxyethanol -5.83 -5.14
123
92 methyl propyl ether -3.63 -3.41 93 methyl isopropyl ether -4.64 -3.48 94 methyl t-butyl ether -3.49 -3.28 95 diethyl ether -2.89 -3.14 96 THF -3.93 -3.61 97 anisole -5.47 -5.82 98 ethyl phenyl ether -5.65 -6.29 99 1,2-dimethoxyethane -4.55 -4.25 100 1,4-dioxane -4.89 -5.23 101 propanal -4.13 -4.29 102 butanal -4.62 -4.54 103 benzaldehyde -6.13 -6.66 104 m-hydroxybenzaldehyde -11.39 -10.71 105 p-hydroxybenzaldehyde -12.36 -10.94 106 acetone -3.15 -3.94 107 2-butanone -3.78 -4.41 108 3,3-dimethyl-2-butanone -4.53 -5.14 109 2-pentanone -4.35 -4.96 110 3-pentanone -4.36 -4.77 111 cyclopentanone -5.01 -5.67 112 2-hexanone -5.02 -5.21 113 2-heptanone -5.65 -5.94 114 2-octanone -6.38 -6.31 115 acetophenone -6.74 -7.14 116 acetic acid -6.35 -5.20 117 propionic acid -6.86 -5.70 118 butyric acid -7.58 -6.27 119 pentanoic acid -8.22 -6.92 120 hexanoic acid -8.82 -7.41 121 4-amino-3,5,6-trichloropyridine-2-carboxylic acid -12.37 -13.33 122 methyl formate -2.82 -5.09 123 methyl acetate -3.54 -4.16 124 ethyl acetate -4.06 -4.70 125 propyl acetate -4.55 -5.18 126 butyl acetate -4.96 -5.76 127 methyl propionate -4.06 -4.69 128 methyl butyrate -4.59 -5.22 129 methyl pentanoate -5.13 -5.93 130 methyl benzoate -7.26 -8.06 131 methylamine -3.78 -3.31 132 ethylamine -4.09 -3.99 133 propylamine -4.77 -4.57 134 butylamine -5.35 -5.00 135 diethylamine -4.75 -4.36 136 dipropylamine -6.02 -5.64 137 trimethylamine -3.60 -2.35 138 piperazine -5.80 -5.91 139 aniline -6.71 -8.18 140 hydrazine -6.48 -7.06 141 morpholine -5.99 -5.33 142 piperidine -6.27 -4.67
124
143 pyridine -5.34 -4.89 144 2-methylpyridine -6.14 -5.38 145 3-methylpyridine -6.40 -5.60 146 4-methylpyridine -6.60 -5.66 147 2-ethylpyridine -6.40 -5.99 148 2-methylpyrazine -5.87 -6.30 149 2-ethyl-3-methoxypyrazine -6.85 -7.54 150 acetonitrile -3.15 -2.29 151 propionitrile -3.66 -3.14 152 butyronitrile -4.25 -3.67 153 benzonitrile -6.09 -6.54 154 2,6-dichlorobenzonitrile -9.18 -8.05 155 1-propanethiol -3.52 -3.80 156 thiophenol -5.99 -6.68 157 thioanisole -6.47 -6.99 158 dimethyl sulfide -4.24 -2.61 159 diethyl sulfide -4.09 -3.48 160 dipropyl sulfide -3.89 -4.99 161 trimethyl phosphate -7.81 -8.94 162 triethyl phosphate -8.88 -8.78 163 tripropyl phosphate -8.65 -8.22 164 2,2-dichloroethenyl dimethyl phosphate -8.59 -7.89 165 o-ethyl-o'-(4-bromo-2-chlorophenyl) S-propyl phosphorothioate -10.49 -10.80
Table A4 The free energy of solvation in water data set. No. Compound Exp. Calc. 1 methane 1.98 0.98 2 ethane 1.83 0.85 3 propane 1.96 1.04 4 cyclopropane 0.75 0.07 5 2-methylpropane 2.32 1.42 6 2,2-dimethylpropane 2.50 1.98 7 n-butane 2.08 1.24 8 2,2-dimethylbutane 2.59 2.14 9 cyclopentane 1.20 -0.64
10 n-pentane 2.33 1.44 11 2-methylpentane 2.52 1.81 12 3-methylpentane 2.51 1.73 13 2,4-dimethylpentane 2.88 2.22 14 2,2,4-trimethylpentane 2.85 2.72 15 methylcyclopentane 1.60 -0.13 16 n-hexane 2.49 1.64 17 cyclohexane 1.23 0.74 18 methylcyclohexane 1.71 1.18 19 cis-1,2-dimethylcyclohexane 1.58 1.59 20 n-heptane 2.62 1.84
125
21 n-octane 2.89 2.02 22 ethylene 1.27 0.86 23 propylene 1.27 0.72 24 2-methylpropene 1.16 0.46 25 1-butene 1.38 0.87 26 2-methyl-2-butene 1.31 0.26 27 3-methyl-1-butene 1.83 1.18 28 1-pentene 1.66 1.14 29 trans-2-pentene 1.34 0.72 30 4-methyl-1-pentene 1.91 1.51 31 cyclopentene 0.56 -0.77 32 1-hexene 1.66 1.35 33 cyclohexene 0.37 -0.58 34 trans-2-heptene 1.66 1.13 35 1-methylcyclohexene 0.67 -0.75 36 1-octene 2.17 1.85 37 1,3-butadiene 0.61 0.77 38 2-methyl-1,3-butadiene 0.68 0.57 39 2,3-dimethyl-1,3-butadiene 0.40 0.37 40 1,4-pentadiene 0.94 0.89 41 1,5-hexadiene 1.01 1.06 42 acetylene -0.01 0.55 43 propyne -0.48 -0.11 44 1-butyne -0.15 0.05 45 1-pentyne -0.16 0.48 46 1-hexyne 0.01 0.61 47 1-heptyne 0.60 0.88 48 1-octyne 0.71 1.05 49 1-nonyne 1.05 1.28 50 vinyl acetate 0.04 0.19 51 benzene -0.89 -0.88 52 toluene -0.76 -0.96 53 1,2,4-trimethylbenzene -0.86 -1.12 54 ethylbenzene -0.61 -0.77 55 m-xylene -0.80 -1.05 56 o-xylene -0.90 -1.01 57 p-xylene -0.80 -1.04 58 propylbenzene -0.53 -0.37 59 butylbenzene -0.40 -0.27 60 t-butylbenzene -0.44 0.23 61 t-amylbenzene -0.18 0.36 62 naphthalene -2.41 -2.43 63 anthracene -4.23 -4.02 64 phenanthrene -4.06 -4.09 65 acenaphthene -3.40 -3.75 66 p-chlorotoluene -1.92 -2.19 67 fluoromethane -0.22 -2.29 68 1,1-difluoroethane -0.11 -3.26 69 trifluoromethane 0.80 -1.15 70 tetrafluoromethane 3.16 2.09 71 hexafluoroethane 3.94 3.33
126
72 octafluoropropane 4.28 4.80 73 fluorobenzene -0.78 -3.31 74 2-chloro-1,1,1-trifluoroethane 0.05 -1.15 75 chlorofluoromethane -0.77 -0.68 76 chlorodifluoromethane 0.11 1.13 77 chlorotrifluoromethane 2.52 2.93 78 dichlorodifluoromethane 1.69 2.54 79 fluorotrichloromethane 0.82 0.33 80 1,1,2-trichloro-1,2,2-trifluoroethane 1.77 3.05 81 1,1,2,2-tetrachlorodifluoroethane 0.82 2.38 82 chloropentafluoroethane 2.86 3.46 83 1,1-dichlorotetrafluoroethane 2.50 2.75 84 1,2-dichlorotetrafluoroethane 2.31 3.97 85 1-bromo-1-chloro-2,2,2-trifluoroethane -0.13 -1.23 86 bromotrifluoromethane 1.79 -1.16 87 1-bromo-1,2,2,2-tetrafluoroethane 0.52 -0.80 88 chloromethane -0.56 -0.81 89 dichloromethane -1.36 -1.02 90 trichloromethane -1.07 -0.34 91 tetrachloromethane 0.10 -0.38 92 chloroethane -0.63 -0.76 93 1,1-dichloroethane -0.85 -1.42 94 (E)-1,2-dichloroethane -1.73 -1.64 95 1,1,1-trichloroethane -0.25 -1.26 96 1,1,2-trichloroethane -1.95 -1.66 97 1,1,1,2-tetrachloroethane -1.15 -0.57 98 1,1,2,2-tetrachloroethane -2.36 -0.79 99 pentachloroethane -1.36 -0.09
100 hexachloroethane -1.40 0.52 101 1-chloropropane -0.35 -0.43 102 2-chloropropane -0.24 0.49 103 1,2-dichloropropane -1.25 -1.01 104 1,3-dichloropropane -1.90 -1.93 105 1-chlorobutane -0.14 -0.31 106 2-chlorobutane 0.07 0.51 107 1,1-dichlorobutane -0.70 -1.05 108 1-chloropentane -0.07 -0.13 109 2-chloropentane 0.07 0.73 110 3-chloropentane 0.07 0.49 111 chloroethylene 0.49 -0.68 112 cis-1,2-dichloroethylene -1.17 -1.50 113 trans-1,2-dichloroethylene -0.76 -1.32 114 trichloroethylene -0.44 -1.16 115 tetrachloroethylene 0.05 -0.55 116 chlorobenzene -1.01 -1.97 117 o-chlorotoluene -1.15 -1.69 118 1,2-dichlorobenzene -1.36 -2.68 119 1,3-dichlorobenzene -0.98 -2.86 120 1,4-dichlorobenzene -1.01 -3.03 121 2,2'-dichlorobiphenyl -2.73 -2.38 122 2,3-dichlorobiphenyl -2.45 -3.48
127
123 2,2',3'-trichlorobiphenyl -1.99 -3.34 124 bromotrichloromethane -0.93 -2.28 125 1-chloro-2-bromoethane -1.95 -1.91 126 bromomethane -0.82 -0.49 127 dibromomethane -2.11 -2.39 128 tribromomethane -1.98 -3.00 129 bromoethane -0.70 -0.71 130 1,2-dibromoethane -2.10 -1.58 131 1-bromopropane -0.56 -0.46 132 2-bromopropane -0.48 1.76 133 1,2-dibromopropane -1.94 -0.23 134 1,3-dibromopropane -1.96 -1.75 135 1-bromo-2-methylpropane -0.03 0.31 136 1-bromobutane -0.41 -0.38 137 1-bromoisobutane -0.03 0.73 138 1-bromo-3-methylbutane 0.20 -0.03 139 1-bromopentane -0.08 -0.22 140 3-bromopropene -0.86 -0.42 141 bromobenzene -1.46 -1.53 142 1,4-dibromobenzene -2.30 -1.71 143 p-bromotoluene -1.39 -1.77 144 1-bromo-2-ethylbenzene -1.19 -1.17 145 o-bromocumene -0.85 -0.22 146 methanol -5.07 -4.58 147 ethanol -4.90 -4.86 148 ethylene glycol -9.30 -7.52 149 1-propanol -4.85 -4.57 150 2-propanol -4.75 -4.92 151 1,1,1-trifluoro-2-propanol -4.16 -4.18 152 2,2,3,3-tetrafluoropropanol -4.90 -4.30 153 2,2,3,3,3-pentafluoropropanol -4.15 -5.57 154 1,1,1,3,3,3-hexafluoro-2-propanol -3.76 -0.48 155 2-methyl-1-propanol -4.51 -5.11 156 1-butanol -4.72 -4.34 157 2-butanol -4.61 -3.06 158 t-butanol -4.51 -4.37 159 2-methyl-1-butanol -4.42 -3.27 160 3-methyl-1-butanol -4.42 -3.87 161 2-methyl-2-butanol -4.43 -3.59 162 2,3-dimethyl-1-butanol -3.91 -4.46 163 1-pentanol -4.49 -4.15 164 2-pentanol -4.39 -3.12 165 3-pentanol -4.35 -3.45 166 2-methyl-1-pentanol -3.93 -4.52 167 2-methyl-2-pentanol -3.93 -3.14 168 2-methyl-3-pentanol -3.89 -2.90 169 4-methyl-2-pentanol -3.74 -2.74 170 cyclopentanol -5.49 -5.77 171 1-hexanol -4.36 -3.94 172 3-hexanol -3.68 -3.11 173 cyclohexanol -4.95 -4.82
128
174 4-heptanol -4.01 -2.72 175 cycloheptanol -5.49 -4.05 176 1-heptanol -4.25 -3.73 177 1-octanol -4.10 -3.54 178 allyl alcohol -5.03 -5.61 179 phenol -6.53 -5.58 180 4-bromophenol -7.10 -6.06 181 4-t-butylphenol -5.92 -4.14 182 2-cresol -5.86 -5.10 183 3-cresol -5.49 -5.76 184 4-cresol -6.12 -5.81 185 2,2,2-trifluoroethanol -4.31 -5.49 186 p-bromophenol -7.13 -6.06 187 2-methoxyethanol -6.77 -4.81 188 dimethoxymethane -2.93 -2.52 189 methyl propyl ether -1.66 -1.51 190 methyl isopropyl ether -2.00 -1.82 191 methyl t-butyl ether -2.21 -0.28 192 diethyl ether -1.75 -2.13 193 ethyl propyl ether -1.81 -1.72 194 dipropyl ether -1.16 -1.41 195 diisopropyl ether -0.53 -0.83 196 di-n-butyl ether -0.83 -1.05 197 THF -3.12 -3.83 198 2-methyltetrahydrofuran -3.30 -3.70 199 anisole -2.45 -1.99 200 ethyl phenyl ether -4.28 -1.81 201 1,1-diethoxyethane -3.27 -3.17 202 1,2-dimethoxyethane -4.84 -3.06 203 1,2-diethoxyethane -3.53 -3.89 204 1,3-dioxolane -4.09 -6.12 205 1,4-dioxane -5.05 -5.18 206 2,2,2-trifluoroethyl vinyl ether -0.12 -1.53 207 1-chloro-2,2,2-trifluoroethyl difluoromethyl ether 0.11 0.01 208 acetaldehyde -3.50 -3.21 209 propanal -3.44 -4.10 210 butanal -3.18 -2.81 211 pentanal -3.03 -3.79 212 hexanal -2.81 -2.50 213 heptanal -2.67 -3.43 214 octanal -2.29 -2.15 215 nonanal -2.07 -3.16 216 trans-2-butenal -4.23 -4.62 217 trans-2-hexenal -3.68 -4.29 218 trans-2-octenal -3.44 -3.93 219 trans,trans-2,4-hexadienal -4.64 -3.52 220 benzaldehyde -4.02 -5.05 221 m-hydroxybenzaldehyde -9.51 -9.02 222 p-hydroxybenzaldehyde -10.47 -9.23 223 acetone -3.80 -3.75 224 2-butanone -3.71 -4.41
129
225 3-methyl-2-butanone -3.24 -4.02 226 3,3-dimethyl-2-butanone -2.89 -3.65 227 2-pentanone -3.52 -3.99 228 3-pentanone -3.41 -4.22 229 4-methyl-2-pentanone -3.06 -3.67 230 2,4-dimethyl-3-pentanone -2.74 -3.37 231 cyclopentanone -4.68 -4.09 232 2-hexanone -3.41 -3.92 233 2-heptanone -3.04 -3.65 234 4-heptanone -2.93 -3.61 235 2-octanone -2.88 -3.48 236 2-nonanone -2.48 -3.33 237 5-nonanone -2.67 -2.99 238 2-undecanone -2.15 -2.87 239 acetophenone -4.58 -4.94 240 acetic acid -6.70 -5.32 241 propionic acid -6.46 -5.40 242 butyric acid -6.35 -4.73 243 pentanoic acid -6.16 -4.71 244 hexanoic acid -6.21 -4.29 245 4-amino-3,5,6-trichloropyridine-2-carboxylic acid -11.96 -12.75 246 methyl formate -2.78 -4.55 247 ethyl formate -2.65 -4.77 248 propyl formate -2.48 -4.28 249 methyl acetate -3.31 -3.69 250 isopropyl formate -2.02 -4.81 251 isobutyl formate -2.22 -4.84 252 isoamyl formate -2.13 -3.66 253 ethyl acetate -3.08 -3.56 254 propyl acetate -2.85 -3.06 255 isopropyl acetate -2.65 -3.70 256 butyl acetate -2.55 -2.88 257 isobutyl acetate -2.36 -4.84 258 amyl acetate -2.45 -2.62 259 isoamyl acetate -2.21 -2.52 260 hexyl acetate -2.26 -2.44 261 methyl propionate -2.97 -3.60 262 ethyl propionate -2.80 -3.37 263 propyl propionate -2.54 -3.03 264 isopropyl propionate -2.22 -3.38 265 pentyl propionate -1.99 -2.78 266 methyl butyrate -2.84 -3.10 267 ethyl butyrate -2.50 -3.06 268 propyl butyrate -2.28 -2.69 269 methyl pentanoate -2.54 -2.97 270 ethyl pentanoate -2.52 -2.74 271 methyl hexanoate -2.48 -2.61 272 ethyl heptanoate -2.30 -2.32 273 methyl octanoate -2.05 -2.15 274 methyl benzoate -4.28 -5.93 275 methylamine -4.60 -3.98
130
276 ethylamine -4.61 -4.34 277 propylamine -4.50 -4.02 278 butylamine -4.38 -3.88 279 pentylamine -4.09 -3.64 280 hexylamine -4.04 -3.50 281 dimethylamine -4.28 -3.28 282 diethylamine -4.06 -3.52 283 dipropylamine -3.65 -2.97 284 dibutylamine -3.31 -2.49 285 trimethylamine -3.23 -1.70 286 triethylamine -3.03 -2.38 287 azetidine -5.56 -4.02 288 piperazine -7.40 -8.45 289 N,N'-dimethylpiperazine -7.58 -5.74 290 N-methylpiperazine -7.77 -7.15 291 aniline -5.49 -6.28 292 1,1-dimethyl-3-phenylurea -11.87 -9.31 293 N,N-dimethyaniline -2.90 -4.97 294 ethylenediamine -9.75 -8.93 295 hydrazine -9.30 -9.78 296 2-methoxy-1-ethanamine -6.55 -6.26 297 morpholine -7.17 -6.61 298 N-methylmorpholine -6.34 -5.22 299 N-methylpyrrolidine -3.97 -3.82 300 N-methylpiperidine -3.89 -2.67 301 pyrrolidine -5.47 -4.47 302 piperidine -5.10 -3.98 303 pyridine -4.69 -3.32 304 2-methylpyridine -4.62 -3.49 305 3-methylpyridine -4.77 -3.54 306 4-methylpyridine -4.92 -3.58 307 2-ethylpyridine -4.32 -3.42 308 3-ethylpyridine -4.60 -3.40 309 4-ethylpyridine -4.72 -3.45 310 2,3-dimethylpyridine -4.81 -3.58 311 2,4-dimethylpyridine -4.85 -3.75 312 2,5-dimethylpyridine -4.70 -3.74 313 2,6-dimethylpyridine -4.60 -3.60 314 3,4-dimethylpyridine -5.21 -3.71 315 3,5-dimethylpyridine -4.84 -3.77 316 2-methylpyrazine -5.51 -4.80 317 2-ethylpyrazine -5.45 -4.72 318 2-isobutylpyrazine -5.05 -4.10 319 2-ethyl-3-methoxypyrazine -4.39 -4.57 320 2-isobutyl-3-methoxypyrazine -3.68 -3.67 321 9-methyladenine -13.60 -13.77 322 1-methylthymine -10.40 -11.22 323 methylimidazole -10.25 -7.61 324 N-propylguanidine -10.92 -10.73 325 acetonitrile -3.89 -1.49 326 propionitrile -3.85 -1.73
131
327 butyronitrile -3.64 -1.48 328 benzonitrile -4.10 -3.87 329 2,6-dichlorobenzonitrile -5.22 -5.11 330 3,5-dibromo-4-hydroxybenzonitrile -9.00 -9.34 331 N,N-dimethylformamide -4.90 -6.63 332 N-methylformamide -10.00 -8.02 333 Acetamide -9.72 -8.41 334 (E)-N-methylacetamide -10.00 -7.25 335 (Z)-N-methylacetamide -10.00 -7.42 336 propionamide -9.42 -8.55 337 methanethiol -1.24 -1.82 338 ethanethiol -1.30 -1.22 339 1-propanethiol -1.05 -0.91 340 thiophenol -2.55 -3.03 341 thioanisole -2.73 -2.80 342 dimethyl sulfide -1.54 -2.21 343 diethyl sulfide -1.43 -0.90 344 methyl ethyl sulfide -1.49 -1.60 345 dipropyl sulfide -1.27 -0.51 346 2,2'-dichlorodiethyl sulfide -3.92 -3.44 347 dimethyl disulfide -1.83 -2.67 348 diethyl disulfide -1.63 -1.60 349 trimethyl phosphate -8.70 -8.52 350 triethyl phosphate -7.80 -7.65 351 tripropyl phosphate -6.10 -4.16 352 2,2-dichloroethenyl dimethyl phosphate -6.61 -6.55 353 dimethyl-5-(4-chloro)-bicyclo[3.2.0]-heptyl phosphate -7.28 -8.62 354 o-ethyl-o'-(4-bromo-2-chlorophenyl) S-propyl phosphorothioate -4.09 -5.04 355 hydrochinone -10.77 -10.18 356 1,2,3-trimethoxybenzene -5.40 -6.31 357 1,2-benzenediole -7.62 -9.56 358 1,3-benzenediole -9.67 -9.68 359 o-phenylenediamine -7.19 -10.91 360 m-phenylenediamine -10.26 -12.01 361 2-methylaniline -5.47 -6.41 362 N-methylaniline -4.54 -5.38 363 acetylene anion -73 -66 364 protonated methanol -85 -76 365 protonated dimethyl ether -70 -72 366 protonated 2-propanol -64 -66 367 methanolate ion -95 -92 368 formylate ion -77 -75 369 dimethyl ether carbanion -81 -78 370 phenolate ion -72 -80 371 toluene carbanion -59 -62 372 superoxide -87 -84 373 methyl ammonium ion -70 -79 374 protonated acetamide -66 -63 375 protonated N-methylmethanamine -63 -70 376 protonated N,N-dimethylmethanamine -59 -62 377 pyridinium ion -59 -68
132
378 ammonium ion -79 -67 379 acetonitrile carbanion -75 -73 380 azide ion -74 -74 381 methylsulfonium ion -74 -74 382 protonated dimethyl sulfide -61 -55 383 1-propanethiolate anion -76 -75 384 thiophenolate ion -67 -66
Table A5 The pKa data set. No. Compound Exp. Calc.
1 2,3,4,5,6-pentafluoroaniline -0.28 0.26 2 2,3,5,6-tetramethyl-4-nitrobenzeneamine 2.36 2.97 3 2,3-dichloroaniline 1.76 2.03 4 2,4,5-trichloroaniline 1.09 1.33 5 2,4,6-trichloroaniline -0.03 1.38 6 2,4-dibromoaniline 2.30 0.92 7 2,4-dichloroaniline 2.00 2.23 8 2,4-dinitroaniline -4.25 -2.34 9 2,5-dichloroaniline 2.05 2.09
10 2,5-dimethoxyaniline 3.93 6.15 11 2,6-dichloro-4-nitroaniline -2.55 -1.07 12 2,6-dichloroaniline 0.42 2.18 13 2,6-dimethyl-4-nitrobenzeneamine 0.98 2.37 14 2,6-dinitroaniline -5.00 -2.07 15 2-amino-4-nitrophenol 3.10 2.33 16 2-aminobenzoic acid,ethyl ester 2.51 3.52 17 2-aminobenzoic acid 2.14 2.83 18 2-aminobiphenyl 3.83 6.77 19 2-aminophenol 4.84 5.26 20 2-chloro-4-nitroaniline -0.94 -0.64 21 2-methoxy-5-nitroaniline 2.49 2.36 22 2-nitro-4-toluidine 0.40 1.62 23 3,4-dichloroaniline 2.97 2.18 24 3,5-dichloroaniline 2.51 1.99 25 3,5-dimethyl-4-nitrobenzeneamine 2.54 2.22 26 3,5-dinitroaniline 0.30 -0.33 27 3-aminobenzoic acid 3.07 3.17 28 3-aminophenol 4.37 5.24 29 3-bromoaniline 3.58 2.62 30 3-methyl-4-bromoaniline 4.05 3.46 31 3-methyl-4-nitroaniline 1.64 1.70 32 3-nitro-4-toluidine 3.03 2.96 33 3-trifluoromethylaniline 3.49 2.46 34 4-aminobenzoic acid 2.38 1.86 35 4-aminobiphenyl 4.35 6.32
133
36 4-aminophenol 5.48 5.91 37 4-benzoylaniline 2.24 1.11 38 4-bromoaniline 3.86 4.30 39 4-chloro-2-nitroaniline -1.02 -0.22 40 4-chloro-3-nitrobenzeneamine 1.90 -0.52 41 4-methoxy-2-nitrobenzenamine 0.77 1.03 42 4-methylsulfonylaniline 1.35 0.31 43 4-nitro-2-toluidine 1.04 0.90 44 5-nitro-2-toluidine 2.35 2.65 45 butyl-4-aminobenzoate 2.47 3.98 46 methyl-4-aminobenzoate 2.47 3.48 47 methyl anthranilate 2.23 2.97 48 o-bromoaniline 2.53 1.67 49 p-aminobenzoic acid,ethyl ester 2.51 2.60 50 p-aminosalicylic acid 2.05 2.73 51 propyl-4-aminobenzoate 2.49 4.55 52 p-trifluoromethylaniline 2.45 2.01 53 1,2,2,6,6-pentamethylpiperidine 11.25 9.80 54 1,2,3,4-tetrahydro-2-naphthalenamine 9.93 10.17 55 1-methylpyrrolidine 10.32 8.72 56 2,2,2-trifluoroethylamine 5.70 4.95 57 2,2,6,6-tetramethylpiperidine 11.72 11.19 58 2,2-bipyridine 4.33 3.97 59 2,3,4,5,6-pentachloropyridine -1.00 -0.72 60 2,3,5,6-tetrachloropyridine -0.80 -1.88 61 2,3,5,6-tetramethylpyridine 7.90 7.64 62 2,3-dichloropyridine -0.85 0.70 63 2,3-dimethylpyridine 6.57 6.44 64 2,4,6-collidine 7.43 6.93 65 2,4-dimethylpyridine 6.99 6.51 66 2,5-dimethylpyridine 6.40 6.53 67 2,6-dichloropyridine -2.86 1.44 68 2,6-dimethoxypyridine 1.60 4.41 69 2,6-lutidine 6.60 6.18 70 2-acetylpyridine 2.73 3.02 71 2-amino-5-methylpyridine 7.22 5.93 72 2-aminomethylfuran 8.89 8.22 73 2-benzylpyridine 5.13 6.08 74 2-bromopyridine 0.90 1.91 75 2-chloropyridine 0.49 2.78 76 2-ethylpyridine 5.89 6.38 77 2-fluoropyridine -0.44 3.32 78 2-hydroxypyridine 0.75 3.70 79 2-methoxypyridine 3.06 4.55 80 2-methyl-5-vinylpyridine 5.67 5.94 81 2-methylpiperidine 11.08 10.10 82 2-methylpyridine 6.00 5.69 83 2-methylthiopyridine 3.59 1.71 84 2-phenethylamine 9.96 7.35 85 2-phenylpyridine 4.48 4.81
134
86 2-phenylpyrrolidine 9.40 9.24 87 2-propylpiperidine 11.00 10.88 88 2-pyridinecarboxyaldehyde 3.80 4.25 89 2-pyridineethanol 5.31 5.13 90 2-pyridinepropanol 5.61 5.81 91 2-t-butylpyridine 5.76 7.26 92 2-vinylpyridine 4.98 4.74 93 3,4-dimethylpyridine 6.46 6.70 94 3,4-methylenedioxyamphetamine 9.67 9.40 95 3,5-dichloropyridine 0.67 1.60 96 3,5-dimethylpyridine 6.15 6.86 97 3-bromopyridine 2.91 1.11 98 3-ethylpyridine 5.56 6.77 99 3-formylpyridine 3.80 3.51
100 3-hydroxypyridine 4.80 4.31 101 3-methoxypyridine 4.91 4.71 102 3-methylpyridine 5.63 6.07 103 3-nitropyridine 1.18 0.51 104 3-phenylpropylamine 10.16 9.18 105 3-phenylpyridine 4.80 5.50 106 3-pyridinemethaneamine 5.96 8.10 107 3-pyridinemethanol 4.90 4.67 108 3-pyridinepropanol 5.47 6.08 109 4,4-bipyridinyl 4.82 5.16 110 4-acetylpyridine 3.59 3.45 111 4-benzylpyridine 5.59 6.78 112 4-bromopyridine 3.78 3.24 113 4-chloropyridine 3.84 3.65 114 4-cyanopyridine 1.90 3.85 115 4-ethylmorpholine 7.67 8.03 116 4-ethylpyridine 5.87 6.72 117 4-formylpyridine 4.77 3.29 118 4-methoxypyridine 6.47 5.05 119 4-methylbenzenemethanamine 9.36 9.22 120 4-methylpyridine 5.98 6.01 121 4-phenylbutylamine 10.36 9.66 122 4-phenylpyridine 5.55 5.33 123 4-propylpyridine 6.05 7.56 124 4-pyridineethanol 5.60 6.88 125 4-pyridinemethanol 5.33 5.56 126 4-pyridinepropanol 5.84 7.11 127 4-t-butylpyridine 5.99 7.84 128 4-vinylpyridine 5.62 5.15 129 5-ethyl-2-methylpyridine 6.51 7.20 130 allylamine 9.70 8.45 131 α-methylbenzeneethanamine 10.13 9.45 132 α-methylbenzenepropanamine 9.79 9.51 133 anabasine 8.70 8.91 134 arecoline 7.16 6.34 135 azetidine 11.29 8.87
135
136 benzylamine 9.33 7.82 137 bis-(2-chloroethyl)ethylamine 6.57 6.57 138 chlorpheniramine 9.13 7.09 139 cyclohexanamine 10.63 10.50 140 diallylamine 9.29 9.17 141 dibutylamine 11.39 11.89 142 dicyclohexylamine 10.40 11.82 143 diethylamine 11.09 10.22 144 diisopropylamine 11.07 10.99 145 dimethylamine 10.73 9.46 146 dimethylbutylamine 10.19 9.61 147 dinicotinic acid 1.10 3.88 148 diphenhydramine 8.98 6.74 149 dipropylamine 11.00 11.01 150 E-3-nicotinoylacrylic acid 3.82 1.90 151 ethylamine 10.87 10.11 152 ethyldimetyhlamine 10.16 8.72 153 fenpropidin 10.10 12.19 154 fenpropimorph 6.98 11.76 155 hexamethyleneimine 11.07 10.21 156 isobutylamine 10.68 10.84 157 isonicotinic acid, ethyl ester 1.70 4.11 158 isonicotinic acid, methyl ester 3.45 3.45 159 isonicotinic acid 3.26 2.96 160 isopropylamine 10.63 10.42 161 mescaline 9.56 8.09 162 methadone 8.94 7.28 163 methamphetamine 9.87 9.35 164 methylamine 10.62 9.23 165 methylbutylamine 10.90 10.29 166 morpholine 8.49 7.85 167 moxisylyte 8.72 8.01 168 N-β-dimethylbenzeneethanamine 9.87 8.73 169 N-butylamine 10.78 10.50 170 N-ethylbenzenemethanamine 9.64 9.78 171 nicotine 8.18 8.64 172 nicotinic acid, ethyl ester 3.35 3.51 173 nicotinic acid, methyl ester 3.13 2.81 174 nikethamide 3.50 3.97 175 n-methylbenzeneethanamine 10.08 8.13 176 n-methylbenzylamine 9.54 7.85 177 n-methylmorpholine 7.38 7.72 178 n-methylpiperidine 10.08 8.83 179 N,N-di-2-propenyl-2-propen-1-amine 8.31 8.60 180 N,N-dimethyl-2-(3-pyridyl)ethylamine 8.86 8.46
181 N,N-dimethyl-2-[5-methyl-2-(1-methylethyl)phenoxy]ethanamine 8.66 9.74
182 N,N-dimethyl-(2-pyridine)ethanamine 8.75 7.82 183 N,N-dimethyl-3-pyridylmethylamine 8.00 7.63 184 N,N-dimethylbenzylamine 8.91 7.50
136
185 orphenadrine 8.91 8.33 186 picolinic acid, methyl ester 2.21 2.49 187 picolinic acid 1.06 2.29 188 piperalin 8.90 8.34 189 piperidine 11.28 9.30 190 p-methoxyamphetamine 9.53 10.04 191 propylamine 10.71 10.09 192 pyridine 5.23 5.27 193 pyrrolidine 11.31 8.97 194 sec-butylamine 10.56 10.80 195 t-butylamine 10.68 10.52 196 triethylamine 10.78 9.64 197 trimethylamine 9.80 8.37 198 tri-N-butylamine 10.89 12.41 199 tripropylamine 10.65 10.91 200 1-acetyl-1H-imidazole 3.60 4.53 201 1-methyl-4-nitro-1H-imidazole -0.53 -2.05 202 1-methyl-5-nitroimidazole 2.13 -0.07 203 1-phenylmethyl-1H-imidazole 6.70 6.39 204 2-(2,4-dimethylphenyl)-5-nitrobenzimidazole 5.29 2.75 205 2-(2-methoxyphenyl)benzimidazole 7.17 4.37 206 2-(2-methylphenyl)-5-nitrobenzimidazole 4.87 2.72 207 2,4,6-pyrimidinetriamine 6.81 2.23 208 2-(4-aminophenylmethyl)-5-chlorobenzimidazole 7.47 4.67 209 2-(4-bromophenylmethyl)-5-chlorobenzimidazole 5.42 7.96 210 2-(4-chlorphenylmethyl)-5-chlorobenzimidazole 4.86 4.82 211 2,4-dimethylquinoline 5.12 6.41 212 2-(4-methoxyphenylmethyl)-5-nitrobenzimidazole 4.26 1.76 213 2-(4-methylphenyl)benzimidazole 6.90 6.27 214 2-(4-methylphenylmethyl)-5-chlorobenzimidazole 7.09 5.70 215 2,6-dimethylquinoline 6.10 6.62 216 2-amino-4,6-dimethylpyrimidine 4.82 4.35 217 2-aminopyrimidine 3.45 2.98 218 2-bromopyrimidine -1.63 1.24 219 2-ethoxypyrimidine 1.27 3.11 220 2-methyl-1H-imidazole 7.85 6.22 221 2-methyl-8-quinolinol 5.55 4.92 222 2-methylquinoline 5.71 5.70 223 2-methylthio-4,6-dimethylpyrimidine 0.59 4.86 224 2-methylthiopyrimidine 6.48 3.72 225 2-phenyl-1H-imidazole -0.68 5.11 226 2-pyrimidinecarboxylic acid, methyl ester 2.12 0.70 227 3-bromoquinoline 2.69 3.66 228 3-methylquinoline 5.17 6.09 229 3-quinolinol 4.28 4.73 230 4,6-dimethylpyrimidine 2.70 5.05 231 4,7-dichloroquinoline 2.80 1.94 232 4-methyl-8-quinolinol 5.56 4.89 233 4-methylpyrimidine 1.91 4.40 234 4-methylquinoline 5.67 5.87
137
235 4-nitroimidazole -0.05 -0.20 236 5-chloro-8-quinolinol 3.56 2.62 237 5-nitropyrimidine 0.72 -0.44 238 5-quinolinol 5.02 4.37 239 6-bromoquinoline 3.87 1.17 240 6-chloroquinoline 3.85 3.18 241 6-hydroxyquinoline 5.15 3.85 242 6-methoxyquinoline 5.03 4.93 243 6-methylquinoline 5.34 6.06 244 7-bromoquinoline 3.87 3.98 245 7-methoxyquinoline 5.03 4.67 246 7-methylquinoline 5.34 6.00 247 7-quinolinol 5.46 4.56 248 8-chloroquinoline 3.12 3.28 249 8-fluoroquinoline 3.34 4.20 250 8-methoxyquinoline 5.01 4.42 251 8-methylquinoline 5.05 5.81 252 8-quinolinol 4.90 4.21 253 anserine 7.04 8.67 254 benzimidazole 5.53 5.20 255 cimetidine 6.80 6.07 256 cloquintocetmexyl 3.75 3.90 257 fenclorim 4.23 0.07 258 imidazole 6.95 5.97 259 pentostatin 5.20 7.38 260 pilocarpol 6.78 5.75 261 prochloraz 3.80 2.85 262 pyrimethanil 3.52 4.06 263 pyrimidine 1.23 3.61 264 quinoline 4.90 5.16 265 triflumizole 3.70 2.29
Table A6 The glass transition temperature data seta. No. Compound Exp. Calc.
1 1-TNATA 386 380 2 2-TNATA 383 395 3 AODF1 353 362 4 AODF2 353 379 5 BMA-1T 359 346 6 BMA-2T 363 357 7 BMA-3T 366 374 8 BMA-4T 371 376 9 BMB-2T 380 360
10 BMB-3T 366 372 11 BNpA-1T 364 364 12 BPAPF 440 422 13 EFPCA 405 393 14 EFPPCA 458 446
138
15 EM1 407 401 16 EM2 395 366 17 EM3 391 374 18 EM4 372 389 19 EM5 440 424 20 ENPPCA 447 403 21 EPPCA 447 419 22 EtCz2 343 364 23 F1AMB-1T 397 384 24 m-BPD 354 365 25 m-MTDAB 320 344 26 m-MTDAPB 378 387 27 m-MTDATA 348 361 28 m-MTDATz 315 360 29 MPPPCA 456 430 30 MTBDAB 407 451 31 m-TTA 353 399 32 NEFAPQ 389 380 33 NPB 368 351 34 NPCA 396 402 35 NPECAPPP 425 382 36 o-MTDAB 315 337 37 o-MTDAPB 382 387 38 o-MTDATA 349 355 39 o-MTDATz 328 357 40 PAB 401 402 41 PAE3b 388 398 42 PAE3c 412 428 43 PAPA 394 406 44 PATB4a 398 394 45 PATB4d 423 421 46 PATB4e 416 449 47 p-BrTDAB 345 352 48 p-ClTDAB 337 320 49 p-DPA-TDAB 380 388 50 p-FTDAB 327 326 51 PhAMB-1T 357 355 52 PhCz2 363 376 53 p-MTDAB 328 344 54 p-MTDAPB 383 388 55 p-MTDATA 353 362 56 PPACBN 467 422 57 PPATC3e 415 445 58 PPCA 453 431 59 PPPCA 457 440 60 p-TTA 405 398 61 TBB 361 428 62 TBPSF 468 392 63 TCB 399 393 64 TCPB 445 453 65 TCTA 423 414 66 TDAPB 394 377 67 TDATA 362 352 68 TMB-TB 433 430 69 TPD 338 335 70 TPOB 410 383 71 TPTAB1 311 324 72 TPTAB2 319 310 73 TPTE 403 402
a Structure codes from Yin, S.; Wang, Y., J. Chem. Inf. Comput. Sci., 2003, 43, 970-977.
139
Table A7 The aqueous solubility data set (logS).
No. Compound Exp. Calc.
1 1-Bromoheptane -4.431 -3.409 2 1-Bromohexane -3.807 -3.214 3 Acetyl-R-mandelic acid -1.231 -2.342 4 1,1-Diphenylethene -4.436 -4.155 5 Benzo[b]triphenylene -8.222 -5.931 6 1,2,4,5-Tetrafluorobenzene -2.376 -1.534 7 1,3-Butadiene -1.867 -2.157 8 1,4-Dimethylcyclohexane -4.466 -3.070 9 1,4-Pentadiene -2.087 -2.309
10 1,5-Hexadiene -2.687 -2.562 11 1,6-Heptadiene -3.340 -2.836 12 1,6-Heptadiyne -1.747 -2.436 13 1,8-Nonadiyne -2.983 -2.954 14 1-Chloro-2-[2,2-dichloro-1-(4-chlorophenyl)ethyl]benzene -6.506 -6.201 15 1-Anthranol -4.721 -3.929 16 1-Bromo-2-naphthylisothiocyanate -0.319 -4.858 17 1-Bromo-3-chloropropane -1.848 -2.394 18 1-Bromo-3-fluorobenzene -2.666 -3.146 19 1-Butene -2.403 -2.109 20 1-Butyne -1.275 -1.911 21 1-Chloro-2,4-dinitronaphthalene -5.402 -4.238 22 1-Chloro-2-fluorobenzene -2.416 -2.613 23 1-Chloro-3-fluorobenzene -2.346 -2.637 24 1-Chloroheptane -3.996 -3.223 25 1-Ethyl-2-methylbenzene -3.207 -3.156 26 1-Heptene -3.733 -2.934 27 1-Heptyne -3.010 -2.714 28 1-Hexen-3-ol -2.344 -1.952 29 1-Methyl Tetrahydrofuran -1.538 -1.555 30 1-Methyl-1-cyclohexene -3.267 -2.605 31 1-Methylphenanthrene -5.854 -4.358 32 1-Naphthaleneacetic Acid -2.652 -3.037 33 1-Naphthol -3.519 -2.939 34 1-Naphthyl Isothiocyanate -4.602 -4.307 35 1-Nonene -5.053 -3.487 36 1-Nonyne -4.237 -3.263 37 1-Octyne -3.662 -2.997 38 1-Pentene -2.676 -2.392 39 17-Methyltestosterone -3.951 -3.260 40 2,2',3,3',4,4',6-Heptachlorobiphenyl -8.301 -7.524 41 2,2',4,5-Tetrachlorobiphenyl -7.252 -5.963 42 2,2',4,4'-Tetrachlorobiphenyl -6.123 -5.933 43 2,2',3,5'-Tetrachlorobiphenyl -6.562 -6.121 44 2,2',3,4,5'-Pentachlorobiphenyl -7.854 -6.648 45 2,2',3,3',4,4',5,5'-Octachlorobiphenyl -9.000 -7.928 46 2,2',3,4,5,5',6-Heptachlorobiphenyl -9.000 -7.564 47 2,2',3,4,6-Pentachlorobiphenyl -7.432 -6.587 48 2,2',3,3',5,6-Hexachlorobiphenyl -8.523 -7.135 49 2,2,3-Trimethyl-3-pentanol -0.833 -2.237 50 2,2,5,5-Tetramethyl-3-hexyne -3.833 -3.521 51 2,2,5-Trimethyl-3-hexyne -3.618 -3.270 52 2,2-Dimethyl-3-butanol -2.368 -1.938 53 2,2-Dimethyl-3-hexyne -4.143 -2.981 54 2,3',5-Trichlorobiphenyl -6.000 -5.381
140
55 2,3,4,5,6-Pentachlorophenoxyacetic Acid -3.745 -4.964 56 2,3,4,6-Tetrachlorophenoxyacetic Acid -3.409 -4.382 57 2,3,4-Trichlorophenoxyacetic Acid -3.097 -3.944 58 2,3,5-Trichloro-4-hydroxypyridine -4.286 -3.480 59 2,3,5-Trichlorophenoxyacetic Acid -3.000 -3.889 60 2,3,6-Trichlorophenoxyacetic Acid -2.620 -3.848 61 2,3-Dichlorophenoxyacetic Acid -2.810 -3.296 62 2,3-Dimethyl-1-butanol -2.133 -1.834 63 2,3-Dimethyl-2-pentanol -2.622 -2.132 64 2,3-Dimethyl-3-pentanol -2.595 -2.076 65 2,3-Xylenol -1.427 -2.334 66 2,4,6-Trichlorophenol -2.341 -3.826 67 2,4,6-Trichlorophenoxyacetic Acid -3.013 -3.837 68 2,4-Decadione -2.585 -2.236 69 2,4-Dimethyl-2-pentanol -2.683 -2.305 70 2,4-Dimethyl-3-pentanol -2.448 -2.172 71 2,4-Dimethyl-3-pentanone -3.046 -1.981 72 2,4-Dimethylquinoline -1.942 -3.341 73 2,4-Dinitrobenzoic Acid -1.067 -3.120 74 2,4-Dinitrophenol -2.598 -3.138 75 2,4-Lutidine -1.231 -2.370 76 2,4-Octadione -1.559 -1.644 77 2,5-Dichlorophenoxyacetic Acid -2.616 -3.313 78 2,5-Dimethyl-4-acetaminophenol -2.013 -2.034 79 2,5-Piperazinedione -0.831 -0.271 80 2,5-Xylenol -1.538 -2.407 81 2,6-Dichlorophenoxyacetic Acid -2.152 -3.287 82 2,6-Diethyl-4-acetaminophenol -2.531 -2.602 83 2,6-Diisopropyl-4-acetaminophenol -3.214 -2.977 84 2,6-Dimethyl-4-acetaminophenol -1.911 -2.023 85 2,6-Dimethyl-4-heptanol -3.904 -2.783 86 2,6-Dimethylnaphthalene -4.893 -3.766 87 2,6-Xylenol -1.305 -2.335 88 2-(2-Methyl-4-chlorophenoxy)propionic Acid -2.407 -3.191 89 2-Anthranol -4.328 -3.911 90 2-Chlorophenoxyacetic Acid -2.164 -2.639 91 2-Ethyl-1-butanol -3.152 -2.031 92 2-Ethylnaphthalene -4.291 -3.853 93 2-Fluorobenzyl Chloride -2.541 -2.623 94 2-Heptene -3.816 -2.909 95 2-Heptyne -3.770 -2.733 96 2-Hexanol -2.617 -1.998 97 2-Methyl-1-pentanol -2.976 -2.069 98 2-Methyl-1-pentene -3.033 -2.631 99 2-Methyl-2-hexanol -2.823 -2.241
100 2-Methyl-2-pentanol -2.244 -1.981 101 2-Methyl-3-hexyne -3.745 -2.756 102 2-Methyl-3-pentanol -2.451 -1.876 103 2-Methyl-4-acetaminophenol -1.595 -1.834 104 2-Methyl-4-penten-3-ol -2.260 -1.991 105 2-Methyl-5-t-butylphenol -2.594 -3.068 106 2-Methyldecalin -6.573 -3.702 107 2-Naphthoic Acid -3.886 -2.696 108 2-Naphthyl Isothiocyanate -4.444 -4.289 109 2-Nitrobenzaldehyde -3.878 -2.274 110 2-Pentene -2.538 -2.354 111 2-Thiouracil -2.257 -2.135 112 3,3'-Dichlorobiphenyl-4,4'-diamine -4.910 -3.903 113 3,3'-Dichlorobiphenyl -5.699 -4.777
141
114 3,3-Diphenylphthalide -4.855 -4.432 115 3,4,5-Trichlorophenoxyacetic Acid -2.939 -3.788 116 3,4,7,8-Tetramethyl-1,10-phenanthroline -5.222 -3.986 117 3,4-Dichlorophenoxyacetic Acid -2.684 -3.133 118 3,4-Xylenol -1.409 -2.408 119 3,5-Dichlorophenoxyacetic Acid -2.362 -3.096 120 3,5-Dinitrobenzoic Acid -2.197 -3.252 121 3,5-Pyridinedicarboxylic Acid -2.194 -1.400 122 3,5-Xylenol -1.398 -2.398 123 3-(5-tert-Butyl-1,3,4-thiadiazol-2-yl)-4-hydroxyl-l -1.877 -3.088 124 3-Bromo-2-nitrobenzoic Acid -2.872 -3.334 125 3-Bromobenzyl Isothiocyanate -3.971 -4.378 126 3-Carboxyphenylisothiocyanate -3.252 -3.038 127 3-Chloro-2-nitrobenzoic Acid -2.632 -2.687 128 3-Chlorobenzyl Isothiocyanate -3.863 -4.139 129 3-Chlorophenoxyacetic Acid -1.898 -2.447
130 3-Cyclohexyl-6-dimethylamino-1-methyl-1,3,5-triazine-2,4-dione -0.883 -2.268
131 3-Ethyl-3-pentanol -2.585 -2.210 132 3-Fluorobenzyl Chloride -2.544 -2.609 133 3-Heptanol -3.208 -2.259 134 3-Hexanol -2.547 -1.987 135 3-Hexanone -2.578 -1.763 136 3-Hexyne -2.167 -2.463 137 3-Hydroxy-5-methyl Isoxazole -0.067 -0.878 138 3-Hydroxyphenyl Isothiocyanate -1.991 -3.174 139 3-Methyl-1-butene -2.732 -2.381 140 3-Methyl-1-pentanol -3.121 -1.897 141 3-Methyl-2,4-pentadione -0.010 -1.132 142 3-Methyl-2-butanone -1.896 -1.479 143 3-Methyl-2-pentanol -2.466 -1.782 144 3-Methyl-2-pentanone -2.425 -1.814 145 3-Methyl-3-hexanol -2.734 -2.191 146 3-Methyl-3-pentanol -2.125 -1.890 147 3-Nitrobenzaldehyde -4.179 -2.293 148 3-Nitrobenzyl Isothiocyanate -4.086 -3.879 149 3-Nitropentane -1.955 -1.983 150 3-Nitrophenyl Isothiocyanate -3.553 -3.783 151 3-Nitrophthalic Acid -1.021 -1.862 152 3-Penten-2-ol -1.730 -1.811 153 3-Pentyl-2,4-pentadione -1.851 -2.006 154 3-Propyl-2,4-pentadione -0.876 -1.436 155 3-Thenoic Acid -1.474 -1.787 156 4,4'-Dimethylbiphenyl -6.000 -4.166 157 4,7-Dimethyl-1,10-phenanthroline -3.971 -3.616 158 4-(Methylthio)phenyl Dipropyl Phosphate -3.386 -3.280 159 4-(4-Chlorophenoxy)butyric Acid -3.290 -2.770 160 4-(2,4,5-Trichlorophenoxy)butyric Acid -3.829 -4.276 161 4-Benzoyl Phenylisothiocyanate -4.854 -4.286 162 4-Bromo-1-butene -2.247 -2.518 163 4-Bromobiphenyl -5.523 -4.708 164 4-Bromophenyl Isothiocyanate -0.268 -3.622 165 4-Carbethoxyphenylisothiocyanate -4.046 -3.612 166 4-Carboxyphenylisothiocyanate -3.975 -3.025 167 4-Chlorobenzyl Isothiocyanate -3.830 -4.152 168 4-Chlorophenyl Phenyl Ether -4.793 -4.151 169 4-Cyanobenzyl Isothiocyanate -3.495 -3.743 170 4-Dimethylaminophenyl Isothiocyanate -4.125 -3.628 171 4-Hexen-3-ol -2.165 -2.008
142
172 4-Hydroxyphenyl Isothiocyanate -2.668 -3.128 173 4-Methyl-1-pentene -3.244 -2.653 174 4-Methyl-3-pentanone -2.564 -1.813 175 4-Methylbenzaldehyde -1.724 -1.895 176 4-Methylbiphenyl -4.620 -3.924 177 4-Nitrobenzyl Isothiocyanate -3.633 -3.724 178 4-Nitrocatechol -1.571 -2.074 179 4-Nitroresorcinol -3.022 -2.113 180 4-Nonylphenol -4.498 -4.436 181 4-Penten-1-ol -1.924 -1.585 182 4-Penten-3-ol -1.766 -1.764 183 4-Vinyl-1-cyclohexene -3.335 -2.853 184 4-s-Butylphenol -2.194 -3.021 185 4-t-Butylphenol -2.413 -3.041 186 5,5-Dimethyl-2,4-hexadione -1.631 -1.753 187 5,5-Dipropylbarbituric Acid -2.398 -2.667 188 5,6-Dimethyl-2-thiouracil -2.056 -2.734 189 5-Bromo-2-nitrobenzoic Acid -1.521 -3.173 190 5-Bromo-3-tert-butyl-6-methyluracil -2.804 -3.657 191 5-Carboethoxy-2-thiouracil -2.099 -2.792 192 5-Chloro-2-nitrobenzoic Acid -1.319 -2.941 193 5-Ethyl-5-methylbarbituric Acid -1.096 -1.759 194 5-Ethyl-5-N-butylbarbituric Acid -1.638 -2.564 195 5-Ethyl-5-N-heptylbarbituric Acid -3.218 -3.307 196 5-Ethyl-5-N-hexylbarbituric Acid -3.049 -3.048 197 5-Ethyl-5-N-nonylbarbituric Acid -3.462 -3.856 198 5-Ethyl-5-N-octylbarbituric Acid -3.943 -3.582 199 5-Ethyl-5-N-propylbarbituric Acid -1.442 -2.241 200 5-Ethyl-5-pentylbarbituric Acid -2.177 -2.675 201 5-Methyl-2-thiouracil -2.446 -2.342 202 5-Nitro-1,10-phenanthroline -3.917 -3.543 203 6-Amino-2-thiouracil -2.747 -2.341 204 6-Methyl-2,4-heptadione -1.604 -1.897 205 6-Nitrophthalide -2.651 -2.556 206 7-Methylsulfinyl-2-xanthonecarboxylic Acid -5.062 -3.570 207 7-Methylthio-2-xanthonecarboxylic Acid -6.042 -4.030 208 Acetal -0.429 -1.870 209 Acetaminophen Acetate -2.781 -1.906 210 Acetaminophen Butyrate -2.826 -2.473 211 Acetaminophen Hexanoate -4.141 -3.040 212 Acetaminophen Laurate -4.745 -4.701 213 Acetaminophen Octanoate -4.443 -3.602 214 Acetaminophen Palmitate -4.892 -5.809 215 Acetaminophen Propionate -2.811 -2.181 216 Acetaminophen Stearate -4.922 -6.380 217 Adenine -2.119 -1.322 218 Adipic Acid -2.654 -1.194 219 Alachlor -3.261 -3.695 220 Allyl Bromide -1.499 -2.215 221 Ametryn -3.037 -3.408 222 Amikacin -0.500 2.060 223 Amitrole 0.522 -0.679 224 Ampyrone 0.554 -2.290 225 Amyl Acetate -1.877 -2.105 226 Ancymidol -2.596 -3.160 227 Androstenedione -3.699 -3.293 228 Anethole -3.126 -3.103 229 Anthraquinone -5.187 -3.220 230 Arginine 0.019 -0.482
143
231 Aspirin Phenylalanine Ethyl Ester -3.328 -3.278 232 Azinphos-methyl -4.039 -4.179 233 Barban -4.370 -4.146 234 Bendroflumethiazide -3.590 -3.535 235 Benzamide -0.956 -1.560 236 Benzhydrol -2.553 -3.325 237 Benzidine-2,2'-disulfonic Acid -4.634 -2.876 238 Benzoin -2.850 -3.116 239 Benzophenone -3.125 -3.043 240 Benzoyl-r-mandelic Acid -1.509 -3.408 241 Benzoylprop-ethyl -4.263 -5.408 242 Benzyl Alcohol -0.402 -1.909 243 Benzyl Isothiocyanate -3.137 -3.582 244 Benzylamine -1.533 -1.908 245 Bibenzyl -4.627 -4.296 246 Borneol -2.320 -2.563 247 Bromochloromethane -0.889 -2.139 248 Bromomethionic Acid 1.131 -1.712 249 Butadiyne -4.699 -2.021 250 Butyl Dibutyl Phosphinate -1.717 -1.753 251 CDAA -0.945 -2.781 252 Camphor -1.987 -2.384 253 Caproic Aldehyde -1.302 -1.806 254 Caprylic Aldehyde -2.360 -2.058 255 Carbazole -5.265 -3.330 256 Carbofuran -2.500 -2.378 257 Carbon Disulfide -3.170 -0.794 258 Carbonyl Sulfide -1.682 -1.954 259 Carboxin -3.141 -3.474 260 Carvacrol -2.080 -3.065 261 Chelidonic Acid -1.110 -1.596 262 Chloromethionic Acid -4.812 -1.444 263 Chloroneb -4.413 -3.675 264 Chloropicrin -2.006 -2.818 265 Chlorothalonil -5.647 -4.505 266 Chlorothiazide -3.020 -2.921 267 Chlorpropham -3.380 -3.439 268 Chlorpyrifos-methyl -4.907 -3.113 269 Chlorquinox -5.428 -5.049 270 Cinchonidine -3.168 -3.961 271 Cinchophen -3.193 -4.176 272 Cinnamaldehyde -1.991 -2.166 273 Citraconic Acid 0.779 -0.899 274 Cortisone -3.110 -2.867 275 Cortisone Acetate -4.277 -3.232 276 Cortisone Propionate -4.717 -3.192 277 Cumene Hydroperoxide -1.039 -2.409 278 Cyclobarbital -1.456 -2.508 279 Cycloheptane -3.515 -2.806 280 Cycloheptene -3.164 -2.650 281 Cyclohexanone -2.339 -1.312 282 Cyclooctane -4.152 -2.883 283 Cytosine -1.143 -0.874 284 D-Alanine 0.267 -0.611 285 D-Glutamic Acid -1.219 -0.773 286 DCPA -5.822 -4.796 287 d,l-2-(4-Chlorophenoxy)propionic Acid- -2.134 -3.066 288 d,l-Aminooctanoic Acid -4.202 -1.954 289 d,l-Aspartic Acid -1.212 -0.601
144
290 d,l-Glutamic Acid -0.746 -0.698 291 d,l-Isoleucine -0.778 -1.382 292 d,l-Methionine -0.659 -1.570 293 d,l-Norvaline -0.145 -1.154 294 d,l-Phenylalanine -1.066 -1.995 295 d,l-alpha-Aminobutyric Acid 0.287 -0.904 296 DMPA -4.798 -4.202 297 Dalapon 0.545 -2.391 298 Daminozide -0.205 -0.533 299 Dazomet -3.876 -3.342 300 Decabromodiphenyl Ether -7.585 -6.483 301 Decyl-p-hydroxybenzoate -2.885 -4.198 302 Dexamethasone -3.644 -2.464 303 Dianisidine -3.610 -2.850 304 Dibenzo-18-crown-6 -4.693 -4.413 305 Dibenzothiophene -5.542 -4.281 306 Dibutyl Butyl Phosphonate -2.700 -2.307 307 Dibutyl Ethoxybutyl Phosphate -2.647 -3.030 308 Dibutyl Ethyl Phosphate -1.846 -2.140 309 Dibutyl Ethyl Phosphonate -1.569 -1.917 310 Dibutyl Hydrogen Phosphonate -1.425 -2.025 311 Dibutyl Methyl Phosphate -1.499 -2.039 312 Dibutyl Methyl Phosphonate -1.415 -1.467 313 Dicamba -1.691 -3.214 314 Dichlobenil -4.236 -3.581 315 Dichlofenthion -6.110 -3.750 316 Dichlone -6.357 -3.822 317 Dichlorodifluoromethane -2.635 -1.823 318 Dichlorophen -5.698 -4.484 319 Dichlorprop -2.453 -3.807 320 Dicofol -5.448 -6.544 321 Diethyl Amyl Phosphate -1.476 -1.821 322 Diethyl Butyl Phosphate -1.147 -1.638 323 Diethyl Hexyl Phosphonate -2.666 -1.737 324 Diethyl Trichloromethyl Phosphonate -1.754 -1.426 325 Diethylstilbestrol -4.350 -4.487 326 Digallic Acid -2.809 -2.687 327 Digitoxin -5.293 -4.771 328 Dimethirimol -2.242 -2.497 329 Dinitramine -5.467 -3.654 330 Dinoseb -3.665 -3.960 331 Diphenic Acid -2.284 -2.697 332 Diphenyl Methyl Phosphate -2.121 -2.592 333 Diphenylacetic Acid -3.222 -3.405 334 Diphenylnitrosamine -3.752 -3.623 335 Dixanthogen -4.959 -5.389 336 EPTC -2.703 -3.846 337 Estragole -2.921 -3.159 338 Ethalfluralin -6.222 -4.160 339 Ethoate-methyl -1.457 -2.134 340 Ethohexadiol -2.287 -1.849 341 Ethyl Cinnamate -2.996 -2.928 342 Ethyl Cyanoacetate -0.753 -1.556 343 Ethyl Dibutyl Phosphonate -1.200 -1.917 344 Ethyl Hydrocinnamate -2.909 -3.054 345 Ethyl Phthalate -2.347 -2.715 346 Ethyl Propionate -0.667 -1.437 347 Ethyl m-Isothiocyanobenzoate -3.602 -3.757 348 Ethyl p-Benzoate -2.319 -2.492
145
349 Ethylidene Chloride -1.292 -2.360 350 Ethylmalonic Acid 0.732 -1.310 351 Eugenol -1.824 -2.627 352 Fenarimol -4.383 -4.917 353 Fenbufen -5.046 -3.578 354 Fensulfothion -2.302 -3.569 355 Flufenamic Acid -4.398 -2.902 356 Fluometuron -3.412 -1.743 357 Fluorobenzene -1.792 -1.930 358 Fluorometholone -5.843 -2.861 359 Fumaric Acid -1.220 -0.956 360 Glutamic Acid -1.235 -0.773 361 Glutamine -0.548 -0.337 362 Glyphosate -1.149 0.581 363 Heptanoic Acid -1.665 -1.960 364 Heptyl p-Hydroxybenzoate -2.234 -3.386 365 Hexachloro-1,3-butadiene -4.907 -4.854 366 Hexachlorobenzene -7.770 -5.329 367 Hexadecyl p-Hydroxybenzoate -2.981 -5.818 368 Hexobarbital -2.735 -2.910 369 Hexyl Acetate -2.451 -2.419 370 Hexyl p-Hydroxybenzoate -2.768 -3.114 371 Histidine -0.533 -0.901 372 Hydantoic Acid -0.483 -0.534 373 Ibuprofen -4.367 -3.407 374 Isoamyl Acetate -3.558 -2.124 375 Isoamyl Salicylate -3.157 -3.227 376 Isoamylmalonic Acid 0.543 -1.867 377 Isobutane -3.075 -2.188 378 Isobutyl Isobutyrate -2.403 -2.275 379 Isobutylbenzene -4.123 -3.469 380 Isobutylene -2.329 -2.054 381 Isobutyraldehyde 0.091 -1.351 382 Isoprene -2.026 -2.334 383 Isopropalin -6.491 -4.909 384 Isopropyl Ether -4.608 -2.049 385 Isopropyl tert-Butyl Ether -2.366 -2.395 386 L-Asparagine -0.652 -0.351 387 L-Cystine -3.343 -2.327 388 Glutamic Acid -1.235 -0.773 389 Histidine -0.533 -0.901 390 L-Isoleucine -0.582 -1.428 391 L-Mandelic Acid -0.233 -1.923 392 Lactamide -5.057 -0.578 393 Lenacil -4.592 -2.712 394 Leptophos -5.235 -5.609 395 Levodopa -1.717 -1.389 396 Limonene -4.194 -3.312 397 Linalool -1.987 -2.879 398 MCPA -2.233 -2.945 399 MCPB -3.678 -3.238 400 Maleic Acid 0.579 -0.949 401 Mandelic Acid -5.924 -2.030 402 Meconic Acid -1.377 -2.011 403 Meconin -1.890 -2.402 404 Menthol -2.535 -2.602 405 Methacrylonitrile -2.167 -1.769 406 Methidathion -3.100 -3.376 407 Methionine -0.421 -1.570
146
408 Methomyl -0.447 -2.070 409 Methotrimeprazine -5.960 -5.427 410 Methyl Benzoate -1.839 -2.217 411 Methyl Butyrate -0.833 -1.595 412 Methyl Dixanthogen -3.939 -4.791 413 Methyl Isopropyl Ether -1.802 -1.599 414 Methyl Oxalate -0.292 -1.380 415 Methyl Propyl Ether -2.130 -1.785 416 Methyl m-Isothiocyanobenzoate -3.565 -3.502 417 Methylamine 1.605 -0.677 418 Methylaniline -1.280 -2.120 419 Methylmalonic Acid 0.760 -1.061 420 Methyltestosterone Acetate -4.854 -2.524 421 Methylthiouracil -2.426 -2.562 422 Metolazone -3.783 -3.828 423 Mirex -6.807 -8.824 424 Monolinuron -2.465 -2.719 425 Mustard Gas -2.363 -2.964 426 Myristic Acid -5.301 -3.878 427 Myristyl Alcohol -6.053 -4.127 428 N',N'-Dimethyl-m-aminophenyl Isothiocyanate -3.710 -3.650 429 Naproxen -4.161 -3.482 430 Naptalam -3.163 -3.639 431 Neburon -4.758 -4.242 432 Neopentyl Alcohol -2.146 -1.800 433 Niridazole -4.962 -2.763 434 Nitralin -5.760 -4.499 435 Nitrilotriacetic Acid -0.510 -0.608 436 Nitroethane -1.917 -1.439 437 Nitroguanidine -1.374 -1.432 438 Nitromethane 0.256 -1.265 439 Nonyl Aldehyde -3.171 -2.326 440 Nonyl p-Hydroxybenzoate -2.316 -3.917 441 Norflurazon -4.035 -2.834 442 Octadecyl-p-hydroxybenzoate -3.079 -6.375 443 Octyl p-Hydroxybenzoate -2.485 -3.670 444 Octylamine -2.810 -2.586 445 Oxamyl 0.106 -2.708 446 Oxanilic Acid -1.302 -2.244 447 Oxycarboxin -2.427 -2.526 448 Palmitic Acid -5.523 -4.424 449 Parathion -4.084 -3.290 450 Pentachlorbenzyl Alcohol -6.147 -4.886 451 Pentachloroethane -2.607 -3.641 452 Pentachlorophenol -4.208 -4.929 453 Pentamethylmelamine -1.958 -1.573 454 Phenacetin -2.350 -2.103 455 Phenetole -2.301 -2.592 456 Phenothiazine -5.097 -4.073 457 Phenoxyacetic Acid -3.959 -1.956 458 Phenyl Isothiocyanate -3.177 -3.392 459 Phenyl Salicylate -3.155 -3.398 460 Phosmet -4.104 -3.897 461 Phthalimide -2.611 -1.919 462 Picloram -2.749 -3.370 463 Pindone -4.107 -3.122 464 Pipemidic Acid -2.975 -2.083 465 Pirimicarb -1.946 -2.624 466 Prednisolone -3.953 -3.105
147
467 Propyl Acetate -2.453 -1.485 468 Propyl Dixanthogen -5.699 -6.363 469 Propylthiouracil -2.151 -3.138 470 Propyne -1.042 -1.639 471 Propyzamide -4.232 -3.956 472 Protoporphyrin IX -3.721 -7.160 473 Pyrocatechol -5.378 -1.817 474 Quinethazone -5.031 -2.480 475 Quinhydrone -1.730 -1.662 476 Quinidine -3.365 -4.283 477 Resorcinol -5.186 -1.731 478 Rhodanine -1.772 -2.583 479 Saccharin -1.629 -2.192 480 Salicin -0.855 -1.176 481 Salicylanilide -3.589 -3.104 482 Serine 0.607 -0.308 483 Siduron -4.111 -3.365 484 Sucrose 0.793 0.347 485 Sulfapyridine -2.969 -2.776 486 Sulfathiazole -2.835 -3.224 487 Tetradecyl p-Hydroxybenzoate -2.975 -5.321 488 Tetrahydropyran -1.776 -1.659 489 Thiometon -3.091 -3.536 490 Thionazin -2.338 -2.624 491 Threonine -0.089 -0.488 492 Thymine -1.519 -0.938 493 Triallate -4.882 -5.516 494 Tributylamine -3.000 -3.974 495 Tributylphosphine Oxide -0.593 -0.091 496 Trichlorfon -0.223 -2.151 497 Tricyclazole -2.073 -3.165 498 Tridecyl p-Hydroxybenzoate -2.945 -5.050 499 Trietazine -4.060 -3.298 500 Triethyl Phosphate 0.439 -1.100 501 Triethylamine -0.138 -2.265 502 Trimethoprim -2.861 -2.836 503 Trimethyl Phosphate 0.553 -0.629 504 Trimethylamine 0.841 -1.561 505 Triphenylcarbinol -2.260 -4.631 506 Tripropylamine -2.301 -3.176 507 Undecane -7.553 -4.137 508 Undecyl p-Hydroxybenzoate -2.092 -4.488 509 Uric Acid -3.730 -1.188 510 Vanillin -1.140 -1.756 511 Xylene -3.000 -2.849 512 Xylidine -2.040 -2.273 513 α,α,α-Trifluoro-o-toluic Acid -1.598 -1.316 514 2,2’-Bipyridine -1.420 -3.021 515 alpha-1,2,3,4,5,6-Hexachlorocyclohexane -5.163 -4.975 516 alpha-Endosulfan -5.885 -5.455 517 alpha-Hydroxycaproamide -1.081 -1.286 518 beta,beta,beta-Trichlorolactic Acid -2.645 -2.705 519 beta-1,2,3,4,5,6-Hexachlorocyclohexane -6.084 -4.757 520 beta-Alanine -5.213 -0.683 521 beta-Aminobutyric Acid 1.084 -0.488 522 beta-Endosulfan -6.162 -5.641 523 cis-1,2-Dimethylcyclohexane -4.272 -3.055 524 d-Borneol -2.319 -2.530 525 d-Camphoric Acid -1.421 -2.014
148
526 d-Fenchone -1.851 -2.416 527 d-Limonene -3.996 -3.316 528 dl-2-Octanol -2.036 -2.541 529 epsilon-Aminocaproic Acid -5.415 -1.053 530 4,4’-Bipyridine -1.538 -2.889 531 gamma-Aminobutyric Acid 1.101 -0.651 532 l-Menthone -2.492 -2.562 533 m-Acetoxyphenyl Isothiocyanate -3.114 -3.547 534 m-Acetylphenyl Isothiocyanate -4.328 -3.423 535 m-Biphenyl Isothiocyanate -4.523 -4.709 536 m-Cyanophenyl Isothiocyanate -3.193 -3.542 537 m-Ethoxyphenyl Isothiocyanate -3.420 -3.790 538 m-Fluorobenzoic Acid -1.970 -1.642 539 m-Isopropoxyphenyl Isothiocyanate -3.328 -4.109 540 m-Isothiocyanobenzoic Acid -3.097 -3.297 541 m-Isothiocyanophenyl Isothiocyanate -4.699 -4.109 542 m-Methylphenyl Isothiocyanate -3.848 -3.648 543 m-Terphenyl -5.155 -4.980 544 m-Toluenesulfonamide -1.341 -2.453 545 n-2-Hydroxy-n2,n4,n4,n6,n6-pentamethylmelamine -2.371 -1.802 546 n-Amyl Bromide -3.077 -2.937 547 n-Amyl beta-Ethoxypropionate -2.196 -2.845 548 n-Butyl Chloride -2.025 -2.356 549 n-Butyl Ether -3.592 -2.744 550 n-Butyl beta-Ethoxypropionate -1.639 -2.404 551 n-Butylmalonic Acid 0.437 -1.771 552 n-Capric Acid -3.445 -2.789 553 n-Ethyl beta-Ethoxypropionate -0.421 -1.857 554 n-Hexyl beta-Ethoxypropionate -2.829 -2.936 555 n-Methyl beta-Ethoxypropionate -0.072 -1.758 556 n-Methylolpentamethylmelamine -2.400 -1.666 557 n-Methylolpentamethylmelamine Methyl Ether -2.205 -2.230 558 n-Octyl Bromide -5.063 -3.678 559 n-Propyl beta-Ethoxypropionate -1.017 -2.136 560 n-Propylcyclopentane -4.740 -2.998 561 n-Propylmalonic Acid 0.680 -1.390 562 n-Valeraldehyde -0.867 -1.249 563 n2,n2,n4,n4-Tetramethylmelamine -2.688 -1.332 564 n2,n4,n6-Triethyl-n2,n4,n6-trimethylmelamine -3.703 -2.998 565 n6,n6-Diethyl-n2,n2,n4,n4-tetramethylmelamine -3.507 -2.616 566 o,p'-DDE -6.357 -6.399 567 o-Chlorobenzoic Acid -1.872 -2.191 568 o-Chlorophenol -1.054 -2.587 569 o-Fluorobenzoic Acid -1.289 -1.250 570 o-Nitrophenol -1.745 -2.368 571 o-Phenylphenol -2.386 -3.337 572 o-Terphenyl -5.301 -5.014 573 o-Tolidine -2.213 -3.111 574 o-Toluenesulfonamide -2.023 -2.367 575 o-Toluic Acid -2.060 -1.951 576 o-Toluidine -0.860 -2.042 577 4-(Dodecyloxy)benzoic Acid -2.447 -4.514 578 p-Acetylphenyl Isothiocyanate -0.022 -3.122 579 p-Anisaldehyde -1.502 -1.662 580 p-Biphenyl Isothiocyanate -4.854 -4.705 581 p-Cresol -0.701 -2.243 582 p-Dibromobenzene -4.072 -4.302 583 p-Ethoxyphenyl Isothiocyanate -4.260 -3.718 584 p-Ethylphenol -1.397 -2.588
149
585 p-Fluorobenzoic Acid -2.067 -1.625 586 p-Methylbenzyl Isothiocyanate -3.796 -3.869 587 p-Phenylenediamine -0.379 -1.196 588 p-Phenylphenol -3.481 -3.323 589 p-Toluenesulfonamide -1.734 -2.350 590 p-Tolyl Isothiocyanate -4.721 -3.638 591 p-tert-Pentylphenol -2.990 -3.250 592 l-Mandelic Acid -0.233 -1.923 593 S-Trioxane 0.288 -0.479 594 tert-Amylbenzene -4.150 -3.552 595 trans-Crotonic Acid 0.000 -1.287 596 trans-Stilbene -5.793 -4.322
150
Appendix B
The measures of skewness and kurtosis of the distribution of surface property
values were added to the set of Parasurf ’07 statistical descriptors during the study of
phospholipidosis-inducing drugs. These measures are described briefly below.
Skewness, or the third standardized moment, is a measure of the asymmetry of data
distribution, describing the left or right –handedness of a distribution of values. It is
described by the equation:
( )( )
3
11 31
N
ii
x x
Nγ
σ=
−=
−
∑
where x is the mean, σ is the standard deviation, and N is the number of data points. The
skewness for a normal distribution is zero, and any symmetric data should have a
skewness near zero. Negative values for the skewness indicate data that are skewed left
and positive values for the skewness indicate data that are skewed right.
Kurtosis, or the fourth standardized moment, is a measure of whether the data distribution
is peaked or flat relative to a normal distribution and is described by the equation:
( )( )
4
12 41
N
ii
x x
Nγ
σ=
−=
−
∑
where x is the mean, σ is the standard deviation, and N is the number of data points.
Data sets with high kurtosis tend to have a distinct peak near the mean, decline rather
rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the
mean rather than a sharp peak.
151
References 1. Kennedy, S. P.; Bormann, B. J. Effective partnering of academic and physician
scientists with the pharmaceutical drug development industry. Experimental Biology and Medicine 2006, 231, 1690-1694.
2. Silverman, R. B. The Organic Chemistry of Drug Design and Drug Action. 2nd ed.; Elsevier Academic Press: New York, 2004.
3. Kubinyi, H. Opinion: Drug research: myths, hype, and reality. Nature Reviews Drug Discovery 2003, 2, 665-668.
4. Kubinyi, H. Lectures of the Drug Design Course. http://www.kubinyi.de/lectures.html
5. Hammett, L. P. Effect of structure upon the reactions of organic compounds. Benzene derivatives. Journal of the American Chemical Society 1937, 59, 96-103.
6. Hansch, C.; Maloney, P.; Fujita, T.; Muir, R. M. Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature 1962, 194, 178-80.
7. Fischer, H.; Gottschlich, R.; Seelig, A. Blood-Brain Barrier Permeation: Molecular Parameters Governing Passive Diffusion. Journal of Membrane Biology 1998, 165, 201-211.
8. Overton, E. Osmotic properties of the cells and their importance for toxicology and pharmacology. Zeitschrift für Physikalische Chemie, Stöchiometrie und Verwandtschaftslehre 1897, 22, 189-209.
9. Sangster, J. Octanol-Water Partition Coefficients: Fundamentals and Physical Chemistry. Wiley: New York, 1997; p 79-112.
10. Meylan, W. M.; Howard, P. H. Atom/fragment contribution method for estimating octanol-water partition coefficients. Journal of Pharmaceutical Science 1995, 84, 83-92.
11. Hansch, C.; Steward, A. R.; Anderson, S. M.; Bentley, D. The parabolic dependence of drug action upon lipophilic character as revealed by a study of hypnotics. Journal of Medicinal Chemistry 1968, 11(1), 1-11.
12. Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced Drug Delivery Reviews 1997, 1997, 3-25.
13. Clark, T. Quantum Cheminformatics: An Oxymoron? Beilstein Institute Workshop, Chemical Data Analysis in the Large, May 22-26, 2000, Bozen, Italy 2000.
14. Clark, T.; Ford, M.; Essex, J.; Richards, W. G.; Ritchie, D. W. A non-atom-based paradigm for modeling QSAR and QSPR. In QSAR and Molecular Modelling in Rational Design of Bioactive Molecules, Proceedings of the 15th European Symposium on Structure-Activity Relationships (QSAR) and Modelling Istanbul, Turkey, Sept. 5-10, 2004.
15. Monard, G.; Kenneth M. Merz, J. Combined Quantum Mechanical/Molecular Mechanical Methodologies Applied to Biomolecular Systems. Accounts of Chemical Research 1999, 32(10), 904-911.
16. Warshel, A.; Levitt, M. Theoretical studies of enzymic reactions: dielectric, electrostatic and steric stabilization of the carbonium ion in the reaction of lysozyme. Journal of Molecular Biology 1976, 103(2), 227-49.
152
17. Clark, T. Modelling the chemistry: time to break the mould? Euro QSAR 2002, Designing drugs and crop protectants, 111-121.
18. Murray, J. S.; Lane, P.; Brinck, T.; Paulsen, K.; Grice, M. E.; Politzer, P. Relationships of critical constants and boiling points to computed molecular surface properties. Journal of Physical Chemistry 1993, 97(37), 9369-9373.
19. Murray, J. S.; Politzer, P. Statistical analysis of the molecular surface electrostatic potential: an approach to describing noncovalent interactions in condensed phases. Journal of Molecular Structure 1998, 425, 107-114.
20. Murray, J. S.; Ranganathan, S.; Politzer, P. Correlations between the solvent hydrogen bond acceptor parameter β and the calculated molecular electrostatic potential. Journal of Organic Chemistry 1991, 56, 3734-3737.
21. Politzer, P.; Lane, P.; Murray, J. S.; Brinck, T. Investigation of relationships between solute molecule surface electrostatic potentials and solubilities in supercritical fluids. Journal of Physical Chemistry 1992, 96(20), 7938-7943.
22. Politzer, P.; Murray, J. S. Molecular electrostatic potentials and chemical reactivity. In Rev. Comput. Chem., Lipkowitz, K.; Boyd, R. B., Eds. VCH: New York, 1998; Vol. 2, p 273.
23. Politzer, P.; Murray, J. S.; Peralta-Inga, Z. Molecular Surface Electrostatic Potentials in Relation to Noncovalent Interactions in Biological Systems. International Journal of Quantum Chemistry 2001, 85, 676-684.
24. Ehresmann, B.; Martin, B.; Horn, A. H. C.; Clark, T. Local molecular properties and their use in predicting reactivity. Journal of Molecular Modeling 2003, 9, 342-347.
25. Ehresmann, B.; Groot, M. J. d.; Alex, A.; Clark, T. New Molecular Descriptors Based on Local Properties at the Molecular Surface and a Boiling-Point Model Derived from Them. Journal of Chemical Information and Computational Sciences 2004, 43, 658-668.
26. Sjoberg, P.; Murray, J. S.; Brinck, T.; Politzer, P. A. Average local ionization energies on the molecular surfaces of aromatic systems as guides to chemical reactivity. Canadian Journal of Chemistry 1990, 68, 1440-1443.
27. Mulliken, R. S. New electroaffinity scale; together with data on valence states and on valence ionization potentials and electron affinites. Journal of Chemical Physics 1934, 2, 782-93.
28. Mulliken, R. S. Electronic population analysis on LCAO-MO molecular wave functions. II. Overlap populations, bond orders, and covalent bond energies. Journal of Chemical Physics 1955, 23, 1833-40.
29. Pearson, R. G. Density functional theory: electronegativity and hardness. Chemtracts: Inorganic Chemistry 1991, 3(6), 317-33.
30. Schürer, G.; Gedeck, P.; Gottschalk, M.; Clark, T. Accurate parametrized variational calculations of the molecular electronic polarizability by NDDO-based methods. International Journal of Quantum Chemistry 1999, 75, 17.
31. Jäger, R.; Kast, S. M.; Brickmann, J. Parameterization Strategy for the MolFESD Concept: Quantitative Surface Representation of Local Hydrophobicity. Journal of Chemical Information and Computational Sciences 2003, 43, 237-247.
32. Jäger, T.; Schmidt, F.; Schilling, B.; Brickmann, J. Localization and quantification of hydrophobicity; The molecular free energy density (MolFESD) concept and its application to the sweetness recognition. Journal of Computer-Aided Molecular Design 2000, 14, 631-646.
33. Pixner, P.; Heiden, W.; Merx, H.; Möller, A.; Moeckel, G.; Brickmann, J. Empirical Method for the Quantification and Localization of Molecular
153
Hydrophobicity. Journal of Chemical Information and Computational Sciences 1994, 34, 1309-1319.
34. Ehresmann, B.; Groot, M. J. d.; Clark, T. A Surface-Integral Solvation Energy Model: The Local Solvation Energy. Journal of Chemical Information and Computational Sciences 2005, 45, 1053-1060.
35. Clark, T.; Lin, J.-H.; Horn, A. H. C. Parasurf '06, A1; CEPOS InSilico Ltd.: 26 Brookfield Gardens Ryde, Isle of Wight PO33 3NP, 2005.
36. SYBYL 7.0, Tripos Inc.: 1699 South Hanley Rd., St. Louis, Missouri, 63144, USA. 37. Politzer, P.; Weinstein, H. Some relations between electronic distribution and
electronegativity. Journal of Chemical Physics 1979, 71, 4218-4220. 38. Koopmans, T. C. The distribution of wave function and characteristic value among
the individual electrons of an atom. Physica 1933, 1, 140-113. 39. Lin, J.-H.; Clark, T. An Analytical, Variable Resolution, Complete Description of
Static Molecules and Their Intermolecular Binding Properties. Journal of Chemical Information and Modelling 2005, 45(4), 1010-1016.
40. Rivail, J.-L.; Cartier, A. Variational Calculation of Electronic Multipole Molecular Polarizabilites. Molecular Physics 1978, 36, 1085-1097.
41. Rivail, J.-L.; Cartier, A. An Extended Variational Method for Calculating Molecular Multipole Polarizabilities. Chemical Physics Letters 1979, 61, 469-472.
42. Martin, B.; Clark, T. Dispersion treatment for NDDO-based semiempirical MO techniques. International Journal of Quantum Chemistry 2006, 106(5), 1208-1216.
43. Martin, B.; Gedeck, P.; Clark, T. Additive NDDO-based atomic polarizability model. International Journal of Quantum Chemistry 2000, 77(1), 473-497.
44. Schamberger, J.; Gedeck, P.; Martin, B.; Schindler, T.; Hennemann, M.; Horn, A. H. C.; Ehresmann, B.; Clark, T. GEISHA, Erlangen, Germany, 2003.
45. DeLano, W. L. The PyMOL Molecular Graphics System, DeLano Scientific: Palo Alto, CA, USA, 2002.
46. Lombardo, F.; Shalaeva, M. Y.; Tupper, K. A.; Gao, F.; Abraham, M. H. ElogPoct: A Tool for Lipophilicity Determination in Drug Discovery. Journal of Medicinal Chemistry 2000, 43, 2922-2928.
47. Mannhold, R.; Cruciani, G.; Dross, K.; Rekker, R. Multivariate analysis of experimental and computational descriptors of molecular lipophilicity. Journal of Computer-Aided Molecular Design 1998, 12, 573-581.
48. Mannhold, R.; van de Waterbeemd, H. Substructure and whole molecule approaches for calculating logP. Journal of Computer-Aided Molecular Design 2001, 15, 337-354.
49. Nadig, G.; Zant, L. C. V.; Dixon, S. L.; Kenneth M. Merz, J. Charge-Transfer Interactions in Macromolecular Systems: A New View of the Protein/Water Interface. Journal of the American Chemical Society 1998, 120(22), 5593-5594.
50. CORINA 3D Structure Generator, Molecular Networks, GmbH: Erlangen, Germany, 2006.
51. Sadowski, J.; Gasteiger, J.; Klebe, G. Comparison of Automatic Three-Dimensional Model Builders Using 639 X-Ray Structures. Journal of Chemical Information and Computational Sciences 1994, 34, 1000-1008.
52. Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model. Journal of the American Chemical Society 1985, 107(13), 3902-9.
154
53. Clark, T.; Alex, A.; Beck, B.; Burkhardt, F.; Chandrasekhar, J.; Gedeck, P.; Horn, A. H. C.; Hutter, M.; Martin, B.; Rauhut, G.; Sauer, W.; Schindler, T.; Steinke, T. VAMP, 9.0; Accelrys Inc.: San Diego, 2003.
54. Winget, P.; Horn, A. H. C.; Selçuki, C.; Martin, B.; Clark, T. AM1* Parameters for Phosphorous, Sulfur and Chlorine. J. Mol. Model. 2003, 9, 408-414.
55. Rinaldi, D.; Rivail, J.-L. Molecular polarizabilities and dielectric effect of the medium in the liquid state. Theoretical study of the water molecule and its dimers. Theor. Chim. Acta 1973, 32, 57.
56. Rinaldi, D.; Rivail, J.-L. Calculation of molecular electronic polarizabilities. Comparison of different methods. Theor. Chim. Acta 1974, 32, 243-251.
57. TSAR 3.3, 3.3; Oxford Molecular Ltd.: Oxford, England, 2000. 58. Breindl, A.; Beck, B.; Clark, T. Prediction of the n-Octanol/Water Partition
Coefficient, logP, Using a Combination of Semiempirical MO-Calculations and a Neural Network. Journal of Molecular Modelling 1997, 3, 142-155.
59. Hansch, C.; Leo, A.; Hoekman, D. Exploring QSAR: Hydrophobic, Electronic, and Steric Constants. The American Chemical Society: Washington, D.C., 1995.
60. Sotomatsu, T.; Nakagawa, Y.; Fujita, T. Quantitative Structure-Activity Studies of Benzoylphenylurea Larvicides. Pesticides Biochem. and Physiol. 1987, 27, 156-164.
61. Klammt, A.; Schüürmann, G. COSMO: A new approach to dielectric screening in solvents with explicit expressions for the screening energy and its gradient. J. Chem. Soc., Perkin Transactions 1993, 2, 799-805.
62. Schüürmann, G. Prediction of Henry's Law Constant of Benzene Derivatives Using Quantum Chemical Continuum-Solvation Models. Journal of Computational Chemistry 2000, 21, 17-34.
63. Thompson, J. D.; Cramer, C. J.; Truhlar, D. G. New Universal Solvation Model and Comparison of the Accuracy of the SM5.42R, SM5.43R, C-PCM, D-PCM, and IEF-PCM Continuum Solvation Models for Aqueous and Organic Solvation Free Energies and for Vapor Pressures. Journal of Physical Chemistry B 2004, 108, 6532-6542.
64. Wang, J.; Wang, W.; Huo, S.; Lee, M.; Kollman, P. A. Solvation Model Based on Weighted Solvent Accessible Surface Area. Journal of Physical Chemistry B 2001, 105, 5055-5067.
65. Tehan, B. G.; Lloyd, E. J.; Wong, M. G.; Pitt, W. R.; Gancia, E.; Manallack, D. T. Estimation of pKa Using Semiempirical Molecular Orbital Methods. Part 2: Application to Amines, Anilines, and Various Nitrogen Containing Heterocyclic Compounds. Quantitative Structure-Activity Relationships 2002, 21.
66. Physical/Chemical Property Database (PHYSPROP), Syracuse Research Corporation, Environmental Research Center: Syracuse, NY, USA.
67. Shirakawa, H.; Louis, E. J.; MacDiarmid, A. G. Synthesis of electrically conducting organic polymers: halogen derivatives of polyacetylene, (CH)x. J. Chem. Soc., Chem. Commun. 1977, 578-580.
68. Thomas, K. R. J.; Lin, J. T.; Tao, Y.-T.; Chuen, C.-H. Quinoxalines Incorporating Triarylamines: Potential Electroluminescent Materials with Tunable Emission Characteristics. Chemistry of Materials 2002, 14, 2796-2802.
69. Thomas, K. R. J.; Lin, J. T.; Tao, Y.-T.; Ko, C.-W. New Star-Shaped Luminescent Triarylamines: Synthesis, Thermal, Photophysical, and Electroluminescent Characteristics. Chemistry of Materials 2002, 14, 1354-1361.
155
70. Yin, S.; Shuai, Z.; Wang, Y. A Quantitative Structure-Property Relationship Study of the Glass Transition Temperature of OLED Materials. Journal of Chemical Information and Computational Sciences 2003, 43, 970-977.
71. Yalkowsky, S. H.; Dannenfelser, R. M. AQUASOL database of aqueous solubility. In College of Pharmacy, University of Arizona, Tucson, AZ: 2000.
72. ACD/Solubility DB, release 10.0, Advanced Chemistry Development, Inc.: Toronto ON, Canada, 2006.
73. Cheng, A.; K. M. Merz, J. Prediction of Aqueous Solubility of a Diverse Set of Compounds Using Quantitative Structure-Property Relationships. Journal of Medicinal Chemistry 2003, 46(17), 3572-3580.
74. Delaney, J. S. ESOL: Estimating Aqueous Solubility Directly from Molecular Structure. Journal of Chemical Information and Computational Sciences 2004, 44(3), 1000-1005.
75. Xie, L.; Liu, H. The Treatment of Solvation by a Generalized Born Model and a Self-Consistent Charge-Density Functional Theory-Based Tight-Binding Model. Journal of Computational Chemistry 2002, 23, 1404-1415.
76. Reasor, M. J. A review of the biology and toxicologic implications of the induction of lysosomal bodies by drugs. Toxicology and Applied Pharmacology 1989, 97, 47-56.
77. Anderson, N.; Borlak, J. Drug-induced phospholipidosis. Federation of European Biochemical Societies Letters 2006, 580, 5533-5540.
78. Reasor, M. J.; Kacew, S. Drug-Induced Phospholipidosis: Are There Functional Consequences? Experimental Biology and Medicine 2001, 226, 825-830.
79. Halliwell, W. H. Cationic amphiphilic drug-induced phospholipidosis. Toxicologic Pathology 1997, 25, 53-60.
80. Fujita, T.; Iwasa, J.; Hansch, C. A new substituent constant, π, derived from partition coefficients. Journal of the American Chemical Society 1964, 86(23), 5175-5180.
81. Coulombe, P. A.; Kan, F. W.; Bendayan, M. Introduction of a high-resolution cytochemical method for studying the distribution of phospholipids in biological tissues. European Journal of Cell Biology 1988, 46(3), 564-76.
82. Bauknecht, H.; Zell, A.; Bayer, H.; Levi, P.; Wagener, M.; Sadowski, J.; Gasteiger, J. Locating biologically active compounds in medium-sized heterogenous datasets by topological autocorrelation vectors: dopamine and benzodiazepine agonists. Journal of Chemical Information and Computational Sciences 1996, 36(6), 1205-13.
83. Sadowski, J.; Wagener, M.; Gasteiger, J. Assessing similarity and diversity of combinatorial chemistry libraries by spatial autocorrelation functions and neural networks. Angewandte Chemie, Int'l Ed. 1996, 34(24), 2674-7.
84. Boser, B. E.; Guyon, I.; Vapnik, V. N. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory 1992, 5, 144-152.
85. Burges, C. J. C. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 1998, 2, 121-167.
86. Cortes, C.; Vapnik, V. Support-Vector Networks. Machine Learning 1995, 20(3), 273-297.
87. Friedman, J. H. Multivariate Adaptive Regression Splines. Annals of Statistics 1991, 19(1), 1-141.
88. Friedman, J. H. Estimating functions of mixed ordinal and categorical variables using adaptive splines. In New Direction in Statistical Data Analysis and
156
Robustness, Morgenthaler, S.; Ronchetti, E.; Stahl, W. A., Eds. Birkhaüser: 1993; pp 73-113.
89. Schölkopf, B.; Sung, K.-K.; Burges, C. J. C.; Girosi, F.; Niyogi, P.; Poggio, T.; Vapnik, V. Comparing support vector machines with gaussian kernels to radial basis function classifiers. IEEE Trans. on Signal Processing 1997, 45, 2758-2765.
90. Weininger, D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. Journal of Chemical Information and Computational Sciences 1988, 28, 31-36.
91. Chang, C.-C.; Lin, C.-J. LIBSVM: a Library for Support Vector Machines. 2003. 92. Cherkassky, V.; Gehring, D.; Mulier, F.; Friedman, J. H.; Masters, T. XTAL
Software Package, ver. 5, University of Minnesota Electrical Engineering Dept.: Minnesota, 1995.
93. Tomizawa, K.; Sugano, K.; Yamada, H.; Horii, I. Physicochemical and Cell-Based Approach for Early Screening of Phospholipidosis-Inducing Potential. Journal of Toxicological Sciences 2006, 31(4), 315-324.
94. Ploemen, J.-P. H. T. M.; Kelder, J.; Hafmans, T.; Sandt, H. v. d.; Burgsteden, J. A. v.; Salemink, P. J. M.; Esch, E. v. Use of physicochemical calculation of pKa and ClogP to predict phospholipidosis-inducing potential. Experimental and Toxicologic Pathology 2004, 55, 347-355.
95. Fischer, H.; Kansy, M.; Potthast, M.; Csato, M. Prediction of in vitro phospholipidosis of drugs by means of their amphiphilic properties. In Rational Approaches to Drug Design, Proceedings of the 13th European Symposium on Quantitative Structure-Activity Relationships, Hoeltje, H. D.; Sippl, W., Eds. Prous Science: Barcelona, 2001; pp 286-289.
96. Lipinski, C. A.; Lombardo, F.; Dominy, B. W. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced Drug Delivery Reviews 2001, 46, 3-26.
97. Cramer, R. D., III; Patterson, D. E.; Bunce, J. D. Comparative Molecular Field Analysis (CoMFA). 1. Effect of Shape on Binding of Steroids to Carrier Proteins. Journal of the American Chemical Society 1988, 110, 5959-5967.
98. Poso, A.; Juvonen, R.; Gynther, J. Comparative molecular field analyses of compounds with CYP2A5 binding affinity. Quantitative Structure-Activity Relationships 1995, 14, 507-511.
99. Geladi, P.; Kowalski, B. Partial least squares regression: A tutorial. Analytica Chimica Acta 1986, 185, 1-17.
100. Gerlach, R. W.; Kowalski, B. R.; Wold, H. O. A. Partial least-squares path modelling with latent variables. Analytica Chimica Acta 1979, 112(4), 417-21.
101. Dijkstra, T. Latent variables in linear stochastic models: Reflections on maximum likelihood and partial least squares methods. 2nd ed.; Sociometric Research Foundation: Amsterdam, The Netherlands, 1985.
102. Green, S. M.; Marshall, G. R. 3D-QSAR: a current perspective. Trends in pharmacological sciences 1995, 16(9), 285-91.
103. Klebe, G.; Abraham, U.; Mietzner, T. Molecular similarity indices in a comparative analysis (CoMSIA) of drug molecules to correlate and predict their biological activity. Journal of Medicinal Chemistry 1994, 37, 4130-4146.
104. Clark, T.; Lin, J.-H.; Horn, A. H. C. Parasurf '07, A1; CEPOS InSilico Ltd.: 26 Brookfield Gardens Ryde, Isle of Wight PO33 3NP, 2006.
105. de Jong, S. SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems 1993, 18, 251-263.
157
106. Guccione, S.; Doweyko, A. M.; Chen, H.; Barretta, G. U.; Balzano, F. 3D-QSAR using 'Multiconformer' alignment: The use of HASL in the analysis of 5-HT1A thienopyrimidinone ligands. Journal of Computer-Aided Molecular Design 2000, 14, 647-657.
107. Allinger, N. L.; Yuh, Y. H.; Lii, J.-H. Molecular Mechanics. The MM3 Force Field for Hydrocarbons. Journal of the American Chemical Society 1989, 111(23).
108. Lanig, H.; Utz, W.; Gmeiner, P. Comparative Molecular Field Analysis of Dopamine D4 Receptor Antagonists Including 3-[4-(4-Chlorophenyl)piperazin-1-ylmethyl]pyrazolo[1,5-a]pyridine (FAUC 113), 3-[4-(4-Chlorophenyl)piperazin-1-ylmethyl]-1H-pyrrolo-[2,3-b]pyridine (L-745,870), and Clozapine. Journal of Medicinal Chemistry 2001, 44, 1151-1157.
109. Wang, R.; Gao, Y.; Liu, L.; Lai, L. All-Orientation Search and All-Placement Search in Comparative Molecular Field Analysis. Journal of Molecular Modeling 1998, 4, 276-283.
110. Zheng, M.; Yu, K.; Liu, H.; Luo, X.; Chen, K.; Zhu, W.; Jiang, H. QSAR analyses on avian influenza virus neuraminidase inhibitors using CoMFA, CoMSIA, and HQSAR. Journal of Computer-Aided Molecular Design 2006, 20, 549-566.
111. Andrews, L. E.; Banks, T. M.; Bonin, A. M.; Clay, S. F.; Gillson, A.-M. E.; Glover, S. A. Mutagenic N-Acyloxy-N-alkoxyamides: Probes for Drug-DNA Interactions. Australian Journal of Chemistry 2004, 57, 377-381.
112. Andrews, L. E.; Bonin, A. M.; Fransson, L. E.; Gillson, A.-M. E.; Glover, S. A. The role of steric effects in the direct mutagenicity of N-acyloxy-N-alkoxyamides. Mutation Research 2006, 605, 51-62.
113. Bonin, A. M.; Glover, S. A.; Hammond, G. P. A comparison of the reactivity and mutagenicity of N-benzoyloxy-N-benzyloxybenzamides. Journal of Organic Chemistry 1998, 63, 9684-9689.
114. Böhm, M.; Stürzebecher, J.; Klebe, G. Three-Dimensional Quantitative Structure-Activity Relationship Analyses Using Comparative Molecular Field Analysis and Comparative Molecular Similarity Indices Analysis To Elucidate Selectivity Differences of Inhibitors Binding to Trypsin, Thrombin, and Factor Xa. Journal of Medicinal Chemistry 1999, 42, 458-477.
115. Tropsha, A.; Cho, S. J. Cross-validated r2 guided region selection for CoMFA studies. Perspectives in Drug Discovery and Design 1998, 12/13/14, 57-69.
116. Kroemer, R. T.; Hecht, P.; Guessregen, S.; Liedl, K. R. Improving the Predictive Quality of CoMFA Models. Perspectives in Drug Discovery and Design 1998, 14, 41-56.
117. Verma, R. P.; Hansch, C. A QSAR study on influenza neuraminidase inhibitors. Bioorganic & Medicinal Chemistry 2006, 14, 982-996.
118. Doweyko, A. M. The hypothetical active site lattice. An approach to modelling active sites from data on inhibitor molecules. Journal of Medicinal Chemistry 1988, 31(7), 1396-406.
119. Andrews, P. R.; Craik, D. J.; Martin, J. L. Functional group contributions to drug-receptor interactions. Journal of Medicinal Chemistry 1984, 27(12), 1648-57.
120. Becker, O. M.; Levy, Y.; Ravitz, O. Flexibility, Conformation Spaces, and Bioactivity. Journal of Physical Chemistry B 2000, 104, 2123-2135.
121. Furnham, N.; Blundell, T. L.; DePristo, M. A.; Terwilliger, T. Is one solution good enough? Nature Structural and Molecular Biology 2006, 13(3), 184-185.
122. Günther, S.; Senger, C.; Michalsky, E.; Goede, A.; Preissner, R. Representation of target-bound drugs by computed conformers: implications for conformational libraries. BMC Bioinformatics 2006, 7, 1-11.
158
123. Kuntz, I. D.; Chen, K.; Sharp, K. A.; Kollman, P. A. The maximal affinity of ligands. Proceedings of the National Academy of Sciences of the United States of America 1999, 96, 9997-10002.
159
Curriculum Vitae
Name: Kendall Grant Byler
Birthdate: 21.05.1970
Birthplace: Huntsville, AL, United States
Education
05/03-05/07
Doctor rerum naturalium
Friedrich-Alexander-Universität, Erlangen-Nürnberg
Computer-Chemie-Centrum, Prof. Dr. Tim Clark
08/97-12/01
Master of Science
The University of Alabama in Huntsville
08/88-05/93
Bachelor of Science
The University of Alabama in Huntsville
Publications
• Byler, K.; de Groot, M. J.; Clark, T. Support Vector Classification for the Prediction of
Phospholipidosis Induction. The 20th Darmstadter Molecular Modelling Workshop Erlangen,
Germany 2006.
• Byler, K.; Ehresmann, B.; de Groot, M. J.; Clark, T. Surface-Integral QSPR Models: Local
Energy Properties. The 19th Darmstadter Molecular Modelling Workshop Erlangen,
Germany 2005.
• Lawton, R. O.; Alexander, L. D.; Setzer, W. N.; Byler, K. G. Floral essential oil of
Guettarda poasana inhibits yeast growth. Biotropica 1993, 25, 483-486.
• Setzer, W. N.; Flair, M. N.; Byler, K. G.; Huang, J.; Thompson, M. A.; Moriarty, D. M.;
Lawton, R. O.; Windham-Carswell, D. B. Antimicrobial and cytotoxic activity of crude
extracts of Araliaceae from Monteverde, Costa Rica. Brenesia 1992, 38, 123-130.
160