Download pdf - 3D-QSAR and Physical Property Modeling Using Quantum ...relationships, QSAR, molecular modeling New Drug Investigational New Drug Candidates for further development Figure 1.1 The

3D-QSAR and Physical Property Modeling Using Quantum-Mechanically-

Derived Molecular Surface Properties

A Dissertation

Kendall Byler

2007

3D-QSAR and Physical Property Modeling Using

Quantum Mechanically Derived Molecular Surface

Properties

Den Naturwissenschaftlichen Fakultäten der

Friedrich-Alexander-Universität Erlangen-Nürnberg

zur

Erlangung des Doktorgrades

vorgelegt von

Kendall Grant Byler

aus Huntsville

Als Dissertation genehmigt von den naturwissenschaftlichen

Fakultäten der Friedrich-Alexander-Universität Erlangen-Nürnberg.

Tag der mündlichen Prüfung: 11.05.2007

Vorsitzender der Promotionskomission: Prof. Dr. E. Bänsch

Erstberichterstatter: Prof. Dr. T. Clark

Zweitberichterstatter: Prof. Dr. P. Gmeiner

Acknowledgements

I would like to thank those but for whom this work would not have been

possible. The first of these is Professor Dr. Tim Clark, who provided the opportunity

and the guidance in my study of computational chemistry. And thanks go to the

members of the Clark group who helped me in my endeavors: Dr. Nico van Eikema

Hommes, Dr. Harald Lanig, Dr. Ralph Puchta, Dr. Matthias Hennemann, Matthias

Brüstle, Anselm Horn, Dr. Olaf Othersen, Dr. Gudrun Schürer, Dr. Tatyana

Shubina, Florian Haberl, Kirsten Höhfeld, Catalin Rusu, Jr-Hung Lin, Hakan Kayi,

and Sergio Sanchez. And also to members of the Gasteiger group for their

assistance: Dr. Simon Spycher, Prof. Dr. Fernando da Costa, Dimitar Hristozov, Dr.

Christof Schwab, and Dr. Thomas Engel, and of course Adrian Jung of the Kirsch

group. Thanks also to the Pfizer Corporation for their financial support of this

research.

I would thank my family: my parents, Paul and Carol Byler, my sister,

Ashley, my grandparents, Henry and Martha Snoddy, Elza and Emma Byler, and

my beautiful wife, Anastasia. And I would thank the friends everywhere that stayed

friends despite the separations of time and distance.

i

Contents

1 Introduction ................................................................................................1

1.1 Drug Discovery............................................................................................1

1.2 Property Modeling ......................................................................................3

1.3 A Quantum-Mechanical, Molecular Orbital Approach..........................4

2 Surface-Integral QSPR Models: Local Energy Properties ....................7

2.1 Introduction.................................................................................................7

2.1.1 Local Molecular Properties...............................................................8

2.1.2 Surface-Integral Models....................................................................9

2.2 Methods......................................................................................................15

2.3 Results ........................................................................................................16

2.3.1 Octanol/Water Partition Coefficient ...............................................16

2.3.2 Free Energy of Solvation ................................................................23 2.3.2.1 Free Energy of Solvation in Octanol ...................................................... 23 2.3.2.2 Free Energy of Solvation in Water ......................................................... 28

2.3.3 Acid Dissociation Constant.............................................................33

2.3.4 Boiling Point ...................................................................................36

2.3.5 Glass Transition Temperature.........................................................40

2.3.6 Aqueous Solubility..........................................................................44

2.4 Discussion...................................................................................................48

2.5 Conclusions ................................................................................................51

ii

3 Support Vector Classification of Phospholipidosis-Inducing Drugs... 52

3.1 Introduction............................................................................................... 52

3.1.1 Phospholipidosis ............................................................................. 52

3.1.2 Phospholipidosis Models ................................................................ 54

3.1.3 Surface Autocorrelations ................................................................ 56

3.1.4 Statistical Methods.......................................................................... 57 3.1.4.1 Support Vector Machines...................................................................... 57 3.1.4.2 Multivariate Adaptive Regression Splines............................................ 60

3.2 Methods...................................................................................................... 61

3.3 Results ........................................................................................................ 62

3.3.1 Support Vector Machines ............................................................... 63

3.3.2 Multivariate Adaptive Regression Splines

Using Autocorrelation Indices....................................................... 68

3.4 Discussion .................................................................................................. 70

3.5 Conclusions................................................................................................ 73

4 3D-QSAR Using Local Properties .......................................................... 74

4.1 Introduction............................................................................................... 74

4.1.1 Comparative Molecular Field Analysis .......................................... 74

4.1.2 Partial Least Squares Regression.................................................... 76

4.1.3 Local Properties .............................................................................. 77

4.2 Computational Methods........................................................................... 79

4.3 Results and Discussion.............................................................................. 80

4.3.1 Serotonin Receptor Agonists/Antagonists ...................................... 80

4.3.2 Adrenergic Receptor Agonists/Antagonists.................................... 84

4.3.3 Dopamine D4 Antagonists.............................................................. 86

4.3.4 Avian Influenza Neuraminidase Inhibitors..................................... 89

4.3.5 Mutagenic Tertiary Amides ............................................................ 92

iii

4.3.6 The Effect of Grid Orientation on Predictivity ...............................96

4.4 Conclusions ..............................................................................................101

5 Conclusions and Outlook .......................................................................103

5.1 Conclusions ..............................................................................................103

5.2 Outlook.....................................................................................................104

6 Summary .................................................................................................106

7 Zusammenfassung ..................................................................................110

Appendix A..................................................................................................114

Appendix B..................................................................................................151

References....................................................................................................152

iv

Chapter 1

Introduction

1.1 Drug Discovery

It has been estimated1 that, out of a pool of millions of compounds screened,

10,000 reach the animal testing phase, which will then likely produce ten drug candidates

for human clinical trials, of which only one will reach the market. It may also require 15

years and 750,000 U.S. dollars in the process. Drug candidates that fail late in the testing

process will never produce a return for the company that has invested so much time and

money. Pharmaceutical companies must offset these losses by recouping the expenditure

from among the several successfully tested drugs they produce.

In an effort to minimize the potential loss from focusing on compounds that will

never result in a marketable drug, much preliminary research and testing are done. The

rational drug-design approach2 to this problem begins by identifying a molecular target

involved in a pathophysiological process and characterizing its structure and function;

then begins the search for a lead compound. This is usually achieved by means of an array

of in vitro screens for biological activity. Large groups of compounds may be evaluated

simultaneously in this way and the procedure is referred to as high-throughput screening

(HTS). Once a lead compound is discovered, it may also be found to have some

undesirable properties such as high toxicity, poor bioavailability or pharmacokinetics.

Libraries of compounds may be synthesized that have modifications to the general

structure of the lead compound in an effort to modulate the desirable and undesirable

1

Introduction

effects. Structure-activity relationships (SAR’s) may be observed concurrently with the

study of the combinatorial library that point to a common chemical substructure that

produces the pharmacological effect. The medicinal chemist can then make various

modifications to the pharmacophore in order to improve its properties.

Kubinyi3 describes the drug-design process in terms of a design cycle wherein the

optimization of a lead compound is improved iteratively in an evolutionary manner4

(Figure 1.1).

BiologicalConcept

Computer-aided design:Protein crystallography, NMR, 3D databases, designde novo

Lead StructuresSeries design,

synthesis design

SynthesesBiological Testing

Structure-activityrelationships, QSAR,molecular modeling

New Drug Investigational New DrugCandidates for

further development

Figure 1.1 The drug design cycle from Kubinyi’s lectures on drug design4.

However, all of this takes quite a lot of time and the questions of clinical

development and lengthy drug approval process have yet to be addressed. Thus, to

improve the efficiency of the HT screen further, chemists use molecular-modeling

schemes to calculate properties based on chemical structure to aid in the screening

process. These virtual-screening methods include molecular-dynamics simulations,

protein-ligand docking, protein-protein docking, membrane simulation, similarity

searching of pharmacophore databases, and quantitative structure-activity relationships

2

Chapter 1

(QSAR’s). These tools allow pharmaceutical companies to screen out compounds that

possess too many undesirable characteristics before investing time in producing,

chemically analyzing, and testing. Of great interest is the elucidation of a set of

chemical/physical properties that modulate the relationship between chemical structure

and pharmacological activity that could be used to predict activity based solely on

chemical structure.

1.2 Property Modeling

The use of property modeling for the purpose of prediction has taken many

approaches. One of the first examinations of electronic effects on activity lies with

Hammett’s linear free-energy relationship5 of substituent effects on benzoic acid

hydrolysis reaction rates. He generated a series of substituent constants from a plot of the

effect on reaction rate, which could then be used in the prediction of the substituent effect

on other reaction rates. Hansch suggested6 a similar relationship between lipophilicity and

biological activity. Unless a drug is actively transported across the cell membrane, it must

passively diffuse through the membrane7, which is composed of a lipid bilayer. Thus, the

lipophilicity of a compound must have a corresponding effect on the drug’s ability to enter

the cell and produce the pharmacological effect and, indeed, this correlation between

lipophilicity and biological activity had been observed as early as the late nineteenth

century8. Since direct measurement of the solubility of compounds in cellular membranes

is difficult at best, Hansch6 approximated this property of lipophilicity by a measure of the

ratio of a compound’s solubility in n-octanol and in water as defined9 by

[ ][ ] (1

oct

aq

compoundPcompound )α

=−

(1.1)

where the term (1-α) represents the degree to which the compound dissociates in water as

calculated from its ionization constant. As some compounds are ionizable, making them

appear more soluble in water, solubility measurements in water are often performed in an

aqueous buffer and measurements taken over a pH range (logD). Substituent constants

similar to those of Hammett were used to calculate logP and logD based solely on the

3

Introduction

chemical structure. More recently atom/fragment based methods were developed10 for the

prediction of logP and logD. A more or less Gaussian distribution of logP values

correlating to the drug potency (log 1/C), with a peak value of approximately 2, has been

observed11.

Lipinski made the observation12 that a compound’s oral absorption and distribution

seemed to depend on certain structural characteristics. This is commonly referred to as the

Rule of Five and states that a compound with two or more of the following characteristics

will be poorly absorbed and distributed in the body. These are:

• A molecular weight > 500 amu.

• A logP > 5.

• More than 5 hydrogen-bond donors (sum of –OH’s and –NH’s)

• More than 10 hydrogen-bond acceptors (sum of N and O atoms)

Drugs that passively diffuse across the cell membrane tend to follow this rule, while those

that are actively transported do not depend on the same criteria of

lipophilicity/hydrophilicity and are exceptions. More recently4, the observation was made

that the absorption of drug-like molecules is regularly distributed along these properties,

bounded on one side by the rule-of-five values. The general implication of this simple

rule is that the amount of property space that needs to be sampled in order to derive

physicochemical or pharmacological properties is small. It is necessary only to discover

the particular set of molecular descriptors that describe the set of properties to be predicted

adequately. This trend in property prediction seems to be toward a reduced space

approach, which can account for complex interactions by relatively simple terms.

1.3 A Quantum-Mechanical,

Molecular Orbital Approach

Most modeling methods employ as much theory as is practicable given the system

to be studied. For example, molecular dynamics may be used to model proteins and

4

Chapter 1

protein-ligand interactions in solution, but the complexity of the system requires the use of

classical mechanics with a reduced set of non-bonded interactions, and a simplified

representation of the solvent molecules. Other approximations are made in order that the

simulation may be made in some reasonable period of time. Although the system as a

whole may be well represented, this approach often leaves interactions near a particular

site of interest poorly described13,14. This has led to the development of hybrid quantum

mechanical/molecular mechanical (QM/MM) methods15,16, which employ quantum

mechanical calculations in the regions where a higher level of theory is required, while the

bulk of the system is represented by force-field calculations. These regions are usually

those where ligands interact with protein residues in a binding pocket and quantum

mechanical methods describe electrostatic intermolecular interactions better than atomic-

monopole-based force field techniques17.

Since the point of contact for all drugs lies inevitably with the molecular surface of

both the drug and the drug target, a descriptive model of the molecular surface is needed.

The nature of this surface is electronic and quantum mechanical methods are those which

describe the electronic structure of the molecule. Quantum mechanical calculations take

into account the behavior of electrons in molecular orbitals rather than localized atomic

orbitals, whereas force field techniques must inevitably rely on atomic constants

parameterized to heats of formation. Local properties such as the molecular electrostatic

potential (MEP) have been used to describe strong non-covalent interactions that are based

primarily on charge. The MEP has been projected onto molecular isodensity surfaces to

calculate descriptors for physical property prediction by Murray and Politzer18-23.

Recently, additional local properties were described24,25 to complement the MEP and

provide a more complete description of the local electronic environment at the molecular

surface. Local properties such as polarizability24, ionization potential24,26, electron

affinity24, electronegativity27-29, and hardness29, taken together, can readily be calculated

by quantum-mechanical methods. Dispersion forces, which dominate in the case of

nonpolar molecules, may be described by calculating local molecular polarizability30. The

tertiary structure of proteins and the stability of biological membranes depend

fundamentally on these dispersion interactions between nonpolar regions31-33.

5

Introduction

Figure 1.2 Surface-integral electrostatic potential surface for paracetamol.

The use of surface-integral models33 (SIM’s) to predict physical properties by the

integration of a functional of one or more local properties over the molecular surface has

been demonstrated in the literature31,32,34. In addition to predicting physical properties,

surface-integral QSAR models may be constructed from local properties that predict

biological activities such as enzyme inhibition constants (Ki) and protein-ligand binding

(Kd) constants. These activities, used as local properties, may then be mapped onto the

molecular surface to expose regions that are significant to the observed activity. In this

way, the portions of a drug’s molecular surface important to the binding and activation of

its target may be examined as functions of both local electronic properties and local

activities concomitant with the property/activity predictions of the virtual high-throughput

screen.

6

Chapter 2

Surface-Integral QSPR Models:

Local Energy Properties

2.1 Introduction

The tools used for quantitative structure-activity relationships (QSAR),

quantitative structure-property relationships (QSPR), protein-ligand docking, and scoring

functions, among others in the cheminformatics toolbox, generally apply an atom-based

approach. In an attempt to move from this atom-based scheme to a quantum-mechanical

surface-based approach13,14,17, a local properties method has been developed to define

properties and interactions at the molecular surface. These local properties are used in

statistical models for the prediction of physical properties and biological activities in terms

of Coulomb, exchange repulsion, dispersion, and donor-acceptor interactions. The

following describes the local-property/surface-integral approach implemented by the

CEPOS InSilico program Parasurf ‘0635 used in producing QSAR/QSPR models for the

octanol-water partition coefficient (logP), the free energy of solvation in water

(ΔGsolv.(H2O)), the free energy of solvation in n-octanol (ΔGsolv.(oct.)), the acid

dissociation constant (pKa) for nitrogenous compounds, the boiling point (Tb) for organic

7

Surface-Integral QSPR Models: Local Energy Properties

compounds, the glass transition temperature (Tg) for organic polymers, and water

solubility (logS).

2.1.1 Local Molecular Properties

The electrostatic potential at the molecular surface has been examined widely22,36

as a descriptor of the electronic environment of the molecular surface and has been used to

describe the noncovalent interactions possible for a given structure. Murray and Politzer

have used the molecular electrostatic potential (MEP, V) and statistical measures derived

from it to calculate pharmacological properties by a general interaction properties

method18-23,37. Tripos’ SYBYL36 uses a calculation of MEP for use in comparative

molecular-field analyses. Additional local properties defined at the molecular surface

have recently been examined as predictors of two-electron donor-acceptor interactions in

order to describe intermolecular electronic interactions more completely 24,25.

The molecular electrostatic potential V(r) is defined as the energy resulting from

the interaction between a positive point charge with a point r on the molecular surface and

is described by the equation,

( ) ( )1 R

ni

i i

dZVρ

=

′ ′= −

′−∑ ∫r r

rr r - r

(2.1)

where n is the number of atoms in the molecule, ρ (r) is the electron-density function for

the molecule, and Zi is the nuclear charge of atom i at Ri.

The local ionization potential26 IEL(r) is a density-weighted Koopmans’ ionization

potential38 at a point r at the surface that describes the tendency of a molecule to interact

with electron acceptors (electrophilic reactivity) and is defined by

( )( )

( )1

1

HOMO

i ii

L HOMO

ii

IEρ ε

ρ

=

=

−=

∑

∑

rr

r (2.2)

where ρ i (r) is the electron density at r due to molecular orbital i, εi is its Eigenvalue.

Local electron affinity EAL is defined in an analogous Koopmans’ formulation

using the virtual orbitals and describes the tendency of a molecule to interact with electron

donors. It is defined by:

8

Chapter 2

( )( )

( )

orbs

orbs

n

i ii LUMO

L n

ii LUMO

EAρ ε

ρ

=

=

−=

∑

∑

rr

r (2.3)

Local hardness29 ηL and local Mulliken electronegativity27 χL are derived from the

two previous properties24 by:

( )2

L LL

IE EAη −= (2.4)

2

L LL

IE EAχ += (2.5)

and represent additional local properties that are readily-interpretable chemical terms.

Local polarizability αL is an occupation-weighted sum of the orbital polarizabilities

over atomic orbitals using Rivail’s variational technique39-43 in which the contribution of

each atomic orbital is determined by the electron density of the individual atomic orbital at

point r and is defined by:

( )( )

( )

1

1

1

1

orbs

orbs

n

j jj

L n

j jj

q

q

jρ αα

ρ

=

=

=∑

∑

rr

r (2.6)

where qj is the Coulson occupation, α j is the isotropic polarizability for atomic orbital j,

and density ρ j is defined as the electron density at r due to an exactly singly occupied

atomic orbital j. The five local properties used in the following regression models have

been shown to be essentially orthogonal25, with ηL correlating weakly with local ionization

potential.

2.1.2 Surface-Integral Models

The surface-integral models are defined by the general expression:

( )1

, , , ,ntri

i i i i iL L L L

iP f V IE EA α η

=

iA= ⋅∑ (2.7)

9


where P is the modeled property, f is a nonlinear function of the five local properties

where the summation is run over all ntri surface triangles which make up the molecular

surface. The individual surface properties are taken from the center of each triangle,

denoted by the superscript i, with an associated area Ai. The function f is determined by

multiple regression using pre-calculated sums of component terms as listed in Appendix

A, Table A1.

The local properties may be fitted to an isodensity or spherical-harmonic surface39.

When a spherical-harmonic approach is used, the surfaces, as well as the local properties,

are fit to a spherical-harmonic expansion of radial distances,

( ),0

cos cosN l

m ml lm l

l m lr c N Pα β mα β

= =−

= ∑ ∑ (2.8)

where (cosmlP )α are Legendre functions, Nlm are normalization factors, and l and m are

integers ( ). The number of harmonics to be used depends on the application. In

general, the higher the order of l, the incrementally tighter the surface is fitted to the

molecular framework. Spherical-harmonic fitting may only be used with a shrink-wrap

surface because the surface properties must be single-valued at any point extending

outward from the center along a radial vector.

l m l− ≤ ≤

The local properties are calculated for each of a set of triangles fitted to the surface

of the molecule. This set of tesserae may be integrated over the entire surface in order to

derive quantitative structure-activity and structure-property models. In this way the local

properties and the properties/activities derived from them, mapped to the molecular

surface, may be visualized using molecular visualization software such as GEISHA44 or

Pymol45 (See Table 2.2). In addition to QSAR/QSPR models that may be derived from

the surface-integral approach, descriptors based on various statistical features of the local

property surfaces may also be used. A set of 40 molecular descriptors derived from the

local surface properties are generated by Parasurf for use in statistical models. Models

generated using Murray-Politzer-type18, 19, 22 statistical descriptors use the general formula:

( )1 4,...,P f D D= 0 (2.9)

10

Chapter 2

These statistical descriptors are described in the following table:

Table 2.1 Parasurf ‘06 statistical descriptor set.

Descriptor Description

Dipole moment μ

Dipolar density μD

Molecular electronic polarizability α

Molecular weight MW

Globularity Glob

Molecular surface area A

Molecular volume Vol

Most positive MEP Vmax

Most negative MEP Vmin

Mean of positive MEP values V+

Mean of negative MEP values V−

Mean of all MEP values V

Range of MEP values VΔ

Total variance of positive MEP 2σ +

Total variance of negative MEP 2σ −

Total variance in MEP 2totσ

MEP balance parameter MEPν

Product of MEP balance and variance 2tot MEPσ ν

Maximum ionization potential value maxLIE

Minimum ionization potential value minLIE

Mean ionization potential value 1

1 Ni

L Li

IE IN =

= E∑

Range of ionization potential max minL L LIE IE IEΔ = −

Total variance in ionization potential 2

2

1

1 Ni

IE L Li

IE IEN

σ=

⎡ ⎤= −⎣ ⎦∑

Maximum electron affinity value maxLEA

11


Minimum electron affinity value minLEA

Mean of positive electron affinity values 1

1 Ni

L Li

EA EAN

+

+ ++=

= ∑

Mean of negative electron affinity values 1

1 Ni

L Li

EA EAN

−

− −−=

= ∑

Mean of electron affinity values 1

1 Ni

L Li

EA EAN =

= ∑

Range of electron affinity max minL L LEA EA EAΔ = −

Variance in positive electron affinity 2

2

1

1 m

EA ii

EA EAm

σ + ++

=

⎡ ⎤= −⎣ ⎦∑

Variance in negative electron affinity 2

2

1

1 n

EA ii

EA EAn

σ − −−

=

⎡ ⎤= −⎣ ⎦∑

Sum of pos., neg. variances for EA 2 2 2EAtot EA EAσ σ σ+ −= +

EA balance parameter

2 2

22

EA EAEA

EA

σ σνσ

+ −⋅=

⎡ ⎤⎣ ⎦

Fraction of surface with pos. EA EAAδ +

Mean electronegativity value Lχ

Maximum local polarizability value maxLα

Minimum local polarizability value minLα

Mean local polarizability value Lα

Range of local polarizability LαΔ

Variance in local polarizability 2ασ

Yet other property models use spherical-harmonic hybridization coefficients as

terms in the multipole regression with the same general form. The set of spherical-

harmonic terms consists of 100 hybridization coefficients Hl: 16 shape hybrids, and 21

each of V, IEL, EAL, and αL hybrids. These are defined by:

( )2mm

l li m

H c=−

= ∑ (2.10)

12

Chapter 2

When a molecular shape or a local property is fitted to the spherical-harmonic expansion,

the shape or property may be described by the hybridization coefficients in an analogous

fashion to the linear combination of atomic orbitals (LCAO).

Figure 2.1 Molecular electrostatic potential surface for N-(3-acetylphenyl)-acetamide.

Quantitative structure-property models for several physical properties have been

derived using surface-integral methods31,32, including logP as a measure of

hydrophobicity33,46-48 and solvation free energies. Several surface-integral QSPR models

employing the aforementioned local properties are presented here. This treatment is

felicitous in dealing with donor-acceptor and dispersion interactions between molecular

surfaces that play a significant role in solvation by non-polar solvents49 and protein-ligand

binding to non-polar residues.

13


Table 2.2 Local property surfaces for paracetamol calculated with Parasurf.

Electron Affinity

Electronegativity

-147 0 93 329

Hardness

Ionization Potential

200 432 320 761

Molecular Electrostatic Potential

Molecular Polarizability

-5424 1907 159 346

14

Chapter 2

2.2 Methods

Structures for the data sets assembled from the literature were converted from 2D

structures to 3D MDL SD files using Molecular Networks’ CORINA50,51. The molecular

geometries for these were then optimized with the AM1 Hamiltonian52 using VAMP 9.053.

In cases where the addition of d-orbitals improved the overall structure, the AM1*

Hamiltonian54 was used for optimization, followed by a single-point AM1 calculation in

order to retrieve essential polarizability data. The five local surface properties were

calculated for each structure by Parasurf ’0635 for either an electron isodensity surface or

a spherical-harmonic-fitted surface by either a marching-cube or shrink-wrap algorithm, as

indicated. Molecular electrostatic potentials were calculated using the zero-differential-

overlap-based atomic multipole technique and the local ionization energy, electron

affinity, and polarizability as described previously40-43,55,56. Multiple regression models

were generated with Tsar 3.357 using functions of powers and products of the five local

properties. One-hundred-fifty nonlinear product and power terms of the local properties

were generated by script and used as descriptors in a multiple regression routine

(Appendix A, Table A1). The multiple regressions were performed with the Leave Out

Groups of Three cross-validation method, using an F to enter value of 4.0 and F to leave

of 3.9, excluding variables if there is a cross-correlation greater than 0.9. A Leave One

Out method is often used with the multiple regression routine, yielding predictive r2cv

values that may be very close to corresponding r2 values. However, it is considered by the

authors of Tsar to be a better measure of predictivity in the case of stepwise regression to

leave out what amounts to a third of the data to be predicted by the remaining two-thirds

(Tsar reference guide). Individual terms used in the multiple regressions that cross-

correlated R>0.86 were excluded from the regression.

The surface-integral models obtained using the marching-cube method of

generating isodensity surfaces were fitted at an isodensity value of 0.008 e/Å3

(corresponding approximately to a van der Waals surface). For the regression models

using the shrink-wrap method of generating surfaces, including those models employing

spherical harmonic coefficients as regression terms, the local properties were fit to

spherical harmonics at an isodensity value of 0.0002 e/Å3, which, for a spherical-harmonic

15


fit, is approximately the van der Waals’ surface. The set of spherical harmonic terms

consists of 100 hybridization coefficients: 16 shape hybrids, and 21 each of V, IEL, EAL,

and local polarizability hybrids. These were also used as terms to generate linear

regression models. In this chapter, the statistical measure of the fit of the regression

models to the surface data are presented below the plots of experimental and calculated

physical property values by the regression coefficient, r2. Measures of the predictive

capacity of the models are expressed as r2cv, the cross-validated regression coefficient, the

mean unsigned error (MUE), and the root-mean-square error (RMSD) of the predictions.

2.3 Results

2.3.1 Octanol/Water Partition Coefficient

The n-octanol/water partition coefficient (here, logP) data set consists of 168

structures assembled from the literature58-60, with values ranging from -3.64 to 8.23 logP

units (Appendix A, Table A2.). The surface-integral model for logP derived from

multiple regression using the set of 150 property terms (using the marching-cube

algorithm) as starting variables yielded an 8-term regression equation using neutral

structures (including zwitterionic amino acids) and represents the best model to date. The

regression equation is given by:

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( )

5 3 3-6 82

5 213 62

216

36 2

10

(log ) 1.6967 10 4.6367 10 0.25768

5.2448 10 4.4222 10

7.7213 10

1.5978 10

1.4233 10

L

L L

L L

L L

L

f P V V

V IE

V IE

V EA

V EA

α

α η

η

α

−

− −

−

−

−

= × ⋅ + × ⋅ − ⋅⎡ ⎤ ⎡ ⎤ ⎡⎣ ⎦ ⎣ ⎦ ⎣

− × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤ ⎡⎣ ⎦ ⎣

+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

+ × ⋅ ⋅ ⋅

r r r

r r r r

r r r

r r r

r r ( )

L

⎤⎦

⎤⎦

r

52 0.1784Lα +⎡ ⎤⎣ ⎦r

(2.11)

16

Chapter 2

-2 0 2 4 6 8Experimental logPOW

-2

0

2

4

6

8

Cal

cula

ted

logP

OW

Figure 2.2 Experimental and calculated values of logP for the test set:

N=168, MUE=0.227, RMSD=0.500, r2=0.797, r2cv=0.685.

In a prior model, the amino acids phenylalanine and tryptophan were represented in their

uncharged forms, which resulted in a 7-term regression equation:

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

2 3-6 8 10

337 5 2

311 2

324 2

(log ) 6.2390 10 2.7378 10 2.2779 10

5.0736 10 1.6563 10

2.0941 10

8.5026 10

0.3042

L

L L L L

L L

L L

f P V V IE

EA

V IE EA

V IE

α α η

η

− −

− −

−

−

= × ⋅ + × ⋅ − × ⋅⎡ ⎤ ⎡ ⎤ ⎡⎣ ⎦ ⎣ ⎦ ⎣

− × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦

− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+

r r r

r r r r

r r r

r r r

3⎤⎦r

(2.12)

As can be seen in Figure 2.1, the regression statistics and the clustering of points improves

slightly with the use of the zwitterionic forms of these amino acids.

17



-2

0

2

4

6

8

Cal

cula

ted

logP

OW

Figure 2.3 Neutral logP set with non-zwitterionic amino acids:

N=168, r2=0.782, r2cv=0.656, MUE=0.238, RMSD=0.516.

Another regression model was generated for the same set using ionized structures for

those ionized >50% at pH=7.0 as calculated by pKa, giving the 10-term equation:

( ) ( ) ( ) ( )

( ) ( )

( ) ( ) ( ) ( )

( ) ( )

3 5 32 25 7

3 3

5 314 182

25 7

(log ) 8.3660 10 5.4673 10 2.2713 10

4.4369 10 5.2747 10

6.0514 10 4.4293 10

1.4686 10 9.7195 10

L L

L L L

L L

f P V V V

IE EA

V IE IE

EA EA

η

α

− −

− −

− −

− −

⎡ ⎤ ⎡ ⎤= − × ⋅ + × ⋅ + × ⋅8− ⎡ ⎤⎣ ⎦⎣ ⎦ ⎣ ⎦

+ × ⋅ − × ⋅

− × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦

+ × ⋅ ⋅ + × ⋅⎡ ⎤⎣ ⎦

r r r

r r

r r r r

r r ( ) ( )( ) ( ) ( )81.2682 10 0.02807

L L

L LV IE EA

η−

⋅

+ × ⋅ ⋅ ⋅ +

r r

r r r

r

(2.13)

18

Chapter 2

-1 1 3 5 7Experimental logPOW

-1

1

3

5

7C

alcu

late

d lo

gPO

W

Figure 2.4 Surface-integral model for logP using compounds charged by pKa at pH=7:

r2=0.729, r2cv=0.145, MUE=0.252, RMSD=0.576.

Trifluopromazine is an outlier in this model, and the r2cv statistic improves significantly

with its removal, as do the MUE and RMSD. The resulting model predicted poorly,

however, exhibiting a negative value for r2cv, so the number of cross-validation sets was

reduced from ten (standard for these models) to six in order to generate a model with

better statistics. This gives the 9-term equation:

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )

3 52 24 7

39 3 2

3 52 25 6

318 8

(log ) 1.0814 10 2.2696 10

9.0847 10 7.0265 10 1.5322 10

4.0126 10 1.1769 10

7.1156 10 1.1316 10

L L

L L

L L L

f P V V

V IE

EA V

IE V IE

α

η

− −

− − −

− −

− −

⎡ ⎤ ⎡ ⎤= − × ⋅ + × ⋅⎣ ⎦ ⎣ ⎦

+ × ⋅ + × ⋅ − × ⋅⎡ ⎤⎣ ⎦

⎡ ⎤ ⎡ ⎤+ × ⋅ + × ⋅ ⋅⎣ ⎦ ⎣ ⎦

− × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤⎣ ⎦

r r r

r r

r r r

r r r r ( )0.08667

LEA⋅

−

r

EA r

(2.14)

19


-1 1 3 5 7Experimental logPOW

-1

1

3

5

7

Cal

cula

ted

logP

OW

Figure 2.5 “Charged” logP model with outlier removed: r2=0.735, r2cv=0.573, MUE=0.151, RMSD=0.437.

This greatly improves the predictivity of the model, while not improving the

regression statistic as much. The use of charged structures diminished the predictivity of

the models and it was decided that their inclusion in these QSAR/QSPR models was not

useful. By virtue of the fact that the local properties are calculated in the gas phase, where

no solvent shielding may occur, the impact of ionization on target values as derived from

the regression models might be exaggerated.

A model using the 40 statistical descriptors as starting variables yielded a 10-term

equation:

( )( )

(log ) 0.5690 43.14 ( ) 0.1467 0.1577

10.45 0.0130 max 0.1397 0.0342

7.056 min( ) 14.84 38.72MEP L

f P MEP

IE EA

μ ρ μ α

ν χ

α α

+

−

= − ⋅ + ⋅ + ⋅ − ⋅

+ ⋅ − ⋅ + ⋅ − ⋅

+ ⋅ − ⋅ +

r

(2.15)

20

Chapter 2


-2

0

2

4

6

8

Cal

cula

ted

logP

OW

Figure 2.6 Linear regression model for logP using statistical descriptors:

N=168, MUE=0.775, RMSD=0.996, r2=0.743, r2cv=0.635.

It is evident that, although the regression statistics are comparable to the best nonlinear

model, the predictive capacity of this model is not quite as good, with a root mean square

error of nearly 1 logP unit.

The regression model for logP using spherical-harmonic hybridization coefficients

is comprised of 11 terms:

( ) 1 4 1 3

4 9 10 11

21 1

(log ) 0.8771 1.389 0.0459 0.0584

0.1117 0.2729 0.0404 0.1722

6.808 8.842 5.786

R R MEP MEP

MEP MEP IEL EAL

EAL

f P H H H H

H H H

H Hα

= ⋅ + ⋅ − ⋅ − ⋅

− ⋅ − ⋅ + ⋅ + ⋅

+ ⋅ − ⋅ −

r

H (2.16)

21



-2

0

2

4

6

8

Cal

cula

ted

logP

OW

Figure 2.7 Regression model for logP using spherical-harmonic hybridization coefficients:

N=168, MUE= 0.756, RMSD= 0.966, r2= 0.759, r2cv=0.516.

The surface-integral model using a spherical harmonic-fitted surface gives the 4-term

equation:

( )( ) ( ) ( ) ( )

( ) ( )

3 53 1 12 2

317

log 5.668 10 2.345 10 2.154 10

9.453 10

L L

L L

f P V

IE EA

α α− − −

−

= − × ⋅ + × ⋅ − × ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣

+ × ⋅ ⋅⎡ ⎤⎣ ⎦

r r r

r r

⎦r (2.17)

Overall, the model with the best regression coefficient and best predictivity in terms of

r2cv, MUE, and RMSD was the surface-integral model that used the local properties from

the marching-cube surface and included the amino acids in their zwitterionic form. There

was not much variation among the several models in the r2 fit of the surface properties,

although the RMS error varied by nearly ½ of a logP unit.

22

Chapter 2


-2

0

2

4

6

8

Cal

cula

ted

logP

OW

Figure 2.8 Surface-integral model for logP using spherical-harmonic-fitted surface:

N=162, MUE= 0.770, RMSD= 0.963, r2= 0.745, r2cv=.0.662.

2.3.2 Free Energy of Solvation

2.3.2.1 Free Energy of Solvation in Octanol

The surface-integral model for the free energy of solvation in n-octanol

(ΔGsolv.(oct.)) was generated using the 165 compounds in Table A3 of Appendix A, taken

from Ehresmann, et al.34. The resulting regression equation is comprised of 17 terms:

23


( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( )

32-2 -4 -2

3-16 -3

3-4 -13

32-4 -9

( )( ) 1.3705 10 3.9476 10 6.8874 10

5.5092 10 1.0796 10

1.1937 10 1.1179 10

3.5384 10 5.2971 10

solv L

L L

L L

L

octf G V V

V IE V EA

V EA V EA

V

α

α

⎡ ⎤Δ = × ⋅ − × ⋅ − × ⋅⎣ ⎦

+ × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤⎣ ⎦

+ × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦

⎡ ⎤+ × ⋅ ⋅ − ×⎣ ⎦

r r r

r r r r

r r r r

r r ( ) ( )

( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( )

r

( ) ( ) ( )

3

3 3-17 -21

-5 -8

32-12

-4

-17

1.6949 10 3.9527 10

3.6011 10 4.7541 10

2.8234 10

2.7129 10

6.5137 10

L L

L L L L

L L L L

L L

L L

IE

IE V IE EA

V IE V IE

V IE

V EA

α

η

α η

η

α

⋅ ⋅⎡ ⎤⎣ ⎦

− × ⋅ ⋅ + × ⋅ ⋅ ⋅⎡ ⎤ ⎡⎣ ⎦ ⎣− × ⋅ ⋅ ⋅ + × ⋅ ⋅ ⋅⎡ ⎤ ⎡⎣ ⎦ ⎣

⎡ ⎤+ × ⋅ ⋅ ⋅⎣ ⎦− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

− ×

r r

r r r r r

r r r r r r

r r r

r r r

( ) ( ) ( )

⎤⎦⎤⎦

( ) ( ) ( ) ( )

52

52-162.0072 10 0.3585

L L

L L L

V EA

V EA

η

α η

⎡ ⎤⋅ ⋅ ⋅⎣ ⎦

⎡ ⎤+ × ⋅ ⋅ ⋅ ⋅ +⎣ ⎦

r r r

r r r r

(2.18)

-12 -10 -8 -6 -4 -2 0 2Experimental ΔGsolv(octanol) (kcal mol-1)

-12

-10

-8

-6

-4

-2

0

2

Cal

cula

ted

ΔG

solv(o

ctan

ol) (

kcal

mol

-1)

Figure 2.9 Experimental and calculated free energies of solvation in octanol for the training set:

N=165, MUE=0.569, RMSD=0.713, r2=0.914, r2cv=0.798.

In order to examine the effect of optimal conformation in solution, the structures

from the same data set were (AM1) geometry-optimized using the conductor-like

24

Chapter 2

screening model61 (COSMO) of Klammt and Schüürmann with a bulk dielectric constant

(EPS) of 10. The surface-integral model using these structures yields an 11-term equation:

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

3 1

21 3

325 7

3 313 17

( )( ) 5.0064 10 5.5198 10

3.6463 10 2.7085 10

2.3985 10 7.3706 10

1.1133 10 1.8658 10

1.2607

solv L L

L L

L L

L L

octf G IE

V EA

V EA V EA

V EA IE

α

α

η

− −

− −

− −

− −

Δ = × ⋅ − × ⋅

+ × ⋅ + × ⋅ ⋅⎡ ⎤⎣ ⎦

L

⎡ ⎤+ × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦ ⎣ ⎦

+ × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣− ×

r r r

r r r

r r r r

r r r r

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( ) ( )

⎦5

8

5216

10

1.8956 10

1.2349 10 0.0834

L L

L L

L L L

V IE

V IE

V EA

α

η

α η

−

−

−

⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

⎡ ⎤+ × ⋅ ⋅ ⋅ ⋅ +⎣ ⎦

r r r

r r r

r r r r

(2.19)

-12 -10 -8 -6 -4 -2 0 2

Experimental ΔGsolv(octanol) (kcal mol-1)

-12

-10

-8

-6

-4

-2

0

2

Cal

cula

ted

ΔG

solv(o

ctan

ol) (

kcal

mol

-1)

Figure 2.10 Surface-integral model for free energy of solvation in octanol using the COSMO-optimized

training set: N=165, MUE=0.648, RMSD=0.841, r2=0.875, r2cv=0.816.

The use of the COSMO-optimized structures reduces the predictivity of the surface-

integral model in terms of the mean unsigned error and RMS error as seen in Figure 2.10.

25


A regression model using spherical-harmonic hybrid coefficients yields an 18-term

equation:

( )( )( ) 1 1 4

7 11 18

2 4 6

11 17 12

1.003 0.058 0.071 0.143

0.156 0.557 1.670 0.024

0.013 0.010 0.015 0.018

0.042 0.127 0.169 0.26

5

1

7

solv R MEP MEP MEP

MEP MEP MEP IEL

IEL IEL IEL IEL

IEL IEL EAL

f G oct H H H H

H H H

H H H H

H H H

Δ = − ⋅ + ⋅ − ⋅ −

− ⋅ − ⋅ − ⋅ − ⋅

− ⋅ + ⋅ + ⋅ + ⋅

− ⋅ + ⋅ + ⋅ −

r

16

1 3

2

4.943 8.343 53.187

H

⋅

EALH

H Hα α

⋅

− ⋅ + ⋅ +

(2.20)

-10 -6 -2 2Experimental ΔGsolv(octanol) (kcal mol-1)

-10

-6

-2

2

Cal

cula

ted

ΔG

solv(o

ctan

ol) (

kcal

mol

-1)

Figure 2.11 Experimental and calculated free energies of solvation in octanol using hybrid coefficients:

N=165, MUE=0.636, RMSD=0.813, r2=0.889, r2cv=0.704.

This model has similar statistics to the surface-integral model using the marching-cube

surface, but the RMS error increases by 0.1 kcal·mol-1. The surface-integral model using a

spherical-harmonic-fitted surface yields the 12-term equation:

26

Chapter 2

( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )

22

314 5

55 215 62

8

4

1

( )( ) 0.1315 8.01 10

2.195 10 8.852 10

3.266 10 4.793 10

2.059 10

1.413 10

7.322 10

solv L L

L L

L L L L

L L

L L

octf G

V IE V EA

IE EA

V IE

V EA

α α

η α

η

α

−

− −

− −

−

−

−

Δ = − ⋅ + × ⋅ ⎡ ⎤⎣ ⎦

− × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤⎣ ⎦

⎡ ⎤− × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤⎣ ⎦ ⎣ ⎦+ × ⋅ ⋅ ⋅

− × ⋅ ⋅ ⋅

+ ×

r r r

r r r r

r r r r

r r r

r r r

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

31

324

5

329

7.020 10

9.707 10

1.792 10 3.126

L L

L L L

L L L

L L L

V EA

IE EA

V EA

V EA

α

η

α η

α η

−

−

−

⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

+ × ⋅ ⋅ ⋅ ⋅

⎡ ⎤− × ⋅ ⋅ ⋅ ⋅ +⎣ ⎦

r r r

r r r

r r r r

r r r r

(2.21)

-12 -10 -8 -6 -4 -2 0 2Experimental ΔGsolv(octanol) (kcal mol-1)

-12

-10

-8

-6

-4

-2

0

2

Cal

cula

ted

ΔG

solv(o

ctan

ol) (

kcal

mol

-1)

Figure 2.12 Surface-integral model for free energy of solvation in octanol using spherical harmonic surface:

N=165, MUE=0.719, RMSD=0.924, r2=0.865, r2cv=0.729.

Here again, the regression statistics are not quite as good for the surface-integral model

employing a spherical-harmonic surface as compared with the case of the marching-cube

surface. There is also an accompanying increase in RMS error of ~0.2 kcal·mol-1.

27


2.3.2.2 Free Energy of Solvation in Water

The data set presented in Table A4 of Appendix A for the free energy of solvation

in water (ΔGsolv.(H2O)) was assembled from 384 compounds in the literature62-64. The

regression equation for the free energy of hydration surface-integral model is comprised of

21 terms:

( ) ( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )

( )

2-2 -52

3-9

5 32-7 -16

52-2 -6

-15

( )( ) 2.0192 10 4.3422 10

6.7859 10 0.3486

8.9213 10 1.2382 10

2.2924 10 3.1783 10

1.4154 10

solv

L L

L L

L L

L L

H Of G V V

IE

V IE

V V

IE EA

α

η

α α

Δ = × ⋅ − × ⋅ ⎡ ⎤⎣ ⎦

− × + ⋅⎡ ⎤⎣ ⎦

⎡ ⎤+ × ⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦⎣ ⎦

⎡ ⎤− × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦ ⎣ ⎦

+ × ⋅ ⋅

r r r

r r

r r r

r r r r

r ( )

( ) ( ) ( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( )

3

55 2-7 -52

3-15

3-22

2-10

-8

-3

5.0232 10 5.7349 10

2.5633 10

1.1298 10

8.3577 10

3.3400 10

5.7742 10

L L L L

L L

L L

L L

L L

L L

IE EA

EA

V IE EA

V IE

V IE

V EA

α α

η

α

η

α

⎡ ⎤⎣ ⎦

⎡ ⎤− × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦ ⎣ ⎦

− × ⋅ ⋅⎡ ⎤⎣ ⎦

− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

− × ⋅ ⋅ ⋅

r

r r r r

r r

r r r

r r r

r r r

r r r

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )

-5

52-11

3-14

32-12

3-20

1.5065 10

1.5836 10

2.4093 10

7.9139 10

1.2174 10 0.2167

L L L

L L L

L L L

L L L

L L L

IE EA

IE EA

IE EA

IE EA

V EA

α

α

α

η

α η

+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

⎡ ⎤+ × ⋅ ⋅ ⋅⎣ ⎦

− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

⎡ ⎤− × ⋅ ⋅ ⋅⎣ ⎦

+ × ⋅ ⋅ ⋅ ⋅ −⎡ ⎤⎣ ⎦

r r r

r r r

r r r

r r r

r r r r

(2.22)

28

Chapter 2

-100.0 -72.5 -45.0 -17.5 10.0Experimental ΔGsolv(H2O) (kcal mol-1)

-100.0

-72.5

-45.0

-17.5

10.0

Cal

cula

ted

ΔG

solv(H

2O) (

kcal

mol

-1)

Figure 2.13 Experimental and calculated free energies of solvation in water for the training set given in

Table A4: N=384, MUE= 0.727, RMSD= 1.503, r2= 0.983, r2cv=0.825.

As can be seen in Figure 2.13, the predictivity suffers somewhat from the inclusion of the

charged species, resulting in a lever effect on the regression such that the whole set of

structures cannot be fitted with the same robustness as either the charged or uncharged

portions. Using only the neutral compounds (N=362) from the data set (Appendix A,

Table A4, rows 1-362) in a surface-integral model results in a 17-term equation:

29


( ) ( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

382

32 27

3 213 5

3 37 15

( )( ) 8.385 10 0.0194 1.232

1.032 7.411 10

3.829 10 2.527 10

1.329 10 5.618 10

4.288 10

solv L L

L L

L L

L L

H Of G V IE

V EA

V EA V

V V

α

α

α

α η

−

−

− −

− −

Δ = − × ⋅ ⋅ − ⋅⎡ ⎤⎣ ⎦

⎡ ⎤+ ⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦ ⎣ ⎦

+ × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤ ⎡⎣ ⎦ ⎣

+ × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤ ⎡⎣ ⎦ ⎣

− ×

r r r

r r r

r r r r

r r r r

( ) ( )( ) ( ) ( )

( ) ( ) ( )

⎤⎦

⎤⎦

r

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

317

7

5218

6

3212

326

312

1.828 10

7.725 10

9.213 10

3.365 10

1.550 10

2.357 10

9.339 10

L L

L L

L L

L L

L L

L L

L L

IE

V IE EA

V IE EA

V IE

V IE

V EA

V EA

η

α

η

α

α

−

−

−

−

−

−

−

⋅ ⋅⎡ ⎤⎣ ⎦− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

⎡ ⎤− × ⋅ ⋅ ⋅⎣ ⎦− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

⎡ ⎤+ × ⋅ ⋅ ⋅⎣ ⎦

⎡ ⎤+ × ⋅ ⋅ ⋅⎣ ⎦

− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+ ×

r r

r r r

r r r

r r r

r r r

r r r

r r r

( ) ( ) ( ) ( )7 0.5539L L LV EA α η− ⋅ ⋅ ⋅ ⋅ −⎡ ⎤⎣ ⎦r r r r

(2.23)

-10 -5 0 5Experimental ΔGsolv(H2O) (kcal mol-1)

-10

-5

0

5

Cal

cula

ted

ΔG

solv(H

2O) (

kcal

mol

-1)

Figure 2.14 Experimental and calculated free energies of solvation in water for

the uncharged components of the training set given in Table A4:

N=362, MUE= 0.789, RMSD= 1.031, r2= 0.891, r2cv= 0.845.

30

Chapter 2

This gives a model with similar regression statistics, but with an improved RMS

predictivity of ½ a kcal·mol-1 of solvation free energy. When this data set, minus two

outliers, was optimized with the COSMO solvation model (EPS=80.0), the result was the

13-term equation:

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )

22

324 7

317

5

8

( )( ) 0.0165 0.8593 0.4845

8.100 10 4.357 10

0.0152 4.485 10

5.086 10

4.071 10

9.0

solv L L L

L L

L L L

L L

L L

H Of G IE

V EA V EA

V IE

V IE

V IE

α α

α η

α

η

− −

−

−

−

Δ = ⋅ − ⋅ + ⋅ ⎡ ⎤⎣ ⎦

⎡ ⎤+ × ⋅ ⋅ − × ⋅ ⋅⎣ ⎦

+ ⋅ ⋅ − × ⋅ ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+

r r r r

r r r r

r r r r

r r r

r r r

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )

5

24

316

5217

77 10

9.522 10

3.271 10

5.327 10 0.9146

L L

L L L

L L L

L L L

V EA

IE EA

IE EA

V EA

α

η

η

α η

−

−

−

−

× ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

⎡ ⎤+ × ⋅ ⋅ ⋅ ⋅ −⎣ ⎦

r r r

r r r

r r r

r r r r

(2.24)


-10

-5

0

5

Cal

cula

ted

ΔG

solv(H

2O) (

kcal

mol

-1)

Figure 2.15 Experimental and calculated free energies of solvation in water for the

unchargedcomponents using theCOSMO solvation model (EPS=80.0):

N=360, MUE= 0.922, RMSD= 1.139, r2= 0.862, r2cv= 0.805.

31


Thus, the use of structures geometry-optimized with the COSMO model reduce the

predictivity again by 0.1 kcal·mol-1. The best regression model using spherical-harmonic

hybridization coefficients yielded a 21-term equation with poor regression statistics and is

not presented here (MUE=1.47, RMSD=2.14). The surface-integral model using a

spherical harmonic-fitted surface is defined by the 18-term equation:

( ) ( ) ( )

( ) ( )( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( )

2 34 62

523 7

13 3

324 6

312

( )( ) 1.545 10 8.236 10

5.710 10 5.879 10

0.247 1.651 10

1.054 10 1.043 10

4.746 10 5.11

solv

L L

L L

L L

L

H Of G V V

IE EA

V IE

V EA V EA

V EA

α

− −

− −

−

− −

−

Δ = × ⋅ + × ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦

⎡ ⎤+ × ⋅ − × ⋅ ⎣ ⎦− ⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦

⎡ ⎤+ × ⋅ ⋅ − × ⋅ ⋅⎣ ⎦

+ × ⋅ ⋅ +⎡ ⎤⎣ ⎦

r r r

r r

r r r

r r r r

r r ( ) ( )

( ) ( ) ( ) ( )

( ) ( )

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

525

536 15 2

36

3211

4

326

10

3 10

7.387 10 6.237 10

1.639 10

3.699 10

3.319 10

5.071 10

1.036 10

L

L L

L L

L L

L L

L L

L L

V

V IE

EA

V IE

V EA

V EA

V EA

α

α η

α

η

α

α

α

−

− −

−

−

−

−

−

L

⎡ ⎤× ⋅ ⋅⎣ ⎦

− × ⋅ ⋅ − × ⋅ ⋅⎡ ⎤ ⎡⎣ ⎦ ⎣

− × ⋅ ⋅⎡ ⎤⎣ ⎦

⎡ ⎤− × ⋅ ⋅ ⋅⎣ ⎦− × ⋅ ⋅ ⋅

⎡ ⎤+ × ⋅ ⋅ ⋅⎣ ⎦

+ × ⋅ ⋅ ⋅

r r

r r r r

r r

r r r

r r r

r r r

r r r

( ) ( ) ( ) ( )

3

3172.508 10 1.003L L LV EA α η−

⎡ ⎤⎣ ⎦

− × ⋅ ⋅ ⋅ ⋅ +⎡ ⎤⎣ ⎦r r r r

⎤⎦

(2.25)

32

Chapter 2


-10

-5

0

5

Cal

cula

ted

ΔGso

lv(H

2O) (

kcal

mol

-1)

Figure 2.16 Surface-integral model for free energy of solvation in water for the

uncharged structures and using thespherical-harmonic-fitted surface:

N=360, MUE= 0.929, RMSD= 1.18, r2= 0.842, r2cv= 0.781.

Again, the spherical-harmonic-fitted surface manages to affect the local property space

such that predictivity is decreased (~0.2 kcal·mol-1) and the model with the best

predictivity was the surface-integral model using the marching-cube surface. With this

appearing to be a trend, the surface-integral models presented hereafter are comprised only

of the marching-cube surface-fitted local properties.

2.3.3 Acid Dissociation Constant

A surface-integral model was generated for pKa using the data set in Table A5 of

Appendix A, consisting of 268 nitrogenous compounds taken from the article by Tehan, et

al.65 on pKa estimation, which is comprised of primary and secondary amines, anilines,

and pyridines. The regression equation has 23 terms:

33


( ) ( )

[ ] [ ][ ]

32 3 4 2

56 32

536 2

315 3

7

6.979 10 ( ) 6.469 10 ( ) 3.829 10 ( )

4.278 10 ( ) 8.326 10 ( )

1.552 10 ( ) 4.6127 ( )

3.0124 10 ( ) ( ) 7.818 10 ( ) ( )

7.530 10 ( )

a

L

L L

L L

f pK V V V

V IE

IE

V IE V EA

V

α

− − −

− −

−

− −

−

= − × ⋅ + × ⋅ + × ⋅ ⎡ ⎤⎣ ⎦

− × ⋅ ⎡ ⎤ + × ⋅⎣ ⎦

− × ⋅ − ⋅

+ × ⋅ ⋅ + × ⋅ ⋅

− × ⋅ ⋅

r r r

r r

r r

r r r r

r

r

[ ] [ ]

[ ]

[ ][ ]

3 5112 2

313 2

3 33 72

57 2

5

( ) 9.208 10 ( ) ( )

2.379 10 ( ) ( ) 4.549 10 ( ) ( )

1.522 10 ( ) ( ) 5.233 10 ( ) ( )

9.698 10 ( ) ( )

9.348 10 ( ) ( ) ( )

4.1

L L

L L

L L

L L

L L

EA V EA

V EA V

V EA V

IE

V IE

α

α

α

α

−

− −

− −

−

−

⎡ ⎤ + × ⋅ ⎡ ⋅⎣ ⎦ ⎣ ⎦

+ × ⋅ ⋅ − × ⋅ ⋅

+ × ⋅ ⎡ ⋅ ⎤ − × ⋅ ⋅⎣ ⎦

+ × ⋅ ⋅

+ × ⋅ ⋅ ⋅

−

r r

r r r r

r r r r

r r

r r r

[ ]

⎤r

[ ][ ][ ][ ]

8

519 2

323

312

213

319

31 10 ( ) ( ) ( )

2.534 10 ( ) ( ) ( )

7.872 10 ( ) ( ) ( )

6.732 10 ( ) ( ) ( )

9.212 10 ( ) ( ) ( ) ( )

5.947 10 ( ) ( ) ( ) ( ) 4.

L L

L L

L L

L L

L L L

L L L

V IE

V IE

V IE

V EA

V EA

V EA

η

η

η

α

α η

α η

−

−

−

−

−

−

× ⋅ ⋅ ⋅

+ × ⋅ ⎡ ⋅ ⋅ ⎤⎣ ⎦

− × ⋅ ⋅ ⋅

+ × ⋅ ⋅ ⋅

− × ⋅ ⋅ ⋅ ⋅

− × ⋅ ⋅ ⋅ ⋅ +

r r r

r r r

r r r

r r r

r r r r

r r r r 9512

(2.26)

-4 0 4 8 12Experimental pKa

-4

0

4

8

12

Cal

cula

ted

pKa

Figure 2.17 Experimental and calculated pKa values for the training set:

N=268, MUE= 1.03, RMSD= 1.339, r2= 0.841, r2cv=0.767.

34

Chapter 2

The authors performed separate regressions for each class of nitrogenous compound (i.e.

amines, anilines, pyridines, etc.) and report regression statistics for each class. These

values range from the low values of r2=0.55, r2cv=0.54 for nitrogenous heterocycles

(N=150) to high values of r2=0.94, r2cv=0.94 for a combined set of anilines and amines

(N=132). The reported regression equations are comprised of a constant and a single term

(electrophilic superdelocalizability)65.

Using the standard statistical descriptor output of Parasurf gave a model with four

terms:

( )( ) 2 1

1

4.599 10 1.281 10 3.065 10

1.200 10 12.451

a L

L

2f pK MEP MEP IE

EA

+− −

−−

= × ⋅ − × ⋅ − × ⋅

− × ⋅ +

r −

(2.27)

-3.5 -1.0 1.5 4.0 6.5 9.0 11.5Experimental pKa

-3.5

-1.0

1.5

4.0

6.5

9.0

11.5

Cal

cula

ted

pKa

Figure 2.18 Experimental and calculated pKa values using statistical descriptors:

N=268, MUE= 1.32, RMSD= 1.678, r2= 0.769, r2cv=0.736.

The regression statistics for the surface-integral model are only slightly better than that

obtained for the statistical descriptor model above and, considering the need for only four

linear terms (versus 23 nonlinear terms), this model may lend itself more easily to physical

35


interpretation. The major drawback comes in the form of an increase in RMS error of 0.34

pKa units.

2.3.4 Boiling Point

The surface-integral model for the boiling point data set, which was taken from

Syracuse Research Corporation’s PHYSPROP database66 and consisting of 1642

compounds and using the marching-cube surface, has 17 terms:

( )( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( )

( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )

32 5 5

22 5

38

319

4

8

8.018 10 4.143 10 0.786 10

2.195 10 3.922 10

7.287 10

4.727 10

3.316 10

2.515 10

b L

L L

L L

L L

L L

L L

f T V V

V EA IE

IE

V IE EA

V IE EA

V IE

α

α

α

α

− − −

− −

−

−

−

−

= × ⋅ − × ⋅ + × ⋅⎡ ⎤⎣ ⎦

− × ⋅ ⋅ + × ⋅ ⋅ L⎡ ⎤⎣ ⎦

− × ⋅ ⋅⎡ ⎤⎣ ⎦

− × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+ × ⋅ ⋅ ⋅

− × ⋅ ⋅ ⋅⎡⎣

r r r

r r r r

r r

r r r

r r r

r r r

( ) ( ) ( )

( ) ( ) ( )

r

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

( ) ( )

2

7

3210

320

527

5219

512 2

11

7.114 10

2.367 10

3.463 10

1.071 10

6.296 10

1.032 10

9.686 10

L L

L L

L L

L L

L L L

L L L

L L

V IE

V IE

V IE

V EA

IE EA

IE

EA

η

η

η

α

α

α η

α η

−

−

−

−

−

−

−

⎤⎦− × ⋅ ⋅ ⋅

⎡ ⎤+ × ⋅ ⋅ ⋅⎣ ⎦

+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

⎡ ⎤− × ⋅ ⋅ ⋅⎣ ⎦

⎡ ⎤− × ⋅ ⋅ ⋅⎣ ⎦

+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

− × ⋅ ⋅ ⋅

r r r

r r r

r r r

r r r

r r r

r r r

r r ( )

( ) ( ) ( ) ( )

52

2101.165 10 65.41

L

L L LV EA α η−

⎡ ⎤⎣ ⎦

+ × ⋅ ⋅ ⋅ ⋅ −⎡ ⎤⎣ ⎦

r

r r r r

(2.28)

36

Chapter 2

0 100 200 300 400 500Experimental Tb (°C)

0

100

200

300

400

500

Cal

cula

ted

T b (°

C)

Figure 2.19 Surface-integral model for boiling point using the marching-cube surface:

N=1642, MUE= 22.2, RMSD= 33.9, r2= 0.740, r2cv=0.574.

When 19 outliers are removed from the data set, the resulting regression model possesses

19 terms:

( )( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

322

2 3

3 325 10

5 5210 13 2

0.235 0.106 1.023 10

6.926 5.028 1.529 10

6.297 10 1.126 10

2.656 10 1.144 10

8.485

b L

L L L

L L

L L

f T V V V IE

V EA

V EA V EA

V IE

α α

η η

−

−

− −

− −

⎡ ⎤= ⋅ + ⋅ − × ⋅ −⎣ ⎦

+ ⋅ − ⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦

⎡ ⎤+ × ⋅ ⋅ − × ⋅ ⋅

L

⎡ ⎤⎣ ⎦⎣ ⎦

⎡ ⎤+ × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤⎣ ⎦⎣ ⎦

+ ×

r r r r

r r r

r r r r

r r r r

( ) ( ) ( ) ( )

r

r

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )

536 6 2

3

7

320

3

324

16

10 3.106 10

2.101 10

9.788 10

3.008 10

3.397 10

1.313 10

1.131 10

L L L L

L L

L L

L L

L L

L L

L L L

EA

V IE EA

V IE

V IE

V IE

V IE

V EA

α α

η

η

α

α

α η

− −

−

−

−

−

−

−

⋅ ⋅ + × ⋅ ⋅⎡ ⎤ ⎡⎣ ⎦ ⎣

− × ⋅ ⋅ ⋅

− × ⋅ ⋅ ⋅

+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦+ × ⋅ ⋅ ⋅

⎡ ⎤− × ⋅ ⋅ ⋅⎣ ⎦

+ × ⋅ ⋅ ⋅ ⋅⎡

r r r r

r r r

r r r

r r r

r r r

r r r

r r r r3

87.97−⎤⎣ ⎦

η ⎤⎦

(2.29)

37


0 100 200 300 400Experimental Tb (°C)

0

100

200

300

400

Cal

cula

ted

T b (°

C)

Figure 2.20 Surface-integral model for boiling point with outliers removed:

N=1623, MUE=26.0, RMSD=35.1, r2= 0.752, r2cv=0.728.

Both regression statistics are improved, but somewhat ironically, the prediction error

increases with the removal of the outliers: MUE:+3.8 and RMSD: +1.2 degrees Celsius.

The linear regression model using spherical-harmonic hybrid coefficients for the boiling

point data set has 29 terms:

( ) 1 3 4 6

2 5 7 8

13 15 19 1

2 4 6 7

( ) 47.52 13.05 8.54 35.30 31.27

1.39 1.61 1.88 2.86

16.39 19.35 17.13 0.44

0.35 0.26 0.19 0.46 0.27

b R R R R

MEP MEP MEP MEP

MEP MEP MEP IEL

IEL IEL IEL IEL

f T H H H H H

H H H H

H H H H

H H H H H

= ⋅ + ⋅ − ⋅ − ⋅ + ⋅

+ ⋅ + ⋅ + ⋅ + ⋅

+ ⋅ + ⋅ − ⋅ + ⋅

+ ⋅ + ⋅ + ⋅ − ⋅ − ⋅

r

8

9 10 17 1

7 9 15 1

0.87 0.50 1.57 0.49 0.49

1.03 2.10 3.08 61.41 378.51143

7R

2

7

IEL

IEL IEL IEL EAL EAL

EAL EAL EAL

H H H H H

H H H H Hα α

− ⋅ − ⋅ − ⋅ − ⋅ − ⋅

− ⋅ − ⋅ + ⋅ + ⋅ − ⋅

−

(2.30)

38

Chapter 2

-50 50 150 250 350 450 550Experimental Tb (°C)

-50

50

150

250

350

450

550

Cal

cula

ted

T b (°

C)

Figure 2.21 Regression model for boiling point using spherical-harmonic hybrid coefficients:

MUE= 24.6, RMSD= 34.6, r2= 0.779, r2cv=0.742.

When the set of 40 statistical descriptors is used, the regression model for the boiling point

data set has 16 terms:

( ) max

2 min

max

max 2

( ) 23.19 7523 11.45 0.3103 1.163

12.65 8.179 0.7882 0.2502

0.9949 1.251 0.7031 1.285

127.9 449.5 757.2 699.2

b D

L

L L L

L L

L

f T M

V V IE

EA EA EA

α

μ μ α

σ

χ

α α σ

+ +

−

= ⋅ − ⋅ + ⋅ − ⋅ + ⋅

+ ⋅ − ⋅ − ⋅ − ⋅

− ⋅ − ⋅ + ⋅Δ + ⋅

− ⋅ + ⋅ + ⋅ −

r W V

(2.31)

39


-5 95 195 295 395 495 595Experimental Tb (°C)

-5

95

195

295

395

495

595C

alcu

late

d T b

(°C

)

Figure 2.22 Regression model for boiling point using statistical descriptors:

MUE= 25.3, RMSD= 36.8, r2= 0.750, r2cv=0.733.

With an average RMS error of 35.1°C for the set of property models, none of the

individual models predicts well enough to be used in a practical application, but rather,

they serve to highlight the limitations of the method and confirm the intuitive notion that,

at or near the boiling point, where a percentage of molecules are entering the gas phase,

the collective set of local properties cease to describe well the interactions between

molecules in terms of molecular surface properties.

2.3.5 Glass Transition Temperature

Glass transition temperature (Tg) is the temperature at which amorphous materials

change from a somewhat crystalline phase to a liquid phase and is used as a measure of

the thermal failure limit for organic light-emitting diodes67-69 (OLED’s). The surface-

integral model using the marching-cube surface for the glass transition temperature was

40

Chapter 2

generated from a set of 73 OLED materials in Table A6 of Appendix A, assembled from

the literature70. The resulting regression equation has 4 terms:

( )( ) ( ) ( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( )

5 324 20

315

327

2.608 10 2.336 10

3.215 10

1.078 10 255.98

g L L

L L L

L LL

f T V V IE

IE

EA

η

α η

α η

− −

−

−

⎡ ⎤= − × ⋅ − × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦⎣ ⎦

+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

⎡ ⎤− × ⋅ ⋅ ⋅ +⎣ ⎦

r r r r

r r r

r r r

r

(2.32)

300 350 400 450Experimental Tg (°C)

300

350

400

450

Cal

cula

ted

T g (°

C)

Figure 2.23 Experimental and calculated glass transition temperatures for the training set:

N=73, MUE= 16.8, RMSD= 22.5, r2= 0.690, r2cv=0.582.

Using lower F statistic values (for individual terms to enter and to leave the regression

equation) in the multiple regression results in an equation with more terms and a slightly

improved r2 value, but also yields a much less predictive model (r2cv approaches zero).

The COSMO-optimized data set (using a bulk dielectric constant value of 10.0 for n-

octanol) yields a model with 12 terms:

41


( )( ) ( ) ( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( )

( ) ( ) ( )

322 3

536 3 2

324 2

35

320

0.286 5.091 10 9.633 10

6.251 10 3.745 10 5.833

7.761 10 2.385 10

3.135 10

5.192 10

2.52

g

L L

L L

L

L L

f T V V V

V EA

V EA V

V

V IE EA

α

α

α

− −

− −

− −

−

−

⎡ ⎤= − ⋅ − × ⋅ + × ⋅ ⎣ ⎦

+ × ⋅ − × ⋅ + ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦

⎡ ⎤− × ⋅ ⋅ − × ⋅ ⋅⎣ ⎦

− × ⋅ ⋅⎡ ⎤⎣ ⎦

+ × ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

+

r r r

r r

r r r r

r r

r r r

( ) ( ) ( )

( ) ( ) ( )

313

310

1 10

1.843 10 252.25

L L

L L

V IE

V EA

α

α

−

−

× ⋅ ⋅ ⋅⎡ ⎤⎣ ⎦

+ × ⋅ ⋅ ⋅ +⎡ ⎤⎣ ⎦

r r r

r r r

r

r

(2.33)


300

350

400

450

Cal

cula

ted

T g (°

C)

Figure 2.24 Surface-integral model for glass transition temperature using COSMO-optimized structures:

MUE= 15.3, RMSD= 18.7, r2= 0.779, r2cv=0.491.

The predictivity of this model is comparable to the previous one, with an improvement in

RMS error of 2.8. The same data set (not using COSMO-optimized structures) was used

to generate a regression model using the statistical descriptors, which yielded a 9-term

equation:

42

Chapter 2

( )( ) max

min max 2

1.253 6.112 7.938 4.544 1.221

0.334 6.220 0.125 6160 A840.6

g L

L L EA EA

f T V V V

IE EA

α

σ δ+ −

+−

= ⋅ − ⋅ − ⋅ + ⋅ − ⋅

+ ⋅ + ⋅ + ⋅ − ⋅+

r IE

(2.34)


300

350

400

450

Cal

cula

ted

T g (°

C)

Figure 2.25 Regression model for glass transition temperature using statistical descriptors:

MUE=12.7, RMSD= 15.7, r2= 0.844, r2cv=0.521.

This model possesses better regression statistics, with a significantly improved prediction

error, compared with that of the surface-integral model. The regression model using the

spherical-harmonic hybrid coefficients yields an equation with only two terms that

predicts very poorly (MUE = 22.2, RMSD = 28.5):

( )( ) 16 4144.05 2.982 285.1g R MEPf T H H= ⋅ + ⋅ +r (2.35)

43


300 350 400 450Experimental Tb (°C)

300

350

400

450C

alcu

late

d T b

(°C

)

Figure 2.26 Regression model for glass transition temperature using hybridization coefficients:

MUE=22.2, RMSD=28.5, r2= 0.501, r2cv=0.224.

The best-predicting property model here uses the statistical descriptor set and has an RMS

of 15.7°C, which represents roughly 10% of the range of the Tg values in the data set,

which is rather large (the best boiling point model predicts within ~6% of its range). But

here again, the local properties are being used to predict a phase change – the point at

which the forces dictating the arrangement of molecules cease to apply in the same

manner.

2.3.6 Aqueous Solubility

The aqueous solubility data set in Table 1.6 of Appendix A is a small subset of 589

compounds taken from The University of Arizona’s AQUASOL database71. Given that

the solubility values were in some 100 various units, all values were converted to standard

molarity units (moles/liter, M ) and the logarithm (base 10) taken as target values (logS).

44

Chapter 2

The regression equation for the surface-integral model derived using the marching-cube

surface consists of 11 terms:

( ) ( ) ( ) ( )

( ) ( )

( ) ( ) ( )( ) ( )

( ) ( )

( ) ( )( ) ( )

323 5

34 2

5 3142

3

527

319

5

log 2.086 10 3.768 10

1.803 10 0.338

0.391 3.643 10

5.224 10

4.336 10

9.093 10

1.257 10

L L

L L

L

L

L L

L

f S V V

EA

V EA

V

V

IE

V IE

α

α

α

α

η

− −

−

−

−

−

−

−

⎡ ⎤= × ⋅ + × ⋅ ⎣ ⎦

− × ⋅ − ⋅ ⎡ ⎤⎣ ⎦

+ ⋅ + × ⋅ ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣− × ⋅ ⋅

⎡ ⎤− × ⋅ ⋅⎣ ⎦

+ × ⋅ ⋅⎡ ⎤⎣ ⎦+ × ⋅ ⋅

r r r

r r

r r

r r

r r

r r

r r ( )( ) ( ) ( )81.478 10 0.8576

L

L LV IE

α

η−

⋅

− × ⋅ ⋅ ⋅ −

r

r r r

⎦r

(2.36)

-8 -6 -4 -2 0 2Experimental logS (H2O)

-8

-6

-4

-2

0

2

Cal

cula

ted

logS

(H2O

)

Figure 2.27 Surface-integral model for logS using the marching cube surface:

N=589, MUE= 0.844, RMSD= 1.14, r2= 0.578, r2cv=0.411.

45


The regression model using the spherical-harmonic hybrid coefficients yielded an 18-term

equation:

( )( ) ( ) ( )

( ) ( )

( ) ( )

( ) ( ) ( )

( ) ( ) ( ) ( )

( )

1 1

3 321 1

2 2

321 1

21 2

2

log 4.206 10 3.416 10

2.993 10 6.651 10

2.272 10 2.797 10

1.447 10 8.008 10

7.902 10 1.113 10

2.251 10

L L

L L

L L

f S V V

V V

V IE

V IE V

V

α α

η

α

− −

− −

− −

− −

− −

−

= − × ⋅ − × ⋅

⎡ ⎤− × ⋅ + × ⋅ ⎡ ⎤⎣ ⎦⎣ ⎦

+ × ⋅ + × ⋅

⎡ ⎤− × ⋅ + × ⋅ ⋅⎣ ⎦

+ × ⋅ ⋅ + × ⋅ ⋅⎡ ⎤⎣ ⎦

− × ⋅

r r r

r r

r r

r r r

r r r r

r ( ) ( ) ( )

( ) ( )

( ) ( )

( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( )

32

522

22

21

32

3

4.405 10

1.044 10

3.205 10

1.564 10

1.920 2.457

59.728 2.055

L L

L L

L L

L L

L L L L

L L

V

IE EA

IE

EA

EA EA

V IE EA

η η

α

α

η η

−

−

−

−

⋅ − × ⋅ ⋅⎡ ⎤⎣ ⎦

⎡ ⎤+ × ⋅ ⋅⎣ ⎦

+ × ⋅ ⋅⎡ ⎤⎣ ⎦

− × ⋅ ⋅⎡ ⎤⎣ ⎦

⎡ ⎤− ⋅ ⋅ + ⋅ ⋅⎣ ⎦

+ ⋅ ⋅ ⋅ +⎡ ⎤⎣ ⎦

r r

r r

r r

r r

r r r r

r r r

r

(2.37)


-8

-6

-4

-2

0

2

Cal

cula

ted

logS

(H2O

)

Figure 2.28 Regression model for logS using spherical-harmonic hybrid coefficients:

N=589, MUE= 0.960, RMSD= 1.27, r2= 0.469, r2cv=0.333.

46

Chapter 2

The regression model using the statistical descriptors has 14 terms:

( )( ) ( ) ( )

( ) ( )

( ) ( )

( ) ( )

( ) ( ) ( )

3 321 3 2

522 2 2

24 3

21 2

52 2

log 1.135 10 6.618 10

4.049 10 2.219 10

2.853 10 5.296 10

1.709 10 4.315 10

1.342 10 12.75

L

L L

L L

L L

L L

f S V IE

IE IE

EA

V IE

α

η η

η

− −

− −

− −

− −

−

⎡ ⎤= − × ⋅ − × ⋅ ⎡ ⎤⎣ ⎦⎣ ⎦

+ × ⋅ − × ⋅⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦

− × ⋅ + × ⋅⎡ ⎤⎣ ⎦

− × ⋅ − × ⋅ ⎡ ⎤⎣ ⎦

⎡+ × ⋅ + ⋅ ⋅⎡ ⎤⎣ ⎦ ⎣

r r

r r

r r

r r

r r

r

r

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

32

2

3 22

3.718 9.269

1.949 43.79

6.998

L L

L L

V IE V EA

V EA V EA

⎤⎦

+ ⋅ ⋅ − ⋅ ⋅⎡ ⎤⎣ ⎦

⎡ ⎤+ ⋅ ⋅ + ⋅ ⋅⎡ ⎤⎣ ⎦⎣ ⎦−

r r r r

r r r r

(2.38)


-8

-6

-4

-2

0

2

Cal

cula

ted

logS

(H2O

)

Figure 2.29 Regression model for logS using statistical descriptors:

N=589, MUE= 0.791, RMSD= 1.07, r2= 0.624, r2cv=0.528.

As in the case of glass transition temperature, the statistical descriptor set again gives the

best regression model in terms of RMS error (~1 logS unit). The regression statistics,

however, are not encouraging. In comparison to the commercially-available product

ACD/Labs’ Solubility DB72, the RMS error of our best model performs only as well as the

outside range of RMS errors in the prediction of a set of test compounds73,74.

47


2.4 Discussion

As described in Ehresmann, et al.34, the construction of the surface-integral models

involves two approximations: that the target properties may be treated using a sum of local

surface values and that gas-phase electron densities from semiempirical calculations can

be used to represent properties that, in bulk, depend on the presence of a polar medium.

The free energy of solvation models themselves give reliable estimates, although they are

not as accurate as the most reliable methods available63,75. It should be noted that this

surface-integral approach relies on the gas-phase electron densities and optimized

structures from semiempirical calculations and can only include solute polarization

implicitly by the local polarizability for the molecular surface. It was also found that the

use of COSMO-optimized structures by this method results in models of slightly lower

predictive power.

In order to evaluate the predictivity of our model, the logP data set was first

predicted using KOWWIN 1.67, included in the U.S. Environmental Protection Agency’s

property estimation package, EPISUITE 3.11. KOWWIN is a Windows implementation

of Syracuse Research Corporation’s LogKow10, which is an atom/fragment-based

method for estimating logP that was trained with 2,410 compounds, using 175 fragment

groups (r2 = 0.98). The statistics reported for a 13,058 compound validation set are as

follows: a standard deviation of 0.436, an MUE of 0.316 and an r2 of 0.95. The SD files

for our logP training set were converted to their corresponding SMILES strings and run as

a batch job. This test set yielded a mean signed error of 0.396, a mean unsigned error

(MUE) of 1.172, and a root mean square (RMS) error of 2.046. As a second trial and a

rough measure of the comparative prediction accuracy of the two methods, the logP values

of a small set of 17 recognizable biological structures taken from Exploring QSAR59 were

predicted using both methods (Table 2.3). The KOWWIN model yielded a mean

unsigned error of 1.12 and an RMS error of 1.61. Our local property model performed

slightly worse, with a mean unsigned error of 1.40 and an RMS error of 1.82.

From previous regressions, a tendency of the models to predict poorly for logP

values at or below zero logP units was observed and attributed to an under-representation

by the data set of compounds much more soluble in water than in n-octanol (since

48

Chapter 2

compounds with logP values at the other end of the solubility spectrum also presented a

similar prediction error).

Table 2.3 Results of KOWWIN and Parasurf logP predictions.

No. Compound Exp. KOWWIN Parasurf

1 estradiol 4.01 1.55 3.02

2 imipramine 4.80 5.01 4.78

3 pentazocine 3.31 5.03 4.65

4 rifampin 1.32 2.08 2.15

5 vincristine 2.57 3.11 3.83

6 digitoxin 1.68 2.04 3.61

7 terfenadine 3.22 7.62 7.87

8 sufentanil 3.24 3.62 3.22

9 colchicine 1.3 1.86 1.64

10 tetracycline -1.44 -0.18 0.57

11 hexetidine 2.00 5.26 5.30

12 Δ9-tetrahydrocannabinol 6.97 7.6 5.84

13 yohimbine 2.73 2.11 2.86

14 quinine 2.64 3.29 4.52

15 acyclovir -1.56 -2.41 0.20

16 diazepam 2.99 2.70 3.83

17 codeine 1.14 1.28 2.54

The principal disadvantage of the local-property/surface-integral method lies with

the application to small regions of the molecular surface as having a well-defined local

values in the property units. The projected property value is not a true measure of the

actual property value at a given point, but rather, is an index of it, describing over the

whole of the surface the variations in the property. This abstraction also applies to the

local properties themselves. It has been noted24 that EAL does not, in fact, represent a real

electron affinity, even within the definition of Koopmans’ theorem, but rather is a local

indicator of electron-accepting regions on the molecular surface - regions that are likely to

be the site of nucleophilic attack. The local electronegativity also does not correspond to a

real electronegativity based on chemical potential37. Another disadvantage of the method

49


is the lack of an obvious physical interpretation of the regression equation. The surface-

integral models are nonlinear relationships between a physical property and a set of

surface properties (or indices) that require an understanding of the terms within their own

context or a method of relating local property variations to chemical structural features.

Figure 2.30 Surface-integral-modeled logP surface of decylsulfonic acid.

These drawbacks notwithstanding, the models themselves approach the predictive

ability of accepted, commonly-used prediction methods, with the added benefit of

allowing the researcher to visualize the physical property surface for a given structure, as

well as use the surface-mapped physical property as a local property in itself. The quality

of prediction of actual physical property values at the point of using surface-integral-

derived physical properties would be dubious at best. Rather, the use of these “extended”

properties would only be appropriate in the capacity of a surface descriptor in a statistical

analysis or a classification scheme.

50

Chapter 2

2.5 Conclusions

The quantitative structure-property models presented here represent a shift towards

a completely wave-function, molecular-orbital-based approach to QSAR/QSPR prediction

using surface-integral models. One particularly attractive feature of the surface-integral

technique is the ability to use the predicted property (or activity) as a local property itself.

The property in question is defined as a local property and projected onto an isodensity

surface for visualization or for further statistical analysis. Thus, not only are the surface-

integral models QSPR/QSAR prediction methods, but they are also indicators of the

molecular surface features that contribute to the particular property. With further

investigation into the physical meanings of surface-integral models in terms of the local

properties of which they are comprised, any physical property or biological activity that is

a function of molecular surface interactions can be predicted and visualized by this

method.

51

Chapter 3

Support Vector Classification of

Phospholipidosis-Inducing Drugs

3.1 Introduction

3.1.1 Phospholipidosis

Drug-induced phospholipidosis is a physiopathological condition characterized by

the appearance of microscopic subcellular structures, called lamellar bodies or lysosomal

inclusion bodies that contain primarily large deposits of undegraded phospholipids. The

lysosomal bodies aggregate inside the cells of the lungs, liver, kidneys, corneas, and brain

and their presence often coincides with adverse clinical effects such as inflammatory

reactions and fibrosis, although the relationship is as yet unexplained. Indeed, the onset of

phospholipidosis may or may not be associated with a presentation of adverse

symptoms76. It is nevertheless well-documented that drug-induced phospholipidosis does

affect cellular function by impairing lysosomal protein degradation, membrane fusion, and

pino- and endo-cytosis77,78. For this reason it is desirable of pharmaceutical companies to

develop a screen for phospholipidosis induction.

52

Chapter 3

The most common feature of the drugs that induce phospholipidosis is that they are

both cationic and amphiphilic: they have a positively-charged water-soluble portion and a

hydrophobic portion. Referred to, therefore, as cationic amphiphilic drugs (CAD’s), these

compounds are found accumulated inside the lysosomal bodies along with aggregated

phospholipids. It is thought that the CAD’s pass into the lysosomal compartment where

the pH is low and become trapped. By virtue of being weakly basic, they are protonated

so that they cannot pass back through the phospholipid bilayer79. This may also be a

defense mechanism of the cell to protect itself against exogenous xenobiotics and

metabolites.

While the molecular mechanism of phospholipidosis induction is not presently

known, there are two basic hypotheses. The first involves the CAD’s binding directly to

phospholipids, resulting in indigestible complexes that are stored in the lysosomal lamellar

bodies79. The other hypothesis takes note of the concomitant inhibition of phospholipase

activity in the lamellar bodies. The CAD’s can inhibit phospholipases either by binding

directly to them, or if the concentration of the CAD’s becomes high enough, they may

effectively raise the pH such that phospholipase function becomes impaired, resulting in

an accumulation of phospholipids that cannot be degraded78.

Working from either of these hypotheses it should be possible to elucidate some

molecular surface features common among the drugs that promote the induction of

phospholipidosis in order to establish the relationship between biological activity and

structure (or surface properties). The challenge comes in that, while most drugs that

induce phospholipidosis are cationic and amphiphilic, not all cationic amphiphilic

compounds induce phospholipidosis. Thus, the most defining characteristics of the drug

class may not be useful in classifying its activity. In fact, these properties may completely

overshadow other aspects of the molecular surface more pertinent to the function of

CAD’s as phospholipase inhibitors or as sites of phospholipid complexation/aggregation.

Given that the octanol-water partition coefficient (P, or in this case logP), itself an index

of hydrophobicity32, has been shown to be additive in terms of the solvent-accessible

surface80, local properties such as the ones described in the previous chapter (Equations

2.1 – 2.6) that were used in the logP surface-integral model should be particularly useful

in predicting phospholipidosis induction provided there is a sufficiently rigorous statistical

method available to classify compounds based on their molecular surface property values.

53

Support Vector Classification of Phospholipidosis-Inducing Drugs

3.1.2 Phospholipidosis Models

Our recent work with phospholipidosis prediction was performed using a 144-

compound data set of structures with a positive assay for phospholipidosis induction as

determined by transmission electron microscopy81, which was provided by Anne Tilloy-

Ellul (Pfizer Global R&D, Amboise Laboratories, France) and Marcel de Groot (Pfizer

Global R&D, Sandwich Laboratories, UK). This data set, shown in Table 3.1, was

divided in half, with one portion to be used as a training set, and the other as a test set by

sorting the provided set by the Parasurf-calculated free energies of solvation in octanol

(ΔGsolv(oct.)) and placing every other compound into either the training set or the test set.

Thus, the training set of 72 structures consists of 44 positives and 28 negatives, and the

test set of 72 compounds consists of 42 positives and 30 negatives. The two compounds

carbon tetrachloride and valproic acid were duplicates in the complete data set and are in

both the training set and test set. The basic approach undertaken to classify these data

according to the likelihood of inducing phospholipidosis in assay involved the use of two

statistical methods, each applied to two types of descriptor sets of local properties. The

first set of property descriptors used were the statistical descriptors described in the

previous chapter and shown in Table 2.2. The second used consisted of surface

autocorrelation indices, which had recently been implemented in Parasurf ’06 and will be

described briefly in the following section.

Table 3.1 The phospholipidosis training set (l) and test set (r).

Training Set Inducer Test Set Inducer

17-a-ethynylestradiol Negative 3-methylcholanthrene Negative 1-amino-4-octylpiperazine Positive AC-3579 Positive 1-chloro-10,11-dehydroamitriptyline Positive acetaminophen Negative

1-chloroamitriptyline Positive amikacin Positive 6-hydroxydopamine Positive amiodarone Positive abacavir Negative amitriptyline Positive amineptine Negative anticoman Negative amodiaquine Positive aricept Negative azaserine Negative AY-25329 Negative azithromycin Positive AY-9944 Positive bilirubin Positive bicalutamide Negative brompheniramine Positive boxidine Positive caffeine Negative bupropion Negative carbamazepine Negative carbon tetrachloride Negative carbon tetrachloride Negative ceftazidime Negative ceftazidime Negative cephaloridine Positive

54

Chapter 3

chloroquine Positive chlorcyclizine Positive chlorpromazine Positive chloroform Negative chlortetracycline Negative chlorphentermine Positive ciprofibrate Negative citalopram Positive clociguanil Positive clindamycin Positive clofibrate Negative clomipramine Positive colchicine Negative clozapine Positive cyclizine Positive cocaine Positive cyproterone acetate Negative coralgil Positive desipramine Positive dantrolene Negative (d-)H-4,4-bis-diethylaminoethoxy-diethylphenylethane Positive demeclocycline Negative

dibucaine Positive desferal Negative erythromycin Positive dibekacin Positive etoposide Negative diclofenac Negative felbamate Negative diflunisal Negative fenofibrate Negative di-isobutamide Positive fluoxetine Positive doxapram Negative galactosamine Negative doxycycline Negative gemfibrozil Negative emetine Positive gentamicin Positive ethyl fluclozepate Positive hydroxyzine Positive famotidine Negative hypoglycin-A Negative fenfluramine Positive iprindole Positive flutamide Negative ketoconazole Positive homochlorcyclizine Positive lysergide Positive hydrazine Negative methotrexate Negative hydroxyurea Negative methyldopa Negative IA3 Positive norchlorcyclizine Positive imipramine Positive nortriptyline Positive indoramin Positive phenacetin Positive (l)-ethionine Negative pheniramine Positive maprotiline Positive phenobarbital Negative meclizine Positive phentermine Positive mesoridazine Positive physostigmine Negative metformin Negative piroxicam Negative methadone Negative quinine Positive methapyrilene Negative R-800 Positive mianserin Positive RMI10393 Positive netilmicin Positive SC-45864 Probable noxiptiline Positive stilbamidine Positive paraquat Positive sulindac Negative perhexiline Positive suramin Positive procaine Negative tamoxifen Positive promethazine Positive temozolomide Negative propranolol Positive tetracaine Positive quinacrine Positive thioacetamide Negative quinidine Positive tilorone Positive rolitetracycline Negative tobramycin Positive SDZ_200-125 Positive tocainide Positive SKF-14336-D Positive trimeprazine Positive stavudine Negative trimethoprim sulfamethoxazole Positive tacrine Negative trospectomycin sulfate Positive trifluperazine Positive valproic_acid Negative triparanol Positive WY-14643 Negative tunicamycin Positive zidovudine Negative valproic_acid Negative zileuton

Negative

zimelidine

Positive

55


3.1.3 Surface Autocorrelations

Surface autocorrelations are cross-correlations of various surface property values

that have been shifted by some distance on the molecular surface. Introduced by

Gasteiger82,83 as descriptors for use in molecular binding studies, they are used to discover

periodic patterns or fundamental harmonics that may not be apparent by inspection. There

are six general autocorrelation functions implemented in Parasurf ’06 that describe local

molecular electrostatic potential (3 functions), shape (1 function), local ionization

potential (1 function), and local electron affinity (1 function) cross-correlations. These are

used in the general vector equation:

( ) ( )2

1 1

1 tri trij

n nR ri

iji j itri

A R en

σω − −

= = +

= ∑ ∑ (3.1)

where rij is the distance between surface points and ωij is one of the four autocorrelation

functions. Each autocorrelation vector has 128 elements, starting at a radius of 2.5Å with

increments of 0.06Å.

The three MEP functions that describe the three possible sets of cross products are

defined in the following table.

Table 3.2 Molecular electrostatic functions used in surface autocorrelations.

Plus-plus MEP autocorrelation,

VPP

ij i jV Vω = × where (Vi > 0 and Vj > 0)

0.0ijω = where (Vi <0 or Vj < 0)

Minus-minus MEP autocorrelation,

VMMij i jV Vω = × where (Vi < 0 and Vj < 0)

Plus-minus MEP autocorrelation,

VPM

ij i jV Vω = − × where (Vi × Vj < 0)

0.0ijω = where (Vi × Vj > 0)

56

Chapter 3

Similarity indices may be calculated for data sets by comparison with a reference structure

by

( ) ( )( )

( ) ( )( )1 2

1 1 2

2 min ,1 Ni i

i i i

A R A RS

N A R A R=

⋅=

+∑ (3.2)

for ( ) ( )1 21

0N

i ii

A R A R=

⎡ ⎤+ >⎢ ⎥⎣ ⎦∑

where A1(Ri) is the value of the autocorrelation function for molecule 1 at a distance Ri and

N is the number of points within the defined range of R for which the sum is non-zero.

The similarity indices are calculated for the range of each of the autocorrelation functions,

as well as for the first four quartals of the distance range for each of the functions.

3.1.4 Statistical Methods

The principal statistical method used to predict phospholipidosis induction from

molecular surface descriptors was the Support Vector Machine84-86 (SVM), with a

multivariate adaptive regression splines87,88 (MARS) method used to compare the

prediction accuracy of the best SVM models. In the case of a small difference in

prediction accuracy between the training set and the test set for the support vector

machines and a relatively large difference in prediction accuracy for the regression splines

models, over-fitting of the data by the SVM’s would be assumed.

3.1.4.1 Support Vector Machines

The Support Vector Machine (SVM) is a machine-learning technique for

classification that involves a non-linear mapping of data into a high-dimensional feature

space, then using structural risk management to find a separating hyperplane with the

largest margin between the transformed data. These learning machines have been shown

to classify with an accuracy at least as good as the various neural net methods85.

57


Margin

H2

H1

w

Origin

-b|w|

Figure 3.1 Linear maximum-margin hyperplane with circled support vectors.

The method for solving the maximum-margin hyperplane (Figure 3.1) problem involves a

minimization of the Lagrangian formulation

( )2

1 1

12

l l

P i i i ii i

L y bα α= =

≡ − ⋅ + +∑ ∑w x w (3.3)

where w is a vector normal to the hyperplane, /b w is the perpendicular distance from

the hyperplane to the origin, and αi are Lagrange multipliers with i=1, … , l, one for each

input vector, subject to the constraint, ( ) 1 0i iy b⋅ + − ≥x w , i∀ . In the convex quadratic

programming solution of the maximum and the minimum LP, the training data are mapped

into dot product feature space (represented here by the vector pair, xi and xj) after

requiring that the gradient of LP be subject to:

i i ii

yα= ∑w x (3.4)

and

0i ii

yα =∑ (3.5)

in the following “dual formulation” equation:

,

12D i i j i j

i i j

L yα α α= − ⋅∑ ∑ i jx xy (3.6)

58

Chapter 3

In the nonlinear case this is a difficult solution until one employs the “kernel trick”84 to

express the dot products and the mappings Φ into the Hilbert space:

: dΦ ℜ ⇒ H (3.7)

in terms of some kernel function of the form

( ) ( ) ( )i j i jK = Φ ⋅Φx ,x x x (3.8)

By this method, wherein the dot products are defined in the new space as a single function,

it becomes unnecessary to determine Φ explicitly. This is especially useful in the case of

the commonly-used radial basis function:

( )2

i j

i jK e γ− −= x xx ,x (3.9)

(for γ>0), which renders H infinite-dimensional.

In C-SVC, or C-support vector classification86, the construction of the optimal

hyperplane involves the minimization of the functional

, , 1

12min

lT

ib i

Cξ

ξ=

⎧ ⎫+⎨ ⎬⎩ ⎭

∑w

w w (3.10)

subject to ( )( ) 1Ti iy x b iφ ξ+ ≥ −w and ξi ≥ 0, where w again represents the hyperplane

vector, b is a bias term, and the summation term represents the sum of deviations, ξ, of

training errors and maximizes the margin for the correctly classified vectors. If the

training data can be separated without errors, then the constructed hyperplane corresponds

to the optimal margin hyperplane86. And by varying the value of the C term in the

expression, one can vary the trade-off between the complexity of the decision surface and

the frequency of error, in effect “tuning” the SVM’s ability to generalize.

Another algorithm for constructing the hyperplane is the ν-SVC method89, which

uses a parameter, ν, to set the upper bound on the fraction of training errors and a lower

bound on the fraction of support vectors. The formulation of this method is seen in the

formulation

, , , 1

1 1min2

lT

ib ilξ ρνρ ξ

=

⎧ ⎫− +⎨ ⎬⎩ ⎭

∑ww w (3.11)

59


subject to ( )( )Ti iy x b iφ ρ ξ+ ≥ −w and ξi ≥ 0, ρ≥ 0, where l is the number of input of

input vectors, b is a bias term, and φ(xi) is the feature space mapping.

The expectation value of the probability of error is bounded by the ratio of the

expectation value of the number of support vectors to the number of training vectors,

expressed as:

Pr( ) sv

tv

nerror

n≤ (3.12)

where nsv is the number of support vectors and ntv is the number of training vectors. So, if

the optimal hyperplane can be constructed from a small number of support vectors relative

to the training set size, then the ability of the SVM to generalize will be high. This is true

even when the feature space is infinite dimensional, since the complexity of the learning

algorithm does not depend on the dimensionality of the feature space, but on the number

of support vectors86.

3.1.4.2 Multivariate Adaptive Regression Splines

The second classification method applied to the surface property data was the

multivariate adaptive regression splines technique87, which generates a prediction model

by selecting a weighted sum of basis functions from the set of basis functions spanning all

descriptor values, then adding basis functions to the predictive equation on a least squares

goodness-of-fit criterion, according to the general equation:

( ) ( )0

M

m mm

f Bα=

= ∑x x (3.13)

where BBm(x) represents the set of basis functions and αm is a real number indicating the

contribution of the basis function:

( ) ( )( ),1

mK

m km v k mk

B b x=

= ∏x (3.14)

where v(k,m) is an index of the factor used as the argument of bkm.

The basis functions here are categorical two-sided truncated linear functions used to

approximate the response to predictor variance:

60

Chapter 3

( )( ) ( )( )( )( ) ( )( )

, ,

, ,

km kmv k m v k m

km kmv k m v k m

b x I x A

b x I x A

+

−

= ∈

= ∉

(3.15)

(3.16)

with I(xv(k,m)) representing an indicator function having a value of one if true and zero if

false. Akm is a subset of the possible values of xv(k,m).

Both of the multivariate analysis techniques described (support vector machines and

multivariate adaptive regression splines) are readily applied to problems of high

dimensionality, where the number of predictors is large and the variance within the

predictors tends to add noise to underlying correlations.

3.2 Methods

The Pfizer set of 144 canonical SMILES90 (Table 3.1) were converted to 3D

structures with CORINA50, 51 and then geometry-optimized in the gas phase using the

AM1 Hamiltonian in VAMP 9.0. The molecular surface properties were calculated and

mapped with Parasurf ‘06, using the marching-cube algorithm to fit to an isodensity

surface of 8.0×10-3 e·Å-1. Parasurf’s 40 standard statistical descriptors (Chapter 2, Table

2.2) were augmented with three physical property descriptors calculated from multiple

regression models for logP, the free energy of solvation in octanol ΔGsolv.(oct.), and the

free energy of solvation in water ΔGsolv.(H2O), as described in the chapter on surface-

integral models. These 43 descriptors were then used to train the C-support vector

classification (C-SVC) and ν-support vector classification (ν-SVC) machines using Chang

and Lin’s libsvm91. The data set was linearly scaled from –1 to +1 to prevent single

descriptor domination of the training and a radial basis function (RBF) was used with

adjustable parameters, C and γ, determined by libsvm included cross-validation and grid

searching routine. The grid search is an automated trial of (C,γ) pairs that runs until a best

cross-validation accuracy is reached. The γ parameter is the multiplier in the RBF

exponential (Equation 3.9) and the C parameter corresponds, in the libsvm authors’

61


formulation for the soft margin hyperplane minimization expression, to an upper bound

applied to the sum of training errors (Equation 3.10).

libsvmmachine

support vector ParaSurf ‘06Surface Descriptors

CORINA3D Structures

VAMP 9.0Geometry Optimization

Figure 3.2 General processing pathway from SMILES strings to phospholipidosis prediction.

The University of Minnesota’s XTAL package92 was used for the training of

multivariate adaptive regression splines. Piecewise-linear splines were used with a

varying maximum number of basis functions to be used and the number of interactions set

to the number of predictors in each case. A Leave One Out cross-validation scheme was

used with the training parameters varied individually until a minimum RMS error was

achieved.

3.3 Results

In the following section, predictions of the SVM and MARS models are presented

in the form of confusion matrices, with the actual value of positive or negative with

respect to the induction of phospholipidosis in rows with italicized text labels and the

predicted values in columns with plus (+) and minus (−) symbols as labels.

62

Chapter 3

3.3.1 Support Vector Machines

The support vector classification using the standard set of statistical descriptors

with spherical-harmonic fitting yielded a model with a prediction accuracy of 47% for

negatives and 95% for positives; an overall accuracy of 75%. This SVM (ν-SVC, ν=0.5)

uses a radial basis function (γ=0.016) and possesses 43 support vectors.

− +

Negative 14 16

Positive 2 40

Figure 3.3 Confusion matrix for test set using spherical-harmonic fitting.

A support vector machine was trained with the marching-cube surface-fitted

properties, which yields a model with a prediction accuracy of 43% for negatives and 90%

for positives; an overall accuracy of 71%. The SVM (ν-SVC, ν=0.4) uses a radial basis

function (γ=0.016) possesses 42 support vectors.

− +

Negative 13 17

Positive 4 38

Figure 3.4 Confusion matrix for the test set using marching-cube fitting.

An analysis of the correlation between the Parasurf statistical descriptors and the

target classification was performed and the ten most significant descriptors for the

marching-cube-fitted surface properties were used, along with calculated logP, as an

enriched predictor set (Table 3.3) and SVM’s trained. Three of these descriptors were

from a set of newly-added Parasurf statistical descriptors, describing the skewness and

kurtosis of the distribution of values for each of the local properties. The best ν-SVM

63


(ν=0.500) with a radial basis function (γ=0.091) had 41 support vectors and an overall

prediction accuracy of 75% for the test set (57% for negatives, 88% for positives).

− +

Negative 17 13

Positive 5 37

Figure 3.5 Confusion matrix for the test set using descriptors that are highly-correlated with the target

values.

Table 3.3 Enriched descriptor set: 11 descriptors.

1 logP

2 Mol. Vol.

3 Mean (−) MEP

4 MEP (−) variance

5 MEP kurtosis

6 Mean IEL

7 IEL variance

8 IEL kurtosis

9 Mean χL

10 χL variance

11 χL skewness

Given that the compounds that have been observed to induce phospholipidosis are

generally cationic amphiphilic structures, an additional correlation analysis was performed

using pKa and logP values calculated from surface-integral models. These two surface-

integral properties proved to correlate well and were added to a list of the 18 most

significant descriptors (Table 3.4) and used to train a support vector machine (C-SVC,

C=0.8; radial basis function, γ=0.048). This consisted of 49 support vectors and yielded a

prediction accuracy of 53% for negatives and 98% for positives; or 79% overall.

64

Chapter 3

− +

Negative 16 14

Positive 1 41

Figure 3.6 Confusion matrix for the test set using additional descriptors, pKa and logP.

Table 3.4 Enriched descriptor set: 20 descriptors.

1 pKa

2 logP

3 Dipolar density, μD

4 Molecular polarizability, α

5 Mol. Wt.

6 Globularity

7 Mol. surface area

8 Mol. volume

9 Mean (+) MEP

10 Mean (−) MEP

11 MEP (−) variance

12 MEP total variance

13 MEP kurtosis

14 Mean IEL

15 IEL variance

16 IEL skewness

17 IEL kurtosis

18 Mean χL

19 χL skewness

20 ηL skewness

21 αL skewness

Six ν-SVM’s were trained using sets of 128 each of autocorrelation vectors for

shape, molecular electrostatic potential, electron affinity, and ionization potential. The

SVM using electron affinity autocorrelation vectors consisted of 46 support vectors

(ν=0.400; RBF, γ=0.008) and yielded a prediction accuracy of 37% for negatives and 74%

for positives, or 58% overall.

65


− +

Negative 11 19

Positive 11 31

Figure 3.7 Confusion matrix for the test set using EAL autocorrelation vectors.

The SVM using ionization potential autocorrelation vectors consisted of 25 support

vectors (ν=0.200; RBF, γ=0.008) and yielded a prediction accuracy of 57% for negatives

and 76% for positives, or 68% overall.

− +

Negative 17 13

Positive 10 32

Figure 3.8 Confusion matrix for the test set using IEL autocorrelation vectors.

The SVM using shape autocorrelation vectors consisted of 40 support vectors (ν=0.300;

RBF, γ=0.008) and yielded a prediction accuracy of 37% for negatives and 83% for

positives, or 64% overall.

− +

Negative 11 19

Positive 7 35

Figure 3.9 Confusion matrix for the test set using shape autocorrelation vectors.

The SVM using molecular electrostatic potential autocorrelation vectors (minus-minus)

consisted of 42 support vectors (ν=0.200; RBF, γ=0.008) and yielded a prediction

accuracy of 47% for negatives and 88% for positives, or 71% overall.

66

Chapter 3

− +

Negative 14 16

Positive 5 37

Figure 3.10 Confusion matrix for the test set using VMM autocorrelation vectors.

The SVM using molecular electrostatic potential autocorrelation vectors (plus-minus)



− +

Negative 11 19

Positive 10 32

Figure 3.11 Confusion matrix for the test set using VPM autocorrelation vectors.

The SVM using molecular electrostatic potential autocorrelation vectors (plus-plus)



− +

Negative 13 17

Positive 4 38

Figure 3.12 Confusion matrix for test set using VPP autocorrelation vectors.

The sets of autocorrelation vectors were also combined (truncated to 28 increments each)

and used to train a ν-SVM (ν=0.500; RBF, γ=0.006) with 45 support vectors. The

prediction accuracy was 53% for negatives and 95% for positives for an overall accuracy

of 78%.

67


− +

Negative 16 14

Positive 2 40

Figure 3.13 Confusion matrix for test set using all autocorrelation vectors.

The ν-SVM’s were trained with a 10-fold cross-validation scheme with no

misclassification penalty bias (misclassification of negatives are equivalent to

misclassification of positives in the training). As such, the support vector machines

tended toward far more false negatives than false positives.

3.3.2 Multivariate Adaptive Regression Splines

Using Autocorrelation Indices

Using the local properties mapped onto the marching cube surface for each

structure, 66 autocorrelation similarity indices described in Section 3.1.3 for all

compounds in both sets were calculated using the surface of valproic acid (test set;

negative) as a reference. The final model consisted of 12 basis functions and had a

generalized cross-validation error of 0.219. The prediction accuracy for the training set

was 97%, predicting one false positive and one false negative (Figure 3.14). When the

regression splines equations were applied to the test set, a prediction accuracy of 68%,

with 20 false positives and 3 false negatives was found (Figure 3.15).

− +

Negative 27 1

Positive 1 42

Figure 3.14 Confusion matrix for the training set.

68

Chapter 3

− +

Negative 10 3

Positive 20 39

Figure 3.15 Confusion matrix for the test set.

The same procedure was applied using the standard set of Parasurf ’06 statistical

descriptors. The training set yielded a model that consists of 9 basis functions and had a

generalized cross-validation error of 0.351. The prediction accuracy for the training set

was 90%, predicting 3 false positives and four false negatives (Figure 3.16). When the

regression equations were applied to the test set, a prediction accuracy of 72%, with 15

false positives and 5 false negatives (Figure 3.17).

− +

Negative 25 4

Positive 3 40

Figure 3.16 Confusion matrix for the training set.

− +

Negative 15 5

Positive 15 37

Figure 3.17 Confusion matrix for the test set.

Thus, the MARS models are slightly less accurate in their predictions as compared

to the support vector machines, but they also predict many more false positives (while the

SVM’s predict many more false negatives).

69


3.4 Discussion

It seems clear from the comparison with the MARS prediction accuracies (of

approximately 70%) that there is not an obvious condition of over-fitting of the training

data in the case of the SVM’s, with an averaged prediction accuracy of 75.6%. The C-

SVC machine (RBF; C=0.8; γ=0.048) with the largest feature space margin (ρ=2.405)

with 49 support vectors was able to classify 57 of the 72 compounds in the test set

correctly (79% accuracy). It is generally useful to note the size of the feature space

margin, ρ, as an indicator of the relative ability of the SVM to generalize in the prediction

of new data, but as this value is in the units of the n-dimensional transformed feature space

for a particular SVM, it cannot be used as a standard measure. More useful is a direct

comparison with the predictive capacity of another multivariate analysis technique, such

as the MARS analyses presented here. That the two techniques give similar predictive

accuracies suggests that the best models will generalize as well as the training data will

allow.

Among the different SVM trainings using the various descriptor sets, there were 16

cases where molecules were predicted correctly consistently among all trained machines

and several cases where molecules were predicted incorrectly. The most consistently

well-predicted members of the test set are shown below in Table 3.5 in bold italics, while

the most misclassified compound, Ceftazidime, is underlined.

Table 3.5 Test set misclassifications among trained support vector machines.

Compound Number of Misclassifications

3-Methylcholanthrene 2 AC-3579 1

Acetaminophen 2 Amikacin 1

Amiodarone 1 Amitriptyline 1 Anticoman 3

Aricept 3 AY-25329 2 AY-9944 0

Bicalutamide 2 Boxidine 1

Bupropion 1 Carbon_tetrachloride 3

Ceftazidime 4

70

Chapter 3

Cephaloridine 1 Chlorcyclizine 2

Chloroform 3 Chlorphentermine 1

Citalopram 1 Clindamycin 0 Clomipramine 1

Clozapine 3 Cocaine 2 Coralgil 1

Dantrolene 1 Demeclocycline 0

Desferal 3 Dibekacin 1 Diclofenac 2 Diflunisal 2

Di-isobutamide 0 Doxapram 3

Doxycycline 1 Emetine 0

Ethyl_fluclozepate 2 Famotidine 2

Fenfluramine 2 Flutamide 0

Homochlorcyclizine 2 Hydrazine 2

Hydroxyurea 1 IA3 1

Imipramine 0 Indoramin 2

L-Ethionine 2 Maprotiline 1 Meclizine 0

Mesoridazine 1 Metformin 0 Methadone 3

Methapyrilene 3 Mianserin 2

Netilmicin 0 Noxiptiline 1

Paraquat 2 Perhexiline 1

Procaine 1 Promethazine 2 Propranolol 1 Quinacrine 1 Quinidine 1

Rolitetracycline 0 SDZ_200-125 3

SKF-14336-D 0 Stavudine 0 Tacrine 0

Trifluperazine 2 Triparanol 0 Tunicamycin 1 Valproic_acid 3 Zimelidine 0

71


In general, structures that have a negative assay result for phospholipidosis

induction are under-represented in the data set and the multivariate adaptive regression

splines that have been applied to the surface property data have proved to predict more

false positives than false negatives, while the support vector machines predict in an

opposite manner. In both cases there is a tendency of the multivariate methods to bias

their predictions toward the correct classification of primarily positives or primarily

negatives, with the border between them remaining rather unresolved. Overall, the best

combinations of surface predictors and multivariate methods give a prediction accuracy of

75-79% for this data set. As more research is published on the prediction of

phospholipidosis, larger data sets will be available for use in the construction of

computational models and thus, the models themselves will improve.

In a previous experiment examining the effect of charge state on the predictive

capacity of the SVM’s, the structures in the training set were ionized according to their

charge state (ionized > 50%) in solution at physiological pH (7.4) and used to train several

SVM’s. The structures in the test set were also ionized by this criterion and used to test

the predictive capacity of the trained machines. The resulting SVM’s proved excellent in

predicting the charge state of the molecules, but very unreliable in predicting

phospholipidosis induction (~50% overall accuracy). As a result, all structures were used

in their neutral forms. It would seem that, while the charge state of a given molecule may

represent its true state in solution, the effect on the surface descriptors is to diminish the

impact of the weaker, non-electrostatic components such as molecular polarizability in the

subsequent classification schemes.

The work of Tomizawa, et al.93, drawing on earlier work by Ploemen94 and

Fischer95, describes the use of two predictors, net molecular charge (NC) (based on the

relative charge distribution of molecules in solution at a specified pH of 4.0 from a

calculation of pKa) and ClogP, in the prediction of phospholipidosis-inducing potential

(PLIP), giving a PLIP risk rating to each compound in their combined set of 63

compounds. The reported prediction accuracy is 98%, with only one misclassified

compound in their validation set. This simple and efficient method seems highly

predictive, but it is little more than a set of rules in the manner of Lipinski’s Rule of Five96

and, as the authors indicate, its accuracy is wholly dependent on the degree of accuracy of

the atom/fragment-based calculations of pKa and ClogP. And aside from predicting that

cationic amphiphilic species are, in fact, cationic and amphiphilic, the method does not

72

Chapter 3

explore or allow for the exploration of the underlying relationships between the CAD and

its environment, in terms of the close-contact regions with the surfaces of intra-lysosomal

phospholipids or phospholipases.

Thus, in terms of application to efficient high-throughput virtual screening, the

more lightweight, less computationally-intensive methods are the more desirable, which,

in this case are the methods of Tomizawa, Ploemen, and Fischer, with whatever their

actual prediction accuracies might be. In the case of our local property/SVM technique,

what is lost in a marginally greater computational cost is made up for in the accumulation

of local property information that may be used to ascribe electronic surface interactions to

actual processes involved in inducing phospholipidosis. The main drawback here, again,

is the present lack of interpretability of the local properties as a collection of statistical

measures. However, insofar as the properties of pKa and logP, themselves, may be

accurately predicted by local property surface-integral models (Chapter 2), it seems clear

that local surface properties must play a significant role in the interplay of forces

governing the initiation of phospholipid aggregation within the lysosome.

3.5 Conclusions

This study demonstrated the use of local surface properties in a support vector

machine methodology to predict phospholipidosis induction given a set of molecular

surfaces as described by statistical measures of the local properties. It is interesting to

note that the support vector machine trained with the additional pKa and logP descriptors

calculated from surface-integral models was the most accurately predictive in terms of

classification by local property descriptor. This suggests not only the importance of these

particular properties to the process of phospholipidosis induction, but, as these values are

themselves calculated from the same pool of local surface properties, the importance of

the local properties in describing the range of surface interactions involved in the process

associated with phospholipidosis.

73

Chapter 4

Three-Dimensional Quantitative

Structure-Activity Relationships Using

Local Properties

4.1 Introduction

4.1.1 Comparative Molecular Field Analysis

Comparative Molecular Field Analysis97, or CoMFA, is a 3D-QSAR method

developed by the group of Richard D. Cramer, III that involves modeling the relationship

between ligands and receptors in terms of steric and electrostatic interactions. This is

done by aligning a set of molecular structures that have an associated activity value (logK,

inhibitory concentration, etc.). A three-dimensional grid is generated around the aligned

molecules and probe “atoms” are placed at each point in the grid. The steric and

electrostatic potentials that arise from proximity with the atoms in the aligned molecules

are recorded and used in a partial least squares regression with the activity values as

independent variables. A Leave-One-Out cross-validation scheme is used in the Tripos’

74

Chapter 4

SYBYL36 implementation of CoMFA to estimate the predictive capacity of the model in

terms of q2, the cross-validated r2 of the model. A three-dimensional contour map is then

plotted that relates regions of steric and electrostatic potential to activity. Colored figures

in the space around the aligned molecules indicate regions that relate positively and

negatively to activity (Figure 4.1).

Figure 4.1 A representation of a SYBYL CoMFA analysis of coumarin substrates

as inhibitors of cytochrome P450 2A598.

The most common method of molecular alignment is by substructure. The

alignment algorithms use a reference fragment as a template for aligning all other

structures, as in SYBYL. Cepos InSilico’s Parafit aligns structures using a set of

spherical harmonic functions that are produced by Parasurf to generate a molecular

surface. Local properties that are mapped onto the surface, i.e. onto the spherical

harmonic functions, can then be used as a template for alignment by common electronic

properties such that the set of molecules need not have a common substructure. In

addition to the alignment by overlaying the spatial positions of spherical harmonics,

alignment may also be conducted by similarity of fitted local electronic properties, such as

electronegativity.

75

3D-QSAR Using Local Properties

The measure of the predictive capacity of a CoMFA model, according to the

SYBYL manual, is found in the statistical measures r2, the regression coefficient of the

model, and q2, the “predictive r2”. The latter is the measure of the fit of the cross-

validated predictions which, in the case of the standard CoMFA and the method employed

here, is a full Leave-One-Out cross-validation scheme, with each case left out in turn and

predicted by the rest of the data set. The value of r2 should always be greater than 0.6 (a

good model should have an r2 > 0.9) and the value of q2 could fall into three categories:

• q2 > 0.6: The model is fairly good.

• 0.4 < q2 < 0.6: The model is questionable.

• q2 < 0.4: The model is poor.

In addition, a minimum number of vector components (described in the following section)

should be used that improves r2 by at least 5%. Typically, the number of components in a

given model should be no greater than seven or eight. In general, the lower the number of

components, the more straightforward the relationship between the probe parameters and

observed activity.

4.1.2 Partial Least Squares Regression

Representing the large number of steric and electrostatic potential values

determined for each of the many grid points (in some cases, thousands of values) in a

meaningful way becomes difficult for typical statistical analytical methods. It is a

problem of how to select the important predictors from such a large set of data. In

instances of QSAR/QSPR modeling where multiple regression analyses result in poorly-

predicting models due to cases of over-fitting of the data or where large numbers of

factors cannot be avoided due to the nature of the experiment, a statistical method very

similar to principal component analysis, called Partial Least Squares (PLS) analysis99, can

be used to extract latent factors in the data that account for the variation in the target

values. Introduced by Wold100 and co-workers around 1979, and referred to as the

Projection to Latent Structures in statistics texts101, the general method involves

transforming the matrices of the factors and the target values into new vector spaces such

76

Chapter 4

that the relationship between successive pairs of scores is a high as possible. Directions in

transformed factor space that associate with the greatest variance in the responses that are

also biased toward directions in response space that result in accurate predictions are used

as a means of indirectly modeling the target values. The extracted factors depend on all

input variables, with each factor contributing successively less to the predictivity of the

model. Thus, while there is no data reduction involved in the process itself, only a certain

number of factors, or vector components, (usually determined by some measure of residual

variance) are used in the final model. The most common method of determining the

maximum number of vector components to be used is by cross-validating by each target

value until a minimum value is reached.

4.1.3 Local Properties

It has been argued102 that steric and electrostatic fields do not present a complete

picture of drug-receptor interactions, so more recently other 3D-QSAR methods have been

developed that take additional physicochemical properties into account. One such method

is known as Comparative Molecular Similarity Index Analysis103 (CoMSIA) and follows

the same general CoMFA methodology, but using atomic probes for local hydrophobicity,

hydrogen bond donors and acceptors, as well as for steric and electrostatic potential

contributions. Another major difference lies in the use of a Gaussian distance function

applied to grid values such that there are no dramatic property changes from grid point to

grid point and the use of similarity indices between structures for each property used as

factors in PLS analysis. The indices are calculated by:

2

1

( ) iqn

rqk i

i

S j e αω ω −

=

= − ⋅ ⋅∑ (4.1)

where, for molecule j, ωi is the target property value, ωi is the local property value at grid

point q for a probe atom (charge +1, radius 1Å, hydrophobicity +1, H-bond donor index

+1, and H-bond acceptor index +1), riq is the distance between grid point q and atom i, and

α is an attenuation factor. The models that result from CoMSIA are generally more

predictive in terms of q2 than their CoMFA counterparts and have the ability to model the

77


binding surfaces of the ligand-substrate complex more accurately in terms that are familiar

to the biochemist.

In the interest of expanding the descriptive vocabulary of 3D-QSAR, a method has

been developed that uses local properties to model the electronic interactions of drug

binding surfaces using a methodology analogous to CoMFA. The approach described

below begins with the standard steric and electrostatic descriptors of CoMFA, in the form

of the local electron density and the local molecular electrostatic potential, respectively.

To these are added the local properties of electron affinity, ionization potential,

electronegativity, hardness, and polarizability. The result is an augmented molecular field

analysis that is interpreted in a 3D-graphical manner identical to that of CoMFA, but with

additional property fields that may be used alone or in combination to reveal important

intermolecular interactions not elucidated by shape and charge fields alone.

Figure 4.2 A set of aligned structures in a CoMFA grid.

78

Chapter 4

4.2 Computational Methods

The structures in the following data sets were aligned using SYBYL 7.0 via

conversion to Tripos mol2 format, followed by conversion back into MDL SD format.

Semi-empirical MO calculations with the AM1 Hamiltonian were performed on each

using VAMP 9.0 to calculate charges and orbital information, with or without geometry

optimization as indicated. A three-dimensional grid with a point spacing of 2 Ångstroms

and a 4 Ångstrom border was generated by script and was used in the calculation of seven

local properties: electron density δe, electron affinity EAL, electronegativity χL, hardness

ηL, ionization potential IEL, electrostatic potential V, and polarizability αL at each grid

point using Parasurf ’07104 (Figure 4.2).

Figure 4.3 Representation of the local-property/activity CoMFA field for EAL.

Points interior to the molecular surface were removed from the grid by using a

“generalized” van der Waals radius of 1.16 Ångstroms in order to ensure that property

values that bear no direct relation to surface activity would not appear in the PLS analyses.

This atomic radius, which is slightly smaller than the van der Waals radius for the

79


hydrogen atom (1.20Å), was chosen in order to leave enough surface electron density to

use the local electron density as a steric parameter in the 3D-QSAR analysis. The local

property values at the grid points were then used as independent variables in separate

partial least squares regressions, using associated physical property values as target values.

The partial least squares analyses were performed using an in-house program using the

SIMPLS105 algorithm and a full cross-validation scheme (i.e. all cases are excluded and

predicted in turn), re-centering and re-scaling the included data for each run. The PLS

regressions were carried out initially to ten vector components in order to determine the

maximum number of components to be included in the final model, using the cross-

validated standard error of prediction (SEP) of the model to choose the appropriate

number of components (as in SYBYL). In the cases where PLS analyses were performed

using single local properties, the property data was normalized by the mean. The

regression coefficients for the final model were then used to generate a three-dimensional

representation of the property space with colored spheres using Pymol45. Those grid

points with a positive relationship to the particular target property are color-coded red,

while those with a negative relationship to the target property are color-coded blue. In

addition to color-coding, the size of the grid spheres, determined by the standardized

magnitude of the regression coefficients, represents the magnitude of the relationship

between the local property at that point to the target value (See Figure 4.3).

4.3 Results and Discussion

4.3.1 Serotonin Receptor Agonists/Antagonists

The common 5-HT1A and α1 -adrenergic agonist/antagonist data set consists of 23

thienopyrimidinone structures in Tripos mol2 format that had been optimized

previously106 with MM3107. The structures in Table 4.1 were aligned to structure 23 and

converted to MDL SD format. Single-point AM1 calculations were used to calculate the

charges and orbital information needed by Parasurf ’07 for the grid (4Å border, 2Å

80

Chapter 4

spacing) points surrounding the aligned molecules. The pIC50 values for 1) 5-HT1A

receptor binding and 2) α1 -adrenergic receptor binding for each compound were used as

target values in PLS analyses, where IC50 is the concentration of ligand that causes 50%

dissociation of [3H]-8-hydroxy-2-(di-N-propylamino)tetralin from the 5-HT1A receptor or

50% dissociation of [3H]-prazosin from the α1 receptor in binding assays.

The PLS analysis of the aligned data set using all local properties yielded a q2 of

0.761 with one vector component, an overall SEP of 0.793, and a model r2 of 0.870. The

cross-validated predictions are presented below in Figure 4.4. The results of the PLS

analyses using individual local properties are presented in Table 4.2.

Table 4.1 Thienopyrimidinone 5-HT1A and α1 agonists/antagonists.

N N

O

R3

S

S

R1 R2

N

NR4 N N

O

H2N

S

S

H3C CH3

N OCH3

21

N N

O

H2N

S N

N

OCH3

22

Structure R1 R2 R3 R4 5-HT1A pIC50 α1 pIC50

1 Me Me H 2-Cl-Ph 6.34 6.79

2 Me Me H 3-Cl-Ph 6.01 6.52

3 Me Me H 2-OMe-Ph 7.62 7.40

4 Me Me H 1-Naphthyl 6.45 6.05

5 Me Me H 2-Pyrimidyl 6.65 5.96

6 H 2-Cl-Ph 6.03 6.78

7 H 2-OMe-Ph 7.23 7.42

8 H 1-Naphthyl 6.43 6.35

9

-(CH2)4-

-(CH2)4-

-(CH2)4-

-(CH2)4- H 2-Cl-Ph 6.30 5.74

10 H Ph H 2-OMethenyl 6.41 6.65

11 H Ph H 1-Naphthyl 5.70 5.61

12 -(CH=CH)- H 2-OMe-Ph 7.34 7.04

13 H H NH2 2-OMe-Ph 8.92 8.54

14 Me 2-OMe-Ph 8.15 7.19

15

-(CH2)4-

-(CH2)4- NH2 2-OMe-Ph 8.89 7.41

16 Me Me NH2 Ph 8.48 7.37

81


17 Me Me Me 2-OMe-Ph 8.52 7.57

18 Me Me NH-Ph 2-OMe-Ph 6.30 7.49

19 Me Me Me 2-Pyrimidyl 7.19 5.69

20 Me Me NH2 2-Pyrimidyl 8.17 6.30

21 − − − − 9.10 7.44

22 − − − − 9.30 8.40

23 Me Me NH2 2-OMe-Ph 9.52 8.14

4 5 6 7 8 9Experimental pIC50

4

5

6

7

8

9

Cal

cula

ted

pIC

50

Figure 4.4 Cross-validation predictions vs. actual values of 5-HT1A receptor binding pIC50.

Table 4.2 Partial least squares regression results for the 5-HT1A data set.

δe EAL χL ηL IEL V αL ALL

r2 0.682911 0.759950 0.974536 0.818111 0.936360 0.613866 0.780818 0.869525

q2 0.22754 0.66781 0.82511 0.698682 0.750238 0.509428 0.699553 0.761065

SEP 1.245238 0.913798 0.690154 0.874884 0.81127 1.059668 0.874447 0.792926

Components 1 1 2 1 2 1 1 1

None of the PLS regressions required more than two vector components to return q2

values greater than 0.6. The two exceptions (Table 4.2) are electronic density (δe) and

molecular electrostatic potential (V), which are the terms analogous to the two standard

CoMFA parameters. Judging from the q2 value (0.825) for the regression using only the

82

Chapter 4

local electronegativity, a 3D-QSAR model (Figure 4.5) using this single local property is

sufficient to predict the serotonin inhibitory concentration for the data set.


4

5

6

7

8

9

Cal

cula

ted

pIC

50

Figure 4.5 Cross-validation predictions vs. actual values of 5-HT1A receptor

binding pIC50 using only local electronegativity.

Figure 4.6 Local-electronegativity/activity field for the aligned 5-HT1A data set.

83


In Figure 4.6, a strong positive relationship with activity is observed near the distal end of

aligned thienopyrimidinone nitrogenous substituents, while a larger region of negative

activity resides near the distal nitrogen of the aligned piperazine rings.

4.3.2 Adrenergic Receptor Agonists/Antagonists

The PLS analysis of the aligned data set using all local properties yielded a q2 of

0.700 with two vector components, an overall SEP of 0.602, and a model r2 of 0.980. The

cross-validated predictions are presented below in Figure 4.7 and the results of the PLS



4

5

6

7

8

9

Cal

cula

ted

pIC

50

Figure 4.7 Cross-validation predictions vs. actual values of α1-adrenergic receptor binding pIC50.

Table 4.3 Partial least squares regression results for the α1-adrenergic receptor data set.


r2 0.837437 0.992341 0.964925 0.976671 0.949911 0.480773 0.656486 0.980002

q2 0.300313 0.676767 0.765624 0.635388 0.677426 0.296878 0.519698 0.700124

SEP 0.817689 0.633365 0.545112 0.659089 0.621647 0.820678 0.724712 0.60231

Components 2 7 3 4 3 2 1 4

84

Chapter 4

Here again, with a q2 of 0.766, the local electronegativity model predicts slightly

better than the combined local property model and the two standard CoMFA parameters

are the poorest-performing of the local properties. Additionally, more vector components

were required to construct each of the α 1 receptor models than were required for the 5-

HT1A models. The plot of experimental versus calculated pIC50 is presented below in

Figure 4.8.


4

5

6

7

8

9

Cal

cula

ted

pIC

50

Figure 4.8 Cross-validation predictions vs. actual values of α1-adrenergic

receptor binding pIC50 using only local electronegativity.

Figure 4.9 Local electronegativity/activity field for the aligned α1-receptor data set.

85


As in the case of the 5-HT1A data, a positive response in local electronegativity near the

distal end of the thienopyrimidinone rings was observed for the α1 receptor data (Figure

4.9). There are several negative response regions, however, that describe a rather

complicated response in local electronegativity, primarily on the thienopyrimidinone end

of the structures. The property field regions of positive response common to both sets of

data indicate a relationship between electronegative (nitrogenous) substituents on the

thienopyrimidinone ring and inhibitory activity, while the regions of negative impact are

different for the two data sets.

4.3.3 Dopamine D4 Antagonists

The D4 dopamine antagonist data set consists of 29 MDL SD piperazine structures

that had been optimized previously108 with MM3. The structures were converted to mol2

format, then aligned to a central substructure using SYBYL 7.0 and converted back to SD

format. Single-point AM1 calculations using VAMP 9.0 were used to calculate charges

and orbital information that were used as input for Parasurf ‘07, which calculated the

local properties for each point in a three-dimensional grid (4Å border, 2Å spacing)

surrounding the aligned molecules. The pKi (the negative logarithm of the inhibition

constant) values for the dopamine D4 receptor binding for each compound were used as

the target values in PLS analyses.

Table 4.4 Piperazine dopamine D4 receptor antagonists.

NN

N N R1

R2

45

6

71-16, 26-29

R1

N N Cl

17-25

Structure R1 R2 exp. pKi calc. pKi

1

p-Cl-Ph

H

8.64

7.935

2 Ph H 7.78 7.379

86

Chapter 4

3 p-I-Ph H 8.52 8.047

4 p-F-Ph H 7.70 7.834

5 Me H 5.14 7.028

6 Et H 4.62 5.911

7 p-Cl-Ph 4-Me 7.30 7.522

8 p-Cl-Ph 7-I 8.30 9.439

9 p-Cl-Ph 7-Me 8.57 7.921

10 p-Cl-Ph 7-ethinyl 8.91 7.905

11 cyclohexyl H 5.35 6.698

12 m-Cl-Ph H 8.41 7.661

13 p-Cl-Ph 4,5-benzo 5.74 7.513

14 p-Cl-Ph 6,7-benzo 6.85 8.194

15 m-Cl-Ph 4,5-benzo 6.10 6.532

16 p-Cl-Ph 6,7-benzo 6.66 7.297

17 NH

− 7.58 7.805

18 NH

CN

NC − 8.02 8.426

19 HN

− 7.41 8.982

20 HN

NC

CN − 8.24 8.834

21 NH

− 7.74 7.822

22 N N

− 8.66 7.64

23

− 6.60 7.705

24 NH

N

− 9.21 7.581

25 NH

NN

Cl

N

− 7.80 7.101

26 p-ethinyl-Ph H 8.36 8.036

27 m,p-Cl-Ph H 8.25 7.708

28 m-CF3-Ph H 8.72 7.646

29 H CH2OH 7.71 7.935

87


The PLS regression model using all local properties yields a q2 value of 0.623 with

three vector components and an overall SEP of 0.960, and a model r2 of 0.906. A plot of

the cross-validated predictions is presented in Figure 4.11. The results of the PLS


Table 4.5 Partial least squares regression results for the D4 receptor data set.


r2 0.886454 0.533339 0.900134 0.922590 0.900603 0.930679 0.375814 0.905785

q2 0.67420 0.27462 0.626792 0.566274 0.616129 0.778059 0.098352 0.623449

SEP 0.94303 1.188695 0.955174 1.012352 0.972363 0.784165 1.235116 0.959651

Components 3 1 3 4 3 7 1 3

Figure 4.10 Molecular electrostatic potential field for the aligned D4 receptor set.

With this data set, the electrostatic potential (Figure 4.10) and electron density

regressions yield the most predictive models, with local electronegativity and local

ionization potential also providing significant contributions. It is, therefore, to be

88

Chapter 4

expected that a standard CoMFA would produce a comparable model and, indeed, the

reported q2 for the standard analysis with this data set was 0.739, with an SEP of 0.734

using seven vector components108.

4 5 6 7 8 9Experimental pKi

4

5

6

7

8

9C

alcu

late

d pK

i

Figure 4.11 Cross-validation predictions vs. actual values of dopamine D4 receptor binding pKi.

The original article described the use of an all-orientation109 sampling of CoMFA property

space to return the best possible q2 value, which may over-estimate the relationship

between the observed activity and the combined steric and electrostatic parameters.

4.3.4 Avian Influenza Neuraminidase Inhibitors

A subset of 21 2D structures and accompanying pIC50 values (Table 4.6) were

taken from a larger set of 126 avian influenza neuraminidase inhibitors110. These were

converted to 3D MDL SD files using Molecular Networks’ CORINA and subsequently

geometry-optimized with AM1 with VAMP 9.0. The structures were aligned with

SYBYL 7.0 and the set of Parasurf local properties were calculated for a grid (4Å border,

2Å spacing) surrounding the structures.

89


Table 4.6 Avian influenza neuraminidase inhibitors. COOH

NHR3

R2R1O

Structure R1 R2 R3 exp. pIC50 calc. pIC50

1 CHEt2 NH2 COMe 9.00 7.15

2 C3H7 NH2 COMe 6.89 6.44

3 CH2CH2CF3 NH2 COMe 6.65 5.59

4 CH2CH=CH2 NH2 COMe 5.66 6.66

5 Me NH2 COMe 5.43 6.74

6 C2H5 NH2 COMe 5.70 7.08

7 C4H9 NH2 COMe 6.52 6.44

8 C5H11 NH2 COMe 6.70 6.86

9 C6H13 NH2 COMe 6.82 6.61

10 C7H15 NH2 COMe 6.57 6.93

11 C8H17 NH2 COMe 6.74 6.81

12 C9H19 NH2 COMe 6.68 6.40

13 C10H21 NH2 COMe 6.22 6.39

14 CH2CHMe2 NH2 COMe 6.70 5.56

15 CH(Me)Et (S) NH2 COMe 8.05 7.17

16 CH2C6H5 NH2 COMe 6.21 7.06

17 H NHC(=NH)NH2 COMe 7.00 7.75

18 C3H7 NHC(=NH)NH2 COMe 8.70 8.30

19 C4H9 NHC(=NH)NH2 COMe 8.52 8.43

20 CH(Me)Et NHC(=NH)NH2 COMe 9.30 9.86

21

C3H7

NH2

COMe

5.82

6.76

The PLS regression model using all local properties yields a q2 value of 0.678 with

four vector components and an overall SEP of 0.847, and a model r2 of 0.965. As can be

seen in Table 4.7, all of the local properties contribute significantly to the predictivity of

the model, with the exception of the local electron density and the local molecular

electrostatic potential. The regressions of local electron affinity and local molecular

polarizability are the best predictors of activity in this case, with q2 values greater than 0.7.

90

Chapter 4

Either of these local properties, alone, should be adequate in predicting the observed

activity and indicating the dependence of the activity on EAL or αL.

Table 4.7 Partial least squares regression results for the neuraminidase inhibitor data set.


r2 0.711406 0.953357 0.963215 0.986630 0.982036 0.606607 0.913741 0.965417

q2 0.559258 0.729133 0.680904 0.690092 0.681546 0.463248 0.733392 0.677654

SEP 0.937534 0.858793 0.838464 0.825561 0.836767 1.008261 0.782692 0.847242

Components 2 4 4 6 5 1 3 4

5 6 7 8 9Experimental pIC50

5

6

7

8

9

Cal

cula

ted

pIC

50

Figure 4.12 Cross-validation predictions vs. actual values of neuraminidase inhibition pIC50.

The regions of both positive and negative response for the local molecular polarizability

field are situated very near the central ring on the same side of the ring, suggesting a

somewhat complex positive response to polarizable R1 moieties.

91


Figure 4.13 Local molecular polarizability field for the aligned neuraminidase inhibitor set.

4.3.5 Mutagenic Tertiary Amides

A set of 49 N-acyloxy-N-alkoxyamide structures111,112 that possess a fully sp3-

hybridized central nitrogen amide and accompanying mutagenicity values

(log[mutagenicity] at a concentration of 1 μmol/plate in Salmonella typhimurium TA100)

are presented in Table 4.8. These compounds have been shown to react with N7 of

guanine in the major groove of DNA through an SN2 mechanism involving the

displacement of the N-acyloxy group. Somewhat counter-intuitively, however, the less

reactive the compound, the more mutagenic it is113.

The 2D structures were converted to 3D MDL SD files using CORINA and

subsequently geometry-optimized with AM1. The structures were aligned using SYBYL

7.0 and the set of Parasurf local properties calculated for a grid (4Å border, 2Å spacing)

surrounding the structures. The PLS regression model using all local properties yields a q2

value of 0.678 with four vector components and an overall SEP of 0.847, and a model r2

of 0.965. The results of the local property regressions are presented in Table 4.9 below

and the cross-validated predictions of the model are presented in Figure 4.14.

92

Chapter 4

Table 4.8 The mutagenic tertiary amides data set.

ON

O

O

R2

R3

O

R1

Structure R1 R2 R3

1 Me φ p-Br-φ -CH2- 2 2,6-diMe-φ - φ Bu 3 3,5-diMe-φ - φ Bu 4 Me 3,5-diMe-φ - Bu 5 Me Me Bu 6 Me p-Br-φ - Bu 7 Me φ Bu 8 Me p-Cl-φ - Bu 9 Me p-Me-φ - Bu

10 Me m-NO2-φ - Bu 11 Me p-NO2-φ - Bu 12 Me p-φ-φ- Bu 13 Me p-t-Bu-φ- Bu 14 adamantanyl φ Bu 15 Bu φ Bu 16 φ adamantanyl Bu 17 φ φ Bu 18 φ i-Pr Bu 19 φ t-Bu-CH2- Bu 20 φ Et Bu 21 i-Pr φ Bu 22 t-Bu-CH2- φ Bu 23 t-Bu φ Bu 24 Me Me φ -CH2- 25 Me φ φ -CH2- 26 φ φ φ -CH2- 27 φ p-t-Bu-φ- φ -CH2- 28 p-benzaldehyde φ φ -CH2- 29 p-Cl-φ - φ φ -CH2- 30 p-cyano-φ - φ φ -CH2- 31 p-Me-φ - φ φ -CH2- 32 p-MeO-φ - φ φ -CH2- 33 m-MeO-φ - φ φ -CH2- 34 m-NO2-φ - φ φ -CH2-

93


35 p-NO2-φ - φ φ -CH2- 36 p-φ-φ- φ φ -CH2- 37 p-t-Bu-φ- φ φ -CH2- 38 Me φ p-Cl-φ -CH2- 39 Me φ Et 40 Me φ i-Pr 41 Me φ p-Me-φ -CH2- 42 Me φ p-MeO-φ -CH2- 43 Me φ n-octanol 44 Me φ p-φ-φ-CH2- 45 Me φ p-φ-Ο-φ-CH2- 46 Me φ Pr 47 Me φ p-t-Bu-φ- CH2- 48 Me p-t-Bu-φ- p-t-Bu-φ- CH2-

49 φ φ p-t-Bu-φ- CH2-

Table 4.9 Partial least squares regression results for the mutagenic tertiary data set.


r20.790466 0.846869 0.726179 0.708745 0.717700 0.365975 0.531608 0.932495

q20.560957 0.621656 0.636329 0.615513 0.627104 0.322679 0.287702 0.693979

SEP 0.304009 0.291694 0.28339 0.289702 0.286228 0.350909 0.395195 0.266125

Components 3 4 1 1 1 1 1 4

Here again the standard CoMFA steric and electrostatic parameters would be

expected to produce a less predictive model as a result of lower q2 values and larger SEP

values for local electron density and local MEP in comparison to the other local property

q2 values. The q2 for local polarizability is also very low with this data set.

In one of the original papers describing these compounds112, it is indicated that

steric factors affect mutagenicity in two respects. The first of these concerns the ability of

the amides to enter the major groove of DNA in such a way that a stable transition state

geometry can be achieved and the second involves an observed decrease in SN2 reactivity

with increased bulk near the amide nitrogen. In our analysis, the major responses to

activity were found in the local properties EAL, χL, ηL, and IEL, with the local

94

Chapter 4

electronegativity field indicating a relationship between mutagenicity and the electron-

withdrawing character of para-substituents on the alkoxy phenyl ring moieties, as seen in

Figure 4.35, where the large red sphere is situated above the para- position (with respect

to the central nitrogen) of the benzoxy ring(s).

1 2 3Experimental

1

2

3

Cal

cula

ted

Figure 4.14 Cross-validation predictions vs. actual values for the mutagenic amides data set.

Figure 4.15 Local electronegativity field for the aligned mutagenic amides set.

95


The local electron affinity field shown in Figure 4.16, which also contributes strongly to

predictivity, indicates a strong positive relationship with activity on the “puckered” side of

the central nitrogen and a strong negative relationship with activity on the opposite side.

Figure 4.16 Local electron affinity field for the aligned mutagenic amides set.

4.3.6 The Effect of Grid Orientation on Predictivity

A relationship recently has been observed109 between the relative orientation of the

grid surrounding the aligned structures and the predictive q2. An effect of the size of the

grid spacing on the predictivity of CoMFA models has also been reported114. It is now

well-documented that CoMFA analyses tend to give a range of q2 values as the grid

spacing changes or the grid is re-oriented around the aligned structures115. Tropsha, et al.

report that q2 may vary as much as 0.5 q2 units between grid re-orientations, which is

tantamount to the difference between a predictive model and a non-predictive model.

Wang, et al.109 have presented a grid orientation routine that samples all possible

translations and rotations to return the best possible CoMFA q2 value. It would seem that

96

Chapter 4

this approach defeats the purpose of using a validation scheme to estimate the predictivity

of the model since it takes advantage of a defect in the method to give the best possible

statistic. Böhm, et al.114 have attributed the dependence of q2 on grid orientation to the

steepness of calculated steric and electrostatic potentials at the lattice points with a typical

2Å spacing and the use of arbitrary cutoff values. If the grid spacing and orientation lead

to discontinuous values that bias the partial least squares model toward less estimated

predictivity (low q2 values), then might not these same discontinuities lead to an

overestimation of predictivity? A more reasonable use of an all-alignment method might

take the median q2 value from the distribution of values in a manner similar to the method

used by Kroemer, et al.116 in their examination of the effect of cross-validation techniques

on predictivity.

In an effort to offset the effect of grid orientation, the use of grids with 1Å spacing

has been reported108,116, but have not generally been adopted, possibly due to the

significant additional computational expense. For instance, a 1Å-spaced grid with the

same dimensions as a 2Å-spaced grid has roughly eight times as many points, resulting in

about eight times the number of dependent variables in the PLS analysis, which can easily

be ~10,000 in number with the use of several property descriptors. Additionally, the use

of the smaller grid spacing may not result in an improvement in q2. Indeed, neither the

analyses using the local properties presented here, nor those in Kroemer’s investigation116,

exhibit an improvement in q2 with the use of the smaller grid spacing.

To investigate the response in q2 for the Parasurf-generated local properties to

rotation of the grid, two data sets were aligned in a set of grids rotated in steps of 15°

through a range of −90° to +90° about a common Z-axis. Partial least squares analyses

were performed for each local property, including the combined set of properties for each

grid rotation.

97


Table 4.10 Isoquinoline influenza neuraminidase inhibitors data set.

N

O

X

Structure X log 1/C

1

4-NO2

2.90

2 4-Br 2.77

3 4-CN 2.84

4 4-Cl 2.81

5 4-F 2.63

6 H 2.58

7 4-CH3 2.68

8 4-OCH3 2.62

9 4-OH 2.24

10 4-OC2H5 2.65

11 4-OC3H7 2.79

12 4-OC4H9 2.78

13 4-C(CH3)3 3.15

14 3-CH3 2.78

15 3-F 2.67

16 3-Cl 2.82

A set of 16 aligned influenza neuraminidase inhibitors117 in Table 4.10 exhibited a

very poor correlation to activity (log 1/C), with an average q2 value of −0.06593 for the

combined set of local property descriptors. The values of q2 for all of the individual local

property PLS analyses were observed to vary greatly among the rotated grids. The

electronic density models, which gave the overall best scores, yielded a maximum q2 value

of 0.682 and a minimum value of −0.012, with a range of 0.694 q2 units. The other local

properties varied similarly, as shown in Figure 4.17. Local ionization potential (IEL) and

local electronegativity (ENEG) periodically exhibit some measure of correlation to

98

Chapter 4

activity, while local polarizability (POL) consistently contributes nothing to the

predictivity of the model.

The previously described 5-HT1A data set had exhibited good q2 statistics and was

also evaluated for grid orientation dependency. The results of the PLS analyses are

presented in Figure 4.18 below. For five of the local properties: electronegativity,

hardness (HARD), ionization potential, polarizability, and electron affinity (EAL), grid

rotation has a very small effect on the predictivity as measured by q2, while electron

density (DENS) and electrostatic potential (MEP) exhibit a large variance in q2. This

would seem to suggest that when there is a significant contribution from these local

properties there is much less dependence on the orientation of the grid. While for the

steric and electrostatic parameters, the story is much the same as before: the predictive

quality of models that include them are subject to rather severe dependence on the

orientation of the grid.

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-90° -75° -60° -45° -30° -15° 0° 15° 30° 45° 60° 75° 90°

degrees

q 2

DENSENEGHARDIELMEPPOLEAL

Figure 4.17 Response of q2 to grid rotation for each local property using the isoquinoline data set.

99


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-90° -75° -60° -45° -30° -15° 0° 15° 30° 45° 60° 75° 90°degrees

q 2

DENSENEGHARDIELMEPPOLEAL

Figure 4.18 Response of q2 to grid rotation for each local property using the 5-HT1A data set.

The q2 for the PLS analysis using the set of combined local properties also exhibits

much less rotation dependence than the neuraminidase inhibitor data set (Table 4.19).

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

-90° -75° -60° -45° -30° -15° 0° 15° 30° 45° 60° 75° 90°degrees

q 2

Figure 4.19 Response of q2 to grid rotation for combined local properties using the 5-HT1A data set.

100

Chapter 4

The predictive capacity (q2) of the PLS regression models using the combined set

of local properties seems to suffer somewhat in comparison with the q2 values of the

individual local-property PLS regressions, presumably due to the addition of “noise” from

poorly-predicting local properties. This is not observed, however, in the case of the set of

mutagenic tertiary amides (Section 4.3.5), where the modestly-predictive local properties

appear to have an additive effect on the overall predictive capacity, even with the

inclusion of the poorly-predicting properties.

4.4 Conclusions

The application of the newly-introduced local properties: EAL, χL, ηL, IEL, V, and

αL, used previously only in a molecular surface context, to several comparative molecular

field analyses has proved to be substantially as predictive as the standard CoMFA steric

and electrostatic parameters. Indeed, the MEP, analogous to the electrostatic parameter in

standard CoMFA, has been found to be the least predictive of the local properties in some

of the cases presented here. Although it is a well-established property used in describing

Coulombic interactions (which are, in turn, the strongest contributors to intermolecular

interaction energies in the gas phase), the MEP is strongly attenuated in polar solvent such

that interactions from among other local properties may predominate in the overall

intermolecular interaction. In the case where a ligand molecule is “solvated” by inclusion

into an enzyme binding pocket, additional local properties offer a rationale for the

orientation of binding that goes beyond charge-cancellation.

In addition to the advantage gained by having a complement of local electronic

properties from which to establish structure-activity relationships, these local properties

seem to be exceptionally robust with respect to grid orientation, such that all-orientation

grid placement schemes may not be required to extract the best possible PLS q2 value.

The local property 3D-QSAR method presented here is similar to the CoMSIA method103

in that several properties contribute to the overall predictive response of the resulting

model. The local properties could very easily be adapted to a standard CoMSIA

methodology by the application of similarity indices to the local property values at the grid

points. The local properties can, in principle, also be used within the Hypothetical Active

101


Site Lattice106,118 technique (HASL), whereby a 3D lattice of points internal to the van der

Waals radii of the aligned set of molecules is iteratively optimized in terms of partial

(biological, etc.) activities at each lattice point, generating a composite pharmacophore.

The application by any of these methods should prove to be an apt use of local electronic

properties to describe molecular/macromolecular interactions at the point of close contact.

As such, salient electronic features of the binding regions of drug targets may be better-

described by this ensemble of electronic terms.

102

Chapter 5

Conclusions and Outlook

5.1 Conclusions

The computational methods described here allow drug researchers a means of

applying quantum-mechanically-derived local electronic properties to in silico high

throughput screening schemes in such a way as to not only predict and classify by various

physical properties and biological activities, but also to describe in chemical terms the

nature of the observed activity as a function of surface properties. These properties may

also be visualized by mapping them onto an isodensity surface, making the identification

of important functional moieties readily accessible to everyone involved in the drug design

pathway. Large sets of 2D structures or SMILES strings may be evaluated for potential

problems, such as phospholipidosis-inducing potential, in terms that are useful to the

medicinal chemist. In this way, rapid screening of compounds can be achieved for any

property or activity that can be explained by local properties. The same pool of data can

then be used to extrapolate regions at or near the molecular surface that impact activity the

most. The major drawback in the use of local properties lies in their application to the

surface-integral models, where the interpretation of the model is less than straightforward.

However, when applied to a 3D-QSAR scheme, the individual contributions of the local

properties become evident and may be easily interpreted by indicating the regions that

contribute greatly to function.

103

Conclusions and Outlook

Figure 5.1 The contact MEP surface for 5-acetamido-1,3,4-thiadiazole-2-sulfonamide bound to

human carbonic anhydrase II (from the RCSB Protein Data Bank: ID=1YDA)

5.2 Outlook

Given the interest in describing the interactions between a drug and its target in as

much detail as is feasible, the addition of new electronic terms to the traditional

combination of steric and electrostatic parameters in the calculation of binding constants

and free energies, as in the case of automated docking routines, may provide better model

predictivity as well as new insights into the drug-binding process itself. Borrowing on the

ideas of building up a free energy of binding model from the contributions of molecular

fragments119 and the same sort of fragment-based approach in identifying the portions of

the binding region most important to protein-ligand binding, we find that we may be able

to construct a binding energy model based on the contributions of close-contact surface

properties. In essence, the idea is to treat the free energy of binding (ΔGbind) as a sort of

surface-integral-based solvation free energy where the contact free energy term (ΔGcontact)

is the energy associated with the “solvation” of ligand by the substrate (See Equation 5.1).

104

Chapter 5

lig substrbind contact conf conf solvG G G G GΔ = Δ + Δ + Δ + Δ (5.1)

Only the portion of the ligand surface that is in close proximity to the surface of the

enzyme binding pocket is taken to represent the binding surface. The model is constructed

by taking the bound ligand/binding-pocket atoms from crystal structures (along with

associated binding data such as experimentally-determined Kd or Ki values), adding

hydrogens and optimizing the geometries of these hydrogens, keeping the heavy atoms

fixed. Close-contact regions are identified by means of sums of van der Waals radii or by

overlap of isodensity surfaces and recorded. The substrate atoms are removed, leaving

only the ligand in a geometry that is very near that of the bound structure. The local

properties are then calculated for the molecule and, taking only the close-contact portion

of the surface, a regression model of binding energy is constructed from either 1)

statistical descriptors of the close-contact surface, or 2) a surface-integral treatment of the

close-contact surface.

One approximation that is made with this procedure is the neglect of the changes in

conformation in both the ligand and substrate upon binding, but several authors120-122 note

that the bound conformations of ligands are inevitably low-energy geometries and may not

contribute greatly to the overall free energy of binding as defined here. The

conformational free energy of the substrate cannot be evaluated by this method and is

assumed to be small and relatively constant among proteins. The free energy of solvation

term and the contact free energy term in Equation 5.1 are inversely proportional: as the

ligand enters the binding pocket, it becomes solvated by the binding pocket and desolvated

in bulk solution for the same portion of the molecular surface. Since the solvation free

energy model described in Chapter 2, Section 2.3.2, is calculated by a surface-integral

approach as well, it is a function of the solvent-exposed surface. The prediction of the free

energy of protein/ligand binding by local surface properties then becomes a matter of

including a small portion of the binding pocket of the protein in the initial handling of the

ligand. Further, it may be possible to predict the maximum possible free energy of

binding123 for a given ligand without a crystal structure, using the previously-described

model, and to estimate the amount contact surface area and the actual contact surface

regions from a given binding constant.

105

Chapter 6

Summary

Of great interest to the pharmaceutical industry is the elucidation of a set of

chemical/physical properties modulating the relationship between chemical structure and

pharmacological activity that could be used to predict activity based solely on chemical

structure. It is necessary then, only to discover the particular set of molecular descriptors

which adequately describe the activity to be predicted. Since the point of contact for all

drugs lies inevitably with the molecular surface of both the drug and the drug target, a

descriptive model of the molecular surface is needed. The nature of this surface is

electronic and quantum mechanical methods are those which describe the electronic

structure of the molecule. Local properties defined at points on the molecular surface,

such as molecular electrostatic potential (MEP), have been used to describe strong non-

covalent interactions that are based primarily on charge. Recently, additional local

properties have been described which complement MEP and provide a more complete

description of the local electronic environment at the molecular surface. This work

describes the implementation of the five local electronic properties using Parasurf:

electron affinity EAL, electronegativity χL, hardness ηL, ionization potential IEL, molecular

electrostatic potential MEP, and polarizability αL, into three principal methods of

quantitative structure-activity (QSAR) and structure-property prediction (QSPR) for use in

virtual high-throughput screening.

106

Chapter 6

The first of these methods involves the construction of surface-integral models,

which relate physical properties to the sum of the individual contributions of the local

surface properties, as determined by the statistical technique of multiple regression.

Similar regression models for activity have also been constructed from statistical measures

of the local properties, such as maxima, minima, ranges, etc. The predicted properties

may then be mapped onto the molecular surface as local properties, themselves, to expose

surface regions that relate to the observed activity. So, this method provides not only a

predicted property value, but allows for the visualization of the property surface.

Representation of a local property surface (MEP) used to construct surface-integral models.

Seven such models have been constructed for the prediction of 1) the n-octanol/water

partition coefficient logP, 2) the free energy of solvation in water ΔGsolv.(H2O), 3) the free

energy of solvation in n-octanol ΔGsolv.(oct.), 4) the acid dissociation constant pKa, 5)

boiling point Tb, 6) the glass transition temperature of organic LED materials Tg, and 7)

water solubility logS.

The second statistical method employing the local properties uses support vector

machines to classify drug compounds as inducers of phospholipidosis. Drug-induced

phospholipidosis is an undesirable side-effect of, primarily, cationic amphiphilic drugs

107

Summary

that causes lysosomal bodies, which contain large deposits of undegraded phospholipids,

to aggregate inside the cells of the lungs, liver, kidneys, corneas, and brain. Their

presence often coincides with adverse clinical effects such as inflammatory reactions and

fibrosis. The support vector machines use the statistical measures of the local properties

as descriptors for the classification of a test set of compounds, based on the model

constructed from a training set with the same number of compounds.

Representative local property field (EAL) using CoMFA methodology.

The third statistical method to which the local properties were applied was the 3D-

QSAR method of Comparative Molecular Field Analysis (CoMFA), which involves

aligning a set of molecules and determining the relationship between biological activity

and steric and electrostatic potentials at each of set of grid points surrounding the

structures by a partial least squares analysis. It has been argued that steric and

electrostatic fields alone do not present an adequate representation of drug-receptor

interactions, so additional physicochemical properties are required. In our formulation,

108

Chapter 6

the standard CoMFA electrostatic parameter corresponds to the MEP and the steric

parameter corresponds to the electron density, with the additional local properties

augmenting these basic parameters. Five sets of structure-activity data sets were

examined and our method produced q2 values comparable to, if not better than, the

reported standard CoMFA values. In addition, it was noted that the individual local

properties consistently gave better q2 values than either the MEP or electron density

parameters and proved to be significantly less sensitive to grid orientation. This method

effectively expands the descriptive vocabulary of 3D-QSAR and is better able to reveal

important intermolecular interactions not elucidated by shape and charge fields alone.

The computational methods described here allow drug researchers a means of

applying quantum-mechanically-derived local electronic properties to in silico high

throughput screening schemes in such a way as to not only predict and classify by various

physical properties and biological activities, but also to describe in chemical terms the

nature of the observed activity as a function of surface properties. These properties may

also be visualized by mapping them onto an isodensity surface, making the identification

of important functional moieties readily accessible to everyone involved in the drug design

pathway.

109

Chapter 7

Zusammenfassung

Die Aufklärung von physikalisch-chemischen Eigenschaften welche einen direkten

Bezug zwischen chemischer Struktur und pharmakologischer Aktivität und somit eine

Abschätzung der Aktivität allein basierend auf der strukturellen Information einer

Substanz erlauben ist für die pharmazeutische Industrie von essentieller Bedeutung.

Gelingt dies, so wird für die adäquate Beschreibung der vorherzusagenden Aktivität

folglich nur noch ein passend Satz molekularer Deskriptoren benötigt. Die Schnittstelle

zwischen Wirkstoff und aktivem Zentrum des Zielmoleküls wird unweigerlich durch die

molekularen Oberflächen beider Substanzen bestimmt weswegen ein Modell zur

Beschreibung dieser Oberflächen benötigt wird. Da es sich hierbei um elektronische

Oberflächen handelt werden zur Beschreibung quantenchemische Methoden

herangezogen. Um starke, auf Ladungen basierende, nicht-kovalente Wechselwirkungen

zu beschreiben wurden bisher lokale Eigenschaften auf der molekularen Oberfläche wie

etwa das molekulare elektrostatische Potential (MEP) benutzt. In letzter Zeit wurden

zusätzliche lokale Eigenschaften in diesen Ansatz integriert, welche die Informationen des

MEP ergänzen und somit zu einer vollständigeren Beschreibung der lokalen

elektronischen Umgebung auf der molekularen Oberfläche führen. Diese Arbeit

beschreibt die Integration fünf verschiedener lokaler elektronischer Eigenschaften wie

Elektronenaffinität EAL, Elektronegativität χL, Härte ηL, Ionisationspotential IEL,

molekulares elektronisches Potential MEP und Polarisierbarkeit αL in drei Hauptmethoden

110

Chapter 7

der quantitativen Stuktur-Aktivitäts (QSAR) und Struktur-Eigenschafts Vorhersage

(QSPR) für die Verwendung in High Throughput Screening (HTS) Anwendungen.

Darstellung einer MEP-Oberfläche für die Konstruktion von Oberflächenintegralmodellen

Die erste dieser Methoden beinhaltet die Konstruktion von

Oberflächenintegralmodellen. Diese setzen die physikalischen Eigenschaften mit der

Summe der individuellen Beiträge der lokalen Eigenschaften auf der Oberfläche in

Beziehung welche durch statistische Mehrfachregression bestimmt wurden.

Vergleichbare Regressionsmodelle zur Bestimmung der Aktivität wurden ebenfalls unter

Verwendung statistischer Größen wie etwa Maxima, Minima und Spannweiten aufgestellt.

Die vorhergesagten Eigenschaften können dann auf der molekularen Oberfläche als lokale

Eigenschaften abgebildet werden um so die Bereiche offen zu legen, die der

beobachtbaren Aktivität zuzuordnen sind. Somit sagt diese Methode nicht nur

Eigenschaftswerte voraus sondern eignet sich zusätzlich zur Veranschaulichung einer

Eigenschaftsoberfläche. Insgesamt wurden sieben solcher Modelle erstellt die sich zur

Vorhersage folgender lokaler Eigenschaften eignen 1) dem Verteilungskoeffizienten

(logP) von n-Octanol/Wasser, 2) der freien Lösungsenthalpie in Wasser ΔGsolv.(H2O), 3)

der freien Lösungsenthalpie in n-Octanol ΔGsolv.(oct.), 4) der Säuredissoziationskonstante

111

Zusammenfassung

pKa, 5) des Siedepunkts Tb, 6) der Glasübergangstemperatur organischer LED-Materialien

Tg, sowie 7) der Wasserlöslichkeit logS.

Die zweite statistische Methode zur Bestimmung lokaler Eigenschaften beinhaltet

die Verwendung von support vector machines zur Klassifizierung von

Wirkstoffbestandteilen als Ursache für Phospholipidose. Die wirkstoffinduzierte

Phospholipidose ist ein unerwünschter Nebeneffekt von meist kationischen amphiphilen

Wirkstoffen welcher zur Aggregation von Lysosomen führt die hohe Konzentrationen

nicht abgebauter Phospholipide enthalten die in der Lunge, der Leber, den Nieren, der

Kornea und dem Gehirn häufig zum Auftreten schädlicher Nebenwirkungen wie

Entzündungen und Fibrose führt. Die support vector machines nutzen die statistisch

ermittelten Größen der lokalen Eigenschaften als Deskriptoren zur Bestimmung einer

Auswahl an Substanzen welche vorher in einer Trainingsprozedur klassifiziert wurden.

Darstellung der räumlichen Verteilung lokaler Eigenschaften (EAL) erzeugt mittels CoMFA Methodik.

112

Chapter 7

Die dritte statistische Methode in die die lokalen Eigenschaften integriert wurden

ist die 3D-QSAR Methode der Comparative Molecular Field Analysis (CoMFA).

Basierend auf der Grundlage einer Gruppe zueinander ausgerichteter Moleküle erstellt die

CoMFA-Methode mittels Kleinstquadratanalyse auf einem Satz von Gitterpunkten eine

Beziehung zwischen biologischer Aktivität, der Sterik und dem elektrostatischen

Potential. Es wurde vermutet, dass Felder basierend auf Sterik und Elektrostatik allein

keine adäquate Wiedergabe der Wirkstoff-Rezeptor Wechselwirkung darstellen,

weswegen zusätzliche chemische und physikalische Eigenschaften mit berücksichtigt

werden mussten. In dem hier erarbeiteten Ansatz repräsentiert der elektrostatische

CoMFA Parameter das MEP und der sterische Parameter die Elektronendichte wobei die

zusätzlichen lokalen Eigenschaften diese grundlegenden Parameter ergänzen. Fünf der

mit dieser Methode erzeugten Stuktur-Aktivitäts Datensätze lieferten q2 Werte welche

vergleichbar oder sogar besser ausfielen als die durch Standard-CoMFA erzeugten

Literaturwerte. Zusätzlich zeigte sich, dass die individuellen lokalen Eigenschaften

durchweg bessere q2 Werte ergaben als die durch MEP oder Elektronendichteparameter

allein berechneten und dass sie sich zudem weniger anfällig bezüglich der

Gitterorientierung verhielten. Die in dieser Arbeit vorgestellte Methode erweitert die

Möglichkeiten von 3D-QSAR und ist weitaus besser in der Lage wichtige Informationen

über intermolekulare Wechselwirkungen aufzuzeigen die mit Strukturfeldern und

Ladungsverteilungen allein bisher nicht erkläret werden konnten.

Die hier beschriebenen rechnerbasierten Methoden eröffnen die Möglichkeit der

Anwendung quantenmechanisch generierter lokaler elektronischer Eigenschaften im in

silico HTS sodass nicht nur Vorhersagen und Klassifizierungen von physikalischen

Eigenschaften und biologischen Aktivitäten getroffen werden können sondern zusätzlich

eine chemische Beschreibung der Natur der beobachteten Eigenschaft als eine Funktion

von Oberflächeninformationen geschaffen werden kann. Diese können dann grafisch auf

eine molekulare Oberfläche übertragen und dargestellt werden, was im Folgenden die

Identifikation von wichtigen funktionalen Stellen für alle in der Prozesskette der

Wirkstoffentwicklung beteiligten Personen leicht zugänglich macht.

113

Appendix A

Data Sets

Table A1 Nonlinear regression terms used in calculating surface-integral models. Number Term

1 ( )V r

2 ( )V r

3 ( )3

2V⎡ ⎤⎣ ⎦r

4 ( ) 2V⎡ ⎤⎣ ⎦r

5 ( )5

2V⎡ ⎤⎣ ⎦r

6 ( ) 3V⎡ ⎤⎣ ⎦r

7 ( )LIE r

8 ( )LIE r

9 ( )3

2LIE⎡ ⎤⎣ ⎦r

10 ( ) 2LIE⎡ ⎤⎣ ⎦r

11 ( )5

2LIE⎡ ⎤⎣ ⎦r

12 ( ) 3LIE⎡ ⎤⎣ ⎦r

13 ( )LEA r

14 ( )LEA r

15 ( )3

2LEA⎡ ⎤⎣ ⎦r

16 ( ) 2LEA⎡ ⎤⎣ ⎦r

17 ( )5

2LEA⎡ ⎤⎣ ⎦r

18 ( ) 3LEA⎡ ⎤⎣ ⎦r

19 ( )Lα r

20 ( )Lα r

114

21 ( )3

2Lα⎡ ⎤⎣ ⎦r

22 ( ) 2Lα⎡ ⎤⎣ ⎦r

23 ( )5

2Lα⎡ ⎤⎣ ⎦r

24 ( ) 3Lα⎡ ⎤⎣ ⎦r

25 ( )Lη r

26 ( )Lη r

27 ( )3

2Lη⎡ ⎤⎣ ⎦r

28 ( ) 2Lη⎡ ⎤⎣ ⎦r

29 ( )5

2Lη⎡ ⎤⎣ ⎦r

30 ( ) 3Lη⎡ ⎤⎣ ⎦r

31 ( ) ( )LV IE⋅r r

32 ( ) ( )LV IE⋅r r

33 ( ) ( )3

2LV IE⎡ ⎤⋅⎣ ⎦r r

34 ( ) ( ) 2LV IE⋅⎡ ⎤⎣ ⎦r r

35 ( ) ( )5

2LV IE⎡ ⎤⋅⎣ ⎦r r

36 ( ) ( ) 3LV IE⋅⎡ ⎤⎣ ⎦r r

37 ( ) ( )LV EA⋅r r

38 ( ) ( )LV EA⋅r r

39 ( ) ( )3

2LV EA⎡ ⎤⋅⎣ ⎦r r

40 ( ) ( ) 2LV EA⋅⎡ ⎤⎣ ⎦r r

41 ( ) ( )5

2LV EA⎡ ⎤⋅⎣ ⎦r r

42 ( ) ( ) 3LV EA⋅⎡ ⎤⎣ ⎦r r

43 ( ) ( )LV α⋅r r

44 ( ) ( )LV α⋅r r

45 ( ) ( )3

2LV α⎡ ⎤⋅⎣ ⎦r r

46 ( ) ( ) 2LV α⋅⎡ ⎤⎣ ⎦r r

47 ( ) ( )5

2LV α⎡ ⎤⋅⎣ ⎦r r

48 ( ) ( ) 3LV α⋅⎡ ⎤⎣ ⎦r r

49 ( ) ( )LV η⋅r r

50 ( ) ( )LV η⋅r r

51 ( ) ( )3

2LV η⎡ ⎤⋅⎣ ⎦r r

52 ( ) ( ) 2LV η⋅⎡ ⎤⎣ ⎦r r

115

53 ( ) ( )5

2LV η⎡ ⎤⋅⎣ ⎦r r

54 ( ) ( ) 3LV η⋅⎡ ⎤⎣ ⎦r r

55 ( ) ( )L LIE EA⋅r r

56 ( ) ( )L LIE EA⋅r r

57 ( ) ( )3

2L LIE EA⎡ ⎤⋅⎣ ⎦r r

58 ( ) ( ) 2L LIE EA⋅⎡ ⎤⎣ ⎦r r

59 ( ) ( )5

2L LIE EA⎡ ⎤⋅⎣ ⎦r r

60 ( ) ( ) 3L LIE EA⋅⎡ ⎤⎣ ⎦r r

61 ( ) ( )L LIE α⋅r r

62 ( ) ( )L LIE α⋅r r

63 ( ) ( )3

2L LIE α⋅⎡ ⎤⎣ ⎦r r

64 ( ) ( ) 2L LIE α⋅⎡ ⎤⎣ ⎦r r

65 ( ) ( )5

2L LIE α⋅⎡ ⎤⎣ ⎦r r

66 ( ) ( ) 3L LIE α⋅⎡ ⎤⎣ ⎦r r

67 ( ) ( )L LIE η⋅r r

68 ( ) ( )L LIE η⋅r r

69 ( ) ( )3

2L LIE η⋅⎡ ⎤⎣ ⎦r r

70 ( ) ( ) 2L LIE η⋅⎡ ⎤⎣ ⎦r r

71 ( ) ( )5

2L LIE η⋅⎡ ⎤⎣ ⎦r r

72 ( ) ( ) 3L LIE η⋅⎡ ⎤⎣ ⎦r r

73 ( ) ( )L LEA α⋅r r

74 ( ) ( )L LEA α⋅r r

75 ( ) ( )3

2L LEA α⎡ ⎤⋅⎣ ⎦r r

76 ( ) ( ) 2L LEA α⋅⎡ ⎤⎣ ⎦r r

77 ( ) ( )5

2L LEA α⎡ ⎤⋅⎣ ⎦r r

78 ( ) ( ) 3L LEA α⋅⎡ ⎤⎣ ⎦r r

79 ( ) ( )L LEA η⋅r r

80 ( ) ( )L LEA η⋅r r

81 ( ) ( )3

2L LEA η⎡ ⎤⋅⎣ ⎦r r

82 ( ) ( ) 2L LEA η⋅⎡ ⎤⎣ ⎦r r

83 ( ) ( )5

2L LEA η⎡ ⎤⋅⎣ ⎦r r

84 ( ) ( ) 3L LEA η⋅⎡ ⎤⎣ ⎦r r

116

85 ( ) ( )L Lα η⋅r r

86 ( ) ( )L Lα η⋅r r

87 ( ) ( )3

2L Lα η⋅⎡ ⎤⎣ ⎦r r

88 ( ) ( ) 2L Lα η⋅⎡ ⎤⎣ ⎦r r

89 ( ) ( )5

2L Lα η⋅⎡ ⎤⎣ ⎦r r

90 ( ) ( ) 3L Lα η⋅⎡ ⎤⎣ ⎦r r

91 ( ) ( ) ( )L LV IE EA⋅ ⋅r r r

92 ( ) ( ) ( )L LV IE EA⋅ ⋅r r r

93 ( ) ( ) ( )3

2L LV IE EA⎡ ⎤⋅ ⋅⎣ ⎦r r r

94 ( ) ( ) ( ) 2L LV IE EA⋅ ⋅⎡ ⎤⎣ ⎦r r r

95 ( ) ( ) ( )5

2L LV IE EA⎡ ⎤⋅ ⋅⎣ ⎦r r r

96 ( ) ( ) ( ) 3L LV IE EA⋅ ⋅⎡ ⎤⎣ ⎦r r r

97 ( ) ( ) ( )L LV IE α⋅ ⋅r r r

98 ( ) ( ) ( )L LV IE α⋅ ⋅r r r

99 ( ) ( ) ( )3

2L LV IE α⎡ ⎤⋅ ⋅⎣ ⎦r r r

100 ( ) ( ) ( ) 2L LV IE α⋅ ⋅⎡ ⎤⎣ ⎦r r r

101 ( ) ( ) ( )5

2L LV IE α⎡ ⎤⋅ ⋅⎣ ⎦r r r

102 ( ) ( ) ( ) 3L LV IE α⋅ ⋅⎡ ⎤⎣ ⎦r r r

103 ( ) ( ) ( )L LV IE η⋅ ⋅r r r

104 ( ) ( ) ( )L LV IE η⋅ ⋅r r r

105 ( ) ( ) ( )3

2L LV IE η⎡ ⎤⋅ ⋅⎣ ⎦r r r

106 ( ) ( ) ( ) 2L LV IE η⋅ ⋅⎡ ⎤⎣ ⎦r r r

107 ( ) ( ) ( )5

2L LV IE η⎡ ⎤⋅ ⋅⎣ ⎦r r r

108 ( ) ( ) ( ) 3L LV IE η⋅ ⋅⎡ ⎤⎣ ⎦r r r

109 ( ) ( ) ( )L LV EA α⋅ ⋅r r r

110 ( ) ( ) ( )L LV EA α⋅ ⋅r r r

111 ( ) ( ) ( )3

2L LV EA α⎡ ⎤⋅ ⋅⎣ ⎦r r r

112 ( ) ( ) ( ) 2L LV EA α⋅ ⋅⎡ ⎤⎣ ⎦r r r

113 ( ) ( ) ( )5

2L LV EA α⎡ ⎤⋅ ⋅⎣ ⎦r r r

114 ( ) ( ) ( ) 3L LV EA α⋅ ⋅⎡ ⎤⎣ ⎦r r r

115 ( ) ( ) ( )L LV EA η⋅ ⋅r r r

116 ( ) ( ) ( )L LV EA η⋅ ⋅r r r

117

117 ( ) ( ) ( )3

2L LV EA η⎡ ⎤⋅ ⋅⎣ ⎦r r r

118 ( ) ( ) ( ) 2L LV EA η⋅ ⋅⎡ ⎤⎣ ⎦r r r

119 ( ) ( ) ( )5

2L LV EA η⎡ ⎤⋅ ⋅⎣ ⎦r r r

120 ( ) ( ) ( ) 3L LV EA η⋅ ⋅⎡ ⎤⎣ ⎦r r r

121 ( ) ( ) ( )L L LIE EA α⋅ ⋅r r r

r

122 ( ) ( ) ( )L L LIE EA α⋅ ⋅r r

123 ( ) ( ) ( )3

2L L LIE EA α⎡ ⎤⋅ ⋅⎣ ⎦r r r

124 ( ) ( ) ( ) 2L L LIE EA α⋅ ⋅⎡ ⎤⎣ ⎦r r r

125 ( ) ( ) ( )5

2L L LIE EA α⎡ ⎤⋅ ⋅⎣ ⎦r r r

126 ( ) ( ) ( ) 3L L LIE EA α⋅ ⋅⎡ ⎤⎣ ⎦r r r

127 ( ) ( ) ( )L L LIE EA η⋅ ⋅r r r

r

128 ( ) ( ) ( )L L LIE EA η⋅ ⋅r r

129 ( ) ( ) ( )3

2L L LIE EA η⎡ ⎤⋅ ⋅⎣ ⎦r r r

130 ( ) ( ) ( ) 2L L LIE EA η⋅ ⋅⎡ ⎤⎣ ⎦r r r

131 ( ) ( ) ( )5

2L L LIE EA η⎡ ⎤⋅ ⋅⎣ ⎦r r r

132 ( ) ( ) ( ) 3L L LIE EA η⋅ ⋅⎡ ⎤⎣ ⎦r r r

133 ( ) ( ) ( )L L LIE α η⋅ ⋅r r r

134 ( ) ( ) ( )L L LIE α η⋅ ⋅r r r

135 ( ) ( ) ( )3

2L L LIE α η⋅ ⋅⎡ ⎤⎣ ⎦r r r

136 ( ) ( ) ( ) 2L L LIE α η⋅ ⋅⎡ ⎤⎣ ⎦r r r

137 ( ) ( ) ( )5

2L L LIE α η⋅ ⋅⎡ ⎤⎣ ⎦r r r

138 ( ) ( ) ( ) 3L L LIE α η⋅ ⋅⎡ ⎤⎣ ⎦r r r

139 ( ) ( ) ( )L L LEA α η⋅ ⋅r r r

140 ( ) ( ) ( )L L LEA α η⋅ ⋅r r r

141 ( ) ( ) ( )3

2L L LEA α η⎡ ⎤⋅ ⋅⎣ ⎦r r r

142 ( ) ( ) ( ) 2L L LEA α η⋅ ⋅⎡ ⎤⎣ ⎦r r r

143 ( ) ( ) ( )5

2L L LEA α η⎡ ⎤⋅ ⋅⎣ ⎦r r r

144 ( ) ( ) ( ) 3L L LEA α η⋅ ⋅⎡ ⎤⎣ ⎦r r r

145 ( ) ( ) ( ) ( )L L LV EA α η⋅ ⋅ ⋅r r r r

r

146 ( ) ( ) ( ) ( )L L LV EA α η⋅ ⋅ ⋅r r r

147 ( ) ( ) ( ) ( )3

2L L LV EA α η⎡ ⎤⋅ ⋅ ⋅⎣ ⎦r r r r

148 ( ) ( ) ( ) ( ) 2L L LV EA α η⋅ ⋅ ⋅⎡ ⎤⎣ ⎦r r r r

118

149 ( ) ( ) ( ) ( )5

2L L LV EA α η⎡ ⎤⋅ ⋅ ⋅⎣ ⎦r r r r

150 ( ) ( ) ( ) ( ) 3L L LV EA α η⋅ ⋅ ⋅⎡ ⎤⎣ ⎦r r r r

Table A2 The logP data set.

No. Compound Exp. Calc.

1 glutamine -3.64 -3.16 2 citric acid -1.72 -0.59 3 phenylalanine -1.52 -1.79 4 tryptophan -1.06 -1.12 5 1,3-propanediol -1.04 -1.02 6 maleic acid-hydrazide -0.84 -0.43 7 N-formylcyclobutane carboxamide -0.70 0.19 8 allopurinol -0.55 1.23 9 2,2-dimethylpropionic acid-hydrazide -0.35 0.41

10 3-fluoropropanol -0.28 -0.14 11 thiamphenicol -0.27 1.19 12 2',3'-didesoxyadenosine -0.22 1.17 13 3-mesylphenyl urea -0.12 -0.95 14 imidazole -0.08 0.26 15 caffeine -0.07 0.91 16 o-methyl THPO -0.04 0.82 17 5,6-dihydro-2-methyl-1,4-oxathiin-3-carboxylic acid 0.04 0.32 18 mercaptoacetic acid 0.09 0.62 19 6-methylthioinosine 0.09 1.85 20 merbarone 0.14 2.64 21 atenolol 0.16 1.68 22 o-methylbenzoyl hydrazine 0.22 1.01 23 pentoxifylline 0.29 1.49 24 nikethamide 0.33 0.10 25 p-hydroxybenzamide 0.33 2.02 26 2,2-dichloroethanol 0.37 1.27 27 antipyrine 0.38 1.87 28 1-acetyl-N-(4-fluorophenyl)hydrazine carboxamide 0.42 -0.11 29 sulpiride 0.42 0.68 30 piperazine-2-carboxanilide 0.48 0.76 31 fluconazole 0.50 3.06 32 acetaminophen 0.51 0.20 33 2-amino-5-methoxy benzimidazole 0.57 0.99 34 sotalol 0.59 0.98 35 glutaric acid dimethyl ester 0.62 0.07 36 3-(5-nitro-2-furanyl)-2-propenoic amide 0.65 0.84 37 gallic acid 0.70 -0.23 38 1-acethyl-6-dimethyl-7-methoxymitosene 0.72 0.86 39 2-azacycloheptanthione 0.75 2.11 40 N-(2-benzoyl-oxyacetyl)-2-carboxyazetidine 0.79 0.37 41 chloropentazide 0.84 1.89 42 4-pyridinebutylamine 0.86 2.24 43 procainamide 0.88 1.52 44 tiapride 0.90 0.68 45 4-methylthiazole 0.97 1.40 46 6-cyanoquinoxaline 1.01 2.46 47 syringic acid 1.04 -0.02

119

48 1-phenyl-3-cyanoguanidine 1.05 2.06 49 m-acetylaminoacetophenone 1.10 0.98 50 acetylsalicylic acid 1.19 0.63 51 benzaldehydesemicarbazone 1.27 0.71 52 4-oxo-4-phenylbutanoic acid 1.30 1.00 53 2-phenylethanol 1.36 1.62 54 carocainide 1.38 1.43 55 3-bromobenzenesulfonamide 1.39 1.30 56 bromochloromethane 1.41 1.10 57 trimethylacetic acid 1.47 0.81 58 o-fluorophenylacetic acid 1.50 1.19 59 2-(2,6-dichloro-4-hydroxyphenylimino)imidazolidine 1.52 1.72 60 N-phenyl-4-aminophenylsufonamide 1.55 0.54 61 hydrocortisone 1.55 1.21 62 tryptamine 1.55 1.57 63 acetophenone 1.58 1.39 64 p-(N,N-dimethylcarbamate)-N,N-dimethylcarbamate, benzyl ester 1.59 1.48 65 prednisolone 1.60 1.82 66 1-dodecansulfonic acid 1.60 2.46 67 2-methylquinoxaline 1.61 2.72 68 3,5-dimethoxyphenol 1.64 0.74 69 bromazepam 1.65 2.89 70 indole-3-ethanolcarbamate 1.69 1.44 71 3-indolylpropionic acid 1.75 1.84 72 pindolol 1.75 2.09 73 propylene 1.77 1.54 74 2-oxoisopropyl-5-phenyl-5'-ethylbarbituric acid 1.79 3.11 75 4-dimethylamino-thieno(2,3-D)pyrimidine 1.82 2.09 76 dexamethasone 1.83 1.28 77 2-acetyl-oxyethyl benzoate 1.85 1.40 78 4-chloroaniline 1.88 1.53 79 N-methyl-2,3-dimethylphenyl carbamate 1.95 1.72 80 o-methylphenoxyacetic acid 1.98 0.95 81 acetic acid-m-methoxybenzoate 2.02 1.42 82 quinoline 2.03 2.69 83 1,1'-dioxo-3-cyclohexen-3-yl-1,2,4-benzothiadiazine 2.05 1.67 84 3,4-dimethylacetanilide 2.10 1.71 85 indole 2.14 1.59 86 mexilitene 2.15 2.32 87 griseofulvin 2.18 2.37 88 carbamazepine 2.19 2.08 89 hydrocortisone acetate 2.19 2.88 90 o-methylbenzaldehyde 2.26 1.48 91 4-bromoaniline 2.26 1.52 92 2,6-mimethoxypyridine 2.30 1.35 93 thiophene-2-carboxylic acid, ethyl ester 2.33 1.50 94 21-desoxybetamethasone 2.35 2.11 95 thiosalicylic acid 2.39 1.98 96 butyl gallate 2.41 1.14 97 1-pyrrol-2-yl-pentanone 2.42 1.55 98 4-phenylbutyric acid 2.42 2.04 99 5,5'-diphenylhydantoin 2.47 2.49

100 8-trifluoromethylquinoline 2.50 1.28 101 3-chlorophenol 2.50 1.93 102 lorazepam 2.51 3.04 103 2,17-dihydroxy-3-oxolactone-7,21-dicarboxy-pregan-4-ene 2.54 3.65 104 di-isopyramide 2.58 3.65 105 N-benzyl-N-formylaniline 2.62 2.62 106 5,6-diazaphenanthrene 2.71 3.47

120

107 lormetazepam 2.72 3.58

108 1-methyl-1,3-dihydro-5-(2-fluorophenyl)-7-chloro-1,4-benzodiazepin-2-one 2.75 3.57

109 diazepam 2.79 3.73 110 3-butyl-R,S-1-(3H)-isobenzofuranone 2.80 2.57 111 2-anilino-1,4-naphthoquinone 2.84 2.33 112 4-aminobiphenyl 2.86 2.90 113 quinidine 2.88 4.55 114 chlorobenzene 2.89 1.95 115 dihydromorphanthridine 2.90 3.67 116 p-phenoxyaniline 2.93 2.67 117 3-bromoquinoline 3.03 2.96 118 octanoic acid 3.05 2.04 119 deoxycorticosterone acetate 3.08 3.42 120 alprenolol 3.10 3.38 121 indecainide 3.11 4.09 122 N-(3,4-dichlorophenyl)difluoroacetamide 3.18 2.61 123 benzophenone 3.18 2.67 124 p-fluorotoluene 3.20 1.71 125 testosterone 3.29 3.37 126 1-(3,4-dichlorophenyl)-2-isopropylaminoethanol 3.32 3.65 127 3-methoxy-4-cyclohexylmethoxyphenylacetic acid 3.35 3.28 128 naphthalene 3.37 3.29 129 anthraquinone 3.39 2.06 130 1,2-dichlorobenzene 3.43 2.14 131 prometrin 3.51 2.37 132 4,7-dichloroquinoline 3.57 3.80 133 9-(N-((N,N’-diethylamino)acetyl)amino)fluorene 3.64 4.66 134 3,5-dichlorophenol 3.68 2.15 135 indigo 3.72 2.55 136 flecainide 3.78 2.38 137 3,4-dimethylchlorobenzene 3.82 3.30 138 diflubenzuron 3.83 2.14 139 estradiol 4.01 3.55 140 1-(4-cyclohexylphenyl)-3-methoxy-3-methylurea 4.08 3.61

141 1-phenyl-1-benzyl-2-methyl-3-(N,N-dimethylamino)-propanoic acid, propyl ester 4.18 4.48

142 2,6-dimethylnaphthalene 4.31 3.89 143 aminopyrene 4.31 4.15 144 fluphenazine 4.36 3.87 145 1,3-dimethylnaphthalene 4.42 4.08 146 1,3-dithiolan-2-ylidine-propanoic acid, dibutyl ester 4.60 4.32 147 1,2,4,5-tetrachlorobenzene 4.60 5.32 148 propafenone 4.63 3.53 149 bifonazole 4.77 5.82 150 aprindine 4.86 6.00 151 diethylstilbestrol 5.07 4.56 152 fluoranthene 5.16 4.93 153 trifluopromazine 5.19 4.18 154 clotrimazole 5.20 5.25 155 teflubenzuron 5.39 3.79 156 hexaflumuron 5.43 3.77 157 2,4,4'-trichlorobiphenyl 5.62 5.07 158 2,4,5-trichlorobiphenyl 5.90 4.50 159 thioridazine 5.90 5.47 160 phenylanthracene 6.01 6.33 161 flufenoxuron 6.16 5.10 162 1,3,7,8-tetrachlorodibenzodioxin 6.30 6.92 163 chlorfluazuron 6.63 5.09

121

164 1,2,3,6,7-pentachlorodibenzodioxin 6.74 8.00 165 linoleic acid 7.05 5.90 166 palmitic acid 7.17 5.26 167 3,3',4,4',5,5'-hexachlorobiphenyl 7.41 6.46 168 stearic acid 8.23 6.15

Table A3 The free energy of solvation in n-octanol data set. No. Compound Exp. Calc.

1 methane 0.51 -0.67 2 ethane -0.64 -1.19 3 propane -1.26 -1.77 4 cyclopropane -1.60 -0.98 5 2-methylpropane -1.45 -2.17 6 2,2-dimethylpropane -1.74 -2.36 7 n-butane -1.86 -2.22 8 cyclopentane -2.65 -3.49 9 n-pentane -2.45 -2.80

10 n-hexane -3.01 -3.32 11 cyclohexane -3.46 -3.16 12 methylcyclohexane -3.21 -3.36 13 n-heptane -3.74 -3.90 14 n-octane -4.18 -4.47 15 ethylene -0.27 -0.74 16 propylene -1.14 -1.42 17 2-methylpropene -2.03 -2.04 18 1-butene -1.89 -2.23 19 1-hexene -2.94 -3.37 20 1,3-butadiene -2.10 -2.16 21 acetylene -0.51 -0.61 22 propyne -1.59 -1.25 23 1-pentyne -2.79 -2.55 24 1-hexyne -3.43 -3.02 25 benzene -3.72 -3.92 26 toluene -4.55 -4.50 27 ethylbenzene -5.08 -5.20 28 m-xylene -5.25 -5.11 29 o-xylene -5.07 -4.98 30 p-xylene -5.19 -5.07 31 naphthalene -6.97 -6.97 32 anthracene -10.47 -10.11 33 1,1-difluoroethane -1.13 -2.39 34 tetrafluoromethane 1.50 0.36 35 fluorobenzene -3.87 -5.15 36 chlorotrifluoromethane -1.97 -0.49 37 dichlorodifluoromethane -1.25 -1.68 38 fluorotrichloromethane -2.63 -2.87 39 1,1,2-trichloro-1,2,2-trifluoroethane -2.54 -2.66 40 1-bromo-1-chloro-2,2,2-trifluoroethane -3.27 -4.01

122

41 bromotrifluoromethane -0.75 -1.44 42 dichloromethane -3.07 -2.44 43 trichloromethane -3.81 -3.15 44 chloroethane -2.58 -2.20 45 1,1,1-trichloroethane -3.69 -4.05 46 1,1,2-trichloroethane -4.53 -3.95 47 1-chloropropane -3.06 -2.93 48 2-chloropropane -2.84 -1.51 49 cis-1,2-dichloroethylene -3.71 -3.33 50 trans-1,2-dichloroethylene -3.61 -2.75 51 trichloroethylene -3.75 -3.29 52 tetrachloroethylene -4.24 -3.97 53 chlorobenzene -5.00 -5.42 54 1,2-dichlorobenzene -6.01 -6.26 55 1,4-dichlorobenzene -5.67 -6.55 56 2,2'-dichlorobiphenyl -9.41 -8.78 57 2,3-dichlorobiphenyl -9.23 -9.98 58 2,2',3'-trichlorobiphenyl -9.12 -9.81 59 bromomethane -2.43 -2.40 60 dibromomethane -4.18 -4.90 61 tribromomethane -5.62 -5.05 62 bromoethane -2.90 -2.93 63 1-bromopropane -3.42 -3.59 64 2-bromopropane -3.40 -2.94 65 1-bromobutane -4.16 -4.41 66 1-bromopentane -4.68 -5.02 67 3-bromopropene -3.30 -4.04 68 bromobenzene -5.46 -5.59 69 1,4-dibromobenzene -7.47 -7.46 70 p-bromotoluene -6.36 -6.14 71 methanol -3.87 -4.13 72 ethanol -4.36 -4.49 73 ethylene glycol -7.44 -6.90 74 1-propanol -5.02 -5.11 75 2-propanol -4.62 -4.93 76 1,1,1-trifluoro-2-propanol -5.12 -6.15 77 1,1,1,3,3,3-hexafluoro-2-propanol -5.76 -3.60 78 1-butanol -5.71 -5.49 79 t-butanol -4.78 -4.75 80 1-pentanol -6.40 -6.13 81 1-hexanol -7.06 -6.67 82 1-heptanol -7.75 -7.19 83 1-octanol -8.13 -7.75 84 allyl alcohol -5.27 -5.34 85 phenol -8.69 -7.46 86 4-bromophenol -10.59 -9.48 87 2-cresol -8.49 -7.89 88 3-cresol -8.20 -8.05 89 4-cresol -8.84 -8.12 90 2,2,2-trifluoroethanol -4.81 -6.75 91 2-methoxyethanol -5.83 -5.14

123

92 methyl propyl ether -3.63 -3.41 93 methyl isopropyl ether -4.64 -3.48 94 methyl t-butyl ether -3.49 -3.28 95 diethyl ether -2.89 -3.14 96 THF -3.93 -3.61 97 anisole -5.47 -5.82 98 ethyl phenyl ether -5.65 -6.29 99 1,2-dimethoxyethane -4.55 -4.25 100 1,4-dioxane -4.89 -5.23 101 propanal -4.13 -4.29 102 butanal -4.62 -4.54 103 benzaldehyde -6.13 -6.66 104 m-hydroxybenzaldehyde -11.39 -10.71 105 p-hydroxybenzaldehyde -12.36 -10.94 106 acetone -3.15 -3.94 107 2-butanone -3.78 -4.41 108 3,3-dimethyl-2-butanone -4.53 -5.14 109 2-pentanone -4.35 -4.96 110 3-pentanone -4.36 -4.77 111 cyclopentanone -5.01 -5.67 112 2-hexanone -5.02 -5.21 113 2-heptanone -5.65 -5.94 114 2-octanone -6.38 -6.31 115 acetophenone -6.74 -7.14 116 acetic acid -6.35 -5.20 117 propionic acid -6.86 -5.70 118 butyric acid -7.58 -6.27 119 pentanoic acid -8.22 -6.92 120 hexanoic acid -8.82 -7.41 121 4-amino-3,5,6-trichloropyridine-2-carboxylic acid -12.37 -13.33 122 methyl formate -2.82 -5.09 123 methyl acetate -3.54 -4.16 124 ethyl acetate -4.06 -4.70 125 propyl acetate -4.55 -5.18 126 butyl acetate -4.96 -5.76 127 methyl propionate -4.06 -4.69 128 methyl butyrate -4.59 -5.22 129 methyl pentanoate -5.13 -5.93 130 methyl benzoate -7.26 -8.06 131 methylamine -3.78 -3.31 132 ethylamine -4.09 -3.99 133 propylamine -4.77 -4.57 134 butylamine -5.35 -5.00 135 diethylamine -4.75 -4.36 136 dipropylamine -6.02 -5.64 137 trimethylamine -3.60 -2.35 138 piperazine -5.80 -5.91 139 aniline -6.71 -8.18 140 hydrazine -6.48 -7.06 141 morpholine -5.99 -5.33 142 piperidine -6.27 -4.67

124

143 pyridine -5.34 -4.89 144 2-methylpyridine -6.14 -5.38 145 3-methylpyridine -6.40 -5.60 146 4-methylpyridine -6.60 -5.66 147 2-ethylpyridine -6.40 -5.99 148 2-methylpyrazine -5.87 -6.30 149 2-ethyl-3-methoxypyrazine -6.85 -7.54 150 acetonitrile -3.15 -2.29 151 propionitrile -3.66 -3.14 152 butyronitrile -4.25 -3.67 153 benzonitrile -6.09 -6.54 154 2,6-dichlorobenzonitrile -9.18 -8.05 155 1-propanethiol -3.52 -3.80 156 thiophenol -5.99 -6.68 157 thioanisole -6.47 -6.99 158 dimethyl sulfide -4.24 -2.61 159 diethyl sulfide -4.09 -3.48 160 dipropyl sulfide -3.89 -4.99 161 trimethyl phosphate -7.81 -8.94 162 triethyl phosphate -8.88 -8.78 163 tripropyl phosphate -8.65 -8.22 164 2,2-dichloroethenyl dimethyl phosphate -8.59 -7.89 165 o-ethyl-o'-(4-bromo-2-chlorophenyl) S-propyl phosphorothioate -10.49 -10.80

Table A4 The free energy of solvation in water data set. No. Compound Exp. Calc. 1 methane 1.98 0.98 2 ethane 1.83 0.85 3 propane 1.96 1.04 4 cyclopropane 0.75 0.07 5 2-methylpropane 2.32 1.42 6 2,2-dimethylpropane 2.50 1.98 7 n-butane 2.08 1.24 8 2,2-dimethylbutane 2.59 2.14 9 cyclopentane 1.20 -0.64

10 n-pentane 2.33 1.44 11 2-methylpentane 2.52 1.81 12 3-methylpentane 2.51 1.73 13 2,4-dimethylpentane 2.88 2.22 14 2,2,4-trimethylpentane 2.85 2.72 15 methylcyclopentane 1.60 -0.13 16 n-hexane 2.49 1.64 17 cyclohexane 1.23 0.74 18 methylcyclohexane 1.71 1.18 19 cis-1,2-dimethylcyclohexane 1.58 1.59 20 n-heptane 2.62 1.84

125

21 n-octane 2.89 2.02 22 ethylene 1.27 0.86 23 propylene 1.27 0.72 24 2-methylpropene 1.16 0.46 25 1-butene 1.38 0.87 26 2-methyl-2-butene 1.31 0.26 27 3-methyl-1-butene 1.83 1.18 28 1-pentene 1.66 1.14 29 trans-2-pentene 1.34 0.72 30 4-methyl-1-pentene 1.91 1.51 31 cyclopentene 0.56 -0.77 32 1-hexene 1.66 1.35 33 cyclohexene 0.37 -0.58 34 trans-2-heptene 1.66 1.13 35 1-methylcyclohexene 0.67 -0.75 36 1-octene 2.17 1.85 37 1,3-butadiene 0.61 0.77 38 2-methyl-1,3-butadiene 0.68 0.57 39 2,3-dimethyl-1,3-butadiene 0.40 0.37 40 1,4-pentadiene 0.94 0.89 41 1,5-hexadiene 1.01 1.06 42 acetylene -0.01 0.55 43 propyne -0.48 -0.11 44 1-butyne -0.15 0.05 45 1-pentyne -0.16 0.48 46 1-hexyne 0.01 0.61 47 1-heptyne 0.60 0.88 48 1-octyne 0.71 1.05 49 1-nonyne 1.05 1.28 50 vinyl acetate 0.04 0.19 51 benzene -0.89 -0.88 52 toluene -0.76 -0.96 53 1,2,4-trimethylbenzene -0.86 -1.12 54 ethylbenzene -0.61 -0.77 55 m-xylene -0.80 -1.05 56 o-xylene -0.90 -1.01 57 p-xylene -0.80 -1.04 58 propylbenzene -0.53 -0.37 59 butylbenzene -0.40 -0.27 60 t-butylbenzene -0.44 0.23 61 t-amylbenzene -0.18 0.36 62 naphthalene -2.41 -2.43 63 anthracene -4.23 -4.02 64 phenanthrene -4.06 -4.09 65 acenaphthene -3.40 -3.75 66 p-chlorotoluene -1.92 -2.19 67 fluoromethane -0.22 -2.29 68 1,1-difluoroethane -0.11 -3.26 69 trifluoromethane 0.80 -1.15 70 tetrafluoromethane 3.16 2.09 71 hexafluoroethane 3.94 3.33

126

72 octafluoropropane 4.28 4.80 73 fluorobenzene -0.78 -3.31 74 2-chloro-1,1,1-trifluoroethane 0.05 -1.15 75 chlorofluoromethane -0.77 -0.68 76 chlorodifluoromethane 0.11 1.13 77 chlorotrifluoromethane 2.52 2.93 78 dichlorodifluoromethane 1.69 2.54 79 fluorotrichloromethane 0.82 0.33 80 1,1,2-trichloro-1,2,2-trifluoroethane 1.77 3.05 81 1,1,2,2-tetrachlorodifluoroethane 0.82 2.38 82 chloropentafluoroethane 2.86 3.46 83 1,1-dichlorotetrafluoroethane 2.50 2.75 84 1,2-dichlorotetrafluoroethane 2.31 3.97 85 1-bromo-1-chloro-2,2,2-trifluoroethane -0.13 -1.23 86 bromotrifluoromethane 1.79 -1.16 87 1-bromo-1,2,2,2-tetrafluoroethane 0.52 -0.80 88 chloromethane -0.56 -0.81 89 dichloromethane -1.36 -1.02 90 trichloromethane -1.07 -0.34 91 tetrachloromethane 0.10 -0.38 92 chloroethane -0.63 -0.76 93 1,1-dichloroethane -0.85 -1.42 94 (E)-1,2-dichloroethane -1.73 -1.64 95 1,1,1-trichloroethane -0.25 -1.26 96 1,1,2-trichloroethane -1.95 -1.66 97 1,1,1,2-tetrachloroethane -1.15 -0.57 98 1,1,2,2-tetrachloroethane -2.36 -0.79 99 pentachloroethane -1.36 -0.09

100 hexachloroethane -1.40 0.52 101 1-chloropropane -0.35 -0.43 102 2-chloropropane -0.24 0.49 103 1,2-dichloropropane -1.25 -1.01 104 1,3-dichloropropane -1.90 -1.93 105 1-chlorobutane -0.14 -0.31 106 2-chlorobutane 0.07 0.51 107 1,1-dichlorobutane -0.70 -1.05 108 1-chloropentane -0.07 -0.13 109 2-chloropentane 0.07 0.73 110 3-chloropentane 0.07 0.49 111 chloroethylene 0.49 -0.68 112 cis-1,2-dichloroethylene -1.17 -1.50 113 trans-1,2-dichloroethylene -0.76 -1.32 114 trichloroethylene -0.44 -1.16 115 tetrachloroethylene 0.05 -0.55 116 chlorobenzene -1.01 -1.97 117 o-chlorotoluene -1.15 -1.69 118 1,2-dichlorobenzene -1.36 -2.68 119 1,3-dichlorobenzene -0.98 -2.86 120 1,4-dichlorobenzene -1.01 -3.03 121 2,2'-dichlorobiphenyl -2.73 -2.38 122 2,3-dichlorobiphenyl -2.45 -3.48

127

123 2,2',3'-trichlorobiphenyl -1.99 -3.34 124 bromotrichloromethane -0.93 -2.28 125 1-chloro-2-bromoethane -1.95 -1.91 126 bromomethane -0.82 -0.49 127 dibromomethane -2.11 -2.39 128 tribromomethane -1.98 -3.00 129 bromoethane -0.70 -0.71 130 1,2-dibromoethane -2.10 -1.58 131 1-bromopropane -0.56 -0.46 132 2-bromopropane -0.48 1.76 133 1,2-dibromopropane -1.94 -0.23 134 1,3-dibromopropane -1.96 -1.75 135 1-bromo-2-methylpropane -0.03 0.31 136 1-bromobutane -0.41 -0.38 137 1-bromoisobutane -0.03 0.73 138 1-bromo-3-methylbutane 0.20 -0.03 139 1-bromopentane -0.08 -0.22 140 3-bromopropene -0.86 -0.42 141 bromobenzene -1.46 -1.53 142 1,4-dibromobenzene -2.30 -1.71 143 p-bromotoluene -1.39 -1.77 144 1-bromo-2-ethylbenzene -1.19 -1.17 145 o-bromocumene -0.85 -0.22 146 methanol -5.07 -4.58 147 ethanol -4.90 -4.86 148 ethylene glycol -9.30 -7.52 149 1-propanol -4.85 -4.57 150 2-propanol -4.75 -4.92 151 1,1,1-trifluoro-2-propanol -4.16 -4.18 152 2,2,3,3-tetrafluoropropanol -4.90 -4.30 153 2,2,3,3,3-pentafluoropropanol -4.15 -5.57 154 1,1,1,3,3,3-hexafluoro-2-propanol -3.76 -0.48 155 2-methyl-1-propanol -4.51 -5.11 156 1-butanol -4.72 -4.34 157 2-butanol -4.61 -3.06 158 t-butanol -4.51 -4.37 159 2-methyl-1-butanol -4.42 -3.27 160 3-methyl-1-butanol -4.42 -3.87 161 2-methyl-2-butanol -4.43 -3.59 162 2,3-dimethyl-1-butanol -3.91 -4.46 163 1-pentanol -4.49 -4.15 164 2-pentanol -4.39 -3.12 165 3-pentanol -4.35 -3.45 166 2-methyl-1-pentanol -3.93 -4.52 167 2-methyl-2-pentanol -3.93 -3.14 168 2-methyl-3-pentanol -3.89 -2.90 169 4-methyl-2-pentanol -3.74 -2.74 170 cyclopentanol -5.49 -5.77 171 1-hexanol -4.36 -3.94 172 3-hexanol -3.68 -3.11 173 cyclohexanol -4.95 -4.82

128

174 4-heptanol -4.01 -2.72 175 cycloheptanol -5.49 -4.05 176 1-heptanol -4.25 -3.73 177 1-octanol -4.10 -3.54 178 allyl alcohol -5.03 -5.61 179 phenol -6.53 -5.58 180 4-bromophenol -7.10 -6.06 181 4-t-butylphenol -5.92 -4.14 182 2-cresol -5.86 -5.10 183 3-cresol -5.49 -5.76 184 4-cresol -6.12 -5.81 185 2,2,2-trifluoroethanol -4.31 -5.49 186 p-bromophenol -7.13 -6.06 187 2-methoxyethanol -6.77 -4.81 188 dimethoxymethane -2.93 -2.52 189 methyl propyl ether -1.66 -1.51 190 methyl isopropyl ether -2.00 -1.82 191 methyl t-butyl ether -2.21 -0.28 192 diethyl ether -1.75 -2.13 193 ethyl propyl ether -1.81 -1.72 194 dipropyl ether -1.16 -1.41 195 diisopropyl ether -0.53 -0.83 196 di-n-butyl ether -0.83 -1.05 197 THF -3.12 -3.83 198 2-methyltetrahydrofuran -3.30 -3.70 199 anisole -2.45 -1.99 200 ethyl phenyl ether -4.28 -1.81 201 1,1-diethoxyethane -3.27 -3.17 202 1,2-dimethoxyethane -4.84 -3.06 203 1,2-diethoxyethane -3.53 -3.89 204 1,3-dioxolane -4.09 -6.12 205 1,4-dioxane -5.05 -5.18 206 2,2,2-trifluoroethyl vinyl ether -0.12 -1.53 207 1-chloro-2,2,2-trifluoroethyl difluoromethyl ether 0.11 0.01 208 acetaldehyde -3.50 -3.21 209 propanal -3.44 -4.10 210 butanal -3.18 -2.81 211 pentanal -3.03 -3.79 212 hexanal -2.81 -2.50 213 heptanal -2.67 -3.43 214 octanal -2.29 -2.15 215 nonanal -2.07 -3.16 216 trans-2-butenal -4.23 -4.62 217 trans-2-hexenal -3.68 -4.29 218 trans-2-octenal -3.44 -3.93 219 trans,trans-2,4-hexadienal -4.64 -3.52 220 benzaldehyde -4.02 -5.05 221 m-hydroxybenzaldehyde -9.51 -9.02 222 p-hydroxybenzaldehyde -10.47 -9.23 223 acetone -3.80 -3.75 224 2-butanone -3.71 -4.41

129

225 3-methyl-2-butanone -3.24 -4.02 226 3,3-dimethyl-2-butanone -2.89 -3.65 227 2-pentanone -3.52 -3.99 228 3-pentanone -3.41 -4.22 229 4-methyl-2-pentanone -3.06 -3.67 230 2,4-dimethyl-3-pentanone -2.74 -3.37 231 cyclopentanone -4.68 -4.09 232 2-hexanone -3.41 -3.92 233 2-heptanone -3.04 -3.65 234 4-heptanone -2.93 -3.61 235 2-octanone -2.88 -3.48 236 2-nonanone -2.48 -3.33 237 5-nonanone -2.67 -2.99 238 2-undecanone -2.15 -2.87 239 acetophenone -4.58 -4.94 240 acetic acid -6.70 -5.32 241 propionic acid -6.46 -5.40 242 butyric acid -6.35 -4.73 243 pentanoic acid -6.16 -4.71 244 hexanoic acid -6.21 -4.29 245 4-amino-3,5,6-trichloropyridine-2-carboxylic acid -11.96 -12.75 246 methyl formate -2.78 -4.55 247 ethyl formate -2.65 -4.77 248 propyl formate -2.48 -4.28 249 methyl acetate -3.31 -3.69 250 isopropyl formate -2.02 -4.81 251 isobutyl formate -2.22 -4.84 252 isoamyl formate -2.13 -3.66 253 ethyl acetate -3.08 -3.56 254 propyl acetate -2.85 -3.06 255 isopropyl acetate -2.65 -3.70 256 butyl acetate -2.55 -2.88 257 isobutyl acetate -2.36 -4.84 258 amyl acetate -2.45 -2.62 259 isoamyl acetate -2.21 -2.52 260 hexyl acetate -2.26 -2.44 261 methyl propionate -2.97 -3.60 262 ethyl propionate -2.80 -3.37 263 propyl propionate -2.54 -3.03 264 isopropyl propionate -2.22 -3.38 265 pentyl propionate -1.99 -2.78 266 methyl butyrate -2.84 -3.10 267 ethyl butyrate -2.50 -3.06 268 propyl butyrate -2.28 -2.69 269 methyl pentanoate -2.54 -2.97 270 ethyl pentanoate -2.52 -2.74 271 methyl hexanoate -2.48 -2.61 272 ethyl heptanoate -2.30 -2.32 273 methyl octanoate -2.05 -2.15 274 methyl benzoate -4.28 -5.93 275 methylamine -4.60 -3.98

130

276 ethylamine -4.61 -4.34 277 propylamine -4.50 -4.02 278 butylamine -4.38 -3.88 279 pentylamine -4.09 -3.64 280 hexylamine -4.04 -3.50 281 dimethylamine -4.28 -3.28 282 diethylamine -4.06 -3.52 283 dipropylamine -3.65 -2.97 284 dibutylamine -3.31 -2.49 285 trimethylamine -3.23 -1.70 286 triethylamine -3.03 -2.38 287 azetidine -5.56 -4.02 288 piperazine -7.40 -8.45 289 N,N'-dimethylpiperazine -7.58 -5.74 290 N-methylpiperazine -7.77 -7.15 291 aniline -5.49 -6.28 292 1,1-dimethyl-3-phenylurea -11.87 -9.31 293 N,N-dimethyaniline -2.90 -4.97 294 ethylenediamine -9.75 -8.93 295 hydrazine -9.30 -9.78 296 2-methoxy-1-ethanamine -6.55 -6.26 297 morpholine -7.17 -6.61 298 N-methylmorpholine -6.34 -5.22 299 N-methylpyrrolidine -3.97 -3.82 300 N-methylpiperidine -3.89 -2.67 301 pyrrolidine -5.47 -4.47 302 piperidine -5.10 -3.98 303 pyridine -4.69 -3.32 304 2-methylpyridine -4.62 -3.49 305 3-methylpyridine -4.77 -3.54 306 4-methylpyridine -4.92 -3.58 307 2-ethylpyridine -4.32 -3.42 308 3-ethylpyridine -4.60 -3.40 309 4-ethylpyridine -4.72 -3.45 310 2,3-dimethylpyridine -4.81 -3.58 311 2,4-dimethylpyridine -4.85 -3.75 312 2,5-dimethylpyridine -4.70 -3.74 313 2,6-dimethylpyridine -4.60 -3.60 314 3,4-dimethylpyridine -5.21 -3.71 315 3,5-dimethylpyridine -4.84 -3.77 316 2-methylpyrazine -5.51 -4.80 317 2-ethylpyrazine -5.45 -4.72 318 2-isobutylpyrazine -5.05 -4.10 319 2-ethyl-3-methoxypyrazine -4.39 -4.57 320 2-isobutyl-3-methoxypyrazine -3.68 -3.67 321 9-methyladenine -13.60 -13.77 322 1-methylthymine -10.40 -11.22 323 methylimidazole -10.25 -7.61 324 N-propylguanidine -10.92 -10.73 325 acetonitrile -3.89 -1.49 326 propionitrile -3.85 -1.73

131

327 butyronitrile -3.64 -1.48 328 benzonitrile -4.10 -3.87 329 2,6-dichlorobenzonitrile -5.22 -5.11 330 3,5-dibromo-4-hydroxybenzonitrile -9.00 -9.34 331 N,N-dimethylformamide -4.90 -6.63 332 N-methylformamide -10.00 -8.02 333 Acetamide -9.72 -8.41 334 (E)-N-methylacetamide -10.00 -7.25 335 (Z)-N-methylacetamide -10.00 -7.42 336 propionamide -9.42 -8.55 337 methanethiol -1.24 -1.82 338 ethanethiol -1.30 -1.22 339 1-propanethiol -1.05 -0.91 340 thiophenol -2.55 -3.03 341 thioanisole -2.73 -2.80 342 dimethyl sulfide -1.54 -2.21 343 diethyl sulfide -1.43 -0.90 344 methyl ethyl sulfide -1.49 -1.60 345 dipropyl sulfide -1.27 -0.51 346 2,2'-dichlorodiethyl sulfide -3.92 -3.44 347 dimethyl disulfide -1.83 -2.67 348 diethyl disulfide -1.63 -1.60 349 trimethyl phosphate -8.70 -8.52 350 triethyl phosphate -7.80 -7.65 351 tripropyl phosphate -6.10 -4.16 352 2,2-dichloroethenyl dimethyl phosphate -6.61 -6.55 353 dimethyl-5-(4-chloro)-bicyclo[3.2.0]-heptyl phosphate -7.28 -8.62 354 o-ethyl-o'-(4-bromo-2-chlorophenyl) S-propyl phosphorothioate -4.09 -5.04 355 hydrochinone -10.77 -10.18 356 1,2,3-trimethoxybenzene -5.40 -6.31 357 1,2-benzenediole -7.62 -9.56 358 1,3-benzenediole -9.67 -9.68 359 o-phenylenediamine -7.19 -10.91 360 m-phenylenediamine -10.26 -12.01 361 2-methylaniline -5.47 -6.41 362 N-methylaniline -4.54 -5.38 363 acetylene anion -73 -66 364 protonated methanol -85 -76 365 protonated dimethyl ether -70 -72 366 protonated 2-propanol -64 -66 367 methanolate ion -95 -92 368 formylate ion -77 -75 369 dimethyl ether carbanion -81 -78 370 phenolate ion -72 -80 371 toluene carbanion -59 -62 372 superoxide -87 -84 373 methyl ammonium ion -70 -79 374 protonated acetamide -66 -63 375 protonated N-methylmethanamine -63 -70 376 protonated N,N-dimethylmethanamine -59 -62 377 pyridinium ion -59 -68

132

378 ammonium ion -79 -67 379 acetonitrile carbanion -75 -73 380 azide ion -74 -74 381 methylsulfonium ion -74 -74 382 protonated dimethyl sulfide -61 -55 383 1-propanethiolate anion -76 -75 384 thiophenolate ion -67 -66

Table A5 The pKa data set. No. Compound Exp. Calc.

1 2,3,4,5,6-pentafluoroaniline -0.28 0.26 2 2,3,5,6-tetramethyl-4-nitrobenzeneamine 2.36 2.97 3 2,3-dichloroaniline 1.76 2.03 4 2,4,5-trichloroaniline 1.09 1.33 5 2,4,6-trichloroaniline -0.03 1.38 6 2,4-dibromoaniline 2.30 0.92 7 2,4-dichloroaniline 2.00 2.23 8 2,4-dinitroaniline -4.25 -2.34 9 2,5-dichloroaniline 2.05 2.09

10 2,5-dimethoxyaniline 3.93 6.15 11 2,6-dichloro-4-nitroaniline -2.55 -1.07 12 2,6-dichloroaniline 0.42 2.18 13 2,6-dimethyl-4-nitrobenzeneamine 0.98 2.37 14 2,6-dinitroaniline -5.00 -2.07 15 2-amino-4-nitrophenol 3.10 2.33 16 2-aminobenzoic acid,ethyl ester 2.51 3.52 17 2-aminobenzoic acid 2.14 2.83 18 2-aminobiphenyl 3.83 6.77 19 2-aminophenol 4.84 5.26 20 2-chloro-4-nitroaniline -0.94 -0.64 21 2-methoxy-5-nitroaniline 2.49 2.36 22 2-nitro-4-toluidine 0.40 1.62 23 3,4-dichloroaniline 2.97 2.18 24 3,5-dichloroaniline 2.51 1.99 25 3,5-dimethyl-4-nitrobenzeneamine 2.54 2.22 26 3,5-dinitroaniline 0.30 -0.33 27 3-aminobenzoic acid 3.07 3.17 28 3-aminophenol 4.37 5.24 29 3-bromoaniline 3.58 2.62 30 3-methyl-4-bromoaniline 4.05 3.46 31 3-methyl-4-nitroaniline 1.64 1.70 32 3-nitro-4-toluidine 3.03 2.96 33 3-trifluoromethylaniline 3.49 2.46 34 4-aminobenzoic acid 2.38 1.86 35 4-aminobiphenyl 4.35 6.32

133

36 4-aminophenol 5.48 5.91 37 4-benzoylaniline 2.24 1.11 38 4-bromoaniline 3.86 4.30 39 4-chloro-2-nitroaniline -1.02 -0.22 40 4-chloro-3-nitrobenzeneamine 1.90 -0.52 41 4-methoxy-2-nitrobenzenamine 0.77 1.03 42 4-methylsulfonylaniline 1.35 0.31 43 4-nitro-2-toluidine 1.04 0.90 44 5-nitro-2-toluidine 2.35 2.65 45 butyl-4-aminobenzoate 2.47 3.98 46 methyl-4-aminobenzoate 2.47 3.48 47 methyl anthranilate 2.23 2.97 48 o-bromoaniline 2.53 1.67 49 p-aminobenzoic acid,ethyl ester 2.51 2.60 50 p-aminosalicylic acid 2.05 2.73 51 propyl-4-aminobenzoate 2.49 4.55 52 p-trifluoromethylaniline 2.45 2.01 53 1,2,2,6,6-pentamethylpiperidine 11.25 9.80 54 1,2,3,4-tetrahydro-2-naphthalenamine 9.93 10.17 55 1-methylpyrrolidine 10.32 8.72 56 2,2,2-trifluoroethylamine 5.70 4.95 57 2,2,6,6-tetramethylpiperidine 11.72 11.19 58 2,2-bipyridine 4.33 3.97 59 2,3,4,5,6-pentachloropyridine -1.00 -0.72 60 2,3,5,6-tetrachloropyridine -0.80 -1.88 61 2,3,5,6-tetramethylpyridine 7.90 7.64 62 2,3-dichloropyridine -0.85 0.70 63 2,3-dimethylpyridine 6.57 6.44 64 2,4,6-collidine 7.43 6.93 65 2,4-dimethylpyridine 6.99 6.51 66 2,5-dimethylpyridine 6.40 6.53 67 2,6-dichloropyridine -2.86 1.44 68 2,6-dimethoxypyridine 1.60 4.41 69 2,6-lutidine 6.60 6.18 70 2-acetylpyridine 2.73 3.02 71 2-amino-5-methylpyridine 7.22 5.93 72 2-aminomethylfuran 8.89 8.22 73 2-benzylpyridine 5.13 6.08 74 2-bromopyridine 0.90 1.91 75 2-chloropyridine 0.49 2.78 76 2-ethylpyridine 5.89 6.38 77 2-fluoropyridine -0.44 3.32 78 2-hydroxypyridine 0.75 3.70 79 2-methoxypyridine 3.06 4.55 80 2-methyl-5-vinylpyridine 5.67 5.94 81 2-methylpiperidine 11.08 10.10 82 2-methylpyridine 6.00 5.69 83 2-methylthiopyridine 3.59 1.71 84 2-phenethylamine 9.96 7.35 85 2-phenylpyridine 4.48 4.81

134

86 2-phenylpyrrolidine 9.40 9.24 87 2-propylpiperidine 11.00 10.88 88 2-pyridinecarboxyaldehyde 3.80 4.25 89 2-pyridineethanol 5.31 5.13 90 2-pyridinepropanol 5.61 5.81 91 2-t-butylpyridine 5.76 7.26 92 2-vinylpyridine 4.98 4.74 93 3,4-dimethylpyridine 6.46 6.70 94 3,4-methylenedioxyamphetamine 9.67 9.40 95 3,5-dichloropyridine 0.67 1.60 96 3,5-dimethylpyridine 6.15 6.86 97 3-bromopyridine 2.91 1.11 98 3-ethylpyridine 5.56 6.77 99 3-formylpyridine 3.80 3.51

100 3-hydroxypyridine 4.80 4.31 101 3-methoxypyridine 4.91 4.71 102 3-methylpyridine 5.63 6.07 103 3-nitropyridine 1.18 0.51 104 3-phenylpropylamine 10.16 9.18 105 3-phenylpyridine 4.80 5.50 106 3-pyridinemethaneamine 5.96 8.10 107 3-pyridinemethanol 4.90 4.67 108 3-pyridinepropanol 5.47 6.08 109 4,4-bipyridinyl 4.82 5.16 110 4-acetylpyridine 3.59 3.45 111 4-benzylpyridine 5.59 6.78 112 4-bromopyridine 3.78 3.24 113 4-chloropyridine 3.84 3.65 114 4-cyanopyridine 1.90 3.85 115 4-ethylmorpholine 7.67 8.03 116 4-ethylpyridine 5.87 6.72 117 4-formylpyridine 4.77 3.29 118 4-methoxypyridine 6.47 5.05 119 4-methylbenzenemethanamine 9.36 9.22 120 4-methylpyridine 5.98 6.01 121 4-phenylbutylamine 10.36 9.66 122 4-phenylpyridine 5.55 5.33 123 4-propylpyridine 6.05 7.56 124 4-pyridineethanol 5.60 6.88 125 4-pyridinemethanol 5.33 5.56 126 4-pyridinepropanol 5.84 7.11 127 4-t-butylpyridine 5.99 7.84 128 4-vinylpyridine 5.62 5.15 129 5-ethyl-2-methylpyridine 6.51 7.20 130 allylamine 9.70 8.45 131 α-methylbenzeneethanamine 10.13 9.45 132 α-methylbenzenepropanamine 9.79 9.51 133 anabasine 8.70 8.91 134 arecoline 7.16 6.34 135 azetidine 11.29 8.87

135

136 benzylamine 9.33 7.82 137 bis-(2-chloroethyl)ethylamine 6.57 6.57 138 chlorpheniramine 9.13 7.09 139 cyclohexanamine 10.63 10.50 140 diallylamine 9.29 9.17 141 dibutylamine 11.39 11.89 142 dicyclohexylamine 10.40 11.82 143 diethylamine 11.09 10.22 144 diisopropylamine 11.07 10.99 145 dimethylamine 10.73 9.46 146 dimethylbutylamine 10.19 9.61 147 dinicotinic acid 1.10 3.88 148 diphenhydramine 8.98 6.74 149 dipropylamine 11.00 11.01 150 E-3-nicotinoylacrylic acid 3.82 1.90 151 ethylamine 10.87 10.11 152 ethyldimetyhlamine 10.16 8.72 153 fenpropidin 10.10 12.19 154 fenpropimorph 6.98 11.76 155 hexamethyleneimine 11.07 10.21 156 isobutylamine 10.68 10.84 157 isonicotinic acid, ethyl ester 1.70 4.11 158 isonicotinic acid, methyl ester 3.45 3.45 159 isonicotinic acid 3.26 2.96 160 isopropylamine 10.63 10.42 161 mescaline 9.56 8.09 162 methadone 8.94 7.28 163 methamphetamine 9.87 9.35 164 methylamine 10.62 9.23 165 methylbutylamine 10.90 10.29 166 morpholine 8.49 7.85 167 moxisylyte 8.72 8.01 168 N-β-dimethylbenzeneethanamine 9.87 8.73 169 N-butylamine 10.78 10.50 170 N-ethylbenzenemethanamine 9.64 9.78 171 nicotine 8.18 8.64 172 nicotinic acid, ethyl ester 3.35 3.51 173 nicotinic acid, methyl ester 3.13 2.81 174 nikethamide 3.50 3.97 175 n-methylbenzeneethanamine 10.08 8.13 176 n-methylbenzylamine 9.54 7.85 177 n-methylmorpholine 7.38 7.72 178 n-methylpiperidine 10.08 8.83 179 N,N-di-2-propenyl-2-propen-1-amine 8.31 8.60 180 N,N-dimethyl-2-(3-pyridyl)ethylamine 8.86 8.46

181 N,N-dimethyl-2-[5-methyl-2-(1-methylethyl)phenoxy]ethanamine 8.66 9.74

182 N,N-dimethyl-(2-pyridine)ethanamine 8.75 7.82 183 N,N-dimethyl-3-pyridylmethylamine 8.00 7.63 184 N,N-dimethylbenzylamine 8.91 7.50

136

185 orphenadrine 8.91 8.33 186 picolinic acid, methyl ester 2.21 2.49 187 picolinic acid 1.06 2.29 188 piperalin 8.90 8.34 189 piperidine 11.28 9.30 190 p-methoxyamphetamine 9.53 10.04 191 propylamine 10.71 10.09 192 pyridine 5.23 5.27 193 pyrrolidine 11.31 8.97 194 sec-butylamine 10.56 10.80 195 t-butylamine 10.68 10.52 196 triethylamine 10.78 9.64 197 trimethylamine 9.80 8.37 198 tri-N-butylamine 10.89 12.41 199 tripropylamine 10.65 10.91 200 1-acetyl-1H-imidazole 3.60 4.53 201 1-methyl-4-nitro-1H-imidazole -0.53 -2.05 202 1-methyl-5-nitroimidazole 2.13 -0.07 203 1-phenylmethyl-1H-imidazole 6.70 6.39 204 2-(2,4-dimethylphenyl)-5-nitrobenzimidazole 5.29 2.75 205 2-(2-methoxyphenyl)benzimidazole 7.17 4.37 206 2-(2-methylphenyl)-5-nitrobenzimidazole 4.87 2.72 207 2,4,6-pyrimidinetriamine 6.81 2.23 208 2-(4-aminophenylmethyl)-5-chlorobenzimidazole 7.47 4.67 209 2-(4-bromophenylmethyl)-5-chlorobenzimidazole 5.42 7.96 210 2-(4-chlorphenylmethyl)-5-chlorobenzimidazole 4.86 4.82 211 2,4-dimethylquinoline 5.12 6.41 212 2-(4-methoxyphenylmethyl)-5-nitrobenzimidazole 4.26 1.76 213 2-(4-methylphenyl)benzimidazole 6.90 6.27 214 2-(4-methylphenylmethyl)-5-chlorobenzimidazole 7.09 5.70 215 2,6-dimethylquinoline 6.10 6.62 216 2-amino-4,6-dimethylpyrimidine 4.82 4.35 217 2-aminopyrimidine 3.45 2.98 218 2-bromopyrimidine -1.63 1.24 219 2-ethoxypyrimidine 1.27 3.11 220 2-methyl-1H-imidazole 7.85 6.22 221 2-methyl-8-quinolinol 5.55 4.92 222 2-methylquinoline 5.71 5.70 223 2-methylthio-4,6-dimethylpyrimidine 0.59 4.86 224 2-methylthiopyrimidine 6.48 3.72 225 2-phenyl-1H-imidazole -0.68 5.11 226 2-pyrimidinecarboxylic acid, methyl ester 2.12 0.70 227 3-bromoquinoline 2.69 3.66 228 3-methylquinoline 5.17 6.09 229 3-quinolinol 4.28 4.73 230 4,6-dimethylpyrimidine 2.70 5.05 231 4,7-dichloroquinoline 2.80 1.94 232 4-methyl-8-quinolinol 5.56 4.89 233 4-methylpyrimidine 1.91 4.40 234 4-methylquinoline 5.67 5.87

137

235 4-nitroimidazole -0.05 -0.20 236 5-chloro-8-quinolinol 3.56 2.62 237 5-nitropyrimidine 0.72 -0.44 238 5-quinolinol 5.02 4.37 239 6-bromoquinoline 3.87 1.17 240 6-chloroquinoline 3.85 3.18 241 6-hydroxyquinoline 5.15 3.85 242 6-methoxyquinoline 5.03 4.93 243 6-methylquinoline 5.34 6.06 244 7-bromoquinoline 3.87 3.98 245 7-methoxyquinoline 5.03 4.67 246 7-methylquinoline 5.34 6.00 247 7-quinolinol 5.46 4.56 248 8-chloroquinoline 3.12 3.28 249 8-fluoroquinoline 3.34 4.20 250 8-methoxyquinoline 5.01 4.42 251 8-methylquinoline 5.05 5.81 252 8-quinolinol 4.90 4.21 253 anserine 7.04 8.67 254 benzimidazole 5.53 5.20 255 cimetidine 6.80 6.07 256 cloquintocetmexyl 3.75 3.90 257 fenclorim 4.23 0.07 258 imidazole 6.95 5.97 259 pentostatin 5.20 7.38 260 pilocarpol 6.78 5.75 261 prochloraz 3.80 2.85 262 pyrimethanil 3.52 4.06 263 pyrimidine 1.23 3.61 264 quinoline 4.90 5.16 265 triflumizole 3.70 2.29

Table A6 The glass transition temperature data seta. No. Compound Exp. Calc.

1 1-TNATA 386 380 2 2-TNATA 383 395 3 AODF1 353 362 4 AODF2 353 379 5 BMA-1T 359 346 6 BMA-2T 363 357 7 BMA-3T 366 374 8 BMA-4T 371 376 9 BMB-2T 380 360

10 BMB-3T 366 372 11 BNpA-1T 364 364 12 BPAPF 440 422 13 EFPCA 405 393 14 EFPPCA 458 446

138

15 EM1 407 401 16 EM2 395 366 17 EM3 391 374 18 EM4 372 389 19 EM5 440 424 20 ENPPCA 447 403 21 EPPCA 447 419 22 EtCz2 343 364 23 F1AMB-1T 397 384 24 m-BPD 354 365 25 m-MTDAB 320 344 26 m-MTDAPB 378 387 27 m-MTDATA 348 361 28 m-MTDATz 315 360 29 MPPPCA 456 430 30 MTBDAB 407 451 31 m-TTA 353 399 32 NEFAPQ 389 380 33 NPB 368 351 34 NPCA 396 402 35 NPECAPPP 425 382 36 o-MTDAB 315 337 37 o-MTDAPB 382 387 38 o-MTDATA 349 355 39 o-MTDATz 328 357 40 PAB 401 402 41 PAE3b 388 398 42 PAE3c 412 428 43 PAPA 394 406 44 PATB4a 398 394 45 PATB4d 423 421 46 PATB4e 416 449 47 p-BrTDAB 345 352 48 p-ClTDAB 337 320 49 p-DPA-TDAB 380 388 50 p-FTDAB 327 326 51 PhAMB-1T 357 355 52 PhCz2 363 376 53 p-MTDAB 328 344 54 p-MTDAPB 383 388 55 p-MTDATA 353 362 56 PPACBN 467 422 57 PPATC3e 415 445 58 PPCA 453 431 59 PPPCA 457 440 60 p-TTA 405 398 61 TBB 361 428 62 TBPSF 468 392 63 TCB 399 393 64 TCPB 445 453 65 TCTA 423 414 66 TDAPB 394 377 67 TDATA 362 352 68 TMB-TB 433 430 69 TPD 338 335 70 TPOB 410 383 71 TPTAB1 311 324 72 TPTAB2 319 310 73 TPTE 403 402

a Structure codes from Yin, S.; Wang, Y., J. Chem. Inf. Comput. Sci., 2003, 43, 970-977.

139

Table A7 The aqueous solubility data set (logS).

No. Compound Exp. Calc.

1 1-Bromoheptane -4.431 -3.409 2 1-Bromohexane -3.807 -3.214 3 Acetyl-R-mandelic acid -1.231 -2.342 4 1,1-Diphenylethene -4.436 -4.155 5 Benzo[b]triphenylene -8.222 -5.931 6 1,2,4,5-Tetrafluorobenzene -2.376 -1.534 7 1,3-Butadiene -1.867 -2.157 8 1,4-Dimethylcyclohexane -4.466 -3.070 9 1,4-Pentadiene -2.087 -2.309

10 1,5-Hexadiene -2.687 -2.562 11 1,6-Heptadiene -3.340 -2.836 12 1,6-Heptadiyne -1.747 -2.436 13 1,8-Nonadiyne -2.983 -2.954 14 1-Chloro-2-[2,2-dichloro-1-(4-chlorophenyl)ethyl]benzene -6.506 -6.201 15 1-Anthranol -4.721 -3.929 16 1-Bromo-2-naphthylisothiocyanate -0.319 -4.858 17 1-Bromo-3-chloropropane -1.848 -2.394 18 1-Bromo-3-fluorobenzene -2.666 -3.146 19 1-Butene -2.403 -2.109 20 1-Butyne -1.275 -1.911 21 1-Chloro-2,4-dinitronaphthalene -5.402 -4.238 22 1-Chloro-2-fluorobenzene -2.416 -2.613 23 1-Chloro-3-fluorobenzene -2.346 -2.637 24 1-Chloroheptane -3.996 -3.223 25 1-Ethyl-2-methylbenzene -3.207 -3.156 26 1-Heptene -3.733 -2.934 27 1-Heptyne -3.010 -2.714 28 1-Hexen-3-ol -2.344 -1.952 29 1-Methyl Tetrahydrofuran -1.538 -1.555 30 1-Methyl-1-cyclohexene -3.267 -2.605 31 1-Methylphenanthrene -5.854 -4.358 32 1-Naphthaleneacetic Acid -2.652 -3.037 33 1-Naphthol -3.519 -2.939 34 1-Naphthyl Isothiocyanate -4.602 -4.307 35 1-Nonene -5.053 -3.487 36 1-Nonyne -4.237 -3.263 37 1-Octyne -3.662 -2.997 38 1-Pentene -2.676 -2.392 39 17-Methyltestosterone -3.951 -3.260 40 2,2',3,3',4,4',6-Heptachlorobiphenyl -8.301 -7.524 41 2,2',4,5-Tetrachlorobiphenyl -7.252 -5.963 42 2,2',4,4'-Tetrachlorobiphenyl -6.123 -5.933 43 2,2',3,5'-Tetrachlorobiphenyl -6.562 -6.121 44 2,2',3,4,5'-Pentachlorobiphenyl -7.854 -6.648 45 2,2',3,3',4,4',5,5'-Octachlorobiphenyl -9.000 -7.928 46 2,2',3,4,5,5',6-Heptachlorobiphenyl -9.000 -7.564 47 2,2',3,4,6-Pentachlorobiphenyl -7.432 -6.587 48 2,2',3,3',5,6-Hexachlorobiphenyl -8.523 -7.135 49 2,2,3-Trimethyl-3-pentanol -0.833 -2.237 50 2,2,5,5-Tetramethyl-3-hexyne -3.833 -3.521 51 2,2,5-Trimethyl-3-hexyne -3.618 -3.270 52 2,2-Dimethyl-3-butanol -2.368 -1.938 53 2,2-Dimethyl-3-hexyne -4.143 -2.981 54 2,3',5-Trichlorobiphenyl -6.000 -5.381

140

55 2,3,4,5,6-Pentachlorophenoxyacetic Acid -3.745 -4.964 56 2,3,4,6-Tetrachlorophenoxyacetic Acid -3.409 -4.382 57 2,3,4-Trichlorophenoxyacetic Acid -3.097 -3.944 58 2,3,5-Trichloro-4-hydroxypyridine -4.286 -3.480 59 2,3,5-Trichlorophenoxyacetic Acid -3.000 -3.889 60 2,3,6-Trichlorophenoxyacetic Acid -2.620 -3.848 61 2,3-Dichlorophenoxyacetic Acid -2.810 -3.296 62 2,3-Dimethyl-1-butanol -2.133 -1.834 63 2,3-Dimethyl-2-pentanol -2.622 -2.132 64 2,3-Dimethyl-3-pentanol -2.595 -2.076 65 2,3-Xylenol -1.427 -2.334 66 2,4,6-Trichlorophenol -2.341 -3.826 67 2,4,6-Trichlorophenoxyacetic Acid -3.013 -3.837 68 2,4-Decadione -2.585 -2.236 69 2,4-Dimethyl-2-pentanol -2.683 -2.305 70 2,4-Dimethyl-3-pentanol -2.448 -2.172 71 2,4-Dimethyl-3-pentanone -3.046 -1.981 72 2,4-Dimethylquinoline -1.942 -3.341 73 2,4-Dinitrobenzoic Acid -1.067 -3.120 74 2,4-Dinitrophenol -2.598 -3.138 75 2,4-Lutidine -1.231 -2.370 76 2,4-Octadione -1.559 -1.644 77 2,5-Dichlorophenoxyacetic Acid -2.616 -3.313 78 2,5-Dimethyl-4-acetaminophenol -2.013 -2.034 79 2,5-Piperazinedione -0.831 -0.271 80 2,5-Xylenol -1.538 -2.407 81 2,6-Dichlorophenoxyacetic Acid -2.152 -3.287 82 2,6-Diethyl-4-acetaminophenol -2.531 -2.602 83 2,6-Diisopropyl-4-acetaminophenol -3.214 -2.977 84 2,6-Dimethyl-4-acetaminophenol -1.911 -2.023 85 2,6-Dimethyl-4-heptanol -3.904 -2.783 86 2,6-Dimethylnaphthalene -4.893 -3.766 87 2,6-Xylenol -1.305 -2.335 88 2-(2-Methyl-4-chlorophenoxy)propionic Acid -2.407 -3.191 89 2-Anthranol -4.328 -3.911 90 2-Chlorophenoxyacetic Acid -2.164 -2.639 91 2-Ethyl-1-butanol -3.152 -2.031 92 2-Ethylnaphthalene -4.291 -3.853 93 2-Fluorobenzyl Chloride -2.541 -2.623 94 2-Heptene -3.816 -2.909 95 2-Heptyne -3.770 -2.733 96 2-Hexanol -2.617 -1.998 97 2-Methyl-1-pentanol -2.976 -2.069 98 2-Methyl-1-pentene -3.033 -2.631 99 2-Methyl-2-hexanol -2.823 -2.241

100 2-Methyl-2-pentanol -2.244 -1.981 101 2-Methyl-3-hexyne -3.745 -2.756 102 2-Methyl-3-pentanol -2.451 -1.876 103 2-Methyl-4-acetaminophenol -1.595 -1.834 104 2-Methyl-4-penten-3-ol -2.260 -1.991 105 2-Methyl-5-t-butylphenol -2.594 -3.068 106 2-Methyldecalin -6.573 -3.702 107 2-Naphthoic Acid -3.886 -2.696 108 2-Naphthyl Isothiocyanate -4.444 -4.289 109 2-Nitrobenzaldehyde -3.878 -2.274 110 2-Pentene -2.538 -2.354 111 2-Thiouracil -2.257 -2.135 112 3,3'-Dichlorobiphenyl-4,4'-diamine -4.910 -3.903 113 3,3'-Dichlorobiphenyl -5.699 -4.777

141

114 3,3-Diphenylphthalide -4.855 -4.432 115 3,4,5-Trichlorophenoxyacetic Acid -2.939 -3.788 116 3,4,7,8-Tetramethyl-1,10-phenanthroline -5.222 -3.986 117 3,4-Dichlorophenoxyacetic Acid -2.684 -3.133 118 3,4-Xylenol -1.409 -2.408 119 3,5-Dichlorophenoxyacetic Acid -2.362 -3.096 120 3,5-Dinitrobenzoic Acid -2.197 -3.252 121 3,5-Pyridinedicarboxylic Acid -2.194 -1.400 122 3,5-Xylenol -1.398 -2.398 123 3-(5-tert-Butyl-1,3,4-thiadiazol-2-yl)-4-hydroxyl-l -1.877 -3.088 124 3-Bromo-2-nitrobenzoic Acid -2.872 -3.334 125 3-Bromobenzyl Isothiocyanate -3.971 -4.378 126 3-Carboxyphenylisothiocyanate -3.252 -3.038 127 3-Chloro-2-nitrobenzoic Acid -2.632 -2.687 128 3-Chlorobenzyl Isothiocyanate -3.863 -4.139 129 3-Chlorophenoxyacetic Acid -1.898 -2.447

130 3-Cyclohexyl-6-dimethylamino-1-methyl-1,3,5-triazine-2,4-dione -0.883 -2.268

131 3-Ethyl-3-pentanol -2.585 -2.210 132 3-Fluorobenzyl Chloride -2.544 -2.609 133 3-Heptanol -3.208 -2.259 134 3-Hexanol -2.547 -1.987 135 3-Hexanone -2.578 -1.763 136 3-Hexyne -2.167 -2.463 137 3-Hydroxy-5-methyl Isoxazole -0.067 -0.878 138 3-Hydroxyphenyl Isothiocyanate -1.991 -3.174 139 3-Methyl-1-butene -2.732 -2.381 140 3-Methyl-1-pentanol -3.121 -1.897 141 3-Methyl-2,4-pentadione -0.010 -1.132 142 3-Methyl-2-butanone -1.896 -1.479 143 3-Methyl-2-pentanol -2.466 -1.782 144 3-Methyl-2-pentanone -2.425 -1.814 145 3-Methyl-3-hexanol -2.734 -2.191 146 3-Methyl-3-pentanol -2.125 -1.890 147 3-Nitrobenzaldehyde -4.179 -2.293 148 3-Nitrobenzyl Isothiocyanate -4.086 -3.879 149 3-Nitropentane -1.955 -1.983 150 3-Nitrophenyl Isothiocyanate -3.553 -3.783 151 3-Nitrophthalic Acid -1.021 -1.862 152 3-Penten-2-ol -1.730 -1.811 153 3-Pentyl-2,4-pentadione -1.851 -2.006 154 3-Propyl-2,4-pentadione -0.876 -1.436 155 3-Thenoic Acid -1.474 -1.787 156 4,4'-Dimethylbiphenyl -6.000 -4.166 157 4,7-Dimethyl-1,10-phenanthroline -3.971 -3.616 158 4-(Methylthio)phenyl Dipropyl Phosphate -3.386 -3.280 159 4-(4-Chlorophenoxy)butyric Acid -3.290 -2.770 160 4-(2,4,5-Trichlorophenoxy)butyric Acid -3.829 -4.276 161 4-Benzoyl Phenylisothiocyanate -4.854 -4.286 162 4-Bromo-1-butene -2.247 -2.518 163 4-Bromobiphenyl -5.523 -4.708 164 4-Bromophenyl Isothiocyanate -0.268 -3.622 165 4-Carbethoxyphenylisothiocyanate -4.046 -3.612 166 4-Carboxyphenylisothiocyanate -3.975 -3.025 167 4-Chlorobenzyl Isothiocyanate -3.830 -4.152 168 4-Chlorophenyl Phenyl Ether -4.793 -4.151 169 4-Cyanobenzyl Isothiocyanate -3.495 -3.743 170 4-Dimethylaminophenyl Isothiocyanate -4.125 -3.628 171 4-Hexen-3-ol -2.165 -2.008

142

172 4-Hydroxyphenyl Isothiocyanate -2.668 -3.128 173 4-Methyl-1-pentene -3.244 -2.653 174 4-Methyl-3-pentanone -2.564 -1.813 175 4-Methylbenzaldehyde -1.724 -1.895 176 4-Methylbiphenyl -4.620 -3.924 177 4-Nitrobenzyl Isothiocyanate -3.633 -3.724 178 4-Nitrocatechol -1.571 -2.074 179 4-Nitroresorcinol -3.022 -2.113 180 4-Nonylphenol -4.498 -4.436 181 4-Penten-1-ol -1.924 -1.585 182 4-Penten-3-ol -1.766 -1.764 183 4-Vinyl-1-cyclohexene -3.335 -2.853 184 4-s-Butylphenol -2.194 -3.021 185 4-t-Butylphenol -2.413 -3.041 186 5,5-Dimethyl-2,4-hexadione -1.631 -1.753 187 5,5-Dipropylbarbituric Acid -2.398 -2.667 188 5,6-Dimethyl-2-thiouracil -2.056 -2.734 189 5-Bromo-2-nitrobenzoic Acid -1.521 -3.173 190 5-Bromo-3-tert-butyl-6-methyluracil -2.804 -3.657 191 5-Carboethoxy-2-thiouracil -2.099 -2.792 192 5-Chloro-2-nitrobenzoic Acid -1.319 -2.941 193 5-Ethyl-5-methylbarbituric Acid -1.096 -1.759 194 5-Ethyl-5-N-butylbarbituric Acid -1.638 -2.564 195 5-Ethyl-5-N-heptylbarbituric Acid -3.218 -3.307 196 5-Ethyl-5-N-hexylbarbituric Acid -3.049 -3.048 197 5-Ethyl-5-N-nonylbarbituric Acid -3.462 -3.856 198 5-Ethyl-5-N-octylbarbituric Acid -3.943 -3.582 199 5-Ethyl-5-N-propylbarbituric Acid -1.442 -2.241 200 5-Ethyl-5-pentylbarbituric Acid -2.177 -2.675 201 5-Methyl-2-thiouracil -2.446 -2.342 202 5-Nitro-1,10-phenanthroline -3.917 -3.543 203 6-Amino-2-thiouracil -2.747 -2.341 204 6-Methyl-2,4-heptadione -1.604 -1.897 205 6-Nitrophthalide -2.651 -2.556 206 7-Methylsulfinyl-2-xanthonecarboxylic Acid -5.062 -3.570 207 7-Methylthio-2-xanthonecarboxylic Acid -6.042 -4.030 208 Acetal -0.429 -1.870 209 Acetaminophen Acetate -2.781 -1.906 210 Acetaminophen Butyrate -2.826 -2.473 211 Acetaminophen Hexanoate -4.141 -3.040 212 Acetaminophen Laurate -4.745 -4.701 213 Acetaminophen Octanoate -4.443 -3.602 214 Acetaminophen Palmitate -4.892 -5.809 215 Acetaminophen Propionate -2.811 -2.181 216 Acetaminophen Stearate -4.922 -6.380 217 Adenine -2.119 -1.322 218 Adipic Acid -2.654 -1.194 219 Alachlor -3.261 -3.695 220 Allyl Bromide -1.499 -2.215 221 Ametryn -3.037 -3.408 222 Amikacin -0.500 2.060 223 Amitrole 0.522 -0.679 224 Ampyrone 0.554 -2.290 225 Amyl Acetate -1.877 -2.105 226 Ancymidol -2.596 -3.160 227 Androstenedione -3.699 -3.293 228 Anethole -3.126 -3.103 229 Anthraquinone -5.187 -3.220 230 Arginine 0.019 -0.482

143

231 Aspirin Phenylalanine Ethyl Ester -3.328 -3.278 232 Azinphos-methyl -4.039 -4.179 233 Barban -4.370 -4.146 234 Bendroflumethiazide -3.590 -3.535 235 Benzamide -0.956 -1.560 236 Benzhydrol -2.553 -3.325 237 Benzidine-2,2'-disulfonic Acid -4.634 -2.876 238 Benzoin -2.850 -3.116 239 Benzophenone -3.125 -3.043 240 Benzoyl-r-mandelic Acid -1.509 -3.408 241 Benzoylprop-ethyl -4.263 -5.408 242 Benzyl Alcohol -0.402 -1.909 243 Benzyl Isothiocyanate -3.137 -3.582 244 Benzylamine -1.533 -1.908 245 Bibenzyl -4.627 -4.296 246 Borneol -2.320 -2.563 247 Bromochloromethane -0.889 -2.139 248 Bromomethionic Acid 1.131 -1.712 249 Butadiyne -4.699 -2.021 250 Butyl Dibutyl Phosphinate -1.717 -1.753 251 CDAA -0.945 -2.781 252 Camphor -1.987 -2.384 253 Caproic Aldehyde -1.302 -1.806 254 Caprylic Aldehyde -2.360 -2.058 255 Carbazole -5.265 -3.330 256 Carbofuran -2.500 -2.378 257 Carbon Disulfide -3.170 -0.794 258 Carbonyl Sulfide -1.682 -1.954 259 Carboxin -3.141 -3.474 260 Carvacrol -2.080 -3.065 261 Chelidonic Acid -1.110 -1.596 262 Chloromethionic Acid -4.812 -1.444 263 Chloroneb -4.413 -3.675 264 Chloropicrin -2.006 -2.818 265 Chlorothalonil -5.647 -4.505 266 Chlorothiazide -3.020 -2.921 267 Chlorpropham -3.380 -3.439 268 Chlorpyrifos-methyl -4.907 -3.113 269 Chlorquinox -5.428 -5.049 270 Cinchonidine -3.168 -3.961 271 Cinchophen -3.193 -4.176 272 Cinnamaldehyde -1.991 -2.166 273 Citraconic Acid 0.779 -0.899 274 Cortisone -3.110 -2.867 275 Cortisone Acetate -4.277 -3.232 276 Cortisone Propionate -4.717 -3.192 277 Cumene Hydroperoxide -1.039 -2.409 278 Cyclobarbital -1.456 -2.508 279 Cycloheptane -3.515 -2.806 280 Cycloheptene -3.164 -2.650 281 Cyclohexanone -2.339 -1.312 282 Cyclooctane -4.152 -2.883 283 Cytosine -1.143 -0.874 284 D-Alanine 0.267 -0.611 285 D-Glutamic Acid -1.219 -0.773 286 DCPA -5.822 -4.796 287 d,l-2-(4-Chlorophenoxy)propionic Acid- -2.134 -3.066 288 d,l-Aminooctanoic Acid -4.202 -1.954 289 d,l-Aspartic Acid -1.212 -0.601

144

290 d,l-Glutamic Acid -0.746 -0.698 291 d,l-Isoleucine -0.778 -1.382 292 d,l-Methionine -0.659 -1.570 293 d,l-Norvaline -0.145 -1.154 294 d,l-Phenylalanine -1.066 -1.995 295 d,l-alpha-Aminobutyric Acid 0.287 -0.904 296 DMPA -4.798 -4.202 297 Dalapon 0.545 -2.391 298 Daminozide -0.205 -0.533 299 Dazomet -3.876 -3.342 300 Decabromodiphenyl Ether -7.585 -6.483 301 Decyl-p-hydroxybenzoate -2.885 -4.198 302 Dexamethasone -3.644 -2.464 303 Dianisidine -3.610 -2.850 304 Dibenzo-18-crown-6 -4.693 -4.413 305 Dibenzothiophene -5.542 -4.281 306 Dibutyl Butyl Phosphonate -2.700 -2.307 307 Dibutyl Ethoxybutyl Phosphate -2.647 -3.030 308 Dibutyl Ethyl Phosphate -1.846 -2.140 309 Dibutyl Ethyl Phosphonate -1.569 -1.917 310 Dibutyl Hydrogen Phosphonate -1.425 -2.025 311 Dibutyl Methyl Phosphate -1.499 -2.039 312 Dibutyl Methyl Phosphonate -1.415 -1.467 313 Dicamba -1.691 -3.214 314 Dichlobenil -4.236 -3.581 315 Dichlofenthion -6.110 -3.750 316 Dichlone -6.357 -3.822 317 Dichlorodifluoromethane -2.635 -1.823 318 Dichlorophen -5.698 -4.484 319 Dichlorprop -2.453 -3.807 320 Dicofol -5.448 -6.544 321 Diethyl Amyl Phosphate -1.476 -1.821 322 Diethyl Butyl Phosphate -1.147 -1.638 323 Diethyl Hexyl Phosphonate -2.666 -1.737 324 Diethyl Trichloromethyl Phosphonate -1.754 -1.426 325 Diethylstilbestrol -4.350 -4.487 326 Digallic Acid -2.809 -2.687 327 Digitoxin -5.293 -4.771 328 Dimethirimol -2.242 -2.497 329 Dinitramine -5.467 -3.654 330 Dinoseb -3.665 -3.960 331 Diphenic Acid -2.284 -2.697 332 Diphenyl Methyl Phosphate -2.121 -2.592 333 Diphenylacetic Acid -3.222 -3.405 334 Diphenylnitrosamine -3.752 -3.623 335 Dixanthogen -4.959 -5.389 336 EPTC -2.703 -3.846 337 Estragole -2.921 -3.159 338 Ethalfluralin -6.222 -4.160 339 Ethoate-methyl -1.457 -2.134 340 Ethohexadiol -2.287 -1.849 341 Ethyl Cinnamate -2.996 -2.928 342 Ethyl Cyanoacetate -0.753 -1.556 343 Ethyl Dibutyl Phosphonate -1.200 -1.917 344 Ethyl Hydrocinnamate -2.909 -3.054 345 Ethyl Phthalate -2.347 -2.715 346 Ethyl Propionate -0.667 -1.437 347 Ethyl m-Isothiocyanobenzoate -3.602 -3.757 348 Ethyl p-Benzoate -2.319 -2.492

145

349 Ethylidene Chloride -1.292 -2.360 350 Ethylmalonic Acid 0.732 -1.310 351 Eugenol -1.824 -2.627 352 Fenarimol -4.383 -4.917 353 Fenbufen -5.046 -3.578 354 Fensulfothion -2.302 -3.569 355 Flufenamic Acid -4.398 -2.902 356 Fluometuron -3.412 -1.743 357 Fluorobenzene -1.792 -1.930 358 Fluorometholone -5.843 -2.861 359 Fumaric Acid -1.220 -0.956 360 Glutamic Acid -1.235 -0.773 361 Glutamine -0.548 -0.337 362 Glyphosate -1.149 0.581 363 Heptanoic Acid -1.665 -1.960 364 Heptyl p-Hydroxybenzoate -2.234 -3.386 365 Hexachloro-1,3-butadiene -4.907 -4.854 366 Hexachlorobenzene -7.770 -5.329 367 Hexadecyl p-Hydroxybenzoate -2.981 -5.818 368 Hexobarbital -2.735 -2.910 369 Hexyl Acetate -2.451 -2.419 370 Hexyl p-Hydroxybenzoate -2.768 -3.114 371 Histidine -0.533 -0.901 372 Hydantoic Acid -0.483 -0.534 373 Ibuprofen -4.367 -3.407 374 Isoamyl Acetate -3.558 -2.124 375 Isoamyl Salicylate -3.157 -3.227 376 Isoamylmalonic Acid 0.543 -1.867 377 Isobutane -3.075 -2.188 378 Isobutyl Isobutyrate -2.403 -2.275 379 Isobutylbenzene -4.123 -3.469 380 Isobutylene -2.329 -2.054 381 Isobutyraldehyde 0.091 -1.351 382 Isoprene -2.026 -2.334 383 Isopropalin -6.491 -4.909 384 Isopropyl Ether -4.608 -2.049 385 Isopropyl tert-Butyl Ether -2.366 -2.395 386 L-Asparagine -0.652 -0.351 387 L-Cystine -3.343 -2.327 388 Glutamic Acid -1.235 -0.773 389 Histidine -0.533 -0.901 390 L-Isoleucine -0.582 -1.428 391 L-Mandelic Acid -0.233 -1.923 392 Lactamide -5.057 -0.578 393 Lenacil -4.592 -2.712 394 Leptophos -5.235 -5.609 395 Levodopa -1.717 -1.389 396 Limonene -4.194 -3.312 397 Linalool -1.987 -2.879 398 MCPA -2.233 -2.945 399 MCPB -3.678 -3.238 400 Maleic Acid 0.579 -0.949 401 Mandelic Acid -5.924 -2.030 402 Meconic Acid -1.377 -2.011 403 Meconin -1.890 -2.402 404 Menthol -2.535 -2.602 405 Methacrylonitrile -2.167 -1.769 406 Methidathion -3.100 -3.376 407 Methionine -0.421 -1.570

146

408 Methomyl -0.447 -2.070 409 Methotrimeprazine -5.960 -5.427 410 Methyl Benzoate -1.839 -2.217 411 Methyl Butyrate -0.833 -1.595 412 Methyl Dixanthogen -3.939 -4.791 413 Methyl Isopropyl Ether -1.802 -1.599 414 Methyl Oxalate -0.292 -1.380 415 Methyl Propyl Ether -2.130 -1.785 416 Methyl m-Isothiocyanobenzoate -3.565 -3.502 417 Methylamine 1.605 -0.677 418 Methylaniline -1.280 -2.120 419 Methylmalonic Acid 0.760 -1.061 420 Methyltestosterone Acetate -4.854 -2.524 421 Methylthiouracil -2.426 -2.562 422 Metolazone -3.783 -3.828 423 Mirex -6.807 -8.824 424 Monolinuron -2.465 -2.719 425 Mustard Gas -2.363 -2.964 426 Myristic Acid -5.301 -3.878 427 Myristyl Alcohol -6.053 -4.127 428 N',N'-Dimethyl-m-aminophenyl Isothiocyanate -3.710 -3.650 429 Naproxen -4.161 -3.482 430 Naptalam -3.163 -3.639 431 Neburon -4.758 -4.242 432 Neopentyl Alcohol -2.146 -1.800 433 Niridazole -4.962 -2.763 434 Nitralin -5.760 -4.499 435 Nitrilotriacetic Acid -0.510 -0.608 436 Nitroethane -1.917 -1.439 437 Nitroguanidine -1.374 -1.432 438 Nitromethane 0.256 -1.265 439 Nonyl Aldehyde -3.171 -2.326 440 Nonyl p-Hydroxybenzoate -2.316 -3.917 441 Norflurazon -4.035 -2.834 442 Octadecyl-p-hydroxybenzoate -3.079 -6.375 443 Octyl p-Hydroxybenzoate -2.485 -3.670 444 Octylamine -2.810 -2.586 445 Oxamyl 0.106 -2.708 446 Oxanilic Acid -1.302 -2.244 447 Oxycarboxin -2.427 -2.526 448 Palmitic Acid -5.523 -4.424 449 Parathion -4.084 -3.290 450 Pentachlorbenzyl Alcohol -6.147 -4.886 451 Pentachloroethane -2.607 -3.641 452 Pentachlorophenol -4.208 -4.929 453 Pentamethylmelamine -1.958 -1.573 454 Phenacetin -2.350 -2.103 455 Phenetole -2.301 -2.592 456 Phenothiazine -5.097 -4.073 457 Phenoxyacetic Acid -3.959 -1.956 458 Phenyl Isothiocyanate -3.177 -3.392 459 Phenyl Salicylate -3.155 -3.398 460 Phosmet -4.104 -3.897 461 Phthalimide -2.611 -1.919 462 Picloram -2.749 -3.370 463 Pindone -4.107 -3.122 464 Pipemidic Acid -2.975 -2.083 465 Pirimicarb -1.946 -2.624 466 Prednisolone -3.953 -3.105

147

467 Propyl Acetate -2.453 -1.485 468 Propyl Dixanthogen -5.699 -6.363 469 Propylthiouracil -2.151 -3.138 470 Propyne -1.042 -1.639 471 Propyzamide -4.232 -3.956 472 Protoporphyrin IX -3.721 -7.160 473 Pyrocatechol -5.378 -1.817 474 Quinethazone -5.031 -2.480 475 Quinhydrone -1.730 -1.662 476 Quinidine -3.365 -4.283 477 Resorcinol -5.186 -1.731 478 Rhodanine -1.772 -2.583 479 Saccharin -1.629 -2.192 480 Salicin -0.855 -1.176 481 Salicylanilide -3.589 -3.104 482 Serine 0.607 -0.308 483 Siduron -4.111 -3.365 484 Sucrose 0.793 0.347 485 Sulfapyridine -2.969 -2.776 486 Sulfathiazole -2.835 -3.224 487 Tetradecyl p-Hydroxybenzoate -2.975 -5.321 488 Tetrahydropyran -1.776 -1.659 489 Thiometon -3.091 -3.536 490 Thionazin -2.338 -2.624 491 Threonine -0.089 -0.488 492 Thymine -1.519 -0.938 493 Triallate -4.882 -5.516 494 Tributylamine -3.000 -3.974 495 Tributylphosphine Oxide -0.593 -0.091 496 Trichlorfon -0.223 -2.151 497 Tricyclazole -2.073 -3.165 498 Tridecyl p-Hydroxybenzoate -2.945 -5.050 499 Trietazine -4.060 -3.298 500 Triethyl Phosphate 0.439 -1.100 501 Triethylamine -0.138 -2.265 502 Trimethoprim -2.861 -2.836 503 Trimethyl Phosphate 0.553 -0.629 504 Trimethylamine 0.841 -1.561 505 Triphenylcarbinol -2.260 -4.631 506 Tripropylamine -2.301 -3.176 507 Undecane -7.553 -4.137 508 Undecyl p-Hydroxybenzoate -2.092 -4.488 509 Uric Acid -3.730 -1.188 510 Vanillin -1.140 -1.756 511 Xylene -3.000 -2.849 512 Xylidine -2.040 -2.273 513 α,α,α-Trifluoro-o-toluic Acid -1.598 -1.316 514 2,2’-Bipyridine -1.420 -3.021 515 alpha-1,2,3,4,5,6-Hexachlorocyclohexane -5.163 -4.975 516 alpha-Endosulfan -5.885 -5.455 517 alpha-Hydroxycaproamide -1.081 -1.286 518 beta,beta,beta-Trichlorolactic Acid -2.645 -2.705 519 beta-1,2,3,4,5,6-Hexachlorocyclohexane -6.084 -4.757 520 beta-Alanine -5.213 -0.683 521 beta-Aminobutyric Acid 1.084 -0.488 522 beta-Endosulfan -6.162 -5.641 523 cis-1,2-Dimethylcyclohexane -4.272 -3.055 524 d-Borneol -2.319 -2.530 525 d-Camphoric Acid -1.421 -2.014

148

526 d-Fenchone -1.851 -2.416 527 d-Limonene -3.996 -3.316 528 dl-2-Octanol -2.036 -2.541 529 epsilon-Aminocaproic Acid -5.415 -1.053 530 4,4’-Bipyridine -1.538 -2.889 531 gamma-Aminobutyric Acid 1.101 -0.651 532 l-Menthone -2.492 -2.562 533 m-Acetoxyphenyl Isothiocyanate -3.114 -3.547 534 m-Acetylphenyl Isothiocyanate -4.328 -3.423 535 m-Biphenyl Isothiocyanate -4.523 -4.709 536 m-Cyanophenyl Isothiocyanate -3.193 -3.542 537 m-Ethoxyphenyl Isothiocyanate -3.420 -3.790 538 m-Fluorobenzoic Acid -1.970 -1.642 539 m-Isopropoxyphenyl Isothiocyanate -3.328 -4.109 540 m-Isothiocyanobenzoic Acid -3.097 -3.297 541 m-Isothiocyanophenyl Isothiocyanate -4.699 -4.109 542 m-Methylphenyl Isothiocyanate -3.848 -3.648 543 m-Terphenyl -5.155 -4.980 544 m-Toluenesulfonamide -1.341 -2.453 545 n-2-Hydroxy-n2,n4,n4,n6,n6-pentamethylmelamine -2.371 -1.802 546 n-Amyl Bromide -3.077 -2.937 547 n-Amyl beta-Ethoxypropionate -2.196 -2.845 548 n-Butyl Chloride -2.025 -2.356 549 n-Butyl Ether -3.592 -2.744 550 n-Butyl beta-Ethoxypropionate -1.639 -2.404 551 n-Butylmalonic Acid 0.437 -1.771 552 n-Capric Acid -3.445 -2.789 553 n-Ethyl beta-Ethoxypropionate -0.421 -1.857 554 n-Hexyl beta-Ethoxypropionate -2.829 -2.936 555 n-Methyl beta-Ethoxypropionate -0.072 -1.758 556 n-Methylolpentamethylmelamine -2.400 -1.666 557 n-Methylolpentamethylmelamine Methyl Ether -2.205 -2.230 558 n-Octyl Bromide -5.063 -3.678 559 n-Propyl beta-Ethoxypropionate -1.017 -2.136 560 n-Propylcyclopentane -4.740 -2.998 561 n-Propylmalonic Acid 0.680 -1.390 562 n-Valeraldehyde -0.867 -1.249 563 n2,n2,n4,n4-Tetramethylmelamine -2.688 -1.332 564 n2,n4,n6-Triethyl-n2,n4,n6-trimethylmelamine -3.703 -2.998 565 n6,n6-Diethyl-n2,n2,n4,n4-tetramethylmelamine -3.507 -2.616 566 o,p'-DDE -6.357 -6.399 567 o-Chlorobenzoic Acid -1.872 -2.191 568 o-Chlorophenol -1.054 -2.587 569 o-Fluorobenzoic Acid -1.289 -1.250 570 o-Nitrophenol -1.745 -2.368 571 o-Phenylphenol -2.386 -3.337 572 o-Terphenyl -5.301 -5.014 573 o-Tolidine -2.213 -3.111 574 o-Toluenesulfonamide -2.023 -2.367 575 o-Toluic Acid -2.060 -1.951 576 o-Toluidine -0.860 -2.042 577 4-(Dodecyloxy)benzoic Acid -2.447 -4.514 578 p-Acetylphenyl Isothiocyanate -0.022 -3.122 579 p-Anisaldehyde -1.502 -1.662 580 p-Biphenyl Isothiocyanate -4.854 -4.705 581 p-Cresol -0.701 -2.243 582 p-Dibromobenzene -4.072 -4.302 583 p-Ethoxyphenyl Isothiocyanate -4.260 -3.718 584 p-Ethylphenol -1.397 -2.588

149

585 p-Fluorobenzoic Acid -2.067 -1.625 586 p-Methylbenzyl Isothiocyanate -3.796 -3.869 587 p-Phenylenediamine -0.379 -1.196 588 p-Phenylphenol -3.481 -3.323 589 p-Toluenesulfonamide -1.734 -2.350 590 p-Tolyl Isothiocyanate -4.721 -3.638 591 p-tert-Pentylphenol -2.990 -3.250 592 l-Mandelic Acid -0.233 -1.923 593 S-Trioxane 0.288 -0.479 594 tert-Amylbenzene -4.150 -3.552 595 trans-Crotonic Acid 0.000 -1.287 596 trans-Stilbene -5.793 -4.322

150

Appendix B

The measures of skewness and kurtosis of the distribution of surface property

values were added to the set of Parasurf ’07 statistical descriptors during the study of

phospholipidosis-inducing drugs. These measures are described briefly below.

Skewness, or the third standardized moment, is a measure of the asymmetry of data

distribution, describing the left or right –handedness of a distribution of values. It is

described by the equation:

( )( )

3

11 31

N

ii

x x

Nγ

σ=

−=

−

∑

where x is the mean, σ is the standard deviation, and N is the number of data points. The

skewness for a normal distribution is zero, and any symmetric data should have a

skewness near zero. Negative values for the skewness indicate data that are skewed left

and positive values for the skewness indicate data that are skewed right.

Kurtosis, or the fourth standardized moment, is a measure of whether the data distribution

is peaked or flat relative to a normal distribution and is described by the equation:

( )( )

4

12 41

N

ii

x x

Nγ

σ=

−=

−

∑

where x is the mean, σ is the standard deviation, and N is the number of data points.

Data sets with high kurtosis tend to have a distinct peak near the mean, decline rather

rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flat top near the

mean rather than a sharp peak.

151

References 1. Kennedy, S. P.; Bormann, B. J. Effective partnering of academic and physician

scientists with the pharmaceutical drug development industry. Experimental Biology and Medicine 2006, 231, 1690-1694.

2. Silverman, R. B. The Organic Chemistry of Drug Design and Drug Action. 2nd ed.; Elsevier Academic Press: New York, 2004.

3. Kubinyi, H. Opinion: Drug research: myths, hype, and reality. Nature Reviews Drug Discovery 2003, 2, 665-668.

4. Kubinyi, H. Lectures of the Drug Design Course. http://www.kubinyi.de/lectures.html

5. Hammett, L. P. Effect of structure upon the reactions of organic compounds. Benzene derivatives. Journal of the American Chemical Society 1937, 59, 96-103.

6. Hansch, C.; Maloney, P.; Fujita, T.; Muir, R. M. Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature 1962, 194, 178-80.

7. Fischer, H.; Gottschlich, R.; Seelig, A. Blood-Brain Barrier Permeation: Molecular Parameters Governing Passive Diffusion. Journal of Membrane Biology 1998, 165, 201-211.

8. Overton, E. Osmotic properties of the cells and their importance for toxicology and pharmacology. Zeitschrift für Physikalische Chemie, Stöchiometrie und Verwandtschaftslehre 1897, 22, 189-209.

9. Sangster, J. Octanol-Water Partition Coefficients: Fundamentals and Physical Chemistry. Wiley: New York, 1997; p 79-112.

10. Meylan, W. M.; Howard, P. H. Atom/fragment contribution method for estimating octanol-water partition coefficients. Journal of Pharmaceutical Science 1995, 84, 83-92.

11. Hansch, C.; Steward, A. R.; Anderson, S. M.; Bentley, D. The parabolic dependence of drug action upon lipophilic character as revealed by a study of hypnotics. Journal of Medicinal Chemistry 1968, 11(1), 1-11.

12. Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced Drug Delivery Reviews 1997, 1997, 3-25.

13. Clark, T. Quantum Cheminformatics: An Oxymoron? Beilstein Institute Workshop, Chemical Data Analysis in the Large, May 22-26, 2000, Bozen, Italy 2000.

14. Clark, T.; Ford, M.; Essex, J.; Richards, W. G.; Ritchie, D. W. A non-atom-based paradigm for modeling QSAR and QSPR. In QSAR and Molecular Modelling in Rational Design of Bioactive Molecules, Proceedings of the 15th European Symposium on Structure-Activity Relationships (QSAR) and Modelling Istanbul, Turkey, Sept. 5-10, 2004.

15. Monard, G.; Kenneth M. Merz, J. Combined Quantum Mechanical/Molecular Mechanical Methodologies Applied to Biomolecular Systems. Accounts of Chemical Research 1999, 32(10), 904-911.

16. Warshel, A.; Levitt, M. Theoretical studies of enzymic reactions: dielectric, electrostatic and steric stabilization of the carbonium ion in the reaction of lysozyme. Journal of Molecular Biology 1976, 103(2), 227-49.

152

http://www.kubinyi.de/lectures.html

17. Clark, T. Modelling the chemistry: time to break the mould? Euro QSAR 2002, Designing drugs and crop protectants, 111-121.

18. Murray, J. S.; Lane, P.; Brinck, T.; Paulsen, K.; Grice, M. E.; Politzer, P. Relationships of critical constants and boiling points to computed molecular surface properties. Journal of Physical Chemistry 1993, 97(37), 9369-9373.

19. Murray, J. S.; Politzer, P. Statistical analysis of the molecular surface electrostatic potential: an approach to describing noncovalent interactions in condensed phases. Journal of Molecular Structure 1998, 425, 107-114.

20. Murray, J. S.; Ranganathan, S.; Politzer, P. Correlations between the solvent hydrogen bond acceptor parameter β and the calculated molecular electrostatic potential. Journal of Organic Chemistry 1991, 56, 3734-3737.

21. Politzer, P.; Lane, P.; Murray, J. S.; Brinck, T. Investigation of relationships between solute molecule surface electrostatic potentials and solubilities in supercritical fluids. Journal of Physical Chemistry 1992, 96(20), 7938-7943.

22. Politzer, P.; Murray, J. S. Molecular electrostatic potentials and chemical reactivity. In Rev. Comput. Chem., Lipkowitz, K.; Boyd, R. B., Eds. VCH: New York, 1998; Vol. 2, p 273.

23. Politzer, P.; Murray, J. S.; Peralta-Inga, Z. Molecular Surface Electrostatic Potentials in Relation to Noncovalent Interactions in Biological Systems. International Journal of Quantum Chemistry 2001, 85, 676-684.

24. Ehresmann, B.; Martin, B.; Horn, A. H. C.; Clark, T. Local molecular properties and their use in predicting reactivity. Journal of Molecular Modeling 2003, 9, 342-347.

25. Ehresmann, B.; Groot, M. J. d.; Alex, A.; Clark, T. New Molecular Descriptors Based on Local Properties at the Molecular Surface and a Boiling-Point Model Derived from Them. Journal of Chemical Information and Computational Sciences 2004, 43, 658-668.

26. Sjoberg, P.; Murray, J. S.; Brinck, T.; Politzer, P. A. Average local ionization energies on the molecular surfaces of aromatic systems as guides to chemical reactivity. Canadian Journal of Chemistry 1990, 68, 1440-1443.

27. Mulliken, R. S. New electroaffinity scale; together with data on valence states and on valence ionization potentials and electron affinites. Journal of Chemical Physics 1934, 2, 782-93.

28. Mulliken, R. S. Electronic population analysis on LCAO-MO molecular wave functions. II. Overlap populations, bond orders, and covalent bond energies. Journal of Chemical Physics 1955, 23, 1833-40.

29. Pearson, R. G. Density functional theory: electronegativity and hardness. Chemtracts: Inorganic Chemistry 1991, 3(6), 317-33.

30. Schürer, G.; Gedeck, P.; Gottschalk, M.; Clark, T. Accurate parametrized variational calculations of the molecular electronic polarizability by NDDO-based methods. International Journal of Quantum Chemistry 1999, 75, 17.

31. Jäger, R.; Kast, S. M.; Brickmann, J. Parameterization Strategy for the MolFESD Concept: Quantitative Surface Representation of Local Hydrophobicity. Journal of Chemical Information and Computational Sciences 2003, 43, 237-247.

32. Jäger, T.; Schmidt, F.; Schilling, B.; Brickmann, J. Localization and quantification of hydrophobicity; The molecular free energy density (MolFESD) concept and its application to the sweetness recognition. Journal of Computer-Aided Molecular Design 2000, 14, 631-646.

33. Pixner, P.; Heiden, W.; Merx, H.; Möller, A.; Moeckel, G.; Brickmann, J. Empirical Method for the Quantification and Localization of Molecular

153

Hydrophobicity. Journal of Chemical Information and Computational Sciences 1994, 34, 1309-1319.

34. Ehresmann, B.; Groot, M. J. d.; Clark, T. A Surface-Integral Solvation Energy Model: The Local Solvation Energy. Journal of Chemical Information and Computational Sciences 2005, 45, 1053-1060.

35. Clark, T.; Lin, J.-H.; Horn, A. H. C. Parasurf '06, A1; CEPOS InSilico Ltd.: 26 Brookfield Gardens Ryde, Isle of Wight PO33 3NP, 2005.

36. SYBYL 7.0, Tripos Inc.: 1699 South Hanley Rd., St. Louis, Missouri, 63144, USA. 37. Politzer, P.; Weinstein, H. Some relations between electronic distribution and

electronegativity. Journal of Chemical Physics 1979, 71, 4218-4220. 38. Koopmans, T. C. The distribution of wave function and characteristic value among

the individual electrons of an atom. Physica 1933, 1, 140-113. 39. Lin, J.-H.; Clark, T. An Analytical, Variable Resolution, Complete Description of

Static Molecules and Their Intermolecular Binding Properties. Journal of Chemical Information and Modelling 2005, 45(4), 1010-1016.

40. Rivail, J.-L.; Cartier, A. Variational Calculation of Electronic Multipole Molecular Polarizabilites. Molecular Physics 1978, 36, 1085-1097.

41. Rivail, J.-L.; Cartier, A. An Extended Variational Method for Calculating Molecular Multipole Polarizabilities. Chemical Physics Letters 1979, 61, 469-472.

42. Martin, B.; Clark, T. Dispersion treatment for NDDO-based semiempirical MO techniques. International Journal of Quantum Chemistry 2006, 106(5), 1208-1216.

43. Martin, B.; Gedeck, P.; Clark, T. Additive NDDO-based atomic polarizability model. International Journal of Quantum Chemistry 2000, 77(1), 473-497.

44. Schamberger, J.; Gedeck, P.; Martin, B.; Schindler, T.; Hennemann, M.; Horn, A. H. C.; Ehresmann, B.; Clark, T. GEISHA, Erlangen, Germany, 2003.

45. DeLano, W. L. The PyMOL Molecular Graphics System, DeLano Scientific: Palo Alto, CA, USA, 2002.

46. Lombardo, F.; Shalaeva, M. Y.; Tupper, K. A.; Gao, F.; Abraham, M. H. ElogPoct: A Tool for Lipophilicity Determination in Drug Discovery. Journal of Medicinal Chemistry 2000, 43, 2922-2928.

47. Mannhold, R.; Cruciani, G.; Dross, K.; Rekker, R. Multivariate analysis of experimental and computational descriptors of molecular lipophilicity. Journal of Computer-Aided Molecular Design 1998, 12, 573-581.

48. Mannhold, R.; van de Waterbeemd, H. Substructure and whole molecule approaches for calculating logP. Journal of Computer-Aided Molecular Design 2001, 15, 337-354.

49. Nadig, G.; Zant, L. C. V.; Dixon, S. L.; Kenneth M. Merz, J. Charge-Transfer Interactions in Macromolecular Systems: A New View of the Protein/Water Interface. Journal of the American Chemical Society 1998, 120(22), 5593-5594.

50. CORINA 3D Structure Generator, Molecular Networks, GmbH: Erlangen, Germany, 2006.

51. Sadowski, J.; Gasteiger, J.; Klebe, G. Comparison of Automatic Three-Dimensional Model Builders Using 639 X-Ray Structures. Journal of Chemical Information and Computational Sciences 1994, 34, 1000-1008.

52. Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model. Journal of the American Chemical Society 1985, 107(13), 3902-9.

154

53. Clark, T.; Alex, A.; Beck, B.; Burkhardt, F.; Chandrasekhar, J.; Gedeck, P.; Horn, A. H. C.; Hutter, M.; Martin, B.; Rauhut, G.; Sauer, W.; Schindler, T.; Steinke, T. VAMP, 9.0; Accelrys Inc.: San Diego, 2003.

54. Winget, P.; Horn, A. H. C.; Selçuki, C.; Martin, B.; Clark, T. AM1* Parameters for Phosphorous, Sulfur and Chlorine. J. Mol. Model. 2003, 9, 408-414.

55. Rinaldi, D.; Rivail, J.-L. Molecular polarizabilities and dielectric effect of the medium in the liquid state. Theoretical study of the water molecule and its dimers. Theor. Chim. Acta 1973, 32, 57.

56. Rinaldi, D.; Rivail, J.-L. Calculation of molecular electronic polarizabilities. Comparison of different methods. Theor. Chim. Acta 1974, 32, 243-251.

57. TSAR 3.3, 3.3; Oxford Molecular Ltd.: Oxford, England, 2000. 58. Breindl, A.; Beck, B.; Clark, T. Prediction of the n-Octanol/Water Partition

Coefficient, logP, Using a Combination of Semiempirical MO-Calculations and a Neural Network. Journal of Molecular Modelling 1997, 3, 142-155.

59. Hansch, C.; Leo, A.; Hoekman, D. Exploring QSAR: Hydrophobic, Electronic, and Steric Constants. The American Chemical Society: Washington, D.C., 1995.

60. Sotomatsu, T.; Nakagawa, Y.; Fujita, T. Quantitative Structure-Activity Studies of Benzoylphenylurea Larvicides. Pesticides Biochem. and Physiol. 1987, 27, 156-164.

61. Klammt, A.; Schüürmann, G. COSMO: A new approach to dielectric screening in solvents with explicit expressions for the screening energy and its gradient. J. Chem. Soc., Perkin Transactions 1993, 2, 799-805.

62. Schüürmann, G. Prediction of Henry's Law Constant of Benzene Derivatives Using Quantum Chemical Continuum-Solvation Models. Journal of Computational Chemistry 2000, 21, 17-34.

63. Thompson, J. D.; Cramer, C. J.; Truhlar, D. G. New Universal Solvation Model and Comparison of the Accuracy of the SM5.42R, SM5.43R, C-PCM, D-PCM, and IEF-PCM Continuum Solvation Models for Aqueous and Organic Solvation Free Energies and for Vapor Pressures. Journal of Physical Chemistry B 2004, 108, 6532-6542.

64. Wang, J.; Wang, W.; Huo, S.; Lee, M.; Kollman, P. A. Solvation Model Based on Weighted Solvent Accessible Surface Area. Journal of Physical Chemistry B 2001, 105, 5055-5067.

65. Tehan, B. G.; Lloyd, E. J.; Wong, M. G.; Pitt, W. R.; Gancia, E.; Manallack, D. T. Estimation of pKa Using Semiempirical Molecular Orbital Methods. Part 2: Application to Amines, Anilines, and Various Nitrogen Containing Heterocyclic Compounds. Quantitative Structure-Activity Relationships 2002, 21.

66. Physical/Chemical Property Database (PHYSPROP), Syracuse Research Corporation, Environmental Research Center: Syracuse, NY, USA.

67. Shirakawa, H.; Louis, E. J.; MacDiarmid, A. G. Synthesis of electrically conducting organic polymers: halogen derivatives of polyacetylene, (CH)x. J. Chem. Soc., Chem. Commun. 1977, 578-580.

68. Thomas, K. R. J.; Lin, J. T.; Tao, Y.-T.; Chuen, C.-H. Quinoxalines Incorporating Triarylamines: Potential Electroluminescent Materials with Tunable Emission Characteristics. Chemistry of Materials 2002, 14, 2796-2802.

69. Thomas, K. R. J.; Lin, J. T.; Tao, Y.-T.; Ko, C.-W. New Star-Shaped Luminescent Triarylamines: Synthesis, Thermal, Photophysical, and Electroluminescent Characteristics. Chemistry of Materials 2002, 14, 1354-1361.

155

70. Yin, S.; Shuai, Z.; Wang, Y. A Quantitative Structure-Property Relationship Study of the Glass Transition Temperature of OLED Materials. Journal of Chemical Information and Computational Sciences 2003, 43, 970-977.

71. Yalkowsky, S. H.; Dannenfelser, R. M. AQUASOL database of aqueous solubility. In College of Pharmacy, University of Arizona, Tucson, AZ: 2000.

72. ACD/Solubility DB, release 10.0, Advanced Chemistry Development, Inc.: Toronto ON, Canada, 2006.

73. Cheng, A.; K. M. Merz, J. Prediction of Aqueous Solubility of a Diverse Set of Compounds Using Quantitative Structure-Property Relationships. Journal of Medicinal Chemistry 2003, 46(17), 3572-3580.

74. Delaney, J. S. ESOL: Estimating Aqueous Solubility Directly from Molecular Structure. Journal of Chemical Information and Computational Sciences 2004, 44(3), 1000-1005.

75. Xie, L.; Liu, H. The Treatment of Solvation by a Generalized Born Model and a Self-Consistent Charge-Density Functional Theory-Based Tight-Binding Model. Journal of Computational Chemistry 2002, 23, 1404-1415.

76. Reasor, M. J. A review of the biology and toxicologic implications of the induction of lysosomal bodies by drugs. Toxicology and Applied Pharmacology 1989, 97, 47-56.

77. Anderson, N.; Borlak, J. Drug-induced phospholipidosis. Federation of European Biochemical Societies Letters 2006, 580, 5533-5540.

78. Reasor, M. J.; Kacew, S. Drug-Induced Phospholipidosis: Are There Functional Consequences? Experimental Biology and Medicine 2001, 226, 825-830.

79. Halliwell, W. H. Cationic amphiphilic drug-induced phospholipidosis. Toxicologic Pathology 1997, 25, 53-60.

80. Fujita, T.; Iwasa, J.; Hansch, C. A new substituent constant, π, derived from partition coefficients. Journal of the American Chemical Society 1964, 86(23), 5175-5180.

81. Coulombe, P. A.; Kan, F. W.; Bendayan, M. Introduction of a high-resolution cytochemical method for studying the distribution of phospholipids in biological tissues. European Journal of Cell Biology 1988, 46(3), 564-76.

82. Bauknecht, H.; Zell, A.; Bayer, H.; Levi, P.; Wagener, M.; Sadowski, J.; Gasteiger, J. Locating biologically active compounds in medium-sized heterogenous datasets by topological autocorrelation vectors: dopamine and benzodiazepine agonists. Journal of Chemical Information and Computational Sciences 1996, 36(6), 1205-13.

83. Sadowski, J.; Wagener, M.; Gasteiger, J. Assessing similarity and diversity of combinatorial chemistry libraries by spatial autocorrelation functions and neural networks. Angewandte Chemie, Int'l Ed. 1996, 34(24), 2674-7.

84. Boser, B. E.; Guyon, I.; Vapnik, V. N. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory 1992, 5, 144-152.

85. Burges, C. J. C. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 1998, 2, 121-167.

86. Cortes, C.; Vapnik, V. Support-Vector Networks. Machine Learning 1995, 20(3), 273-297.

87. Friedman, J. H. Multivariate Adaptive Regression Splines. Annals of Statistics 1991, 19(1), 1-141.

88. Friedman, J. H. Estimating functions of mixed ordinal and categorical variables using adaptive splines. In New Direction in Statistical Data Analysis and

156

Robustness, Morgenthaler, S.; Ronchetti, E.; Stahl, W. A., Eds. Birkhaüser: 1993; pp 73-113.

89. Schölkopf, B.; Sung, K.-K.; Burges, C. J. C.; Girosi, F.; Niyogi, P.; Poggio, T.; Vapnik, V. Comparing support vector machines with gaussian kernels to radial basis function classifiers. IEEE Trans. on Signal Processing 1997, 45, 2758-2765.

90. Weininger, D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. Journal of Chemical Information and Computational Sciences 1988, 28, 31-36.

91. Chang, C.-C.; Lin, C.-J. LIBSVM: a Library for Support Vector Machines. 2003. 92. Cherkassky, V.; Gehring, D.; Mulier, F.; Friedman, J. H.; Masters, T. XTAL

Software Package, ver. 5, University of Minnesota Electrical Engineering Dept.: Minnesota, 1995.

93. Tomizawa, K.; Sugano, K.; Yamada, H.; Horii, I. Physicochemical and Cell-Based Approach for Early Screening of Phospholipidosis-Inducing Potential. Journal of Toxicological Sciences 2006, 31(4), 315-324.

94. Ploemen, J.-P. H. T. M.; Kelder, J.; Hafmans, T.; Sandt, H. v. d.; Burgsteden, J. A. v.; Salemink, P. J. M.; Esch, E. v. Use of physicochemical calculation of pKa and ClogP to predict phospholipidosis-inducing potential. Experimental and Toxicologic Pathology 2004, 55, 347-355.

95. Fischer, H.; Kansy, M.; Potthast, M.; Csato, M. Prediction of in vitro phospholipidosis of drugs by means of their amphiphilic properties. In Rational Approaches to Drug Design, Proceedings of the 13th European Symposium on Quantitative Structure-Activity Relationships, Hoeltje, H. D.; Sippl, W., Eds. Prous Science: Barcelona, 2001; pp 286-289.

96. Lipinski, C. A.; Lombardo, F.; Dominy, B. W. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Advanced Drug Delivery Reviews 2001, 46, 3-26.

97. Cramer, R. D., III; Patterson, D. E.; Bunce, J. D. Comparative Molecular Field Analysis (CoMFA). 1. Effect of Shape on Binding of Steroids to Carrier Proteins. Journal of the American Chemical Society 1988, 110, 5959-5967.

98. Poso, A.; Juvonen, R.; Gynther, J. Comparative molecular field analyses of compounds with CYP2A5 binding affinity. Quantitative Structure-Activity Relationships 1995, 14, 507-511.

99. Geladi, P.; Kowalski, B. Partial least squares regression: A tutorial. Analytica Chimica Acta 1986, 185, 1-17.

100. Gerlach, R. W.; Kowalski, B. R.; Wold, H. O. A. Partial least-squares path modelling with latent variables. Analytica Chimica Acta 1979, 112(4), 417-21.

101. Dijkstra, T. Latent variables in linear stochastic models: Reflections on maximum likelihood and partial least squares methods. 2nd ed.; Sociometric Research Foundation: Amsterdam, The Netherlands, 1985.

102. Green, S. M.; Marshall, G. R. 3D-QSAR: a current perspective. Trends in pharmacological sciences 1995, 16(9), 285-91.

103. Klebe, G.; Abraham, U.; Mietzner, T. Molecular similarity indices in a comparative analysis (CoMSIA) of drug molecules to correlate and predict their biological activity. Journal of Medicinal Chemistry 1994, 37, 4130-4146.

104. Clark, T.; Lin, J.-H.; Horn, A. H. C. Parasurf '07, A1; CEPOS InSilico Ltd.: 26 Brookfield Gardens Ryde, Isle of Wight PO33 3NP, 2006.

105. de Jong, S. SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory Systems 1993, 18, 251-263.

157

106. Guccione, S.; Doweyko, A. M.; Chen, H.; Barretta, G. U.; Balzano, F. 3D-QSAR using 'Multiconformer' alignment: The use of HASL in the analysis of 5-HT1A thienopyrimidinone ligands. Journal of Computer-Aided Molecular Design 2000, 14, 647-657.

107. Allinger, N. L.; Yuh, Y. H.; Lii, J.-H. Molecular Mechanics. The MM3 Force Field for Hydrocarbons. Journal of the American Chemical Society 1989, 111(23).

108. Lanig, H.; Utz, W.; Gmeiner, P. Comparative Molecular Field Analysis of Dopamine D4 Receptor Antagonists Including 3-[4-(4-Chlorophenyl)piperazin-1-ylmethyl]pyrazolo[1,5-a]pyridine (FAUC 113), 3-[4-(4-Chlorophenyl)piperazin-1-ylmethyl]-1H-pyrrolo-[2,3-b]pyridine (L-745,870), and Clozapine. Journal of Medicinal Chemistry 2001, 44, 1151-1157.

109. Wang, R.; Gao, Y.; Liu, L.; Lai, L. All-Orientation Search and All-Placement Search in Comparative Molecular Field Analysis. Journal of Molecular Modeling 1998, 4, 276-283.

110. Zheng, M.; Yu, K.; Liu, H.; Luo, X.; Chen, K.; Zhu, W.; Jiang, H. QSAR analyses on avian influenza virus neuraminidase inhibitors using CoMFA, CoMSIA, and HQSAR. Journal of Computer-Aided Molecular Design 2006, 20, 549-566.

111. Andrews, L. E.; Banks, T. M.; Bonin, A. M.; Clay, S. F.; Gillson, A.-M. E.; Glover, S. A. Mutagenic N-Acyloxy-N-alkoxyamides: Probes for Drug-DNA Interactions. Australian Journal of Chemistry 2004, 57, 377-381.

112. Andrews, L. E.; Bonin, A. M.; Fransson, L. E.; Gillson, A.-M. E.; Glover, S. A. The role of steric effects in the direct mutagenicity of N-acyloxy-N-alkoxyamides. Mutation Research 2006, 605, 51-62.

113. Bonin, A. M.; Glover, S. A.; Hammond, G. P. A comparison of the reactivity and mutagenicity of N-benzoyloxy-N-benzyloxybenzamides. Journal of Organic Chemistry 1998, 63, 9684-9689.

114. Böhm, M.; Stürzebecher, J.; Klebe, G. Three-Dimensional Quantitative Structure-Activity Relationship Analyses Using Comparative Molecular Field Analysis and Comparative Molecular Similarity Indices Analysis To Elucidate Selectivity Differences of Inhibitors Binding to Trypsin, Thrombin, and Factor Xa. Journal of Medicinal Chemistry 1999, 42, 458-477.

115. Tropsha, A.; Cho, S. J. Cross-validated r2 guided region selection for CoMFA studies. Perspectives in Drug Discovery and Design 1998, 12/13/14, 57-69.

116. Kroemer, R. T.; Hecht, P.; Guessregen, S.; Liedl, K. R. Improving the Predictive Quality of CoMFA Models. Perspectives in Drug Discovery and Design 1998, 14, 41-56.

117. Verma, R. P.; Hansch, C. A QSAR study on influenza neuraminidase inhibitors. Bioorganic & Medicinal Chemistry 2006, 14, 982-996.

118. Doweyko, A. M. The hypothetical active site lattice. An approach to modelling active sites from data on inhibitor molecules. Journal of Medicinal Chemistry 1988, 31(7), 1396-406.

119. Andrews, P. R.; Craik, D. J.; Martin, J. L. Functional group contributions to drug-receptor interactions. Journal of Medicinal Chemistry 1984, 27(12), 1648-57.

120. Becker, O. M.; Levy, Y.; Ravitz, O. Flexibility, Conformation Spaces, and Bioactivity. Journal of Physical Chemistry B 2000, 104, 2123-2135.

121. Furnham, N.; Blundell, T. L.; DePristo, M. A.; Terwilliger, T. Is one solution good enough? Nature Structural and Molecular Biology 2006, 13(3), 184-185.

122. Günther, S.; Senger, C.; Michalsky, E.; Goede, A.; Preissner, R. Representation of target-bound drugs by computed conformers: implications for conformational libraries. BMC Bioinformatics 2006, 7, 1-11.

158

123. Kuntz, I. D.; Chen, K.; Sharp, K. A.; Kollman, P. A. The maximal affinity of ligands. Proceedings of the National Academy of Sciences of the United States of America 1999, 96, 9997-10002.

159

Curriculum Vitae

Name: Kendall Grant Byler

Birthdate: 21.05.1970

Birthplace: Huntsville, AL, United States

Education

05/03-05/07

Doctor rerum naturalium

Friedrich-Alexander-Universität, Erlangen-Nürnberg

Computer-Chemie-Centrum, Prof. Dr. Tim Clark

08/97-12/01

Master of Science

The University of Alabama in Huntsville

08/88-05/93

Bachelor of Science

The University of Alabama in Huntsville

Publications

• Byler, K.; de Groot, M. J.; Clark, T. Support Vector Classification for the Prediction of

Phospholipidosis Induction. The 20th Darmstadter Molecular Modelling Workshop Erlangen,

Germany 2006.

• Byler, K.; Ehresmann, B.; de Groot, M. J.; Clark, T. Surface-Integral QSPR Models: Local

Energy Properties. The 19th Darmstadter Molecular Modelling Workshop Erlangen,

Germany 2005.

• Lawton, R. O.; Alexander, L. D.; Setzer, W. N.; Byler, K. G. Floral essential oil of

Guettarda poasana inhibits yeast growth. Biotropica 1993, 25, 483-486.

• Setzer, W. N.; Flair, M. N.; Byler, K. G.; Huang, J.; Thompson, M. A.; Moriarty, D. M.;

Lawton, R. O.; Windham-Carswell, D. B. Antimicrobial and cytotoxic activity of crude

extracts of Araliaceae from Monteverde, Costa Rica. Brenesia 1992, 38, 123-130.

160