05345831

7/30/2019 05345831

1/10

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 3, MARCH 2010 1355

Simulated Multispectral Imagery for Tree SpeciesClassication Using Support Vector Machines

Ville Heikkinen, Timo Tokola, Jussi Parkkinen, Ilkka Korpela, and Timo Jskelinen

Abstract The information content of remotely sensed data de-pends primarily on the spatial and spectral properties of the imag-ing device. This paper focuses on the classication performance of the different spectral features (hyper- and multispectral measure-ments) with respect to three tree species. The Support Vector Ma-chine was chosen as the classication algorithm for these features.A simulated optical radiation model was constructed to evaluatethe identication performance of the given multispectral systemfor the tree species, and the effects of spectral-band selectionand data preprocessing were studied in this setting. Simulationswere based on the reectance measurements of the pine ( Pinus sylvestris L .), spruce [ Picea abies (L .) H . Karst .], and birch trees

( Betula pubescens Ehrh . and Betula pendula Roth ). Leica ADS80airborne sensor with four spectral bands (channels) was used as axed multispectral sensor system that leads to response values forthe at-sensor radiance signal. Results suggest that this four-bandsystem has inadequate classication performance for the threetree species. The simulations demonstrate on average a 515 per-centage points improvement in classication performance whenthe Leica system is combined with one additional spectral band.It is also demonstrated for the Leica data that feature mappingthrough a Mahalanobis kernel leads to a 510 percentage pointsimprovement in classication performance when compared withother kernels.

Index Terms Feature extraction, image sensors, pattern classi-cation, radiometry, remote sensing.

I. INTRODUCTION

C LASSIFICATION is one of the approaches in deriving in-formation from remotely sensed forest data, and detailedtree species classication is important in forest inventories fortechnical, ecological, and economic reasons [14]. The adequateaccuracy level for practical applications is above 90%, as thevalue of the forest data deteriorates rapidly at lower accuracies[15]. Tree species classication is an evident bottleneck incurrent remote sensing of forests, in spite of the ample researchcarried out into the use of airborne laser scanning and therecently introduced digital aerial multispectral cameras (e.g.,

[25]). These cameras offer enhanced radiometric and geometricproperties when compared with traditional lm cameras, butthey are not customized for forestry applications but for sur-

Manuscript received April 6, 2009; revised May 22, 2009 and July 27, 2009.First published December 4, 2009; current version published February 24,2010. This work was supported by the Academy of Finland under Grant123193.

V. Heikkinen, J. Parkkinen, and T. Jskelinen are with the Faculty of Science, InFotonics Center, University of Joensuu, 80101 Joensuu, Finland(e-mail: [email protected].).

T. Tokola is with the Faculty of Forest Sciences, University of Joensuu,80101 Joensuu, Finland.

I. Korpela is with the Faculty of Agriculture and Forestry, Department of Forest Resource Management, University of Helsinki, 00014 Helsinki, Finland.

Digital Object Identier 10.1109/TGRS.2009.2032239

veying and mapping purposes. There is still a substantial lack of basic research into the spectral characteristics distinguishinggiven forest objects, and such information would be valuable inspecifying optimal sensors designed for forest use.

The classication of objects in images is based on theirgeometrical or spectral features. At the single-tree level, im-portant structures contribute to the image texture only at veryhigh resolutions, and such images are often too expensive toacquire over large areas. We will thus ignore the spatial featuresof images here and focus entirely on features derived from

pointwise multispectral measurements. Tree species recogni-tion algorithms that operate at the individual tree level havebeen developed, and they are mainly based on the spectralproperties of the observed signal [9], [10], [13], [22].

A property which characterizes the spectral imaging systemis the number of the individual wavelength bands sensed in theelectromagnetic spectrum. Every spectral band of the sensor hasa corresponding spectral response function with some shape,and the number of these bands denes the dimensionality of themeasurement vectors. The bandwidth of an individual band inthe sensor system is usually called its spectral resolution. Cur-rent sensor technology allows the capture of spectral data us-ing hundreds of high-resolution spectral bands simultaneously.Imaging devices with these capabilities provide a possibilityto use well-known analytical methods to extract representativespectral space features from the data. It has been shown that theclassication based on high-dimensional reectance data can becarried out accurately and efciently using linear mappings tolower-dimensional subspaces for each class [3], [13], [20].

Although the most informative spectral data are obtainedwith systems involving hundreds of spectral bands, the useof such imaging devices can be impractical or too costly insome applications. For example, the width of the imaged area(swath width) of hyperspectral devices for remote sensing issmaller than that of multispectral devices. Usually, the high-

dimensional hyperspectral data also involve a high level of redundancy, implying an inefcient data management and stor-age. The identication and usage of a small number of data-dependent relevant bands would increase the applicability of such an imaging system.

When the imaging device has a low number of spectralbands, the available data already reside in the xed lower-dimensional subspace dened by the spectral response func-tions of the data-independent sensor system. Consequently, anefcient linear feature extraction might be disrupted due tothe lower information content of the measured data. It is alsopossible that the system may have been optimized for someparticular task, which could lead to poor performance when it

0196-2892/$26.00 2009 IEEE

7/30/2019 05345831

2/10

1356 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 3, MARCH 2010

is used for classication purposes. In this paper, the modelingproblems due to lower information content of the multispectralmeasurements are compensated for by means of various datapreprocessing methods and nonlinear feature space mappingsthrough positive denite kernel functions. The kernel techniquedoes not compensate for the lack of information content, but

they introduce the tools to model complex data structures non-linearly. The features derived from the kernel give representa-tions for the data in a high-dimensional feature space, where theclassication task is assumed to be easier to accomplish [24].

Recently, support vector machine (SVM) classiers havebeen found to achieve excellent accuracy when used for theclassication of remotely sensed data. In particular, when com-bined with kernel functions, SVMs have been found to giveresults that compete well with the best previously availableclassiers [4], [7], [11], [18], [19]. An SVM classier is alsorobust to noise and high-dimensional data and gives the solution(support vectors) in the form of a sparse representation, whichis benecial in practical usage [4].

Performance of a classier is also affected by the prepro-cessing stage for data. Standard methods for preprocessing aredifferent scaling methods, outlier detection, principal compo-nent analysis, and whitening. In this paper, we concentrateon outlier removal and whitening transformation. Whiteningtransformation is related to the use of translation invariantMahalanobis kernel, which has been suggested to allow fastmodel selection of the SVM algorithm [1].

The objective of the present paper is to evaluate the effectsof spectral-band selection and data preprocessing on treespecies identication with the SVM algorithm. Simulatedhigh-spectral-resolution radiance measurements and simulated

response values [digital numbers (DNs)] of a four-channelLeica ADS80 airborne camera were used as a basis for thispaper [16]. The simulation features are based on real reectancemeasurements of pine ( Pinus sylvestris L. ), spruce [Picea abies( L.) H. Karst. ], and birch trees ( Betula pubescens Ehrh. and Betula pendula Roth ). The simulated setting and the availabilityof a reectance ensemble allow the Leica ADS80 system tobe studied in conjunction with additional spectral responsefunctions. Simulated measurements obtained using alternativemultispectral system are compared in terms of the classicationaccuracy of the SVM algorithm, when the rst-order polyno-mial, Gaussian, and Mahalanobis kernels are used. Statistical

signicance between classications using different kernels andsystems was evaluated with McNemars test.We demonstrate that signicant improvements for classi-

cation accuracy are obtained when an additional spectral re-sponse function is added to the Leica system. In addition, wealso present improvements of classication accuracy by usingpreprocessing of data.

II. OPTICAL RADIATION MODEL

In the following, we introduce the optical radiation modeland data used for the simulations of the hyper- and multi-spectral measurements. In the notation used in the following,the wavelength variable is denoted by (in nanometer) andthe wavelength region by . Operators are written in capital

letters, and functions in lower case letters. Vectors are denotedby boldface letters, and the corresponding vector componentsare superscripted.

In the visible and near-infrared regions, the at-sensor radi-ance component of a perfectly diffuse reecting surface (this isalso known as Lambertian surface model, where reectance is

independent of viewing angle) is approximated as

()= r ()l0() s () v ()

cos()+ r ()

l() v ()

+ s ()(1)

where [23]l0 : R + the exo-atmospheric solar irradiance;l : R + the irradiance at the surface due to skylight; s : [0, 1] the atmospheric transmittance along the solar

path; v : [0, 1] the atmospheric transmittance along the sen-

sor view path;r : [0, 1] the spectral reectance of the object;

the angle between the surface normal and thesolar incident angle;s () the path scattered radiance at-sensor

component.The dependence on the spatial location is not explicitly

written into the earlier equation. For a non-Lambertian surface,we could assume a xed viewing angle and replace r ()/ withthe bidirectional reectance distribution function [23].

We approximate that v = 1 for our airborne sensor andassume that s () = 0 for the indirect component. Assumingthat the angle is xed, the effect of the electromagneticradiation in the k-band camera can be modeled as

x i =

i1 () c ()s i ()d(2)

where i = 1 , . . . , k , s i : [0, 1] is the spectral responsefunction of the ith camera band, and c : [0, 1] is thetransmittance of the camera optics. The scalar i1 corresponds tothe chosen measurement geometry and exposure setting, whilethe function gathers together the nonlinearity of the system.Substituting the radiance function into (2), we have

x i =

i2

wi ()r ()d

(3)

with i2 = (1 / ) i1 cos() as the system calibration con-stant and

wi () = ( l0() s () + l()) c ()s i () = ld () c ()s i ().(4)

We assume here that the response functions {s i }ki =1 are locatedin the wavelength interval of the visible and near-infraredradiation = [390, 850] nm. This wavelength region wasbasedon the properties of the available reectance ensemble.

We use the discrete high-resolution daylight measurement asan approximation for the irradiance

ld () = l0() s () + l(). (5)

7/30/2019 05345831

3/10

HEIKKINEN et al. : SIMULATED MULTISPECTRAL IMAGERY FOR TREE SPECIES CLASSIFICATION 1357

Fig. 1. Daylight irradiance, sampling of 5 nm.

The spectral power distribution of daylight used here corre-sponds to spectroradiometer measurements of the hemisphericdaylight in the region of 380780 nm, including the global spec-tral irradiance E on a horizontal surface from direct sunlightand the entire sky. The measurements correspond to middayconditions with clear weather, and they were carried out inJoensuu, Finland. Due to the restricted wavelength range of the measurements, the daylight irradiance was extended to theregion of 380850 nm using a constant value in the region of 780850 nm (see Fig. 1). This daylight irradiance was used forall the simulations.

A. Reectance Ensemble

The reectance spectra of needles of young (less than40 years old) Scots pines ( Pinus sylvestris L. ) and Norwayspruces [ Picea abies ( L.) H. Karst. ] and the leaves of birches

( Betula pubescens Ehrh. and Betula pendula Roth ) collectedfrom Finland and Sweden were used as experimental data[13]. The spectroradiometric measurements were made in clearweather during the growing season in June 1992. Each radiancemeasurement represented the average spectrum of thousands of leaves on a growing tree. The measured crowns were thick inorder to minimize the effects of branch color and backgroundillumination. The reectance component of the signal wascalculated using the baseline measurement at a distance of 5 m, with the aid of a BaSO4 reference surface [12]. In the mea-surement setting, the sun was always behind the measurementdirection of the sensor (backscattering), with a clear path toward

the object. The solar vector had an almost constant elevationangle, but the azimuth angle with respect to the measurementdirection had a variation.

The measurements can be considered to be free of spectralsignatures from other classes. They were carried out on theground at a distance of 50 m (eld of view 0.6 m 2 ) using aPR 713/702 AM spectroradiometer in the wavelength intervalof 3901070 nm with a 4-nm spectral sampling. Repetitionaccuracy of the device is 3.5%. The original spectra weretransformed to the wavelength range of 390850 nm at 5-nmsampling intervals by linear interpolation. The upper limit of the wavelength was set to 850 nm, as no signicant differencesin the spectra were found above this wavelength for thesemeasurements. The ensemble sizes for the three classes werealmost equal: 336 samples for birch, 369 samples for pine, and

Fig. 2. Average spectra of birch, pine, and spruce ensembles.

348 samples for spruce. Differences between average spectraof deciduous (birch) and coniferous (pine and spruce) groupsare shown in Fig. 2. More details on the measurement settingand the analysis of the data subspaces for discrimination andapproximation are given in [13].

B. Response Functions

Systems {s i }ki =1 based on rectangular response functionswere used in this paper. Examples of real multispectral systemsusing response functions of this kind are the Leica ADS40 andADS80 (airborne digital sensor) systems [16]. These ideal func-tions {s i }ki =1 correspond to the characteristic functions of non-overlapping wavelength intervals {i }ki =1 and are dened as

s i () = ci i () (6)

where ci R + and

i () =1, i0, otherwise. (7)

The wavelength supports for the Leica system are 1 =[428, 492], 2 = [533, 587], 3 = [608, 662], and 4 =[833, 887]. These bands are comparable in their spectralproperties to Landsat bands 14 [2]. The response functions of the Leica system are shown in Fig. 3.

C. Numerical Approximation of Multispectral ResponsesHigh-spectral-resolution reectance measurements were

used as approximations for the true reectance values r ( i ),where the measurements i = 1 , . . . , n correspond to the spec-troradiometer measurements and i correspond to uniformsampling of the wavelength interval .

For the camera, we assume that can be inverted and c = 1 .We used Simpsons rule as a standard quadrature model for thenumerical integration in order to simulate the camera responsevalues in accordance with (2). The approximated model isformulated as

x i i2 3m

m

t =0wi ( t )q ( t )r ( t ) (8)

7/30/2019 05345831

4/10


Fig. 3. Leica system with (solid) an additional band in the 705755-nmwaveband, (dash-dotted) an additional band in the 710725-nm waveband, and(dashed) an additional band in the 695725-nm waveband.

where m is given by the xed sampling interval for theregion and q is the quadrature weight of Simpsons rule [21].

Using a vector notationr

= ( r (0), . . . , r (m ))T

R n (n = m + 1) for the measured reectance and a matrixnotation for the responsivity matrix W R k n (including theweight q ), we can write (2) as

x = ( x1 , . . . , x k )T W r . (9)

This numerical model was used to form the simulated responsesfrom a sampled spectrum r . The sampling interval was set to5 nm (n = 93) , which corresponded to the maximal spectralresolution of our reectance data.

III. EXTENSION OF THE SENSOR SYSTEM

We chose to study how the addition of a new characteristic re-sponse function would improve the classication performanceof the four-band Leica system introduced in Section II-B. Threealternative response functions with different bandwidths areconsidered in this paper.

In the rst case, the additional response function was chosenbased on the existing real system. In some congurations,the Leica system provides an optional near-infrared channelsupported in the region of 705755 nm [2]. The four-band Leicasystem and the extended system are shown in Fig. 3.

In the second case, the four-band Leica system was extendedusing an extra response function based on the properties of thereectance ensembles of the trees and the original Leica system.We xed two different bandwidths and calculated the posi-tions of these response functions by measuring the separabilityamong the three classes. The positions of the new responsefunctions were allowed to change only in the nonsupported662833-nm gap of the original Leica system. The details of the calculation are described below.

Two new response functions corresponding to the dif-ferent bandwidths are derived by maximizing the averageJeffriesMatusita distance (see [26]) of the three tree classes.This distance is based on the calculation of the average Bhat-tacharyyadistance between the density functions of two classes.Assuming normal distribution for the classes, the distance

between class i and j is dened asJ ij = 2(1 exp( ij )) (10)

Fig. 4. Leica system and average spectra for the second derivatives of thetree ensembles in the wavelength region of 420850 nm (solid line = spruce,dash-dotted line = birch, and dashed line = pine).

where

ij =18

(x i x j )T (( i + j )/ 2) 1 (x i x j )+12

ij (11)

ij = ln(10) lg (|( i + j )/ 2|) 12lg (| i |) 12

lg (| j |) .

(12)

In the earlier formulation, covariance matrices i and jcorrespond to the simulated 5-D multispectral responses, andthe determinants of these matrices are presented in their ten-based logarithmic scales (see [26] for more information). Mul-tispectral responses were simulated using (9) with concatenatedresponsivity matrix W T p,b = [ w p,b W ]T R

n 5 , where W cor-respond to the four-band Leica system and w p,b correspond tothe fth response function with bandwidth (b) and position ( p).In this paper, the bandwidths were chosen to be 30 and 15 nm.For both of these bandwidths, we calculated the optimal posi-tions from the region of 662833 nm (using a 5-nm samplinggrid), which were dened to correspond to the maximal averagedistances between the classes. In the calculations, we usedrandomizations of available dataset, where 75% of the availablereectance data were used to simulate the multispectral ensem-bles for the three classes. Approximately, same amount of datasamples was used for every class. Because of the deviationin the properties of the randomized datasets, there was somevariation in the optimal positions. This variation with respectto optimal positions was approximately 10 nm, but all thedistance values in this region were close to the maximal values.

The response functions chosen for this paper are located in theregions of 695725 and 710725 nm (see Fig. 3).To support the choice of the new response functions pre-

sented earlier, we analyzed the differences in second-orderderivative features between classes so as to identify changesin the reectance curve. Derivative analysis has already beenused for remotely sensed data in previous studies [5] [26].In order to dampen the effect of measurement noise, meanltering with a window size of three units was performed beforethe calculation of the divided difference approximation for thesecond derivative [26]. When the average derivative curves areanalyzed, it can be seen that the 660830-nm interval showsan interesting behavior when compared with other wavelengthregions and locations of the original Leica response functions(see Fig. 4). The average of the second derivative for the birch

7/30/2019 05345831

5/10


Fig. 5. Average spectra for the second derivatives of the tree ensemblesin the unsupported wavelength region of 660780 nm of the Leica system(solid line = spruce, dash-dotted line = birch, and dashed line = pine).

spectra deviates strongly from that of the pine and spruce

spectra in the regions of 690720 and 725755 nm. It can bealso seen that the derivative of spruce spectra shows deviationin shape from that of the pine and birch spectra in the region of 710725 nm (see Fig. 5).

Summarizing the earlier discussion, the nal candidatesfor the fth band were [705 ,755] (), [695 ,725] (), and [710 ,725] (). The classication performance was studied forthe three ve-channel systems corresponding to these responsefunctions. Details of the classication are presented in thefollowing sections.

IV. CLASSIFICATION DETAILS

We used an SVM classication algorithm to discriminatebetween the simulated multispectral or hyperspectral measure-ments. The algorithm is based on the optimization problem of nding a separating hyperplane between the feature vectors of two classes [24]. The separating hyperplanes identied by theSVM maximize the margin between the classes and are robustfor the classication of unseen samples. Since the multispec-tral/hyperspectral signatures inside the three classes were notmixed with signatures from other sources, this setting can bedened as a pure pixel classication [3].

Let us assume that we have a binary classication problemwith training data in the form {x i , yi }li =1 , with x i R k and

yi {1, 1}. In the SVM framework, it is assumed that thedata are mapped to some feature space F with feature map

: R k F (13)

and an explicit representation of the decision function is writ-ten as

f (x ) = sign w T (x ) + b (14)

where b is a bias term and w T (x ) + b = 0 denes a hyper-plane in the feature space. If the data are separable in the featurespace, it can be written that f (x i ) +1 , when yi = +1 andf (x i ) 1, when yi = 1.

Assuming that the two classes are not separable in the featurespace, the classication model is derived as the solution to the

minimization problem

minw,b, 12 wT w + C li =1 i ,

s.t. yi w T (x i ) + b > 1 i , i = 1 , . . . , land i > 0, i = 1 , . . . , l

(15)

where the term w T w / 2 corresponds to the margin betweenclasses, the parameter C controls the penalization of the sam-ples located at the incorrect side of the decision boundary, and{ i }li =1 are slack variables which indicate misclassication of sample x i when i > 1 [24].

The solution is obtained using the dual space of Lagrangemultipliers and the property

(x , z ) = ( x )T (z ) (16)

where kernel denes the mapping : R k F of inputsamples x , z R k to the feature space F [24]. In this way,nonlinear decision boundaries in the input space are denedwithout having to make explicit use of the possibly innitedimensional feature space F . The decision function for theSVM becomes

f (x ) = signl s

i =1

i yi (x , x i ) + b (17)

where ls is the number of support vectors, { i }l si =1 are thecalculated Lagrange multipliers, and is the selected positivedenite kernel function [24]. The algorithm for multiclassclassication is an extension of binary classications usingseparate binary classications. For more information on SVMand multiclass techniques, please refer to the study in [7], [18],and [24].

The present SVM classication was performed with a poly-nomial kernel of rst degree

L (x , z ) = x T z + 1 (18)

and a Gaussian kernel

G (x , z ) = exp x z 22 (19)

where denes the length scale of the kernel. When the rst-degree polynomial kernel is used, it is assumed that the decisionboundaries between the classes are hyperplanes in the original

input space.

A. Data Preprocessing

The data were standardized to a zero mean and unit vari-ance before the calculations. For the Gaussian kernel, thispreprocessing is equivalent to the use of a kernel

G (x , z ) = exp (x z ) 2 1 . (20)

The norm x 2 1 = xT 1 x is dened by a diagonal matrix

1ii = 1 / 2i , and i denotes the standard deviation of the ithcomponents of the training set.

The above preprocessing step can be generalized with a fullcovariance matrix using the translation invariant Mahalanobis

7/30/2019 05345831

6/10


Fig. 6. Assumed birch outliers (10% of the data) for one training set. Theoutliers are calculated from the Leica DN using the Mahalanobis distance. Thedata in the gure are represented in terms of the two most signicant principalcomponents of the set.

kernel

M (x , z ) = exp /m (x z ) 2

1 (21)

where = E [(x x )( x x )T ] denotes the covariance matrixof the training ensemble {x i }li =1 , x denotes the expected valueof the ensemble and m = k [1]. This preprocessing correspondsto data whitening, where the data are rst represented in orthog-onal directions dened by the eigenvectors of the covariancematrix and scaled to have equal variance in this orthogonalrepresentation. Using eigendecomposition of the covariancematrix = USU T and notation d = ( x z ) for the differencevector, the generalized norm can be written in the form

d T 1 d = d T US 1U T d =k

i =1

(d T u i )2 /S ii (22)

where u i and S ii denote the ith eigenvector and correspondingeigenvalue, respectively. Equation (22) shows that coordinatesd T u i of the difference vector are divided by the data variancesin the corresponding directions. Directions corresponding to asmall data variance are given more weight.

Usage of Mahalanobis kernel is closely related to outlierremoval based on the Mahalanobis distance [23]

x x 2 1 = ( x x )T 1(x x ). (23)

When using this distance for data preprocessing, we assume

that the data distribution in one class has an ellipsoidal shape.According to this assumption, large Mahalanobis distancesindicate anomalous data items and should be removed from theensemble. For example, the dataset used in this study includesreectance measurements with different solar geometries andtherefore contains some variation that might disrupt the clas-sication performance. It can be expected that this kind of renement method for training data weights the essential partsof the ensemble more efciently (see Fig. 6).

B. Model Training

The kernel and margin parameters were found using a tenfoldcross-validation routine [24], with a grid search performed forthe Gaussian kernel, while an alternative two-step line search

TABLE ICLASSIFICATION ERROR RATES WHEN THE S IMULATED HYPERSPECTRAL

DATA ARE USED . STATISTICALLY SIGNIFICANT D IFFERENCES TO THEBES T PERFORMING KERNEL ARE INDICATED W IT H UNDERLINES

method was used for the Mahalanobis kernel [1]. In the rststep, a line search for parameter C was performed using a xedvalue = 1 . In the second step, the resulting parameter C wasxed, and a line search was performed for parameter . Thisprovided a signicant speeding up of the training phase withoutany signicant decrease in accuracy.

The kernel evaluations were carried out using the high-dimensional radiance measurements or simulated camera re-sponses (DNs). The multiclass one against one method wasused for the classications, which were carried out using theSimpleSVM Matlab toolbox (v.2.31) [17]. In this method, the

SVM is trained separately for each pair of classes, and the de-cision regarding the class label for the test sample x is made byvoting between the binary classiers concerned. This strategyhas been veried as accurate for land cover classication [18].

V. EXPERIMENTAL DETAILS

The classication performance was calculated for the DNsfrom the simulated multispectral systems and for the simulatedfull-resolution radiance data with 5-nm sampling. The wave-length interval was xed at 390850 nm. The covariance matrixis poorly suited for use with full-resolution data due to the

small effective dimensionality of the data and the instabilityof the inversion. Magnitude of the condition numbers of thecovariance matrices of radiance data was 1011 . Because of this,the results of the classication of the full-resolution data withthe Mahalanobis kernel are not presented. A subspace mappingor regularization technique of some kind would be needed inorder to use this kernel efciently with high-dimensional data.

Misclassication ratios were calculated for the combinationsof sensor system and kernel, employing ve randomizationsof the available data to the training and test sets. These samerandomizations were used for all the systems and kernels. Thesample sizes for the tree classes were the following: birch,

336; pine, 369; and spruce, 348. In each randomization, 75%of the samples in each class were assigned to the training set,and the remaining 25% were used in the test set. The datawere standardized for the rst-degree polynomial and Gaussiankernels.

This data preprocessing method was also compared to pre-processing with outlier removal. In this paper, 10% of thetraining data items corresponding to the largest Mahalanobisdistances were removed; this is done for each class separatelyusing the simulated 4- or 5-D camera responses. The resultsare presented in Tables IV. The results obtained after extend-ing the Leica system with additional sensors are denoted inTables IV and V by the respective wavelength intervals sup-ported. In the following, the Gaussian and Mahalanobis kernelsare called as nonlinear kernels.

7/30/2019 05345831

7/10


TABLE IICLASSIFICATION ERROR RATES WHEN DN S FROM THE LEICA SYSTEMARE USE D . STATISTICALLY S IGNIFICANT DIFFERENCES TO THE BEST

PERFORMING KERNEL ARE INDICATED W ITH UNDERLINES

TABLE IIICLASSIFICATION ERROR RATES WHEN DN S FROM THE LEICA SYSTEMARE USED AND 10% OF THE TRAINING DATA ITEMS ARE REMOVED AS

OUTLIERS . STATISTICALLY S IGNIFICANT DIFFERENCES TO THE BESTPERFORMING KERNEL ARE INDICATED W ITH UNDERLINES

TABLE IVCLASSIFICATION ERROR RATES WHEN DN S FROM THE F IV E -C HANNEL

SYSTEMS ARE USED . STATISTICALLY S IGNIFICANT D IFFERENCES TOTHE BES T PERFORMING KERNEL AND SYSTEM ARE INDICATED

W IT H UNDERLINES AND BOLDFACE , RESPECTIVELY

TABLE VCLASSIFICATION ERROR RATES WHEN DN S FROM THE F IV E -C HANNEL

SYSTEMS ARE USED AND 10% OF THE TRAINING DATA ITEMS ARE

REMOVED AS OUTLIERS . STATISTICALLY S IGNIFICANT D IFFERENCESTO THE BEST PERFORMING KERNEL AND SYSTEM ARE INDICATEDW IT H UNDERLINES AND BOLDFACE , RESPECTIVELY

A. Statistical Signicance of Classication Differences

McNemars test was used to test the statistical signicancebetween the classication results [6], [8].The McNemars valuewith continuity correction is dened as

M =(|f 12 f 21 | 1)2

f 12 + f 21(24)

where f 12 is the number of samples misclassied by classier1 but not by classier 2 and f 21 is the number of samplesmisclassied by classier 2 but not by classier 1. The nullhypothesis is that the two different classiers 1 and 2 have thesame error rate, which means that f 12 = f 21 . McNemars testis based on a 2 test with one degree of freedom. In this paper,the 2 critical value with a 5% level of signicance was chosen,and with one degree of freedom, the value is 3.8414. If the null

hypothesis is true, the probability of having a McNemars valuegreater than the critical value is less than 5%.

Results are presented in Tables IV, so that, for every sensorsystem, statistically signicant differences to the best perform-ing kernel are indicated with underlining. The McNemarstest for statistical signicance shows that there are signicant

differences between the rst-degree polynomial and nonlinearkernels. The signicant difference between the Gaussian andMahalanobis kernels can be seen only for the simulated four-channel Leica system (Sets 2 and 5). When the outlier removalis performed, the signicant differences between these kernelsvanish. In the case of the ve-channel systems, no signicantdifference is detected between these two kernels.

For every kernel (Tables IV and V), statistically signicantdifferences to the best performing sensor system is indicatedwith bold face notation. It was validated for all the kernels thatthere are no statistically signicant differences between the sys-tems with additional support in the 695725- and 705755-nmregions. This result is also valid for the case where outlierremoval is used. Signicant differences can be found when thesystem supported in the region of 710725 nm is compared withtwo other systems.

B. Classication Results

The results in Table I show that the classication based onthe hyperspectral measurements of radiance leads to superiorperformance relative to the classication based on the four-bandmeasurements in Table II. The large number of wavelengthbands allows small changes in spectral shape to be detected. Forthe high-dimensional radiance data, a rst-degree polynomial

kernel gives a slight improvement in classication accuracyover the Gaussian kernel. The results for the low-dimensionalinputs suggest that it is benecial to use the nonlinear kernel.The use of a nonlinear feature space compensates for the poorspectral accuracy of the measurement system; a decrease of approximately 523 percentage points in the misclassicationperformance is achieved when a nonlinear kernel is substitutedfor the rst-degree polynomial kernel. The Mahalanobis kerneloutperforms the Gaussian kernel in four cases and has almostequal performance for one set. It can be seen that the maximaldifference in misclassication in favor of the Mahalanobiskernel is 16 percentage points.

It is shown in Table III that the removal of 10% of data pointsas outliers clearly increases the accuracy of the rst-degreepolynomial and Gaussian kernels when compared with thetraining sets including outliers, whereas for the Mahalanobiskernel, the results are similar with or without this preprocessing.

We also studied how an additional response function wouldenhance the classication performance. Three wavelength loca-tions with varying support were tested. It is shown in Table IVthat all these ve-channel systems show a signicant improve-ment in classication performance relative to the results for thefour-channel system in Table II, with the misclassication ratiodecreasing by 123 percentage points depending on the kernel.Nonlinear kernels seem to benet most from the addition of anew band; the largest increase in accuracy is obtained for Set 2.For the Gaussian kernel, there is a decrease of 22.8 percentage

7/30/2019 05345831

8/10


points in the misclassication ratio. The response function inthe 705755-nm region shows almost identical performance tothat in the 695725-nm region, with the misclassication ratiodiffering between these two regions by 04 percentage pointsin favor of the former. A response function in the region of 710725 nm leads to the most accurate classication with both

the Gaussian and Mahalanobis kernels, reducing the averageerror by approximately 0.59 percentage points relative to theother two ve-channel systems. On the other hand, this sensorsystem leads to an increase in the errors relative to the otherve-channel systems with the rst-degree polynomial kernel. Inmost cases, the Gaussian and Mahalanobis kernels have similarperformance with these ve-channel systems.

Outlier removal for the ve-channel systems (Table V) againleads to improved performance for the rst-degree polynomialkernel with almost every dataset and every system, whereas theresults for the Gaussian kernel with these ve-channel systemsare poorer in almost every case due to preprocessing. For theMahalanobis kernel, the results again show a somewhat similarperformance with or without preprocessing. The results arepoorer for the systems with additional support in the 695725-and 705755-nm regions but remain almost unaltered for thesystem with a sensor in the 710725-nm region.

VI. D ISCUSSION

We presented the classication results when additionalspectral response functions were used with the Leica sys-tem. The bandwidth of the best performing response function [710 ,725] () is smaller than that of the other two responsefunctions [705 ,755] () and [695 ,725] (), and the wavelength

range of 710725 nm is also covered by the other two ad-ditional bands. Thus, the results suggest that the deviation inperformance is partly due to the smaller bandwidth of thisresponse function. The effectiveness of the band [710 ,725] ()depends also on the kernel because results suggested that therst-degree polynomial kernel does not show consistent perfor-mance difference between the three different bands. The resultsalso show that there was no signicant performance differencebetween the [705 ,755] () and [695 ,725] () bands, although thebandwidth difference is 20 nm.

The experiments suggest that a decrease in the bandwidth(from 30 to 15 nm in this paper) is useful only if the location

is specied accurately. For a response function with 15-nmbandwidth, a small shift in the position had sometimes signif-icant effect for the classication of samples from the test sets.The experiments also showed that the band-selection methodfor the optimal positions had deviating results, depending on theused data randomization (75% of the available data were used inevery randomization). Although the deviation in optimal posi-tions was small, a careful data evaluation is needed, particularlywhen position is calculated for the response function with smallbandwidth.

The three new response functions were located on the wave-length interval of 660830 nm in this paper. This region is moti-vated since response functions in this region avoid overlap withthe four Leica response functions. In addition, the derivativeanalysis shows that the tree spectra have interesting features

in this region. It can be argued that other useful regions canbe found beyond the 660830-nm region. For example, theaverage second derivatives have also elevated values in intervalsof 495525 and 590605 nm, as shown in Figs. 4 and 5. Thesetwo intervals are not supported by the original Leica system andmight also provide valuable information from the tree species.

However, further investigation is needed to nd out if thesewavelength regions have potential also for the classication of these tree species.

When classication of these reectance data at the sameresolution as in the present case was attempted previouslywith linear subspace classiers in [13], a linear pseudoinverseclassier was found to give the best performance, with a similarperformance to that of the SVM model used here for the simu-lated radiance data. In comparison with our results for the low-dimensional multispectral measurements, it has been reportedthat it is benecial to increase the order of the polynomial kernelif only a small number of input variables are present [11].

Results suggest that the measurements obtained from asimulated hyperspectral imaging device will capture essen-tial information from the training set also with rst-degreepolynomial kernel. Some techniques for the preprocessing of training sets were evaluated here in order to improve furtherthe performance of the SVM classication based on the low-dimensional multispectral measurements. For the four-channelsystem, it was veried that the usage of the Mahalanobisdistance as a preprocessor or the usage of the Mahalanobiskernel improved the classication accuracy. This suggests thatthe SVM model based on the Mahalanobis kernel is robust withrespect to outliers in the case of the datasets used in this paper.The results for the ve-channel systems suggest that the outlier

removal used here is benecial only when the classier is usedwith a rst-degree polynomial kernel. The Gaussian kernel wascapable of extracting essential information from the trainingdata of these ve-channel systems without outlier processing.On the other hand, the usage of the Mahalanobis kernel did notlead to any decrease in performance when compared with theGaussian kernel.

It can be assumed that some of the so-called outliers in datawill be due to, for instance, varying measurement conditions(e.g., variable sensorobjectsun geometry) or measurementerrors. On the other hand, the set of outliers includes alsosamples due to natural spectral variation within and between

species. In practice, it might be difcult to remove the dis-rupting data variation automatically in an optimal way so thatno essential information is removed from the training set. Theresults obtained here for the ve-channel systems, for example,suggest that the procedure for the removal of 10% of thetraining samples using Mahalanobis distance was too extremeand it decreased the classication accuracy. On the other hand,the results for this dataset suggest that the Mahalanobis kernelperformed the processing of outliers more efciently andautomatically, without need to set any threshold value.

It should be noted that, when compared with the Gaussiankernel, different cross-validation routines were used for thetraining of the SVM with Mahalanobis kernel. Training methodbased on two line searches has effect to the performance of the Mahalanobis kernel, but the difference to the grid search

7/30/2019 05345831

9/10


was found to be small for our data. It is also noted that, whenthe Mahalanobis kernel is used, the inverse covariance matrixincludes the effect of the training samples from all the threeclasses (total covariance matrix). It was veried for these datathat the set of outliers detected using the Mahalanobis distancefor every class separately is similar to the set of outliers when

calculated using the inverse of the total covariance matrix. Thissimilarity gives some explanation for the similar results of theMahalanobis and Gaussian kernels with outlier removal.

VII. CONCLUSION

A signicant amount of redundancy exists in spectral radi-ance from natural objects, and intelligent signal measurement(or compression) is appropriate. This was achieved here usingfour-channel system based on a real Leica ADS40/ADS80sensor system. A simulated optical radiation model was used toevaluate the tree species classication performance of the givensensor system using the SVM classier with three differentkernel functions. The effects of an additional fth spectral bandand data preprocessing were studied using this simulator.

We have employed a model in the simulations which includesthe following properties for the sensed signal.

1) The reectance spectra (as measured at ground level)are assumed to correspond to the signal sensed from thegeometry of the airborne camera.

2) A pure pixel assumption [3]. Depending on the distancebetween the camera and the surface (varying ight alti-tude), the reectance distribution for practical measure-ments is a mixture of different signatures present in thescene.

3) The incident light at the sensor also has some effect onaccount of indirect scattered component, which has beenignored in this paper.

4) An unknown measurement noise component is includedin the measured reectance distributions and is prop-agated to simulated responses via a quadrature modelin (9).

5) The spectral response functions of the simulated camerawere an idealization, and in reality, they will be inu-enced by the properties of the lens, beam splitter, andinterference lters and by the sensitivity of the charged-coupled device system [2].

In the previous study, it has been shown that it is possible toclassify reectance spectra accurately in a 3-D subspace usingspectral response functions from linear subspace classiers[13]. The modeling presented in this paper is more closelyrelated to practical band construction since it allows us tointerpret the subspace-mapped data directly as simulated mea-surements from a multispectral camera. Real construction of spectral response functions derived from the optimal subspaceclassier is impossible due to the wildly oscillating behavior of these functions (see [13]), and it may also be unrealistic to useonly certain very narrow wavelength bands in the measurementsystem.

Classication performance nevertheless degenerates signif-icantly from the results obtained with high-dimensional mea-surements when the camera DNs corresponding to the xed

four-channel Leica system are used as input vectors for theSVM classier. The results indicate a need for a higher numberof bands, decrease in the bandwidths, or new positioning of thebands in order to improve the classication accuracy. Of thethree extensions of the Leica system to a ve-channel systemevaluated here, band selection based on the use of the interval

of 710725 nm showed promising results, with an averagemisclassication ratio of 15%.It was also assumed that the classication performance in

low-dimensional multispectral spaces was decreased due tooutliers in the class samples. It was shown for the four-channel Leica system that the use of the Mahalanobis kernelor outlier removal increased the accuracy of the SVM classier.In addition to this, results suggest that the Mahalanobis kernelperformed the outlier processing automatically without anyuser interference and also provided signicant speedup in thetraining phase of the classier.

ACKNOWLEDGMENT

The authors would like to thank anonymous reviewers for theadvice and suggestions concerning this paper.

REFERENCES[1] S. Abe, Training of support vector machines with Mahalanobis kernel,

in Proc. ICANN , 2005, pp. 571576.[2] U. Beisl, Absolute spectroradiometric calibration of the ADS40

sensor, in Proc. Congrs ISPRS Commission Technique I. Symp. ,Marne-la-Valle, France, 2006, pp. 1418.

[3] C.-I. Chang, Hyperspectral Imaging: Techniques for Spectral Detectionand Classication . New York: Kluwer, 2003.

[4] G. Camps-Valls and L. Bruzzone, Kernel-based methods for hyperspec-tral image classication, IEEE Trans. Geosci. Remote Sens. , vol. 43,no. 6, pp. 13511362, Jun. 2005.

[5] T. H. Demetriades-Shah, M. D. Steven, and J. A. Clark, High-resolutionderivatives spectra in remote sensing, Remote Sens. Environ. , vol. 33,no. 1, pp. 5564, Jul. 1990.

[6] T. G. Dietterich, Approximate statistical tests for comparing super-vised classication learning algorithms, Neural Comput. , vol. 10, no. 7,pp. 18951923, Oct. 1998.

[7] G. M. Foody and A. Mathur, A relative evaluation of multiclass imageclassication by support vector machines, IEEE Trans. Geosci. RemoteSens. , vol. 42, no. 6, pp. 13351343, Jun. 2004.

[8] G. M. Foody, Thematic map comparison: Evaluating the statistical sig-nicance of differences in classication accuracy, Photogramm. Eng. Remote Sens. , vol. 70, no. 5, pp. 627633, 2004.

[9] F. A. Gougeon, D. A. Leckie, D. Paradine, and I. Scott, Individ-ual tree crown species recognition: The Nahmint study, in Proc. Int.Forum Autom. Interpretation High Spatial Resolution Digital ImageryForestry , D. A. Hill and D. G. Leckie, Eds., Victoria, BC, Canada, 1998,pp. 209223.

[10] A. Haara and M. Haarala, Tree species classication using semi-automatic delineation of trees on aerial images, Scand. J. For. Res. ,vol. 17, no. 6, pp. 556565, Nov. 2002.

[11] C. Huang, L. S. Davis, and J. R. G. Townshend, An assessment of sup-port vector machines for land cover classication, Int. J. Remote Sens. ,vol. 23, no. 4, pp. 725749, Feb. 2002.

[12] R. D. Jackson, S. M. Moran, P. N. Slater, and S. F. Biggar, Field cali-bration of reference reectance panels, Remote Sens. Environ. , vol. 22,no. 1, pp. 145158, Jun. 1987.

[13] T. Jskelainen, R. Silvennoinen, J. Hiltunen, and J. P. S. Parkkinen,Classication of the reectance spectra of pine, spruce, and birch, Appl.Opt. , vol. 33, no. 2, pp. 23562362, Apr. 1994.

[14] I. Korpela, Individual tree measurements by means of digital aerial pho-togrammetry, Silva Fennica Monographs, vol. 32004.

[15] I. Korpela and T. Tokola, Potential of aerial image-based monoscopicand multiview single-tree forest inventory: A simulation approach, For.

Sci. , vol. 52, no. 2, pp. 136147, Apr. 2006.[16] ADS80 Datasheet , Leica Geosystems AG, Heerbrugg, Switzerland, 2008.[Online]. Available: http://www.leica-geosystems.com

7/30/2019 05345831

10/10


[17] G. Loosli, SimpleSVM: The Matlab Toolbox2004. [Online]. Available:http://gaelle.loosli.fr/research/tools/simplesvm.html

[18] F. Melgani and L. Bruzzone, Classication of hyperspectral remote sens-ing images with support vector machines, IEEE Trans. Geosci. RemoteSens. , vol. 42, no. 8, pp. 17781790, Aug. 2004.

[19] G. Mercier and M. Lennon, Support vector machines for hyperspectralimage classication with spectral-based kernels, in Proc. IEEE IGARSS ,2003, pp. 288290.

[20] E. Oja, Subspace Methods of Pattern Recognition . Hertfordshire, U.K.:Res. Studies Press, 1983.[21] G. M. Phillips and P. J. Taylor, Theory and Applications of Numerical

Analysis . New York: Academic, 1973.[22] A. Pinz, Tree isolation and species classication, in Proc. Int. Fo-

rum Autom. Interpretation High Spatial Resolution Digital ImageryForestry , D. A. Hill and D. G. Leckie, Eds., Victoria, BC, Canada, 1998,pp. 127139.

[23] R. A. Schowengerdt, Remote Sensing: Models and Methods for ImageProcessing , 3rd ed. Amsterdam, The Netherlands: Elsevier, 2007.

[24] B. Schlkopf and A. J. Smola, Learning With Kernels . Cambridge, MA:MIT Press, 2002.

[25] R. Synjoki, P. Packaln, M. Maltamo, M. Vehmas, and K. Eerikinen,Detection of aspens using high resolution aerial laser scanning data anddigital aerial images, Sensors , vol. 8, no. 8, pp. 50375054, 2008.

[26] F. Tsai and W. D. Philbot, Derivative-aided hyperspectral image analysissystem for land-cover classication, IEEE Trans. Geosci. Remote Sens. ,

vol. 40, no. 2, pp. 416425, Feb. 2002.

Ville Heikkinen received the M.Sc. degree in ap-plied mathematics from the University of Joensuu,Joensuu, Finland, in 2004.

He is currently with the Department of ComputerScience and Statistics, University of Joensuu. He hasworked with method development in spectral dataanalysis and classication.

Timo Tokola received the D.Sc. degree in forestryfrom the University of Joensuu, Joensuu, Finland.

He has over 20 years of professional experience.He is currently a Professor of forest informationtechnology with the Faculty of Forest Sciences, Uni-versity of Joensuu. He had previously mainly workedin the elds of natural resource inventory, geograph-ical information systems (GIS), information systemplanning, and forest management planning. He hasbeen working in various projects as a Coordinator.His private and public sector assignments include

modern forest management including aerial photography, photogrammetry,satellite remote sensing, terrestrial and airborne laser scanning, GPS-basedmapping, GIS database design, analysis of GIS data, and implementation of desktop GIS systems. He has published over 50 scientic refereed paperson database design, GIS, forest resource inventory, and remote sensing. Hismain interests include developing methods for using remote sensing in natural

resource inventory and computer applications for supporting regional decisionmaking.

Jussi Parkkinen received the M.Sc. degree in med-ical physics and the Ph.D. degree in mathematicsfrom the University of Kuopio, Kuopio, Finland, in1982 and 1989, respectively.

In 19891990, he was a Visiting Researcher withThe University of Iowa, Iowa City. In 1990, hewas a Visiting Professor with the University of Saskatchewan, Saskatoon, SK, Canada. In 1991

1992, he was a Professor and the Head of the Depart-ment of Computer Science, University of Kuopio.In 19921998, he was a Professor of information

processing, and in 19951998, he was the Dean of the Department of In-formation Technology, Lappeenranta University of Technology, Lappeenranta,Finland. Since 1999, he has been a Professor of computer science, and since2007, he has been the Vice Rector responsible for research with the Universityof Joensuu, Joensuu, Finland. He specializes in spectral color image analysisandpattern recognition. Since 2007,he hasbeen a Visiting Professorwith ChibaUniversity, Chiba, Japan.

Dr. Parkkinen was the Chairman of the Finnish Pattern Recognition Societyin 19951999. He is a fellow and a member of the governing board of theInternational Association of Pattern Recognition. He was the Chairman of theCIE TC8-07 technical committee on multispectral imaging in 20042008.

Ilkka Korpela received the Ph.D. degree in forestryfrom the University of Helsinki, Helsinki, Finland,in 2004.

He is currently a Researcher with the Depart-ment of Forest Resource Management, University of Helsinki. He has worked with method developmentin 3-D measurement and classication of forest veg-etation using terrestrial and airborne image and lightdetection and ranging (LiDAR) data.

Timo Jskelinen was born in Luumaaki, Finland,in 1953. He received the Ph.D. degree in physicsfrom the University of Joensuu, Joensuu, Finland,in 1981.

From 1981 to 1991, he was a Chief Assistant andActing Associate Professor with the University of Kuopio, Kuopio,Finland. From 1987 to 1989, he wasa Visiting Research Scientist with Saitama Univer-sity, Saitama, Japan. In 1991, he was an AssociateProfessor with the Department of Physics, Universityof Joensuu. Since 1994, he has been a Professor with

the same department. He is also the Head of the department. He has publishedapproximately 100 articles. His research interests include optical materials

research, optical metrology, color research, and information optics.Prof. Jskelinen is a member of the Optical Society of America.

Documents

05345831