Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
A Stu dy of Autom atic M edical
Image Segm entation using
In dep en dent Comp onent Analy sis
연세대학교 대학원
생체공학협동과정
전기전자공학전공
배 수 현
A S tudy of A utom atic
M e dic al Im ag e S e g m ent ation u s ing
Independent Com ponent A naly s is
지도 유 선 국 교수
이 논문을 박사 학위논문으로 제출함
200 1년 6월 일
연세대학교 대학원
생 체 공 학 협 동 과 정
전기전자공학전공
배 수 현
배수현의 박사 학위논문을 인준함
심사위원 인
심사위원 인
심사위원 인
심사위원 인
심사위원 인
연세대학교 대학원
200 1년 6월 일
D e d ica t e d to M y P ar ents
이 논문을 저를 위해 항상
기도하시던 부모님께 바칩니다
백록담 (白鹿潭) - 한라산 소묘 정지용
1
절정에 가까울수록 뻐꾹채 꽃키가 점점 소모(消耗)된다. 한마루 오르면 허
리가 스러지고 다시 한마루 위에서 모가지가 없고 나중에는 얼굴만 갸웃
내다본다. 화문(花紋)처럼 판(版) 박힌다. 바람이 차기가 함경도 끝과 맞서
는 데서 뻐꾹채 키는 아주 없어지고도 팔월 한철엔 흩어진 성신(星辰)처럼
난만(爛漫)하다. 산그림자 어둑어둑하면 그렇지 않아도 뻐꾹채 꽃밭에는 별
들이 켜 든다. 제자리에서 별이 옮긴다. 나는 여기서 기진했다.
2
암고란(巖古蘭), 환약(丸藥)같이 어여쁜 열매로 목을 축이고 살아 일어섰다.
3
백화(白樺) 옆에서 백화가 촉루가 되기까지 산다. 내가 죽어 백화처럼
흴 것이 숭없지 않다.
4
귀신도 쓸쓸하여 살지 않는 한모롱이, 도체비꽃이 낮에도 혼자 무서워 파
랗게 질린다.
5
바야흐로 해발 육천 척(尺) 위에서 마소가 사람을 대수롭게 아니 여기고
산다. 말이 말끼리, 소가 소끼리, 망아지가 어미소를, 송아지가 어미 말을
따르다가 이내 헤어진다.
6
첫새끼를 낳노라고 암소가 몹시 혼이 났다. 얼결에 산길 백 리를 돌아 서
귀포로 달아났다. 물도 마르기 전에 어미를 여윈 송아지는 움매 - 움매 -
울었다. 말을 보고도 등산객을 보고도 마구 매어달렸다. 우리 새끼들도 모
색(毛色)이 다른 어미한테 맡길 것을 나는 울었다.
7
풍란(風蘭)이 풍기는 향기, 꾀꼬리 서로 부르는 소리, 제주 휘파람새 휘파
람부는 소리, 돌에 물이 따로 구르는 소리, 먼데서 바다가 구길 때 솨 - 솨
- 솔소리, 물푸레 동백 떡갈나무 속에서 나는 길을 잘못 들었다가 다시 칡
덩쿨 기어간 흰돌배기 꼬부랑길로 나섰다. 문득 마주친 아롱점말이 피(避)
하지 않는다.
8
고비 고사리 더덕순 도라지꽃 취 삿갓나물 대풀 석이(石) 별과 같은 방울
을 달은 고산식물을 색이며 취(醉)하며 자며 한다. 백록담 조찰한 물을 그
리어산맥 위에서 짓는 행렬이 구름보다 장엄하다. 소나기 놋낫 맞으며 무
지개에 말리우며 궁둥이에 꽃물 익어 붙인 채로 살이 붓는다.
9
가재도 기지 않는 백록담 푸른 물에 하늘이 돈다. 불구(不具)에 가깝도록
고단한 나의 다리를 돌아 소가 갔다. 쫓겨온 실구름 일말(一抹)에도 백록담
은 흐리운다. 나의 얼굴에 한나절 포긴 백록담은 쓸쓸하다. 나는 깨다 졸다
기도(祈禱)조차 잊었더니라.
CON T E N T S
LIS T OF F IGU RE S ⅲ
LIS T OF T A B LE S ⅴ
A B S T RA CT ⅵ
1 . IN T R OD U CT ION 1
2 . B A S IC CON CEP T S OF IN D EP EN D E N T
COM P ON EN T A N A LY S IS 4
2.1 Principal Component Analysis 7
2.2 Independent Component Analysis 9
2.3 Overcomplete Representations of Independent Component Analy sis
19
2.4 Medical Image Segmentation T echniques 26
3 . M E D ICA L IM A GE S E GM EN T A T ION U S IN G
IN D EP E N D EN T COM P ON E N T A N A LY S IS 32
3.1 Acquiring CT Image 32
3.2 Acquiring MR Image 36
3.3 Methodology 39
4 . RE S U LT S & A N A LY S IS 48
4.1 Simple T est Example 48
4.2 Evaluations of Automatic Segmentation with T est Data 53
4.3 Automatic Segmentation with Medical Images 60
4.4 Evaluations of Medical Image Segmentation 67
- i -
5 . CON CLU S ION 78
REF E RE N CE S 80
A B S T RA CT (In K ore an ) 87
- ii -
LIS T OF F IGU RE S
2.1 Example 2- D data distribution and the corresponding principal
component and independent component axes [28]. 5
2.2 A simple example of principal component analysis 8
2.3 T he instantaneous mixing and unmixing model 13
2.4 Optimal information flow in sigmoidal neurons 14
2.5 Illustration of basis vector s in a two- dimensional data space with
tw o spar se sources (top) or three spar se sources (bottom ) 22
2.6 A hypothetical frequency distribution f (I ) of intensity values I (x , y )
for fat , muscle and bone, in a CT image. Low intensity values
correspond to fat tissues, whereas high intensity values correspond
to bone 29
3.1 A simplified x - ray beam I 0 attenuated through a pixel and result s
of the attenuated beam I . A tw o- dimensional matrix of linear
attenuation coefficient s of the image. 35
3.2 Selected image data which were used in this experiment . 40
3.3 Flow chart of Medical Image Segmentation Using independent
component analysis 41
3.4 Examples of three levels of kurtosis . Each of the distributions as
the same variance 43
4.1 T est example of independent component analy sis 51
4.2 T he simple test image consist ing of two cylinder s and one ellipse
52
- iii -
4.3 T he result of automatic segmentation 52
4.4 T he selected original data used in evaluation 54
4.5 Small squares extracted from mixed test data 56
4.6 Large squares extracted from mixed test data 57
4.7 Plot PE and UMA values about small squares 59
4.8 Plot of PE and UMA values about large squares 59
4.9 T he kurtosis graph of the input data set 62
4.10 T he input image and the segmentation result 63
4.11 T he segmentation result using threshold 64
4.12 T he segmentation result of CT image 65
4.13 Result of 16 selected axial CT image segmentation 66
4.14 Volume rendered image using the result of Figure 4.6 67
4.15 T he result of manual segmentation 69
4.16 T he sensit ivity (T rue Positivity Rate) comparison between
independent component analysis method and thresholding method 70
4.17 T he result of "False Positive Rate" comparison 73
4.18 T he result of mislabelling rate comparison 76
- iv -
L IS T OF T A B LE S
4.1 Probability of Error and Ultimate Measurement Accuracy value of
independent components . 58
4.2 T he kurtosis of input data set 61
4.3 T he result of Paired- t test about sensitivity with 0.05 statistical
significance 71
4.4 T he result of Paired- t test about "False Positive Rate" with 0.05
statistical significance 74
4.5 T he result of Paired- t test about mislabelling rate with 0.05
statistical significance 77
- v -
A B S T RA CT
A S tu dy o f A u tom at ic M e dic al Im ag e S eg m ent at ion
u s in g In depen dent Com pon en t A n aly s i s
Soohyun Bae
Dept . of Biomedical Engineering
T he Graduate School
Yonsei University
Medical image segmentation is a process that partit ions original
medical images into some homogeneous regions meaningful for the
computer aided diagnosis and visualization . T his dissertation proposes
an automatic medical image segmentation technique based on
independent component analysis . Independent component analysis is a
method for solving the blind source separation problem . It finds a linear
coordinate system such that the resulting signals are as statistically
independent as possible.
Use of independent component analysis as an automatic medical
image segmentation technique allow s for more accurate segmentation of
medical images under the assumption that medical images consist of
some statistically independent part s .
- v i -
T he proposed method is applied to CT images and numerically
synthesized test data to demonstrate the performance of automatic
segmentation . T he performance evaluation methods which w ere chosen
in this dissertation were Probability of Error (PE ) and Ultimate
Measurement Accuracy (UMA ) methods . T he result of automatic
segmentation was also compared to a general segmentation method
using threshold based on sensitivity (T rue Posit ive Rate),
specificity (1- False Positive Rate) and mislabelling rate . Statistical
Paired- t test was done about the evaluation result . For the test data,
most of the PE and UMA values are close to zero, the T PR(T rue
Posit ive Rate)s are over 95 percent , FPR(False Positive Rate)s are 1
percent . T he mislabelling rate is near 1 percent . It means that the
automatic method demonstrated in this dissertation has a good result . A
statistical Paired- t test of T PR, FPR and mislabelling rate using 0.05
statistical significance has p value much low er than statist ical
significance. It means that the result of automatic method proposed in
this dissertation has better result than the general method.
Key w ords : Segmentation , Independent Component Analy sis, Medical
Image, Statistically independent component
- v ii -
CH A P T E R 1
IN T R OD U CT ION
T he properties of medical images like Computed T omography (CT ),
Magnetic Resonance Imaging (MRI) over other diagnostic imaging
modalities are their high spatial resolution and excellent discrimination
of soft t is sues, bones and other internal organs . T hese kind of medical
images provide rich information about anatomical structure, enabling
quantitat ive pathological or clinical studies ; the derivation of
computerized anatomical atlases ; as w ell as pre- and intra- operative
guidance for therapeutic intervention .
Advanced applications that use the morphologic contents of medical
images frequently require segmentation of the imaged volume into each
organ types . T his problem has received considerable attention - the
comprehensive survey article by Besdek et al.[1] list s 90 citations .
Segmentation of medical images is a prerequisit e for a variety of image
analysis and visualization tasks . It is often performed by commercial
softw are using existing algorithms such as region growing,
thresholding , boundary detection , and morphological filtering [2]- [7].
However , incomplete segmentation frequently occurs because of several
difficulties . First , partial volume artifact s due to a small gap between
voxels compromise medical images resolution . It leads to ambiguous
boundaries and mixed voxels . Second, adjacent structures connected to
the medical images have similar intensity distributions . A s a result , the
- 1 -
medical images cannot be easily separated from other interconnected
structures . Hence, localization of the features from a large number of
sectional images represent s a major challenge.
T he manual modification method is the most common w ay of
segmentation in cases of incomplete automatic segmentation [3]- [6][8]. It
is a time- consuming and tedious task because extensive user interaction
is involved to modify the incorrectly segmented border on each
sectional image. It also requires experienced users to carefully define
the features on medical images because of the complexity of that ones .
Nevertheless, resultant rendered surfaces still appear uneven . Even
though the result s of done by the experienced user s don 't have the
consistency .
Many segmentation approaches exist to eliminate the labor - intensive
work of the manual method. One of them is the deformable active
contour model, which is also referred to as the snake model[8]- [13]. It
was successfully applied to a variety of image segmentation tasks, but
it still requires user - interaction to locate the initial contour and adjust
internal parameter s of the model. Another approach is the
knowledge- based automatic segmentation tuned specifically to a
particular human organ [14]. How ever , it is generally thought to be
extremely difficult to formalize complex knowledge of the human
anatomy . A promising alternative is the interactive segmentation ,
facilit ating user s to guide the segmentation process based on 3D visual
feedback [15][16]. Instead of relying on poorly represented knowledge, it
utilizes an expert ' s judgement . Although the feasibility and utility of the
- 2 -
interactive segmentation approach have been demonstrated in CT and
MR image segmentation , the obstacles for wide applications are the
much - increased computational power for real- time 3D visualization and
the need for appropriate operator s to resolve mixed structures .
In this paper , I propose a new automatic approach to segment the
features from medical images using Independent Component
Analysis (ICA ) or Blind Source Separation (BSS ). Independent component
analysis decompose each signal of an ensemble into components (also
called 'basis vector s ' ) that are as independent as possible by a linear
transformation of the signals [17]- [19]. T he amplitude of a particular
component is extracted by a corresponding weight vector (also called a
' filter ' [17]). Using this kind of characteristics of independent component
analysis , I propose a new automatic algorithm to segment medical
images .
T his thesis is organized as follow s . In Chapter 2, the basic concept s
of independent component analysis , principal component analysis and
some evaluation methods are presented. Chapter 3 describes how the
medical image can be acquired, the assumptions chosen in this thesis
and the methodology will be described. Chapter 4 show s the result of
my experiment and the evaluation of the result will be illustrated. In
Chapter 5, the conclusion and the future work will be described.
- 3 -
CH A P T E R 2
B A S IC CON CE P T S OF IN D E PE N D E N T
COM P ON E N T A N A L Y S IS
T he goal of blind signal separation is to recover independent sources
given only sensor observations that are linear mixtures of independent
source signals [26]. T he term blind indicates that both the source signals
and the w ay the signals w ere mixed are unknown . Independent
component analysis is a method which can be used to solve the blind
signal separation problem . It is a w ay to find some w eigh vector which
make the resulting signals are as statistically independent from each
other as possible. T here have been some algorithms doing source
separation like Principal Component Analysis (PCA ). Principal component
analysis is correlation - based transformation . It is perhaps the oldest and
best - known technique in multivariate analysis [32][51]. It is very similar
to SVD(Singular Value Decomposition ) algorithm which separates the
second- order dependencies . In contrast to correlation - based
transformations , independent component analysis not only decorrelates
the signals (second- order statistics ) but also reduces higher - order
statistical dependencies [26]. Independent component analysis is a
generalization of principal component analy sis that separates the
high - order dependencies in the input , in addition to the second- order
dependencies [27]. Principal component analysis is a way of encoding
second- order dependencies in the data by rotating the axes to
- 4 -
correspond to directions of maximum covariance. Consider a set of data
point s derived from tw o underlying distributions as shown in Figure
2.1.
Figure 2.1 Example 2- D data distribution and the corresponding
principal component and independent component
axes [28].
Principal component analysis models the data as a multivariate
Gaussian and w ould place an orthogonal set of axes such that the two
distributions would be completely overlapping . Independent component
analysis does not constrain the axes to be orthogonal, and attempts to
place them in the directions of statistical dependencies in the data . Each
- 5 -
weight vector in independent component analysis att empts to separate a
portion of the dependencies in the input , so that the dependencies are
separated from betw een the elements of the output . T he projection of
the two distributions onto the independent component analysis axes
would have less overlap, and the output distributions of the two w eight
vectors would be highly kurtotic[29], which means that the output
distributions are in the state that a large concentration of values are
near zero, with rare occurrences of large positive or negative values in
the tails . T his is equivalent to the minimum entropy codes discussed by
Barlow [24].
Bell and Sejnow ski[19] recently developed an algorithm for
separating the statistically independent sources of mixed signal through
unsupervised learning . T he algorithm is based on the principle of
maximum information transfer betw een sigmoidal functions . T his
algorithm is the generalization of Linsker ' s information maximization
principle[30] to the multi- unit case and maximizes the joint entropy of
the output unit s . Another way of describing the difference betw een
principal component analysis and independent component analysis is
therefore that principal component analysis maximizes the joint
variance (covariance) of the outputs , whereas independent component
analysis maximizes the joint entropy of the outputs .
- 6 -
2 .1 Prin c ipal Com pon ent A n aly s i s
Principal component analy sis [31][32] is a well- established technique
for dimensionality reduction, that is , representing a set of m
dimensional vectors with n<m components for each vector so that
information is lost "as litt le as possible." It w as first introduced by
Pear son (1901), who used it in a biological context to recast linear
regression analysis into a new form . It w as then developed by
Hotelling (1933) in work done on psychometry . It appeared once again
and quite independently in the setting of probability theory , as
considered by Karhunen (1947); and w as subsequently generalized by
Loéve(1963)[31]. Examples of it s many applications include data
compression , image processing , visualization , exploratory data analysis,
pattern recognit ion and time series prediction . In terms of linear algebra
the problem of dimensionality reduction consist s of finding a new basis
for the data so that if w e drop out or zeroing some of the components
in the new basis , the reconstruction error is as small as possible.
Considering the simple example of Figure 2.2, in Figure 2.2(a) w e have
a two dimensional data set in which there is a significant
correlation (linear dependence) betw een the component s . If w e project the
data onto the subspace indicated by the solid line in the figure - that
is, represent the data with just one number , the value of projection onto
the subspace - we get the data set of Figure 2.2(b ).
- 7 -
Figure 2.2 A simple example of principal component analysis
(a) A data set and it s principal axes
(b ) Reduction of the data to one dimension
T he most common derivation of principal component analysis is in
terms of a standardized linear projection which maximizes the variance
in the projected space[33]. For a set of observed n - dimensional data
vectors {t n }, n {1, , N }, the q principal axes w j , j {1 , , q}, are
those orthonormal axes onto which the retained variance under
projection is maximal. It can be shown that the vector s w j are given
by the q dominant eigenvectors (i.e . those with the largest associated
eigenvalues j ) of the sample covariance matrix
S = 1N n
( t n - t ) ( t n - t ) T , where t is the data sample mean, such that
Sw j = jw j . T he q principal components of the observed vector t n are
- 8 -
given by the vector x n = W T ( t n - t ) , where W = ( w 1 , w 2 , , w q ) . T he
variables x j are then uncorrelated such that the covariance matrix
1N n
x n x nT is diagonal with element s j .
A complementary property of principal component analysis , and that
most closely related to the original discussions of Pearson is that , of all
orthogonal linear projections x n = W T ( t n - t ) , the principal component
projection minimizes the squared reconstruction errorn
||t n - t n | | 2 ,
where the optimal linear reconstruction of t n is given by t n = Wx n + t .
How ever , a notable feature of these definit ions of principal
component analysis (and one remarked upon in many text s ) is the
absence of an associated probabilistic model for the observed data [33].
2 .2 In depen dent Com pon en t A n aly s i s
T wo events A and B are called independent if
(2.1)P ( A B ) = P ( A )P ( B )
Noting that the conditional probability P ( B A ) is given by
(2.2)P ( B A ) = P ( A B )P ( A )
,
we can see that independence implies P ( B A ) = P ( B ) , if P ( A ) 0 .
- 9 -
A ssume that w e have an input signal which is composed of some
mixed source signals and the mixing process and any prior information
about the input signal are unknown . If the probabilities of the output
signals can be made to satisfy Eg . (2.1), the output signals are
independent components of the input signal. Making the output signals
as independent as possible is the goal of independent component
analysis and it can be accomplished by maximizing the mutual
information of the output signals . T he mutual information between X
and Y is the sum of marginal entropies minus the joint entropy . T his
is defined as
(2.3)
I ( Y , X ) = H ( X ) + H (Y ) - H ( X , Y )
= H ( X ) - H ( X Y )
= H (Y ) - H (Y X )
where X is the input , Y is the output and H ( Y ) is the entropy of
the output , while H ( Y X ) is whatever entropy the output has which
didn ' t come from the input [19]. Entropy is defined as
(2.4)H ( X ) =x X
P (x ) log 1P (x )
,
where the ensemble X is a random variable x with a set of possible
outcomes . For P (x ) = 0 , the entropy is zero by definition . H ( X ) is
always greater or equal to zero. H ( X , Y ) is the joint entropy of tw o
- 10 -
variables, interpreted as the redundancy betw een X and Y or ,
alternatively , as the reduction in uncertainty of one variable(e .g . X )
due to the observation of the other variable Y . Independent component
analysis attempt s to maximize the joint entropy of suitably transformed
component maps using neural netw ork , and in so doing reduces the
redundancy between the distributions of map values for different
components . T his, in effect , result s in blind separation of the mixed
signal into spatially independent components .
T he basic problem of independent component analysis is how to
maximize the mutual information that the output Y of a neural
netw ork processor contains about it s input X . T o solve the problem ,
we consider here only the gradient of information theoretic quantities
with respect to some parameter , , and assume that Y is the function
of , in our netw ork
(2.5)I ( Y , X ) = H ( Y )
because H ( Y X ) does not depend on . T his can be seen by
considering a system which avoids infinities :
(2.6)Y = G ( X ) + N ,
where G is a nonlinear squashing function and N is additive noise on
the output s . In this case H (Y X ) = H ( N ) [34]. Whatever the level of
- 11 -
this additive noise, maximization of the mutual information , I ( Y , X ) , is
equivalent to the maximization of the output entropy , H (Y ) , because
H ( N ) = 0 . Bell and Sejnow ski ' s independent component analysis
algorithm [19] is an unsupervised learning rule that w as derived from
the principle of optimal information transfer through sigmoidal neurons
based on Eq. (2.5) and Eq. (2.6). T hat is described in Figure 2.3.
Independent sources s become mixed by A . T he observed sources are
x . T he goal is to learn W that invert s the mixing A and u are the
estimates of the recovered sources . T he infomax approach is one w ay
to find the unmixing system W . It requires a nonlinear transfer function
g ( u ) .
Consider the case of a single input , x , and output , y , passed
through a nonlinear squashing function , g .
(2.7)u = x + 0
(2.8)y = g (u ) = 11 + e - u
- 12 -
Figure 2.3 T he instantaneous mixing and unmixing model.
Independent sources s become mixed by A . T he
observed sources are x . T he goal is to learn W that
invert s the mixing A and u are the estimates of the
recovered sources . T he infomax approach is one way to
find the unmixing sy stem W . It requires a nonlinear
transfer function g ( u ) .
A s illustrated in Figure 2.4, the optimal w eight on x for
maximizing transfer is the one that best matches the probability density
of x to the slope of the nonlinearity . T he optimal produces the
flattest possible output density , which in other words , maximizes the
entropy of the output .
- 13 -
Figure 2.4 Optimal information flow in sigmoidal neurons (a) Input
x having density function f x ( x ) , in this case a
gaussian , is passed through a non - linear function g (x ) .
T he information in the resulting density , f y (y ) depends
on matching the mean and variance of x to the
threshold, 0 , and slope, , of g (x ) (see Schraudolph et
al 1991). (b ) f y (y ) is plotted for different values of the
w eight . T he optimal weight , opt transmit s most
information [19].
T he derivation of independent component analysis is as follow s .
When g (x ) is monotonically increasing or decreasing (i.e ., has a unique
inver se), the probability density function of the output , f y (y ) , can be
written as a function of the probability density function of the input ,
f x ( x ) [50]
- 14 -
(2.9)f y (y ) =f x ( x )
| yx |
(2.10)
H (y ) = - E [ ln f y (y ) ]
= --
f y ( y ) ln f y ( y ) dy
Substituting Eq. (2.9) into Eq. (2.10) gives Eq. (2.11).
(2.11)H ( y ) = E [ ln | yx |] - E [ ln f x ( x ) ]
T he second term on the right (the entropy of x ) may be considered to
be unaffected by alterations in a parameter determining g ( x ) .
T herefore in order to maximize the entropy of y by changing , w e
need only concentrate on maximizing the fir st term, which is the
average log of how the input affect s the output as shown in Eq. (2.12)
(2.12)
H = ( ln | yx |)
= ( yx
) - 1 ( yx
)
In the case of the logistic transfer function like Eq. (2.7) and (2.8), each
terms in Eq. (2.12) are like Eq. (2.13) and (2.14).
- 15 -
(2.13)yx
= y ( 1 - y )
(2.14)( yx
) = y ( 1 - y )( 1 + x ( 1 - 2y ) )
Dividing Eq. (2.14) by (2.13) gives the learning rule for the logistic
function , as calculated from the general rule of Eq. (2.12):
(2.15)1 + x ( 1 - 2y )
(2.16)0 1 - 2y
T he optimal w eight is found by gradient ascent on the entropy of the
output , y with respect to .
When there are multiple input s and outputs , maximizing the joint
entropy of the output encourages the individual outputs to move
towards statistical independences . It is very similar to the derivation of
single input and single output . When the form of the nonlinear transfer
function g is the same as the cumulative density functions of the
underlying independent components (up to scaling and translation ) it can
be shown that maximizing the mutual information betw een the input X
and the output Y also minimizes the mutual information betw een the
u i [18][34]. Many natural signals , such as sound sources , have been
shown to have a super - Gaussian distribution , meaning that the kurtosis
- 16 -
of the probability distribution exceeds that of a Gaussian [19]. For
mixtures of super - Gaussian signals, the logistic transfer function has
been found to be sufficient to separate the signals [19].
T he update rule for the w eight matrix , W , for multiple inputs and
outputs is given by
(2.17)W = - ( H ( Y )
W)WT W
= - ( I + yu T )W
where y =y i
y i
u i=
u iln
y i
u iand is a learning rate. T he
learning rate is decided empirically and typically near 0.001. T he W T W
term in equation (2.17), first proposed by Amari et al.[20], avoids
matrix inver sions and speeds convergence. During training , the learning
rate is reduced gradually until the weight matrix W stops changing
appreciably (e.g . root mean square change for all elements< 10 - 6 ). We
employed the logistic transfer function , Eq. (2.8), giving y = 1 - 2y i .
T he algorithm include a "sphering" step prior to learning [18]. T he row
means are subtracted from the dataset , X , and then X is passed
through the zero- phase whitening filt er , W z , which is twice the inverse
square root of the covariance matrix :
(2.18)W z = 2 <XX T >- 1
2
- 17 -
T his removes both the fir st and the second- order statistics of the data;
both the mean and covariances are set to zero and the variances are
equalized. T he full transform from the zero- mean input w as calculated
as the product of the sphering matrix and the learned matrix ,
W I = WW z . T he pre- whitening filt er in the independent component
analysis algorithm has the Mexican - hat shape of retinal ganglion cell
receptive fields which remove much of the variability due to
lighting [18].
T he difference betw een independent component analysis and principal
component analysis is illustrated as follow s . Consider a set of data
point s derived from tw o underlying distributions as shown in Figure
2.1. Principal component analysis encodes second order dependencies in
the data by rotating the axes to correspond to directions of maximum
covariance. Principal component analy sis models the data as a
multivariate Gaussian and w ould place an orthogonal set of axes such
that the projections of the two distributions w ould be completely
overlapping . Independent component analysis does not constrain the
axes to be orthogonal, and attempts to place them in the directions of
statistical dependencies in the input , so that the distributions onto the
independent component analysis axes w ould have less overlap, and the
output distributions of the two weight vector s w ould be kurtotic[29].
- 18 -
2 .3 Ov erc om ple t e R epre s ent at ion s o f In depen den t
Com pon en t A n aly s i s
T he goal of independent component analysis is to find a linear
transformation W of the dependent sensor signals X that makes the
outputs as independent as possible:
(2.19)U (t ) = WX = WA S (t )
where U is an estimate of the sources . T he sources are perfectly
recovered when W is the inverse of A up to permutation and scale
change.
(2.20)P = R S = WA
where R is a permutation matrix and S is the scaling matrix . T he
tw o matrices define the performance matrix P so that if P is
normalized and reordered a perfect separation leads to the identity
matrix . For the linear mixing and unmixing model, the following
assumptions should be chosen [27][35][36]:
1. T he number of sensors is greater than or equal to the number
of sources N M
2. T he sources S (t ) are at each time instant mutually independent .
- 19 -
3. At most one source is normally distributed.
4. No sensor noise or only low additive noise signals are permitted.
Assumption 1 is needed to make A a full rank matrix . Assumption 2
is the basis of independent component analy sis and can be expressed as
follow s:
(2.21)p ( S (t ) ) =M
i = 1p ( S i ( t ) )
Assumption 3 and 4 is necessary to recover sources using the infomax
condition, in which the mutual information betw een outputs is only
minimized for the low noise case.
A ssumption 1 means that the standard formulation of independent
component analysis requires at least as many sensors as sources . T hat
is one of the major drawback of independent component analysis but
also one distinct feature of independent component analysis . Lewicki
and Sejnow ski have proposed a generalized independent component
analysis method for learning overcomplete representations of the data
that allow s for more basis vector s than dimensions in the input [28].
T his technique overcomes assumption 1 and 4 assuming a linear mixing
model with addit ive noise. T his approach provides a natural solution to
decomposition by finding the maximum aposteriori representation of the
data . T he prior distribution on the basis function coefficient s removes
the redundancy in the representation and leads to representations that
- 20 -
are sparse and are a nonlinear function of the data .
T he goal of this overcomplete representations is described in Figure
2.5. In a two- dimensional data space, the observations X in Figure
2.5(a , b) w ere generated by a linear mixture of 2 independent random
spar se sources . In this space, Figure 2.5(a ) show s orthogonal basis
vectors (principal component analysis ) and Figure 2.5(b ) show s
independent basis vectors .
If the 2- dimensional observed data are generated by 3 sparse
sources as shown in Figure 2.5(c, d) the complete independent
component analysis representation Figure 2.5(c) cannot model the data
adequately but the overcomplete independent component analy sis
representation Figure 2.5(d) finds 3 basis vector s that fit the underlying
distribution of the data .
T he observed M - dimensional data x = [x 1 , , x M ] T may be modeled
as a linear overcomplete mixing matrix , A, (M×N ) with additive noise.
(2.22)x = A s + n
where s = [ s 1 , , sN ] T are the sources and n is assumed to be a
white Gaussian noise with variance 2 so that
(2.23)log P (x A , s ) - 12 2 (x - A s )
- 21 -
Figure 2.5 Illustration of basis vector s in a two- dimensional data
space with tw o sparse sources (top) or three spar se
sources (bottom ). (a ) Principal component analysis finds
orthogonal basis vectors and (b ) Independent
component analysis representation finds independent
basis vectors . (c) Independent component analysis
cannot model the data distribution adequately with
three sources but (d) the overcomplete independent
component analy sis representation finds 3 basis vector s
that match the underlying data distribution [28]
- 22 -
It is also assumed that the sources s i are mutually independent , so
that the joint probability distribution has the form Eq. (2.21), and each
source s i has a sparse distribution , such as the Laplacian density .
(2.24)P ( s i) ex p ( - |s i |)
Given the above model and assumptions, the goal is to infer both
the basis vectors A and the sources s given the mixtures x .
Due to the additive noise and the rectangular mixing matrix A , the
solution for s cannot be found by the pseudo- inverse s = A + x . A
probabilistic approach to estimating the sources is based on finding the
maximum a posteriori value of s :
(2.25)s = m ax
sP ( s x ,A )
= m axs
P ( x A , s )P ( s )
Given basis vector s A, and ovservation x , Eq. (2.25) can be
optimized by gradient ascent on the log posterior distribution .
In the case of zero noise and P ( s ) Gaussian, maximizing Eq. (2.25)
is equivalent to m in s | |s | | 2 subject to x = A s . T he solution can be
obtained with the pseudoinverse, s = A + x , and is a linear function of
A and s . In the case of zero noise and P ( s ) Laplacian like Eq.
- 23 -
(2.24), maximizing Eq. (2.25) is equivalent to m in s ||s || 1 subject to
x = A s . Unlike the Gaussian prior , this solution cannot be obtained by
a simple linear operation .
T he objective for learning the basis vector s, A , is to maximize the
probability of the data which requires marginalizing over all possible
sources :
(2.26)P (x A ) = P (x A , s )P ( s )ds
For general overcomplete bases, this integral is intractable. For the
special case of zero noise and A invertible (a complete basis ), the
integral in Eq. (2.25) is solvable and leads to the standard independent
component analysis learning algorithm [19][26][37]. Lewicki and
Sejnow ski approximated Eq. (2.25) by fitting a multivariate Gaussian
around s . T he basis vector s were learned by performing gradient
ascent on the approximation to log P (x A ) . T he brief derivation of the
basis vector s as follow s [28]. For a set of indepen dent dat a v ectros
x 1:K = x 1 , , x K , ex pan d the post er ior probability den sity by a saddle point
approx im at ion :
(2.27)
log P (x 1:K |A ) = log k P (x k |A )
K L2
log2
+ K M2
log (2 )
+K
k = 1[ log P ( s k ) -
2( x k - A s k ) 2 - 1
2log det H ( s k ) ]
- 24 -
T he derivative of Eq. (2.27) is like Eq. (2.28).
(2.28)
log P ( x | A ) =
log P ( s) -2
(x - A s) 2 - 12
log det H
T he first and third terms in Eq. (2.28) is like in Eq. (2.29) and (2.30).
T he seconde term in Eq. (2.28) doesn 't have gradient component so it
vanishes to zero.
(2.29)log P ( s )A
= - W T z s T
(2.30)log det HA
= 2 A H - 1 - 2 W T y sT
Gathering above equation leads to
(2.31)A A A T
Alog P ( x A )
- A ( ( s ) sT
+ I)
where ( s i ) =log P ( s i )
s i
is called the score function, and I is the
identity matrix . T he prefactor A A T produces the natural gradient
extension [20] which speeds convergence. Note that A in Eq. (2.31) is
not restricted to be a square matrix .
- 25 -
In the case of a Laplacian prior on s m ,
(2.32)
log P ( s )m
log P ( sm )
= M log -M
m = 1|s m |
Because log P ( s ) is piece- wise linear in s , the curvature is zero, and
there is a discontinuity in the derivative at zero. T o approximate the
volume contribution for P ( s ) , w e use
(2.33)s mlog P ( s ) - t anh ( s m )
which is equivalent to using the approximation P ( s m ) cosh - / ( s m ) .
For large this approximates the true Laplacian prior wile staying
smooth around zero. T his leads to the following diagonal expression for
the second- derivative matrix :
(2.34)2
s 2m
log P ( s m ) = - sech 2 ( s m )
2 .4 M e dic al Im ag e S e g m ent at ion T e chniqu e s
Since the advent of computed tomography, methods for automatically
detecting anatomical object s in CT and MR images have been active
- 26 -
areas of research in the medical imaging community . But the
anatomical object s of medical images are complex and in some cases
the anatomical scale of object s is small with the field view of the
images . As a result , the spatial contrast of the features of interest is
very poor and the features become very noisy and ambiguous .
T his section is intended to provide a review of exist ing image
segmentation methods and provide a motivation for the medical image
segmentation using independent component analysis .
If an image has been preprocessed appropriately to remove noise and
artifact s , segmentation is often the key step in interpreting the image.
Image segmentation is a process in which regions or features sharing
similar characteristics are identified and grouped together . Image
segmentation may use statistical classification, thresholding , edge
detection , region detection , or any combination of these techniques . T he
output of the segmentation step is usually a set of classified elements ,
such as tissue regions or tis sue boundaries .
Most segmentation techniques are either region - based or edge- based.
Region- based techniques rely on common patterns in intensity values
within a cluster of neighboring pixels . T he cluster is referred to as the
region , and the goal of the segmentation algorithm is to group regions
according to their anatomical or functional roles . Edge- based techniques
rely on discontinuities in image values between distinct region , and the
goal of the segmentation algorithm is to accurately demarcate the
boundary separating these regions .
T hresholding is one of the most important and the simplest
- 27 -
approaches to image segmentation . It is used extensively in many
image processing applications [39]- [43]. It is one of the region - based
segmentation .
T hresholding is based on the notion that regions corresponding to
different tis sue types can be classified by using a range function
applied to the intensity values of image pixels . T he assumption is that
different t is sue types will have a distinct frequency distribution and can
be discriminated on the basis of the mean and standard deviation of
each distribution which is shown in figure 2.6. For example, given a
tw o- dimensional image I (x , y) , we can define a simple threshold rule to
classify bone or a compound threshold rule to classify soft tissue:
Simple: (if I (x , y) > I 0 ⇒ Bone)
Compound : (if I 0 < I (x , y ) < I 1 ⇒ Soft tissue)
In practice, the type of thresholding just described can be expected
to be successful in highly controlled environments . One of the areas in
which this is often possible is in industrial inspection applications,
where illumination control is usually feasible . T hresholding has been
used by Chow and Kaneko to segment ventricles form cineangiograms
of the human heart [44]. T hey chose this technique because of the
strong bimodal distribution of image values corresponding to regions
that are interior and exterior to the ventricles .
- 28 -
Figure 2.6 A hypothetical frequency distribution f (I ) of intensity
values I (x , y ) for fat , muscle and bone, in a CT image.
Low intensity values correspond to fat tissues , whereas
high intensity values correspond to bone. Intermediate
intensity values correspond to muscle tissue. F + and
F - refer to the false positive and false negatives ; T +
and T - refer to the true positives and true negatives .
- 29 -
T he major drawback to threshold- based approaches is that they
often lack the sensitivity and specificity needed for accurate
segmentation . Sensitivity is defined as the true positivity rate for a
function or a test that must detect the presence or absence of some
intrinsic property [45][46][47]. Hence, the purpose of the test is to
determine, as accurately as possible, the presence or absence of this
intrinsic property . Formally , the sensitivity of segmentation test is
defined as follow s:
(2.35)SE N SIT IV IT Y = T R U E +IN T R IN SIC +
where T R U E + is defined as the number of samples that have the
intrinsic property and were categorized by the test as positive, and
IN T R IN SIC + is defined as the total number of elements that have the
intrinsic property (regardless of the outcome of the test ).
In Figure 2.6, the area under the Bone curve and to the right of the
Bone T hreshold line is classified as T + because this test will correctly
categorize such elements as being bone. T he area under the Bone curve
and to the left of the Bone T hreshold line is classified af F - because
this test will incorrectly classify such element s as no being bone. In
this example, the sensitivity is a measure of how w ell this test can
categorize as bone those tissues that are truly bone.
On the other hand, specificity is defined as the complement of the
false posit ive rate:
- 30 -
(2.36)SP E C IF IC IT Y = 1 - F A L SE -IN T R IN SIC -
where F A L SE - is defined as the total number of samples that do not
have the intrinsic value but w ere categorized incorrectly as true, and
IN T R IN SIC - is defined as the total number of elements that do not
have the intrinsic property [45][46].
In Figure 2.6, the area under the Muscle curve that lies to the right
of the Bone T hreshold line is classified af F + because this test will
incorrectly categorize such elements as being bone. Conver sely , the area
under the Bone curve that lies to the left of the Bone T hreshold line is
classified as F - because this test will correctly categorize such
elements as not being bone. T he specificity of this test is a measure of
how well this test will reject from the bone category those elements
that truly are not bone. Because their denominator s are different , the
sensitivity and specificity measures are not complements of one another .
In fact , they measure tw o distinct aspect s of any segmentation test : the
ability to correctly reject false properties and the ability to correctly
accept true properties .
- 31 -
CH A P T E R 3
M E D ICA L IM A GE S E GM E N T A T ION U S IN G
IN D E PE N D E N T COM P ON E N T A N A L Y S IS
CT and MR is the widely used medical image because of it s high
spatial resolution and excellent discrimination of soft tissues , bones and
other internal organs compared with other medical images . Medical
images provide rich information about anatomical structure, enabling
quantitat ive pathological or clinical studies ; the derivation of
computerized anatomical atlases ; as w ell as pre- and intra- operative
guidance for therapeutic intervention .
T o understand the design specifications and the trade- offs made in
the development of the medical image segmentation using independent
component analy sis, it is helpful first to review the techniques acquiring
medical image like CT and MR.
3 .1 A c quirin g CT Im ag e
T he purpose of the CT scanner is to acquire a large
number (100- 1200) of CT projections around the patient [48]. Unlike film,
which acquires a two- dimensional image, the detector or detector array
on a CT scanner only acquires data along a thin line. So for one
complete set of CT projections goind all the w ay around the patient ,
- 32 -
only a single CT image can be computed. Before the acquisit ion of the
next slice, the table that the patient is lying on is moved slightly in the
cephalic or cranial direction , which positions a different axial slice of
tissue in the path of the x - ray beam for the next series of acquired
projections .
T he basic principle of the x - ray CT involves x - ray generation ,
detection , digitization , processing , and computer image reconstruction .
X- ray s passing through a body are attenuated at different rates by
different tissues . T he number or data by the analog to digital
converter s (ADC). T he digital data are fed into a computing device for
image reconstruction .
T he photon density that emerges when a narrow beam of
monoenergetic photons with energy E and intensity I 0 passes through
a homogeneous absorber of thickness x can be expressed as :
(3.1)I = I 0 ex p [ - ( , Z , E )x ]
where , , and Z are the linear attenuation coefficient , density of the
absorber , and atomic number , respectively . In the energy region where
most commercial x - ray CT systems are being engaged for medical
tomography, tw o types of interactions are dominant , namely
photoelectric absorption and Compton scattering [38].
In photoelectric interaction the x - ray photon is completely absorbed
by transferring all of it s energy to an electron . In Compton scattering ,
on the other hand, scattered x - rays undergo both a directional and
- 33 -
energy change. If the absorber is not homogeneous , ( , Z ) is simply a
space- variant function dependent on the distributions of the material.
By directing a monochromatic x - ray beam in the y direction , for
example, the output x - ray intensity I ( x ) can be written as Eq. 3.2.
(3.2)I (x ) = I 0 ( x ) ex p [ - ( x , y ) dy ]
where I 0 and (x , y ) are the incident x - ray intensity and x - ray
attenuation coefficient , respectively . For instance, by taking the
logarithm and rearranging Eq. (3.2), one can obtain projection data p (x )
like in Eq. (3.3).
(3.3)p (x ) = - ln [ I (x )
I 0 (x )]
= (x , y ) dy
where p ( x ) is equivalent to a simple integration or summation of the
total attenuation coefficient s along the x - ray path . Eq. (3.4) show s the
digital form of Eq. (3.3).
(3.4)p ( x ) =i = N
i = 1i (x , y )
Eq. (3.4) represents the summation of the attenuation coefficient s of
N - pixels along a given x - ray path .
In x - ray CT the contrast is associated with the different attenuation
- 34 -
coefficient s of the material involved. Since each set of projection data
represents the integral value of the attenuation coefficient s along the
path , the projection data taken at different view s are the basic for
tomographic image reconstruction which described in Figure 3.1.
Figure 3.1 (a) A simplified x - ray beam I 0 attenuated through
a pixel and result s of the attenuated beam I .
(b) A tw o- dimensional matrix of linear attenuation
coefficient s of the image. Attenuated beam
intensities for the corresponding row s (i.e ., I 1 , I 2 ,
are shown at the right .
- 35 -
3 .2 A c quirin g M R Im ag e
T he spin - echo sequences of various types are the most widely used
imaging techniques and their basic form consist s of 90°and 180° rf
pulses . For 3- D imaging , this pulse sequence is repeated both in the z
and y directions , provided the x - gradient is the frequency - encoding or
readout gradient . In conventional spin - echo imaging the repetition is
usually longer than the echo time to allow recovery of the longitudinal
magnetization or T 1 recovery . T he fir st 90° rf pulse rotates or flips
the magnetization of the spins into the transverse plane. Immediately
after the 90° flip, spins in the transverse plane start the T 1 and T 2
relaxations . Although the T 1 relaxation process continues , the addition
of a 180°pulse at the time ofT E
2flips the spins to the opposite side,
and eventually rephases the spins that w ere dephased by T 2 decay
during the time from t = 0 to t =T E
2. In addition to true T 2 decay ,
the transver se component of the magnetization also decays due to the
magnetic field inhomogeneity . T he equivalent decay time due to both
the T 2 relaxation and the field inhomogeneity is known as T *2 which
is given by Eq. (3.5).
(3.5)1
T *2
= 1T 2
+B 0
2
where is gyromagnetic ratio and B 0 is the inhomogeneity of the
- 36 -
magnetic field. It should be noted, how ever , that the decay of the signal
because of the field inhomogeneity can be recovered by the application
of a 180° rf pulse or rf spin echo, which rephases the spins that have
been dephased during the period between the 90° and 180° rf pulses .
Spin dephasing due to T 2 relaxation , however , cannot be recovered.
With the application of three orthogonal gradient the acquired echo
signal can now be expressed as Eq. (3.6).
(3.6)s( t , g y , g z ) =- - -
( x , y , z )ei ( G xx t + g y y T y + g zz T z )
dx dy dz
where (x , y , z ) represents the spin density function including the T 1
and T 2 decays , G x is the readout gradient (constant gradient during
data acquisition ), g y and g z are the phase encoding gradient s in the y
and z directions with varying amplitudes in steps, and T y and T z are
the durations for the G y and G z gradient s, respectively . From equation
(3.6) it can be shown that the echo signal is the 3- D Fourier transform
of the spin density function . Hence the spin density function modulated
by the T 1 and T 2 relaxations can be obtained by the 3- D Fourier
transform of the spin- echo signal and is given by Eq. (3.7).
(3.7)(x , y , z ) = 0 (x , y , z ){ ex p [- T E
T 2 ( x , y , z )]}{1 - ex p [
- T R
T 1(x , y , z )]}
where 0 (x , y , z ) denotes the initial values of the spin density function
- 37 -
at a location (x , y , z ) and the two terms in { } denote the T 1 and T 2
decay s .
With a short echo time, the T 2 - dependent term in Eq. (3.7) will be
unity and Eq. (3.7) becomes
(3.8)(x , y ) 0 (x , y ){1 - ex p [ -T R
T 1(x , y )]}
With a short T R , Eq. (3.8) can be further approximated as
(3.9)(x , y ) 0 ( x , y ) [T R
T 1( x , y )]
T he short T R and short T E sequence, can be therefore considered as
an imaging mode, where the image intensity is inver sely proportional to
the longitudinal relaxation time T 1 . T his mode is often called T 1
- w eighted imaging .
With a long repetition time ( T R T 1 ), the T R dependent term in Eq.
(3.7) will become zero, and Eq. (3.7) approaches Eq. (3.10).
(3.10)(x , y ) 0 (x , y ){ ex p [ -T E
T 2 (x , y )]}
In this mode the signal is usually large because of the long T R . On
the other hand, due to the long echo time, the image is heavily
- 38 -
weighted by T 2 and the signal is reduced significantly . T his mode is
called T 2 - w eighted imaging and the image is somewhat noisy .
3 .3 M eth odolog y
In this section I describe the method of automatic medical image
segmentation using independent component analysis algorithm . T o verify
the performance of the medical image segmentation using independent
component analy sis, computer simulations w ere done with a test data
and the medical image data set . T he data set in this experiment
consist s of 27 axial CT images of patient ' s head, starting from images
below chin and ending at images in the upper portion of the nose. T he
original image data were obtained using a General Electric High- speed
Advantage Computerized T omography under the condition of 120 kVp
and 200mA . No special post - processing was performed on the image
data , other than that of reducing the bit resolution , 8bit s/ pixel for
efficient memory usage. Figure 3.2 show s some selected original images
from the data set . It is possible to assume that the data set consist s of
three part s, bone, soft tis sue and background, and the final goal of my
experiment is to extract bone from the data set . Figure 3.3 show s the
flow chart of bone extraction process .
- 39 -
Figure 3.2 Selected image data which w ere used in this
experiment .
A s described in section 3.1 and 3.2 it can be assumed that different
part s of the medical images have some independent components . In CT
images , bone and soft t is sue have different att enuation coefficient s and
this result s in different CT number or gray value. In MR images ,
- 40 -
different relaxation time result s in different w eighted image. So it is
assumed that internal part s of medical image are statistically
independent and this is the start of my experiment .
Figure 3.3 Flowchart of Medical Image Segmentation Using
independent component analy sis
- 41 -
T o extract the bone regions from each of the 27 axial image slices ,
the prior of the data should be decided and it can be decided by the
probability density function of the data . T he prior can be chosen in
terms of the kurtosis of the distribution where the kurtosis is defined
as the fourth moment according to Eq. (3.11).
(3.11)K U R T OSIS = i(b i - b ) 4
(i
(b i - b ) 2) 2 - 3
where b is the mean value.
If the kurtosis of the data is zero(Gaussian ) or smaller than
zero(sub - Gaussian or platykurtic), the Gaussian prior can be chosen and
if the kurtosis of the data is larger than zero(super - Gaussian or
leptokurtic) the Laplacian prior like in Eq. (3.12) should be chosen .
(3.12)P ( s m ) ex p ( - s m )
T he super - Gaussian has longer tails and sharper peak than a
Gaussian distribution , like Figure 3.4. Compared to a Gaussian,
Laplacian distribution puts greater weight on values close to zero, and
as a result the representations are more spar se[29].
- 42 -
Figure 3.4 Examples of three levels of kurtosis . Each of the
distributions has the same variance. A Gaussian
distribution has minimal redundancy (highest entropy ) for
a fixed variance. T he higher the kurtosis, the higher the
redundancy . With high kurtosis, there is a higher
probability of a low response or a high response with a
reduced probability of a mid- level response.
After choosing the prior , the data set is it erated using independent
component analy sis . In this thesis , the algorithm described in section 2.3
is chosen because the standard independent component analysis
algorithm described in section 2.2 requires the same number of input
and output . It is because to make the linear transform matrix W
- 43 -
square matrix . But the goal of this dissertation is to segment bones
and other part s of medical images from one slice and it requires
overcomplete matrix . Before the data set is iterated using independent
component analysis the data set should be sphered using Eq. (2.10).
T his removes both the first and second- order statistics of the data:
both the mean and covariances are set to zero and the variances are
equalized. T o ensure that the input ensemble w as stationary in
sequence, the sequence index of the signals was permuted. T his means
that at each iteration of the training , the independent component
analysis training system w ould receive input from a random sequence
index point .
In order to evaluate segmentation result s of medical image set ,
SENSIT IVIT Y and SPECIFICIT Y described in Eq. (2.35), (2.36) and
other evaluation functions, named Empirical Discrepancy Methods (EDM ).
In practical segmentation applications, some error s in the segmented
image can be tolerated. On the other side, if the segmenting image is
complex and the algorithm used is fully automatic, the error is
inevitable [52][56]. T he disparity betw een an actually segmented image
and a correctly ideally segmented image(reference image) that is the
best expected result can be used to assess the performance of
algorithm . Both (actually segmented and reference) images are obtained
from the same input image. T he reference image is sometimes called
gold standard.
Weszka and Rosenfeld used an approach to measure the difference
between an ideal(correct ) image and a segmented image[54]. Under the
- 44 -
assumption that the image consist s of object s and background each
having a specified distribution of gray level, they compute for any
given standard segmented value, the probability of misclassifying an
object pixel as background, or vice ver sa. T his probability in turn
provides an index of segmentation result s, which can be used for
evaluating segmentation algorithm . In their work , such a probability is
minimized in the process of selecting an appropriate segmentation .
Recently , a discrepancy measure based on the same principal has been
defined. It is t ermed p robability of error (PE ). For a two- class problem
PE can be calculated by Eq. (3.13)[57].
(3.13)P E = P (O )×P ( B O ) + P ( B )×P ( O B )
where P ( B O ) is the probability of error in classifying object s as
background. P (O B ) is the probability of error in classifying
background as object s . P (O ) and P ( B ) are a priori probabilities of
object s and background in images . In this case, as the PE value close
to 0, the segmentation show s good result .
Image analysis is concerned with the extraction of information from
an image, an image yields data out . Here the data are the measurement
values of object features obtained from segmented images . One
fundamental question in image analysis is whether a measurement made
on the object s from segmented images is as accurate as one made on
the object s from segmented images . According to this measure, a
segmented image has the highest quality if the object features extracted
- 45 -
from it precisely match the features in the original. T he ultimate goal
of image segmentation in the context of image analysis is to obtain
measurements of object features . T he accuracy of these measurements
obtained from the segmented image with respect to the reference image
provides useful discrepancy measures . T his accuracy can be termed
ultimate m easurem ent accuracy (UMA ) to reflect the ultimate goal of
segmentation . Let R f denote the feature value obtained from the
reference image and S f denote the feature value measured from the
segmented image, the UMA is defined as Eq. (3.14).
(3.14)U M A =| R f - S f |
R f
As like PE, the UMA value close to 0 means that the result of
segmentation is good.
Other evaluation function which is called mislabelling rate described
in Eq. (3.15) are chosen to evaluate the result s of segmentation using
independent component analysis .
(3.15)F ( I) = R ×R
i = 1
e 2i
A i
where I is the image to be segmented, R , the number of regions in
the segmented image, A i , the area, or the number of pixels of the ith
region , and e i , the error of region i [49]. T he term R is a global
- 46 -
measure which penalizes small regions or regions with a large error . e i
indicates an appropriate feature whether or not a region is assigned. A
large value of e i means that the feature of the region is not well
captured during the segmentation process . As described in equation
(3.15), the larger value of evaluation function means the bad result of
segmentation .
- 47 -
CH A P T E R 4
RE S U LT & A N A L Y S IS
In this chapter we describe the applications of automatic medical
image segmentation using independent component analysis algorithm .
T o verify the performance of the medical image segmentation using
independent component analysis, computer simulations w ere done with a
test data set and 27 axial CT images . T he performance evaluations
were done using SE N SIT IV IT Y , SP E CIF ICIT Y , mislabelling rate and
EGM which w ere described in Chapter 2 and Chapter 3.
4 .1 S im ple T e s t E x am ple
In the fir st simulation, some time sequence test data w ere used to
verify independent component analysis . T he test data consisted of three
speech signals that w ere obtained from three different per sons . T he
data w as mixed with 3×3 random matrix , each element ranging from 0
to 1. Using independent component analysis algorithm, the original
speech signals were separated shown in Figure 4.1. Figure 4.1(a) show s
the original signals . T hese signals are mixed with random matrix
shown in Figure 4.1(b ). After 1,000 iteration using independent
component analysis the original signals w ere separated from mixed
signals . It show s that stat istically independent component s in mixed
signals are nearly perfectly separated using independent component
- 48 -
analysis .
Figure 4.2 show s a simple synthesized test image consisting of a
combination of two cylinders and one ellipse. T o keep the task simple,
we only considered the three object synthesized, 160×160 image in this
simulation . In this experiment similar to above, Figure 4.2(a ) and figure
4.2(b ) are summed with tw o random coefficient s between 0 and 1.
T he gray value of tw o cylinder s and one ellipse is very similar but
it s are in different part of the images and w e can assume that they are
statistically independent . T he architecture in Figure 2.3 and the
algorithm proposed in section 2.3 w as sufficient to perform the
segmentation . After mixing two image data with random coefficient s,
the unmixing matrix W was trained with independent component
analysis . After about 100,000 iterations, the unmixing matrix converged.
T he learning rate was chosen 0.0001 in fir st 10,000 iterations , 0.005 in
the next 20,000 iterations and 0.001 in the remainder iterations . Figure
4.2 show s the result of the segmentation . Although the input is mixed
with tw o object s, the result of the output is unmixed. It says that tw o
object s are statistically independent .
- 49 -
(a )
(b )
- 50 -
(c )
Figure 4.1 T est example of independent component analy sis
(a ) T he original speech signal obtained from
three different persons
(b ) Mixed signal with 3×3 random matrix
(c) Separated signal using independent component
analy sis .
- 51 -
Figure 4.2 T he simple test image consisting of two
cylinder s and one ellipse. In this
experiment , (a) and (b) are mixed with
randomly .
Figure 4.3 T he result of automatic segmentation .
T he learning process has no prior
information and segmenting the image
into tw o part s , tw o cylinders and one
ellipse.
- 52 -
T here are some differences between the gray value of original image
and unmixed image. Although one might expect that unmixed data
having same gray value of original image would be more efficient the
learning process has no prior information and segmenting the image
into two part s, tw o cylinder s and one ellipse. Figure 4.3(b ) show s there
are some part s of mixed with a ellipse and two cylinders . It is because
in original image the gray values of the ellipse and two cylinders w ere
not different significantly .
4 .2 E v alu at ion s o f A u tom at ic S e g m ent at ion w ith T e s t
D at a
T his section describes an evaluation of segmentation using
independent component analysis . T he original data shown in Figure 4.4
is used. T he original data are mixed using a matrix which have 4
random coefficient s ranging between 0 and 1. T he original data have
gray value and is composed of tw o squares . At first , st andard
deviations of two squares are set to 35, mean value of large
square(Figure 4.4(a)) is set to 70, and mean value of small
square(Figure 4.4(b ))is set to 50. In this experiment the standard
deviations of tw o squares and the mean value of large square are not
changed but the mean value of small square is increased by 10 until
the mean value of small square becomes 190. Figure 4.4 show s the
selected images of which the mean values are 70(Figure 4.4(a)) and
- 53 -
130(Figure 4.4(b)). Figure 4.4(c) show s the graph of probability density
functions when the mean value of large square is 70 and the mean
value of small one is 130.
Figure 4.4 T he selected original data used in evaluation .
(a ) Standard deviation 35, mean 70
(b ) Standard deviation 35, mean 130
(c) T he graph of probability density function
- 54 -
Figure 4.5 and 4.6 show the result of segmentation using
independent component analysis . T he result s are ordered by the mean
value of small square(Figure 4.4(b)). Each result show s that the
independent component in mixing data can be extracted nearly perfectly .
In mixing process , large squares cover all part of small squares . It
makes the independent components which represent small squares have
some false positive value(Figure 4.5). But the independent component
which represent large squares nearly don t have false positive and true
negative values .
T o evaluate the performance of automatic segmentation using
independent component analysis, probability of error (PE ) and ultimate
measurement accuracy (UMA ) values which are described in Eq. (3.13)
and (3.14) are calculated. T hese values are close to 0 if the result s of
segmentation are close to original data . T able 4.1, Figure 4.7 and 4.8
show s the result of evaluation using PE values and UMA values . All
values are close to zero. T his means that the segmentation using
independent component analysis can extract the independent components
from the mixed data and it can be applied to medical image
segmentation .
- 55 -
Figure 4.5 Small squares extracted from mixed
test data .
- 56 -
Figure 4.6 Large squares extracted from mixed
test data .
- 57 -
T able 4.1 Probability of Error and Ultimate Measurement Accuracy
value of independent components .
PE Value of
Small Squares
UMA Value of
Small Squares
PE Value of
Large Squares
UMA Value of
Large Squares
0.256 0.269 0.092 0.093
0.268 0.254 0.091 0.089
0.297 0.287 0.077 0.075
0.269 0.271 0.064 0.068
0.257 0.263 0.071 0.072
0.257 0.258 0.068 0.067
0.248 0.25 0.068 0.069
0.245 0.248 0.065 0.066
0.237 0.238 0.067 0.065
0.256 0.254 0.094 0.082
0.233 0.241 0.098 0.097
0.278 0.267 0.093 0.095
0.258 0.257 0.094 0.097
0.269 0.264 0.094 0.094
0.231 0.237 0.093 0.091
- 58 -
Figure 4.7 Plot PE and UMA values about small squares
Figure 4.8 Plot of PE and UMA values about large squares
- 59 -
4 .3 A ut om at ic S e g m ent at ion w ith M edic al Im ag e s
When segmenting medical images using independent component
analysis algorithm which described in Chapter 3, the prior of the data
set should be chosen . T he process of choosing the prior of the data set
was done using kurtosis . Figure 4.9 and T able 4.2 show s the kurtosis
graph and the kurtosis of the data set , respectively . T he medical image
data set used in this dissertation had high kurtosis, the Laplacian prior
was chosen shown in equation (3.12).
Figure 4.10 show s the input image, a slice from a axial view from
one slice of the data set . T he slice show s chin and cervical spine of
female patient and the result of segmentation . Usually the gray value of
soft t is sue and that of inside the cervical spine are nearly same so it is
very difficult to segment the cervical spine exactly using threshold or
other method like in Figure 4.10. Using automatic method which was
developed in this thesis , the exact part of chin and cervical spine
segmented. Note the significant improvement in the inside area of
cervical spine. In Figure 4.10(a ) and (b ), inside area of cervical spine is
completely absent . In Figure 4.10(c), inside area of cervical spine is
partly absent . In Figure 4.10(d), inside area of cervical spine has
somewhat exact value but the chin has some false positive values and
we can not exactly discriminate that part .
- 60 -
T able 4.2 T he kurtosis of input data set . All of the
kurtosis are higher than 0, and it means that
the input data set have super - Gaussian
probability density function .
Input data (256×256 image) KURT OSIS
CT HEAD00 3.0487CT HEAD01 3.4076CT HEAD02 3.4840CT HEAD03 3.1982CT HEAD04 2.8927CT HEAD05 2.6120CT HEAD06 2.6840CT HEAD07 2.7024CT HEAD08 2.7019CT HEAD09 2.7735CT HEAD10 3.0565CT HEAD11 5.4718CT HEAD12 6.4063CT HEAD13 6.8815CT HEAD14 6.5358CT HEAD15 5.0069CT HEAD16 3.8800CT HEAD17 2.4190CT HEAD18 2.1906CT HEAD19 2.1755CT HEAD20 2.1586CT HEAD21 2.1212CT HEAD22 2.0798CT HEAD23 2.0982CT HEAD24 2.1921CT HEAD25 2.1131CT HEAD26 1.9202
- 61 -
Figure 4.9 T he kurtosis graph of the input data set . All of the
kurtosis are higher than 0, and it means that the input
data set have super - Gaussian probability density
function .
- 62 -
Figure 4.10 T he input image and the segmentation result .
(a) It is a slice from a axial view from a General
Electric High - speed Advantage Computerized
T omography, showing tooth and cervical spine of
female patient .
(b) T he segmentation result . T he tooth and cervical
was extracted from (a) except soft t is sue.
Figure 4.11 show s the segmentation result of the data set . Left
images are original CT and right ones are segmented images by
automatic method using Independent Component Analy sis . Since bones
in CT images have higher gray values than other part s and these are
the dominant part of those images, bones are easily segmented using
independent component analysis . Figure 4.12 show s 16 selected axial
segmented image using independent component analy sis and Figure 4.14
- 63 -
show s the volume rendering image using the result shown in Figure
4.13.
Figure 4.11 T he segmentation result using threshold. T he
original image in Fgure 4.10(a ) had gray values . I
gradually increased the threshold value and
segmented the chin and cervical spine. But the
threshold method didn ' t segment exactly because
some part s of the soft tissues and some part s of
the cervical spine and chin had same gray values .
- 64 -
Figure 4.12 T he segmentation result of CT image. Left
images are original CT and right ones are
segmented images by automatic method using
Independent Component Analy sis .
- 65 -
Figure 4.13 Result of 16 selected axial CT image segmentation
- 66 -
Figure 4.14 Volume rendered image using the result of
Figure 4.6
4 .4 E v alu at ion s of M e dic al Im ag e S e g m en t ation
T his section describes a comparison of segmentation using
independent component analysis to segmentation using thresholding
method. T he manual segmentation result s by radiologist s w as chosen as
the reference of a comparison . Figure 4.16 show s the binary result of
manual segmentation .
At first the sensitivity comparison was done betw een independent
- 67 -
component analy sis method and thresholding method. Sensit ivity which
is defined in equation (2.35) show s the "T rue Posit ive Rate" of
segmented data to reference and higher sensitivity means good
segmentation result . Figure 4.16 show s the result of sensit ivity
comparison . In Figure 4.16, the result of 3 image data using
independent component analysis out of 27 has low er sensitivity than
thresholding method. It is because the 3 image data have tooth and
cervical spine and the tooth part have some metal artifact s . T he metal
artifact s distributed other part s of the soft tis sue and this was not
segmented clearly using independent component analysis . A statist ical
Paired- t test about sensitivity was done using 0.05 statist ical
significance and p value of Paired- t t est was much low er than
statistical significance. T his means that even though the bad result of
three cases using independent component analysis, the segmentation
using independent component analysis gave good result s . T able 4.3
show s the result of Paired- t test about the sensitivity .
- 68 -
Figure 4.15 T he result of manual segmentation . T hese images
used as the reference of a comparison segmentation
using independent component analysis to
segmentation using thresholding method.
- 69 -
Figure 4.16 T he sensitivity (T rue Posit ivity Rate) comparison between
independent component analysis method and thresholding
method. T he result of 3 image data using independent
component analysis out of 27 has low er sensitivity than
thresholding method. It is because the 3 image data have
tooth and cervical spine and the tooth part have some
metal artifact s .
- 70 -
T able 4.3 T he result of Paired- t t est about sensitivity with 0.05
statist ical significance. p value is much lower than
statist ical significance. T his means that even though the
bad result of three cases using independent component
analy sis, the segmentation using independent component
analy sis give good result s .
T rue Positive Rate of
Independent Component
Analysis
T rue Positivity Rate of
T hreshold
Mean 0.982963 0.898519
Variance 0.003691 0.000328
T he number of
Samples27 27
Statistically
Significance0.05
Degree of Freedom 26
p(T < =t ) Value of
Paired- t T est4.8E - 07
Secondly the specificity comparison w as done. Specificity is defined
in equation (2.36) and 1- specificity show s the "False Positive Rate" of
segmented data to reference and lower "False Posit ive Rate" means
good segmentation result . Figure 4.17 show s the result of "False
Posit ive Rate" comparison . In Figure 4.17, the result of 6 image data
using independent component analy sis out of 27 has lower "False
Posit ive Rate" than thresholding method. But other result s have nearly
- 71 -
same "False Positive Rate". It means that in term s of specificity , the
segmentation method using independent component analysis does not
have good result comparing to thresholding method. A statistical
Paired- t t est about "False Posit ive Rate" show s that the p value of
Paired- t test is somewhat higher than statistical significance and it
means that there is no significant difference betw een independent
component analysis and thresholding method. T able 4.4 show s the result
of Paired- t test about the sensitivity .
But in Figure 4.16 and 4.17, the "T rue Posit ive Rate" of independent
component analysis is much higher than thresholding method and the
"False Positive Rate" is lower and w e can infer that the Receiver
Operator Characteristic Curve of the independent component analysis
method much better shape than the thresholding method. T his means
that independent component analysis method usually gives better result s
than thresholding method.
- 72 -
Figure 4.17 T he result of "False Positive Rate" comparison . T he
result of 6 image data using independent component
analysis out of 27 has lower "False Positive Rate" than
thresholding method. But other result s have nearly same
"False Positive Rate". It means that in terms of
specificity , the segmentation method using independent
component analysis does not have good result comparing
to thresholding method.
- 73 -
T able 4.4 T he result of Paired- t t est about "False Positive Rate" with
0.05 statistical significance. p value is higher than statistical
significance. It means that there is no significant difference
betw een independent component analy sis and thresholding
method.
False Positivity Rate of
Independent Component
Analysis
False Positivity Rate of
T hreshold
Mean 0.984074074 0.999259259
Variance 0.00245584 7.12251E - 06
T he number of
Samples27 27
Statistically
Significance0.05
Degree of Freedom 26
p(T < =t ) Value of
Paired- t T est0.12460627
At last the mislabelling rate comparison was done between
independent component analy sis method and thresholding method.
Mislabelling rate which is defined in equation (3.13). T his is a
combination of the "T rue Positive Rate" and "False Positive Rate".
T here is alw ays a trade- off betw een preserving details and suppressing
noise, which is reflected in the evaluation measure the mislabelling rate.
If there are too many details in the segmented image, the error of each
- 74 -
region may be smaller . But since many small regions are formed and
the number of regions is large, the value of mislabelling rate is large
which indicates that the segmentation result is not good. Figure 4.18
show s the result of mislabelling rate comparison . In Figure 4.18, only
2 cases using independent component analy sis out of 27 has higher
mislabelling rate than thresholding method. On the other hand, the
remainder cases have much smaller rate compared to thresholding . It
means that although the result of "False Positive Rate" comparison is
not good, independent component analysis method will have much good
result compared to thresholding method. A statistical Paired- t test about
mislabelling rate also done using 0.05 statistical significance and p
value of Paired- t t est w as much low er than statistical significance.
T able 4.5 show s the result of Paired- t t est about the mislabelling rate .
- 75 -
Figure 4.18 T he result of mislabelling rate comparison . Only 2 cases
using independent component analy sis out of 27 has higher
mislabelling rate than thresholding method. T he remainder
cases have much smaller mislabelling rate compared to
thresholding . It means that although the result of "False
Positive Rate" comparison is not good, independent
component analysis method will have much good result
compared to thresholding method.
- 76 -
T able 4.5 T he result of Paired- t t est about mislabelling rate with 0.05
statistical significance. p value is higher than statistical
significance. T his means that although the result of "False
Posit ive Rate" comparison is not good, independent
component analysis method will have much good result
compared to thresholding method.
Mislabelling Rate of
Independent Component
Analysis
Mislabelling Rate of
T hreshold
Mean 1.069558847 9.254604451
Variance 13.4430021 11.47477914
T he number of
Samples27 27
Statistically
Significance0.05
Degree of Freedom 26
p(T < =t ) Value of
Paired- t T est4.86657E - 08
- 77 -
CH A P T E R 5
CON CLU S ION
In this dissertation , an automatic medical image segmentation method
using independent component analysis was demonstrated. T he
performance of this method w as evaluated using PE and UMA values .
T he performance of this method was also compared with the
performance of the general thresholding method using T PR(T rue
Posit ive Rate), FPR(False Positive Rate) and the mislabelling rate. For
the test data all of the result s are close to the original data . T he
segmentation method using independent component analy sis has a T PR
of over 95 percent , a FPR of 1 percent , and a mislabelling rate near 1
percent . It means that the automatic method demonstrated in this
dissertation has a good result . T he segmentation method using
independent component analysis offers several distinct advantages over
other segmentation method. First , before the segmentation there ' s no
need to know a priori informations about the region to be segmented.
Second, the independent component analysis method gives good spatial
resolution compared to the general threshold method - - using this
method more detail part s in medical images could be discriminated.
T he medical image segmentation technique is the start of the
3- dimensional medical image reconstruction technique that is used for
diagnosis , treatment , preoperative planning , and outcomes simulation for
various interventional options . How ever the poor result of segmentation
- 78 -
has been a major obstacles to 3- dimensional medical image
reconstruction . T he segmentation method described in this dissertation
efficiently segmented detailed part of medical image. It can improve the
3- dimensional medical image reconstruction technique.
Independent component analysis has many potential application areas,
including blind separation of electroencephalographic and
magnetoencephalographic data , as well as feature extraction and
analysis of natural images . Blind source separation can be applied to
the noise reduction from the biomedical signals, for example, ocular
noise and 60Hz artifact extraction from EEG, fetal monitoring signal
analysis , EKG signal analysis . Independent component analy sis relies on
several model assumptions which can be applied to each case described
above. But the assumptions may be inaccurate or incorrect . Finding and
making suitable model assumption will be interesting research area .
- 79 -
RE F E RE N CE S
[1] J . Besdek, L. Hall, and L. Clarke, "Review of MR image
segmentation techniques using pattern recognition ," Medical Physics,
vol.20, no.4, pp . 1033- 1048, 1993
[2] G. Wang, M . W . Vannier , M . W . Skinner , W . A . Kalender , A .
Polacin and D. R. Ketten,“"Unwrapping cochlear implants by spiral
CT ,"” IEEE T ransactions of Biomedical Engineering , vol.43, no.9,
pp .891- 900, 1996
[3] H . L. Seldon, "T hree- dimensional reconstruction of temporal bone
from computed tomographic scans on a per sonal computer ,"” Arch.
Otolaryngol. Head Neck Surg ., vol.117, pp .1158- 1161, 1991
[4] H . T akahashi and I. Sando, "Computer - aided 3- D temporal bone
anatomy for cochlear implant surgery ,"” Laryngoscope, vol.100,
pp .417- 421, 1990
[5] R. Frankenthaler , V . Moharir , R. Kikinis, P . V . Kipshagen, F . Jolesz,
C. Umans, and M . P . Fried, "Virtual Otoscopy, Computers in
Otolaryngology ," vol.31, pp.383- 392, 1998
[6] S . K. Yoo, G. Wang, J . T . Rubinstein , M . W . Skinner and M . W .
Vannier , "T hree- dimensional modelling and visualization of the
cochlea on the internet ," IEEE T ran . Info. T ech. in Biomed., June,
2000
- 80 -
[7] T . Himi, A . Kataura, M . Sakata , Y. Odaw ara, J . Satoh , and M .
Sawaishi, "T hree- dimensional imaging of the temporal bone using a
helical CT scan and it s application in patient s with cochlear
implantation ," ORL; Journal of Oto- Rhino- Laryngology & its related
specialt ies , vol.58, pp.298- 300
[8] C. Yuan, E . Lin, J . Millard, and J . Hwang, "Closed contour edge
detection of blood vessel lumen and outer wall boundaries in
black - blood MR images," Magnetic Resonance Imaging , vol.17, no.2,
pp .257- 266, 1999
[9] D. J . Williams, and M . Shah, "A fast algorithm for active contour s
and curvature estimation ," CVGIP : Image Under standing , vol.55, no.1,
pp .14- 26, 1992
[10] G. Xu, E . Segaw a, and S . T suji, "Robust active contours with
insensitive parameter s," Pattern Recognition , vol.27, no.7, pp .879- 884,
1994
[11] L. D. Cohen, "On active contour models and balloons," CVGIP :
Image Under standing , vol.53, no.2, pp.211- 218, 1991
[12] M . Kass, A . Witkin and D. T erzopoulos, "Snakes : active contour
models ," Int . J . Comput . Vision , vol.1, pp.321- 331, 1987
[13] J . Ivins , and J . Porrill, "Statistical snakes : active region models ,"
Image and Vision Computing , vol.13, no.5, pp .431- 438, 1995
[14] M . Sonka, W . Park , and E . A . Hoffman, "Rule- based detection of
intrathoracic airw ay trees," IEEE T rans . Med. Imag ., vol.15, no.3,
pp .314- 326, 1996
- 81 -
[15] K. H . Höhne and W . A . Hanson, "Interactive 3D segmentation of
MRI and CT volumes using morphological operations ," J . Comput .
Assist . T omogr ., vol.16, no.2, pp.285- 294, 1992.
[16] P . Salviroonporn , A . Robatino, J . Zahajszky , R. Kikinis and F . A .
Jolesz, "Real- time interactive three- dimensional segmentation ," Acad.
Radiol., vol.5, no.1, pp .49- 56, 1998
[17] A . J . Bell and T . J . Sejnow ski, "Edges are the ' independent
components ' of natural scenes," In Advances in neural information
processing systems 9, pp.831- 837, Cambridge: MIT Press , 1997
[18] A . J . Bell and T . J . Sejnow ski, "T he 'independent component s ' of
natural scenes are edge filter s," Vision Research , vol.37, no.23,
pp .3327- 3338, 1997
[19] A . J . Bell and T . J . Sejnow ski, "An information- maximization
approach to blind separation and blind deconvolution ," Neural
Computation , vol.7, no.6, pp.1129- - 1159, 1995
[20] S . Amari, A . Cichocki, and H . H . Yang, "A new learning algorithm
for blind source separation ," In Advances in Neural Information
Processing 8, pp .757- 763, Cambridge: MIT Press, 1996
[21] A . Hyvärinen and E . Oja, "Simple neuron models for independent
component analysis," International Journal of Neural Sy stem s, vol.7,
no.6, pp.671- 687, 1996
[22] B. A . Olshausen and D. J . Field, Spar se coding with an
overcomplete basis set : A strategy employed by V 1?," Vision
Research, vol.37, pp .3311- 3325, 1997
- 82 -
[23] B. A . Olshausen and D. J . Field, "Emergence of simple- cell
receptive field properties by learning a sparse code for natural
images ," Nature, vol381, 607- 609, 1996
[24] H . Barlow , "Unsupervised learning ," Neural Computation , vol.1,
pp .295- 311, 1989
[25] H . Barlow , "What is the computational goal of the neocortex ?" In
C. Koch, editor , Large scale neuronal theories of the brain , pp.1- 22,
Cambridge: MIT Press, 1994
[26] T e- Won Lee, Indep endent Comp onent A naly s is : Theory and
A pp lications , Kluwer Academic Publisher s, 1998
[27] P . Comon, "Independent component analysis - a new concept ?"
Signal Processing , vol.36, pp.287- 314, 1994
[28] M . S . Lewicki and T . Sejnow ski, "Learning overcomplete
representation ," Neural Computation, vol.12, no.2, pp.337- 365, 2000
[29] D. J . Field, "What is the goal of sensory coding?" Neural
Computation , vol.6, pp .559- 601, 1994
[30] R. Linsker , "Self- organization in a perceptual network ," Computer ,
vol.21, no.3, pp .105- - 17, 1988.
[31] S . Haykin , N eural N etworks, A Comp rehens ive F oundation ,
Prentice- Hall, 1994
[32] I. T . Jolliffe, P rincip al Comp onent A naly s is , Springer - Verlag , 1986
[33] M . E . T ipping and C. M . Bishop, "Probabilistic Principal Component
Analysis," Journal of the Royal Statistical Society , Series B, 61, Part
3, pp.611- 622, 1999
- 83 -
[34] J . P . Nadal and N. Parga, "Non - linear neurons in the low noise
limit : a factorial code maximizes information transfer ," Netw ork,
vol.5, pp.565- 581, 1994
[35] J . F . Cardoso and B. Laheld, "Equivalent adaptive source
separation ," IEEE T rans . on Signal Processing , vol.45, no.2,
pp .434- 444, 1996
[36] T . W . Lee, M . Girolami, A . J . Bell and T . J . Sejnow ski, "A
Unifying Information - theoretic Framework for Independent
Component Analysis," Computers & Mathematics with Applications,
vol.31, no.11, pp .1- 21, 2000
[37] J . F . Cardoso, "Blind signal processing : statistical principles,"
Proceedings of IEEE, vol.86, no.10, pp .2009 - 2025, 1998
[38] Z. H . Cho, J . P . Jones and M . Singh, F oundations of M edical
Imag ing , John Wiley & Sons , Inc. 1993
[39] R. C. Gonzalez and R. E . Woods, D ig ital Im ag e P rocess ing ,
Addison - Wesley Publishing Company , 1992
[40] J . S . Lim, Two- D im ens ional S ignal and Imag e P rocess ing ,
Prentice- Hall, 1990
[41] R. Crane, A s imp lif ied app roach to Imag e P rocess ing ,
Prentice- Hall, 1997
[42] N. Lu, F ractal Im ag ing , Academic Press, 1997
[43] E . Gose, R . Johnsonbaugh and S . Jost , Pattern Recognition and
Image Analysis , Prentice- Hall, 1996
- 84 -
[44] C. K. Chow and T . Kaneko, "Automatic boundary detection of the
left ventricle from cineangiograms," Computers and Biomedical
Research, vol.5, no.4, pp .388- 410, 1972
[45] R. Pichumani, Cons truction of a Three - D im ens ional Geom etric
M odel F or S egm entation and Vis ualiz ation of Cervical Sp ine
Imag es , Ph . D. Dissertation , 1997
[46] H . C. Sox Jr ., M . A . Blatt , M . C. Higgins, and K. I. Marton ,
M edical D ecis ion M ak ing . Butterw orths, 1988
[47] T . A . Russ , "Using hindsight in medical decision making, "
Computer Methods and Program s in Biomedicine, vol.32, no.1,
pp .81- 90, 1990
[48] J . T . Bushberg , J . A . Seibert , E . M . Leidholdt Jr . and J . M . Boone,
T he Essential Physics of Medical Imaging , Willians & Wilkins, 1994
[49] J . Liu and Y. H . Yang, "Multiresolution Color Image Segmentation ,"
IEEE T rans . on Pattern Analysis and Machine Intelligence, vol.16,
no.7, 1994
[50] A . Papoulis , P robability , R andom Variables, and S tochas tic
P rocesses , McGraw - Hill, 1991
[51] R. W . Preisendorfer , P rincip al Comp onent A naly s is in M eteorology
and Oceanog rap hy , Elsevier , 1988
[52] Y. J . Zhang, "A survey on evaluation methods for image
segmentation," Pattern Recognition , vol.29, no.8, pp .1335- 1346, 1996
[53] M . D. Levine and A . Nazif, "Dynamic measurement of computer
generated image segmentation ," IEEE T rans . PAMI- 7, pp.155- 164,
1985
- 85 -
[54] J . S . Weszka and A . Rosenfeld, "T hreshold evaluation techniques,"
IEEE T rans . SMC- 8, pp .622- 629, 1978
[55] A . M . Nazif and M . D. Levine, "Low level image segmentation : an
expert sy stem ," IEEE T rans . PAMI- 6, pp.555- 577, 1984
[56] C. N . Graaf, A . S . E . Koster , K. L. Vincken and M . A . Viergever ,
"Validation of the interleaved pyramid for the segmentation of 3D
vector images ," Pattern Recognition , vol.15, pp .467- 475, 1994
[57] S . U. Lee, S . Y. Chung and R. H . Park , "A comparative
performance study of several global thresholding techniques for
segmentation," CVGIP , vol.52, pp .171- 190, 1990
[58] Vikram Chalana and Yongmin Kim , "A Methodology for Evaluation
of Image Segmentation Algorithm s on Medical Images," SPIE
vol.2710, pp.178- 189, 1996
- 86 -
국문요약
In depen dent Com pon ent A n aly s i s 를 이용한 의료영상의
자동 분할에 관한 연구
배 수 현
연세대학교 대학원
생체공학협동과정
전기전자공학전공
의료영상의 분할은 의료영상을 컴퓨터 진단 및 가시화에 필요한 같은
성질을 가진 여러 조직으로 나누어주는 방법이다. 본 논문에서는
Independent Component Analy sis를 이용한 의료영상의 자동 분할 방법을
연구하였다. Independent Component Analysis는 사전 정보를 알 수 없는
입력 데이터로부터 출력 데이터를 통계적으로 독립적인 형태로 만들 수 있
는 선형 시스템을 만듦으로서 Blind Source Separation문제를 해결하는 방
법이다.
본 논문에서는 Independent Component Analysis사용한 의료영상의 자
동 분할 방법을 성능평가를 위하여 합성한 테스트 데이터와 CT 영상에 적
용하였다. 그리고 제안된 방법의 성능평가를 위하여 Probability of
Error (PE )와 Ultimate Measurement Accuracy (UMA ) 값을 측정하였다. 또
한 자동 분할 방법과 일반적인 분할 방법의 성능평가를 위하여
- 87 -
sensitivity (T rue Positive Rate), specificity (1- False Positive Rate) 및
mislabelling rate를 측정하였으며, 실험결과의 통계적인 유의성 검증을 위
하여 Paired- t 테스트를 성능평가 결과에 적용하였다. Independent
Component Analysis사용한 의료영상의 자동 분할 방법은 의료영상이 통계
적으로 독립적인 여러 기관으로 구성되어 있다는 가정하에 의료영상의 각
부분을 정확하게 분할할 수 있었다.
본 연구에서 얻어진 결과는 다음과 같다.
(1) 본 논문에서 사용한 테스트 데이터에 대해서 대부분의 PE와 UMA
값은 0에 가까운 값을 가졌다. 이는 본 논문에서 제안한 방법의 분
할 결과가 테스트 데이터를 구성한 원본 데이터와 거의 일치함을
의미한다.
(2) 본 논문에서 제안한 방법을 CT 영상에 적용하였을 경우 T PR (T rue
Positive Rate)은 95%이상, FPR(False Positive Rate)은 1%이하의
값을 가졌다. 그리고 mislabelling rate는 약 1%의 값을 가졌다. 이
는 의료영상의 분할에 본 논문에서 제안한 방법을 적용하였을 경우
분할하고자 하는 영역을 거의 정확하게 분할함을 의미한다.
(3) Paired- t 테스트를 사용하여 5%의 통계적 유의성을 가지고 분석하
였을 때 본 논문에서 제안한 방법은 일반적인 영상 분할 방법보다
좋은 결과를 가져옴을 알 수 있었다.
핵심되는 말: 분할, Independent Component Analysis , 의료영상, 독립성
- 88 -