FLLL-TR-0210 - Department of Knowledge

Transcription

FLLL-TR-0210 - Department of Knowledge
Technical Report
FLLL–TR–0210
Dimension Reduction: A Comparison of Methods with
respect to Fault Diagnosis of Engine Test Bench Data
Edwin Lughofer
Fuzzy Logic Laboratorium Linz-Hagenberg
e-mail [email protected]
Werner Groissböck
Fuzzy Logic Laboratorium Linz-Hagenberg
e-mail [email protected]
Johannes Kepler Universität Linz
Institut für Algebra, Stochastik und
wissensbasierte mathematische Systeme
Johannes Kepler Universität Linz
A-4040 Linz
Austria
Fuzzy Logic Laboratorium Linz-Hagenberg
FLLL
Softwarepark Hagenberg
Hauptstrasse 99
A-4232 Hagenberg
Austria
1
Motivation
1
1 Motivation
In many real-world applications of manufacturers, companies and firms concerning data analysis,
data transformations or data processing the question of dealing with a manageable size of data
dimension (for example as columns in a data matrix) arises. Therefore, in a number of occasions
it can be useful or even indispensable to first reduce the dimension of the data, keeping as much
of the original information as possible, and then feed the reduced data into the system.
Occasions for the urgent need of dimension reduction algorithms could be:
Too small amount of data records for given input dimension:
A well-known and widely-used estimation about how much data records should be used to
perform data analysis methods for a given amount (n) of measured channels is determined
by n2 . Of course, this is just a rough estimate and has to be adapted to the special needs of
a concrete method (see [1]). If too few data points are used for data analysis methods, the
probability of achieving inaccurate or even unstable results increases.
The empty space phenomenon:
The curse of the dimensionality refers to the fact that the sample size needed to estimate
a function of several variables to a given degree of accuracy grows exponentially with the
number of variables (= number of dimensions). A related fact, responsible for the curse
of dimensionality, is the so-called empty space phenomenon: high-dimensional spaces are
inherently sparse. So for example consider a multi-dimensional lattice over the input space
and a given fixed amount of data points characterizing the input. Then, as a matter of fact,
the density of data points in the cuboids, built up by the lattice, decreases, if dimensions are
added. For practical usage a lower bound exists how much data should lie in one cuboid and
this bound is given by 10. Thus, in a multi-dimensional cuboid with dimension dim 10dim
data points are needed (see also [1] or [2]).
Visualization of data clouds and data models:
In scientific applications on data analysis/data processing it is warmly welcome or even
indispensable first, to visualize data for having a first impression about the nature of the data
and second, to visualize empirical models which describe the data with special parameters
for verification of the model generation methods. For this purpose also dimension reduction
is needed, because only up to 3-dimensional models can be visualized; indeed, 4 or even
5-dimensional models can be prepared for visualization by setting 1 or 2 dimensions to a
fixed value but this is just a simulation of visualization such that information gains out of
curves, points etc. cannot be easily taken.
Fast computation time for data analysis methods:
For the purpose of verification and validation of results achieved by theoretical research
it is required to get as much data as possible. But this fact does not hold for real-time
application where for example data models need to be computed online. Some algorithms
for processing data even grow exponentially in time with the increase of the dimension of
the data. Hence, dimension reduction is needed to decrease the computation time to the
required maximum. Local regression methods which build up models on the "check side"
2
Dimension Reduction Techniques - Theoretical Aspects
2
Figure 1: General Dimension Reduction Process
due to the actual test data points takes a long time to come to a plausibility decision, if for
example the whole amount of dimension is taken as input space. For a test-run to check the
speed of the method and to check the improvement of the computation by using dimension
reduction a 200-dimensional data set with 1534 data points were chosen and taken as input to
a local regression method. Calculation was performed in MATHEMATICA under Windows
2000 on a Pentium II 366 Mhz processor with 128 MB RAM. The following results were
achieved:
Computation time for a 200-dimensional input data set: 20.23 seconds
Computation time after reducing the data dimension to 97: 3.17 seconds
It can be easily seen that computation time could be reduced by nearly a factor 7 whereas
the dimension was only reduced by a factor 2.
Complexity reduction of data analysis methods:
Complexity reduction here means above all reduction of needed memory of a data processing procedure to make an algorithm more efficient or even appliable at all. For example
the so-called RENO (REgularized Numerical Optimization) (see [3], [4], [5]) was applied
onto data which builds up rule-bases for fuzzy inference systems by performing all possible combinations of fuzzy-sets, defined over each input dimension. Hence, considering 10
input dimensions, for each 5 fuzzy-sets defined, it was tested that this configuration yields
510 = 9765625 different rules!!! and a virtual memory overflow during computation of
RENO was produced on a PC with 512 MB RAM.
In Figure 1 the work-flow of a data analysis process using a dimension reduction step is demonstrated
2 Dimension Reduction Techniques - Theoretical Aspects
As a consequence of the previous chapter some dimension reduction techniques were studied,
researched and applied onto data coming from engine test stations during an industrial project.
These techniques will be demonstrated in the chapter below. Here it should be also mentioned that
this is not the complete current state of the art about dimension reduction algorithms; this would
include more techniques, for example
Projection pursuit
2
Dimension Reduction Techniques - Theoretical Aspects
3
Generalized additive models
Self-organizing maps (Kohonen maps)
Generative topographic mapping
See also [2]
2.1
Neglecting Redundancy
A substantial contribution to dimension reduction can be given by redundant data vectors, respectively often called data channels. Redundancy is the information which exists several times
independently inside the system and therefore acts on data columns in a data matrix. For neglecting redundancy in a data set obtained from an engine test bench there is no preference to decide
between redundancy between measured channels or redundancy between calculated channels.
Redundancy between 2 data channels can simply be discovered by using correlation coefficients such as the empirical correlation coefficient for identifying redundancy causing a linear
behaviour:
Pn
− x̄)(yi − ȳ)
pPn
2
2
i=1 (xi − x̄)
i=1 (yi − ȳ)
= pPn
rxy
i=1 (xi
(1)
or by Spearman-Rank correlation coefficient for identifying redundancy causing a monotonic
behaviour:
Pn
rxy = qP
n+1
n+1
2q)(Rg(yi ) − 2 )
n+1 2 Pn
n+1 2
i=1 (Rg(yi ) − 2 )
2 )
i=1 (Rg(xi )
n
i=1 (Rg(xi )
−
−
(2)
where xi and yi for i = 1, ..., n are the measurements of 2 channels, x̄ and ȳ the mean estimators of 2 channels and Rg(xi ), Rg(yi ) the rank of the ith measurement in the sorted data columns
xsort and ysort . If now
|rxy | ≥ rred
(3)
where rred is slightly smaller than 1, a redundancy is identified.
We tried out redundancy creation due to measured data from a diesel engine containing 101
channels and 758 samples and obtained the results shown in figure 2.1:
So, for example if somebody wants to build an empirical model with input channels VE,
GL, QLUFT, NOX and NOX_S the dimension from 5 can be simply reduced to 3 by using the
redundancy table and neglecting QLUFT and NOX_S.
One problem during the determination of redundant channels is caused by the fact that it can
happen that redundancy is calculated whereas the appearance of redundancy is just a "fake" (for
example in the case of 2 constant channels). Fortunately, as channels are mostly not really exactly
constant, this circumstance does not appear very often. Moreover, it can be checked by single
channel analysis if a (nearly) constant channel appears in the data matrix, and therefore may be
neglected.
2
Dimension Reduction Techniques - Theoretical Aspects
Channel
VE
TA31_1
TA31_2
GL
NOX
GNOX
4
Redundant with
VE_H
TA31_2,TAZ2,TAZ3
TAZ4, TAZ5
GAH,QLUFT,GLF
NOX_S
GNOX_S
Figure 2: Example of a redundancy causing linear behaviour (left) and the list of channel redundancies (right)
2.2
T-Values
Another approach of channel reduction can be obtained through so-called T -values which denote
the degree of influence of an input channel for building an empirical model for the reproduction
of another channel (output channel). In order to be able to compute the T -values first a regression
with linear or also nonlinear regressors has to be carried out for estimating parameters βi , i =
1, ..., k, where k is the amount of regressors. (Note: for static linear regression without any timediscrete dynamics the amount of regressors k is equal to the amount of channels m - for detailed
inspection see [6]). Then the T -value can be calculated by:
βi
Ti = √
s di
(4)
where s is an estimator of the perturbation in the regression analysis and can be calculated by
s
s=
SSreg
n − (m + 1)
(5)
with SSreg = ni=1 (yˆi − ȳ)2 the sum of squared errors of the regression and di the ith diagonal
element of the matrix (xt x)−1 .
P
The following "influence" matrix with the 4 channels having the greatest influence and each
row containing the channel which is reconstructed (output channel), could be obtained by using
above formulas on a data set coming from a large diesel engine:
Another approach to gain information for the degree of influence of a channel in an empirical
model using regression estimators βi can be realized by performing a statistical test with hypothesis H0i : βi = 0 and rejection domain
|Hβ − ζ0 |
√
nσ 2
s
n − (m + 1)
≥ t1−α/2 (n − (m + 1))
(H(xt x)−1 H t )
(6)
where x is the data matrix containing all data channels and t1−α/2 the α-quantile of the Student
distribution. A small p-value (given by the left side of above inequation) yields a rejection of the
hypothesis and therefore indicates that βi 6= 0 and hence a degree of influence of the corresponding
regressor in dependency of α: with the parameter α the rejection domain can be adjusted in a
2
Dimension Reduction Techniques - Theoretical Aspects
5
Output Ch
Ch 1
t-value
Ch 2
t-value
Ch 3
t-value
Ch 4
t-value
N_MI
PE_MI
BE
VE
VE_H
RW_S1
LAMBDA
LBDREZ
LBDL
POEL
MD_MI
PHC
VE_H
VE
GSOOT
KHC
N_MI
LBDF2_1
30.18
8620
70.12
134.4
134.4
30.49
38.05
23.6
53.6
LBDREZ
TA32
PNOX_S
STOECH
STOECH
GNOX_S
HC
TA41
TL2_1
23.6
2.5301
69.48
64.4
55.02
13.56
26.25
22.8
44.78
P11
TP
PSOOT
X
X
LBDREZ
GHC
BH
QLUFT
20.12
2.44
29.57
9.59
10.17
11.67
19.71
16.43
29.81
LBDF2_1
P31
KNOX_S
KLFS
P_MI
SOOT
KNOX_S
LAMBDA
NOX_S
19.46
2.405
8.22
8.24
9.57
8.37
16.32
16.04
20.7
Table 1: Table of channels having greatest influence for reproducing output channels plus t-values
(=influence levels) of these channels
way such that equation (6) holds for almost all β, not only for those whose absolute value is
significantly greater than 0.
The information as it is stated in the table above can be used for a strategy to built up models
for a smaller amount of data channels (=> lower-dimensional models, for example in the case of
reproducing channel PE_MI (second line) MD_MI (with a influence value of 8620!) plays the
crucial role and the others can more or less be neglected):
Input: train data matrix, threshold for amount of necessary
channels
Output: Sorted Channel list (sorted from top to bottom
due to the factor of influence) for each channel -> channel matrix
Step 1: Designate "well-measured" channels in data matrix
For i=0; i<number_of_well_measured_channels; ++i
Step 2: Perform multi-dimensional regression for each channel
Step 3: Compute t-values (as described above)
Step 4: Sort
t-values from top to bottom
Step 5: Choose threshold (either computation dependent on actual
t-values or fixed)
Step 6: Store channels in channels list due to threshold
End For
End
Here it should be mentioned that Step 1 is more or less a simple pre-filtering step and is shown in
[7]. Step 5 can be carried out in many different ways, which are stated below in chapter 3. Step 2
is described in [7], [6] and [9].
2
Dimension Reduction Techniques - Theoretical Aspects
6
This channel matrix which denotes the most important channels for building up a model for
each channel can be taken as input be any data-based model generation algorithm (a variety of
model generation or identification algorithms is demonstrated in [7] and [8]).
2.3 Principal Component Analysis (PCA)
Principal component analysis is possibly the dimension reduction technique most widely used in
practice, perhaps due to its conceptual simplicity and to the fact relatively efficient algorithms exist
for its computation.
Let’s start with a data matrix A containing channels (= columns) x1 , x2 , ..., xn and let’s say
that µx = E{x} denotes the mean values of the channels and the matrix
Cx = E{x − µx )(x − mux )t }
(7)
the covariance matrix of the data set. The components of Cx , denoted by cij represent the covariances between the data channels xi and xj . The covariance matrix is, by definition, always
symmetric.
From such a symmetric matrix it is possible to calculate an orthogonal basis by finding its
eigenvalues (λi ) and eigenvectors (ei ), which are the solution of the equation
Cx ei = λi ei ∀i = 1, ..., n
(8)
For simplicity it can be assumed, that the λi are distinct. These values can be found by finding the
solutions of the characteristic equation
|Cx − λI| = 0
(9)
where I is the identity matrix having the same order than Cx and the |.| denotes the determinant
of the matrix. If the data vector has n components (= dimensions), the characteristic equation
becomes of order n. This is easy to solve only if n is small. Solving eigenvalues and corresponding
eigenvectors is a non-trivial task (the zero points of a characteristic polynomial with degree n
have to be evaluated), and many methods exist, which can perform this in O(n3 ) : singular value
decomposition, cholesky decomposition, etc. (see [10] for further details).
By ordering the eigenvectors in the order of descending eigenvalues (largest first), one can
create an ordered orthogonal basis with the first eigenvector having the direction of largest variance
of the data. In this way, we can find directions in which the data set has the most significant
amounts of energy and neglect the others ⇒ dimension reduction.
In the Figure 2.3 a 2-dimensional data set is demonstrated and how it can be achieved to
represent this 2 dimensions by just one principal component via applying principal component
analysis.
It can easily be seen, that the first eigenvector has the largest eigenvalue points to the direction
of largest variance (right and upwards) whereas the second eigenvector is orthogonal to the first
one (pointing to left and upwards). In this example the first eigenvalue corresponding to the first
eigenvector is λ1 = 0.1737 while the other eigenvalue is λ2 = 0.0001. By comparing the values
of eigenvalues to the total sum of eigenvalues, we can get an idea how much of the energy is
2
Dimension Reduction Techniques - Theoretical Aspects
7
Figure 3: Principal components for 2 data channels
concentrated along the particular eigenvector. In this case, the first eigenvector contains almost all
the energy, therefore the data could be well approximated with a one-dimensional representation.
Instead of using all eigenvalues and as a consequence all eigenvectors in a transformation
matrix B, which transforms data points lying in the original coordinate system to a new data point
lying in the coordinate system spanned by the principal components, a partial subspace consisting
of the eigenvectors corresponding to largest K eigenvalues can be chosen and represented as a
matrix BK . Transformation of arbitrary input points from the original space to the "principal
component space" can be performed via using the operation
y = BK (x − µx )
(10)
For example, this has to be done, if models are generated based on the principal component vectors, and new data points should be checked due to these models.
If the data is concentrated in a linear subspace, this provides a way to compress data without
losing much information and simplifying the representation. By picking the eigenvectors having
the largest eigenvalues we lose as little information as possible in the mean-square sense. One can
e.g. choose a fixed number of eigenvectors and their respective eigenvalues and get a consistent
representation, or abstraction of the data. This preserves a varying amount of energy of the original
data. Alternatively, we can choose approximately the same amount of energy and a varying amount
of eigenvectors and their respective eigenvalues. This would in turn give approximately consistent
amount of information in the expense of varying representations with regard to the dimension of
the subspace.
Besides, PCA offers a convenient way to control the trade-off between loosing information
and simplifying the problem at hand by reducing dimensions.
Recapitulating, 3 side effects can be achieved:
Orthogonalisation of the components of the input vector
Ordering of the resulting orthogonal components (principal components) so that those with
the largest variation comes first
Eliminating those components that contribute the least to the variation in the data set
2
Dimension Reduction Techniques - Theoretical Aspects
8
Although principal component analysis yields a clear compact mathematical description of
the data through transformed channels (as principal components), comparing it to the channel
selection approach through regression analysis, 3 disadvantages appear within practical usage:
PCA is an unsupervised method, which means that PCA does not take into account any
output channel which should be mapped by others. Therefore, in a measurement system
containing N channels N − 1 PCAs with N − 1 input channels have to be carried out,
which probably would lead to an enormous computational effort.
As every channel appears in every principal component, faults detected through plausibility
analysis algorithms using empirical models based on principal components can hardly be
isolated by FI (=Fault Isolation) algorithms - see [12].
Adjustment of the number of the components used for building up an empirical model: a too
small number causes a too inaccurate model, a too large number increases the computational
complexity.
For further details about special appearance of the PCA (for example clustered PCA, recursive
PCA etc.) see [11].
2.4
Partial low dimensional Models
Another approach which was pursued, implemented and verified in an industrial project and does
not illustrate a conventional dimension reduction method like those in the chapters before, is denoted by selecting partial low-dimensional models out of the whole data matrix. In other words,
these partial models can be seen as a cloud of low-dimensional models covering subspaces of the
original input space. So more or less, this approach can be seen as a dimension reduction not
performing "top-to-bottom" (as it is for example done by principal component analysis), but more
"bottom-to-top".
From the mathematical point of view this approach is not really sophisticated and depends
just on channel selection due to measure values which gives hints for which channel combinations
data-based models should be built up or due to expert knowledge defined by the operator. The great
effort lies more in the strategy how to select the channels. For our model generation approaches
(see [7], chapter 8), 2 different selection strategies were accomplished:
Selecting pair of channels by using redundancy obtained through empirical (1) and/or SpearmanRank correlation coefficient (2) and reproducing this redundancy by an empirical model.
Computing models (especially 3-dimensional fuzzy inference systems) based on a list (also
called leading channel list) defined by an operator with physical expert knowledge.
The practical and also theoretical advantages of performing small partial models are the following:
Fast computation time: Although somebody can rise the objection that theoretically it can
happen that for n channels n over 2 2-dimensional models have to be built up, practically
it turned out that, if chosen a reasonable lower bound a correlation coefficient should have
3
Results with respect to Fault Diagnosis
9
to be able to build up useful models, approximately 2n models were generated. Thus, if n
is large, for a lot model generation algorithms building up 2n small -dimensional models
take much less time than computing 1 high-dimensional model taking all input channels; the
reason is that a lot of model generation algorithms grow exponentially with the dimension
(see [7]).
Strong flexibility due to changings in the channel configuration: In the practical use
of offline training and online-check it can happen that some channels which were taken
as input in offline training were not present in online-checking any longer, while maybe
others were added. In this case plausibility checking due to a model based on all input
channels, which were present during offline-training, cannot be performed, while checking
due to some low-dimensional partial models (those which only consists of channels which
are actually present in the online configuration) is still possible and valid; therefore with an
amount of low-dimensional models a plausibility statement for a set of test data points can
be reached.
Easier fault isolation: If there are small partial models with just a few channels in it and
faults are detected due to these models, the fault can be isolated much more easier as if there
1 only one model where all channels are included (for further details see concepts of FDI in
[7], chapter 9).
Another approach for a hint to build up a model is denoted by the so-called multiple correlation coefficient which can be evaluated for example in the 3-dimensional case via the following
formula:
s
2 + r 2 − 2r r r
r12
12 13 23
13
R123 =
(11)
2
1 − r23
where r12 ,r13 and r23 are the correlation coefficients between the channels x1 and x2 , x1 and x3
and x2 and x3 . For higher dimensions this multiple correlation coefficient changes to the so-called
r-squared values (for further details see [13]).
3
Results with respect to Fault Diagnosis
In this chapter the results of fault detection based on empirical models are compared between
different dimension reduction techniques as preliminary steps before the generation of fuzzy inference systems by using the famous MATLAB method genfis2.
Genfis2, as opposed to RENO (see [3], [4], [5]), has the great advantage to be able to deal
with a dimensionality up to 20, while RENO produces a virtual memory overflow on a machine
with 256 MB RAM if input dimension is higher than 7. Genfis2 first uses the subtractive clustering algorithm (see [16]) to determine the number of rules and antecedent membership functions
and then uses linear least squares estimation to determine each rule’s consequent equations. The
Gaussian-shaped fuzzy sets are obtained by projecting each cluster onto each axis. This method
returns a Takagi-Sugeno-type fuzzy inference system (see [15]) structure that contains a set of
fuzzy rules to cover the feature space (see [17], [18]).
An approach for the fault detection process itself, hence the decision based on data-based
models such as fuzzy inference systems, if a current point is faulty or not, is to filter the data
3
Results with respect to Fault Diagnosis
10
Input: Train data set, Check data set with rating channel
Output:Detection and over-detection rate
Step 1: Defining Parameters
Step 2: Pre-filtering of train data matrix
Step 3: Applying a special dimension reduction method onto
filtered data
Step 4: Adjusting parameter(s) for genfis2 Step 5:
Generate FIS with genfis2
Step 6: Evaluate quality of FIS
Step 7: Perform fault detection due to check data
Step 8: Calculate detection and overdetection rate
Table 2: Complete process scheme using dimension reduction methods as preliminary steps for
model generation and fault diagnosis
in order to estimate a parameter which represents the deviation from the reference situation (see
[8]). Additionally, an internal and an external quality measure is calculated, which both give
rise to the trustability of a model (see [7]) and is taken into account in the whole fault diagnosis
process: Statements about faults coming from models with better qualities are higher weighted
than obtained through worse ones.
The performance reflects detection and over-detection rate of faulty points occurred during a
test procedure of a large diesel engine. The train data matrix, containing 1810 data points (rows)
with 80 channels (columns) for building up the fuzzy inference system was first pre-filtered by the
outlier detection algorithms as described in [7], chapter 7 and through the famous Mahalanobis
distance measure (see [14]) before sending it into dimension reduction methods. A so-called
check data set of 250 points with the same 80 channels as contained in the train data set for
verification and validation of the complete general process scheme as described in table (3 was
defined, including 129 faulty and 121 faulty-free samples. Additionally, a so-called rating channel
was placed into the check data matrix, which clarifies for each sample, if there is a fault in it or
not (1 or 0).
Step 3 in above algorithm was carried out for the following 3 different dimension reduction
techniques:
Dimension reduction based on physical expert knowledge ⇒ partial low dimensional models (see above)
Principal Component Analysis with different variations of parameters
Dimension Reduction based on t-values with different variations of parameters
In table 3 a summary of results is stated, including comments on the choice of parameters
and the output structure of fuzzy inference systems generated by genfis2. For the calculation of
detection and overdetection rate (step 8 in table 3) see [7]. Up-to-n-dimensional models here
means that the amount of inputs for reproducing one output channel is restricted by n, where
3
Results with respect to Fault Diagnosis
11
Dim. Red. Method
Model Structure
Parameters
Expert Knowledge
12 3-dimensional models with
inputs N_MI (rev) and PE (torsional) and 12 common leading
channels for diesel engines
10 10-dimensional models by
taking out the 10 most significant principal components reproducing each of these 10 by
the other 9
62 5-dimensional models by taking all original channels (after
filtering!) as output channels
and reproducing them by the 5
most significant principal components
62 5-dimensional models channels (after filtering!) as output
channels and reproducing them
by the 5 most significant channels
62 up-to-5-dimensional models
(after filtering!) as output channels and reproducing them by
the most (up to 5) significant
channels
62 up-to-5-dimensional models
(after filtering!) as output channels and reproducing them by
the most (up to 5) significant
channels
62 up-to-10-dimensional models
(after filtering!) as output channels and reproducing them by
the most (up to 10) significant
channels
radius_cluster
0.2
PCA
PCA
T -Values
T -Values
T -Values
T -Values
Det.
Rate
Overdet.
Rate
=
56.2%
6.0%
radius_cluster
0.2
=
71.32% 62.28%
radius_cluster
0.2
=
46.32% 0.88%
radius_cluster
0.2
=
48.53% 1.75%
radius_cluster
0.2
=
45.59% 1.75%
radius_cluster
dynamic
=
50.0%
radius_cluster
dynamic
=
45.59% 1.75%
1.75%
Table 3: Performance results of fault diagnosis obtained by using different dimension reduction
techniques
4
Interpretation of Results and Conclusion
12
below n the input dimension is dynamically adapted for output channel m through the following
assignment:
dimm = |{tval ∈ Tm |tval ≥ 0.3 max(tvali ), tvali ∈ Tm }|
(12)
i
where Tm is the set of t-values belonging to input channels for reproducing output channel m.
Moreover, also the crucial parameter radius_cluster for genfis2 can be adjusted dynamically
(3) in dependency of the actual input dimension by
(
rad =
1
n−dimm
0.7
dimm < n − 1
dimm ≥ n − 1
(13)
Radius_cluster is always between 0 and 1 and specifies that the range of influence in a data dimension of one cluster is a fraction of the width of the data space. If radius_cluster is a scalar (as in
our case), then the scalar value is applied to all data dimensions, i.e., each cluster center will have
a spherical neighborhood of influence with the given radius.
4
Interpretation of Results and Conclusion
As in the case of a too high over-detection rate the danger is great that the confidence of the
operators in the complete fault detection system decreases, the dimension reduction method as
stated in line 6 of table 3 should be preferred to any others; hence, the dynamisation of both, used
input dimension and parameter radius_cluster for genfis2, gives us really an essential impact onto
the performance. Indeed, the method based on expert knowledge achieves the highest detection
rate, but an over-detection rate of 3 to 4 times higher than the others is unacceptable.
By the way, too much fixed input dimensions (as in the case of algorithm in line 2) explain
the variances of the data in a too detailed way and therefore lead to a so-called overfitting of the
data, which means nothing else than a too exact adaptation of the model to the data, such that new
incoming faulty-free points often do not fit into the model ⇒ high overdetection rate.
Even though, with decoupling the dynamic fuzzy expert systems from the physical a-priori
knowledge and hence, as an important side effect for reducing parametrization effort, shifting it
into the area of pure data-based modeling methods (see figure functional framework in [7], chapter
4), overdetection rate can be almost squeezed down to 0, the detection rate of about 50% is not
satisfying enough. Therefore advanced techniques in dimension reduction and channel/variable
selection will be needed in order to be able to improve the fault detection process further.
References
[1] Edwin Lughofer, "Testdata Requirement Specification"
[2] Miguel A. Carreira-Perpinan, "A review of dimension reduction techniques", technical report
CS-96-09, Dept. Of Computer Science, University of Sheffield
[3] Josef Haslinger, Ulrich Bodenhofer,Martin Burger, "Data-Driven Construction of Sugeno
Controllers: Analytical Aspects and New Numerical Results", Software Competence Center
Hagenberg, 2000
References
13
[4] Martin Burger, Josef Haslinger and Ulrich Bodenhofer, "Regularized Optimization of Fuzzy
Controllers", Technical Report, SCCH-TR-0056, Software Competence Center Hagenberg,
2000
[5] H.Schwetlick and Thorsten Schütze, "Least Squares Approximation by Splines with Free
Knots", BIT 35(3):361-384, 1995.
[6] Frank E. Harell jr., "Regression modeling strategies: With applications to linear models,
logistic regression and survival analysis", Springer Series in Statistics
[7] Günther Frizberg, Edwin Lughofer, Thomas Strasser, "Final Report AMPA01"
[8] Oliver Nelles, "Nonlinear System Identification - From Classical Approaches to Neural Networks and Fuzzy Models", ISBN 3-540-67369-5 Springer-Verlag Berlin Heidelberg New
York
[9] Peter Weiss, "Statistik 2"
[10] W. H. Press, S. A. Teukolsky, W. T. Vetterling and P.B. Flannery, "Numerical Recipes in C:
The Art of Scientific Computing", Cambridge University Press, Cambridge, U.K., second
ed., 1992
[11] J. Edward Jackson, "A user’s guide to principal components", ISBN: 0471622672, John
Wiley & Sons
[12] Patton, R.J. et al., "Issues of Fault Diagnosis for Dynamic Systems", Springer-Verlag, London Limited, 2000, pp. 87-114
[13] Murray R. Spiegel, "Statistik", McGraw-Hill Book Company Europe
[14] P. C. Mahalanobis, "Proc. Natl. Institute of Science of India", 2, 49 (Original reference to
Mahalanobis distance calculations.)
[15] Hung T. Nguyen, Michio Sugeno, Richard Tong and Ronald R. Yager, "Theoretical Aspects
of Fuzzy Control", ISBN 0-471-02079-6, John Wiley and Sons, Inc.
[16] Robert P. Velthuizen, Lawrence O. Hall, Laurence P. Clark and Martin L. Silbiger, "An Investigation of Mountain Method Clustering for large Data Sets", Pattern Recognition, Vol.
30, No. 7, pp. 1121-1135, 1997
[17] Ronald R. Yager and Dimitar P. Filev, "Generation of Fuzzy Rules by Mountain Clustering",
Technical Report MII-1318R, Machine Intelligence Institute, Iona College, New Rochelle,
NY 10801
[18] S. Chiu, "Fuzzy Model Identification Based on Cluster Estimation," Journal of Intelligent &
Fuzzy Systems, Vol. 2, No. 3, 1994.